Here are some facts.
The Mercier calculator uses a total of twenty data points for each event. For the men's marathon, those data points are all in the range of about 2:05 to 2:10. For the women's marathon, those data points are all in the range of 2:20 to 2:30. All measurements and equivalencies of performances outside the range of those data points are based on crude mathematical extrapolations, not distributions of actual performances.
It's not surprising that the Mercier calculator rates Paula's performance in the 2003 London marathon as the equivalent of a sub-2:00 marathon by a man. The Mercier calculator doesn't even recognize that any woman has ever broken 2:20 in the marathon.
A reasonably bright grade-schooler could, in the course of an afternoon, construct a calculator that better reflects the distribution of men's and women's marathon times and more accurately "equates" performances of men and women based on where those performances lie on their respective distribution curves. Of course, as many others have pointed out, the significance of such purported equivalencies is, to say the least, open to serious debate.
For people who care about this stuff, the IAAF's tables provide a much more sophisticated basis upon which to compare performances. I am very familiar with the IAAF's caveats about limitations on using the tables, and also recognize how the tables can be misused. But I'm not aware of any better, or more current, basis for comparing times across a range of men's and women's events. Certainly, the Mercier calculator isn't in the running.