Tango on Baseball Archives

© Tangotiger

Archive List

Baseball: Pythagorean Method (February 11, 2004)

Ben V-L checks in with More than you probably ever wanted to know about the ``pythagorean'' method
--posted by TangoTiger at 09:57 AM EDT


Posted 10:18 a.m., February 11, 2004 (#1) - David Smyth
  Is this the old post from Fanhome, or a "new edition" recently submitted by Ben? If it's new, why doesn't he mention Pythagopat?

Posted 10:40 a.m., February 11, 2004 (#2) - tangotiger
  It's the old one.

Posted 10:44 a.m., February 11, 2004 (#3) - Ben Vollmayr-Lee
  Can anyone point me towards Pythagopat?

Posted 10:46 a.m., February 11, 2004 (#4) - Patriot
  Pythagopat sets the exponent to RPG^.28.

Posted 10:51 a.m., February 11, 2004 (#5) - Ben Vollmayr-Lee
  Is the 'Pat' short for Patriot? I assume this form gives a lower RMS than the others? It must be a small effect, though, because RPG^0.28 is quite linear in the range of historical run environments.

Posted 10:56 a.m., February 11, 2004 (#6) - Patriot (homepage)
  Yeah, Pat is short for Patriot. It's not so much about RMS. It's about accuracy at the one point where we know for sure what the exponent is. That point is 1 RPG. The minimum RPG for any game is 1, because the game keeps going until someone scores. And if there is only one run scored in the game, the team that scored it wins. So if you play 160 games and score 90 runs and allow 70 runs, you go 90-70. So the exponent at RPG=1 must be 1.(This insight came from David). I think I came up with RPG^.29, and David something similar, and then Tango determined that .28 would work best for the most teams(there are 3 articles on his webpage...I linked one above).

Posted 10:57 a.m., February 11, 2004 (#7) - Greg Tamer(e-mail)
  Sweet -- a linear fit equation I can roughly do in my head to estimate a team's winning percentage using RS and RA. Will have to use n/2 = 0.90, though, but that's not too terrible.

Is there any model, regardless of complexity but only using RS and RA, that can reduce the rmse to under four games? Well, perhaps lgRS and lgRA can be included as well.

Ben -- will your report remain indefinitely on your website? Perhaps this can be submitted to Primer's Visitor's Dugout and posted as an article for safe-keeping.

Posted 11:12 a.m., February 11, 2004 (#8) - David Smyth
  The thing about the 1 rpg is that, due to the construction of the game (no ties at 0), it is a *unique point*. It is good to have a formula which handles that unique point correctly, but it would not be a good tradeoff if the formula was noticeably less accurate at normal scoring levels. Happily, that doesn't seem to be the case.

Posted 5:25 p.m., February 11, 2004 (#9) - Ben Vollmayr-Lee
  Greg: I will be leaving this on my web page indefinitely, so it's probably safe. Though I have no problems with Primer archiving it (though maybe I should finish formatting it first). To get to a lower RMS I'm pretty sure you would need to account for more information than just RS and RA.

Patriot: thanks for the description. So RPG^x has a feature like pythagoras, that it hits an extreme properly. But, as David already said, this is a lesser concern than fitting the data well in the typical RPG range.

Evidently RPG^0.28 does well as judged by RMS (presumably what Tango showed). Which is interesting, because it's a 1-parameter fit, so it's 'fitting with one hand tied behind its back'. Interesting. I don't have my data set up at the moment to conveniently run it through and find the RMS.

Posted 5:32 p.m., February 11, 2004 (#10) - tangotiger
  If I remember right, doing it as abs(real win% - est win%), all of these were around an average error of .020 (or 3.2 wins per year). I suppose that probably corresponds to the RMS of 4.2. So, you don't gain much anywhere, even with the ultracomplex Tango Distribution.

Posted 3:03 p.m., February 12, 2004 (#11) - Ben Vollmayr-Lee
  Pythagopat revisited: In the old post I had tried a formula of the form

wpct = 0.5 + a RPG^b (x-1/2)

I didn't report the fit values, but they turn out to be a=0.933765, b=0.309298 (linear least square fitting). This gave the RMS of 4.192 reported. If I go back and fit with

wpct = 0.5 + RPG^b (x-1/2)

I get b = 0.277742 with also RMS 4.192. This is a good sign that the 1-parameter fit is better than the 2-parameter fit. Now if I pythagorize 'n', i.e. take

n = a RPG^b

in the pythagorean formula, I get a=0.94935, b=0.310016, with RMS 4.185. Going all the way to Pythagopat, n = RPG^b, I get b=0.286089 with RMS 4.184.*

Two comments:

1) these RMS values are right in line with the RMS obtained from other functional forms that had some form of diminishing returns (pythagoras or cubic correction) and a RPG-dependent slope.

2) Pythagopat, in being constructed to behave reasonably in the RPG=1 limit, matches the performance of 2 parameter fits with only 1 parameter. This lends a some merit to theoretical idea behind pythagopat. The way I would put it: the rules of baseball (no ties) are evidently one of the main sources for the RPG-dependence in wpct formulas. And this is pretty cool. I learned something new.

* you might be troubled that RPG^b can fit to a lower RMS than a*RPG^b. The reason is that what I'm reporting as RMS (root-mean-square) has in the denominator of the mean (average) NOT the number of team-seasons in my sample (1972) but rather the number of team-seasons minus the number of fitting parameters. So the one-parameter fit had 1971 in the denominator and the two-parameter fit had 1970 in the denominator. This version of RMS is a better measure of the quality of your fit, and it tells us in this case that adding an extra parameter has no real improvement on the fit.

Posted 3:23 p.m., February 12, 2004 (#12) - tangotiger
  Ben, that's great stuff!

"I get b=0.286089 ". I don't remember what my data set included, but I had .287. So, I think we've pretty much nailed it.

As well, I mentioned that in the cases that we're most interested in (extreme teams and extreme players), I found a better fit with .28. That is, if you throw out the one-third or one-half of your teams with the smallest run differential (or smallest RS-RA / RS+RA), .28 works better.

I suppose if you really, really wanted to find the best fit, you'd do:
b = x + [ABS(RS-RA) / (RPG)] y

Then, n=RPG^b

For the majority of teams, that y term will reduce to close to zero. For a Tiger or Yankee team, that second term might come out to -.01 or -.02.

I suppose that x=.29 and y= -.01 or -.02.

Posted 3:25 p.m., February 12, 2004 (#13) - tangotiger
  Uhmmm... that should be y= -.1 or -.2.