« More on Liskov | Main | The Era of Operations »

August 11, 2007

The Pythagoreon Formula (of Baseball)

Bill James once invented a theory that a baseball team's expected record is proportional to the ratio of the square of runs scored to the square of runs allowed. It's called the Pythagorean Theorem because the real Pythagorean Theorem has something to do with squares and addition and all that, so why not?

The Mariners, who are at this moment one percentage point ahead in the AL wild card race, are having a particularly surprising year based on this theory. They have scored 560 runs and allowed 549, which creates the expectation that they will basically be playing .500 ball (actually .510 as of today, they are expected to be 58-55 instead of 63-50). ESPN has a Relative Power Index page which shows, among other things, the Pythagorean expectations (listed in the last two columns as "ExpW-L" and "ExpWP").

The Mariners are 5 games over their ExpW-L, but they aren't the "best" at that. The Diamondbacks are 66-51, with the best record in the National League, despite having been outscored 516 to 495 by their opponents. Their ExpW-L is 56-61, so they are 10 games over.

The team that is furthest UNDER their ExpW-L is the Yankees. They have scored 684 runs and allowed only 532, so by rights they should be in first place with a 72-43 record, one game ahead of the Red Sox. The Mets, by comparison, who happen to have the exact same record as the Yankees right now, are only up 535-489 in runs, for an expected 63-52 record (nine games behind the Yankees expectation). And Seattle, which has essentially the same record, is even worse.

James claims that teams tend to revert to their expected Pythagorean record over time, although not necessarily in the span of a single season. It does seem reasonable to feel that the Mariners have gotten "luckier" this season than the Yankees, but like many of these kinds of calculations that James has come up with over the years, there's no real logic behind it, except some hand-waving and the fact that it seems to play out historically (before I continue, I'll point out that I think Bill James is great, and the statistical approach he has taken to analyzing baseball is light-years ahead of the gut-feel approach of old). For example, you could argue that won-lost records should be proportional to runs scored vs. runs allowed (as opposed to the squares) or to the cubes of those. These would all work if you consider the league as a whole, where runs scored equals runs allowed and the record is .500. Or you could say that it is proportional to batting average vs. batting average allowed, or batting average vs. league average compared to ERA vs. league ERA, and maybe you could square those for good measure, or take the square root, or who knows. I know that James has stated that the goal of a baseball team's offense should be to score runs, and the goal of defense is to prevent runs, so I see why he focused on a run-based calculation when he posited the Pythagorean Theoreom. But the focus on runs is debatable; arguably the goal of a team should be to win games, not score runs (most people would agree that Arizona, whose actual record is one game ahead of the Yankees, has had a slightly better year than the Yankees so far, despite the fact that the Yankees are 173 ahead in run differential and 17 games ahead in ExpW-L). Or maybe the goal is to win championships, or generate high attendance. The Pythagorean theory is fun and interesting, you can impress baseball fans by spouting it, and if the Yankees wind up stomping the Mariners in the standings it will have proven itself again, but as a theory it has some clunky underpinnings.

Posted by AdamBa at August 11, 2007 12:39 PM

Trackback Pings

TrackBack URL for this entry: