Tango on Baseball Archives

© Tangotiger

Archive List

Persistency of reverse Park splits (November 20, 2003)

MGL takes a quick and dirty look at hitters who have Home/Road splits that are in reverse of the HFA of his home park.

As much as I like to challenge established beliefs in baseball, I also like to challenge beliefs that are not yet established. One such example is “What is more predictive of a player’s home/road splits – his own historical splits or his overall stats and the average park factor of his home park?” Conventional wisdom says that the answer lies somewhere in between – that an unusual home/road historical split for a player, or one that is not in line with his home park’s average park factor, may be a fluke, but also may be indicative of that player either intentionally or unintentionally being able to take advantage or not of the unique characteristics of his home park.

In fact, some people express a severe disdain for even using average park factors to adjust a player’s home stats. Presumably they believe that most or all players take advantage (or not) of their home park in unique ways and that applying average park factors to an individual player’s stats to neutralize the effect of perhaps playing in a pitcher’s or hitter’s park does more harm than good.

You will also often hear and see people talk about how players like Cameron or Nomar have such extreme historical splits that they must be doing something either particularly right or wrong in their home parks – in other words, that such extreme splits are due, at least in part, to something other than the average effect of the park itself combined with random fluctuation.

While there seems to be some merit in the argument that not all players are going to be affected in the same way by the unique characteristics of their home park (in fact, that is almost a given), is the signal to noise ratio in a player’s historical home/road splits so low that we can tell very little about how a player is truly affected by his home park by looking at his historical splits? In the same vein, are a player’s unique or true splits insignificant compared to the average effect a park has on that player? In either of those cases, if we want to predict a player’s home/road splits, we would be better off using a player’s average home park factor and ignoring his actual sample historical splits, much like we so with a pitcher’s $H rate (DIPS) or a batter’s (especially a RHB) platoon ratio or clutch stats. In other words, the regression for a player’s home/road splits would be close to 100%, depending upon, as usual, the size of the historical sample you are using for your prediction (although as regression rates approach 0 or 100%, sample sizes become less and less important).

The other side of the coin, as I said, is to use some moderate regression rate, such that in order to predict a player’s home/road splits, we combine a player’s sample historical split rate with the average park factor of his home park.

As usual, I am going to hypothesize that a player’s historical splits are not very predictive of his future splits – therefore our best tool for predicting a player’s splits is his average home park factor applied to his home stats. In other words, I am suggesting that the regression rate for a player’s home/road splits is near 100% for a small sample and 80 or 90% (maybe more) for even a large sample. If I am right, then it is correct to simply park adjust a player’s home stats in the traditional way if we want to compare players on a level playing field, without worrying about the fact that any given player might be uniquely affected by his home park in ways that are not captured by that park’s average park factor.

There are any number of ways to address this issue in a research study. Unfortunately, I have neither the time nor the energy right now to do a very good job (how’s that for a disclaimer?). Let’s look at the results of a Q&D (quick and dirty) study:

First I computed the 10-year (less if a park was new or changed any of its characteristics) regressed OPS park factor for each park in 2001 and 2002. The 2002 OPS park factors are printed below. The 2001 ones are similar:

ARI 1.04
ATL .98
CHN .99
CIN 1.01
COL 1.18
FLO .96
HOU 1.05
LAN .97
MIL 1.01
MON 1.02
NYN .95
PHI 1.00
PIT 1.00
SDN .96
SLN 1.00
SFN .96


BAL .96
BOS 1.02
ANA 1.03
CHA 1.01
CLE 1.02
DET .97
KCA 1.04
MIN 1.01
NYA .97
OAK .96
SEA .96
TBA 1.02
TEX 1.02
TOR 1.00


Then I computed every player’s home/road OPS ratio in 2001. Before I divided their home OPS by their road OPS, I “adjusted” them to account for the normal home field advantage (HFA). On the average, a player’s home OPS is 1.03 times their road OPS. Before I divided a player’s home OPS by his road OPS to get his home/road OPS ratio, I divided his home OPS by 1.015 and multiplied his road OPS by 1.015 to account for a normal HFA.

Next I only looked at players who: 1) had at least 300 PA’s, home and road combined, 2) played in the top four hitter’s (ARI, HOU, KCA, ANA), not including Colorado, or pitcher’s parks (LAN, SFN, SDN, DET), according to their 2001 and 2002 OPS park factors, and 3) had a “reverse” home/road split in 2001 – i.e., their OPS was better at home even though they played in one of the top four pitcher’s parks or their OPS was worse at home even though they played in one of the top four hitter’s parks. Here are those players in 2001 and their home parks, the lesser of their home or road PA’s, and their home/road OPS ratio (again, after adjusting for HFA):

Pitcher’s parks

Karros LAN 222 1.05
Loduca LAN 200 1.08
Aurilia SFN 305 1.07
Cedeno DET 233 1.16
Higginson DET 270 1.35
Macias DET 220 1.11


Hitter’s parks

Bell ARI 214 .95
Gonzo ARI 315 .98
Williams ARI 196 .81
Ausmus HOU 206 .88
Castilla HOU 205 .74
Eckstein ANA 286 .97
Salmon ANA 275 .97
Alicea KCA 183 .75
Quinones KCA 214 .91
Randa KCA 283 1.00
Sweeney KCA 262 .91


For the above players, the average OPS park factor for the pitcher’s parks was .96. The average OPS ratio for the players in the pitcher’s parks was 1.14. For the hitter’s parks (I did not include Colorado), the average park factor for the above players was 1.04 and the average OPS ratio for those players who played in the hitter’s parks was .91.

The next obvious step is to look at the same player’s OPS ratios in 2002 for those players who played in the same park and amassed at least 300 PA’s again. There were 17 players (6 and 11) in the 2001 sample with a total of 4089 PA’s home and road combined. In the 2002 sample, 10 players “survived” (same home park and at least 300 total PA’s) with a total of 2452 PA’s.

Here are the 2002 results:

Pitcher’s parks

Karros LAN 248 .87
Loduca LAN 281 .75
Aurilia SFN 228 1.03
Higginson DET 222 .93

Hitter’s parks

Gonzo ARI 267 .87
Ausmus HOU 196 1.43
Eckstein ANA 289 .98
Salmon ANA 232 .81
Randa KCA 267 1.08
Sweeney KCA 1.06


When you do a study like this, the most telling statistics are the aggregate results of each group. If you look at each individual player’s OPS ratio in one year and then the other, you will be tempted to make conclusions one way or another about each individual player. That is what you were trying to avoid in the first place and why you want to look at as many “extreme” players as possible combined in order to get a large sample. Here are the composite results:

In 2002, the players in the hitter’s parks who originally all had a “reverse” OPS ratio of a combined .91, had a combined OPS ratio of 1.02 the following year. The average OPS park factor for these parks was 1.04. The players in the pitcher’s parks who had a “reverse” combined OPS ratio of 1.14 in 2001, ended up with a combined OPS ratio of .89 in 2002. The average OPS park factor for these parks was .96.

While further (and better) study, especially establishing a larger sample size, is needed to address this issue, my preliminary conclusion is that a player’s sample home/road ratio, at least for one year, is not at all a reliable predictor of his future home/road splits, and that in fact, the best predictor of a player’s home/road splits is the average multi-year park factor of his home park.

--posted by TangoTiger at 10:17 AM EDT


Posted 10:20 a.m., November 20, 2003 (#1) - tangotiger
  Bill James also did a similar study regarding the reverse platoon advantage (LH/RH v LP/RP), and came to a similar conclusion.

Posted 11:42 a.m., November 20, 2003 (#2) - studes (homepage)
  This makes a ton of sense. I spent a lot of time earlier this year picking apart park factors as best I could and trying to determine if certain types of hitters did better/worse in certain parks. I still think there's something there, but the classification of hitters is very tricky, and so is the regression toward the mean.

I also spent some time on Retrosheet reviewing home/road splits for a lot of Mets hitters (cause I was interested in the effects at Shea, which has been around for a long time). I was amazed at how much variance there was in one-year splits. For instance, Mookie Wilson had several years in which he hit "significantly" better at Shea, and several years in which he hit "significantly" worse. Significantly, by my own non-analyzed impression.

My only point is that I think you can only draw conclusions about specific batters or types of batters with A LOT of data and with an airtight classification system. And I'm not sure what you'll find out at that point.

By the way, check out Sid Fernandez's home/road splits from his Met days. I bet those pass the significance test.

Did I just undermine my entire point?

Posted 11:56 a.m., November 20, 2003 (#3) - tangotiger
  El Sid is an EXTREME FB pitcher, and also at the top of the list with low hits on ball in play (not that those 2 things are mutually exclusive). I would not be surprised if whatever park effect exists at Shea that he'd be most exposed to it (like Wade Boggs at Fenway).

Posted 2:01 p.m., November 20, 2003 (#4) - MGL
  By the way, check out Sid Fernandez's home/road splits from his Met days. I bet those pass the significance test.

Did I just undermine my entire point?

To some extent you did. Looking at an extreme split (as compared to the average home park factor - i.e., a player who has a 1.20 home/road OPS ratio while playing for the Rox does not have an "extreme" split) for a player and doing a significance test on that is NOT the proper way to decide whether that sample split is a fluke or is "real" (or a combination). As is often the case with these types of questions, this is a Bayesian probability problem. First, you have to answer the question, "What is the distribution and magnitude of players in the population (of ML baseball players) who have unique true home/road splits that are different from the true splits of an average player for that home park?" Then, and only then, can you start doing "significance" tests on a particular sample split and some ensuing calculations. For example, if you answer the first question (the first part of the Bayesian calculation) with "There are no players with unique splits (such as if we were trying to find an association with a player's splits and the month of his birth)," then any weird sample split (even 4 standard deviations from the mean) is not going to suggest that the extreme sample split was anyhting but a fluke. That's how the analysis must be done.

People need to add one more important word to their baseball analysis vocabulary when it comes to these types of problems - "Bayes!"

And yes, not only did James find that there was virtually no such thing as a unique true platoon ratio in major league baseball (see my post in the Clutch thread about the T. Long trade a few days ago), at least for RHB's (and to a lesser extent for LHB's), so did the authors of the book "Curve Ball" and so did I.

Getting back to the Fernandez example and to home/road splits in general, if in fact we find that the regression for players' sample home/road splits is large, which my study suggests that it is, AND we know intuitively that some of a park's unique characteristics that go into its average park factor affect players differently (so it is unlike the "month of birth" example), what must be happening is:

1) The signal to noise ratio is low, probably due in part to the fact that we tend to forget or ignore that the sample size in splits is almost half that of a metric like OPS or BA;

2) there are probably only relatively few players who are significantly and uniquely (different from the average park factor) affected by a particular park; and

3) the effect of these unique influences is probably not that large.

This all leads to the conclusion, that using average park factors IS appropriate for adjusting player's home stats and that it does NOT do more harm than good and that using a player's overall stats and their home park average park factor is a VERY good way to predict their future splits, regardless of their sample historical splits, and that in evaluating trades, for example, we should not worry so much about players who have shown extreme and anomolous splits (like Nomar), as those extreme splits are most likely a fluke, absent compelling evidence to the contrary, and even then, we need to be very, very careful (as always) that we don't invent, exaggerate, or embellish "compelling evidence" to accomodate our beliefs.

Now really getting back to El Sid, he may be one of those players with whom you do have SOME compelling evidence at least that his extreme splits may have some merit in terms of accurately representing (with SOME regression) his true splits, given that he was, as Tango said, one of the most extreme fly ball pitchers in baseball history (he once pitched a complete game in which the infield had zero assists), and that he had relatively few balls put in play against him.

Because of the realtively small sample size of the original study, I did the exact same thing for 1999 and 2000. Here are the abbreviated results using the same parks:

There were 27 players (4499 PA's) in the 1999 sample with "reverse" splits. The average of the players' sample splits in the pitcher's parks was 1.13 (remember it "should" be .96). In the hitter's parks, whereas the splits of all players "should" be 1.04, the players with "revrese" splits had a composite split ratio of .89.

In 2000, 19 of the 27 players "survived." The players in the hitter's parks who had a "reverse" composite split of .89 regressed to a composite split of 1.11 and the players in the pitcher's parks regressed from a "revrese" split of 1.13 to a split of .96.

The conclusion is now stronger that, without knowing anything else about a player other than his sample one-year home/road splits, in order to estimate his "true" splits or predict his future splits (again, they are basically one and the same), one should ignore those sample splits and simply assume that his future or true split ratio will be approximately the same as the average player in the league.

BTW, in Tango's lead-in to this study, he meant (or at least, he should have meant) "...in reverse of the 'park factor' of his home park" and not "the 'HFA' (home field advantage)..."

Posted 11:19 p.m., November 20, 2003 (#5) - Michael Humphreys
  Studes,

I assume that El Sid did much better at Shea. That would make a lot of sense. He was probably one of the most extreme fly ball pitchers of all time (I think he once pitched a game with no ground balls), and Shea had one of the largest foul territories when he was pitching. (They've since added new seats in what used to be foul territory.)

Another factor might be visibility. El Sid struck out his share of batters, and the haze at Shea (aggravated by "track" lighting) probably helped him (and Seaver too).

Posted 1:43 a.m., November 21, 2003 (#6) - MGL
  Shea has a smaller than average foul park factor now (.93). DO you know when the seats were added? I keep track of all park changes and I don't recall coming across that, unless it was a long time ago.

Shea is a pitcher's park for basically 3 reasons. One, the infield grass seems to be thick (very low "GB hits thru the IF factor"). Two, as you say the visibility is probably bad and/or the mound is favorable to the pitcher, as the K park factor is high and the BB park factor is low. Three, the HR park factor is low, especially to left and center, because of the average fairly cold weather, the sea-level altitude, and a deep center field and rounded outfield...

Posted 3:05 p.m., November 21, 2003 (#7) - KJOK(e-mail)
  Yes, when were these new Shea seats added? I have around 1985 as the last time any major seating modifications were made to Shea...