Tango on Baseball Archives

© Tangotiger

Archive List

Batting Average on BIP, 1999-2002 (October 10, 2003)

Someone brings up team DER every now and then, so for those who've never seen it, this is a breakdown of $H (1-DER) by home/away.

--posted by TangoTiger at 08:00 AM EDT


Posted 8:02 a.m., October 10, 2003 (#1) - tangotiger (homepage)
  And this is for 1974-1990.

Eventually, I'll update the first link to include 2003 and the second one to span 72-92.

Posted 12:30 p.m., October 10, 2003 (#2) - FJM
  Tom: could you post the H-R differential for each team and it's opponents separately? I believe some parks help the home team a lot more than they help the opposition. Case in point: Coors 2003. If you look only at the Rox, the differential (.324-.283=.041) was down only 8 points from your 1999-2002 number. But from the opponents perspective, it was down 51 points (.313-.315= -.002). That's right, Rox opponents actually had a better H$ at home than they did in Denver this year.

Posted 1:05 p.m., October 10, 2003 (#3) - tangotiger
  FJM: while I certainly would be wary of any 1 year differential, your suggestion is a good one, at the multi-year level.

When I get the 2003 data, I'll do breakdowns for 1999-2003 by
- lefty/righty + home/opponent
as well as the overall average.

That sound good?

Posted 1:48 p.m., October 10, 2003 (#4) - FJM
  It does indeed. You might want to take a quick look at the Day/Night split as well. Dodger Stadium and Qualcomm in particular have the reputation of being completely different parks in the daytime.

Also, a study of Questec/non-Questec parks is definitely needed. The effects of Robo-ump should be seen primarily in the K/BB ratio. However, if Curt Schilling is right, it will show up in H$ as well.

Posted 3:59 p.m., October 10, 2003 (#5) - studes (homepage)
  The more I play with park factors, the more convinced I become that one-year factors are virtually meaningless. You can find all sorts of funky stuff in one-year factors. For instance, Rockies' pitchers had a better ERA at home than on the road (obviously, DER had a lot to do with it, as FJM points out). Why? Who knows? The only theory that would make sense is that O'Dowd somehow figured out what a Coors Park pitcher should be, and I sincerely doubt that.

I think MGL, Patriot and Tango are right on: take the long-term view and regress the heck out of the figures. If you don't do that, I think you are better off just totally ignoring park factors. Yes, there will be wild skews on a year-to-year basis. I'd accept those as statistical anomalies that just happen, and that don't really require "correction" except on a long-term basis.

Posted 4:37 p.m., October 10, 2003 (#6) - tangotiger
  Day/night? Hmmm... I suppose grass/turf as well (within home/road), say Fenway-grass, BosAway-grass, BosAway-turf?

So, we've got:
- handedness of pitcher
- handedness of batter
- illumination
- park surface
- park location
- questec

I get the feeling that my sample size will get down to near nothing at this point. As well, the unbalanced schedle rears its ugly head, as well as pitchers batting or not.

I think what we need here is Alan's logistic function to make some sense out of all this.

Posted 5:01 p.m., October 10, 2003 (#7) - FJM
  If you tried to include all possible factors in your model for every park, you would indeed be left with nothing worth analyzing. I don't believe that will be necessary. For example, the day/night distinction probably only matters for a few parks. The burden of proof is always on the one who asserts the positive. So test it first by itself for each park. Unless it clearly matters for Park X, throw it out. Same for home/visitor, left/right, etc. I'm guessing only one or two factors will be significant for each park, but they won't always be the same ones.

Posted 5:51 p.m., October 10, 2003 (#8) - David Smyth
  Not to disrupt the good posts here, but on Tango's link there is an explanation of terms section, and one of the entries is:
Team: the team involved
Thanks, Tango, for making that clear. :)

Posted 6:23 p.m., October 10, 2003 (#9) - tangotiger
  Actually, if you looked even closer, you will note that "team" does not appear as a heading, but "park" does.

Posted 10:24 p.m., October 11, 2003 (#10) - RossCW
  The more I play with park factors, the more convinced I become that one-year factors are virtually meaningless.

I don't think park factors are meaningless - its pretty clear Coors stadium numbers are not the same as other parks. They can't be accurately applied to individual players or individual seasons. But that just makes them not very useful for most purposes and very often misused, not meaningless.

Posted 12:40 a.m., October 12, 2003 (#11) - studes (homepage)
  Whoa! Targeted by Ross.

I didn't say park factors are meaningless. Just one-year park factors. If I only knew the park factor for Coors Park for one year, I would severely regress it back to the league average.

Look at Pac Bell after its first year, then compared to the next two. Pretty big diff.

Posted 10:16 a.m., October 12, 2003 (#12) - tangotiger
  I agree. We think of Coors as "Coors" because of the amount of years of data.

But, I can easily go back and cherry-pick 1 year from Fenway or Wrigley or Fulton County or the Metrodome, etc, etc and find a 30% increase in runs home to road.

To be precise, 1 year is not meaningless (nothing is ever meaningless), but I'll guess that single-year park factors need to be regressed somewhere 50 to 80%.

And, as Ross points out, when applied to individual players (or style/quality of players), there's no reason that Jack Clark, Willie McGee and Vince Coleman get the same park factors either.

So, the reliability of using single-year park factors on any one player comes in pretty low (though still statistically significant).

Posted 12:21 p.m., October 12, 2003 (#13) - studes (homepage)
  Nice job, Tango. I better understand what Ross was saying. But I still question whether applying single-year park factors to individual players is "statistically significant." (at least not in terms of determining a player's true ability).

Of course, I should apply the usual caveat: I am not a statistician. And I'm not sure how the concept of statistical significance would apply here. But if one-year park factors are significant for individual players, why regress to the mean?

Posted 1:52 p.m., October 12, 2003 (#14) - RossCW
  And I'm not sure how the concept of statistical significance would apply here

The problem is that it doesn't. There is really no reason to think that a park has the same impact on every player. That large outfield that reduces the SLG of a home run hitter may increase extra base hits for a speedster. So while it is "statistically significant" for the population of players it has no meaning for any individual player.

One way to think about this is to take an average park factor for the AL and NL and then apply them to every park in each league. It ought to be obvious that the impact of Coors has no more relevance to Dodger stadium than it does to Fenway. But the probability that national league players played at Coors is higher so the average league park factors would still be statistically significant for the population of NL players, just as individual park factors are.

Posted 5:38 p.m., October 12, 2003 (#15) - studes (homepage)
  But the probability that national league players played at Coors is higher so the average league park factors would still be statistically significant for the population of NL players, just as individual park factors are.

I agree with what you are saying, Ross, except that I am questioning the statistical validity of this last sentence. I mean, the Rockies' pitchers gave up more runs on the road than at home this year.

It's been twenty years since my last statistics class, but I did the following anyway: I looked at the Rockies' 2003 record at home and away, and computed total runs scored per game (both the Rockies and the opposition). At Coors, the number of runs scored per game was 11.9, vs 9.6 on the road, for a park factor of 125%. Next, I calculated the standard deviation for each set of 82 games. It's 4.7 at Coors and 5.3 on the road.

So, just applying one standard deviation to the numerator and denominator, there is a 67% chance that the numerator is between 6.7 and 17.2, and the denominator is between 4.8 and 14.4.

Okay, I admit that I don't know what to do with this statistically, but I am pretty sure this means that, for a given denominator of 9.6, there is a 16.5% (half of 33%) chance the numerator will be 6.7 or lower. If true, there is a 16.5% chance that the Coors park factor is 70 or lower (6.7/9.6).

I know there are problems with this approach, such as assuming a normal distribution and the imbalanced schedule, but I just don't think one-year park factors are statistically significant, even in an extreme case like Coors'.

Posted 5:51 p.m., October 12, 2003 (#16) - studes (homepage)
  One mistake (at least!). The SD's are 4.7 on the road and 5.3 at Coors. The ranges are correct.

Posted 6:06 p.m., October 12, 2003 (#17) - tangotiger
  The SD might be 5 for one game, but it won't be 5 for 81 games.

That is, if 1000 runs over 81 games are scored at Coors, you need the standard deviation of runs over 81 games (and not do 5 x 81).

In any case, we really don't know how to apply PF to individual players with the current reports. You need to figure out park factors by handedness, gb/fb tendency, power/slap hitter, speed, and quality of player.

The problem is that if you want to know how Pac Bell affects all LH, FB, power, fairly fast, great hitters, you might end up having a sample of only 2000 PA over 5 years, 1000 of which would belong to Bonds.

Posted 6:18 p.m., October 12, 2003 (#18) - studes (homepage)
  Yes, I agree about the individual batter thing. That's not really what I'm saying.

And yes, the St Dev is 5 over 81 games. I analyzed this game-by-game.

Posted 8:04 p.m., October 12, 2003 (#19) - studes (homepage)
  Going over my calculations a little more, I think a statistician would say that there is about a 65% chance that Coors is higher run-scoring environment than the average NL environment, based only on 2003 data. At least, that is my interpretation of the averages and standard deviations, adding in a probability table.

And, obviously, Coors is one of the most (if not the most) extreme environments around. So any conclusions you would want to make about other parks, based on one-year data, would be even less conclusive.

Posted 4:51 p.m., October 13, 2003 (#20) - studes (homepage)
  Okay, I apologize to everyone for using this thread as my own personal learning curve. I spent some time this morning studying my basic statistics even further, and realized I should have used the standard error, instead of the standard deviation (which is what i think you were saying, Tango).

Standard Error is St. Dev divided by the square root of the sample size, which is about 5 divided by 9 in this case, or .556. Two times this gives you a confidence interval of 95%. So I would use this to impute that the Coors Park one-year park factor is statistically significant (that is, there is at least a 95% probability that Coors is a better run environment than the league).

Regarding everything else I said: I guess I still don't trust most one-year park factors, and I absolutely agree that, when assigning park factors to individuals, individual batting/pitching types should be taken into account.