BABIP and Speed (January 7, 2004)
Some good stuff in here. Speed may be worth around .015 hits / BIP. With 500 BIP, that's 6 runs.
The issue with LH might have more to do with reaching base on error, which of course and stupidly, MLB counts as an out for the batter.
--posted by TangoTiger at 07:40 PM EDT
Posted 8:14 p.m.,
January 7, 2004
(#1) -
Charles Saeger(e-mail)
I took a look at the 1992 NL once upon a time, and found this advantage to be around the same, I found adding Speed Score/200 to a base negated speed, or .005 points of BABIP per point of Speed Score.
Posted 8:47 p.m.,
January 7, 2004
(#2) -
MGL
Honestly, I don't understand what the point of this is. Yes, faster players get more infield hits and get more doubles and triples and ROE's (and they bunt more often, which results in a hit around 40% of the time). Is that supposed to be a revelation?
All of these things are reflected in their stats, other than the ROE's. We've always advocated making an adjustment for ROE's in a player's stats. I don't know that it is "stupid" for MLB to treat a ROE as an out. What do you expect them to do? It is the same logic as with earned runs and unearned runs with a pitcher. They assume that a ball that "should" have been an out is an out. That's all. It obviously ends up making a player's official stats not exactly indicative of his true value, but so what? Who says that a player's official stats are supposed to prefectly reflect talent? Yes, from a sabermetric viewpoint an ROE is probably closer to a hit than an out, but one of the points of keeping official stats is simply to reflect exactly what is going on in the field. An ROE is a play that "should have" been an out but for a fielder miscue. The fact that some players cause more ROE's than others is not necessarily relevant to MLB - only to someone trying to estimate true value from official stats. At least they are tabulated separately, so you can do the adjustment if you want. What would you have them do - call it a single? That's fine too, I suppose, although it is certainly more consistent to call it an out than a single. Frankly, I don't think it matters what you call it. If you have a problem with ROE's you have to have a serious problem with SF's or with including IBB's in a batter's BB totals!
As far as the "article" goes, of course LHB's are going to have more infield singles than RHB's, after you control for speed and everything else. Looking at their BABIP or regular BA is not going to tell you anything about a batter's speed unless you control for everything else. The pool of LHB's is worse overall than the pool of RHB's just as the pool of LHP's is worse than the pool of RHP's.
As far as the ROE's, which I assume is your "beef," before we treat them as a single, we would need to see how much is luck and how much is skill. If the skill component is not similar to that of a single (if the period to period r's are not similar), then I don't think you want to treat it is a single. We know that there is SOME skill (i.e. speed) involved in the ROE's - how much is the important question before we get all worked up over the fact that they are normally treated as outs.
In Super-lwts, I am going to either include ROE's, or at least give some extra weight to a player's speed, somewhere. The reason for doing the latter rather than including ROE's per se, is that let's say that there were two RHB's who were equally slow and had equal power (and about the same GB/FB ratio), but one had lots more ROE's than the other. You would have to assume that thei difference in ROE rate was pure luck any attempt to assign "value" to those different ROE rates would be surplussage. Accordingly, you might be better off just making an adjustment for speed, GB/FB ratio, and handedness. In fact, maybe just adding a little extra value to a playre's IF hits might do the trick, as a player's infield hit rate (per PA) might be the best predictor of his ROE rate, as they come from essentially the same "skill" (speed, BIP rate, distance from home to first, and GB/FB ratio, and depth of IF)...
Posted 11:18 p.m.,
January 7, 2004
(#3) -
Tangotiger
Honestly, I don't understand what the point of this is.
Because we are quantifying what we already know qualitatively.
Yes, faster players get more infield hits and get more doubles and triples and ROE's (and they bunt more often, which results in a hit around 40% of the time). Is that supposed to be a revelation?
Now, when you regress a hitter's hit/BIP, you can also use his speed, so that if you have a fast guy with a .305 and a slow guy with a .305, you regress the fast player upwards, and the slow player downwards.
All of these things are reflected in their stats, other than the ROE's. We've always advocated making an adjustment for ROE's in a player's stats. I don't know that it is "stupid" for MLB to treat a ROE as an out. What do you expect them to do?
Record it separately, just like you record a single and a walk. I see no reason to lump in a RBOE with the other ABouts.
Who says that a player's official stats are supposed to prefectly reflect talent?
No one. But, why not keep something that has a run value of +.5 separate from somethign that has a run value of -.3?
but one of the points of keeping official stats is simply to reflect exactly what is going on in the field.
But, "officially", it's considered an ABout, unless you break out your PBP files.
The fact that some players cause more ROE's than others is not necessarily relevant to MLB - only to someone trying to estimate true value from official stats.
Agreed.
At least they are tabulated separately, so you can do the adjustment if you want.
Only for those who have PBP files.
What would you have them do - call it a single?
Keep it separate.
That's fine too, I suppose, although it is certainly more consistent to call it an out than a single. Frankly, I don't think it matters what you call it. If you have a problem with ROE's you have to have a serious problem with SF's or with including IBB's in a batter's BB totals!
At least the SF and IBB is recorded separately so that I can have the option to add in SF to the other ABouts, and remove the IBB from the total BB. Not true with RBOE.
As far as the "article" goes,
I think it was a good effort. I think encouragement, where effort is put in, is warranted, don't you?
As far as the ROE's, which I assume is your "beef," before we treat them as a single, we would need to see how much is luck and how much is skill.
Agreed, but until MLB compiles them for us easily, those of us with PBP files are the only ones who can do the work on it.
If the skill component is not similar to that of a single (if the period to period r's are not similar), then I don't think you want to treat it is a single. We know that there is SOME skill (i.e. speed) involved in the ROE's - how much is the important question before we get all worked up over the fact that they are normally treated as outs.
Yes, that's the point. Keith Woolner's look at it last year is certainly promising. The league leaders were fast or LH hitters. Whether there is anything extra beyond that, like an additional skill, needs to be looked at. Again, keep it separate.
Accordingly, you might be better off just making an adjustment for speed, GB/FB ratio, and handedness. In fact, maybe just adding a little extra value to a playre's IF hits might do the trick, as a player's infield hit rate (per PA) might be the best predictor of his ROE rate, as they come from essentially the same "skill" (speed, BIP rate, distance from home to first, and GB/FB ratio, and depth of IF)...
Very possible. But again, this goes back to the stupidity of MLB to not account for this as a separate category. Why should Woolner, Emeigh, MGL, or me be the only ones who can answer this question?
The above blogger, Dick Allen, certainly would have enjoyed looking at this issue. And I'm sure many many more.
Posted 1:13 a.m.,
January 8, 2004
(#4) -
MGL
Now, when you regress a hitter's hit/BIP, you can also use his speed, so that if you have a fast guy with a .305 and a slow guy with a .305, you regress the fast player upwards, and the slow player downwards.
Yes, no doubt. I explain in my regression article how you should use "other things" in order to figure out what constant to regress certain stats towards. This is the same idea.
Record it separately, just like you record a single and a walk. I see no reason to lump in a RBOE with the other AB outs.
Sure, if only we lives in a perfect world, we'd all shop at Walgreens...
At least the SF and IBB is recorded separately so that I can have the option to add in SF to the other ABouts, and remove the IBB from the total BB. Not true with RBOE.
I actually forgot that they WEREN'T listed in a player's official stats!
I think it was a good effort. I think encouragement, where effort is put in, is warranted, don't you?
I didn't really mean to disparage the article. It brings up a point that is definitely worth noting and working on, especially when you consider that they (ROE's) aren't normally available, which changes my whole premise. I guess at the very least, they should be noted separately in a player's official stat line, as you said, like the SF's and IBB's, even if they are considered an out (you can't argue too much with the logic of calling them an out - after all no one on the MLB rules committe thinks about the impact of speed on ROE's - they just figure it should have been caught, therefore it can be treated as an out, period). Now, whoever came up with the sacrifice fly thing, but not the "sacrifice ground ball"...
Posted 4:04 a.m.,
January 8, 2004
(#5) -
Rich(e-mail)
(homepage)
Honestly, I don't understand what the point of this is
MGL, I'm just making my way into baseball research... I've never claimed that anything here was groundbreaking, I was just interested in this issue and thought I would investigate. As I get more savvy with research methods/issues I may be able to do something more worthwhile. Surely it's better that people like me do relatively insignificant research than none at all?
Rich
Posted 5:47 a.m.,
January 8, 2004
(#6) -
Rich
sorry, that comes across as being too defensive... I just meant to say that everyone has to start somewhere, and by having the likes of Tango and yourself point out oversights (such as the reaching on error thing, which hadn't occurred to me), the likes of myself can learn and improve the quality of what we're doing.
Thanks
Rich
Posted 12:42 p.m.,
January 8, 2004
(#7) -
Rob H
(homepage)
Whether there is anything extra beyond that, like an additional skill, needs to be looked at.
Maybe you know this already, but others looked at this already. See homepage for more. Bob Horner of all people was very good at ROE.
Posted 2:17 p.m.,
January 8, 2004
(#8) -
tangotiger
I took all players with at least 800 PA (excluding IBB and bunts) from 1999-2002, and compiled their RBOE. I could have split it up by BIP instead of PA, but whatever.
Anyway, we've got 340 hitters, with an average RBOE/PA of 1%. Pat Meares had 22 RBOE over 885 PA, or a rate of 2.5%. A player with 885 PAs would have a variance of .0034. That is, sqrt(.01*.99/885) = .0034. (Apologies if I'm not using the correct terms.) Meares was 4.3 standard deviations away from here, and best in the 1999-2002 period. That is, (.025-.010)/.0034=4.3
Doing the same for all 340 hitters, and only 51% of the hitters were within 1 standard deviation. The standard deviation of these standard deviations is 1.33. Again, if I'm doing this right, this means that the spread is 1.33 times larger than would be expected by random luck.
Using the Guassian method that AED so kindly posted for us a few days ago (and assuming that the true variance is .0035, which I'm sure is too high), and the regression equation for RBOE becomes:
regression towards the mean (RBOE) = 832 / (832 + PA)
In this case, Meares regresses 48% towards the mean, or a "true talent" RBOE rate of 1.8%.
The average regression towards the mean was 33%, with an average of 1700 PAs.
************
Using the other way I had posted:
obs var ^ 2 = true var ^ 2 + luck ^ 2
.0035 ^ 2 = true ^ 2 + .0024 ^ 2
that makes the true = .0024
The regression towards the mean in this case is 50%.
******
The reason for the difference is that I treated
.0035 as the true variance in the Gaussian method. Making .0024 as the true variance, and the new regression towards the mean equation, according to the Gaussian method is:
regression towards the mean (RBOE) = 1700 / (1700 + PA)
Posted 2:33 p.m.,
January 8, 2004
(#9) -
Rich
(homepage)
http://www.baseballstuff.com/btf/scholars/levitt/articles/speedscores.htm
Interesting work on RBOE.
Posted 3:56 p.m.,
January 8, 2004
(#10) -
AED
The true variance in skills can be estimated by finding the additional variance needed so that the average of the square of the deviations (in sigma) is one. In other words, find "x" such that:
average [ (RBOE/PA-0.01)^2 / ( 0.0103*0.9897/PA + x^2 ) ] = 1
It turns out that x is 0.0022, closer to Tango's second estimate. This gives a regression toward the mean of 2100/(2100+PA).
Posted 6:32 p.m.,
January 8, 2004
(#11) -
MGL
The study linked by Ricj above is interesting. In summary, it appears as if speed has almost nothing to do with ROE's! The two factors appear to be propensity to hit GB's (of course) and handedness of batters (a proxy for which side of the infield the GB's ar hit to). RHB's have more ROE's per GB! This is further evidence that "speed to first" is not a factor. If we adjust for handedness and count ROE's as per GB, we appear to be left with almost nothing! That is important, as it means that if we want to include a player's ROE's in a value metric, all we need to (and should) do is increase the value of a GB out to SS and 3B! If we don't have PBP data (where each ground ball was actually hit), then we should simply take a player's total GB's, and interpolate how many to SS and 3B, based on his handedness.
If this is true, it is profound, as I think that the heretefore sabermetric wisdom has been that speed, as well as GB propensity of course, is a major factor in ROE rate. Again, this study, at least, seems to completely contradict that - i.e, that speed plays little or no role in ROE rate, once we control for GB rate (since speedy players tend to be GB hitters)...
Posted 7:57 p.m.,
January 8, 2004
(#12) -
D Smyth
---"...speed plays little or no role in ROE rate, once we control for GB rate (since speedy players tend to be GB hitters)..."
Well. then, don't you need to control for GB rate and speed independently? It is common sense, to me, that speed "should be" an independent factor in RBOE.
Posted 8:29 p.m.,
January 8, 2004
(#13) -
MGL
David, that's the whole point! Once you control for speed (and to whom the GB's are hit), according to this study at least, speed has NO correlation to ROE's!
Posted 11:50 p.m.,
January 8, 2004
(#14) -
Chris Dial
Yes, it's been pretty well known that RH pull GB hitters ROE the most, for some time. Not sure the date of that article, but Ron Johnson has mentioned similar stuff (from Tom Ruane, I think).
I don't believe ROE is a skill. If errors are defined uniformly, it can't be.
ROE's will also have less baserunner advancement than a regular hit.
I like "separate", but given the present alternatives, I prefer them in "outs" as opposed to "reached base". The player didn't do anything that created the error (as the rulebook defines errors). Meares ROEd. Isn't that at the Metrodome? Isn't much of that park factors? I certainly couldn't look at the data you ran, Tango, without an adjustment for parks (which can be found in the STATS books). The PF for errors will combine both the field and the home scorer (as some are kooks, but still most errors are correct). I looked - the Metrodome actually has a PF of 92 for infield errors. That's surprising.
Any reason you chose 200 PAs a season for 4 seasons? I think I would have set the cutoff a bit higher.
Posted 12:06 a.m.,
January 9, 2004
(#15) -
Tangotiger
The run value of a RBOE is HIGHER than that of a single.
Whether ROE is a skill or the byproduct of a trait (like being RH) doesn't matter much to me... the spread of ROE is still not explained by luck. And, the player deserves the result.
The net effect is that it's a differential of +/- 3 runs.
The PA cutoff doesn't matter, since the standard deviation will take care of all that. I could have made it 100 PAs, and it wouldn't have made a difference.
Posted 1:50 a.m.,
January 9, 2004
(#16) -
MGL
The average ROE park factor on at turf is .92 and 1.02 on grass (IIRC). An yes, Chris, since the GB ROE sometimes results in the batter on second, it IS worth more than a single (around .49 to .48), albeit only slightly.
I for one am man enough to admit that all along I thought it was a "skill" having to do with speed. Despite what Tango says, it is a huge revelation to know if that is NOT true. As I said, if that is NOT true, then all you have to do is slightly adjust a player's value for handedness AND slightly adjust the value of the GB "out" (including ROE's of course).
OTOH, since the LHB's singles have more value and their GB outs have more "moving runner's over" value, the handedness thing might be a wash (or even STILL favor LHB's even after the ROE adjustment), so you might not have to even adjust for handedness after all.
As far as the value of the GB out, there is a huge premium when you include the ROE! IIRC, around 1 in 30 GB "outs" are ROE's, which increases the value of the GB out by .0126, which is huge!
Posted 1:52 a.m.,
January 9, 2004
(#17) -
MGL
Woops, that should be increases the value of the GB out by .026...
Posted 6:33 a.m.,
January 9, 2004
(#18) -
Rich
very interesting.
MGL, do you have values of various outs (K/GB/FB, anything else) to hand please?
Tom Tippett briefly mentions reaching on error in this article:
http://www.diamond-mind.com/articles/ichiro.htm
Of the 3357 errors committed by major league fielders last year, 1928 allowed a hitter to reach base. There were 136,861 plate appearances in which the batter put the ball in play, so the average batter reached via error 1.4% of the time.
Ichiro put the ball in play 647 times last year, so an average reach-on-error count would be 647 x .014 = 9. But Ichiro actually reached via error 12 times. That could just be luck, of course, but since we're trying to put an upper limit on the impact of speed, let's be generous and assume that his speed was responsible for all three of those extra times on base.
Overall, 81% of last year's errors put the batter on first, 17% put the batter on second, 1% put the batter on third, .05% saw the runner score on the play, and 0.7% resulted in the batter being put out trying to get an extra base on the error. Ichiro wound up on first ten times (83%) and on second twice (17%), so he didn't get any extra bases on his errors than did the average batter.
cheers
Rich
Posted 8:26 a.m.,
January 9, 2004
(#19) -
Chris Dial
What is the average value of a hit? I'm going with "more than a single". So I'll stick with ROE is less than a regular hit.
And while I'm no statistician, ROE distribution matters if it is going to be called a skill (or whatever). If it is a skill or trait, then the numbers should have some predictability year-to-year. Do they?
And I don't think the player deserves the "credit". The player in question hit a routine GB - why does he get *any* credit for another player not making his play? The critical aspect to this the definition of an error.
If using 100 PAs would work as well, run the numbers that way and let's see what happens. I think you'll find a wider rate distribution.
Posted 8:30 a.m.,
January 9, 2004
(#20) -
David Smyth
MGL, you said in the Slwts articles that the value of a GB out and a FB out was about the same. And you implied that this includes all of the extra things which happen on these outs, such as GDP, SF, other runner advancement. I always assumed that the impact of ROE was also included in that statement.
Posted 9:28 a.m.,
January 9, 2004
(#21) -
tangotiger
What is the average value of a hit? I'm going with "more than a single". So I'll stick with ROE is less than a regular hit.
Ok, so a single is also less than a regular hit. What's the point? The RBOE has a .01 run impact greater than a single.
And while I'm no statistician, ROE distribution matters if it is going to be called a skill (or whatever). If it is a skill or trait, then the numbers should have some predictability year-to-year. Do they?
I believe that MGL's method uses year-to-year correlation. So, you have 2 separate methods to establish the nonrandomness of RBOE: (1) the distribution of such an event, (2) the year-to-year persistency of such an event.
And I don't think the player deserves the "credit". The player in question hit a routine GB - why does he get *any* credit for another player not making his play? The critical aspect to this the definition of an error.
Then, we would expect randomness if the player deserves no credit. We think that alot of the reason for RBOE is the propensity to: (1) hit GB and (2) hit balls to the right side of the infield. So, we expect persistency and we expect a distribution that is greater than that which would be based on random variations. And, we get it.
If using 100 PAs would work as well, run the numbers that way and let's see what happens. I think you'll find a wider rate distribution.
You *will* get a wider rate distribution. You will *also*get a wider expected random rate distribution based on the smaller number of samples (PAs) for each player. The *difference* between this new observed wider distribution and the new expected random distribution will remain the *same*. This difference is the true talent distribution.
Posted 3:25 p.m.,
January 9, 2004
(#22) -
MGL
Here are the values of various FB and GB events for the AL 2001-2003:
None of these includes bunts.
All FB (including HR's)= -.030
All GB (hits, outs, ROE, etc.) = -.100
Fly ball (Pop or Fly, but not line drive) OUT, including errors and DP's = -.282
Ground OUT, also including errors and GDP's = -.283
Fly ball, not inlcuding ROE's (at least one out made) = -.285
Ground out, also not including ROE's and at least one out made = -.314
As you can see, since very few ROE's are on fly balls or pop flies, they only increase the value of the FB out by .003 runs.
However, the ROE on a ground ball increases the value of the GB out including ROE's) by .031 runs, which is a lot (10%)!
While we are on the subject of lwt values:
K out (including dropped 3rd strikes) = -.300
non-K out (including ROE's, FC where not out was made, etc.) = -.284
non-K out (not including ROE's and at least one out made) = -.301
So the K out and the non-K out are worth almost EXACTLY the same amount if ROE's are NOT inlcluded in the non-K outs. Once you include the errors, then a non-K out is "better." And, it doesn't matter whether those non-K outs are fly outs or ground outs...
Posted 4:01 p.m.,
January 9, 2004
(#24) -
MGL
Let me correct the above values. Also, I am now treating a FC where no out was actually made as an out. So there are only two categories of non-K outs now: with ROE's and without ROE's. The corrections are in bold.
RHB/LHB
All FB (including HR's)= -.040/-.018
All GB (hits, outs, ROE, etc.) = -.102/-.097
Fly ball (Pop or Fly, but not line drive) OUT, including errors and DP's = -.281/-.283
Ground OUT, also including errors and GDP's = -.289/-.278
Fly ball, not inlcuding ROE's = -.285/-.285
Ground out, also not including ROE's = -.321/-.302
K out (including dropped 3rd strikes) = -.300/-.300
non-K out (including ROE's, FC where no out was made, etc.) = -.286/-.282
Posted 9:26 p.m.,
January 9, 2004
(#25) -
Chris Dial
Tango,
*do* we get persistencyt? That's my question. Does Pat Meares get 12 ROE, 0, 0, 10 ROE to get 22 in 4 years? That's not (to me) persistency *for an individual*. Thus it isn't an individual's skill.
If ther eis persistency, please show it - you showed Meares had the widest range - does he every year fall in teh + region? To this point, that hasn't been demonstrated here.
Posted 11:07 p.m.,
January 9, 2004
(#26) -
Tangotiger
I thought MGL showed it in his PDF, but apparently he didn't. Perhaps MGL can rerun his stuff for RBOE. If he doesn't do it by Monday, I'll do it.
My bet is that the year-to-year r for the 600 PA hitters is .25. There will be some persistency.
Posted 3:18 a.m.,
January 10, 2004
(#27) -
MGL
Depends on what you want to call a "skill." Since we know that ROE's are related to GB rate (per PA) and handedness (GB rate to 3rd base and SS), then of course, players will demonstrate a "persistency" (skill, if you want to call it that) if we use ROE per PA and don't control for handedness. I don't have to look at the data to tell you that. The more interesting question is what is the "persistency" or correlation (is there a "skill") of ROE, if we look at it on a per GB basis and adjust for handedness? According to the study referenced above, there is little or none. I think the r or r^2 was like .04 once the GB rate and handedness were adjusted for.
FWIW, for ROE per PA, we get a regression of about 71% for 600 PA.
For LHB's we get 77%. For RHB's, we get 72%.
If we use ROE per BIP, we have 74% for RHB and 81% for LHB.
If we further control for GB/FB ratio, the regressions will be even higher. There is definitely a suggestion that other than handedness and GB rate, there isn't much of a persistency/skill/correlation in ROE's, as we suspected...
Posted 7:20 p.m.,
January 10, 2004
(#28) -
Tangotiger
In my equation, I get 75%. So, like I said, the year-to-year r is .25.
Posted 7:48 p.m.,
January 10, 2004
(#29) -
MGL
Blows me away how the "equation" comes so close to the empirical number!
Posted 12:27 a.m.,
January 11, 2004
(#30) -
MGL
I ran some correlations for ROE per PA and ROE per "ground ball to the left side of the IF (GBL)." Here are the results (r's):
Players had a min of 300 PA for each of two consecutive years. I correlated 2000 with 2001, 2001 with 2002, and 2002 with 2003 (Technically all data elements should be independent, but they are not as I "overlapped" years, but it is no big deal. It just means that the effective sample size is smaller that the actual sample size.)
N=579
ROE per PA, r=.265
ROE per GBL, r=.386
With my empircial data, I also got a lower regression with the ROE/GBL than the 74% or so I reported with the ROE per PA. I'm not sure why the correlation is HIGHER with ROE per GRL. I'm puzzled by that.
That definitely implies that there IS some consistency in ROE independent of handedness and GB rate - in fact more than if you don't adjust for handedness and GB rate!
The formula I am using for "r" is:
N * the sum of all the x*y's divided by the square root of ((N * the sum of all the x^2's - (the sum of all the x's)^2 * (N * the sum of all the y^2's - (the sum of all the x's) ^ 2))
If that is readable. Is that the "correlation coefficient?" Is r-squared (the precent of variance in the y's explained by the x's) just this number squared?