How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
By MGL
As a preview to my new and improved Superlwts (total player rating) run values, here are major league baseball's 2000-2003 base running picks and pans (if I may borrow a phrase from People Magazine):
Each player's base running lwts or net run value is computed as per the following general methodology:
Three basic base running categories are considered: One is a player's ability to move from second base to third on a ground ball out to the infield with 0 or 1 out. Each base runner's advance, hold, and out, per opportunity, is compared to a league average base runner and converted into net runs cost or earned.
Two, is a base runner's ability to stay out of the double play when on first base and on a ground ball out to the infield. Each player's out at second, safe at second but out at first, or GDP, per opportunity, is considered.
Three, all other base running situations are evaluated in the same way: holds, outs, and advances on hits and fly ball outs to the outfield, and outs on base when trying to stretch a single or double.
In addition, most of the above are adjusted for the speed and the location of the batted ball.
Finally, after combining (without weighting each year) each player's 2000-2003 base running lwts, I convert that number into a lwts per 162 base running games (base running games is a player's base running opportunities divided by the league average opportunities per game, times 162) and then, to establish a player's true base running talent, I regress that number using the following regression formula:
regression= 1 - base running games/(base running games + 250)
For example, if a player has 600 base running games in 4 years, his raw base running lwts per 162 will be regressed by 29.4% toward the league average, where the league average is zero by definition.
Without further ado, here are the best and worst base runners over the last 4 seasons, with comments. Most of these players will not surprise you, as the largest component of a player's base running skill is speed. All values are regressed runs per 162 base running games.
2000-2003 best
1) C. Guzman 3.5 (mitigates the value of an otherwise poor overall player)
2) B. Larkin 3.2 (was once a truly great all-around player)
3) T. Womack 3.2 (see above comment on Guzman)
4) J. Pierre 3.0 (as expected, being one of the fastest players in baseball - with one of the smallest heads)
5) P. Reese 3.0 (his defense and base running more than make up for his weak bat - as if you thought that the BoSox didn't know what they were doing!)
6) B. Jordan 3.0 (he didn't play football for nothing!)
7) R. Furcal 3.0 (truly an up and coming superstar - the next B. Larkin?)
8) T. Goodwin 3.0 (and you thought that Dusty Baker was not so smart!)
9) A-Rod 2.6 (is there anything this guy is not good at?)
10) C. Beltran 2.8 (see comment above on Furcal)
2000-2003 worst
1) Edgar -5.8 (age and bad legs)
2) A. Galarraga -4.2 (no surprise - who was dumb enough to sign him this year?)
3) F. Thomas -4.0 (his overall value is not very good at all anymore)
4) J. Lopez -3.8 (someone paid too much for him this year)
5) C. Delgado -3.5 (considering his defensive position and his peripherals, not nearly one of the best players in the game)
6) B. Molina-3.3 (can't hit, can't run, but is a good defensive catcher)
7) B. Grieve -3.2 (not much to say about this guy - can't run, can't field, used to hit...)
8) R. Sexson -3.1 (mitigates his value - someone probably paid too much for him)
9) R. Palmeiro -2.9 (still can hit, probably doesn't want to hurt himself on the bases)
10) M. Ramirez -2.8 (not too many outfielders on the worst list - is he slow or dumb on the bases?)
--posted by TangoTiger at 03:31 PM EDT
Posted 4:39 p.m.,
February 10, 2004
(#1) -
Greg Tamer(e-mail)
Which runners, if any, were regressed out of the top ten? I understand the need to filter those with limited PT, but what about those with two or three full seasons of PT who are somewhat hurt by the regression?
I'm surprised to see Larkin at #2 after the regression due to his limited PT these past four years. He must be, like, really, really good. As a Reds fan, I'm going to assume he was even more superb in his prime.
Also, you have A-Rod at #9 with 2.6 runs, but Beltran at #10 with 2.8 runs.
Looking forward to the new and improved Superlwts.
Posted 5:03 p.m.,
February 10, 2004
(#2) -
studes
(homepage)
Great work as usual, MGL. Do you really mean some of your comments? For instance, is Delgado really not nearly one of the best players around? -3.5 runs over 162 game for his baserunning doesn't seem to warrant such a downgrade.
Guess we'll see when all the slwts are finished.
Posted 7:35 p.m.,
February 10, 2004
(#3) -
dsm
So if I am interpreting this chart correctly, it appears as if the difference between an average baserunner and the best baserunner is about 1/3 of a win, or 1 Win Share. Ditto for the worst baserunners, unless you are a complete cripple. Which means I second studes' comment about MGL's comments - these adjustments to a player's offensive contributions are somewhere between marginal and neglible.
Posted 8:41 p.m.,
February 10, 2004
(#4) -
mathteamcoach
On Manny...(not too many outfielders on the worst list - is he slow or dumb on the bases?)
Both.
Posted 10:31 p.m.,
February 10, 2004
(#5) -
MGL
Sorry about the A-Rod mistake. He is actually #11. Beltran is #9 and Deshields and Ellis are #10 at 2.7.
Does James have baserunning win shares? What is the range? Of course, his would be non-regressed, so the range would be wider. The SD for my baserunning lwts per 140 games is 2.8, so 95% of the full timers are between +-5.6. So for 500 games (around 4-years), the SD is 5.3 (95% between +-10.6).
I'm surprised to see Larkin at #2 after the regression due to his limited PT these past four years. He must be, like, really, really good. As a Reds fan, I'm going to assume he was even more superb in his prime.
Remember that the regressed values ARE how "good" a player really is. I know everyone wants to see the "actual values" for everything, but I am starting to hate actual (unregressed) values as they mean "nothing," especially if we don't know how much luck is inherent in the measure. At least here you know that the correlation in 600 games is around .70 such that around 50% ("r squared") of the variance (34 runs per 600 games) is attributable to chance and 50% to the players' baserunning skill.
The new Superlwts (2000-2003) are already in Primer hands waiting to be published. They are all expressed as actual values with no regression. If anyone wants to estimate a player's true total Superlwts from his sample total Superlwts, they can use the following regression formula:
regression=500/(500+PA),
which is around 44% for one year (630 PA or 150 games). Technically, each Superlwts category has to be regressed individually, but the above is good enough for a free stuff on a free web site.
The comments in the article are partly tounge and cheek. Don't take them too seriously.
I meant what I said about Delgado though. I have 42 players projected better than he for 2004 in regressed position adjusted Superlwts per 150 games! You will see them when Primer publishes the file. It's not the baserunning that hurts him so much. In fact, his 4-year weighted baserunning and UZR are only -6 per 150. It's mainly his age and position (an average first baseman is +11 in Superlwts, over the last 4 years, whereas an average SS is -9, so he has 20 runs to make up when comparing him to A-Rod or Nomar).
Players like Luis Matos, Milton Bradley, Polanco, Kearns, and Adam Kennedy are rated above Delgado. That is because of defense, other peripherals, like baserunning, age, and defensive position.
Defensive position is critical in comparing one player to another. The average catcher Superlwts is -15, which is why a player like I-Rod is so valuable.
I want to thank Tango and others for "turning me on" to this idea of defensive positional adjustments, which I believe are critcial and often overlooked.
There appear to be two ways of neutralizing everyone's Superlwts rate to account for defensive position. One is the way I do it. The other is to try and neutralize each player's UZR and not touch any of the other values, with the assumption being that any player can potentially play any other defensive position so long as you adjust his UZR appropriately. Needless to say, that kind of adjustment is problematic...
Posted 10:37 p.m.,
February 10, 2004
(#6) -
MGL
Those SD's I quoted above are the "observed" SD's and don't represent the spread of talent. The spread of talent would be the SD of the regressed values, which is only:
1.61 runs per 162 for players with at least 400 games in those 4 years. So yes, it looks like +-3 or 4...
Posted 11:34 p.m.,
February 10, 2004
(#7) -
Greg Tamer(e-mail)
Thanks, MGL. One more question:
regression=500/(500+PA)
Why 500? If this is explained elsewhere in another thread or on another site, a link would be great. I understand the concept of regression, but I can't, off the top of my head, figure out how it is determined how much regression is needed. Thanks again.
Posted 11:36 p.m.,
February 10, 2004
(#8) -
Greg Tamer(e-mail)
MGL -- nevermind on the regression question -- I finally remembered Tango had a Studies thread on it.
Posted 9:36 a.m.,
February 11, 2004
(#9) -
tangotiger
MGL, I'm surprised that you have to use a high number like 500.
For off lwts, I use 209, and for UZR I use 420. Since these would be the two largest components of superLWTS, 500 seems out of line.
Otherwise, you are saying that you have to regress superLWTS 50% if given 500 PA. And, I know that you have previously stated tht you regress around 30% or so.
Posted 12:48 p.m.,
February 11, 2004
(#10) -
Mark Field
these adjustments to a player's offensive contributions are somewhere between marginal and neglible
I'm not so sure. In a given year, yeah the difference between best and worst amounts to about one win per year. But over the course of a 15-20 year career, this difference does give us a better appreciation for guys like Larkin or Henderson compared to, say, Delgado or Manny.
I think sabermetrics is still at the stage of overemphasizing batting because that can be measured so well. I applaud MGL for adding in the defensive and baserunning contributions precisely because it leads to comments like the one he made about Delgado, which I see as counter to what most -- including me -- might have said after just looking at OPS+.
Posted 1:55 p.m.,
February 11, 2004
(#11) -
MGL
MGL, I'm surprised that you have to use a high number like 500.
Good point Tango.
Since batting lwts alone has a y-t-y "r" of .675 for 550 PA's, which implies an x of around 265, I don't know where I got that 500 from. Must be a mistake. It's probably around 200. I'll check.
If any of our resident stat experts are lurking, what happens if you "combine" 6 or 8 metrics that have various "r"'s for a given number of opportunities, and various SD's? What should the combined "r" look like? IOW, Superlwts is essentially a combination of batting, defense, arm or DP defense, baserunning, and GDP as a batter, all per "game." For each one, a game represnts various numbers of opportunities, so the "scale" for each "r" is different. For example, one game for batting runs is 4.2 PA's, while one game for baserunning is 1.15 baserunning opportunities. Anyway, the "r"'s vary from the highest of .675 for around 500 PA's (120 games) of batting to the lowest of right around .300 for 120 games of baserunning and GDP defense for infielders. So if we combine all of those values, per game, what kind of "r" would we expect for the total, in, say, 120 games? Obviously becuase the variance of all these measures is quite different, batting and UZR will "dominate" the combined value...
Posted 2:02 p.m.,
February 11, 2004
(#12) -
MGL
Good point Mark F.! While each Superlwts category other than batting and UZR is relatively insignificant, at what point to we not write them all off, both in terms of a career, as you say, and in terms of a combination, especially since many are co-related (baserunning, UZR, and GDP for example). A big, slow guy like Delgado will tend to be bad at all the peripherals, which makes quite a dent in his overall value.
Plus people, inlcuding myself, do tend to forget or ignore the positional "adustments." It is human nature to want to just compare A-Rod's hitting with Delgado's to see who is the "better" player. It is almost a footnote or an afterthought that they play vastly different positions and therefore their overall value, given the same hitting, is vastly different. A 20-run Superlwts difference between a SS and a first baseman is a lot of runs to "make up!"
Posted 2:38 p.m.,
February 11, 2004
(#13) -
tangotiger
MGL, it's even worse, because even if the spread of talent of baserunning were as wide as hitting, each hitting event is worth more.
That is, if you had basestealing as a spread between 50% success and 90% sucess, and you had OBA as between 25% and 50%, you not only have what you mentioned about the opps (say 0.1 SB att/game to 4.3 PA/game), but that each SB att is worth far less than an average PA in terms of potential impact.
My guess is that it would be far easier to just convert superLWTS into a binomialized stat, and work from there.
I'll defer to those with more expertise on the matter.
Posted 4:14 p.m.,
February 11, 2004
(#14) -
Mark Field
Bill James once commented that players like Ron Roenicke, who do everything a little above average but nothing a lot above average, tend to be underrated. I think the same applies to these seemingly minor aspects of play -- it's easy to forget how these "little" things can add up.
Being able to play shortstop is worth, say, 1.5 wins per year over being a first baseman (that's tango's estimate; I think it's low). A good fielder at short might be worth 3 wins above average; a bad fielder at first could cost you a win. If the shortstop can run the bases well and the first baseman can't, all of a sudden he's accounted for 5.5 wins relative to that first baseman before we even get to offense. It takes a very good hitter to make up that kind of disparity.
Posted 4:30 p.m.,
February 11, 2004
(#15) -
Kyle S
I have a question that is somewhat relevant here so I'll ask it and yall can make fun of me. Does run expectancy from the base/out Markov Chain states depend on team SLG? Intuitively, it seems like the marginal value of getting to 2nd base over first base is higher for a team that hits mostly singles as opposed to a high SLG team, which therefore might increase the value of a stolen base. However, I expect that the difference (if it even exists) will be tiny. Have any of yall looked at this?
Posted 4:51 p.m.,
February 11, 2004
(#16) -
tangotiger
(homepage)
Run Expectancy depends on EVERYTHING.
The above link will give you a good idea about the various run values of events in different run environments. All those run environments assumes a typical split of hits, hr, walks given that you have a safe play.
For your specific question about a team that relies much less (or more) on hr, I'd have to run it through (say like the 80s Cards). Chances are, these Cards will have a lower chance of scoring from 1B and greater from 2B and 3B. This makes the SB worth more.
A team with tons of HR hitters will make the value of being on 1B greater and being on 3B less, and makes the SB worth less (relative to the Cards team).
As for magnitude of effect, I'm sure it would be quite small, maybe a .002 win difference per SB at most, and more likely .001? So, Vince Coleman might have been worth an extra .2 wins in a season by not being surrounded by boppers. Just a guess.
Posted 7:29 p.m.,
February 11, 2004
(#17) -
MGL
I checked the sample "r" for total Superlwts for 00 regressed on 01 and 02 regressed on 03, playes with a min of 500 PA each year. It is around .600 for around 600 average PA's. It is less than that of batting lwts alone. That is .52-.67 at 2 SD's. So I am going to use 400 as my x in the regression formula...