SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)
An empirical approach.
--posted by TangoTiger at 04:46 PM EDT
Posted 6:35 p.m.,
June 10, 2003
(#1) -
EV-9D9
YOU ARE A PROTOCOL DROID ARE YOU NOT
Posted 9:06 a.m.,
June 11, 2003
(#2) -
Andrew Edwards
Fascinating stuff. I could spend all day reading through and finding explanations for all the times when the empirical values deviate from theoretical values.
I think it's those deviations that tell us the most about the game.
Great stuff, Tango.
Posted 9:11 a.m.,
June 11, 2003
(#3) -
Andrew Edwards
For instance, with the bases empty, a walk is worth just a shade more than a single.
My explanation is that this is because pitchers who give up walks are just slightly more likely to be scored on than pitchers who give up hits. Perhaps also that batters who take walks are slightly more likely to be higher in the order, and therefore followed by better hitters.
Like I said, I could do this all day.
It would be interesting, by the way, to control for lineup slot of hitter. Might help reduce some of the noise, especially around the relative value of IBB. I'm not sure if the sample size is there, though.
Posted 9:41 a.m.,
June 11, 2003
(#4) -
tangotiger
Andrew, your suppositions are plausible, though I don't think the sample size at this level will make it statistically significant. That would be my guess as well though, that walks are more likely to be issued by below average pitchers, and to the top of the order.
As for lineup slot, yes, I could do that as well, but a few problems
1 - sample size will definitely play an issue here (I think I'd have to do this for the whole retrosheet years, which really isn't more time, but I just need a more powerful computer)
2 - selective sampling (Mike Piazza would be 3% of the cleanup spot instead of a random noise in the overall, which makes that pretty significant, unless all cleanup hitters are of the same "type" as Piazza.. that may be true, but that won't be the case for the #1 or #2 hitters)
3 - we'd want to separate pitchers batting to not
4 - most importantly, I find the current chart already a little unweildly, and I can't imagine readers enjoying NINE of these, and really, if I separate by pitcherss, 18 of them!
However, I know I would, and I'd guess you would as well!
Posted 3:00 p.m.,
June 11, 2003
(#5) -
Jim Keller
I haven't studied the subject of statistical analysis enough to question any of your data, but I think in order to determine how much of an impact a batter has on the game in question it would be necessary to expand your base/out states to account for the inning and the score differential. A starter working the fourth inning with a 5-run lead will have a much different approach to his pitching than a closer working the ninth with a 1-run lead, as will the batters have different approaches to their at-bats.
Further, I think that the number of runs expected to be scored in an inning is not the most important factor to evaluate. What matters most about the outcome of a PA is whether the batter has increased or decreased the likelihood that his team will win the game. Here again, it is essential to include the inning and score differential. A grand slam in the bottom of the ninth with the home team 3 runs down obviously has a much larger impact on the result of a game than does a solo home run in the bottom of the eighth with the home team ahead by 10 runs. In order to figure this out, an analysis would have to be done to determine the likelihood of a team to win a game for each of over 11,000 states (+/-13 runs (the smallest run differential that has never been overcome in a game), 0-26 outs as the home team, 0-26 outs as the visiting team and the 8 base runner states). Whether this is feasible given the tremendous amount of information required to get reliable data is something I don't know. Once this has been done, all one needs to do is compare the likelihood of winning the game before the PA to the likelihood of winning the game after the PA to determine whether the PA was beneficial, neutral or detrimental.
Posted 3:10 p.m.,
June 11, 2003
(#6) -
Andrew Edwards
Sample size, I expected, would be a restriction. I hadn't thought of the selective sampling issue (I hadn't thought much about it at all).
I guess you're probably right about sifting through nine tables like that. I'd love it though. I'd especially love it if I could make a 3-dimensional pivot table in SPSS out of the nine. *drool* ...pivot tables... *drool*
Jim:
Tango's done Win Expectancy tables too, if you search around this site. The general consensus, in strategy terms, is that you play to maximize total runs in the early parts of the game, since the situation is more variable then. As the possible number of situations diminishes in the later innings, then it becomes more manageable and more sensible to play simply for the win.
Posted 3:11 p.m.,
June 11, 2003
(#7) -
Andrew Edwards
Tango, just to clarify, I'm not actually asking you to do it. Far be it for me to assign you hours of work. You're already doing way too much.
Posted 3:39 p.m.,
June 11, 2003
(#8) -
tangotiger
(homepage)
Andrew, I did not take it as such... just pointed out some areas to think about.
Jim, I didn't talk about win probability here, as that would be another topic. However, if you are interested in that, please go to my site (see homepage link above), and there's plenty there for you. The two important things I've done are:
1 - I've give you the win expectancy for inning/score/base/out for 7th inning and on, with score differential of 1 or 0
2 - Created "leverage" situations for ALL innings, ALL base/out with score 3 runs and less, which you could use for pinch hit talk, and to a slightly smaller extent, bullpen usage
Hope this answers at least some of your questions.
Posted 3:42 p.m.,
June 11, 2003
(#9) -
tangotiger
As always, there are other variables to consider:
- batter/pitcher matchup
- batters due up
- potential pitchers due up
- runner speed
- fielding talent and positioning
- park
in addition to the inning/score/base/out. But, I'm limited in my time, otherwise, I'd love to generate a WE that incorporates all this.
Posted 12:13 p.m.,
June 12, 2003
(#10) -
JohnW
Very interesting stuff. Am I reading the table correctly? I notice that with a runner on 2nd and nobody out, an IBB has a negative value. Does this mean "the old school" is correct and managers should be calling for 4 straight after a leadoff double?
That seems to fly in the face of conventional sabermetric wisdom.
Posted 1:18 p.m.,
June 12, 2003
(#11) -
tangotiger
The way to read it is:
with man on 2b and 0 outs, there's a certain run expectancy from that point on to the end of the inning, based on the actual chain of batters that faced that situation
with man on 2b and 0 outs, and an IBB then issued, there's a certain run expectancy from that point on to the end of the inning, based on the actual chain of batters that faced that situation
The chain of batters are not necessarily random (well, they probably are in the first case), and certainly not random in the second set.
Therefore, you have to be very careful in how you read the chart and trying to make comparisons, and what-ifs, etc.
Posted 3:52 a.m.,
June 13, 2003
(#12) -
JohnW
The chain of batters are not necessarily random (well, they probably are in the first case), and certainly not random in the second set.
Hmmm, does this mean that in the particular cases where the IBB was issued we don't know what the run expectancy would have been had not the IBB been called for? And therefore we cannot judge if the IBB was a valid strategy or not?
Let me ask a direct question: can one tell from the results in your table if the IBB strategy with 0 outs, runner on 2nd, is a good one?
thanks.
Posted 7:55 a.m.,
June 13, 2003
(#13) -
tangotiger
No.
In order to establish the validity of the IBB, you first need to use Win Expectancy, as I did when I looked at "When to walk Barry Bonds" last Oct. To do that, you have to do some work behind the scenes, to establish a "what-if" scenario.
The empirical results are just what they are.
Posted 9:50 p.m.,
June 13, 2003
(#14) -
Silver King
Pardon the interruption. The address I've been using for fanhome's strategy'n'sabermetrics is broken. This also happened a couple months ago, and that's when a Primate supplied me with this link:
http://pub162.ezboard.com/fbaseballfrm8
Can someone point me back to the site, and does somebody know why the addresses seem to eventually go bad?
Posted 8:12 a.m.,
June 14, 2003
(#15) -
tangotiger
http://pub119.ezboard.com/fbaseballfrm8
Posted 10:25 a.m.,
June 14, 2003
(#16) -
Patriot
I have no clue why they go bad(they do it for all of the FanHome boards from time to time), but you can always just go to fanhome.theinsiders.com and then to baseball and find it from there.