I did a little test with LW by batting order, looking only at the leadoff hitter. I'll put the methodology at the end.
Anyway, a leadoff hitter really gets punished. First of all, his singles, doubles, and triples lose about .02 runs for each event. That is, if for the average hitter a single is worth .48 runs, then for a leadoff hitter it's worth .46 runs. The home run loses almost .10 runs. The walk GAINS .01 runs. And the out is a big whopper: -.30 runs/out becomes -.35 runs for a leadoff hitter. The SB has +.02 run effect, but the CS has a whopper of -.08 runs. These adjustments are the run contribution of a leadoff hitter to the average team. Based on these numbers, it is fair to say that to maximize a leadoff hitter's value that he should not hit alot of home runs, he should walk a fair amount, and he should not get out alot (high on-base average), and he should have a high SB%. Of course, all this is conventional wisdom, and so we didn't really learn alot here.
I also looked at all players since 1975 to see who was most and least affected by the "leadoff effect". The best leadoff hitter season since 1975:
Joe Morgan, 1975, -19 runs / 600PA. Of course, I am sure that Morgan has a more positive affect as a #4 hitter as well. Rickey Henderson checks in between -20 and -23 runs every year. Tim Raines has the same spread, as does Wade Boggs.
Since MGL likes to harp on Marquis Grissom as a bad leadoff hitter, let's see how he does: The average major leaguer is -24 runs, and he checks in at -23 to -25 for his career. So, Marquis as a leadoff hitter is not the worst move to make, though there are better ones. And the worst example of a leadoff hitter is probably dave Kingman, with a spread of -24 to -28 runs. The worst leadoff season is Matt Williams, 1989, -28 runs / 600PA.
So, what do we learn? We learn that the difference between the absolute best and absolute worst leadoff hitter is at most 10 runs. We can safely say that a manager might make about 2 runs worth of difference in choosing the right leadoff hitter, since he would never consider Dave Kingman and his ilk.
I also suggest that the career Linear Weights values of Ricky Henderson, Tim Raines, and Wade Boggs be padded by about 40 runs or so to account for their exceptional ability as leadoff hitters.
The methodology: a leadoff hitter will score about 10% more often than a regular hitter, so I padded his singles by .03 runs, his doubles by .045 runs, etc. At the same time, a leadoff hitter will have 25% less runners to drive in than a regular hitter, and therefore instead of a single having a runners-moved-over value of .20, it is only .15, or -.05 runs. Overall, the single is worth -.02 runs. Do the same thing with all the stats, and you get adjusted LW values for a leadoff hitter. Perhaps MGL or Voros or whoever has the data can come up with adjustments for all batting orders and we can see who is the prototypical player for each of those slots.
By the way Mark McGwire's 1998 season has a leadoff value of -22 runs. I am NOT saying that Mark McGwire should be a leadoff hitter, since he will have a much more major impact as a cleanup hitter. But his hitting profile shows that he doesn't lose as much as say Dave Kingman as a leadoff hitter.
Let me qualify what I said about absolute best and absolute worst. I am not talking about Rickey vs Rey Ordonez, and that is 10 runs difference. I mean that given you have BOTH players in your lineup, then putting Rey #1 and Rickey #9 would lose you 4 or 5 runs for your team looking at the leadoff effect. I don't know what the #9 effect would be.
Tango,
I'm not sure what your -23's and -24's represent. When you say the average major leaguer has a leadoff value of -24 runs and Grissom has -.23 to -25, what do you mean exactly? I understand your whole concept, but I don't know what the numbers represent.
I looked at the 99 play-by-play data and computed the lwts values for the leadoff position only. I used the same methodology as I explained in the "more lwts thread." I computed the run expectancy bases/outs matrix for the leadoff spot only, and then used these values to compute the lwts values for the leadoff spot only. The only "problem" with these values is that they are computed based on a relatively small sample size.
99 NL leadoff 99 NL overall
bb = .31 .......... .30
s = .43 ........... .46
d = .65 ........... .78
t = .72 ........... 1.05
hr = 1.25 ......... 1.40
sb = .16 .......... .19
cs = -.38 ......... -.44
out = -.289 ....... -.286
99 AL leadoff 99 AL overall
bb = .34 .......... .33
s = .46 ........... .49
d = .75 ........... .80
t = 1.13 ........... 1.05
hr = 1.37 ......... 1.42
sb = .21 .......... .17
cs = -.42 ......... -.48
out = -.289 ....... -.286
These numbers seem to jive with Tango's estimates, other than the SB/CS and the out values. There seems to be a sample size problem with my numbers (some of them look too "random"). I need to run more years and combine NL and AL to get a larger sample size.
mgl, I don't want to speak for Tango but I presume his -24 type number represents the difference between the leadoff linear values and the overall values for a player, for a given year.
David, yes that is exactly right. Basically, a leadoff hitter has much less effect on the run contribution relative to the other players in the batting order simply because he has no one to drive in. That is, while everyone's coefficients for the offensive events is a combination of ability to get on and ability to move runners over, a leadoff hitter does not have the same opportunities. Even worse, because he comes up so often with no outs, his outs cost the team alot more than any other hitter (i.e, the cost of going from 0 outs to 1 out is alot more than more 1 out to 2 outs). The combination of the two causes the leadoff hitter to produce 24 less runs per 600 PA than the average hitter.
Imagine if you will a batter who ALWAYS comes up with the bases empty and no outs. What do you think his LW would be? We know what they are, something like: .40 * 1B + .67 * 2B + .90 * 3B + 1.00 * HR and a whopping -.50(or something) * outs. This does not necessarily total to zero, NOR should it. The full inning should total to zero. MGL, here is a perfect example NOT to use the outs as a way to get everything down to zero. What I propose is that rather than looking at the leadoff hitter for the game, look at the leadoff hitter PER INNING. In fact, what we should attempt to do is to create the 24 possible LW values based on the 24 possible base/out situations. Then simply apply those weights based on the percentage of times the average leadoff hitter encounters those 24 situations. That should give us the "perfect" leadoff LW values.
Tango, do you have an estimate of the LWt values for a hitter who always comes up with 2 outs/bases loaded--the opposite of your no outs/empty example?
Tango,
Your values for a hitter coming up with 0/0 are about right, except for the out. Where do you get -.5? Using the bases/outs matrix, the value should be around -.27, slightly MORE (less negative) than the overall value of an out. This makes sense, as the value of an out is much more negative when there are runners on base (hence, the large negative value of the GDP). The RE with 0/0 is .56. With 1 out is falls to .29. I also question your out value for a leadoff hitter. It should also be slightly less negative than an overall out, as I computed in my empirical analysis of the lwts values for leadoff hitters.
Your suggestion of first computing the lwts values for all the bases/outs situations and then using this to come up with the lwts values for a particlular slot in the batting order is a good one. There is one flaw however. Two things go into each batting order slot's unique linear weights values. One is the frequency that he bats with the various bases/outs combinations (this is the basis for your suggestion). Two is the composition of the hitters behind him. So a leadoff hitter on one team who has exactly the same frequency of AB's with the various bases/outs situations as, say, the #7 hitter on another team would have very different linear weights values than his counterpart on the other team. The best way of computing lwts by batting order or any other category is still by empirical analysis.
BTW, in answer to David's question, all you have to do is take the RE charts (bases/outs matrix), look at the values for 3/2 and go from there. Of course, you would have to know, or estimate, the frequency of the runner scoring from 1st on a single and 2nd on a double. The other values (BB, out, TR, and HR) are trivial to calculate. (A home run is worth 4 runs minus the loss of the baserunners. The loss of the baserunners is .78 minus .11 or .67, so the the HR is 4 minus .67, or 3.33. A walk is worth 1 run (obviously). A triple is 3 runs minus the loss of 2 baserunners. Etc.)
Tango,
Let me get this -24 runs thing straight. If you take an average batter and plug his numbers into a standard lwts formula (.47*s+.78*d+1.06*t+1.40*hr+.32*bb+.19*sb-.46*cs-.29*outs) and then plug these same numbers into you "leadoff batter" lwts formula, you lose -24 runs for a whole season. Is that right? That seems way too high. Of course, I contend that your value for the leadoff hitter's out is way too negative as well. How did you come up with your leadoff hitter lwts values anyway, especially the out?
Here are the lwts values for a leadoff hitter for 97-99, NL and AL (same as before, with a larger sample and both leagues combined):
bb = .34
s = .45
d = .73
t = 1.09
hr = 1.30
sb = .19
cs = -.41
out = -.306
According to these numbers, the average batter loses about 10.5 runs in the leadoff spot (not counting SB/CS), not 24 runs. This seems much more reasonable.
The overall lwts numbers I am using are:
bb = .32
s = .47
d = .78
t = 1.06
hr = 1.40
sb = .19
cs = -.46
out = -.291
Using the above values, Marquis Grissom, my poster boy for the "anti-leadoff hitter" clocks in at -81 runs for his career (not including 2000), using the overall lwts values. Using the leadoff lwts values, his career lwts is -180 runs. This 99 run difference over his career equals 11 runs per 162 games, a little worse than the average player (am I vindicated!). I wrote this as I calculated the numbers, so I was prepared to eat crow (or just delete the part about Marquis). Even if a particular leadoff hitter lost the same amount of runs as the average hitter (like Grissom in Tango's post, although I contend the -24 runs is WAY too high), that would still make him a pretty bad leadoff hitter, wouldn't it? A leadoff hitter should have a HIGHER leadoff lwts value than the average hitter, and a good leadoff hitter, like Rickey, should have a MUCH higher total. In fact, let me calculate Rickey's...
His overall lwts per 162 games is 21.2 runs.
His lwts per 162 games using the leadoff values is 12.0!
Hmmm....
Could it be that Rickey was miscast as a leadoff hitter? It sure looks like it. His home run totals, which are the most costly (.1 cost) in the leadoff position, cost him 1.6 runs per season. The BB (.02 extra) added 2.4 runs per season, so this more than makes up for the loss from the high HR totals. What's going on here? Did I make a math error? Of course, his CS totals, which is much less costly in the leadoff position (I'm not sure why), gives him an extra .93 runs per season. I did not include these in the 21.2 and 12.0 numbers. The SB totals should make no difference (they are worth the same in both formulas. I'm also not sure why).
From the values in both formulas, it seems like the ultimate leadoff hitter, compared to putting him another positon, would have high walks, high triples, and few home runs and doubles (these are the most costly). Of course, it's hard to have high triples and low doubles (I think). As far as BA, I'm not sure. That's all I can muster for now. I need some help with this topic!
MGL, First off, I think I am wrong about the outs. I misinterpreted my matrix, and I will report on that tonight.
As for Rickey, remember EVERYONE loses if moved to the leadoff spot. The question is: WHO loses the LEAST? I am sure that Rickey, Raines, and Boggs lose the least of their values, and I am sure Kingman loses the most of his value. Given Rickey and Kingman, who do you bat 1st and who do you bat 4th? This is obvious, but let's go through it anyway. Say Rickey loses 10 runs batting leadoff, and gains 2 runs batting 4th. On the other hand, maybe King Kong loses 20 runs batting leadoff, and gains 10 runs batting cleanup. What combination makes the most sense? One gives you 0 net runs, while the other gets you -18 net runs.
As for the idea of computing the LWTS values for each of the 24 combinations of base/out, and the thinking of the flaw because of the batters behind him. Let me clarify. I am talking about the 24 combination for ALL batters at all batting orders (i.e., the league). Then the next thing you do is figure out for each batting order, how often do each of the 24 situations come up. Combine the two, and voila, LW by batting order.
(Let me also note that if you do the LW for the 24 combinations of base/out, this is not a random distribution. That is, certain situations will be weighted more towards one batting spot than another. The work-around here is to come up with the LW for the 24 base/out for each of the batting spots, and you end up with 24 * 9 LW. Then, equally weighting the 9 spots EQUALLY regardless of number of occurrences of those 24 base/out situations, you end up with the LW for the 24 base/out for the LEAGUE AVERAGE RANDOM batting spot. Then you proceed as always, and apply these figures to each batting spot based on occurrences for that spot.)
Tango,
I still think your shortcut for calculating the values for each spot in the order is flawed, but I have to think about your last post...
As for Rickey, I didn't realize that EVERYONE loses in the 1 hole. I have to think about that too. I'm not clear on why. In any case, it would seem that Rickey loses an awful lot (10 runs) for supposedly the best leadoff hitter of all time. Is it that all the good players (with high overall lwts) lose a lot (much more than 10 runs) in the 1 hole, so Rickey's loss is actually quite good? I think we're looking at 2 issues. One is simplay comparing leadoff hitters using the leadoff lwts formula. Whoever has the best (highest) lwts total is the best leadoff hitter. It may be that Rickey comes out tops. The second issue, is leadoff the best for Rickey on any given team? If you put Big Mac in the leadoff spot he may come out as the best leadoff hitter of all time, but you would be costing your team runs by not putting him in the 3rd or 4th hole. I guess for Rickey, we would have to put some other player in the 1 hole and see if we lose more runs. I can't imagine that we couldn't find someone that would loose less than 10 runs. Of course, we really have to know the values for all the slots in order to do a proper analysis of who belongs where.
It looks like we came up with a great way to figure out the optimal batting order without using a sim. Then again, from the sim studies, we also know that batting order, within reason, doesn't make all that much difference anyway.
MGL, Yes I agree, it is a great way to figure out the optimal batting order, and in the end, probably useless!
Yes, there are 2 issues at work here, as you pointed out. As I said earlier, I was inspired by Bill James' sim putting Willie Mays in the leadoff spot, and the team generating as much runs as Rickey. Essentially, while Rickey loses 10 runs, maybe Willie loses 15 or something. And yes, McGwire may come out on top as leadoff. But he'd probably come out on top in all 9 slots! The question is where does he maximize his numbers.
Basically, we are trying to figure out the best hitting profile for each of the 9 slots. Is the #9 slot better as a "2nd leadoff" position? I think we are really close to figuring that out. The walk/home run LW values changing shows the importance of the walk as a leadoff hitter, etc, etc. We already knew all this though, since the leadoff spot is such a well-defined spot. But, I'd like to see how the other batting spots fare as well.
I am pretty sure that my methodology will work. If you really want to take it to the extreme "perfect" way, then you've got to construct a set of LW by bases/outs/batting order/team (8 x 3 x 9 x 30). Then, average all the teams to give you the league average LW by bases/outs/batting order (8 x 3 x 9). And one more time to average for a neutral batting order (8 x 3). You do all that with NO WEIGHTS. This gives us the true LW values for the 24 base/out situations, batting-order neutral, and team-neutral. Once you got that, then you do the same thing for "occurrences" (how often for each team for each spot will the #3 hitter come up with bases loaded and 1 out?, etc), and go through the same "averaging". Once you finally got that as well, you can combine the two.
By not weighting anything, you are saying that the impact of the #1 guy is the same as the #9 guy. That is, they all get the same number of PA.
This will yield a team-neutral set of LW by batting order.
David, as for the bases loaded / 2 outs example, I don't have all the data in front of me, but making a few assumptions yields the following:
1.48 * 1B
2.14 * 2B
2.64 * 3B
3.34 * HR
1.00 * BB
0.08 * SB
(I don't have the data here to figure the out portion.)
Very interesting stuff, keep it going!
Tango stole my thunder because I was going to post on the "who loses the least" idea myself. Obviously, SOMEONE has to bat leadoff.
And since that is the case, and since the linear values for the 2nd, 3rd, and 4th slots are partially dependent on the 1st slot, it should be recognized that there is a certain interaction and mutual dependence between lineup slots which is not captured in this type of analysis. I'm not saying that it's not worthwhile; I'm saying it's not complete, and perhaps not capable of being so. In a way it's similar to the conundrum about how to apportion credit for a game-winning HR in the 9th which dramatically changes win probability; how much should go to the batter and how much to the other players who made it possible?
I am not yet convinced that, if all of of this were properly taken into account, that the overall linear values are not the most correct to use for any batter, regardless of lineup slot.
But I'm willing to be educated.
Yes, I agree, that all studies that I've done and read shows that the lineup order does not matter. And this is just another way of showing if this is true or not. But maybe, just maybe, there is one particular spot in the order that does have a high range, and therefore should be properly suited for.
I also agree there are limitations. For example, with my context-neutral methodology of figuring out the "perfect" LW values, this assumes that the batter in the 3rd 4th and 5th spots are all equally average. But we know this not to be the case. Do we want to adjust based on the real-world experience, or adjust based on everything-equal? Both have their places. But again, if batting order does not matter, then maybe this particular question does not matter? We're still not sure.
By the way, the SB value I mentioned in my last post should read 0.86 * SB. (David, if you still have my linear weights spreadsheet, you can fill in the probability of scoring on 2 outs as basically half the values there. If there is a 28% overall chance of scoring from 1B, then there is a 14% chance of scoring with 2 outs. It's a good rough approximation.) For the SB, if you steal with the bases loaded (all 3 guys of course), then basically, after the fact, it's as if the runner from 1B stole home. That give him 0.86 runs. The cost of the CS is equal to the RE of bases loaded / 2 outs, whatever that happens to be.
To say that batting order "does not matter" is a bit of a misstatement. It just doesn't matter all that much. To some people (me, for example), however, every little bit helps. If I owned or managed a team, I would want to squeeze out every last drop of runs that I could, as long as a small gain did not require much effort and as long as none of my players felt really "uncomfortable" with a particular spot in the order that I assigned him. Of course, they'd be so pissed off at me for other things, like no sac bunting, no IBB to "hot" hitters, no base stealing by players with long-term sb%'s less than 70%, etc., that they probably wouldn't worry too much about the batting order. They also wouldn't be too happy when I traded, released, or benched players like Glanville, Relaford, Alex Gonzales (FLO), Morandini, Darren Lewis, Marlon Anderson, Tim Bogar, Jorge Fabregas, almost everyone on the Twins, etc., guys who belong nowhere near a major league field (can't hit a lick).
I spent all night computing the following, so I hope that not only do you appreciate the work, but that SOMEONE does SOMETHING with it!
Here are the OFFICIAL, empirically computed, lwts values by batting order position. The methodology was essentially the same as computing the overall lwts values. However, I didn't use the overall bases/outs matrix to compile the average values of each of the events for each position in the batting order. I used a separate bases/outs matrix for each batting order position. So first I went through the play-by-play database and computed the 9 different bases/outs matrices for each position in the batting order. (If anyone wants these, send me an e-mail, or I'll print them in another post.) Then I went back through the database and computed the average values of each event for every batting order slot, BASED ON THE INDIVIDUAL BASES/OUTS MATRICES. For example, each time the leadoff hitter completed an event, I subtracted the "before" RE (run expectancy) (from the #1 slot matrix) from the "after" RE (from the #2 slot matrix) and added any runs that scored on that play, of course. This is the value of that event for that slot (leadoff). Of course, to come up with the final values, I average all of the values for each event. Actually, the computer does all of this (in about 3 minutes per league). I just sit back and watch SportsCenter.
First, here are the overall bases/outs matrices for 93-99 (not that these are used in computing the batting order lwts values).
AL
Outs 0...1...2
--- .56 .30 .11
x-- .95 .57 .24
-x- 1.2 .72 .35
--x 1.44 .98 .39
xx- 1.59 .96 .46
x-x 1.88 1.22 .54
-xx 2.08 1.43 .62
xxx 2.34 1.64 .81
NL
Outs 0...1...2
--- .51 .27 .10
x-- .88 .53 .22
-x- 1.13.68 .32
--x 1.37 .96 .37
xx- 1.47 .91 .44
x-x 1.76 1.11 .49
-xx 1.96 1.36 .56
xxx 2.25 1.51 .72
Remember that these are 93-99 averages. I think the values will be slightly higher for 99 and 00 (higher scoring than 93-98).
O.K., here are the lwts values by BO (batting order) position:
93-99 NL
1.....2.....3.....4.....5.....6.....7.....8.....9
out -.282 -.296 -.324 -.322 -.292 -.266 -.243 -.231 -.186
bb .36 .37 .29 .29 .28 .26 .26 .28 .43
s .48 .49 .44 .46 .44 .41 .41 .43 .56
d .75 .74 .73 .78 .76 .71 .74 .73 .86
t 1.03 1.10 .98 .99 1.04 1.08 1.19 1.08 1.19
hr 1.27 1.32 1.33 1.39 1.39 1.40 1.42 1.42 1.56
sb .20 .19 .14 .13 .12 .14 .16 .17 .33
cs -.41 -.53 -.53 -.46 -.45 -.41 -.37 -.35 -.43
sh1 -.11 -.10 -.17 -.13 -.15 -.13 -.14 -.07 -.07
nsh1 .06 -.02 -.01 -.01 -.00 .02 .01 .05 .07
sh12 .12 .14 -.06 -.05 -.07 -.09 .00 -.09 -.06
nsh12-.06 .00 .00 .02 .04 .06 -.11 -.02 .00
sh2 .09 -.06 -.19 -.14 -.04 -.03 -.04 .04 .02
nsh2 .07 -.03 -.01 .01 .01 .01 .01 -.03 .03
SS .11 -.07 .25 .00 .15 -.06 -.18 .11 -.21
NSS .03 -.01 -.05 -.03 .03 .02 .01 -.01 -.05
93-99 AL
1.... 2.... 3.... 4.... 5.... 6.... 7.... 8.... 9
out -.303 -.315 -.334 -.338 -.316 -.290 -.276 -.269 -.263
bb .39 .38 .29 .31 .30 .30 .31 .33 .39
s .49 .51 .46 .49 .46 .45 .47 .49 .53
d .77 .78 .74 .75 .79 .76 .78 .81 .84
t 1.07 1.05 .92 .97 1.05 1.13 1.01 1.10 1.15
hr 1.30 1.36 1.37 1.40 1.41 1.42 1.41 1.45 1.47
sb .22 .20 .13 .12 .14 .16 .18 .20 .25
cs -.41 -.53 -.51 -.45 -.44 -.42 -.42 -.42 -.42
sh1 -.04 -.09 -.12 -.20 -.19 -.16 -.11 -.11 -.03
nsh1 .03 .01 -.02 .02 .00 .01 .05 .03 .06
sh12 -.05 .14 -.10 -.13 -.06 -.12 -.12 -.09 .03
nsh12 01 -.05 -.08 -.01 -.06 .02 -.10 .05 .00
sh2 .01 -.05 -.03 -.24 -.16 -.07 -.11 -.07 -.10
nsh2 .01 -.03 -.04 .03 -.01 -.05 .00 .00 .00
SS -.25 -.02 .04 .00 .04 .00 .10 -.08 -.20
NSS .01 -.04 .00 .01 .08 .02 -.01 .00 .07
Key: The #9 slot in the NL includes pinch hitters as well as pitchers. Outs are ALL outs, including DP's and TP's (AB-H). SB are ALL SB's. I'm not sure if they include errors where the baserunner(s) advance. They do not include a SB attempt where the runner is safe on an error (no SB). A sh1 is a bunted ball put in play with a runner on 1st only and 0 outs (it includes singles, errors, and outs). A Nsh1 is all other PA's OTHER THAN A BUNTED BALL, with a runner on first only and 0 outs. A sh12 and a ns12 is with runners on 1st and 2nd and 0 outs. A sh2 and nsh2 is with runner on second only and 0 outs. Keep in mind that the nsh's include times when the batter started to bunt and then changed to swinging away or struck out or walked. (The sh's only include bunted balls put in play.) SS is a suicide squeeze attempt. This includes all bunted balls put into play with 1 out and a runner on third only AND missed bunts where the runner on third is out at home (CS). A NSS is all other PA's other than a SS with a man on 3rd only and 1 out.
There are lots of interesting numbers here. I'll comment on a few. In both leagues, the HR value goes up all the way to the end of the BO. This is (presumably) because you have runners on base AND the average number of outs goes up. A home run is worth more with 2 outs than with 0 or 1. I think perhaps a guy with a lot of HR's but a low OBP or BA(Garrett Anderson, Rob Deer, Pete Incaviglia) should hit 7th or 8th, instead of the traditional 4th or 5th. It looks like the #9 hitter in the AL should have lots of walks with a low BA and some power (Rickey Henderson) and lots of SB's (the SB/CS break even point in the 9 hole is 63%!). Your high BA guys definitely should be #3 or 4 because of the high cost of the out, but a lot of power is not needed! Your leadoff guy should definitely have lots of walks, SB, and no power (it is the least productive spot for HR's, and not very productive for Dbl's). His BA doesn't matter much since the cost of his outs are about average; however, if he had a real low average with high bb, you might want to put him in the 9 hole (if he had some power as well).
I also have the values for K's vs. non-K outs, fly outs, ground outs, GDP's, ground outs no DP in a DP situation, non-K outs with no DP. If anyone wants these let me know. They are not that interesting by BO, although the value of the GDP is considerably different from slot to slot (-.792 in the 7 hole to -.955 in the 4 hole!).
Any questions? Got milk?
[Edited by mgl on September 21st, 2000 at 03:57 AM]
MLG, very impressive! Just to make sure I follow you on your methodology, your basis is to assume the "real-world" example of whatever the actual #2 #3 and #4 hitters would do, and that they were NOT neutralized. This is totally acceptable for what we are trying to accomplish.
The other one I have to think about it is NL #9 slot. Let me ask a question. If let's say there were no pinch hitters (i.e., look at only the first 3 innings of each game), will the LW value for that spot be much much lower? And if you look at innings 7+, where essentially it's the pinch hitter, does the NL #9 spot look like the AL #9 spot? I am trying to figure out if the reason that the LW values are so different is because the markov chain EXPECTS low runs, and so lower LW values result. Not sure if I am making sense here. And so, if you do treat the NL #9 spot as a "2nd leadoff" position, then won't the NL #8 spot's LW also change drastically?
Anyway, there's tons of great stuff here, and I'll look into this more closely tonight. Definitely much appreciated! If you can send me the spreadsheet with all your various LW values, that'd be much appreciated as well.
The first thing I did was plug M Ramirez, 1999 (.333/.442/.663) into 3 AL slots. The results: Leadoff = +59 runs, Cleanup = +42 runs, 9th = + 88 runs. How should this be interpreted? (To be clear, I am assuming that the BB value includes IBB but not HBP, and that the outs value should be multiplied by the batter's (AB-H+GDP) total.)
Tango,
I'm not sure what you mean with the "Markov chain" thing. (I know what a Markov chain is.) The reason the NL and AL values are different are simply because of the DH (#9 spot not occupied by the pitcher). Of course, this changes the values of all the slots. Yes, if we looked at only those times that a non-pitcher batted in the 9th hole, all the values would be similar to the AL (I assume). Other differences would have to do with the overall difference in the quality of the leagues, and the different ballparks. (BTW, hitters are substantially better in the AL the last few years, or perhaps pitchers are worse.)
Of course, if we only looked at the times when a pitcher batted in the 9th hole, the other slots' values would be different as well, particularly those that are close to the 9 slot (7,8,1,2).
David,
Interesting results for Ramirez. I'm not sure what it means either. The first thing you would have to do is take the average player and see what his lwts looks like in each slot. Then you can compare that to how an individual player's lwts changes from slot to slot, as you did with Manny. Also, you don't want to include IBB's with BB's. IBB's are near neutral events. HBP's you do want to include. For all intensive purposes, a HPB is exactly the same as a regular BB. I believe it is as batter dependent as a regular BB. As far as outs goes, the value for the out already includes the GDP. So you can eliminate the DP factor completely or you can take .02 off the out value (-.28 instead of -.30) and then add the GDP. Remember that a GDP is actually worth about 3 times that of a non-DP out, so you have a choice. If you want to evaluate what a player actually did, you should use a formula like Vout*(AB-H+2*GDP). If you want to estimate what part of the GDP he is "responsible for," you might want to use the standard (AB-H+GDP) mulktiplier for the out value.
I guess the interpretation is clear: The higher RAA for Ramirez at the 9th slot doesn't mean he would be better suited there; it means that he would be further above the average for that slot, as expected. Right? The slot Ramirez is best put in also depends on how all the other batter's RAA would change if you moved Ramirez to 9th, or wherever.
For me, the most interesting category is the outs. If I had to pick one slot for whom OBA has the most team importance, I would have picked leadoff. It appears that the answer is the 3rd and 4th slots. For the leadoff man, the greater frequency of batting with no outs is pretty much canceled out by the lesser frequency of men on base. So, is it really a mistake by an average team to do what they often do--obtain the best 3-4 batters they can and put the leftover .340 OBA slap hitter in the one slot?
I think so. I think that any interpretation from mgl's data should be made carefully and conservatively. As I alluded to in an earlier post, lineup design is a very interactive procedure. The wisdom of moving a Deer or Incaviglia from 5th to 7th cannot be made in isolation.
David, Yes, an excellent point. We see how much the LW weights change when you have the pitcher in the #9 slot hitting. Moving a player with a weird profile like Rob Deer should have an impact to everyone around him.
As for interpreting the results, you would definitely have to compare him to league average (or even better, the rest of his teammates). So, if the Indians are your team, take their 9 regulars, run them through MGL's numbers, and see which players end up better than the rest. Obviously, Manny might end up being the best in several slots, but the key is where does he maximize his numbers
MGL, I know very little of markov chains, and so I will not comment further on that. However, my problem (and maybe you can help me on this one) is this: let's say we look at LW by pitcher. We agree that each pitcher should have his own set of weights, with Pedro's being drastically lower. It makes sense to me simply because all the PA's belong to Pedro, and each hit rarely results in a run, and therefore his LW values should all be lower. However, my problem comes into play where you have one guy who is so far worse than the rest of the gang, and that being the #9 NL hitting spot. Why would the single LW be far different for a #9 NL hitter than a #9 AL hitter? That is, they both have practically the same number of opportunities to drive in a run, right? And they both have the same type of batters coming up, right? Just because the #9 NL spot is so far below productive than the #9 AL spot, why "punish" his offensive events? If anything, I can see where his outs would be alot worse, simply because he prevents the top of the order from coming up to the plate. So, getting back to your methodology, I am wondering if you can explain for this?
I think that the better methodology might be to figure out how often the pitcher comes to the plate in the 24 base/out situations, and apply the LEAGUE AVERAGE LW values to each of the 24 situations.
I also think that the players most affected should be the #8 NL spot, because once on base, he has a far worse chance of scoring than does the #9 NL spot (the pitcher).
Spoke too fast. Ok, I see that the offensive numbers look fine for the #9 spot. However, it's the outs number that seems very low. Wouldn't the #9 NL out cause alot more damage?
Tango,
Yes, the out value for the #9 spot in the NL looks awfully low. As you implied in your other post, the values for any particular spot should be around the same, regardless of whether Babe Ruth or Babe Ruth's grandmother hits in that spot. The primary effect on a slot's values is the slots before and after that slot. So the NL and AL 9th hole should be very similar. That's why the out value in the NL looks a little strange. Why do you think it should "cause more damage?"
Also, remember that because I used the empirical method to come up with these values, they "are what they are." Unless I have a bug in my number crunching program (which is possible), they are all correct by definition.
MGL, the only way for the out value to be that low is if the #9 slot comes up alot in low RE situations, meaning 2 outs, or no one on base. Of all the 9 spots, I would think that the leadoff hitter would have the most skewed type of occurrence pattern.
The other thing that might be working with your program is what I alluded to with Pedro. His outs value is very low simply because no one scores on him. Even when there are runners on base, his outs value is low simply because no one will drive them in.
I am wondering is this kind of effect is happening to your methodology. That is, Pedro's LW MUST be low simply because this is the only way all the offensive and out events will add up to his actual runs allowed. I am wondering if perhaps this kind of effect is happening on the #9 NL spot.
One question: does your program create the RE from the time the batter comes up to the plate to the time the INNING is over or the time the GAME is over? This might be the "error" in your methodology. If you treat it to the INNING, then this kind of forces the Pedro-effect, as the expectation for a run is so low BECAUSE the pitcher is batting that the value of the out must be low as well.
However, when a pitcher is out, he completely deflates the contribution of the top of the order. I suggest that RE should be calculated to end of game.
We probably never encountered such a situation simply because everything got averaged out nicely. But now, we have to look at things a little differently.
Of course, if your RE is to end of game, then I'll have to give it some more thought.