Tango on Baseball Archives

Factors that affect the chances of scoring (September 24, 2003)

Over at Clutch, there was a discussion about how often Rickey compared to Bonds would score on their walks. Discussing this from a more general level, here are the factors that affect how often a runner will score, when reaching base:

1 - Does he steal? A great basestealer will add about 10 runs to his total. Given that the average runner will get on base about 200 times, that's worth about .05 runs / time on base

2 - Is he a good baserunner? I can get into a long approach to show how to work this, but MGL has provided the necessary data in his superLWTS that shows that a good baserunner adds about 6-10 runs per season, or about .04 runs / time on base.

3 - Does he get on base with 0 outs? A leadoff hitter will get on base with 0 outs about 48% of the time, while he'll do so about 26% of the time with 1 or 2 outs each. The typical runner will score about 45% of the time with 0 outs, 30% with 1 out, and 15% with 2 outs. Working it out, and we see that the leadoff hitter gets about a .03 runs / time on base advantage.

4 - Does he have good hitters hitting behind him? When I did that batting order thread at fanhome a couple of years back, I produced some interesting data. One of the results of that is that you get a .02 runs / time on base advantage for the leadoff hitter, but almost a .04 runs / time on base for the #2 hitter. (Sidenote: the #2 hitter should usually be your BEST overall hitter. This is probably Bonds' best spot in the order.)

5 - Does he get lots of doubles and triples? Again, pretty obvious here. The difference between getting on base 15% of the time on doubles and triples and 25% of the time on doubles and triples works out to about .02 runs / time on base.

So, if Rickey Henderson scores 40% of the time he reaches base on a walk and Bonds does so 25% of the time (I have no idea what the numbers are), there are at least 5 big reasons that would explain this difference.

--posted by TangoTiger at 12:22 AM EDT

Posted 3:41 p.m., September 24, 2003 (#1) - OCF
One of the results of that is that you get a .02 runs / time on base advantage for the leadoff hitter, but almost a .04 runs / time on base for the #2 hitter. (Sidenote: the #2 hitter should usually be your BEST overall hitter. This is probably Bonds' best spot in the order.)

I always liked the batting order of the 1982 Brewers. Well, maybe not Simmons cleanup, but that's a triviality. Molitor, batting leadoff and on the good side of all 5 of the points above, scored 136 runs. Yount, the best hitter on the team, batted second and scored 129 and drove in 114.

When you correct for the different eras, Yount's 1982 is extremely similar to Alex Rodriguez's 1996, also batting second, but without a Molitor in front of him.

Posted 11:23 p.m., September 24, 2003 (#2) - RossCW
Tango -

Obviously number 5 does not have an effect on how often a player scores after a walk.

And if number 4 is correct, then how good the pitcher who walks him is at preventing hits must also have an impact. As was pointed out on the other thread - pitchers who walk a lot of batters tend to have lower opponents' batting averages since if they don't, they don't last long. Thus players are more likely to score after a hit than after a walk.

Posted 7:40 a.m., September 25, 2003 (#3) - Tangotiger
And you are wrong if by "hit" you mean single. The chances of scoring on walk/single are virtually the same. I'll point you to the research on this later.

Posted 10:16 a.m., September 25, 2003 (#4) - tangotiger (homepage)
This excerpt was published at the above link.

===============
Table 1. For all methods for leadoff batter to reach
base, number of times each event occurred, the number of
times that batter scored and the frequency of each. Note
that the "E" category includes all times the leadoff
batter reached on an error, which includes those cases
when he went past first.

Event Reach Score Freq
1B 183468 72841 .397 ************
2B 48364 30961 .640
3B 6573 5753 .875
HR 27205 27205 1.000
BB 82637 33002 .399 **************
HP 6217 2543 .409 **************
INT 81 22 .272
E 12105 5298 .438
==================

As you can see, it doesn't matter how you got to 1B, via walk or single. The chances of scoring were virtually the same.

I would guess that the rate for the HBP can be explained by random chance (1 SD = .006). The INT rate is statistically insignificant (only 81 samples, making 1 SD = .054).

Posted 10:27 a.m., September 25, 2003 (#5) - tangotiger
By the way, here's some interesting data. From 1994-2002, here is the weighted ERA of the pitchers (weighted by the event). What this tells you is: "Given that a BB was given up, what's the ERA of that pitcher?"

eraBB.....eraSO.....eraHR.....eraBIPH.....eraH.....eraOuts.....eranonKouts
4.77.....4.38.....4.94.....4.71.....4.74.....4.51.....4.55

So, the difference in performance between a pitcher giving up a walk or a hit is pretty darn close.

And as we see from David's chart above, the overall impact is just not there.

(eraOuts is League ERA by definition.)

Posted 8:04 p.m., September 25, 2003 (#6) - FJM
Since eraBB > eraH and since eraHR >> eraBB (and presumably eraHR > era3B > era2B > era1B), it follows that eraBB >> era1B! (">>" means "much greater than".) Which appears to contradict the observation that it doesn't matter how the batter gets to first base. Can you post the values for era1B and era2B?

Posted 11:03 p.m., September 25, 2003 (#7) - Tangotiger
I don't have them easily available. However, if eraBIPH is 4.71, and since 70 to 75% of those are singles, I'll guess that era1B is 4.69 and era2B3B is 4.77.

And while worse pitchers do give up walks (and therefore you expect them to allow more of them to score than on singles), you are probably talking such a small impact when you look at the distribution of pitchers, that you get the reported results that the chances of scoring on a single and walk are virtually the same.

The difference between an ERA of 4.69 and 4.77 is .08 ER, or .09 runs per 9 IP. Or .09 runs per 12 nonHR baserunners. Or .0075 runs per baserunner.

More likely, it's something like a pitcher giving up 5.2 runs / 9 IP with 12 nonHR baserunners and 5.29 runs / 9 IP with 12.1 nonHR baserunners. Take out the let's say 1 HR for the first pitcher, and the 1.01 HR for the 2nd pitcher, and you get:

5.2-1=4.2 runs per 12 nonHR baserunners = .350 runs per baserunner
5.29-1.01=4.28 runs per 12.1 baserunners = .354 runs per baserunner

(I'm sure you can work out the numbers better than me.)

In any case, it's rather easy to see how you can start tweaking numbers here and there to get the empirical results. I did this off the cuff, and I'm showing a .004 difference, and in fact, the empirical is showing .002.

Posted 2:53 p.m., September 26, 2003 (#8) - RossCW
Does leadoff batter refer to the position in the batting order or leading off each inning?

Posted 3:48 p.m., September 26, 2003 (#9) - tangotiger
If you are referring to this

A leadoff hitter will get on base with 0 outs about 48% of the time, while he'll do so about 26% of the time with 1 or 2 outs each

this is saying that the leadoff hitter of the game will get 48% of his times on base with 0 outs.

Posted 6:34 p.m., September 26, 2003 (#10) - RossCW
If you are referring to this

No - I was referring to this:

For all methods for leadoff batter to reach
base, number of times each event occurred, the number of
times that batter scored and the frequency of each.

Posted 6:56 p.m., September 26, 2003 (#11) - Tangotiger
Those are leadoff innings.

Posted 8:19 p.m., September 26, 2003 (#12) - FJM
I think you'll find that era2B3B is more like halfway between era1B and eraHR.

Posted 9:30 p.m., September 26, 2003 (#13) - Mike Emeigh(e-mail)
(eraOuts is League ERA by definition.)

I don't think this is quite correct; because there are innings in which runs are scored without an out being recorded (leadoff walkoff HRs and the like), I would think the eraOuts would be slightly less than the league ERA. If you're calculating eraEvent on a per-game basis, then it would be true, but if it's being done on a per-inning basis or a per-pitcher basis it would not be.

-- MWE

Posted 7:55 a.m., September 27, 2003 (#14) - tangotiger
Just to clear up what I'm doing:

Sum(ERA*IP)/Sum(IP), where I sum over all pitchers

Expanding that, and we get

Sum([ER/IP*9]*IP)/Sum(IP)
Sum(ER*9)/Sum(IP)
9*Sum(ER)/Sum(IP)

And that gives you lgERA.

Posted 7:57 a.m., September 27, 2003 (#15) - tangotiger
Oh, and this is true, as long as the IP > 0 for every pitcher.

Posted 5:00 p.m., September 28, 2003 (#16) - RossCW
Since 1965 the ratio of walks to hits has been .375 (501901/1338995). The ratio of walks to hits in the leadoff example above is .241 (33002/136760). Apparently batters are far more likely to walk when they are not leading off an inning - I would assume that means they are also far less likely to score.

Posted 5:44 p.m., September 28, 2003 (#17) - RossCW
That should have been far less likely to to walk leading off an inning and therefore far less likely to score after a walk.

Posted 12:22 p.m., September 29, 2003 (#18) - FJM
You picked up the wrong column of numbers from the Tango's leadoff table. You want the "Reach" column, not the "Score" column. Specifically, the ratio you want is 82,637/265,610.

Since that works out to .311, your point is still valid, however. That's still far less than the overall BB/H ratio you quote (.375).

I wonder how much of that difference can be explained by Intentional Walks. Frankly, I was surprised to see the leadoff batter ever getting an IBB.

Posted 12:54 p.m., September 29, 2003 (#19) - tangotiger
Ross, I really don't know what your point is.

Given that a player reaches 1B with 0 outs, it doesn't matter if he gets there by single or walk, as he'll score about 40% of the time. Whatever extra information is contained in the manner in which a runner reaches 1B is almost insignificant. The significance is almost completely explained by my 5 factors.

Posted 9:00 p.m., September 29, 2003 (#20) - RossCW
I really don't know what your point is.

My point is that there appears to be a sixth factor:

A smaller percentage of players' walks come when leading off an inning than the percentage of hits that come when leading off an inning. This means that drawing a walk, on average, is less likely to lead to a run scoring than a hit because a walk is less likely to happen at the point most likely to lead to a runner scoring.

Posted 6:53 a.m., September 30, 2003 (#21) - Tangotiger
This means that drawing a walk, on average, is less likely to lead to a run scoring than a hit

This is a given. Walks occur more frequently with 1b open than not, relative to a single, and walks occur more frequently with 2 outs than not, relative to a single.

Across all base/out states, no question, a walk leads to less runners scoring than a single. But, GIVEN the base/out state, the runner on a walk is virtually just as likely to score as the runner on a single.

Are we agreed?

Posted 10:24 a.m., September 30, 2003 (#22) - RossCW
But, GIVEN the base/out state, the runner on a walk is virtually just as likely to score as the runner on a single.

The data here says that is true for leading off an inning. I have no idea whether it is true under other circumstances. It would seem unlikely since a runner is much more likely to advance to second on a single with a runner on base, while a leadoff batter advancing to second is counted as an error in the data above.

This is a given. Walks occur more frequently with 1b open than not, relative to a single, and walks occur more frequently with 2 outs than not, relative to a single.

So when we look at Factor 3 above, how does this effect it? Doesn't it imply that a batter with more walks is less likely to get on base with 0 outs and therefore less likely to score?

Of course we are talking about the average walk. It may well be that the pattern of when players walk varies widely. In fact its pretty likely.

Posted 11:56 a.m., September 30, 2003 (#23) - tangotiger
I have no idea whether it is true under other circumstances.

I'm not going to run the data for you, but my best guess is that it is the same. How you get there is inconsequential, after you account for "the 5 factors". I can say this with confidence because my Markov models match the empirical based on this assumption.

It would seem unlikely since a runner is much more likely to advance to second on a single with a runner on base, while a leadoff batter advancing to second is counted as an error in the data above.

I read this 4 times now, and I don't understand what you are trying to say.

So when we look at Factor 3 above, how does this effect it? Doesn't it imply that a batter with more walks is less likely to get on base with 0 outs and therefore less likely to score?

Correct. As you note in your next statement, this is the average walk. While the average single will score about .26 times, the average walk will score .25 times. (These numbers are dependent on the environment, but the effect is about .01 runs / time on base.) I documented this somewhere in the comments section of the "How Runs Are Really Created" series.

By the 24 base/out states however, there is no difference. (or the difference is as shown in my quote above, with a difference of .002 runs / time on base). Talking fan to fan, it's no difference. Talking professor to professor, well, that's boring isn't it?

Of course we are talking about the average walk. It may well be that the pattern of when players walk varies widely. In fact its pretty likely.

The walk and K varies the most by base/out states, and therefore, I suspect that those kinds of players vary the most.

Posted 10:30 p.m., September 30, 2003 (#24) - RossCW
my best guess is that it is the same.

I see no reason to believe that and pleny of reasons why they wouldn't.

I read this 4 times now, and I don't understand what you are trying to say.

1) The data above counts leadoff singles where the runner advances to second on an error as an error. I can't think of any other circumstances under which a runner will advance to second on a leadoff single. Can you?

2) With a runner on base a single will sometimes result in the batter reaching second when the throw is made to get the lead runner. Presumably this increases the chances of the batter scoring. A walk on the other hand will never get the batter to second.

And there may be any number of other factors in game situations that will influence actual outcomes. This makes it unlikely that the numbers are the same for other situations. Whether any are significant you can't know without running the numbers for actual events.

While the average single will score about .26 times, the average walk will score .25 times

This is based on actual events from what period of time? This appears to be contradicted by the fact that batters who get on base with a larger proportion of walks tend to score fewer times per time on base.

By the 24 base/out states however, there is no difference.

Which means nothing for the overall likelihood of scoring on a walk or a single. If the distribution of walks and singles across the base/out states is not the same then the chances could be identical in each instance and widely different in aggregate.

The walk and K varies the most by base/out states, and therefore, I suspect that those kinds of players vary the most.

I've read this four times and don't catch your meaning.

Posted 8:04 a.m., October 1, 2003 (#25) - Tangotiger
Ross, if you understand Markov, then I don't get your followup questions, and I will be happy to bow out at this point. If you don't understand Markov, then I'd be happy to explain it.

Which means nothing for the overall likelihood of scoring on a walk or a single.

I know. That's why I said the .26/.25 for overall, for this very reason. Overall, it's .26/.25. By the 24 base/out, it's almost certainly the same. I say that because of Markov.

Posted 11:52 a.m., October 1, 2003 (#26) - RossCW
If you don't understand Markov, then I'd be happy to explain it.

I don't understand Markov.

From This Link:

http://www.taygeta.com/rwalks/node7.html

A Markov chain is a sequence of random values whose probabilities at a time interval depends upon the value of the number at the previous time

Does not seem to have anything to do with the question of whether a player is less likely to score on a walk than a single.

That's why I said the .26/.25 for overall, for this very reason

Is this based on actual events from specific season(s)? If you have the actual data of how many times a player who walked scored and how many times a player who hit a single scored the math is not complicated and requires little analysis. You have six data points: how many walks, how many times the player who walked scores and the percentage, how many singles, how many times the player who singled scored and the percentage.

The data above would indicate that someone who walks should be much less likely to score. They are much less likely to get on base leading off an inning. On the other hand perhaps that difference is insignificant when looking at all walks.

Posted 12:29 p.m., October 1, 2003 (#27) - tangotiger (homepage)
Basically, what Markov says is that "how you enter a state is independent as to how you leave a state".

So, if you can picture the different ways that you enter the "2nd and 1 out" state:
- start state: 1 out, event: double, end state: 2nd and 1 out
- state state: 1 out, man on 2b, event: single, end state: 2nd and 1 out (runner scores, and batter takes 2b on a single... something that is unlikely with a walk, I agree)
- state state: 0 outs, man on 1b, event: succ bunt, end state: 2nd and 1 out

etc, etc, etc.

So, GIVEN that you've got a man on 2b and 1 out, what happens to that runner, according to Markov chains, is independent as to how he got there.

Now, is this true?

What's cool is that you can compare the expectation from a Markov chain to empirical analysis, and you'll get results that are close enough that you can make that claim.

The same applies for getting on 1b and 0 outs. If you get there on a walk, you know that the pitcher is slightly worse than the pitcher that made you get on base with a single. Maybe if you got a walk, you are more likely to come from the top of the order, and so you have better hitters behind you. Maybe a single happens with fast players more, etc, etc, etc. There are alot of "hidden information" contained in how you get there. But, empirical analysis shows that the rate at which you score from a walk or single with 1b and 0 outs is virtually the same (.399 to .397).

Now, when I produce the Linear Weights results for each event, I base this on the empirical data for 1999 to 2002. That is, I look at exactly how many runs scored from that event/base/out to the end of the inning. (See above homepage link. The overall values are in the last line.)

If I set up my Markov chains, I'll get overall numbers that match pretty closely to that last line.

And, as a third step, I can also reproduce the numbers in the last line by assuming things like "single/walk = same chance of scoring". So, again, whatever differences might exist is insignificant.

I'm happy with the data that I've produced, and the reasoning behind it. I'm not going to do further work here, but I encourage you to download the Retrosheet event files, or Ray Kerby's program, and do the work yourself.

Finally, while a single might find the batter at 2b (because of error on the throw, or taking 2b on the throw home, etc), he might also find himself out for trying to take the extra base. Like I said, my best guess is that the difference between a single and walk scoring is probably pretty darn close for each of the 24 base/out states. This assumption leads to results that are consistent with the empirical LWTS, the Markov LWTS, and the LWTS process I detailed in Article 2 of "How Runs are Created".

Posted 12:32 p.m., October 1, 2003 (#28) - tangotiger
(Note: at the above link, you'll find very weird results for HBP and IBB. Those are sample size issues. There are just not enough of them, at each of the base/out states, to feel confident with them. Even overall, the margin of error is high. )

Posted 9:03 p.m., October 9, 2003 (#29) - RossCW
So, GIVEN that you've got a man on 2b and 1 out, what happens to that runner, according to Markov chains, is independent as to how he got there.

Now, is this true?

I think it clearly is true only if you choose to ignore any information how they got there provides about the state. To say that it doesn't matter whether a hitter got to second on a double or a bunt single and a stolen base is true only if it is the same hitter.

In fact, when one looks at the disparity between players in the number of runs they score per time on base, it appears that who is on base may be more important than whether they are on first or second or how many outs there are. If Vince Coleman is on first he is quite likely to steal second and far more likely to score from first or second on a well hit ball. It may well be that he is more likely to score from first than Willie McCovey is to score from second. Or more likely to score from first with one out than McCovey is with no outs. The only way to know for sure would be to look at the actual instances for each player.

In general the batter who got there with a double is likely slower than the average batter who got there with a bunt single and a double. In other words the runner on second is more likely to be Willie McCovey than Vince Coleman. So while how a runner enters the state doesn't effect how he leaves the state it may give you critical information about what type of runner he is which is an important characteristic of the state and does effect how he is likely to leave it.

If I set up my Markov chains, I'll get overall numbers that match pretty closely to that last line.

I don't see how matching "overall numbers" demonstrates that you have considered all the critical variables in the on-base state.

To look at it another way, I ran numbers on how likely batters were to socre assuming that the critical factor was whether they were on base. I did not consider how many outs there were or what base they were on. Just that they had reached base. In other words I defined the "state" as being on base and how they left that state as either by the inning ending or by them scoring a run. And what I found was that how often a player scored once on base was as critical a factor in how often they will score as how often they get on base. Does that mean that which base they got on is not a critical factor? No, it just means that I didn't consider it.

Posted 8:34 a.m., October 10, 2003 (#30) - tangotiger
I've already acknowledged that speed is a component, as this was part of the 5 factors that affect scoring.

As a group, the chance of scoring from 1b on a single is close to that of a walk. Since there's alot of speed information hidden inside a SB, I would expect the chance of scoring from 2b would be diff between a double or single/walk+steal. Obviously, at the individual level, this is even more true. The point is to try to establish how much hidden information there is in the unconsidered variables. The single/walk 1B thing does not have much.

Posted 8:38 a.m., October 10, 2003 (#31) - Anonymous
.

Posted 9:02 a.m., October 10, 2003 (#32) - David Smyth
"The single/walk thing does not have much."

Well, I guess maybe not enough to affect the results, or maybe there is an offset. But I believe that that there is some hidden speed info in 1B and walks. At least, I know that the fastest runners tend to hit more 1B (per PA) than the avg batter, and draw fewer walks. As well, they tend to hit fewer HR, an avg number of 2B, and many more 3B (duh).

Posted 10:10 a.m., October 10, 2003 (#33) - tangotiger
This is my point. When you consider all the variables, they don't amount to a hill of beans, overall.