Runs Produced

Should we subtract the Home Run or not?

© Tangotiger

The presumption in this article is that Runs Scored (R) and Runs Batted In (RBI) have value. This article will not discuss the biases in R and RBI. I am setting aside discussion of those biases because they are not germane to the point of this article. It may seem strange that I need to make such strong statements. What is clear to me is that a large group of readers see the term RBI and come fully-armed to do battle, even if one does not exist. They want to fight, and need to fight, the fight in alerting the mainstream of the biases in these metrics. This ain't the place for that fight.

Definitions

Runs Produced is a term used to represent R+RBI-HR. There are two issues. The first is the name. Just as people argue ridiculously over Quality Start (QS), they make the same arguments over Runs Produced (RP). Bill James noted that if the QS metric was simply called something else, like The Johnson Game (after Walter Johnson), this wouldn't have been an issue. Trying to capture a one-sentence, twenty-word definition into two words is a recipe for disaster. There is nothing wrong with a definition that says: A game in which the starting pitcher allows at most three earned runs, while pitching at least six innings. It's simple, direct, and to the point. Is it so terrible to call that a Quality Start? Would it have been better to call it a Quatlu Start, basically just creating a word? People have such strong opinions on what Quality means that they simply refuse to allow any additional definitions. I suppose we can call it a Maddux Game and be on our way.

The idea of Runs Produced causes similar stubborness. Why does a solo homerun have less RP than a run resulting from a single? is the common complaint. Yet, we have no problems with the NHL awarding a single point on an unassisted goal, and awarding up to three points on assisted goals. The problem lies with the term RBI which is runs batted in. MLB could easily have created RDI, or runners driven in. In effect, the MLB-equivalent of an NHL assist. And then who would complain about MLB creating a term called Points, which would be runs scored (goals in the NHL) and runners driven in (assists in the NHL)? Points is in effect R+RBI-HR. Unfortunately, the term is not called Points, but Runs Produced. And doubly unfortunate, the calculation is not R+RDI, but R+RBI-HR. Why subtract the HR? is forever on our backs.

Background

In the 1987 Baseball Abstract, Bill James started discussing Runs Produced. I thought at that moment that he would shed the light, and stop this nonsense. Instead, he said something to the effect that Runs Produced made a terrible mistake in subtracting HR. And even more amazingly to me, he said that the HR should not be subtracted, but added, because of its unifying power. Or something to that effect. The answer to the solution is quite easy, and I thought maybe Bill simply overlooked it. Recently, I bought the 1983 Baseball Abstract where Bill James' thoughts on Runs Produced were very elaborate. And he "pre-iterated" his stance (though "re-iterated" based on my timeline). What made it even more surprising is that easy solution I just hinted at was the one being offered by Bill James to argue against subtracting the HR. His evidence supported him, but only because of small sample size-itis.

Even another great sabermetrician, Keith Woolner, offered in an online chat a few years ago that if you insist on using Runs Produced, that you shouldn't make the silly mistake of subtracting the HR.

When one is up against giants, one better be prepared to have more than a rock and a slingshot.

Get On With It Already

I will now present the easy solution that I hinted at. In this article, I will focus on all hitters with at least 300 at bats plus walks (300 PA) since 1995. This gives me 3155 batters. I then scale each of those seasons to 600 PA.

From that dataset, I label anyone with less than 10 HR as "No Power", 10-20 HR as "Average Power", 20-30 HR as "Strong Power", and anyone with 30 or more HR as "Ruthian Power".

I also calculate each hitter's wOBA, to represent their overall hitting impact. wOBA scales like OBP, so if you're not sure what wOBA is, and you don't want to click the link, just think of OBP, and assume that it properly balances the walks, hits and homeruns. I create five classes of hitters:
- Poor (wOBA under .285)
- Fair (wOBA between .285 and .325)
- Average (.325 to .365)
- Good (.365 to .405)
- Great (wOBA of at least .405)

Here's the question to ask: if you have two guys who are overall average, but one derives almost no value from his HR, and another derives a great deal of value from his HR, shouldn't his Runs Produced be similar? That is the point of the metric, to put on the same scale guys that are so diverse.

I identified 237 batters who were overall Average, but had No Power. Their overall wOBA was .340, which translates to around 79.3 runs created. There were 80 batters who were overall Average, but had Ruthian Power. Their overall wOBA was .351, which translates to around 84.7 runs created (after adjusting for the fewer outs). So, we see that they are overall similar, with the second group of hitters around 5 runs better.

The R+RBI of the two groups were: 137, 178. That difference is 41 runs! And their R+RBI-HR? 130, 144, or a difference of 14 runs. Which one better represents the value of our hitters?

How about we look at overall Good hitters? There were 27 who had No Power, with a wOBA of .377, and 99 RC. There were 192 with Ruthian Power, with a wOBA of .386 and 103 RC. We see here that they are close, with about a 4-run advantage for our Ruthian hitters. The R+RBI? 155, 195, for a 40 run difference. R+RBI-HR? 148, 160, for a 12 run difference.

Let's look at the Fair hitters, with 412 having No Power, 360 having Average Power, and 99 with Strong Power. Their wOBA (and RC), respectively: .306 (61 RC), .309 (63 RC), .312 (65 RC). Again, fairly similar. And their R+RBI: 124, 137, 149. But subtract the HR, and you get: 118, 122, 125.

Why?

The runs scored can be approximated by:
.27*1B + .44*2B + .61*3B + 1.00*HR + .27*BB + reaching on error

RBI is roughly equal to:
.2*1B + .4*2b + .6*3b + 1.6*HR + .02*BB + runners on batting outs

If we add the two equations together, we get:
.47*1B + .84*2B + 1.21*3B + 2.60*HR + .29*BB + ...

As you can see, the numbers are very similar to Linear Weights, except for the HR. If we had instead used runners driven in (RBI-HR, or RDI), we'd have:
.47*1B + .84*2B + 1.21*3B + 1.60*HR + .29*BB

And that makes more sense. The HR hitter still gets a bit too much credit, and the walker still doesn't get enough, but at least R+RBI-HR is a better approximation than R+RBI.

Final Adjustment

A final adjustment step can be made to go from Runs Produced (R+RBI-HR) to Runs Created. And a quick useful one to do that is by subtracting outs/7. It's possible that the better adjustment is AB/10. I don't know. You can work it out.

I published the full data on Google Docs. Feel free to manipulate the data as you see fit.

Yeah, but...

...It doesn't reconcile at the team level, right? Right. It's not supposed to, necessarily. Just like it won't reconcile in the NHL, either. It just means that you need to apply a team-level adjustment, rather than a league-level adjustment, to Runs Produced. So, instead of outs/7, a team might need outs/6.8 or outs/7.2, or whatnot.

If you have a team with 0 HR, then the Runs Produced (or Points, if you are still hung up on the name) of that team will be much much higher than a team that scores the exact same number of runs, but has alot of HR. Each hit on the non-HR team has alot more runners driven in, because there were no HR to clear the bases.

What is without question is that the HR needs to be subtracted. It's the only way to reconcile the 100 RC guy with 35 HR and the 100 RC guy with 5 HR. I understand the theoretical objection. There's no reason to restate the objection. It's time to look at the actual data.