How are Runs Really Created - Third Installment
by Tangotiger
Part 5 - Some real-life examples
Previously...
Okay, so where do we stand? We know that what we are after is the rate at which runners score, given all the input variables. We know that when using the plus 1 method, the marginal impact of each event in the 1974-1990 environment must correspond to the actual run values, previously published. If we accept BaseRuns' theory on how the rate at which runners score is calculated (B/(B+C)), then we have to provide the coefficients in the "B" equation which corresponds to an actual known environment (1974-1990), and is verified against the plus 1 method. We've done this [in the last article].
What we don't know, yet, is if we can apply this equation to other environments, or to even extreme environments, like a team of Pedros or a team of Bonds or even weird profiles like Rob Deer. All we have to do is run our simulator. We will create various run environments, apply the formula that was fixed for the 1974-1990 environment, and see if it holds up against these other run environments.
To sim or not to sim
I've received (legitimate) complaints regarding the use of a simulator. After all, can we really accurately capture everything that happens in baseball? Yes, we can. But does my simulator actually do all that? No, it doesn't. For example, I do not have it put in a typical batting order, rather making everyone average. Since the results of doing this is remarkably close to what actually happened, there's very little to be gained by making it lineup-specific. (Other questions would necessitate such a thing, like when to walk Barry, when to sac bunt, etc). The other complaint, which I find more substantive, is that no one could reproduce my work without having a similar sim. That's true. And without peer review, who's to say that I'm not just making all this stuff up?
Retrosheet is a great site. Not only do they have the play-by-play event logs that I base all my programs on, but they also have "game logs" that show each team's game totals, game-by-game, since 1900. What can we do with this information? Lots!
Most run evaluators are measured against the seasonal team totals. The main reason is that seasonal totals removes much of the noise that exists in game totals. If one team gets 11 hits, 1 HR, and 3 walks and scores 8 runs, while another scores 2 runs, does that tell us much? There are two problems with aggregating by team. The first is that since the majority of the teams are clumped to the average (OBA of .300 to .360 is not that wide a difference), that we can't find the true differences. (Sure, run your regression, and you can get decent results.) Secondly, by aggregating the daily game-logs into seasonal totals by team, you are introducing a bias. Baserunning, parks, and opposing pitching will still pollute the totals. By aggregating all the 1987 Cardinal games, you are not accounting for their superior baserunning speed, or their park, or that they didn't have to face their own pitchers. Even worse, you are introducing this bias, rather than hope that it "cancels out" by having a larger sample size.
Reduce noise, remove bias
For the balance of this article, I will group the 1974-1990 game totals to reduce the noise, and not introduce any biases. Here are the game-by-game totals, summed by the number of HR hit in the game. The columns represent the number of HR hit, the number of games, actual runs scored, the BaseRuns estimate, the Linear Weights estimate, and the basic RC estimate.
Runs Scored, breakdown by HR hit HRclass n R BsR LWTS RC 0 33,068 3.08 3.06 3.79 3.03 1 23,117 4.62 4.62 4.44 4.66 2 9,218 6.12 6.12 5.00 6.41 3 2,838 7.65 7.65 5.62 8.37 4 687 9.03 9.00 6.07 10.29 5 146 10.55 10.49 6.73 12.45 6 40 12.33 12.32 7.52 15.35 7 9 16.22 14.32 8.34 18.27 8 2 14.00 15.87 8.58 22.52 10 1 18.00 18.30 9.51 27.03
Note: I am adjusting each of the various run evaluators to account for the information that we have, such that the overall runs scored for 1974-1990 matches the estimate.
A clean sweep for BaseRuns among all HR classes! Linear Weights woefully underscores the number of runs estimated in the high HR games, and Runs Created is equally poor the other way. This is in-line with our expectations. While RC has the right idea, its improper handling of the HR is magnified in this set of data. The static values of Linear Weights shows its inadequacy.
Score rate
As was discussed, the key to the run model is the score rate. Working backwards, we can determine the score rate implied by the LWTS and RC method (for BaseRuns, we know it is based on the B and C components). Representing the above chart, here are the score rates:
Score rate, breakdown by HR hit HRclass n SR bsr_SR lwts_SR rc_SR 0 33,068 0.267 0.265 0.328 0.262 1 23,117 0.304 0.304 0.289 0.307 2 9,218 0.340 0.340 0.247 0.364 3 2,838 0.373 0.372 0.210 0.430 4 687 0.404 0.401 0.166 0.504 5 146 0.430 0.426 0.134 0.578 6 40 0.454 0.454 0.109 0.671 7 9 0.610 0.484 0.089 0.746 8 2 0.414 0.543 0.040 1.001 10 1 0.533 0.554 (0.033) 1.135
Now it starts to make more sense. BaseRuns correctly determines the percentage of given baserunners that should score, for each HR class. But look at Linear Weights. Not only does it underestimate, but it actually is saying that there is a fewer percentage of runner scoring the more HR that are hit! Runs Created starts off fine, but as the number of HR starts to accumulate, its expectations of the % of runners score grows at too fast a rate, with no cap at 100%. With 10 HR, Linear Weights estimates negative runs per runner, while Runs Created estimates more runs than runners. The above chart as an image makes it quite clear:
It is good to remember that the design of RC and LWTS was based on "normal" hitting conditions. While they do fine in that regard, we see here the results of extrapolating into extreme conditions.
Here are all the score rates grouped by on-base average (in groups of .050).
Linear Weights and Runs Created both had negative numbers for the lower level of OBA. Runs Created does work fine for this set of data (because the % of hits that are HR is uniform across the data). Linear Weights also works okay for a particular set of data. As noted in the previous article, all the run estimators work fine when the OBA is between .300 and .400. This image shows this to be true. BaseRuns clearly overestimates at the .500 to .800 OBA levels.
Since we've got the data, let's construct more groupings. (Note: at no time should the grouping be based on runs scored. This is what we are trying to study, and any selection based on runs scored will be tainted.) Let's select based on OPS (in groups of .100).
Hmmm, BaseRuns seems to hold up pretty well with games having high OBA and high SLG. Since the "HR effect" is not masked as it was with the OBA groupings, we see very clearly how Runs Created and Linear Weights both break down at the high end. BaseRuns on the other hand tracks the actual score rates very well.
I'm certain that there's some combination of hits, HR, and walks where BaseRuns breaks down. But that is not based on the error in the design, since the design of BR x score Rate + HR = Runs is a true one by definition. Rather it would be in the B and C components that the BaseRuns is modeled after. As mentioned, while there is probably some other derivation of score rate that will be more accurate than what BaseRuns proposes, it will probably be not as elegant.
As well, any run measure that does not follow the design based on the score rate is doomed to fail at one extreme or another. The score rate is capped at both ends at 0% and 100% (meaning that you can't have more runs scored than baserunners + HR). This must be enforced by any run model. And BaseRuns is the only model that adheres to this constraint.
Ripple effects
Heisenberg Uncertainty Principle: the separation between the observer and the observed is always more-or-less arbitrary
Throw a pebble in a pond, and you will see some effects. The water will ripple on the surface in a particular pattern, the pebble will hit the bed causing whatever is on the floor to move to some degree. In all though, the equilibrium of the pond system has barely been altered. Drop a watermelon in an aquarium, and there will be much more changes. So much more changes that this aquarium system is no longer an aquarium system, but rather an aquarium system with a watermelon in it, and a new equilibrium will be reached based on the existence of the watermelon.
When we used the "plus 1" method to the hundreds of thousands of PAs of 1974-1990, it was like dropping a pebble in a pond. The dynamics of how runs are scored remained virtually the same. We are confidant that adding 1 HR to such a large system would not alter this system, and therefore we can measure the changes in the system and attribute that change solely to the HR.
If we have a system, say the 2002 Giants, with a player like Barry Bonds, and you have another system that has the same 8 players, but with some average player in Barry's place, you have two distinct systems. The players playing with Bonds have a certain run value for each of their accomplishments because of the potential that Barry has when he steps up to the plate. Getting a walk with Barry on deck is much more valuable than getting a walk with an average hitter on deck. Barry defines (part of) the system.
With or without you
The standard way to determine Barry's run impact to the Giants is to determine how many runs they'd score with him, and how many they'd score with some "replacement". The replacement player should not just be anything. Make him too bad, like a pitcher, and you alter the system again. The expectation of the other 8 players in the lineup is that an average team will use their average resources to make average moves so that they have an average player to play with. Therefore, the controlled environment to compare Barry to would be one that has his 8 teammates, plus an average player. The new environment has Barry playing with his 8 teammates. The difference in the result of these two different run environments will show Barry's run creation ability above an average player to the 2002 Giants. (Note: As mentioned last time, a +1 player in 300 PA is not necessarily "better" than a -1 player in 600 PA. The determination of who's better than who is outside the scope of this article.)
Custom Run Values
While we can produce custom run values by run environment that would apply equally to all players playing in that run environment, this would technically be the wrong thing to do. The reason is that each of the 9 players on a team plays in a different run environment. Barry doesn't get to drive himself in if he gets on base. Jeff Kent does get to score more if he gets on base and Barry is on deck. For the most part however, the effect of one player is very minimal in changing the run environment in which he plays (more like a pond than an aquarium). Therefore, using custom run values by run environment that applies to all players in that run environment is an acceptable alternative, for ease. But correctly, you have to measure the player against the environment that he finds himself surrounded by. The player cannot define his own run environment.
The problem here is that by not using the same custom run values for all players on the same team, you cannot guarantee that all the marginal run values of these players will add up on the team level. As mentioned in the first article, baseball is non-linear and interdependent. There is no expectation that everything should add up. Even accepting the nonlinearity and interdependence of run creation, it would add up almost exactly, if the player's effect was on a pond system. But with a player being one-ninth of the team, we don't have that luxury. You are left to either use the simpler custom values by team, or to apply adjustments after the fact such that everything adds up.
Since we have BaseRuns, we can use the plus 1 method to determine the run values for various run environments. Here is the custom linear weights value, by OBA
We see that BaseRuns does break down with respect to the run value of the triple when the OBA is very high. Overall, however, this image certainly looks familiar, doesn't it? In the first article in this series, I asked that we just use our common sense to determine the run values. Here is that image again:
So, what do we have? BaseRuns conforms almost exactly to reality, and almost exactly to our common sense of what each hitting component's run value is in various run environments.
I've previously published preliminary custom run values. With some new data at my disposal, I will be updating that chart to more accurately reflect each team's run environment.
Read this once, and forget I said it
If you did want everything to add up, you can perform some math wizardry. Start off with 9 average players. Then, change the leadoff hitter such that he moves 1/600th towards what the real leadoff hitter does (say add 1 hit to the 160 hits that the average hitter would hit). Measure the change. Then, do the same for the #2 through #9 hitter. Then, keep cycling through, each time making a minute change to the player in question, and measuring the change. Keep doing this until you end up with the actual 9 Giants. In this manner, you are trying to create a pond system. You want such minute changes that the run environment, after each change, barely changes. But after 5400 such moves, you end up with a brand new run environment. Again, you could make the changes 1/1,000,000th, and make 9 million moves to be more accurate. In this way, you are ensuring that everything will add up on the team level, and you are better controlling the run environment. Unless you are crazy, I suggest you not do this.
Runs Created is dead. It should only be used for "back of the envelope" calculations.
BaseRuns is the now. It is the only run evaluator that I am aware of that follows the definition of how runs are created. The search for run evaluators now is to determine the best estimator for the score rate.
Linear Weights is alive and well. It is the foundation for a whole series of run estimators and run modelers. Custom Linear Weights is a necessary offshoot (which we can determine using BaseRuns). Linear Weights by the 24 base-out states is another critical component. Linear Weights by batting order. By park, by fielders, by pitchers. There's really no end to this.
The holy grail would be Win Expectancy that includes the hitting team, the fielders, the pitchers, the park, the inning, the score differential, the base-out states, and the batting order. I believe that Pete Palmer has some of this already, and last year I had created a preliminary WE matrix using some of these variables (it wasn't mathematically generated, and therefore, it should be taken with a grain of salt). I believe that I have all the tools and information required to produce the holy grail. Since the philanthropist and venture capital markets don't exist for sabermetrics, I will only be able to devote my spare time to this pursuit.
Questions, comments, requests
This concludes my series on this topic. If there are any questions, comments, or requests, feel free to make them. Please refrain from "who's better than who" or "top 10" type requests. Those have a habit of snowballing. Otherwise, thank you for reading.
September 16, 2002 - Rob Wood
Does the superiority of the new methods over runs created have anything to do with the argument that runs created estimates runs scored whereas the linear weights methods estimate runs scored above average (and therefore uses more information)? I am not "taking sides" here, I'm just curious if they use the same information.
September 16, 2002 - tangotiger
(www)
(e-mail)
Actually, the linear weights that I use can determine either absolute runs scored, or runs above average.
As described in the previous article, the reconciliation between the two methods is simply subtracting .16 runs per out (for 1974-1990).
September 16, 2002 - Marshall Mathers
Let poor Heisenberg alone, lest you want me to cut you open with Ockham's Razor, biatch.
September 16, 2002 - Arvin Engen
1. In the home run example, are you accounting for the fact that games in which more home runs are hit are likely to be associated with favorable batting conditions in general, e.g. the wind blowing out on a humid July day at Coors Field with Jose Lima pitching?
That is, the number of home runs hit isn't independent of other run-generating events. It's not that more runners are scoring per home run when a large number of home runs are hit, but rather, that there are more (run-generating) singles, doubles and triples in these games as well.
2. Your criticism of runs created is a bit like telling the policeman who has just pulled you over that his radar gun is unreliable because it has failed to account for special relativity.
Why the hell should I care about extreme outcomes? If I'm a major league GM, how does BaseRuns help me to build a better team? Does it have better predictive value than runs created? Does it do a better job of explaining run creation in "realistic extreme" environments such as Barry Bonds 01/02 or the Deadball Era or Coors Field?
September 16, 2002 - tangotiger
(www)
(e-mail)
That is, the number of home runs hit isn't independent of other run-generating events. It's not that more runners are scoring per home run when a large number of home runs are hit, but rather, that there are more (run-generating) singles, doubles and triples in these games as well.
*** First of all, this is not true. Generally speaking, those events ARE independent of the HR. I ran further studies that controlled for those events (for example, looked only at games with 2 to 4 walks, 6 to 8 singles, etc, and separated them by the HR class) so that there was virtually no difference in those events. The results were the same.
Why the hell should I care about extreme outcomes?
*** In extreme examples, you can't hide the shortcomings of your models or estimates. They stick out like a sore thumb. And note, in my extreme examples, the dataset went only so far as Barry Bonds' 01/02. So, it was not "unrealistic" extreme, but realistic extremes.
If I'm a major league GM, how does BaseRuns help me to build a better team? Does it have better predictive value than runs created?
*** As I said, almost all run evaluators are similar at the .300 to .400 OBA range. This will help you determine the true value of those extreme players that GMs are trying to figure if they are overpaying them.
Better predictive value? This system, nor runs created, does not talk or explain about predictive value. Voros, MGL and a few others do a good job there.
Does it do a better job of explaining run creation in "realistic extreme" environments such as Barry Bonds 01/02 or the Deadball Era or Coors Field?
Yes. This series of articles explains a team of Barry Bonds, a team of Pedros (i.e., Pedro himself), and virtually any run environment, regardless of whether that run environment is due to the hitter, the pitcher, the fielders, or the park. If Runs Created says that the run value of a HR is LESS than 1 for dead ball (an impossibility) pitchers, or worth more than 2, close to 3 runs (!), for Barry Bonds run environment, what does that tell you?
September 16, 2002 - Arvid Engen
*** First of all, this is not true. Generally speaking, those events ARE independent of the HR. I ran further studies that controlled for those events (for example, looked only at games with 2 to 4 walks, 6 to 8 singles, etc, and separated them by the HR class) so that there was virtually no difference in those events. The results were the same.
I'm not convinced. Do you mean to suggest that there's no meaningful correlation between (say) the number of doubles hit in a game and the number of home runs? That's a highly counterintuitive conclusion.
I'm also not certain that the experiment you've described would do an adequate job of controlling for those effects. "2 to 4 walks" is a fairly wide range, the difference between Roy Oswalt on the one hand, and Matt Clement on the other. Moreover, you're not controlling for the interactions between these run elements. One more double or two more singles or two more walks might not make that much difference, but one more double and two more singles and two more walks certainly would.
*** In extreme examples, you can't hide the shortcomings of your models or estimates. They stick out like a sore thumb. And note, in my extreme examples, the dataset went only so far as Barry Bonds' 01/02. So, it was not "unrealistic" extreme, but realistic extremes.
....
*** As I said, almost all run evaluators are similar at the .300 to .400 OBA range. This will help you determine the true value of those extreme players that GMs are trying to figure if they are overpaying them.
The competing models aren't trying to deal with extreme cases. To vigorously pat yourself on the back because BaseRuns specifically does better what it's specifically designed to do better is a bit facetious. If BaseRuns is 50% more accurate at dealing with extreme cases, and 1% less accurate at dealing with realistic cases (I don't know that it is), that seems to me like one step forward and two back.
Yes. This series of articles explains a team of Barry Bonds, a team of Pedros (i.e., Pedro himself), and virtually any run environment, regardless of whether that run environment is due to the hitter, the pitcher, the fielders, or the park.
But how about one Barry Bonds and eight mortals? What I'd like to see is a comparison of the systems within an actual major league context, not a simulation that you've designed to produce an outcome that is preordained to be favorable to your cause.
What is the question you're trying to answer? What are the implications of your research? I apologize in advance for my confrontational tenor, but your advocacy of BaseRuns comes across as almost cultish, based on a series of assumptions it conflates with Truth, without regard for the world around it. It is like a sabermetric version of Ayn Rand.
September 17, 2002 - Paul B
It's really all about what you're interested in studying, isn't it? Base Runs seems to work better in a single-game context than the other major methods of run evaluation and the methodology is adaptable to unusual run environments (e.g. Tango's example in part one of a softball team that scores 20 runs per game).
If you're interested in working with seasons or groups of seasons involving actual major league teams, I'm still inclined to go with a model that more accurately estimates those situations--even if those models lack the theoretical grounding of Base Runs. If the method I'm using correlates strongly with the environment I'm trying to study (e.g. if it closely estimates the 1987 St. Louis Cardinals actual runs scored) to what degree does it matter which technique I use? To some extent, I feel like I'm being told I'll have a better pasta sauce if I take the extra time to stew some fresh tomatoes; that may be true, but for almost every real situation, your sauce will taste just fine if you use a can. Or am I just less bothered by back-of-the-envelope calculations than other people are?
To echo Arvid's question: if I'm studying actual MLB teams from the past thirty years, is Base Runs designed to provide new information? If I'm Billy Beane, does Base Runs tell me anything that's useful in putting my team together that contradicts LW or RC or XR or EQR or any of the myriad of other run estimators we already have? Does it say that Eric Chavez contributes less than we thought, or that John Mabry contributes more? Or is Base Runs more useful in letting me know why we only scored 8 runs on that day when we got six doubles and two home runs or that we need to run more when we face Pedro?
September 17, 2002 - tangotiger
(www)
(e-mail)
There is some correlation. Here's the data. But since I've shown the breakdown by OBA (where there is HIGH correlation by definition between the # of singles, doubles and OBA), and I broke it down by HR (where the correlation that does exist has very little impact overall), I don't know what it is that you are after.
S D T HR HBP BB 6.3 1.4 0.2 - 0.2 3.1
6.3 1.5 0.2 1.0 0.2 3.3
6.4 1.6 0.2 2.0 0.2 3.4
6.5 1.6 0.2 3.0 0.2 3.6
6.4 1.6 0.2 4.0 0.2 3.7
6.7 1.7 0.2 5.0 0.3 3.8
7.3 1.7 0.3 6.0 0.2 4.2
7.6 2.1 0.3 7.0 0.4 4.4
As you can see, the most important variable here is the HR. It has by far the most impact on how many runs should be scored, *in this grouping of data*.
And the impact that it does have is nowhere as high as Runs Created would say it is. It's impact, as shown in the article, is virtually exactly what BaseRuns says it should be.
If BaseRuns is 50% more accurate at dealing with extreme cases, and 1% less accurate at dealing with realistic cases (I don't know that it is), that seems to me like one step forward and two back.
Again, the point of the articles is to paint a picture as to how runs are created. BaseRuns is the first step in trying to figure out what the score rate should be. I'm sure there'll be better ones to come around. But the basic model is correct by definition. What Runs Created, static Linear Weights, et al do is to ignore the model, and instead fit their formula to the sample data they have on hand.
What is the question you're trying to answer?
How are runs really created.
What are the implications of your research?
The implication is that by ignoring the actual basis of how runs are scored (that the HR has an absolute minumum value of 1, and caps off at somewhere below 2, and that all events should converge to 1 as the OBA converges to 1), you are fixing your formula to reduce the RMSE. While you might (and should!) get better results by fixing your formula against known sample data, you are deceiving the reader into how runs are really created.
The implication is that the value of the HR does not have an ever-increasing value. There is a law of diminishing returns for the HR specifically.
I apologize in advance for my confrontational tenor,
I would appreciate if the confrontation aspect is reduced slightly. Thanks. I'd prefer the debate center on the merits of the data and interpretation of the data.
but your advocacy of BaseRuns comes across as almost cultish, based on a series of assumptions it conflates with Truth
Again, except for BaseRuns' definition of the score rate, everything I've said is truth. What assumptions are you referring to?
We start off with a point of fact that runs = BR x scoreRate + HR. Now, we're trying to figure out what the score rate is. David' B/B+C seems too simple, but in actual fact, this ends up conforming to reality. There's a problem at the very high end, and that's where we should try to look for better answers. But the runs = BR x scoreRate + HR must hold.
, without regard for the world around it.
BaseRuns is the only model that accounts for the world around it.
But how about one Barry Bonds and eight mortals? What I'd like to see is a comparison of the systems within an actual major league context, not a simulation that you've designed to produce an outcome that is preordained to be favorable to your cause.
Preordained? Would you believe me if I told you that I wrote the first 2 articles BEFORE I ran the data in the third article? I was happy to use my sim, but then I decided to run against real data. I was the biggest skeptic of BaseRuns when David first introduced it to me. There's no bigger skeptic of "new math" than me, be it DIPS, BaseRuns, or Win Shares.
As for 1 Barry and 8 mortals, that would require the use of a sim, because of the problem I mentioned regarding the pond and the aquarium. There are other factors, specifically, the batting spot you put Barry in. He has a different effect if batted 1st than 5th. I intend to look at the batting order effect at some point, but I'd be happy to share anything specific you would like to know.
September 17, 2002 - tangotiger
(www)
(e-mail)
is Base Runs designed to provide new information
At the risk of repeating myself, it is designed to present how runs are really created. It's up to the reader to decide how valuable it is to know this.
As I mentioned on a few occasions, if all you look at are players and teams with an OBA around .300 to .400, it really doesn't matter what you use.
But, if you are interested in extreme examples, like Pedro, or a high run scoring environment, BaseRuns value comes through in that it doesn't give you the shortcuts the other run estimators rely on to be accurate.
I agree, if you are not bothered by back of the envelope calculations, and you don't care about extreme situations, then stick with basic RC or static LWTS. They'll serve your purpose. I've said as much in past articles.
I am presenting a framework to understand how runs are created, and that at some point the marginal value of the HR decreases, while other run evaluators never consider this.
And if you are looking at college or high school ball, then BaseRuns becomes much more valuable.
September 17, 2002 - tangotiger
(www)
(e-mail)
As for how accurate is BaseRuns is in "real-life" situations, here's the data behind the "by OBA" chart. The "R" is actual Runs scored.
oba R BsR LWTS RC 0.030 0.12 0.10 (1.93) 0.03 ... BsR better
0.077 0.41 0.31 (1.18) 0.20 ... BsR better
0.124 0.61 0.63 (0.37) 0.51 ... BsR better
0.176 1.17 1.22 0.63 1.09 ... BsR better
0.224 1.88 1.97 1.67 1.86 ... RC better
0.275 2.83 2.99 2.89 2.93 ... LWTS better
0.324 4.04 4.21 4.19 4.21 ... LWTS better
0.371 5.31 5.58 5.51 5.65 ... LWTS better
0.421 6.87 7.28 6.95 7.42 ... LWTS better
0.468 8.64 9.18 8.41 9.33 ... LWTS better
0.515 10.34 11.33 9.89 11.49 ... LWTS better
0.566 12.10 13.91 11.78 13.84 ... LWTS better
As you can see, when the OBA is between 275 and 375, all three measures are very very similar. But for the Pedros and Thomes and Bonds of the world, things are different.
Notice also that LWTS is better at the "high-end". This is because LWTS takes advantage of the HR value to be fixed at 1.40, and it doesn't fall into the RC trap.
If I present a similar table broken by HR class, BsR will take over in *all* respects. This is why I say that David's BsR is the first step. Clearly, it still falls into a similar trap as RC in that it overvalues each event (but not as much) that RC does. The search is to find out how to better represent the interaction.
September 17, 2002 - tangotiger
(www)
(e-mail)
Ugh, just trying to turn off those italics. Sorry about all that.
September 17, 2002 - Arvid Engen
1. I will try and tone things down a bit, but think of this as the sort of tough love you'd encounter in defending a dissertation. It is clear that all of this makes sense to you, but it is not so transparent to a well-informed audience.
I also think that you're inviting somewhat more ... aggressive feedback when you say things like "Runs Created is dead, BaseRuns is the now", or invoke (incorrectly) something like the Heisenberg Uncertainty Principle. I mean, you're talking the talk.
2. Thank you for presenting the table of correlations.
3. The fundamental point is that there's no "Holy Grail" here. Runs are created by the particular combination of batting and baserunning events in a particular inning of a particular game. Any attempt to generalize these unique sequences of events into something universally applicable has got to make approximations and assumptions.
Linear weights cuts a different corner than BaseRuns does, by focusing on data at the season level rather than at the game level. You assert that focusing on data at the game level is True, without presenting evidence either as to the utility of this approach (what would Billy Beane do with it?), or to its aesthetic purity. Why focus on the game level, rather than at the inning level? Why try and take all of the context out of run creation at all?
4. My point in criticising your experimental design is that the coefficients you use in BaseRuns were derived based on data gathered at the game level, and that you then use game-level data in order to test its superiority. If you tested the various systems based on season-level data, linear weights would triumph, because that's how it is derived.
5. In the OBA chart you present above, BaseRuns is considerably less accurate over the entire normal range of OBA's. Missing by an additional .25 runs per game would amount to about 40 runs or 4-5 games over the course of a season. That is not "very very similar"; you have made a substantial trade-off here!
6. I was disappointed that you did not reply to the Ayn Rand ad hominem! I suspect that you have a tattered and dog-eared copy of The Fountainhead sitting on your bedside table.
September 17, 2002 - tangotiger
(www)
(e-mail)
Are those italics ever going to die?
...this as the sort of tough love you'd encounter in defending a dissertation. It is clear that all of this makes sense to you, but it is not so transparent to a well-informed audience.
*** Yes, this is very clear to me. Since I don't have the natural honed gift of Bill James in writing, I'll do my best to convey my message better.
I also think that you're inviting somewhat more ... aggressive feedback when you say things like "Runs Created is dead, BaseRuns is the now",
*** That's ok. I say these things with basis. I've gone to great lengths to show different scenarios, etc.
...or invoke (incorrectly) something like the Heisenberg Uncertainty Principle. I mean, you're talking the talk.
*** I didn't mean to suggest that the Heisenberg Uncertainty Principle was at work here, nor that my example was one of Heisenberg. The specific quote I took was one where it's hard to distinguish between what is being observed, without interacting with the system you are observing. Barry Bonds does not interact with himself, only with his teammates. But by throwing Bonds into the mix, you are changing the relative values of the teammates you are trying to study.
2. Thank you for presenting the table of correlations.
*** Sure. I'd be glad to show more detailed data. Just the forum here is not very appealing for it. Email me if you want more.
3. The fundamental point is that there's no "Holy Grail" here. Runs are created by the particular combination of batting and baserunning events in a particular inning of a particular game.
*** The search is for the holy grail, and BaseRuns is *not* it.
Any attempt to generalize these unique sequences of events into something universally applicable has got to make approximations and assumptions.
*** You don't want to make them universally applicable. The holy grail reference was in reference to things to come. You have to understand all the contexts, the base-out situation, the pitcher/batter matchup, the runners, the fielders, the park. I don't expect that we will end up with 1 formula for all that. I do expect that we will get a series of principles that will follow all that. The work that I have done all leads to this.
Linear weights cuts a different corner than BaseRuns does, by focusing on data at the season level rather than at the game level. You assert that focusing on data at the game level is True, without presenting evidence either as to the utility of this approach
*** The game is the unit since the interaction of the events occur at the game level. You make a more accurate point that the interaction occurs at the inning level, and this is true. If I had the data, I would have presented at the inning level.
(what would Billy Beane do with it?),
*** I'm sure he's a smart guy. But not all these things have to be applied by GMs. My audience is myself, and people who think like me. Maybe there's not many people out there like that, that's fine.
or to its aesthetic purity. Why focus on the game level, rather than at the inning level? Why try and take all of the context out of run creation at all?
*** BaseRuns tries to (wrongly) take the context out. LWTS, as I do them, forces all the context right back in.
4. My point in criticising your experimental design is that the coefficients you use in BaseRuns were derived based on data gathered at the game level, and that you then use game-level data in order to test its superiority.
*** As discussed, the interactions occur at the inning level. Game-level is the best I had.
If you tested the various systems based on season-level data, linear weights would triumph, because that's how it is derived.
*** Not necessarily so. Dynamic Custom Linear Weights would always triumph over static linear weights. Assuming you meant static linear weights, this is probably true, but not necessarily. But I agree generally with this statement. The point however is that the single from last week does nothing to determine the impact of run scoring tomorrow. It might have some predictive value, but it has no impact on it. This is why inning-view (or game-view if you are also looking at wins and lineup construction) is the correct view.
5. In the OBA chart you present above, BaseRuns is considerably less accurate over the entire normal range of OBA's. Missing by an additional .25 runs per game would amount to about 40 runs or 4-5 games over the course of a season. That is not "very very similar"; you have made a substantial trade-off here!
*** If I present the chart by HR, or by OPS, BaseRuns would triumph over the normal range. It depends what data it is that you are using, the context that you are presenting. Again, the strength of BaseRuns is how it handles the HR. Its weakness (relatively speaking to itself, but not to the other evaluators) is the rest of the components. This is why I don't support BsR as the end-all and be-all, but as the first step. As I've mentioned a few times, the search is on for a better score rate. If you have a run model that doesn't adhere to something as fundamental as R = BR x scoreRate + HR, what are you supposed to do with this?
*** Here is the data by the OPS class (grouped by .100)
opsClass R BsR LWTS RC
0.055 - 0.03 (1.76) 0.02 ... RC is better
0.160 0.22 0.18 (0.75) 0.17 ... BsR is better
0.258 0.47 0.50 0.11 0.48 ... RC is better
0.358 0.93 1.02 1.01 0.97 ... RC is better
0.454 1.62 1.72 1.94 1.65 ... RC is better
0.552 2.49 2.60 2.90 2.52 ... RC is better
0.651 3.47 3.63 3.88 3.59 ... RC is better
0.748 4.62 4.79 4.82 4.83 ... BsR is better
0.846 5.85 6.08 5.72 6.27 ... LWTS is better
0.945 7.17 7.48 6.59 7.92 ... BsR is better
1.043 8.60 9.00 7.42 9.78 ... BsR is better
1.141 10.11 10.53 8.18 11.85 ... BsR is better
1.239 11.57 12.25 9.06 14.14 ... BsR is better
1.338 13.16 13.89 9.76 16.66 ... BsR is better
1.443 15.78 16.15 10.78 19.62 ... BsR is better
(you'll note that when RC is better, is is barely better. As the OPS rises, BsR is far better.)
And here's broken down by the HR class
HR R BsR LWTS RC
- 3.08 3.06 3.79 3.03 ... BsR is better
1 4.62 4.62 4.44 4.66 ... BsR is better
2 6.12 6.12 5.00 6.41 ... BsR is better
3 7.65 7.65 5.62 8.37 ... BsR is better
4 9.03 9.00 6.07 10.29 ... BsR is better
5 10.55 10.49 6.73 12.45 ... BsR is better
6 12.33 12.32 7.52 15.35 ... BsR is better
7 16.22 14.32 8.34 18.27 ... BsR is better
September 17, 2002 - Rob Wood
Tango, can you post the complete formula you are using? Or maybe it's a series of formulas? Thanks much.
September 17, 2002 - Paul B
But not all these things have to be applied by GMs. My audience is myself, and people who think like me. Maybe there's not many people out there like that, that's fine.
At the risk of being overly reductive, if a model like this doesn't have a use for baseball management then I'm not sure what the point of the research is.
I don't mean to sell BaseRuns short, because I suspect it does have a use for management. As a method that accurately models how runs are scored in single games (or innings, if you take the method in that direction), BR could help with strategic questions concerning lineup selection or how to efficiently run your offense against a top notch pitcher.
It's fine if BaseRuns doesn't have a practical application beyond the single game environment, or if it's primary utiity is in studying non-standard MLB environments (e.g., high school ball), or if it's mainly designed to answer theoretical rather than practical questions. But most of these articles have focused on theoretical underpinnings, so I'm curious if you have a sense of how using BaseRuns should change the approaches we've all been using. And I apologize if this is a point you feel you've hammered home in your previous articles, because if it is I'm not sure I've understood it.
September 17, 2002 - tangotiger
(www)
(e-mail)
At the risk of being overly reductive, if a model like this doesn't have a use for baseball management then I'm not sure what the point of the research is.
*** The point of the research is to enlighten people as to how runs are really created. It doesn't have to have an application beyond that. However, if you want to properly value a player, you should value him on how he really creates runs. And BaseRuns helps in that regard for the extreme players.
BR could help with strategic questions concerning lineup selection or how to efficiently run your offense against a top notch pitcher.
*** That's possible, but I would not rely on BsR for that. Personally, I would use BsR to generate custom linear weights values, and THEN I'd use linear weights to assist in answering those questions. This is what I do, and I am very very confidant in the results I get from that.
...so I'm curious if you have a sense of how using BaseRuns should change the approaches we've all been using. And I apologize if this is a point you feel you've hammered home in your previous articles, because if it is I'm not sure I've understood it.
*** I don't think I've really addressed this issue. The approach is to get away from the "typical" run estimators, because they don't model reality. To quote someone's "Equivalent Runs" or "XRuns" or "Runs Created" almost makes it seem as if those estimators are accurate. They may yield accurate results in some or most cases, but the calculation to derive those calculations are not correct. Suppose that we know that 3 = 6 x .33 + 1. But, I come out and say, well, you know 3 is also equal to (6+1) x .429. I may end up getting the same answer using the same data, but the way I combined the data is wrong. But since most players and most teams do not deviate much from the norm, then, really who cares? It all works out.
However, I care about the extremes, about Pedro, Bonds, Thome, et al. And just because something works "on average" doesn't mean it works in the extreme.
So, to get back to your question, BaseRuns should change your approach as to how you view how runs are created, and should force you to question when you see a run evaluator that "works".
For low-level, or game-level actions, a custom set of LWTS or RE or WE charts is what you want (and I've provided some links above throughout the article).
============== Rob, here is the full "B" component I use. Just a caution: you DON'T need to have all this data. But I have this data, and this is what I am using. You will recognize most from the Retrosheet event files. If you want me to clarify some of the items, let me know. Note: because of "partial innings", you have to be very very careful (which is why I have that last entry). The short answer is that the RE chart at the bottom of the 9th inning of a tied game is DIFFERENT from the RE chart at any other point in the game. Again, if you need the long answer, let me know.
To all: Again, adding each of these components beyond the basics adds very very little to the accuracy of the run construction. But, for completeness, I am providing it.
0.73 Single 1.95 Double 3.13 Triple 1.69 HR 0.05 Walk (0.48) IBB 0.16 HBP 0.80 Error 0.28 Interference 1.43 OtherSafe 0.73 Sac (0.06) Strikeout (0.00) Out 0.81 SB (1.19) CS (0.51) Pickoff (0.35) PickoffError 1.05 Balk 1.17 PB 1.17 WP 0.56 DefensiveIndiff (1.06) OtherAdvance 0.00 FoulError (1.49) implied outs
September 17, 2002 - Italics
test
test
Did it work?
September 17, 2002 - tangotiger
(www)
(e-mail)
Italics: yes, this was all my fault. I did not have a proper closing italic tag, and it left everything subsequent in italics. I threw in a whole bunch of closing italic tags in my previous post just to make sure that there was no nesting going on, to close it off, and that seemed to work. Sorry about all that...
September 17, 2002 - Patriot
You would think that Tango had tried to take down Santa Claus and the Easter Bunny! Good grief!
"The competitors are not trying to work for extreme cases" So? That somehow makes it wrong to attempt to model the extremes? If we can have an estimator that works well for the extremes, we can study so much more than we can with Runs Created. Should Barry Bonds be walked every time up? I dunno. Let's see. Barry has 200 basic RC or so last year, in 600 PA. How many basic RC would he have had if he was walked all 600 times: 600*0/600=0! 0 RC! Yep, walk him every time up! For that matter, walk everybody every time up, and they will never score!
It is true that the method that we use should be accurate for normal cases. But somebody mentioned that if BsR is 1% less accurate in normal cases then RC or LW, it is "1 step forward and two steps back". BS. 1% error for a true 100 RC player means we would be erring and saying he actually created 99 or 101 runs. 1 run! Who cares? I hope no one here thinks 1 run over the course of a whole season is very significant.
Of course, BsR IS accurate in the normal range of offense. Comparable to LW, definitely more accurate than RC.
Some dogmatic person criticized Tango for saying that "RC is dead". RC is dead! It is grossly flawed! This should be obvious to everyone. We KNOW that it overrestimates Barry Bonds, and yet we defend it simple because we have used it for 20 years? Come on.
September 17, 2002 - Arvid Engen (aka _some dogmatic person_)
"The competitors are not trying to work for extreme cases" So? That somehow makes it wrong to attempt to model the extremes?
One of the thrusts of Tango's criticism is that RC and XR don't do well in extreme cases; my point is that they don't make any pretense of doing so. I mean, so far as I know, Miguel Tejada can't dunk a basketball to save his life. Absolute goof on the basketball court. That doesn't have any bearing on his effectiveness as a baseball player.
That's why I've pressed Tango to provide examples of how and where the extreme cases are not trivial.
More to the point, I think Tango would have saved himself some trouble if he presented BaseRuns from the outset as a complementary approach to the existing metrics, and not as a replacement for them. This is closer to the tone reflected in his follow-up comments, which come across as a lot more reasonable.
But somebody mentioned that if BsR is 1% less accurate in normal cases then RC or LW, it is "1 step forward and two steps back". BS. 1% error for a true 100 RC player means we would be erring and saying he actually created 99 or 101 runs.
Look, "how are runs generated?" is not one of those sabermetric questions that remains mysterious to us. There isn't much room left for improvement; you have to sweat the small stuff. A small error on a very large number of cases doesn't seem any better to me than a large error on a very small number of cases. And if you look at the OBP data cited above, the error isn't even particularly small in some instances.
By definition, extreme cases are going to be very rare.
September 17, 2002 - Arvid Engen
Tango,
Thank you for your follow-up comments; I don't mean for it to come across as though I think what you've done is without merit. However a couple of further questions/comments:
1. It seems almost oxymoronic that BaseRuns doesn't do particuarly well relative to different levels of OBP, but does do very well relative to different levels of OPS. This suggests to me that there is some sort of interaction between the "getting on base" element and the "moving runners along" element that has been lost in the attempt to segregate those two things from one another. For one thing, the probabilities of particular batting outcomes aren't independent of the bases occupied during a given plate appearance.
2. I suppose I'm still somewhat put off by the implications that BaseRuns is a "true" or "real" or otherwise aesthetically pure measure of run creation. Even if you look at data on an inning-by-inning basis, it is still an approximation:
BB-1B-1B-HR-K-K-K produces 4 runs, whereas HR-K-K-1B-1B-BB-K normally produces 1.
That, in aggregating data to the season level, unusual and random sequences tend to get lost in the noise, is as much an advantage as a disadvantage.
September 17, 2002 - Paul B
Patriot, you’re doing a marvelous job of attacking arguments that aren’t being made. Nobody here has suggested that outliers don’t exist, or that it’s a bad idea to measure them. The question, to my mind, is whether BaseRuns improvement in measuring outliers is sufficient to justify replacing existing estimators when measuring typical examples within the controlled set of major league baseball teams, or whether (as Arvid argues) it should be used as a complementary technique to those we already have.
Tango frequently mentions that when he is introduced to a new concept, he does his best to beat up on the concept to make sure it holds up. Do you object to the posters on Primer subjecting Tango’s methods to the same scrutiny? I realize that the folks over at FanHome have been examining BaseRuns for a while now, but simply saying “trust me, we’ve done the due diligence” isn’t sufficient.
And making a brash statement such as “Runs Created is dead” invites controversy. I haven’t seen anybody here defend runs created--in fact, my impression is that most of the people at Primer don’t even use runs created for their estimator--but you can hardly claim to be surprised to meet some challenges when you throw down a gauntlet like that.
September 17, 2002 - Patriot
Well, I am one of the FanHome people and we have been championing BsR for quite some time. Of course, this is NOT the first introduction of BsR to the general sabermetric public. Mr. Smyth posted the method on r.s.bb and on James Fraser's site. I wrote a little bit on it for SABR's BTN. And Tango has been doing outstanding work on verifying the formula's accuracy, and trying to improve it. Until now, there have been no real reactions to it outside of the FanHome board. It's not like some secret club that you can't get in on. Now all of a sudden, the biggest step forward in RC methods is being questioned by people who ignored it for a year or two. That's a little annoying; where were you before?
Now, to adress Arvid, if we are estimating how many runs Barry has produced, we need a better method than RC for sure. And we know that when we add Barry to the system, all of the LW values will change. Now you can choose to evaluate Barry in the league neutral context, or you can attempt to evaluate the "theoretical team", as BJames does in the New RC. If you want to include the team interaction stuff, you need a method that properly estimates the number of runs a team will score, with the values of each event being dynamic. RC doesn't work. You need BsR if you want to do this.
Really, every RC method is a LW method. You can determine the "effective" weights using the +1 method. RC, again, is irrelevent, because it simply does not work. To assume that LW values stay static is also very wrong. The coefficients are not the same today as they were in 1968. So we need a method that will give us an accurate estimate of the LW. BsR is that method.
Tango is a big fan of LW. There is nothing wrong with using a LW formula to evaluate a player-as a matter of fact, I would encourage it and I believe Tango does too. But we cannot simply use a formula like ERP or XR over all contexts if we want to be correct. You can think of BsR as a method of estimating the LWs if it makes you feel better. There's nothing wrong with LW, but static LW ARE wrong.
As to "how are runs generated", is that really such an answered question? We have static LW saying each single produces .5 runs, and each double produces .8 runs on average. WRONG. A Rey Ordonez team will not score .5 runs/single. We have RC saying that a HR, by itself, is worth 4 runs. WRONG. We need BsR. BsR is right! It is an absoulte fact that runs=baserunners*%whoscore+HR. What we need to focus on, and what Tango has suggested in this piece, is to find the best way to estimate the % of runners who score. However, runs=baserunners*%whoscore+HR never occured to me until I read Smyth's work. And I have not seen any other sabermetricians advocating measuring like this. So BsR IS a big step forward in our understanding of how runs are created.
Paul, again you cannot deny that runs=baserunners*%whoscore+HR. Again, let's find out how we can best estimate the scoring %. You guys are quibbling over the NEED for such a method, which seems ludicrous to me. Did you guys question the need for Davenport's custom exponent method for estimating pythagorean W%. Actually, thanks to another brilliant insight by David Smyth we may be able to get better pythagorean exponents. We need methods that hold up theoretically, in all cases, so we can feel confident using our methods for all situations. If I estimate the number of runs that will score in a 1 hit game using LW, I can't be confident in that result. And that, BTW, is a very good example of an extreme: the individual game.
And maybe I'm just a fool, but I am surprised that anyone would be put off by "RC is dead".
September 17, 2002 - Paul B
Until now, there have been no real reactions to it outside of the FanHome board. It's not like some secret club that you can't get in on. Now all of a sudden, the biggest step forward in RC methods is being questioned by people who ignored it for a year or two. That's a little annoying; where were you before?
I never claimed to feel left out of a secret club. Look, Patriot, this is the way it works: a new technique is kicked around by the most hardcore of analysts--the kind of people who do a good job of keeping up with everything on r.s.bb and FanHome. At some point, the technique is considered sound enough to introduce to a larger public, where it makes it’s way to the forums frequented by more casual analytic types, the kind of people who understand why walk rates are important but who don’t spend a lot of time pouring over the Lahman database, like much of the readership at either of the BPs. If that audience responds well, the technique will probably continue toward the mainstream. Three or four years ago, even most statheads assumed that pitchers had substantial control over balls in play. Last year, Voros’s work was mentioned in The Village Voice, of all places.
You guys are quibbling over the NEED for such a method, which seems ludicrous to me. Did you guys question the need for Davenport's custom exponent method for estimating pythagorean W%.... We need methods that hold up theoretically, in all cases, so we can feel confident using our methods for all situations.
I don’t question the need for Davenport’s custom exponent, because as Davenport himself acknowledged, for most situations in actual MLB you’ll do just fine using 1.83 (or even 2), but that if your particular interest is in studying extreme examples, you need a custom exponent. If you’re trying to figure out if the Red Sox are underperforming this season given their runs scored and allowed, either the basic or complex exponent will help you to the same conclusion.
I don’t begrudge anybody who spends his or her time dealing with theoretical analysis, and I have no patience for the Bill Simmons types who make asinine claims about statheads ruining the game. But I’m not a hardcore analytic type myself, so my interest in any new techniques that come across in a forum like this is to ask what the applications are for my understanding of the game. In the early 1980’s, it was RC and LW that proved to me that all those managers who were valuing speed over obp were killing their offenses (yes, I now know that some of this goes back to Branch Rickey in 1954). More recently, DIPS has taught me that a pitcher can be “hit lucky”, and that a pitcher’s control of his environment is basically limited to his K, BB and HR rates. For a mainstream fan, those concepts were “revolutionary”.
I don’t debate that serious analysts need BaseRuns to better study unusual environments. What I’ve been trying to ask is if the more casual fan with an analytic bent (i.e., someone like me) needs BaseRuns to better understand the game he or she enjoys. If the answer is “not really”--just as the answer is “not really” to whether you need Davenport’s custom exponent--then that’s fine. If the answer is “it does a better job of estimating single game performance than other run estimators” then that’s fine, too.
And maybe I'm just a fool, but I am surprised that anyone would be put off by "RC is dead".
Arrogance puts me off. People claiming to have absolute answers to one great Truth put me off. As I said earlier, I don’t even use RC. But when somebody comes along and declares “the old ways are dead and the toy I use is the way of the future”--well, there’s plenty of hubris in that kind of declaration, and it begs for closer examination. I think I would have reacted in the same way if Tango had declared “OPS is dead”, and the flaws in OPS are even more obvious than those in RC (e.g. “this guy has a 1.010 OPS, so you’d be better off walking him every time”).
Now, I’ve found Tango’s tone to be far more modest in subsequent posts on this discussion, and consequently feel I am beginning to better understand the uses of BaseRuns. But shouting down the questions of somebody who doesn’t share your analytic devotion is the main reason stat-minded fans get such a bad rap. Asking questions is not an outrage at which to take offense.
September 17, 2002 - Arvid Engen
Dear Tony Eason,
Excuse me for wearing white after labor day. I spend too much time worrying about the nuances of baseball statistics as it is, and I choose to focus on this forum for a variety of reasons. BaseRuns certainly isn't widely accepted to the point where its assertions can be taken at face value without further validation; if it were, Tango would not have had to "spread the gospel" over at this site.
The peer review process, if you will, is still ongoing. I don't doubt that you and Tango have worked with the data extensively, have invested significant time in improving the system, but I am not convinced that the philosophical underpinnings of the system have been well articulated, here or anywhere else.
runs=baserunners*%whoscore+HR isn't anything more than an identity. For it to be of any use, you need to solve for its parameters based on the same dirty, polluted dataset that all the other methods have used. It is not clear to me that this particular formulation represents anything other than an algebraic rearrangement of a linear weights formula. You arrive at different coefficients because you use game-based data, instead of season-based data, but there is nothing in the identity itself that is particularly amenable to that technique. You could just as easily generate linear weights or extrapolated runs or whatever else using game-based data.
There is also no intrinsic relationship between the BaseRuns formula and the choice of static versus dynamic LW coefficients. If we wanted LW coefficients that correspond to a high-offense environment, we could get plenty good results by limiting our regression to high-offense seasons. It is not clear to me that BaseRuns has reached some sort of equilibrium between the utility of a formula that can facilitate comparison of offensive performance across different contexts, and the reality that offensive output is inherently context-bound.
BaseRuns is simply a refinement of linear weights. A tremendous amount of this discussion consists of age old questions couched in a new vocabulary. That does mean that these questions are resolved to everyone's satisfaction, or that the research this project has contributed isn't worthwhile, and I apologize if I have implied otherwise. It does mean that you guys ought to be careful not to mistake the novelty of your jargon with the novelty of your ideas. Certainly, some of the language you and Tango have invoked in defending the system suggests that this may be a strong possibility.
For Tango's next article, I would suggest that he discuss the offensive performance of the 2001 San Francisco Giants in terms of BaseRuns.
September 17, 2002 - tangotiger
(www)
(e-mail)
Arvid:
1. It seems almost oxymoronic that BaseRuns doesn't do particuarly well relative to different levels of OBP, but does do very well relative to different levels of OPS. This suggests to me that there is some sort of interaction between the "getting on base" element and the "moving runners along" element that has been lost in the attempt to segregate those two things from one another. For one thing, the probabilities of particular batting outcomes aren't independent of the bases occupied during a given plate appearance.
*** As mentioned, BaseRuns is the first step at the score rate. It does very well with high OPS and HR classes, simply because it handles the HR properly. Further improvement is called for in cases where no HR are hit. This is why I mention that the search is on for a better score rate.
2. I suppose I'm still somewhat put off by the implications that BaseRuns is a "true" or "real" or otherwise aesthetically pure measure of run creation. Even if you look at data on an inning-by-inning basis, it is still an approximation:
BB-1B-1B-HR-K-K-K produces 4 runs, whereas HR-K-K-1B-1B-BB-K normally produces 1.
*** The model assumes somewhat random distribution of events. It does not purport otherwise. You can pick any single example, and any model will be wrong. You need the sample size behind it.
That, in aggregating data to the season level, unusual and random sequences tend to get lost in the noise, is as much an advantage as a disadvantage.
*** No need to aggregate by team though, since this introduces a bias. By aggregating on other terms, as I've done, you get "better" data.
Paul:
is sufficient to justify replacing existing estimators when measuring typical examples within the controlled set of major league baseball teams
*** As I said, if all you care about is the typical example, then the typical evaluators is all you need.
but you can hardly claim to be surprised to meet some challenges when you throw down a gauntlet like that.
*** I have no problem arguing against RC, since it does not have a basis in logic. Its basis is gobbledygook math that is fixed to the sample narrow data, and its flaws are exposed when taking it out of its environment. This is also true of static LWTS. This is not the case with BsR or with custom LWTS.
because as Davenport himself acknowledged, for most situations in actual MLB you’ll do just fine using 1.83 (or even 2), but that if your particular interest is in studying extreme examples, you need a custom exponent
*** I didn't know Clay said this, but this is exactly my position as well.
I’ve found Tango’s tone to be far more modest in subsequent posts on this discussion
*** I must be getting old these last few hours. I'll try to be more "O'Reilly Factor" from time-to-time.
Arvid:
BaseRuns is simply a refinement of linear weights
*** BsR allows the generation of custom LWTS. There's no other relationship between the two.
I would suggest that he discuss the offensive performance of the 2001 San Francisco Giants in terms of BaseRuns.
*** For that you need custom LWTS by batting order by the 24 base-out states. BsR is not appropriate, except to help in establishing the baseline custom values.
September 18, 2002 - Paul B
I’ve found Tango’s tone to be far more modest in subsequent posts on this discussion
*** I must be getting old these last few hours. I'll try to be more "O'Reilly Factor" from time-to-time.
No, I only meant modest on a relative scale, where 0 is the amount of arrogance to be found in an average Tango article. For God's sake, there's never a need to turn on Fox News.
I've appreciated the introduction to BaseRuns (since I didn't know I was supposed to go hunt down the info on r.s.bb and FanHome). Thanks for the thorough research and response to posts.
September 18, 2002 - Brian Blake
My compliments to Tango on an excellent and very educational series of articles.
I realize this was covered to some degree particularly in the first article, but there does seem to be some confusion among many readers as to why Base Runs is a superior model in theory.
Perhaps it is worth highlighting again here at the end.
All run creation models generally begin with, as known quantities, the number of hits, singles, doubles, triples, HRs, walks, etc., and attempt to apply some formula to these numbers to estimate how many runs will be scored. The formula that is applied to these, however, always contains some degree of inacuracy.
In any run estimator, the coefficients which are applied to these numbers are estimates. And no matter how much work is done to refine them, they will always be estimates which introduce some degree of inaccuracy and which cause unacceptable results under certain conditions.
One impotant thing Base Runs has done is to remove 2 KNOWN quantities from these calculations, thus reducing the level of error.
One thing that is KNOWN is that every HR scores at least one run, that of the batter who hit it. Therefore, there is no reason to multiply this run by any coefficient. Doing so only increases the inacuracy of the model. In Base Runs, therefore, this part of the HR is removed from the estimated part of the calculation and instead, its TRUE value (1 for each HR) is added in the end.
The other thing that is known is the total number of baserunners. The first part of the calculation (H+BB-HR) calculates this KNOWN quantity. It is important to note that these 2 parts of the calculation are not only quite simple and elegant, but also 100% accurate. No estimate has yet been applied.
It is in the middle part of the equation, the calculation of the score rate, that things get more complex, and some degree of estimation and innacuracy is introduced. But here Tango has made clear that this part of the calculation is preliminary, and that he believes it may be improved on in the future.
And I would add that not only is it possible that it will be improved in its theoretical accuracy, it is also possible that it might be improved in its simplicity, so that a version of Base Runs might well be created that would be best even for the "back of the envelope" calculations which many prefer.
Once the score rate of baserunners is known, the formula (H+BB-HR)*rate+HR would be quite simple to calculate.
Finally, in the calculation of this ratio as presented in the article, B/B+outs, where B = (.8*1B + 2.1*2B + 3.4*3B + 1.8*HR +.1*BB), the calculation of B is no more complicated than that used by other run estimators which apply coefficients to each of these events. Because these coefficients are determined in a somewhat similar manner to other estimators, from real world data, they are also estimates and might be likely to introduce some degree of error.
But unlike some other estimators, these coefficients do not represent the VALUE of these events. Instead, they represent only the impact on the calculation of the overall rate of succesfully scoring baserunners.
Also, while it might seem counter intuitive at first that the impact of the HR (with a coefficient of 1.8) is less than that of the triple (with a coefficient of 3.4), it is important to remember that we already elliminated the run scoring value of the HR itself from this part of the calculation. Therefore, The HR coefficient only includes its impact in terms of driving in other baserunners. While the other coefficients include both the driving in impact, plus the scoring impact (the chance that that new baserunner will score).
Thus in theory, the triple likely has roughly the same driving in impact on this ratio as the HR (because a triple would clear the bases), thus 1.8, and the additional 1.6 (for a coefficient of 3.4) would be due to the fact that adding a runner on third base would increase the overall score rate due to the greater likelihood of scoring a runner from 3rd base, as opposed to 2nd or 1st.
On the other extreme, the values for walks and singles are also entirely due to their driving in or moving over value. This is because a runner on first is NOT more likely to score than the average runner. This is why the coefficient for walks is so small. This does not suggest that additional walks do not increase scoring. They do, but this is already accounted for in the first part of the formula which increaes for each additional baserunner.
Perhaps one way this formula could be refined for those who are interested, might be to attempt to seperate the run scoring, or the moving over(driving in) components, though I expect this would only increase the complexity.
Another idea, which might make the score rate calculation a bit more intuitively satisfying, though more mathematically complex, might be to present it as the weighted average of the seperate score rates for the single, double, triple, and HR. Thus rather than saying a triple "increases the score rate of the average baserunner" we would be saying "take the %of singles multiplied by the score rate for singles, plus the % of doubles multilied by the score rate for doubles, etc."
But this would also dramaticaly increase the complexity of the formula, as the score rates for each event are dependent on the others; if there are more triples, more runners who single or walk will be driven in for example, so you would have a similar lengthy calculation for each event.
One alternate way of presenting the formula for B which does get across to some degree what is going on here, is to say:
B = 8.2 * [.1(1B) + .26(2B) + .41(3B) + .22(HR) + .01(BB)]
All I've done here is to take the some of the coefficients, 8.2, and back that out by dividing each coefficient by that sum, in order to express each coefficient as a percentage. The coefficients now sum to 1. Multiplying this out will produce the original formula. The only slight differences in reslts in using this formula would be caused by rounding error.
By treating the 8.2 as a constant, we can see clearly the relative effect of each event, in percentage form, on the score rate. I think it is clear from Tango's data that this model is more accurate across different run environments then those models which treat the marginal value of each new event as static, and it should be, as each event in this formula increases the rate of scoring.
But while Tango has shown his data grouped by HR's, OBP, and OPS, and demonstrated a reasonable degree of accuracy there, I would like to also see the data shown grouped by number of triples in a game, number of walks in a game, number of doubles in a game.
The reason I would like to see this is that I think it might better demonstrate how accurate or innacurate those coefficients are, and thus might help point the way towards further refinement of this part of the formula.
It seems that if there is any inaccuracy here, it is likely in either the constant, 8.2, or in the relative weights assigned to the impact of the individual events.
We would expect groupings of HRs to work well, as a major part of the HR has been backed out of the part of the calculation that is prone to error. Likewise, if the balance between individual events, like walks singles, doubles, and triples is off somewhat, this will not likely show in groupings by OBP and OPS, because games with high relative rates of these events will likely be distributed fairly evenly across the groupings.
Tango, you've shown us the data for where this works well, now show us the data for where it might not (yet)!
September 18, 2002 - David Smyth
First, I'm enjoying all of the commentary, pro and con, and I thank everyone who has participated. As far as the "my stat is better than your stat" stuff, all I have ever wanted is for BsR to be better known, and considered to be an equal alternative to the "standard" run formulas. Sometimes it might be a better alternative. For example, there is always talk about whether they should walk Barry every time (in fact there is a current "clutch hits" discussion of this on this site), and I think that BsR will likely provide a better answer than RC or XR (although a good simulation would be best). Another question might be, how often should you try to steal against Pedro, or against J Lima? BsR will provide custom values for the SB and CS for each pitcher, so that you can compute a custom break-even point. You could do the same with RC, I think, but the CS value in RC seems to be too high.
Also, those B coefficients given by Tango vary on what is included in the A and C factors. Someone asked for an actual practical version of the formula, and unless I'm mistaken none was given by Tango in the articles. (Tango, I thought you were gonna give a version which includes all the minor stuff, like ROE and WP.) Anyway, here is a version that I use most of the time:
A = H +BB +HBP -IBB -HR B = .1*(BB+HBP-IBB) +.8*1B +2.3*2B +3.6*3B +2.1*HR +SB -CS C = AB-H +CS D = HR
This will give you a good estimate of the number of expected runs generated over the long-term. You can also reconcile the entire B result to a league or team by multiplying it by 1.01, or .98, or whatever produces the correct run total.
Tango, I appreciate the work you've done, and the publicity you have given BaseRuns. The info is there for anyone to decide for himself what he wants out of a run formula, and which one best meets those needs.
September 18, 2002 - David Smyth
Ugh! The formatting came out bad. I'll reprint that formula and try to make it come out better.
A = H +BB +HBP -IBB -HR~~~~~~~~~~~B = .1*(BB +HBP -IBB) +.8*1B +2.3*2B +3.6*3B +2.1*HR +SB -CS~~~~~~~~~~~~~~C = AB-H +CS~~~~~~~~~~~~~D = HR
September 18, 2002 - tangotiger
(www)
(e-mail)
Paul:
No, I only meant modest on a relative scale, where 0 is the amount of arrogance to be found in an average Tango article. For God's sake, there's never a need to turn on Fox News.
Ah-hahaha... the Linear Weights Arrogance Tango Scale! I love it! As for Fox News, they are extremely biased. PBS, CNN maybe, and 60 minutes are really the only good ones out there. Seriously, watch BBC, or other world news and you get such a different perspective on the world. Did you watch those 3 Arabic-American kids from Florida on Larry King 2 nights ago? They were extremely believable, and given the choice between them and that lady, I'd choose them. Of course, the American media was all over them before the King appearance, and since then? Exactly.
You know what else they said when asked if they would sue her? No! They said no! How un-American is that??
Brian:
Excellent summary overall. I agree with almost everything, except
might be to attempt to seperate the run scoring, or the moving over(driving in) components
I have done this in Article 2, under the "building blocks of run creation". The separating into components is what you need to do to get custom Linear Weights components.
I'll have to think about your "8.2" concept. Sounds interesting.
I would like to also see the data shown grouped by number of triples in a game, number of walks in a game, number of doubles in a game
Sure, no problem. I'll try to get that done by this weekend (I usually run my research while my newborn is asleep, which is not often these days!).
It seems that if there is any inaccuracy here, it is likely in ... the relative weights assigned to the impact of the individual events
No, that is not possible. Those numbers were generated such that when using the plus 1 method it yields the exact LWTS coefficients determined by the play-by-play data. Therefore, the inaccuracy would be that we can't simply have such a simple "B" equation.
============
I will post the complete BsR equations that I used by the end of today. What I provided to Rob above was only the "B" equation. I neglected to also include the "Baserunner" portion as well.
September 18, 2002 - Arvid Engen
Finally, in the calculation of this ratio as presented in the article ... the calculation of B is no more complicated than that used by other run estimators which apply coefficients to each of these events. Because these coefficients are determined in a somewhat similar manner to other estimators, from real world data, they are also estimates and might be likely to introduce some degree of error.
Is there any reason to believe that such estimates are subject to any less degree of error than are the other run estimators?
That's why I say that this is an algebraic rearrangement of a traditional run estimation formula; you're calculating the same damned coefficients, and adding some window dressing. Well, not quite; you're also introducing a constraint or two into the calculation, the effect of which seems to be:
1. More robustness across different run-scoring environments. 2. Somewhat less accuracy in normal run-scoring environments.
But unlike some other estimators, these coefficients do not represent the VALUE of these events. Instead, they represent only the impact on the calculation of the overall rate of succesfully scoring baserunners.
That sounds neat, but that's little more than a semantic distinction. The "value" of the coefficients in a LW formula is the estimated impact of a particular event on scoring runs. The "value" of the coefficients in the BaseRuns formula is the estimated impact of a particular event on scoring baserunners. I don't see how the latter is superior to the former in any meaningful way.
September 18, 2002 - tangotiger
(www)
(e-mail)
2. Somewhat less accuracy in normal run-scoring environments.
This may very well be true, but that is only because the other measures (except LWTS) are "cheating" to get there. They ignore the constraints of a HR being at least 1 run, they ignore the constraint that you can't score more runs than you have runners, and so that gives them enough wiggling room to force in coefficients to the sample data they have to get the lowest RMSE possible.
Static LWTS values are derived from the pbp and therefore does no cheating. Well, it cheats in that its values can only be applied from the data it was generated from, the typical run scoring environment.
BaseRuns may be 1% less accurate in the typical environments but "50%" more accurate in the extremes. You (?) said that this is 1 step forward, 2 steps back. From my standpoint, this is 2 steps forward, 1 step back.
I don't like that the accuracy of the other formulas is fitted to the typical data, *especially since almost everyone then takes that formula out of that environment and applies it to Pedro, Barry, and Thome*. That little disclaimer is always ignored.
Anyway, to repeat: if all you care about is the typical, use the typical. If you want to know how the events interact with each other to produce runs in various run environments, then you need to use R = BR x scoreRate + HR. For now, BaseRuns is it.
September 18, 2002 - Brian
Tango,
Here's what I was getting at with regard breaking out the "getting on" and "moving over" values in the B equation. Looking again at what you did in part 2 did help me to figure this out though:
First, allow me to change my constant in the B equation from 8.2 to 4.5, which produces this (by dividing 8.2 by 1.8181 and multiplying the coefficients by that same amount):
B= 4.5 * [.18(1B)+.47(2B)+.75(3B)+.40(HR)+.02(BB)]
I did this so that the coefficient for the HR would equal the "moving over" value of the LWTs. From there it is not too difficult to break out the "getting on" and "moving over" values in the B equation. The only difference is that for purposes of the B equation, you want the scoring rate to increase only when the getting on value is above average, and to decrease when it is below average.
In this case, the average "getting on" value turns out to be .28, as the "getting on" value for the B equation is equal to the LWTs value less .28. The "moving over" values are exactly the same.
LWTS: 1B 2B 3B HRS BB geton .25 .41 .61 1.00 .24 mover .21 .34 .42 0.40 .06
BsR: geton (.03).13 .33 0.00(.04) mover 0.21 .34 .42 0.40 .06 total 0.18 .47 .75 0.40 .02
There is probably no need to apply these values seperately in the formula, but this may be helpfull in demonstrating that those coefficients are in fact derived directly from LWTS.
As for the 4.5 constant, I was a bit puzzled by what this meant, but I may have an answer.
For the purposes of calculating the scoring rate, in the denominator, succesfull scoring events + outs, outs in theory ought to only include outs with men on base.
If we divide term in the equation B/(B+C) by 4.5, we are able to back this constant out of B, and are left with B/(B+(C/4.5)). Thus total outs are being divided by 4.5. If you have 27 outs in a game, that would mean you are counting 6 of them, which it seems to me might be roughly the number of outs made per game with men on base in your data set.
Does this make sense?
September 18, 2002 - Brian
Tango,
Here's what I was getting at with regard breaking out the "getting on" and "moving over" values in the B equation. Looking again at what you did in part 2 did help me to figure this out though:
First, allow me to change my constant in the B equation from 8.2 to 4.5, which produces this (by dividing 8.2 by 1.8181 and multiplying the coefficients by that same amount):
B= 4.5 * [.18(1B)+.47(2B)+.75(3B)+.40(HR)+.02(BB)]
I did this so that the coefficient for the HR would equal the "moving over" value of the LWTs. From there it is not too difficult to break out the "getting on" and "moving over" values in the B equation. The only difference is that for purposes of the B equation, you want the scoring rate to increase only when the getting on value is above average, and to decrease when it is below average.
In this case, the average "getting on" value turns out to be .28, as the "getting on" value for the B equation is equal to the LWTs value less .28. The "moving over" values are exactly the same.
LWTS: 1B 2B 3B HRS BB geton .25 .41 .61 1.00 .24 mover .21 .34 .42 0.40 .06
BsR: geton (.03).13 .33 0.00(.04) mover 0.21 .34 .42 0.40 .06 total 0.18 .47 .75 0.40 .02
There is probably no need to apply these values seperately in the formula, but this may be helpfull in demonstrating that those coefficients are in fact derived directly from LWTS.
As for the 4.5 constant, I was a bit puzzled by what this meant, but I may have an answer.
For the purposes of calculating the scoring rate, in the denominator, succesfull scoring events + outs, outs in theory ought to only include outs with men on base.
If we divide term in the equation B/(B+C) by 4.5, we are able to back this constant out of B, and are left with B/(B+(C/4.5)). Thus total outs are being divided by 4.5. If you have 27 outs in a game, that would mean you are counting 6 of them, which it seems to me might be roughly the number of outs made per game with men on base in your data set.
Does this make sense?
September 18, 2002 - tangotiger
(www)
(e-mail)
I think you are onto something here
Thus total outs are being divided by 4.5. If you have 27 outs in a game, that would mean you are counting 6 of them, which it seems to me might be roughly the number of outs made per game with men on base in your data set.
That is not possible. About 45% of all PAs occur with men on base. 65% of all PAs are outs. Therefore, # of outs with MOB is .45 x .65 x 39 = 11
However, I did notice a very interesting relationship between the B component values and the LWTS values in the past. I haven't been able to quantify well yet though. I'm sure you are on the right path.
September 18, 2002 - tangotiger
(www)
(e-mail)
btw, the number I derived was 4.25. Not sure what to do with it yet.
September 18, 2002 - Patriot
(www)
(e-mail)
"But when somebody comes along and declares “the old ways are dead and the toy I use is the way of the future”--well, there’s plenty of hubris in that kind of declaration, and it begs for closer examination."
Forget that Base Runs even exists. RC is still clearly wrong, and worthless.
Shouting down questions? I did that the first time. Then I wrote 600 words explaining my opinion on BsR, LW, etc. Of course, you have nothing to say about this, you only question my tone. And as for absolute truth, R=BR*%score+HR is true. The question is [b]How best to estimate score rate?[/b] As Tango said and as I have repeated, this is still an open question. Therefore, BsR is not absolute truth. It is clear from the evidence presented by Tango, though, that for its purpose, it is the best method currently available.
Tony Eason? I HATE those Patriots. Anyway, I have not developed this system, please don't get that misconception. I am just someone who has seen the research and has agreed with it. David and Tango did the work.
Game based and season based data has nothing to do with anything! You can run a regression for the needed B component based on team seasonal totals, and get a reasonable answer, and a similar one to Tango's analysis. Not only that, but a "game" based linear regression will very little different then a "season" based one. You do not need seperate formula for 1 game, 5 games, 50 games, 162 games, 1000 games, 1 million games. You just need a formula for runs. I have the 1990 and 1991 game-by-game team offensive totals here. Running a regression: R=.48S+.75D+1.18T+1.41HR+.32W-.087(AB-H) This is about what you would expect. However, this formula will not work for a game in which there is 1 HR and 27 outs, because it will predict -.9 runs. Static LW simply do not work with extremes.
As to limiting our regression for high offense seasons, that might work, but how many LW formulas are we going to have. "Well, we use this one when the OBA<.1 and the SLG is <.15, and this one is for when the OBA>.5, and the SLG is >.7", this is ludicrous. What we need is a way to *estimate* LW for any conditions. BsR, BsR, BsR. "2. Somewhat less accuracy in normal run-scoring environments." For 1980-2000, using the basic version of XR, the basic version of RC, and a variation of BsR, the RMSEs for team runs in a season are: XR=23.7, BsR=23.9, RC=25.8 Are you REALLY concearned about .2 runs/162 games, and you are going to let this get in the way of having 1 formula that will give a REASONABLE estimate over almost the entire spectrum of offense? If accuracy for a season is the only thing that matters, then linear regression is the only answer, because it will be the most accurate pretty much by definition. Rating a stat with a RMSE of 23.7 as better than one with a RMSE of 23.9 is like rating a .300 hitter as better then a .297 hitter.
September 18, 2002 - Brian
Tango:
Another thought; rather than outs with men on base, perhaps it is the men LEFT on base that would be near to 6.
Not every out of course, even with men on base, prevents the baserunner from scoring. LOB seems like the number you would want.
It also fortunately, is something that you might have an actual value available for, if that is the case.
p.s. sorry for the double posting.
September 18, 2002 - David Smyth
I'm glad Patriot mentioned the accuracy thing. I don't know where the idea got started that BsR has less accuracy. As far as I know, BsR has a noticeably better RMSE than RC for real teams. I am assuming that the same elements are included in each tested formula version, although I have reason to believe that BsR in even it's usual limited version (not including IBB, HBP, GDP, SF, K, and SH) is more accurate than the full RC formula. That was a result of testing done by J Furtado. I would not be at all surprised if BsR were more accurate than XR, assuming the same elements in each version, but I am not sure about that. BsR will not be more accurate than a proper regression using the same elements, because the regression automatically produces the most accurate weighting. But that regression formula will then fall apart when used against individual game data within that set.
The stuff concerning the rearranging of the B factor is interesting. The outs are supposed to represent the negating of the positive value which has been given in the formula to the runners who don't score (mostly those left on base).
September 19, 2002 - tangotiger
(www)
(e-mail)
If we go back to article 2, and the definition of the score rate (or just using common sense), we have:
% of runners scorings = (runners who score) / (runners who score + those who don't)
This is the score rate.
Runners who score is represented by the "B" equation. Though, as mentioned astutely in the earlier post, we should strip out the 4.25 (or whatever constant) to represent this actually.
The "outs" portion, the "C", of the score rate represents those runners who don't score, namely those left on base, and those outs on base.
September 21, 2002 - pund
Some html to aid in formatting.
[pre] Line 1 Line 2 [/pre]or
[br]Line 1
[br]Line 2Both seem to work, (use '<' and '>' instead of '[' and ']') though "pre" changes the font (good for tables) while "br" doesn't.
September 23, 2002 - tangotiger
(www)
(e-mail)
Give me another week please on posting the formula. I'll have to write a whole article on it, as it's not as simple as I thought I could make it.
September 30, 2002 - tangotiger
(www)
(e-mail)
I've added a baseruns article, which is an addendum to the RC series. I apologize for not making it better, but I'd rather get it out there, rather than let it sit on my backburner.