Construction of Base Runs
Many different versions of Base Runs have been introduced by several sabermetricians, to accommodate different datasets and philosophies. However, all Base Runs formulas take the form:
A*B/(B + C) + D
A represents baserunners; B represents advancement of baserunners; C represents outs; and D represents guaranteed runs (usually just home runs). Thus, Base Runs adheres to the true identity that Runs Scored = baserunners * (% of baserunners that score) + home runs. B/(B + C) is an empirical estimate of the percentage of baserunners that score, and is the main source of potential improvement for Base Runs formulas.
Base Runs was designed to adhere to real-world constraints on runs scored, and succeeds in doing so to a greater extent than other simple run estimators. Base Runs recognizes that a team must score at least one run for each home run it hits and that the number of runners scored cannot be greater than the number of baserunners.
Base Runs Versions in Use
Smyth, in his "Base Runs Primer", defined three Base Runs equations. The first deals solely with basic events:
A = H + W - HR
B = (1.4*TB - .6*H - 3*HR + .1*W)*1.02
C = AB - H
D = HR
A second formula incorporated all of the official offensive statistics with the exception of sacrifices:
A = H + W + HBP - HR - .5*IW
B = (1.4*TB - .6*H - 3*HR + .1*(W + HBP - IW) + .9*(SB - CS - GDP))*1.1
C = AB - H + CS + GDP
D = HR
A third formula was designed to be used with pitching statistics:
A = H + W - HR
B = (1.4*TBe - .6*H - 3*HR + .1*W)*1.1
C = 3*IP
D = HR
Where TBe = 1.12*H + 4*HR
Calibrating Base Runs
The definitions of the four factors in the general BsR equation create a situation in which the elements (and their corresponding coefficients) to be included in the A, C, and D factors is generally straightforward. Thus the B factor is the only one with ambiguity, and is the area in which it is easiest to make alterations. One can "calibrate" Base Runs so that the output equals a desired value. By rearranging the equation, the required B factor is found:
B' = (Runs - D)*C/(A - Runs + D)
The multiplier for the B factor is then simply B'/B.
Variations on Base Runs
The structure of Base Runs lends itself well to experimentation and customization. Thus, there are a number of possible choices of how to define the various factors.
In Smyth's equations presented above, the guiding principle has been that A represents "initial baserunners" (i.e. the number of runners known to have reached safely). One could alternatively use A to represent "final baserunners" (i.e. baserunners after removing runners known to have been retired on base, by being caught stealing or wiped out on a double play) as Runs Created does. Similarly, the C factor, could be defined as "batting outs" (AB - H + SH + SF), all outs (which would include CS and GIDP), or other combinations.
Intrinsic Linear Weights
Since Base Runs is considered to be a logical model of the scoring process, it is often assumed to be accurate when dealing with a wider range of contexts than other run estimators. Thus, the intrinsic linear weights used in Base Runs can be useful, as they can be good approximations of linear weights as found by other means.
Since Base Runs is a dynamic formula, the intrinsic weights are not constant, but rather are dependent on the performance of the entity (team, league, player, etc.) in question. If one knows the total value of the entity's A, B, C, and D factors (denoted by A, B, C, and D) and the coefficient in each of the factors for a given event (denoted by a, b, c, and d), then the intrinsic weight of any given event can be calculated thusly:
weight (event) = ((B + C)*(A*b + B*a) - (A*B)*(b + c))/((B + C)^2) + d
The result of this approach is similar to what one would find by adding one single to a team's totals, and finding the value of the single as the difference between the new and initial BsR estimates.
The intrinsic linear weights approach can also be used to solve for the B coefficients needed to result in a certain set of linear weights for a given set of inputs. For example, suppose that one had the empirically-derived linear weights for the 1976 NL, and wanted a BsR formula that would match the results for an average team. Since the A, C, and D factors have pre-determined coefficients based on how they have been defined, the only thing needed is the B coefficients. By rearranging the above equation, one gets this formula for the B coefficient for any event:
b (event) = ((B + C)^2*(L - d) - B^2*a - B*C*a + A*B*c)/(A*C)
where L is the desired linear weight value for the event and B is B' as given above--the B value needed to make BsR equal to runs for the inputs. In order for the procedure to work, all variables must be included in B, even those that may not normally be put there (like outs).
Base Runs for Individual Batters
Like other dynamic run estimators, Base Runs attempts to model the team scoring process, and thus should not be applied directly to individual batters. In order to use Base Runs to evaluate individuals, several approaches can be utilized:
- One approach is to find the intrinsic linear weights for some entity, and apply those to player performance. This utilizes the weights that Base Runs places on events, but does not allow for the effect that an individual has on the context that he plays in.
- In order to incorporate the individual's impact on his environment, the Base Runs of a league or team can be figured with or without the player's statistics, with the difference credited to him
- A theoretical team approach, in the spirit of that developed by Bill James and applied to Runs Created, can be used. The theoretical team approach puts the player in the context of eight players (usually average), and figures the difference in the team's runs scored with and without the player. David Smyth used this approach to evaluate individual hitters with BsR.
Weaknesses of Base Runs
Base Runs adheres to more of the fundamental constraints on run scoring than most other run estimators, but it is by no means perfectly compliant. Some examples of shortcomings:
- BsR will sometimes give a negative estimate; this happens when the B factor is negative.
- BsR will sometimes project many more than three runners left on base per inning, despite the fact that three is the upper limit. For example, if walks have a B coefficient of .1, an inning with 10 walks and 3 outs will yield an estimate of 10*1/(1+3) = 2.5 runs, meaning that 7.5 runners must have been stranded.
- Tangotiger's research found that BsR overvalued events within the .500-.800 team OBP range
One avenue for possible improvement in the model is the scoring rate estimator B/(B + C). There is no deep theory behind this construct--it was chosen because it worked empirically. It is possible that a better score rate estimator could be developed, although it would most likely have to be more complex than the current one.