Run Estimators

From Wiki
Jump to: navigation, search

Run estimators are methods which take offensive data for a team (such as singles, walks, outs, etc.) and return an estimate of the number of runs that will result from that output. Run estimators can be divided into several categories.

Types of Run Estimators[edit]

Linear Estimators[edit]

Linear run estimators (often referred to simply as Linear Weights) place a coefficient on each offensive event, and sums the products of the coefficients and the corresponding event frequencies to estimate runs scored. The coefficient for each event is the number of runs the event adds, on average, in a given context. Usually the context is a league average team, but the coefficients can be customized for other contexts.

The weights themselves can be generated by several methods, including empirical analysis through play-by-play data, linear regression, trial and error, and using weights derived from multiplicative or modeling estimators.

Examples of linear estimators include Pete Palmer's Batting Runs, Paul Johnson's Estimated Runs Produced, and Jim Furtado's Extrapolated Runs.

Dynamic Estimators[edit]

Dynamic (or multiplicative) estimators attempt to model the run scoring process rather than simply assigning values to each event. These approaches recognize that the value of a given event depends on the context in which it occurs.

Examples of dynamic estimators include Bill James' Runs Created, Mike Gimbel's Run Productivity Average, David Smyth's Base Runs, and Eric Van's Contextual Runs.

Models[edit]

Sabermetricians have also developed more intensive models to describe run creation. These models cannot be written as a simple formula, as they use advanced mathematical techniques. Markov Chain is one of the most useful techniques, and has been utilized by Gary Skoog, Jeff Sagarin, Tango Tiger, and John Beamer. Another technique is the probability-based approach used by D'Esopo and Lefkowitz as well as, more recently, Carl Morris.

Accuracy of Run Estimators[edit]

The accuracy of a run estimator is usually assessed by comparing the estimates for major league teams to their actual scoring output. The root mean square error (RMSE) between the prediction and the actual figure is often used as a standard of comparison.

Most published run estimators have similar accuracy with actual teams, so many sabermetricians are more interested in how they perform at extreme levels of performance. Since the range of performances is much wider for individuals than it is for teams, methods that have equal accuracy on the team level can have wide differences in accuracy when applied to players. To assess accuracy at extremes, sabermetricians use various techniques, including logical tests (How well does the estimator conform to common sense and the rules of baseball?), intrinsic linear weights (How does the estimator value each offensive event in various contexts?), and performance on the game or inning level (since games and innings incorporate a much wider range of offensive output than aggregate seasonal data).