Tango on Baseball Archives

© Tangotiger

Archive List

SABR 301 - DIPS Bands (July 15, 2003)

This was published last year, but it's worthwhile to look at periodically. It's interesting to try to figure out to the extent that selective sampling based on perceived talent affects the (possible random) distribution of performance.

--posted by TangoTiger at 02:18 PM EDT


Posted 3:50 p.m., July 16, 2003 (#1) - JK
  Its been a long while since Ive done any stats work, so bear with me.

Taking $H* = (Teammate $H/Individual $H), the idea is to show that there is a statistically significant correlation between $H* and pitcher quality, with pitcher quality represented by BIP. The problem is that a pitcher's BIP is a function of how much a manager plays the pitcher, which in turn is a funciton of perceived quality, which in turn is a function of the pitcher's observed $H* to that point. So cet par, pitchers who have exhibited better past $H* will get more BIP opps; thus, selection bias.

IIRC the usual way to correct for selection bias is a two-step Heckman procedure, where you specify a "selection equation" (usually using a probit model) and then use the residuals of the selection equation to generate a control factor for the selection bias. Then in the second step you just run an ordinary OLS regression on the substantive question that you are looking at (eg $H* = B1*BIP + B2*X2 + . . . + Bn*Xn + Constant + e), but also including the selection bias control factors derived in the first regression as an independent variable.

Unfortunately, I just dont know how to pull this off; Im not even sure how to configure the selection model properly. I do know most of the good commercial stats packages out there will do must of the heavy lifting for you if you get the basic specification right. Hopefully someone who knows a lot more about stats will post a solution.

Posted 3:59 p.m., July 16, 2003 (#2) - Nick S
  Nice chart. This is generally supportive of the DIPS methodology (i.e. pitchers have little variation in their true ability to prevent H/BIP.) The average groups have right about the 2/3 of pitchers within 1 SD, as would be expected if all pitchers had the very same ability to prevent H/BIP. Towards the low BIP end, the sample is weighted towards pitchers who did poorly at preventing H/BIP, which would, of course, raise their ERA and lower the likelyhood of their staying around the majors. This does not mean that the pitcher's neccessarily had a lower $H ability (although, on average I'm sure they did, just not to any great effect), but rather that they were unlucky. Poor fellows.

The upper end is just the opposite, but with a similar interpretation. It does seem likely that with these large sample sizes, that some pitchers do have a better than average $H ability, but you should note that for a pitcher like Maddux who was 0.008 H/BIP better than average, this comes out to about 5 hits over the course of a season (maybe 3 runs or so). The absolute upper limit for a pitcher is probably less than 0.03 H/BIP, which would be about 20 hits in a season.

Posted 4:33 p.m., July 16, 2003 (#3) - JK
  A thought:
Create dummy variables for each BIP band. Then run the Heckman where the selection model is a probit model estimating the probability of belonging in a given BIP band given $H* and any other included indepedent variables. Then import the control factor based on those residuals into an OLS regression of $H* against the BIP category dummies plus other included indep. variables plus the control factor. Only thing is, I'm not quite sure how this is done for multiple exclusive dummy variables.