Forecasting Pitchers - Adjacent Seasons

© Tangotiger

Process

The following table presents the performance of pitchers, year-to-year, based on age, from 1955 to 2003. It is limited to pitchers with at least 250 PA in the same league, year-to-year. Each pitcher in the sample is equally weighted. The numbers represent the pitcher's ratio, relative to the league's ratio, in various component categories.

Since that probably made no sense, let's take an example. Rick Wise, at the age of 21, in 1966, faced 413 batters, and hit 3 of them. (It's actually 416, but I removed the 3 batters he IBB.) That's a rate of .0073. In the AL in 1966, that rate was .0053. I also computed Wise's rate at the age of 22 in 1967.

I repeated this step for all pitchers aged 21/22. I took the simple average of all the pitchers and league rates. This gave me a HBP rate of .0058 at age 21 and .0059 at age 22. The league rates were .0057 and .0058. As you can see, these age 21/22 pitcher's hit batters at a slightly higher rate than the overall league rate.

I then turned these rates into ratios (.0058 became .0058/(1-.0058) = .00583, and .0057 became .00573). That's the ratio of hit batters to batters not hit.

Then, you normalize to the league. So, .00583/.00573 = 1.017 for age 21, and 1.017 for age 22. If I went to more decimal places, the age 22 would have been slightly higher.

Finally, I take the normalized age 22 ratio and divide it by the normalized age 21 ratio, and that gives you 1.004. What this number represents is that a pitcher is more likely to hit a batter at age 22, than age 21.

Year-to-year change in performance

Here is the result of all that for
HBP: HBP/(PA-IBB-HBP)
BB: (BB-IBB)/(PA-HBP-BB)
SO: SO/(PA-HBP-BB-SO)
HR: HR/(PA-HBP-BB-SO-HR)
xH: (H-HR)/(PA-HBP-BB-SO-H)


Age1	Age2	n	 HBP 	 BB 	 SO 	 HR 	 xH
20	21	31	 0.777 	 0.997 	 1.029 	 1.191 	 1.016
21	22	88	 1.004 	 0.993 	 0.988 	 0.971 	 1.009
22	23	206	 0.987 	 0.957 	 0.974 	 1.046 	 1.024
23	24	350	 1.021 	 0.982 	 0.973 	 1.055 	 1.020
24	25	464	 1.011 	 0.986 	 0.996 	 1.062 	 1.014
25	26	586	 0.986 	 0.976 	 0.983 	 1.031 	 1.003
26	27	628	 0.980 	 1.001 	 0.959 	 1.032 	 1.011
27	28	592	 1.089 	 1.005 	 0.966 	 1.030 	 1.030
28	29	520	 1.061 	 0.982 	 0.964 	 1.053 	 1.021
29	30	478	 0.941 	 1.022 	 0.955 	 1.057 	 1.028
30	31	389	 1.067 	 1.013 	 0.937 	 1.083 	 1.008
31	32	351	 0.992 	 0.999 	 0.962 	 1.069 	 1.023
32	33	293	 1.009 	 1.017 	 0.941 	 1.073 	 1.025
33	34	239	 1.039 	 1.005 	 0.967 	 1.053 	 1.025
34	35	183	 1.134 	 1.031 	 0.970 	 1.075 	 1.025
35	36	135	 1.018 	 1.048 	 0.938 	 1.078 	 1.021
36	37	106	 0.892 	 1.010 	 0.971 	 1.085 	 1.018
37	38	67	 1.017 	 0.981 	 0.976 	 1.037 	 1.026
38	39	42	 1.012 	 1.021 	 0.922 	 0.989 	 1.029
39	40	32	 1.249 	 1.095 	 0.948 	 1.103 	 1.033

So, what does this show us? According to this list, a pitcher's strike out rate goes down for every year-to-year pair. His age 22 K ratio is 98.8% of what it was at age 21. For every year, it's the same thing. A pitcher's hit allowed per ball in park goes up for every year-to-year pair.

The next thing to do is to "chain" them. If from age 21 to 22 the K ratio is .988 and from age 22 to 23 it's .974, then from 21 to 23 it's .988 x.974 = .962. Got that?

Year-to-year performances, chained

Here then is the result of all that:


Age	 HBP 	 BB 	 SO 	 HR 	 xH
20	 1.301 	 1.131 	 0.972 	 0.865 	 1.000
21	 1.011 	 1.128 	 1.000 	 1.030 	 1.016
22	 1.015 	 1.119 	 0.988 	 1.000 	 1.025
23	 1.002 	 1.072 	 0.963 	 1.046 	 1.049
24	 1.023 	 1.053 	 0.936 	 1.104 	 1.070
25	 1.034 	 1.038 	 0.933 	 1.172 	 1.085
26	 1.020 	 1.013 	 0.917 	 1.208 	 1.089
27	 1.000 	 1.013 	 0.879 	 1.246 	 1.101
28	 1.089 	 1.018 	 0.850 	 1.284 	 1.134
29	 1.156 	 1.000 	 0.819 	 1.352 	 1.158
30	 1.088 	 1.022 	 0.782 	 1.429 	 1.190
31	 1.161 	 1.035 	 0.732 	 1.548 	 1.200
32	 1.151 	 1.034 	 0.704 	 1.655 	 1.228
33	 1.162 	 1.052 	 0.663 	 1.776 	 1.259
34	 1.207 	 1.057 	 0.641 	 1.870 	 1.290
35	 1.368 	 1.090 	 0.622 	 2.010 	 1.322
36	 1.392 	 1.143 	 0.584 	 2.167 	 1.351
37	 1.242 	 1.155 	 0.567 	 2.351 	 1.375
38	 1.263 	 1.133 	 0.553 	 2.439 	 1.411
39	 1.279 	 1.156 	 0.510 	 2.411 	 1.452
40	 1.597 	 1.266 	 0.484 	 2.659 	 1.500

According to this table, a pitcher peaks at age 27 for Hit batters, 29 for walks, 21 for strike outs, 22 for HR, and 20 for hits on balls in play. The average would be around 23 or 24.

What's the problem?

Selective sampling. Selective sampling and lack of regression towards the mean. The pitchers who are allowed to have back-to-back years of 250 PA have, on average, a better than league performance in year X. Performance does not equal ability. Observed performance is equal to the underlying true talent plus luck. And the better your performance, the more likely it's good luck and not bad luck (on average and for a large enough group of players).

So, what we need is to regress each year X performance, before comparing it to year X+1. But, how much to regress? In another study, I concluded that the regression for pitchers with 650 PA (the average of these year-to-year pairs) was about 10% for K, 20% for BB, 50% for HR, H, and HBP. So, let's see what happens when I use these.

Year to year performances, chained, and regressed


Age	 HBP 	 BB 	 SO 	 HR 	 xH
20	 1.745 	 1.248 	 0.953 	 0.875 	 0.993
21	 1.360 	 1.286 	 0.996 	 1.000 	 1.000
22	 1.375 	 1.322 	 1.000 	 1.002 	 1.002
23	 1.368 	 1.299 	 0.984 	 1.045 	 1.013
24	 1.396 	 1.297 	 0.964 	 1.075 	 1.024
25	 1.389 	 1.287 	 0.964 	 1.112 	 1.033
26	 1.381 	 1.263 	 0.952 	 1.123 	 1.027
27	 1.351 	 1.265 	 0.920 	 1.143 	 1.027
28	 1.401 	 1.264 	 0.894 	 1.136 	 1.043
29	 1.473 	 1.229 	 0.866 	 1.143 	 1.057
30	 1.372 	 1.233 	 0.831 	 1.149 	 1.071
31	 1.377 	 1.224 	 0.783 	 1.189 	 1.068
32	 1.293 	 1.188 	 0.756 	 1.218 	 1.077
33	 1.235 	 1.168 	 0.712 	 1.261 	 1.084
34	 1.183 	 1.132 	 0.688 	 1.290 	 1.095
35	 1.234 	 1.119 	 0.669 	 1.341 	 1.101
36	 1.193 	 1.127 	 0.631 	 1.413 	 1.097
37	 1.045 	 1.093 	 0.613 	 1.483 	 1.099
38	 1.039 	 1.039 	 0.602 	 1.498 	 1.108
39	 1.000 	 1.000 	 0.562 	 1.450 	 1.130
40	 1.139 	 1.029 	 0.536 	 1.464 	 1.149

Whoah, big difference. Now, a pitcher peaks in hit batters and walks at age 39! This is very interesting, since I have hitters also peaking with walks in his late 30s. That is, both hitters and pitchers get much smarter as they age, such that pitchers reduce their walk rates into their late 30s (presumably because they can't overpower hitters) and hitters increase their walk rates into their late 30s (presumably because they can't overpower pitchers).

Even with regression, a pitcher's K, HR and hits ratios tops off at age 21 or 22.

What if my regression rates are wrong? Well, that's definitely a possibility. How much you regress has a huge impact on the whole chaining process. What kind of regression rate would I need to make the K rate peak a little later? If I force in a regression rate of 30% for K (instead of 10%), and 75% for HR and hits (instead of 50%), here's what we get:

Year to year performances, chained, and regressed (part 2)


Age	 HBP 	 BB 	 SO 	 HR 	 xH
20	 1.745 	 1.248 	 0.887 	 0.893 	 1.006
21	 1.360 	 1.286 	 0.956 	 1.000 	 1.009
22	 1.375 	 1.322 	 0.993 	 1.019 	 1.008
23	 1.368 	 1.299 	 0.997 	 1.060 	 1.013
24	 1.396 	 1.297 	 0.992 	 1.078 	 1.019
25	 1.389 	 1.287 	 1.000 	 1.100 	 1.025
26	 1.381 	 1.263 	 0.998 	 1.100 	 1.015
27	 1.351 	 1.265 	 0.978 	 1.112 	 1.009
28	 1.401 	 1.264 	 0.960 	 1.086 	 1.018
29	 1.473 	 1.229 	 0.941 	 1.069 	 1.027
30	 1.372 	 1.233 	 0.913 	 1.050 	 1.034
31	 1.377 	 1.224 	 0.870 	 1.062 	 1.026
32	 1.293 	 1.188 	 0.847 	 1.066 	 1.026
33	 1.235 	 1.168 	 0.797 	 1.085 	 1.024
34	 1.183 	 1.132 	 0.770 	 1.094 	 1.026
35	 1.234 	 1.119 	 0.751 	 1.118 	 1.023
36	 1.193 	 1.127 	 0.716 	 1.165 	 1.007
37	 1.045 	 1.093 	 0.698 	 1.203 	 1.000
38	 1.039 	 1.039 	 0.693 	 1.199 	 1.000
39	 1.000 	 1.000 	 0.661 	 1.149 	 1.015
40	 1.139 	 1.029 	 0.640 	 1.113 	 1.024

Now what do we have? Hit batters and walks remain with a peak of age 39. K rates peak at age 25 (and that's as high as I can get it). HR rates peak at age 21, but are fairly static between age 23 and 35. Hits allowed on balls in park are essentially completely flat (i.e., pitcher's ability does not change).

Conclusion

Be careful!

Seriously, how the selective sampling issues are dealt with will severely impact the results.

You can also consider taking pitchers with at least 5 consecutive years of 250 PAs. But, there's selective sampling there too. If a pitcher managed to get that much playing time, then chances are his performances did not deteriorate as much as others did. If you took pitchers with at least 15 years, what do you think we'd get? Exactly. Rather flat aging processes. However, what these different selections will give you is sort of a "max". If for every different selection criteria you have, the K rate tops off at age 25, then you know what? Chances are, that's the peak.

Number of prior MLB PAs might also affect the aging process.