SuperLWTS Aging Curve (January 26, 2004)
This is from MGL.
If you remember my article from last year, this was the hitting aging curve I came up with. To compare the two charts, note that my 80% of peak is about -25 runs per 680 GP.
MGL's chart is more instructive, as it includes baserunning and fielding.
Update: MGL's (PDF) file above includes the aging curves for each of the components (hitting, fielding, arm, speed, etc). His last chart (hitting) is the one that is comparable to mine (and is pretty much in-line with what I have).
--posted by TangoTiger at 10:34 AM EDT
Posted 11:09 a.m.,
January 26, 2004
(#1) -
David Smyth
It looks like including the fielding and baserunning "lowers" the peak season a bit to 26, which makes sense. I guess it might be better to use a peak "period" of 25 thru 29 in conversation, instead of trying to get more specific (peak "age").
How does this chart handle the selective sampling problems? Is this just the players who actually played?
Finally, it's interesting to compare that chart vs a salary chart. Young players get paid too little, and old players too much, relative to production. I guess to a degree that's inevitable. But why should a team pay a -13 Slwts 34 yr old, when they can find a minor leaguer of 23 who can produce the same at a lower salary, and with a better chance of improvement.
Posted 11:36 a.m.,
January 26, 2004
(#2) -
tangotiger
I agree that the peak age does go down a bit, since fielding/baserunning is much more speed dependent (and we know those peak pretty early).
I usually use 25-29 to denote peak.
I would almost always stay away from a free agent, since they are usually paid above what they can deliver in the future. No question that trying to make it off the backs of the young guys, if you can find those quality young guys, has a great return for the money.
Posted 1:12 p.m.,
January 26, 2004
(#3) -
tangotiger
I should have noted how MGL did this. He took players from successive seasons (not sure how far back) with at least 100 PAs in each season, and weighted them by the lesser of the two PAs.
There are selective sampling issues to be sure.
One thing that I've started to do is regress the performance of year 1. For the year 2, there's other issues as well, which can be partly handled by regression, but not to the same extent or way that you would do the year 1 regressions. As well, the weighting issue is a little dicey, and you have to be careful there.
I'm sure those with stat knowledge of Heckman Selection Correction can speak better than I can on this issue.
Posted 6:35 p.m.,
January 26, 2004
(#4) -
MGL
It was a quick and dirty chart! I thought it would be interesting because it was everything (Superlwts) and not just batting. I only used 00-33 (4 years). It is also nice to do these analyses using context-neutral data, although it may not make much difference with a large enough sample size. The context does affect the selective sampling issues though (e.g., a marginally talented player in LA is more likely to get sent down or released or retire than a marginally talented player in Col).
I did not address any selective sampling issues plus my sample sizes are small, especially at the corners. 22 and 38 year-olds nubered around 20 in my sample, up to 110 or so for the 26-29 yo's (25 to 31 is by far the biggest group of players, in terms of numbers. In terms of PA's at the various ages, the dropoffs are smoother, i.e., only the very good (talentwise) young and old players get lots of playing time, while mediocre peak players get lots of playing time).
I also want to do separate curves for each Superlwt component! That might be interesting. The SB/CS lwt curve will probably surprise you. Even though players run a lot more when younger, players who run a lot, on the average, because they are young or fast, or both, basically run themselves into around zero net runs. There will probably be no age curve for SB/CS lwts just like I found that there is virtually no y-t-y correlation for SB/CS lwts, which shocked me, until I realized that all players on the average, whether they run a little or a lot, or whether they are fast or slow, bascially run themselves into zero lwts, which is amazing when you think about it! For the few players who do steal a lot and have a high SB %, 2 things happen to make the r for all players almost zero: one, there are very few of those players. Most players who steal a lot, steal at at rate of 65-75%, which means they are literally spinning their wheels. And two, even some of the ones that have high SB% one year can have dismal ones the next (e.g., L. Castillo). Basically there is no predictive value to a player's SB/CS lwts, so for example for a player like Beltran who is arguable a very good overall player, I would attach almost no value to his SB/CS lwts for purposes of talent or projections. Sure, SB attempts and success rate separately might have some decent r, especially the former (although the age curve for SB attempts is probably real sharp), but who cares about either one of those, per se. All you care about is the net value, unless ALL YOU KNEW was one or the other, of course!
Posted 7:28 p.m.,
January 26, 2004
(#5) -
FJM
I'm confused. If as you say "only the very good (talentwise) young and old players get lots of playing time", how can the weights from 33 on all be increasingly negative? Wouldn't such players decline more slowly (or even improve at an advanced age, ala Barry Bonds) than the typical player? How do I interpret the -50 for a 39-year old?
Posted 7:54 p.m.,
January 26, 2004
(#6) -
MGL
Oh BTW, the plateau between 26 and 29 is probably a sample size fluke. I suspect that with Superlwts at least, it should be a smooth decline after age 26. I could be wrong though. In general, when you do graphs like that, especially with limited sample sizes, and particularly since each data point on the x axis could have severe sample size problems, you want to smooth out the curves on the graph, especially since most relationships are indeed smooth (though the shape is not necessarily obvious or evident)...
Also the absolute numbers on the y axis (on the left) mean nothing. I just arbitrarily set the younget age (22) to zero. It is a scale of lwts per 480 PA's (162 average games) though...
Posted 8:12 p.m.,
January 26, 2004
(#7) -
MGL
I'm confused. If as you say "only the very good (talentwise) young and old players get lots of playing time", how can the weights from 33 on all be increasingly negative? Wouldn't such players decline more slowly (or even improve at an advanced age, ala Barry Bonds) than the typical player? How do I interpret the -50 for a 39-year old?
Don't get confused by the fact that only the really good old and young playes are playing and getting substantial time in the major leagues. It's not because they age slower or their peak is any different than any other players. There is no evidence that that is the case, although surely everyone has his own unique peak age (although it is next to impossible to figure it out).
OK, I take that back. Yes, it is probably true that players who are old and still good, as a group have either aged at a slower rate or peaked at a later age. If that's the case, so what?
The only point I was trying to make is that marginal players will tend to get lots of playing time around their peak years only because they are not good enough to play full time when they are young or old, but that very good players are good enough to play full time at all points in ther careers, and obviously they are great players around their peak age...
Posted 8:14 p.m.,
January 26, 2004
(#8) -
MGL
I hit the post button too soon. The curve is based on at least 50 PA's per back to back season and not 100, not that anyone cares. I gave Tango the wrong number.
I agree with Tango, that even if you successfully adjust for the selective sampling issues, etc., the aging curve is going to bascially look the same. It seems like no matter what anyone does, it always comes out looking the same...
Posted 10:24 a.m.,
January 27, 2004
(#9) -
FJM
I'm not trying to be argumentative, but it seems to me you are making 2 statements that are mutually exclusive. If marginal players get significant playing time only in their peak years while good/great players play a lot both early and late in their careers, then it seems to me there must be at least 2 and possibly 3 different aging curves. Is it possible to construct a curve for those with significant playing time before age 24 and/or after age 33 and then a separate one for everybody else?
Posted 10:33 a.m.,
January 27, 2004
(#10) -
tangotiger
File has been updated.
Posted 12:26 p.m.,
January 27, 2004
(#11) -
MGL
FJM, the problem is that you are introducing SEVERE selective sampling issues, in fact, the mother of all selective sampling issues in constructing different aging curves based on playing time! The idea of an aging curve is to model "real" aging patterns, not observed ones (although without selective sampling that is one and the same), such that you can use it to project and adjust other players' stats.
I don't really know how to explain it, plus I don't have the time right now! Maybe someone else can chime in and help, or maybe I am wrong...
Posted 2:53 p.m.,
January 27, 2004
(#12) -
AED
MGL, I think you're overstating the sampling issue here. The fact that a player's future number of opportunities can be affected by luck in his earlier opportunities does not negate the fact that the opportunities themselves were unbiased measurements of ability.
Also, as long as I'm nitpicking, the statistical uncertainty in the difference of lwts/680 between two seasons is something like sqrt(1/N1+1/N2). So in terms of weighting the values for different players, the weight should equal 1/(1/N1+1/N2), not min(N1,N2).
Tango, no need to regress anything here, unless you want to make a prior assumption of the shape of the aging curve (which is dangerous since it's what you're trying to measure).
Posted 3:01 p.m.,
January 27, 2004
(#13) -
tangotiger
(homepage)
AED, If you have not checked it out yet, go to the above homepage link, and click on the article there.
I'd be interested to hear your thoughts on the sampling issue, insofar as the PA component plays a role.
Posted 6:24 p.m.,
January 27, 2004
(#14) -
FJM
I really don't see your problem, since you yourself suggested that playing time early (and late) in a career is a reliable indicator of true talent. But OK, let's say that it isn't reliable. Then use something else as a classification criteria, such as OPS+ or LWTS or whatever you like, but limited to the peak years, 26-29. Classify everybody at least one standard deviation above the mean as a good/great player, and everybody at least one s.d. below the mean as a marginal player. Everybody else is average. Now construct 3 aging curves. The curve for the average group should look pretty much like the overall curve, but I'll be very surprised if the other 2 don't look quite different.
Posted 10:45 p.m.,
January 27, 2004
(#15) -
MGL
Also, as long as I'm nitpicking, the statistical uncertainty in the difference of lwts/680 between two seasons is something like sqrt(1/N1+1/N2). So in terms of weighting the values for different players, the weight should equal 1/(1/N1+1/N2), not min(N1,N2).
Thanks. That's more than a nitpick. I think that the selective sampling is not a major problem, but it could be, could in not? What if only players who had good season were ever allowed back for another season. That group would be comprised of great players almost no matter what - no problem there - but also good, mediocre and bad players who only had lucky seasons. So the entire first year of any "couplet" for any player would have tons of lucky PA's in the first year and essentially unbiased PA's in the second year, such that it would look like everyone was losing ability every year, even if there were no real aging patterns (true ability stayed teh same at all ages). This is not exactly what happens of course, but it does exist to SOME degree. That's the only problem I see. I agree with Tango that you have to some sort of regressing for year x in any x/x+1 couplet because of some unlucky players dropping out and having no x+1 years, such that all players who have x+1 years will automatically have been a little lucky as a group in year x.
I really don't see your problem, since you yourself suggested that playing time early (and late) in a career is a reliable indicator of true talent. But OK, let's say that it isn't reliable. Then use something else as a classification criteria, such as OPS+ or LWTS or whatever you like, but limited to the peak years, 26-29. Classify everybody at least one standard deviation above the mean as a good/great player, and everybody at least one s.d. below the mean as a marginal player. Everybody else is average. Now construct 3 aging curves. The curve for the average group should look pretty much like the overall curve, but I'll be very surprised if the other 2 don't look quite different.
I see what you are saying. I'll have to think about it and/or try it and see what happens. I just think that, off the top of my head, any attempt to group players by "talent" is going to be inextricably related to their aging, a priori. But like I said, I'll have to think about it.
Posted 3:12 a.m.,
January 28, 2004
(#16) -
AED
MGL, I guess I was referring partly to Tango's argument from his old thread than to what you did. You're right about selective sampling, but it primarily affects a player's first year in the majors, not his last. The easiest ways to get rid of that are to either require a player to have played at least N seasons in the majors (to get rid of guys who had one lucky year and then hung on for a little longer) or to eliminate players' rookie seasons from consideration.
Players who are near replacement level from 26-29 will show a spuriously shallow decline because of selection effects -- the only such players still on a major league roster at 34 are those who peaked late or declined slowly and stayed near replacement level.
Posted 7:51 a.m.,
January 28, 2004
(#17) -
Jim
Older players don't only get playing time because they are still good, but often times they are still sitting on large salaries which make teams feel like they NEED to play those players. Can't have 'huge salaries on the bench'.
Posted 8:50 a.m.,
January 28, 2004
(#18) -
tangotiger
The easiest ways to get rid of that are to either require a player to have played at least N seasons
This won't work either. Even for 10 years, you still run the risk that the observed line is a bit higher than what his true talent line is. The fewer the years you use,the bigger the gap between the observed line and the true talent line.
The player's last year's observed point will almost surely be way below his true talent point.
No player's true talent drops as much as his observed performance drops in his last year of play.
Posted 2:16 p.m.,
January 28, 2004
(#19) -
tangotiger
(homepage)
If you go to the above link, and page down to my "June 28" comments, you will see that I did an interesting exercise, similar to what is being asked here.
I broke my groups of players by length of careers, to get different aging patterns.