See copyright notice at the bottom of this page.
List of All Posters
Felipe Alou: Is He Afraid of the Walk?
November 13, 2002 - MGL
Tango, this is more than just a controlling for age situtaion. Since players significantly increase their walk rate throught their careers, you definitely need to look at pre versus with or post-Alou, controlling for the normal walk rate progression with time. If you don't, and you study any manager or any team, it will probably look like no manager or team negatively impacts a player or players' walk rate, simply becuase time has elapsed...
I like the study, but unless you use a control group or you control your study group for age or at least for "time progression" I don't think you arre likely to get any meaningful results...
I agree with David that even if Alou does not impact a player's walk rate while he plays for his team and/or after he leaves, his disdain for walks may impact the team by him helping to sign and/or him giving more playing time to "aggressive players". Of course, the Alou quote belies the fact that Alou may have this "anti-walk" philosophy...
BTW, is no one concerned that Alou is obsessed with the sac bunt? He executed 65 non-pitcher bunts this year (which is around 110 attempts). The Expos also had 118 SB and 64 CS. This concerns me too. That's a lot of stolen base attempts with absolutely nothing to show for it. This is definitely not a guy that I would want managing MY team...
Felipe Alou: Is He Afraid of the Walk?
November 14, 2002 - MGL
My bad Vinay! Whoever it was ordered an awful lot of bunts! Well, at least I didn't get flamed as I would have on Clutch Hits, will I will not post on anymore, BTW...
I also agree that I don't think that too many batters will change their approach to hitting because of a particular manager or coach...
Felipe Alou: Is He Afraid of the Walk?
November 14, 2002 - MGL
David, it wasn't so much being flamed (I can take it and I bring some of it on due to my sometimes "in-your-face" style of critiquing other people's comments, of course), but the childish posts on Clutch Hits. I am not used to those kinds of "general circulation" boards I guess (I'm used to FanHome, although it has died since it changed venues - mainly becuase I think the new venue sucks - hard to maneuver, etc.).
Actually I can't believe the number of people who take the time to read Primer (or even know about it) and then take further time to post something that belongs in an AOL teen chat room. Also, I have little patience for unimformed, misguided persons who are not willing to learn.
If you want to see some examples, I guess look at some of the posts on the Tejada MVP thread and the Clayton versus Ordonez thread (I think). It's not that big a deal. I don't care much about the flames. It's just too frustrating and time consuming to wade through all the crap on Clutch Hits; I've got better things to do (not that I ever get to them). Thanks for some of the sympathy though guys!
Although my style is not always conducive to "acceptance" (i.e., I often annoy and piss people off) I try and add as much value as I can to these boards as well as to the field of sabermetrics in general - not becuase it matters one iota to the general population (it doesn't) or not because it makes the game of baseball any better (it doesn't), but simpy because I like it and it keeps me out of trouble (and out of class often-times - but that's another story - at least I only have one more semester to go)...
Banner Years
October 31, 2002 - MGL
Excellent work!
You lost me a little on this one (you have a great writing style which generlaly makes evetything crystal clear. Either you deviated a litlle from such a style of I syddenly became thick. The latter is entirely possible. In any case, if you could explain the following (as if I were a 6-yo child).
"Let's put it altogether
Let's go back to our banner year groups. If we assume that 14% of the weight should be given to regression towards the mean, then we can use the following weights to predict next season's performance level:
24% - Year 1 24% - Year 2 38% - Year 3 14% - average
By using these weights, we can predict the Year 4 values that were produced by the last two studies. As a shorthand, you can say 1 part average, 2 parts each first two years, 3 parts third year."
(Also why the blue font? Or is that juts my browser? I can't seem to highlight, in order to "copy and paste" any of the blue text!)
For those of you who question why 3 149% years followed by a 142% year is "3 lucky" years and one "more indicative of talent" year, that is exactlty true! It's hard to explain why. I'll try. First of all, the 142% is exacty what we would expect after 3 years of 142%!
ANY TIME WE SAMPLE A PLAYER'S TALENT (1 YEAR, 2 YEARS, 5 YEARS) WE EXPECT THAT HIS TRUE TALENT LEVEL IS EXACTLY EQUAL TO THE SAMPLE MEAN (LWTS RATIO, OPS, OR WHATEVER) PLUS A REGRESSION TOWARDS THE MEAN OF THE POPULATION THAT PLAYER COMES FROM!
Without knowing the age, height, weight, etc, of that player, we have to assume that that player comes from a population of professional baseball players only. So the mean towards which our 3-year sample will regress is 100% (the mean normlalized lwts ratio of all players, which is by definition, 100%). And of course, the smaller the sample we have (say 2 years of 149%, as opposed to 3 years), the more we expect the next year to regress. Without doing the work, I KNOW that 2 years of 149% will be followed by something LESS than 142%.
Any given year, where we don't know a priori, what the value of that year is, is neither expected to be lucky or unlucky. That is the 142% year - simply a random (for players who already had 3 good years, of course) year whose value is unknown before we calculate it. Therefore, we neither expect it to be a lucky or an unlucky year.
The 3 149% years, ARE BY DEFINTION LUCKY YEARS!
Banner Years
October 31, 2002 - MGL
Excellent work!
You lost me a little on this one (you have a great writing style which generlaly makes evetything crystal clear. Either you deviated a litlle from such a style of I syddenly became thick. The latter is entirely possible. In any case, if you could explain the following (as if I were a 6-yo child).
"Let's put it altogether
Let's go back to our banner year groups. If we assume that 14% of the weight should be given to regression towards the mean, then we can use the following weights to predict next season's performance level:
24% - Year 1 24% - Year 2 38% - Year 3 14% - average
By using these weights, we can predict the Year 4 values that were produced by the last two studies. As a shorthand, you can say 1 part average, 2 parts each first two years, 3 parts third year."
(Also why the blue font? Or is that juts my browser? I can't seem to highlight, in order to "copy and paste" any of the blue text!)
For those of you who question why 3 149% years followed by a 142% year is "3 lucky" years and one "more indicative of talent" year, that is exactlty true! It's hard to explain why. I'll try. First of all, the 142% is exacty what we would expect after 3 years of 142%!
ANY TIME WE SAMPLE A PLAYER'S TALENT (1 YEAR, 2 YEARS, 5 YEARS) WE EXPECT THAT HIS TRUE TALENT LEVEL IS EXACTLY EQUAL TO THE SAMPLE MEAN (LWTS RATIO, OPS, OR WHATEVER) PLUS A REGRESSION TOWARDS THE MEAN OF THE POPULATION THAT PLAYER COMES FROM!
Without knowing the age, height, weight, etc, of that player, we have to assume that that player comes from a population of professional baseball players only. So the mean towards which our 3-year sample will regress is 100% (the mean normlalized lwts ratio of all players, which is by definition, 100%). And of course, the smaller the sample we have (say 2 years of 149%, as opposed to 3 years), the more we expect the next year to regress. Without doing the work, I KNOW that 2 years of 149% will be followed by something LESS than 142%.
Any given year, where we don't know a priori, what the value of that year is, is neither expected to be lucky or unlucky. That is the 142% year - simply a random (for players who already had 3 good years, of course) year whose value is unknown before we calculate it. Therefore, we neither expect it to be a lucky or an unlucky year.
The 3 149% years, ARE BY DEFINITION LUCKY YEARS! That is because we purposefuly chose players with 3 good years, relative to the league average. Any time we purposely choose goof or bad years, we are, again, by definition, choosing "good (or bad) AND lucky (or unlucky) years. That is why all good or bad years will regress towards more average ones!
Beleive it or not, now that we have 3 years of 149% followed by 1 year of 142%, we have a "new" sample talent of around 146.7 (ignoring the 1st 149% year, as it is gettin gold). In fact, let's bump up the 146.7 to 147 because fot he first 149% year. Even though this is our "new" talent sample for our player, it is still not our best "estimate" of his talent, i.e., our prediction for th enext year! We still think that all 4 years have been lucky (remember all samples above average are automatically lucky and all samples below average are automatically unlucky; how lucky or unlucky delends upon the size of the sample; here we have 4 years of above average performance - it is still lucky performance, but not by that much) so we think that our 147% projection is TOO HIGH! Again, without looking at the database, I can tell you that the 5th year will be something less than 147% - probably around 145% (maybe less, becuase we may start to have an age bias - th elonger the sample you look at, the more likely it is that you are looking at older players).
All that bein said, alrthough I don't think it is a problem either, I would address the age thing - either control for age or at least include it in your results.
I would also do some park adjusting or at least inlcude some park info in your results. I am a littel concerned that a banner year is weighed towards players who have changed home parks, so that, for example, banner year followed by 3 average years will tend to be hitters park followed by pitcher's park, so that in 5th year, the player is more likely to be in pitcher's park, whereas 3 average years followed by banner year suggests that the player is more likely to be in hitter's park for 5th year. This would "screw up" the weighting system...
Banner Years
October 31, 2002 - MGL
Stephen, there is no falw in my argument! All players (as a group, NOT every single player - heck some 149/149/149 are actually 155 players!) will regress towards the mean, because...
Some of those players are actually 150 (let's ignore the exactly 149 players) or better players and got UNLUCKY, and some of those players are less than 149 players and got lucky. Those are the only two choices. The chances that any given player is less than a 149 player and got lucky is MUCH higher than the chances that he is better thana 149 player and got unlucky, sinply because there are many, many more sub-149 players.
The chances that a random 149/149/149 player if actually a sub-149 player who got lucky as compared to an above-149 player who got unlucky goes down as the sample size of the 149 group goes up (for example 149/149/149/149). However the upper limit (when the sample size is infinite) of the real talent of a sample group of players who have a sample performance of 149 is 149! It can never be higher and it can never be exactlty 149. It must be lower! That is why all all samples of performance above or below average will ALWAYS regress towards the mean of the population they come from. This is not my argument or opinion. It is a mathematical certainty, based on the "upper limit theorem" I describe above.
The only caveat is the definition of the "population they come from". In Tango's study, he looked at ALL players. Any player who had a 149/149/149 period qualified. Yes, this group is comprised of mostly good players (140, 135, 155, etc.), but they still come from a population of all ML players (technically all ML players who have played full time for 3 consecutive years, which is probnably an above average population, so they will not regress towards 100%, but maybe 110%). Now if we only look at first baseman or players over 6 feet tall, then the number towards which we regress will change...
David, your writing is excellent (isn't that what I said?). I just got hung up on the part I quoted in my last post. Could you re-explain that part please? As I also said, it may just be me being thick. Why did you and DS claim that I said that you didn't write the article well? That's an example of only telling part of the story and thereby disttorting the truth (like politicians and commentators do all the time in order to prove a point). I know youi guys didn't intentionally do that, but it is a bugaboo of mine...
Banner Years
November 1, 2002 - MGL
[i]I hope you guys can indulge a stats novice for a minute.
It's been a few years since I took true score theory, but from what I remember, outcomes are a function of true score plus measurement error. So, in other words, in some book in heaven somewhere, it may be written that Barry Bonds is a .320 hitter. Everything else represents measurement error. I think I understand regression to the mean. If the average baseball player hits .280, then we would expect Barry Bonds to follow a "true score" season with something less than .320. If I were a betting man, I would go with that.[/i]
You actualy have a nice handle on what's going on! Basically any player who hits better or worse than average over any time period of time is "expected" to hit closer to the mean than his sample hitting indicates. It's as simple as that. It is not conjecture. It is a mathematical certainty. It is a fundamental aspect of sampling theory. I think you completely understand how that works.
Given that we have a sample of a player's hitting (1 year, 5 years, whatever), that sample number is ALWAYS the "limit" of our "best estimate of his true talent" which is, of course, the same as his projection. For example, if Bonds' sample BA is .320, that is the "limit" of his true BA. Now the only thing left is to determine if a player's sample performance, like Bonds' .320 BA, is the upper or lower limit of his true level of perforamnce. The way we do that is simple, once we know the mean performance of the population that our player was selected from. If that mean is less than the player's sample performance than the sample performance is the upper limit of his true talent. If it is greater, then his sample performance is the lower limit of his true talent. In practice, it is usually easy to guess whether that mean is greater or less than a player's sample performance. In some cases, however, it it is not so easy.
For Bonds, if his sample BA is .320 we are pretty sure that no matter what, the mean BA of the popualtion that he comes from is less than that, so we estimate his true BA at something less than .320. That doesn't mean that we KNOW or that it is 100% certain that his true BA is less than .320. That's where a lot of people are making a mistake. There is a finite chance that he is a true .320 hitter, a true .330 hitter, or even a true .250 hitter who has been enormously lucky. All these things have a finite chance of being true. It's just that if you add up all the various true BA's times the chances of their occurrence, sampling theory tells you that you get a number which is closer to the population mean than his true BA. How much closer is completely a function of how large your sample of performance is and nothing else.
The other tricky part that gets people in trouble is "What IS the population of players that a particular player comes from and what is the mean of that population?" After all, that is an important number since that is the number that we need to regress to. Finding out or estimating that number can be tricky sometimes. If we pick a player from a list of all players without regard to anything other than he has a high or low BA, or whatever we happent obe looking for, then we know that the population is ALL BATTERS. It doesn't matter that we are picking a player who has a high or low BA deliberately. There is no "selection bias" as far as the population and its mean is concerned. Remember no matter what criteris we use to choose a player, the population that that player belongs to for purposes of estimating a performance mean that we will regress to, is the group of players that we are slecting FROM, NOT the group of players that we think that our player belongs to (good hitters for example)! If we pick a .320 player from a list of all ML players (or some random sample of all NL players), then that player comes from a population of ALL players and hence the population mean that we regress to is the mean of all ML players.
Now if we find out something else about that player we chose, then all of a sudden we have a differnent population of players and we have to choose a differnent mean BA, which is not all that easy sometimes. For example, if we find out if that player is a RF'er then all of a sudden we have a player who comes from the popualtion of ML RF'ers and NOT all ML players. Obviously the mean BA of all ML RF'ers is different than that of ALL ML players. Same thing if we find out if our player is tall or heavy or LH or RH or fast or slow, etc.
Anyway, for the umpteenth time, that's regression to the mean with regard to baseball players, in a nutshell, for whatever it's worth...
[i]So if Barry Bonds hits .320 for three years in a row, his failure to regress represents luck. But why does that mean that his true score is not .320? Why can't Barry just be a lucky player who happened to hit at his true score level for three straight years?[/i]
See the explantion above. Yes he could be a true .320 player, just like he could be a true .350 player or .280 player. It's just that the best mathematical estimate of his true BA is NOT .320, it is something less, depending upon the mean BA of his population (big, possibly steroid laden, black, LH, RF'er who has a very nice looking swing, has a great reputation, a talented father, etc.) and how many AB's the .320 sample represents...
Whew!
Banner Years
November 1, 2002 - MGL
Tango, why don't you think that a sample mean of any player you choose who has played 5 years with > 299 AB per year will not regress towards 155%? It will, if that 155% is the mean of the population of all such players. Now, in order to get (estimate) the mean of that population, you cannot weight player's numbers. For example, you cannot have more than one Aaron in your group. The population mean must come from a random, non-weighted sample of all players who have played at least 5 years with 300 or more AB per year (or whatever your criteria was). So you must find all players who fit that description and give each player equal weight even though some of those players (Aaron for example) may have many such 5-year spans.
What numbers (let's just call in BA) do you use for those players who appear more than once (have more than one 5-yrar span with 300 or nore AB's each year)? You would take the average BA over all years for that player, as that would represent the best estimate of that player's true BA for any 5-year period...
Banner Years
November 1, 2002 - MGL
BTW, I contradicted myself (in a subtle way) in my second to last post. I said that in order to determine what specific population a players comes from, we look at the "list of players" that we selcted our player FROM. Then I went on to say that if we found out afterwards that a player was tall, we would change our population (from ALL ML players to ALL tall ML players). This appears to be a contradiction, which it is.
Whay I meant was that we can use any characteristic we know about our player (either before or after we chose him) to define or estimate the population of players he comes from. We cannot, however, use his BA (or whatever it is we are trying to regress) to determine what population he comes from (for example, if it is .320, we cannot say "Oh, he must be from a population of good hitters), becuase that is what we are trying to ascertain in the first place (the chances that he IS a good hitter versus the chances that he is a bad or average hitter who got lucky, etc.). It's sort of analagous to the dependent and independent variables in a regression analysis. The characteristics of the player we are regressing (like height, weight, position, etc.) are all the "independent" variables and his BA (or whatever number we are trying to regress) is the dependent variable. The "independent" variables determine the population that he is from for pusposes of figuring out what number we should regress to (the mean BA of that population), while the dependent variable (the sample BA of that players) CANNOT be used to make any inferences about that population (for purposes of establishing a BA to regress to)...
Banner Years
November 2, 2002 - MGL
As far as I know, and I am no math or statistics maven (maybe slightly ahead of an ignoramus but something short of a sciolist), linear algebra is an advanced, college and graduate level, field of mathematics. So anyone who comprehends nothing more than linear algebra is indeed more advanced than I...
Banner Years
November 2, 2002 - MGL
Actually I wanted to add one more thing about "regression" as it relates to projecting talent in bsseball, assuming of course, that not EVERYONE is ignoring this thread now that the cat's out of the bag (that Tango and I are ignoramuses when it comes to statistics).
While the mean of a population from which a player comes determines the upper or lower limit of his true BA (from now on, when I use BA, it is simply a convenient proxy for any metric which measures talent), it isn't that useful in terms of knowing how much to regress a player's sample BA in order to estimate his true BA. In fact, it isn't necessary at all. Nor does the size of the player's sample BA tell us how much to regress, UNLESS AND UNTIL WE KNOW OTHER CHARACTERISTICS OF THE POULATION.
What I mean by that is that there are actually 2 things that tell us exactly how much to regress a player's sample BA to determine the best estimate of his true BA, and one of them is not the mean BA of the population to which the player belongs.
One of those things IS the size of the sample BA (1 year, 4 years, etc.). The other is the DISTRIBUTION OF THE TRUE BA'S OF ALL PLAYERS IN THE POPULATION.
Once we know those 2 things, we can use a precise mathematical formula (it isn't linear algebra, I don't think) to come up with an exact number whihc is the best estimate for that player's true BA.
Let's back up a little. In normal sampling statistics, a player's BA over some time period would be sample of his own true BA and our best estimate of that player's true BA would be exactly his sample BA. So if player A had a .380 average during one month and that's all we knew about this player, regular sampling theory would say that our best estimate of his true BA was .380 and we could use the number of AB's that .320 was based on (the sample size) to determine how "confident" we were that the .320 WAS in fact his real BA, using the standard deviation of BA, which we can compute using a binomial model, etc., etc. Most of you know that.
Now here is where we sort of veer away from a normal "experiment" in sampling statistics, when it comes to baseball players and their talent. We KNOW something about the population of all baseball players, which means, both mathematically and logically, that the .320 sample BA in one month (say 100 AB's) is not necessarily a good estimate of that player's true BA. We know logically that if a player hits .380 in a month that he is NOT a .380 hitter. The only reason we know that, however, is because we know that there is either no such thing as a .380 hitter or at least that a .380 hitter is very rare. If in fact we knew nothing about the range of what ML baseball players usually hit, we would then HAVE TO say that our player was a .380 hitter (within a certain confident interval, which would be around plus or minus 90 points to be 95% confident as the SD for BA in 100 AB's is around 45 points).
So now the question, as always, is, given that our player hit .320 in 100 AB's and given that we KNOW that players rarely if ever have true BA's of .380, what IS the best estimate of our player's true BA (still within the various confidence intervals)?
Let's say the mean BA of the population of ML baseball players (for the same year as our .380 sample) is .270. According to my other posts, that is the number that we regress the .380 towards, and the number of AB's the .380 is based on (100 ) determines how much we regress. Well, the first part is always true (the .270 is the lower limit of our player's true BA), but the second part is only true given a certain set of characteristics of the population of baseball players. IOW, it is these characteristics that FIRST determine how much we regress the .380 toeards the .270. Once we establish those characteristics, the more sample AB's we have, the more we regress.
What are those characteristics we need to determine before we can figure out how much to regress the .380 towards the .270? It is the percentage of batters in the population (ALL ML players in this case, since we know nothing about our .380 hitter other than he is a ML player) who have various true BA's. IOW, we need to know how many ML players are true .210 hitters, how many are true .230 hitter, true .320 hitters. etc. Obviously, there is a whole continuum of true BA's among ML players, but it would suffice for this kind of analysis if we estimated the number of players in each range. Now, estimating the number of players in baseball for each range of true B'A's is not easy to do and is a little curcuitous as well. The only wayt to do that is to look historically at players who have had a long career and assume that their lifetime BA is is their true BA. Of course, even that lifetime BA would have to be regressed in order to get their true BA, so that's where the "curcuitous logic" comes from - "in order to know how much to regress a sample BA, we have to find out the true BA's of ML players and in order to find out those true BA's we have to know how much to regress a player's lifetime BA..."
We have other problems in terms of trying to figure out hoe many players in ML baseball have true BA's of x. For example, not many players who have true BA's of .210 have long careers, so if we only loked at long careers to establish our percentages, we might miss some types of players (those with very low true BA's). In any casze, let's assume that we can cone up with a fairly good table of frequencies for the true BA's of all ML players. It might look something like <.200 (.1%), .200-220 (1%), .220-.230 (3%),..., .300-.320 (2%), etc.
NOW we can use Baysean (lower on the total pole than linear algebra) probability to figure our .380 player's true BA! The way we do that goes something like this:
What are the chances that our player is a true .200-.220 hitter (1% if we know nothiong else about this hitter other than he is a ML player) GIVEN THAT
Banner Years
November 2, 2002 - MGL
Actually I wanted to add one more thing about "regression" as it relates to projecting talent in bsseball, assuming of course, that not EVERYONE is ignoring this thread now that the cat's out of the bag (that Tango and I are ignoramuses when it comes to statistics).
While the mean of a population from which a player comes determines the upper or lower limit of his true BA (from now on, when I use BA, it is simply a convenient proxy for any metric which measures talent), it isn't that useful in terms of knowing how much to regress a player's sample BA in order to estimate his true BA. In fact, it isn't necessary at all. Nor does the size of the player's sample BA tell us how much to regress, UNLESS AND UNTIL WE KNOW OTHER CHARACTERISTICS OF THE POULATION.
What I mean by that is that there are actually 2 things that tell us exactly how much to regress a player's sample BA to determine the best estimate of his true BA, and one of them is not the mean BA of the population to which the player belongs.
One of those things IS the size of the sample BA (1 year, 4 years, etc.). The other is the DISTRIBUTION OF THE TRUE BA'S OF ALL PLAYERS IN THE POPULATION.
Once we know those 2 things, we can use a precise mathematical formula (it isn't linear algebra, I don't think) to come up with an exact number whihc is the best estimate for that player's true BA.
Let's back up a little. In normal sampling statistics, a player's BA over some time period would be sample of his own true BA and our best estimate of that player's true BA would be exactly his sample BA. So if player A had a .380 average during one month and that's all we knew about this player, regular sampling theory would say that our best estimate of his true BA was .380 and we could use the number of AB's that .320 was based on (the sample size) to determine how "confident" we were that the .320 WAS in fact his real BA, using the standard deviation of BA, which we can compute using a binomial model, etc., etc. Most of you know that.
Now here is where we sort of veer away from a normal "experiment" in sampling statistics, when it comes to baseball players and their talent. We KNOW something about the population of all baseball players, which means, both mathematically and logically, that the .320 sample BA in one month (say 100 AB's) is not necessarily a good estimate of that player's true BA. We know logically that if a player hits .380 in a month that he is NOT a .380 hitter. The only reason we know that, however, is because we know that there is either no such thing as a .380 hitter or at least that a .380 hitter is very rare. If in fact we knew nothing about the range of what ML baseball players usually hit, we would then HAVE TO say that our player was a .380 hitter (within a certain confident interval, which would be around plus or minus 90 points to be 95% confident as the SD for BA in 100 AB's is around 45 points).
So now the question, as always, is, given that our player hit .320 in 100 AB's and given that we KNOW that players rarely if ever have true BA's of .380, what IS the best estimate of our player's true BA (still within the various confidence intervals)?
Let's say the mean BA of the population of ML baseball players (for the same year as our .380 sample) is .270. According to my other posts, that is the number that we regress the .380 towards, and the number of AB's the .380 is based on (100 ) determines how much we regress. Well, the first part is always true (the .270 is the lower limit of our player's true BA), but the second part is only true given a certain set of characteristics of the population of baseball players. IOW, it is these characteristics that FIRST determine how much we regress the .380 toeards the .270. Once we establish those characteristics, the more sample AB's we have, the more we regress.
What are those characteristics we need to determine before we can figure out how much to regress the .380 towards the .270? It is the percentage of batters in the population (ALL ML players in this case, since we know nothing about our .380 hitter other than he is a ML player) who have various true BA's. IOW, we need to know how many ML players are true .210 hitters, how many are true .230 hitter, true .320 hitters. etc. Obviously, there is a whole continuum of true BA's among ML players, but it would suffice for this kind of analysis if we estimated the number of players in each range. Now, estimating the number of players in baseball for each range of true B'A's is not easy to do and is a little curcuitous as well. The only wayt to do that is to look historically at players who have had a long career and assume that their lifetime BA is is their true BA. Of course, even that lifetime BA would have to be regressed in order to get their true BA, so that's where the "curcuitous logic" comes from - "in order to know how much to regress a sample BA, we have to find out the true BA's of ML players and in order to find out those true BA's we have to know how much to regress a player's lifetime BA..."
We have other problems in terms of trying to figure out hoe many players in ML baseball have true BA's of x. For example, not many players who have true BA's of .210 have long careers, so if we only loked at long careers to establish our percentages, we might miss some types of players (those with very low true BA's). In any casze, let's assume that we can cone up with a fairly good table of frequencies for the true BA's of all ML players. It might look something like <.200 (.1%), .200-220 (1%), .220-.230 (3%),..., .300-.320 (2%), etc.
NOW we can use Baysean (lower on the total pole than linear algebra) probability to figure our .380 player's true BA! The way we do that goes something like this:
What are the chances that our player is a true .200-.220 hitter (1% if we know nothiong else about this hitter other than he is a ML player) GIVEN THAT he hit .380 in 100 AB's (much less than 1% of course)? What are the chances that he is a .300-.320 hitter given that he hit .380, etc (more than 2% of course)?...
Do all the multiplication and addition (arithmetic, MUCH lower than linear algebra) and voila we come up with an exact number (true BA) for our .380 hitter (which still has around a 90 point either way 95% confident interval).
Remember that the mean BA doesn't tell us anything about how much to regress or what the final estimate of the true BA of our .380 hitter is; it only tells us the limit of the regression, and in fact, we don't even need to know that number, as in the calculations above. For example, let's say that the mean BA for all ML players were .270, as in the above, bu that all ML players ahd the same true BA. The true BA for our .380 hitter or ANY hitter with any sample BA in any number of AB's would be .270. Let's say that 1% of all ML players had true BA's of .380 and 99% had true BA's of .290. What would our .380 player's true BA be?
It is either .380 or .290, so it's not really a "fair" question. We could answer it in 2 ways. One would be that "there is an X percent chance that he is a .290 hitter (who got lucky in 100 AB's) and a Y percent chance that he is a .380 hitter (who hit what he was "supposed to" in 100 AB's). The other answer is that he is a .zzz hitter, where the .zzz is X percent times .270 plus Y percent times .380, divided by 100. Here's how we would do that calculation:
The chances of a .290 hitter hitting .380 or better is .025 (.380 is 2 standard deviations above .290 for 100 AB's). The chances of a .380 hitter hitting .380 or better is .5. So if we didn't know anything about the frequencies of .290 or .380 hitters in our population, our player is 20 times more likely to be a .380 hitter than a .290 hitter (.5/.025), or he has a 95.24% chance of being a .380 hitter. But since 99% of all players are .290 hitters and only 1% are .380 hitters, we now have the chances that our player is a .380 hitter at 20%, rather than the initial 1%. So we can say that our hitter has a 17% chance of being a .380 hitter and an 83% chance of being a .290 hitter or we can say that our hitter is a .305 hitter. We get the 20% chance of our hitter being a .380 hitter by the following Bayesian formula: The ratio of the chance of being a .290 hitter who hit .380 or better (.99 times .025 or .02475) to the chance of being a .380 hitter who hit .380 or better (.01 times .5 or .005), is 4.95 to 1. That means that it is 4.95 more likely that our .380 hitter is a true .290 hitter who got lucky, so the chances of our hitter being a .290 hitter is .8319, and hence .1681 for being a .380 hitter.
That same above Bayesian calculation would apply for any number of categories of true BA's in the population and the percentage of players in each category.
Now, given the difficulty in determining the categories and frequencies for true BA's in the population of ML baseball players and given the cumbersome nature of the ensuing Bayesian calculations, we can forgoe all of that by using a linear regression formula to approximate the same results. If we used a single regression formula for say the above example (a player who hits .380 in 100 AB's), we would take a bunch of data points constituting all players with a certain BA in 100 AB's (our independent variable) and regress this on those same players' BA for the next year or preferably multiple years. As usual, this will yield two coefficients, A and B in our y-Ax+B linear equation, and B will be colse to the mean BA of all baseball players (actualy the mean BA in our sample group we are using in the regression analysis). Remember that these coefficients will only work for 100 AB's. If we want to do the sem thing for a player with 500 AB's, we have to do a new regression analysis and derive a new equation, OR we can do a multiple regression analysis where number of AB's is one of the independent variables. Unfortunately, due to my status as a statistics ignoramus, I don't know wether there is a linear relationship if we include # of AB's (I don't think there is), in which case you would have to do a non-linear analysis, which is beyond my abilities...
Banner Years
November 3, 2002 - MGL
In case there is anyone on the planet still reading this thread, the 4th sentence in the third paragraph from the bottom should read (among lots of other spelling and grammar errors):
But since 99% of all players are .290 hitters and only 1% are .380 hitters, we now have the chances that our player is a .380 hitter at 17% (NOT 20%), rather than the initial 1%.
Banner Years
November 6, 2002 - MGL
Brother of Jessie,
Sorry but your argument is mathematically (statistically) unsound. I don't have the time right now to explain why.
BTW, Tango's 149 149 149 149 142 observation is not a revelation. It doesn't "need" explaining nor is it open to "criticsm".
It is a mathematical certainty that no matter what the distribution of true linear weights ratios in the population of baseball players is, any player or players who show an above or below average in any number of years, will "regress" towards the mean (100 in this case) in any subsequent year. How much they regress (in percentage, like the 7/49, if you want to put it that way) depend entirely on how many PA's the historical data is comprised of. In this case, 4 years of 149 regressed to 142 in the 5th year. One year of 149 will regress to, I don't know, something like, 120. Tango's observations were just to make sure that nothing really funny (like the statistician's view of the world is completely F'uped) was going on. We don't need to look at ANY data to tell us that 4 149's will be followed by something less than but closer to the 149, or that 2 149's will be followed by something even more less than and less close to 149. Again, it is a mathematical certainty, even if there is lots of learning and developing going on with baseball players. The learning and developing can only decrease the amount of regression; it cannot eliminate it! Of course, what we will and do find if we look at real-life data, is that this learning and developing (to the extent that people "read into" these banner years) is small or non existent beyond the normal or average age progression. The reason we know this is that if the learning and developing were a large or even moderate factor, we would see much smaller regressions after banner years than we do. The regressions we do see comport very nicely to what a statistical model would predict if no learning and developing were going on. Given that, there can be only one conclusion - THAT A BANNER YEAR IS 99 PARTS FLUCTUATION AND 1 PART LEARNING AND DEVELOPMENT (i.e, the concept of a "breakout year" is a FICTION!)
Banner Years
November 7, 2002 - MGL
Mr. James,
You need to read my protracted discussion on "regression" as it applies to baseball talent. I think it is in this thread, but I'm not sure.
Despite your moniker, you got no shot to "shoot me down" on this one!
A player's stats gets regressed to the mean of the population that he comes from. Yes, if we assemble ALL-STARS and choose players from that group, the mean is greater than 100. Same is true if we assemble right fielders and choose a player or players from that group. If we assemble a group of sub 6-foot players, our mean will probably be less than 100.
Tango looked at all ML players and chose those who had high lwts ratios (an average of 149) for 4 straight years. EVEN THOUGH THESE ARE OBVIOUSLY GREAT PLAYERS, THEY CAME FROM A GROUP OF ALL ML PLAYERS. That is why you regress toward the mean of all ML players (actually you regress toward the mean of all ML players who have played at least 5 years). If you assemble All Stars and choose from that group, THERE HAS TO BE AN INDEPENDENT REASON FOR YOU CALLING THEM ALL-STARS, OTHER THAN THE CRITERIA YOU USED TO SELECT PLAYERS FROM THAT GROUP. IOW, in order to regress those same 149 players to a mean greater than 100, you would need to assemble your ALL STARS first by using some creiteris independent of having a high lwts ratio for 4 straight years - say a high lwts ratio for the previous 3 years. If you do that, then you regress toward the mean of all players who have had high lwts ratios for 3 straight years and have played for at least 5 more years. Get it!
You regress towards the mean of the population that your player comes from! You cannot make any inferences about that population based upon you rsampling criteria! That's the whole point of regression!
Read this - it is important:
To put it another way, in Tango's example, he looks at the entire population of baseball players. They include all players of all true talent. They have a certain mean lwts ratio, which we can easily measure, and of course, we define it as 100. Next he ignores those players who have not had a minimum number of PA's for at least 5 years, right? So now we have a population of players who have had a min # of PA's for 5 straight years. We take the mean of that population, which is probably higher than 100 (say 105). That is the number we regress to! The fact that we now select only those players who have had at least a 125% lwts ratio for 4 straight years DOES NOT CHANGE THE POPULATION THAT THESE PLAYERS WERE SELECTED FROM. That is the popultion whose mean we rregress to! Yes, that group of > 125% players are ALL-STARS as a group. Their true lwts ratio is much greater than 105, but it is NOT 149, as we can see from the 5th year ratio of 142. By definition, when we regress to the mean (this is not my "opinion" it is a rule of statistics), we regress to the mean of the poulation from which we chose our players, regardless of what criteria we used to select those players. By criteria, I mean "What range of lwts ratios?", like the > 125% that Tango chose. If we choosze criteria (these are independent variables) like what position, or what weight, OR WHAT WAS THEIR LWTS RATIO IN THE PRIOR YEAR OR YEARS, then we have a new population and hence, a new mean to regress to. Anyway, I got off on a tangent as far as the important thing to read...
When Tango chose those players who had ratios above 125% for 4 straight years, the reason we regress at all is that those playres selected consist of: 1) players with a true ratio of less than 149 who got lucky, 2) those players with a true ratio around 149, the sample mean, and 3) those players who have a true ratio GREATER than 149. We don't know in what proportion they exist, but even though it is more likely that a player who has a sample mean of 149 is a true 149 player, and it is less likely that he is a less than or greater than 149 true player, there are many, many more players in our original group (our population) that were true sub-149 players, so it is much more likely that an "average" player in our 149 sample group of players is a true sub-149 player who got lucky. It just so happens that the proper mean to regress to is the mean of the original group, whatever that is (105?).
If we had chosen a group of ALL-STARS, based on, say, 3 years worth of lwts ratios above 125%, we now have a group whose true lwts ratios is around 135 or so. Now, if FROM THAT GROUP, we select those players who have had 4 MORE year of > 125%, then we have the same experiment as Tango's, except that that group is ALREADY made up of players who have a true ratio of around 140, as opposed to in Tango's example, the group he selcted from are ALL ML players who have played for 5 years (etc.). They only have a mean ratio of around 105. So in the second experiment, where we choose from a group of KNOWN good players (on the average, not all of them), many more of the players we select are good players who did not get lucky or got a little lucky for the next 5 years (after the initial 3 years of > 125% performance). Many more (percentage-wise) in the second group, as oposed to the first group, are also true > 149 players who got unlucky. That's why the true ratio in the second group is more than 142 (probably 145 or so). It is still not 149, since the mean of the group of ALL-STARS is only around 135, so we still have to regress the 149 sample mean to 135. The "reason" for this is that we still have some lingering players who are not very good, but managed to make it into the ALL-STAR group through luck, and ALSO managed to make it into the > 125% for the next 5 years group. Obviously not many players will make it through these 2 hurdles, but as long as there is a finite chance that any true sub-149 player will make it, the true ratio of the 149 group will ALWAYS be less than 149! You may say, wait, there is an equal chance that a > 149 players made it through both groups as a sub-149 player, so they would cancel each other out! In fact that is true! There is an equal likelihood that any player in our 149 group is a true 154 or a true 144 (each one is 5 points different from the mean). But here is the kicker that makes us regress downward in either experiment: There are many more players in either population who are true sub 149 players than there are who are > 149 players, so an average player in our 149 group is MORE likely to be a 144 player who got lucky than a 154 player who got unlucky, simply becuase there are more 144 players!
Now if we chose our ALL-STARS such that our estimate of the average true ratio in that group of ALL-STARS were 155 (let's say we selected all players who had a ratio for 3 years of greater than 140 - not too many players, of course), then if we did the same experiment, and the sample ratio of players who had > 125% for the next 4 years was stil 149, we would eactually regress upwards such that our estimate of the 149 group's true mean ratio would be like 153 or so!
I hope this explains everything, because I just missed my appointment at the gym!
Banner Years
November 7, 2002 - MGL
I'm done trying to explain how it works with baseball talent (or any similar experiment). Either we are misunderstanding one another or you are very stubborn or both. Maybe someone else can explain it to you or maybe we can just drop it.
If a sample of players (yes NOT a random sample), using a measurement criteria (like above 125% for 4 straguiht years), drawn from the population of baseball players DID NOT regress toward the mean then you would NOT see the 5th year at 142 - you would see it at 149, right?
Do an experiment which should make everything obvious - in fact, you don't even have to do the experiment - the results hsould be obvious:
Look at all BB players from 2001. Take only those who had an OPS above .900 (non-random sample - obviously). Their average (mean) OPS is something like .980. What do you think their OPS as a group is in 2002? It is the .980 regressed toward the mean of the entire population of baseball playersr (around .770), which will be maybe .850 or .880. The 2002 OPS is also the best estimate of these players' true OPS (given a large enough sample of players, the best estimate of those players' average true OPS is the next year's - or any large random sample - OPS). We KNOW this right? We know that we take any non-random sample of players defined by a certain performance (less than .700 OPS, greater than .800, etc.), their subsequent (or past) OPS will REGRESS (toward the mean of the whole population of BB players)! That is regression towards the mean (for an excperiment like this)! There does not have to be random sampling, although the exact same thing would happen if we took a random sample!
What do you think would "happen" (what would the future look like) if we looked at all those players who had an OPS of over 1.000 for one week? (See my thread on hot and cold streaks.) Would their future (or past) OPS regress towards the mean or wouldn't it? Would their average OPS (of around 1.100) remain the same? Do you not understand what I am getting at here? What do you think the true talent (OPS) of these one-week 1.100 guys is? I know you don't think it is 1.100, which means they will continue at a 1.100 clip. I know you know that it will continue at a pace closer to the league average (probably around .880). What do you call that other than regression to the mean?
(Light bulb went off in my head!) Now I see what you are saying! My apologies. Yes technically, these "higher than average groups" (the 149 for 4 straight years guys or the better than 1.000 OPS for one week guys) will regress toward THEIR mean lwts ratio or OPS and NOT the mean of the league as a whole. Yes that is true and I think that is what you are trying to say. Again, my apologies. You are absolutely correct. IN PRACTICE, you can use the mean of the whole league to regress to, because you don't know what the mean of the group you selected is - in fact, that is what you are trying to figure out. IOW, if we take the 149 ratio guys and want to figure out their true ratio or what their ratio will be in the next year (same thing, basically), then technically, we must use their true ratio to regress to, but that's what we are trying to figure out in the first place - what that true ratio is. SInce we don't know that and the onlything we know is the mean ratio of all players, then we have to regress that 149 towards that all player mean of 1056 or whatever it is. Yes, technically that 149 doesn't get regressed towards 105. It gets regressed towards ssomething less than 149 and mroe than 105, but since we don';t know what that is, it LOOKS like it gets regressed towards the 105.
Anyway, there is no argument anymore, unless you think that the true OPS of that 149 group is 149, rather than something like 142 (the 149 regressed towards the true ratio). IF you do, you must wonder why the next year comes out to 142. If you do, you must think that one year of a more than .900 OPS is a player's true OPS, again, in which case you must wonder why these guys show a .800 OPS or so the next year. And if you do, you must certainly wonder why a group that shows a 1.100 in a week does not continue at that clip, even though we did not randomly select this group of players (we only looked at players who had greater than a 1.000 OPS in a certain week)...
CHeers!
Banner Years
November 8, 2002 - MGL
I'm gonna try one more time! Forget my last post!
Forget about the expression "regression to the mean". It is a generic expression which could have different meanings depending upon the context. Pretend it doesn't exist.
Remember that when I say that a value (call it value 1) gets "regressed" to another value (call it value 2), THAT MEANS two things and two things only:
1) Value 2 represents the direction in which we move value 1 (it can be moved either up or down).
2) If we don't know how much to move value 1 (which we usually don't in these types of experiments), value 2 represents the limit of the movement.
For example, if value 1 is 149 and value 2 is 135, we know 2 and only 2 things, assuming that we have to "move" value 1. One, we move it down (towards the 135), and two, we move it a certain unknown amount but we never move it past 135.
How does this very vague above concept apply to baseball players? I'm glad you asked.
First, I am going to call "value 2", as described above, "the mean" and I am going to substitute the word "regress" for the word "move" as used in the context above. This is literally an arbitrary choice of words. We might as well say "wfogbnlnfl to the slkvdn". I'm using the word "mean", not in any mathematical sense, but to represent the "limit of how much we move value 1". Likewise, I am using the word "regress", also not in any mathematical sense, but purely as a substitute for the word "move".
So "regression to the mean" from now on simply means "We move value 1 either up or down, depending on whether value 2 is less than or greater than value 1, and value 2 also represents the most we can move value 1."
Now here are some experiments in which I will attempt apply the above methodology. You tell me whether it should be applied or not (and if not, why). If you think that it is appropriate to apply, you must also tell me what value we should use for value 2. The correct answers will appear at the end of this post.
Experiment #1:
Let's say that we have a population of BB players and we don't know whether all players in that population have the same true OPS or not. Either everyone has the same true OPS (like a population of all different denominations of coins, where every coin has the same true H/T "flipping" ratio), or some or all of the players have different true OPS's (like if we had a population of coins and some coins were 2-sided, while others were 3-sided or 4-sided, etc.).
Now let's say that we randomly sample 100 of these players and look at a 1-year sample of each player's lwts ratio. We basically have 100 "man-years" of data that were randomly sampled (not exactly, but close enough) from a population of all baseball players.
Let's say that the mean OPS these 100 players is .780. This is our value 1, by the way. Let's also say that WE DO NOT KNOW WHAT THE MEAN OPS OF THE POPULTION OF ALL PLAYERS IS. Remember that we randomly sampled these 100 players and 1 year's worth of data for each player from a population of all players and all years.
What is the best estimate of the true OPS of these 100 players, given that the average OPS for all 100 players for 1 year each is .780?
In order to arrive at that estimate, did you need to determine a value 2 and did you need to move value 1 (.780) in the direction of value 2 and does value 2 represent the furthest you should move value 1? If the answer is yes to all 3 related questions, how did you arrive at value 2? If the answer is yes to some and no to some (of the above 3 questions), please explain.
Experiment #2:
Same as experiment #1, but we now know (G-d tells us) that the mean OPS of the population of all players is .750 AND we know that all players have the same true OPS.
Again, what is the best estimate of the true OPS of our 100 players chosen randomly (and their 1-year stats each)? This is a no-brainer right? It is not a trick question. The answer is as obvious as it seems.
Given your answer to the above question, did you move value 1 (the .780), is there a value 2 (and if so, what is it), and if the answer to both questions is yes, do we know exactly how much to move value 1 ("towards" value 2)? IOW, is there regression to the mean (remember my above definition - movement, direction, and limit, where "regression to" is "movement towards" and "the mean" is "value 2")?
Experiment #3: (We are getting closer and closer to Tango's experiment)
Same as above (#2) only this time not only do we know that the mean OPS of all players is .750, we also know (again from G-d, not from any empirical data) that all players in the population have different true OPS's. IOW, some players have true OPS' of .600, some .720, some .850, some .980, etc. In this experiment we don't know, nor does it matter, what percentage of players have true OPS's of x, what percentage have true OPS's of y, etc. We only know that different players have different true OPS's. So in our random sample of 100 players, each player could have a true OPS of anything, assuming that every OPS is represented in the population in unknown proportions.
Now, rememeber, like the previous experiments, the mean OPS of our 100 randomly selected players for 1-year each (at this point it doesn't matter that we used 1-year data for each player. We could have used 2-year or 6-months), is .780. Remember also that we KNOW the true average (mean) OPS of all the players is .750. And don't forget (this is what makes this experiment different from #2) that we KNOW that different players in the population have different true OPS's, of unknown values and in unknown proportions (again, the last part - the "unknown values and in unknown proportions" - doesn't matter yet).
So now what is your best estimate of the true average (mean) OPS of the 100 players? Is this an exact number? Do we use a "regression to the mean"? If yes, what is value 2 and do we know exactly how much to move (regress) value 1 (again, the .780) towards value 2?
Here are the answers to the questions in experiments 1-3:
Experiment #1 (answer):
The best estimate of the average true OPS of the 100 players is .780, the same as their sample OPS. There is no "regression to the mean". There is no value 2; therefore there is no movement from value 1. (Technically, we could say that value 2 is .780 also, the same as value 1, and that we regress value 1 "all the way" towards value 2.) The above comes from the following rule in sampling statistics:
When we sample a population (look at the 1-year OPS of 100 players) and we know nothing about the characteristics of the population, as far as the variable we are sampling (OPS) is concerned, the sample mean (.780) is the best estimate of the population mean.
Experiment #2 (answer):
The answer is that no matter what the sample OPS is (in this case .780), the true OPS of any and all players (including the average of our 100 players) is .750! This is simply because we are told that the true OPS of all players is .750! Any sample that yields an OPS of anything other than .750 MUST BE DUE TO SAMPLING ERROR
Banner Years
November 8, 2002 - MGL
I'm gonna try one more time! Forget my last post!
Forget about the expression "regression to the mean". It is a generic expression which could have different meanings depending upon the context. Pretend it doesn't exist.
Remember that when I say that a value (call it value 1) gets "regressed" to another value (call it value 2), THAT MEANS two things and two things only:
1) Value 2 represents the direction in which we move value 1 (it can be moved either up or down).
2) If we don't know how much to move value 1 (which we usually don't in these types of experiments), value 2 represents the limit of the movement.
For example, if value 1 is 149 and value 2 is 135, we know 2 and only 2 things, assuming that we have to "move" value 1. One, we move it down (towards the 135), and two, we move it a certain unknown amount but we never move it past 135.
How does this very vague above concept apply to baseball players? I'm glad you asked.
First, I am going to call "value 2", as described above, "the mean" and I am going to substitute the word "regress" for the word "move" as used in the context above. This is literally an arbitrary choice of words. We might as well say "wfogbnlnfl to the slkvdn". I'm using the word "mean", not in any mathematical sense, but to represent the "limit of how much we move value 1". Likewise, I am using the word "regress", also not in any mathematical sense, but purely as a substitute for the word "move".
So "regression to the mean" from now on simply means "We move value 1 either up or down, depending on whether value 2 is less than or greater than value 1, and value 2 also represents the most we can move value 1."
Now here are some experiments in which I will attempt apply the above methodology. You tell me whether it should be applied or not (and if not, why). If you think that it is appropriate to apply, you must also tell me what value we should use for value 2. The correct answers will appear at the end of this post.
Experiment #1:
Let's say that we have a population of BB players and we don't know whether all players in that population have the same true OPS or not. Either everyone has the same true OPS (like a population of all different denominations of coins, where every coin has the same true H/T "flipping" ratio), or some or all of the players have different true OPS's (like if we had a population of coins and some coins were 2-sided, while others were 3-sided or 4-sided, etc.).
Now let's say that we randomly sample 100 of these players and look at a 1-year sample of each player's lwts ratio. We basically have 100 "man-years" of data that were randomly sampled (not exactly, but close enough) from a population of all baseball players.
Let's say that the mean OPS these 100 players is .780. This is our value 1, by the way. Let's also say that WE DO NOT KNOW WHAT THE MEAN OPS OF THE POPULTION OF ALL PLAYERS IS. Remember that we randomly sampled these 100 players and 1 year's worth of data for each player from a population of all players and all years.
What is the best estimate of the true OPS of these 100 players, given that the average OPS for all 100 players for 1 year each is .780?
In order to arrive at that estimate, did you need to determine a value 2 and did you need to move value 1 (.780) in the direction of value 2 and does value 2 represent the furthest you should move value 1? If the answer is yes to all 3 related questions, how did you arrive at value 2? If the answer is yes to some and no to some (of the above 3 questions), please explain.
Experiment #2:
Same as experiment #1, but we now know (G-d tells us) that the mean OPS of the population of all players is .750 AND we know that all players have the same true OPS.
Again, what is the best estimate of the true OPS of our 100 players chosen randomly (and their 1-year stats each)? This is a no-brainer right? It is not a trick question. The answer is as obvious as it seems.
Given your answer to the above question, did you move value 1 (the .780), is there a value 2 (and if so, what is it), and if the answer to both questions is yes, do we know exactly how much to move value 1 ("towards" value 2)? IOW, is there regression to the mean (remember my above definition - movement, direction, and limit, where "regression to" is "movement towards" and "the mean" is "value 2")?
Experiment #3: (We are getting closer and closer to Tango's experiment)
Same as above (#2) only this time not only do we know that the mean OPS of all players is .750, we also know (again from G-d, not from any empirical data) that all players in the population have different true OPS's. IOW, some players have true OPS' of .600, some .720, some .850, some .980, etc. In this experiment we don't know, nor does it matter, what percentage of players have true OPS's of x, what percentage have true OPS's of y, etc. We only know that different players have different true OPS's. So in our random sample of 100 players, each player could have a true OPS of anything, assuming that every OPS is represented in the population in unknown proportions.
Now, rememeber, like the previous experiments, the mean OPS of our 100 randomly selected players for 1-year each (at this point it doesn't matter that we used 1-year data for each player. We could have used 2-year or 6-months), is .780. Remember also that we KNOW the true average (mean) OPS of all the players is .750. And don't forget (this is what makes this experiment different from #2) that we KNOW that different players in the population have different true OPS's, of unknown values and in unknown proportions (again, the last part - the "unknown values and in unknown proportions" - doesn't matter yet).
So now what is your best estimate of the true average (mean) OPS of the 100 players? Is this an exact number? Do we use a "regression to the mean"? If yes, what is value 2 and do we know exactly how much to move (regress) value 1 (again, the .780) towards value 2?
Here are the answers to the questions in experiments 1-3:
Experiment #1 (answer):
The best estimate of the average true OPS of the 100 players is .780, the same as their sample OPS. There is no "regression to the mean". There is no value 2; therefore there is no movement from value 1. (Technically, we could say that value 2 is .780 also, the same as value 1, and that we regress value 1 "all the way" towards value 2.) The above comes from the following rule in sampling statistics:
When we sample a population (look at the 1-year OPS of 100 players) and we know nothing about the characteristics of the population, as far as the variable we are sampling (OPS) is concerned, the sample mean (.780) is the best estimate of the population mean.
Experiment #2 (answer):
The answer is that no matter what the sample OPS is (in this case .780), the true OPS of any and all players (including the average of our 100 players) is .750! This is simply because we are told that the true OPS of all players is .750! Any sample that yields an OPS of anything other than .750 MUST BE DUE TO SAMPLING ERROR
Banner Years
November 8, 2002 - MGL
I'm gonna try one more time! Forget my last post!
Forget about the expression "regression to the mean". It is a generic expression which could have different meanings depending upon the context. Pretend it doesn't exist.
Remember that when I say that a value (call it value 1) gets "regressed" to another value (call it value 2), THAT MEANS two things and two things only:
1) Value 2 represents the direction in which we move value 1 (it can be moved either up or down).
2) If we don't know how much to move value 1 (which we usually don't in these types of experiments), value 2 represents the limit of the movement.
For example, if value 1 is 149 and value 2 is 135, we know 2 and only 2 things, assuming that we have to "move" value 1. One, we move it down (towards the 135), and two, we move it a certain unknown amount but we never move it past 135.
How does this very vague above concept apply to baseball players? I'm glad you asked.
First, I am going to call "value 2", as described above, "the mean" and I am going to substitute the word "regress" for the word "move" as used in the context above. This is literally an arbitrary choice of words. We might as well say "wfogbnlnfl to the slkvdn". I'm using the word "mean", not in any mathematical sense, but to represent the "limit of how much we move value 1". Likewise, I am using the word "regress", also not in any mathematical sense, but purely as a substitute for the word "move".
So "regression to the mean" from now on simply means "We move value 1 either up or down, depending on whether value 2 is less than or greater than value 1, and value 2 also represents the most we can move value 1."
Now here are some experiments in which I will attempt apply the above methodology. You tell me whether it should be applied or not (and if not, why). If you think that it is appropriate to apply, you must also tell me what value we should use for value 2. The correct answers will appear at the end of this post.
Experiment #1:
Let's say that we have a population of BB players and we don't know whether all players in that population have the same true OPS or not. Either everyone has the same true OPS (like a population of all different denominations of coins, where every coin has the same true H/T "flipping" ratio), or some or all of the players have different true OPS's (like if we had a population of coins and some coins were 2-sided, while others were 3-sided or 4-sided, etc.).
Now let's say that we randomly sample 100 of these players and look at a 1-year sample of each player's lwts ratio. We basically have 100 "man-years" of data that were randomly sampled (not exactly, but close enough) from a population of all baseball players.
Let's say that the mean OPS these 100 players is .780. This is our value 1, by the way. Let's also say that WE DO NOT KNOW WHAT THE MEAN OPS OF THE POPULTION OF ALL PLAYERS IS. Remember that we randomly sampled these 100 players and 1 year's worth of data for each player from a population of all players and all years.
What is the best estimate of the true OPS of these 100 players, given that the average OPS for all 100 players for 1 year each is .780?
In order to arrive at that estimate, did you need to determine a value 2 and did you need to move value 1 (.780) in the direction of value 2 and does value 2 represent the furthest you should move value 1? If the answer is yes to all 3 related questions, how did you arrive at value 2? If the answer is yes to some and no to some (of the above 3 questions), please explain.
Experiment #2:
Same as experiment #1, but we now know (G-d tells us) that the mean OPS of the population of all players is .750 AND we know that all players have the same true OPS.
Again, what is the best estimate of the true OPS of our 100 players chosen randomly (and their 1-year stats each)? This is a no-brainer right? It is not a trick question. The answer is as obvious as it seems.
Given your answer to the above question, did you move value 1 (the .780), is there a value 2 (and if so, what is it), and if the answer to both questions is yes, do we know exactly how much to move value 1 ("towards" value 2)? IOW, is there regression to the mean (remember my above definition - movement, direction, and limit, where "regression to" is "movement towards" and "the mean" is "value 2")?
Experiment #3: (We are getting closer and closer to Tango's experiment)
Same as above (#2) only this time not only do we know that the mean OPS of all players is .750, we also know (again from G-d, not from any empirical data) that all players in the population have different true OPS's. IOW, some players have true OPS' of .600, some .720, some .850, some .980, etc. In this experiment we don't know, nor does it matter, what percentage of players have true OPS's of x, what percentage have true OPS's of y, etc. We only know that different players have different true OPS's. So in our random sample of 100 players, each player could have a true OPS of anything, assuming that every OPS is represented in the population in unknown proportions.
Now, rememeber, like the previous experiments, the mean OPS of our 100 randomly selected players for 1-year each (at this point it doesn't matter that we used 1-year data for each player. We could have used 2-year or 6-months), is .780. Remember also that we KNOW the true average (mean) OPS of all the players is .750. And don't forget (this is what makes this experiment different from #2) that we KNOW that different players in the population have different true OPS's, of unknown values and in unknown proportions (again, the last part - the "unknown values and in unknown proportions" - doesn't matter yet).
So now what is your best estimate of the true average (mean) OPS of the 100 players? Is this an exact number? Do we use a "regression to the mean"? If yes, what is value 2 and do we know exactly how much to move (regress) value 1 (again, the .780) towards value 2?
Here are the answers to the questions in experiments 1-3:
Experiment #1 (answer):
The best estimate of the average true OPS of the 100 players is .780, the same as their sample OPS. There is no "regression to the mean". There is no value 2; therefore there is no movement from value 1. (Technically, we could say that value 2 is .780 also, the same as value 1, and that we regress value 1 "all the way" towards value 2.) The above comes from the following rule in sampling statistics:
When we sample a population (look at the 1-year OPS of 100 players) and we know nothing about the characteristics of the population, as far as the variable we are sampling (OPS) is concerned, the sample mean (.780) is the best estimate of the population mean.
Experiment #2 (answer):
The answer is that no matter what the sample OPS is (in this case .780), the true OPS of any and all players (including the average of our 100 players) is .750! This is simply because we are told that the true OPS of all players is .750! Any sample that yields an OPS of anything other than .750 MUST BE DUE TO SAMPLING ERROR, by definition. It told you this one was a no-brainer! In this experiment, there is "regression to the mean" (again, per my definition - re-read it if you forgot what it was). Value 1 (.780) gets moved towards value 2, (.750). It just so happens that we know exactly how much to move it (all the way). Value 2 is still the limit on how much we can move value 1 in order to estimate the average true OPS of the sample group of players. And in this case, value 2 is equal to THE MEAN OF THE POPULATION! How do you like that? In this experiment, regression to the "mean" is really to the "MEAN"!
Experiment #3 (answer):
Remember we still know the population mean (.750). This time, however, not only are we not told that all players have the same true OPS, we are told that they definitely don't. The answer is that the best estimate of the true OPS of our 100 player sample (with a sample average OPS os .780) is something less than .780 and something more than .750. We don't know the value of the "somethings" so there is no exact answer other than the above (given the information we have). So again we have "regression to the mean", with value 2 still being .750, value 1 still .780, and the amount of regression or movement is unknown. The movement must be down since value 2 is less than value 1, and the limit of the movement is .750 since that is the value of value 2. (We can actually estimate the amount of the movement given some other paramaters but that is not important right now - we are only interested in whether "regression to the mean" is appropriate in each of these experiments, and if it is, what is the value of value 2.) BTW, as in experiment #2, value 2 happens to be the population mean, so the expression "regression to the mean" is somewhat literal, although again, that is somewhat of a coincidence.
Back to some more experiments (leading up to Tango's)...
Experiment #4:
We know the average OPS of all players is .750 and we know that all players have the same true OPS. This time, however, we select X players, not randomly, but all those who had an OPS in 1999 greater than .850. Let's say that there were 25 such players and that their average (sample) OPS was .912 (for that 1 year).
What is the average true OPS of our 25 players? Again, easy (trick) answer! It is still .750, since you are told that all players have a true OPS of .750. Again it doesn't matter what criteria we chooose to select players or what any player's 1-year sample OPS is. All sample OPS's that differ from .750 are due to statistical fluctuation (sample error), by definition. Again, "regression toward the mean", where the "regression" is 100% and the "mean" (value 2) is the mean OPS the population. So we have "regression to the mean" even thgough we did not choose a random sample of players from the population. We chose them based upon the criteria we set - greater than an .850 sample OPS in the year 2000.
Experiment #5 (same as Tango's):
Same population of players. It has a mean OPS of .750. Unlike the above experiment, each player can have a different true OPS. This time we only look at players who had a sample OPS of greater than .990 during the month of June in 2000. The average (sample) OPS of this group (say 50 players) is 1.120.
What is your best estimate of the true OPS of this group of players? (This question is of course exactly the same question as "What is your best estimate of what this group of players will hit in July 2000 or August 2000, not counting things like changes in weather, etc.?") Well what is it, and how do you arrive at your answer? Is your answer reasonable?
In order to arrive at your answer, was there any "movement" from the 1.120, like there was in the last experiment (the .912 had to be moved toward the .750 - in fact all the way to it)? In which direction 0 up or down? Why? If you did move the 1.120 to arrive at a different value for the "best estimate of the sample players' true OPS" how much did you move it? How much should you move it? Is there a limit on how much you should move it? If you did move it, is there a value 2 that tells us in what direction to move value 1 (the 1.120). How did you arrive at the value 2? Does this value 2 (like all value 2's are supposed to do) represent the limit of the movement? If not, why not, and what is the limit of the movement? Was there "regression to the mean" in deriving at your estimate of the sample players' average true OPS? If yes, what value represented "the mean"?
I'm not going to answer any of above questions in the last experiment. If you answer them (and the others) correctly, you will know everything there is to know about Tango's and similar experiments and whether there is or is not "regression to the mean", in the generic sense, in determining sample players' OPS, no matter how we choose our sample (randomly or not), and what value is represented by "the mean", given that "regression" simply means "some amount of movement"...
Banner Years
November 8, 2002 - MGL
Well, Frank, not only did you not answer my last few questions, but it is obvious to me that you know virtually nothing about basic statistical principles (at least not enough to have an intelligent discussion about this kind of analysis).
Your statement:
"Rey Ordonez has NO chance of showing up in Tango's sample, even if you simulated his performance over a thousand years."
is of course patently wrong. Rey Rey or even my 13 yo cousin does not have NO chance of showing up in Tango's sample. Everyone has SOME FINITE chance, no matter how infinitesimal. The tails of a bell curve reach out to an infinite distance.
That being said, the discussion (with you) is probably over. Neverthless, my sample will contain slightly better than average players whereas Tango's sample will contain distinctly better than average players. We all know this of course.
Neverthesless, the answer to my questions will be same whether we use my sample or Tango's sample. Both samples, of course, are non-random, which was the crux of your original criticism. Tango's is very non-random and mine is slightly non-random. At what point to you switch answers (that you do or don't use "regression to the mean" to estimate the sample's true lwts ratio or OPS and that "the mean" is not the mean of all players)? You don't like "one week" because it encompasses most players (again, it is still non-random as the worst players will tend to not have any or many one week OPS's above .980, whereas the great players will have many - in fact probably around half of all their weeks will be above .950)? What about 2 weeks? 6 months? One year? 2 years? 4 years like in Tango's study? At what point do we no longer "regress to the mean of the population" in order to estimate true OPS of lwst ratio? Your argument is silly!
The reason I used one week is to make it obvious to you that even if we take a non-random sample of players from the population of all players and our selection criteria is greater or less than a certain OPS or lwts ratio, that the estimate of true OPS or ratio in that non-random group must be "less extreme" than the sample result. You know that; everyone knows that. The one-week experiment makes it obvious. You are just digging in your heels at this point, for whatever reasons. Rememvber that what we use for value 2 (the so-called "mean") is simply the number that represents the limit of our regression.
Read this!!!!
Here is where you are getting screwed up, no offense intended. In the one-week experiment, you know intuitively that the true OPS for our sample players is actually somewhere near the mean of the population (a little higher), so you don't mind using the population mean as the number to regress towards (remember that number, value 2, just tells us DIRECTION and LIMIT). In Tango's experiment, you know intuitively that the sample group is mostly very good players with very high true ratios - which is true - so you don't like using the population mean as value 2. That's fine. The true OPS of Tango's sample players is nowhere near value 2 (the mean of the population); we still use it though to give us direction and limit - that's all value 2 is - remember? The concept of value 2 being the limit becomes silly when we choose mostly very good players of course, but we still use it to give us direction because it is the only KNOWN value that we have. In those extreme cases, we don't really NEED it, as we can just say that the true ratio of Tango's sample players is "something slightly less than 149", but we can also say that it is "149 regressed towards 105" (or whatever the mean ratio of all 5-year players is). That's all I have been trying to say.
Basically, Frank, once you establish value 2 as your direction and limit of your regression (and value 2 is always, by definition, the mean of the population), then you can decide how much to regress your sample mean (the 149 in Tango's case). Without doing a regression analysis or having other information about the distribution of true ratios in the poulation, you can only guess at how much to regress. In my case - the one-week guys - you regress a lot, which you know intuitively. Still doesn't change value 2, does it, even though you know intuitively that the final answer (the best estimate of the sample players' true OPS or ratio) is going to be close to value 2? Let's take 6-month players. Intuitively, you know to regress more than you do for the one-week players, but still a lot. Again, we still use value 2, or the population mean, to tell us direction and limit. If you didn't use a precise value for value 2, you might make a mistake and regress too much! For example, if we did the same experiment and used one month samples above an OPS of .980, you would know intuitively that their true OPS's were a lot less than that, right? Let's true to guess .850. Is that too high or too low? Without knowing the population mean and using it as our value 2 (the limit of the regression), we can't tell! If I told you that the population mean is .750, then you would say "Phew, that's not too low!" Infact, you would probably nopw say that their true OPS was around .780 or .800, wouldn't you? IOW, you need to have that value 2 in order to have SOME IDEA as to how much to regress! And that value 2 is always the popualtion mean, arbitrarily and by defintion, simply becuase you know that the true OPS of your sample of players cannot be less tha that (since you selected your players based on a criteria of some time period of performnce GREATER than the average). How about 1-year samples above .980. We know to regress even less, but we still need a value 2 to give us SOME IDEA how much to regress and to make sure that we don't regress too much! When we get to 4 year samples, as in Tango's study, we don't all of sudden say "No more regression!" We still regress, only this time a very small amount! Do we need to know what direction? Well it's kind of obvious since we know from ur selection criteria that our sample includes more players who got lucky than unlucky. So we know to regress downwards. Do we need to know exactly what the limit of regression is - i.e., do we need to know value 2? No, of course we don't, but it doesn't change the fact that, like the 1-week, 1-day, 1-year, or 3-year experiments, there still is a value 2, which is still the mean of the overall population, and that technically that number established the direction and limit of our regression. If in Tango's study you don't want to call it "regression to the mean", that's fine. Who cares? It is only semantics. If you want to call it "a little regression donwards" that's fine too. It doesn't change my discussion an iota! I hate when someone takes an "expression" (out of context), critices it (with valid criticism), and then uses THAT criticism to invalidate a person's whole thesis. Why don't you just say "I agree with your whole thesis (in fact, as I said before, there is no agreeing or disagreeing - I'm just parroting you proper statisitical theory in my own words and realting it to these baseball experiments), but I don't like sentence # 42?" It's like if I tell you the entire and correct theory of the origin of the universe (I know that it is debatable), and I finish off by telling you that the universe is contracting rather than expanding and you tell me that I'm compleltely full of crap (political commentators do this all the time)!
Here's an example, BTW, of where knowing value 2, and making sure that it is the mean of the whole population, IS improtant, even when we take multi-year samples. Let's say we did the same experiment as Tango, but we select players who were only 5% above average for 2 straight years. Let's say that the average OPS for that sample was .788. Even though we are reasonable certain that our sample is made up primarily of above average players (no Rey Rey's in this sample), do we need to know value 2 in order to "hone in on" how much to lower the .788 to arrive at an estimate of the true OPS of our sample? Sure! Let's say I don't tell you what the mean OPS of the population of ALL 3-year players (players who have played for at least 3 or 4 years) is? You might choose .775 as the best estimate of our sample players' true OPS. Whoah! If i them told you that the mean OPS in the population (of all 3-year payers) was .776, you would know that you went too far! If I told you that it was .750, then you would know that you were in the ballpark (no pun intented). So while sometimes knowing value 2 is important and sometimes it is not (it is obvious in which direction to regress and it is obvious about how far), it doesn't change the fact that we always HAVE a value 2, and that it is always the mean of the population!
Whew....
OPS: Begone!
May 20, 2003 - MGL
Good work Tango!
I think Darren might be right about Beane's "3 times the value" comment (see his post). I also think that while Beane and company are parsecs ahead of their competition, that there is a large gap between his and Depodesta's knowledge of and efficient and correct use of sabermetric principles and that of many sabers on Primer, BP, and Fanhome (and wherever else they might lurk). IMO, the A's would be far better off hiring someone like James, Voros, Tango, etc., than trying to do sabermetric analysis themselves. It is kind of like when Brenley asked Matt Williams to sac bunt last night. I'm sure he is capable of doing so, but...
Also, I think that teams will quickly start catching up to the "player acquisition" principles being espoused and used by Beane and company, especially since they are now being publicized in a maninstream book (nice job Beane). I don't think it will be very long before it wil be very difficult to pick up high production (high OPS or whatever) but not traditionally highly regarded players cheaply. The next frontier for picking up undervalued players will be and should be DEFENSE, and other Super-lwts components. It will be a long time before teams start using things like UZR to evaluate the overall impact a player will have on their team. Right now, one of the best ways to pick up valuable players cheaply and "sell" players expensively who are not all that valuable (buy low and sell high), is to look for large gaps bewtween a player's traditional defensive rating (scouting, reputation, etc.) and their UZR (or other good defensive metric - are there any others?) rating. (I think the days are numbered as to being able to do that for offense.) This should provide for plenty of value for a while I think. In the book, Beane implies that they use some kind of defensive rating that sounds suspiciously like UZR, via some computer company or something. Anyone know more about that?
SABR 101 - Relative and Absolute Scales (June 6, 2003)
Discussion ThreadPosted 6:11 p.m.,
June 7, 2003
(#3) -
MGL
Good stuff Tango!
Patriot, would like to see you expand on that thought. I always thought that BJ was brilliant (and a very good writer), but like many brilliant people, he can be considerably one-track...
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 1:23 a.m.,
January 14, 2004
(#4) -
MGL
I read through some of the old stuff and I'm still lost. Here is one of your equations, Tango:
Anyway, DER = UZR+Park+PZR.
Maybe I do get it. Are you saying that UZR measures how much better or worse a fielder handles line drives to zone 7S or hard hit ground balls to zone 56M, etc., wihtout regard to the distribution of those batted balls (e.g., we don't care if fielder A got 100 hard hit balls to zone 56M only and fielder B got 100 slow hit balls to zone 56M only - if they both fielded them at the league average, they would both have a UZR of zero), but PZR only considers the distribution of those different batted balls - i.e. PZR doesn't care which ones are actually turned into outs or not, only the league average out rate for each kind of ball in each zone (as well as the other parameters)? For example, if pitcher A had 100 hard hit balls in zone 56M only and pitcher B had 100 soft hit balls hit to zone 56M only, then pitcher B has a much better PZR? And we would calculate the exact PZR based on the league average out rates of hard hit balls and soft hit balls into zone 56M?
Aha, now I thinkI get it! I was thinking that PZR was like UZR in that it considered the actual out rate of each type of batted ball/zone/runners, outs, etc., and comapred that to the league average rates, yielding the same result as a pitcher's collective fielders. Now I see what you are doing! Brilliant! Now I also see how park adjusted UZR + park adjusted PZR = DER.
Of course, once we figure PZR, we still want to know how much of PZR is luck and how much is skill. I have a feeling that you already calculated that ahead of actually doing the individual PZR's. That must be from the team PZR's that you estimated from the team DER's minus the team UZR's. Is that right? And you came up with the fact the pitchers and fielders have about the same amount of responsibility? Is that right? And how much of each one's value (UZR or PZR) is skill and how much luck? I guess what that question always means is that for an infinite sample size, what is the regression? I think that is what that question means.
Hmmm... PZR. Briliant! I know Tango is now thinking, "What took that idiot (boor) MGL so long to figure this out?"
Let me know if I have this right now, and I'll do someof the preliminary work.
I assume that the only things yoiu can hold a pitcher responsible for, and you want to include in PZR is where the balls are hit, wht type and how hard. You can't hold him responsible for the other parameters, like baserunners (well, MAYBE baserunners), outs, and handedness of batters (other than how the pticher's handedness affects the batters' handedness), so I assume that you would want to "include" some paramteres in PZR, and adjust, but not include other parameters. In that way, it is a little different than doing the UZR calculations. Let me give an example of how I would caluclate a PZR and how I would hande the paramters issue, which is different from how they are handles in UZR (for PZR some of the paramters establish the baseline, and some of them are used to "adjust" the baseline; for UZR all the parameters are used for one and not the other). Correct me if I'm wrong here...
pitcher A
100 hard hit balls to zone 56M all with one out and by RHB's in 50 innings. That is all of his batted balls.
League averages
All hard hit ground balls to zone 56M are caught 60% of time with 0 outs and RHB, 62% 0 outs and LHB, 64% 1 out and RHB, and 66% 1 out and LHB.
All soft hit ground balls to zone 56M are caught 70% of time with 0 outs and RHB, 72% 0 outs and LHB, 74% 1 out and RHB and 76% 1 out and LHB.
All ground balls are caught 70% of time with 0 outs and RHB, 65% 0 outs and LHB, 75% 1 out and RHB and 70% 1 out and LHB.
All GB's are caught 70% of the time regardless of outs or batter hand.
If we don't want to penalize (or reward - whatever the case may be) the pitchers for the outs and the batter handedness, then we calulate for pitcher A:
If a league average pitcher gives up 100 ground balls with 1 out and a RHB at the plate, 75% are caught (see above league averages). However, pitcher A's 100 ground balls were all hit hard, were all hit to zone 56M (with 1 out and a RHB). A league average pitcher who did that would have only 64% of those kinds fo GB's caught (also, see league averages above). So our pitcher A allowed 11 fewer balls to be caught (regardless of how many were actually caught - that gets into the UZR realm), for a PZR of 11*.8 or 8 runs per 100 BIP or 50 innings, which ever we used as our "rate."
If we want to penalize (or reward) pitcher A for the fact that all his hits were by RHB and were with 1 out, then we would have to start with:
The league average conversion rate for ALL GB's, regardless of outs and batter handedness is 70%. That is the only thing that changes in our calculations. Now we take the difference between 70% and 64% for a PZR of 6*.8 or 4.8 runs (per 100 BIP or 50 innings).
Is that right? Should the pitcher be "penalized/rewarded" for any parameter other than speed, type, and location of batted ball? I don't think so. After all, we wouldn't think of doing that for park affects. We "adjust" for park affects. Why not "adjust" for outs/baserunners/handedness, and certainly batter G/F ratio (hmm.. do I adjust for batter G/F ratio in UZR? Probably not becuase there are so many batters it is not worth it - they probably average to near league average), as we should for park effects?
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 1:24 p.m.,
January 14, 2004
(#9) -
MGL
I probably agree with J. Cross on the handedness issue for most pitchers. Kind of like my explanation on the QOC article for not adjusting a pitcher's stats for opponent handedness in the QOC adjustments (but should do it for batters). There are exceptions though, like for LOOGY's, as I explained in the article. And of course, it would be a little unfair (one way or the other - good or bad), if a pitcher with not that many PA's happened to have faced more than his share of RHB's or LHB's, for no particular reason. Our concern might be somewhat baseless, as I'm sure the "adjustments" one way or another don't amount to much.
What about baserunners/outs? Should pitchers be responsible for any weird runners/outs profiles they have that signifciantly affect their PZR?
I;mnot even sure that we are going to gain with PZR's. I don't think it will help inpitcher evalaution or projection. After all, the regressions sort of take into consideration the inherent PZR's. Plus we can infer them quite accurately, by just "subtracting" their fielder's UZR from their stats. In fact, when I do my pitcher evaluationas and projections, I do a QOF adjustment which uses team regressed UZR (each fielder's multi-year regressed UZR added together, pro-rated by the distribution of that pitcher's BIP's).
Even when we get PZR, they still have to be regressed to "remove" the luck element. It might be nice to quantify what DIPS tries to ignore, but what is the purpose? Tango originally said something about using PZR to validate DIPS. I;m still not sure what that means. After you get the PZR and let's say it turns out that it is exaclty on the scale of UZR and that the regressions are about the same (as we think is true). What does that say baout DIPS? Only that a pitcher's BABIP is x part defense, x part pitcher, and y park luck. I think it has already been proven that: one, the pitcher does have some pretty decent control over BIP, and that two, the luck element is at least as strong as the defense element, probably much stronger...
UZR, multiple positions (July 7, 2003)
Posted 10:10 p.m.,
December 18, 2003
(#20) -
MGL
This is good stuff. I will have to re-read before I put out my Super-lwts again...
SABR 301 - Regression towards the mean (July 22, 2003)
Posted 12:39 p.m.,
January 14, 2004
(#3) -
MGL
I just realized why that website is so great! It's from my alma mater - Cornell U.!
Solving DIPS (August 20, 2003)
Posted 12:51 a.m.,
December 27, 2003
(#19) -
MGL
As I said on Fanhome, that is a phenominal article. It should make your head spin!
Anyway, looks like the only way for me to stop procrastinating is to go cold turkey. So, after this weekend, I won't be stopping by for a while, or reading anything else online. If someone wants me to post some links in Primate Studies, I'll be glad to do so, but I won't offer any of my thoughts on the matter. I'll be back in time for the World Series in a limited capacity.
I feel for you as much as anyone of course, as I periodically get addicted to Primer and Fanhome. However, how many times have you threatened to leave for a while and then come crawling back? ;)
Actually I need to do the same thing and concentrate on my real work and the book...
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 11:13 a.m.,
October 28, 2003
(#50) -
MGL
Tango,
What are the numbers for a "monkey" if the monkey uses a 3/4/5 weighting for the 3 years? How about a weighting plus a basic park adjustment (using a 3-year, or similar, "OPS park factor") for those players who have not played on the same team for the 4 years in question, or if you don't want to do that much work, a park adjustment for only those players who switched teams from 2002 to 2003?
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 7:00 p.m.,
October 28, 2003
(#59) -
MGL
I am not surprised at Marcel's success. If you factor in the park changes, Marcel probably (and should) blows everyone away! I've been saying (screaming) for years that projecting player performance is NOT rocket science nor does it take any special scouting, observational, intuitive, or even mathematical skills. It is simply "Monkey See Monkey Do," as Tango's experiment illustrates. I cannot say this enough. Barring some injury or other extraordinary factor, the best estimate of a player's performance is his last 3 or 4 years' performance, weighted and adjusted for age and context (park, opponent, etc.)! This is so important it bears repeating a hundred times or so (but I won't)! In fact, if you do just about anything else, you are probably going to do a lot worse than the sophisticated Monkey (Marcel plus context adjustments).
Although I was not able to participate in Tango's experiment this year (hopefully I'll have the time next year), my forecasting algorithm is available for all the world to see, and I'm sure my results would be somewhere near the top. I simply take each player's stats from the last 3 years, adjust them component by component for the strength of all of his opponents in those 3 years, adjust them for each park that a player plays in over those 3 years, and adjust each component for age (remember that aging curves look very different for each component). Then I adjust (to a healthy baseline) an entire year if that player was slightly injured, moderately injured, or severely injured in that year. Then I combine the 3 years using a 5/4/3 weighting system and regress each component towards the mean of an average player of similar height and weight. The fewer PA's a player has in those 3 years, the more each component gets regressed. Finally, if there is a continuing or new injury, I adjust the final stats to account for that injury.
I don't like to say this too much, because you get tons of flak from almost everyone other than hard-core sabermetricians, but, at least as far as player evaluation goes, for the purposes of projecting player performance, setting salaries, and putting together a successful team, you don't need to watch players and you don't need scouts (except perhaps for minor leaguers - even then, can you say MLE?). All you need are a player's stats! I live by this credo and I'll die by it! And I think that this whole experiment and discussion suggests that it is true at least to a large extent!
Seriously, how do you think most managers and GM's would do if they participated in this forecasting experiment? It would be so embarrasing it would be scandalous! Enrique Wilson (the best utility player in baseball according to Tim McCarver), Tony Womack, Neifi Perez, and Luis Sojo might be in someone's top 10!
This crap by some saber-types conceding that you have to combine sabermetrics with a "feel for the players," scouting, and other traditional evaluation techniques, in order to evaluate players and put together successful teams, is just that - a bunch of pandering, lip service crap - and I'm not afraid to say so!
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 8:31 p.m.,
October 28, 2003
(#62) -
MGL
I'll try and do my "after the fact" projections and see where they stand. Of course, no one will know for sure whether I cheated or not (I won't). My algorithm is basically what I described. The only thing I didn't specifically give (I'd be happy to) is what numbers I use to adjust for injury years, what numbers I use for park factors and age factors, and what my regression formula is.
As far as the 5%, 25%, etc. levels, such as Pecota does, personally, I don't think anything other than using regular old z scores are appropriate (IOW, if you have a .700 OPS projection, then there is a 5% chance that that player would have an OPS of greater than 2 SD above or below .700, where one SD is based on one year's worth of projected PA's. Anything other than that (such as what Pecota tries to do), is BS I think (I am not sure)...
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 9:04 p.m.,
October 28, 2003
(#64) -
MGL
I consider Voros the projection God! My algorithm is better now anyway!
Seriously, I think that any variation on Marcel, with park adjustments, is as good as any other and about as good as you can get. Plus there is obviously a fairly large sample error (margin of error) factor in the results...
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 12:58 p.m.,
November 1, 2003
(#75) -
MGL
Walt,
One of the problems is that there are two factors which determine what the "curve" will look like - one, the distribution of a binomial (will the player get a hit, a walk, a home run, etc., in each PA or won't he?), and two, the distribution of possible changes in true talent level, which is presumably based on things like chnages in age, physical condition, injury, "learning," and mental and psychological factors. The former should produce a normal curve, by definition - the latter, who knows? Pecota seems to focus on the latter and completely ignore the former. The former cannot be ignored. It will always exist and there is nothing that anyone can do about it. As I like to say, it is possible that certain players have a consistent talent level from day to day while others do not (for whatever reasons), but a player has no control over the random (actual semi-random, but then again, the throw of a die is semi-random as well) nature of the outcome of each PA. You (Walt) are talking only about the latter (changes in talent level from year to year) as well, whereas I am pretty much talking only about the former. But my contention is that the latter is either insignificant as compared to the former OR that it mimics the distribution of the former (it is bell shaped with a similar SD), so that the net result is a performance distribution which is approximately normal with an SD defined by the binomial distribution of OPS, BA, or whatever metric we are talking about...
Diamond Mind Baseball - Sending the runner on a 3-2 count (October 28, 2003)
Posted 1:23 p.m.,
October 29, 2003
(#4) -
MGL
1) I think it needs a lot more study to determine the "break even points."
2) I think assuming that the runners only gets thrown out 1/3 of the time is quite wrong. First, lots of non-base stealers (those who would get thrown out 2/3 of the time if they tries to steal a base) are sent on 3-2 counts.
3) I agree that batters are forced to swing more on marginal pitches with the runner going. Batters already swing too often on 3-2 counts (because they are afraid to take a called third strike). It is probably not a good idea to "force" them to swing even more often.
I think a good rule of thumb is to send above average base stealers only (then again, if they are above average, why were they not stealing before the 3-2 count?)
An interesting question is what about with runners on 1st and 2nd? Is it EVER correct to send the runners? Managers do this all the time.
Diamond Mind Baseball - Sending the runner on a 3-2 count (October 28, 2003)
Posted 6:07 p.m.,
October 29, 2003
(#5) -
MGL
In #2 above, I meant to add..
Second, for some strange reason, many otherwise fast runners and/or good basestealers seem to think that it is OK to get bad jumps and jog down to second on a 3-2 count, rather than the correct approacj which is to treat it as a straight steal...
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 8:25 p.m.,
November 11, 2003
(#52) -
MGL
As I has already said, it is extrmemly unlikely that a pitcher who has the K, BB, and HR rate that Pedro has had after 105 pitches (his historical stats) AND is still throwing in the 90's with movement and any difference in control or command is not great (observation) has a true $H of anywhere near .400. It's just not possible. In order for a pitcher to have a true $H of .400 or so would HAVE TO throw a straight fastball in the 80's (or less) with little else in terms of command or offspeed pitches. That's one of the points that Tango makes and he is correct. Yes, of course it is possible for a goos major league possible to get so tired that he has the talent of an A-ball pitcher. It is likely, however, that no manager, in any situation, even Grady Little, is going to leave a pitcher in that long. It is also likely that that kind of drop-off would occur at a VERY high pitch count (I don't think that pitchers "drop off the table" - I think that it is a gradual, though not necessarily linear, decline - but I'm not sure and it doesn't really matter to this discussion) AND if Pedro were pitching with a talent anywhere near that of an A-ball pitcher, you would probably notice something severely amiss other than not throwing good 2-strike pitches.
Mike, I'm not sure at all what you were trying to do with your quick and dirty study. Even if a pitcher never had days where his talent level differed from any other day, he would have stretches where any combination of hard/soft/line drive/fly/ground balls were hit due to chance alone. And of course, all pitchers who were taken out after less than 4 innings by definition would have had an unusally high number of hard hit balls AND line drives. With all due respect, I really don't see what the point of presenting the data was.
What you want to do is to look at all pitching starts in which a pitcher gets hit hard in the first 3 innings (high percentage of line drives and high percentage of hits per GB and FB) and then look at their $H (or whatever stat you want to use to represent how "hard" they are being hit) in the next inning. That will tell you something about whether enough pitchers fluctuate significantly enough in their "stuff" (ability as opposed to performance) that it is "correct" to take out a good pitcher after he gets shelled for several innings even though you might be making a mistake (it might just have been bad luck - BTW Ross, there are 2 kinds of bad luck in this regard; one is when bloop hits fall at the right time and a bunch of runs score accordingly; the other is when a pitcher gets hit hard even though his true "talent" has not changed). My guess is that in the next inning all of these pitchers as a group will revert back to very near their normal stats (in fact, I'd bet a lot at even money that this is true). Like clutch hitting, such a test would not prove or disprove that pitchers' talent fluctutes from day to day or even from inning (or batter) to inning; it merely would evidence, one way or another, whether one could use getting "hit hard" in any given number if innings as a proxy for this talent fluctuation or whether there is not enough prevalence in this regard (either not enough pitchers who do fluctuate in talent and/or the fluctuations in talent are not that great) to be able to distinguish it from the inevitable fluctuation in performance due to the random (binomial) nature of the events. Again, my guess is that the idea that pitchers who are getting "hit hard" (or not) is significantly indicative of their true talent at the time (IOW, predictive of the future) is another of those truisms that turn out to be clearly not true. Tango and I have been debunking many such myths lately and will present some of them in due time...
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 5:53 p.m.,
November 13, 2003
(#58) -
MGL
My apologies for putting words into your mouth Tango. I also think you are being a little too politically correct. :) What happened to the Cartman that I knew and loved?
Oh, and why do you bother?
David Pinto and fielding (November 10, 2003)
Posted 7:49 p.m.,
November 10, 2003
(#5) -
MGL
I haven't been following this thread too much, but yes, I regress most of the adjustments (I can't say ALL of them - I'd have to check) before I apply them - definitely the park adjustments. I use multi-year data for the adjustments and appply a regressed version to data in individual years (within that multi-year period).
For example, if the "ground balls through the infield" sample park factor at Dodger Stadium is 1.06 using data from 93-02 (hopefuly the infield has not changed much over the years - of course if an infield has changed, like in Phily and Tampa, I use different years and different PF's), I would regress that 1.06 to maybe 1.03 (grass infields get regressed towards .99 and turf infields towards 1.02, I think). That is the "ground ball" park factor (the 1.03) that I would use to adjust all ground balls at Dodger Stadium in any year (between 93 and 02). Actually, I don't think I regress any of the other adjustment factors (GB/FB pitchers, handedness, speed of batted ball). I think I use a 4-year sample adjustment factor, with no regression, but I'm not sure.
Interesting (and very good!) work by David. When I get some time, I'll check it out in more detail...
Pitcher's Hitting Performance When NOT Bunting (November 18, 2003)
Posted 1:41 p.m.,
November 19, 2003
(#3) -
MGL
To really assess how good a hitter the pitcher is, you need to remove all AB's where he unsuccessfully attempted to bunt at all, even if that wasn't the end of the at bat.
Actually those are grouped as bunt attempts. Since I have pitch by pitch data, most of the time I can tell when a pitcher has attempted a bunt and then switched to swinging away with 2 strikes.
In any case, Tango is right in that the above data is only when there is not a bunt situation, as I did not want to include those times when the infield was playing for a bunt and the pitcher swung away.
I was also able to calculate (not exactly, but pretty close) whether it is correct for a pitcher (or position player) to continue bunting or to swing away with two strikes or not (and what to do at the various counts, in terms of switching from bunting to swinging away or vice versa). Of course, that depends upon how good a hitter the batter is with 2 strikes versus how god a bunter they are. I can give the break even points OR tell you what an average pitcher (and position player) should do, however.
You'll have to wait for the book on that one though! ;) Good critical thinking on your part (questioning whether it is correct for a pitcher to still bunt or not with 2 strikes)! That is one of those (many) things that a manager would rightfully have NO IDEA what is correct or not, and to think that he does is both arrogant and stupid, since somewhere on this earth lies a person or two who could figure it out if they (managers) would only bother to ask!
Pitcher's Hitting Performance When NOT Bunting (November 18, 2003)
Posted 6:09 p.m.,
November 19, 2003
(#5) -
MGL
That's a good one! That could be taken either way though, and I'm not sure which way you mean it. Either way, you're (I'm) going to piss people off. I assume you mean that you are going to do most of the writing (which is good, since you write better than I do, except when you get cryptic), since we want people to actually buy the book, and controversy on the radio is often a good thing...
Pitcher's Hitting Performance When NOT Bunting (November 18, 2003)
Posted 6:14 p.m.,
November 19, 2003
(#6) -
MGL
One more thing:
and to think that he does is both arrogant and stupid, since somewhere on this earth lies a person or two who could figure it out if they (managers) would only bother to ask!
I thought that was one of my better statements! Actually it is a critcial point that needs to be made public (the folly of people in all walks of life making critcial decisions or offering opinions when they have no idea how to evaluate the merits of the various alternatives vis-a-vis their decision or opinion AND when those merits can and should be procured from someone else. The point can probably be made in not so harsh a fashion I suppose...
Persistency of reverse Park splits (November 20, 2003)
Posted 2:01 p.m.,
November 20, 2003
(#4) -
MGL
By the way, check out Sid Fernandez's home/road splits from his Met days. I bet those pass the significance test.
Did I just undermine my entire point?
To some extent you did. Looking at an extreme split (as compared to the average home park factor - i.e., a player who has a 1.20 home/road OPS ratio while playing for the Rox does not have an "extreme" split) for a player and doing a significance test on that is NOT the proper way to decide whether that sample split is a fluke or is "real" (or a combination). As is often the case with these types of questions, this is a Bayesian probability problem. First, you have to answer the question, "What is the distribution and magnitude of players in the population (of ML baseball players) who have unique true home/road splits that are different from the true splits of an average player for that home park?" Then, and only then, can you start doing "significance" tests on a particular sample split and some ensuing calculations. For example, if you answer the first question (the first part of the Bayesian calculation) with "There are no players with unique splits (such as if we were trying to find an association with a player's splits and the month of his birth)," then any weird sample split (even 4 standard deviations from the mean) is not going to suggest that the extreme sample split was anyhting but a fluke. That's how the analysis must be done.
People need to add one more important word to their baseball analysis vocabulary when it comes to these types of problems - "Bayes!"
And yes, not only did James find that there was virtually no such thing as a unique true platoon ratio in major league baseball (see my post in the Clutch thread about the T. Long trade a few days ago), at least for RHB's (and to a lesser extent for LHB's), so did the authors of the book "Curve Ball" and so did I.
Getting back to the Fernandez example and to home/road splits in general, if in fact we find that the regression for players' sample home/road splits is large, which my study suggests that it is, AND we know intuitively that some of a park's unique characteristics that go into its average park factor affect players differently (so it is unlike the "month of birth" example), what must be happening is:
1) The signal to noise ratio is low, probably due in part to the fact that we tend to forget or ignore that the sample size in splits is almost half that of a metric like OPS or BA;
2) there are probably only relatively few players who are significantly and uniquely (different from the average park factor) affected by a particular park; and
3) the effect of these unique influences is probably not that large.
This all leads to the conclusion, that using average park factors IS appropriate for adjusting player's home stats and that it does NOT do more harm than good and that using a player's overall stats and their home park average park factor is a VERY good way to predict their future splits, regardless of their sample historical splits, and that in evaluating trades, for example, we should not worry so much about players who have shown extreme and anomolous splits (like Nomar), as those extreme splits are most likely a fluke, absent compelling evidence to the contrary, and even then, we need to be very, very careful (as always) that we don't invent, exaggerate, or embellish "compelling evidence" to accomodate our beliefs.
Now really getting back to El Sid, he may be one of those players with whom you do have SOME compelling evidence at least that his extreme splits may have some merit in terms of accurately representing (with SOME regression) his true splits, given that he was, as Tango said, one of the most extreme fly ball pitchers in baseball history (he once pitched a complete game in which the infield had zero assists), and that he had relatively few balls put in play against him.
Because of the realtively small sample size of the original study, I did the exact same thing for 1999 and 2000. Here are the abbreviated results using the same parks:
There were 27 players (4499 PA's) in the 1999 sample with "reverse" splits. The average of the players' sample splits in the pitcher's parks was 1.13 (remember it "should" be .96). In the hitter's parks, whereas the splits of all players "should" be 1.04, the players with "revrese" splits had a composite split ratio of .89.
In 2000, 19 of the 27 players "survived." The players in the hitter's parks who had a "reverse" composite split of .89 regressed to a composite split of 1.11 and the players in the pitcher's parks regressed from a "revrese" split of 1.13 to a split of .96.
The conclusion is now stronger that, without knowing anything else about a player other than his sample one-year home/road splits, in order to estimate his "true" splits or predict his future splits (again, they are basically one and the same), one should ignore those sample splits and simply assume that his future or true split ratio will be approximately the same as the average player in the league.
BTW, in Tango's lead-in to this study, he meant (or at least, he should have meant) "...in reverse of the 'park factor' of his home park" and not "the 'HFA' (home field advantage)..."
Persistency of reverse Park splits (November 20, 2003)
Posted 1:43 a.m.,
November 21, 2003
(#6) -
MGL
Shea has a smaller than average foul park factor now (.93). DO you know when the seats were added? I keep track of all park changes and I don't recall coming across that, unless it was a long time ago.
Shea is a pitcher's park for basically 3 reasons. One, the infield grass seems to be thick (very low "GB hits thru the IF factor"). Two, as you say the visibility is probably bad and/or the mound is favorable to the pitcher, as the K park factor is high and the BB park factor is low. Three, the HR park factor is low, especially to left and center, because of the average fairly cold weather, the sea-level altitude, and a deep center field and rounded outfield...
Baseball Player Values (November 22, 2003)
Posted 10:02 p.m.,
November 23, 2003
(#9) -
MGL
The easy way to "factor in" fielding to pitching is to simply substitute a league average or team average $H for a pitcher's sample $H. If you want to be a little more rigorous, simply adjust each pitcher's $H by a team's DER. If you want to be more rigorous than that, use team UZR ratings to adjust a pitcher's $H. If you don't have UZR ratings, you can use MAH's new DRA ratings on a team level.
How to do these adjustments in a win added probability system is a little tricky....
Tendu (November 24, 2003)
Posted 12:15 a.m.,
November 26, 2003
(#11) -
MGL
Two things:
One, it is simply not that important to have anywhere near perfect data input. Two, as one who already does this on my own, using games downloaded from MLB.com's website, I can tell you that after watching thousands of games in my life on TV, it is a relatively simple task to judge pitch selection (if you are ever in doubt, you simply look at the pitch speed). Occasionally a splitter and a changeup are hard to distinguish, but who throws a splitter and a changeup and at around the same speed? Same thing for a curve and slider. Very occasionally, a change and a curve look similar, although a curve is usually thrown much slower. A cutter as opposed to a regular fastball can be problematic, but who cares. Ditto for a 2-seam and 4-seam fastball. As long as you get 90+% of the pitches right (which you should), again, who cares?
Pitch location is ONLY difficult when the batter hits the ball, and even then, you have a pretty good idea with practice. And if you are a Tendu stringer, practice you will get!
As far as pitch speed, you hear about the TV guns being so "inaccurate." As someone who watches more than 200 live games a year, that is BS! And even if it's true, again, who cares? A pitcher's average velocity on each of his pitches will even out even with different guns in different games. And once again, no one is going to care (nor should they) if there is a little "slop."
I have nothing to do with the company. I am only defending them because this "crying" about how hard it is to "string" games is ridiculous. It's not hard! Now if Tendu were not hiring and/or training players properly and consistently, or if they didn't have some sort of quality control system to make sure that stringers were not slacking off and/or faking results, that would be a different story...
Sabermetric Reference - CATCHER'S FIELDING (November 27, 2003)
Posted 7:56 p.m.,
November 29, 2003
(#2) -
MGL
My Superlwts (S-lwts) ratings for all players includes catcher "defensive" ratings, which are used for catchers in lieu of UZR, whichis used for all other position players. The catcher defensive ratings are based on the net value of their SB/CS(PO) (around .18 runs for a SB and -.46 runs for a CS or PO), and their PB's and errors per inning as compared to an average catcher. PB's are probably somewhat a function of the pitcher, although I give all the "creidt" to the catcher. As well, a catcher probably deserves some percentage of the credit for the WP's. Same thing for the SB/CS. Pitchers deserve some credit for that as well. In order to properly adjust or inlcude PB's, WP's, and SB/CS totals, you would have to do a failry complex analysis to "separate out" the effects of the pitchers. Alas, I don't do that. Maybe some time in the future.
Anyway, here is the best and worst catcher "defense" from 2000 to 2002, minimum of 1000 TPA "faced".
Name runs saved/cost per 500 PA "faced"
Matheny 11.8
Ausmus 8.0
B. Molina 6.1
I-Rod 5.4
Lieberthal 4.6
Piazza -8.3
Hatteberg -8.7
Barrett -4.4
Hernandez -3.9
Kendall -2.8
Posada -2.0
As you can see, most of the best defenders can't hit, and the worst defenders can. Barrett was an exception and is why (I assume) he is no longer playing, or at least shouldn't be. Hernandez is marginal, as he is only a decent hitter (for a catcher), and poor defender, and may be one reason why Oak traded him. And you can see why Pudge is so valuable, as he possesses that rare combination of great defensive and offensive skills. As well, you can see why Piazza's overall value was never as good as it seemed, depsite scary offensive numbers, not to mention the fact that his offensive perpherals were always terrible (baserunning, moving runners over from second, and GDP). Now that his offense is not that great due to age (and I assume that his defense is worse as well), it is indeed imperative that he move to another position...
Marcel, The Monkey, Forecasting System (December 1, 2003)
Discussion ThreadPosted 2:03 a.m.,
December 5, 2003
(#31) -
MGL
The same you would with anyone else, even if you knew his name was Barry Bonds. Because you've decided to only use 1 year of data, even if you have more, you always regress based on the sample size.
Tango, whoa Nellie on that statement! You may be misleading people. It was significant that Phillie booster said "out of Zeus' head" - IOW, that we know nothing about the player. These formulas (e.g. Marcel the Monkey) are predicated on not knowing anything else about the player other than his 3-year (or whatever time period) stats. If we know that it is Bonds, then we have more info, so the formula will not necessarily work well as it stands in terms of predicting future perforemance, if that is the intention. Anytime we know something else about our player we would like to tweak the formual if we can.
People should not forget that these formulas are based on regression analyses or at least something loosely analagous to a regression anlysis. The precise way to come upo with a "best estimate" of a player's true performance level is to know or estimate the true distributions of talent in the populaton from which the player comes, and then do a Bayesian analysis based on that and a binomial distribution which models sample error in the sample stats you are working with. A regression equation and these simple equations (like MArcel) work great, but they are still VERY shorthand versions of the more precise Bayesian analysis.
BTW, when you regress to the league average in the Marcel or any other projection formula, if you know some characteristics about the player, such as ht and wt, or even defensive position alone, you can regress to those subsets of players and not the league average as a whole. You can also do something similar for your observation of a player's athleticism or the "sweetness" of his swing. For example, if a player with a lousy swing hits .340 in 300 AB's, it is more likely to be fluke than for a player with a nice swing. Again, you can handle this by regressing the first player's sample BA to the league average for players with lousy swings, etc.
And yes, each component very definitely has different regression coefficients and different age adustments, just as they have different park factors...
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 11:43 p.m.,
December 5, 2003
(#36) -
MGL
MGL, I don't think you read me right. Suppose you are a long-time baseball fan, but you can't remember a number to save your life. And all you have are the 2003 batting stats.
It doesn't matter what his name is... you would still regress the 600 PA hitters around the same way.
I know that's what you meant, but I didn't want anyone to be mislead. I'm not sure you aren't misleading (some) people again.
Let's take your first sentence above. You say you "can't remember a number," but presumably you have a name (Bonds). Do you know anyhting about that name, even though you don't know any numbers? Do you know that he is a good player? A great player? His father was a great player? All of these things change the number towards which you regress.
If you know literally nothing about this player names Bonds (he could be named Savkjgbjd for all you know), then isn't it so obvious that it won't affect the regression that it's not worth even mentioning?
There is a tenet in statute interpretation (in law) that says something like "If there are two ways to interpret something and one way (way "A") means that there was absolutely no reason to mention "X" and "X" was in fact mentioned, then interpretation "B" must be assumed.
I assume that you mentioned Bonds with the assumption that our "person" must know something about him otherwise there was no reason to even bring up the fact that he knew the name of the player...
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 7:55 p.m.,
December 10, 2003
(#11) -
MGL
Since a lot of smart guys hang out here but not on Fanhome, where this idea was started and continues, maybe someone can help me out here:
I understand how CS can correlate positively with baserunning runs (BRLWTS). But that is only because CS correlates with SB attempts and with SB themselves. It is like K's and offensive production. They correlarte only becuase players with high K's also have high HR's. There is no cause/effect relationship.
Given that, how can we use the formula BR = .10SB + .12CS for any individual player?? Clearly, the higher the CS's for any given SB, the BRLWTS fo not go up! If we have a regression equation that only includes CS, then yes, we can use it to predict or estimate BRLWTS such that the higher the CS, the "faster" the runner. But surely once we use SB already, the VS have to negatively correlate with speed or baserunning! So the formula should read something like BR=.something*SB minus(not plus).something * CS.
The examples I gave on Fanhome were:
player A has 100 SB 20 CS
player B 100/40
player C 100/60
Tango's formula says that player C is the "fastest" (has the highest BRLWTS). That is absurd!
When you combine the basesteling formula with Tango's baserunnin formula, you get the correct sign (correlation) for CS, but that is an accident. What if CS were not that bad, such that the correct basestealing formula were SB/CS runs=.20*SB-.10*CS? Now if we add up the two formulas to get GADBR, we have the "wrong sign" for the CS (we get QADBR=.30SB+.01CS).
Someone help me out here!
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 8:03 p.m.,
December 10, 2003
(#13) -
MGL
Now that I think of it some more....
Assuming that Tango's multiple linear regression of SB and CS on baserunning lwts was correct (I assume he did a "normal" MLR analysis), is it possible that it is true that even for a constant SB, that players who have more CS's are actually faster (have higher baserunning lwts)? IOW, what is more predictive of speed or baseruning lwts is a player's attempts and not their success rate?
I don't know that much about regression analyses, but when you do a multiple regression, don't the regression coeficients "assume" that the other variable is constant?
Is that where I am going wrong? I just assumed that a player with 100 SB and 20 CS would be faster than a player who had 100/40, even though the 100/40 attempted a steal more often?
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 8:11 p.m.,
December 10, 2003
(#14) -
MGL
BTW, as far as the overall value of Rickey, as compared to Schmidt or any other player, that is why Super-lwts (whether we use UZR, DRA, or any other good defensive metric doesn't matter) is so important and valuable, if I may say so myself. Without baserunning lwts and defense, and a few other minor things (GDP and moving runners over), we are leaving out a significant part of the picture for no good or even apparent reason.
If we add in Tango's custom lwts by batting order, we have almost everything we need to see who are the best and worst overall players in any era or accross era's (assuming we do cross-era adjustments correctly)...
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 11:40 p.m.,
December 10, 2003
(#18) -
MGL
Or maybe, and this may be a stretch, Base Runs are a function of aggresiveness as well as speed. The guy who gets caught stealing 40 times is obviously willing to take some chances.
Actually, that may very well be true. We think that baserunners on the average are WAY to conservative. We will address that in our book. Over-aggresiveness on basestealing is a bad thing, but over-aggressiveness on baserunnng may be a very good thing, so your theory may have some merit....
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 1:02 p.m.,
December 11, 2003
(#23) -
MGL
Regarding the data above..
OK, the plus guys are at the top and the minus guys at the bottom, but I can see no particular "order" for the list other than that. Is this part of an IQ test? In what order are the players in the list?
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 2:10 a.m.,
December 12, 2003
(#26) -
MGL
I thought that the best/worst column would be cool, but it looks like most of the "worst" are just players who rarely attempt steals and are really slow, as opposed to players who rarely attempt steals, but are not that slow. Also, I wonder how much randomness there is the best/worst ratings (I think a lot) as getting thrown out at a few bases or not can be a fluke one way or another.
I could have sworn there was some "order" to that first list, as it looked like most of the fast guys were at the top and the slow guys were at the bottom.
Wouldn't it be great if players could both steal and run the bases optimally, especially the fast ones (i.e., be less aggressive at the right times on basestealing and more aggresive on baserunning)? I guess we'll have to wait until our book comes out! :)
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 12:33 a.m.,
December 12, 2003
(#3) -
MGL
I have some respect for DM, but..
1) Their methods are apparently proprietary, which immediately takes them out of the realm of legitimate scientific research as far as I am concerned.
2) The description of their analyses on the above website is tainted with surplussage and fluff. To wit:
we look at range factors, which are assists and/or putouts per nine defensive innings, keeping in mind that range factors can be severely biased by the nature of a team's pitching staff: the left/right mix, strikeout rates, and tendency to generate ground balls versus fly balls
There is absolutely no reason to "look at range factors" if you have access to PBP data and use it for a zone based defensive metric, which they obviously do. That's like saying that "even though we have sophisticated metrics for evaluating offensive value and production, like lwts, OPS, runs created, baseruns, etc., we also look at BA."
That is pure unadulterated B.S.!
in cases where our findings are at odds with a player's reputation, we use the video clips on MLB.com to watch a large number of plays involving that fielder...
They can't be serious!
I take back what I said in the first sentence. I have absolutely no respect for them....
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 12:45 a.m.,
December 12, 2003
(#4) -
MGL
To continue my excoriation of DM....
But it's hard to judge pitchers on only one season because they typically get dozens of chances to make plays, while other fielders get hundreds of opportunities.
If we extend our review of pitchers who convert a high percentage of chances into outs to include the last three years...
Unblelievably assinine statements! The gold gloves (both the official ones and the DM ones) are awarded each year. What possible relevance is a player's 3-year stats? When giving yearly awards, you don't care if a player's one-year performance was luck or skill. Sheesh!
Mussina was a good pick, in my view, because he was in the league's top tier in turning batted balls into outs, was third in the league in error-free chances, controlled the running game (only 9 steals in 19 attempts), and has done these things well enough in the past to show that this was not a fluke....
I didn't know that controlling the running game was part of a pitcher's qualifications for a gold glove, which is "fielding performance" as far as I know. Do they consider a position player's SB/CS totals for gold gloves? (OK, maybe that's a stretch.)
And again, whether a fielder's performance was a "fluke" or not should have absolutely no bearing on his golf glove qualificiations in any given year!
I am blown away by the poor quality of this article (as you can tell)...
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 1:02 a.m.,
December 12, 2003
(#5) -
MGL
Two more criticisms, now that I've plunged Primer into italics hell....
"It's a classic question. Would you rather have a guy with great range but is somewhat error-prone or someone who's steadier but doesn't cover as much ground?"
That's a classic question? DM, did you forget that you were a sabermetric web site?
Since an error is "worth" almost exactly what a missed ball (base hit) is, you would rather have the guy that makes the most outs (per opportunity), period! That's not much of a question (for a sabermetrically knowledgable person)!
Finally, DM just gets done trashing A Jones' defense (and rightfully so, although I haven't done his 2003 UZR yet, but I expect it to be bad like 2002); then they go ahead a make him one of their Gold Glove recipients! Makes no sense to me!
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 2:15 a.m.,
December 12, 2003
(#6) -
MGL
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 2:18 a.m.,
December 12, 2003
(#7) -
MGL
Trying to get rid of italics. Not having much luck.
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 3:33 a.m.,
December 12, 2003
(#9) -
MGL
I take back what I said about Jones. A preliminary 2003 UZR for A. Jones is:
+16 runs per 162 games (not including arm).
In the NL (CF only), BTW, Biggio was terrible, Pierre was only average, Lofton was still good, Finley was atrocious, and Kotsay was God (again)...
In the AL (CF only)...
Bernie once again was off the chart terrible (not even couning his bad arm), at -44 per 162! He has to be moved to DH! Baldelli was terrible, Hunter was good, bit not great once again, and Cameron was lights out again (one of the best overall players in baseball), with Wells, Beltran, and Damon all good as expected. Oh and Erstad, for his limited playing time, was once again great, even with all his injuries...
I'll publish the complete 2003 UZR's soon...
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 10:02 a.m.,
December 12, 2003
(#14) -
MGL
Hey, at least I stimulated some good discussion! Seriously, I was unfair to DM of course. Despite their shortcomings, of which we all are not immune, they are one of the best sabermetric resources out there, especially for defense. Of course, we all wish that they would provide more insight into their exact methodologies, as well as provide a "number" for defense, but you can't fault someone for trying to make a buck, and I guess that is one of their market strategies.
I'll have some more comments later. Got to go to a funeral (ex-mother-in-law passed away)....
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 7:47 p.m.,
December 12, 2003
(#22) -
MGL
Thanks guys.
Anyway...
A few more comments...
Since UZR's at first base do not have large range of values, it is reasonable to think that scooping bad throws is a failry big part of defensive ability at first. Snow is considered particularly adept at this (hard to say whether it is true of course). His UZR ratings have not been that good (a little above average this year), but he may be a much better first baseman overall because of his scooping ability.
It pisses me off that the "stringers," the people who score the games for STATS and other companies that provide PBP data, don't record "bad throws". It's not the stringers' fault of course. There is a lot of valuable stuff that is left out of the PBP data, but that is starting to change. Eventually, everything will be taken off a video by a computer I imagine.
I am working on a way to estimate a first baseman's scooping ability by simply comparing the errors made by 2b, SS, and 3b with that player and first and the same 2b, SS, and 3b with another player at first, with the appropriate weightigs. Actually, it is not as simple as it sounds, as you have to deal with all kinds of tricky sample size issues.
As far as whether one should use prior years' data for these types of awards, I completely disagree with the sentiment here. Although I normally have little interest in performance type metrics (I usually want to know true talent and future performance), I think that an award such as a gold glove should have absolutely nothing to do with any other year but the one in question, nor should you have to use data from any other year. I understand the arguments made herein about using past years to establishing baselines and to increase the confidence level of one particular year's evaluation, but the bottom line is that other years should have very little (almost none) relevance or influence on how much "value" a player added or subtracted to his team as compared to another player in the same year, nor should we care at all about that player's true value (real talent, future perforemance, etc.).
As far as including a pitcher's ability to control the running game in the Gold Glove award, one, I would have to look at the guidelines before I knew for sure what my stance was on that. Clearly it is "defense" (preventing runs) but clearly it is NOT fielding. The fact that they call it a Gold "Glove" suggests that they are interested in fielding only and NOT every aspect of defense. If that were the case, why would you not include a pitcher's pitching in his Gold Glove qualifications? IOW, can you say for sure whether holding runners is part of pitching or part of fielding or an entity all to itself? I don't think it is clear cut (in fact I know it is not). Obviously a catcher's SB/CS totals are a lot closer to being part of his "fielding" than it is for a pitcher, since he has to actually throw the ball. I don't think that the fact that it is included in a catcher's GG qauls makes it anywhere near a no-brainer for pitchers. In fact, if traditionally managers do not take that into consideration for ptichers, then you can make a reasonable argument that it shouldn't be considered, as sometimes tradition creates the rules, when the rules are incomplete or ambiguous.
Yes, defense does generally decline before overall offense, J Cross. The evidence is that fielding is like triples and that it peaks very early for obvious reasons (speed and agility rather than learning or experience are the major components, at least at the major league level). As far as whether a "declining" fielding metric suggests an impending decline in offense, on the average, I have no idea. I doubt there is much connection, although there is probably a decline in triples, SB, and bunts and infield hits contemporaneous with a decline in defense, but I don't think that one "follows" or "signals" the other.
More importantly, however, is the fact that it is almost impossible to tell whether a player has started to decline or not (or when he peaks, etc.), as there is just too much noise. Anecdotal examples are the A. Jones thing and Jeter in hitting, although I'm sure there are many examples...
Some more 03 UZR "previews" (ones that I thought were interesting without necessarily mentioning any numbers): These are preliminary because I haven't put in any changes yet into the UZR methodology that I plan to, based on some great discussions and suggestions on Primer and Fanhome, and I haven't updated the park factors yet (I am using last year's which should be around the same for all parks but Cin I think...
R. Alomar, not surprsingly, was terrible. Time to retire real soon. Although we should expect to see some rebound offensively and defensively, I would not want him on my team at any price.
Bagwell still good even though offensive has declined a lot with age. My aging and defense research last year indicated that defense at first base, unlike any other position, may improve with age into a player's 30's like some offensive components. That is not too surprising, I don't think.
D. Bell is still great at third and unheralded I think.
Berkman is surpisingly decent.
As I said Biggio did not make the adjustment to CF real well, but putting someone in CF (a young person's position) at his age was not a good idea.
A. Boone still good on defense, which makes his overall value pretty good despite mediocre offensive numbers.
Amazingly, Bordick still one of the best at any IF position, still largely because of his great hands!
S. Casey, tremendous! L. Castillo not so good.
Both Chavez' (Endy and Eric) very good.
Hee Seop no good.
Jose Cruz great!
Counsell's defense keeping him in the game, despite the ugliest batting stance in the history of baseball.
J.D. Drew no good anymore or fluke?
Eckstein and A. Everett still great! Carl Everett, OTOH, yuk!
Furcal very good, very good overall player. One of the best.
Giambi's days at first should be numbered. Yanks defense is in big trouble, other than Boone at third (and Soriano)! Here are their approx. 2003 UZR's at each position:
First Base: -21 (N. Johnson was no good either)
Second base: +7
SS -26
3B +13
LF -4
CF -40 (-33 Bernie)
RF -3
That's is a total of -74, almost a half run per game! Think how much better their pitchers are than what their ERA's suggest!
S. Green still terrilbe. Not much value overall any more.
Grissom still no good. Despite great and surprising offense in 2003, I hate him as a player.
Vlad no good in 2003. Is that a fluke (before everyone invents reasons for his decline, think Andruw)?
C. Guzman still terrible! Why does this guy still have a job?
Hatteberg, NG!
Izturis average - very surprising. Fluke?
All the Lee's were great - Travis, Derek, and Carlos! Travis was the best.
As I said, Lofton can still chase 'em down. His reputation for "losing it" came from a few awkward plays in the '02 WS, I think.
Lowell was way better this year. Maybe he's not as bad I previously thought.
Tino still good. Should add a half win to the D-Rays to bring their projected win total up to around 51 games.
Mientk.. excellent!
Olerud still pretty good.
Mags was great in right. Also one of the most underrated players in baseball.
Neifi Perez back to being very good. I don't know what is up with his fielding other than he has fluctuated wildly from year to year. Maybe just extreme random variation. Someone has to have that.
Polanco an unheralded great defensive player.
Pujols can field pretty well too!
I almost forgot. Bonds was not nearly as bad as I anticipated (a little below average).
Manny was terrible!
A-Rod and Nomar both above average again (A-Rod better). One of these guys is better overall than Jeter and the other is a jillion times better overall than Jeter.
Rolen, OK, but nowhere near as good as in previous years (injuries?).
Rey Sanchez not really pickin' 'em anymore. If he can't, that should be end of career - oops, I forgot about his veteran leadership value!
Ichiro finally has good UZR numbers! Maybe his last few years low UZR's were a fluke.
Tejada below average, but not by much.
Thome was horrible! I can't think of his previous years' UZR's (good, bad?) off the top of my head.
Jose Valentin fantastic! Another terrific and underrated player I think.
Vina was terrible in limited play.
Ty Wigginton was atrocious. What's his reputation on defense?
Womack wasn't too bad this year. Still can't hit of course. He and Neifi are interchangeable to me and should be bat boys for some team, not bat ers.
As I said, Cameron was light out again. Easily the best defender in baseball year after year.
Keep in mind that the above comments are heavily and unfarily biased towards one-year samples, which are hardly necessarily representative of aplayer's true defensive ability. to get a much more relible snapshot of a player's true defensive value, you should look at their multi-year (3?) UZR's combined (perhaps weighted). If you don't look at them combined you will be tempted to make unreliable inferences about their ascent or decline...
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 3:30 p.m.,
December 15, 2003
(#25) -
MGL
Bob, there is no doubt that a catcher's nubmers (SB/CS and PB) should be ajdusted for the pitchers they catch, especially a part-time catcher who might catch a particular pitcher every time he pitches. Whether Tippett did that or not. I have no idea.
UZR does not rate catchers. In Super-lwts there is a category for catcher lwts which uses a catcher's SB/CS, PB, and error rates to come up with a runs saved or cost as compared to an average catcher given the same # of opportunities. Unfortunately, I do not control for the pitchers either. I should, and maybe I will from now on.
Jamie Moyer does indeed have one of the lowest WP rates in baseball so yes, you would expect his catchers' PB rates to be somewhat low as well (I think - I don't know how much correaltion there is - PB's are supposed to be somewhat independent of the pitcher - i.e. a PB is not supposed to be a pitch in the first or other difficult or "wild" pitch). And of course, any catcher who catches a left pitcher is necessarily going to helped considerably with his SB/CS numbers. OTOH, it is sometimes difficult to tell "which came first" - the catcher's SB/CS rate or the pitcher's, especially since I don't know of anyone who has made a reasonable estimate of each player's (pitcher or catcher) percent of "responsibility" in the SB/CS attempt and success rates.
Looking at Moyer's 2000-2002 SB/CS numbers and using that as a proxt for how HE controls the running game, i.e., how much HE helps Wilson, he allowed 34 SB in 56 attempts, which looks like about an averge number and rate (61%) for a LHP. Compare that to someone like Maddux, who in 3 years has allowed 74 SB's in 101 attempts (73% success rate). OTOH, SEA RHP's, such as Baldwin (24/46), Garcia (36/42), and Piniero (36/50) had similar attempt and success rates.
Looking at my Superlwts from 2000-2002 (I don't have the 2003 ones yet), and using the methodology described above for "catcher dfefense," I have Wilson at only +5 runs, Molina at +16 (compare Piazza at -27), so even with possibly getting "help" from Moyer, Wilson was not all that great for those 3 years...
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 3:44 p.m.,
December 15, 2003
(#26) -
MGL
proxt=proxy
pitch in the first=pitch inthe dirt
left pitcher=lefy pitch
my typing = sucky
Banner Years and Reverse Banner Years for a Player's BB rate (December 14, 2003)
Discussion ThreadPosted 1:49 p.m.,
December 14, 2003
(#4) -
MGL
TOK, you got that exactly right. Since Cruz was 29 in his banner year, right on the cusp, I suppose you should use an average of the younger and older guys weighting scheme.
I am using a rate stat, of course, although I'm not exactly sure what you mean (is 50 BB per 500 PA a "rate" stat or a counting stat). Any stat can be expressed anyway you want, e.g., HR's per AB, per PA, per season, per game, per career - it has to be [i]per something[/i] unless you simply don't want to tell someone or you don't know what the "per" is (e.g., "Bonds hit 127 HR's.")
BA I guess is tradiionally called a "rate" stat, and RBI, for edample is traditonally a "counting" stat, but that is just semantics - both are "per something." The RBI is "per season" (or per career or whatever you want) and the BA is "hits per AB." Don't get caught up in words like rate stat and counting stat. They mean nothing, and can only confuse people.
It isn't that clear in the write-up, but all of those numbers (42, 60, etc.) are [i]per 500 PA[/i].
MH, I don't know that this study informs us as to whether you can "teach" someone (or to what extent) to have plate discipline. Could be. I think that we always knew two things - one, that players' plate discipline increases quite significantly as they age (or as they get professional exprerience, I don't know which is the cause), for whatever reasons, and two, that different players probably show different "true aging curves" when it comes to plate discipline.
Maybe the study does suggest that a lot of the differences we see in players' plate discipline (bb rate) curves are NOT random noise. I'm not sure. I thought the principal benefit to a study like this (and much more work needs to be done, especially for other components) was to aid us in our projections (so we can beat that pesky Marcel next year)....
Banner Years and Reverse Banner Years for a Player's BB rate (December 14, 2003)
Posted 3:50 p.m.,
December 14, 2003
(#6) -
MGL
Nice. My first question: What is the correct way to predict BB rate for a non-banner player. A guy who neither had a terrific break through or a horrific fall?
Last 3 year's weighted (5/4/3) average and then regressed towards the league average. You can regress around 25% for a full 3 years and more for less time. You should age adjust as well - 2% per year, peaking at age 37. These numbers are off the top of my head.
For example, a player is now 26 years old. His BB rates per 500 last 3 years were:
38
41
52
First, age adjust each year to get them equivalent. Say "convert them to age 37. 38*1.26, 41*1.24, 52*1.22
48
51
63
Now take the weighted average. 3*48+4*51+5*63 divided by 12.
55
Now, adjust that back to age 27.
46
Now regress that 25% toward the league average of, say 50 (non-pitchers). 46*.75+50*.25
47! Viola!
Banner Years and Reverse Banner Years for a Player's BB rate (December 14, 2003)
Posted 12:03 p.m.,
December 15, 2003
(#8) -
MGL
In the study, IBB's are removed (I think). I never look at IBB's for anything unless I am studying IBB's...
UZR 2003 Previews (December 18, 2003)
Posted 3:39 p.m.,
December 18, 2003
(#3) -
MGL
Much apologies as I completely screwed up the above numbers (the best and worst). I double counted some years for some players. Please ignore them! They are all wrong.
I have asked Tango to put the new charts up. Hopefully that will be soon! Again, I apologize! That's what I get for working at 3 in the AM!
UZR 2003 Previews (December 18, 2003)
Posted 4:25 p.m.,
December 18, 2003
(#5) -
MGL
Ditto Tango's sentiments. I don't include catchers in the UZR ratings. If I did, I suppose I could look at bunts, squibs, and popups, but alas I don't. In my Super-lwts which I will also publish, I have a defensive rating for catchers, which includes PB, SB/CS and errors. None of these are adjusted for the pool of pitchers a catcher catches, although they obviously play a role in those numbers.
Tango has a nice defensive metric for catchers which DOES control for pitchers, and which includes some other things as well. He and I have been having some discussion and some disagreement about the appropriate way to present and combine the individual values in his metric.
Speaking of combining, Tango pointed out to me - and it is worth stating here - that there is some potential confusion/misreprentation in combining a fielder's UZR's at different positions, as I did above. For example, a player, like Cruz Jr., may have very good UZR numbers in RF, and relatively poor ones in CF. Aside the random fluctuations in herent in any sample UZR rating, this is quite normal, as the amount of defnsive skill needed to play an average CF is neceassily much higher than the amount needed to play RF. In fact, according to Tango, there is about a 10 runs per year difference, which is a lot. IOW, if a player's UZR is 0 runs in CF (an average CF), his equivalent UZR in RF would be +10 (a well-above average CF'er). Therefore, every defensive position can and should be adjusted to fairly compare it to any other defensive position to get everyone on a "level playing field." To put it another way, one can "neutralize" every player's UZR by doing the appropriae adjustment, depending on their defensive position or positions. For example, and again, according to Tango, you can normalize everyone to an average CF by adding 10 runs per 162 to a SS or C's UZR, 5 runs for a 2B'man and 3B'man, and subtract 10 runs for 1B, LF, and RF. This is a rough guide.
Anyway I did not do any of that. I simply took each player's UZR's at all of the positions they happen to have played and added them up. Each player's UZR at each position is of course as compared to an average player at that position. Again, for example, Cruz Jr. appears to be an average or better RF'er, and an average or worse CF'er. If Cruz were on the lists above, his overall number would simply represent how he was in CF as compared to an average CF'er plus how he was in RF, comapred to an average RF'er. Tango doesn't seem to like adding these 2 numbers together, but I see no problem with that, as long as the above explanation is given (and I'm not sure that such a detailed explanation was even necessary).
BTW, even though I do all the work, Tango has a much better handle than I do on how to interpret the UZR stuff. So any criticism should be directed at him, and any praise to me! :)
Here are the top and bottom 5 players (in total UZR runs and NOT UZR runs per 162 "games"), AT EACH POSITION, as requested:
First base:
Worst
F.Mcgriff -45
M. Vaughn -41
P. Konerko -30
J. Thome -25
A. Galarraga -25
Best
T. Helton 75
T. Martinez 45
T. Lee 40
D. MientkiwieczxCxCv 31
T. Zeile 28
J. Bagwell 28
Second base:
Worst
L. Rivas -51
R. Alomar -48
T. Walker -25
D. Easley -24
M. Young -24
Best
A. Kennedy 70
P. Reese 40
P. Polanco 36
J. Hairston 36
M. Grudzalvzbvczbvnvb 31
Shortstop:
Worst
D. Jeter -82
C. Guzman -60
T. Womack -40
D. Cruz -31
C. Gomez -24
J. Rollins -24
Best
R. Sanchez 47
A. Rodriguez 42
J. Valentin 41
D. Eckstein 41
M. Bordick 40
Third base:
Worst
A. Ramirez -60
T. Fryman -53
F. Tatis -35
M. Lowell -28
G. Norton -27
Best
S. Rolen 94
D. Bell 77
A. Beltre 50
E. Chavez 49
J. Cirillo 44
R. Ventura 44
Left field:
Worst
P. Burrell -59
C. Floyd -50
B. Grieve -42
D. Ward -40
G. Sheffield -38
Best
G. Jenkins 77
B. Higginson 57
L. Gonzalez 53
R. White 38
T. O'leary 31
D. Erstad 31
Center field:
Worst
B. Williams -82
D. Glanville -64
S. Finley -63
J. Cruz Jr. -58
M. Grissom -47
Best
D. Erstad 95
M. Cameron 90
A. Jones 61
M. Kotsay 53
R. Rivera 48
Right field:
Worst
S. Green -49
M. Alou -39
A. Bell -33
M. Ordonez -30
M. Stairs -29
Best
T. Nixon 44
J. Guillen 39
J. Dye 37
R. Hidalgo 28
V. Guerrero 28
It seems to me that if a player is more than x amount of standard deviations from average, either good or bad, they should be moved to another position. The following players from above would appear to excellent candidates to move either left or right down the defensive spectrum:
S. Green (RF)
Erstad and Cameron to SS? (CF)
B. Williams (CF)
G. Jenkins (LF)
S. Rolen (3B)
D. Bell (3B)
D. Jeter (SS)
A. Kennedy (2B)
T. Helton (1B)
UZR 2003 Previews (December 18, 2003)
Posted 9:58 p.m.,
December 18, 2003
(#23) -
MGL
I have read many of both DS's and J. Cross' posts, and they are both big contributors to this site as well as thoughtful and intelligent analysts. I think emotion got the better of both. I've been there before. FWIW, I don't think DS meant anything "racial" with that comment. In fact, I didn't even notice it as such.
A lot of good points (and a lot of overstated points) about UZR and changing positions and changing positions in general.
BTW, I was kidding about moving Cameron or Erstad to SS although I did indeed forget about Erstad being LH. Perhaps he would have been an IF'er if not for that.
Now don't forget that Tango's UZR conversions or adjustments are based on players who have played different positions. He did not just make them up or make some assumptions about what happens when you move a player across the "defensive spectrum." That is an important point because that means that yes, we can make some pretty good assumptions, on the average, about what will happen when we move a player from, say CF to RF, or SS to 3B.
Of course, that doesn't mean that these "conversion factors" hold true for every player, by any stretch of the imagination. Common sense alone tells us that certain players are more suited to certain positions than others and that certain players may do particularly well at one posiotion and particularly poorly at another position, regardless of what the UZR conversions try to tell us.
I would be careful, however, about making assumptions about players, like "Jeter sucks at SS (which we all pretty much agree), but that he would not be any better at 2B or 3B. I've seen him as much as anybody I suppose and I can't think of any compelling reasons why that may or may not be true. I think you have to assume that if you moved him to 2B or 3B (or even CF), that he would pick up the usual "conversion" points in UZR runs. Maybe not of course.
Same for Genkins. Although he is a white guy with few steals, he must be doing something to make him a great left fielder and that something has to be something to do with speed, quickness or "jumps." Again, there is no reason why any great LF'er cannot be a very good CF'er. Unless Jenkins' high UZR in left is, or substantially is, a statistical fluke, which is always possible, he must be "fast" in the outfield (I put fast in quotes to mean and/or quick and/or gets good jumps), which means he potentially has the skills to be a good CF'er.
I agree that there may not be that much correlation between IF and OF skills, however there is SOME, as certainly speed, quickness, and agility are requisites for the entire OF, as well as SS and 2B, and sure helps at third and first as well. I would think that all CF'ers combined are going to do a lot better at SS than all LF'ers and RF'er combined.
As far as actually moving a player from one posiiton to another, by definition, you [almost] HAVE to move a player if he is that far from average at his regular position, notwithstanding politics. The only legitimate reason NOT to move him, other than politics and the fact that you temporarily might not have someone to replace him, is if he is particularly suited to that position OR unsuited to other positions. If in fact Jeter would not pick up 5 or 10 runs by moving him to another position, then yes, there is no point in moving him, other than to DH, which you can't do for politicial reasons, at least for another 5-10 years.
The bottom line (as the person who mentioned the 15 runs and SS and DH suggests) is if you are a team, you have to do all the permuations and calculations such that you put players in optimal positions, given their probable defensive value at each position and their offensive value. You have to keep in mind that a run saved does NOT equal a run earned - the former is slightly more valuable, and of course, politics (keeping players and fans happy) is always a consideration.
MH, I have no idea what you are talking about with errors. As you say, an error is indeed worth around what a missed ball (a hit) is worth. In fact, I think STATS ZR lumps errors and hits together, which is perfectly proper. I don't for mostly accounting reasons. The only difference is that: one or the other may have a different luck/skill ratio and one may have a greater sample error due to the methodology (i.e., more or less regression when converting observed values to estimates of true values or projections).
UZR does keep errors (hands) and range separate and I usuallly just combione the two. One reason is that almost all of the variation among fielders at a position is range and not errors. There are occasionally some exceptions, like a lot of Bordick's year in and year out value is in "hands" (errors), and as I did state one time, R. Ordonez fluctuated wildly in errors for a few years for some or for no reason at all.
Interestingly (again, another anecdotal example of how bad observation is), A. Ramirez was -5 total UZR runs (-14 per 162) with the Cubs and +2/+3 with Pittsburgh (the difference is probably not enought to "see" - in fact, I would venture to say that we don't ever "see" the differences in player's UZR's. We only "see" differences in "technique" which may or may not correlate well with a player's true defensive talent. And of course, unfortunately, we see and mentally note "great" plays and "bonehead" plays, both of which probably have little to do with a player's overall defense, as measured by UZR.
A. Ramirez, as we can surpmise from looking at his error numbers or his fielding %, was one of those rare players who had decent range and atrocious error numbers. His UZR total range in 2003 was +4 runs while his total error UZR was =8 runs! In 02, it was -23 and -3, and in 01 it was -21 and -1, and in 00, in limited time, iot was -5 and -4. It appears as if Aramis is a geninely atrocious third baseman, both in range and in hands, although it is possible that his range is getting better and is evidenced by his 03 range UZR. Honestly, I have no idea what the proper weights are for combining a player's historical UZR's in order to project their future UZR, although I suspect it is something like OPS, or 5/4/3 plus regression...
UZR 2003 Previews (December 18, 2003)
Posted 10:03 p.m.,
December 18, 2003
(#25) -
MGL
Last paragraph, =8 should be -8.
UZR 2003 Previews (December 18, 2003)
Posted 11:58 p.m.,
December 18, 2003
(#27) -
MGL
If a lefty was platooned and faced 90% righties instead of the normal 70% (?) maybe his numbers should be adjusted down to reflect his true ability.
Yes, yes, of course it should. For my payer projections and my offensive ratings in SUper-lwts, I adjust for quuality and handedness of opponent pitchers. For most players, especially ful-time ones, it is no big deal. But of course if a player is being platooned, then yes, his overall abilty (versus all handedness pitchers) is going to be quite different that his sample stats would suggest.
Everbody loves a new stat adjustment.
Uh, I don't know about that, unless you define "everybody" as "nobody."
For projection purposes and estimating ability or context-neutral value (all basically the same thing), yes, one should take a player's complete sample stats and do all the adjustments (park, opponents, age etc.) and come out with a neutral, normalized stats, be it OPS, lwts or individual components, which represents that player's value or projection if they played on a league average team in a league average park, versus the league average %'s of left and righty pitchers. From there you can do anything you want. If you want to project how that player would do in a platoon role, you do exactly the same thing (you DON'T look at their sample splits!) - come up with am overall projection - and then adjust that projection using league a average platoon ratio if the player is a RHB, or a unique paltoon tatio for that player if the player is a LHB. And that unique platoon ratio for LHB's must be computed by taking that players multi-year sample platoon ratio and heavily (like 75% or more, depending upon sample size) regressin git towards a leage average platoon ratio for LHB's. You might even be better off just using a league average platoon ratio for all LHB's, as you would for RHB's. It would be like DIPS for platoon ratios. It is definitely BETTER to use a league average ratio for LHB's than to use that batter's actual sample platoon ratio, if those are your only choices (for some reason you can't or don't want to do a regression).
The big mistake that almost everyone does, even some astute analysts and so-called sabermetricians is to quote or use a player's sample platoon ratio or sample splits to discuss, analyze or justify a platoon situation. If a playe has sample splits that are around league average anyway, then that is fine of ocurse, But if a player has an extreme split one way or another, that is a huge mistake!
For example, let's say a RHB had an overall OPS of .763 in around 600 PA's over the last 2 or 3 years - pretty average. Let's also say that his OPS versus lefties was .864 in around half of those PA's and .666 versus RHP. He is obviously being platooned a lot.
It is a mistake to look at his .864 versus LHP and to say that he is likely to hit around that in the future versus lefties, as if the .666 tells you nothing about his overall batting ability! Basically since this payer is a RHB, we assume that he has a league average true platoon ratio, which means that our best estimate of how he will hit versus lefites in the future is simply .763 times 1.09 (league aveage OPS ratio platoon ratio for RHB), or .832, a far cry from his sample OPS versus lefties of .864 since half of his sample PA's were against lefties and half versus righties. I am not including regressions in these projections or age adjustments or anything like that.
BTW, this exact player is Tony Graffanino who is the poster boy for a RHB who should be platooned. What I am saying is that he should not be platooned any more or less than any other RHB with an overall OPS of .763 (adjusted for the number of RHP and LHP he faced). IOW, a player's sample splits (at least for RHB's) should have NO BEARING on whether they should be platooned or not! The only thing that should be considered is their overall OPS projection. If it is low enough and he has other good qualities, like defense and baserunning, the you might want to consider platooning him, if it is cost effective to tie up 2 players for one slot in the BO...
UZR 2003 Previews (December 18, 2003)
Posted 12:02 a.m.,
December 19, 2003
(#28) -
MGL
BTW, this exact player is Tony Graffanino who is the poster boy for a RHB who should be platooned
What I meant was he is used a poster boy by those uninformed (about how actual platoon ratios work) people I was talking about...
UZR 2003 Previews (December 18, 2003)
Posted 2:17 p.m.,
December 19, 2003
(#33) -
MGL
Do you think that it is possible (or even likely) that Williams would become an average or above right fielder if put in that position.
It's possible, but no it is extrememly unlikely. Given that Bernie should pick up a few (5-10) runs in UZR in right, yes, it is true that Sheffield may not be much better, if at all, than Bernie.
The problem with Bernie playing RF is that his arm is atrocious for a CF. I don't think that a manager would dare put him in right...
UZR 2003 Previews (December 18, 2003)
Posted 7:21 p.m.,
December 19, 2003
(#36) -
MGL
But as far as the assertion by mgl that the corner OF positions are essentially the same as CF, but just to a lesser degree--I don't see why this should be assumed.
Nowhere did I make or imply such an assertion. Everything else you said in that paragraph is true. Didn't I also say or imply that there are obviously some "skills" unique to certain positions?
All I said is that absent some knowledge about a partiuclar player's skills at a particualr position, we can assume that when a player moves from CF to LF or RF that he will automatically pick up x amount of runs! That is true, that is true, that is true! What exaclty is it that YOU want to assume, given that in this hypothetical, we know NOTHING about the player? The reason we can ASSUME that our hypothetical CF'er, that we know nothing about, will pick up 5 or 10 runs, theoretcially, or on the average, when he moves to RF or LF is because we GOT THAT ADJUSTMENT by studyong all player that have played multiple positions. When we find something out about ALL players (assuming that it is large enough to not worry too much about sample error), then it is safe to assume - in fact, you HAVE TO assume - that that something applies to eny given player in that sub-group, if we know nothing else about that player! That's all I said about Jenkins or any other CF'er. If you know or you think you know something about Jenkins that would make him different from the average player who has played both LF and CF, then fine - I have no quarrel with that. Obviosuly, within that sub-group there are going to be all kinds of "exceptions". Maybe Jenkins would not lose only 10 runs in talent by moving to CF. Maybe he would go from a great LF'er to a below average LF'er because he has some skills that are particularly suited to LF and not to CF. I have no idea. IF you do, then fine. That's not what I was talking about, was it - that I KNOW that Jenkins is just like the average OF who has played both LF and RF? I don't know that! Since I don't know anyhting about Jenkins, the absolute best estimate of his UZR in CF is going to be his LF UZR projection minus 5 or 10 runs, or whatever Tango says the adjustment is, based on hsitoricla reuslts from all players who have played both LF and CF! Sheesh!
And, yes, there is certain number (x amount of runs or x amount of SD's of runs), plus or minus, at which a team MUST consider moving a player to another position. Whether they do or not, or another team dies or not, depends on their other personnel AND if they think or know that this player is particularly suited to their persent positon. such that the standard adjustments from position to position do not apply to him.
Yes, of course, it is going to affect that player's offensive value above replacement or above average at that position, but that doesn't really matter.
David, don't look at it as a player "moving to another position." That is what is screwing up your thinking. Look at it as, when the season starts, every player can potentially play multiple positions. Then look at it like we can estimate their defensive value at all thsoe positons, based on their past UZR's at the positions they have played AND the average adjustments for the other positions. So now you have Jeter, who is projected at say -40 at SS (that's not true, but say it is), -30 at 2B and -20 at 3B. The question is where do you play him, not whether you move him from SS to another position (politics aside). You wouldn't even think of putting him at SS any more than you woulkd think of putting a bad third base defender at SS, because a terrible SS is the exacxt same entity as a bad 3B'man (again, not considering players' unique skills at certain positions)!
So yes, as soon as you realize that a player is way above or way below the average defender at a certain position, you MUST consider moving him to another position, and if you determine that he doesn't have any skills that are particualarly suited to one position over (as compared to an average defender), then you MUST move him.
The players I pointed out are clearly over that threshold whatever that threshold is, which I have no idea. That means that yes, their teams must consider moving them, notwithstanding their other personnel and politics. For example, if it is determined that Jenkins does not have particular skills that make him especially suited for LF as opposed to CF (again, as compared to other players who have played both positions), i.e, that he is not going to lose more than say 10 or 15 runs in CF, then yes, they SHOULD move him and get another LF'er!
Tango can probably better explain why you HAVE to consider moving a player if he is way better or worse than the average defender. BTW, if a player is way worse than the average defender at a certain position AND his offense is really bad, the you may have to consider moving him to the bench or to the unemployment lines, such as when Bernie's offense goes in the tank. IOW, if a player is way too good defensively at a certain position, then not only MUST you consider moving him to a more demanding defensive position, but if his offense is good enough for the original position, like with Jenkins, then his offense is obviously going to be OK at the new, more demanding, position. OTOH, for the player who's defense is terrible at one position, yo have to either move him to a less demanding defensive position, or if his offense is not good enough at that position (less than replacemtn or so), then yo have to bench him or release him (or get some sucker to take him in a trade). And you know what B. Beane said. "There's a sucker GM born every minute!"
UZR 2003 Previews (December 18, 2003)
Posted 10:39 p.m.,
December 19, 2003
(#40) -
MGL
I am just putting the finishing touches on the 2003 UZR's. Actually I already have the UZR's done. I am just adding the arms for OF'ers and the gdp's for IF'ers, so that we can get the complete defensive picture (more or less).
UZR 2003 Previews (December 18, 2003)
Posted 4:32 p.m.,
December 20, 2003
(#42) -
MGL
The 2003 (and updated 2000-2002) defensive lwts (UZR, GDP for IF'ers and ARMS for OF'ers) are done!
Tango, will post it somewhere!
2003 Gold Gloves (highest total UZR runs plus GDP runs for IF'ers and ARM runs for OF'ers, min 90 games):
First base
NL T. Helton
AL T. Lee
Second base
NL Polanco (in only 90 games, he had the most def. runs!
AL M. Ellis
Thid base
NL A. Beltran
Al E. Chavez
SS
NL A. Everett
AL J. Valentin
OF
NL M Kotsay, R. Hidalgo, E. Chavez
AL M. Cameron, Ichiro, R. Winn
UZR 2003 Previews (December 18, 2003)
Posted 5:56 p.m.,
December 20, 2003
(#44) -
MGL
Here are Jose's year by year range and error lwts:
Year, range, error, games
99 3, -7, 81
00 17, -10, 144
01 5, -3, 38
02 8, -2, 73
03 27, -1, 123
That is a total of +60 runs in range and -23 in errors in 457 games, which is +21/-8 per 162.
It is not surpring that they don't like his error rate. It is not good (around 10 extra errors per season). However, it would appear as if his range more than makes up for his bad hands.
He would be a good candidate for being undervalued overall. It is also not surprising that a team would put more emphasis on errors than on range, as #1, it is easly measured, and #2, it is the traditional measure of an infielder's defense.
I took a quick look at his 2000-2002 Super-lwts. This guy appears to be a sleeper. His numbers are near superstar levels for a SS. His baserunning and offensive linear weights are terrific, and as discussed above he is likely a well-above average defender.
His 5 mil a year salary is a bargain. I would think that he is worth, or at least he was (he is a little old at 31), around 7 or 8 mil a year...
UZR, 2000-2003 (December 21, 2003)
Posted 1:17 p.m.,
December 21, 2003
(#1) -
MGL
Nice work on the chart Tango! How did you come up with the weightings? An educated guess? There is a typo in the weightings on top. The second 2002 should read 2000.
For readers not that familiar with UZR. Check out the UZR articles (part I and part II) on the Primer archives. A crach course:
UZR runs includes "range" and "hands" (hands inlcudes throwing and other errors) combined. A player like Jose Valentin at SS appears to be terrible with "hands" (high error rate), but excellent in range. A player like M. Bordick at SS appears to be good in range but great in hands. Keep in mind that there is much large variation in range than in hands, even for IF'ers. For OF'ers, there is very little variation in hands.
For DP lwts (IF'ers only), for every DP or missed DP (per opportunity, of course), equal credit is given to the fielder and the pivot man, although I don't know that the true split is indeed 50/50.
First basemen are not evaluated on their "throw receiving ability" although that is certainly part of thier fielding "toolbox."
Pop flies and line drives are not evaluated for IF'ers.
Finally, an OF'ers ability to keep players from stretching singles into doubles or doubles into triples, by virtue of the threat of their arm, is not considered in arm lwts. Throwing out a runner trying to stretch IS included. So keep in mind that the lwts value of OF arms is probably understated, maybe by quite a bit (50%?).
A couple of observations:
Lots of very bad and very good CF'ers! I think that the reason is that managers will throw or leave lots of people in center who don't really belong there. Not so, for example, at SS (Jeter not withstanding).
Cruz is interesting. Horrible UZR in CF and very good UZR in RF in about an equal number of games. Do you think he has some special skills that are not particularly suited for CF, but are for RF or he just can't get the hang of CF, or is it just a fluke and he is really a slighly below average CF'er and a slightly above average RF'er? I vote for the latter.
Are Hidalgo and Valentin two of the most underrated defenders in baseball? Underrated overall? Hidalgo's 2003 salary look about right at 8.5 mil. Valentin is underpaid I think...
UZR, 2000-2003 (December 21, 2003)
Posted 2:54 p.m.,
December 23, 2003
(#7) -
MGL
Chris, Kearns should be in the complete CSV file, is he not? He is indeed very good. +27 in 182 games in LF, CF, and RF (mostly RF - 128 games). That is +22 in UZR and +5 in arm, so his arm is great too...
UZR, 2000-2003 (December 21, 2003)
Posted 2:13 p.m.,
December 24, 2003
(#12) -
MGL
Tango, not a big deal, but the typo (2002 should read 2000) is still there in the link...
UZR, 2000-2003 (December 21, 2003)
Posted 3:54 p.m.,
December 28, 2003
(#15) -
MGL
Yes, of course, inherent in a player's UZR is his positioning, good or bad. Same thing for pitchers and batters, right? So yes, when we use a performance metric to estimate a player's "skill", we are including some apsects that have nothing to do with the player himself (manager, scouts, etc.).
One, I doubt it is THAT big a deal, as I think that no manager, team, or scout is going to "tolerate" an awful positioning by a fielder. All teams have advance scouts, etc. In fact, if there is ANYTHING that most teams do well, it is that. It is illogical to think that Little and the BoSox, especially with Epstein and James on board, are going to promote or tolerate bad positioning from Walker or anyone else.
So to answer some of your questions and concerns, which are very good ones BTW:
1) I don't thnk that postitioning makes much difference as long as it is reasonable and I think that most teams employ reasonable positioning.
2) You can probably optimize positioning with a complex video or visual analysis, but it would be difficult.
3) I don't know if the affect of positioning would be "linear" or not accross all player talent levels, but as I said I don't think the affect is large anyway, and for small effects, linearity or non-linearity are almost the same. I highly doubt that positioning makes NO difference for very good fielders. If I had to guess I would say that positioning is roughly linear to SOMETHING across all talent levels.
UZR, 2000-2003 (December 21, 2003)
Posted 10:02 p.m.,
December 29, 2003
(#18) -
MGL
Dave, my UZR includes park adjustments. The 2 parks/positions that have signifciant adjustments are Coors Field and Fenway/LF, so that should already be factored in to Manny's UZR.
As eacxh year/team is just a small sample of a player's true UZR and his true defensive ability, you get a best estimate of both by combining UZR's over time to get a large sample size. When a player has a large (or small) difference between his UZR from one year or team to another, it is much more likley that this is just random fluctuation than that his talent got better or worse. In 298 games over the last 4 years, mostly in LF, Manny has a UZR of around -18 per 162. That means that he is likely a very bad OF'er, which shouldn't surprise anyone as there is nothing in his offensive stat profile that indicates any kind of speed or quickness, one or both of those usually being requisites to good OF defense. I usually don't like to even look at a player's yearly stats, be it for offense or defense. I like to look at total multi-year stats so that I am not tempted to read into any year to year patterns, as almost everyone is tempted to do, and does do (and you have attempted to do), as that is part of human nature apparently...
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 7:55 p.m.,
December 21, 2003
(#3) -
MGL
Good stuff! The idea of a position-neutral UZR is still rattling around in my head like a pinball on speed. Can you explain a little bit what a chart like that means and how it can be "used" (when you get the time, of course)?
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 9:00 p.m.,
December 21, 2003
(#5) -
MGL
As for its implications, for the most part, you do have the best fielders at the top needing to play in one of the 4 top fielding positions.
That is exaclty what I was saying to DS on the other thread - that when a player is x amount of UZR runs above or below average at his position, where x is some large number, you MUST consider moving him to another position. I could not explain to DS why this is so. Is it because it is easier (and cheaper) to replace a player at an easier defensive position?
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 9:04 p.m.,
December 21, 2003
(#6) -
MGL
Does this mean that Rolen, Polanco, and Bell should be considered for SS? Jenkins and Hidalgo should be playing center?
Should all the guys near the bottom who are playing one of the top 4 positions (Bernie, Jeter, et al.) be moved?
Again, unless we have evidence of some special skill set....
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 11:12 p.m.,
December 21, 2003
(#8) -
MGL
Tango, say you have a very good CF'er, and you move him to left field and then his UZR numbers are extrmemely high (maybe this is the case with Jenkins - we don't know). How do you explain that he SHOULD be in CF and not LF (assuming that is true), when it appears that wherever you put him the team has the same overall offensive and defensive runs, assuming a replacement player in the position you don't put the fielder?
After all, the flip side of the argument that you MUST comsider moving an extememly gifted defender at one position is that if he were already at that position, you would not move him to the easier position. IOW, if Jenkins is indeed a CF'er in disguise (his UZR in left STRONGLY suggests that he is), you would not even think of moving him to LF. That would be like moving Cameron or Erstad to LF.
The reason I say that a player must be WAY WAY above average (like Jenkins) before you consider moving him, and that then you MUST consider moving him as opposed to just WAY (one "way") above average, like L-Gon or O'Leary, is that if a player is THAT far above average, that is already a VERY STRONG indication that he is playing the wrong position!
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 4:53 a.m.,
December 22, 2003
(#10) -
MGL
AED, very provocative point! I am interested in Tango's response!
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 4:54 a.m.,
December 22, 2003
(#11) -
MGL
Actually, the only problem with that point is that all the OF positions get around the same number of plays (per game)...
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 6:14 p.m.,
December 23, 2003
(#31) -
MGL
Colin, Tango's regresson formula for UZR IS the exact amount of regression you want to use before combining it with offense (after regressing the sample offensive lwts) in order to come up with a total player "value," where "value" is an estimate of "true" value...
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 2:11 p.m.,
December 24, 2003
(#35) -
MGL
Colin M., I understand exaclty what you are saying. I'll have to think about what the answer is in terms of UZR and other components of a player's value. Of course you are talking about the classic difference between "reliability" and "accuracy" in experimental science, where reliability is correlation from one measurement to the other (for the same element), i.e., year to year UZR correaltion, and accuracy is how well the measurement reflects the true value, i.e., how well UZR describes actual fielding talent or value. For example, fielding percentage probably has a high degreee of reliability, but a poor degree of accuracy in terms of defining overall fielding skill (not just error rate).
The more stat-savvy Primates will probably have to chime in on this one. If they do, hopefully they will respond in English rather than stat-ese, which I hate to say it, is the main reason why I hesitate to ask for their help...
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:50 p.m.,
December 23, 2003
(#3) -
MGL
Tango, great stuff! It takes a while to figure out what the heck is going on, but after you do, it's even better. Where do you get the formula for "r" from? That's about what I use (around 50% regression for one year stats and 25% for 3, depending upon the position of course).
Your conversion numbers from one positon to another ar essentially based on how players actually do who have played more than one position, right? If so, I think we should be careful about to carried away with the suggestion that a player migh be particularly suited to one position and not another. That is specualtion. As I have said a number of times, I think that when a player has a vey high or very low UZR at one position, even if they have never played another position (like Jenkins, Jeter, or Bernie), I think that is strong evidence that they are at the wrong position, and I would have to be convinced that they have some special skill that would make those conversion numbers NOT apply to them. I mean you developed those conversion numbers based on thousands of innings of players who HAVE played multiple positions, yet everyone wants to make a "special case" for everyone that I suggest should be moved. That's ridiculous. By definition, the special cases have to be fairly infrequent. I know the argument is that the ones who have never played another positions ARE the ones who are the special cases, while the ones who HAVE ALREADY played mutliple positions are the ones to whom the conversions apply. I know this is a conundrum, like with MLE's, because of selective sampling issues (can you only apply the "conversions" to players who's managers have determined that they CAN play multiple positions, and therefore you CAN'T is them - the don't apply - to players who have only played one position?), but I think it is naive to think that the reason why Jeter doesn't play third is because managers "know" that no matter how bad he is at SS, he would be just as bad at third. That is ridiculous. One, they are afraid to ask him to move positions, two, they don't think he is as nearly as bad as he is, and three, managers don't like the rock the boat, especialy on a winning team (maybe they are right). Come on, Jeter is big and has a strong arm. If anyhting he may be mroe suited to third than to SS. Using Tangop's chart, the Yankees could conceivably pick up 6 runs just by switching Boone (who has played SS) and Jeter - but we all know that's not going to happen for political reasons.
He races from first base to third as well as anyone in the league.
Based on 2000-2002 S-lwts, he is not even in the top 20 for baserunning, although he is very good. For 2003, he is in the negative (worse than average) for baserunning!
The best basrunners in 2003:
Spivey, EY, Podsednik, L. Walker (amazing at his age and size), Castillo (the best of the best), Furcal, M. Giles, Polanco, Matthews Jr., Vizquel, C. Blake (also best of best), Kotsay (how good is this guy), O, Cabrera, Pujols, yes Pujols (also best of the best), Beltran (best of best), Guzman (only redeeming thing about his play), Ichiro, Hinske (THE best of best).
I was just eyeballing the list, so I may have missed some. I have a feeling that players as a whole are so non-optimal in running the bases (too passive) that the above players are either very smart or just plain aggressive. Speed is a factor but not as much as you might think.
Here are the worst, BTW. I suspect that most of these are just too slow to ever attempt many extra base advancements although some of them may be just even more passive than the average baserunner. Again, I am just eyeballing a list that has lots of data and is in no particular order.
Worst of 2003
K. Garica, T. Hafner, E. Martinez (worst of worse and takes away a lot from his hitting value, -7 runs per 162), Olerud, Millar, Fullmer, Phelps, Posada, Bellhorn, Matheny, Tino, Karros, R. Martinez (that's pitiful for a SS), V. Guerrera (???), Bradley (pitiful for a CF), A. Ramirez, J. Franco, J. Encarnacion (??), R. Belliard, Counsell (??), Jenkins (??), McGriff.
I'll put up a real best and worst on Fanhome.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 12:27 a.m.,
December 24, 2003
(#8) -
MGL
Scoriano, keepo in mind that UZR arms does not measure a fielder's ability to cut hits off and hold a batter to a single or a double, rather than a double or a triple. It only looks at "holds" and assists.
I could do that, but it would be a little hard, as there is no "marker" in the PBP data that indicates that an OF'er held a batter to a single or double. I would have to look at the percentage of singles, doubles and triples in the various parks by an average fielder and compare this to a fielder's own percentages. Lot's of noise there. Trouble. That is why I say you can probably add 50% or so to an OF'ers arm lwts to account for unmeasured holding batters to singles and doubles...
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 5:24 p.m.,
January 12, 2004
(#13) -
MGL
Tango, does that +32 at first include the fewer opp's at first? I assume that it does.
I agree that Erstad's true value in CF, and therefore at any position is much less than it was in prior years, due to age and particularly injury (which is the reason for moving him in the first place). So in order to guess how much it will "cost" the Angels to move veryione around, I think we have to assume that Erst's true defensive neutral value at this point in time is a lot less than +34.
Plus, and this is a big plus, I don't know why Tango didn't regess those 4-year UZR stats to reflect true ability before doing the calculations, Before we true and figure how any runs the A's will save/cost by moving everyone around, we have to establish each player's true defensive value (by regressing their sample 3 or 4 year values). Actually we have to take that one-step further and esatblish their projections by taking those regressed sample values and adjusting for the fact that all these players are one year older in 2004, and are presumably going to be worse in talent defensviely than ever before. For example, I have Erstad's UZR projection in CF for 2004, including age and regression, at +20...
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 10:19 p.m.,
January 12, 2004
(#17) -
MGL
Tango certainly includes the number of opportunities in his "positional transformation formulas." IOW, the fact that first base has limited opp's as compared to CF or SS is already included int he nubmers. Keep in mind that a lot of the way the "formulas" were derived WERE based on players who DID play multiple positions (Tango can correct me if I am wrong), so they are not that theoretical and SHOULD apply to at least some players accross the board.
He does state many times, that they are rough approximations, that they would not necessarily apply to any large percentage of players, that there are indeed certain unique non-fungible skills at each position, etc, etc., so all these questions about whether Erstad would indeed be that valuable (+20, +30) at first are quite valid. I think that any exact number we surmise for his value at first is going to pretty unreliable.
OTOH, given that he is likely to have been the best CF'er in recent baseball history, it is not unreasonable to think that he might be one of the best first baseman, which is probably the equivalent of a true value of +15 or so UZR at first. OTOOH, it seems to me that the skills requirted for a great CF'er (speed, quickness, positioning, getting a good jump, eye-hand coordination, eyesight, judging ball trajectories, fearlessness, etc.) are somewhat different (to say the least) than those required at first (quick hands and feet, good hand-eye coordination, height, positioning, speed and quickness for bunts). As well, there is some evidence in my UZR research that players need to "learn" how to play first, unlike the other positions, where they "learn" them in amateur and minor league ball and "raw skills" are most important at the major league level. Plus, we have no idea how he is going to be in terms of catching bad throws (although as a great CF'er, you can assume that he probably has great athletic skills, eye-hamd coordination and eyesight), and covering first base and feeding the pitcher and things like that (some of these requiring practice and experience which one can only get by practicing and playing that particular position).
All that being said, I would put his over/under UZR and DP fielding at first (remember UZR does not measure catching bad throws) at +10 for 2004. That is just a WAG, and if anything, it may be high.
Silver, I am just finishing up the 2003 Superlwts and revised earlier ones. They should be out in the next week or so. As far as doing earlier ones, One of these days I will. For UZR, I don't have battted ball speed, earlier than 1998 I think, but it is not that big a deal. Other than that I can put out S-lwts for any year in which there is PBP data. I have the data going back to the late 80's, and I think earlier data is available on retrosheet, although I'm not sure they have the batted ball locations which would put the kabsh on UZR, although methods like DRA are pretty darn good at coming up with UZR-like numbers without using PBP data. We can also use PBP from one time period to approximate the distribution of the location of balls hit in other eras and do a "phantom UZR" using those distributions and traditional fielding data, as well as a team's pitchers' G/F ratio, handedness, BIP rate, etc.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:51 a.m.,
January 13, 2004
(#21) -
MGL
MGL, MGL, MGL... you must be the absolute worst reader I've ever met (to go with your memory)!
Hey! I resemble that remark, as my mother used to say. I was referring to your last post, not to the article. I incorrectly thought that you were using unregressed UZR averages, as that +32 sounded to high to be Erstad's true UZR value. As I said, I had him at a projected +20 for 2004. My regressions are probably stronger than yours, plus I add in a fairly agressive age adjustment, especially for CF and SS. It's not necessarily that my reading comprehension skills and my memory are poor (although they both probably are). It's just that I have 8 million things on my mind, another 6 million projects going, and I make around 4 million posts on Primer and Fanhome a day!
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 1:47 p.m.,
January 13, 2004
(#30) -
MGL
The numbers for good and bad fielders moving to first is fascinating! People keep assuming thsat Tango's transformaton formulas won't "work" for one reason or another, and then he keeps reminding us that they DO work, as they are based on real life players who did play multiple positions, at least for players who are chosen by their managers to move or to play multiple positions (there might some selective sampling there).
But even if there were selective sampling (the transformation formulas only "work" for players who have the necessary skills to play multiple positions, and not all players have those unique skills), we might not care, as we really only need to use them for players like Erstad, whom the manager and team already decided CAN play first (has the skills necessary for multiple positions). IOW, they might not work, if we were to just speculate about a random player (e.g., how would Juan Gonzales do at SS?), or we were to try and figure the optimal placement of players on the field (like Jeter to third, AFB to SS, Matsui to CF, etc.).
Tango, it is a snap for me to do UZR's for past years (89-99)! It would take me like a couple of hours. I already have the PBP data (somewhere) for those years (used to have them stored on 3.5 floppies!).
Btw, more important than all this is if MGL implements PZR. You guys should be starting a petition for that. THAT will be the culmination of DIPS and UZR, the 2 biggest advances in that last few years.
I still don't know (or understand) what PZR's are. I'm sure you can point me to the appropriate thread...
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:56 p.m.,
January 13, 2004
(#35) -
MGL
I would think that an infielder moving to another position might convert around the same % of balls (into outs), regardless of where he plays in the IF, and that this might nor vary much from player to player. Ditto for an OF'er. But to go from OF to IF or from IF to OF, I would think that would vary tremendously from player to player, as the skill for converting a ball into an out are signficantly different in the OF and the IF. For one thing, almost all (maybe ALL) good OF'ers have excellent raw speed. Not true for good IF'ers, I don't think. Was Ripken fast at all?
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 3:04 p.m.,
January 13, 2004
(#38) -
MGL
It's not like his manager didn't know that he was a great OF'er (I don't think). He probably played some first because he was particularly good at it, didn't mind playing there, had some experience at some level, they had no other good, full-time first baseman (or someone was injured), they had some acceptable replacement in the OF, he was injured, or he/they felt that because of the way he plays in the OF (recklessly and fearlessly), he needed a "rest" every now and then...
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 7:20 p.m.,
January 18, 2004
(#59) -
MGL
It illustrates how using words like spectacular, steady, consistent, etc., can muddle the real issue, which is whose talent is worth more runs to his team, and by how much, not who looks better, who makes more spectacular plays, etc. We are always interested in quantitative assessments in baseball, and not qualitative ones, especially ambiguous and arbitrary descriptive ones...
Baseball America Past Scouting Reports - 2003 Award Winners (December 23, 2003)
Posted 12:31 a.m.,
December 24, 2003
(#2) -
MGL
If you read all of BA's scouting reports, they tend to be positive and optimistic, especialy for any decent (grade A or B) prospect. It is no surprise that almost any successful ML'er will have a nice sounding scouting report.
Yes, it would be much more interesting to look at the scouting reports of major league flops...
Super-lwts previews - Baserunning (December 23, 2003)
Discussion ThreadPosted 11:44 p.m.,
December 23, 2003
(#2) -
MGL
Hey, all I am reporting is a simple sample result. As we all know it is subject to a multitude of sample errors, including hit and runs, who was batting behind him, what kind of hits they happened to have gotten, etc. Keep in mind that a whole season is like 150 baserunnng opps or something like that.
Remember also that we regress not only according to sample size, but we rregress towards the average of a similar player (i.e., a population from which this player came). There would be nothing wrong with regressing this Pujol's sample +6 to the average baserunning lwts of a "power hitter who is not fast but not that slow" or something like that.
Plus, if Pujols is indeed considered an overly agressive baserunner, as Kjok says, that would explain a lot. As we've said many time, we think that the average baserunner is way too passive (a typcial risk aversive strategy). If even a not-very-fast baserunner would just be more aggressive, I think they would show much better baserunning lwts. That may be at least partially what happened with Pujols (in addition to just being lucky with this metric).
Remember that you assume that every metric that "appears" to be wrong IS wrong. There would be no poin in having the metric then! Wasn't it Bill James who said that any good metric (that wasn't measuring something that was obvious to the naked eye) HAS TO have x number of surprises, but more than y?
Response like that of the Q of E on Fanhome are just plain stupid. Why would he even read the list when he already knew who should or shouldn't be on it?
And the funny thing about "criticizing" the baserunning lwts, is that it is one of the most straightforward metrics there are. It is what it is. Sure there as sample errors as discussed above and in the original post, but all it does is count a player's extra bases on other player's hits and their times thrown out on other player's hits, adjust for opportunities and compare it to the average player. That's it. Couldn't get any simpler or more "error-proof."
Super-lwts previews - Baserunning (December 23, 2003)
Posted 11:47 p.m.,
December 23, 2003
(#3) -
MGL
My typing is terrible today (my back is killing me). The 4th paragraph above should read:
Remember that you cannot assume that every metric that "appears" to be wrong IS wrong. There would be no point in having the metric then! Wasn't it Bill James who said that any good metric (that wasn't measuring something that was obvious to the naked eye) HAS TO have at least x number of surprises, but no more than y?
Super-lwts previews - Baserunning (December 23, 2003)
Posted 9:55 a.m.,
December 24, 2003
(#5) -
MGL
Tango, I'll add it to the ever-growing lists of things to do! :)
Batting average on balls in play, ground balls and other such beasts (December 24, 2003)
Discussion ThreadPosted 1:45 p.m.,
December 30, 2003
(#6) -
MGL
Yes, speed has little to do with GB's that go thru the IF, other than the fact that the IF has to play a little more shallow and in some cases play in to protect against a bunt. Speed also affects the ROE on a GB rate.
Also, if we look at hit rates on GB's (both IF hits and hits thru the IF), we are going to get "cross-relationships" because the fast group are probably the lightest and weakest players of the 2 groups, and probably don't hit their GB's as hard as the slow group. Consequently, they will probably get fewer hits thru the IF (with the IF'ers having to play shallower cancellin some of all of that out) and more IF hits, even after controlling for foot speed.
What was the point of this thread?
[Insert component name] Adjustment Factors (December 26, 2003)
Discussion ThreadPosted 8:44 p.m.,
December 26, 2003
(#2) -
MGL
MGL's UZR might have this problem to some extent as well, especially for a player like Andruw Jones who plays at the same park for half his games over several years.
Yes, I thought about that recently as well. I'm not sure it's a problem, though, when you are using home and road data to establish the H/R adjustment factors. Pinto's is a problem because he is just using the home data, which is of course comprised of 50% the home player or home team and 50% of lots of players who are presumably around league average. In fact, to estimate a PF, you don't have to use home AND road data. You can use just home data (which is what David is doing), as long as that home data is comprised of an unbiased large group of players. The only reason we use home AND road data in traditinal park factors is becuase we have a biased sample using only the home data (the home team hitters or pitchers or both). I'll have to think about your Ruth example and get back. I'm not sure there is a problem with it.
On a related note: You know how you (Tango) don't like idea of using park factors at all for "adjusting" (neutralizing or estimating what a player would do in a neutral stadium) a single player's stats mainly because you don't know how a stadium affects THAT particular player (not to mention the fact that true PF's are hard to estimate)? Assuming that parks affect different players in different ways to SOME extent (remember my little study on how a player's home/road splits regress almost 100%, which means that the unique affect of parks on individual players is SMALL?), let's say that our estimate of the true HR park factor for LHB's in Yankee Stadium is 1.20, and that is based on 20 years of data and then we regressed the sample HR PF appropriately. Now let's say that Ruth has a Yankee Stadium HR PF of 1.5. Doesn't that suggest that Yankee Stadium affects Ruth more than the average LHB, such that when adjusting Ruth's home stats, we might want to take a weighted average of the 1.20 and the 1.5 (maybe 95/5)? Or is Ruth's "extra" advantage already factored into the 1.20 since the 1.20 includes lots of Ruth HR's disproportionately to other players in the league (which is maybe a good reason why we SHOULD partially use a player's own home/road splits to adjust HIS stats)? OTOH, all of the Yankee players are disproportionately represented in that 1.20 (some more than others, depenmding on how many of the 20 years they played in). We don't want to "weight" Ruth's adjustment factor with all Yankee players. We want to weight it with Ruth's home/road splits. So maybe we do want to use 5 parts 1.50 and 95 parts 1.20 (or whatever combination) for Ruth and some other number than the 1.50 when we park adjust the home stats of other Yankee players.
On a related note, since weather can change the true PF of an outdoor park in any given year (not counting physical park changes) and so can changes in other parks in the league, when we park adjust say a player's or a team's 1999 stats, is it not better to use say 10 years of data, but with the 1999 data more heavily weighted, especially for outdoor parks?
I can't believe you actually posted that MLE article! You went from like Sabermetrics 101 to graduate level STATS 581!
BTW, Tango if you come up with any more provocative topics, please keep them to yourself. My head is about to explode! I actually have real work to do and a book to write! Thank God I'm stuck in Rochester, NY, in the winter with nothing to do!
[Insert component name] Adjustment Factors (December 26, 2003)
Posted 8:57 p.m.,
December 27, 2003
(#6) -
MGL
Tango is correct. You want to use ALL the years you can for a park that hasn't changed. As far as the "other parks," changing in the meantime, you can either: a) live with it, b) try and adjust for it using some sort of a "strength of schedule" iterative process, or c) put more weight on the year withing which you are adjusting. For weather, you can do a) or c).
How much you want to break a park up into its components is personal preference, based on time, energy, availablity of data, etc. The mnore information, the better - ALWAYS, as long as you know how to use it in a reasonable fashion. You can always find a way for MORE data to benefit your model, even if you don't know what the hell you are doing and your sample sizes are very small.
The most coarse kind of adjustment is a "whole park" run factor. If it is a symmetrical park, don't worry TOO MUCH (there are still wind issues, for example Wrigely Field benefits balls to left, i.e., RHB's much more so than balls to right, because of the wind) about the R/L thing. If it is a non-symmetrical park, then perhaps you might want to break that "whole park" run factor into one for RHB's and one for LHB's. It's up to you. Ditto for breaking up run factors into component factors - HR factors, triples factors, etc., or BA, OPS, etc.
Personally ,I use compoent park factors for everything I can think of, for all parts of a park. Fly balls to left, right and center, bunt atttemtps, ground balls hit to the infield (how often they go through or not depending upon the speed of the surface), foul balls (sixe of foul territry, etc. If you do that, you have to be careful that you regress those component park factors aggressively and corrrectly before you apply them to a batter's or a team's stats to estimate what that batter would do in an unknown, neutral park, since the individual sample park factors are going to be necessarily small. Also, rather than regressing everything to 1.00, sicne you know lots of things about a particular park, you want to regress sample component PF's to something other than 1.00, depending upon those "things," like altitude, average temp and wind, size of the OF and wall heights, size of foul terrory, kind of surface (grass, astroturf, Nex-turf), etc. Kind of tricky but works great!
Park Factor Thoughts (December 27, 2003)
Discussion ThreadPosted 1:37 p.m.,
December 27, 2003
(#1) -
MGL
I see, you are throwing the kitchen sink into Primate Studies before you embark on your hiatus!
Valuing Starters and Relievers (December 27, 2003)
Posted 1:35 p.m.,
December 27, 2003
(#2) -
MGL
Good discussion and of course, any system that treats both types of pitchers the same (i.e,, uses the same baseline ERA to compare to) is indeed problematic. A few comments:
One, I think you understate the selective sampling problem in doing a study of pitcher who have substantial relief AND starting work. It can be worked with (adjusted for) to some extent, but it is a big problem nonetheless.
You say that Tango comes up with a difference of around .60 runs after doing his study. You then map out a plan for a reasonable study and state that if you did such a study, you would probably come out with a difference of more than .60 runs. As far as I know, Tango came up with that .60 by doing precisely what you map out as the correct way to do such a study!
The reason you keep giving anecdotal (read: worthless) evidence of large spreads in ERA between relievers and starters is that because relievers pitch between 70-100 innings and starters around twice that. Of course, the lowest ERA's for relievers will be lower than the lowest ERA's for starters over the same time period, even if they had exactly the same talent (true ERA's)!
Your article is good, but please, please (pretty please), DO NOT use anecdotal evidence to "prove" a point or even to support or contradict an hypothesis or a notion or someone else's conclusions! That is wrong and a pet peeve of mine! Unfortunately, many writers do that all the time in the context of shoddy research! In fact, the word "evidence" in the term "anecdotal evidence," is a misnomer! If it were evidence (beyond a scintilla or a de minimus value), it would not be called anecdotal. Whe we provide data from thousands or hundreds of samples, we do not call them "anecdotes"! When we provide one or two (or three or ten) data points, we often call them "anecdotes" (and rightfully so), and of course they have little or no evidentiary value because of the sample size. Anecdotes should only be presented to ILLUSTRATE a contention or a conclusion that is or is going to proven or at least investigated by the proper scientific method! I really don't want to hear about Smoltz or Gagne as "proof" or evidence of anything!
Anyway, very good central point in your article!
BTW, if a pitcher is able to pitch very well for 1 or 2 innings, but he would suck for 5 or 6 or 7 innings for whatever reasons (not enough pitches, stamina, etc.), is that pitcher more or less talented overall than the average starting pitcher? I ask that because you throw out terms like "more or less talented" with regards to starting and relief pitchers without including the proper context (e.g., more or less talented for how manyh innings?)...
Valuing Starters and Relievers (December 27, 2003)
Posted 3:27 p.m.,
December 28, 2003
(#15) -
MGL
IIRC, Tango found no evidence that "times through the order" is a signiciant factor for anyone (starters or starters/relievers). Therefore the difference between a hybrid's pitcher's stats when releiving versus starting is probably due to throwing harder or at elast differently for 1 or 2 innings when they releive, but pacing themsleves from the getgo when they start.
Or, because of selective sampling it could just be that pitchers who have substabntial time as both starters and relievers, have NO true differences between starting and releiving, but it is just that when a pitchre sucks at starying, he may get demoted to the bullpen and when he pitches well in releif he may get promoted to the rotation. This could easily create the illusion that they are actually better ptichers when they relieve. Here is the proof that could easily be the case:
Assume exaclty the same talent when releiving and whe starting. Assume they are exaclty average pitchers and they are all the same (100 ERA+). 100 pitchers start out as pure starters. 5 will suck to the tune of 1.5 SD's below average. Those pitchers will get demoted to the bullpen. In the bulpne they will have an ERA+ of 100. As starters they will have an ERA+ of something less than 100+. Get it?
Given the way some pitchers get shiffled between the pen and being starters regardless of their true talent (e.g., Weaver), you have all knds of SEVERE selective sampling issues when looking at all pitchers who have pitched in the pen AND inthe rotation. In fact, becuase of the reason why some pticher get swriched to the pen or get promoted to a starter, you conclusion when looking at their relative perforemances is going to be goregone and will NOT give you much insight into their true talents when starting and when relieving! It is exaclty like trying to come up with MLE's that apply to any player in the minors - very difficult, even harder for this pitching thing, as you will be hard pressed to find many pitchers who pitch well as starters and then get "demoted" to the pen...
Valuing Starters and Relievers (December 27, 2003)
Posted 3:31 p.m.,
December 28, 2003
(#16) -
MGL
The more that I think of it, the more that I think it is worthless to compare performances by the same pitcher in both relief and starting roles. Completely worthless! In fact, if it is true that "times thru the order" is not that relevant, that is evidence that maybe there is no signifciant difference between when a pitcher starts and when he relieves, at least for those pitchers who can and do both (as opposed to "specialty" pitchers who can throw 100 MPOH for one inning, like Wagner)...
Valuing Starters and Relievers (December 27, 2003)
Posted 4:31 a.m.,
December 29, 2003
(#20) -
MGL
A thoughtful discussion, Guy. I have to re-read your original article, because I forgot your contention and its ramifications. If one, you are saying that when a starter, on the average, switches roles, he will improve his per inning performance, I certainly don't disagree. Did you think I did? We already discussed and we are all in agreement that a pitcher can choose to throw harder knowing that he is only going to pitch for 1 or or 3 innings, rather than potentially 6 or 7. While some may not be able to do this for whatever reasons, or their style of pitching is not conducive to "throwing harder" (I can't imagine Maddux, for example, throwing harder or altering his style of pitching when pitching in a relief role, but you never know), I think it is fair to say that on the average a starter will pitch better in relief (i.e., some will pitch better, some will pitch the same, and maybe a few will pitch worse - those who take a while to "warm up," if there really is such a thing).
We might have thought that pitching only 1,2 or 3 innings would have been an inherent advantage over pitching more innings, as we might have surmised that the mor times a batter sees a pitcher, the more of an advantage he has. That appears to NOT be the case, based on Tango's research, but you never know. It could certainly be that seeing a pitcher for the second time IS an advantage for a batter, but that advantage is eliminated because, on the average, a pitcher increases his talent slightly after the first couple of innings. In either case, the net result is the same (same result first, second, and third times through the batting order), so it doesn't really matter why, I don't think. So again, we are left with the fact that a pitcher (or at least some pitchers) can and do alter their pitching when they know they are only out there for a short time in such a way that their per batter or per inning "talent" is better than if they anticiopated being out there for a long time perhaps. Makes sense. Probably true. Now, verifying that hypothesis and trying to quantify it is the real tricky part.
Guy, I think you still don't understand the magnitude of the selective sampling problem when you are dealing with dual-use pitchers. It is extrememly problematic. It doesn't take a whole lot of pitchers who get bombed or pitch poorly as starters getting demoted to the bullpen to screw up the randomness and independence of your reliever and starter ERA samples for the dual use pitchers. This can easily be illustrated by doing a simulation on the computer.
The other thing I said is that if a group of pitchers (relief pitchers) have average smaller samples, their best and worst ERA's per season will ALWAYS be more extreme than a group (starters) that has larger average sample size (innings), assuming that one group is not much mor eor less talented than the other. This is mathematical certainty of course. If we took a group of 100 pitchers and threw them 3 innings each for the whole season, and another group of 100 and threw them for 200 innings for a whole season, which group would boast the most pitchers with the most extreme ERA's? It is a no brainer, right! Several of the 3 inning pitchers will have ERA's of 0.00. Several will have ERA's over 10, etc. Of the 200 inning pitchers, none will likely have ERA's of 0.00 or even 1.00 and none would likely have sky high ERA's. Same thing but less extreme results for a group of pitchers who throw 75-100 innings per year (releivers) versus a group of pitchers who throw 150-200 inn. per season (starters). Of couse, if you complicate things by throwing pitchers who pitch in only a few innings into the starter group, then the reuslts are going to change a little. I am not saying that releivers will have more extremem ERA's, which they will, that starters, becuase they are releivers - it has nothing to do with that. I amjut sating that any group of pitchers who don't pitch that many innings will have more extreme ERA's than a group that pitches a lot more innings, regardless of the relative talents of the two groups (assuming they are reasonably close in talent). It is the notion in statistics and gambling that in the short run "fluctuation trumps expectation," when expectation is expressed as a RATE, like ERA, but in the long-run it is the reverse.
That's all I have for now...
Best Fielding Teams, 2003 (December 28, 2003)
Posted 5:27 p.m.,
December 30, 2003
(#4) -
MGL
As I've said many times, with my UZR methodology at least, I take pains to reduce the chances of sample error creepin ginto the results, especially as we make adjustments for more and more variables (and teh sameple sizes get less and less). The way I do that is through regressions and "mitigations" to be vague about it.
Let me give you an example. Let's say that player A hit 20 home runs in park P. Let's say that we want to estimate how many home runs he would have hit in a neutral park. We want to do some sort of park adjustement of course. Now let's say that we have some data that suggests that this is a good home run park, but we are not sure how reliable this data is, plus we have only a very small sample of data that suggests this is a good HR park. What to do, what to do. Let's say the small sample of data is such that hitters hit twice as many home runs in this parks as in other parks. But that's only in like 100 PA's, so we know that there is not much confidence in that 2-1 ratio. What to do, what to do. Now say we look at the park, and it is smaller than an average park, but not that much smaller. Now we are more confident that at least this is a good HR park, bur probably not THAT good (HR PF of 2.00). What to do, what to do. We also have the problem, as Tango often points out, that maybe this park does not truly affect our player the way it affects an "average" player. What to do, what to do. If we use that sample PF of 2.00 and say our player would have hit 10 HR's in a neiutral park, rather than the 20 in this park, we know intuitively that we are probably "overadjusting," which is the main thing that Tango worries about with all these UZR adjustments (he should also worry about bugs and mistakes in the "program"). What to do, what to do. Here's what can be done, and here is why, if used properly, any adjustments, no matter how small the data sample is, is ALWAYAs better than no adjustments, if you mitigate them properly. which I think I almost always do, becuase I am always cognizant of the problem of "overadjusting." You start with this: Out player hits 20 HR's, but we have evidence that it is a good HR park. If we do no adjustments becuase we are scared about overadjusting becuase we really don't have much sample data as our evidence, we get a park neutral HR numbe of 20 still. That's fine. No park adjustment. What about 19.9. Surely that is better (closer to the truth) than 20, since we have SOME evidence that this is a good HR park. What about 19.8? Wel, how strong is our evidence that this is a good HR park and how good do we think it is, based on that evidence? Basically , we keep going until we strike a balance between some level of adjustment to improve our estimate of his neutral HR rate, and th fact that we have limited data to support our notion that this a good HR park. The key is to be concervative, such that you can say with high degreee of confidence that my "adjusted value is better than my unadjusted one, given what evidence I have of the trtue natur eof the adjustments. For example, if Manny has an unadjusted UZR of -30 in LF at Fenway, it is safe to say that his park neitralized UZR is somewhjat better than that. How much better? Well, that's the problem. But as long as you are conservartive, which I always am, you are OK. N one can accuse you of coming upo with adjusted results that are worse than the unadjusted results. If anyhting (in fact, this is always the case), with my UZR's and my other stats that get adjusted for various things, I am underestimating the adjustments, such that while the values generated may not be perfect, they are ALWYAS better than the unadjusted versions. You have to take my wod for that, but if I would show you the intermediate results for UZR (unadjusted, adjusted for park, uadjusted for park and speed of ball, etc.), you would see that each adjustment is very slight, and alsi intuitive and obvious, which means, again, that the final unadjusted results HAVE TO be better than unadjusted ones, just as if I asked you, in my 20 HR example above, what is a better estimate of player A's park-neutral home run mubmer, 20 or 19.9, given that we have some sample of data, albeit not very reliable (small size, etc.), that suggests that park P is a better than average HR park?
03 MLE's - MGL (December 28, 2003)
Posted 5:37 p.m.,
December 28, 2003
(#2) -
MGL
It is fascinating to read the scouting reports on players in the minors and to look at their MLE's as well. A few thoughts:
How bad do you have to be on defense not to play in the majors if you are a very good hitter? Not only do we know that errors are a small part of defense overall, shouldn't we at least look at the average error rates in the MINORS at the variuous positions? Why is this scouting report comparing his error rate to that of major league players? Are error rates very dependent on a minor league player's home stadium (are some of them "cornfields"?) What about Myrow's range? What does the age or experience curve for error rate look like?
Why is he "running out of time" at age 27, when his MLE's say that he is more than ready to hit at the major league level? What are they waiting for? Obviously the Yankees have very little room for offensive players at the mjaor league level. It would seem that Myrow being still in the minors is not his fault at all? Is there any evidence that a player with a good MLE over many PA's (1000+) needs "seasoning" in the minors for some reason? Is there any evidence that a 23 yo with the same MLE projection (the projection including the appropriate regression accounting for age) as a 27 yo will do any better or worse in the majors, other than the 23 yo is likely to to get better and the 27 yo is likely to get worse because of age?
Drew Henson? His MLE in 2003 was -21 in 500+ PA's and -13 in 2002 in 500+ PA's. In 2001, it was -33 in over 300 PA's. Seems to me the guy can't hit, period. I wouldn't waste ANY time waiting for him! In fact, after 2001, and even more so after 2002, I would not have wasted my time either.
If you use a player's MLE, isn't it a great opportunity to unload (trade) a player with a lousy hitting projection (especially in a hitter's park, where his raw minor stats might actually look good)but a great reputation, like Henson, and vice versa (pick up an unknown player/prospect with a great MLE)? Shouldn't the Yankees be able to trade Seguignol to some team that desparately needs a good hitting first baseman, as they don't need him as long as they have Bernie and Giambi, and even when Bernie is gone, Seguignol is likely to be too old?
What teams do you think use MLE's to evaluate minor leaguers? I don't see how you can evaluate them any other way! At the very least, shouldn't all teams have a list of every minor league player's MLE in front of them at all times?
And finally, I think Tango is going crazy! Why is UZR in the title to this thread? I think he means "LWTS"! He is starting to type like me (20 WPM and 15 typos per minute)!
03 MLE's - MGL (December 28, 2003)
Posted 7:10 p.m.,
December 28, 2003
(#4) -
MGL
Mr. Tibbs,
No problem. Quite simple actually.
First, as I said, I park and league adjust the raw stats. I use home/road splits to park adjust the stats. I use 5 year component PF's for s,d, t,hr, bb, and so and regress the 5-year sample PF's to create "true" PF's. The league adjustments are done simply by using the ratios of the player's league to the total league for each of the categories (s,d,t,hr,etc.). When I say "league," I mean INT and PCL for AAA and SOU, EAS, and TEX for AA. No regression there. AA and AA have separate MLE coefficients, so no league adjustment is needed for AA or AAA as a whole.
Anyway, once the raw stats are adjusted for park and league, I simply multily them by the following MLE coefficients. These are the "best fit" coefficients (on average - not using a regression analysis) I use:
AA
s=.95
d=.85
t=.95
hr=.61
bb=.87
so=1.15
sb=.9
cs=.9
AAA
s=.98
d=.87
t=.93
hr=.68
bb=.9
so=1.10
sb=.85
cs=1.05
Then I simply use a standard Palmer type lwts formula (including SB and CS), where all the components are set at the last 3 year's major league levels. IOW, let's say a minor league (AAA) player has a normalized (where 1.00 is average in AA or AAA) HR rate of 1.50. After multiplying that 1.50 by the MLE coefficient of .68, we get 1.02, so now our player is expected to hit 1.02 times the average major league HR rate, since he hit 1.5 times the average AAA HR rate, and players who play in AAA and the majors in the same year hit 68% of their AAA HR rate when in the majors (that's where the MLE coefficients come from) over the last 3 years. If the average HR rate in the majors were 14 per 500 PA over the last 3 years, then I use a HR rate of 14 times 1.02, or 14.28, per 500 PA for this player to compute his MLE lwts.
That's it!
03 MLE's - MGL (December 28, 2003)
Posted 7:26 p.m.,
December 28, 2003
(#5) -
MGL
I also finished doing the pitcher MLE's. From those I can compute MLE ERC (component ERA). I include a pitcher's minor league WP rate in his component ERA, BTW. I also use a player's actual home/road splits to do the park adjustments, and I also use a pitcher's actual Sngls, Dbls and TRpls rates, and not just non-HR hits.
Here are the best MLE ERC's for 2003, min 400 TBF's:
Name, age, team, hand, St/Rel, ERC
J. Brown, 27, BUF, R, S, 448, 2.42
Cotts, BIR, 24, L, S, 440, 2.70
R. Beltran, OTT, 35, L, S, 412, 3.14
C. Reyes, DUR, 35, R, S, 522, 3.24
Tsao, 23, R, S, 446, 3.35
Wasdin, 32, NVL, R, S, 553, 3.39
Griffiths, 26, NOR, R, S, 459, 3.45
The ERC's are based on an average major league ERC of 4.00 by definition.
Only Brown and Griffiths pitched in AA or AAA in 2002 (I think). Brown's MLW ERC in 2002 was 4.35 on 419 TBF, and Griffiths' was 6.04 in 646 TBF. The average MLE ERC in AA and AAA combined is 5.79, based on an average major league ERC of 4.00.
That is Jamie Brown, BTW, Anyone know who he is? Seems like he should be pitching in the majors, based on these numbers.
03 MLE's - MGL (December 28, 2003)
Posted 9:50 p.m.,
December 28, 2003
(#9) -
MGL
Here a couple more that were not on my previous list:
Edgar Gonzales, 21, ELP, R, S, 697, 3.45
Telemaco, 30, SWB, R, S, 599, 3.48
As far as Brown's K's, we hear it all the time, but I don't recall ever seeing an actual study that suggests that a pitcher's K rate is a good independent predictor of future succss. If anyone knows of one, point me in the right direction. Sure, if a pitcher has a low K rate AND his $H rate is low, then his ERA or ERC may overestimate his projection. But if his $H rate is average or worse or you use a DIPS ERA or something like it (where you regress each component differently before combining them to form a regressed ERC), then does it matter what that K rate is? I don't think so, but Icould be wrong.
IOW if pitcher A has a 4.00 ERC or ERA in 1000 TBF with an average $B rate and a K rate of 5 per 9 innings, and pitcher B has the same ERC or ERA in 1000 TBF, also with an average $H rate, but his K rate is 8 pe 9 innings, is there any evidence to suggest that pitcher B's overall projection is going to be better than pitcher's A? Again, I don't think so, or at least I've never seen any good evidence to indicate as such. It is not difficult to test that assertion. We hear all the time about how important pitcher's K rate is, but that is becuase if a pitcher's K rate is low, in order to be as good or better than another pitcher whose K rate is higher he will have to excel in either his BB rate of his HR rate, which is difficult, but certainly not impossible.
Obviosly if we have 2 pitchers wirth the same ERA or ERC in the samne period of time, we'll take the one with the higher K rate, as the other one is more likely to have gotten lucky with his ERA (low $B). But again, if we have two pitchers with equal ERA's or ERC's and one has a higher K rate, but they both have the same $B, I don't think it matters which one you take.
So before we say that we don't "like" Brown's K rate, don't we have to look at his $H. If it is not low, suggesting good luck, then his K rate is no problem, and his ERC over the last 2 years is a decent indication of how will pitch in the future, even after regressing towards the average ERA of a rookie pitcher. In fact, if we take Brown's weighted ERC average from 2002 and 2003 over those 850 TBF's or so, we get 3.25 with a weighting of 4/3. If we regress that even 60% towards the mean of a rookie pitcher, say, 4.5 (again, where 4.00 is defined as major league average), we get a projected normalized (to 4.00) ERA in the majors of 4.00, which is a hell of a lot better than a replacement pitcher. I'll take him any day of the week!
03 MLE's - MGL (December 28, 2003)
Posted 10:39 p.m.,
December 28, 2003
(#10) -
MGL
I'll bet you can start a team, RIGHT NOW, and finish above .400, if you are given access to any team's players 26 and older from the minors.
BTW, that's a fascinating thought experiment! What would your average MLB team record be if you fired all of your scouts, fired your entire roster and just used the top 25 guys in MLE lwts (accounting for defensive position of course) and MLE ERC??
What about for a team like Det and TB? What if you did the same but combined your major and minor league rosters?
In fact, here are the top 10 position players in the DET organization (Toledo and Eerie) and the top 5 starting pitchers. How many games would this team project to win at the major league level?
Offense
Name, lwts per 500 PA (based on 2003 stats) regressed 50% towards -14 (min 300 PA in 2003), age
C Inge, -11, 27
1B Daigle, -16, 25
2B Tousa, -15, 25
3B Ust, -17, 26
SS Bautista, -15, 26
OF Nicholson, -14, 28
OF Varner, -14, 24
OF Walker, -16, 27
Those are their real defensive positions.
Pitching
Name, 2003 ERC regressed 75% toward 5.00, where 4.00 is major league average, age
Ahearne, 4.67, 35
Loux, 4.75, 25
N. Robertson, 4.90, 27
Henkel, 4.92, 26
M. Johnson, 5.03, 29
OK, so we have a total of -118 lwts runs in offense in 500 PA. 162 games is around 1.3 times that amount or -156 lwts runs for the season. If the average rpg is 5.0 or 810 for the season, we have these guys at 654 runs for the season.
The pitching is really not that bad. If we just average the ERC's of the above 5 pitchers, we get 4.85 or .85 above average. Since each game is around 8.75 innings, you have 157 9 inning games in an 162 game season, so out pitching staff will allow 157 times .85 or 134 runs more than average, or 944 runs.
So we will score 654 and allow 944. Using a pythag exponent of 2, that is a pythag w/l record of .324, or 52.5 wins and 109.5 losses or almost 10 wins better than their major league record this year, and almost 6 wins better than their 2003 pythag record. How about that! How much would the above team cost?
03 MLE's - MGL (December 28, 2003)
Posted 10:42 p.m.,
December 28, 2003
(#11) -
MGL
The 4.85 normalized MLE ERC for our pitching staff is after regressing the 50-man average 75% towards 5.00.
03 MLE's - MGL (December 28, 2003)
Posted 11:33 p.m.,
December 28, 2003
(#12) -
MGL
5-man, not 50-man...
03 MLE's - MGL (December 28, 2003)
Posted 3:11 a.m.,
December 29, 2003
(#14) -
MGL
It's crude, I admit. I just keep the PA's constant, and adjust the rates of the components using the above MLE coefficients. For example, if a player has 20 HR's per 500 PA in AAA, he gets 20 times .68 (the MLE HR coefficient for AAA) or 13.6 HR's per 500 PA in the majors (that is his HR MLE). So the outs get adjusted automatically as they are whatever is left over after adjusting the MLE s,d,t,hr,and bb+hp.
If I want to compute an OBP or SA or BA from the s,d,t,hr,and bb+hp rates per 500 PA, I figure around 5 total SF's and CI's, or something like that (and ignore SH's and IBB's, as I do for major league stats). For SB and CS I just use the minor league raw numbers, again, per PA, and not per times on first, and then multiply those by the MLE coefficients. Not some of my most rigorous work, but it does the trick.
03 MLE's - MGL (December 28, 2003)
Posted 12:54 p.m.,
December 29, 2003
(#17) -
MGL
DS, yes I am vaguely familiar with some research on long-term results of high K and low K pitchers. But...
But if you have 2 pitchers of the same age, with the same ERA, ERC, and DIPS ERAs--the one with the higher K rate has a greater career expectation.
I do not think that BJ or anyone else controlled for DIPS ERA! That is the problem! Two pitchers equal in regular ERA, one with high K (power) and the other with low K (finesse), the power pitchers will tend to better at all times in the future because their DIPS ERA will be different! The high low K pitchers will tend to have a lower $H rate (luckier). IOW, their projections will NOT be the same! You have to take the two groups and control for a modified DIPS ERA and THEN look at future (short and long-term results) results! I dont think this was done, as no one knew or at least talked about the fact that different components should get regressed differntly in calculating a pitcher's projected ERA from his sample ERA!
Even given that, I have no doubt that a pitcher who is a "power" pitcher may have a "longer" career because they tend to be bigger and stronger. There is some selective sampling problem though if you study length or career as a function of K rate. As a pitcher gets older and starts to suck, teams will tend to keep the power pitchers (e.g., B. Witt, Helling), and let them continue pitching, and not the finesse pitchers, even for the same level of suckiness.
Tango, I did it right. I just used confusing terminology. The pitcher numbers are all "normalized" to 4.00 as average. The 4.00 doesn't mean anything. I then regressed their sample ERC's to 1 run higher than a league average pitcher, somewhat arbitrarily. It maybe should be a little higher as the average minor league pitcher has an MLE ERC of almost 2 runs higher than an average major league pitcher.
For the hitters I just took their sample MLE lwts below league average and regressed them to -14 per 500 PA, which is also around the average MLE of all minor league hitters. The 5 rpg is just a number I used to do the pythag w/l record. It could have been 4.5 or it could have been 5.5, although 5.0 is about an average AL rpg over the last 3 years. What did you think of my estimations for an all Detroit minor league team? Do you think it is reasonable? I honestly do. I agree that the average "best of" minor league team would be at least .400 in the majors, which is amazing if you think of it. Also, what do you think the average "MLE" UZR of a minor leaguer is? Given their young ages, I would have to say it is around zero...
03 MLE's - MGL (December 28, 2003)
Posted 3:25 p.m.,
December 29, 2003
(#20) -
MGL
Rally,
I couldn't find a halfway decent catcher in the minors so I used Inge. I just used his MLE from last year, but of course he has been a lot worse in the majors. In fact, I don't know that anyone in recent memory has hit worse in the majors in those 840 big league AB's than Inge.
In 2001-2003 in around 250-300 PA's in the minors, he actually didn't hit that badly - around -3 lwts per 500 PA in MLE lwts. OTOH, in his last full year in the minors, 2000, his MLW lwts was -20.5 per 500, which makes you wonder why he was brought up in the first place, and also should indicate that it was no surprise that he would hit so poorly in the majors. Unless he had some outstanding defensive skills, and I don't recall ever hearing anything about that, a minor league player with a -21 MLE lwts, even a catcher, is a dime a dozen. Surely you can find a better hitting catcher at the major league minimum price. Just an example of how woeful an organization Det is. And yes, it is a pretty fair assumption that no matter how much a major team league has to spend, within reason, if it loses 119 games in a season, and if it is perennially horrible, that it is managing its affairs in an incompetent fashion.
I don't have his 2004 projection yet, but I would say that it would be in the -30 per 500 PA range, which is like a .620 OPS or something like that. What does Pecota, Shandler, Marcel, or ZIPS have him as?
03 MLE's - MGL (December 28, 2003)
Posted 4:35 p.m.,
December 29, 2003
(#25) -
MGL
Rally, sure any projection for Inge is going to be a lot higher (and rightfully so) than his career major league average for 2 reasons: one, regresion to the mean of an average young major league catcher in a sucky organization, and two, his minor league MLE's are much better than his major league numbers and should be factored into the projection.
The difference, though, is that Rivera has hit well in the minors, and might be a productive player if someone gives him 840 AB to adjust to the majors. I know of no evidence that suggests that hitters have to get "used to" the major leagues. If that is true than we would see overall a big upswing (more than the regular age curve would indicate) in performance from a player's rookie year to his sophomore year or a big upswing from a player's first 300 PA"s to his next 300 PA's or some pattern like that. Do we? I doubt it. When you see a player who comes up to the majors and sucks for a couple of hundred PA's, it is most likely a statitsiticxal fluke, just like any sucky couple of hunderd PA's for any major leaguer at any time in his career. I could be wrong but the burden of proof with ALL of these conventional wisdoms and assumption is on the conventional wisomER. Why? Because I say so! Actually, for no other reason that because conventional wisdom is usally, by an overwhelming degree, wrong. So which is more efficient - to put the burden of proof on the conventional wisdom claim or the sabermetric claim?
DS, yes, you are just repeating what I said about the high K low K thing. The point I am making is "Does a high K pitcher pitcher have a better sjort or long term future, once we control for $H than a low K pitcher. That's all. And it is a critcial question, because in this century, we now know that when doing projections for pitchers, we must control for $H (regress them a lot mroe than the other components). So when we project 2 pitchers to have an equal context neitral ERA, and one has higher K rate, we really want to know if one will indeed perform better than the other next year or 5 years from now. James' studies did not answer that. In fact, given what we know now about DIPS, James results were a foregone conlusion, since of your control for ERA, a pticher with a high K rate will actually have a better DIPS ERA than a pitcher with a low K rate, so the the high K pitcher SHOULD perform better in the future! We don't use non-DIPS ERA's anymore for projections, if we don't have to, so we won't get the same projections any more for those pitchers with the same regular ERA's but different K rates!
A simple study needs to be done to look at this question again. Look at all ptichers of a similar DIPS ERA. Break them down into 2 groups - high K rate and low K rate. Adjust for age, and look at each group's regular or DIPS ERA the next year (going forward, it doesn't matter whether you use DIPS ERA or ERA - they will be the same). That will tell you if a pitcher' K rate is predictive of future success independent of his DIPS ERA (or another good projection). Then look at the rest of the careers of both groups to see if K rate is indicative, again independent of a regular projection, of a longer or more successful career...
03 MLE's - MGL (December 28, 2003)
Posted 7:16 p.m.,
December 29, 2003
(#28) -
MGL
Over the longer term, the advantage of the Ks is the potential for a pitcher to "develop" secondary level skills (BB, HR) while maintaining the K level (or at least to compensate for an age related reduction in Ks). That is a much easier task than that which faces lo K pitchers who are already good in BB and HR--they have already reached the practical limit in those areas, but do not have the raw ability to improve in the area in which they are lacking. Thus the longer term advantage of the hi K pitcher (on average, of course).
You are assuming that this is true. That is what I am questioning. It could be, but then again, it might not be. James' study (again, from memory) was flawed since it did not control for $H, so that the results might not lead to the aboive conclusion. IOW, all other things being equal, if you are a GM, looking at 3 years or more into the future, maybe is doesn't matter what a pitcher's K rate is.
Or it could be complicated (and non-linear). I tcould be, that all other things being equal, besides K rate, if 2 pitchers are bad, then the one who has the higher K rate is more likely to get better (as you say or imply above). If those pitchers are both already good (Moyer and Clemens) maybe it doesn't matter. Maybe at a certain age it matters and a certain age it doesn't matter.
What I was objecting to originally, was someone who said something like "Yeah, his sample ERC and therefore his projected ERA is pretty good, but I don't particularly like him because of his low K rate." For that statement to have any merit, a low K rate would have to imply either a short or long-term (or both) projection that would be worse than his normal projection, at least as compared to a pticher with the same projection, but a higher K rate.
I was questioning this wisdom, because I beleive that it has developed into a conventional wisdom (we just take it for granted that it is true), but that it was originally predicated on some falutly research OR I did not recall ANY good research that suggests that it might be true. That's my job. Questioning everything that people think or believe is true.
Kind of like "Does the unemployment rate, at least as we traditionally measure it (which I heard changes from time to time), significantly related to the state of the economy (whatever "the state of the economy" MEANS, which is another can of worms), AND does the President, his administration, or Congress have any significant influence over it (i.e., should the Pres get any "credit" when the unemployment rate goes down), etc., etc."
03 MLE's - MGL (December 28, 2003)
Posted 1:28 p.m.,
December 30, 2003
(#31) -
MGL
MGL's implicit answers are:
1 - no
2 - none
3 - none
Come on, you know that my impicit or explicit answers to 2 and 3 are not "none." It is "very little." The problem with scouting is not that it is inherently unimportant. The problem is that it is tainted with the same misconcptions and misinformation that exist in all of levels of professional baseball, such that it is a lot less effective than it could be. Scouts should be trained to complement statistical anlysis. They (socuts) think that they supplant it of course. That taints the whole process. A simplistic example:
Player A has great minor league hitting numbers in 1000 PA's. His MLE OPS is like .800 or .850. He was never considered a great propect though for whatever reasons (already scouts are biased against this guy, although perhaps for good reason, but perhaps not). He gets called up and in 68 PA's hits around a .400 OPS. What is the scout who is watching him in those 68 PA's going to say: "He was completely overmatched. He is either not ready for major league pitching or he never will be." If he is a grade A prospect, and particuarly if this scout or a friend of his, originally touted this guy, he will lean towards, "He's not ready yet." If he is a grade B or C prospect, he might lean towards, "He's not really major league material."
Of course, all of these types of scouting reports are crap! Without the scout understanding the importance of the MLE's, and without him unbderstanding the sample size issues of the 68 major league PA's, and because of his bias going in, based on his pre-conceived notion of whether this player is a true prospect or not, his "scouting report" is going to be so tainted as to render it almost worthless. That is why I think scouting is almost worthless. Not to mention the fatc that it is done by "old baseball guys" whose average IQ is probably around 93 and who make anywhere from zero to $50,000 a year.
So what would the proer way to have "scouted" our guy with 68 major league PA's. Well, anyone who hits .400 (OPS) in 68 PA's is going to "look" terrible in those 68 PA's, even if they were Barry Bonds. I'm sure he's had spates of 68 PA's where he has looked pretty bad (OK, maybe not Bonds, but how about Sosa). Here we have a minor league player who has torn the cover off the ball for 1000 PA's and then looks bad in 68 PA's in the mjaors! What can a scout tell us? I'm not sure he can tell us anything, to tell you the truth! We already know that guy who hits .400 is going to "look" bad. He probably swung at lots of bad pitches, etc. He also probably (defintiely) does that occasionally in the span of 68 PA's in the minors as well! Part of a batter's flutcuation in batting perforemnce is "looking bad" and swinging at bad pitches, for whatever reasons. All a scout should really say in this situation, is "Well, he sure is a good hitter in the minors. Those 68 PA's were probably just a fluke. After all, it's only 69 PA's comapred to over 1000 he's had god success on in the minors. Plus, don't forget, he faced Schilling and Maddux in 17 of those 68 PA's. Plus, he seemed particularily nervous in the show - even more so than the avrerage rookie call-up. Like I said, I can't really tell much from 68 PA's. The guys been playing baseball for 20 years with 100's of thousands of PA's at all levels. We've seen him in the minors for a couple of thousand PA's and he has donme pretty well there so far. And let's face it, AAA baseball is no sandlot league. I'd like to see you (talking to managment) hit a AAA pitcher throwing 95 MPH! Don't forget, a lot of these AAA pitchers will be in the majors soon. Where do you think all of these major league pitchers come from? Plus, there are lots of major league pitchers who are no better than the average AAA pitcher. Criped this guy hit .375 in 1000 PA's in AAA, and you want to know why he hit only .147 in 68 PA's in the show? I am a scout, not an oracle, you idiots (agaion, talking to baseball managment)! How the hell do I know WHY he didn't hit in 68 PA's? Yeah, he looked overmatched, but wht do you think he is going to look like when hitting .147? You guys ever heard of bad luck? 68 PA's for Christ sake! You want my opinion. Let him bat for 680 PA's and then we'll talk. Until then, shut up and do your job and let me do mine! Uh, what is my job again?"
That's what a scout should say...
03 MLE's - MGL (December 28, 2003)
Posted 1:35 p.m.,
December 30, 2003
(#32) -
MGL
I know of no evidence that suggests that hitters are equally disadvantaged as they play against better competition.
Neither do I, but you know as well as I do where the burden of proof lies, and you also know as well as I do that the answer is probably that there is not a huge spread among how much different hitters are truly disadvantaged as they play against better competition. And we can get some idea by looking at the variance among rookies as comapred to their MLE's, can we not?
03 MLE's - MGL (December 28, 2003)
Posted 2:35 p.m.,
December 30, 2003
(#34) -
MGL
Here is one thing I would look at:
What is the average variance around a minor leaguer's one year (say min 300 PA) MLE OPS in his rookie year (rookie year, also min 300 PA)?
Compare this to the average variance around a major leaguer's previous year OPS and his next year's OPS.
Or would the year to year r tell us anything about whether and by how much a player's true talent fluctuates when he jumps from majors to minors versus from one year in majors to another year in minors?
The results of either ot these analyses migh inform us (confuse us) on not only whether there is lots of variation on how well minor leaguers "adapt" to the mjaor leagues, but it might inform us on how good our MLE's are.
IOW, even if all minor leaguers adpated about the same (i.e., going from minors to majors was the same as going from majors to majors once you adjusted the minor league stats to make them equivalent to the majors), if the MLE was no good, we would see a larger variance around those MLE's in the next year's (in the majors) stats than in majors to majors stats, would we not?
03 MLE's - MGL (December 28, 2003)
Posted 5:04 p.m.,
December 30, 2003
(#42) -
MGL
I'd do it differently, at least at the outset, as I hate doing regressions if I don't have to, because no one knows what they mean (rehotically speaking of course). Plus, and this is an important point for all of you "regression fans" (Tango). Regressions are only necessary when you want to know about "best fit" type relationships or you want to know if correlations apply "across the board" (an r from a regression analysis is basicaly an average correlation among all data points). When you just want to know whether two groups of elements (in this case, high K and low K ptichers) differ in terms of another variable (longevity, future long or short-term perfoemance), you are way better off doign somethig a lot simpler, more striaghforward, and easier to follow and understand, than a regression, at least for starters! Like with Tango's example, rather than a regression, take the same pitchers in the sasme age grtoups with the same nubme of min TBF's, and split them into 2 groups, high K and low K. The creiteris for for each group does not matter. You just need to strike a balance between making the 2 groups as distinct as you can while still preserving some semblanc eof sample size. Anyway, then just look at the number of PA's each group has from age 29 to 36, and also look at their average ERA or DIPS ERA before and after. Or first control for DIPS ERA before and then look at DIPS ERA or regular ERA after. For this last point, split each K rate group (low anf high) into 2 groups. High ERA and low ERA. Make sure that both low ERA groups have around the same ERA and both high ERA groups have around the same ERA. Then look at the ERA's of these 4 groups in the "post" period. All of this is sort of like a "poor man's" regression, but it is way easier to follow.
The 2 hypotheses for issue number one, is that there will be a significant difference between the number of future (age 29-26) PA's between the high K group and the low K group or their won't. OK, unless you control for ERA, like you essentially do in the multiple regression, you won't know if a difference wad due to the K rate or the ERA (pitchers with better ERA's, which tend to be the high K pitchers, generally have a longer career). We want to know which one is the cause. Anyway, just break dowen the 2 K groups into high and low ERA groups as I suggested for the second part of ths study (looking at if K rate affects future perforemcne, indpendent of a pojection based on past DIPS ERA), for the first part and do the same thing. Here is the reason I prefer this kind of an analysis (I don't know what it is called, if anyhting) rather than a regression:
Let's say you do the regression and you get an r between K rate and futrure number of PA's of .4 (or .3 or .5). What does that tell you (in English)? It tells me nothing and I assume it tells the average reader nothing unless Tango or someone who does the regression and is familiar with regressions tells you what it means! Then you just have to take their word for it!
If you do it my way, and the data says that group A, the low K rate pitchers with an average DIPS ERA of 3.8, had an average of 3200 subsequent PA's and the high K pitchers with the same average DIPS ERA had an average # of subsequent PA's of 4500, the results speak fopr themsleves. Of course, you then ahve the issue of sample sizes ans significance, but at least the aveag eperson, and even a sabermetrician, can relate to and undestand that kind of a result a lot better than "an r of .4."
Anyway, just my tirade on why I hate regression analyses and the use of other similar rigorous scientific tools when it comes to analysing and presenting baseball issues....
03 MLE's - MGL (December 28, 2003)
Posted 12:32 a.m.,
December 31, 2003
(#46) -
MGL
Interesting, Rally. How do your EQR ratios compare to my component ratios? I'm not sure how to convert them to the same "currency."
I get an average of -13.6 MLE lwts runs per 162 for an average minor league player (AA and AAA combined). If we figure that in the major leagues, a player creates 55 runs per 500 PA, -13.6 is 41.4, which is 75% of major league production. For AAA, the average MLE is -11.0, which is 80% of the major league level, and AA is -16.2, which is 71%.
According to Dan S., and he probably got this from BJ, a player loses around 18% going from AAA to the majors. I forgot what perxentage BJ uses for AA. Clay D. says about 15% per level, which implies 15% for AAA and 28% for AA or something like that, which ar close to my numbers.
I don't know if I did that right, in order to compare it to your numbers , but that sounds a lot different from what you get.
Your observation about selective sampling being a huge problem for computing or verifying MLE's is right on the money for the exact reasons you mention. But here is the deal:
If you are using a player's actual minor league stats from on eyear and that player is a good player (he is from the population who normally gets called up), you don't need to worry about the selective sampling problems. You can just use the MLE's that you came up with in the same manner that you used, which is correct. If you are trying to figure out the MLE of a bad or average minor league player, i.e., a player who is NOT a normal candidate for promotion, then your original MLE coefficient or coefficients will not work correctly. Here's the important point:
Once you regress your coefficients to account for thiz selective sampling problem, as you did, you can't take a player's (actually ANY player's - good or bad) actual sample minor league stats and apply the regressed MLE coefficients! You have to regress the players true minor league stats first (towards that of an average minor league player), to reflect his true talent in the minors and THEN apply the regressed coefficients.
If you don't do the regression on the minor league stats, that's fine if, as I said, your player is a good player (and lucky). Then you can apply your original unregressed MLE coefficients.
It is a little tricky, but trust me, I've thought about these thigs for years!
Here's why:
As you correctly said, MLE's gotten by just using the weighted ratios of stats (we'll use OPS) of players who played in both the minors and majors, say in the same year, are not "true" MLE's. They don't accurately reflect the true difference in talent level between majors and minors. That is where James went wrong, or at least he didn't explain it correctly. He apparently didn't realize the selective samplng issue or chose to ignore it.
All players who make it into the majors, and thus have minor stats AND major stats which we can use to come up with MLE coefficients, as a group, are both good and lucky. Let's say we have 100 players who get called up and they each get 300 PA's in the majors and they had 300 PA's in the minors. Let's forget for now about the fact that if they suck in the majors they get sent down, even if that suckiness is bad luck which it will always be, at least partially. (MGL's rule # 3,456: All above average play, on the average, is good talent plus good luck, and all below average skill is bad luck plus poor talent!)
Anyway, let's say these 100 minor league players have a group OPS of .900 (in the minors) for those 300 PA's. Their true group OPS in the minors is a lot less - proably around .800 (assuming an average OPS in the minors is .750). That is because each of these players' high OPS is based on only 300 PA's, for purposes of this experiment. In real life, for a player to get called up, he probably needs more than just that. But in any case, for all players who get called up, they got lucky, as a group, in the minors. IOW, their true OPS in the minors is a lot lower than their sample OPS for whatever time period you are using.
So now, these guys all get called up and they hit .750 as a group in the majors. When we calculate our MLE OPS coefficient, we are going to use .750/.900, and conclude that our players lost 17% of their OPS going from minors to majors! Wrong!
The .750 is in fact a good estimate of their true OPS in the majors. No selective sampling there becuase we have already chosen our group and we go forward with no attrition. Selelctive sampling only occurs retroactively, when you choose your group after "observing" the data already. So .750 is a fair estimate of this group of players' actual major league OPS talent.
BUT, the .900 is not! That is not a fair estimate of this group's minor league OPS talent. It is .825 (or something like that, maybe .800 - I am regressing each player's sample minor league OPS in 300 PA's towards .750 - 300 PA's, the regresson should be like 70%). So the true ratio between minor league OPS and major league OPS is NOT .750/.900, it is .750/.825, or 91%, not 83%. I am making up these numbers, but it doesn't make a difference. The important point is the distincion between our group's sample OPS and their true OPS in the minors.
So basically, you have several choices in how you compute and use your MLW coefficients. You can use the the .750/.900 or83%, but that is only going to apply to players in the minors who hit around .900 in the minors and are likely to get called up. Or you can regress those .900 players' minor league OPS's first in order to estimate their true talent in the minors and the use that number, say .825, with the .750 in the majors to get a true MLE coeff. of 91%. Now you can use this coeff. for ANY player in the minors, good or bad, but before you apply that coeff. (the 91%), you have to first establish the true OPS talent in the minors of the player in question. If the player in question has a minor OPS of .900 in 300 PA's, first do the regression and then apply the 91% coeff. If a player hits .700 in the minors in 300 PA's, first regress that .700 to maybe .735, and THEN apply the 91%.
Fianlly, a third way, which I think is what you did, was to first compute the .900/.750 or 83% coeff., and the to regress the coefficient towards 1. If you regressed it 50%, you would be at around 91%. Now you still have to apply that regressed number to a player's regressed mior league OSP, which is where you went wrong in your analysis with Crosby...
03 MLE's - MGL (December 28, 2003)
Posted 1:19 p.m.,
December 31, 2003
(#49) -
MGL
I made a mistake in my last post. You HAVE to regress the player's sample minor league stats to estimate his true minor league stats, sa you would in the major leagues, and THEN you would apply the MLE coefficients. The coefficients that you would use would be the ones that accounted for the selective sampling (the higher ones). That's the only way to do it. You can't use the lower MLE coefficients first and THEN regress the final MLE, as I said earlier you could do. That doesn't work. I am in the process of doing more work on MLE's. I'll get back to this thread with the results.
In the meantime, I converted all of my MLE lwts stuff to an MLE OPS+, which is simply a player's MLE OPS divided by the average OPS in the major leagues. IOW, a minor leaguer's MLE OPS is exactly the same as a major leaguer's OPS+.
Here are some more lists: I will also send a file of all AA and AA players' MLE's from 2001 to 2003 to Tango and he can post them somewhere here.
2003 Best OPS+, min 300 PA
Name, 2003 age, team, pa, 2003 OPS+, 2002 OPS+, 2001 OPS+
M. Cabrera, 20, CAR, 303, 123,x,x
F. Seguignol, 28, COL, 446, 122,x, 106
B. Martin, 27, ELP, 387, 117,76, x
B. Larson, 27, LOU, 315, 117, 124, 84
B. Jacobsen, 28, TEN, 521, 115, 99, 105
B. Myrow, 27, TRE, 591, 114, 100, x
Bu. Crosby, 27, COL, 384, 113, 79, 92
J. Bay, 25, POR, 373, 113, 100, x
J. Leone, 26, SAN, 558, 113, x, x
G. Koonce, 28, SCO, 602, 112, 106, 99
R. Ludwick, 25, OKL, 360, 112, 103, 91
T. Sledge, 26, EDM, 572, 111, 96, 91
Bo. Crosby, 23, SCO, 543, 110, 90, x
Looks like most of these players are the real McCoy. Koonce, Bay, Myrow, Jacobson, Larson, and Seguignol are particularly impressive ove the last 3 years.
Here are the best players in OPS+ over the last 3 years combined, min 800 PA, AND who have never played in the majors. Their OPS+ is also age adjusted this time, however, there is no weighting of the 3-year OPS+ averages. I take each year's sample MLE stats and adjust them to the level of a 28 yo (doesn't mattter what age they are all adjusted to at this point). Then I average the 3-year stats (weighted for PA of course). That is each player's 3-year average "as if he were a 28 yo. I then "reverse adjust" those "28 yo" stats to their current age. IOW, it is just like a projection, but no year by year weighting is used. Each year int he 3-year period is given the same weight as any other year.
Interesting list, as I have never heard of half of them.
Name, 2004 age, last team (org), AA/AAA, pos, 3-yr PA, 3-yr OPS+
B. Myrow, 28, TRE (NYY), AA, 3B, 826, 113
B. Jacobsen, 29, TEN (SL), AA, 1B, 1296, 107
T. Meadows, 27, WCH (KC), AA, OF, 854, 105
J. Deardorff, 26, NBR (MIN), AA, 1B, 1239, 105
J. Gall, 26, MEM (STL), AAA, 1B, 1126, 104
T. Alvarez, 26, NVL (PIT), AAA, OF, 1126, 104
J.D. Closser, 24, TUL (COL), AA, C, 822, 101
A. Phillips, 27, COL (NYY), AAA, 2B, 801, 101
T. Sledge, 27, EDM (MON), AAA, OF, 1630, 101
Surely some of these guys belong in the majors! What happened to Phillips? He didn't play much in 2003.
03 MLE's - MGL (December 28, 2003)
Posted 6:26 p.m.,
December 31, 2003
(#50) -
MGL
BTW, Rally, the selective sampling of only "good and lucky" players being called up will of course make it look like the "drop-off" from minors to majors is bigger than it really is. However, the other selective sampling, the fact that if a player hits poorly in the majors after being called up, he tends to be sent back down, will force the MLE coefficient back in the other direction, such that one selective sampling tends to cancel out the other, assuming that you weight the major and minor sample stats by the "lesser of the two PA's," as you correctly do. For those of you whop don't understand what this means: If player A has 300 PA's with a .900 OPS in the minors and 100 PA's of an .800 OPS in the majors and player B has a .950 OPS in 200 PA's in the minors and an .750 OPS in 300 PA's in the majors, here is how we figure the MLE OPS coefficient:
Player A
He has 300 PA's in the minors and 100 PA's in the majors so we weight both his major and minor OPS by 100 (the lesser of the 2 PA's). For player B, we weight his minor and major OPS by 200, the lesser of HIS PA's. So for "both players': average minor OPS, we have .900 times 100 plus .950 times 200, divided by 300 or .933. Their average major OPS is .800 times 100 plus .750 times 200 divided by 300, or .767 (we use the same weights for the major and minor OPS'). So the OPS MLE coefficient is .767/.933 or 82.2%. As both Rally and I explained, this is NOT the true amount that any player loses when going from minors to majors (assuming that these were real numbers and that the sample size were much larger), as the .900 and .950 do not repreent these player's true minir league OPS talent whereas the .750 and .800 do represent their true major league OPS talent. It should be more like .767/.850, or something like that (the .850 beinf closer to the weighted average of these two players' true OPS. What Rally explained, and I explained in this post, in additon to that bias, there is going to be a "weighting" bias such that players who do poorly in the majors will have fewer PA's in the majors than players who do well, driving the observed coefficient back up towards 100% (the other selective sampling issue drove it down, away from 100%).
Anyway, I'll have more info as I do more research on proper MLE's using my minor league database. As far as I can tell, the only way you can retain any semblance of linearity in applying MLE's is to first regress your sample minor league stats to convert them into a true value and THEN to apply some MLE coefficient or coefficients in order to estimate a true major league value, IF you want to be able to calculate MLE's for ALL minor league players across ALL sample sizes and all levels of talent. For example, a player with a .900 OPS in 100 PA's in the minors obviosuly has to have a different "true" MLE in the majors (where the definition of a true MLE is that players true OPS [talent] if they had played in the majors or if they get called up to the majors right away) than a player who has a .900 OPS in 1000 PA's...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Discussion ThreadPosted 2:37 a.m.,
January 1, 2004
(#1) -
MGL
I am going to continue my discussion from the other thread to this one.
Here are some preliminary data:
In the first analysis, I looked at all players who played (at least 1 PA) in both the minors and majors in the same year. I used 3 years worth of minor and major data: 2001-2003. All of the minor league data is park and league adjusted. All of the major league data is not. For large enough samples, it shouldn't make much of a difference whether each set of data is park or league adjusted or not.
First I looked at AAA only. I wanted to see the ratio of normnalized AAA stats to normlaized major league stats for all players who played at both levels in any one year. If a player played at both levels in 2001, I compared his minor league stats in 2001 to his major league stats in 2001 only. If he played at both levels again in 2002, I compared his 2002 minor stats to his 2002 major stats only. IOW, it is as if they were two different players.
I looked at the following individual components: s, d, t, hr, bb+hp, so, sb, and cs. A "normalized" stat is simply a player's rate divided by the league rate. In the minor leagues, since the raw stats have been park and league adjusted, any player's normalized stats can be fairly compared to another. For the major leagues, the stats are not park or league adjusted, so regardless of what home parka player plays in or whether he plays in the NL or AL, his normalized stats are his stats (rate-wise, per PA, of course) divided by the average NL and AL players combined. As I said, the fact that the minor stats are adjusted and the major ones are not should make little or no difference in this analysis.
There were a total of 1146 "players" who had dual playing time in at least on eof those 3 years. I put "players" in quotes becuase if a player had dual time either on more than one team or in more than one of the 3 years, he was counted more than once. In other words, there were 1146 "pairs" of data in the sample group. Each pair contained sample of minior league PA's and a sample of major league PA's from the same player in the same year.
For each pair of data, the average number of PA's in the minors was 183 and in the majors it was 113. Again, that is in one year only, and those are averages. Any given player looked at could have had 300 PA's in the minors in 2002 and only 3 PA's in the majors that year. They would still be included in the sample data. In any case, there were 1146 such pairs of data.
As with most analyses where you are looking at matched pairs of data, i weighted each element of each pair by the lesser of the two PA's. In other words, if in the first pair of data there were 100 PA's in the minors and 200 in the majors, the major stats AND the minor stats would be weighted by 100, the lesser of the 100 and 200.
All of the minor stats are averaged using these weights and all of the major stats are averaged using the same weights. For example, let's say we were using OPS and we had 3 data pairs:
OPS minor PA minor OPS major PA major
.800 100 .700 200
.850 200 .800 150
.900 150 .850 50
The weighted average of the minor league OPS's would be .800*100 plus .850*150 plus .900*50 all divided by 300, or .842.
The weighted aveage of the major league OPS's would be (we use teh same weights) .700*100 + .800*150 + .850*50, or .775. In this example, the ratio between the minor and major OPS would be .775/.842 or .92. If those were actual numbers, they wouldn't really mean anyhting since the OPS's in the minor leagues and in the major leagues would have to normlaized to their minor and major league averages before we took the ratio.
The total of the PA "weights" (of we added up all of the lesser of the two PA's in each pair) is 58,911. This is what you would use to calculate SD's for confidence intervals of the results.
The weighted average minor league component stats of all of these dual players were, per 500 PA:
Remember this is AAA only.
s, d, t, hr, bb, so, sb, cs, OPS
85.1, 26.6, 3.8, 13.4, 48.3, 82.2, 11.7, 5.2, .811
Again, these numbers are meaningless unless you know the averages in AAA.
The average AAA player had the following component stats:
82.5, 24.7, 3.4, 11.8, 45.8, 87.0, 9.5, 4.7, .760
As you can see, the average AAA player who plays at both levels in any given year is above average in each of those categories. In fact, here are their normalized (their stats divided by the league average) component stats:
1.03, 1.07, 1.10, 1.14, 1.05, .94, 1.24, 1.10, 1.07
Interestingly, the average player who plays at both levels (usually a "call-up") excels in both power and speed, but not as much in BB rate. Such a bias is probably not optimal but not surprising.
Here are the average major league stats of the dual level players:
74.3, 21.0, 2.8, 10.6, 39.2, 99.9, 6.6, 3.9, .664
Again, these raw numbers mean little unless they are normalized to major league averages in that same year, so you can see how these "call-ups" did compared to the average major league player.
Here are those same component stats, normalized:
.94, .87, 1.11, .72, .86, 1.21, .84, 1.09, .87
Not surprisingly, they were well below average in almost every category but triples. They actually had a higher triples rate (per PA) than the average major leaguer (and their normlaized triples rate actually went up slightly form the minors to the majors), which is not surprising since triples is mainly a function of age and speed. To really get an idea as to what's going in with triples, you need to convert the above triples numbers into a "per doubles and triples" rate rather than a per PA rate as it is expressed above.
It is interesting that despite their young age and presumed superior speed, these call-ups really had their SB rate drop quite a bit and their CS rate go up. This suggests that it is much harder to steal bases in the major leagues, perhaps due to much better catcher arms and pitcher moves.
Finally, here are the normalized minor league stats divided by the normalized major league stats for all of the dual-level players. Again, because of the weighting system used, these should represent the observed drop-offs in performance (not the drop off in talent, because of the selelctive sampling issues that will be addressed in a later post and has already been discussed on the other thread) for these dual level players, as a group:
.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81
Again, the above is:
s, d, t, hr, bb, so, sb, cs, OPS
The most pronounced drop-off (36%) was in HR rate. That high number, as compared to the other drop-off rates, could be a function of severe selective sampling with HR's in that players with very high short-term (i.e. lucky) HR rates in the minors are more likely to get called up than players with high rates in other components. Again, we will get to the selective sampling issues later.
Finally for this installment, here are the ratios of minor to major normalized stats (drop-offs) for AA players who also played in the majors in the same year:
Here we have only 194 pairs of data. Each pair averages 143 AA PA's and 127 major league PA's. The total of the PA weightings was 6412.
Here are the AA to major ratios:
.84, .70, 1.07, .51, .84, 1.42, .96, .91, .74
Compare them to the AAA to major ratios:
.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81
As you can see, in going from AA to the majors rather than AAA to the majors, there is a larger drop-off in every category but triples, SB, abd CS, suggesting that these players are REAL fast and perhaps good basestealers. Remember how for the AAA to majors players, the SB rate drops and the CS rate goes way up. Like with triples, since those SB and CS rates above are per PA and independent of one another, you would have to convert them to at least a function of one another to see what is going on. One reason for the really large drop-off in HR rate as compared to the other components might be the selective sampling issue again. The average AA dual-service player has a very high HR rate in the minors. The normalized HR rate of these players is 1.44 as opposed to only 1.14 for the AAA players. This suggests that the best or perhaps only way to get called up from AA is to hit lots of home runs. Again, the higher the rate in the minors, the more luck component there is in that minor league sample stat, and the more drop off we should see in the majors due to regression alone, in addition to a change in the pitching level. IOW, that 49% observed drop-off in HR rate from AA to majors may not be nearly that high when we adjust for selective sampling of players who get called up.
Next time I am going to look at what is actually going on with this selective sampling issue and how we can perhaps account for it such that we might arrive at some MLE coefficients which actually reflect the true drop-off rates for a player's hitting talent in the minors versus his hitting talent in the majors, rather than reflecting an observed drop-off rate from a biased sample (the minor stats) to a somewhat random sample (the major sample - although the number of PA's in the majors is going to be biased), which is going to inflate the true drop-off rates if our sample of dual-level players tends to be lucky in the minors, which they do.
While the above ratios will "work" (will predict a minor league player's major league stats) pretty well for a player with around the same stats in the minors as one of our average dual-level players, they will NOT work real well if our minor leaguer has stats that are way above or below the average player in our above sample...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 3:15 a.m.,
January 1, 2004
(#2) -
MGL
Here are the AAA and AA ratios, when we do the same analysis, only this time we use a player's minor stats in one year and his major stats in the next year. If the selective sampling affects are around the same as with the same-year dual service players, we would expect to see not as much of a drop-off with the following-year dual service players as they are one year older and presumably haven't reached their peak age yet (on the average). We have 764 matched pairs for AAA one year and majors the next year. The average PA's in the minors in one year for each pair is 248 and it is 178 in the majors the following year.
AAA minor/major ratios
.94, .86, .99, .72, .85, 1.24, .79, .96, .86
Compare these to the ratios of the same-year dual players:
.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81
There is in fact a smaller dropoff. Whether that is due to age (as we would expect) or the fact that perhaps the same year dual service players had to be luckier in the minors (to get called up in the same year), we don't know. Actually this group of players had better minor league numbers than the same-year players, but they were also had more PA's so their true stats may be closer to their sample stats than the same-year players.
For AA, here are the ratios for players who played in AA one year and the majors in the next year: There were only 271 pairs with an average of 304 PA's in AA and 139 in the majors (the next year). Again, because of age, we expect a smaller drop-off, especially with these presumably younger players.
.94, .79, .97, .57, .81, 1.33, .81, .97, .80
Compare to the ratios for same-year AA players:
.84, .70, 1.07, .51, .84, 1.42, .96, .91, .74
Indeed, there is a smaller drop-off in most of the categories (interestingly, not in BB rate, but that could be sample error)...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 12:45 p.m.,
January 1, 2004
(#4) -
MGL
Tango, I have to think about that. As I said, these are the preliminary results and NOT the best way to calculate MLE's. It sounds like you are saying that there are other problem besides the selective sampling issue. Are you saying that even if the sample of minor league players who also played in the major leagues were randomly chosen from the minor legaues, you still wouldn't like choosing the lesser of the two PA's (i.e.,, dividing the 300 PA stats by 3)? Isn't that what we do whenever we do any "matched pairs" studies? I don't think any regression is necessary on the major league side. I think only on the minor league side and then only because of the selective sampling issue. For example, when you look at groups of hitters or pitchers from one year to another, like in your banner years study, don't you weight each year's results by the lesser of two PA's exactly like I did? If this is correct but for the selective sampling on the minor legaue side, then we are on the same page. If you think that regression must be done on BOTH the minor and major side, then I think I disagree. Also, after you do the regressions, on one side or the other, then do you still weight both sides by the lesser of the two PA's?
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 6:07 p.m.,
January 1, 2004
(#5) -
MGL
Before I talk about why the coefficients generated above don't "work" for any and all players in the minor leagues, it's necessary to first discuss what the definition of an MLE is.
From Dan Symborski on his web site:
One thing to remember is that MLE's are not a prediction of what the player will do, just a translation of what the major league equivalence of what the player actually did is. Dan S.
Even though you can't use an MLE directly as a prediction of major league performance, what's the point in having an MLE if you can't use it in a projection model? The answer of course is that once you translate a player's sample minoir legaue stats into a "major league equivalency," you can then use it in whatever projection model you happen to like or use. In fact, the contention by Bill James and other is that using an MLE in a projection model is exactly as good as using actual major league stats in a projection model.
Foe example, according to that claim, if player A had an MLEA OPS of .857 in 500 minor league plate appearances and player B had an actual major league OPS of .857 in 500 major leage appearances, not only would their projections be exactly the same (assuming everything else about them were the same, or that we knew nothing else abou them), but those projections would be equally accurate.
In my opinion, that claim is preposterous for two reasons. One, accurate park and league adjustments in the minor leagues, especially the former, are more difficult to do than in the major leagues, and two, no one knows for sure, and perhaps even close to "for sure" what the correct coefficients ro multipliers are when doing the MLE's. As far as I know, Bill James and others simply use a "ballpark" figure of an 18% reduction in production from AAA to the majors and then apply that in some crazy way to each component of a player's minor league stat line. Surely this can't lead to a result (an MLE) that is anywhere near as accurate as player's true major league stats. Also, as far as I know, the only justification for the "claim" is some crude test that James (apparently others have tried to replicate it) did to try and show how "accurate" MLE's were in predicting the next year's major league stats in a small sample of players versus how "accurate" actual major league stats were in predicting the next year's major league stats in another smal sample of players. He did something like look at the average value of the "delta" BA in each group, and when he found that it was about the same, he concluded that MLE stats were "just as accurate in predicting major league performance as actual majopr league stats."
Now, they may in fact be "almost as good," but they CANNOT be "just as good." The only way they could be "just as good" was if we had a way to PEFECTLY compute an MLE. Practically speaking, which I'm sure that James was doing when he said "just as good," I still don't think that you can say that, because of the inherent problems associated with the park and league adjustments in the minors plus coming up with a method of calculating the MLE conversion numbers in the first place. We don't even know whether there is a linear relationship between minor league and major league perforemance, let alone exactly what that relationship is for each of the compoentn stats. And we certainly don't know that each of the component stats is reduced by 18% in run value translation, rather than 15% for one stat, 20% for another, or some other combination of numbers, etc.
So while yes, a perfect MLE may be just as good as an actual major league stat in terms of predicting the same major league stat in the future, that reasoning is kind of circular. In fact, it's a given. It's like saying that a player projection is perfect if in fact, we use a perfect projection model!
Anyway, the goal in these posts is to use the data I have to try and come up with true and accurate MLE coefficients that can be used for any and all players, which is what MLE's are designed to do. I've already explained why the coefficients I came up with cannot be used on all playersm but CAN be used on players who have around the same minor stats as the average player in the groups studied above. However, even if we did that, our MLE coefficiennts would not represent the actual rduction in talent going from the minors to the majors, the coul;d simply be used, sort of coincidentally, as a one-step projection model. In other words, they would both do a translation AND do a regression of the translated numbers all at the same time.
To prove how the above coefficients would work well as a one-step projection for some players but not for others, I will look at players who played in the minors and majors in the same year, exactly as I did above, but I will look at the years 98-00 rather than 01-03, so that the original coefficients are not a self-fulfilling prophecy. In other words, we want to test how well those coefficients work for certain groups of players in another sample.
First, we'll look at the same overall players (dual-service) in 98-01. Here are the minor/major coefficients after doing all the same adjustments and normalizations that I did above:
.91, .86, .98, .61, .83, 1.26, .66, .96, .82
Here are the same numbers from the 01-03 sample:
.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81
Pretty darn close, which means that there is probably some very good relationship between minor and major performance, which means that one, James is right in that if we can come up with a perfect MLE algorithm, we can probably predict major leage stats from ninor league stats just as well as from major league stats, and two, we should be able to come up with something pretty good such that our MLE's should be pretty good at predicting mjaor league perforemance - not "as good as" major league stats, but pretty good.
Getting back to why we can't use the above coefficients to either translate minor stats to MLE's or to predict major stats for ALL playres but we can for some players, here are the same coeff. for players who had HR's rates around the average of all the players in our group of same-year dual-service players:
We are only concerned with the HR rate here.
.93, .83, .97, .65, 1.24, .72, 1.03, .83
As you can see, the 01-03 sample HR coeff. would have done a pretty darn good job of predicting major league HR rates for these players!
But what about for players who had either very low or very high HR rates? Let's look at the low HR rate group, and see if that .64 HR coeff. from the 01-03 sample group would have done a good job of prediction. The players had a HR rate of almost 1/3 of the average HR rate in our comlete sample of dual-service players.
Here are their coefficients:
.87, .87, 1.01, 1.16, .86, 1.27, .66, 1.04, .89
Wow! These players hit MORE home runs in the majors. Clearly, if we used the .64 coefficient for HR rate on these players, we would have done a horrible job of predicting their major league home run rate! And don't forget that these are actual players who got called up - who layed in both the majors AND minors in the same year sometime in 98-01. In fact, these players averaged 194 PA's in the minors (in one year) and 123 in the majors (in the same year). What the heck is going on here? I'll get to that in a little while, although you can probably guess.
Here are the coeff. for the high HR players. Their HR rate is around 50% higher than the whole dual-service group (since the entire group has a high HR rate to begin with).
.95, .85, .95, .51, .81, 1.25, .66, .87, .78
Well, using the .64 HR coeff. is not very good for this group either. If you did, you would overestimate their major league HR rate.
As you probably figured out already, there is not single coefficient that we can use to do a one-step prediction form minor to major, because the real two-step process is nowhere near linear.
As I already said, what we have to do with this data is to try and figure out how to come up with accuarate coefficients such that they represent the true drop-off in talent from minors to majors. Once we know these, we can easily do the two-step process of projecting major perforemnce from a sample of minor perforemnce. The first step is the translation of the sample minor stats to an equivalent sample of major stats, and the next step is the same as we would do with any sampele of major perforemance - regress those sample major stats according to the size (PA's) of the sample. Presumably each component would have its own regression rate.
Interestingly, and unfortunately, if our dual service players were chosen at random - i.e., if players were called up from the minors at random or by lottery, our work would have been over a long time ago. The original coefficients, like the .64 for HR rate calculated form the 01-03 sample group, would be fine for using for translations for ANY and all mnr league players. Thos coefficints would be true MLE's. But alas, playres are not chosen randomly from the minors to be called-up to the majors and they are not sent down randomly or not at all, so we end up getting a "selective sample" of players in our dual use groups, such that their sample minor stats, even though it is a large sample is not anywhere near representative of the true talent of the group as a whole.
This last point is very important in baseball problems and in statistics in general. Tango alluded to it in his last post. If you have 1000 players chosen at random and each player only has 10 PA's each, the average stats of those 1000 players in 10,000 PA's is going to be a very close approximation of the average true talent level of the entire group because of the large sample and becuase they were chosen at random.
However, if we selectively sample a group of 1000 players, let's say players who were good over some short period of time, say 10 PA's, even though we also have 10,000 PA's of data, the average of those 10,000 PA's is NOT, I repeat NOT, going to be a good estimate of the average talent of that 1000 player group! That's is what is happening with our dual service group.
The trick then is to try and figure out the average "true" minor stats of all the players in our group and look at their major stats. The ratios between the two are going to be good estimates of the true MLE coefficients. Tango sugests doing that on a player by player basis. I'm not sure that is necessary. I'll have to think about that and perhaps do some simulations to see what is the best solution to the problem...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 7:58 p.m.,
January 1, 2004
(#7) -
MGL
Tango, there is no problem! I was using that as an example of how you can't use the original ratios I came up with! Did you just skim my post?
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 9:41 p.m.,
January 1, 2004
(#8) -
MGL
One (usually) good way of figuring out the true stats of a selective sample of players is to look at their previous and/or next year's stats. If the selective sampling only occurs in one year, then the stats of those players in the previous and following year should be more or less random (no selective sampling) and should reflect the true talent of that group of players. For example, if you selected a group of players in one year, based on any biased criteria (good year, bad year, etc.) and you looked at that same group the next year or the previosu year, you would see the average stats of the whole group regress to their true talent level.
The only danger in this is that you have to watch out for natural selective sampling in the previous or next year even though you didn't use any criteria to choose those players in the previous or next year. For example, if you chose the bottom (worst) half of all players in the NL in 2002 and wanted to know their true talent level and you looked at those same players the next, unfortunately, the worst of those playres probably did not have a next year or had very PA's in the next year, so that the next year's stats would be top-heavy with the better players, such that your estimate of the true talent of the players originally chosen will probably be a little high, if you use their next year's stats as a prozy for this true talent. Of course, one way to minimize this natural bias (as opposed to a pure selction bias) is to do the lesser of the two PA's" type weighting. For example, let's say that your original sample were 3 players with the following stats:
A 450 PA .700 OPS
B 400 PA .650 OPS
C 450 PA .600 OPS
Let's say the next year, these same player's stats were:
A 500 PA .730 OPS
B 300 PA .700 OPS
C 150 PA .680 OPS
Your year two stats are top-heavy with the better player (the worst player didn't get that many PA's becuase he had the worst stats last year and he is probably the true worst of the three). You can mitigate or minimize that bias by weighting the two years by the lesser of the two PA's in each pair of stats. The weighted average of tghe year one group is now .600 * 150 + .650 * 300 + .700 * 450 all divided by 900, or .667. The second year weighted average is .730 * 450 + .700 * 300 + .680 * 150 divided by 900, or .712. So the true talent of the group as a whole is .712. Basicaly by using the "lesser of the 2 PA's" as the weighting factor for each year, year one and year two, we are minmizong the impact that one out of whack sample OPS can have if it is based on a very small sample, and at the same time we are accunting for the fact that the players in one year may not be represented in an equal or even proportionin the next year.
Surprisingly, when I looked at my data, I found that of the same-season, dual-service players, many of them also had plenty of minor league time the following year. I guess there are lots of players who get called up and then are sent back down and stay down for a while or who get called up for a while and tehn start the next season in the minors again.
Here are same-year dual-service (AAA and maors) players this time in a 00-02 sample who also played in AAA again, the following year. What follows is their minor league normalized stats in the first year (good and lucky - that's why they were called up) and then their next year's minor league stats (presumably somewhat close to their true stats). Again, in both years, each player's stats are weighted by the lesser of the two PA's - either the PA's from year two or the PA"s from year one, whichever is less.
Year one (these players played in AAA and majors in this year AND had time in the minors again in the next year):
.99, 1.04, 1.10, 1.14, 1.03, 1.01, 1.11, 1.08, 1.04
As you can see, these players are indeed better than average players. That is why they were called up at some point (or sent down I guess if they started the season in the majors - I make no distinction, just that they played at both levels in the same year).
Here is how the same players did the next year in the majors:
1.00, 1.04, 1.05, 1.11, 1.04, 1.00, 1.03, .99, 1.04
The ratio of the first year to the second year normalized stats is the following:
1.01, 1.00, .96, .98, 1.00, .99, .93, .92, 1.00
These numbers are actually the regression coefficients that we would use to convert the year one normalized stats into their true stats, since we are using their year two stats as a proxy for their true stats. Why does it look like little or no regression is needed? Two very important reasons: One, these are next year's stats, and since these players are all young on the average, we have not accounted for the fact that there is going to be a pretty big increase in talent level from one year to the next. Two, the players were slectively sampled in year one because they were better than average in that year and were chosen to be called up. We would typicvally expect players who were better than average in only one year or a partial year to regress quite a bit in the next year. But these players were selected for call-up not only because they had one good year, but it is likely that they had god years before that and were good prospects in the first place. IOW, the teams can tell lucky players from good ones to SOME extent. IOW, these sample fo players who get called up are not as lucky as we thought. Their one year sample stats collectively are probably fairly close to their true stats, on the average. That, and the age thing, is why their next year stats are very close to their stats in the selectively sampled year. We do see some regression in HR rate even with the age increase, and in triples, which is not surpising, as increased age probably means lower triples rate (ditto for SB/CS rate), so the triples regression we would expect to see is not mitigated by an increase in age (ditto for SB/CS rate).
So the only thing that remains in order to figure out how much these AA players stats increase from one year to the next because of age. That should be fairly easy. The quick and dirty way would be just to look at all AAA players in one year to the next and look at the ratio increase in each of the stats, doing the "lesser of the PA's" weighting to account for the fact that the better AAA players may not be in AAA the subsequent year (they may be in the majors). We should see an increase in everything except for triples, do to an age increase, with minor eague players being on the lower part of the age curve (less than 28) on the average.
Another way to handle the regression thing is to use the above regression coefficients and apply them to the MLE coefficients computed when I looked at players who played in the minors one year and the majors the next. Those players already have a built-in age adjustment, just like with these regression ratios. I'll compare the two methods later.
First let's do the same analysis with AA players. We should see the same effect. Not too much regression if any, except for triples, becuase of the age increase. The age increase effect should be especially pronounced since the AA players would be a little younger than the AA players. On the other hand, AA players who get called up may be mroe lucky than AA players, as the AAA players have more of a history for the teams to review than do the AA players. If a playe tears up single A in 250 PA's and then is tearing up AA in 200 PA's he may get called up, whereas for a AAA player to get called up, maybe he has to tear up A, AA, and AAA. I don't know.
Anywhere here are the same stats for AA players who played in AA and the majors in one year and then again in AA in the following year. We don't have a huge sample size for these players, as not too many times did a player play in AA and the majors in one year and AA again in the next year. In fact, it happened only 92 times in 00-02, but at least there was an average of 250 PA's in the first AA go around and 259 in the second.
First year AA (players also played in majors in this year and AA again in the next year)
1.05, 1.09, 1.20, 1.40, 1.06, .96, 1.41, 1.20, 1.13
Of course, these were the very best (and lucky) players in AA that year.
Next year in AA
1.02, 1.09, 1.21, 1.48, .94, 1.01, 1.10, .87, 1.09
There was indeed an overall regression, as you can see from the normlalized OPS's, even though these players were one year older. Interestingly, most of that OPS regression comes from the BB rate.
Here are the regression ratios (the first values divided by the second):
.97, 1.00, 1.01, 1.06, .88, 1.04, .78, .73, .97
Everything regressed but triples and HR's. Triples not regressing may be due to sample error and HR's not regressing may be becuase of the age increase. Again, to get a better idea as to the effect of the age increase we need to look at the stats of ALL AA players from one year to the next. I'll do that in the next installement for both AA and AAA players...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 12:58 a.m.,
January 2, 2004
(#9) -
MGL
I've thought about it a little. Since the next year minor stats are a decent proxy for the true stats of these players one year later why not use stats, normalized of course, and divide them by the normalized major stats of players who played in the minors one year and the majors the next. This should give us a decent estimate of the true MLE coeffients. At least we can compare these to the ones we get if we adjust the regression coeffients we got in the last analysis for the on eyear increase in age.
Here again, are the ratios of one year normalized minor stats to next year's normalized major stats in AAA for 00-02:
.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86
Now here are the ratios of one year's AAA stats to the next year's AAA stats for players who played in AAA and the majors in the same year. Hopefully players who played in AAA and the majors in the same year are roughly the same pool of players as players who played in AAA one year and the majors the next, since we are using the regression coefficients of one group for the other group.
1.01, 1.00, .96, .98, 1.00, .99, .93, .92, 1.00
To see how similar the groups are, here are the normalized stats of the group that played in AAA in year x and in the majors in year x+1:
1.05, 1.08, 1.11, 1.12, 1.03, .92, 1.24, 1.15, 1.07
Here they are for players who played in AA AND in the majors in year x:
1.03, 1.09, 1.09, 1.18, 1.05, .95, 1.18, 1.08, 1.08
Not too terribly different.
So the final step is to take the coefficients gotten from dividing the one year minor sample stats by the next year major sample stats, which is, for AAA (see above):
.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86
and dividing those by the regression ratios above. Those regression rates are:
1.01, 1.00, .96, .98, 1.00, .99, .93, .92, 1.00
After dividing one by the other, we get:
.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86
These are the final MLE coefficients that estimate the actual drop-off in true value from AAA to the majors. If we convert them into a lwts and use .122 runs per PA or 61 runs per 500 PA, for the average major league game, we get -16 runs per 500 PA from the above MLE's, which is a 26% drop-off in production from AAA to the majors.
On way to check how good those MLE ratios, and actually another independent way to calculate them is to see what the ratios are if we only looked at approximately league average players in the minor leagues who also played in the majors that same year. This way our players' true rates would be around the same as their sample rates, so that the observed ratios between the minor and major normalized stats would be the same as the true ratios or true drop-offs. In order to retain decent sample sizes, we have to do this one compoent at a time.
First I did it with HR rate. The average HR rate in AAA in 01-03 was 11.9 per 500. I only looked at same year dual-service batters who had HR rates between 10 and 14. They had an average rate of 12.0, right around the league average. Their MLE ratio for HR's was .68. Not too far from our .72 above.
Now let's do thes rest of the components. We will only look at same year dual service players with around average BB rates, then average K rates, etc. Here are the minor/major sample coefficients, including HR's, but excluding OPS when we do this:
.92, .83, 1.11, .68, .82, 1.24, .64, .92
Again, let's compare these to the ones calculated above, with the OPS removed:
.93, .86, 1.04, .72, .85, 1.24, .85, 1.05
Not too bad!
Now let's do the whole thing for AA, without the last part (the "check").
Here are the one year AA to the next year AA regression coeffiecients for players who played in AA and the majors in one year and AA again in the next year:
.98, 1.00, .98, 1.03, .90, 1.04, .75, .75, .97
Now here are the sample MLE coefficients for players who played in AA in one year and in the majors in the next year:
.93, .82, 1.05, .55, .82, 1.32, .81, .8, .80
If we divide the second by the first, to get the true AA MLE coefficients, we get:
.95, .82, 1.07, .53, .91, 1.27, 1.08, 1.06, .82
If you do the lwts, this reprents a 32% drop-off in production from AA to the majors.
Final tally:
True AA MLE coefficients:
.95, .82, 1.07, .53, .91, 1.27, 1.08, 1.06, .82
True AAA MLE coefficients:
.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 1:00 a.m.,
January 2, 2004
(#10) -
MGL
fixing bold hell...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 1:00 a.m.,
January 2, 2004
(#11) -
MGL
one more try
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 2:34 p.m.,
January 2, 2004
(#16) -
MGL
Can you post the BABIP MLE coefficients? It seems like most, if not all, of the dropoff in hits can be attributed to the big increase in K's.
Batters are not like pitchers in that their BABIP is fairly constant such that a chnage in K rate or BB rate will automatically mean a change in hits per PA rate. OTOH, your point that all of the individual component rates are not nearly indepdendent is a good one.
For the rates, I probably should be using BB per PA, K per PA-BB, HR per AB, and s,d, t per AB-HR (or s,d per AB-HR and t per d+t), or something like that.
Tango, what do you think the rates should be that I use?
MGL, MGL, MGL.... YOU are accusing someone ELSE of skimming a post, and then commenting on it? Should I get John McEnroe on you?
As long as we get to Cartman... Seriously, who of us has the time to NOT skim other long posts??
As well, you really should be using the Odds Ratio method. At the level that you are doing it, using a Rates method is wrong.
I knew you'd say that! Can you not use the odds ratio method most of the time since it makes things a thousand times more difficult? Whenever I use the odds ratio method, I need a little "cheat sheet" I have saved on my computer! IOW, is the "ratio" method (as opposed to the "odds ratio") method good enough in most instances? In this instance?
Now, your idea about looking at the stats of the player in question in a year that's NOT part of the sample is EXCELLENT!
However, you still have some issue. Even though you are looking at say a player who played in the minors-majors one year to establish the equivalency, and then look at year+1 in the minors to figure out his true talent level (all the players as a group), you get selective sampling. If in year+1 the guy had very few PAs in the minors, then he's hitting the cover off the ball (luckily) and he gets called up. If he stays in the minors all year+1, and gets tons of PAs, he probably wasn't doing so well. Worse, you weight those PAs more, because he had more.
I know. I am hoping that it is not that big of a factor in this case. It may not be. If it were a big factor, we'd probably see a large observed regression, if as you say, the ones with the large PA's are not doing so well. We don't see a large regression though, which makes me think that this "inherent" selective sampling in year x+1 is not that great. I don't know. There is so much selective sampling going on with trying to compute real MLE's it's not even funny. It's like trying to compute true aging coefficients but 100 times more difficult.
Tango, if you give us any more "old threads" to read through, we are going to have "negative" social lives rather than none.
This MLE project is a work in progress. As you can see, James must have been on drugs when he non-chalantly comes up with true MLE coefficients in the 1980's and proudly proclaims that the resultant MLE's are "just as good as" major league stats!
OK, I changed the rates to the following:
s rate=s/(pa-bb-so-hr)
ex=(d+t)/(pa-bb-so-hr)
t=(d+t)
hr=hr/(pa-bb-so)
bb=bb/(pa)
so=so/(pa-bb)
sb=sb/(s+bb)
cs=cs/(sb+cs)
Here are the old sample "major to minor" coefficients for players in AAA in year x and the majors in year x+1, using the old rates (everything per PA):
2000-2002 AAA and 2001-2003 majors
.94, .86, 1.00, .71, .85, 1.23, .79, .97, .86
These were s, d, t, hr, bb, so, sb, cs, and ops, all per PA (except for OPS of course)
Here are the same coefficients, using the new rates, as described above:
.96, .89, 1.15, .73, .85, 1.22, .87, 1.14, .86
These are s, ex, t, hr, bb, so, sb, cs, ops, with the rate denominators described above.
As FJM insightfully surmised, the lower K rates caused the hit rates per BIP not to decrease as much as per PA. This is definitely the better way to look at the rates. Thanks to FJM!
So now we need to divide these numbers by the regression coefficients again to get the "true" MLE coefficients.
The regression coefficients (year x+1 minor stats divided by year x minor stats for all following-year dual-service players) using these new rates are:
Remember also that these include both regression toward the mean AND an increase in talent level due to age, so that it may "look" like very little or no regression at all.
1.01, .99, .96, .98, 1.00, .99, .92, .99, 1.00
As Tango points out, we have to be a little wary of these coefficients, as there is some selective sampling here as well in terms of the number of PA's that each player gets in year x +1 as a function of how they performed in year x+1.
Anyway, dividing these into the sample MLE coefficients above, to yield an estimate of the true MLE coefficients, we get:
.95, .90, 1.20, .74, .85, 1.23, .95, 1.15, .86
For AA, here are the sample MLE coefficients:
.96, .82, 1.21, .59, .81, 1.31, .90, 1.13, .80
Here are the AA regression coefficients from one year to the next (see above for AAA):
.96, .99, 1.01, 1.05, .88, 1.03, .83, .95, .97
Again, dividing one by the other, yields:
1.00, .83, 1.20, .56, .92, 1.27, 1.08, 1.18, .82
So here are our final estimates of the "true" MLE coefficients using the new, and probably much better, rate scheme:
AA
1.00, .83, 1.20, .56, .92, 1.27, 1.08, 1.18, .82
That is an MLE lwts of -20.4 per 500 PA, which is a 33.5% reduction in run production from AA to the majors.
AAA
.95, .90, 1.20, .74, .85, 1.23, .95, 1.15, .86
This is a -17.7 MLE lwts per 500 PA or a 29% reduction in run production from AAA to the majors.
Those reduction %'s seem a little high, but it is hard to tell.
What do you guys think?
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 2:36 p.m.,
January 2, 2004
(#17) -
MGL
As long as we get to Cartman
That should be "as long as we didn't get to Cartman..."
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 5:05 p.m.,
January 2, 2004
(#22) -
MGL
One of the problems you might run into the NL to AL (and reverse) translations is "getting used to a new league." You may find a reduction in production going both ways. Should you look at NL versus on eor tow years after going to the Al (and vice versa) and control for age?
The AA reg. coeff. for BB/PA (.88) seems awfully low.
You mean high, right? Yes, I agreee that there is no logical reason for the AA BB rate coeff. to be higher than the AAA. Should be the other way around. There is so much sample error in all the different calculations.
For example, in the 97-99 vs 98-00 samples, here are the AA sample MLE coeff.
.94, .85, 1.27, .54, .82, 1.27, .87, 1.21, .80
Compare that to the AA 00-02 vs. 01-03 sample coeff. from before:
.96, .82, 1.21, .59, .81, 1.31, .90, 1.13, .80
Now, here are the 97-99 vs. 98-00 regression coeff. for AA:
.95, .95, 1.00, .94, .90, 1.07, .87, 1.05, .93
Again, compare that to the ones for 00-02 vs. 01-03 from before:
.96, .99, 1.01, 1.05, .88, 1.03, .83, .95, .97
If we average the two sets of sample data for both the sample MLE's and the regression coeff., we get:
.95, .83, 1.24, .56, .82, 1.29, .88, 1.17, .80
and
.96, .97, 1.01, 1.00, .89, 1.05, .85, 1.00, .95
Again dividing one by the other, we get, for AA:
.99, .86, 1.23, .56, .92, 1.23, 1.04, 1.17, .84
For AAA, the 97-99 vs 98-00 regression coeff. are:
.98, 1.00, .93, .96, 1.03, 1.00, 1.02, .97, .99
The other (more recent) sample was:
1.01, .99, .96, .98, 1.00, .99, .92, .99, 1.00
The average of the two samples is:
1.00, .99, .95, .97, 1.01, .99, .97, .98, 1.00
The 97-99/98-00 sample MLE coeff. for AAA are:
.94, .93, 1.10, .70, .88, 1.16, .77, 1.21, .87
The other sample was:
.96, .89, 1.15, .73, .85, 1.22, .87, 1.14, .86
The average of these is:
.95, .91, 1.13, .72, .86, 1.19, .82, 1.17, .86
1.00, .99, .95, .97, 1.01, .99, .97, .98, 1.00
Dividing the average sample MLE's by the average regression coeff. gives you, for AAA:
.95, .92, 1.19, .74, .85, 1.20, .85, 1.19, .86
Final tallies:
AA
.99, .86, 1.23, .56, .92, 1.23, 1.04, 1.17, .84
32% reduction from AA to majors.
AAA
.95, .92, 1.19, .74, .85, 1.20, .85, 1.19, .86
28% reduction from AA to majors.
Still don't know why the BB coeff. is higher in AA than in AAA. Would have to look at players who went from AA to AAA and see what happens.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 6:45 p.m.,
January 2, 2004
(#24) -
MGL
Here are the AA to AAA from one year to the next sample coeff. for 97-02 vs. 98-03 (6-yr sample):
.98 .93 1.04, .82, .95, 1.08, 1.07, 1.03, .93
This includes the selective sample problem just like with the AAA or AA to majors samples, so we need some regression coeff. (from AA one year to AA in the next year for players who had dual service in AA and AAA) to divide by (also 6-yr samples):
.97, .97, 1.01, 1.00, .90, 1.05, .85, .99, .95
That's the number I don't trust. Since the true BB rate would go up quite a bit with age for a young player, even with regression I would expect that next year's BB rate would stay around the same, and not drop by 10%.
Anyway, dividing one by the other, to get the true AA to AAA coefficients, we get:
1.01, .99, 1.06, .89, 1.04, 1.05, 1.23, 1.03, .97
This is only a 3% reduction in run production. Seems too small, although it jives with the 32% from AA to majors and 28% from AAA to majors. Also, it appears as if the BB rate might go up from AA to AAA, even after adjusting for age.
In fact, let's look at players who played in AA and AAA in the same year and who had around a league average (for AA) BB rate, so that we would expect their sample AA BB rate to be about the same as their true AA BB rate, such that their BB rate in AAA in the same year should reflect their true reduction or increase in BB rate, whichever the case may be.
For players who had around average BB rates in AA and who also played in AAA in the same year, we get a AAA to AA ratio for BB rate of .91, which suggests that players DO lose BB rate when going from AA to AAA. In fact, let's substitute the .95 (a compromise) for the 1.04 in the above true AA to AAA coeff. That gives us:
1.01, .99, 1.06, .89, .95, 1.05, 1.23, 1.03, .97
which is a 6% reduction in run production rather than a 3%.
The .95 rather than the 1.04 probably changes some of the other values, but let's not worry about that.
Let's go back to the AA to majors and AAA to majors true coeff., and see if we can "fix" those screwy (backwards - the AA being higher than the AAA) BB coeff., the same way we "fixed" the AA to AAA one.
We'll look at same year AA to majors and AAA to majors with around average BB rates in their lower league:
AA (6-yr sample)
The sample BB coeff. is .70, so we will call that the true BB coeff.
AAA
The sample BB coeff. is .86, so we will call that the true BB coeff.
Now that's more like it!
Since the AA to AAA and the AAA to majors samples are much larger than the AA to majors samples, so let's interploate the AA to majors from the AA to AAA and the AAA to majors. That gives us a AA to majors of .82. If we average that with the .70 and give more weight to the interpolated value, we'll call it .80, since it is a nice round number.
So here is what our true MLE coeff. look like now with the BB rate changes:
AA
.99, .86, 1.23, .56, .80, 1.23, 1.04, 1.17, .84
35% reduction from AA to majors.
AAA
.95, .92, 1.19, .74, .86, 1.20, .85, 1.19, .86
28% reduction from AAA to majors.
Who knows? Could be!
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 6:50 p.m.,
January 2, 2004
(#25) -
MGL
Actually, I was referring to the year-to-year reg. coeff., which you divide by to get the final MLE. In other words, for AAA it is .85 / 1.00 = .85 while for AA it is .81 / .88 = .92. I'm not questioning the numerator, just the denominator, the .88. Note that it is the only year-to-year coeff. other than SB rate which varies significantly from 1.0.
OK, right! You were referring to the regression coeff. and not the MLE coeff. It WAS low! That's what I redid. Yes, I also thought it should be close to 1, as the age thing cancels out the regression, which happens with most of the other components except for triples of course.
Definitely. Very sharp! I think you are probably the only one who is actually reading through and making sense of my gibberish! Good work!
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 9:23 p.m.,
January 2, 2004
(#28) -
MGL
No offense David, but I am just kind of keeping a journal of my work and printing it on this thread in case anyone is interested. I know that it is hard, if not impossible, to follow, but I don't have the time to write something up that's more cogent...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 10:48 p.m.,
January 2, 2004
(#30) -
MGL
No prob! I will soon! I'm redoing my 2003 AA and AAA MLE's using the new coeff., so I can send them to Tango and he can post them somewhere...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 12:17 a.m.,
January 3, 2004
(#33) -
MGL
I used the wrong denominators (and numerators) for the SB and CS portion of the stats. I used SB/(s+bb) and cs/(sb+cs). I should have used (sb+cs)/(s+bb) or "attempts," and THEN sb/(sb+cs), or cs/sb+cs).
The observed MLE coeff. for these "new" stats (att rate and sb success rate) are:
AA
.96, .93
AAA
.89, .93
The regression coeff. are:
AA
.85, 1.00
AAA
.96, 1.00
Dividing the first by the second to get the true MLE coeff., we get:
AA
1.13, .93
AAA
.93, .93
This means that when going form AA to the majors, attempts go up 13% and success rate goes down 7%. Going from AAA to the majors, atttempts and success rate both go down 7%.
Going from AA to AAA, the final true MLE coeff. for these two "new" rates (att and sb succ. rate) are:
1.22, .98
This means that when you go from AA to AAA, you attempt 22% more steals and your success rate goes down by 2%. The AA to AAA numbers jive very well with the AA to majors and AAA to majors.
If you attempt 100 steals in AA, you will attempt 122 in AAA and 113 in the majors. If you succeed 70 times in AA, you will succeed around 69 times in AAA and 65 times in the majors.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 4:27 p.m.,
January 3, 2004
(#37) -
MGL
Rally that is a reasonable explanation, but there is so much selective sampling and sample error, that I'm not sure the steal attempts thing means that much.
FJM. yes I can split up the leagues, but again, we reduce our sample sizes in half in AA and in thirds in AA, such that any differences you might see could be sample error, could be something else, etc.
I'm reasonably sure that most of the coefficients I came up with are in the same ballpark as the "real" ones, I'm reasonably sure that there is a fairly linear relationship between major and minor talent, regardless of the level of the talent (high or low), and I'm reasonably sure that the resultant MLE's are a decent predictor of major league stats/ Beyond that, who knows? I wouldn't take these coefficients as the gospel. As I said, I don't know where James and others got the original idea that their MLE's are "the gospel."
BTW. I sent my entire MLE files for AA and AAA, 2001-2003, to Tango to put up somewhere. I used a version of the new coefficients. They are quite interesting. Lots of seemingly good hitters in the minors that I have never heard of.
Here are all players who had at least 200 PA's in AA and AAA in 2001-2003 and have had significant major league time as well, and their age adjusted, and weighted by year (5/4/3), total MLE OPS's in AA and AAA were at least .800:
I regressed their weighted, age adjusted total MLE OPS to reflect a Marcel-type major league OPS projection. For 200-400 PA's, I regressed around 70%, for 400-600, I regressed around 50%, for 600 to 800, around 40%, for 800 to 1000, around 30%, and over 1000, around 20%. These are off the top of my head. The OPS I regressed to was .755, which is the average major league OPS that the MLE"s are based on.
Name, PA's, MLE OPS, MLE OPS Regressed, Park Adjusted Major Leagie OPS
A. Dunn, 416, 1.132, .906, .859
T. Perez, 229, .928, .790, .727
N. Johnson, 444, .873, .808, .812
A. Kearns, 286, .842, .781, .865
R. Simon, 245, .832, .774, .758
T. Hall, 505, .824, .789, .683
M Giles, 405, .816, .782, .846
J. Crede, 903, .810, .793, .750
H. Choi, 959, .807, .791, .739
M LeCroy, 631, .803, .781, .768
R. Ludwick, 1272, .800, .791, .698
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 4:38 p.m.,
January 3, 2004
(#38) -
MGL
Rally, fascinating! I get a 25.7% reduction from AAA and a 37.4% reduction from AA for the last 3 years! Our numbers are so close, it's scary! Did James and Dan S. put that same numbers as 18% for AAA? Are we all using the same scale - some kind of runs created? I'm using the average MLE lwts in AA and AA, which is of course, runs below the major league average, and then converting that to a runs created by simply adding .122 runs per PA, which is the average number of runs per PA in the major leagues last year.
IOW, my average AAA player had an MLE lwts of -.031 per PA. So I am converting that to a "runs created" per PA by adding .122, which is .091. IOW, my average AAA player created .091 najor league runs per PA.
So, I used 1 minus .091/.122 as the "reduction in run production" going from AAA to the majors. I think that is the right way (or at least one way) to do it...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 7:10 p.m.,
January 3, 2004
(#40) -
MGL
MGL have you ever done any studies about college level equivalencies (CLE's)?
No. The "M" in MLE stands for "major" so they would still be "MLE's" for college - perhaps CMLE's!
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 10:26 p.m.,
January 3, 2004
(#42) -
MGL
To me its a bit easier to work with because being a below average hitter will reduce the total of plate appearances...
If I'm doing this correctly, if I did it by out rather than by PA, it increases the reduction, since as you say, the minor league hitters have slightly fewer PA's in a game, and therefore generate slightly fewer runs.
For AA, the reduction in runs is another 2%, so we are up to 39.5%. For AAA, we have another .8% in reduction, for 26.5%...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 10:27 p.m.,
January 3, 2004
(#43) -
MGL
the minor league hitters have slightly fewer PA's in a game...
What I meant was that the minor league hitters, on the average, would have slightly fewer PA's in a major league game...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 1:36 a.m.,
January 4, 2004
(#46) -
MGL
Look at it this way. If you could get equally reliable MLE's at almost any level, what about Little League? T-Ball? (kidding) That's a good one! T-Ball MLE's!
Seriously, there has to be a point at which current talent, even if you could normalize it from team to team, park to park, league to league, etc., just doesn't correlate well at all to future (major league) talent, because player's learn and develop, physically, mentally, intellectually, psychologically, and desire-wise at signficiantly different rates both quantitatively and qualitatively.
The assumption behind AA and AAA stats being able to be translated into reliable MLE's is that the player's talent has pretty much developed and matured as much as it can, and that the only thing left is physical maturity and level of competition. Or at the very least, that the development in talent between A or AAA and the majors is more or less the same for all players.
At the other extreme is, as I said, Little League. One, we don't know what a 12 year old's physical makeup is going to be in 10 years. Two, we don't know what his competetive desire will be. Three, we don't know what his learning capability or desire will be. Etc. So even if we could normalize Little League stats to account for park, competition, etc., we still would not be able to come up with good MLE's. Basically the correlation (r or r squared) between even normalized LL stats and major league stats would be very small. For AA or AAA and major leagues it should be pretty high (close to majors versus majors).
So the question is, how much correlation between college and majors would there be, even if you could control for most or all of the things Tango mentions? Somewhere between LL and AAA I would think.
As a practical matter, for college players you are not trying to compare them with major leaguers to see whether you should call a college player up to the pros. That is where MLE's for AA and AAA come up handy - to help you to determine when and whether to call a player up, based on his major league projection, which is of course based o his regressed minor league MLE, and based on who else you have at the major league level that he would replace. That is why MLE's themsleves are important at the AA and AAA level.
If you are not going to bring a player up from a certain level, like college ball, to the pros, then MLE's per se are not that important. All you really want to do is to be able to normalize college stats so that you can failry compare one player on one team and conference to another player on another team and conference. That is what is mostimportant at the college level! Not what those college stats would like at the major league level, although it would be nice if you had some idea.
You would also like to know how high school stats trnaslated to college stats. In facy, that is probably more important than how college stats translate to major league stats. That way you can compare high and college players failry in the draft. The inference of course is that once you are able to put the high school and college players on a "level playing field," the one with the better "equivalent" stats is the one who will more likley do better if ever in the major leagues.
The reason you don't really care that much about MLE's for college or high school players, but you do for AA and AAA (and A) players, is that they are fundamentally different. All teams get to draft X amount of players at any draft time, so all you care about is drafting the best player, or at least being able to identify the best player (so you can factor it into signing bonus issues and things like that). A, AA, and AAA players, on the other hand, are already in your organixation or you are trading for them. The only thing you care about with them are their relative chances of making the pros. That's why you not only need normalizing metrics (league and park adjustments) in the minors, but you also need MLE's. If player A is better than player B, but neither player has any chance of making the pros, then they both have around the same value...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 4:56 p.m.,
January 4, 2004
(#51) -
MGL
No, they're not (position adjusted).
The positions listed in the files are the position at which each player had the most "games at position" his last year in AA or AAA for the 04 file and for that year in the 01-03 files.
Tango, do you still hate MLE's? I hate them if only because they have consumed me for about a week now! Time to move onto something else. Actually, Im working on the piching ones now.
Question for anyone: For the pitcher MLE's, can we automatically use the same component coefficients since the pitching is just the reverse of the hitting?
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 8:03 p.m.,
January 4, 2004
(#54) -
MGL
Unless #2 is false, and the parks are equivalent, then the pitching MLE factors will not be the same as the hitting ones.
Are you saying that if the parks are the same, then the hitting and pitching MLE's WILL be the same?
BTW, writing rule #23: Don't use more than one negative in a sentence!
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 3:11 p.m.,
January 5, 2004
(#59) -
MGL
This is probably the 10th time that MGL has mischaracterized my position, and seeing that he likes to (and accuses others of) skim articles, let me reiterate:
Honestly, Tango, that's probably the nost ridiculous thing you have ever said on these boards! One, how can someone "mischaracterize" (or characterize) ANYTHING with one question, "Do you still hate MLE's?" Two, is it my freakin' imagination, or is part of the the name of this thread "Why I hate MLE's?" Maybe we live on different planets, but when someone says "Why I hate MLE's" that implies (more than implies - that is the only conclusion) that they "hate MLE's." When I ask "Do you still hate MLE's?? not only was it just a facetious question, but where in the world am I characterizing anything about WHY you may hate MLE's? Enter McEnroe and Cartman!
If on the other hand you apply MLEs AFTER doing a regression, then what you are left with is simply applying a factor for quality of competition difference.
Well, of course, you are supposed to apply MLE's AFTER doing the appropriate regression on the minor league level (even though no one ever does), since they are supposed to represent the ratio of true talent. That's why I went through all those painstaking steps to try and establish the true talent level of the minor league players! So yes, my question about pitchers and batters was assuming that we have first established the "true" stats of the minor league pitcher or batter.
In this case, it's probably safe to say that talent distribution of hitters and pitchers are similar enough.
Rally and AED seem to think that the average park in the minors is probably signifcantly different than those in the majors, since their first repsonse was to say that there is no reason to assume that the hitter and pitcher MLE's will be the same, unless I am misinterpreting what they said, or they are backpedalling...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 8:30 p.m.,
January 6, 2004
(#67) -
MGL
AED, you are a hundred times smarter than I when it comes to statistics, but this is really a bugaboo of mine:
Overbay underperformed his MLE at the 0.3 sigma level, which means there is no reason to believe he was any worse than the MLE suggested. The difference between his prediction and performance can be solely attributed to luck. (and not even all that bad luck!)
Kata did the opposite, getting lucky in his first few weeks to overachieve his MLE and stick. By the end of the year, he was most of the way back down to his MLE.
When do cetain "sigma's," as you call them, magicaly turn from "can be solely attributed to luck" to "cannot be solely attibuted to luck?" They don't as you know, which is why I hate when anyone makes a magical distinction between a result that is less than 2 SD's a from a null hypothesis versus more than 2 SD's, etc. I much prefer to say, "This is the probability that a certain result occured by chance given a certain assumption (usually a certain true value).." If it's 1%, 3%, 5%, or 20%, or whatever, the reader can draw his own conclusion.
This notion, used mostly in the social sciences, that 2 SD's or 2.5 SD's (only a 2.5% or a .5% chance of particular results occurring by chance) is the "magical" threshold for statistical significance, is absurd.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 1:11 a.m.,
January 7, 2004
(#69) -
MGL
Given the number of at bats, he essentially performed as expected, yet lost his starting job because too much emphasis was given to two week's worth of stats (when Spivey was injured and Overbay and Kata were battling for one roster spot).
Now that's a better way to put it....
A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)
Posted 3:20 p.m.,
January 6, 2004
(#4) -
MGL
Interesting premise! Once again, no one who doesn't have a degree in statistics is going to be able to folow the methodology, which, as is often the case with an interesting topic, is a shame.
There is actually a much easier to follow method of achieiving the same results using Bayesian probability. I'm not sure why the author went the more complicated route. Speaking of, does the author have a name?
Sample size and the attenuation between regular season record and WS results is not going to allow you to make any reliable inferences about why the best teams did or didn't win the number of WS they were expected to. At the very least you would have to adjust for teams making or not making the playoffs. Even then, I think you can safely assume that the best team in the reg season is not always the favorite in the post-season because of the increased value of a team's top 3 starters, as well as how the home teams are figured in the post-season...
A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)
Posted 9:14 p.m.,
January 6, 2004
(#8) -
MGL
MGL,
I'm not aware of an easier method to find the probability that a given team is the best team. Certainly the rankings of the teams could have been obtained with less effort. Is there a reference you could point me to?
Honestly, I may have been speaking out of my posterior. I only skimmed your study the first time. If I had the distribution of true talent in the league, of course the rest (calculating the P that any given team is the best team, given their sample w/l record for the year) is trivial.
I don't know how to calculate that (the true talent distribution of baseball teams in an avreage season, e.g., 15% if all teams are .500 teams, 10% are .510 or .490, etc.). I also don't know if this distribution can be defined by a SD (we know that the mean is .500) only (e.g., that it is normal). I also don't know whether you are assuming that this distribution changes from year to year. If so, then you must be using the actual distribution of sample w/l records to determine the "true" talent distribution of teams in that year.
If not (if you are assuming that this distribution is always the same, at least for your test sample of 13 years), then what I meant was why not list the talent distribution or explain the properties of the distribution (again, e.g., it is normal with a SD of .05), and then go through the simple math for one of those years to come up with the P that each team is the best team?
These are semi-rhetorical questions. You don't need to answer them. Theyprobably makes little sense aanyway. As I said, very interesting premise. If you increase your sample size, and control for "making the playoffs," could we not get some idea as to whether there are indeed other significant factors that contribute to winning in the post-season that don't during the regular season, or vice versa?
Where have you gone Tom Boswell? (January 7, 2004)
Posted 10:10 p.m.,
January 7, 2004
(#5) -
MGL
Has anyone actually studied this? I've wanted to see this study for a long time. Somehow, I really doubt this, especially since teams tend to win about 25% of all Blown Saves anyways.
Obviously if there is any correlation, it is with blown saves that end in a loss!
I've never seen any research on that. I do vaguely recall some research I did many years ago which suggested that after a "tough loss," a team does worse than expected in the next game. I wouldn't write this one off, although I admit that I don't know the answer.
Where are these Boswell quotes coming from?
Where have you gone Tom Boswell? (January 7, 2004)
Posted 1:00 a.m.,
January 8, 2004
(#9) -
MGL
I just re-read one of those studies (comparing the W/L records 1/5/10 days before and after various come-from-behind wins). I remember reading it some time ago. My impressions are:
1) Despite what the author concludes, I think that the sample sizes are too small to conclude ANYTHING.
2) If anything, the "day after" win %'s are much higher than the "5 and 10 days later" win %'s. The author fails to mention this for some reason. He only compares the "after" w%'s to the before w%'s. For some reason the before %'s are much higher than the after %'s, excluding the "one day after" %'s. Either there are other things going on, or the sample sizes are so small that were are seeing lots of noise. I would guess the latter, as the SD in w% for even 1000 games (the approx. number of 1 and 2 run comebacks in the study) is 1.5%, so that differences between, for example, .500 and .530 are less than 2 SD's, and therefore don't tell us much.
I'd have to see a much better study before I made up my mind on this one. Might be impossible though, as it could be looking for a needle in a haystack (a "real" difference of less than one 1% for example, which would be almost impossible to detect without an anormous sample). That's the only "study" I read, however...
BABIP and Speed (January 7, 2004)
Posted 8:47 p.m.,
January 7, 2004
(#2) -
MGL
Honestly, I don't understand what the point of this is. Yes, faster players get more infield hits and get more doubles and triples and ROE's (and they bunt more often, which results in a hit around 40% of the time). Is that supposed to be a revelation?
All of these things are reflected in their stats, other than the ROE's. We've always advocated making an adjustment for ROE's in a player's stats. I don't know that it is "stupid" for MLB to treat a ROE as an out. What do you expect them to do? It is the same logic as with earned runs and unearned runs with a pitcher. They assume that a ball that "should" have been an out is an out. That's all. It obviously ends up making a player's official stats not exactly indicative of his true value, but so what? Who says that a player's official stats are supposed to prefectly reflect talent? Yes, from a sabermetric viewpoint an ROE is probably closer to a hit than an out, but one of the points of keeping official stats is simply to reflect exactly what is going on in the field. An ROE is a play that "should have" been an out but for a fielder miscue. The fact that some players cause more ROE's than others is not necessarily relevant to MLB - only to someone trying to estimate true value from official stats. At least they are tabulated separately, so you can do the adjustment if you want. What would you have them do - call it a single? That's fine too, I suppose, although it is certainly more consistent to call it an out than a single. Frankly, I don't think it matters what you call it. If you have a problem with ROE's you have to have a serious problem with SF's or with including IBB's in a batter's BB totals!
As far as the "article" goes, of course LHB's are going to have more infield singles than RHB's, after you control for speed and everything else. Looking at their BABIP or regular BA is not going to tell you anything about a batter's speed unless you control for everything else. The pool of LHB's is worse overall than the pool of RHB's just as the pool of LHP's is worse than the pool of RHP's.
As far as the ROE's, which I assume is your "beef," before we treat them as a single, we would need to see how much is luck and how much is skill. If the skill component is not similar to that of a single (if the period to period r's are not similar), then I don't think you want to treat it is a single. We know that there is SOME skill (i.e. speed) involved in the ROE's - how much is the important question before we get all worked up over the fact that they are normally treated as outs.
In Super-lwts, I am going to either include ROE's, or at least give some extra weight to a player's speed, somewhere. The reason for doing the latter rather than including ROE's per se, is that let's say that there were two RHB's who were equally slow and had equal power (and about the same GB/FB ratio), but one had lots more ROE's than the other. You would have to assume that thei difference in ROE rate was pure luck any attempt to assign "value" to those different ROE rates would be surplussage. Accordingly, you might be better off just making an adjustment for speed, GB/FB ratio, and handedness. In fact, maybe just adding a little extra value to a playre's IF hits might do the trick, as a player's infield hit rate (per PA) might be the best predictor of his ROE rate, as they come from essentially the same "skill" (speed, BIP rate, distance from home to first, and GB/FB ratio, and depth of IF)...
BABIP and Speed (January 7, 2004)
Posted 1:13 a.m.,
January 8, 2004
(#4) -
MGL
Now, when you regress a hitter's hit/BIP, you can also use his speed, so that if you have a fast guy with a .305 and a slow guy with a .305, you regress the fast player upwards, and the slow player downwards.
Yes, no doubt. I explain in my regression article how you should use "other things" in order to figure out what constant to regress certain stats towards. This is the same idea.
Record it separately, just like you record a single and a walk. I see no reason to lump in a RBOE with the other AB outs.
Sure, if only we lives in a perfect world, we'd all shop at Walgreens...
At least the SF and IBB is recorded separately so that I can have the option to add in SF to the other ABouts, and remove the IBB from the total BB. Not true with RBOE.
I actually forgot that they WEREN'T listed in a player's official stats!
I think it was a good effort. I think encouragement, where effort is put in, is warranted, don't you?
I didn't really mean to disparage the article. It brings up a point that is definitely worth noting and working on, especially when you consider that they (ROE's) aren't normally available, which changes my whole premise. I guess at the very least, they should be noted separately in a player's official stat line, as you said, like the SF's and IBB's, even if they are considered an out (you can't argue too much with the logic of calling them an out - after all no one on the MLB rules committe thinks about the impact of speed on ROE's - they just figure it should have been caught, therefore it can be treated as an out, period). Now, whoever came up with the sacrifice fly thing, but not the "sacrifice ground ball"...
BABIP and Speed (January 7, 2004)
Posted 6:32 p.m.,
January 8, 2004
(#11) -
MGL
The study linked by Ricj above is interesting. In summary, it appears as if speed has almost nothing to do with ROE's! The two factors appear to be propensity to hit GB's (of course) and handedness of batters (a proxy for which side of the infield the GB's ar hit to). RHB's have more ROE's per GB! This is further evidence that "speed to first" is not a factor. If we adjust for handedness and count ROE's as per GB, we appear to be left with almost nothing! That is important, as it means that if we want to include a player's ROE's in a value metric, all we need to (and should) do is increase the value of a GB out to SS and 3B! If we don't have PBP data (where each ground ball was actually hit), then we should simply take a player's total GB's, and interpolate how many to SS and 3B, based on his handedness.
If this is true, it is profound, as I think that the heretefore sabermetric wisdom has been that speed, as well as GB propensity of course, is a major factor in ROE rate. Again, this study, at least, seems to completely contradict that - i.e, that speed plays little or no role in ROE rate, once we control for GB rate (since speedy players tend to be GB hitters)...
BABIP and Speed (January 7, 2004)
Posted 8:29 p.m.,
January 8, 2004
(#13) -
MGL
David, that's the whole point! Once you control for speed (and to whom the GB's are hit), according to this study at least, speed has NO correlation to ROE's!
BABIP and Speed (January 7, 2004)
Posted 1:50 a.m.,
January 9, 2004
(#16) -
MGL
The average ROE park factor on at turf is .92 and 1.02 on grass (IIRC). An yes, Chris, since the GB ROE sometimes results in the batter on second, it IS worth more than a single (around .49 to .48), albeit only slightly.
I for one am man enough to admit that all along I thought it was a "skill" having to do with speed. Despite what Tango says, it is a huge revelation to know if that is NOT true. As I said, if that is NOT true, then all you have to do is slightly adjust a player's value for handedness AND slightly adjust the value of the GB "out" (including ROE's of course).
OTOH, since the LHB's singles have more value and their GB outs have more "moving runner's over" value, the handedness thing might be a wash (or even STILL favor LHB's even after the ROE adjustment), so you might not have to even adjust for handedness after all.
As far as the value of the GB out, there is a huge premium when you include the ROE! IIRC, around 1 in 30 GB "outs" are ROE's, which increases the value of the GB out by .0126, which is huge!
BABIP and Speed (January 7, 2004)
Posted 1:52 a.m.,
January 9, 2004
(#17) -
MGL
Woops, that should be increases the value of the GB out by .026...
BABIP and Speed (January 7, 2004)
Posted 3:25 p.m.,
January 9, 2004
(#22) -
MGL
Here are the values of various FB and GB events for the AL 2001-2003:
None of these includes bunts.
All FB (including HR's)= -.030
All GB (hits, outs, ROE, etc.) = -.100
Fly ball (Pop or Fly, but not line drive) OUT, including errors and DP's = -.282
Ground OUT, also including errors and GDP's = -.283
Fly ball, not inlcuding ROE's (at least one out made) = -.285
Ground out, also not including ROE's and at least one out made = -.314
As you can see, since very few ROE's are on fly balls or pop flies, they only increase the value of the FB out by .003 runs.
However, the ROE on a ground ball increases the value of the GB out including ROE's) by .031 runs, which is a lot (10%)!
While we are on the subject of lwt values:
K out (including dropped 3rd strikes) = -.300
non-K out (including ROE's, FC where not out was made, etc.) = -.284
non-K out (not including ROE's and at least one out made) = -.301
So the K out and the non-K out are worth almost EXACTLY the same amount if ROE's are NOT inlcluded in the non-K outs. Once you include the errors, then a non-K out is "better." And, it doesn't matter whether those non-K outs are fly outs or ground outs...
BABIP and Speed (January 7, 2004)
BABIP and Speed (January 7, 2004)
Discussion ThreadPosted 4:01 p.m.,
January 9, 2004
(#24) -
MGL
Let me correct the above values. Also, I am now treating a FC where no out was actually made as an out. So there are only two categories of non-K outs now: with ROE's and without ROE's. The corrections are in bold.
RHB/LHB
All FB (including HR's)= -.040/-.018
All GB (hits, outs, ROE, etc.) = -.102/-.097
Fly ball (Pop or Fly, but not line drive) OUT, including errors and DP's = -.281/-.283
Ground OUT, also including errors and GDP's = -.289/-.278
Fly ball, not inlcuding ROE's = -.285/-.285
Ground out, also not including ROE's = -.321/-.302
K out (including dropped 3rd strikes) = -.300/-.300
non-K out (including ROE's, FC where no out was made, etc.) = -.286/-.282
BABIP and Speed (January 7, 2004)
Posted 3:18 a.m.,
January 10, 2004
(#27) -
MGL
Depends on what you want to call a "skill." Since we know that ROE's are related to GB rate (per PA) and handedness (GB rate to 3rd base and SS), then of course, players will demonstrate a "persistency" (skill, if you want to call it that) if we use ROE per PA and don't control for handedness. I don't have to look at the data to tell you that. The more interesting question is what is the "persistency" or correlation (is there a "skill") of ROE, if we look at it on a per GB basis and adjust for handedness? According to the study referenced above, there is little or none. I think the r or r^2 was like .04 once the GB rate and handedness were adjusted for.
FWIW, for ROE per PA, we get a regression of about 71% for 600 PA.
For LHB's we get 77%. For RHB's, we get 72%.
If we use ROE per BIP, we have 74% for RHB and 81% for LHB.
If we further control for GB/FB ratio, the regressions will be even higher. There is definitely a suggestion that other than handedness and GB rate, there isn't much of a persistency/skill/correlation in ROE's, as we suspected...
BABIP and Speed (January 7, 2004)
Posted 7:48 p.m.,
January 10, 2004
(#29) -
MGL
Blows me away how the "equation" comes so close to the empirical number!
BABIP and Speed (January 7, 2004)
Posted 12:27 a.m.,
January 11, 2004
(#30) -
MGL
I ran some correlations for ROE per PA and ROE per "ground ball to the left side of the IF (GBL)." Here are the results (r's):
Players had a min of 300 PA for each of two consecutive years. I correlated 2000 with 2001, 2001 with 2002, and 2002 with 2003 (Technically all data elements should be independent, but they are not as I "overlapped" years, but it is no big deal. It just means that the effective sample size is smaller that the actual sample size.)
N=579
ROE per PA, r=.265
ROE per GBL, r=.386
With my empircial data, I also got a lower regression with the ROE/GBL than the 74% or so I reported with the ROE per PA. I'm not sure why the correlation is HIGHER with ROE per GRL. I'm puzzled by that.
That definitely implies that there IS some consistency in ROE independent of handedness and GB rate - in fact more than if you don't adjust for handedness and GB rate!
The formula I am using for "r" is:
N * the sum of all the x*y's divided by the square root of ((N * the sum of all the x^2's - (the sum of all the x's)^2 * (N * the sum of all the y^2's - (the sum of all the x's) ^ 2))
If that is readable. Is that the "correlation coefficient?" Is r-squared (the precent of variance in the y's explained by the x's) just this number squared?
MGL - Component Regression Values (PDF) (January 8, 2004)
Discussion ThreadPosted 6:12 p.m.,
January 8, 2004
(#3) -
MGL
You mean you reduced my article to 3 lines? Where did you get these. Are you figuring all the rates as per PA, or are you assuming the "traditional" denominators (SO=SO/(PA-BB), etc.)? If you are using per PA, do you think it makes a big difference that these are not the best deno,inators to use? Do these inlcude the possible "inter-dependencies" you mention in your first reponse?
I assume these are for batters. Do you have similar numbers for pitchers?
Your numbers are very close to mine, except for triples. His lower values may reflect the "inter-dependency." Tango, why do you think our triples are so different. I was surprised how high mine was, as triples seems to be a good reflection of speed. Also remember I use park adjusted stats. If you did not, the persisitency of triples rate may reflect the park more than the batter. Also do you have values for SB/CS (or attempts per 1B+BB and success rate, which is what I would use)?
Here is the comparison:
Tango: 26% 33% 65% 49% 18% 14% 30% 73% 9%
MGL: N/A N/A 70% 60% 60% 15% N/A N/A 15%
Also, you might want to explain how to use the x/(x+PA). It may be a little confusing for those who are not familiar with regression and the "role" the # of PA's play. Also I have a little problem (or at least a caveat) with the quick and dirty formula: x/(x+PA). I assume that techincally that is not the correct true "relationship" (curve) between the regression coefficient and PA.
I did a littel preliminary work on the "inter-dependency" thing. Indeed it appears that some of the components inform the regression of other components (such as $SO and $HR for batters), or at least change the regression constant. However, it also appears as if there is little or no dependency among some of the components, such that you can safely regress each one separately, without worryong about the other. Tango suggests that the best "Q&D" way to handle this potential problem is to just reduce the regressions for all of the components by some amount. That may be OK, but I would like to come up with a better solution. It may require a regression equation for each component, which inlcudes all the other components (as independent variables)...
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 2:29 p.m.,
January 9, 2004
(#10) -
MGL
The denominator thing is VERY important. If you use the "right" denominators (the right ones are the denominators which tend to do two things: one, reduce the interdependence of the components, i.e., make sure that if one goes up, another one doesn't automatically go down (or up also), or something like that, and two, reflect the greatest proprtion of "skill"), you will see some of the regressions change quite a bit. For example, for a pitcher, triples per PA are very dependent on the other components, but triples per extra base hit are basically the same for all pitchers of the same handedness. For batters, a triple is bascially a "trouble" double hit to RF by a speedy batter. If you use triples per PA for a batter, that will automatically go up as doubles go up, but if you use triples per doubles and triples, this should reflect the true triples speed of the batter.
I agree that Primate Studies this off-season has been fantastic! Props to Tango for all the work he puts in!
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 5:07 p.m.,
January 9, 2004
(#12) -
MGL
FJM, again, the big problem is with the denominators of the rates. You cannot use per PA for both batters and pitchers and expect to get reasonable "responsibility" percentages, especially with the singles. The reason you get such a discrepany with DIPS is that DIPS specifically refers to hits/BIP and NOT hits/PA. In fact, that's the whole point of DIPS. If you look at hits/PA for pitchers, it will appear as if pitchers have huge control over that, but that is simply becuase of their BB and K totals, which changes the hits/PA...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 12:30 a.m.,
January 14, 2004
(#3) -
MGL
Again, and I know this is a hard concept for some people to swallow, when you adjust team records for "strength of opponent," as Dackle (and others) did, you CANNOT use the other team's actual records! You must use a regressed version of their records. In fact, you must take each team that a team played, one by one, and regress each of their records separately, and THEN average them all to get a team's strength of schedule.
I realize that everyone wants to and has done it the "old" way (it seems intuitively correct), but I can't emphasize enough that you cannot assume anything from the actual w/l records of a team's opponents. As in the explanation/example in my article, if all teams in baseball had the same talent (complete parity - Selig bite your tongue), you would be making a big mistake adjusting any team's record according to the actual records of their opponents! (Why should the Met's care about the records of teams they played if those records didn't mean anything, i.e., they were just statistical fluctuations?)
What if there were nearly parity in the league (i.e., all teams were between .490 and .510 in talent). Then we would still be making a mistake, albeit not as large, in adjusting teams' records according to the actual w/l records of their opponents, which might be .420, .610, or whatever. In oder to avoid those mistakes, we have to first figure out the actual parity in the league and then apply that parity to each team's w/l record in order to do the adjustments. The way to do that is to simply regress each team's w/l record to estimate it's true w/l record and then use these records for the opponent adjustments. Tango can easily tell us the correct regression percentage for a team's w/l record in a 162 game schedule.
I suppose in some philospohical sense (D. Smythe, you can stop reading now lest I bore you!), since the resultant adjusted w/l records of each team after doing the "strength of schedule (SOS)" adjustments are just samples of their true talent anyway (albeit "better" samples after the adjustment is done), you might as well do the SOS adjustments the old-fashioned way and then just regress the whole damn thing even more afterward if you want to know a team's real talent, but that's a different and complex story altogether....
MGL takes on the Neyer challenge (January 13, 2004)
Posted 3:16 a.m.,
January 14, 2004
(#5) -
MGL
Michael (Humphries?), yes that is the exact Bayesian approach that I have also attempted to describe (it ain't easy) many times here an on Fanhome. That is the rogorous and precise way to estimate a true value from a sample value if you know or can estimate the distribution of true talent in the population. I'm glad that someone else recognizes this method, as I was beginning to think I was nuts! Anyway, if you assume a somewhat normal (even with a skew) distribution of true talent (either among teams or players), you can do a shortcut of course (at least that is what Tango and others have said and done). That shortcut is to look at the observed variance in sample talent (e.g., variance of w/l records of all teams in a given year) and compare to what is expected (in variance) if everyone were of the same talent, and the difference in variance, I think, is atributed to the true talent distribution, or something like that! I'm not sure how if at all the skew of the true talent distribution afffects the validity of this "shortcut." I think that Tango would say that it is not a shortcut, but the real thing. But I think that is only the case if the true talent distibution were exactly normal (no skew), but I'm not sure.
As you said, or implied, since the true talent distribution is smooth and continuous, regardless of what the curve looks like (there is a finite chance that a team can have a true w/l record of anything), I think that the rogorous Bayesian method that you describe would have to use integral math (calculus), but I'm not sure as I am no mathematician, although once upon a time, many years ago, I majored in math in college (I switched to another major when the math started getting weird and the professors even weirder).
Michael, what a great frickin' article (the homepage link)! There is Tango's regression = 1-r formula! I've never seen that anywhere else, even in the many statistics books and websites I have consulted over the years. Also what a great explanation of "regression" in general! Should be must reading for all sabers and aspiring sabers! Thanks for the great link! I bookmarked it...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 1:10 p.m.,
January 14, 2004
(#9) -
MGL
FJM, sure, and that is why I said (somewhere) that you are better off (going to be more accurate in your estimation of that team's true talent) not using a team's actual w/l record at all. As you say, you should use their pythag record! If you do, you need to use the approriate regression percentage for pythag records and not real records; the former will be smaller than the latter since a pythag record will correlate better with a pythag record (or a real record) in another time frame better than a real record/real record corr.
So you can do it either way and you should use the appropriate regressions. If you have a choice and the resources, pythag record is better. Good point!
Of course, you can get more granular than using a team's pythag record, which will also yield more accurate results. Use a team's total park adjusted offensive lwts and their lwts against for defense. Then compute their theoretical w/l record from that. That SHOULD be better than their pythag record. Regress each value first, or regress the final estimated w/l from the lwts.
Even better than that, take each indidvidual player's multi-year value (lwts, OPS, and lwts against or OPS against for pitchers, or whatever) regress each one accoriding to PA (like you are doing an OPS projection for each player), add all the players up, prorating by number of actual PA's in the year in question, and THAT is probably the best estimate yet! As the procedure gets more complicated, gotta be careful about not introducing too much error. It is a tradeoff and balance between rigor and accuracy of the results (getting closer to the team's true talent) and the possibility of tainting the results with all kinds of potential errors in the complex calculations....
MGL takes on the Neyer challenge (January 13, 2004)
Posted 4:17 p.m.,
January 14, 2004
(#12) -
MGL
MGL, how would you calculate the "true value" of a team? Is it done the same way as players? Do you use 5-4-3 weights?
Dackle, you probably didn't see my post #9, when you posted #10. I think I answered that question.
What if the question you want to answer is not "What is the true value of this player/team's opponents" but rather "How much were this player/team's statistics displaced by the distribution of its opponents within a self-contained season?" In that case I'd use the old method.
Let's say that a team (or player), team A played against another team, team B, once (let's say) that had an overall w/l of .600 in a 100 game season, and there were 101 teams or something like that. Other than using it to estimate that team's true talent, of what relevance is it to team A what team B's record was against other team's? That is why your question makes no sense (I don't mean that to criticize your question). IOW, team A's record in that one game with team B is only "displaced" by team B's true talent not by team B's record against other teams!
Like I said before, let's say that all teams had the same talent. Then why would team B's sample record against all other teams have ANY relevance to that one game of team A versus team B. Why would you want to adjust or displace that one game by team B's record verus other teams? You might as well adjust or "displace" it against the record of some team they never even played! The only reason you use team B's (or any team that you played) record against other teams (and against you of course) is to help you to estimate team B's true talent, which is the ONLY thing you are interested in! Like I also said, it's like an MLE. First you havbe to establish the true talent level of the environment. Then and only then can you do the adjustments or translations. Whatever will help you to establish that true talent, you use. Sometimes it is a team's record against all teams, sometimes it is that AND something else, and sometimes it is not even that!
MGL takes on the Neyer challenge (January 13, 2004)
Posted 2:17 a.m.,
January 15, 2004
(#18) -
MGL
Because you adjust team A and B's strength by the strength of their opponents. You also adjust the strength of team A and B's opponents (the other 100 team in the league, say teams C through Z) by team A and B's strength. Following the game between team A and team B, the strength of team C through Z's opponents (which includes teams A and B) has to be adjusted slightly.
I know how to do SOC iterations. What I am trying to say (said in the articlew that it is hard to understand AND hard to swallow) is that the "strength" of a team is NOT necessarily defines by its record. I think we all understand that. A teams record is a sample of it's strength, but it is not its strength per se. If we want to know a team's strength we can take it's sample overall record and use that to estimate its strength.
How much a team's overall record can be used as a substitute (proxy, estimate of, whatever) its strength depends on the length of the schedule (try doing a SOC or QOC adjustment the "old" way for a 3 game schedule and see what kind of screwy results you get!) and the distribution of talent in the league. As I said, we are trying to adjust each team's record by the true strength of it's opponents and NOT by their actual records. These 2 things are sometimes close and sometimes they are not. For very short schedules they will not be close to the same thing. If the true talent distribution of teams were such that all teams had about the same true strength then they would also not be close. Etc. Sure you could just go ahead and adjust everyone by the actual w/l records of their oppoennts and you wouldn't be that far off in a 162 game season. But you would be making some bad mistakes if you did that for a much shorter season or if baseball had a lot more parity than it does.
If my explanation is still not making sense maybe someone (Tango) can help me out here...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 6:05 p.m.,
January 15, 2004
(#23) -
MGL
Pythagorus shouldn't work as a proxy for strength. A team which has a pythag w% of .750 after five games is not a .750 team.
Sure, pythag just gets you closer to a team's "real" strength so that you don't have to regress as much!
"True talent" is just one way of looking at the question. I'm more interested in how the won-lost records have been displaced by the schedule. If I learn that a .536 team would be a .550 team (using the old method) with a balanced schedule, I'm not assuming that the "true strength" of the .536 team is therefore .550. I'm just recasting the won-lost record of the .536 team, and its leaguemates, in a way that removes the distortion of the schedule.
Your mixing 2 things. One thing is the w/l records of the teams (either before or after the SOS adjustment). No reason we have to worry about their "true strength." People just want to know a team's records. Not too often do you hear someone say, "But have you regressed the Rockie's record to estimate it's true record?" But, if you just want to see "how other teams displaced their record," you have to worry about the true w/l record of their opponents, and not the actual w/l record of their opponents! It makes no sense to adjust the w/l record of a team by the w/l record of it's opponents! The result doesn't mean anything, unless you KNOW that the actual w/l records of your opponents is a good estimate of their actual strength! You can do it, and it's technicaly not wrong. It's just not the best way to do it. In practice, in baseball, it is OK to do it that way, because for a 162 major league baseball schedule, you can be sure that a team's actual w/l record will likely be pretty close to its actual strength. But that is NOT always the case, which is why it is important to understand what's going on.
What about 10 days into the season? Would you be comfortable doing a traditional SOS adjustment for each team? Why not? What about 50 days? At what point do you feel comfortable? Doing the appropriate regression allows you to not have to worry about how many games have been played!
If you are doing "rankings," to some extent, it changes everything. For example, if you are ranking teams based on actual strength, you don't have to do any regressions, assuming all teams have played the same number of games. You can safely and correcty rank all teams based on their actual w/l records. Regressing first won't chnage anything...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 3:50 a.m.,
January 16, 2004
(#27) -
MGL
Dackle, no you are mixing up the 2 things again (the past and future of the records you are adjusting and the past and future of the records you are using for the adjustments). The question about the 10 game season is not whether you would want to adjust the 9-1 team to account for SOS, but whether you would want to adjust, say a .500 team (5-5), who played that 9-1 team 3 times already? Let's say that your .500 team played that 9-1 team 3 times and for their other 7 games, they played .500 teams. IOW, they haven't played the teams with bad records yet. So their SOS would be 3 games versus .900 teams and 7 games versus .500 teams, or an average opponent of .780. Do you really want to adjust that .500 team by .780 and say that their w/l record, adjusted for SOS is now 8-2, because they played that 9-1 team 3 times already? You may want to, but that information doesn't mean very much. Doing some computation and spitting out the result is silly if the result and the computation don't mean anything. Yes, we know that our 5-5 team played 3 games against a tough opponent (the 9-1 team) so their 5-5 record is not really fair. That means something! But how unfair is it? It depends on how tough that 9-1 team really is! To just go ahead and adjust the 5-5 team using the tough team's 9-1 record is arbitrary and yields a result (my 5-5 team is now 8-2) which very likely has no meaning or truth other than "the result of adjusting my .500 record by .780." Doesn't mean anything. Yeah, we know that the 5-5 record is NOT fair and should be adjusted by something, but NOT the 9-1 record of the team you played 3 times. You might as well say "I make my 5-5 team 6-4 now becuase they have played a tough schedule so far." That would be closer to the truth. Where is the "magic" in using the 9-1 record to adjust the 5-5 team's record? As I've said before, you might as well adjust a team's w/l record by the actual w/l record of their opponents AND the temperature on the day of each game. The specific w/l record of a team's opponents don't mean ANYTHING other than as a very weak (for only 10 games) approximation of the opponents' actual strength. Why use that exact number (9-1)? Why not 8-2 or 7-3?
If you don't get what I'm saying, we're better off dropping it now. There is no value in arguing an untenable point unless it helps you to understand the tenable one, although in this case it is a little bit of a semantical argument...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 3:46 p.m.,
January 16, 2004
(#31) -
MGL
Amazing! A scintilla of a compliment from Ross! Maybe I'm reading it wrong...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 1:27 a.m.,
January 18, 2004
(#38) -
MGL
The Royals' 9-1 record was helped 0.8 wins by the schedule," when the 0.8 is calculated using regressed values. Especially when the speaker goes on to argue that this means quality of competition is unimportant, because the adjustment is so small. You should say "The Royals' 6-4 regressed record was helped 0.8 wins by the schedule.
You got it backwards again!
Dackle you just have something which makes little sense stuck in your head and you refuse to give it up. You got some really smart people telling you that what you're trying to do is nice but makes little sense other than the "numbers add up nicely." To each his own...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 4:54 p.m.,
January 18, 2004
(#45) -
MGL
Here is one more way to look at it, using a hypothetical discussion:
So the Royals are 9-1 so far this season, huh? Wow, they must be a great team!
Sure, they seem pretty good so far, but that 9-1 record is a litte misleading.
Oh really, why is that?
Well, they played some really crappy teams.
Really, how do you know that?
The combined record of all the team's they played so far is 20-80!
Wow, I guess they did play some crappy teams. So their 9-1 record is kind of misleading, huh?
Yes it is, it should be more like 7-3 or 6-4 or naybe even 5-5. I'm not really sure.
Hey, I've got an idea, let's "adjust" their 9-1 record to account for how bad their opponents were and then we can tell people "Here's what the Royal's record SHOULD be or here's what it WOULD be if they had played average quality teams!
Great idea, but how should we adjust their 9-1 record?
Well let's adjust it by the collective 20-80 record of their opponents! There are some really great formulas out there, like the log5 and odds ratio methods that can tell us exactly how to adjust a 9-1 record for the "quality of competition"!
Now I'm confused! Why are we adjusting the Royals' 9-1 record?
Because they played crappy teams, on the average! Haven't you been following the discussion?
OK, I get that their opponents are PROBABLY crappy since they had a 20-80 record, but are you sure they are crappy and that 20-80 wasn't just bad luck?
Well, I'm not exaclty sure, but it is a good bet that they are crappy since they have a 20-80 record!
Yeah, I guess you are right! So let me get this straight. We are going to adjust the Royals' 9-1 record to account for the fact that their opponents were probably bad?
Right!
Why are we going to use the 20-80 record of their opponents to make that adjustment?
Because that's what their record was!
Yeah, but I thought the whole idea of adjusting KC's 9-1 record was to account for how BAD their opponents were! Why are we using 20-80 to represent how bad they were?
Becuase that's what their opponent's were, 20-80! I already said that!
Yeah, but do we know that they were realy that bad?
No, we don't know for sure, but that was their record!
Well, are we adjusting the Royals' 9-1 record by 20-80 becuase "that's what their opponents' record was," or "becuase that's how bad we think they were?"
Hmmm... I think both!
But we really want to adjust that 9-1 record by how good their opponents were, right!
Well sure!
So if God came down and told us that their opponents were really average teams, and that theie 20-80 record was just bad luck, what we do then?
Well, then we wouldn't adjust the 9-12 record anymore you idiot, becuase the 9-1 is the right record! If we told everyone that the 9-1 record should be 7-3 or 5-5 we would be lying to them, because we told everyone that we were going to figure out what the Royals' record would be if they had played average teams, and if we knew they were average, we wouldn't want to adjust the 9-1 record! But God is not coming down is he?
No, but why are we using the 20-80 when you said that the ONLY sensible reason to adjustthe 9-1 records is so we can tell everyone what a "fair" record for the Royals is? You know, what their record would be if they played average teams.
Cause that's all we have to go by, since I doubt that God will come down and tell us how bad those teams really are!
Oh, I get! The only reason we are using the 20-80 record is becuase that's all we have to be able to guess how bad those opponents really are. If we knew they were really average, we wouldn't adjust the 9-q at all. And if we knew that they were only a little bad, we would only adjust the 9-1 A LITTLE - not by the 20-80?
Of course. If we KNEW how good their oppoetns were, why would we want to use the 20-80 record to adjust the 9-1 record? That would be stupid. It wouldn't be good information to use for anything, would it? Like you said, if we KNEW that those opponents were really average teams, and that their 20-80 record was just bad luck, we would be misleading everyone if we adjusted that 9-1 record to 6-4 or 7-3 or 5-5 or whatever it would come out to. What would that adjusted record MEAN if their opponents were really average? Nothing that I could think of!
OK, but since we don't KNOW how bad the Royals' opponents were and how lucky or unlucky that 20-80 record was, we pretty much have to go with the 20-80 record as an estimate of how bad those opponetns actually were?
Now you got it!
Wait a minute, I forgot, some really smart people told me that if you want to estimate how good or bad a team is, you don't use their w/l records!
Really? There's a better way to figure out how bad the Royals' opponetns were than using their actual w/l record of 20-80?
Yup!
Well, why didn't you say that in the first place? What is it?
It's something to do with "regression" or somethig like that. And it depends on how many games each team plays.
Well, that makes sense, since a team that is 3-7 is probably not as bad as a team that is 30-70!
Are you sure that this regression method is a better way of guessing how bad the Royals' opponents were than just using their 20-80 record?
Yup, I'm positive!
Boy, I see your point, but I really want to use that 20-80 record to adjust KC's 9-1 record! After all, that WAS the record of their opponents!
Yeah, but that gets back to my original question - why use that 20-80 record just because that is the EXACT RECORD of their opponents, when the whole point of the adjustment is to let people know what the Royals' record SHOULD like like if they played average opponents?
Cause that's how bad their opponents really...Ah! I got you! That's NOT how bad they really are - that 20-80 record! If I want to guess how bad those opponents really are so I can come up with a fair record for thr Royals, I have to do that regression thing! If I use the 20-80 record to do the adjusting, even though it "seems" like the right thing to do, I'm really not using a very good number to adjust the Royal's record! There are better numbers I can use to accomlish what we want to accomplish! Probably 21-79 is better since that is closer to how bad their opponents really are. Even 22-78 is probably better! Using 20-80 is not only arbitrary (even though it happens to be their actual record - so what!) it is not a good number to use if we want to present a "fairer: record for the Royals! If we use the 20-80 to adjust that 9-1 reocrd, it may "seem" like the right thing to do, and we can call it something "nice" like "an opponent or schedule adjusted w/l record" but it doesn't mean anyhting. If it means anyhting at all, it as an attempt to present the Royals record as if they played average teams, but it is a poor attempt! Right?
Right!
Dackle, that's the best I can do! The rest is up to you!
MGL takes on the Neyer challenge (January 13, 2004)
Posted 12:50 p.m.,
January 19, 2004
(#54) -
MGL
Dr. Doppler's Cthulhuite Spawn Counterpart,
Yes and yes to both questions. Using 3-years, rather than 4 or more years is just convenient. Since each prior year is given 20% less weight than the subsequent year, there is diminishing value in using lots of years, plus many players don't even have more than 3 years of major history of course. QOC adjustments end up being so minor, that a perfect "true value" estimate for each player is NOT necessary.
After one or two iterations, nothing changes, again becuase the adjustments end up being so minor AND because the "true value" estimates are based on at least 3 years of data. When you adjust 3 years of data for QOC, you get almost no changes in sample performance...
MGL takes on the Neyer challenge (January 13, 2004)
Posted 9:03 p.m.,
January 22, 2004
(#56) -
MGL
I'm still here. Again, you're mixing up two diffent things. I'm not talking at all about the 9-1 team's true strength based on that 10 game sample. I'm simply talking about taking that 9-1 record, for whatever value it has and "re-doing it" to account for the true strength of their competition. For whatever that is worth. I meant the language in my little dialogue literally. "Converting the 9-1 team's record into a "true record" is another story altogether. OK, the two things are related. If I say, so-and-so is 9-1 so far, what does tha tell us. It tells us that unless we know that a priori that all teams are really the same strength, that our 9-1 team is probably better than average. How much better or what their true w% is, we don't know unless we have more information. Could be 90% or it could be 50%, or anything in between. Now, if we find out that the 10 teams that it played had a true or estiated true w% of 40%, that gives us more information that we can use to estimate the 9-1 team's true strength! That's all it does, and that is the proper way to do it. So we convert the team's 9-1 record into what it "would have" done had it played average teams. That is trivial. We use the log5 or ratio odds method to do that. Now we have a QOC adjusted record of, say 8-2. We an leave it at that if we want to, and just say, here is an "opponent neutral" version of my team's record now. It is still a sample record, but it is closer to the truth than the orogonal 9-1 record now that we know that we played bad teams. Or we can go further and estimate the "true w% of our now 8-2 team. I don't really see the confusion.
BTW, we are getting so used to regressing everyhting that we have forgotten that the default rule is that a smaple mean is the best estimate of the population mean! No regression! IfF all we know is a 90% sample result, then the true value is 90%! It is only when we have more information that we may start to regress! And heck, we may even have to regress a 90% sample result upwards, depending uponthe information! The extra info in this hypo is that these are some sports team's records and we know that the mean of the population is by definition .5 and that all sports leagues have some degree of parity! If we didn't know any of that, then we wouldn't regress!
MGL takes on the Neyer challenge (January 13, 2004)
Posted 4:29 p.m.,
January 23, 2004
(#58) -
MGL(e-mail)
jto, why don't you e-mail me directly. You can tell me about the project...
DRA Addendum (Excel) (January 16, 2004)
Posted 2:35 p.m.,
January 17, 2004
(#2) -
MGL
Again, terrific stuff and well-written! People don't realize yet that DRA is by far and away the best metric out there using traditional fielding data (not PBP data). It will some day be the gold standard! I am working on "converting" UZR into a linear weights formula that can be used with traditional fielding stats only. The results of that should be almost exactly the same as DRA (hopefully).
There a million thoughts that come to mind as I peruse the Excel file. That the best hitter in baseball (by a wide margin of course)and perhaps of all time is also at the top of the defensive list should boggle the mind! And the fact that Buckner is at the top of the defensive firat baseman list and yet he is remembered for that one fateful gaffe is sad!
I don't recall you publicly releasing your "linear weights" formula, Mike, but did you publicy release the data that is used to compute for each player's DRA, especially for catchers? For example, for a SS, you "need" assists, putouts (do you need putouts, for example), team pitcher data inlcuding G/F ratio, handedness, etc.??
DRA Addendum (Excel) (January 16, 2004)
Posted 9:48 p.m.,
January 17, 2004
(#5) -
MGL
Daved,
You can order a "trial" version of Microsoft Office (I think it only lasts for a month, but I'm not sure if you can figure out how to reinstall it) for like $5.00!
MGL's MLEs (January 22, 2004)
Posted 2:56 p.m.,
January 22, 2004
(#1) -
MGL
Those are some damn nice looking charts!
MGL's MLEs (January 22, 2004)
Posted 7:50 p.m.,
January 22, 2004
(#3) -
MGL
I might have minor league fielding data. Which ones are necessary?
SuperLWTS Aging Curve (January 26, 2004)
Posted 6:35 p.m.,
January 26, 2004
(#4) -
MGL
It was a quick and dirty chart! I thought it would be interesting because it was everything (Superlwts) and not just batting. I only used 00-33 (4 years). It is also nice to do these analyses using context-neutral data, although it may not make much difference with a large enough sample size. The context does affect the selective sampling issues though (e.g., a marginally talented player in LA is more likely to get sent down or released or retire than a marginally talented player in Col).
I did not address any selective sampling issues plus my sample sizes are small, especially at the corners. 22 and 38 year-olds nubered around 20 in my sample, up to 110 or so for the 26-29 yo's (25 to 31 is by far the biggest group of players, in terms of numbers. In terms of PA's at the various ages, the dropoffs are smoother, i.e., only the very good (talentwise) young and old players get lots of playing time, while mediocre peak players get lots of playing time).
I also want to do separate curves for each Superlwt component! That might be interesting. The SB/CS lwt curve will probably surprise you. Even though players run a lot more when younger, players who run a lot, on the average, because they are young or fast, or both, basically run themselves into around zero net runs. There will probably be no age curve for SB/CS lwts just like I found that there is virtually no y-t-y correlation for SB/CS lwts, which shocked me, until I realized that all players on the average, whether they run a little or a lot, or whether they are fast or slow, bascially run themselves into zero lwts, which is amazing when you think about it! For the few players who do steal a lot and have a high SB %, 2 things happen to make the r for all players almost zero: one, there are very few of those players. Most players who steal a lot, steal at at rate of 65-75%, which means they are literally spinning their wheels. And two, even some of the ones that have high SB% one year can have dismal ones the next (e.g., L. Castillo). Basically there is no predictive value to a player's SB/CS lwts, so for example for a player like Beltran who is arguable a very good overall player, I would attach almost no value to his SB/CS lwts for purposes of talent or projections. Sure, SB attempts and success rate separately might have some decent r, especially the former (although the age curve for SB attempts is probably real sharp), but who cares about either one of those, per se. All you care about is the net value, unless ALL YOU KNEW was one or the other, of course!
SuperLWTS Aging Curve (January 26, 2004)
Posted 7:54 p.m.,
January 26, 2004
(#6) -
MGL
Oh BTW, the plateau between 26 and 29 is probably a sample size fluke. I suspect that with Superlwts at least, it should be a smooth decline after age 26. I could be wrong though. In general, when you do graphs like that, especially with limited sample sizes, and particularly since each data point on the x axis could have severe sample size problems, you want to smooth out the curves on the graph, especially since most relationships are indeed smooth (though the shape is not necessarily obvious or evident)...
Also the absolute numbers on the y axis (on the left) mean nothing. I just arbitrarily set the younget age (22) to zero. It is a scale of lwts per 480 PA's (162 average games) though...
SuperLWTS Aging Curve (January 26, 2004)
Posted 8:12 p.m.,
January 26, 2004
(#7) -
MGL
I'm confused. If as you say "only the very good (talentwise) young and old players get lots of playing time", how can the weights from 33 on all be increasingly negative? Wouldn't such players decline more slowly (or even improve at an advanced age, ala Barry Bonds) than the typical player? How do I interpret the -50 for a 39-year old?
Don't get confused by the fact that only the really good old and young playes are playing and getting substantial time in the major leagues. It's not because they age slower or their peak is any different than any other players. There is no evidence that that is the case, although surely everyone has his own unique peak age (although it is next to impossible to figure it out).
OK, I take that back. Yes, it is probably true that players who are old and still good, as a group have either aged at a slower rate or peaked at a later age. If that's the case, so what?
The only point I was trying to make is that marginal players will tend to get lots of playing time around their peak years only because they are not good enough to play full time when they are young or old, but that very good players are good enough to play full time at all points in ther careers, and obviously they are great players around their peak age...
SuperLWTS Aging Curve (January 26, 2004)
Posted 8:14 p.m.,
January 26, 2004
(#8) -
MGL
I hit the post button too soon. The curve is based on at least 50 PA's per back to back season and not 100, not that anyone cares. I gave Tango the wrong number.
I agree with Tango, that even if you successfully adjust for the selective sampling issues, etc., the aging curve is going to bascially look the same. It seems like no matter what anyone does, it always comes out looking the same...
SuperLWTS Aging Curve (January 26, 2004)
Posted 12:26 p.m.,
January 27, 2004
(#11) -
MGL
FJM, the problem is that you are introducing SEVERE selective sampling issues, in fact, the mother of all selective sampling issues in constructing different aging curves based on playing time! The idea of an aging curve is to model "real" aging patterns, not observed ones (although without selective sampling that is one and the same), such that you can use it to project and adjust other players' stats.
I don't really know how to explain it, plus I don't have the time right now! Maybe someone else can chime in and help, or maybe I am wrong...
SuperLWTS Aging Curve (January 26, 2004)
Posted 10:45 p.m.,
January 27, 2004
(#15) -
MGL
Also, as long as I'm nitpicking, the statistical uncertainty in the difference of lwts/680 between two seasons is something like sqrt(1/N1+1/N2). So in terms of weighting the values for different players, the weight should equal 1/(1/N1+1/N2), not min(N1,N2).
Thanks. That's more than a nitpick. I think that the selective sampling is not a major problem, but it could be, could in not? What if only players who had good season were ever allowed back for another season. That group would be comprised of great players almost no matter what - no problem there - but also good, mediocre and bad players who only had lucky seasons. So the entire first year of any "couplet" for any player would have tons of lucky PA's in the first year and essentially unbiased PA's in the second year, such that it would look like everyone was losing ability every year, even if there were no real aging patterns (true ability stayed teh same at all ages). This is not exactly what happens of course, but it does exist to SOME degree. That's the only problem I see. I agree with Tango that you have to some sort of regressing for year x in any x/x+1 couplet because of some unlucky players dropping out and having no x+1 years, such that all players who have x+1 years will automatically have been a little lucky as a group in year x.
I really don't see your problem, since you yourself suggested that playing time early (and late) in a career is a reliable indicator of true talent. But OK, let's say that it isn't reliable. Then use something else as a classification criteria, such as OPS+ or LWTS or whatever you like, but limited to the peak years, 26-29. Classify everybody at least one standard deviation above the mean as a good/great player, and everybody at least one s.d. below the mean as a marginal player. Everybody else is average. Now construct 3 aging curves. The curve for the average group should look pretty much like the overall curve, but I'll be very surprised if the other 2 don't look quite different.
I see what you are saying. I'll have to think about it and/or try it and see what happens. I just think that, off the top of my head, any attempt to group players by "talent" is going to be inextricably related to their aging, a priori. But like I said, I'll have to think about it.
Clutch Hitters (January 27, 2004)
Posted 10:49 p.m.,
January 27, 2004
(#5) -
MGL
Tango, or anyone else, if you want to find out if you can identify clutch players, using your definition, why not take everyone's clutch/non-clutch ratio or difference or odds ratio (I'm not sure which would be best), and run a regression correlation from one time period to another? If it is zero or near zero, no clutch "ability" - right? Is your method any different or any better?
Sheehan: Foulkelore (January 29, 2004)
Posted 12:24 a.m.,
January 30, 2004
(#4) -
MGL
I've discussed this with Tango before, but I'm not convinced that there is a strong cause/effect relationship between age and pitching performance, except towards the end of a pticher's career. Research I did a long time ago suggested that major league experience was the "cause" for the observed aging patterns and not aage itself. I'd like for someone to do a quick age "delta thing," but control for ML experience, or vice versa. Remember that the two reasons why batters get better and then worse with age is that: 1)first they get bigger and stronger but slower and then all their physical skills decline AND 2) experience (I assume that's why their walk rates go up alomst forever). For pitchers, until they get so old that they start to have physical problems, I see no reason why age should have much of an impact on pitching performance....
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Discussion ThreadPosted 2:19 p.m.,
January 30, 2004
(#12) -
MGL
Tango,
Can you explain the "chaining" in more detail. I didn't understand that before and I still don't understand it. When I do my age curves, as I did for the Superlwts age curves, I simply plot the average delta added to some contsant, for each each age pair, and that's it. I have no idea what the differences are in the numbers in your pre-chaining and post-chaining tables.
Also, no one wants to addresss my concern about age versus experience. We've discussed this before and I posted something about it in the other thread. If these patterns are NOT caused primarily by aging, but by ML experience, they will look identical, but they an aging pattern will NOT be useful for projecting pitcher performance. I suspect that there is more of an experience influecne than age influence, at least until the later ages (mid and late 30's). Certainly we have to check that out before declaring a cause/effect relatrionship between age and performance!
What I would do is to first do the same exact charts, but simply substitute years of ML srvice for age. The charts should look similar if not exactly the same. Then you have to do more research to see which is the more promintnet cause/effect relationship - age/perform. or ML service/perform! You can't assume that it is age, especially with pitchers!
The best way to attack that, initially at least, is to establish 2 or 3 groups of pitchers. Group I debuted in the ML's at say age 24 or less, group II 25 to 28, and group III, older than 28. Or whatever. Then do the age and service charts for each of those groups.
I suspect you will find that service time is at least as important as age, but I could be wrong. As I said, the distinction is critical if you want to apply these curves to any out of sample pitcher or year (as for projections)...
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 4:16 p.m.,
January 30, 2004
(#15) -
MGL
Tango,
Fantastic stuff! Absolutely fantastic! There can be no one that can do this stuff more quickly than you can!
First of all, I think that much more detailed research needs to be done with the age/experience stuff.
That second portion "carries" information about your pitcher. A pitcher who makes MLB at age 21 probably has much much more true talent than a pitcher who makes MLB at age 28. What this does is that rather than regressing all your pitchers towards the same pop mean, your pitchers who started off at age 21 will regress to a much higher (in terms of true talent) pop mean than the pitcher who makes the bigs at age 28.
What you are actually suggesting is that there IS a significant aging patterns with pitchers, independent of their experience, and that the only reason we see either an experience influence or an "age of debut" influence is because of an "illusion" in the year to year data pairs created by the fact that the earlier a pitcher is called up, the better his true talent, regardless of his performance, such that we want to regress a pitcher's year X stats LESS for a pitcher who debuts early, than for a pitcher who debuts late. I agree, but I think there is more to it than that.
I have to think about this some more. It is indeed fascinating, but I'm afraid that we have only scratched the surface. As I said before, there is no particular reason NOT to think that major league experience improves a pitcher's talent regardless of age, or that age, especially in the middle (22-32), in and of itself, should affect a pitcher's talent very much. The opposite is true with hitting, and hitting is much more of a physical thing. Given that reasonable assumption, I would be a little surprised if there is a strong aging/performance (talent) cause/effect relationship, and if there weren't somewhat of an experience/performance (talent) relationship.
Remember that your initial r's (age and experience) are very misleading, which is why you got a higher r when using both variables. One, if perforemance improves with experience, but declines with age, of course a regression of experience on performance, without controlling for age, is going to yield an r of near zero, as those two things will cancel eachother out. Secondly, if what I am saying about age, that it may be somewhat irrelevant in the middle age catgeries, then an r generated from a linear regression is not even valid if the realtionship is non-linear! Remember that "r" is only useful for certain linear relationships. Even if there is a nice relationship, if it is not linear, "r" essentially is meaningless.
Plus, the age that a player debuts is almost inextricably realrted to his year of experience. For a player to have, say 10 years of experience, in order for him to NOT have debuted at an early age, he would have to be old, such that his year to year changes might reflect his age AND his experience rather than his age of debut. IOW, I am sugesting that this age of debut thing might be an illusion - that the real relationship might be age AND experience only, and they might even be independent, although your chart of the 27 yo pitchers with different years of experience tends to refute that. But then again, the problem with that chart is that there are different regression and selective sampling problems with each of those groups. Pitchers who are 27 with 5 or 6 years of experience already will tend to be the much more talented pitchers...
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 5:16 p.m.,
January 30, 2004
(#19) -
MGL
Just want to make sure I understand the data in the table above:
The $ER (don't you mean "delta" and not "$"? We use "$" to mean some kind of "rate" like $H) is the simple ratio of year X-1 ERA to year X ERA? There is not chainign in this data, right? What is the min number of BF for each year? The TPA1 column is the actual total number of PA in year X, and not the "min PA of year X and X-1"? Finally, "age 20" means from 20 to 21?
So your final conclusion is that if we just use an aging curve like we do for batters, we are not going to be costing oursleves much by not paying any attention to years of experience or age of debut?
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 9:43 p.m.,
January 30, 2004
(#22) -
MGL
Thanks Blixa. Tango, BTW, I finally got what you mean by "chaining." After all these years, literally. I can be very dense sometimes! I agree that although it doesn't make that much difference, you might as well do it right. For example, in the other thread AED (I think) pointed out that the correct way to do the weightings when you have matched pairs with different numbers of PA's (for example) is to use 1/(1/PA1+1/PA2), rather than the "lesser of the two PA's." It's almost the same thing, but, as you said, you might as well do it right! And yes, I have done it the way you thought I did it, by just adding or subtracting one from the other. That's how the graphs are constructed. It won't make any difference in the shape of the graphs of course, but I will correct it.
Funny, on BPro, Davenport had an article about constructing MLE's for the Winter Leagues. He weighted the matched pairs by using the "lesser of the two PA's." I chuckled when I read that, because I never heard of anyone using it other than Tango and me, and it turns out that technically it is not correct!
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 12:52 p.m.,
January 31, 2004
(#26) -
MGL
Chaining assumes that the y-t-y changes occurs on a percentage basis for all pitchers (and hitters) regardless of the underlying rate (high or low). Do we know for sure that this is true - that is doesn't change by some constant or some combination of a constant and a percentage (something non-linear)?
For example, if a 22 yo hitter has a true HR rate of 5 per 500 PA and another one has 10 per PA, do we know that the average change in HR rate for these 2 players is not a constant, like +1 per 500?
If you did a study to check this, you can't look at players who had above or below average rates at any age of course, because that would be their observed rates and not their actual rates, and the players with high and low rates will regress the following year, giving you false values for their true changes due to age.
You would have to maybe group players by defensive position and look at the age curves for, say 1st basemen, LF, and RF, versus the age curves for SS, catchers, and 2B man.
For pitchers, I'm not sure how you would get around the huge selective sampling issues I'm talking about. I guess you could group the pitchers by career rates, but again, pitchers with high or low career rates in a any category probably had a weird aging pattern, independent of whether all aging patterns are indeed arithmetic or geometric (adding a constant from y-t-y or a percentage, in which case you have to do the chaining)....
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 9:37 p.m.,
January 31, 2004
(#30) -
MGL
I'm not sure what your dependent variable Y is (is it year to year decline or total running decline from the debut year). For a pitcher who debuts at age 23, he experiences a 4.2% decline in the second year. In the third year, is that a 4.5% decline from the first year or the second year? IOW, after the second year are we expecting his ERA to be 8.7% (a little different if we "chain") worse than his debut, or 4.5% worse than his debut?
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 11:36 p.m.,
January 31, 2004
(#32) -
MGL
FJM, that wasn't my question....
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 2:41 a.m.,
February 3, 2004
(#7) -
MGL
A lot of good cross-correlation points were brought up. It is by no means clear what the cause-effect is of the results of the study.
It appears to be a very good study, although I must confess I do not understand at all your techniques; therefore it is hard to make any intelligent comments, let alone critique it or offer any suggestions.
I am also not aware of any other studies that have remotely suggested the existence of true clutch and choke hitting.
I think what Charles is saying is that since the y-t-y correlations for clutch hitting are only .04 or so, there is little practical value in the revelation that your study seems to suggest. Is there? What I don't understnd, is that you say that it is difficult to identify a clutch or choke hitter, especially one with less than a multitude of historical AB's, which seems to jive with your .04 corrr., yet you say something about a hitter like so-and-so being expected to have a clutch OBA 10 points higher than expected. To me that makes no sense. If we can wasily identify players who should have a clutch OBA 10 points higher or lower than expected, shouldn't the y-t-y correlation be much higher than .04. Your regression equations also suggest that the y-ty r's should be much higher as well.
Finally, if it is true that singles-type hitters in general perform better in the clutch than do power hitters, our regular value metrics, like lwts, offensive win shares, VORP, etc., should be adjusted accordingly, should they not?
Bascialy I am thorougly impressed by your methods and basic conclusion, but I am lost as to the significance, the magnitude, and the predictability consequences of the results....
The genius of Paul DePodesta (February 4, 2004)
Posted 8:21 p.m.,
February 5, 2004
(#15) -
MGL
Great article! He really writes well too. I'm listening to the video lecture right now. This guy looks about the same age as my son!
One of the funny things is that it appears as if Depodesta did and may still be trying to reinvent the wheel in a lot of the things he does. For example, he talked about creating the run expectancy matrix as if it were a great 21st century innovation when in fact, it has been constructed at least since Hiden Game...
The genius of Paul DePodesta (February 4, 2004)
Posted 10:02 p.m.,
February 5, 2004
(#16) -
MGL
In the video lecture, Depo again (Beane did in Moneyball as well) alludes to the fact that the A's use a player "rating" system or metric similar to what we talked about in the above thread on digitizing the events in a game. For example, a line drive is socred as .8 of a hit (or .5 runs or whatever) and no attention is paid to the actual outcome (hit or out). Same thing for every other type of batted ball. Ideally that is a great "system" and ultimately the optimal system. In order for it to work, though, you have to have almost perfect data. For example, if you assigned .5 runs to every line drive, regardless of the actual outcome, you would end up overvaluing weaker hitters who hit "weaker" line drives and vice versa for stronger hitters, etc. IOW, if you are going to use a system like that, you had better have fine distinctions in your data (like a 80 mph line drive versus a 60 mph line drive). I have no idea what kind of data the A's have or use, but I am a little skeptical of their being able to utilize this kind of a system, at least at the present time until that kind of data becomes avalaible...
The genius of Paul DePodesta (February 4, 2004)
Posted 12:58 a.m.,
February 6, 2004
(#19) -
MGL
It didn't really bother me about not "citing" other sources. Actually I guess maybe it did, as I couldn't imagine doing that kind of lecture and not explaining how I heard or read about these things and then decided to duplicate and refine them myself. The way he explained it comes across as kind of silly to someone knowledgable about sabermetrics.
Danny, I can't imagine that as he got into sabermetrics, he didn't read about RE charts and the like.
Anyway, he seems like a real smart guy. I've upgraded the level of respect I have for the A's organization...
The genius of Paul DePodesta (February 4, 2004)
Posted 9:30 p.m.,
February 6, 2004
(#21) -
MGL
I wholeheartedly believe in a combination of scouting and obejective analysis, however, to some extent objective analysis ALWAYS trumps scouting.
As well, the usefulness of scouting is really only limited to young/inexperienced players and old or injured (presently or in the past) players. Basically scouitng fills in the holes/gaps/questions/uncertainties of the objective analysis. When I evaluate a player (almost always in the context of a projection), I like to build a "story" on the player, which ideally includes scouting/injury reports...
The genius of Paul DePodesta (February 4, 2004)
Posted 3:24 p.m.,
February 10, 2004
(#28) -
MGL
But the scout watches game tape of the hitter last year and this, and picks up something he's doing differently, whether it's laying off the hard slider out of the strike zone, driving pitches the other way rather than trying to pull everything, opening up his stance and hitting line drives rather than weak popups and shallow flyouts. Or maybe he's seeing that there's nothing really different, and that he just had a couple of extra good weeks where those little loopers were dropping in.
Mike, the big (huge) flaw in that kind of thinking is that a lot of what the scouts think is a true change in ability or a reflectionof true ability is luck itself! How do you think players with trye .250 BA's hit .350 over 10 or 12 games or .270 over a season? It's not just bloop hits falling in! It is that they lay off that slider, drive pitches the other way, etc. Thise things flucuatue as as well - not jusy whether the bloop hits fall, the liners get caught, or the long flies just make it over the fence or not. There are all kinds of levels and manifestations of luck. What makes you think that a scout or anyone for that matter can watch a player and figure out to any degress of certainty what is random fluctuation and what is true ability?
When I say that objective analysis "trumps" scouting what I mean is that while objective analysis has it's limitations (even in a perfect form), it makes no mistakes! Scouting on the other hand, definitely has it's limiations too, even in its prefect form, but in addition, and not insiginificantly, it makes lots of critcal errors, due to the ignorance and superstitions of the scouts, and the "illusions" naturally created by the eyes and the brain...
The genius of Paul DePodesta (February 4, 2004)
Posted 10:39 p.m.,
February 10, 2004
(#31) -
MGL
Mike, you don't understnad what I mean by a prefect objective analysis making "no mistakes" yet having limitations?
The genius of Paul DePodesta (February 4, 2004)
Posted 10:43 p.m.,
February 10, 2004
(#32) -
MGL
If I tell you that MLB player A had a .370 BA in 100 AB's, you can tell me exactly what the best estimate of his true BA is. That is what I mean by a "prefect" objective analysis (you can't do any better than that with no more information) and there are no "mistakes" in that estimate (again, it is a perfect estimate given the information we know).
Can I not make an obtuse comment without being accused of being "dead wrong?"
Batter's Box - ShockDome (February 5, 2004)
Discussion ThreadPosted 7:46 p.m.,
February 5, 2004
(#3) -
MGL
Yes, an interesting study that definitely requires more research. I agree that it is critical to use weighted 3 or 4 year HR rates for the pre-SD HR rates and not career HR rates (it wasn't obvious whether he used career HR rates. I don't think he specified). The whole affect, or a good portion of it, might be simply the affect of aging (Tango has HR rate peaking at age 21). One could look at a control group of pitchers of about the same age as the ones in the study.
I don't think it is critcal to use more than one year after going to the Jays, although it would increase the sample of size for the post-move HR rate. It is also not critical to park adjust the pre-HR rates, although it would also help given the small samples. Bascialy whenever you do a study with large samples, adjusting for context, or using more than one year of data is not that important. With smaller samples, it is more important.
I also don't think that you want to use anything but a multi-year (averaged over the history of the Skydome) and regressed HR factor for Skydome. I definitely don't think you want to use the actual HR factor for the year(s) in question. That is almosy never the correct thing to do, otherwise you end up essentially "regressing" all the sample HR rates, since the park factors for any given year are generated from the sample H/R splits of the home players themsleves (or at least "half" of the park factor). IOW, if all the Toronto pitchers in one year had HR rates 1.5 times their road HR rates, the park factor would be 1.50 (if you just used the home playres to generate the PF's) and yo uwould just end up dividing their home HR rates by 1.5, so no matter what the sample HR park factor was for that year, all TOPR pitchers would have a park adjusted home HR rate exactly equalk to their road rate, which makes no sense.
As I said, intersting premise and the results give one pause for thought and an impetus for more work. Good job!
Winter Leagues, Redux (February 5, 2004)
Discussion ThreadPosted 7:05 p.m.,
February 5, 2004
(#1) -
MGL
I have to agree with Tango 100%. This is a horrible and misleading study I think. A classic case of how regression and selectve sampling can give you completely erroneous and misleading results.
Selective sampling in baseball, as the name implies, occurs when you sample a group of players who were not randomly or at least sem-randomly chosen. You get in real trouble when the group you select non-randomly is biased in terms of being very lucky or unlucky, which can happen all the time in these types of studies.
Clearly all players who improved from 2003 to the Winter League are made up of players who, as a group, were either unlucky in 2003 or lucky in the Winter League or a combination. Vice versa for players whose performance declined.
Of course, the Winter League stats give you more information to make a projection for 2004, but if you want to figure out what kind of weight to give it or if you should have a different weighting system for banner Winter League seasons (which is possible, as I found out in my banner year BB study), you HAVE TO adjust the data for the effect of the selective sampling and regression.
I cringed when I read this study, because it was obvious what was coming, and I cringed again when Clay only casually mentioned the regression issue, as if it was a potential and minor problem. It is a huge problem which invalidates the results and conclusions of the entire study! Unless I'm missing something, I am shocked that someone as good as Clay would overlook (or at least severely minimize the importance of) such an obvious thing...
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 10:31 p.m.,
February 10, 2004
(#5) -
MGL
Sorry about the A-Rod mistake. He is actually #11. Beltran is #9 and Deshields and Ellis are #10 at 2.7.
Does James have baserunning win shares? What is the range? Of course, his would be non-regressed, so the range would be wider. The SD for my baserunning lwts per 140 games is 2.8, so 95% of the full timers are between +-5.6. So for 500 games (around 4-years), the SD is 5.3 (95% between +-10.6).
I'm surprised to see Larkin at #2 after the regression due to his limited PT these past four years. He must be, like, really, really good. As a Reds fan, I'm going to assume he was even more superb in his prime.
Remember that the regressed values ARE how "good" a player really is. I know everyone wants to see the "actual values" for everything, but I am starting to hate actual (unregressed) values as they mean "nothing," especially if we don't know how much luck is inherent in the measure. At least here you know that the correlation in 600 games is around .70 such that around 50% ("r squared") of the variance (34 runs per 600 games) is attributable to chance and 50% to the players' baserunning skill.
The new Superlwts (2000-2003) are already in Primer hands waiting to be published. They are all expressed as actual values with no regression. If anyone wants to estimate a player's true total Superlwts from his sample total Superlwts, they can use the following regression formula:
regression=500/(500+PA),
which is around 44% for one year (630 PA or 150 games). Technically, each Superlwts category has to be regressed individually, but the above is good enough for a free stuff on a free web site.
The comments in the article are partly tounge and cheek. Don't take them too seriously.
I meant what I said about Delgado though. I have 42 players projected better than he for 2004 in regressed position adjusted Superlwts per 150 games! You will see them when Primer publishes the file. It's not the baserunning that hurts him so much. In fact, his 4-year weighted baserunning and UZR are only -6 per 150. It's mainly his age and position (an average first baseman is +11 in Superlwts, over the last 4 years, whereas an average SS is -9, so he has 20 runs to make up when comparing him to A-Rod or Nomar).
Players like Luis Matos, Milton Bradley, Polanco, Kearns, and Adam Kennedy are rated above Delgado. That is because of defense, other peripherals, like baserunning, age, and defensive position.
Defensive position is critical in comparing one player to another. The average catcher Superlwts is -15, which is why a player like I-Rod is so valuable.
I want to thank Tango and others for "turning me on" to this idea of defensive positional adjustments, which I believe are critcial and often overlooked.
There appear to be two ways of neutralizing everyone's Superlwts rate to account for defensive position. One is the way I do it. The other is to try and neutralize each player's UZR and not touch any of the other values, with the assumption being that any player can potentially play any other defensive position so long as you adjust his UZR appropriately. Needless to say, that kind of adjustment is problematic...
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 10:37 p.m.,
February 10, 2004
(#6) -
MGL
Those SD's I quoted above are the "observed" SD's and don't represent the spread of talent. The spread of talent would be the SD of the regressed values, which is only:
1.61 runs per 162 for players with at least 400 games in those 4 years. So yes, it looks like +-3 or 4...
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 1:55 p.m.,
February 11, 2004
(#11) -
MGL
MGL, I'm surprised that you have to use a high number like 500.
Good point Tango.
Since batting lwts alone has a y-t-y "r" of .675 for 550 PA's, which implies an x of around 265, I don't know where I got that 500 from. Must be a mistake. It's probably around 200. I'll check.
If any of our resident stat experts are lurking, what happens if you "combine" 6 or 8 metrics that have various "r"'s for a given number of opportunities, and various SD's? What should the combined "r" look like? IOW, Superlwts is essentially a combination of batting, defense, arm or DP defense, baserunning, and GDP as a batter, all per "game." For each one, a game represnts various numbers of opportunities, so the "scale" for each "r" is different. For example, one game for batting runs is 4.2 PA's, while one game for baserunning is 1.15 baserunning opportunities. Anyway, the "r"'s vary from the highest of .675 for around 500 PA's (120 games) of batting to the lowest of right around .300 for 120 games of baserunning and GDP defense for infielders. So if we combine all of those values, per game, what kind of "r" would we expect for the total, in, say, 120 games? Obviously becuase the variance of all these measures is quite different, batting and UZR will "dominate" the combined value...
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 2:02 p.m.,
February 11, 2004
(#12) -
MGL
Good point Mark F.! While each Superlwts category other than batting and UZR is relatively insignificant, at what point to we not write them all off, both in terms of a career, as you say, and in terms of a combination, especially since many are co-related (baserunning, UZR, and GDP for example). A big, slow guy like Delgado will tend to be bad at all the peripherals, which makes quite a dent in his overall value.
Plus people, inlcuding myself, do tend to forget or ignore the positional "adustments." It is human nature to want to just compare A-Rod's hitting with Delgado's to see who is the "better" player. It is almost a footnote or an afterthought that they play vastly different positions and therefore their overall value, given the same hitting, is vastly different. A 20-run Superlwts difference between a SS and a first baseman is a lot of runs to "make up!"
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 7:29 p.m.,
February 11, 2004
(#17) -
MGL
I checked the sample "r" for total Superlwts for 00 regressed on 01 and 02 regressed on 03, playes with a min of 500 PA each year. It is around .600 for around 600 average PA's. It is less than that of batting lwts alone. That is .52-.67 at 2 SD's. So I am going to use 400 as my x in the regression formula...
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Discussion ThreadPosted 2:19 p.m.,
February 12, 2004
(#5) -
MGL
9. Every time you establish a player's true talent level you are actually establishing his probable true talent level. Implicit with this comes a variance. To properly present a player's true talent level, you should provide the mean and the variance. However, because of #3, no one should be silly enough to think that the variance is zero. And, because of #2, no one should be silly enough to think that this won't change day-to-day.
The bottom line is your methodology will always "drag down" a Barry Bonds and "drag up" a Rey Ordonez when there is no necessary reason why they shouldn't go in the other direction.
Tango, you are a brave soul! The only thing that is going to keep you from being balsted forupurveying such heresy is that this is Primate Studies and not Clutch Hits and there is no link from a popular, mainstream web-site. ;)
Of course, the above quote is completely wrong and virtually everything you said was completely right. The notion that there is a finite chance that Barry Bonds is not really that good and just got lucky for 10 or 20 years (you know what I mean) or that Rey Ordonez' is really a pretty good hitter and just got unlucky for 10 years or so, creates so much cognitive dissonance in most people's minds that no matter what anyone says, some people are just going to think that you are anywhere from somewhat wrong to completely nuts.
Of course, in those 2 cases, the key to the "regression" or the Bayesian analysis is what is the mean and the distribution of true talent in the population of big, strong left-handed hitters, etc., or that of small, wiry, slap hitting shortstops from the Dominican Republic (or whereever Ordonez is from)?
It is a very, very tricky thing trying to establish the "population" fro whence a player is selected or even if we can, to try and establish the mean and distribution of true talent with that population. For example, even though we can do a pretty good job of figuring out the mean and distribution of true talent in the entire population of MLB players, which is all we need to be able to regress a "random" MLB player's sample stats properly, how do we go about figuring out the true mean and distribution of all "big, strong, smart, left-handed hitters," etc.? That ain't so easy!
Fortunately, as Tango says, if we have lots of PA's for a player, the regressions are not very important anyway. As well, for players with few PA's, we usually don't know much about them anyway, which makes the regressions for their sample stats a lot easier also!
What has never been explained adequately, I don't think is the relationship between regressing a player's stats toward the mean (in order to estimate his true stats, or at least come up with a weighted average of all his possible true stats), using one of Tango's standard regression equations, x/(x+PA), and doing the same thing (estimating true talent) through a complete thorough Bayesian analysis. The Bayesian analysis the THE complete correct way to solve the problem. It basicaly goes like this: Given that 1% of all MLB players have a true BA of .200, 2% have a true BA of .220, 10%, .230, 3% .300, 1% .330, etc., if we have a random MLB player who is allowed to bat 100 times no matter what, and hits .230, what are the chances that he is a true .200 hitter, a true .220 hitter, a true .314 hitter, etc.? There is one and only one answer and it is as precise as a human being can get. You can express it as "given that he hit .230 in those 100 AB's, there is a 10% chance that he is a true .230 hitter, 1% chance that he is a .290 hitter, etc.," or "there is a 20% chance he is worse than a .230 hitter, 30% chance he is better than a .270 hitter, etc.," or, you can express it as "his most likely (the weighted average of #1 above) true BA is .244, but there is a 50% chance that it is between .234 and .254, and a 70% chance it is between .220 and .260, etc. BTW, those "intervals" I made up do NOT have to be symmetrical around "the weighted mean" or the "best single estimate of his true BA." The analogy is if we flipped a coin 100 times, we could say that the best estimate of the number of heads that will come up (this the above in reverse - here we know the true value and are estimating the possible sample values, but it is really the same thing as looking at the sample value and estimating the possible true values) is 50, or we can say that there is a 70% chance that between 40 and 60 heads will come up, or we can say that there is a 3% chance that 50 heads will come up, a 2% chance that 49 or 51 heads will come up, etc.
The above Bayesian analysis (or some equivalent version of it) is the ONLY perfect way to solve the problem of what is the true BA of a random player who hits .xxx in y nubmer of PA's. And it is perfect. No one can do better, given the conditions. In this case, the conditions are that the player is a random player from the population - we know nothing else about him, and that we know the exact distribution of true BA's among players in the population (1% of all players are true .330 hitters, 5% are true .290 hitters, etc.). Again, if we know the exact distribution of true BA's in the population and we draw a player from that population and "sample" his BA in x number of AB's, we can comeup with an exact, precise, and perfect (given what we know) model (the weighted mean and all the various possibilities) for his likely true BA. What you usually see in a projection, which is simply the result of the above Bayesian analysis, of course, is the player's "weighted mean." As I said above, you can see it represented in lots of other ways. Which ways are more useful is a personal choice I guess, but one is not more right than the other. I suppose it is better to have all the information (the weighted mean, plus the confidence intervals, or the weighred mean plus the entire distribution of possibilities), but what you usually see is just the weightred mean (a typical projection). To tell you the truth, the weighted mean is really all you usually need as that is the most useful piece of information,. and in anycase, the rest (the variance around that mean, or the confidence intervals, or even the entire distribution of possible true values) can usually be inferred.
So how does this rigorous Bayesian analysis, which I just said is the only perfect way to solve this problem, relate to "regressions" and Tango's handy regression formula? This is important, and if I get anything wrong, perhaps someone like Jordan or AED (or Davis or Hsu) can correct me.
If the distribution of true talent in the population is "normal" (or somewhat normal for all practical purposes), AND the sample distribution (in this cas of BA in x number of AB's) is also normal, then we can forego the "long version" of the above Bayesian analysis and use the shortcut version, which is Tango's regression formula. We know that the second part is true for sample BA's, since a batting average is basically a binomial and each event is independent (somewhat), etc., and a distribution of a binomial approximates a normal distribution.
As far as the first part, that the distribution of true BA's in the population is normal or somewhat normal, that's the tricky part. Tango says that it is (if we incorporate playing time). For purpsoes of doing a Bayesian anlysis on a player who has a certain sample BA in order to estimate his true BA or the distribution of p[ossible true BA's., I honestly don't know if we can or should assume a normal distribution of true BA's in the population. If there isn't (a normal distribution), then using a regression formula like Tango's handy one is NOT equivalent to using the rigorous (prefect) Bayesian approach. I know that it comes close, but I don't know how close. The closer the distribtion of true BA's in the population is to a normal distribution, the closer the "regression" model is to the real Bayesian model.
That's about all I have to say for now. The only other question I have which no one had answered yet, but I know that AED or someone like him has the answer to is this:
If the distribution of true talent IS normal, then we know for a fact that the true correlation, "r", between any 2 set of independent samples drawn from that population tells us EXACTLY how much to regress to the mean. For example, if we draw 2 sets of 500 AB samples and the true "r" when regressing one sample on the other is .500, then we know that the exact regression is 50% (1-r). Therefore, if a player's sample BA in 500 AB's is .300 and the mean (true) BA in the population is .250, our absolute best estimate of this player's true BA is .275 (.300 regressed 50% towards .250). That we know for a fact. We also know for a fact, that we can do the same thing for any number of AB's as long as we know the true "r" for that number of AB's. We also know (I think) that since all distributions are normal, that we can exactly infer (interpolate) the "r", and thus the proper regression (1-r), for any number of AB's, if we know the true "r" for any one numb erof AB's.
The question is is Tango's formula, x/(x+PA), the EXACT way to infer these other "r" or is it just a quick and dirty shortcut? I have a feeling that it is the latter, and if it is, I'd like to know the true equation.
The other question is, is it true that, given the same parameters as above, that we don't have to do a regression and calculate an "r" as descrived above - that we can take the expected variance for any number of AB's, assuming no spread of BA talent, and divide that by the observed variance, and that also gives us an EXACT estimate of the "r" and regression (in this case expected variance divided by observed variance equals regression, and "r" equals one minus regression)? And if that is true, how do we calculate the confidence interval (i.e., the "standard error") around that ratio?
If I use the regression method above, I know I can look at a table on the web that tells me what my confidence intervals are on the "r" that I get. If I use the "expected variacne divided by observed variance method," assuming that they will yield the exact same results (not counting sample error of course), I don't know how to find out the confidence interval aroun that result...
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 6:04 p.m.,
February 12, 2004
(#10) -
MGL
Every above-average player is overrated by their stats, and every below average player is underrated by their stats..
JC, yes, if I may re-state what AED just said, the above is almost, but not quite, a brilliant statement! I mention it about a dozen times a year.
Let me make it right:
Any player who has stats higher than the mean of whatever population they come from is overrated, on the average, and every player who has stats lower than that mean is underrated, on the average!
That is one of the most important things to remember in analysis of baseball stats (and of course in many other areas). Oh, and that assumes a Gaussian (normal) or at least a symmetrical distribution of talent in the population. I'm not sure if that statement is always correct for "weird" distributions of talent.
As Tango is trying to explain, that doesn't mean that EVERY player is over or underrated. It means that for EVERY given player, our best estimate of his true stats is ALWAYS between his actual stats and the mean stats from his population. It is a certainty, given the parameters I have just described. Of course, it is up to you to try and figure out what his population is, and then what the distribution of true talent in that population is. Sometimes it is easy and sometimes it is not. Most of the time in baseball, we can make a reasonable estimate.
Tango also keeps trying to explain that it might be better (easier to understand?) to stop presenting just one number and one number only to represent the best estimate of a player's true stats, although it IS correct to present that number to answer the question "what is the best estimate..."
So another way to present that above statement in bold - one that might be easier to digest for those not statistically savvy - is:
Any player who has stats higher than the mean of whatever population they come from is more likely to be overrated than underrated and every player who has stats lower than that mean is more likely to be underrated than overrated!
Statisiticians have a tendency to speak about probabilities as if they were certainties, or at least they are often interpreted as such. In sampling statistics, there are NO certainties (so to speak). Every "conclusion" in sampling statistics is a probability or series of probabilities. Statisticians also have a tendency to speak the "mean" when they really mean a set of probabilities. (I am not a statistician, but I tend to do that also.)
When we say that our estimate of a player's true OBP is .330 (given his .340 sample OBA), we really mean that there is a 10% chance it is between .325 and .335, a 5% chance it is between .320 and .325 or .335 and .340, etc. (I'm making those numbers up).
When we "say" .340, we mean the "weighted average of all those probabilities." We also mean, as I stated, that .340 is the "best estimate (if we had to pick one number) of that player's true OBA" (which we don't really know).
I can't explain things any better than that...
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 8:17 p.m.,
February 16, 2004
(#6) -
MGL
I agree with David that whether a trade or acquisition is "correct" or not for one team or another does not boil down to one number. Each team, like each person in real life, has it's own unique "utility funstion." The example I like to give is if a player goes to a casino to play blackjack, must he play perfect basic strategy which optimizes his expected win/loss during his blackjack session? Of course not. It depends on the player's utility. Does he WANT to optimize his win expectancy? Maybe not. Does he want to have fun, and playing basic strategy might detract from that fun? Maybe. Does he want to avoid getting critcized by other players? Etc.
Same thing for the Rangers and Yankees. Obviously part of the Yankees' "utility funstion" is to win at any cost, and not just (or at all) to optimize their payroll and ultimately their profits.
And that's not to mention that x number of marginal wins is worth different amounts to each time. As well, a player's "marquee value" is different for different players and different teams. Etc. There are a plethora of factors unique to each team that go into optimizing their "deals."
That being said, as Tango has stated, no matter how you slice it, this trade is probably somewhere near "even" for both teams. There are bascially 3 kinds of trades, acquisitions, or contracts. Those that are obviously bad, those that are obviosuly good, and everything else (the ones in the middle). I think that there is little doubt that this one is somewhere in the middle, for both teams.
I do think however that Texas is somehow udner the impression that if they lose A-Rod and are able to pick up a quality pitcher or two, that that is somehow better than having A-Rod. As most of us (at Primer) know, many teams are under the false impression that player value is NOT fungible (especially between pitchers and hitters), which of course it is (not considering the slight influence of run environment and player "synergy").
I'm pretty sure the Rangers think and thought that they would be and have been better off with less offense and more defense (pitching), even though, as we know, "a run saved is a run earned (more or less)." I don't think that the Rangers know that. So other than the nuances of the long-term effects of this trade, what do the Rangers gain by giving up around 3 wins a year and saving around 10 mil a year? Well, if 1 marginal win is worth 2-4 mil in profit for the Rangers, as it is for most teams (I assume), then they have gained nothing. And if they expect to make up those 3 wins by acquiring better pitching, it is going to cost them that 10 mil anyway (probably more, as a win in pitching probably costs more than a win in hitting). So the problem with Texas, and for Texas fans (are there any?), I think, is not whether this trade was "good" for them or not, but the fact that I don't think that Texas managment understands what it takes to put together a good team. On the other hand, although the Yankees may not understand how to evaluate players either (Jeter has to be by far and away the most overpaid player in baseball), if you are willing to throw almost unlimited amouns of money out there, it is easy to field a 90 or 100 win team, which makes the fans ecstatic. The fans don't care how much money you spend of course, or whether you spend money "optimally," as long as you are willing to spend lots of it, like the Yankees. What we forget from time to time, is that for some owners (e.g. George), and to some extent almost all owners, their teams are in between a business (which is mostly about bottom line) and their houses, cars, and other toys, which are about spending as much money as you want and can afford, to make you happy.
To change the subject a little, BPro had 2 terrible columns lately, which shocked me a little. One was a few days ago and was about the Phillies and closers. Some "study" about whether there was a "carry-over" effect after a blown save with Mesa, Bowa, the Phillies, or teams in general. The sample sizes and the "gross" measures in that study were so obviosuly small that any results would be next to worthless. As it turned out, there was not a "noticable" difference between the Phillies (Mesa, Bowa) and the rest of the teams, but there easily could have been a different result, given the ridiculously small samples and "gross measure" (whether they won or not the next day). The article and "study" was a horrible example of trying to "prove" a point by citing some almost meaningless data (either cherry picking the data or using random small-sample data and "hoping" that the results support your hypoothesis). I was shocked that BPro (and Sheehan I think) would stoop to that level of "research").
The other article I thought was terrible, and more germane to this thread, was today's article on "the trade." They kept talking about how much money the Rangers would "save" each year, which was fine, and then almost as an afterthought, they mentioned "but some or even all of that savings might get eaten up in lower revenue." Well, isn't that how the economics of running a baseball team works?
If you lower you win expectancy, you save money, but you lower your revenue, and vicer versa. Isn't that why teams spend money to improve their teams in the first place? The BPro article's implication was that those were 2 independent things. They acted like it was a revelation that even though they dumped a great player (and got a good one) and saved money, that it might actualy affect their revenue! Well, of course it will affect their revenue, otherwise all teams would have 25 replacement players! That's why a marginal win is worth 2-4 million dollars to a team, right? The trick for a team is to try and gain as many theoretical wins as possible for the least amount of money, as well as to try and increase their "revenue per win" as much as possible.
In the article, they kept saying that the Rangers would save around 10 mil per year. That's exaclty what they should save (going along with Tango's notion that the trade was about right for Texas) given the fact that they gave up around 3 wins per year by trading A-rod for Soriano!
I didn't understand the point of that BPro article at all....
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 4:07 a.m.,
February 17, 2004
(#9) -
MGL
If a marginal win is only worth 1 mil for a non-contending team, then they should all field replacement players as it is almost impossible to pick up a win for 1 mil. Of course that is a catch-22, since how do you become a contending team unless you spend some money when you are non-contending?
So Detroit signing Pudge was a gigantic waste of money?
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 12:07 p.m.,
February 17, 2004
(#17) -
MGL
This can be analyzed and debated ad infinitum, like "replacement value" concepts. The bottom line is that for any individual team, trade, acquisition, and contract decisions should be fairly simple.
Is what I am paying for player x worth the extra wins plus marquee value that player x provides for my team?
The only reason why there is such a large gap between what a player is "worth" on the open market and what his true value is to an average team, is that not all teams use this simple decision-making process for various reasons...
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 3:23 p.m.,
February 20, 2004
(#39) -
MGL
I don't have my other computers with me right now (I'm in transit on my laptop), but I wrote in another thread on Soriano's age how the difference between being 26 and 28 has an enormous impact on his 2004 projection, after one considers the "age adjustments." Guys like Silver and Shandler (and myslef) must have been scrambling to their computers to change Soriano's projection. Players peak at 26 overall. Batting peak is probably 27 (I don't have my aging curves in front of me). For every year after 26 or 27 the average player loses about 2 runs (RC or lwts) per year and then 3 runs per year after age 30 and 4 after age 35. Prior to age 26 or 27 they gain about 4 runs per year. So you can do the math (without James Brock systems, which BTW, doesn't use the correct aging adjustments and needlessly confuses the issue of aging).
So given that "the trade" was probably around break even for both teams with Soriano being 26, it was probably pretty bad for Tex with Soriano being 28. Then again, I can't imagine a GM, other than Beane, Epstein, or J.P., knowing, believeing, or understanding, that an average player peaks at 26...
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 8:28 p.m.,
February 20, 2004
(#41) -
MGL
Good job, Tango....
Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)
Posted 11:53 a.m.,
February 17, 2004
(#4) -
MGL
The more granular the data, the higher the SD, and the more accurate and useful the results.
I posted this on the above web site:
I'm making no judgements about win shares per se, but clearly it is not appropriate to average win shares in with these other metrics, especially in the OF. The scale and baseline are completely different for win shares, even though the author of this article has done an admirable job in converting it to something that looks like runs per 162 games.
Also, you want to be very careful in averaging metrics and taking the results seriously (as being better than any of the individual metrics). In general, in order to do that, you have to have reason to believe that one metric complements another and/or offers something that another one does not. And even then, you want to make sure that they are all on roughly the same qualititative level. If they are not, then even if there is some complementing, the inferior ones will bring down the average to a level worse than that of any of the individual superior ones. So you want to make sure that there is a high degree of independence among the individual metrics and that they all use about the same granularity of data.
In this case, besides the fact that I am biased towards UZR, as I said, win shares does not belong in this list, I am not familiar with rate2, and although Pinto's "system" appears to be good and very close to UZR, it is a black box as far as I know.
If I were in charge of the world, I would not be averaging UZR or Pinto's system with anything, any more than I would average lwts with OPS or BA with OBA. IMNSHO, UZR is the gold standard, principally becasue it relies on PBP (hit type and location) data (so does ZR and defensive average, but on a very "gross" level), whereas all the other metrics can only "estimate" such data...
More Help Requested (March 4, 2004)
Posted 2:35 p.m.,
March 11, 2004
(#23) -
MGL
FWIW, I'll throw in a non-technical comment about possible "junk ballots."
You have to be really careful about throwing away a ballot or portion of a ballot that is an outlier in terms of the rest of the ballots or in terms of what you think you know about a particular player, for obvious reasons. The whole point of the project is to get as many subjective evaluations as possible in order to augment the objective data AND what we think we know about a player's skills. Throwing out a ballot that doesn't "look right" is similar to the notion of not regressing Bonds' stats towards a mean becuase we just "know" that he is indeed a great hitter.
Now, given that, the only criteria for throwing or not throwing out a ballot should be whether the person submitting it was "honest" or not in his evaluations, and/or whether they had some minimum level of competence. Obviously, you don't know for sure about either of these criteria. Your "goal" should be in ascertaining honesty and competency, but again, you have to be really careful about judging them (honesty and competency) by comparing a person's evaluations with those of others and what you think you know about a player or players.
If it were me, I would do two things in order to determine whether a ballot or portion of a ballot is so questionable regarding honesty and competence that it merits tossing: One, if the "pattern" of the evaluations "looks" artificial, I would consider throwing it away, even if the actual ratings look reasonable. Two, if enough of the ratings on one person's ballot are outliers, I would assume that there is a systematic dishonesty or incompetency. If only 1 or 2 player ratings are outliers and there are no suspicious looking patterns on a ballot, I would leave it alone.
What kind of statistical methods you use to ascertain the two things I mentioned, I don't think is that important. I think you could just as well, do it by the "seat of your pants." If a ballot "looks goofy" and you suspect that the person is not being honest or for whatever reason is not competent/diligent, then just throw it out. I know you know not to throw out a ballot just because one or two of a person's ratings are far from everyone else's or what yo consider reasonable. In fact, it is almost a given that you "want" some unusual ratings for all players, again, otherwise you are somewhat defeating the purpose of the study.
In the "Bernie's arm" example, if someone has his arm as average or even excellent, as long as their ballot otherwise looks OK, you definitely want to NOT throw out that ballot. You WANT unusal ratings like this, even though the unusual rating itself "suggests" that the person is not being honest or is incompetent. Ideally, you would want to pre-test or determine independently honesty and competency, but since you can't, you have to do the best you can with the ballots you get. I would try and err on the side of NOT throwing away ballots unless you think you have some INDEPENDENT (of the actual ratings) evidence of dishonesty or incompetency.
I wouldn't worry too much about it...
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 1:31 p.m.,
March 10, 2004
(#10) -
MGL
I am amazed how the actual/expected wins for many teams do not match up with the popular perception of whether the team spends money wisely or not. Of course, I'm not sure what the perception was in that time period (95-99).
I am also surprised at how fairly efficient many teams are. I would have thought that there would be a lot more inefficiency.
An interesting study would be to see whether and by how much teams are becoming more efficient in spending money for talent (wins). Would the "r" represent that efficiency? Tango, is that what your r's are (each team's marginal dollars spent, regressed on their marginal wins)?
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 2:23 p.m.,
March 10, 2004
(#12) -
MGL
I'm reading Pappas' articles now (I just got my BP 2004 yesterday).
It seems to me that if you want to evaluate a team's efficiency or "smarts" in spending, using long-term salary/wins data is not bad but that in the short-term, it is terrible. In the short-term, you have the gap between pythag wins and actual wins, you have the gap between a player's projection and his actual perforemnce and you have the injury factor. All of these things are beyond the team's control and create a lot of noise in the short term. Even if a marginal dollars/marginal wins model is good for "evaluating" teams in the long-term, you have different GM's and even different owners, so the results are not that meaningful.
Judging a team's overall "managment talent" or efficiency is REALLY complicated. Much more complicated than just comparing marginal dollars to marginal wins. You have drafting and player development (if a team is good at that, they get good players really cheaply for a few years), you have the years before arb and FA, etc.
It seems to me that the most simple and effective way of "evaluating" a team's efficiency ("smarts") in spending is to compare each individual player's post-arb and FA salary with his projection at the time of the signing.
Tango, it also seems to me that the progression of "r" (from marginal wins regressed on marginal dollars for each team) over the years should give you a very nice idea as to whether and by how much teams are getting more efficient (better in evaluating player talent). I'd love to see a list of the year by year "r's" using Pappas' data...
MGL's superLWTS (March 10, 2004)
Posted 7:16 p.m.,
March 10, 2004
(#2) -
MGL
How come the adjusted stats changed to per 150 games as opposed to per 162?
No particular reason. It's a "rounder" number? Actually, because it is closer to the number of games an average starter plays per year (I think), so that it is easier to "eyeball" relative value among starters...
MGL's superLWTS (March 10, 2004)
Posted 7:36 p.m.,
March 10, 2004
(#3) -
MGL
A word of caution. If anyone wants to use a 4-year Superlwts total to "represent" a player's true talent (such as "I'd rather have so-and-so on my team than so-and-so," or "so-and-so is 'better' than so-and-so"), please at the very least regress his sample Superlwts to the league average of zero. It is simple. Use 400/(400+TPA) as the regression coefficient. For example, if a player has a 4-year Superlwts of +20 per 150 and he has 2000 PA, then the regression coefficient is 400/(2400), or .167. So regress the +20 16.7% towards zero, or simply mutliply +20 by 1-.167 (.833). So his true Superlwts is 16.7 rather than +20. This is critical when comparing players who have a large difference in their number of PA's and for players whose sample Superlwts far from zero (plus or minus). If you want to get even more accurate (as far as a player's true talent now or at some point in the future), you can do some kind of age adjustment by using the following rule of thumb: A player gains 4 runs per year (150 games) prior to age 26, loses 2 runs per year after age 26 and 4 runs per year after age 35...
MGL's superLWTS (March 10, 2004)
Posted 1:13 a.m.,
March 11, 2004
(#6) -
MGL
MGL, if we need to regress a player's SLWTs to find his real LWTS, why don't you just regress them before you publish them?
I present the precise, actual sample data, you guys can do whatever you want with it.
Tango, sure, you should regress each of the components differently, and yes, the regressions probably interact with one another - i.e., you should use a multiple regression equation. However, a more than adequate, Q&D regression can be done with the Superlwts total.
The "400" came directly from the y-t-y "r" for the total Superlwts. However, there was a bug in my program that compiled the regression data. After fixing the bug, the y-t-y "r" for players with at least 300 PA in back to back years is .612 (avg. of 507 PA). That gives us a regression of .388 for 507 PA's. Since that 507 PA's is only an "average," we should probably use like 520 PA's for the .388 regression. That gives us an "x" of 330, rather than 400. Does that look better?
For those who don't understand what the heck I am doing above, just substitute 330 for the 400 in the "regression formula" I posted in #3. So the amount of regression (towards zero) is 330/(TPA+330).
Tango didn't mention it, but if you also want to be more accurate with the regression, you can use something other than "zero" to regress everyone's sample Superlwts towards. Remember that the "zero" is the theoretical mean of the population from which the player comes. Since we already are adjusting for defensive position, that is already taken care of. There are other "populations" you can create for any given player, such as players who are big and strong, players who hit lefy or righty, players who were top prospects, players who were not (as long as whether they were prospects or not was not heavily influenced by their actual AA or AAA stats, since those are part of the Superlwts data), etc.
If you identify or create one of these, or some other, "populations," you can use the mean total Superlwts for that population, rather than "zero" to regress towards...
MGL's superLWTS (March 10, 2004)
Posted 2:02 p.m.,
March 11, 2004
(#8) -
MGL
David,
I submitted the file to Dan S. about 2 months ago and was hoping to have it up on Primer in a nice format. Finally, I asked Tango to post it. I probably should rewrite an "article" briefly describing the method for calculating each component, as a few things have changed since the original Superlwts article. If I have time, I'll do that and submit it to Dan and hopefully he can put everything up somewhere in a nice format. I don't know what takes so long. I feel like Superlwts is the best (by far) total evaluation "system" out there, especially after one does the regressions (or at least some kind of Q&D regression), so it should probably get a little more pub than just a blip on Primate Studies...
MGL's superLWTS (March 10, 2004)
Posted 8:45 p.m.,
March 11, 2004
(#11) -
MGL
Jay Fan, good questions. No, the UZR ratings are not "discounted" for whatever influence a player's pitchers have on balls in play. UZR IS adjusted for handedness, G/F ratio, etc. (read the articles on UZR). On the average, the variance of UZR is about half that of offense (mayebe a little more), I think. Remember that hitting and defense are pretty much independent, so that any player who happens to have his hitting lwts be around zero will likley have a UZR that is further from zero than his hitting lwts. That doesn't mean anything in and of itself. Also, there is no particular reason why hitting has to be more "important" (greater variance) than defense. It is, but it doesn't have to be. One of the things you get when you come up with a more "accurate" metric (like UZR comapred to ZR), is a greater spread in observed talent (assuming that there is a large spread in actual talent). That is a good thing. In fact, the "spread" of a "perfect metric" is always the spread (variance) of true talent PLUS the random variance associated with a sample of that talent.
As far as Erstad and his 56 runs saved one year, don't forget that there is always sample error associated with a sample UZR. Therefore almost any particular sample UZR (+56, -100) is theoretically possible. Given enough players and enough seasons, we will see some +100 and -100's, or whatever. There is a limit to what we could possibly see, as there are only so many balls a player can be "repsonsible" for in any one season (the tails of the curve do not extend infinitely). The +56 for one season doesn't mean much in and of itself (it is simply a one-year sample of his true UZR) other than it is likely that Erstad is a very, very good defender. If we want to estimate exactly how good, we can look at x number of years of UZR, regress, and come up with a reasonable answer.
MGL's superLWTS (March 10, 2004)
Posted 11:02 p.m.,
March 13, 2004
(#15) -
MGL
SB/CS is simply SB*.18 - CS*.46 (or something close to that). The interesting thing about net SB/CS runs, is that there is almost no y-t-y correlation (near zero) for all players (with a min number of PA's per year). I suspect 3 things are going on there. One, all players, fast, slow, medium, good or bad basestealers, tend to run themsleves into a break even or slightly negative net SB/CS runs total. Two, there is only a small window of time (at a certain age) where a player has a good net SB/CS runs total. Three, there are only a relatively few players who, for whatever reasons, are able to maintain for several years a positive SB/CS net run total.
SB/CS are not inlcuded in the batting lwts. Neither are GDP's. All outs are treated as single outs, although GB outs and FB outs are given different values. Also GB outs by a RHB are treated differently (given a different lwts value) than those by a LHB. A GB out by a LHB is less negative because the advance runners more often. FB outs are around the same for RHB and LHB (I think, off the top of my head). Also, ROE's are NOT considered outs in the batting lwts. ROE's are given a separate positive value (around the same as a single, a little more I think). There is a fairly signifciant y-t-y correlation for a player's ROE's even after their G/F ratio and handedness are taken into account.
GDP's are figured sepatately and are based on a player's number of GDP'S PER OPPORTUNITY (runner on first, less than 2 outs) above or below average.
The whole idea of Superlwts is to try and measure anything we can think of that a player "controls" and that adds or subtracts from his team's runs scored or allowed. As Tango says, each of these "things" has a different y-t-y correlation (the player "controls" more or less), so that if we want to use Superlwts to do a projection for a player (how valuable will he be in any future time period to an average team), or to estimate a player's true talent or value, we have to regress each "thing" separately and differently (using different regression coefficients). As I said in a previous post, a Q&D, and more than adequate, method of regression for projections and estimating true talent or value, is to regress the Superlwts total around "50% per 330 PA's" (see the formula in one of my above posts).
If there is anything of value and a player has control over that I may have missed, feel free to let me know, as long as it is not trivial...
MGL's superLWTS (March 10, 2004)
Posted 5:43 p.m.,
March 14, 2004
(#17) -
MGL
Lemme see if I can scrape up some numbers for you. If you download the A.S.S (Astros Daily) database and their software (or the retrosheet data), you can compute any of these numbers you want.
There are about .356 dp opps (runner on first, less than 2 outs) per game. 24.9% of the opps result in a GDP. I use -.567 as the additional value of the DP (difference between a DP and single out).
The way a player's GDP lwts is caluclated is to simply take the difference between his expected GDP given his number of opps and his actual GDP, multiplied by -.567, prorated to the average number of opps per 150 games. The player's OBA or anything else about him has nothing to do with it. I'm not even sure what you mean. Of course, if a player plays on a team and/or is in a lineup slot where the players ahead of him get on base more or less often than average, he will have greater or fewer number of opps such that his positive or negative GDP Superlwts value per 150 will actually be a little "higher." That is the essence of Tango's "custom lwts," and would apply to Superlwts as well.
Here are the lwt values of the various events for 00-03 in the NL:
PA 399,795
Avg RE .512
All out, no sac -.271
All out, no sac, no err -.283
out, no K, no sac -.266
out, no K, no err, no sac -.283
K out -.284
GDP inc. bunts -.822
GDP opp, GB out, no DP -.268
out, no DP, no K, no bunt -.255
GB out, no bunt -.265
FB out, no bunt -.262
LD out, no bunt -.288
GB out, no bunt, no err -.293
FB out, no bunt, no err -.266
LD out, no bunt, no err -.296
GB -.095
FB -.004
LD .357
Hit val, no bunt .646
Hit val, no bunt, no hr .540
GB hit, no bunt .477
FB hit, no bunt 1.023
FB hit, no bunt, no hr .691
LD hit, no bunt .577
LD hit, no bunt, no hr .545
GB err, no bunt .478
FB err, no bunt .608
LD err, no bunt .560
Non-int BB .301
HBP .328
BB+HBP .304
IBB .157
ROE (no sac) .492
S .452
D .764
T 1.065
HR 1.394
Suicide squeeze att -.181
Non-squeeze .002
Sac first att -.160
Sac first/second att -.092
Sac second att -.108
No Sac first .014
No Sac first/second .038
No Sac second -.008
SB .174
CS -.447
MGL's superLWTS (March 10, 2004)
Posted 10:11 p.m.,
March 14, 2004
(#18) -
MGL
I'm using too high a number (too negative) for a marginal GDP. The -.567 should be like -.466, as the value of a single out with a runner on 1st and no DP is -.356. (These numbers are all NL 2000-2003.) I was using the difference between the GDP out (-.822) and value of the GB single out (-.268) as the value of a marginal GDP, rather than the difference between a GDP out and all single outs (with a GDP opp). So my GDP Superlwts numbers are around 20% too high (negative and positive).
Also, keep in mind that the lwt values (of all the events) are based on the average RE's of the before and after bases/outs states. IOW, it doesn't take into consideration that SB's are attempted more often by the faster baserunners and from certain lineup slots. As I said, it just uses the average RE's of the before and after states, which isn't quite right, but is close enough. I definitely don't use the actual runs scored after a certain event (SB or CS) and substract that from the "before RE's," although I suppose I could and that it might be more accurate...
Silver: The Science of Forecasting (March 12, 2004)
Posted 3:38 a.m.,
March 13, 2004
(#11) -
MGL
Surely, this approach is more complicated than the standard method of applying an age adjustment based on the 'average' course of development of all players throughout history. However, it is also leaps and bounds more representative of reality, and more accurate to boot.
This is an unbelievably presumptuous statement, especially the last sentence. How in the world do we (or even they) know that it is "more accurate" or "leaps and bounds more representative of reality?"
There is something insidious about making claims or even writing "academic style" articles about a methodology that is a "black box." I'm even tempted to say something like "I think your forecasting method stinks," and if they want to prove it doesn't (stink) they need to put up (tell us the methodology) or shut up.
You simply can't defend something when you won't tell anyone exactly how it works. And as I said, I sincerely don't think you should be making too many claims about how great something is when no one knows how it works. Not only do we have an absolute right to be skeptical, we have an absolute right to reject those claims summarily.
Given that, it sure sounds like Pectoa does one hell of a job in its forecasting methodology, but we have absolutely no idea whether it does what it says it does, or whether any of their impressive claims even has any merit.
I am especially skeptical of the whole "similarity score" thing. One, once you start using 5 or 10 or 15 "similar" players in order to forecast a player's future performance, you run into huge sample size problems. Kind of like their 6'4" catcher study (on Mauer). That was indeed a ridiculous assertion they made about Mauer based on a handful of other 6'4" catchers (and not 6'3" ones, or 6'5" ones, if there were any), as Gleeman properly pointed out. The problem with their "conclusion" was threefold - one, the small sample size, two, the elimination of other useful data (e.g., 6'3" catchers), and three, the assumption that height means anything at all in terms of a catchers career value (maybe it does and maybe it doesn't). Getting back to their "similarity score" methodology, the second problem is that, do we know that a player's projection is a function of the "type" of player he is, independent of what an average overall player with the same historical stats would be projected at? Before I started sacrificing sample size and used similar players to do my forecasting, I would want to be darn sure that "player types" are significantly related to a player's projection notwithstanding his historical stats and the usual Marcel-like projections. I have never seen anyone show that to be true, which is why I think the statement that "it is also leaps and bounds more representative of reality, and more accurate to boot" is incredibly presumptuous...
Silver: The Science of Forecasting (March 12, 2004)
Posted 4:21 p.m.,
March 13, 2004
(#18) -
MGL
There is one thing that I would also like to know. Of what practical use is knowing the "reliability bands"? I don't mean to imply that there are none, only that nothing obvious jumps out at me.
Take the most extreme cases. Player A (a pitcher) has no history so we assign the league mean (or league rookie mean or whatever) to his 50th percentile. Say that's a 4.50 ERA. His "band" would simply be the distribution of true talent among players with no history (again, say rookies), whatever that might be. Let's say that Player B has the same 50th percentile, but that he has lots of history such that his band is narrower. Which player would any given team want?
What about the same 2 types of players, but player A has a little history and that history is fantastic, such that his 50th percentile is 3.50, and player B is a veteran with a 3.50 mean projection. Again, of what practical importance are the "bands" to any given team?
Silver: The Science of Forecasting (March 12, 2004)
Posted 4:23 p.m.,
March 13, 2004
(#19) -
MGL
Tango, I would guess that the "similar players" have a lot to do with the "bands" such that it would not be obvious what those bands will be just by looking at historical playing time and projected playing time...
Silver: The Science of Forecasting (March 12, 2004)
Posted 4:21 p.m.,
March 15, 2004
(#32) -
MGL
Ballpark factors fluctuate like crazy from year to year, even indoor ones, like in Montreal. Unless you heavily regress AND use many years of ballpark data, you are asking for trouble. A ballpark factor never changes unless the park iteslf or the rest of the league changes. Montreal has not changed anything about their stadium since 1977 (I don't think off the top of my head). Last year, it "played" as a tremendous hitter's park. In 2001, it was a hitter's park as well. Over the long-haul, Montreal is a neutral, if not slight pitcher's park. In fact, I use .98 as the OPS park factor in Montreal (based on 10 years of component data). To use anything but neutral park adjustments for Montreal hitters (and pitchers) is crazy (when a park "plays" as hitter's park, there is a teenie-weenie suggestion that the hitters are "suited" for that park).
I want to reemphasize that how a park "plays" in any given year has nothing to do with how you should adjust a player's stats in that year (other than the fact that the long-term park factor is changed by the addition of that year's stats). I would suspect that Pecota has the same type of "bias" for past and present KC players, as Kauffman has "played" as an extreme hitter's park for the last 3 years, even though the long-term park factors suggest that it is only a slight hitter's park (after the most recent changes in dimensions/turf).
I think that if we examine individual Pecota projections and even biases and patterns within all the projections, we will find a lot to criticize, as any "blanket" projection engine is going to miss the boat in a lot of individual cases, unless a human being goes through each projection with a fine tooth comb (maybe Pecota does that - I don't know). When the smoke clears, however, the fact that Pecota appears to be as good or better than most of the other forecast engines says SOMETHING about what it does.
Whether Pecota's "error bands" are doing anything at all (or whether they are all screwed up) can easily be checked I would think. If someone like Tango were to do some "generic" Q&D error bars for X number of players, based on projected playing time and amount of historical stats (the 2 major components to a "generic" error bar), we could easily compare these to Pecota's error bars for, say the 2003 projections, by looking the variance in actual 2003 stats. For example, if Tango has 50 players who should all have around the same variance (width of error bars), but Pecota has half of them with a much narrower band and half with a much wider band, we can see what the actual variance of 2003 perforemnce was in these 2 groups. If there is a significant difference (in the right direction of course), then it would suggest that Pecota is doing something really cool (and correct) with their "funky" error bands. In general, I suspect that 95% of the reason that Pecota does well is simply because it is a good Marcel-type engine. The other 5% MAY be due to their "secret, yet powerful" "similarity thing," although I remain skeptical that it is any help at all. I remain even mor skeptical that their "error bands" have any useful meaning whatsoever...
Silver: The Science of Forecasting (March 12, 2004)
Posted 5:41 p.m.,
March 15, 2004
(#34) -
MGL
MGL, please remember that Nos Amours played 25% of their home games away from the Big O.
Yup, I keep forgetting that...
Silver: The Science of Forecasting (March 12, 2004)
Posted 11:57 p.m.,
March 15, 2004
(#41) -
MGL
I was actually asking the question, of what value is accurately estimating the bands? I don't see any obvious (significant value), but surely there is some. I suppose that if you are the overwhelming favorite to win your division, you want narrow bands for your players, but I'm not ewven sure it matters much there. If you are DET (or TBA, et al.) with no shot at making the playoffs, it is not clear whether you want a high probability of simply improving (good players with narrow bands) a little or you want a shot at improving a lot but possible not improving at all (average or good players with wide bands). I suppose that you could create an argument that if you have a low payroll, you are forced to go with cheaper players with wide bands and hope to get lucky. Again, I'm not real sure it makes all that much difference, and the issue here is whether Pectoa is forecasting these bands any more accurately than "Marcel" (the "confidence band" version of Marcel) can do.
Surely Nate would have already tested all of these so-called innovative engines and algorithms. Wouldn't he have?
Silver: The Science of Forecasting (March 12, 2004)
Posted 1:47 a.m.,
March 16, 2004
(#43) -
MGL
Michael, that's basically what I said. What I am not sure of is whether it is going to matter more than an iota. There simply aren't going to be a whole lot of players who have markedly different error bands. Even when there are, you would still have to calculate how much one player is going to affect the team's distribution of wins and losses. Even with the extreme example you give (a SD of 1 verus .25), I'm not sure that it would have much impact on the team's chances of winning the division one way or another. More importantly, a team usually has a choice of pitcher A who is 4.0 with a SD of 1 run or pitcher B who is 4.5 with a .25 SD. Greater confidence usually comes with a price. I doubt that you could tell me off the top of your head which pitcher is better for the Yankees or even for the Jays...
Silver: The Science of Forecasting (March 12, 2004)
Posted 9:25 p.m.,
March 17, 2004
(#49) -
MGL
btw, AED, I used your basetball rankings for my bracket. Thanks.
I didn't realize that if you change (remove) 2 letters you can go from basketball to baseball. Is "basetball" some kind of new game? Like from the movie "baseketball"?
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 3:12 a.m.,
March 13, 2004
(#2) -
MGL
There's something obvious missing from this analysis -- the benefit of aggressive base running. If a team takes a lot of risks on the basepaths, they'll make more outs, sure, but they could also take more extra bases. The value of those extra bases is missing from the Cost columns above.
I'm shocked that this statement is presented as if it were an afterthought. There is absolutely no reason to think that extra outs on the bases is a bad thing (it might be, it might not be). It is entirely possible that the average team is way too conservative on the bases (that would be human nature - the psychological "dynamic" of baserunning is completely different from that of basestealing) such that teams with higher than average OS's may be running the bases MORE optimally. Just becuase it so happens that basestealers seem to run themselves into a near break even situation, that is not necessarily the case with baserunning. Getting thrown out on the bases, particularily at home, has such a stigma attached to it, that it is very possible (even likely) that the average team does not attempt to take the extra base nearly as often as they should. If that is the case, then the teams that were running the bases most optimally WOULD have the highest number of OS's. In fact, if Wolverton had presented the base stretching opportunities and the successful stretches as well as the OS's, you would see how conservative baserunners and coaches are (there are relatively few O'S compared to the number of opportunities and even comapred to the number of successful stretches). I have long suspected that the average baserunner/team was MUCH too conservative on the bases, and this study does nothing to address that issue. In fact, it confounds it. Without some kind of analysis on the cost/reward ratio of a team OS versus opps and success rates, the data presented in the article does not pass the "so what" test...
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 11:29 p.m.,
March 14, 2004
(#5) -
MGL
Tango, you are quite optimistic as far as what a good saber could add to a team! You should be my agent! Baserunning, basestealing and closer usage are probably the most valuable things in terms of optimization for a team. The jury is still out on bunting, BTW. It is NOT true that the bunt is generally a bad play.
Optimizing a lineup is ususally not worth that much either. Occasionally it makes a big difference, but usually one reasonable linuep is within a couple of runs of another.
As far as optimizing closer usage (by eliminating some of the low leverage innings the closer usually pitches, like 3 run leads in the 9th when at home, and adding some high leverage non-traditional innings, like tie games in the 8th and 9th, and using your closer for 2 innings more often), it is not clear to what extent "disrupting" traditional closer usage patterns might actually "cost" a team runs and wins.
As far as player acquisitions, trades, and valuations for salaries and contracts, that's where the "money" is. Then again, I believe that the market is much more efficient and teams are much smarter than they used to be. 10-15 wins is an awful lot of wins. I'm not even sure what you mean Tango. Do you mean that the average non-sabermetric team can increase their WE by 10-15 games on the average given the same payroll by using sabermetric analysis to evaluate players?
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 11:36 p.m.,
March 14, 2004
(#6) -
MGL
Wayward, I really have no feel for how sub-optimal baserunning currently is and how many runs/wins a team could add by optimizing it. If I had to guess, I would also say .5 to 1 win per year, which is a lot for one single thing. Also, any time you can optimize something and there are probably few if any hidden costs, it is a "gift horse." For example, with optimizing bullpen (closer) usage and even lineups, there are many potential hidden costs which are hard to put a finger on.
Contrary to what Wolverton says in his article, you CAN figure out to what extent teams are optimizing their baserunning or not, and if not, how much they are costing themselves, as well as what the proper baserunning strategy should be. It is a little tricky, but it can be done. That is one of the things that Tango and I are working on for our book, and should be fascinating (as well as the bunting and other stuff)...
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 4:22 a.m.,
March 15, 2004
(#8) -
MGL
Kyle, actually it's not that complicated. We use RE (run expectancy) as a proxy for WE (win expectancy) anyway. Early in a game they are about the same. Later in a game, we want to use WE exclusively. There is no particular reason why we can't use WE and WE tables (and we should). It is just that it is easier to work with RE and generate RE tables. Figuring what strategy is the most optimal in baseball (e.g., bunting or not) has nothing explicitly to do with "risk and utility functions." It is simply that alternative which yields the highest WE (best chance of winning the game). Of course, there are sometimes hidden costs when it comes to analyzing strategy alternatives. That is not the case with bunting, although there are some game theory issues which come up.
BTW, it is not the chance of "scoring 1 run" which is relevant in close games, it is the chance of scoring "at least 1 run." There is a big difference. For example, if we put a man on second with 1 out, on the average, a team will score exactly 1 run 23% of the time. With a man on first and 0 outs, it is 18% of the time. It might seem that the former is better than the latter in a tie game in the bottom of the 9th or later. It is not. We are not concerned about the "chances of scoring exactly 1 run." We are concerned with the chances of not scoring, or the converse, the chances of scoring at least 1 run. For that, it is 43% with a runner on first and no outs and only 40% with a runner on second and 1 out. So in the bottom of the 9th of a tie game, without knowing anything about the batter, pitcher, linuep, etc., we would NOT accept an offer to move a runner to second and take an out. The WE tables confirm this. With a runner on first and no outs in a tie game in the bottom of the 9th, the WE is .715 and with a runner on second and 1 out, it is .703.
That still doesn't tell us whether it is correct to bunt or not in the bottom of the 9th in a tie game or at any point in the game for that matter. First, we need to know what the average WE is after a bunt attempt, and of course that varies with the bunter, the pitcher, and how the defense is playing. Then we need to know what the alternative would yield - what the average WE is if the batter swings away. That is real complicated. We need to factor in the hitting skill (hitting projection) of the batter, the pitcher, and rest of the run environment (park, weather, etc.). On top of that we need to adjust that hitting skill for the fact that the defense is maybe (probably is) expecting the bunt. As it turns out, the final 2 numbers (WE bunting and WE not bunting) are awfully close, a lot closer than most sabers think, such that whether bunting is "correct" or not depends on a dozen factors or so and is not that easy to figure out.
The final "answer" will be in a "rule of thumb type table," that looks something like Tango's "walk Bonds or not" table - yes, no, or "use your intuition (flip a coin)," with the rules of thumb criteria being "good, bad, or average hitter, lineup slot, good, bad or average pitcher, low, med, or high run scoring environment (park, weather), good, average, or bad bunter (speed/skill of bunter), position of defense (expecting a bunt or suspecting a bunt), and inning/score of game"...
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 9:55 p.m.,
March 16, 2004
(#17) -
MGL
MGL, last summer you were arguing pretty strongly for optimizing lineups. What's changed?
Sometimes it makes a big difference, sometimes it doesn't. Sometimes it makes a small difference and sometimes it doesn't. Not much more I can say. Without looking at a year or 2 worth of team lineups and then comparing them to an "optimal" lineup, I can't even begin to say with any certainty how sub-optimal lineups are on the average. If I had to guess, I would say that the average linuep is less than 5 runs per year sub-optimal. That doesn't mean that there is an occasional really bad linuep. Some things just have to be looked at on a case by case basis...
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 9:28 p.m.,
March 17, 2004
(#22) -
MGL
AED, I am surprised also. Can you tell us the 5 teams and the lineups you optimized? I'd like to check it on my sim as well. I think the bullpen usage (I assume you pretty much mean the closer) is high also (3 wins). Can you tell us how you arrived at that?
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 6:52 p.m.,
March 18, 2004
(#29) -
MGL
From playing with lineups on my sim a million times, I would have to say (asuming that the answer lies in the sim) that "rules of thumb" don't work very well when it comes to individual lineups. For example, you can often put a poor, but fast hitter at the top of the linuep and you are still reasonably optimal. K. Robinson batting number one for the Cards works about as well as Renteria even though Edgar is vastly the superior hitter...
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 11:08 p.m.,
March 18, 2004
(#32) -
MGL
Rally, I don't understand you comments. Is the lineup you posted the optimal lineup (according to APBPA)? Adds .06 rpg as compared to what? Switching Kennedy to 2 and Erstad to 9 from the linuep you listed? or from what lineup? So the listed linuep is NOT the optimal one?
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 11:35 p.m.,
March 18, 2004
(#33) -
MGL
The key to using a sim to estimate optimal lineups in real life is having reasonable projections for each player. Without that, the sim is near worthless for optimizing lineups. Does APBA set the stats for each player or does the user? If the former, do they use a good projection engine? If it is the user, what do you use for the projections? Does it allow you to set each component or just OBP and SLG (or something like that)?
I ran your lineup thru my sim and it generated 4.978 rpg versus a typical RHP. I then used an internet version of the real projected ANA lineup, which is (the numbers are their lwts per 150 projection):
Erstad -13
Eck -19
Vlad 35
GA 2
Salmon 3
Glaus 23
Guillen -4
Molina -36
AK -14
This generated 4.867 rpg versus the same pitcher, a difference of 18 runs per 162, which is huge. Then again, I don't consider the above linuep to even be reasonable (having 2 of your worst hitters hitting 1/2), although I'm sure that Scocsia does.
I think this is a relatively rare example of a big difference between a manager's lineup and the optimal one (on paper).
Let's see if I can tweak your lineup to make it better...
I didn't try all the combinations, but I couldn't do any better...
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 3:18 a.m.,
March 19, 2004
(#35) -
MGL
Off the top of my head, it looks pretty ugly batting a good and fast hitter last and behind the only real lightweight in the lineup, but you never know...
The first lineup scores 5.178 rpg (versus a RHP) and the second scores 5.143, which is 3 runs worse per 120 games (typical number of games verus a RHSP).
Versus a LHP, the first lineup scores 5.617 and the second scores 5.658. The thing about optimal lineups is that there is usally one for RHP's and one for LHP's, especially if a team has several LHB's like Boston (and maybe different ones for GB or FB pitchers, a la Mike Gimbel)...
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 5:08 p.m.,
March 19, 2004
(#45) -
MGL
Don't know how much difference it makes, but if you use a Markov type sim, you have to at least include the speed of the players on the bases.
Mine is a "true sim" which actually "plays" the game, using log5 matchups between the batters and pitchers, pinch hits, plays the infield in, steals, errors, wp, etc. Don't know how much, if anything, all these "extras" affects the optimization of lineups. I assume that programs like DMB are also "true sims"?
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 2:23 a.m.,
March 20, 2004
(#47) -
MGL
AED, for a team that has more than one excellent short reliever (or more importantly, no one reliever who is a lot better than the others), they will pretty much use them optimally be default. IOW, in almost all high leverage situations, one good releiver or the other is probably going to pitch. The only time a team shortchanges itself in terms of sub-optimal use of their best reliever(s) is when they put in a reliever who is substantially worse than their best reliever in many high leverage situations (and presumably "waste" their best reliever in many low leverage situations - like ahead by 3 runs in the 9th at home). For example, when the Yankees had Rivera and Wettelend, they pretty much had a great releiver pitch in all high leverage situations. Ditto for Wagner and Dotel on the Astros and probably some other teams as well.
As far as I can tell, the biggest gain from a non-traditional use of the closer comes from using him in the 8th or the 9th in tie games and then using him for another inning if the game is still high leverage. If you aren't going to use him for 2 innings no matter what, then the gain from optimizing his use is not that great. What happens in that case is that a lot of the times you bring him in in the 8th in a fairly high leverage situation, you will not use him in the 9th in an even higher leverage situation. Some manages and pitching coaches will tell you that they don't like using their closer for more than one inning no matter what, so who knows....
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 3:03 a.m.,
March 21, 2004
(#48) -
MGL
The Diamondbacks had excellent relievers in Valverde (ERA=2.15), Mantei (ERA=2.62), and Villareal at times (ERA=2.57). Yet the bulk of the work in tie games went to Oropesa (ERA=5.81). Granted that this may be an especially egregious case, and part of Valverde's low ERA was due to batting average on balls in play, but it isn't the case that managers with 2+ good short relievers will have close to an optimal bullpen usage.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 10:37 p.m.,
March 25, 2004
(#54) -
MGL
You mean if your scouts told you that Valverde was ready to come up mid-season you wouldn't have promoted him? I don't believe it.
I'm not sure what you mean. My projection for him going into 2003 was not very good. His MLE's showed a pitcher with a great K rate and a horrendous BB rate. Everything else was OK. Plus he only had 52 IP's in AA and AAA prior to 2003. The key to his success was clearly going to be his control. I assume he has excellent stuff (I don't recall seeing him, but his K rate is so high, he must). His control was much better in the majors last year, but there is still room for improvement there. We expect his "hits allowed" to regress quite a bit in 2004 (DIPS) as they were ridiculously low in 2003.
The answer to your question is that:
1) It depends on what his stats at mid season looked like. I would simply update his projection.
2) Once I had a scouting report on him, I could tweak the projection (basically adjust the "mean" towards which I regress).
3) I would have to ask the scouts about his control - e.g., has someone been working with him on his mechanics to improve his comtrol?
4) I have no problem deferring to the scouts on a young prospect, as my projection is not going to be very reliable in the first place, especially if I have only 52 innings of history (MLE's), and I cannot "project" development in pitchers.
Why did you ask about "mid-season?" That is what confuses me about your question.
Park Factors (March 18, 2004)
Discussion ThreadPosted 6:48 p.m.,
March 18, 2004
(#5) -
MGL
Tango, as you know, I now use park factos that are even more granular than Voros' (or anyone else's) component factors. I use home runs to left, right and center, GB "speed" factors thru the IF, foul out factors, IF hit factors, bunt single factors, etc.
How much do you think this addresses the issues you are talking about? I would think a lot.
Park Factors (March 18, 2004)
Posted 11:03 p.m.,
March 18, 2004
(#8) -
MGL
How can you possibly calculate a LH Pac Bell park factor that includes 15% Bonds PAs, and then apply that PF to Bonds himself?
Actually, there is nothing wrong with that at all, given a large enough sample. With large enough sample, any given player's actual home/road splits ARE his exact PF's.
Are your HR factors a function of "long fly balls hit", or a function of HR hit? This makes a HUGE difference
You make an assumption which it turns out is not true! I've done it both ways and there is very little difference.
We've had this debate a million times before, but once you use many years of data and regress, there simply isn't that much difference between one method or another (for example whether you use the odds ratio or the "rate" method).
The bottom line is that you adjust as best as you can for the dimensions, altitude, prevailing weather, foul territory, lighting, and playing surface of a park and you apply that to the appropriate components for a player and if you are "conservative" you simply end up with a better "number" than when you started...
Park Factors (March 18, 2004)
Posted 5:25 p.m.,
March 19, 2004
(#14) -
MGL
You can take Yankee Stadium from 1920 to 1931, and I'd guess that over 50% of LH HR were hit by 1 guy. If, for example, Ruth hit 300 HR at Yankee Stadium and 200 away, and his LH teammates his 300 HR at Yankee Stadium and 400 away, you would conclude that Yankee Stadium has no HR PF for LH.
To some (perhaps a large) extent, that's the result you want! While individual players do indeed have unique park factors, they also don't have unique park factors, such that we gain useful information (sample size) about every player by combining data from all players. Using your argument, we wouldn't want to combine the data from 100 different players since each of those players has their "own" park factor." If you are going to think of park factors that way, then you might as well not do them at all, other than using a player's own home/road splits and then regressing, or just using a player's road stats (adjusted for HFA) as a park neutral estimate of his true talent. Which actualy brins up an interesting question. For a player who plays in an arguably "unusual" park, and has a long history (say more than 15 years of data) which is a better estimate of his true park-neutral talent, his road stats only adjusted for HFA or his total stats with the home stats park adjusted (using some crappy adjustment formula)? Plus the larger the player comprises the data in a certain park (like HR's, Ruth and Yankee Stadium, the more the park factor is simply that player's PF, and the more that player's park adjusted stats are really his road stats only. Remember that if we use a player's own non-regressed splits to adjst his own home stats, we simply get his road stats (and end up completely ignoring his home stats, which can't be right unless we have a huge sample of data for that player)....
Park Factors (March 18, 2004)
Posted 9:24 p.m.,
March 21, 2004
(#18) -
MGL
The other Kurt, yes, the sample size is an issue, but the pitcher/batter/defense bias is not. Pitchers and fielders have very little influence on park effects as compared to hitters. So yes, if there were one or two players who dominated a component of a particular team's offense, it would NOT be a bad idea to just use the visiting team's data. The problem with using only the visitor's stats is that you would have road stats in a park compared to home stats for the rest of the league. So the ratio or odds ratio or whatever method you used to calculate the park factors would end up having the HFA built in to them, which would be extremely problematic. You actually have a similar although not quite so dramatic problem with regular park factors (when you use both teams's stats), and that is if a park has a quirk about it that gives it a greater or lesser than average HFA, that gets built into the regular park factors as well. In fact, you can create 2 park factors for each park - one for the home team and one for the visiting team. That is definitly true for Coors Field. IOW, park factors and HFA are inextricably related...
MGL - Questec and the Strike Zone (March 20, 2004)
Discussion ThreadPosted 8:54 p.m.,
March 20, 2004
(#6) -
MGL
Hey, this is a Questec thread Jim! No, UZR does not account for shifts. It assumes everyone is playing a normal position, but of course the baseline percentages are based on an average configuration for all fielders, which includes all the shifts, so that a player that never shifts or a player that shifts more than average will have a UZR that is slightly screwed up...
MGL - Questec and the Strike Zone (March 20, 2004)
Posted 11:13 p.m.,
March 20, 2004
(#8) -
MGL
Plus, Bonds' defense and baserunning is starting to hurt his overall value. I expect that it will be even worse this year. Amazingly, his SB/CS totals were 7/0, a better net than some of the so-called best basestealers in the game...
MGL - Questec and the Strike Zone (March 20, 2004)
Posted 12:20 a.m.,
March 23, 2004
(#11) -
MGL
re's just one problem with that theory. Based on your data, it appears that the umpires were able to make that transition successfully in 2001 in the future QuesTec parks but it took them another year or two in the other parks
FJM, I will post the ratios for each park in each of the 4 years if you like (home team hitting only or both teams?), but you are not reading the data correctly.
Here's what the data is "saying":
In 2000, the soon-to-be Questec parks/teams had a lower ball-to-called strike ratio for whatever reasons (park effects, their hitters). Ditto for walks and K's. The ratio of the soon-to-be Questec parks' ball-to-called (b-t-cs) strike ratio's and the league's b-t-cs ratio was around .90 (.891).
In 2001, the strike zone changed. In the league as a whole, the b-t-cs ratio dropped. It also dropped in the Questec parks as we would expect. The Questec/league ratio is about the same (.875 - not statistically different than .891).
In 2002, it drops again in the league even though the strike zone has not changed again. I suspect that more umpires are calling the new strike zone (in all parks of course). What happens in the Questec parks now that Questec cameras are installed? It goes up a little. Were the cameras not installed, we would have expected it to drop again on the Questec parks like it did in 2001. The difference between the expected drop and the actual increase may be due to the camera's influence on the umpires. That's the thesis of the article/study.
In 2003, nothing weird goes on ay all. The league-wide ratio drops a little again, maybe just a fluc or maybe even more umpires are calling the higher zone. Drops a little in the Questec parks as expected. There doesn't appear to be a "delayed affect" with the Questec cameras (more umpires being influenced by it in the second year of Questec than in the first).
Were you not aware that Questec was installed in 2002? Maybe that is where the confusion lies. Otherwise there is nothing in the data to contradict my theory of the b-t-cs ratio getting smaller in 2001 and then again in 2002, in both Questec and non-Questec parks...
MGL - Questec and the Strike Zone (March 20, 2004)
Posted 8:03 p.m.,
March 23, 2004
(#15) -
MGL
You are assuming that there is something inherently different about the future QuesTec parks, as evidenced by the disparities in Ball-to-called-strike ratios between them and MLB as a whole in 2000 and 2001.
Now I see what you mean. The whole point of my doing it the way I did it (first comparing the non-Q parks to the league as a whole before Q was installed) is because I assumed that there will be considerable real differences among parks. I still think that is the case. In fact, I'll bet bottom to dollars (or whatever the expression is) that the 98 and 99 data will be similar to the 00 and 01 data (the Q-parks will have a much lower b-t-cs ratio than the non-Q parks. If that is the case, your theory or anti-theory kind of gets blown out of the water. You are obviously saying that that the 00 and 01 Q-park numbers are a fluke and that the 02 and 03 ones are "correct." To some extent Occam's razor applies, but not if it is true that parks (and remember also that in my study I did not control for all the players as BP did - the Q-park data inlcudes not only the "parks" but the home hitters as well) have very different true b-t-cs ratios. If that is the case, than it is probably just as likely that 00 and 01 are a fluke than it is that 00 and 01 represent real "strike zone" park factors for the Q-parks as a whole...
MGL - Questec and the Strike Zone (March 20, 2004)
Posted 9:23 p.m.,
March 23, 2004
(#16) -
MGL
FJM, I didn't have home/away balls and strikes for 1999, so I switched everything to both home and road teams in the Q and non-Q parks. I went back to 1993 (so last 10 years). There is going to be lots of noise due to the fact that the pitchers and hitters on the Q and non-Q teams are going to influence the ratios. Hopefully the data will still show some kind of a pattern.
Year/b-t-cs in Q parks/b-t-cs in league/ratio of one to the other
93/2.41/2.69/.894
94/2.43/2.57/.944
95/2.38/2.62/.906
96/2.37/2.48/.955
97/2.28/2.31/.986
98/2.26/2.32/.975
99/2.39/2.44/.980
00/2.32/2.45/.948
01/2.12/2.25/.942
02/2.20/2.20/1.00
03/2.12/2.13/.996
Although it is nothing definitive, there appears to be a pattern prior to 2002 of the Q-parks having a smaller b-t-cs ratio than the rest of the league, presumably because of "park factors." Starting in 2002, however, the b-t-cs ratio in the Q-parks appears to be around the same as the rest of the league, suggesting, as was my original thesis, that Questec is indeed causing umpires to call a smaller strike zone.
In 8 years prior to Questec, the b-t-cs park factors for the Q-parks combined is .9485. In the 2 years after, it is .998. I would guess that that is a statistically significant difference, but I'm not sure. Again, we have the problem of I didn't control for the fact that we have a biased sample of pitchers and hitters each year in the Q-parks, so who knows. But there certainly is a strong inference that Questec is influencing the umpires' calling of the strike zone, which seems quite logical to me. I would have been shocked if that were not the suggestion from the data. Why would umpires want to be chastized by MLB and put their jobs or assignments in jeapordy if they know that they are being "rated" in some parks and not in others and they know (or think) that the Questec machines do not consider pitches "on the black" strikes, assuming that is the case?
MGL - Questec and the Strike Zone (March 20, 2004)
Posted 1:04 a.m.,
March 24, 2004
(#18) -
MGL
FJM, how is looking at the park by park data going to resolve anything? All you are going to see is lots of noise. BTW, you would expect to see all kinds of differences in ratios between home and road teams batting (or pitching) for 2 reasons: One, sample size, and two, you have a completely different subset of pitchers - one his the home team(s) only and the other is all the rest. I think we need to put this one to sleep...
Was the Eric Chavez signing a good one? (March 22, 2004)
Discussion ThreadPosted 11:59 p.m.,
March 22, 2004
(#15) -
MGL
Rally, Davis and Tango are right on the money. As Rally correctly pointed out, we would expect to see a decline in performance from any excellent player at an early age (or at any age for that matter) because of regression to the mean. That in no way implies that these players' true talent peaked early. More than likley these players' true value peaked at 26 or 27, just like everyone else on the average. In fact, if you look at the results after the selectively sampled year (a good player at age 25), we see a very nice and normal aging patters. So yes, the patterns shown in the article looks exaclty like what we would expect. Namee and/or the rest of the HT staff should have known this!
The bottom line is that Chavez projects to be one of the best players overall in baseball (according to Superlwts projections) for the next several years. If ever there were a player who deserves that kind of contract, surely it is Chavez. His Slwts projection for this year is 36 (per 150). The year after that, it will be around 34 (depending upon what he does this year of course). Those numbers are more than 5 wins above replacement, which is conservatively worth 10 mil per year. Now, it may not be "correct" for a low budget team to sign any player to a lucrative long-term free-agent kind of contract. In fact, it may be correct for a low budget team to spend 40-50 mil per year on payroll and shoot for an 85 win season. I don't really know. From a P.R. persepctive, however, whereby a team has to project to the fans that they are at least trying to win a championship rather than just make as much money as possible, they can't just trade every good player who is going to command big money. That just won't fly. The reason they did that with Tejada (I suspect) is that they knew that he was not worth nearly the money he was asking for (and signed for). They handled it well of course. They didn't say "We really don't think Tejada is worth all that much." They just politely said that they couldn't afford him at this time and wished him luck. With Chavez though they realize that he IS worth the 10 or 11 mil a year and if they have to sign someone for big money efvery once in a while, this was a great opportunity, as he is one of the few superstars who are worth their contracts (Pujols may be another one)...
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 12:00 a.m.,
March 23, 2004
(#16) -
MGL
Opps, sorry, that should be David and not Davis of course...
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 2:03 a.m.,
March 23, 2004
(#19) -
MGL
As far as Chavez' so-called consistency, not only are Tango's comments correct about the "silliness of consistency" in a player's sample stats (so-called "consistency" is merely the convergence of luck and [probably] a reasonably constant level of true talent), but GPA, like OPS, is a "rough approximation" of a player's offensive value. It is literally a coincidence that Chavez' GPA was almost exactly the same in 2000 as in 2001-2003.
Here are Chavez' park and opponent adjusted offensive lwts (MGL style), per 150, which include everyhting but the kitchen sink (but does not include GDP and SB/CS numbers). By the kitchen sink, I mean they do not include IBB's or sac bunt attempts, they give credit for ROE's, and they give different credit for GB and FB outs based on handedness.
2000 +9
2001 +24
These are taken right off the Superlwts file (at least my version).
So much for consistency (in 2000 and 2001 - actually he was very "consistent" in 2001-2003).
BTW, if A has a lwts per 150 for the last 3 years of +10, +11, and +9, and player B's is +22, -1, and +9, for the same number of PA's, what does this tell us, assuming that both are around the same age and have been healthy?
One, player A is slightly more likely to have had around the same true talent over the 3 years.
Two, and I'll put this in the form of several questions, are their projections any different (the year by year weightings not withstanding)? Are their error bars (confidence intervals) any different, assuming the same amount of historical PA's? Should we treat one player differently than the other? Should we "care" that one has been more "consistent" in sample performance than the other?
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 2:18 a.m.,
March 23, 2004
(#20) -
MGL
As far as why Chavez and not Tejada, yes Crosby, Tejada's replacement, appears to be a terrific hitter, but more importantly, here are their respective Superlwts numbers from the last 3 years, as well as their 2004 Superlwts projections:
Tejada Chavez
2001 9 61
2002 27 40
2003 18 41
2004 (proj) 16 36
It is not even close who is by leaps and bounds the better player (as I said, I think that the A's knew that). That's not even considering the age difference...
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 1:49 p.m.,
March 23, 2004
(#25) -
MGL
I think that any "age adjustment/progression" algorithm breaks down for someone that old. I would have to think that sharp decline, injury, or just plain ole voluntary retirement will pretty much be the end of Bonds career within a year or 2. I am also gonna guess that his defense and baserunning will take a serious chunk out of his offense (which will also decline quite a bit, if they stop issuing the IBB so darn much) this year...
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 1:11 a.m.,
March 24, 2004
(#30) -
MGL
Rally, the SB/CS lwts are meaningless. I'm sorry I made a separate category for them. The y-t-y correlation for SB/CS net runs (lwts) for major league players is a big fat zero. As far as "regular" baserunning lwts, there is only so much you can cost your team (see Edgar Martinez), even in a wheelchair. I'd put the over under on Bonds' 2004 UZR at -15.
Even if Bonds only retired when he couldn't play at the level of a replacement player, I'd say he has about a 5% chance of playing till he is 50 and that is generous. Yes, I'm speculating, but I think that the aging curve probably takes a nosedive after age 40 or so, and the chance of a career ending injury, which is not included in the aging curves, is enormous...
Sophomore Slumps? (March 23, 2004)
Posted 1:44 p.m.,
March 23, 2004
(#7) -
MGL
The article could have been written in 6 words. "Sophomore slump, regression to the mean"
I see nothing worthwhile in this article. For a large enough group of players (in order to smooth out the random fluctuations), any above average season is ALWAYS followed by a decline, and any below average season is always followed by an "improvement." Period! It doesn't matter whether you look at ROY players, MVP players, MVP runners up, best players on the team, silver sluggers, etc., etc., etc.! (The only caveat is that you have to balance regression with the age factor. For example, if your group of players were only slightly above average and they had an average age of less than 26, then you would expect about the same numbers in the subsequent year as the regression and the improvement with age would cancel each other out.)
RTM (regression to the mean) may be implied in the article (I agree that it is), but not speficially mentioning it is very misleading to all but the seriously math oriented and/or enlightened readers.
And how can any serious sabermetrician not know EXATLTY what to have expected, almost to the penny, in the second year! Tango, I, and many others could have told them almost exactly what the next year was going to foretell by simply using a one-year regression and then making an age asjustment.
If you really want to tell whether there is indeed a phenomenon whereby pitchers "figure out" good rookie hitters by the following year, it is very tricky. You would have to basically look at the "expected" sophormore year numbers (based on the aforementioned regression plus an age adjustment) and compare that to the actual sophomore numbers. Even then, there is so much "slop" that you are not likely to be able to come up with anything meaningful.
Hinske was an easy one BTW. In 2002, his rookie year, his Superlwts was +20 per 150. In 2000, his MLE Slwts (hitting only) was +2 and in 2001, it was -1. His "sophomore" Superlwts was +8, almost to the "tee" exactly what we would have projected him at!
I am surprised and concerned with the quality of the HT articles (I was kind with the last article linked here) so far...
Sophomore Slumps? (March 23, 2004)
Posted 5:30 p.m.,
March 23, 2004
(#13) -
MGL
Studes, I've always liked Aaron's work. There is a big difference between "soft" articles and misleading and innacurate ones. The "sophomore slump" article is closer to the latter category, especially in light of J. Cross' comments. A sophomore slump has nothing whatsoever (a little exaggeration) to do with second year ROY's, and everything to do with "the next year after an excellent year, rookie or not..."
Sophomore Slumps? (March 23, 2004)
Posted 5:04 a.m.,
March 28, 2004
(#20) -
MGL
dsm, This "phenomenon" is nothing more than regression, and is not unique to rookies. As I (and others) said in an earlier post, you would see almost identical regressions in any year following an above average year for any large group of players.
The same thing by definition applies to pitchers, expcept that you will see more regression from one above average year to the next. Why pitchers regress more than hitters, we don't exactly know. Part of it is due to injury, but much of it is not (they regress a lot from bad injury-free seasons as well)...
The Scouting Report - Compared to UZR (March 23, 2004)
Posted 4:29 a.m.,
March 24, 2004
(#5) -
MGL
I should know these off the top of my head by now, but the y-t-y "r" for batting lwts for players with around 550 PA per year is .675.
For UZR, it is based on UZR "opportunities" of course and not PA's or games. For around 400 opps (around the same number of games as 550 PA, or around 130 games, for an "average" fielder), the "r" is around .450. For a SS or CF'er, for 130 games (550 PA) per year, the y-t-y "r" is around .525. For 2B and left or right field, it is around .475, and for a 1st or 3rd baseman, it is around .350.
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 4:07 p.m.,
March 24, 2004
(#5) -
MGL
Actually, the "DP" part of UZR is as a pivot man AND as the starter of the DP. So the 3-6-3 and 3-6-1 DP IS included in the first baseman's UZR plus DP. Actually, I don't know whether Tango is including the GDP for IF'ers (and arm for OF-ers). I don't think so. Scooping is NOT included in UZR for 1B'man (I wish that it were - perhaps next year)...
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 10:27 p.m.,
March 24, 2004
(#8) -
MGL
As is often the case, I think that the "value" of the 3-6-3 or 3-6-1 DP is much overrated. Basically to calculate the DP portion of UZR I take all the DP opps (runner on first and less than 2 out) and see how many DP's a fielder starts (say at 1B) and how many they are a pivot man on (for SS and 2B of course). For every "extra" (plus or minus) DP's above or below expected DP's (based on the league average number of DP's per DP opp at that position), the pivot man gets "credit" (plus or minus) for half of the .45 runs (or whatever the average diff is between a DP and only one out) and the fielder that started the DP gets the other half "credit." Based on the low SD for first baseman (.6 runs), we have to pretty much conclude that the ability to turn the 3-6-3 or 3-6-1 DP is just not worth very much (probably because it is relatively rare despite what appears to be the case). Even the DP's for the SS and 2B, which include starting the DP AND the pivot, aren't worth all that much. FWIW, the y-t-y "r" for GDP defense is around .35 for an average IF'er per 125 games (obviously more for the SS and 2B and less for 3B and then again 1B). The y-t-y "r" for OF arms is around .4 for 130 games...
The Scouting Report, By the Fans, For the Fans - 3B Report (March 26, 2004)
Posted 4:45 p.m.,
March 26, 2004
(#1) -
MGL
From a perfunctory glance, there may be a syndrome whereby the fans overrate an older player who was once an excellent defender, but is no longer. Edgardo Alfonzo and Castilla may be cases in point...