Tango on Baseball Archives

SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

This is a blast from the past. Many many comments from Walt Davis, MGL, myself, Vinay and a few others. Worth reading.

The original article can be found here.

My conclusion: I hate the way some people use MLEs because we have many issues to resolve...We should not treat the currently-published MLEs as a final product.

--posted by TangoTiger at 11:32 PM EDT

Posted 2:37 a.m., January 1, 2004 (#1) - MGL
I am going to continue my discussion from the other thread to this one.

Here are some preliminary data:

In the first analysis, I looked at all players who played (at least 1 PA) in both the minors and majors in the same year. I used 3 years worth of minor and major data: 2001-2003. All of the minor league data is park and league adjusted. All of the major league data is not. For large enough samples, it shouldn't make much of a difference whether each set of data is park or league adjusted or not.

First I looked at AAA only. I wanted to see the ratio of normnalized AAA stats to normlaized major league stats for all players who played at both levels in any one year. If a player played at both levels in 2001, I compared his minor league stats in 2001 to his major league stats in 2001 only. If he played at both levels again in 2002, I compared his 2002 minor stats to his 2002 major stats only. IOW, it is as if they were two different players.

I looked at the following individual components: s, d, t, hr, bb+hp, so, sb, and cs. A "normalized" stat is simply a player's rate divided by the league rate. In the minor leagues, since the raw stats have been park and league adjusted, any player's normalized stats can be fairly compared to another. For the major leagues, the stats are not park or league adjusted, so regardless of what home parka player plays in or whether he plays in the NL or AL, his normalized stats are his stats (rate-wise, per PA, of course) divided by the average NL and AL players combined. As I said, the fact that the minor stats are adjusted and the major ones are not should make little or no difference in this analysis.

There were a total of 1146 "players" who had dual playing time in at least on eof those 3 years. I put "players" in quotes becuase if a player had dual time either on more than one team or in more than one of the 3 years, he was counted more than once. In other words, there were 1146 "pairs" of data in the sample group. Each pair contained sample of minior league PA's and a sample of major league PA's from the same player in the same year.

For each pair of data, the average number of PA's in the minors was 183 and in the majors it was 113. Again, that is in one year only, and those are averages. Any given player looked at could have had 300 PA's in the minors in 2002 and only 3 PA's in the majors that year. They would still be included in the sample data. In any case, there were 1146 such pairs of data.

As with most analyses where you are looking at matched pairs of data, i weighted each element of each pair by the lesser of the two PA's. In other words, if in the first pair of data there were 100 PA's in the minors and 200 in the majors, the major stats AND the minor stats would be weighted by 100, the lesser of the 100 and 200.

All of the minor stats are averaged using these weights and all of the major stats are averaged using the same weights. For example, let's say we were using OPS and we had 3 data pairs:

OPS minor PA minor OPS major PA major

.800 100 .700 200
.850 200 .800 150
.900 150 .850 50

The weighted average of the minor league OPS's would be .800*100 plus .850*150 plus .900*50 all divided by 300, or .842.

The weighted aveage of the major league OPS's would be (we use teh same weights) .700*100 + .800*150 + .850*50, or .775. In this example, the ratio between the minor and major OPS would be .775/.842 or .92. If those were actual numbers, they wouldn't really mean anyhting since the OPS's in the minor leagues and in the major leagues would have to normlaized to their minor and major league averages before we took the ratio.

The total of the PA "weights" (of we added up all of the lesser of the two PA's in each pair) is 58,911. This is what you would use to calculate SD's for confidence intervals of the results.

The weighted average minor league component stats of all of these dual players were, per 500 PA:

Remember this is AAA only.

s, d, t, hr, bb, so, sb, cs, OPS

85.1, 26.6, 3.8, 13.4, 48.3, 82.2, 11.7, 5.2, .811

Again, these numbers are meaningless unless you know the averages in AAA.

The average AAA player had the following component stats:

82.5, 24.7, 3.4, 11.8, 45.8, 87.0, 9.5, 4.7, .760

As you can see, the average AAA player who plays at both levels in any given year is above average in each of those categories. In fact, here are their normalized (their stats divided by the league average) component stats:

1.03, 1.07, 1.10, 1.14, 1.05, .94, 1.24, 1.10, 1.07

Interestingly, the average player who plays at both levels (usually a "call-up") excels in both power and speed, but not as much in BB rate. Such a bias is probably not optimal but not surprising.

Here are the average major league stats of the dual level players:

74.3, 21.0, 2.8, 10.6, 39.2, 99.9, 6.6, 3.9, .664

Again, these raw numbers mean little unless they are normalized to major league averages in that same year, so you can see how these "call-ups" did compared to the average major league player.

Here are those same component stats, normalized:

.94, .87, 1.11, .72, .86, 1.21, .84, 1.09, .87

Not surprisingly, they were well below average in almost every category but triples. They actually had a higher triples rate (per PA) than the average major leaguer (and their normlaized triples rate actually went up slightly form the minors to the majors), which is not surprising since triples is mainly a function of age and speed. To really get an idea as to what's going in with triples, you need to convert the above triples numbers into a "per doubles and triples" rate rather than a per PA rate as it is expressed above.

It is interesting that despite their young age and presumed superior speed, these call-ups really had their SB rate drop quite a bit and their CS rate go up. This suggests that it is much harder to steal bases in the major leagues, perhaps due to much better catcher arms and pitcher moves.

Finally, here are the normalized minor league stats divided by the normalized major league stats for all of the dual-level players. Again, because of the weighting system used, these should represent the observed drop-offs in performance (not the drop off in talent, because of the selelctive sampling issues that will be addressed in a later post and has already been discussed on the other thread) for these dual level players, as a group:

.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81

Again, the above is:

s, d, t, hr, bb, so, sb, cs, OPS

The most pronounced drop-off (36%) was in HR rate. That high number, as compared to the other drop-off rates, could be a function of severe selective sampling with HR's in that players with very high short-term (i.e. lucky) HR rates in the minors are more likely to get called up than players with high rates in other components. Again, we will get to the selective sampling issues later.

Finally for this installment, here are the ratios of minor to major normalized stats (drop-offs) for AA players who also played in the majors in the same year:

Here we have only 194 pairs of data. Each pair averages 143 AA PA's and 127 major league PA's. The total of the PA weightings was 6412.

Here are the AA to major ratios:

.84, .70, 1.07, .51, .84, 1.42, .96, .91, .74

Compare them to the AAA to major ratios:

.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81

As you can see, in going from AA to the majors rather than AAA to the majors, there is a larger drop-off in every category but triples, SB, abd CS, suggesting that these players are REAL fast and perhaps good basestealers. Remember how for the AAA to majors players, the SB rate drops and the CS rate goes way up. Like with triples, since those SB and CS rates above are per PA and independent of one another, you would have to convert them to at least a function of one another to see what is going on. One reason for the really large drop-off in HR rate as compared to the other components might be the selective sampling issue again. The average AA dual-service player has a very high HR rate in the minors. The normalized HR rate of these players is 1.44 as opposed to only 1.14 for the AAA players. This suggests that the best or perhaps only way to get called up from AA is to hit lots of home runs. Again, the higher the rate in the minors, the more luck component there is in that minor league sample stat, and the more drop off we should see in the majors due to regression alone, in addition to a change in the pitching level. IOW, that 49% observed drop-off in HR rate from AA to majors may not be nearly that high when we adjust for selective sampling of players who get called up.

Next time I am going to look at what is actually going on with this selective sampling issue and how we can perhaps account for it such that we might arrive at some MLE coefficients which actually reflect the true drop-off rates for a player's hitting talent in the minors versus his hitting talent in the majors, rather than reflecting an observed drop-off rate from a biased sample (the minor stats) to a somewhat random sample (the major sample - although the number of PA's in the majors is going to be biased), which is going to inflate the true drop-off rates if our sample of dual-level players tends to be lucky in the minors, which they do.

While the above ratios will "work" (will predict a minor league player's major league stats) pretty well for a player with around the same stats in the minors as one of our average dual-level players, they will NOT work real well if our minor leaguer has stats that are way above or below the average player in our above sample...

Posted 3:15 a.m., January 1, 2004 (#2) - MGL
Here are the AAA and AA ratios, when we do the same analysis, only this time we use a player's minor stats in one year and his major stats in the next year. If the selective sampling affects are around the same as with the same-year dual service players, we would expect to see not as much of a drop-off with the following-year dual service players as they are one year older and presumably haven't reached their peak age yet (on the average). We have 764 matched pairs for AAA one year and majors the next year. The average PA's in the minors in one year for each pair is 248 and it is 178 in the majors the following year.

AAA minor/major ratios

.94, .86, .99, .72, .85, 1.24, .79, .96, .86

Compare these to the ratios of the same-year dual players:

.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81

There is in fact a smaller dropoff. Whether that is due to age (as we would expect) or the fact that perhaps the same year dual service players had to be luckier in the minors (to get called up in the same year), we don't know. Actually this group of players had better minor league numbers than the same-year players, but they were also had more PA's so their true stats may be closer to their sample stats than the same-year players.

For AA, here are the ratios for players who played in AA one year and the majors in the next year: There were only 271 pairs with an average of 304 PA's in AA and 139 in the majors (the next year). Again, because of age, we expect a smaller drop-off, especially with these presumably younger players.

.94, .79, .97, .57, .81, 1.33, .81, .97, .80

Compare to the ratios for same-year AA players:

.84, .70, 1.07, .51, .84, 1.42, .96, .91, .74

Indeed, there is a smaller drop-off in most of the categories (interestingly, not in BB rate, but that could be sample error)...

Posted 9:14 a.m., January 1, 2004 (#3) - Tangotiger
I have an issue with taking the lower of the two PAs.

What you should do, I think, is first regress each of the components by the number of PAs. THEN, you can make you comparison.

If you have 300 PAs in AAA and 100 in MLB, the spread of performance will be much larger in MLB. Dividing the AAA stats by 3 will still give you a spread that is much smaller than the 100 PA in MLB.

Therefore, first regress each of the AAA stats, and the MLB stats, based on their actual PAs. Then you can compare.

Posted 12:45 p.m., January 1, 2004 (#4) - MGL
Tango, I have to think about that. As I said, these are the preliminary results and NOT the best way to calculate MLE's. It sounds like you are saying that there are other problem besides the selective sampling issue. Are you saying that even if the sample of minor league players who also played in the major leagues were randomly chosen from the minor legaues, you still wouldn't like choosing the lesser of the two PA's (i.e.,, dividing the 300 PA stats by 3)? Isn't that what we do whenever we do any "matched pairs" studies? I don't think any regression is necessary on the major league side. I think only on the minor league side and then only because of the selective sampling issue. For example, when you look at groups of hitters or pitchers from one year to another, like in your banner years study, don't you weight each year's results by the lesser of two PA's exactly like I did? If this is correct but for the selective sampling on the minor legaue side, then we are on the same page. If you think that regression must be done on BOTH the minor and major side, then I think I disagree. Also, after you do the regressions, on one side or the other, then do you still weight both sides by the lesser of the two PA's?

Posted 6:07 p.m., January 1, 2004 (#5) - MGL
Before I talk about why the coefficients generated above don't "work" for any and all players in the minor leagues, it's necessary to first discuss what the definition of an MLE is.

From Dan Symborski on his web site:

One thing to remember is that MLE's are not a prediction of what the player will do, just a translation of what the major league equivalence of what the player actually did is. Dan S.

Even though you can't use an MLE directly as a prediction of major league performance, what's the point in having an MLE if you can't use it in a projection model? The answer of course is that once you translate a player's sample minoir legaue stats into a "major league equivalency," you can then use it in whatever projection model you happen to like or use. In fact, the contention by Bill James and other is that using an MLE in a projection model is exactly as good as using actual major league stats in a projection model.

Foe example, according to that claim, if player A had an MLEA OPS of .857 in 500 minor league plate appearances and player B had an actual major league OPS of .857 in 500 major leage appearances, not only would their projections be exactly the same (assuming everything else about them were the same, or that we knew nothing else abou them), but those projections would be equally accurate.

In my opinion, that claim is preposterous for two reasons. One, accurate park and league adjustments in the minor leagues, especially the former, are more difficult to do than in the major leagues, and two, no one knows for sure, and perhaps even close to "for sure" what the correct coefficients ro multipliers are when doing the MLE's. As far as I know, Bill James and others simply use a "ballpark" figure of an 18% reduction in production from AAA to the majors and then apply that in some crazy way to each component of a player's minor league stat line. Surely this can't lead to a result (an MLE) that is anywhere near as accurate as player's true major league stats. Also, as far as I know, the only justification for the "claim" is some crude test that James (apparently others have tried to replicate it) did to try and show how "accurate" MLE's were in predicting the next year's major league stats in a small sample of players versus how "accurate" actual major league stats were in predicting the next year's major league stats in another smal sample of players. He did something like look at the average value of the "delta" BA in each group, and when he found that it was about the same, he concluded that MLE stats were "just as accurate in predicting major league performance as actual majopr league stats."

Now, they may in fact be "almost as good," but they CANNOT be "just as good." The only way they could be "just as good" was if we had a way to PEFECTLY compute an MLE. Practically speaking, which I'm sure that James was doing when he said "just as good," I still don't think that you can say that, because of the inherent problems associated with the park and league adjustments in the minors plus coming up with a method of calculating the MLE conversion numbers in the first place. We don't even know whether there is a linear relationship between minor league and major league perforemance, let alone exactly what that relationship is for each of the compoentn stats. And we certainly don't know that each of the component stats is reduced by 18% in run value translation, rather than 15% for one stat, 20% for another, or some other combination of numbers, etc.

So while yes, a perfect MLE may be just as good as an actual major league stat in terms of predicting the same major league stat in the future, that reasoning is kind of circular. In fact, it's a given. It's like saying that a player projection is perfect if in fact, we use a perfect projection model!

Anyway, the goal in these posts is to use the data I have to try and come up with true and accurate MLE coefficients that can be used for any and all players, which is what MLE's are designed to do. I've already explained why the coefficients I came up with cannot be used on all playersm but CAN be used on players who have around the same minor stats as the average player in the groups studied above. However, even if we did that, our MLE coefficiennts would not represent the actual rduction in talent going from the minors to the majors, the coul;d simply be used, sort of coincidentally, as a one-step projection model. In other words, they would both do a translation AND do a regression of the translated numbers all at the same time.

To prove how the above coefficients would work well as a one-step projection for some players but not for others, I will look at players who played in the minors and majors in the same year, exactly as I did above, but I will look at the years 98-00 rather than 01-03, so that the original coefficients are not a self-fulfilling prophecy. In other words, we want to test how well those coefficients work for certain groups of players in another sample.

First, we'll look at the same overall players (dual-service) in 98-01. Here are the minor/major coefficients after doing all the same adjustments and normalizations that I did above:

.91, .86, .98, .61, .83, 1.26, .66, .96, .82

Here are the same numbers from the 01-03 sample:

.91, .81, 1.01, .64, .82, 1.28, .68, .99, .81

Pretty darn close, which means that there is probably some very good relationship between minor and major performance, which means that one, James is right in that if we can come up with a perfect MLE algorithm, we can probably predict major leage stats from ninor league stats just as well as from major league stats, and two, we should be able to come up with something pretty good such that our MLE's should be pretty good at predicting mjaor league perforemance - not "as good as" major league stats, but pretty good.

Getting back to why we can't use the above coefficients to either translate minor stats to MLE's or to predict major stats for ALL playres but we can for some players, here are the same coeff. for players who had HR's rates around the average of all the players in our group of same-year dual-service players:

We are only concerned with the HR rate here.

.93, .83, .97, .65, 1.24, .72, 1.03, .83

As you can see, the 01-03 sample HR coeff. would have done a pretty darn good job of predicting major league HR rates for these players!

But what about for players who had either very low or very high HR rates? Let's look at the low HR rate group, and see if that .64 HR coeff. from the 01-03 sample group would have done a good job of prediction. The players had a HR rate of almost 1/3 of the average HR rate in our comlete sample of dual-service players.

Here are their coefficients:

.87, .87, 1.01, 1.16, .86, 1.27, .66, 1.04, .89

Wow! These players hit MORE home runs in the majors. Clearly, if we used the .64 coefficient for HR rate on these players, we would have done a horrible job of predicting their major league home run rate! And don't forget that these are actual players who got called up - who layed in both the majors AND minors in the same year sometime in 98-01. In fact, these players averaged 194 PA's in the minors (in one year) and 123 in the majors (in the same year). What the heck is going on here? I'll get to that in a little while, although you can probably guess.

Here are the coeff. for the high HR players. Their HR rate is around 50% higher than the whole dual-service group (since the entire group has a high HR rate to begin with).

.95, .85, .95, .51, .81, 1.25, .66, .87, .78

Well, using the .64 HR coeff. is not very good for this group either. If you did, you would overestimate their major league HR rate.

As you probably figured out already, there is not single coefficient that we can use to do a one-step prediction form minor to major, because the real two-step process is nowhere near linear.

As I already said, what we have to do with this data is to try and figure out how to come up with accuarate coefficients such that they represent the true drop-off in talent from minors to majors. Once we know these, we can easily do the two-step process of projecting major perforemnce from a sample of minor perforemnce. The first step is the translation of the sample minor stats to an equivalent sample of major stats, and the next step is the same as we would do with any sampele of major perforemance - regress those sample major stats according to the size (PA's) of the sample. Presumably each component would have its own regression rate.

Interestingly, and unfortunately, if our dual service players were chosen at random - i.e., if players were called up from the minors at random or by lottery, our work would have been over a long time ago. The original coefficients, like the .64 for HR rate calculated form the 01-03 sample group, would be fine for using for translations for ANY and all mnr league players. Thos coefficints would be true MLE's. But alas, playres are not chosen randomly from the minors to be called-up to the majors and they are not sent down randomly or not at all, so we end up getting a "selective sample" of players in our dual use groups, such that their sample minor stats, even though it is a large sample is not anywhere near representative of the true talent of the group as a whole.

This last point is very important in baseball problems and in statistics in general. Tango alluded to it in his last post. If you have 1000 players chosen at random and each player only has 10 PA's each, the average stats of those 1000 players in 10,000 PA's is going to be a very close approximation of the average true talent level of the entire group because of the large sample and becuase they were chosen at random.

However, if we selectively sample a group of 1000 players, let's say players who were good over some short period of time, say 10 PA's, even though we also have 10,000 PA's of data, the average of those 10,000 PA's is NOT, I repeat NOT, going to be a good estimate of the average talent of that 1000 player group! That's is what is happening with our dual service group.

The trick then is to try and figure out the average "true" minor stats of all the players in our group and look at their major stats. The ratios between the two are going to be good estimates of the true MLE coefficients. Tango sugests doing that on a player by player basis. I'm not sure that is necessary. I'll have to think about that and perhaps do some simulations to see what is the best solution to the problem...

Posted 7:30 p.m., January 1, 2004 (#6) - Tangotiger
MGL, I think the problem with your HR example (high/low) is selective sampling. You choose the guys with really low HR rates.... well, alot of them were unlucky, right? You would have been better off using say low 2b+3b to 1b ratio as a proxy to "power", adn then looked at the HR rates.

Posted 7:58 p.m., January 1, 2004 (#7) - MGL
Tango, there is no problem! I was using that as an example of how you can't use the original ratios I came up with! Did you just skim my post?

Posted 9:41 p.m., January 1, 2004 (#8) - MGL
One (usually) good way of figuring out the true stats of a selective sample of players is to look at their previous and/or next year's stats. If the selective sampling only occurs in one year, then the stats of those players in the previous and following year should be more or less random (no selective sampling) and should reflect the true talent of that group of players. For example, if you selected a group of players in one year, based on any biased criteria (good year, bad year, etc.) and you looked at that same group the next year or the previosu year, you would see the average stats of the whole group regress to their true talent level.

The only danger in this is that you have to watch out for natural selective sampling in the previous or next year even though you didn't use any criteria to choose those players in the previous or next year. For example, if you chose the bottom (worst) half of all players in the NL in 2002 and wanted to know their true talent level and you looked at those same players the next, unfortunately, the worst of those playres probably did not have a next year or had very PA's in the next year, so that the next year's stats would be top-heavy with the better players, such that your estimate of the true talent of the players originally chosen will probably be a little high, if you use their next year's stats as a prozy for this true talent. Of course, one way to minimize this natural bias (as opposed to a pure selction bias) is to do the lesser of the two PA's" type weighting. For example, let's say that your original sample were 3 players with the following stats:

A 450 PA .700 OPS
B 400 PA .650 OPS
C 450 PA .600 OPS

Let's say the next year, these same player's stats were:

A 500 PA .730 OPS
B 300 PA .700 OPS
C 150 PA .680 OPS

Your year two stats are top-heavy with the better player (the worst player didn't get that many PA's becuase he had the worst stats last year and he is probably the true worst of the three). You can mitigate or minimize that bias by weighting the two years by the lesser of the two PA's in each pair of stats. The weighted average of tghe year one group is now .600 * 150 + .650 * 300 + .700 * 450 all divided by 900, or .667. The second year weighted average is .730 * 450 + .700 * 300 + .680 * 150 divided by 900, or .712. So the true talent of the group as a whole is .712. Basicaly by using the "lesser of the 2 PA's" as the weighting factor for each year, year one and year two, we are minmizong the impact that one out of whack sample OPS can have if it is based on a very small sample, and at the same time we are accunting for the fact that the players in one year may not be represented in an equal or even proportionin the next year.

Surprisingly, when I looked at my data, I found that of the same-season, dual-service players, many of them also had plenty of minor league time the following year. I guess there are lots of players who get called up and then are sent back down and stay down for a while or who get called up for a while and tehn start the next season in the minors again.

Here are same-year dual-service (AAA and maors) players this time in a 00-02 sample who also played in AAA again, the following year. What follows is their minor league normalized stats in the first year (good and lucky - that's why they were called up) and then their next year's minor league stats (presumably somewhat close to their true stats). Again, in both years, each player's stats are weighted by the lesser of the two PA's - either the PA's from year two or the PA"s from year one, whichever is less.

Year one (these players played in AAA and majors in this year AND had time in the minors again in the next year):

.99, 1.04, 1.10, 1.14, 1.03, 1.01, 1.11, 1.08, 1.04

As you can see, these players are indeed better than average players. That is why they were called up at some point (or sent down I guess if they started the season in the majors - I make no distinction, just that they played at both levels in the same year).

Here is how the same players did the next year in the majors:

1.00, 1.04, 1.05, 1.11, 1.04, 1.00, 1.03, .99, 1.04

The ratio of the first year to the second year normalized stats is the following:

1.01, 1.00, .96, .98, 1.00, .99, .93, .92, 1.00

These numbers are actually the regression coefficients that we would use to convert the year one normalized stats into their true stats, since we are using their year two stats as a proxy for their true stats. Why does it look like little or no regression is needed? Two very important reasons: One, these are next year's stats, and since these players are all young on the average, we have not accounted for the fact that there is going to be a pretty big increase in talent level from one year to the next. Two, the players were slectively sampled in year one because they were better than average in that year and were chosen to be called up. We would typicvally expect players who were better than average in only one year or a partial year to regress quite a bit in the next year. But these players were selected for call-up not only because they had one good year, but it is likely that they had god years before that and were good prospects in the first place. IOW, the teams can tell lucky players from good ones to SOME extent. IOW, these sample fo players who get called up are not as lucky as we thought. Their one year sample stats collectively are probably fairly close to their true stats, on the average. That, and the age thing, is why their next year stats are very close to their stats in the selectively sampled year. We do see some regression in HR rate even with the age increase, and in triples, which is not surpising, as increased age probably means lower triples rate (ditto for SB/CS rate), so the triples regression we would expect to see is not mitigated by an increase in age (ditto for SB/CS rate).

So the only thing that remains in order to figure out how much these AA players stats increase from one year to the next because of age. That should be fairly easy. The quick and dirty way would be just to look at all AAA players in one year to the next and look at the ratio increase in each of the stats, doing the "lesser of the PA's" weighting to account for the fact that the better AAA players may not be in AAA the subsequent year (they may be in the majors). We should see an increase in everything except for triples, do to an age increase, with minor eague players being on the lower part of the age curve (less than 28) on the average.

Another way to handle the regression thing is to use the above regression coefficients and apply them to the MLE coefficients computed when I looked at players who played in the minors one year and the majors the next. Those players already have a built-in age adjustment, just like with these regression ratios. I'll compare the two methods later.

First let's do the same analysis with AA players. We should see the same effect. Not too much regression if any, except for triples, becuase of the age increase. The age increase effect should be especially pronounced since the AA players would be a little younger than the AA players. On the other hand, AA players who get called up may be mroe lucky than AA players, as the AAA players have more of a history for the teams to review than do the AA players. If a playe tears up single A in 250 PA's and then is tearing up AA in 200 PA's he may get called up, whereas for a AAA player to get called up, maybe he has to tear up A, AA, and AAA. I don't know.

Anywhere here are the same stats for AA players who played in AA and the majors in one year and then again in AA in the following year. We don't have a huge sample size for these players, as not too many times did a player play in AA and the majors in one year and AA again in the next year. In fact, it happened only 92 times in 00-02, but at least there was an average of 250 PA's in the first AA go around and 259 in the second.

First year AA (players also played in majors in this year and AA again in the next year)

1.05, 1.09, 1.20, 1.40, 1.06, .96, 1.41, 1.20, 1.13

Of course, these were the very best (and lucky) players in AA that year.

Next year in AA

1.02, 1.09, 1.21, 1.48, .94, 1.01, 1.10, .87, 1.09

There was indeed an overall regression, as you can see from the normlalized OPS's, even though these players were one year older. Interestingly, most of that OPS regression comes from the BB rate.

Here are the regression ratios (the first values divided by the second):

.97, 1.00, 1.01, 1.06, .88, 1.04, .78, .73, .97

Everything regressed but triples and HR's. Triples not regressing may be due to sample error and HR's not regressing may be becuase of the age increase. Again, to get a better idea as to the effect of the age increase we need to look at the stats of ALL AA players from one year to the next. I'll do that in the next installement for both AA and AAA players...

Posted 12:58 a.m., January 2, 2004 (#9) - MGL
I've thought about it a little. Since the next year minor stats are a decent proxy for the true stats of these players one year later why not use stats, normalized of course, and divide them by the normalized major stats of players who played in the minors one year and the majors the next. This should give us a decent estimate of the true MLE coeffients. At least we can compare these to the ones we get if we adjust the regression coeffients we got in the last analysis for the on eyear increase in age.

Here again, are the ratios of one year normalized minor stats to next year's normalized major stats in AAA for 00-02:

.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86

Now here are the ratios of one year's AAA stats to the next year's AAA stats for players who played in AAA and the majors in the same year. Hopefully players who played in AAA and the majors in the same year are roughly the same pool of players as players who played in AAA one year and the majors the next, since we are using the regression coefficients of one group for the other group.

1.01, 1.00, .96, .98, 1.00, .99, .93, .92, 1.00

To see how similar the groups are, here are the normalized stats of the group that played in AAA in year x and in the majors in year x+1:

1.05, 1.08, 1.11, 1.12, 1.03, .92, 1.24, 1.15, 1.07

Here they are for players who played in AA AND in the majors in year x:

1.03, 1.09, 1.09, 1.18, 1.05, .95, 1.18, 1.08, 1.08

Not too terribly different.

So the final step is to take the coefficients gotten from dividing the one year minor sample stats by the next year major sample stats, which is, for AAA (see above):

.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86

and dividing those by the regression ratios above. Those regression rates are:

1.01, 1.00, .96, .98, 1.00, .99, .93, .92, 1.00

After dividing one by the other, we get:

.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86

These are the final MLE coefficients that estimate the actual drop-off in true value from AAA to the majors. If we convert them into a lwts and use .122 runs per PA or 61 runs per 500 PA, for the average major league game, we get -16 runs per 500 PA from the above MLE's, which is a 26% drop-off in production from AAA to the majors.

On way to check how good those MLE ratios, and actually another independent way to calculate them is to see what the ratios are if we only looked at approximately league average players in the minor leagues who also played in the majors that same year. This way our players' true rates would be around the same as their sample rates, so that the observed ratios between the minor and major normalized stats would be the same as the true ratios or true drop-offs. In order to retain decent sample sizes, we have to do this one compoent at a time.

First I did it with HR rate. The average HR rate in AAA in 01-03 was 11.9 per 500. I only looked at same year dual-service batters who had HR rates between 10 and 14. They had an average rate of 12.0, right around the league average. Their MLE ratio for HR's was .68. Not too far from our .72 above.

Now let's do thes rest of the components. We will only look at same year dual service players with around average BB rates, then average K rates, etc. Here are the minor/major sample coefficients, including HR's, but excluding OPS when we do this:

.92, .83, 1.11, .68, .82, 1.24, .64, .92

Again, let's compare these to the ones calculated above, with the OPS removed:

.93, .86, 1.04, .72, .85, 1.24, .85, 1.05

Not too bad!

Now let's do the whole thing for AA, without the last part (the "check").

Here are the one year AA to the next year AA regression coeffiecients for players who played in AA and the majors in one year and AA again in the next year:

.98, 1.00, .98, 1.03, .90, 1.04, .75, .75, .97
Now here are the sample MLE coefficients for players who played in AA in one year and in the majors in the next year:
.93, .82, 1.05, .55, .82, 1.32, .81, .8, .80

If we divide the second by the first, to get the true AA MLE coefficients, we get:

.95, .82, 1.07, .53, .91, 1.27, 1.08, 1.06, .82

If you do the lwts, this reprents a 32% drop-off in production from AA to the majors.

Final tally:

True AA MLE coefficients:
.95, .82, 1.07, .53, .91, 1.27, 1.08, 1.06, .82

True AAA MLE coefficients:
.94, .90, 1.01, .70, .87, 1.20, .75, .95, .86

Posted 1:00 a.m., January 2, 2004 (#10) - MGL
fixing bold hell...

Posted 1:00 a.m., January 2, 2004 (#11) - MGL
one more try

Posted 10:37 a.m., January 2, 2004 (#12) - tangotiger (homepage)
MGL, MGL, MGL.... YOU are accusing someone ELSE of skimming a post, and then commenting on it? Should I get John McEnroe on you?

********

You said:
As you probably figured out already, there is not single coefficient that we can use to do a one-step prediction form minor to major, because the real two-step process is nowhere near linear.

This is not the reason at all. It has nothing to do with linearity, and everything to do with selective sampling, which is what my comment was directed at.

As well, you really should be using the Odds Ratio method. At the level that you are doing it, using a Rates method is wrong.

******

Now, your idea about looking at the stats of the player in question in a year that's NOT part of the sample is EXCELLENT!

However, you still have some issue. Even though you are looking at say a player who played in the minors-majors one year to establish the equivalency, and then look at year+1 in the minors to figure out his true talent level (all the players as a group), you get selective sampling. If in year+1 the guy had very few PAs in the minors, then he's hitting the cover off the ball (luckily) and he gets called up. If he stays in the minors all year+1, and gets tons of PAs, he probably wasn't doing so well. Worse, you weight those PAs more, because he had more.

*********

I've noted this many times, but it bears repeating. The PAs of a player is dependent on how well he plays. It should not simply be used as a weight as if it was independent of his performance. Just knowing the PAs of a player, and nothing else, will tell you alot about how a player performed in the minors/majors.

*********

Finally, everyone should reread Walt Davis's post on the subject of "survivorship" (see homepage link), as well as running a Heckman Selection Correction (near the bottom of that page). These are very important subjects if you want to be serious about MLEs.

Posted 10:45 a.m., January 2, 2004 (#13) - Michael#36: Posted by Michael (at Clutch) @ 01/02/2004 01:35:53 AM
However, what if we're not that certain of A-Rod OB talent level? Maybe I'm 95% sure that his OB level is from .380 to .420. Now, given that, what do I expect his performance to be over the next 600 PA by chance alone?

(Primate mathemtician, please fill in the blanks. Let's say that it's .340 to .460 95% of the time.)

Actually if you assume that A-Rod's OBP talent is 95% likely to be between .380 and .420 (and make that uniform for that 95% - we can play with that later, it isn't important for now), and than say the 2.5% tale on each end stretches to .300 on the low and .500 on the high, then you can figure out roughly what you'd expect A-Rod's observed OBP is over the next 600 PA, and we get around .348 to around .452 as his 95% interval of confidence.

If you instead assume his OBP talent IS .400 then just on random fluctuation you get the 95% interval being about .360 to about .440 as Tango says.

If you instead assume his OBP talent is between .360 and .440 (say based on a 600 PA .400 season) and redo the stats you get his 95% interval of around .330 to around .470.

So you can see that the expected observed 95% interval does change with the assumed confidence interval of the assumed true talent level. But the reflection of this change is not a simple addition. With 0 assumed error you get the 95% interval of observed data is +/- .040 OBP. When you assume that the error in true talent is +/- .020 OBP you get the 95% interval of observed data is +/- .052. When you assume that the error in true talent estimate is +/- .040 OBP you get the 95% interval of the observed data is +/- .070 (with respect to the predicted mean).

Now this calculation overstates the amount of error you'd expect in reality as I made the simplifying assumption that the 95% confidence interval in true talent level that we had was uniformly distributed. In reality we might expect that .400 OBP true talent level was more likely than a .380 OBP and as a result thouse +/- .052 and +/- .070 would be coming in quite a bit. But still, one can see there is a lot of value in knowing the varience of these estimates - particularly if we have good reason to assume that the player in question has a wider varrience of what we'd expect his true talent to be (give it coming after a break out season, or coming back from injury where we aren't sure if he's going to be as good as he once was, or maybe a young player in his prime who might mature/age/get better a lot in one year or may not).

Posted 10:48 a.m., January 2, 2004 (#14) - tangotiger
Michael, can you give details as to how you got your numbers, as well, as how you'd handle the issue of non-uniformity mathematically

Actually if you assume that A-Rod's OBP talent is 95% likely to be between .380 and .420 (and make that uniform for that 95% - we can play with that later, it isn't important for now),

Posted 11:48 a.m., January 2, 2004 (#15) - FJM
Can you post the BABIP MLE coefficients? It seems like most, if not all, of the dropoff in hits can be attributed to the big increase in K's.

Posted 2:34 p.m., January 2, 2004 (#16) - MGL
Can you post the BABIP MLE coefficients? It seems like most, if not all, of the dropoff in hits can be attributed to the big increase in K's.

Batters are not like pitchers in that their BABIP is fairly constant such that a chnage in K rate or BB rate will automatically mean a change in hits per PA rate. OTOH, your point that all of the individual component rates are not nearly indepdendent is a good one.

For the rates, I probably should be using BB per PA, K per PA-BB, HR per AB, and s,d, t per AB-HR (or s,d per AB-HR and t per d+t), or something like that.

Tango, what do you think the rates should be that I use?

MGL, MGL, MGL.... YOU are accusing someone ELSE of skimming a post, and then commenting on it? Should I get John McEnroe on you?

As long as we get to Cartman... Seriously, who of us has the time to NOT skim other long posts??

As well, you really should be using the Odds Ratio method. At the level that you are doing it, using a Rates method is wrong.

I knew you'd say that! Can you not use the odds ratio method most of the time since it makes things a thousand times more difficult? Whenever I use the odds ratio method, I need a little "cheat sheet" I have saved on my computer! IOW, is the "ratio" method (as opposed to the "odds ratio") method good enough in most instances? In this instance?

Now, your idea about looking at the stats of the player in question in a year that's NOT part of the sample is EXCELLENT!

I know. I am hoping that it is not that big of a factor in this case. It may not be. If it were a big factor, we'd probably see a large observed regression, if as you say, the ones with the large PA's are not doing so well. We don't see a large regression though, which makes me think that this "inherent" selective sampling in year x+1 is not that great. I don't know. There is so much selective sampling going on with trying to compute real MLE's it's not even funny. It's like trying to compute true aging coefficients but 100 times more difficult.

Tango, if you give us any more "old threads" to read through, we are going to have "negative" social lives rather than none.

This MLE project is a work in progress. As you can see, James must have been on drugs when he non-chalantly comes up with true MLE coefficients in the 1980's and proudly proclaims that the resultant MLE's are "just as good as" major league stats!

OK, I changed the rates to the following:

s rate=s/(pa-bb-so-hr)
ex=(d+t)/(pa-bb-so-hr)
t=(d+t)
hr=hr/(pa-bb-so)
bb=bb/(pa)
so=so/(pa-bb)
sb=sb/(s+bb)
cs=cs/(sb+cs)

Here are the old sample "major to minor" coefficients for players in AAA in year x and the majors in year x+1, using the old rates (everything per PA):

2000-2002 AAA and 2001-2003 majors

.94, .86, 1.00, .71, .85, 1.23, .79, .97, .86

These were s, d, t, hr, bb, so, sb, cs, and ops, all per PA (except for OPS of course)

Here are the same coefficients, using the new rates, as described above:

.96, .89, 1.15, .73, .85, 1.22, .87, 1.14, .86

These are s, ex, t, hr, bb, so, sb, cs, ops, with the rate denominators described above.

As FJM insightfully surmised, the lower K rates caused the hit rates per BIP not to decrease as much as per PA. This is definitely the better way to look at the rates. Thanks to FJM!

So now we need to divide these numbers by the regression coefficients again to get the "true" MLE coefficients.

The regression coefficients (year x+1 minor stats divided by year x minor stats for all following-year dual-service players) using these new rates are:

Remember also that these include both regression toward the mean AND an increase in talent level due to age, so that it may "look" like very little or no regression at all.

1.01, .99, .96, .98, 1.00, .99, .92, .99, 1.00

As Tango points out, we have to be a little wary of these coefficients, as there is some selective sampling here as well in terms of the number of PA's that each player gets in year x +1 as a function of how they performed in year x+1.
Anyway, dividing these into the sample MLE coefficients above, to yield an estimate of the true MLE coefficients, we get:

.95, .90, 1.20, .74, .85, 1.23, .95, 1.15, .86

For AA, here are the sample MLE coefficients:

.96, .82, 1.21, .59, .81, 1.31, .90, 1.13, .80

Here are the AA regression coefficients from one year to the next (see above for AAA):

.96, .99, 1.01, 1.05, .88, 1.03, .83, .95, .97

Again, dividing one by the other, yields:

1.00, .83, 1.20, .56, .92, 1.27, 1.08, 1.18, .82

So here are our final estimates of the "true" MLE coefficients using the new, and probably much better, rate scheme:

1.00, .83, 1.20, .56, .92, 1.27, 1.08, 1.18, .82

That is an MLE lwts of -20.4 per 500 PA, which is a 33.5% reduction in run production from AA to the majors.

AAA

.95, .90, 1.20, .74, .85, 1.23, .95, 1.15, .86

This is a -17.7 MLE lwts per 500 PA or a 29% reduction in run production from AAA to the majors.

Those reduction %'s seem a little high, but it is hard to tell.

What do you guys think?

Posted 2:36 p.m., January 2, 2004 (#17) - MGL
As long as we get to Cartman

That should be "as long as we didn't get to Cartman..."

Posted 3:15 p.m., January 2, 2004 (#18) - FJM
Everything seems to make sense, except for the BB rates. The implication is that AA pitching is about halfway between AAA and MLB in terms of control. The AA reg. coeff. for BB/PA (.88) seems awfully low.

Posted 3:27 p.m., January 2, 2004 (#19) - Rally Monkey
I'm impressed. You aren't leaving any stones unturned to search for the right coefficients, at least the ones that you can get to.

The reductions are higher than most systems use for MLE's, but its no shock to me, as you can tell from the stuff I posted in the last thread, I think other's have been too high.

On the new and improved rates (s/(pa-bb-k-hr), etc, how much does this give you in accuracy? Its a lot easier to work with ratios based on PA. Is the improved accuracy worth the more complex calculations?

These ratios should be looked at every couple of years, in my opinion. Is there any reason to assume that the difficulty of moving from AAA to the majors is constant? Its going to fluctuate every now and then, due to the relative quality of pitching at any time and also the ballparks used at each level.

Also, why stop at AA? I realize that few players go from A to the majors, so you'd have to look at players going from A to AA. There are problems in using this approach to get MLE's, as the players going from A to AA will not be the same as the ones going from AA to the majors. Our estimates wouldn't be as good, but better than nothing. Who's the better player, the 22 year old with a .925 OPS in A or the 22 year old with an .850 OPS in AA? If I was running a team, I at least want a system that can give me a reasonable guess, like Clay Davenport's minor league EQA ratings.

Posted 3:42 p.m., January 2, 2004 (#20) - Tangotiger
There are problems in using this approach to get MLE's, as the players going from A to AA will not be the same as the ones going from AA to the majors.

Even better!

What you really really want is for the players that jump from league to league to be representative of the players in each league. That is, the CLOSER you can get your sample players to be almost random, the better.

I have to believe that alot of the A to AA moves must be based on somethign other than "they were somewhat lucky in A ball". If you get a nice big chunk of players going A to AA, that's great. AA to AAA might also be a good watch, though I'm guessing that AA and AAA are treated almost the same.

Posted 3:44 p.m., January 2, 2004 (#21) - Tangotiger
For example, you can get decent translation numbers between NL and AL because the types of players and the quality of players that move league-to-league are pretty random. What hurts you are sample size, and you have to control for the handedness of the batters and pitchers.

However, you can easily figure out translations between AL and NL using say 5-year moving averages. I'm a little miffed that I haven't done this already actually.

Maybe someone will now...

Posted 5:05 p.m., January 2, 2004 (#22) - MGL
One of the problems you might run into the NL to AL (and reverse) translations is "getting used to a new league." You may find a reduction in production going both ways. Should you look at NL versus on eor tow years after going to the Al (and vice versa) and control for age?

The AA reg. coeff. for BB/PA (.88) seems awfully low.

You mean high, right? Yes, I agreee that there is no logical reason for the AA BB rate coeff. to be higher than the AAA. Should be the other way around. There is so much sample error in all the different calculations.

For example, in the 97-99 vs 98-00 samples, here are the AA sample MLE coeff.

.94, .85, 1.27, .54, .82, 1.27, .87, 1.21, .80

Compare that to the AA 00-02 vs. 01-03 sample coeff. from before:

.96, .82, 1.21, .59, .81, 1.31, .90, 1.13, .80

Now, here are the 97-99 vs. 98-00 regression coeff. for AA:

.95, .95, 1.00, .94, .90, 1.07, .87, 1.05, .93

Again, compare that to the ones for 00-02 vs. 01-03 from before:

.96, .99, 1.01, 1.05, .88, 1.03, .83, .95, .97

If we average the two sets of sample data for both the sample MLE's and the regression coeff., we get:

.95, .83, 1.24, .56, .82, 1.29, .88, 1.17, .80

and

.96, .97, 1.01, 1.00, .89, 1.05, .85, 1.00, .95

Again dividing one by the other, we get, for AA:

.99, .86, 1.23, .56, .92, 1.23, 1.04, 1.17, .84

For AAA, the 97-99 vs 98-00 regression coeff. are:

.98, 1.00, .93, .96, 1.03, 1.00, 1.02, .97, .99

The other (more recent) sample was:

1.01, .99, .96, .98, 1.00, .99, .92, .99, 1.00

The average of the two samples is:

1.00, .99, .95, .97, 1.01, .99, .97, .98, 1.00

The 97-99/98-00 sample MLE coeff. for AAA are:

.94, .93, 1.10, .70, .88, 1.16, .77, 1.21, .87

The other sample was:

.96, .89, 1.15, .73, .85, 1.22, .87, 1.14, .86

The average of these is:

.95, .91, 1.13, .72, .86, 1.19, .82, 1.17, .86
1.00, .99, .95, .97, 1.01, .99, .97, .98, 1.00

Dividing the average sample MLE's by the average regression coeff. gives you, for AAA:

.95, .92, 1.19, .74, .85, 1.20, .85, 1.19, .86

Final tallies:

.99, .86, 1.23, .56, .92, 1.23, 1.04, 1.17, .84

32% reduction from AA to majors.

AAA

.95, .92, 1.19, .74, .85, 1.20, .85, 1.19, .86

28% reduction from AA to majors.

Still don't know why the BB coeff. is higher in AA than in AAA. Would have to look at players who went from AA to AAA and see what happens.

Posted 6:28 p.m., January 2, 2004 (#23) - FJM
The AA reg. coeff. for BB/PA (.88) seems awfully low.

Actually, I was referring to the year-to-year reg. coeff., which you divide by to get the final MLE. In other words, for AAA it is .85 / 1.00 = .85 while for AA it is .81 / .88 = .92. I'm not questioning the numerator, just the denominator, the .88. Note that it is the only year-to-year coeff. other than SB rate which varies significantly from 1.0.

Posted 6:45 p.m., January 2, 2004 (#24) - MGL
Here are the AA to AAA from one year to the next sample coeff. for 97-02 vs. 98-03 (6-yr sample):

.98 .93 1.04, .82, .95, 1.08, 1.07, 1.03, .93

This includes the selective sample problem just like with the AAA or AA to majors samples, so we need some regression coeff. (from AA one year to AA in the next year for players who had dual service in AA and AAA) to divide by (also 6-yr samples):

.97, .97, 1.01, 1.00, .90, 1.05, .85, .99, .95

That's the number I don't trust. Since the true BB rate would go up quite a bit with age for a young player, even with regression I would expect that next year's BB rate would stay around the same, and not drop by 10%.

Anyway, dividing one by the other, to get the true AA to AAA coefficients, we get:

1.01, .99, 1.06, .89, 1.04, 1.05, 1.23, 1.03, .97

This is only a 3% reduction in run production. Seems too small, although it jives with the 32% from AA to majors and 28% from AAA to majors. Also, it appears as if the BB rate might go up from AA to AAA, even after adjusting for age.

In fact, let's look at players who played in AA and AAA in the same year and who had around a league average (for AA) BB rate, so that we would expect their sample AA BB rate to be about the same as their true AA BB rate, such that their BB rate in AAA in the same year should reflect their true reduction or increase in BB rate, whichever the case may be.

For players who had around average BB rates in AA and who also played in AAA in the same year, we get a AAA to AA ratio for BB rate of .91, which suggests that players DO lose BB rate when going from AA to AAA. In fact, let's substitute the .95 (a compromise) for the 1.04 in the above true AA to AAA coeff. That gives us:

1.01, .99, 1.06, .89, .95, 1.05, 1.23, 1.03, .97

which is a 6% reduction in run production rather than a 3%.

The .95 rather than the 1.04 probably changes some of the other values, but let's not worry about that.

Let's go back to the AA to majors and AAA to majors true coeff., and see if we can "fix" those screwy (backwards - the AA being higher than the AAA) BB coeff., the same way we "fixed" the AA to AAA one.

We'll look at same year AA to majors and AAA to majors with around average BB rates in their lower league:

AA (6-yr sample)

The sample BB coeff. is .70, so we will call that the true BB coeff.

AAA

The sample BB coeff. is .86, so we will call that the true BB coeff.

Now that's more like it!

Since the AA to AAA and the AAA to majors samples are much larger than the AA to majors samples, so let's interploate the AA to majors from the AA to AAA and the AAA to majors. That gives us a AA to majors of .82. If we average that with the .70 and give more weight to the interpolated value, we'll call it .80, since it is a nice round number.

So here is what our true MLE coeff. look like now with the BB rate changes:

.99, .86, 1.23, .56, .80, 1.23, 1.04, 1.17, .84

35% reduction from AA to majors.

AAA

.95, .92, 1.19, .74, .86, 1.20, .85, 1.19, .86

28% reduction from AAA to majors.

Who knows? Could be!

Posted 6:50 p.m., January 2, 2004 (#25) - MGL
Actually, I was referring to the year-to-year reg. coeff., which you divide by to get the final MLE. In other words, for AAA it is .85 / 1.00 = .85 while for AA it is .81 / .88 = .92. I'm not questioning the numerator, just the denominator, the .88. Note that it is the only year-to-year coeff. other than SB rate which varies significantly from 1.0.

OK, right! You were referring to the regression coeff. and not the MLE coeff. It WAS low! That's what I redid. Yes, I also thought it should be close to 1, as the age thing cancels out the regression, which happens with most of the other components except for triples of course.

Definitely. Very sharp! I think you are probably the only one who is actually reading through and making sense of my gibberish! Good work!

Posted 8:23 p.m., January 2, 2004 (#26) - Tangotiger
AL to NL or vice-versa.

WEll, if both experience the same "familiarity" factor, it'll cancel out.

A NL may lose 5% going to AL, and the AL will lose the same 5% production. So, you have to look at the two at the same time.

Posted 9:13 p.m., January 2, 2004 (#27) - David Smyth
---" Very sharp! I think you are probably the only one who is actually reading through and making sense of my gibberish!

No offense, MGL, but these posts have an infinite number of MLE "lines" which look similar. Sure, there are a few bright and informed readers who are following it without much trouble, but for the rest of us interested stragglers, maybe you could at some point do a good and prepared summary post in simple English...

Posted 9:23 p.m., January 2, 2004 (#28) - MGL
No offense David, but I am just kind of keeping a journal of my work and printing it on this thread in case anyone is interested. I know that it is hard, if not impossible, to follow, but I don't have the time to write something up that's more cogent...

Posted 10:11 p.m., January 2, 2004 (#29) - David Smyth
All I was suggesting was to do a summary post at the end. That's not a big deal, and will let the work by understood and appreciated by many more readers who don't have the interest or time to really follow along in detail...

Posted 10:48 p.m., January 2, 2004 (#30) - MGL
No prob! I will soon! I'm redoing my 2003 AA and AAA MLE's using the new coeff., so I can send them to Tango and he can post them somewhere...

Posted 11:01 p.m., January 2, 2004 (#31) - Rally Monkey
Come on David, you know these posts never really "end". Just use the last line MGL posts.

After reading through all this, I'd do my study a bit different (regress minor league stats more, major league less) but in any case MGL and I got to about the same place, although I don't have a database full of components and am just looking at the final results.

I've got a 26.5& reduction for AAA and a 37% for AA. MGL, do you have data for the lower minors? If so I wonder how close I'm getting for high A (Cal, FSL, Car). Right now I'm at 47% while regressing the A sample 70% to the league average and not touching the AA sample.

I'll look at low A soon.

Posted 11:35 p.m., January 2, 2004 (#32) - Rally Monkey
MLE's for low A players lose 56%.

Posted 12:17 a.m., January 3, 2004 (#33) - MGL
I used the wrong denominators (and numerators) for the SB and CS portion of the stats. I used SB/(s+bb) and cs/(sb+cs). I should have used (sb+cs)/(s+bb) or "attempts," and THEN sb/(sb+cs), or cs/sb+cs).

The observed MLE coeff. for these "new" stats (att rate and sb success rate) are:

AA
.96, .93

AAA
.89, .93

The regression coeff. are:

AA
.85, 1.00

AAA
.96, 1.00

Dividing the first by the second to get the true MLE coeff., we get:

AA
1.13, .93

AAA
.93, .93

This means that when going form AA to the majors, attempts go up 13% and success rate goes down 7%. Going from AAA to the majors, atttempts and success rate both go down 7%.

Going from AA to AAA, the final true MLE coeff. for these two "new" rates (att and sb succ. rate) are:

1.22, .98

This means that when you go from AA to AAA, you attempt 22% more steals and your success rate goes down by 2%. The AA to AAA numbers jive very well with the AA to majors and AAA to majors.

If you attempt 100 steals in AA, you will attempt 122 in AAA and 113 in the majors. If you succeed 70 times in AA, you will succeed around 69 times in AAA and 65 times in the majors.

Posted 12:33 a.m., January 3, 2004 (#34) - Michael
Michael, can you give details as to how you got your numbers, as well, as how you'd handle the issue of non-uniformity mathematically
Actually if you assume that A-Rod's OBP talent is 95% likely to be between .380 and .420 (and make that uniform for that 95% - we can play with that later, it isn't important for now),

No problem. It is really pretty easy. You want to take the true talent distribution, and then put the probabilities that Arod puts up an observed "x" OBP given that his true talent is "y". Then you do that for all x and all y. In essence you are going to multiply the two probibility distributions and look at the resulting distribution. Often, especially for baseball size sample sizes, I find this is easier if we make everything discrete.

To do this just form a table (say in Excel) where you have the true talent going across the top (every integer OBP true talent level between .380 and .420, plus a little bit of tails on each end - you could do more than every integer since the expected distribution of true talent is probably a continuous function, but I'm discretizing it because it is easier to deal with, and since we are only going to make 600 observations anyways this discretization will not affect the estimate very much) and where you have rows that represent the number of hits we observe in Arod's next 600 PA (starting at 0 and going all the way to 600).

Then you give each column in the table a weight which represents how likely that the true talent level is the value for y that this column represents. In the .400 true talent level the .400 column is 1 and every other column is 0. In the .380 to .420 distribution every column in the .380 to .420 was equally worth 95/4100 and each of the columns in each of the tail region were worth 2.5/(100*num_columns_on_one_tail). Then you basically sum across the rows for each x, making sure to have weighted the row properly. So if y is .395 and x is 120 in my table I'd have originally Combin(600,120)*(.395)^120*(1-.395)^(600-120). Then when I weight the columns that would turn into Combin(600,120)*(.395)^120*(1-.395)^(600-120)*(95/4100).

So then in this final column you have P(OBP observed is x) given the assumed true talent distribution. Then you just use the cumulative sum to find what x is for P(OBP observed is about Maximum Likelihood Estimation Primer*A* but I'm not sure.

Posted 1:11 p.m., January 3, 2004 (#35) - FJM
Thanks for the kind words. Numbers, rather than plain English, is my first language, I guess. Anyway, I'm not having any trouble following you. The idea of a summary at the end is a good one, though.

Do you have the ability to split your AAA (or AA) data by league? I realize it shouldn't make any difference, since the data has already been adjusted for league (and park). Still, the PCL is so different from the IL that I wonder about the accuracy of those adjustments, particularly when it comes to stats like BB rate and K rate. If the adjustments ARE correct, then looking at the 2 leagues separately should produce 2 sets of similar results, the only differences presumably being random variation. That in turn would give us some idea how confident we can be in your MLE's.

Posted 1:13 p.m., January 3, 2004 (#36) - Rally Monkey
That's surprising that players will attempt much more steals in AAA than they will at AA. I wonder what would cause that?

Perhaps the best catching prospects tend to stay in AA for a full year, but move quickly through AAA to the majors. AAA becomes the place for catchers who have major league bats but not the defensive skills, the Adam Melhuses and the Mike Riveras of the world. Major league teams seem to put more emphasis on defense over offense at catcher, perhaps more than is optimal.

Posted 4:27 p.m., January 3, 2004 (#37) - MGL
Rally that is a reasonable explanation, but there is so much selective sampling and sample error, that I'm not sure the steal attempts thing means that much.

FJM. yes I can split up the leagues, but again, we reduce our sample sizes in half in AA and in thirds in AA, such that any differences you might see could be sample error, could be something else, etc.

I'm reasonably sure that most of the coefficients I came up with are in the same ballpark as the "real" ones, I'm reasonably sure that there is a fairly linear relationship between major and minor talent, regardless of the level of the talent (high or low), and I'm reasonably sure that the resultant MLE's are a decent predictor of major league stats/ Beyond that, who knows? I wouldn't take these coefficients as the gospel. As I said, I don't know where James and others got the original idea that their MLE's are "the gospel."

BTW. I sent my entire MLE files for AA and AAA, 2001-2003, to Tango to put up somewhere. I used a version of the new coefficients. They are quite interesting. Lots of seemingly good hitters in the minors that I have never heard of.

Here are all players who had at least 200 PA's in AA and AAA in 2001-2003 and have had significant major league time as well, and their age adjusted, and weighted by year (5/4/3), total MLE OPS's in AA and AAA were at least .800:

I regressed their weighted, age adjusted total MLE OPS to reflect a Marcel-type major league OPS projection. For 200-400 PA's, I regressed around 70%, for 400-600, I regressed around 50%, for 600 to 800, around 40%, for 800 to 1000, around 30%, and over 1000, around 20%. These are off the top of my head. The OPS I regressed to was .755, which is the average major league OPS that the MLE"s are based on.

Name, PA's, MLE OPS, MLE OPS Regressed, Park Adjusted Major Leagie OPS

A. Dunn, 416, 1.132, .906, .859
T. Perez, 229, .928, .790, .727
N. Johnson, 444, .873, .808, .812
A. Kearns, 286, .842, .781, .865
R. Simon, 245, .832, .774, .758
T. Hall, 505, .824, .789, .683
M Giles, 405, .816, .782, .846
J. Crede, 903, .810, .793, .750
H. Choi, 959, .807, .791, .739
M LeCroy, 631, .803, .781, .768
R. Ludwick, 1272, .800, .791, .698

Posted 4:38 p.m., January 3, 2004 (#38) - MGL
Rally, fascinating! I get a 25.7% reduction from AAA and a 37.4% reduction from AA for the last 3 years! Our numbers are so close, it's scary! Did James and Dan S. put that same numbers as 18% for AAA? Are we all using the same scale - some kind of runs created? I'm using the average MLE lwts in AA and AA, which is of course, runs below the major league average, and then converting that to a runs created by simply adding .122 runs per PA, which is the average number of runs per PA in the major leagues last year.

IOW, my average AAA player had an MLE lwts of -.031 per PA. So I am converting that to a "runs created" per PA by adding .122, which is .091. IOW, my average AAA player created .091 najor league runs per PA.

So, I used 1 minus .091/.122 as the "reduction in run production" going from AAA to the majors. I think that is the right way (or at least one way) to do it...

Posted 5:16 p.m., January 3, 2004 (#39) - jto
MGL have you ever done any studies about college level equivalencies (CLE's)?

Posted 7:10 p.m., January 3, 2004 (#40) - MGL
MGL have you ever done any studies about college level equivalencies (CLE's)?

No. The "M" in MLE stands for "major" so they would still be "MLE's" for college - perhaps CMLE's!

Posted 8:29 p.m., January 3, 2004 (#41) - Rally Monkey
I'm using equivalent runs, for my samples, pretty much the same thing as runs created. I'm then looking at equivalent runs per out to get my rates, instead of per plate appearance. To me its a bit easier to work with because being a below average hitter will reduce the total of plate appearances, but I think essentially we are on the same scale here.

Posted 10:26 p.m., January 3, 2004 (#42) - MGL
To me its a bit easier to work with because being a below average hitter will reduce the total of plate appearances...

If I'm doing this correctly, if I did it by out rather than by PA, it increases the reduction, since as you say, the minor league hitters have slightly fewer PA's in a game, and therefore generate slightly fewer runs.

For AA, the reduction in runs is another 2%, so we are up to 39.5%. For AAA, we have another .8% in reduction, for 26.5%...

Posted 10:27 p.m., January 3, 2004 (#43) - MGL
the minor league hitters have slightly fewer PA's in a game...

What I meant was that the minor league hitters, on the average, would have slightly fewer PA's in a major league game...

Posted 11:07 p.m., January 3, 2004 (#44) - Scoriano
I have a sideways question ignited by all these fantastic major and minor (league) equations . What shoud clubs do to evaluate college players sabermetrically? I suspect there are enormous challenges in adjusting numbers based on almost completely unbalanced schedules, unreliable park adjustment data, etc.

We know from Moneyball, etc. that Beane looks for undervalued players, lesser tools, better OBP, less $ needed to sign them. But what could you do to actually rank them in order of performance/potential? And how does Beane really gain sufficient comfort that players he thinks are undervalued really are, if as I suspect, the evaluative tools are so imperfect. Riff away.

Posted 12:02 a.m., January 4, 2004 (#45) - tangotiger
The issues with college equivalencies, in order:

1 - strength of schedule relative to other college players
2 - quality of competition relative to MLB
3 - number of games
4 - age
5 - aluminum bats
6 - parks
7 - fielding position

1: this can probably be handled by a logisistic regression model

2: this is huge really... I'll guess that an average college team would play .100 ball against an average MLB team... it's a huge gap... the "feasting on bad pitchers" would apply alot here

3: sample size... not much you can do about it

4: the talent slope goes very high at the 18 to 21 level... while age itself is not much of an issue, when you couple it with the other issues, it just adds another level of variables that you didn't need

5/6: the style of play is much different... no reason to think that all players will be affected similarly, or anywhere close to it

7: this might actually be huge... most good players in college ball play a strong fielding position, but move them to the majors and very few stick to it... the evaluation of a player's fielding talent is extremely important, as it relates to where you can put the guy's bat

As for pitchers, I wouldn't use the normal stat line, but I'd like to get my hands on their performances by count. THAT's where you can figure some things out. I'd want it for hitter's too, but not as much.

Posted 1:36 a.m., January 4, 2004 (#46) - MGL
Look at it this way. If you could get equally reliable MLE's at almost any level, what about Little League? T-Ball? (kidding) That's a good one! T-Ball MLE's!

Seriously, there has to be a point at which current talent, even if you could normalize it from team to team, park to park, league to league, etc., just doesn't correlate well at all to future (major league) talent, because player's learn and develop, physically, mentally, intellectually, psychologically, and desire-wise at signficiantly different rates both quantitatively and qualitatively.

The assumption behind AA and AAA stats being able to be translated into reliable MLE's is that the player's talent has pretty much developed and matured as much as it can, and that the only thing left is physical maturity and level of competition. Or at the very least, that the development in talent between A or AAA and the majors is more or less the same for all players.

At the other extreme is, as I said, Little League. One, we don't know what a 12 year old's physical makeup is going to be in 10 years. Two, we don't know what his competetive desire will be. Three, we don't know what his learning capability or desire will be. Etc. So even if we could normalize Little League stats to account for park, competition, etc., we still would not be able to come up with good MLE's. Basically the correlation (r or r squared) between even normalized LL stats and major league stats would be very small. For AA or AAA and major leagues it should be pretty high (close to majors versus majors).

So the question is, how much correlation between college and majors would there be, even if you could control for most or all of the things Tango mentions? Somewhere between LL and AAA I would think.

As a practical matter, for college players you are not trying to compare them with major leaguers to see whether you should call a college player up to the pros. That is where MLE's for AA and AAA come up handy - to help you to determine when and whether to call a player up, based on his major league projection, which is of course based o his regressed minor league MLE, and based on who else you have at the major league level that he would replace. That is why MLE's themsleves are important at the AA and AAA level.

If you are not going to bring a player up from a certain level, like college ball, to the pros, then MLE's per se are not that important. All you really want to do is to be able to normalize college stats so that you can failry compare one player on one team and conference to another player on another team and conference. That is what is mostimportant at the college level! Not what those college stats would like at the major league level, although it would be nice if you had some idea.

You would also like to know how high school stats trnaslated to college stats. In facy, that is probably more important than how college stats translate to major league stats. That way you can compare high and college players failry in the draft. The inference of course is that once you are able to put the high school and college players on a "level playing field," the one with the better "equivalent" stats is the one who will more likley do better if ever in the major leagues.

The reason you don't really care that much about MLE's for college or high school players, but you do for AA and AAA (and A) players, is that they are fundamentally different. All teams get to draft X amount of players at any draft time, so all you care about is drafting the best player, or at least being able to identify the best player (so you can factor it into signing bonus issues and things like that). A, AA, and AAA players, on the other hand, are already in your organixation or you are trading for them. The only thing you care about with them are their relative chances of making the pros. That's why you not only need normalizing metrics (league and park adjustments) in the minors, but you also need MLE's. If player A is better than player B, but neither player has any chance of making the pros, then they both have around the same value...

Posted 2:49 a.m., January 4, 2004 (#47) - Rally Monkey (homepage)
The guy who does the above site has strength of schedule and park ratings for college players. If I was running a team I'd want to rank the college hitters this way, although sample size and aluminum bats still make things less than reliable.

When it comes to ranking high school players, good luck. I can't see any way to come up with a strength of schedule adjustment that could tell me what the level of competition in Anchorage, Alaska is compared to Great Falls, Montana. Then your sample size is down to what, 25-30 games per year? The only way I can see using stats for high schoolers is this: If his stats don't blow you away, go look at someone else. There must be 5000 kids (wild guess) every year who hit .400 in high school.

Finding good high school players is best left to the scouts.

Posted 2:59 p.m., January 4, 2004 (#48) - David Smyth
Just wanted to be sure of something, Tango or MGL. On those MLE CSV files, are the lwts per 500 PA positionally adjusted or not? I assume not.

Posted 3:41 p.m., January 4, 2004 (#49) - tangotiger
I'm sure not.

Posted 4:25 p.m., January 4, 2004 (#50) - Scoriano
Great stuff, thanks.

Posted 4:56 p.m., January 4, 2004 (#51) - MGL
No, they're not (position adjusted).

The positions listed in the files are the position at which each player had the most "games at position" his last year in AA or AAA for the 04 file and for that year in the 01-03 files.

Tango, do you still hate MLE's? I hate them if only because they have consumed me for about a week now! Time to move onto something else. Actually, Im working on the piching ones now.

Question for anyone: For the pitcher MLE's, can we automatically use the same component coefficients since the pitching is just the reverse of the hitting?

Posted 5:21 p.m., January 4, 2004 (#52) - AED
MGL, probably not. It's not really the reverse of hitting, after all. The batting coefficients reflect differences between minor and major league pitching (plus parks and fielding); the pitching coefficients reflect differences between minor and major league hitting (plus parks and fielding).

Posted 7:42 p.m., January 4, 2004 (#53) - Rally Monkey
We know that the average runs per game for the majors and AAA were about the same. We also know that AAA hitters are not as good as major league hitters. Yet they produce unadjusted numbers that are about the same as major league numbers. This means:

1) The pitching is inferior in AAA
2) The ballparks are easier to hit in at AAA
or
3) Some combination of the two.

Unless #2 is false, and the parks are equivalent, then the pitching MLE factors will not be the same as the hitting ones.

Bring up defense and it gets harrier, but that's no big deal if you are primarily concerned with K's, BB's, and homers.

Posted 8:03 p.m., January 4, 2004 (#54) - MGL
Unless #2 is false, and the parks are equivalent, then the pitching MLE factors will not be the same as the hitting ones.

Are you saying that if the parks are the same, then the hitting and pitching MLE's WILL be the same?

BTW, writing rule #23: Don't use more than one negative in a sentence!

Posted 9:56 p.m., January 4, 2004 (#55) - AED
If the parks and fielding are equivalent, the MLE coefficients will be close to reciprocals for batters and pitchers. There's no reason to assume they are, so it's probably wise to recalculate them for pitchers.

Posted 10:14 p.m., January 4, 2004 (#56) - Rally Monkey
"Are you saying that if the parks are the same, then the hitting and pitching MLE's WILL be the same?"

Yes. Oh, and what AED said.

Posted 1:58 p.m., January 5, 2004 (#57) - tangotiger
Tango, do you still hate MLE's?

This is probably the 10th time that MGL has mischaracterized my position, and seeing that he likes to (and accuses others of) skim articles, let me reiterate:

I hate the way people derive and treat MLEs as a final product. MLEs, as currently done, are simply a first step (maybe second step in this thread). Until you address the issues of selection bias (and examples have been provided on doing so) and supply confidence levels to what you do, MLEs are far, far from a final product.

The same applies for my disdain to how park factors are derived/treated, along with a whole set of adjustment factors. The basic concept is correct, but the execution leaves much to be desired.

Posted 2:01 p.m., January 5, 2004 (#58) - tangotiger
As for the point about the MLEs being the same for pitchers and hitters: they won't be.

If you apply the MLEs on the observed data, they can't be, since you would regress a pitchers hit/bip different that a hitter's hit/bip. If on the other hand you apply MLEs AFTER doing a regression, then what you are left with is simply applying a factor for quality of competition difference.

In this case, it's probably safe to say that talent distribution of hitters and pitchers are similar enough.

Posted 3:11 p.m., January 5, 2004 (#59) - MGL
This is probably the 10th time that MGL has mischaracterized my position, and seeing that he likes to (and accuses others of) skim articles, let me reiterate:

Honestly, Tango, that's probably the nost ridiculous thing you have ever said on these boards! One, how can someone "mischaracterize" (or characterize) ANYTHING with one question, "Do you still hate MLE's?" Two, is it my freakin' imagination, or is part of the the name of this thread "Why I hate MLE's?" Maybe we live on different planets, but when someone says "Why I hate MLE's" that implies (more than implies - that is the only conclusion) that they "hate MLE's." When I ask "Do you still hate MLE's?? not only was it just a facetious question, but where in the world am I characterizing anything about WHY you may hate MLE's? Enter McEnroe and Cartman!

If on the other hand you apply MLEs AFTER doing a regression, then what you are left with is simply applying a factor for quality of competition difference.

Well, of course, you are supposed to apply MLE's AFTER doing the appropriate regression on the minor league level (even though no one ever does), since they are supposed to represent the ratio of true talent. That's why I went through all those painstaking steps to try and establish the true talent level of the minor league players! So yes, my question about pitchers and batters was assuming that we have first established the "true" stats of the minor league pitcher or batter.

In this case, it's probably safe to say that talent distribution of hitters and pitchers are similar enough.

Rally and AED seem to think that the average park in the minors is probably signifcantly different than those in the majors, since their first repsonse was to say that there is no reason to assume that the hitter and pitcher MLE's will be the same, unless I am misinterpreting what they said, or they are backpedalling...

Posted 3:46 p.m., January 5, 2004 (#60) - Rally Monkey
I'm just saying that the effects of ballparks on the AAA and major league game is an unknown, so I'd want to recalculate the MLE factors for pitchers. If the pitcher and hitter factors are different, then we'd have a clue about what the ballparks are doing to the game.

Posted 5:07 p.m., January 5, 2004 (#61) - FJM
Are you able to calculate an R^2 for your MLE's against their actual major league stats for those batters who had enough PA's to qualify at both the minor and major league level during the 2001-03 period? If so, how does it compare to the year-to-year R^2 for all qualified betters who were in the majors for the entire period?

Here's my concern. I looked at the 3 rookies who made the D'Backs in 2003. (Alex Cintron technically didn't qualify as a rookie, even though he had had less than 100 PA's with the big club before 2003.) Here are your MLE's from 2001-03, followed by how they did with the AZ DBs.

Cintron / BA / OBP / SA / OPS
MLEs / .271/ .301/ .374/ .672
AZDB / .317/ .359/ .489/ .848

Kata / BA / OBP / SA / OPS
MLEs / .247/ .285/ .375/ .659
AZDB / .257/ .315/ .420/ .736

Hammock/ BA / OBP / SA / OPS
MLEs / .229/ .288/ .354/ .642
AZDB / .282/ .343/ .477/ .820

The MLEs are far below the numbers they actually put up in every case except Kata's BA. (Even there it should be noted that he was batting .269 through August before slumping to .204 in September.) I believe the problem lies not with your estimates as such. It's the old selective sampling problem reappearing. Your MLE's are very close to replacement level. Consequently, if the batters being called up don't do significantly better than you expect, their playing time will be severely limited and/or they will soon find themselves back in the minors. Either way, they won't get enough MLB PA's to qualify as successful callups.

Posted 5:07 p.m., January 5, 2004 (#62) - FJM
Are you able to calculate an R^2 for your MLE's against their actual major league stats for those batters who had enough PA's to qualify at both the minor and major league level during the 2001-03 period? If so, how does it compare to the year-to-year R^2 for all qualified betters who were in the majors for the entire period?

Cintron / BA / OBP / SA / OPS
MLEs / .271/ .301/ .374/ .672
AZDB / .317/ .359/ .489/ .848

Kata / BA / OBP / SA / OPS
MLEs / .247/ .285/ .375/ .659
AZDB / .257/ .315/ .420/ .736

Hammock/ BA / OBP / SA / OPS
MLEs / .229/ .288/ .354/ .642
AZDB / .282/ .343/ .477/ .820

Posted 5:13 p.m., January 5, 2004 (#63) - FJM
Sorry for the double post.

Posted 6:36 p.m., January 5, 2004 (#64) - Rally Monkey
Lyle Overbay MLE .285/.355/.441/.796
2003 .276/.365/.402/.767

Lets not forget this guy. There are many players who don't live up to their MLE's, so I wouldn't draw any conclusions from a sample of 3 who overachieved.

Posted 7:15 p.m., January 5, 2004 (#65) - FJM
You're right, RM, Overbay underperformed. But he's the exception that proves my point. His MLE's were way above replacement level, so he could afford to fall short and still provide some value to the team. But eventually it even caught up with him. He was sent down for most of July and all of August, returning only after the rosters expanded. Even then he got only 27 AB's after averaging 67 per month through the ASB.

To reiterate my point, unless he is brought up primarily for his outstanding defense, a young player must hit above replacement level to keep his job. A manager might stick with an older player in a prolonged slump if he has "veteran presence" (e.g., Mark Grace). But a rookie must constantly prove himself.

Posted 5:05 p.m., January 6, 2004 (#66) - AED
FJM, you don't want to look at correlations to test MLE accuracies, because players the varying numbers of at bats mean correlation will be better for players with lots of at bats and worse for those with few at bats. Rather, you want to look at whether or not the performances were statistically consistent with the predictions. A player predicted to hit 0.250 but who hits 0.300 in 50 at bats, for example, is consistent with his prediction at better than a 1-sigma level.

I would expect that MLEs are comparable in value to actual major league stats, because the accuracy of any projection from 3 years' data will be limited primarily by the random noise in the year being projected.

The point about selection effects is right, though. Overbay underperformed his MLE at the 0.3 sigma level, which means there is no reason to believe he was any worse than the MLE suggested. The difference between his prediction and performance can be solely attributed to luck. (and not even all that bad luck!)

Kata did the opposite, getting lucky in his first few weeks to overachieve his MLE and stick. By the end of the year, he was most of the way back down to his MLE.

Posted 8:30 p.m., January 6, 2004 (#67) - MGL
AED, you are a hundred times smarter than I when it comes to statistics, but this is really a bugaboo of mine:

Overbay underperformed his MLE at the 0.3 sigma level, which means there is no reason to believe he was any worse than the MLE suggested. The difference between his prediction and performance can be solely attributed to luck. (and not even all that bad luck!)

Kata did the opposite, getting lucky in his first few weeks to overachieve his MLE and stick. By the end of the year, he was most of the way back down to his MLE.

When do cetain "sigma's," as you call them, magicaly turn from "can be solely attributed to luck" to "cannot be solely attibuted to luck?" They don't as you know, which is why I hate when anyone makes a magical distinction between a result that is less than 2 SD's a from a null hypothesis versus more than 2 SD's, etc. I much prefer to say, "This is the probability that a certain result occured by chance given a certain assumption (usually a certain true value).." If it's 1%, 3%, 5%, or 20%, or whatever, the reader can draw his own conclusion.

This notion, used mostly in the social sciences, that 2 SD's or 2.5 SD's (only a 2.5% or a .5% chance of particular results occurring by chance) is the "magical" threshold for statistical significance, is absurd.

Posted 12:07 a.m., January 7, 2004 (#68) - AED
Now, now. Nobody I know uses a certain sigma threshold at which the interpretation suddenly goes from "consistent" to "inconsistent"; obviously it's a gradual scale from better to wrose agreeement. But I can't imagine any situation in which the first data point agreed with the model at a 0.3 sigma level and my reaction was "goodness, my model's not right!", so I think my statement was correct. Given the number of at bats, he essentially performed as expected, yet lost his starting job because too much emphasis was given to two week's worth of stats (when Spivey was injured and Overbay and Kata were battling for one roster spot).

Posted 1:11 a.m., January 7, 2004 (#69) - MGL
Given the number of at bats, he essentially performed as expected, yet lost his starting job because too much emphasis was given to two week's worth of stats (when Spivey was injured and Overbay and Kata were battling for one roster spot).

Now that's a better way to put it....

Posted 7:18 p.m., January 7, 2004 (#70) - FJM
The ironic thing about this discussion is that managers (who probably know less about statistics as a group than the least sophisticated Primate) make decisions affecting people's careers all the time AS IF they did understand statistics. Clearly, Brenly felt Overbay failed. Even if it cannot be demonstrated statistically, the result was exactly the same: he got sent down and his days as a Diamondback were essentially over. I've seen this process repeated over and over again. Remember Jack Cust?