See copyright notice at the bottom of this page.
List of All Posters
Forecasting 2003
February 13, 2003 - Michael
I think this is a good idea. One thing to think of is how will people be scored. Sum of linear error, sum of square error, sum of the rank of their prediction from amongst the entries, etc. In addition to being an interesting question, it may influence what people put for someone like Bonds. If you think his worst case, 25th percentile, median, 75th percentile, and max were say OPSs of 800, 900, 1150, 1350, 1425 respectively than you might be "wiser" to predict 1075 than 1150 in some scoring systems.
I also think it would be interesting to see what people's estimated confidence intervals would be. Asking people for 50% confidence intervals for each of the players would be interesting as for some players it may be 900 +/- 50 where for others it may be 900 to 1350. People whose 32 players break up roughly 8 below, 8 in but below the predicted value, 8 in but above the predicted value, 8 above the range would be doing their confidence ranges pretty well (assuming that their estimates are close to accurate and they didn't just cheat by predicting 8 guys with ranges of 0-0, 8 guys with 2000-2000, and 16 guys with 0-2000, 8 of whom predict 0, 8 of whom predict 2000). That would be interesting as predicting the ranges of possible values would add a lot of value to people's predictions. For instance if one person predicts player A will have a 800 OPS and player B will have a 795 OPS while another person predicts player A will have a 795 OPS with a 50% chance of being between 780 and 815 and predicts player B will have an 800 OPS but has a 50% range of 675 to 900 and it turns out that player A has an OPS of 820 while player B has an OPS of 680 it isn't clear to me that the second person's predictions are more useful and insightful even though they were slightly more off.
SABR 301 - Talent Distributions (June 5, 2003)
Discussion ThreadPosted 1:48 a.m.,
June 6, 2003
(#18) -
Michael
Stat terms 101 as applied to this article:
I think one thing that must be kept in mind is that the distribution of talent in MLB (according to this study) is distributed in a normal way when considering playing time. If you don't consider PA, but instead consider by number of players, then you get the exponential distribution as in chart 2. For certain discussions the by player is a key point (like what replacement level might be). For certain discussions (like what an "average" player faced in a typical mlb situation is like) the PA weighted version is more powerful.
The distribution with a higher than normal distribution of values left of the mode [the most frequent value] is a skewed distribution (with a left-skew, or positive skew value). There is also the kurtosis of a distribution that determines how skinny or fat the distribution is (positive narrow, negative wide, 0 normal). In the normal Gaussian distribution the mean = mode = median. I've normally (no pun intended) heard a "normal like" distribution that is skewed called a, cleverly enough, skew-normal distribution.
For a finite sample you can calculate the skew by taking the (sum of ((the difference between each value and the mean), cubed) divided by the number of samples), all divided by the cube of the standard deviation.
The standard deviation being the familiar square root of ((the sum of (the difference between each value and the mean), squared) divided by the number of samples) which generally measures the spread of a distribution.
The kurtosis can be calculated similarly as the (sum of ((the difference between each value and the mean), to the fourth power) divided by the number of samples), all divided by the fourth power of the standard deviation.
When the kurtosis and the skew are near zero and you have a normal like distribution then the confidence ranges rules of thumb are the familiar 68% within 1 SD of mean, etc.
Now to try to explain what I think we need for what Kevin is getting at:
covariance. So Cov is short for covariance which is a measure of, cleverly enough, how two things co-vary. I.e., how they vary together. Covariance is the average of the product of the difference between the two values and their distributions means. For instance imagine we were measuring the baseball talent of people residing in the US and the income of people residing in the US. If the average baseball talent is B and the average salary is $ then for each person [p] we compute their talent level [T(p)] and their income [I(p)] then for each p residing in the US we do (T(p) - B) * (I(p) - $) and take the average of all these products across all these p's to get the covariance(T, I). (Actually we divide the sum of the product from these n people by (n-1) for some technical reason I can never remember).
I think that paragraph makes sense if you read it carefully. But even if not what it implies is that if the covariance is positive then the two things we are measuring (variables) increase together (in the example above we might expect there to be a [slight?] positive covariance between baseball talent and income level). If as one variable goes up the other goes down you end up with a negative covariance. If the variables being analyzed are independent than the covariance is 0. Using some math one can figure out that the absolute value of the covariance of X and Y (where X and Y are two variables we are studying) is always less than or equal to the product of the standard deviation of X with the standard deviation of Y. Also the covariance of X and X (I.e., of a variabled with itself) is always equal to the standard deviation squared (aka the variance).
Now this covariance may sound like some similar to correlation. But one thing to note is the units of covariance are in terms of what the units of X and Y are and the size of the covariance of X and Y is to a very large degree influenced by the standard deviations of X and Y (I.e., a huge standard deviation of X will lead to a larger value for the covariance of X and Y than one might expect for things that are not related). This means that if we want to tell how related X is to Y and compare that with how related A is to B we might be in trouble using just covariance.
So there we bust out correlation and move each distribution to a more standard z-score (mean of 0 standard deviation of 1) by dividing each point in X by the standard deviation of X and subtracting from each point the mean/std. deviation so the new distrbitution has a mean of 0 and a standard deviation of 1. This means each point is now measured in standard deviations from the mean. So if we do the average of the product of z-scores of X and Y (instead of the average of the product of the differences between each point and the mean of the respective distributions) we calculate the correlation coefficient. Note this is just a special type of covariance so the sign of the correlation should be the same as the sign of the covariance. Also note that thanks to what we know about covariance the correlation must be between -1 and 1 because the standard deviations of our original z-scores was 1.
Once you have the correlation you can compare the correlation of X and Y with the correlation of A and B (or even with the correlation of X and A). Also you can tell how much the variance in one distribution is explained by the other distribution by squaring the correlation (the r^2 number that so many stat projects show).
So getting pack to Kevin, the cofusing thing is I don't get what the covariances are of, are they the covariance of park factors and league era? Are they instead the covariance of ERA+ and league.
Think, think, think... Ah, I should RTFM and then I'd learn COV by Kevin is actually coefficient of variation (std. deviation/ average). So what he's doing is converting the ERA+ to z-scores and then comparing seasons across time on the z score axis. Which as alluded to above in my covariance digression is an attempt to put things on the same untis for comparison. In other words it is an attempt to compare and ERA+ of 150 in 2002 with an ERA+ of 150 in 1960 (amongst other things). It answers the question "is an ERA+ of 200 equally good across all time" with "no, look at the ERA++".
Applications of Win Probabilities (June 13, 2003)
Posted 10:57 p.m.,
June 14, 2003
(#1) -
Michael
So to calculate general leverage this may work, but studying IBB situations or telling if the closer should be used I think one needs to look at the team makeup more closely. One needs to consider both what the quality of your non-closer good bullen arms are and the quality of the team hitting. If you are facing some of the worst of a bad lineup, say Detroit, of hitters than a 1 run lead may be safer than facing the 2-4 hitters of a powerful lineup, say Toronto, with a 2 run lead.
League Equivalency (July 2, 2003)
Posted 3:10 a.m.,
July 3, 2003
(#2) -
Michael
I think a bigger problem is the non-random differences between the players who change leagues and those that do not. Young good players do not tend to change leagues b/c they are not yet free agents. That means the group of players changing leagues is unlikely to be similar to the group of players not changing leagues.
Bats Right, Throws Left (July 29, 2003)
Posted 7:59 a.m.,
July 30, 2003
(#1) -
Michael
Is there a worse stat imaginable than R+RBI-HR?
If there is I don't want to meet it. R and RBI are horrible team dependent and batting order dependent statistics. Subtracting HR is a dumb, dumb, dumb idea that comes about because some people think you shouldn't get credit for 2 team runs (I.e., a R and a RBI) when you hit a HR since you only score one actual team run. But actually each R and RBI is really one half a team run (i.e., player A hits a triple. Player B hits a single scoring A. A gets half the credit for the team run (scored as a R) and B gets half the credit for the team run (scored as an RBI). C now hits a HR which scores B. The team run that B scores half goes to B (scored as a R) and half goes to C (scored as an RBI). The team run that C scores has half go to C (scored as a R) and the other half also goes to C (scored as a RBI). Thus the team scored 3 team runs. And A gets credit for 0.5 of them, B gets credit for 1 of them, and C gets credit for 1.5 of them. And 0.5 + 1 + 1.5 = 3 which is nice). So even if we were to decide to use the not that great R and RBI stats we should *not* subtract the HR. Doing so and then leaving it to bat R and throw L is beyond bizarre.
Discussion ThreadPosted 5:43 p.m.,
August 11, 2003
(#1) -
Michael
I'm almost positive that BP has the component park factors in the back of their book.
Good thing you said almost, because BP books do not have component park factors in the back of their books. (At least not the years I have).
Double-counting Replacement Level (August 25, 2003)
Posted 8:49 p.m.,
August 29, 2003
(#14) -
Michael
Are you sure that all BP does rep level that way? Someone asked GH about replacement level at the last Pizza Feed and he claimed a replacement level player as he used them at BP was a replacement level hitter who was a league average fielder at his position.
DIPS bookmarks (September 13, 2003)
Posted 12:56 a.m.,
September 14, 2003
(#1) -
Michael
Anyone know a site that tracks season to date DIPS numbers?
Patriot: Baselines (September 17, 2003)
Posted 5:10 a.m.,
September 19, 2003
(#8) -
Michael
They will say that nobody really cares about the relative ratings of .345 and .355 players. But hasn't a .345 player with 500 PA shown themselves to have more ability than a .355 player with 10 PA. Yes, they have.
Yes, but when we talk about what players have shown in the past an observerd .345 player with 500 PA has less VALUE in the past than the .355 player who has 10 PA (assuming a .350 FAT rep. level). And if you knew with certainty that A is a .345 player a prior and B is a .355 player and that you had infinite FAT at .350 and you knew that you could get up to 500 PA from A but only a max of 10 PA B that you'd want to have player B in your organization, but you wouldn't care to have player A in your organization because you'd rather just replace A with a FAT.
Injury-prone players (October 14, 2003)
Posted 8:21 p.m.,
October 14, 2003
(#15) -
Michael
I think this is all good stuff, but I agree that looking at effectiveness is going to be important. I mean Sean Green may not have missed many games this year but he wasn't very effective.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 3:46 p.m.,
October 27, 2003
(#11) -
Michael(e-mail)
Embarass me (I think you should post all the names/results, people are embaressed in the HACKING MASS, the pre-season predictions, etc.). Or email me if you like. Thanks.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 7:37 p.m.,
October 28, 2003
(#61) -
Michael
I got #21 and did better on hitters than pitchers, which surprises me not at all. I would suggest in future we look at all players, or more players, not just up-down-up types. It may well be the case that naive (or sophisticated-naive for a tangotiger monkey) algorithms do really well when there is a lot of uncertainty, but when things are fairly predictable they may underperform scouting or educated guesses. Maybe they miss certain breakouts or crashes. That information would be useful to (although my hunch is that MGL is right).
And I wouldn't restrict it to players who people have seen. If you want that information I'd ask people to put y/n on a question have you seen this player. That way you can study both and don't lose data.
Heck, if readers are willing to do a bit more work (I know I would, but don't know how many others are) you should ask people to put distributions on the performances. Like what do you think someone's 5%, 25%, 50%, 75%, 95% numbers are. Interesting to see how people do on distributions as that may well be a place for intelligence that the monkey, as of yet, misses. And a player whose OPS prediction is 750, 775, 800, 825, 850 might be worth a different amount to a team than a player whose distribution is 650,750,800,850,900 even though both have a predicted 800 OPS.
Also, it would be interesting to see how people did if you don't throw out the "troubling" data points.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 9:58 p.m.,
October 28, 2003
(#65) -
Michael
As far as the 5%, 25%, etc. levels, such as Pecota does, personally, I don't think anything other than using regular old z scores are appropriate (IOW, if you have a .700 OPS projection, then there is a 5% chance that that player would have an OPS of greater than 2 SD above or below .700, where one SD is based on one year's worth of projected PA's. Anything other than that (such as what Pecota tries to do), is BS I think (I am not sure)...
I agree with you most likely. But I think it is more worthy of study than should you go with Marcel on picking the 50%. I.e., I think it is more an open issue even though the default hypothesis should be just the SD of the regressed value.
I also know from other studies in economics that people tend to misestimate their confidence bars quite a bit. Like if you ask people in finance what's the GDP of the US, and ask them to give you a 90% interval estimate, the interval will end up being way too small. So I think people's guesstimates of spread will be off too.
Diamond Mind Baseball - Sending the runner on a 3-2 count (October 28, 2003)
Posted 7:44 p.m.,
October 28, 2003
(#2) -
Michael
But, if all else is equal, you'd rather send the runner. Because in the times the ball is a hit you'd rather the runner be going so he can advance further.
What's a Ball Player Worth? (November 6, 2003)
Posted 4:17 p.m.,
November 7, 2003
(#10) -
Michael
Also, it becomes fun/hard if you consider the context more fully. Like who the opposing team is, who is up to bat after you, etc. If the top 4 hitters on a team are really good then the 9 hitter starting an inning of with a double changes the WPA differently than the 5 hitter doings so even if they are the same base/out/score/opponent/park situation.
Golf - player of the year (November 10, 2003)
Posted 10:38 p.m.,
November 11, 2003
(#6) -
Michael
Yeah, and in many ways the replacement level in golf is pretty near $0 (I think). If you had to pick one guy who you got a cut in when he played, and when he didn't you got a cut of the last player to make the tournament (some invited pro who doesn't have a PGA card) then I think this replacement level is pretty near $0. As I think in most tournaments (not counting the appearance money the top pros get) that to get paid you need to at least make the cut. And most of the time the last player invited to a tournament doesn't make the cut.
So Vijay's durability should be an asset. Because it is entirely possible that if Tiger played more he would have played worse as he would have been more fatigued or less focused. If you told me they were both playing a big tournament and I had to pick one, based only on this year's performance, I'd probably go with Woods [although the fact that Singh >= Woods in every major suggests this isn't clear cut]. But if you told me I got a cut of their winnings for a full year, I'd go for Singh (assuming this year's results to be their true talen level). Which suggests that if we are looking for the VALUE counting stat Singh wins, while Woods may win the VALUE rate stat.
HOOPSWORLD.com Review: Pro Basketball Prospectus 2003-04 Edition (November 18, 2003)
Posted 9:19 p.m.,
November 19, 2003
(#3) -
Michael
I think a lot of that advance in football has already been made. There is a lot of research on this things. Just like we sometimes give a hard time to academics who do a linear regression and publish the new linear run formula as if it were original, without any knowledge of what work the "field" has done. I think we should be careful in our criticsm/suggestions to other sports to not come off sounding that uninformed. That said, baseball has a lot going for it that basketball/football/hockey don't in terms of analysis. That is that from the time the ball is pitched to the time it is put in play it is essentially a 1 vs 1 contest with but 1 final result of the event. In sports where either the events are not as distinct or the events occur in a team vs team setting it is harder to get as good data to study. Similar to how defensive analysis is harder than offensive analysis in baseball just because things are interdependent.
Win Shares per Dollar (November 20, 2003)
Posted 5:52 p.m.,
November 20, 2003
(#3) -
Michael
Lou is right, which is why marginal wins for marginal payroll is interesting to look at. And, like many things, one can imagine that there are decreasing marginal returns (win) for increases in the payroll.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 4:05 p.m.,
December 1, 2003
(#4) -
Michael
Tango, do you also weight each season by sq_root(PATBF)? Seems like you should.
I also have to say Marcel is one damn smart monkey.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 4:11 p.m.,
December 1, 2003
(#5) -
Michael
That should be PA or TBF. I tried to use the verticle bar character for an or, but if seems to have been filtered from the message.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 5:04 p.m.,
December 1, 2003
(#14) -
Michael
Well the square root comes about if you make the simplifying assumption that a player has some true likely hood of reaching base in each plate appearance (.350 say). Then each plate appearance is a single observation of an identicle independent random variable. This simplifying assumption, while obviously false, is useful to model the data. And the observed on base percentage should approach the actual on base percentage proportionally to the square root of the number of observations. I.e., to get a twice as accurate measure you need four times as many observations.
To use Tango's numbers. First of all there is a calculation error. you should get 7300 not 7100 and as a result an OBP of .342 not .353.(Maybe the formula is already too complicated? :)
A different way to use the weights to estimate obp would be to just use the obp each year. That is do:
2003, .417
2002, .25
2001, .3
2000, .333
and use the 5,4,3,2 on that so (5*2003obp+4*2002obp+3*2001obp+2*lgobp)/(5+4+3+2).
This gives you an estimate of .332 OBP for the numbers above.
At first that is what I thought Tango meant by the weights above.
What Tango actually means (based on message 7) is:
(5*2003obp*2003PA+4*2002obp*2002PA+3*2001obp*2001PA+2*lgobp*600)/(5*2003PA+4*2002PA+3*2001PA+2*600).
This gives you an estimate of .342 OBP for the numbers above.
What I am suggesting is you instead use sq_root(PA) for each of those numbers above:
(5*2003obp*sr(2003PA)+4*2002obp*sq(2002PA)+3*2001obp*sq(2001PA)+2*lgobp*sr(600))/(5*sr(2003PA)+4*sr(2002PA)+3*sr(2001PA)+2*sr(600)).
This gives you an estimate of .337 OBP and is more consistant with the math behind an observed variable. Using the linear PA in this case overweights the 600 observed PA seasons as it assumes they are 50% more accurate than the season with 400 observed PA when in reality they are only 22.5% more accurate.
Even though Tango did the regression without the square root normalization it wouldn't surprise me if the square root normalization was as least as good as the linear PA weighting (unless the data set was pruned in a systematic way for short seasons) since for many players they will be balanced where all the years will have about the same number of PA. And where they are unbalanced they should be unbalanced in an unsystematic way so the weights shouldn't suffer. The only thing that might be incorrectly set for the square root calculation is the 600 PA for league average, but I doubt that will be off by much.
Banner Years and Reverse Banner Years for a Player's BB rate (December 14, 2003)
Posted 2:54 p.m.,
December 14, 2003
(#5) -
Michael
Nice. My first question: What is the correct way to predict BB rate for a non-banner player. A guy who neither had a terrific break through or a horrific fall?
Ahead of the Curve: Contract Info (December 23, 2003)
Posted 6:12 p.m.,
December 23, 2003
(#1) -
Michael
This is really good. I've been looking for good contract site for a while. One thing that is odd is that some players have all their ARB years listed as "ARB ARB ARB FA" while others have just one ARB listed. But it is neat to see the full 40 man roster and all the future years.
I guess the only thing we'll have to keep in mind is if someone is cut from the 40 man but still owed money it wouldn't be tracked here. Or if one team is paying some/all of the contract of a traded player. But these are minor quibbles.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Discussion ThreadPosted 10:45 a.m.,
January 2, 2004
(#13) -
Michael#36: Posted by Michael (at Clutch) @ 01/02/2004 01:35:53 AM
However, what if we're not that certain of A-Rod OB talent level? Maybe I'm 95% sure that his OB level is from .380 to .420. Now, given that, what do I expect his performance to be over the next 600 PA by chance alone?
(Primate mathemtician, please fill in the blanks. Let's say that it's .340 to .460 95% of the time.)
Actually if you assume that A-Rod's OBP talent is 95% likely to be between .380 and .420 (and make that uniform for that 95% - we can play with that later, it isn't important for now), and than say the 2.5% tale on each end stretches to .300 on the low and .500 on the high, then you can figure out roughly what you'd expect A-Rod's observed OBP is over the next 600 PA, and we get around .348 to around .452 as his 95% interval of confidence.
If you instead assume his OBP talent IS .400 then just on random fluctuation you get the 95% interval being about .360 to about .440 as Tango says.
If you instead assume his OBP talent is between .360 and .440 (say based on a 600 PA .400 season) and redo the stats you get his 95% interval of around .330 to around .470.
So you can see that the expected observed 95% interval does change with the assumed confidence interval of the assumed true talent level. But the reflection of this change is not a simple addition. With 0 assumed error you get the 95% interval of observed data is +/- .040 OBP. When you assume that the error in true talent is +/- .020 OBP you get the 95% interval of observed data is +/- .052. When you assume that the error in true talent estimate is +/- .040 OBP you get the 95% interval of the observed data is +/- .070 (with respect to the predicted mean).
Now this calculation overstates the amount of error you'd expect in reality as I made the simplifying assumption that the 95% confidence interval in true talent level that we had was uniformly distributed. In reality we might expect that .400 OBP true talent level was more likely than a .380 OBP and as a result thouse +/- .052 and +/- .070 would be coming in quite a bit. But still, one can see there is a lot of value in knowing the varience of these estimates - particularly if we have good reason to assume that the player in question has a wider varrience of what we'd expect his true talent to be (give it coming after a break out season, or coming back from injury where we aren't sure if he's going to be as good as he once was, or maybe a young player in his prime who might mature/age/get better a lot in one year or may not).
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 12:33 a.m.,
January 3, 2004
(#34) -
Michael
Michael, can you give details as to how you got your numbers, as well, as how you'd handle the issue of non-uniformity mathematically
Actually if you assume that A-Rod's OBP talent is 95% likely to be between .380 and .420 (and make that uniform for that 95% - we can play with that later, it isn't important for now),
No problem. It is really pretty easy. You want to take the true talent distribution, and then put the probabilities that Arod puts up an observed "x" OBP given that his true talent is "y". Then you do that for all x and all y. In essence you are going to multiply the two probibility distributions and look at the resulting distribution. Often, especially for baseball size sample sizes, I find this is easier if we make everything discrete.
To do this just form a table (say in Excel) where you have the true talent going across the top (every integer OBP true talent level between .380 and .420, plus a little bit of tails on each end - you could do more than every integer since the expected distribution of true talent is probably a continuous function, but I'm discretizing it because it is easier to deal with, and since we are only going to make 600 observations anyways this discretization will not affect the estimate very much) and where you have rows that represent the number of hits we observe in Arod's next 600 PA (starting at 0 and going all the way to 600).
Then you give each column in the table a weight which represents how likely that the true talent level is the value for y that this column represents. In the .400 true talent level the .400 column is 1 and every other column is 0. In the .380 to .420 distribution every column in the .380 to .420 was equally worth 95/4100 and each of the columns in each of the tail region were worth 2.5/(100*num_columns_on_one_tail). Then you basically sum across the rows for each x, making sure to have weighted the row properly. So if y is .395 and x is 120 in my table I'd have originally Combin(600,120)*(.395)^120*(1-.395)^(600-120). Then when I weight the columns that would turn into Combin(600,120)*(.395)^120*(1-.395)^(600-120)*(95/4100).
So then in this final column you have P(OBP observed is x) given the assumed true talent distribution. Then you just use the cumulative sum to find what x is for P(OBP observed is about Maximum Likelihood Estimation Primer*A* but I'm not sure.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 2:42 a.m.,
January 14, 2004
(#4) -
Michael
(homepage)
Here's a nice little link that explains why you need to regress the talent level in general in an experiment. But as MGL gets around to you could do the SoS adjustments and then regress or you could regress and then do the SoS adjustment.
At the team level when estimating team quality and the effect of unbalanced schedule and SoS on team results I think it is most intuitive to approach both simultaneously (but that's just me). If you take a maximum likelihood approach you can say we know roughly what the distribution of team quality is like by looking at ML history and knowing what the rough distribution of quality for teams are (take the observed results of previous years teams and regress by the appropriate amount that is based on the year-to-year correlation between team wins). We can assign each team this probability mapping. Then for each given probability amount we say what is the likelihood that we'd observe the outcome (team a wins 84 games, team b wins 96 games, etc.) given the (true talent of team a is X, team b is Y, etc.). Then we take this amount for all possible (true talent of team a X, team b Y, etc.) and multiply it by the probability that you'd get a team a of X quality and a team b of Y quality etc. which was based on our original probability function. Ok, my explanation is not the clearest so let's use a simplified example:
Let's suppose there were only three types of people in the world, those with talent .6, .5, or .4. Further let's suppose that 10% of people are .6 and .4 respectively while 80% are .5.
Now 4 teams A, B, C, D play each other 16 times but A plays B 10 times and C and D each 3 times (likewise B plays A 10 times and C and D 3 times each while C and D play each other 10 times each). At the end A has a 7-3, 1-2, 2-1 record respectively. B has a 3-7,1-2,1-2 record respectively. C has a 2-1,2-1,5-5 record (for games vs A, B, D respectively). D has a 1-2,2-1,5-5 record. Now for each of the 3^4-tuples of probability for true talent of A,B,C,D [from A=.4,B=.4,C=.4,D=.4;A=.4,B=.4,C=.4,D=.5;...;A=.6,B=.6,C=.6,D=.6] find the likelihood that we'd see the above described records (In otherwords when A=.4,B=.4,C=.4,D=.4 what is the probability that A vs. B is 7-3 AND A vs. C is 1-2 AND ... AND C vs. D is 5-5). Then take this likelihood and multiply it by the probability that A=.4,B=.4,C=.4,D=.4 by the likelihood of that a priori given the original distribution (.1^4 given my initial supposition). Then take all of these numbers and look for the most likely true talent levels.
Now in baseball we'd probably say there was a continuous distribution of talent rather than my 3 values so you'd be integrating the cdf once you'd multiplied the probability functions, but the idea would be the same.
Of course, if you are willing to use more information you might be willing to say that each baseball team isn't actually eqaully likely to fall anywhere on the MLB team talent level since we could use previous year's data to say that a team that won 100 games last year is not most likely to win 81 games next year and is more likely to win 101 games than 61 games. But that is just a minor tweak as you'd perform the same process you'd just have different initial a priori probability functions for each team that you'd use in the multiplication step.
Did anyone understand what I just wrote?
MGL takes on the Neyer challenge (January 13, 2004)
Posted 8:40 p.m.,
January 15, 2004
(#25) -
Michael
David you are sort of right that if you were trying to predict what will be the highest w% of any team next year that it will probably be .650 even though the most talented few teams might be true talent .600. But for each of the given .600 teams you'd make a most accurate prediction by predicting they'd win at a .600 rate. But inherently you know there will be some random fluctuation. Even if you knew a priori the exact skill level of all 30 teams then you'd still have some error in your predictions just based on random fluctuation (and you could model this easily in a simulation).
P.S. MGL I'm not Humphries, I'm Bodell.
Mike's Baseball Rants - Hall Of Fame (January 21, 2004)
Posted 9:56 p.m.,
January 21, 2004
(#2) -
Michael
I agree. I dislike Win Shares and cringe at most of their use in general (sorry studes) but found this use appropriate, clear, and educational.
Obscure Rule Flags Students Who Sharply Improve SAT Scores (January 21, 2004)
Discussion ThreadPosted 3:42 a.m.,
January 23, 2004
(#23) -
Michael
Yeah, but Alan Jordan where did you get the "child molester" part? I don't think you mean cheater == child molester.
At school we actually had some pretty nifty cheat detection programs in computer science where it would test student's computer program submissions and find numerous cases where students had copied other student's work (even from previous years). Amazing that even when you warn students that this type of thing is happening they *still* end up cheating.
Speaking of SAT type problems, a common problem that I've heard to illustrate the Bayes theorem problem is the taxi accident problem. Imagine that on some mystery island witnesses identify the color of a car that was in a hit and run correctly 80% of the time, and the other 20% they are incorrect. Now imagine that you have 95 yellow taxis and 5 green taxis on the island. There is a hit and run involving a taxi and there is a single witness who says the taxi in question was green. To the nearest percentage, what is the percent chance that the taxi really was green?
A. 17%
B. 20%
C. 50%
D. 63%
E. 80%
Many, many people (incorrectly) say 80%. It's almost as bad as the let's make a deal problem.
FANTASY CENTRAL (February 21, 2004)
Posted 9:56 p.m.,
February 22, 2004
(#11) -
Michael
I'm planing on doing what you descibe J. Cross in terms of figuring out the expected distribution of each stat and update it as my team and the other teams in my league fill in. But I'm also in a primer-based league (the same yahoo one as Nob Narb) so I also don't want to share too much. Should be fun to see a league of primers with more sabermetric type stats go at it.
FANTASY CENTRAL (February 21, 2004)
Posted 2:48 a.m.,
March 3, 2004
(#94) -
Michael
The major league front offices that are currently debating if it is better to compare projection systems using rmse or correlations are, to say the least, not in the majority.
And most front offices would want to invest in what is likely to bring their particular team more money. And that may involve some non-performance things (got a good marketing niche [japan, latino, record chaser]; home town hero; big name/used to be a star; etc.) as well as performance things (winning games brings more fans, playoffs brings a lot of revenue). IMHO on the performance side you absolutely want to have not only a rank order of players (correlation) but also an accuracy on how much better certain players were than other players and than replacement players (rmse). If player A is 1 run better than player B and player C is 15 runs better than player D but there are no players in between A and B nor in between C and D and a team is able to pick either A and D or B and C you want to have the accurate predictions that correctly get the magnitude of differences between these players.
FANTASY CENTRAL (February 21, 2004)
Posted 10:17 p.m.,
March 4, 2004
(#100) -
Michael
You also want to baseline by position, such that the "replacement" level at each position is also set to zero $.
So what is the correct way to calculate the "replacement level". Ignoring the adjusting to what players each team has and ignoring the adaptive algorithms what if you want to come up with some static list of $ values a la a lot of roto fantasy sites and a la the BP fantasy manager. Here's my thinking to date:
Simplify for a second and imagine a fake league where each of 10 teams needs 3 utility players (and that's it - no other places) and there's only one category (say SB). Replacement value here is easy to calculate as you just look at the 30th best SB guy on your projections and he's esentially worth $0 (ignoring the potential need for a minimum $1 bid on all players) as if you are the last team to pick the last guy he (or someone better if one of the other teams messed up) should be yours uncontested (maybe it is better to use 31 as replacement value as in the general case with mutiple positions it works best to use the guy who will never be selected). Say you project him to steal 17 bases. Now you just need to figure out how many steals above replacement level there are in the top 30 guys. Say the top 30 guys you project to have 753 total steals. Now there are 243 steals above replacement available (753 - 30*17), and if there are 10 teams with $260 budgets each you'd expect that each steal over replacement is worth about $10.7 ($260*10/243). And to calculate a players value you just know $0 = 17 * $10.7 - b, from that you get the intercept value of about -182, so a players value in my example here is around 10.7 * projected_saves - 182.
Ok, so that's how it works in the simpliest of cases. If you were to generalize to a full field of positions you'd do the same kind of calculations to get the replacement level at each position, calculate the number of steals above position_replacement for each position, do the divide to get $/steal_above_replacement and you'd be off again. So if you imagine another fake simplified league that this time requires a catcher, a ss, and a OF and again just has the steals category and again just has 10 teams. This time replacement level might be as follows 20 for OF, 13 for SS, 2 for C and there might be 323, 214, and 50 SB by the top 10 OF, SS, and C respectively. That means there's a total of 237 SB over replacement ((323 - 10 * 20) + (214 - 10 * 13) + (50 - 10 * 2) = 237). With $2600 dollars total that gives about $10.97/SB over replacement. Which means a 9 C and a 20 SS and a 27 OF are all equally valuable (with formulas of about OF$ = 10.97 * SB - 219; SS$ = 10.97 * SB - 143; C$ = 10.97 * SB - 22).
OK, that's all pretty easy. But the hard part comes with multiple stats. Say we go back to the 3 util player league but this time there are two categories: SB and HR. How does one calculate replacement value now? Ideally you want some formula that says $player = x * HR + y * SB + b. But to calculate x and y you need to know what replacement is for HR and SB. And it is no longer as easy as pick the 31st best guy as you need some way to choose the 31st guy and you don't know if a guy with 20 SB and 25 HR might be better than a guy with 10 SB and 30 HR. The naive way might be to pretend you have 2 different single leagues with $130 per team. In other words to calculate the value of just HR ignoring SB and then to do the same with just SB and ignoring HR. I'm pretty sure this isn't sound.
A better way might be to calculate the Z-values for each player in HR and SB, sum the two numbers and rank the players on that as if you were playing a roto game where the single category was Z-value (in otherwords find the 31st best summed Z value and use that as replacement level). But here you have a couple of problems:
1. Z values based on what pool of players? If you use average HR and stdev HR for the whole league (and same with SB) and you are only going to choose a subset of the league then why should adding scrubs to the league change your answer? Imagine HR average was 20, std dev was 10 and SB average was 10 and std dev was 5. Imagine because you are choosing only a subset of players you are sure you'll never take someone with below average HR and SB. Now all of a sudden you add a bunch of players who hit 2 HR and stole 8 bases. all of a sudden your average and stdev for HR and SB has changed even though you haven't changed the characteristics of the players you plan on picking. So clearly you'd like to be able to calculate the std and average of the players that you might reasonably consider taking. But to do that you have to be able to rank the players somehow which is our first problem again.
2. As has been pointed out before in many cases you'd rather have a guy who was 2 std dev above the mean of players you are considering in both HR and SB than a guy who was 5 std dev above in HR but 1 std dev below in SB. This method doesn't reflect that.
So what do people think the best way to come up with replacement values is when you have two categories at once?
FANTASY CENTRAL (February 21, 2004)
Posted 1:07 p.m.,
March 5, 2004
(#103) -
Michael
J. Cross your method is ... unsatisfying.
I understand that it may well work from a practicle standpoint as a good first order approximation, but there ought to be a theoretical way to calculate it from scratch. Imagine that we were the first roto league to ever use just two categories. What would you use then as the std and averages would be different than in a 5x5 or 6x6 or 8x8 league.
FANTASY CENTRAL (February 21, 2004)
Posted 8:34 p.m.,
March 15, 2004
(#131) -
Michael
I agree with NN. The value of the players at the position matter.
Imagine you have the following choices at 3b:
60,30,28,26,25,24,24,23,22,10
And the following choices at 1b:
60,59,58,57,56,55,55,30,20,10
Where in both places the "10" is your replacement guy.
And again we are assuming a draft based league, not an auction league. If it is your turn to pick it is clear that the top 3b in this example is worth more than the top 1b. They aren't equally valuable even though they are both worth "60" and both have "50" value over replacement.
FANTASY CENTRAL (February 21, 2004)
Posted 2:21 a.m.,
March 16, 2004
(#136) -
Michael
Seems likely yes. Although I'm a bit behind (as always) on the amount of functionality that my spreadsheet has. I just hope we get enough live people so that you don't get 8 picks instantaneously and have 90 seconds to enter in 8 picks and choose your next guy.
FANTASY CENTRAL (February 21, 2004)
Posted 4:21 a.m.,
March 21, 2004
(#147) -
Michael
The aftermath is at:
http://baseball.fantasysports.yahoo.com/b1/2686/draftresults
If you check out the draft results remember this is an 8x8 league with:
R,HR,RBI,SB,TB,BB,OBP,OPS for hitters
IP,W,SV,K,ERA,WHIP,K/9,K/BB for pitchers
MGL's superLWTS (March 10, 2004)
Posted 3:12 a.m.,
March 15, 2004
(#19) -
Michael
MGL and Tango just great stuff. I agree with David #7 about the need for it to be presebted in a better format. I am going to go blind willingly looking at this stuff and it IS the best overall system out there. I understand how busy you guys are but anything you could do to help my eyes would be appreciated.....if we all need to rough Dan up let us know..
JK
Michael
Silver: The Science of Forecasting (March 12, 2004)
Discussion ThreadPosted 9:19 p.m.,
March 15, 2004
(#39) -
Michael
Does anyone have the PECOTA numbers for 2003? When BP added 2004 PECOTA 2003 went down (they reused the URLs and don't have year in them). Nate says 2003 is coming back, but it isn't a high priority on when that is. If you had the 2003 probability bands you could at least look at the players forcast in 2003 and see what percentage were over/under their various projections. I.e., if 24.5% of players played below the 25% numbers that might be pretty good. If 48% of players were below the 40% numbers that might not be as good. If the distrubution of actuals mirrors the projections then there is some confidence PECOTA is doing a good job with its bands. Especially since as tango points out the PECOTA bands are *not* simply saying here's what you'd expect via binomial to occur if true talent was X over Y PA.
I had planed to do just this study but the 2003 data went away. (I wanted to know how much I should trust the 2004 bands when I was doing fantasy valuation).
Silver: The Science of Forecasting (March 12, 2004)
Posted 12:00 a.m.,
March 16, 2004
(#42) -
Michael
It totally matters. First of all it is not the case that all players have equal and symmetric distributions. But even in a case like your example where they do thing of the following players:
Pitcher A E(ERA) = 4.00, SD(ERA) = 1
Pitcher B E(ERA) = 4.00, SD(ERA) = 0.25
Imagine that you are the yankees picking the guy you want to be your 5th starter. Maybe you are comfortable with the guy who is pitcher B because you know with him pitching you'll almost definitely expect to win 50+% of his games regardlss. I.e., you want the less risky guy.
Now imagine that you are instead the Jays and you are trying to fill your #3 or #4 starter. You might prefer pitcher A, because in order to catch the Red Sox and Yankees you pretty much need a 1 SD better than average type luck to make the playoffs and as a result you might be risk seeking.
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 8:36 p.m.,
March 22, 2004
(#12) -
Michael
What ever the market will bear:
6 years at $66 million. :)
But seriously, what he is worth is different to different teams. It depends on size of city, likelihood the extra wins from Chavez versus replacement will mean playoff trips, the fan reaction, the popularity of the player, who can negotiate with the player, etc.