Forecasting 2003
by Tangotiger
The forecast today is partly sunny, with a chance of showers. Tomorrow will be partly cloudy, with a chance of rainbow. AOL will go up, with a chance to go down. There's a large group of people who don't have much, if any, use for forecasting systems (baseball, stocks, weather or otherwise). After all, these forecasting systems are based on probabilities and not certainties, so what good are they? These people reason that they could look at the back of a baseball card, look at his age, and make a decent forecast for his upcoming performance. There's another large group of people who create forecasting systems, because they reason that there are many intricate details that need to be analyzed so that they can reduce the error range in their probability distributions and create useful and accurate forecasts. Put up or shut up Every year, the Wall Street Journal polls the major brokerage houses for their "top 10" picks of the year. They compare the annual performance of those picks against the S&P 500 index (i.e., mom & pop investor). About two years ago, I came across their list over a five-year period. Lehman and Smith Barney were the only ones to beat mom & pop. The other 9 brokerages trailed the index. Throughout the year, we will be comparing the expectation of the systematic forecasters, the back-of-the-card forecasters, and the baseline forecast. Four systematic forecasters have been kind enough to agree to supply me with the projected OPS or ERA of a selected group of players: Mitchel Lichtman, Ron Shandler, Nate Silver, and Tom Tippett. STATS did not return my request, so I will have to get them with a little more effort. The baseline forecast is very simple: take a player's last 3 years OPS or ERA. If he was born 1973 or earlier, worsen his OPS by 5% or his ERA by 10%. If he was born 1976 or later, improve his OPS by 5% or his ERA by 10%. The 1974-75 players will keep their 2000-2002 averages. The back-of-the-card forecasters are the Primer readers. Step right up, and apply whatever process you want. It would help if you do not use the projections of our forecasters. The selected players The 32 players were selected as follows:
Essentially, this is a list of players that should be hard to forecast, because their 2000-2002 performance has been very inconsistent. I introduced the Colorado condition, as well as the playing time condition, because even the back-of-the-card forecasters would agree that some systematic process to handle those players would be required. In the coming weeks, Dan Werr, Chris Dial, and I will present some commentary on each of the 32 players. Some time in March, you will get the chance to fill in your ballot, with your forecasts. In the meantime, here are the 32 players that you can take a look at, and watch out for.
|
February 12, 2003 - Gerry
I can think of a couple of other ways of making a quick & dirty forecast.
1. Take the median OPS over the last three seasons. Make the age adjustment if you wish.
2. Take a weighted average of the OPS over the last three seasons; say, half 2002 plus one-third 2001 plus one-sixth 2000. Make the age adjustment if you wish.
It might be amusing to see how these do against the other competitors.
I guess the simplest forecast would be to just use the 2002 OPS (with or without age adjustment).
February 12, 2003 - Stevens
Another possibility is to sneak a peek at the best comp for the player's age at Baseball-Reference.com. Then, if the comp is older, you just use the comp player's stats for next year. Or you could use the comp's comp for his next year and use those stats. I'll probably do one of these for grins, mostly because it's easy.
Thanks for the study. It will be fun.
February 13, 2003 - Michael
I think this is a good idea. One thing to think of is how will people be scored. Sum of linear error, sum of square error, sum of the rank of their prediction from amongst the entries, etc. In addition to being an interesting question, it may influence what people put for someone like Bonds. If you think his worst case, 25th percentile, median, 75th percentile, and max were say OPSs of 800, 900, 1150, 1350, 1425 respectively than you might be "wiser" to predict 1075 than 1150 in some scoring systems.
I also think it would be interesting to see what people's estimated confidence intervals would be. Asking people for 50% confidence intervals for each of the players would be interesting as for some players it may be 900 +/- 50 where for others it may be 900 to 1350. People whose 32 players break up roughly 8 below, 8 in but below the predicted value, 8 in but above the predicted value, 8 above the range would be doing their confidence ranges pretty well (assuming that their estimates are close to accurate and they didn't just cheat by predicting 8 guys with ranges of 0-0, 8 guys with 2000-2000, and 16 guys with 0-2000, 8 of whom predict 0, 8 of whom predict 2000). That would be interesting as predicting the ranges of possible values would add a lot of value to people's predictions. For instance if one person predicts player A will have a 800 OPS and player B will have a 795 OPS while another person predicts player A will have a 795 OPS with a 50% chance of being between 780 and 815 and predicts player B will have an 800 OPS but has a 50% range of 675 to 900 and it turns out that player A has an OPS of 820 while player B has an OPS of 680 it isn't clear to me that the second person's predictions are more useful and insightful even though they were slightly more off.
February 13, 2003 - Chris Hartjes
(www)
(e-mail)
Michael, it looks to me like the method you are describing seems (to me any way) to be very similar to the new PECOTA system that the guys over at Baseball Prospectus are now pushing. It compares players to similar players in order to try and figure out a projection, then displaying those results ranked by percentile groups. It gives them a way to project potential breakouts or slumps.
Now, I'm not a shill for BP but the system does look interesting. Can't wait to read BP2003 and see a more in-depth look at it.
February 13, 2003 - tangotiger
(www)
(e-mail)
I still have not decided how to "rank". With only 32 players, using differentials or RMSE might not be the most appropriate (esp with the Bonds thing). I could create "classes of differentials" (consider each class to be 1 SD of error, and max out at 3 SDs or something like that). Or I might use differentials, while capping the individual differential at 3 SDs. Really, it's not important. I'm going to present the full data, and the reader is free to analyze and interpret the data as well.
February 13, 2003 - mulker
(e-mail)
Interesting work, as always, Tango. Can't wait to see the results.
For those of us too impatient to wait 8 months, any thoughts about looking retrospectively, using the same methodology, at last year or other prior years? Fantasy players (and others) across the country would be ever grateful.
February 13, 2003 - Paul M
Hey, I love the idea, so don't take the criticism the wrong way.
1 potential flaw-- you're presuming Rogers gets a job and actually pitches this year.
Another-- I wonder if playing time ought to be part of the competition somehow. Surely it is part of a GM's or player personnel director's job-- there are some real ifs here in terms of how much they will perform:
Moises Alou; Marquis Grissom; Jeremy Burnitz, Jeremy Giambi, JD Drew-- and with the pitchers Rogers, Sele, Ritchie.
Maybe there should be a tiebreaker or secondary competition-- name those position players who will not amass 500 PAs; and those starters who will fall under 150IP.
But overall, great idea and it should be fun. I've always been able to do this pretty well for hitters in a rotisserie context-- but pitchers are a total crapshoot, aren't they??
February 13, 2003 - tangotiger
(www)
(e-mail)
...retrospectively, using the same methodology, at last year or other prior years?
I guess surprises are out of the question around here! Voros has looked at the various forecasters for the year 2000 hitters . I was going to also add in the "baseline forecast" to his list to see how that stacks up. Stay tuned in a couple of weeks.
When do fantasy drafts usually occur? The last weekend in March?
February 13, 2003 - mulker
Thanks for the link, Tango. Interesting stuff.
Yup, most fantasy drafts are last week of March.
And we do like surprises. We just don't like waiting. :)
February 13, 2003 - Roger
This will be an interesting but far from definitive exercise for one major reason. The number of cases is so small that any of the forecasts might win by chance alone. Or, another way of putting it is that the results will be affected greatly by the cases that were initially selected and how much the performance of those players is affected by (good or bad) luck or chance. It would be far more useful if you had selected a couple of hundred players at random.
February 13, 2003 - tangotiger
(www)
(e-mail)
I probably should have said this in the article.
If I were to ask the Primer readers to estimate 200 players ERA or OPS, I'd get a smattering of response. By limiting it to something reasonable (32) I hope to get a decent participation, while at the same time getting reasonable (though not conclusive) results from the forecasters. This is similar to what the WSJ does with using the top 10 picks from the brokerages. The intent is not to prove anything. I also selected those 32 players who showed the most deviations, and therefore, we'd expect the forecasters and the Primer readers to have little agreement on these.
I have also asked the forecasters to participate in a second parallel study, where they would submit the projections for a large number of players. I've only received a positive response from 2 of them. This is essentially what Voros did with his study, except he did the hard work by compiling everything himself. I can understand that the forecasters don't want to give everything away (which is why it was easy to ask them for only 32). I hope though that by the end of the season, they'll give me their list, so that I can save some work. So, you'll get the study that you are looking for, plus the other readers will have some fun (I hope) as well.
I hope this answers your concern.
February 13, 2003 - RossyW
If league-wide offense goes up or down, as it probably will, do you adjust all of the projections?
February 13, 2003 - tangotiger
(www)
(e-mail)
Yes, included with the ballot for the 32 players will be your estimate of MLB OPS and ERA (which will default to the 2002 level if you don't choose anything).
This is critical because if a forecaster underestimates all his projections, it doesn't matter, as long as you only use his system. Therefore, that's not a bad thing.
Really, I wanted to ask everyone to submit their OPS/lgOPS, but that loses too much meaning.
Great question!
February 13, 2003 - tangotiger
(www)
(e-mail)
By the way, if anyone has a systematic forecasting system, then send me an email. It could really be based on anything, like
- weighted or unweighted recent performance - lefty/righty splits - gb/fb tendencies - comparable players - age - height/weight - position - regression towards mean - injury historyYou don't have to tell me how your engine processes everything, but just what/how does the engine consider. I can then throw you into the systematic forecaster pool. Thanks...
February 13, 2003 - Charles Saeger
A thought -- why not use single-season similarity scores? I know it's not OPS, but it would be interesting, though it you would need to do a PA correction or something.
February 13, 2003 - DCW3
(e-mail)
Charles--
I'm actually working on a project that creates an OPS forecast for hitters based on the next two seasons of each batter's ten most similar players through current age. The object was to test the predictive value (if any) of sim scores, but it seems quite applicable to this situation. I've got 16 teams done so far.
February 13, 2003 - David Smyth
Tango, I think this is fantastic topic. Although I greatly respect the technical expertise of expert projectors such as Voros, MGL, etc., I still cling to the belief that I can do "almost" as well, looking at the back of a baseball card for 30 seconds or whatever. What is important to me is not whether they can "beat" me (I have no doubt that they can), but rather what is the magnitude of the discrepancy. If I can get 90% of the way there doing 10% of the work, I'm quite happy with that, in the knowledge that even the most accurate projections are still off the mark a noticeable percentage of the time. I don't mean to leave an impression that advanced player projection is for stat nerds only--I realize that this is one of the "higher" forms of the science. But what is of a large impact conceptually is not necessarily matched in the practical arena.
February 13, 2003 - Rob Lane
(www)
Hope I'm not out of line, but this reminds me of a discussion that came up last spring when Voros was presenting a study he had made of some of the previous year's predictions. Here is a post I made at the time:
> As a sort of control, do you happen to know what the results are if > you simply plug in the previous year's performance as your > prediction? This would probably have to omit the minor leaguers > and would rack up some bad scores for players moving to/from > Colorado, but it would establish a baseline for prediction systems, > yes?
(Link above)
I dropped out of further discussion because you guys got all statistically and started talking about "disaggregating errors" and such things and scared me off, but what I wanted to suggest at the time was a suite of baseline predictions, all mechanical, against which other systems could be compared. Baseline #0 would just be the previous year's numbers reentered for each player; subsequent systems would add some straightforward refinement similar to those that have already been discussed here.
Regardless of the above, I'm looking forward to seeing how this study pans out.
February 14, 2003 - Monty
I like the idea of a pure "control group" baseline. In addition to blindly predicting that each player will replicate his previous season, you could predict that each player will hit his career average OPS exactly.
February 16, 2003 - Voros McCracken
(e-mail)
David Smyth wrote:
"I still cling to the belief that I can do "almost" as well, looking at the back of a baseball card for 30 seconds or whatever."
Right, and that generally is because in 3/4 of the cases, the projections will be remarkably similar.
Where Tango is on the right track here is isolating a group of players for whom there can and should be a fairly large discrepancy as to what people will think they'll do. Indeed it's in cases like Jeff Cirillo where a fair amount of research and study can be helpful. On the other hand a wild guess on how to do it might match what the research says as well.
It should also be noted that if enough people make guesses, by rules of probability, some Primate should be able to come away with the top score.
I like the player list though.
February 16, 2003 - Roger
Be careful what you're saying about Primates. You know the old story about how if you put 1000 monkeys in front of typewriters sooner or later they would hack out a perfect copy of a Shakespeare play. Surely you weren't referring to that kind of primate. (But I think you've got the idea right -- given enough Primates hacking away at Tangotiger's Task, somebody by chance alone is going to beat the "experts."
Of course a major flaw in the argument about the 1000 monkeys is that not one of those primates would be able to read or use the printed output. Furthermore, there would be far far more "near misses" to the perfect manuscript -- oh, maybe with a few typographical errors somewhere -- than there would be "perfect" manuscripts, and wouldn't it be a shame if nobody were intelligent enough to realize that almost is close enough?
February 17, 2003 - Dan Werr
(e-mail)
By the way, I've said that this year would bring another installment of the Boone Pool, not with Boone but one or more other players. This is now what that is; the two concepts have been combined.
February 17, 2003 - tangotiger
(www)
(e-mail)
Just to reiterate (or maybe iterate, since I was not very clear), the point is not to figure out who has the best forecasting system, but rather if a systematic forecasting system is any better than a baseline or back of the envelope (card) system.
What the WSJ study shows is not that the Lehman brothers have a better forecasting system (hard to say with only 10 stocks) but rather that the mom&pop do better using a baseline (the S&P500 index) than in paying off the professionals.
To determine which professionals are better, you need far more than just 10 sample points, and the WSJ also does this by looking at all stock picks. This would be part of a second parallel study if I get a decent participation from the forecasters as well, similar to what Voros did in the 2000 link I provided. However, given that I've chosen 30 players who have very inconsistent performances, I think it might show something about the forecasters, but will be far from conclusive. (If I had chosen the 30 most consistent players, my guess is that all systematic forecasters would come up with very very similar estimates. I've removed the Colorado and the inexperienced players from the study, and there again, some forecasting systems might be better with those players.)
February 18, 2003 - being
Looks like fun. I'm surprised and disappointed Darin Erstad isn't on the list. He's gotta be the Most Difficult Player to Forecast.
(.332-26-104?) (.298-17-82?) (.271-8-58?)
February 18, 2003 - tangotiger
(www)
(e-mail)
Erstad: good call!
The next set of players that missed making the cut were, in order: Renteria, Erstad, Beltran, Sosa, Javy Lopez, Mark Loretta, Vina, Giles, Magglio Ordonez, Garret Anderson, Ben Molina.
February 18, 2003 - Minks
Are you going to make park adjustments if we give you lgOPS? I mean, do I have to submit a park adjusted lgOPS for each player or can I just submit an estimate for the overall lgOPS and assume you will adjust it for each player?
February 18, 2003 - David Smyth
I'm a bit confused about what this project is trying to do with respect to the inclusion of "outside information" (outside the stats, I mean). There are 3 "levels"-- the basic, the "back of card", and the advanced. What would be the results if the 3 levels were restricted to pure stat information? This is what I am most interested in knowing. But the way this seems to be set up, it is what would be the results if each level has different information. If I submit a "back of the card" entry, it will not consider any inside info, such as injuries or new pitchers, etc. It may "infer" such, due to a lower AB total in 2002 or whatever, but will not seek to learn or directly include such such info. But this may not coincide with what some other back of card primey submits, or what R Shandler has taken into account. It is easy to say that inside info tends to be unhelpful overall, due to low reliability. But for a small selected sample of "oddball" seasons, outside info might be invaluable.
I just want to make sure that apples are being compared to apples--or, if they aren't, that there is a compelling reason for such. And if there is, that the interpretation is taking all of this into account.
February 19, 2003 - tangotiger
(www)
(e-mail)
Minks: no, you would only supply the unadjusted OPS. The only reason to supply lgOPS and lgERA would be to establish your basis. Suppose that you miss all your OPS projections by 50 points, but that you also projected the lgOPS to be off by 50 too. Then, this scores 100% (in my book). A person using the results of such a projection will be perfectly happy (as long as he uses only this projection).
David: the back-of-the-card forecasters are just like mom&pop investor. They each have access to their own private information and public information and intuition, and combine all their data into some sort of target price for a stock. The collection of all these investors makes up the market. You can benefit from this "wisdom" by buying the S&P500 index (SPY). The systematic forecasters follow a rigid, repeatable process, like the various brokerage houses, like Lehman and Smith Barney. The baseline is the monkey throwing darts at a stock chart. So, whether I am comparing apples or oranges I don't really care (for this study). I'm trying to put this study on the same plane that the WSJ puts its study in.
A second parallel study, looking at extended picks that the systematic forecasters provide (which the WSJ also does when selecting their best analysts), might satisfy the fruit requirements.
February 19, 2003 - David Smyth
I'm clear now, Tango. But I want to point out something else. The design of the project would seem to put the expert forecasters like MGL and Voros at a disadvantage. That is, their procedures are specifically designed to provide the best accuracy for the overall mass of players. They were not specifically tailored to provide the best accuracy for players who had an unusual season in 2002. Their method might be sacrificing outlier accuracy for bulk accuracy. I have no such restriction as a back of the card guy. In fact, I would be stupid if I didn't take this into account and choose to regress more, or less, than I normally would. So, I question that this project will really reveal the "true" differences between the "levels" of forecasting complexity.
February 20, 2003 - Vinay Kumar
Greg Spira and Harold Brooks did something similar in '98. See http://www.baseballprospectus.com/news/19990201brooks.html
I wanted to participate, but got frustrated trying to project Edgardo Alfonzo, and never finished my projection list.
February 21, 2003 - tangotiger
(www)
(e-mail)
Vinay, excellent. I did not know about this. We are essentially after the same goal, but where they have 27 humans projecting 125 players, I'm hoping to get the reverse (100+ humans projecting 32 players).
What is very interesting to me, and which matches the stock market with its S&P 500 index, is that the collective wisdom of the market matches the top forecaster, with all of his intricacies.
The "missing big or getting big" projections of Wilton, I think is probably attributed to lack of regression towards the mean in that system. I'd have to look at the data more carefully though. Because we are dealing with sample performances, you should expect a few guys to have seasons that are out of the norm, and therefore a system like STATS or Palmer will miss the outliers at the gain of the large population. Silver's PECOTA should give the readers the best of both worlds.
February 24, 2003 - FJM
I don't want to turn this into a DIPS thread. But for the sake of argument let's say that Voros is right; most pitchers' performance outside of the DIPS measures [SO/9,(BB+HB)/9, HR/9} is essentially random and hence unpredictable from year-to-year. Then it follows that, even if someone were to come up with a forecast for the 11 pitchers' ERA that is much better than all the others, we must assume it to be primarily luck rather than forecasting skill that is the determining factor. To separate the two, why not have everybody forecast the DIPS numbers as well as ERA?
February 24, 2003 - tangotiger
(www)
(e-mail)
We are trying to forecast a player's performance for the upcoming year. This performance is a combination of a player's expected true talent level, context in which that talent will manifest itself, and luck.
ERA has more luck (from the pitcher's perspective) than other measures. The point of this forecast is to try to predict a player's performance numbers, with the reader trying to do as little as possible.