Background
I introduced the project on my site last year:
On May 9, 2008, and again on June 5, 2008, Nate Silver, architect of Baseball Prospectus’ PECOTA, and forecaster at the political site FiveThirtyEight.com, issued a challenge to fellow political forecaster Dick Bennett at American Research Group in picking the winner of the Democratic primary, state-by-state. That challenge went unanswered. Inspired by that challenge, and having done a multitude of Forecaster assessments in the past, the website TangoTiger.net issued a similar challenge to a large group of forecasters. Most accepted. With 21 forecasters and Marcel, the scene was set. The concept was simple enough:
- We will create a simple scoring system.
- Each Forecaster provides an ordered list of players.
- Forecasters will be drawn at random to determine draft order.
- We will create a program to automate the draft, snake-style, until the rosters are filled. That’s the team you get, no trading or moves.
- We then repeat steps 3 and 4 one thousand times. (The running of so many drafts is to limit the amount of luck one good or bad pick can have on the outcome. It is unlikely that one Forecaster would have ended up with Cliff Lee one thousand times in 2008.)
Then, we’ll see how everyone does at the end of the year. The leader gets to promote the fact that they won The Forecasters Challenge. This challenge is being issued in a spirit of fun and sportsmanship. Here were the Full Rules and List of Participants.
The idea is that by having 1000 random drafts, then the players will get spread out to each of the 22 fans, so that each fan will get each player roughly 25-100 times each, for an average of 45. It didn't quite work out that cleanly, and I'll talk about that in a bit.
The Results
ID POINTS WINS VALUE FAN 115 1484 385 5672 John Eric Hanson 122 1423 187 3581 RotoWorld 203 1397 126 2743 MGL 101 1396 93 2403 ANONYMOUS1 102 1385 75 2220 Ask Rotoman 218 1379 59 1878 Steamer 116 1340 35 1198 KFFL 204 1298 29 974 Baseball Primer (ZiPS) 120 1269 14 544 Razzball 121 1295 0 385 RotoExperts 109 1246 2 194 Christopher Gueringer 112 1229 1 118 FantasyPros911 1056 0 189 Bottom 10 Bottom 10, alphabetical order, ID / Name: 105 Brad Null 106 CAIRO 207 CBS (proxy) 108 Chone 110 Cory Schwarz 111 Fantasy Scope 113 FeinSports.com 214 Hardball Times (proxy) 217 Marcel 119 PECOTAJohn Eric Hanson is the official winner!
Let me explain all the columns. The first column is just a unique ID that means nothing to anyone except my computer program. Marcel is ID 217. Hanson had ID 115. The only meaningful thing out of the ID is that if it starts with a "2" then that means I had to do manipulation on the data in order to create a draft list. I'll talk about this in a second.
The second column is the average number of points the players for each team had in each draft. Again, these numbers are pretty meaningless without context. But, I show it here anyway.
The third column is the total number of drafts each participant won. Hanson, Rotoworld, and MGL combined to win 70% of the drafts.
The fourth column is the total value of the draft. I gave out 11 points for a win, 5 for a 2nd place finish, 3 for a 3rd, 2 for a 4th, and 1 for a 5th. Each draft had 22 points allocated (and with 22 teams, that means an average of 1 point allocated per team). With 1000 drafts, the average point allocation therefore is 1000 points. Hanson earned 5672 points out of a maximum of 11,000 points, which is fantastic.
A couple of more explanations
You will notice that MGL has an ID of 203, which means I had to manipulate his submission. MGL provided player names, and forecast lines (S, D, T, HR, etc). Basic stuff. Whatever he did not supply, I filled in using Marcel rates. Furthermore, he used the playing time forecasts from the Community Forecasts.
Steamer is a group of high school students who provided a limited number of forecasted players. The rest was supplemented by Marcel. ZiPS has similar issues to MGL, and therefore, has similar manipulations. CBS and Hardball Times required name/ID matching, and a few may have fallen through the cracks.
Some oddities
As I noted a few months ago:
Any of the lucky breaks of one draft because of draft position or whatnot gets wiped out if I run it a thousand times.Cool, right? I was also surmising that everyone would have such similar lists that we’d all end up drafting each player around the same number of times. I really thought that since Pujols was going to be drafted 1000 times, and there are 22 of us, then each of us would draft him at least 30 times, and no more than 100 times. I reasoned that he would be a top five pick, everyone will more or less draft 40-50 times at each slot, so eventually, it would even out. Indeed, he was ranked #1 by 5 of us, #2 or #3 by another 5, #4 by another 6, and finally #5 or #6 by the final 5.
A funny thing happened to that theory. Of the 5 forecasters that had him ranked #1, he went to one of them 761 times out of the 1000 drafts. The 5 forecasters that had him ranked #2 or #3 got him in 157 drafts. The #4 rankers got him the other 82 times. Anyone who ranked him 5th or 6th simply never drafted Pujols in 1000 drafts.
And this repeated itself with a huge number of players. Sometimes, the rankings were so nuanced that a forecaster ended up drafting that player all 1000 times.
Let’s analyze a bit using Marcel The Monkey Forecasting System as the illustration. Marcel had Chad Billingsley, an excellent pitcher who is performing well as ranked 24th in his list. Remember, the automated draft program I wrote simply goes through everyone’s list and starts drafting (while ensuring that the position quotas are filled). There were many forecasters that had Billingsley drafted pretty high. One had him 25th. Marcel ended up drafting Billingsley 984 times, while the other forecaster got him the other 16 times. Quite a difference for such a small position! What happened is that the other forecaster had another player ranked much higher than everyone else (Rich Harden, ranked 14th), so that player was also going to that forecaster, leaving Billingsley almost always exposed.
This particular case worked out very well for Marcel. It wasn’t always so. Here are the 25 players that Marcel ended up drafting the most. You can consider these guys as the guys that Marcel liked more than anyone else:
ORDER_ID MLBAM_ID PLAYER_TX 5 407812 Holliday, Matt 24 451532 Billingsley, Chad 38 449107 Aviles, Mike 59 493137 Matsuzaka, Daisuke 65 467055 Sandoval, Pablo 69 425794 Wainwright, Adam 82 425426 Wang, Chien-Ming 109 451482 Galarraga, Armando 112 446209 Litsch, Jesse 119 434578 Saunders, Joe 142 150217 Guzman, Cristian 157 433584 Carmona, Fausto 161 460051 Getz, Chris 182 458690 Volstad, Chris 207 434633 Baker, John 215 460003 Teagarden, Taylor 322 150317 Crede, Joe 332 407487 Rivera, Juan 361 333292 Wilson, Jack 425 457788 Schafer, Jordan 486 334393 Pierre, Juan 503 346795 Chavez, Endy 513 123107 Tatis, Fernando 635 110383 Aurilia, RichLet’s look at a few of these players. You will see that Marcel thought highly of Dice-K, Chien-Ming Wang, and Fausto Carmona. Talk about bad luck. Was Marcel unusually excited by these players, or was it a case like Billingsley where had these guys just a bit higher than everyone else, and so, just ended up with them so disproportionately because, well, that’s simply the way the draft works.
Let’s start with Dice-K. Marcel had him ranked #41. Here is how high he was ranked by the five forecasters most high on Dice-K: #58, 59, 65, 71, 83. So, it’s not like Marcel was crazy in-love with him. He had him ranked a bit higher, but not unusually higher. The result though is that Marcel ends up with Dice-K in 897 drafts out of 1000! And Dice-K’s Fantasy points is one of the worst in the whole league.
In typical correlation studies, we wouldn’t notice much with this pick. That is, take Dice-K completely out of the correlation study, and Marcel’s correlation coefficient doesn’t change much, if at all. However, because he was drafted by Marcel nearly 90% of the time, and because he had an enormous collapse, it is Marcel that takes almost the whole brunt here. Those other forecasters who also had Dice-K ranked highly, get off almost scot-free here, because Marcel was there to bail them out!
Let’s continue with Wang. We all know what happened with Wang. Was Marcel, who drafted him 510 times as the 82nd best player, unusually excited with Wang? Here were the 5 forecasters most in-love with Wang: #68 (drafted 490 times), #135, #137, #145, #168. So, yes, Marcel should get penalized here. Marcel was actually saved by one other forecaster who had him ranked very high as well. Everyone else discounted Wang heavily (though of course, not heavily enough!). However, in drafting, you just have to discount him enough that he never gets drafted by you. So, whether he was ranked #135 or #494 (his lowest ranking by a forecaster), it’s the same thing! As long as you have one forecaster, just one, that is in-love with him enough, you will never draft him ahead of that guy.
Finally, Carmona. Marcel had him #157 and drafted him 452 times. There was one forecaster that had him ranked even higher at 119 and drafted him 541 times. The rest of the forecasters were not close.
If you had to draw up a list of pitchers who collapsed, these three pitcher would likely be part of the 10 biggest pitcher busts. And by bad luck, Marcel ended up with 3 of them.
At the same time, Marcel got Adam Wainright 665 times by ranking him 69th. Here’s how five forecasters ranked him: #71, 74, 76, 81, 82. So, Marcel benefits here by ranking him just slightly higher enough to end up drafting him two-thirds of the time.
I find these results utterly fascinating, and I can see myself spending an inordinate amount of time studying all this, running different drafts with different scenarios. Clearly, it is not fair to those forecasters who ranked Wainright high to not get any acknowledgement for it. After all, it is simply their bad luck to be in a draft with Marcel. But, Marcel will not actually be in every one of their drafts. Marcel is not necessarily a representative forecaster in a random fantasy draft.
Different draft options
A long-time reader, KJOK, suggested running a new set of drafts, but this team, just put two of the 22 forecasters in a head-to-head league. And since I need to have 550 players drafted, I let them each select 275 players. I thought this was a great idea. Basically, you are on the hook for a huge set of players. And, since Marcel will only be in a few drafts, then those who like Dice-K, Wang, and Carmona will draft them alot more than they did in the official draft. Here are those results:
ID POINTS WINS FAN 112 14358 42 FantasyPros911 203 14190 38 MGL 122 14344 37 RotoWorld 204 14208 37 Baseball Primer (ZiPS) 217 14025 31 Marcel 113 13859 30 FeinSports.com 108 13751 29 Chone 105 13855 26 Brad Null 116 13784 26 KFFL 101 13822 25 ANONYMOUS1 120 13590 22 Razzball 115 13673 20 John Eric Hanson 13025 99 Bottom 10 Bottom 10, alphabetical order, ID / Name: 102 Ask Rotoman 106 CAIRO 207 CBS (proxy) 109 Christopher Gueringer 110 Cory Schwarz 111 Fantasy Scope 214 Hardball Times (proxy) 119 PECOTA 121 RotoExperts 218 SteamerNow this is mighty interesting! Hanson, our official winner, is now middle-of-the-pack. MGL moves up from #3 to #2, while Rotoworld drops from 2 to 3. And FantasyPros911, middle-of-the-pack in the official draft, who won one out of 1000 drafts, here wins all 42 of its head-to-head drafts! Marcel ends up with 31 wins and 11 losses, which is simply fantastic. I expected Marcel to finish 21-21 because, frankly it does nothing special. But, doing nothing special seems to be wbat you should be aiming for because, I presume, (many) others overthought their forecasts enough to let Marcel finish with a fantastic record. As I noted a few months ago:
Here are the players that Marcel drafted in 40 to 42 of the 42 drafts:ConclusionORDER_ID POINTS_CT PLAYER_TX 5 101 Holliday, Matt 24 77 Billingsley, Chad 38 -2 Aviles, Mike 44 27 Nolasco, Ricky 56 20 Slowey, Kevin 65 94 Sandoval, Pablo 69 108 Wainwright, Adam 73 95 Gallardo, Yovani 82 -29 Wang, Chien-Ming 109 14 Galarraga, Armando 112 -5 Litsch, Jesse 119 9 Saunders, Joe 123 23 Maine, John 127 5 Bush, Dave 138 70 Escobar, Yunel 139 47 Buehrle, Mark 157 -6 Carmona, Fausto 161 43 Getz, Chris 162 18 Fontenot, Mike 175 15 Maholm, Paul 182 30 Volstad, Chris 198 28 Lannan, John 206 16 Shoppach, Kelly 207 37 Baker, John 215 9 Teagarden, Taylor 229 21 Looper, Braden 234 13 Marshall, Sean 276 5 Owings, Micah 296 10 Martis, Shairon 309 -5 Harrison, Matt 425 2 Schafer, Jordan 477 40 Rasmus, Colby 503 22 Chavez, Endy 511 -2 Sanchez, Gaby 635 5 Aurilia, RichAviles, Wang, Carmona, and Dice-K (38 times) are still featured prominently, but now we also add Sandoval, Buehrle, and more as players who were blocked from ONE other forecaster in the other competition, but not by the other 20 in head-to-head play. This is certainly more representative of the kind of draft list Marcel is putting out.
I have a solution here which makes more sense. And because I have all the draft lists, I can re-run all the drafts by changing any of the assumptions I want. This allows me to test various scenarios. Next time, I’ll do a head-to-head, but I’ll throw in 20 semi-intelligent random amateurs, so that everyone sticks to drafting 25 players. Basically, those 20 guys are going to pick off reasonable players from the draft pool, so that everything mimics reality. As I also explained:
I agree with Ron Shandler that we should reject the correlation studies of forecasting systems. They simply don’t do what they purport to do. It’s very clear what they do, but then those conclusions are taken out of that particular context, making those conclusions not applicable in the real-world sense.The reality is the reality: what forecasting system should I follow in the draft? Playing time matters. It’s a huge component. ZiPS (#4), Marcel (#5) and MGL (#2) all used the Community forecasts. Without those forecasts, Marcel would be at zero (would have drafted Jeff Francis among others).
So, in order to replicate the draft model, I need to put in one (or two) forecasting systems in a league of “amateurs”, and see how well they hold up. It doesn’t really matter if Marcel sees Mauer as a 20 or 22$ player, if in every draft Marcel finds himself in, there’s someone who sees Mauer as a 26-29$ player. It’s really irrelevant therefore if Mauer is a 2$ or 22$ player: he’s completely useless to Marcel.
Basically, I’m being chickensh!t on Mauer (numbers for illustration only). It’s almost like I’m saying “You know, everyone loves Mauer, but I don’t, but I’m going to put his value high enough so that he comes out ok in the correlation studies, but not so high that someone might actually draft him on that valuation.” It’s nothing more than being a coward in a fight.
This is EXACTLY what Marcel does with the rookies: Marcel knows zero about players without MLB experience. What does he do? He simply presumes league average, with the minimal playing time (200 PA or 60 IP for starters or 25 IP for relievers). So, he values him just enough that the forecast appears reasonable, but the forecast will be so low as to be irrelevant in the real-world. The community playing time forecasts though turns his chickensh!t forecast into something better. If Matt Wieters is forecast for 400 PA by the Community, then Marcel is exposed a bit more.
Therefore, I would say that the only way to evaluate a forecasting system is to actually run it through a model. Not necessarily my models, but something better than correlation studies that puts limits on playing time to be part of the sample, and makes no effort to separate the chickensh!t forecasts from the real forecasts.
If there's a real winner here, it's the Community. As I noted when I ran that survey:
Who knows more about whether a pitcher will be in the starting rotation or the bullpen: an algorithm or a true fan? Who knows more about the number of games an injured ARod will play in 2009: an algorithm or a Yankees fan? There are certain human observation elements that are critical for forecasting.Anyway, what I will do is treat this as a learning experience. I’m going to let the official rules stand in terms of awarding a winner. But I’m going to run modified versions unofficially. This will be my pet project in the off-season. Stay tuned...