Tango on Baseball Archives

© Tangotiger

Archive List

Results of the Forecast Experiment, Part 2 (October 27, 2003)

Thanks to Alan Jordan for contributing to this piece.
--posted by TangoTiger at 10:31 AM EDT


Posted 1:25 p.m., October 27, 2003 (#1) - Unfrozen Caveman Cubs Fan
  So, for those of us who are stuck back in the pack (and I'm sure I was one), will there be a posting of everyone's final numbers individually?

Posted 1:49 p.m., October 27, 2003 (#2) - tangotiger
  I emailed all the readers who participated, telling them that I will send their individual scores, if they so ask. I had about 20 emails bounce, and perhaps you were one.

The only reason I did not want to post everyone's score is that I didn't want to embarrass anyone.

Posted 2:23 p.m., October 27, 2003 (#3) - Stevens(e-mail)
  Embarass me. Or email me if you like. I'm curious. Thanks for all the great work, Tango.

Posted 2:25 p.m., October 27, 2003 (#4) - tangotiger
  Ahh, I love the manliness of that response. Stevens, your score was .840. The average reader was .788.

Posted 2:41 p.m., October 27, 2003 (#5) - David Jones(e-mail)
  I'd love to be embarrassed. Give it to me.

Posted 2:59 p.m., October 27, 2003 (#6) - tangotiger
  David: .748, which makes you #63.

(.788 was # 89).

Posted 3:05 p.m., October 27, 2003 (#7) - David Smyth
  What about you, Tango? Did you participate?

Posted 3:12 p.m., October 27, 2003 (#8) - tangotiger
  No, I did not.

Posted 3:13 p.m., October 27, 2003 (#9) - Buford Sharkley
  I'm sorry to say that my email account expired for lack of use.

Would you please tell me my score? Thanks.

Posted 3:20 p.m., October 27, 2003 (#10) - Tim Cramm(e-mail)
  Embarass me as well (and what's actually more embarassing is that I can't remember if I actually filled out the predictions or not -- I vaguely remember doing so, but it's Monday and the dead and near dead brain cells have firm control over my brain).

Posted 3:46 p.m., October 27, 2003 (#11) - Michael(e-mail)
  Embarass me (I think you should post all the names/results, people are embaressed in the HACKING MASS, the pre-season predictions, etc.). Or email me if you like. Thanks.

Posted 3:52 p.m., October 27, 2003 (#12) - Buford Sharkley
  Consider me embarrassed already. I realized that I did not participate in this.

Nice.

Posted 3:53 p.m., October 27, 2003 (#13) - Stevens
  Ahh, I love the manliness of that response. Stevens, your score was .840. The average reader was .788.

Thanks Tango. That's some weak predicting. It's the Monkey for me!

Posted 3:56 p.m., October 27, 2003 (#14) - bob mong
  Tango, could you explain the Monkey? If I remember correctly, it was described, in the original article, as a three-year weighted average with some age adjustments - would you mind sharing the details?

Posted 4:33 p.m., October 27, 2003 (#15) - tangotiger (homepage)
  The original article is the above link. Here's the paragraph in question, for the monkey:
===============
The baseline forecast is very simple: take a player's last 3 years OPS or ERA. If he was born 1973 or earlier, worsen his OPS by 5% or his ERA by 10%. If he was born 1976 or later, improve his OPS by 5% or his ERA by 10%. The 1974-75 players will keep their 2000-2002 averages.
==============

Posted 4:36 p.m., October 27, 2003 (#16) - tangotiger
  Tim: .900, #149

Michael: .695, #22

Posted 4:58 p.m., October 27, 2003 (#17) - Sky Kalkman
  Go ahead and throw mine out, too, if you don't mind. (I'm pretty sure I participated.)

Posted 5:09 p.m., October 27, 2003 (#18) - tangotiger
  Sky: .761, #72

Posted 5:14 p.m., October 27, 2003 (#19) - David Powell
  Mine too. Thanks!

Posted 5:35 p.m., October 27, 2003 (#20) - Tim Cramm
  Woo hoo! I suck! I vaguely recall making some choices at the time that, in retrospect, turned out to be lousy (and that's even with Giambi being removed, for whom I predicted a large breakout year).

There goes my chance at that Mets GM job...

Posted 5:40 p.m., October 27, 2003 (#21) - Bob Dobalina aka trharlan
  I could send you an email, Tango, but I'd rather be flogged with everyone else here. Please, do tell!

Posted 5:45 p.m., October 27, 2003 (#22) - Craig Knox(e-mail)
  Count me in as those who don't mind being publicly embarassed. If you'd be so kind, Tango, I'd be grateful to hear my score. Thanks.

Craig

Posted 5:57 p.m., October 27, 2003 (#23) - baseball chick
  i wish i had sent mine in. i see the winner was a GIRL!!!!!
chicks rule

Posted 6:20 p.m., October 27, 2003 (#24) - Scoriano
  The average reader was .788.

(.788 was # 89).

Woo- hoo! I'm the average Primate, which also makes me sub-Monkey.

Posted 6:20 p.m., October 27, 2003 (#25) - Patriot
  If you've got nothing better to do, I'd like to see mine please. Pat Burrell, JD Drew, Jose Hernandez all killed me I think.

Posted 6:21 p.m., October 27, 2003 (#26) - Scoriano
  Posted 5:57 p.m., October 27, 2003 (#23) - baseball chick
i wish i had sent mine in. i see the winner was a GIRL!!!!!
chicks rule

Chicks dig the crystal ball.

Posted 6:36 p.m., October 27, 2003 (#27) - David Smyth
  Can you give the results broken down by pitchers and non-pitchers?

Posted 6:36 p.m., October 27, 2003 (#28) - Ira(e-mail)
  Please, embarass me... my address has changed since the contest so I'm sure my results bounced.

Ira

Posted 7:01 p.m., October 27, 2003 (#29) - baseball chick
  scoriano, looks like the winning chick HAD the crystal ball!

Posted 7:12 p.m., October 27, 2003 (#30) - Tangotiger
  Hmmm.... ok, I guess I'll embarrass everyone, except the bottom 10. I'll post the individual results tomorrow morning. I'll also break it by hitters/pitchers.

Posted 8:05 p.m., October 27, 2003 (#31) - Alan Jordan
  "The baseline forecast is very simple: take a player's last 3 years OPS or ERA. If he was born 1973 or earlier, worsen his OPS by 5% or his ERA by 10%. If he was born 1976 or later, improve his OPS by 5% or his ERA by 10%. The 1974-75 players will keep their 2000-2002 averages."

I missed that on part one. I was thinking the monkey was just last years OPS or something. If this is a monkey, it must have its own library card and bifocals. No wonder the monkey beat more than half of the readers. This is obviously the Warren Buffet of monkeys.

Posted 8:30 p.m., October 27, 2003 (#32) - Jonathan
  Tell me mine too, if it's not too inconvenient...

Posted 9:15 p.m., October 27, 2003 (#33) - studes (homepage)
  Well, now I wish I had participated. Tango and Alan, you guys did a great job presenting the results. Well done!

Posted 9:44 p.m., October 27, 2003 (#34) - Dan Lee(e-mail)
  I'm all ears, Tango. I'd love to hear what an idiot I was..

Posted 9:50 p.m., October 27, 2003 (#35) - Alan Jordan
  Thanks Studes

Posted 2:07 a.m., October 28, 2003 (#36) - Nick Warino
  I am among the elite. I want all your meats, cheeses, and women!

Posted 3:35 a.m., October 28, 2003 (#37) - Sylvain(e-mail)
  Thanks a lot Alan and Tango.

Sylvain

Posted 6:13 a.m., October 28, 2003 (#38) - Dennis Shea
  Give it to me, Tango baby.

Posted 8:59 a.m., October 28, 2003 (#39) - tangotiger (homepage)
  The above link shows all the reader picks. Since Ira was the person who has asked to be embarrassed and was the most embarrassed, I reverted to initials for all names after Ira. All other names are exactly as was written back in April.

Posted 9:46 a.m., October 28, 2003 (#40) - Ira
  Wow, that is embarassing.... I wonder what the breakdown was.... I forgot my selections.....

Ira

Posted 9:48 a.m., October 28, 2003 (#41) - Bob Dobalina
  Tango, you are the best. Thank you.

Posted 9:50 a.m., October 28, 2003 (#42) - Scot
  Woohoo! I'm numer 6! Time to quit my day job and put out a glossy magazine with lots of pretty pictures, some shallow commentary and some wild-assed guess predictions. :)

Posted 10:05 a.m., October 28, 2003 (#43) - tangotiger
  File has been updated to include breakdown by hitters and pitchers, overall. Note that we have 20 hitters and only 8 pitchers.

Posted 10:15 a.m., October 28, 2003 (#44) - John Church
  I think that my 7 place finish is the ultimate proof that reader projections are random luck.

Posted 10:19 a.m., October 28, 2003 (#45) - Sylvain(e-mail)
  Thanks one more time Alan and Tango.

Ouch! Tied for 31st as far as the hitting goes, but tied for 162nd for the pitching.

What will be interesting is how the primates will behave next year (if the Primer Chiefs decide to reconduct this forecast, of course): since they know the baseline is better than (most of) them, will they just go for the baseline plus 1/10th of personal adjusments in order to get an edge, or 2/3rd baseline 1/3rd forecasters, or another solution? Are the forecasters for real? Are the best Primates forecasters for real?

Sylvain

Posted 10:44 a.m., October 28, 2003 (#46) - Nick S
  1) The forecasters clearly do a good job, much better than our eductaed guesses (4 out of 6 forecasters basically topped the distribution).

2) The forecasters that did the best job are probably using very similar methodology to the (as noted above very, very mathematically inclined) monkey. What are the differences between the two forecasters that did poorly and the ones that did well?

3) Tango, what where the batter/pitcher breakdowns for the forecasters and for the monkey? The assumption would be that oddball batter prediction is more on the ball, although we won't really be able to tell that from the small pitcher sample size.

4) It is a good thing you removed Giambi fromt he sample, or I wouldn't have squeaked into the top 20.

Posted 10:46 a.m., October 28, 2003 (#47) - Patriot
  Holy cow. I did pretty good on the hitters, and AWFUL(ie 2nd or 3rd worst) on the pitchers. I could've sworn it would be the other way around.

Posted 10:46 a.m., October 28, 2003 (#48) - tangotiger
  Sylvain, that is a good point.

Now that the Primer readers are well aware that their intuition on players with up-and-down recent careers is worse than the baseline, what will they do?

Sabermetricians have longed said that sample size, sample size, and sample size is critical. What we have here is that just taking the last three years unweighted, and making virtually no adjustment (except a slight one for age) is almost exactly what a professional forecasting engine gives you (and they would certainly weight the seasons a little different, and make more adjustments for age, park, and make even more adjustments for types of players, be it speed, power, patient, etc, etc).

In terms of interpreting numbers, back-of-the-card calculations or intuition just won't cut it (though again, as a group, all the biases of the readers cancel out almost very nicely).

What edge does that leave anyone? Personal scouting is the only thing left, I think.

The project I am mulling for next year is to have about 300 players on the ballot, and the readers can ONLY choose those players that they've seen with their eyes at least 10 games, or they are a member of the team they follow all the time. I'd only count those players where there are at least n number of readers making their forecasts. I'm not sure what n would be, but I'm hoping to make it 20, but realistically, I'd say 5.

I'm not sure if I'll be able to get the professional forecasters to give up their OPS/ERA forecasts for this many number of players, but, as we've seen, the baseline forecast does a pretty good job anyway.

We'll see how well the scouting eye holds up.

Posted 10:54 a.m., October 28, 2003 (#49) - tangotiger
  Nick (#3): the scores (hitter/pitcher) for

.56/.97 - baseline
.62/.87 - mean of forecasters
.66/.79 - readersgroup

Among the individual systematic forecasters, Palmer and Silver were 1-2 for hitters, but Shandler and Szymborski were 1-2 for pitchers. Palmer was last for pitchers, and Shandler last for pitchers. So, they were all over the place. I doubt that the systematic forecasters have similar engines, though, overall, they get similar results.

I think pitching is the one place where "personal scouting" might come into play, and, though we have only 8 extreme pitchers, the readers did great there.

Posted 11:13 a.m., October 28, 2003 (#50) - MGL
  Tango,

What are the numbers for a "monkey" if the monkey uses a 3/4/5 weighting for the 3 years? How about a weighting plus a basic park adjustment (using a 3-year, or similar, "OPS park factor") for those players who have not played on the same team for the 4 years in question, or if you don't want to do that much work, a park adjustment for only those players who switched teams from 2002 to 2003?

Posted 12:19 p.m., October 28, 2003 (#51) - Jim
  Where are each of the systematic forecasters numbers published? I guess Shandler is Baseball Forecaster, Silver is PECOTA/Baseball Prospectus, Szymborski is ZiPS/Baseball Primer, Tippett is Dimond Mind Baseball.

What about Palmer and Warren?

Posted 12:40 p.m., October 28, 2003 (#52) - tangotiger
  All the forecasters sent me their picks, but Palmer and Warren do not have theirs published.

MGL: I can do the 5/4/3 weighting (though I would make it 5/4/3/2, where "2" is the league mean). Park factors I don't have, and I think it would be too much for me to do at this point.

Posted 12:55 p.m., October 28, 2003 (#53) - tangotiger (homepage)
  Hmmm... very interesting.

Somewhere in the above link (Banner Years article), I proposed a very simple scheme, which I call the 5/4/3/2 scheme (for hitters). You take 5 parts 2002, 4 parts 2001, 3 parts 2000, 2 parts LeagueMean. That last component is the "regression towards the mean". This is for hitters.

Though I've never published it for pitchers, I usually toy around with 3/2/1/2. That is, I give a little more weight to the 2002, and less weight to the 2000 years, and I regress alot more.

And the results? This monkey, which we'll call Marcel, after the Friends' monkey, does extremely well. How well? Better than all the other forecasters.

Marcel the monkey has a .653 score overall, including a .59 score for hitters (#3 among the systematic forecasters) and a .82 score for pitchers (also a #3 among the systematic forecasters). This combination was enough to propel Marcel ahead of the 6 systematic forecasters.

I'm actually a little surprised by the results. I would have expected the 6 systematic forecasters to have applied the Marcel scheme, and enhanced it. It doesn't appear to be the case.

Posted 12:58 p.m., October 28, 2003 (#54) - Monty
  Heh. I got 52nd with my prediction of "Everyone will do exactly as well as last year".

Posted 2:48 p.m., October 28, 2003 (#55) - Brandon Feinen
  Hey, cool I'm 11th overall. Guess that means random guesses are as good as anything else. Where is the hitter/pitcher breakdown at?

Posted 2:49 p.m., October 28, 2003 (#56) - Randy Jones
  well, I got 13th(my real name is J.P. Gelb), pretty good, specially since I completely forgot about this. Looks like I am much better at predicting pitchers, .59, than hitters, .72, or more likely I got very lucky with the pitchers...

Posted 5:16 p.m., October 28, 2003 (#57) - Jim
  Tango, I assume Palmer is Pete Palmer of Hidden Game. Who is Warren?

Also, in your pre-season writeup you said you were trying to get forecast data from STATS, Inc. Did you ever get these? I think these would be interesting since in the study Voros did a couple years ago, he found that STATS was among the most accurate in hitter projections.

Posted 5:35 p.m., October 28, 2003 (#58) - Tangotiger
  Palmer=Pete Palmer.

Warren=Ken Warren

STATS=no one bothered to ever return my emails.

Posted 7:00 p.m., October 28, 2003 (#59) - MGL
  I am not surprised at Marcel's success. If you factor in the park changes, Marcel probably (and should) blows everyone away! I've been saying (screaming) for years that projecting player performance is NOT rocket science nor does it take any special scouting, observational, intuitive, or even mathematical skills. It is simply "Monkey See Monkey Do," as Tango's experiment illustrates. I cannot say this enough. Barring some injury or other extraordinary factor, the best estimate of a player's performance is his last 3 or 4 years' performance, weighted and adjusted for age and context (park, opponent, etc.)! This is so important it bears repeating a hundred times or so (but I won't)! In fact, if you do just about anything else, you are probably going to do a lot worse than the sophisticated Monkey (Marcel plus context adjustments).

Although I was not able to participate in Tango's experiment this year (hopefully I'll have the time next year), my forecasting algorithm is available for all the world to see, and I'm sure my results would be somewhere near the top. I simply take each player's stats from the last 3 years, adjust them component by component for the strength of all of his opponents in those 3 years, adjust them for each park that a player plays in over those 3 years, and adjust each component for age (remember that aging curves look very different for each component). Then I adjust (to a healthy baseline) an entire year if that player was slightly injured, moderately injured, or severely injured in that year. Then I combine the 3 years using a 5/4/3 weighting system and regress each component towards the mean of an average player of similar height and weight. The fewer PA's a player has in those 3 years, the more each component gets regressed. Finally, if there is a continuing or new injury, I adjust the final stats to account for that injury.

I don't like to say this too much, because you get tons of flak from almost everyone other than hard-core sabermetricians, but, at least as far as player evaluation goes, for the purposes of projecting player performance, setting salaries, and putting together a successful team, you don't need to watch players and you don't need scouts (except perhaps for minor leaguers - even then, can you say MLE?). All you need are a player's stats! I live by this credo and I'll die by it! And I think that this whole experiment and discussion suggests that it is true at least to a large extent!

Seriously, how do you think most managers and GM's would do if they participated in this forecasting experiment? It would be so embarrasing it would be scandalous! Enrique Wilson (the best utility player in baseball according to Tim McCarver), Tony Womack, Neifi Perez, and Luis Sojo might be in someone's top 10!

This crap by some saber-types conceding that you have to combine sabermetrics with a "feel for the players," scouting, and other traditional evaluation techniques, in order to evaluate players and put together successful teams, is just that - a bunch of pandering, lip service crap - and I'm not afraid to say so!

Posted 7:07 p.m., October 28, 2003 (#60) - Monty
  my forecasting algorithm is available for all the world to see, and I'm sure my results would be somewhere near the top.

Okay. What's your algorithm and what results would it have predicted for these players? Then we can find out where you would have finished.

Posted 7:37 p.m., October 28, 2003 (#61) - Michael
  I got #21 and did better on hitters than pitchers, which surprises me not at all. I would suggest in future we look at all players, or more players, not just up-down-up types. It may well be the case that naive (or sophisticated-naive for a tangotiger monkey) algorithms do really well when there is a lot of uncertainty, but when things are fairly predictable they may underperform scouting or educated guesses. Maybe they miss certain breakouts or crashes. That information would be useful to (although my hunch is that MGL is right).

And I wouldn't restrict it to players who people have seen. If you want that information I'd ask people to put y/n on a question have you seen this player. That way you can study both and don't lose data.

Heck, if readers are willing to do a bit more work (I know I would, but don't know how many others are) you should ask people to put distributions on the performances. Like what do you think someone's 5%, 25%, 50%, 75%, 95% numbers are. Interesting to see how people do on distributions as that may well be a place for intelligence that the monkey, as of yet, misses. And a player whose OPS prediction is 750, 775, 800, 825, 850 might be worth a different amount to a team than a player whose distribution is 650,750,800,850,900 even though both have a predicted 800 OPS.

Also, it would be interesting to see how people did if you don't throw out the "troubling" data points.

Posted 8:31 p.m., October 28, 2003 (#62) - MGL
  I'll try and do my "after the fact" projections and see where they stand. Of course, no one will know for sure whether I cheated or not (I won't). My algorithm is basically what I described. The only thing I didn't specifically give (I'd be happy to) is what numbers I use to adjust for injury years, what numbers I use for park factors and age factors, and what my regression formula is.

As far as the 5%, 25%, etc. levels, such as Pecota does, personally, I don't think anything other than using regular old z scores are appropriate (IOW, if you have a .700 OPS projection, then there is a 5% chance that that player would have an OPS of greater than 2 SD above or below .700, where one SD is based on one year's worth of projected PA's. Anything other than that (such as what Pecota tries to do), is BS I think (I am not sure)...

Posted 8:38 p.m., October 28, 2003 (#63) - Tangotiger
  If you want that information I'd ask people to put y/n on a question have you seen this player. That way you can study both and don't lose data.

PERFECT idea.

***

As for MGL (sorry for blowing your cover Mickey), he did participate in 2 prior Voros studies, and he did not do as well as Voros, who has been retired as champion. You can find the results to those studies on Voros' site.

Posted 9:04 p.m., October 28, 2003 (#64) - MGL
  I consider Voros the projection God! My algorithm is better now anyway!

Seriously, I think that any variation on Marcel, with park adjustments, is as good as any other and about as good as you can get. Plus there is obviously a fairly large sample error (margin of error) factor in the results...

Posted 9:58 p.m., October 28, 2003 (#65) - Michael
  As far as the 5%, 25%, etc. levels, such as Pecota does, personally, I don't think anything other than using regular old z scores are appropriate (IOW, if you have a .700 OPS projection, then there is a 5% chance that that player would have an OPS of greater than 2 SD above or below .700, where one SD is based on one year's worth of projected PA's. Anything other than that (such as what Pecota tries to do), is BS I think (I am not sure)...

I agree with you most likely. But I think it is more worthy of study than should you go with Marcel on picking the 50%. I.e., I think it is more an open issue even though the default hypothesis should be just the SD of the regressed value.

I also know from other studies in economics that people tend to misestimate their confidence bars quite a bit. Like if you ask people in finance what's the GDP of the US, and ask them to give you a 90% interval estimate, the interval will end up being way too small. So I think people's guesstimates of spread will be off too.

Posted 10:13 p.m., October 28, 2003 (#66) - Tiberius
  Is the reader projection distribution somewhat similar to PECOTA? I am not a stat head, but it strikes me PECOTA gives a distribution of performance just like the Primates did. I wonder which SD was higher?

Posted 11:59 p.m., October 28, 2003 (#67) - Nate (homepage)
  Tango,

I believe that Ken Warren did publish his predictions, if putting them on a website counts as publishing. Anyway, the hitters can be found in the homepage link and on that page is a link to pitchers.

Hope that helps.

Posted 9:05 a.m., October 29, 2003 (#68) - David Smyth
  This crap by some saber-types conceding that you have to combine sabermetrics with a "feel for the players," scouting, and other traditional evaluation techniques, in order to evaluate players and put together successful teams, is just that - a bunch of pandering, lip service crap - and I'm not afraid to say so!

Well, it's 95% crap, and the problem with the other 5% is that it only "proves" itself in retrospect, at a point where everyone knows about it, canceling out any "inside info" advantage. For example, I expect that even the professional forcasters will weight Loaiza's 2003 season heavier than normal in their 2004 projections, because his claim of learning a new pitch has been backed up by a reasonable performance sample. But he claims IIRC to have learned the pitch and used it with success in his last few starts of 2002. To upgrade his 2003 projection just based on that would have been foolhardy, but ultimately correct.

So, the point is, even if you have legitimate scouting info, the only way to differentiate it from all of the usual scouting "noise" is to wait and see. And then it becomes part of "everyone's" projection. Is there a forcaster out there (MGL?) who would go strictly by the numbers with Loaiza for 2004?

Posted 4:56 p.m., October 29, 2003 (#69) - Alan Jordan
  Micheal - "It may well be the case that naive (or sophisticated-naive for a tangotiger monkey) algorithms do really well when there is a lot of uncertainty, but when things are fairly predictable they may underperform scouting or educated guesses."

There are two kinds of uncertainty - 1. where the underlying system stays the same, but there is a random noise to the data. 2. Where the system changes.

Algorithms shine in the first type and fail miserably in the second.

Posted 11:31 a.m., October 30, 2003 (#70) - Walt Davis(e-mail)
  Holy Moly! I don't even remember doing this and I'm second. I'm a freakin' genius.

Tango, would you either e-mail me or post my picks? I'm flabbergasted -- I usually suck at this sort of thing.

Posted 12:07 p.m., October 30, 2003 (#71) - Walt Davis
  As far as the 5%, 25%, etc. levels, such as Pecota does, personally, I don't think anything other than using regular old z scores are appropriate (IOW, if you have a .700 OPS projection, then there is a 5% chance that that player would have an OPS of greater than 2 SD above or below .700, where one SD is based on one year's worth of projected PA's. Anything other than that (such as what Pecota tries to do), is BS I think (I am not sure)...

Well, whether PECOTA does it correctly is another question. But I would be very surprised to find that performance had a normal distribution. Such beasties are in fact quite rare in reality.

First, with the exception of Barry Bonds, there seems to be an upper limit to how well a player can perform. Secondly, and probably more importantly, there's a lower limit below which a player will not be allowed to perform.

I don't know what the standard error is on an OPS prediction, but +/- 60 points wouldn't surprise me. Well, if the baseline prediction is a 700 OPS, it would seem to be silly to say that the 95% confidence interval is 580 to 820. Unless it's Cesar Izturis or the Tigers, the player's not going to be allowed to post a sub-620 OPS for very long.

Or take Jason Giambi. Even with an age adjustment, I'd guesstimate that his 2003 baseline prediction would have been at least a 1000 OPS. But should anyone think that, given his age and weight, that an 1120 OPS was as likely as an 880 OPS? That doesn't make sense to me.

Finally, especially with older players and maybe the youngest ones too, it would seem that their chances of significantly underperforming their prediction is greater than the chance of overperforming it. In short, I'd imagine that age increases the chances of falling off a cliff. And while that is covered somewhat by the age adjustment, I doubt it covers it sufficiently. And, paradoxically, if it did cover it sufficiently, the chances of overperforming would probably be greater than the chances of underperforming ... which means we aren't talking a normal distribution.

Now, having said all that, I suspect that the substantive impact of a symmetric vs. a non-symmetric error distribution is probably trivial.

Posted 6:01 p.m., October 30, 2003 (#72) - David Smyth
  ---"...given his age and weight..."

I realize that this is a trivial part of Walt's post, but Giambi was only 32 this year, and he does not look fat to me at all. I see no reason to have docked him more than the usual age adjustment.

Posted 2:18 a.m., October 31, 2003 (#73) - Snowboy
  If anyone has done an analysis of PECOTA's 2003 predictions/results, I'd like to read it. Has it been done yet?

Posted 6:23 a.m., October 31, 2003 (#74) - Jonathan
  I agree with David. I'd go so far as to give Giambi a 50% chance of breaking 1000 ops in 2004. (Of course, I've been watching Edgar for several years, so my perspective is skewed.)

Posted 12:58 p.m., November 1, 2003 (#75) - MGL
  Walt,

One of the problems is that there are two factors which determine what the "curve" will look like - one, the distribution of a binomial (will the player get a hit, a walk, a home run, etc., in each PA or won't he?), and two, the distribution of possible changes in true talent level, which is presumably based on things like chnages in age, physical condition, injury, "learning," and mental and psychological factors. The former should produce a normal curve, by definition - the latter, who knows? Pecota seems to focus on the latter and completely ignore the former. The former cannot be ignored. It will always exist and there is nothing that anyone can do about it. As I like to say, it is possible that certain players have a consistent talent level from day to day while others do not (for whatever reasons), but a player has no control over the random (actual semi-random, but then again, the throw of a die is semi-random as well) nature of the outcome of each PA. You (Walt) are talking only about the latter (changes in talent level from year to year) as well, whereas I am pretty much talking only about the former. But my contention is that the latter is either insignificant as compared to the former OR that it mimics the distribution of the former (it is bell shaped with a similar SD), so that the net result is a performance distribution which is approximately normal with an SD defined by the binomial distribution of OPS, BA, or whatever metric we are talking about...

Posted 6:01 p.m., November 1, 2003 (#76) - Arvin Hsu
  One of the problems is that there are two factors which determine what the "curve" will look like - one, the distribution of a binomial (will the player get a hit, a walk, a home run, etc., in each PA or won't he?), and two, the distribution of possible changes in true talent level, which is presumably based on things like chnages in age, physical condition, injury, "learning," and mental and psychological factors. The former should produce a normal curve, by definition - the latter, who knows?

Actually, the former does not produce a normal curve. It produces a binomial. The latter produces what statisticians often call the beta-binomial. There resides a beta prior distribution on the binomial theta(or p) value. This beta distribution has it's own mean and std. This is what contains the day-to-day variation. This does not, however, account for overall shifts in ability, ala the Loaiza example.

-Arvin

Posted 11:10 p.m., November 1, 2003 (#77) - Tangotiger
  Just to add an additional perspective to PECOTA-type projections.

1 - PECOTA attempts to establish PERFORMANCE probability ranges. There are 3 very important things that affect these figures

a) the true talent level of a player. We can't know this, and therefore, this itself comes with its own probability distributiong. Say that Bonds' true talent level is .500 OBA, with 1 SD = .050.

b) the probability distribution of performance over 600 PA, given the true talent. So, now we have a probability distribution of a probability distribution. So, over 625 PA, you might have 1 SD for Bonds' OBA of .500 at .020. Then, you have another one for Bonds' OBA of .480, etc, etc. a) and b) combined will widen the distribution

c) the number of PAs. And in here, there are 2 subsets: is his PA limited because of some random occurrence, or is it limited because of poor short-term performance, and his opps have been limited because of selective sampling? So, again, you have to have different distributions at different PA levels. If Bonds had only 150 PAs, changes are, he got hurt. If Jeff Weaver get 15 starts, changes are, he's selectively sampled. But again, you need a probability distribution of why he has those PAs.

So, even though PECOTA gives those distributions, there's really another dimension missing, playing time.

Posted 11:11 p.m., November 1, 2003 (#78) - Tangotiger
  "changes" = "chances"

Posted 10:09 a.m., November 3, 2003 (#79) - tangotiger
  The average standard deviation of the forecast of each player, among the readers, was .46 overall (.42 for pitchers and .47 for hitters).

The actual year-to-year standard deviations is over 1.0.

So, for the people thinking that the readers would give a representative distribution of the players' probability performance ranges, this is not the case.

Posted 4:00 p.m., November 3, 2003 (#80) - Walt Davis
  One of the problems is that there are two factors which determine what the "curve" will look like - one, the distribution of a binomial (will the player get a hit, a walk, a home run, etc., in each PA or won't he?),

There are a few potential problems with this. First, if we are talking PA, we are possibly talking a Bernoulli distribution. That is a trivial distinction, but the binomial distribution is the result of a series of independent Bernoulli trials with a constant probability.

Now independence seems close enough to being true -- there is little if any evidence that the outcome of a player's last PA impacts the outcome of their current PA (e.g. streaks, etc.). However we know that the p-level is not constant from PA to PA for a batter -- positive outcomes are far less likely against Pedro Martinez or in Dodger stadium. Hence we know that the binomial is not the exact distribution of these outcomes. It's probably not far off, but it's definitely not the perfect assumption to make.

Far more importantly, a PA is not a Bernoulli trial. A Bernoulli trial is an event with only two outcomes. The chances of getting a hit vs. not getting a hit may be a Bernoulli trial. But there are multiple possible outcomes (single, double, triple, etc.), each of which makes the other outcomes impossible. This is either a conditional probability problem (i.e. first was it a hit, next what kind of hit was it) or it's a multiple outcome problem. Regardless, each PA is NOT a Bernoulli trial because Bernoulli trials only have two outcomes. No Bernoulli, no binomial. No binomial, no straigtforward leap to a normal distribution.

When modeling the probability of multiple outcomes like this, one usually either assumes a "probit" or a "logistic" model. In both (or any other generalized linear model), you assume that there is an underlying continuous (unmeasurable) variable which determines the likelihood of the varying outcomes. In probit, this underlying variable is assumed to be normal; in logistic, it is assumed to follow a slightly different distribution.

For any individual outcome, a poisson or negative binomial distribution may be as good or better a fit than the binomial. These distributions are often considered better for rarer events, though I'll admit I know that only as a rule of thumb, I've never seen any research on it (nor have I ever looked for any).

Thirdly, what is being projected? Are sabermetricians actually trying to simultaneously predict the number of singles, doubles, triples, HRs, BBs, Ks, HBPs, and outs? Or are they trying to project an OPS or VORP or win shares or some other single measure of value/performance? If the former, I've yet to notice anyone talking about running multinomial or conditional logit and probit models. Moreover, we'd essentially be talking about multi-dimensional space, so "normal" would be an odd way to describe the error distribution. If the latter, I remain unconvinced that there's any good reason to think it follows a normal distribution.

As a brief and very tiny example, let's look at the players in this fun little project.

The baseline estimate overestimated 13 of 21 hitters. The average misestimation was 30 points of OPS over actual performance. Or to put it another way, the average projected OPS for these 21 hitters was 846, the actual was 816. Only 1 hitter outperformed his projection by 100 points or more of OPS; 5 hitters underperformed their projection by 100 points or more.

Things aren't much different for the forecasters, except they overestimated only 11 of 21. But the average misestimation was still 30 points of OPS over actual performance, and their mean projection was pretty much identical.

Things are essentially the same for pitchers, where ERA was underestimated by .36 by the baseline and .53 by the forecasters.

So, for this small sample, assuming that the confidence interval would be a normal distribution with a mean equal to the projected performance would clearly have been a bad idea. Whether the "problem" is a mean bias in the forecasts or asymmetric confidence intervals is an open question. Of course, these players were chosen specifically for their "uncertainty". But there's no question that, this year, a symmetric confidence interval for these uncertain players would have been a bad choice.

Posted 10:47 a.m., November 9, 2003 (#81) - AT
  "Andrea" in Italian = "Andrew" in English. IOW, not a girl. Thanks for the compliment though...

Posted 12:44 p.m., November 9, 2003 (#82) - Scoriano
  "Andrea" in Italian = "Andrew" in English. IOW, not a girl.

So is "Silvio" Berlusconi really "Silvia" in English?

Posted 11:35 p.m., November 9, 2003 (#83) - Tangotiger
  Should have realized since Andrea Bocelli is not a girl, and I'm italian.

Posted 12:28 a.m., January 26, 2004 (#84) - Anonymous
  .