Tango on Baseball Archives

© Tangotiger

Archive List

Baseball Prospectus - : Evaluating Defense (March 1, 2004)

This bothers me to no end.

(I'll put in a disclaimer here for those who think I have some ulterior motives: I email with several of the BP authors. Don't look for anything. Let's stick to the matter at hand.)

But, come on.

Davenport Fielding Runs are a major step up from other widely available defensive statistics such as Fielding Percentage, Range Factor, and the enigmatic Zone Rating, but statistical analysis of baseball defense is still in its infancy. Front offices have access to much more advanced metrics than the public, some specifically charting where each batted ball is hit, how hard, and how high.

First of all... enigmatic? Count the number of balls in a zone of responsibility hit to a player, and count how many outs. Fine, we don't like how the whole zone is treated the same. That's why we have UZR, as introduced by STATS, and improved upon by MGL.

MGL has access to the STATS PBP data, PBP data that charts where the ball is hit and how hard, etc. MGL, at his great expense, purchases this data every year, and is generous enough to share his efforts and time to produce some wonderful work. UZR is the current gold standard that is available to the public (even if it can be impproved upon). Teams don't have a leg up, not as long as we have MGL.

But, to not even mention UZR? When the NY Times and the NY Post are ahead of the curve here, something is missing.

I told one of the BP guys this weekend that I find it a shame that BP never quotes UZR data. And today, I see this.

Tell me. Am I being unreasonable here?

(Note again: for all you guys who are looking for something, don't post it. You know who you are. Email me if you need to get personal.)


--posted by TangoTiger at 11:58 PM EDT


Posted 12:47 a.m., March 2, 2004 (#1) - Michael Humphreys
  Tango,

I'm with you on this. Every baseball fan owes a big "thank you" to MGL for providing essentially the same quality information that major league teams have.

Zone ratings are not "enigmatic"; DFTs are.

The "explanation" of DFTs provided in the attached article explains nothing. (Tomorrow I'll check out the hard copy of Prospectus 2004 to see if any more information is provided there.)

Of course I have to admit that I haven't fully revealed the formulas for DRA, but I've certainly provided a lot more detail about the DRA system than BP had provided regarding DFTs. I've also gone to the trouble of testing DRA results against UZR and Diamond Mind evaluations.

Perhaps the reason BP tries not to talk about zone ratings is that BP might not be allowed to "sell" such ratings directly or indirectly through its website.

All carping aside, I am glad to have DFTs around--they're pretty good, the price is right, and they have ratings for every player who ever played the game.

Posted 1:40 a.m., March 2, 2004 (#2) - Larry H
  Please provide a link to the UZR derivation/methodology. Those of us who are new to the stats part of the game would benefit sometime if the conversation weren't only in code to those who already seem to know what's going on.

Thanks.

LH

Posted 7:11 a.m., March 2, 2004 (#3) - Nod Narb
  Larry,

There's a link to the primate studies index right at the top of the Primate studies page. That should give you easy access to most things referenced in individual threads.

Posted 7:16 a.m., March 2, 2004 (#4) - studes (homepage)
  Right on, Tango. There are at least four systems I would choose over DFT: ZR, UZR, DRA and Pinto. And, like you, I don't even know what goes into DFT. Because of that, I'd even reference fielding Win Shares before DFT! At least I know what some of the flaws in WS are.

[an error occurred while processing this directive] Posted 8:33 a.m., March 2, 2004 (#6) - Ed
  I've started to read through BP 04 and was surprised to see Scott Rolen with negative fielding runs for 2003. Do any other metrics produce this surprising result?

Posted 9:03 a.m., March 2, 2004 (#7) - Mike Green
  Ed, Scott Rolen has -1 fielding WinShares. He's mildly positive on UZR and Pinto. It does seem that his fielding has declined noticeably with age.

Posted 9:25 a.m., March 2, 2004 (#8) - Ed
  Interesting. I seem to remember Diamond Mind giving him a GG for 03, but that may be faulty memory.

Posted 9:27 a.m., March 2, 2004 (#9) - Ed (homepage)
  Answering my own question, Beltre beat out Rolen in 2003 (homepage).

Posted 10:13 a.m., March 2, 2004 (#10) - Larry H
  Nod Narb,

Thanks regarding UZR.

Actually the basic UZR methods explanation doesn't appear to be in the Primate Studies index, at least not that I could see. But I searched the Baseball Primer Archive and found two references that are apparently to the latest methods statements.

http://www.baseballprimer.com/articles/lichtman_2003-03-14_0.shtml

http://www.baseballprimer.com/articles/lichtman_2003-03-21_0.shtml

LH

Posted 10:16 a.m., March 2, 2004 (#11) - cricketing baseballer (homepage)
  Weren’t DFTs first described in the 2002 Baseball Prospectus?

And can one take UZRs back in time? If I understand the DFT system correctly, you can use them for 1903 or 2003, regardless. But can one do that with UZR? If not, that would render UZR somewhat “enigmatic” to me.

Also, while we are on the subject of fielding, why are people so dead set against Range Factors. The article highlights a lot of problems, but as a relatively quick and dirty solution wouldn’t a kind of RF+ system work? I only ask this because I’ve missed some of the debate on this, not because I’m carrying any torch for RFs (or DFTs for that matter).

Posted 10:16 a.m., March 2, 2004 (#12) - tangotiger (homepage)
  In addition to getting everything you want about UZR from the Primate Studies Index, you can go to the above homepage link to read all about UZR, along with the dozens of comments. Take a few hours to go through both articles and commentary.

Posted 10:17 a.m., March 2, 2004 (#13) - tangotiger (homepage)
  As well, you should read Mike Emeigh 8-part series at the above homepage link for a primer on all fielding metrics.

Posted 10:23 a.m., March 2, 2004 (#14) - Larry H
  Thanks. Now I really have my reading cut out for me. Might I suggest that if all the defense stuff has been on a single index instead of every author's own stuff being on separate pages, that would help people like me.

LH

Posted 11:05 a.m., March 2, 2004 (#15) - KJOK(e-mail)
  I agree with Cricketing Baseballer. UZR's are fantastic FOR YEARS WHERE YOU HAVE PLAY BY PLAY DATA!

For the rest of the history of baseball, good alternative methods are still needed.

Posted 11:16 a.m., March 2, 2004 (#16) - tangotiger
  You develop a system to maximize the information at hand. If that means using UZR for 1989 to today, and using something else for other years, fine.

A good process would baseline their system against maximum data to use in those years with minimum data. That is, we know how many GB and FB are hit from 1989 to today. If you have a way of trying to estimate that, then you should ensure that you do so against what we know (with an error range). Once you have that system do, you apply it backwards in time (with that same error range, and perhaps a little higher). Same thing with lefty/righty, grass/turf, etc.

I think Charlie Saeger / Mike Emeigh do something like this, but I'm not sure to the extent that it is done.

Posted 11:17 a.m., March 2, 2004 (#17) - tangotiger (homepage)
  Larry, he's one more link for you. James Fraser and Sylvain have done just as you said, and every article related to fielding is on the above page. (Not sure when it was last updated)

Posted 12:03 p.m., March 2, 2004 (#18) - tangotiger
  Rolen's UZR runs and games from 1999 to 2003:

99: +26, 101
00: +25, 116
01: +33, 154
02: +28, 155
03: +3, 148

Pinto has him at +6 outs (or +5 runs) for 2003.

That's certainly quite a change in performance. I'll take a guess and say that it is statistically significant, but maybe someone else can chime in here.

Figure that from 99 to 02 he did: 2104 BIP, and .767 outs per BIP (lg of .700). In 03, he'd be 592 BIP, and .706 outs per BIP.

So, his probable true talent in 99-02 was .756. His 03 performance would be 3 SD away from this.

Posted 12:05 p.m., March 2, 2004 (#19) - tangotiger
  Of course it is statistically significant (pretty much anything is if you put your level low enough). I meant if it was significant at the 95% level.

Posted 1:00 p.m., March 2, 2004 (#20) - ColinM
  "Of course it is statistically significant (pretty much anything is if you put your level low enough). I meant if it was significant at the 95% level. "

The problem with this is that you run in to the old selective sampling issue. The reason why we're looking at Rolen to begin with is because someone noticed how out of line his '03 was with the rest of his career. By chance alone, there should be a few players who are this far out of their range simply by luck.

I'd say that there's a decent chance that Rolen was just one of those players.

Posted 1:22 p.m., March 2, 2004 (#21) - Mikael
  I posted this over on Clutch Hits, before Darren directed me here.

I hate to be That Guy ripping on Prospectus. I like them and what they do.

But I have mad ish with the "BP Basics" article today on defense. It starts well, with the classic argument against errors and F%, against judgments based on anecdotes. But in discussing the proposed solutions to these problems, James Click lauds Davenport's Fielding Runs as the latest advance over Range Factor, with no mention of similar systems like Win Shares or Context-Adjusted Defense or others. It makes only oblique reference to PBP stats, primarily as proprietary metrics owned by teams. MGL's and Pinto's and everyone else's work is ignored.

I understand that they are a corporation with corporate goals, and they've decided that mentioning extra-BP research harms their pursuit of those goals. But right now, they're publishing a piece whose stated goal is to help readers in "understanding the game better," while willfully ignoring most of the issue for purely profit-based reasons. I think that's really, really lame.

Lame.

Posted 1:23 p.m., March 2, 2004 (#22) - tangotiger
  Colin, agreed.

***

The BP author was kind enough to reply to my rant. In essence, the reply is that the target audience of the piece doesn't necessarily include someone like me. I'm ok with that.

My points are:
1 - calling something as straight-forward as ZR "enigmatic" is, in my view, unenlightening (ZR is to a fielder what OBA is to a batter); a new reader, or the target reader, might take that at face value, and not even question it

2 - that if you decide to write those parts that I have noted in bold, then UZR should have been mentioned

Take out those parts in bold, and the article itself does a good job of conveying its message to its target audience.

Posted 1:33 p.m., March 2, 2004 (#23) - ColinM
  Even taking out the bold part doesn't seem like enough. If BPro wants to discuss the state of defense evaluation, they should at least mention that there are better metrics available! If they don't know about UZR, then they really aren't cutting edge anymore, are they?

Posted 1:46 p.m., March 2, 2004 (#24) - Charles Saeger(e-mail)
  DFTs were explained in a BP a couple of years ago. Structurally, they're similar to CAD, range more like the original version, where it was handled like Bill James's old Range Index. The final numbers will be similar, though there are some differences. Clay's published 1b putout formula is better than my published formula, though I have an unpublished one that is even better yet. (His 1b putout formula is based on mine, and I was credited for it, making me I think the only BBBA contributer ever mentioned in BP.) His 2b/ss putout formulae are also better, and I have revised versions in the spirit of his, though frankly, middle infield putouts are of decidedly minor value, as opposed to 1b/of putouts, where they are bread and butter. I think he weighs an error higher than a hit, which he shouldn't do; CAD looks like it does, but actually it weighs a hit higher than an error for the team, when you use a collective performance.

I place more weight on DFTs than any other fielding measure using the traditional stats, but not because it's the best. I place the weight on it because it is publically available for all history, and because I know how it works. It's better than Win Shares, though I really like having Win Shares fielding around because it changes the context, hitting everything from a different angle. Really, though, this isn't rocket science; any frigging idiot who understands baseball and can even manually do multiple regressions in Excel can write a decent fielding formula.

Posted 1:46 p.m., March 2, 2004 (#25) - studes (homepage)
  That's a weird response. Who is the target audience? I would presume it is a relatively informed baseball fan with an analytic bent. And why wouldn't that sort of person be interested in UZR and other systems?

BPro writes some very informative stuff, but their attitude stinks.

Posted 1:50 p.m., March 2, 2004 (#26) - Charles Saeger(e-mail)
  Another note:

I cannot understand why teams would want to evaluate current players using traditional fielding stats. Right now, they're most useful for historical evaluation, and for in-season spot checks; MGL can't tally UZR in-season. An adequate (not straight ZR, which is useless; more like the old DA) pbp evaluation is light-years better than any traditional stat method, and allows you to look at smaller areas of performance.

Posted 1:53 p.m., March 2, 2004 (#27) - Dotterer
  After reading the piece, I don't think Click was trying to discuss the state of defense evaluation. I think they're just shooting to put something up for people who are total sabermetric newbies. I dont think it's anything sinister. I've seen Huckabay speak twice, and I know he knows about UZR because he mentions it as a defensive measurement he likes and uses.

I think the piece would have been stronger without any mention of alternative defensive measurement at all. But the whole series reads like an intro to sabermetrics. This piece is no different.

Posted 2:12 p.m., March 2, 2004 (#28) - studes (homepage)
  Dotterer, you're right. I really should amend my statement to say that sometimes I don't understand BPro's attitude. Full disclosure: I'm a BPro subscriber, so I obviously like what they do.

Posted 5:56 p.m., March 2, 2004 (#29) - Rob H
  That's certainly quite a change in performance. I'll take a guess and say that it is statistically significant, but maybe someone else can chime in here.

Rolen *looked* off in 2003, at least compared to what he did in 2002. I know I had the same thoughts Tippett had about Rolen's injuries (shoulder in 2002 NLDS and neck midway through 2003 season) affecting his fielding.

And, yes, your comments are reasonable. Would anyone be surprised if the A's method for evaluating defense is very similar to MGL's?

Posted 6:11 p.m., March 2, 2004 (#30) - Taps
  Something that has been mentioned before, but surely it's possible that Rolen in particular just had a bad year with the glove, having little to do with injuries? Many otherwise excellent hitters have bad (for them) years at the plate (Tony Gwynn, 1990 & 1991, for example) and many otherwise excellent pitchers have bad (for them) years on the mound (Roger Clemens, 1993 & 1999, for example). Why not fielding?

And to re-ask a question from cricketing baseballer that I don't see an answer for (I could just have missed it), why are people so dead-set against range factors? James goes to some length to point out that it's not the end-all of defensive metrics, but rather a simple-to-figure improvement over fielding average.

Posted 6:17 p.m., March 2, 2004 (#31) - Michael Humphreys
  DFTs were changed significantly after 2002, there are no public formulas for the current form of DFTs, and some of the data used in DFTs seems to be pbp data (though I could be wrong about the last point).

In short, DFTs are no longer reproducible by fans. We have to take them on faith.

They're still very worthwhile to have around. I would not describe DFT (or DRA, or CAD, for that matter) as "light years" inferior to UZR. As mentioned in the DRA article, 2-to-3 year DRA ratings had a .8 correlation with UZR, as well as the same scale. It is true that UZR is significantly better for part-time players, but I believe that Tango has recommended that people try not to evaluate players with less than 2 years of even UZR ratings (i.e., even full-year UZR ratings are subject to distortion, given the extreme amount of randomness in fielding performance).

I bought BP 2004 and will look at it in more detail, but based on a first read, the method is "slightly different" from 2003 (which was *significantly* changed from the 2002 method, and without a detailed explanation--in fact, I seem to recall the BP website *drawing attention* to the fact that the system had been changed significantly between 2002 and 2003, including by no longer treating errors as different from plays not made). BP 2004 says that the GB/FB method has been changed "radically" from 2003, as well as the LF/CF/RF split.

BP is now generating minor league DFTs, which is not surprising, because that's where they're most useful, particularly to major league teams, who don't have minor league zone data. That may also explain why BP is no longer providing a complete (or even a reasonably well-detailed) explanation of DFT.

Posted 9:43 p.m., March 2, 2004 (#32) - Bill
  The series is called "Baseball Prospectus Basics". To me, that means it covers the Basics used by Baseball Prospectus. When the series started I thought it was obviously geared towards readers new to BP. They've gotten a whole lot more press this year, from Moneyball fallout and the Rose thing, and I think it's safe to say that this year's book will be the best selling edition yet (it's been top 30 on Amazon for a while now), not to mention increased wb traffic. It seems like the perfect time to run a intro course for all of the newbies. Most people here are not newbies.

Also, I imagine it's safe to say that they will continue to ignore possibly superior metrics in the future. It doesn't help them to say that their methods are inferior. They are a business, after all.

Posted 8:48 a.m., March 3, 2004 (#33) - Mike Emeigh(e-mail)
  and I was credited for it, making me I think the only BBBA contributer ever mentioned in BP.

Not quite. Michael Wolverton mentioned something I did on "bequeathed runners" in BPros a few years ago.

-- MWE

Posted 3:11 p.m., March 3, 2004 (#34) - Michael Humphreys
  I posted the following at Clutch:

The BP enigma deepens.

I have looked through all of BP projections for 2004 fielding runs above average.

Something like 95% of the projections are below average, i.e., negative.

It's true that there is a slight skew to fielding talent distributions, but it's nothing like that for batting. As Tango and others have shown, regulars are only a tiny bit better than average, on average, so even if you're including non-regulars in the sample, close to half should be above average.

It's also true that regulars age more quickly in fielding, so they all should be declining. The problem is that projections for young prospects are also included.

Not one player is projected to save 10 or more runs next year in the field. The closest is first baseman Todd Helton, with a +9 projection. One player I can't recall was +8. Eric Chavez was +7 at third. Mike Cameron was +6 in center. Certainly fewer than 10 players are projected to be +5 or more.

Because the ratings are biased negatively, there are quite a few (20?) that are -10 runs or worse. But the spread is still too narrow. Jeter is projected for -10; Bernie Williams for -5. Both have clearly established, on a consistent basis over a few seasons, that they give up around 20 runs per 162 games. And they're both older.

MGL has taught us to regress fielding ratings considerably, but I very much doubt that MGL would not project a single player to be +10 or more in the field.

BP says that fielding has never been less relevant than it is today, and I agree. In the DRA article, I note that DRA ratings, particularly at short, are converging towards the average. But BP projections essentially imply that fielding is virtually valueless. That is not consistent with UZR, Diamond Mind, Pinto or DRA.

To be clear, there are a few single season ratings shown in BP 2004 that are significant--+21 for Mike Cameron last year. But the spread for even actual (not projected) ratings still seems much smaller than for Pinto and *very much* smaller than for UZR.

Curiosity got the better of me. I decided to look for all +15 seasons reported in the book, which provides 2001, 2002 and 2003 data for each player. There were only 32 such seasons. I chose +15 because even if you regress it by 50%, you still have something meaningful.

I believe only four players in the entire book have had even two +15 seasons from 2001-03. None had three. Andruw Jones was +18 in '01 and '02. Adam Kennedy was +17 in '02 and +15 in '03. Eric Chavez was +20 in '01, +25 in '03. He was, I believe, the only player with two +20 seasons. In '02 he was +3. I think there was some catcher who might have been the last one. If I missed one or two, it's only one or two.

The highest single season rating I recall was +26. For first baseman Todd Helton, in 2003. Nope, sorry about that--Rey Sanchez was +27 at short 2001.

Their seems to be some skew in historical ratings--there are more negative ratings, and the negative ratings are probably "bigger" than the positive ratings I've reported above.

Though I'd have to double check by looking at the negative ratings, the data summarized above strongly suggests that the most current BP single season ratings for the last three years have very much smaller variance than UZR, Diamond Mnd, Pinto or DRA ratings. (Diamond Mind doesn't provide runs-saved numbers, but in their essay on fielding, they mention the "spread" of plays made above or below average).

Posted 5:41 p.m., March 3, 2004 (#35) - Silver King
  As others have mentioned, it's _very_ nice to have an intelligently-done version of defensive runs above average that I can easily look up for any pre-play-by-play historical player. It's very nice that it's on the free part of Prospectus (though I am a subscriber). The DFTs are the best thing going (pre-2000 or eventually pre-1989 which UZR can cover) until/unless something like DRA or CAD can be made available for any/all historical players. To whatever extent it's flawed, the best thing going is flawed, and that's a bummer.

Michael, you're noting that DFT recent seasons don't show the variance that more accurate measures show. 'Squeezed' relative to what PBP shows? Do you think that's also true of the DFTs for the previous 1.25 centuries? (Hey, why limit myself to easy questions?) The raa2 numbers are, I think, supposed to be adjusted to contemporary circumstances, so perhaps all raa2's are all in effect 'squeezed.' If they're simply squeezed, it'd be nice to know a rough adjustment factor to apply, since we're unlikely to get Clay to redo them anytime soon. To whatever extent there're other things wrong than squeezing (or shifted a bit toward negative), well, darn.

I agree that it's annoying--at best--that Prospectus basically publicly ignores the existence of other current defensive runs work and especially MGL's combo of generosity and ingenuity with PBP results.

Posted 6:44 p.m., March 3, 2004 (#36) - Michael Humphreys
  Silver King,

I myself have noted that it's nice to have BP fielding ratings for free going all the way back. In general they match up reasonably well with DRA ratings from 1974-2001, though the BP outfield ratings (at least what's up on the website) have too little variance (look at Tris Speaker compared to Cobb) and the infield ratings seem to have more "noise" (but roughly the right variance), because they include pop-ups (which Diamond Mind, UZR, and DRA ignore for infielder ratings).

But the reason I liked the ratings so much was that I thought I knew how they were calculated. The more I've looked into it, the more completely mysterious they've become. I'm still inclined to think they're better than anything else out there for pre-UZR periods, but . . . .

The variance in defensive performance in the 2004 book seems *way* more compressed--more than what I was used to seeing on-line and more so than the two best PBP systems: MGL's UZR and Pinto's Probabilistic Model of Range.

In a prior Primate Studies thread based on a comparison of fielding systems the standard deviation of Pinto's system was about 12 runs (I think) and of UZR about 15. (DRA is about 12.) I thought Tango had also done some simulation studies suggesting an average standard deviation per position of about 12 to 15 runs. Win Shares is around 6.

If fewer than 1% of player seasons (32 out of how many?) in the most recent book have a positive rating of greater than +15, it is almost a certainty that the new BP ratings have a standard deviation closer to Win Shares and in any case clearly different from the two best "zone"-type systems I know of.

And, if you'll pardon my repeating myself, it seems very strange to trumpet your fielding system if it (i) essentially shows that fielding hasn't any persistent material effect and (ii) almost all fielders are below average.

Posted 12:28 a.m., March 5, 2004 (#37) - studes (homepage)
  As I noted in the previous thread, systems, like Win Shares, that are based on more than just range are going to have lower standard deviations than systems that are used to estimate range.

Like it or not, Win Shares includes double plays (in fact, ranks them ahead of range for second basemen), error rates, etc. etc.

Could this be part of what's going on with DFT? I honestly don't know what goes into the DFT equations, and it wouldn't explain the preponderance of negative rankings, but it might help explain the lower variance.

Posted 8:38 a.m., March 5, 2004 (#38) - Michael Humphreys
  Studes,

Thanks for the comment. In the 2003 on-line article about DFTs (which I can no longer find), Clay said that errors were being de-emphasized and did not say that DPs would be emphasized more, so range is pretty much all that's left (as it should be, except at catcher and first base). I think the article emphasized that a "top down" accounting system (similar to Win Shares) would be used. Perhaps a hard "cap" on *team* fielding runs and a too-high "floor" for each *position's* fielding runs results in too little variance, as happens with Win Shares.

A few new thoughts:

It's odd that none of the posters to the "BP-shouldn't-call-Zone-Ratings-Enigmatic-and ignore-MGL's-contributions" threads here or at Clutch appears to be interested in the "enigma" of *BP's* defensive rating methodology *and* results. (Though PECOTA attracted some comment at Clutch.)

After looking at everything I've been able to find at BP's website (not including Premium) and this year's hardcover book, it has occurred to me that BP nowhere claims to rely exclusively on traditional statistics--i.e., they never say they *don't* use STATS-type data.

Their claim that "front offices have access to much more advanced metrics than the public, some specifically charting where each batted ball is hit, how hard, and how high" is in some sense true. Though the STATS data MGL acquires has zone, speed and some trajectory data (classifications of line-drives, fly balls, pop-ups, etc.), baseball teams have even *more* detailed data. Recall in Moneyball that a more advanced version of STATS data was sold by a company called AVM (I think) to the A's (then replicated by budget-conscious DePodesta). The "granulation" of AVM data seemed way beyond what STATS provides--in fact, it had to be, or else no team would have ever bought it. (By the way, though more granulated data is better than less granulated data, MGL's data is well more than adequate to yield good ratings.)

It may not be coincidental that both Diamond Mind and BP are more "enigmatic" than MGL or Pinto. (Diamond Mind doesn't provide actual runs-saved estimates, only "letter"-type grades.) As profit-making concerns, they may be restricted contractually from effectively "reselling" STATS data.

Anyway, to recap: (i) we don't know what BP's defensive formulas are, (ii) we don't even know what defensive data BP uses, though it seems to me likely they use STATS data, (iii) the resulting ratings "skew" weirdly negative, (iv) the variance in ratings appears to be much lower than estimates provided by Tango, MGL, Pinto and DRA, and (v) the projections for 2004 indicate that virtually no fielder has a meaningful positive impact on run prevention.