Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)
Nothing new... just expressed along a common scale: runs per 162 GP.
Note to you aspiring sabermetricians: if you can, express things in the currency of wins, runs, or outs (in that order). Express things as per 162 GP (full-season) or per play or something useful like that. Don't put things as per 100 GP or per 1000 innings.
--posted by TangoTiger at 10:21 AM EDT
Posted 11:14 a.m.,
February 17, 2004
(#1) -
tangotiger
I sent the following to the author above:
=========================
Here are your totals by position:
Pos Rate2 WS UZR Pinto Cedeno
------------------------------------------------
3 54 85 -11 11 35
4 59 11 82 24 45
5 38 160 36 32 67
6 38 2 -6 32 18
7 29 58 -36 -80 -8
8 31 257 21 83 97
9 27 74 -77 -75 -12
------------------------------------------------
Tot 276 647 9 27 242
OF only 87 389 -92 -72 77
I don't expect all positions to be zero, because you
are not looking at all players (and you did set
everyone to 162 GP).
However, the UZR and Pinto numbers do make the most
sense. The Rate2 numbers are about 30 runs too high
total and your WS OF are just off the charts too high.
Posted 11:23 a.m.,
February 17, 2004
(#2) -
Avkash(e-mail)
(homepage)
There is a mistake in my calculations. I'll have the corrections up. Thanks!
Posted 11:24 a.m.,
February 17, 2004
(#3) -
tangotiger
Btw, the SD of all players in that chart is as follows:
UZR: 16 runs = 1SD
Pinto: 12 runs
Rate2: 11
Win Shares: 6
Win Shares just doesn't have the scale.
Posted 11:53 a.m.,
February 17, 2004
(#4) -
MGL
The more granular the data, the higher the SD, and the more accurate and useful the results.
I posted this on the above web site:
I'm making no judgements about win shares per se, but clearly it is not appropriate to average win shares in with these other metrics, especially in the OF. The scale and baseline are completely different for win shares, even though the author of this article has done an admirable job in converting it to something that looks like runs per 162 games.
Also, you want to be very careful in averaging metrics and taking the results seriously (as being better than any of the individual metrics). In general, in order to do that, you have to have reason to believe that one metric complements another and/or offers something that another one does not. And even then, you want to make sure that they are all on roughly the same qualititative level. If they are not, then even if there is some complementing, the inferior ones will bring down the average to a level worse than that of any of the individual superior ones. So you want to make sure that there is a high degree of independence among the individual metrics and that they all use about the same granularity of data.
In this case, besides the fact that I am biased towards UZR, as I said, win shares does not belong in this list, I am not familiar with rate2, and although Pinto's "system" appears to be good and very close to UZR, it is a black box as far as I know.
If I were in charge of the world, I would not be averaging UZR or Pinto's system with anything, any more than I would average lwts with OPS or BA with OBA. IMNSHO, UZR is the gold standard, principally becasue it relies on PBP (hit type and location) data (so does ZR and defensive average, but on a very "gross" level), whereas all the other metrics can only "estimate" such data...
Posted 11:54 a.m.,
February 17, 2004
(#5) -
Avkash(e-mail)
(homepage)
The updated numbers are up, thanks for the heads up.
The total OF Win Shares is now +44.
Posted 12:18 p.m.,
February 17, 2004
(#6) -
studes
(homepage)
By definition, the sum of the Win Shares numbers for all players should equal zero. If they don't then there is an error in the numbers or the methodology. I'll go back and review the data I sent Avkash.
I also agree that WS scale is an issue. MGL, I agree about the preference for UZR. But I do appreciate what Avkash did, because I'm interested in improving Win Shares' fielding numbers, and this approach helps a lot. I'm hoping to eventually use Michael Humphries' DRA approach, when he publishes his equations.
Posted 12:20 p.m.,
February 17, 2004
(#7) -
tangotiger
Pinto's does rely on PBP, and it uses maximum likelihood estimation. The problems with Pinto's model has been noted by me a few times in a few places (which are most notable with the CF and the curious decision to look at slices and not grids). Pinto's model would greatly increase if he were to use multi-year data (something that's built into UZR).
We should remember that Pinto's model was a first stab, and it was published as a blog entry. If Pinto were to devote his time/effort as much as MGL did to improving it, it would be great. It's not fair to criticize it as MGL does, since Pinto never really trumpeted it to the point that it should be criticized as a final product.
It would be like criticizing my "position-neutral" fielding, when it's still at the very early stages of even contemplation.
On a 0 to 10 point scale of usefulness, UZR is probably a 7, and Pinto is a 5. Rate2 is probably a 3 and WS a 2. Just wild meanigless guesses.
***
The "r" among the 4 stats as originally published:
UZR/Pinto: .67
WS/Rate2: .58
UZR/Rate2: .51
Pinto/Rate2: .45
UZR/WS: .34
WS/Pinto: .34
all to UZR: .70 (WS added nothing to Pinto/Rate2)
all to Pinto: .68 (Rate2 and WS added almost nothing to Pinto)
all to Rate2: .67 (UZR/Pinto added almost nothing to Rate2)
all to WS: .59 (UZR/Pinto added nothing to WS)
Posted 12:32 p.m.,
February 17, 2004
(#8) -
studes
(homepage)
I'd be interested in those "r" stats with the right outfield numbers. My guess is that Rate2 and WS would be about equivalent.
Of course, I'll run it myself when I get a chance.
Posted 1:04 p.m.,
February 17, 2004
(#9) -
Michael Humphreys
Avkash, thanks for the great chart! And Tango, thanks for the link.
I've run the basic stats and correlations on the updated data, and have the following observations.
I agree with all of Tango's and MGL's points.
UZR is still the gold standard, though the outfield ratings probably have too much variance (Pinto's seem to have slightly too little).
I would consult UZR over Pinto in the infield, and consider *both* in the outfield, because "ball-hogging" is more of a problem in the outfield and UZR ratings seems to have more variance than Diamond Mind ratings.
For non-zone/PBP systems, Rate2 (Davenport Fielding Translations) is clearly an improvement over Win Shares, but DRA is probably more accurate (and definitely computationally simpler) than DFT.
***
I split the data between infield and outfield. Why? Because only UZR (correctly) eliminates infield fly outs--in other words, UZR is measuring something different from the others.
Here are the main findings in the infield:
UZR, Pinto, DFT all have approximately the same standard deviation (13, 15 and 12, respectively). Win Shares has a std of 6. (DRA standard deviations in the infield are about 12.) Furthermore, even using the updated numbers, the mean/median of Win Share infield ratings is 2 (or almost half of 1 std above zero); whereas the others have mean/medians of around +1. Win Shares undervalues defense and overvalues regulars.
Pinto has a .66 correlation with UZR. DFT has a .54 correlation with UZR. (AED measured a .61 correlation between DRA single-season infield ratings and UZR ratings provided in the DRA article.) Win Shares has a .42 correlation with UZR.
I have the feeling that if David (Pinto) eliminated fly outs from infield ratings, the correlation would shoot up to .8 or higher. For *non*-PBP systems, AED has developed a system that better measures "skill" infield fly outs, and including such plays actually improves his correlations with UZR ratings, which, again, do not include fly outs. This is probably because infield fly outs to some degree measure speed/range, and may indirectly adjust for the groundball/flyball tendency of the team's pitching staff.
In the outfield, the standard deviations were:
UZR (18), Pinto (11), DFT (10), Win Shares (5). (DRA, again, is about 12.)
The relevant correlations with UZR:
Pinto (.69); DFT (.47); Win Shares (.27)[!]. (The Win Shares correlation with Pinto is .42.) I don't have a DRA/UZR single-season ratings comparisons handy, but I don't recall the outfield numbers being appreciably worse than the infield numbers. (Actually, they were probably better, because there were more ratings "misses" in the infield (particularly at third, but also at first) than in the outfield, as shown in the DRA article.) So I would pencil in an approximately .6 to .65 correlation between DRA and UZR (or at least UZR ratings not inconsistent with Diamond Mind ratings).
My apologies for not getting out the DRA ratings for 2002 and 2003. I've been a bit overwhelmed by the academic program I'm in. I've also developed (or at least thought of) some new techniques to improve DRA ratings at third base and cope with the lack of separate LF/CF/RF data before 1970 or so.
Thanks again. As Tango would say, "Great stuff!"
Posted 1:12 p.m.,
February 17, 2004
(#10) -
Avkash(e-mail)
(homepage)
Michael,
I've been following DRA and consult it when the data is available. Look forward to seeing your latest.
Thanks for runing the numbers on the updated charts.
Posted 1:23 p.m.,
February 17, 2004
(#11) -
studes
(homepage)
Win Shares undervalues defense and overvalues regulars.
I understand the second point (because mean is higher) but not the first. Are you saying this because the sd is lower for WS? Do you mean WS does not properly credit outstanding fielders?
If so, I agree.
Posted 1:44 p.m.,
February 17, 2004
(#12) -
Michael Humphreys
Avkash--thanks.
studes--WS "dampens" fielder ratings too much and, when you account for this, overrates regulars. The average WS rating for a regular is very high when you "scale" it for the std of WS ratings. Other ratings systems show regulars +1 or +1.5 runs/season, with std of 12-15; WS shows regulars +2 runs/season, with std of 6.
Everyone--a clarifying point. I think it's worthwhile to look at Pinto/UZR ratings in the outfield together because (i) the measurement of outfielder skill is complicated by the ball-hogging factor (so it's worth trying different techniques), and (ii) at least in the outfield Pinto and UZR are trying to measure the same thing using PBP data: the relative number of fly balls caught by an outfielder, compared to what an average outfielder would catch given the same batted ball "opportunities". Pinto and UZR measure the latter in different ways, but I think that's a good thing.
Posted 1:47 p.m.,
February 17, 2004
(#13) -
tangotiger
Yes, the SD measures the spread of the data. And the spread in WS is half what it is for all other measures.
The "problem" with correcting this though is that you'll end up with negative Win Shares.
Say that all your "regulars" SS WS per 162 GP is between 1 and 13, with a mean of 7. This corresponds to a SD of 2 WS, or 0.67 wins, or 6.7 runs (more or less). However, what we think we know is that the spread should be double that. Since the mean for fielding is fixed, you've got to keep the mean for SS at 7. That gives you a range of -5 to 19 WS for SS. This will correspond to a SD of 4 WS, or 1.33 wins, or 13 runs (which is in-line with what we think is right).
However, according to James' thinking, negative win shares can't happen, and so to "fix" this, he's got to cut the spread in half in order to fit the data into his thinking.
***
Note: A quick way to figure out the SD of something like this is to take the difference in the typical points, and divide by 6. "6" corresponds to 3 SD from the mean on both sides, which I think covers 99% of the points. It's a handy rule of thumb.
Posted 1:54 p.m.,
February 17, 2004
(#14) -
studes
(homepage)
I know I'll get hammered for this, but I should point out that Win Shares is not just a measure of range. It also includes claim points for double plays, errors, outfield assists, etc. I'm not defending Win Shares for doing this, just pointing out that it isn't a straight measure of range, as Pinto's is. I'm not sure which UZR metric Avkash used.
I was looking through some old Baseball Abstracts, and was surprised to find that Bill James actually laid out his Win Shares fielding structure in the 1982 Abstract. He rated fielders according to four diferent types of buckets, which differed by position, totaling 100 points, I think. It was virtually EXACTLY what went into fielding Win Shares.
Posted 1:57 p.m.,
February 17, 2004
(#15) -
studes
(homepage)
Agreed, Tango. One of the problems with fielding Win Shares is lack of negative Win Shares.
Posted 2:07 p.m.,
February 17, 2004
(#16) -
Avkash(e-mail)
(homepage)
Studes,
I used the UZR runs per 162 games played for the position in question. The info is available in the CSV file linked on the "data" page I link to.
On a side note, directed towrds MGL and/or Tango, I noticed some players have a games played higher than 162 in said CSV file. I figured this was an adjustment for innings, but I couldn't figure it out. One example is Aramis Ramirez, who is listed at 173 games played.
Posted 2:14 p.m.,
February 17, 2004
(#17) -
tangotiger
It's based on the "games opps".
I'll present the hitting side, which will be easier to follow. If the avg hitter gets 680 PAs per 162 GP, we can say he's +40 runs per 680 PAs or +40 runs per 162 GP.
But, what if this guy was the leadoff hitter and had 750 PAs? In that case, he had the equivalent of 179 GP. So, if he was a +30 for 750 PAs, he'd be +27 per 680 PAs or per 162 GP.
Same situation applies with fielders, where someone like Jeter might have only 130 fielding games, because alot of balls are not hit to him, or a flyball staff will have their OF with 178 GP or something.
When you see that things are put to 162 GP, it's not HIS 162 GP, but the equivalent number of PAs or BIP that goes into 162 GP.
Posted 5:09 p.m.,
February 17, 2004
(#18) -
studes
(homepage)
Thanks, Avkash. I understand you used UZR/162 games. But there are several types, I think. For instance, there's straight range, but there are also DP impact numbers, etc. Depends on which column you pulled.
Posted 5:24 p.m.,
February 17, 2004
(#19) -
Charles Saeger(e-mail)
studes: the structure went into Win Shares, but many of the individual criteria did not, or at least, not in the form he used.