Tango on Baseball Archives

© Tangotiger

Archive List

Best Fielding Teams, 2003 (December 28, 2003)

Courtesy of UZR and Pinto:



team UZRRuns PintoOuts
SEA 78 37
CHA 54 11
ANA 40 15
KCA 36 5
HOU 23 25
SDN 23 -8
BAL 19 -22
COL 15 -28
TBA 13 25
MON 13 9
SLN 11 -16
ATL 9 41
TOR 8 14
CHN 8 5
FLO 7 -2
CLE 6 35
SFN 5 22
PHI 5 -7
CIN 3 2
LAN -2 -20
DET -8 -41
OAK -12 -11
MIL -15 -2
MIN -17 26
ARI -26 -19
BOS -43 -8
NYN -44 -41
PIT -44 -1
TEX -51 -10
NYA -98 -37

What does this mean say to the Yankee pitchers? Knock off 0.5 runs from each pitcher's ERA, according to UZR, but a third of that according to Pinto's model. That's how much damage their fielders did. (Knock off more for those pitchers who rely on their fielders, and less for those who BB+K alot.)

The r between the two models is .53. 1 SD with UZR is 34 runs, and 1 SD with Pinto is 23 outs (or 18 runs). I would guess that the probability model that Pinto uses should remove the player in question, otherwise he's being compared, partly, to himself. UZR would have this problem, but it has the advantage of using multi-years. So, it's not really fair to compare the two, until Pinto can take advantage of multi-year sample data.

--posted by TangoTiger at 03:53 PM EDT


Posted 4:06 p.m., December 28, 2003 (#1) - tangotiger
  It's nice that MGL, Pinto, I and 8 million NY fans see the MEts and Yanks as 2 horrible fielding teams.

It would be interesting to hear from BAL, COL, and MIN fans.

Posted 4:22 p.m., December 30, 2003 (#2) - Erik Allen(e-mail)
  Tango (or anyone else in the know)...

One thing I have always been confused about concerning UZR and other such ranking systems is the use or non-use of the speed of the batted ball. Does UZR or Pinto's model account for this?

It seems to me that this is an essential part of evaluating the role of the fielder versus the role of the pitcher, although I recognize how difficult it must be to obtain data like that. If there is some inherent BABIP ability for pitchers, it seems to me that it would almost assuredly be linked to a pitcher's unique distribution of ball velocities as they leave the bat.

Posted 4:30 p.m., December 30, 2003 (#3) - tangotiger
  UZR and Pinto both uses the batted ball speed (3 different speeds).

However, I have no idea as to its reliability, nor to its impact on the ratings.

One of the problems with UZR is that it does so much under the hood, that you have to take it on faith that it's done well. No offense to MGL, who does tremendous work on UZR, but I would want to split up the UZR by the various parameters to see what in the world is happening with each one.

That is, what's the basic UZR using only the zone. Then, using the park. Then, add in the batter hand. Then, add in the gb/fb tendency of the pitcher. Then, the base/out situation. Then, the batted ball speed, etc, etc. Right now, the selling of UZR is based on faith and that the results are somewhat, but not very, surprising. That's good for alot of people, but not everyone.

Posted 5:27 p.m., December 30, 2003 (#4) - MGL
  As I've said many times, with my UZR methodology at least, I take pains to reduce the chances of sample error creepin ginto the results, especially as we make adjustments for more and more variables (and teh sameple sizes get less and less). The way I do that is through regressions and "mitigations" to be vague about it.

Let me give you an example. Let's say that player A hit 20 home runs in park P. Let's say that we want to estimate how many home runs he would have hit in a neutral park. We want to do some sort of park adjustement of course. Now let's say that we have some data that suggests that this is a good home run park, but we are not sure how reliable this data is, plus we have only a very small sample of data that suggests this is a good HR park. What to do, what to do. Let's say the small sample of data is such that hitters hit twice as many home runs in this parks as in other parks. But that's only in like 100 PA's, so we know that there is not much confidence in that 2-1 ratio. What to do, what to do. Now say we look at the park, and it is smaller than an average park, but not that much smaller. Now we are more confident that at least this is a good HR park, bur probably not THAT good (HR PF of 2.00). What to do, what to do. We also have the problem, as Tango often points out, that maybe this park does not truly affect our player the way it affects an "average" player. What to do, what to do. If we use that sample PF of 2.00 and say our player would have hit 10 HR's in a neiutral park, rather than the 20 in this park, we know intuitively that we are probably "overadjusting," which is the main thing that Tango worries about with all these UZR adjustments (he should also worry about bugs and mistakes in the "program"). What to do, what to do. Here's what can be done, and here is why, if used properly, any adjustments, no matter how small the data sample is, is ALWAYAs better than no adjustments, if you mitigate them properly. which I think I almost always do, becuase I am always cognizant of the problem of "overadjusting." You start with this: Out player hits 20 HR's, but we have evidence that it is a good HR park. If we do no adjustments becuase we are scared about overadjusting becuase we really don't have much sample data as our evidence, we get a park neutral HR numbe of 20 still. That's fine. No park adjustment. What about 19.9. Surely that is better (closer to the truth) than 20, since we have SOME evidence that this is a good HR park. What about 19.8? Wel, how strong is our evidence that this is a good HR park and how good do we think it is, based on that evidence? Basically , we keep going until we strike a balance between some level of adjustment to improve our estimate of his neutral HR rate, and th fact that we have limited data to support our notion that this a good HR park. The key is to be concervative, such that you can say with high degreee of confidence that my "adjusted value is better than my unadjusted one, given what evidence I have of the trtue natur eof the adjustments. For example, if Manny has an unadjusted UZR of -30 in LF at Fenway, it is safe to say that his park neitralized UZR is somewhjat better than that. How much better? Well, that's the problem. But as long as you are conservartive, which I always am, you are OK. N one can accuse you of coming upo with adjusted results that are worse than the unadjusted results. If anyhting (in fact, this is always the case), with my UZR's and my other stats that get adjusted for various things, I am underestimating the adjustments, such that while the values generated may not be perfect, they are ALWYAS better than the unadjusted versions. You have to take my wod for that, but if I would show you the intermediate results for UZR (unadjusted, adjusted for park, uadjusted for park and speed of ball, etc.), you would see that each adjustment is very slight, and alsi intuitive and obvious, which means, again, that the final unadjusted results HAVE TO be better than unadjusted ones, just as if I asked you, in my 20 HR example above, what is a better estimate of player A's park-neutral home run mubmer, 20 or 19.9, given that we have some sample of data, albeit not very reliable (small size, etc.), that suggests that park P is a better than average HR park?

Posted 11:04 a.m., January 15, 2004 (#5) - tangotiger
  What I did was take each team's actual players, used their "true talent" that I calculated in the other thread, weighted them by the number of BIP they had, and calculated a team's "true talent" fielding level.

Anaheim and Seattle have the best fielding players, while the Yanks, by far, have the worst fielding players (in 2003). You may think the differential between actual UZR and True Talent UZR is "too much". The standard deviation of the differential is 23 runs. Our expectation for what the SD of the differential should have been is 24 runs. (sqrt(.3*.7*27*162)*.8=24)

Take for example the Whitesox fielders. UZR says they were +54, but the true talent of their fielders, weighted by their actual BIP, was -4. By this measure alone, I expect the Whitesox to give up an extra 58 runs next year. (Aside from personnel changes).

The Yanks fielders were extremely unlucky. I expect their fielding runs to improve by 43 runs (had they kept the same personnel).

Essentially, we expect everyone to play up to their true talent levels.

(No age adjustments made)

team actUZR TTruns diff
ANA 40 39 -1
SEA 78 39 -39
SLN 11 30 19
OAK -12 24 36
COL 15 15 0
HOU 21 15 -6
KCA 33 12 -21
MIL -7 9 16
LAN -2 8 10
ATL 9 7 -2
FLO 13 6 -7
CLE 6 5 -1
BAL 17 2 -15
MON 13 2 -11
SDN 28 1 -27
TOR 8 -3 -11
ARI -30 -3 27
CHA 54 -4 -58
CIN 7 -5 -12
BOS -43 -6 37
MIN -17 -9 8
PIT -35 -11 24
TBA 13 -11 -24
PHI 5 -12 -17
CHN 10 -13 -23
TEX -51 -18 33
NYN -34 -19 15
DET -16 -20 -4
SFN -16 -22 -6
NYA -96 -53 43

Posted 10:31 a.m., January 29, 2004 (#6) - Blixa Bargeld
  I think if you were to split these data into infield/outfield defensive ratings, you could then take a pitchers g/f ratio to better estimate how much a pitcher is helped or hurt by the strength of his defense.

Posted 10:41 a.m., January 29, 2004 (#7) - tangotiger
  Agreed.

You can also break that up into lefty/righty pitchers, and look at the sides of the infield and the sides of the OF.

If you include the GB/FB and LH/RH, you know what you get? You get the 7 fielding positions.

So, to best estimate the impact of fielding on each pitcher, you want to know the ball distribution to the 7 major zones for each pitcher, and then apply the fielder (and park) effect on each pitcher.

Posted 11:13 a.m., January 29, 2004 (#8) - tangotiger
  Here is how the 2004 Yanks might look like, fielding-wise:

Position Neutral... Position Specific.... Player
+12, +8 CF : Lofton
.0, -2 2B : Soriano
-9, -4 RF : Matsui
-12, -23 SS : Jeter
-12, -1 1B : Giambi
-15, -9 LF : Bernie
-16, -18 3B : Sheffield
---------------------------
-52, -49 : Total

The worst fielding 3B from 1999-2003:
Greg Norton
Aubrey Huff
Travis Fryman
Fernando Tatis

Gary Sheffield would be right in the middle of that.

Really, the Yanks' fielding is a big mess. This is what I think I'd do:
1B - Giambi
2B - Bernie
SS - Soriano
3B - Jeter
LF - Sheffield
CF - Lofton
RF - Matsui

The first important move is Soriano. He's a better fielder than Jeter, and he's quicker. Bernie has a horrible arm, and can't cover the ground. Other than 1B/DH, the only other place to hide those attributes are at 2B. I think Jeter would be more reliable than Sheff at 3B.

This of course assumes that these players had half-a-season to make the switch.

But, this is just a really big mess. No matter what combination you come up with, it would be a bad choice. What I just did was bad. It's the Bad News Yanks. Add in the poor catching of Posada, and you've got 6 of your 8 fielders as way below average.

Posted 11:43 a.m., January 29, 2004 (#9) - studes (homepage)
  Tango, I just came across Post #5 and I want to thank you. That is truly awesome work.

Posted 12:02 p.m., January 29, 2004 (#10) - tangotiger
  Thanks for the kind words!