UZR inter-positional linear correlations (July 6, 2003)
Scott Fischthal presents some of his findings on UZR relationships between positions.
I think this is the interesting comment
...which would imply that putting two strong fielders next to each other at SS and 3B could cause one of the fielder's UZRs to be suppressed a bit. Or, it could just imply that teams don't find it necessary to put great defenders at both SS AND 3B. Who knows...
--posted by TangoTiger at 10:19 AM EDT
Posted 10:23 a.m.,
July 6, 2003
(#1) -
tangotiger
I think another good test is looking at players that switch teams, and see how they have their UZR change. I.e., Ventura next to Jeter and next to Reyrey, etc. Put all of the 3B UZR in one pile with good SS UZR, and the same 3B UZR in another pile with the bad SS UZR, and see if there's any relationship there.
Posted 8:12 p.m.,
July 6, 2003
(#2) -
MAH
Scott,
Thanks for running the test. For what it's worth, a model I've developed that relies only on traditional, non-PBP/Zone data, shows that properly context-adjusted team-level performance at each position is basically uncorrelated with performance at the other positions--in the latest version of the model, no Pearson's r was greater than 0.2 or so--clearly below the 0.3 level that is commonly used as a good rule-of-thumb for "significant" correlation. That doesn't necessarily mean that it is *impossible* for player A to impact player B's fielding performance, but, in general, properly measured fielder performance is "independent".
Thanks again.
Posted 7:32 a.m.,
July 7, 2003
(#3) -
David Smyth
I don't know why a correlation of zero should be the expected. That would imply that teams have no plan for constructing their defense; that they simply plug in the players they happen to acquire. But a common plan is to have good up-the-middle defense, and sacrifice on the corners. It seems to me that this can 'explain' most of those correlations. Even the low correlations such as +.06 for 2B/CF are in the right direction (I hope I remembered that one correctly).
Posted 8:32 a.m.,
July 7, 2003
(#4) -
Rally Monkey
(homepage)
Here's some data for outfielders who switched teams. Not a big sample, but there isn't a lot of correlation between year 1 and year 2.
Posted 10:22 p.m.,
July 7, 2003
(#5) -
Scott Fischthal
In re #3: Interesting observation, but note the +.06 correlation between 2B and 3B as well.
Of course, a .06 correlation is as good as zero given the sample size we're talking about here, so I'd be hard pressed to read much into it.
For what it's worth, the correlation between the sum of a team's 1B and 3B UZRs and its 2B and SS UZRs is -.14, very weak but in your predicted direction.
Posted 11:36 a.m.,
July 8, 2003
(#6) -
Mike Emeigh(e-mail)
The interesting pair of observations in #4 are those for Cameron and Griffey. Cameron lost 30 runs of UZR moving from Cincy to Seattle, and Griffey gained 31 runs moving the other way. That suggests the possibility of a park or pitching staff effect that might be worth investigating.
-- MWE
Posted 12:24 p.m.,
July 8, 2003
(#7) -
tangotiger
Mike,
That's very interesting! I looked at all players who played for multiple teams, and tracked their UZR/162 (weighted by the lesser of their games), to see if there was anything there for all players.
The biggest discrepancy was with Tor, followed by Bos, Det, Cin, ChA. On the flip side were teams who had the "advantage" go the other way: Min, LA, Ari, Tex, Bal.
This might suggest that the park factors employed by MGL are not good enough. Looking at Cincy specifically, there was 979 matched-games.
Here's the breakdown
team team diff G
CIN MIL 65 37
CIN COL 51 143
CIN MIN 36 106
CIN TEX 32 48
CIN SEA 30 281
CIN KCA 2 61
CIN PIT 0 115
CIN SDN -6 48
CIN TBA -11 140
In almost all cases, a player playing for Cincy had a higher UZR than playing for another team.
Here's the report from Texas:
team team diff G
TEX COL 25 145
TEX CLE 4 53
TEX KCA -2 100
TEX OAK -14 47
TEX SEA -15 382
TEX CHA -24 217
TEX ANA -25 47
TEX NYA -31 42
TEX CIN -32 48
TEX SDN -38 48
TEX DET -98 145
That's 1274 matched games. Here is the match on the last line, TEX/DET, making up 145 games:
Juan Gone went from +15 in Det to -7 in Tex, over 53 games in RF. Kapler went from +55 in Det to -21 in Tex in CF.
As for why this might be, it could be the park or the pitchers (or something wrong with MGL's programs). That is, MGL has tried as best he could to isolate the park and pitchers, etc, so that all we are left is the fielder's performance to measure. It could be that the "batted ball velocity" isn't doing the job, and that maybe Texas pitchers suck so bad, that it's not being measured properly in the data. Or that Texas is such a hard place to field at, that we can't adjust it well enough.
Cincy has the opposite problem that maybe their pitchers are much better at controlling balls in play than the league, or that their park is much easier to play at (and the park adjustment does not reflect that properly).
Note: I chose a cutoff point of at least 30 games played with both teams in the above analysis.
Here is the full report using all players, with no min cutoff:
team diff G
TOR 8 1861
DET 7 2046
ANA 7 1026
CHA 6 1333
NYA 6 2200
CIN 5 2647
PHI 4 1044
CLE 4 2012
SLN 3 1714
BOS 3 1832
TBA 2 2232
SEA 2 3751
HOU 1 1765
CHN 1 3468
MIL 1 2674
SFN 0 1685
FLO -1 1322
NYN -2 2912
SDN -2 3193
MON -2 1856
ATL -3 2501
LAN -3 1842
PIT -3 1261
COL -4 3005
OAK -4 1886
KCA -4 2072
BAL -5 1808
TEX -6 3548
ARI -8 1271
MIN -13 727
Essentially, this suggests that Tor fielders are 8 runs over compensated in UZR (if their opposing teams are league average... I'd have to adjust for this as well... that maybe the teams that the Tor played for were +4 or something... this is similar to strength of schedule analysis).
Posted 12:26 p.m.,
July 8, 2003
(#8) -
tangotiger
The "diff" is difference in UZR/162.
Posted 12:38 p.m.,
July 8, 2003
(#9) -
tangotiger
Ughhh.. "opposing teams" is really "the non-Toronto team that the Toronto player has also played for at the same position".... replace "Toronto" with whatever team you want.
Posted 10:17 a.m.,
July 9, 2003
(#10) -
tangotiger
I re-ran the above to include the standard deviation. What I did was assume a 70% success rate (p), and a sample number of plays = G x 4. And each play was worth .8 runs. Taking Toronto as an example, we get that 1 SD (for plays) is SQRT(.3 x .7 / (1861 x 4) ). Taking that figure, and multiplying it by .8 gives us the SD (for runs) per play. Converting this into a /162GP figure, I multiply that value by 162 x 4. So, for Tor, 1 SD (for runs per 162 GP) is 2.8. Their figure is 2.91 SD from their sample mean.
Now, how did all teams do? Only 10 of 30 were within 1 SD, when we would have expected 20. 23 of 30 were within 2 SD, and all were within 3 SD. So, I do think there is some park bias going on.
Here's the data
team diff G 1SD SD
TOR 8 1861 2.8 2.91
DET 7 2046 2.6 2.67
NYA 6 2200 2.5 2.37
CIN 5 2647 2.3 2.17
ANA 7 1026 3.7 1.89
CHA 6 1333 3.3 1.84
CLE 4 2012 2.6 1.51
PHI 4 1044 3.7 1.09
BOS 3 1832 2.8 1.08
SLN 3 1714 2.9 1.05
SEA 2 3751 1.9 1.03
TBA 2 2232 2.5 0.80
CHN 1 3468 2.0 0.50
MIL 1 2674 2.3 0.44
HOU 1 1765 2.8 0.35
SFN 0 1685 2.9 -
FLO -1 1322 3.3 (0.31)
MON -2 1856 2.8 (0.73)
PIT -3 1261 3.3 (0.90)
NYN -2 2912 2.2 (0.91)
SDN -2 3193 2.1 (0.95)
LAN -3 1842 2.8 (1.08)
ATL -3 2501 2.4 (1.26)
OAK -4 1886 2.7 (1.46)
KCA -4 2072 2.6 (1.53)
BAL -5 1808 2.8 (1.79)
COL -4 3005 2.2 (1.85)
ARI -8 1271 3.3 (2.40)
MIN -13 727 4.4 (2.95)
TEX -6 3548 2.0 (3.01)