The following is a letter written by Scott Fischthal. I am posting it virtually unedited.

UZR inter-positional linear correlations

Back about eight years ago, I performed some very basic analysis on Defensive Averages (DA), a ZR-like stat that Sherri Nichols used to post based on some work done by Pete DeCoursey and data from The Baseball Workshop (the Project Scoresheet follow-on that is now defunct). At the time, there was a lot of discussion about whether DA suffered from similar issues to RF, particularly due to impacts adjacent fielders might have on one-another (but also due to pitching staff effects). DA sort of anticipated DIPS, in that it assigned ALL batted balls to fielders; there were no balls that fell "out-of-zone" for all fielders, and the zones tended to be relatively large.

In any case, I briefly explored some of these issues by running some simple linear correlation studies; details can be found in the r.s.b archives. To give you an idea of what I found, the first post was titled: "OF DAs: Independence assumption unwarranted".

A sampling of the posts (in some cases I mention using Spearman rho instead of Pearson r, but don't worry about that too much; I ran both and the results were roughly equivalent):

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=3r0e8u%242ldq%40news-s01.ny.us.ibm.net

http://groups.google.com/groups?q=fischthal+correlation&start=30&hl=en&lr=&ie=UTF-8&oe=UTF-8&scoring=d&selm=4ckhqt%24dft%40news1.radix.net&rnum=34

http://groups.google.com/groups?q=fischthal+correlation&start=40&hl=en&lr=&ie=UTF-8&oe=UTF-8&scoring=d&selm=3rhreg%24sh8%40cedar.cic.net&rnum=47

In general, a groups.google on Fischthal and correlation will get you a pretty good set of these posts. I had a web page once with all the data on it; I need to find that html file and post it somewhere. Bottom line is that there were some pretty fair positive correlations between adjacent position DAs. Some of this had to do with overlapping zones, but (for example) the correlation between 2B and SS was around r=0.5, and their zones did not overlap. I saw little correlation between DAs and pitching staff attributes (GB/FB, K/9 rates, K/W, BB/9, LHP/RHP, et al.). There was a slight correlation (r=.32) between infield DAs and fly-ball staffs.

Anyway, this afternoon I ran a couple of real quick positional correlations on UZR based on the CSV file you had posted in the baseballprimer.com archives. I summed the data so that I had the annual UZRs for each team at each position, giving me 120 data points per position for the 4 years of data. My results:

PEARSON R   1B       2B       3B       SS       LF       CF       RF       IF
2B       -0.152
3B        0.178    0.062
SS        0.102    0.273   -0.299
LF       -0.061    0.037    0.078    0.014
CF        0.067    0.063   -0.121    0.118   -0.086
RF       -0.088   -0.080   -0.033    0.075    0.160   -0.120
IF        0.483    0.565    0.515    0.507    0.044    0.047   -0.052
OF       -0.027    0.029   -0.059    0.129    0.577    0.621    0.475    0.035

Spearman rho
2B       -0.144
3B        0.188    0.062
SS        0.108    0.229   -0.295
LF       -0.031   -0.016    0.096   -0.015
CF        0.031    0.024   -0.100    0.138   -0.112
RF       -0.102   -0.113   -0.023    0.014    0.153   -0.120
IF        0.491    0.546    0.490    0.490    0.020    0.053   -0.093
OF       -0.038   -0.030   -0.049    0.109    0.571    0.554    0.485    0.018

I think this shows a fairly impressive level of linear independence of UZRs among different positions on the field, especially for outfielders. The only ones that would make me a bit nervous are SS/3B and SS/2B. SS/3B is particularly troublesome in that the correlation is negative, which would imply that putting two strong fielders next to each other at SS and 3B could cause one of the fielder's UZRs to be suppressed a bit. Or, it could just imply that teams don't find it necessary to put great defenders at both SS AND 3B. Who knows...

I'd guess any correlation that falls significantly below 0.3 is pretty much irrelevant; after all, we're seeing a -0.121 correlation between 3B and CF, and I would be very surprised if you could find a variable that could drive a negative correlation between those two positions.

Since I haven't really been doing any statistical analysis on the job for the past few years, I don't have much in the way of tools any more to start looking at nonlinear correlations, but I'd suggest you also take a run at those correlations to be sure there isn't anything more disturbing here.