Tango on Baseball Archives

© Tangotiger

Archive List

David Pinto and fielding (November 10, 2003)

David is rapidly becoming one of my favorites.

Click for more...


09/19 - A Probabalistic Model of Range (using direction, batted ball type, velocity)
09/20 - Followup
09/22 - Rankings
09/26 - Position Rankings
11/02 - Recap
11/09 - Including park factors, batter handedness and pitcher handedness
11/09 - New Rankings
11/12 - Team/pos rankings
11/13 - Pitcher rankings

David Pinto does some work on fielding. Essentially, his process is similar to MGL's UZR, in that they consider virtually the same factors.

The difference is that David uses logistic regression (if I'm using that term properly), while MGL uses "adjustment factors".

Anyway, with the new adjustments that David added, 1 SD over 4500 BIP for the Stadium effect is 38 outs. That's pretty substantial, and we could expect it. For the pitcher's handedness, the effect is 1 SD = 2 outs. That's pretty tiny over 4500 BIP, and also supported from MGL's findings last year. 1 SD = 4 outs for the batter's handedness, and 1 SD = 5 outs for both handedness. I'm a little surprised that the effect for the batter's handedness was not more. This probably means that, in 2003 anyway, the % handedness hittters faced for each team must have been pretty even.

One interesting thing that David does, in terms of direction, is not location. That is, while MGL would have a factor for the ball going to 8M, David would simply look for the entire slice up the middle, including the infield.

I wonder what would happen if David would also include the perpendicular axis to this slice (i.e., the grid location, rather than the slice location)? I've gotta figure that it would be an improvement.

The thing with the "stadium" effect is, I think, David is not only capturing the stadium effect, but also any scoring bias. I'm not sure how MGL's process works in this regards (probably the same, though not sure).

All in all, it will be fascinating to compare David and MGL's results at some point. (If David is out there, and if you have the data for any year from 1999-2002, we can do this right now!)

Finally, David, you are now in the perfect position to shed light on DIPS. That is, take all your probabilities for all the combination of factors, and apply it to PITCHERS. If what I said is not self-evident, drop a note here, and we can discuss this further.

--posted by TangoTiger at 02:05 PM EDT


Posted 2:16 p.m., November 10, 2003 (#1) - UZR and PZR blueprint (homepage)
  For those who want more information on how to separate pitching from fielding on BIP, I provided a non-technical blueprint for doing so.

I think that someone out of MGL, Tippett, Pinto and maybe others can shed some big light on this...

====== Cut =====
PZR is the exact same thing as UZR, but from the pitcher's perspective.

To recap the ultimate UZR as quickly as possible: given
- the location of the batted ball
- the trajectory of the batted ball
- the speed of the ball
- the game situation (inning/score/base/out/count)
- the speed of the runner(s) and batter
- the batter's handedness
- the gb/fb tendency of the pitcher/batter
- the type of pitch thrown (fastball, curve)
- the surface of the park
- the dimensions of the park
- the climate that day
- teammates' positioning

how often do we expect a league average fielder to convert that batted ball into an out (for each possible combination of the above)? How often did our playerX actually get an out? The difference is his performance level above league average for that position. Now, MGL captures most, but not all of that. (And maybe there's more too)

Now, you do something very similar, whereby the pitcher gets the same variables being controlled for that are outside his control. So, given
- the game situation (inning/score/base/out)
- the speed of the runner(s) and batter
- the batter's handedness
- the gb/fb tendency of the batter
- the surface of the park
- the dimensions of the park
- the climate that day

how OFTEN did pitcherY allow a batted ball into each of the zones? How OFTEN does a league average pitcher allow a batted ball into each of the zones?

What are the league out-conversion rates in each zone, for each combination of the above? Call these the "array of subzoneOutRates"

Multiply this array against the frequency of pitcherY's array. Do the same for the frequency of the league pitcher array.

The difference is the pitcher's performance level in balls in play.

Following this process will validate (or invalidate) DIPS.
======Paste ====

Posted 3:14 p.m., November 10, 2003 (#2) - studes (homepage)
  Did you notice the non-impact of Coors? David is using one-year park factors here, and I think that's a mistake, myself.

The other thing that struck me, looking at his data, is how little impact defense actually has, at least when collated on the team level. The difference between the best and worst fielding teams in the leagues last year (Braves and Mets) was 80 outs, spread over an entire season. That's 0.5 outs/game. 1/6 of an inning/game. etc. etc.

Can this be? Is the relative difference between defenses that small? Is everything else park, handedness, pitcher and (mostly, it seems) luck? How could Darrin Erstad possibly save 55 runs in a season if this is true?

Let me know what I'm missing.

Posted 3:44 p.m., November 10, 2003 (#3) - tangotiger
  If I remember right, on a team level, fielding is about +/- 40 runs per team, with some exceptions (like the 2002 Angels).

Great observation, studes, on the 1-year thing. You are absolutely right. You've got to have multi-year data here, or things get skewed.

What David is doing with his measure is, essentially, best-fitting. It implicitly assumes that the data itself is not a sample, but the population.

You can think of the "strength of schedule" idea, as it's the same principle. You may have the 98 Yanks input into a strength of schedule to come out with a .681 win% against a balanced league, but that itself is still subject to regression towards the mean.

This would actually be a huge stumbling block, depending on the size of the sample, in David's methodology.

MGL, on the other hand, does (I think) regress his adjustment factors to account for this.

Posted 7:41 p.m., November 10, 2003 (#4) - J Cross
  Are the 2003 UZR's and SLWT's posted anywhere? or when are they coming out?

Posted 7:49 p.m., November 10, 2003 (#5) - MGL
  I haven't been following this thread too much, but yes, I regress most of the adjustments (I can't say ALL of them - I'd have to check) before I apply them - definitely the park adjustments. I use multi-year data for the adjustments and appply a regressed version to data in individual years (within that multi-year period).

For example, if the "ground balls through the infield" sample park factor at Dodger Stadium is 1.06 using data from 93-02 (hopefuly the infield has not changed much over the years - of course if an infield has changed, like in Phily and Tampa, I use different years and different PF's), I would regress that 1.06 to maybe 1.03 (grass infields get regressed towards .99 and turf infields towards 1.02, I think). That is the "ground ball" park factor (the 1.03) that I would use to adjust all ground balls at Dodger Stadium in any year (between 93 and 02). Actually, I don't think I regress any of the other adjustment factors (GB/FB pitchers, handedness, speed of batted ball). I think I use a 4-year sample adjustment factor, with no regression, but I'm not sure.

Interesting (and very good!) work by David. When I get some time, I'll check it out in more detail...

Posted 10:20 a.m., November 11, 2003 (#6) - Tangotiger
  In comparing the MGL and Pinto model:

- The real strength in MGL's model is that we can "see" what all the underlying "impacts" are for every variable, as well as to easily be able to regress those impacts

- The real drawback in MGL's model is how he gets those adjustment factors. Essentially, we're treating all these variables as independent to each other, and then just multiply them all to get the impact for a given context. We're not even sure we can do that, and, to even get those factors to begin with, you have to try to strip them away from those polluted context

- The real strength in the Pinto model is that we don't treat all these variables as being independent

- The real drawback is that the data being used would probably have a rather large standard error, and that we can't really see what the impact is. We can see it if we isolate one variable, but if you have a combination of variables, we really don't know what direction it's pulling the data, as well as to the degree. And of course, no regression towards the mean (though that may be able to be built into the system).

This really just becomes a statistics problem.

Posted 10:42 p.m., November 12, 2003 (#7) - Tangotiger (homepage)
  David posted the team-by-team breakdown (see above link).

Posted 10:52 a.m., November 13, 2003 (#8) - tangotiger
  Doing a "actual outs" minus "expected outs", here are the leaders:
Position Outs Diff Team
4 34 Orioles
3 30 Devil Rays
6 30 Devil Rays
4 29 Athletics
8 24 Braves
5 23 Astros
4 23 Pirates
6 23 Royals
6 22 Expos
4 20 Blue Jays

and the trailers

4 -21 Twins
3 -22 Athletics
5 -22 Mets
1 -22 Orioles
4 -22 Rockies
6 -23 Yankees
4 -24 Mets
1 -25 Devil Rays
9 -25 Orioles
9 -27 Dodgers

Yes, there's Jeter over there...

How can you possibly be -22 at 1B and +30 at 1B? Kinda surprising.

If I take the absolute value of all the differences, here's what the average spread is by position

Pos Avg err
1 8
2 4
3 8
4 12
5 8
6 10
7 6
8 9
9 8

This means that the spread in talent is found most at 2B, and least at C. The C is not surprising, since they'll be involved so little on BIP.

Posted 11:33 a.m., November 13, 2003 (#9) - tangotiger
  Anyone else notice how well the Atl CF are doing (i.e., Andruw Jones)?

The one big difference between David and MGL is that MGL looks at the grid location, while David only looks at the pizza slice location. So, if Jones plays shallow or deep, this will have an effect on things.

Perhaps the Official Priamte Statisticians can jump in here, insofar as I'd like to talk about logistic regression and independence.

Let's focus on Andruw Jones. When MGL does his "out rates by zone", where the zone is (more or less):
grid,stadium,type of fly ball, pitcher tendency on flyball, batter handedness
MGL comes up with league-wide adjustment rates, so that one player doesn't have much impact overall. Though, I'm not convinced about the stadium effect.

When David does his analysis, doesn't the identity of the player now become a problem, since the out rates by zone, as David does them, is now polluted by that player to a much larger extent?

I'm still a little confused on making sure that Andruw Jones doesn't impact what the out conversion rate is on a flyball hit in Atlanta.

(Kinda like having a LH HR factor for the 1920's Yanks... and having Babe Ruth make up the majority of that sample.)

Thanks in advance...

Posted 3:19 p.m., November 13, 2003 (#10) - Michael Humphreys
  Tango,

Not sure this is exactly on point, but I've been in touch with David regarding his system, and I think that one of the potentially valuable things about it is that it may help us better quantify "ball-hogging".

His model tracks the league average rate of out conversion for every batted ball with the same parameters--direction, trajectory (grounder, line drive, fly ball, mayber pop-up), speed, and pitcher-and-batter handedness). The out conversion probability is calculated both at the (league average) team and (league average) position level. The probabilities per position sum to the probability per team. Andruw's data gets included in the league data.

At the end of the year, you take the sum of the centerfielder probabilities of out conversion for every BIP on a team and compare that to number of Andruw's gross putouts. I think Andruw was +24 or something like that.

Here's the neat thing. You could calculate the extent to which Andruw recorded a disproportionate amount of his putouts on BIP that had high out-conversion at the *team* level, because the data exists for each and every BIP he caught. For example, was he getting a lot of putouts on short flies that had a .9 probability of being caught by *somebody* on the team, with the normal distribution of probabilities by position being something like .4 SS, .4 2B, .1 CF?

We might want to look at the Atlanta ratings at 2B and SS. If they're basically OK, the fact that Atlanta as a team had the best defense under David's system would provide good evidence that Andruw really is valuable, and not just taking cheap chances.

The possibility that individual players could increase their totals with discretionary infield and short outfield fly balls may explain why Soriano shows up as basically OK.

This approach of looking at *team* level probability of out conversion shades into another issue that highlights the differences between David's model and UZR. UZR is measuring fielder impact in terms of estimated runs saved; David's system measures outs recorded in excess of probable outs recorded. UZR is an Expected Value measurement; David's is a Probability measurement. When a player records an out on a ball with a very high probability of out-conversion, he does very little in changing the *Expected Value* of runs allowed by his team, as that value is measured the moment the ball leaves the bat. Adapting David's system to the "expectation" concept will permit calculations of runs-saved per fielder.

In the meantime, David's system is an excellent new method that can potentially provide excellent fielder ratings and provide further insight into DIPS.

Posted 3:36 p.m., November 13, 2003 (#11) - tangotiger
  I believe UZR would have the same thing. For example, if Andruw catches a ball in zone Y where 95% of all balls are caught, this is what happens:

run value of ball in zone Y = .05 x (-.60) + .95 x (.30) = .255

run value of ball CAUGHT in zone Y = .30
run value of ball NOT caught in zone Y = -.60

Therefore, the change in run value of a ball caught in zone Y
= .300 - .255 = +.045 runs

The run value of a ball not caught in zone Y = -.855 runs

95% of .045 + 5% of -.855 = 0

Posted 10:55 p.m., November 13, 2003 (#12) - Michael Humphreys
  Tango,

Thanks for the example. So I guess UZR could track if Andruw was recording a lot of cheap outs. But maybe we don't care, because he won't get that much "run" credit on those kinds of plays anyway. In other words, if Andruw takes an above-average number of "easy" plays in "shared" zones, but fails to cover an average number of "difficult" plays in "centerfield-only" zones, he effectively gets docked under UZR, which would be correct.

Posted 12:43 p.m., November 14, 2003 (#13) - Danny
  Pinto's system rates the A's 3B as below average. Chaves played all but 100 innings of 3B for the A's. UZR rates Chavez as 37 runs above average for 2000-20002.

What is the reason for the discrepancy?

Is it a matter of park adjustments? Did Chavez decline?

Posted 2:03 p.m., November 14, 2003 (#14) - tangotiger
  I sent the following to David.

**************

I made a comment here on the perfect UZR which I will cut / paste here
======= cut ==========
- the location of the batted ball
- the trajectory of the batted ball
- the speed of the ball
- the game situation (inning/score/base/out/count)
- the speed of the runner(s) and batter
- the batter's handedness
- the gb/fb tendency of the pitcher/batter
- the type of pitch thrown (fastball, curve)
- the surface of the park
- the dimensions of the park
- the climate that day
- teammates' positioning
===== end cut ========

So, in addition to the base/out that you mentioned,
you might as well throw in inning/score, right?

I don't remember seeing the surface of the park in
your variables, so you might want to consider that.
The gb/fb tendency of the pitcher/batter also effects
the rates, as would the speed of the baserunners.
These are a little more problematic to figure out,
though. The climate (temperature, wind, etc) might be
worth considering as well. I see that STATS now has
the speed and type of pitches thrown.

Anyway, consider using any/all of these.

You have to be careful about now applying your
probabilities to the pitchers, because you do not want
the pitcher skill as a variable. For example, the
handedness of the pitcher should not, I don't think,
be a variable in here, since the handedness is a trait
of the pitcher. We are not trying to remove the
pitcher handedness bias, since it's something inherent
to the pitcher. Same thing with the slice location.
This is under the pitcher's control, so you don't want
to treat that as a bias to account for.

In that above link, I list the variables from the
pitcher's perspective:

===== cut =====
- the game situation (inning/score/base/out)
- the speed of the runner(s) and batter
- the batter's handedness
- the gb/fb tendency of the batter
- the surface of the park
- the dimensions of the park
- the climate that day
==== end cut ==

Great job overall!

Posted 2:15 p.m., November 14, 2003 (#15) - David Pinto(e-mail) (homepage)
  I wanted to respond to Danny's comment about the A's at third base. The A's are turning out to be a very interesting case which will need a great deal of study. With the parameters I've used so far, the A's pitchers look like they set up easy chances for their fielders. So any fielder on the A's is going to get less credit in this system. Also, remember, this system is not measuring runs at the moment, but outs. It could very well be that Chavez is trading stopping doubles for letting singles go by, which might hurt his DER but help his overall runs saved. Chavez does very well in 2003 in defensive win shares. He's going to be a case to examine very closely, as will the Scott Hatteberg/Travis Lee extremes at first base.

Posted 2:17 p.m., November 14, 2003 (#16) - tangotiger
  Octavio Dotel had 205 BIP, meaning that 1 SD = .032. Dotel's actual DER minus expected DER was .063. This means that Dotel's performance was a shade under 2 SD from the mean. That was actually the largest deviation on the one side. Jeff Weaver was 2.7 SD the other way.

In the 230 pitchers in the group, we had only 1 pitcher that was more than 2 SD. We also had 196 within 1 SD, or 85%.

However, because of, I think, the "removal of skill" that David is doing, this forces everyone towards the mean.

Even look at say keeping the slice location in. Say you have a pitcher that gives up plenty of deep flyballs. However, under David's system, all pitcher BIP in that slice should give you the same "expected outs". A pitcher with a great fielder or a pitcher that gives up alot of easy flyballs will record more outs than expected, and we don't know the real reason.

The OF doesn't have this problem much, because he's got many pitchers getting balls over there. The same can't be said for the flip side.

Posted 1:07 p.m., November 15, 2003 (#17) - studes (homepage)
  I don't know if this is the right approach, but I ran a regression of David's expected DER and actual DER to try and get a grip on the relative responsibility between pitching and fielding for DER. I'm including "luck" in the pitching bucket, cause there is really no way to separate it out.

I ran "Expected DER" and "Diff" against Actual DER. Of course, the R2 was almost perfect, but I was interested in the t-stat. Expected DER had a t stat of 99, and Diff had a t stat of 55.

Based on this limited sample size, I think that would indicate that pitching and luck account for 2/3 of DER variance between teams, and fielding accounts for 1/3. This is slightly higher than my previous guess of 50%.

Is my approach valid?

Posted 1:21 p.m., November 15, 2003 (#18) - Mike Emeigh(e-mail)
  With the parameters I've used so far, the A's pitchers look like they set up easy chances for their fielders.

I wonder how much of this is due to the immense amount of foul territory at the NAC. When I was parsing the PBP data, it seemed to me that there were a *lot* of foulouts in Oakland this year.

-- MWE

Posted 4:17 p.m., November 15, 2003 (#19) - Michael Humphreys
  Studes,

I could very well be wrong about this (I probably am), but I thought the t-stats tell you how confident you can be about the accuracy (standard error) of the coefficients for the variables, not necessarily the relative impact of each variable on the outcome being modeled.

I'm also not sure the following alternative approach is any better, but it might be worth regressing marginal "pitcher" DER (or marginal outs) onto runs allowed, and then, separately, marginal "fielder" DER (or marginal outs) onto runs allowed, and compare the relative r-squareds. It might also be worthwhile simply comparing the standard deviation in "pitcher" DER outs v. "fielder" DER outs.

I think that we won't be able to draw firm conclusions on the relative effect of pitching and fielding on BIP from David's system until we have at least two years of data with which to perform a "persistency" test at the individual pitcher and fielder level.

If I recall correctly from various Primer posts, the "r" between successive individual UZR fielding ratings (which, in the case of infielders, excludes infield pop-ups) is about .5, which corresponds to an r-squared of about 25%. The "r" for individual pitcher BABIP year-to-year is about .2, corresponding to an r-squared of 4%. Dick Cramer's article about the impact of pitchers on BABIP indicates that, for purposes of comparing the relative impact of pitchers and fielders, the relevant comparison is between the r-squareds. By this measure, using the above r-squareds, fielders have six times the impact of pitchers. Of course, there's a *huge* amount of noise for BIP outcomes, year-to-year. DRA allocates the "noise" to the fielders, who account for most of the "signal".

Mike,

Yes, Oakland's foul territory is vast--I think ballparks.com (which baseball-reference.com posts for each team) says that it is the largest or one of the largest in the majors.

In developing DRA, I did a lot of analysis of infield fly outs, and tried to estimate the impact of foul territory. What I kept coming up with are estimates that Oakland would probably "give" an average fly ball staff an extra 12-15 infield fly outs a year compared with a ballpark with "average" foul territory. That's still an *extremely* rough guess based on a *primitive* "coding" of ballparks.com's verbal descriptions of foul territory. I would not be surprised if the effect were greater.

The regression result between infield fly outs and foul territory also nicely exemplifies the t-stat "accuracy" /r-squared "impact" issue discussed above. The r-squared for foul territory was tiny (less than 5%), but the t-stat was highly significant (.0001).

During the playoffs, there was a short story about how the Red Sox shifted their rotation to get "infield pop up artist" Wakefield to pitch in Oakland, precisely because of the foul territory, and ground ball pitcher Lowe in Boston.

Tango (and David),

Doesn't David's system track not only the "slice", but the trajectory (grounder, line-drive, fly ball, pop-up) and ball speed? Do you think that the trajectory and ball speed variables would serve as proxies for "depth" in the "slice"?

David,

What might be happening to Chavez is that a high number of infield fly balls on the left side of the field would significantly increase the sum of probable out-conversions for an average third-baseman playing the position in an average way. If Eric allowed the shortstop or left fielder to take as many of those discretionary chances as they could possibly handle, with Eric concentrating on the huge foul territory that only he could handle, I would guess that his rating would go down.

As Bill James has written (somehow that sounds like a Scriptural reference), there doesn't seem to be any relationship between third base putouts and fielder skill. I extend the same insight to other infield positions, for various reasons mentioned in the DRA article.

One simple approach to test this theory (and your database is outstandingly well-suited to this) might be to calculate separate ground ball and fly ball "outs-vs.-probable outs" for infielders. Something tells me that Chavez's ground-ball-only rating would go up, and Soriano's would drop from its currently "average" level.

Posted 5:08 p.m., November 15, 2003 (#20) - Tangotiger
  Do you think that the trajectory and ball speed variables would serve as proxies for "depth" in the "slice"?

No, I don't. And, since the grid location is available, I don't understand why not include it as well. Between "slice", "grid", and "ball speed", I would think the latter is most at risk from scorer judgement.

As well, I'll reiterate that I think David is not handling the pitcher portion of his system correctly, so we should wait for a response from him before trying to make sense of the numbers.

Posted 5:18 p.m., November 15, 2003 (#21) - Tangotiger
  Let me try to give an example of where I think David is going wrong with his pitching version.

Say that he only looked at 2 categories: handedness of pitcher, and batted ball type (and let's assume the batted ball type is either air or ground).

So,
p(LP,a)= .70
p(RP,a)=.80
p(LP,g)=.75
p(RP,g)=.85
Those are probability of getting an out.

So, say you have a left-handed ground ball pitcher. HEre's would be his frequency numbers:
f(LP,a)=.20
f(LP,g)=.80

So, what would the "expected outs" be?

.7 x .2 + .75 x .8 = .74

And, let's say that he actually got .74 outs. So, this means that he has average skill, right? Wrong. He has average skill relative to a LP. If you look at the prob rates, all LP have a probability of getting an out as -.10 relative to a RP.

If you've got 70% of the pitchers as RP, then the average LP would be -.07 relative to an average pitcher.

Therefore, our average LP is actually -.07 realtive to an average pitcher.

The problem is that David treats the variable of handedness of the pitcher as something to account for (you should) and essentially remove (which you shouldn't).

I think I have it right here.

Posted 5:40 p.m., November 15, 2003 (#22) - Michael Humphreys
  Tango,

I think I get your example. So, using the numbers in the example, the lefty should have a -0.07 rating, but somehow under David's system he has a 0.0 rating?

Are the numbers in the example meant to be representative? Do ground balls hit against lefties have lower out-conversion rates than those hit against righties? Is this controlled for batter-handedness?

I think Mike Emeigh wrote in his "Jeter" series that lefties face more righties, and righties of course have a longer distance to run to beat out a throw on a ground ball, so you'd sorta expect the *average* out-conversion on ground balls hit against lefties *not* controlled for batter handedness to be higher. On the other hand, I think Mike had found that when you control for batted-handedness, the effect is partly counterbalanced because ground balls hit to the opposite field have lower-out conversion rates. Or something like that.

Posted 9:55 p.m., November 15, 2003 (#23) - Tangotiger
  My numbers were strictly for illustration and for me being able to do it in my head.

Posted 2:34 p.m., November 16, 2003 (#24) - studes (homepage)
  Michael, thanks for your input on my little calculation. Makes sense, so I went back and did another calculation. I "normalized" a fielding impact DER by averaging the pitcher DER across all teams, and then adding or subtracting the DER difference by team. I then computed the standard error of the pitching DER and the fielding DER and got these results:

- Pitching DER: .0016
- Fielding DER: .0009

This would indicate that variance for pitchers is 80% higher than that for fielders.

Standard error, in this case, is the estimate of the underlying "true" mean of the broader population. But I think that using it to determine the relative impact of pitching and fielding on DER is appropriate.

Caveat: all the "noise" you referred to in your previous post is in the pitching bucket, given the way David approached the data. That's why the variance is higher. So this is not invalidating your point about the relative true impact of pitching and fielding on DER.

I think we all agree that much of DER is dependent on things other than the pitcher or fielder. That is, luck. The only difference I'm bringing to the table is that I think that the noise should be attributed to the pitcher, when assigning "responsibility" for DER.

This is purely a philosophical belief. You can either see the pitcher as passive, or the fielders as passive (or a combination, I guess). Your approach (attributing all the noise to fielders) makes the pitcher passive, which might be consistent with the DIPS framework.

But in my mind, the pitcher is in charge of what happens on the field. He initiates the play, he throws the ball. What happens to a batted ball ought to be attributed to him until other players can make a reasonable impact -- i.e. fielders. So I would choose to make the fielders the passive players. They can't really act until a ball is hit into their "zone." It's the pitcher who ought to be assigned "control" before that moment.

Sorry to ramble. I don't think this has any impact on your DRA calculations, from what you said before. But I thought I would bring up the philosophical differences.

I think the approach Tango seems to be taking -- assigning 50/50 responsibility to pitchers and fielders -- is reasonable until we get more data.

Posted 10:15 p.m., November 16, 2003 (#25) - Mike Emeigh(e-mail)
  I think Mike Emeigh wrote in his "Jeter" series that lefties face more righties, and righties of course have a longer distance to run to beat out a throw on a ground ball, so you'd sorta expect the *average* out-conversion on ground balls hit against lefties *not* controlled for batter handedness to be higher.

In the NL, in 2003, it actually worked the other way around. LHP got a lower percentage of outs on GBIP than their right-handed counterparts, and LHB were retired more often on GBIP than RHB. There are several competing factors at work here:

1. RHB have to run further than LHB when they put a ball into play, and thus should (all else being equal) be retired more often on a GBIP than a LHB, all else being equal.
2. RHB tend to hit more balls to the left side of the infield than do LHB, and GBIP hit to the left side are turned into outs less often than balls hit to the right side.
3. Balls hit the other way are turned into outs less often than balls that are pulled.

In 2003, the middle factor was the largest one. RHB hit four times as many GB to the left side of the infield as did their LH counterparts, so that even though they were retired more often on those balls (76.2% of the time, to 73.0% for LHB), the sheer numbers of BIP outweighed the higher conversion rate on those balls. As a result, 52% of GBIP against NL LHP were hit to the left side of the infield, vs only 46% of GBIP vs NL RHP. On balance, the chances for NL infielders were easier when a right-hander was on the mound.

I don't know the extent to which MGL makes these adjustments, but it seems to me to be pretty clear looking at the PBP data that there are distinct differences in expected results when a LHB hits a ball into a zone vs when a RHB hits a ball into the same zone. Batter handedness seems to have a greater impact than pitcher handedness on the results from BIP, and while pitcher handedness is a generally useful proxy for batter handedness in the platoon era, I think we need to remember that it is just a proxy.

-- MWE