Tango on Baseball Archives

© Tangotiger

Archive List

Clutch Hitting: Fact or Fiction? (February 2, 2004)

AED checks in with a superb article on clutch hitting.

Here is my study: Does Clutch Hitting Exist? Yes!.

 
--posted by TangoTiger at 08:44 PM EDT


Posted 9:08 p.m., February 2, 2004 (#1) - Ryan (homepage)
  The author notes that the "clutch" hitters are mostly singles hitters and the "chokers" are mostly power hitters.
Is it possible that the skill that is being detected is not clutch hitting, but the ability to hit certain types of pitchers?
As the author notes, the at-bats being defined as clutch are frequently save situations, so a lot of the pitchers being faced are probably closers or other power pitchers.

I've never seen a study, but I've always been curious whether a contact hitter like Gwynn or Ichiro will outperform a homerun/walk/strikeout type of hitter (i.e. Thome) when facing the best pitchers. According to DIPS, a pitcher can keep down his walks and homeruns, but has little control over balls in play. So, wouldn't a contact hitter who rarely walks or strikesout, do almost as well against the best pitchers as he does against the worst? He'd strikeout a little more, and walk a little less, but all those balls in play should still be hits just as often.

Of course, that wouldn't explain why there is even greater statistical significance when defining clutch as RISP. I'd be curious to see if that definition produced a similar list of "clutch hitters" and "chokers".

Posted 9:55 p.m., February 2, 2004 (#2) - Charles Saeger(e-mail)
  I too want to see this study reclassified by pitcher quality. Do good clutch performers do their work against bad pitchers? Can we see what this is in another "clutch" situation, say, runner on third and less than two out?

Nevertheless, this is interesting, though nowhere near as important as the author says it is.

Posted 11:39 p.m., February 2, 2004 (#3) - Michael Humphreys
  It was all new to me, well-written, and made a lot of sense. Getting on base would seem to be more in the batter's control than getting an extra-base hit. It would be interesting to know whether the batters who succeeded in improving their OBP in "pressure" situations lowered their slugging percentage at the same time (which would provide further evidence that the "clutch" batters made a conscious *adjustment* to avoid bad pitches and shorten their swings to make contact).

Posted 12:22 a.m., February 3, 2004 (#4) - AED
  Analysis of batter types vs. pitcher types would be a full-blown study in its own right. A very interesting one, but not really the point I was trying to address (and not something I have time to do at the moment). However, I think one can get at the more obvious possible manifestations of that by looking at the correlations - are OBAs of sluggers low in clutch situations because of fewer walks, more strikeouts, fewer home runs, or fewer hits on balls in play? The answer is a combination of all of the above, but primarily fewer walks and fewer hits on balls in play. While closers give up fewer walks and sluggers tend to walk more, this trend accounts for only about 20% of sluggers' loss of walks in clutch situations. And the lower batting average on balls in play is an order of magnitude larger than would be explained by pitching style. So it indeed appears to be the hitter, not the pitcher.

Ryan, a typical Gwynn or Ichiro type of player will generally do about 15 points of OBA better in the clutch than would a typical Thome type of player. I haven't looked into the RISP list in the same detail I did clutch, so don't feel comfortable commenting on that.

Charles, I'm intrigued by your comment that this is "nowhere near as important as the author says it is." Can you point me to any study that found a statistically-significant difference in player performance in clutch and non-clutch situations?

Michael, I don't see much of a correlation or anticorrelation between change in OBA and change in HR/H or TB/H. I'll look into that in more detail when I get a chance.

Posted 1:12 a.m., February 3, 2004 (#5) - AED
  One follow-up on the Gwynn/Thome comparison. The typical singles hitter like Gwynn would tend to hit about 10 points of OBA worse in the clutch than would the average major leaguer. Gwynn himself was a very good clutch hitter, of course. (The trends I'm discussing account for only 0.005 of the overall clutch OBA scatter; the other 0.005 of the scatter is for reasons uncorrelated to batter profile.)

Posted 1:26 a.m., February 3, 2004 (#6) - J Cross
  Interesting. I wonder if good hitters or sluggers are less likely to have a platoon advantage in a clutch situations than singles/bad hitters. Maybe the better hitters face tougher pitching matchups in these situations. Maybe power/flyball hitters benefit less from a shifted/moving infield. I doubt it's a noticeable effect but sometimes when a pitches falls behind a slugger in these situations he'll decide to complete the walk intentionally. In that case a situation where a hitter had worked the count in his favor would be thrown out.

Posted 2:41 a.m., February 3, 2004 (#7) - MGL
  A lot of good cross-correlation points were brought up. It is by no means clear what the cause-effect is of the results of the study.

It appears to be a very good study, although I must confess I do not understand at all your techniques; therefore it is hard to make any intelligent comments, let alone critique it or offer any suggestions.

I am also not aware of any other studies that have remotely suggested the existence of true clutch and choke hitting.

I think what Charles is saying is that since the y-t-y correlations for clutch hitting are only .04 or so, there is little practical value in the revelation that your study seems to suggest. Is there? What I don't understnd, is that you say that it is difficult to identify a clutch or choke hitter, especially one with less than a multitude of historical AB's, which seems to jive with your .04 corrr., yet you say something about a hitter like so-and-so being expected to have a clutch OBA 10 points higher than expected. To me that makes no sense. If we can wasily identify players who should have a clutch OBA 10 points higher or lower than expected, shouldn't the y-t-y correlation be much higher than .04. Your regression equations also suggest that the y-ty r's should be much higher as well.

Finally, if it is true that singles-type hitters in general perform better in the clutch than do power hitters, our regular value metrics, like lwts, offensive win shares, VORP, etc., should be adjusted accordingly, should they not?

Bascialy I am thorougly impressed by your methods and basic conclusion, but I am lost as to the significance, the magnitude, and the predictability consequences of the results....

Posted 8:18 a.m., February 3, 2004 (#8) - Charles Saeger(e-mail)
  Actually, no, that's not what I'm saying. I'm saying that in runs generated out of this, it doesn't look like that it affects that many runs. It's more like counting safe on errors than counting home runs.

Posted 9:12 a.m., February 3, 2004 (#9) - Erik Allen(e-mail)
  Very interesting study! I think the research method used here is sound, and is very similar to what was used in the Solving DIPS thread this summer.

A few comments...first, I think the fact that better hitters tend to hit worse in the clutch is entirely consistent with the idea that there are better pitchers pitching in these situations. To illustrate this, we can estimate the outcome of a particular batter/pitcher matchup using Bill James Log5 formula. Consider the extreme example of brining in a relief ace pitcher, whose OBP against is 0.300 versus a league average of 0.340. The first batter to come to the plate has a true OBP of 0.400. The Log5 formula predicts the hitter will get on base at about a 0.356 clip in this situation. In other words, the batter OBP is reduced by 0.044. Next, a poor hitter with a 0.300 OBP comes up. Log5 predicts the OBP of this matchup is 0.263, only a drop of 0.037 in OBP. Better hitters should be more strongly affected by the presence of a good pitcher than poor hitters.

A second comment is an idea for a follow-up study. Although you have demonstrated that a spread of clutch hitting ability exists, you can also use the MC simulation to estimate the persistence of clutch hitting ability. To do this, create a set of imaginary players with the distribution of clutch hitting abilities found in your study. Then, simulate two consecutive seasons for each of these players, and measure the y-t-y correlation. If this correlation coefficient is significantly higher than 0.04, then it raises some questions about the predictability of clutch hitting ability.

Posted 10:53 a.m., February 3, 2004 (#10) - Alan Jordan
  It's a very interesting study. It's the first evidence that I've seen that clutch hitting exists.

This may be semantics, but I'm not ready to call someone a choker because their OBA is lower in clutch situations. I would feel more comfortable using the word choker for someone who's OPS is lower in clutch situations. If a batter is swinging for the fences, he's obviously choosing to sacrifice his probability of getting on base in order to increase his probability of getting a homerun. Since we aren't measuring his odds of getting a homerun, we are getting an incomplete measure of his contribution at the plate. What are measuring may just be the tendency for sluggers to swing.

Perhaps we should be looking at whether a runner crossed the plate or the number runners who crossed the plate vs the expected number of runners who cross the plate. I don't see a simple way of testing that.

Posted 10:54 a.m., February 3, 2004 (#11) - Walt Davis
  The difference between a "one standard deviation good" and an average clutch hitter amounts to only 1.1 successful appearances, while the difference between a good and an average overall hitter amounts to 3.9 successful plate appearances. In short, any argument that clutch skills should be ignored could equally well be an argument that all batting skill should be ignored in clutch situations, given that randomness is the largest factor of all.

I would assume this is what Charles is referring to. A clutch hitter will have 1.1 more successful PAs in about 150 clutch situations in a season. Those 1.1 additional successful PAs will result in maybe one additional run created (on average) per year between your clutch and non-clutch hitters. Now those are fairly high leverage runs and maybe worth something like .5 wins each.

Moreover it would seem fairly easy that differences in pitchers faced or randomly distributed differences in base/out scenarios might explain such a small difference (i.e. if the average batter does better with a man on 1st and noone on 2nd due to the 1B holding the runner on, then batters who randomly had more of these clutch situations would do better). At the very least, we'd think that at least some of the variation in clutch performance is due to variation in these factors, meaning the "true" clutch effect is likely even smaller than this.

I'd imagine that in clutch situations, the base/out/deficit situation would be treated differently by different hitters. Here "clutch" includes cases where the tying run is at first base, or in the batters box, or on-deck. I'd think a Gwynn-type hitter would approach these situations the same way, but if the batter represents the tying run or if they tying run is on 1B, a Thome-type hitter is looking for a HR or at least a double. Rather than Gwynn-types being clutch hitters, perhaps Thome-types are making a conscious (and perhaps correct) decision to sacrifice some OBP for some SLG. In other words, if Gwynn's 1.1 extra successes are singles, Thome only needs one extra HR than he would normally hit to make up that difference.

Another issue to address is statistical power. Statistical power is the probability that a test statistic detects an effect of size X in a sample size N (at a given alpha level). While it would seem great to have lots of power, there's the downside that with enough power, even trivial differences will achieve statistical significance. In other words, with a big enough sample size, everything is significant.

If I weren't lazy, I'd look up the proper formula for power in the binomial. Instead I did a quickie simulation. I simulated 10,000 careers of 612 players with a "regular" OBP of 328 where each had 1000 career clutch PAs (is this reasonable?) and 1/3 had "clutch" OBPs of 326, 1/3 328, and 1/3 330. For each set of 612 careers, I determined the number of players who had significantly more or fewer hits (at the .05 level) assuming a .328 OBP (i.e. no clutch). In any given set of 612 careers, we'd expect 5% of batters (or 30.6) to exceed that level due to randomness.

I then looked at the distribution of this count across the 10,000 seasons, which ranged from 13 to 54. Given 612 trials and a p of .05, a count higher than 40 should occur in about 3.8% of the sets of 612 careers. However, in our simulation, we get a count of 40 or higher about 6.7% of the time.* In other words, we would easily reject the null hypothesis that there's no difference in clutch OBP. We would conclude this even though the diffrences are in fact quite trivial.

Which is just a way of saying that the p-values reported in this article aren't necessarily impressive, they're just reflective of the power of the test. The magnitude of the effect is just one of the things that impacts the power of the test. The magnitude of the effect in the study is greater than the ones I used above, but still quite small.

* there appears to be some bias in my random number generator such that even with no effect simulated, I get about 4.4% (insted of 3.8%)with a count above 40....or I shouldn't have used the binomial to generate that 3.8% expected value. But 6.7% is still 50% above 4.4% and we'd conclude a significant difference. The other possibility is that the probability function is off, meaning the 3.8% number is too low.

Posted 11:26 a.m., February 3, 2004 (#12) - cheng
  Sorta new around these parts, but doesn't Tango have a "modified" OBP stat that takes into account all offensive contributions but still expresses the number in a binomial fashion? Seems that this would be a better stat to use than straight OBP. Maybe hitters like Yaz or Winfield deliberately swung for the fences in these situations? Just throwing it out there. Great article.

Posted 11:44 a.m., February 3, 2004 (#13) - PhillyBooster
  Perhaps I was not reading closely enough, but nonetheless here is my question:

It is stated that players, as a whole, perform worse in situations defined as "clutch."

Taking that as a given, is a player defined as a "clutch player" for purposes of this study if he plays exactly the same in clutch and non-clutch situations? Should he be?

I'm not sure what I think the answer to the "should" question should be. On the one hand, performing the same doesn't really seem very "clutch", on the other hand, a relative increase in performance is still meaningful, no matter what you choose to call it.

Posted 11:46 a.m., February 3, 2004 (#14) - tangotiger
  AED has made a remark regarding that when I first introduced it, and he has provided some process for me to go through to see how well it holds up against the binomial. I'll report back later.

As well, LWTS by the 24-baseout state might be considered, since, as some have pointed out, the value of a walk or HR changes depending on the base/out situation. This would be tied to the above paragraph.

Posted 12:39 p.m., February 3, 2004 (#15) - Charles Saeger(e-mail)
  As an addendum to my earlier comments, I suppose ...

How many runs are we talking about being in play here? How many runs difference are made up between, say, Ozzie Smith and Dave Winfield, because of this? This is what I'm talking about, and I'm inherently skeptical of the 28% figure. I am having a hard time accepting it without seeing the numbers.

I know this sounds very unsabermetric, but what are the differences in batting average? Are the differences due to walk rate or whacking the ball? The slugging percentage difference suggest the latter, but just making sure.

Posted 12:48 p.m., February 3, 2004 (#16) - Charles Saeger(e-mail)
  Ah, rereading, I see the 28% figure's origin. I took it to mean something in an overall sense, but it's just in terms of performance in these situations. It still looks like we're only talking about a couple of runs a year here, so I still can't see it as an important skill.

Posted 12:53 p.m., February 3, 2004 (#17) - AED
  Thanks for all the interest and feedback. Going through the list quickly:

MGL, a significant chunk of the variation in clutch performance is due to correlations between clutch performance and the players' batting profile. This part can be calculated rather accurately for anyone based on their overall stats. Slugging average is a catch-all that hides a lot of factors in about the same proportion as they influence clutch performance, so that's what I chose to use. I could have given clutch performance as a function of BB/PA, SO/PA, HR/(PA-BB-SO), and (H-HR)/(BA-BB-SO-HR), but it wouldn't have gained a whole lot in accuracy. (My note about Gwynn/Thome types did use such a breakdown; in retrospect I could would have been nearly as accurate using just their career slugging numbers.)

Charles, the standard deviation is indeed small, but since these are high-leverage situations, the hits/wins ratio is higher. I have "win advancements" computed for the same years, so if I get a chance I'll try to put the two studies together.

Erik, in the initial phase of the analysis (the strict binomial test), I used multiplicative adjustments to the on-base averages to account for pitcher difficulty. The league average in 'clutch' situations was 0.322; that in 'early' (innings 1-5) was 0.331. So all clutch rates were multiplied by 1.02 and non-clutch rates were multiplied by 0.99. This should accurately reverse the log5 effect. (To make sure this adjustment wasn't causing anything bad, I also made a second run by adding/subtracting to the rates.) The follow-up you mention isn't needed -- given 150 clutch plate appearances in consecutive seasons and the measured spread of 0.007 in clutch skill, one knows the correlation to be 0.04.

Walt, naturally the game situation will affect what strategies are used. But since pitchers are always trying to prevent runs and batters are always trying to score runs, it isn't clear that there should be a huge difference between how they approach trying to score/prevent runs in the 7th inning rather than in the 2nd inning. I'm not completely sure what your test did, but I did simulate a very large number of seasons with both the null hypothesis (equal performance in clutch/non-clutch) and the clutch hitting model. Only 0.4% of null hypothesis models were more deviant than the actual data.

PhillyBooster, I believe the 0.009 drop in overall OBA in clutch situations is due to the fact that opposing teams tend to send out better pitchers in those situations. I would have to check that more carefully to give a definitive answer.

Charles, the 28% figure is the ratio of the total rms in clutch ability to the rms in inherent OBA skill. The small correlations I have detailed are only part of that total -- an equal part of it appears to be uncorrelated with anything obvious. Batting average is affected as well, about half as strongly as OBA and SLG are lost; this is due almost entirely to batting average on balls in play.

Posted 1:03 p.m., February 3, 2004 (#18) - tangotiger
  In terms of the opposing pitcher for our hitters:

Using my own terminology for clutch (based on LI), the standard deviation of the differences between opposing pitcher's "overall 1999-2002 lwts-based OBA" during clutch and non-clutch situations is .0019. (I don't know how random that is, though others here might know.)

Breaking it down by top 50 clutch performers, bottom 50 clutch performers, and overall:
- opposing pitcher's OBA during non-clutch, for all groups, was .347
- opposing pitcher's OBA during clutch, for all groups, was .342
- the differential was .005 for all groups

So, a guy who performs well in the clutch did not do so by facing poorer pitchers.

I have the batter's performances being .0058 lower during clutch, while their opposing pitcher's overall OBA was .0053 lower. The reason that hitters did worse in the clutch than nonclutch is almost entirely due to the better pitchers being faced.

Posted 1:45 p.m., February 3, 2004 (#19) - J Cross
  Tango, do you have data on the quality of the average pitcher pitching in a clutch situation? Basically, the season/career OBP against for pitchers who pitched in these clutch situations weighted by the # of PA they pitched in these situations.

I'm not sure that looking at how well these pitchers did IN the clutch situations themselves is a good measure of how good these pitchers really are. There might be a higher level of batters in clutch situations (so the change in pitcher would be understated) or the obp's might be effected by more/less common base/out situations. Or maybe there are more clutch situations (as defined by run differential) in pitchers parks or in games with better pitching/fielding teams.

Posted 2:25 p.m., February 3, 2004 (#20) - tangotiger
  I probably didn't explain post #18 well, but when I said:


Breaking it down by top 50 clutch performers, bottom 50 clutch performers, and overall:
- opposing pitcher's OBA during non-clutch, for all groups, was .347
- opposing pitcher's OBA during clutch, for all groups, was .342
- the differential was .005 for all groups


I meant that the opposing pitcher's "overall 1999-2002 lwts-based OBA" during nonclutch was .347, and during clutch it was .342.

That is, I weighted each pitcher's PAs in clutch and assumed a "true talent" equal to their overall 1999-2002 performance.

Note that that's not the true definition of "true talent", and that for pitchers with less than 800 PAs in that time span, I gave them league average numbers. (i.e., I regressed the regulars 0% and the bubble players 100%). For a quick report, I think this is probably acceptable.

Posted 2:36 p.m., February 3, 2004 (#21) - J Cross
  Okay, I got it now. Makes more sense than the way I read it.

Posted 2:40 p.m., February 3, 2004 (#22) - AED
  Here are numbers from the data I used, which may be more directly useful in addressing questions about this article.

The average pitcher in a "clutch situation" had the following career stats:
0.154 SO/PA, 0.081 BB/PA, 0.021 HR/PA, 0.286 BABIP

The average pitcher in a "non-clutch situation" had the following career stats:
0.145 SO/PA, 0.082 BB/PA, 0.021 HR/PA, 0.285 BABIP

So aside from a 6% higher strikeout rate, there's nothing significantly different about the pitching faced.

Posted 3:14 p.m., February 3, 2004 (#23) - tangotiger
  That works out to a .315 OBA in clutch and .317 OBA in nonclutch.

Posted 3:27 p.m., February 3, 2004 (#24) - J Cross
  thanks guys. Is the smaller differential in AED's pitcher's due to the fact that his study stretches back to the day of more complete games and less bullpen use or rather a different assignment of "clutch"? Looking at who's pitching in "clutch" situations is interesting for it's own sake

Posted 3:36 p.m., February 3, 2004 (#25) - RossCW
  If I understand the selection of "clutch situations" it includes a lot of situations that players would not consider to be clutch. For instancethe top of the seventh with the score tied at 5 to 5 and no one on base isn't really a "clutch" situation. So presumably the measured impact of the situations that are trul "clutch" is diluted by the lack of impact in situations that aren't.

It is certainly likely that players are more sensitive to the impact of clutch hitting than its statistical significance might warrant. At bats that win ball games tend to get noticed.

The relative value of sluggers versus singles hitters in those situations may be related to the quality of pitching they face. I think a manager is much more likely to bring in a top reliever to face a slugger than a singles hitter.

It also seems that a larger percentage of the at bats considered are truly "clutch" for sluggers. Tony Gwynn is not really in a clutch situation with a no one on base even in the 8th inning. A slugger - who could put his team in the lead with one swing - might feel like that is a clutch situation. Its also plausible that the slugger will swing for the fences when the bases are empty and not when there are runners on base.

Assuming someone repeats the methodology used here and gets the same results (something I am wholly incapable of) this study seems to end the debate about whether clutch hitting exists. But it calls for further studies to better understand how it exists. Or more accurately - there is something about the group of at bats selected that causes some hitters to get statistically significant different results than other hitters compared to other at bats. I don't know what another explanation of the results would be, but that also needs to be considered.

My comments are just suggestions for areas where further study is warranted.

Posted 4:59 p.m., February 3, 2004 (#26) - AED
  Tango, that's correct about the averages. Regressing the pitchers' career numbers to estimate true talent, I estimate that the OBA allowed by an average pitcher in a clutch situation is 0.316, while that allowed by an average pitcher in a non-clutch situation is 0.313. So, again, aside from the strikeout rates (which are still 0.154 and 0.145 after regressing), the overall quality of pitching faced in the two situations is essentially the same. Pitchers in clutch situations allowed 37.02% of batters that are not struck out on base; those in non-clutch situations likewise allowed 37.02% of batters that are not struck out on base.

Ross, absolutely, the choice of "clutch situation" matters. As noted, I used a rather broad definition to include any situation that might have pressure, which allowed a more precise statistical test between the two samples. My main goal was to find out whether or not there is any difference whatsoever; with that determined, follow-up studies are warranted.

If I define "clutch" as 8th inning or later, trailing, and with the tying run on base, at the plate, or on deck, and lower the minimum number of clutch plate appearances to 100, I find the rms spread in clutch skill to be 0.0135 in OBA, which is statistically significantly higher than the spread in clutch skill in my original definition and is over half the distribution of overall OBA. (As before the quality of pitching is the same, except that pitchers in clutch situations are now 10% better at getting strikeouts than those in non-clutch situations.) So yes, it would seem that whatever differences there are get larger in situations with more pressure, which means that I underestimated the impact of clutch hitting on wins. Tango's upcoming study uses lwtsOBA and LI, so he can more directly calculate win impact than I can.

Because the cluth situation definition includes situations when the tying runner is on deck or on base, 'clutch' does not necessarily mean "home run to win or tie". So there are some situations in which the batter feels pressure to hit a home run, and others where he feels pressure to hit a guy home from third, and yet others in which he just needs to get on base safely.

Posted 5:12 p.m., February 3, 2004 (#27) - tangotiger
  I have updated the lead-in, with a link to my study on the issue.

Please note, there is room for much improvement here. What has been established by Andy's study, and now by mine, is that clutch hitting is detectable (even if faintly). It is possible as we look into the issue more, that we'll improve our methods to detect clutch ability.

The important takeways are:
- just because no one, until now, was able to prove clutch hitting, doesn't mean that it doesn't exist
- since we now have some signs that clutch ability is detectable to some degree, this degree can be increased by improved methods and samples

Posted 5:12 p.m., February 3, 2004 (#28) - AED
  the OBA allowed by an average pitcher in a clutch situation is 0.316, while that allowed by an average pitcher in a non-clutch situation is 0.313.

I realize this can be interpreted two ways, plus I got the numbers reversed for good measure. Having calculated the "OBA talent" for each pitcher using his career stats regressed appropriately towards the mean, the average "pitcher OBA talent" faced in a clutch at-bat is 0.313 and the average "pitcher OBA talent" faced in a non-clutch at-bat is 0.316.

Posted 5:22 p.m., February 3, 2004 (#29) - tangotiger
  I made a little good that AED caught in my article. Article has now been corrected.

Posted 5:26 p.m., February 3, 2004 (#30) - Charles Saeger(e-mail)
  Well, I would guess the next step is to try the same on the old classic, RISP.

Posted 5:31 p.m., February 3, 2004 (#31) - Charles Saeger(e-mail)
  Tom: It's not that clutch hitting didn't exist; we all knew it did. It's whether or not clutch ability exists, and I must ask to be sure, but knowing you I'm sure you already checked -- is the distribution random?

Posted 6:13 p.m., February 3, 2004 (#32) - AED
  Charles, both of our studies carefully compared the actual distributions with random distributions. Mine was more significant in that regard since I used plain OBA (and thus no need to fudge the variance), but Tango also found more variance than would come from randomness for any reasonable value of his fudge factor. So yes, "clutch ability" does exist, in the sense that players have different talent levels in clutch and non-clutch situations.

RISP was one of my samples, and I found a significant deviation between RISP and non-RISP performance there as well.

Posted 6:51 p.m., February 3, 2004 (#33) - Nick S
  Tango-

How did you define clutch situations? That is, I assume you picked a LI threshold above which a situation was clutch, where was that threshold, and have you run the study at various thresholds to see if and how clutch ability varies with LI.

Posted 7:39 p.m., February 3, 2004 (#34) - David Smyth
  I read both AED's and Tango's studies. Good work, fellas, even though I don't understand all of the technical details. But one thing that bothers me is the casual use of the "clutch" term. In the popular arena, my understanding is that "clutch" carries a connotation of a player "rising to the occasion" because of his "steel balls". These studies seem to show that there are differences in situational performance according to type of batter (power vs singles), and perhaps to some degree an individual player's ability to adapt to certain pitching patterns a bit better than his "type" would suggest (Murray?). IMO, this all has ZERO to do with any qualities of "intestinal fortitude". It is not surprising to me at all that these differences exist. That some of these situations seem to coincide with a fan's idea of greater "pressure" being felt, or with a "Leverage Index", is probably a coincidence.

There is still no real evidence that "clutch" ability exists. There is evidence that differences in "situational" ability exist. I do realize that at some point one has to define clutch in terms of specific situations, and so maybe my point has little practical relevance. But still, I wish the researchers would stop using the term "clutch". It's sloppy.

Posted 7:43 p.m., February 3, 2004 (#35) - tangotiger
  I used an LI of at least 1.5 (which gives us an overall avg LI of 3), and that was nearly 20% of the PAs.

My problem is that if I use a threshhold too high, my sample size is too low to detect a pattern.

It's a tough call really.

Posted 9:26 p.m., February 3, 2004 (#36) - AED
  David, you seem to have it reversed. Most of the 0.0135 spread in clutch OBA seen in 8th inning and later clutch situations is not correlated with batter profile. (The rms scatter from batter profile is about 0.005; that from other sources is thus 0.0125.) Likewise, the pitching seen in those clutch situations is not very different from that seen in non-clutch situations.

That some of these situations seem to coincide with a fan's idea of greater "pressure" being felt, or with a "Leverage Index", is probably a coincidence.

I had three samples of data (3rd-5th inning, 1 out, and winning by 1 or 2 runs), and in none of these was any significant difference between in-sample and out-of-sample performance found. I also have four samples of data (my original clutch definition, the GABSB clutch definition, RISP, and my 8th inning and later clutch definition), all of which have significant differences. So it is extremely unlikely to be a coincidence. Looking at the amount of the difference among these samples, it also appears that the change in true performance increases with LI.

There is still no real evidence that "clutch" ability exists.

There is evidence that players perform differently in clutch situations in a way that is independent of the player's batting profile or the pitching faced. If it's not "batter profile" and not "opposing pitching", "steel balls" is a probable explanation.

Posted 11:38 p.m., February 3, 2004 (#37) - tangotiger
  Ok, so Miguel Tejada "situationally adapts when there is a large swing impact in the game" better than expected by random chance. Can't I use the term "clutch situation" or "crucial situation" there? And can't I call his trait as clutch ability? Do I have to use the legalese 12-word description?

Posted 8:00 a.m., February 4, 2004 (#38) - David Smyth
  ---"Can't I use the term "clutch situation" or "crucial situation" there? And can't I call his trait as clutch ability?"

My opinion is no, because the term clutch, as it has been commonly used, implies 1) better situational performance 2) due to a greater emotional/mental resolve. I do not consider this as a semantical nitpick on my part.

What reason do we have to think that E Murray had more of this than R Jackson? I seem to remember you (Tango) finding that almost all of Murray's advantage was in bases-loaded situations, and not in other high LI PAs. Why would he only tighten his belt with bags full? And by what physical mechanism is this "resolve" supposed to work, given that the batter has only a half-second to react to what the pitcher throws? Much more likely that some batters have more adaptable styles, and some are perhaps smarter (than the opposing pitcher) in certain situations. To convince me that the effect is due to "steel balls", as AED put forth in his last post, would take a lot more breakdowns by different aspewcts of LI.

So yes, I would use a more conservative terminology at present.

Posted 9:23 a.m., February 4, 2004 (#39) - tangotiger
  Fine, if we have to associate "clutch" to strictly the emotional or intellectual (but not phyiscal) resolve of the player, then obviously I can't say that a player is clutch. By that definition, we can never use the term with any statistical significance.

But, I see no reason to try to distinguish between emotional, intellectual, and physical. To me, clutch is any or all 3. I don't really care why Jason Giambi's true talent is different when the game matters the most (keeps his emotions at bay? has a better understanding as to how to bat? his body adapts better to the situation?).

Posted 9:40 a.m., February 4, 2004 (#40) - tangotiger
  I added a short paragraph at the end of my article to show that the 2 clutch runs, because of the timing, is worth 0.6 wins, instead of 0.2.

Posted 10:06 a.m., February 4, 2004 (#41) - Alan Jordan
  I'm with David on this. I'm willing to accept as "Steel Balls" a player who hits BETTER than his non clutch average controlling for pitching. What we have evidence for so far is that hitters hit worse in clutch situations and some don't hit as bad (some may hit better). It may actually be that nobody hits better in clutch than non clutch.

I think AED has found something, I'm just not sure what yet.

Posted 10:42 a.m., February 4, 2004 (#42) - Mike Green
  There are very important differences between Tango's and AED's studies. You cannot look at clutch situations and consider OBP only, and get a true result.

Take a particular situation. Two outs, nobody on, the home team is down by a run in 1990 and Dennis Eckersley is trying to close out the save. If Jack Clark is up, there is the realistic possibility of a home run, which is much more valuable in the game context than a walk. Jack's OBP goes down; his slugging percentage stays constant or goes up. If Alfredo Griffin is up, there is no realistic possibility of a home run and his manager will go crazy if he swings as wildly as he often did in less meaningful situations. His OBP will not suffer as Jack's does.

Tango's study, which includes a measurement of power, shows a completely different type of hitter- Miguel Tejada and Jason Giambi as clutch hitters, albeit at a lower level. I have no difficulty in accepting this empirically. AED's study, on the other hand, does not persuade me that Jack Clark, who was famous for his game winning home runs, deserves the "choker" label and nor Alfredo Griffin a "clutch hitter" label.

Posted 10:45 a.m., February 4, 2004 (#43) - tangotiger
  Alan, in my article, I said that the hitters, as a group, hit 5.8 lower in clutch compared to nonclutch. At the same time, the quality of opposing pitchers has an OBA that is 5.3 points lower during clutch than nonclutch.

Therefore, players, as a group, hit just as well in clutch/nonclutch, after you control for pitching.

Just to give you the numbers for Jason Giambi:
nonclutch: 2213 PAs, .442 lwtsOBA, .441 realOBA, .346 pitcherlwtsOBA
clutch: 466 PAs, .503 lwtsOBA, .487 realOBA, .339 pitcherlwtsOBA

So, Giambi hit 46 OBA points higher in the clutch, and if you adjust for the better pitching he faced, that's 53 points higher. If you use lwtsOBA (which weights the HR more and BB less), he's out of this world.

After adjusting his performance for the pitchers he faced, and regressing his nonclutch performance to determine his true talent level, Giambi was 3.05 SD higher in clutch than his nonclutch would dictate.

The performance of the 340 players in my study does not conform to a distribution that would be dictated by random. Something is at play here. What else could it be? Maybe we should look at it the other way. How about "really nonclutch situations" and "the rest"? Instead of my taking the LI of greater than 1.5, I should look for LI less than 0.5. Instead of players rising to the occasion, maybe Giambi and Tejada are players that drag their true talents down when the game doesn't count.

This is Giambi's lwtsOBA in the 5 classes of pressure that I make:
0 to 0.5 ... 0.5 to 1.0... 1.0 to 1.5... 1.5 to 2.5... 2.5+
0.438...... 0.430 ........ 0.461....... 0.482.......0.565

Maybe Giambi is a genius who gets bored when there's no action.

Posted 10:50 a.m., February 4, 2004 (#44) - tangotiger
  The important thing to remember, in both AED's study and mine, is that there is something nonrandom at play here. There may be other factors that need to be controlled further. Maybe I need to break out my LWTS by the 24 base-out states to get a better read as to the true performance of the player.

What is presented here should be considered as a starting point that we see a light that maybe the performance of the player is not distributed in some random fashion.

That we can all accept that hitters of different profiles approach (input) each LI situation differently already shows that it's not random. But, could the effect (output) still be random?

Right now, there's cause to believe that it's not random.

Posted 1:40 p.m., February 4, 2004 (#45) - AED
  You cannot look at clutch situations and consider OBP only, and get a true result.

Quite the opposite - by using a binomial statistic such as OBP, you know the expected random variance well enough to establish that the differences between the samples are non-random. Because of the ambiguity in the treatment of variance in lwtsOBA, Tango's results do not establish the presence of clutch differences; however given my finding that they exist his study is a nice follow-up.

For what it's worth, Jack Clark's OPS and lwtsOBA were both also lower in clutch situations.

Posted 2:08 p.m., February 4, 2004 (#46) - Mike Green
  AED, I'm sorry but it just doesn't make sense. A home run is not of the same significance as a walk, least of all in "clutch" situations. It may assist from a statistical perspective to treat it as the same event for this purpose, but the real life distortion is so large that meaning is lost.

That Jack Clark's OPS is lower in clutch situations does not surprise me; this is true in general. Interestingly, Bill James studied Jack Clark's clutch performance in the 1980s and said back then that Clark was one of the few players who one could actually demonstrate performed well in the clutch. I remember that the evidence back then seemed pretty persuasive. It has been 15 years, so it might not look as good now, but it'll take more evidence than this to persuade me that Jack Clark was a poor clutch hitter.

Posted 2:16 p.m., February 4, 2004 (#47) - tangotiger
  re: OBA treating all safe events as 1.

In terms of establishing the randomness, I don't see this as a problem. The binomial needs to treat things as binary events (safe/out). So, from that perspective, AED has shown something as statistically significant.

However, is the reason that we get nonrandomness because a player changes his hitting profile so that we were expecting this to begin with? Looking at what I did, I give more weight to the HR and less to the walk. However, in so doing, we no longer have binary events. My lwtsOBA may look binary, but it is not. But, I desperately want to use the binomial. Applying my fudge factor (which should come under alot of scrutiny, much more scrutiny than what AED has done), and I get the same kind of nonrandomness effect (overall) as AED shows with using plain OBA.

Posted 2:26 p.m., February 4, 2004 (#48) - tangotiger
  One test that I've been thinking about (and I think someone has brought up) is to split up the player's performances by base/out.

Right now, the base/out frequency won't be the same with clutch/nonclutch. While 45% of all PAs occur with the bases empty, maybe only 30% do so during clutch situations.

As a way to ensure that the hitter's approach should be the same during clutch/nonclutch, we should ensure that they have the same mix of base/out in clutch/nonclutch. Or, at least adjust all the player's performances based on the base/out (similar to what I did with the opposing pitcher).

If for example, we have in nonclutch situations
Giambi, bases empty, .400 lwtsOBA, .390 OBA
Giambi, men on, .410 lwtsOBA, .380 OBA

and we do this for all 340 hitters (equally weighting them), and we get
all players, bases empty, .345 lwtsOBA, .350 OBA
all players, men on, .345 lwtsOBA, .340 OBA

Then we can adjust the player's OBA with men on to account for that fact that players do have a different hitting approach with men on, and bases empty.

Then, we can take this adjustment, and apply it to the men on during clutch situations.

Does this sound good?

Posted 3:00 p.m., February 4, 2004 (#49) - RossCW
  I'm with David that we don't know why some players hit better than others. It may be that players that just don't care very much do better - the exact opposite of the popular notion. Its also possible that there are a variety of reasons for different players.

On Jack Clark - I have a different memory. I remember being unconvinced by James work on clutch hitting and thinking his peon to Clark sounded like a fanboy exception which reinforced my sense that he had a conclusion in mind in both cases.

Posted 4:05 p.m., February 4, 2004 (#50) - tangotiger
  This time, I tried it the other way.

I took a player's performance when the leverage is at least 0.5, and made that his "regular" performance. Then, I looked to see how they played when the game least counted (LI under 0.5).

Interestingly, there is an effect here as well, but not as much. The true spread in clutch hitting is about 50% wider than the true spread in bored hitting. However, bored hitting also exists.

(This is the case whether I looked at OBA or lwtsOBA.)

The leaders in the "perform much better when the game is not on the line" are: Chipper Jones, Tony Batista, Brian Giles, Vinny Castilla, Nomar Garciaparra.

The opposite, which is the guys who do their worst when the game is not on the line: Tino, Jacques Jones, Richie Sexson, Craig Counsell.

So, it's mostly rising to the occasion, and partly playing down to the crucialness of the game.

Posted 5:09 p.m., February 4, 2004 (#51) - AED
  Tango, how much of an anticorrelation is there between performance in high- and low-leverage situations?

Posted 5:28 p.m., February 4, 2004 (#52) - tangotiger
  I have no idea what you just said. Can I interpret that to mean that you have Jessica Alba's home phone, and you will give it to me?

***

Correlation (r) of the 5 levels of pressure (0 = LI less than 0.5, and 4=LI greater than 2.5). avg PA in each group is: 514, 555, 327, 236, 89

0-1 .66
0-2 .61
0-3 .55
0-4 .47 (.81 OBA0 + k = OBA4)
1-4 .39 (.62 OBA0 + k = OBA4)
2-4 .38
3-4 .39

Not sure how to interpret the differing PA levels in each group.

Posted 6:23 p.m., February 4, 2004 (#53) - J Cross
  I think AED was asking for the correlation btw OPS change in situation 0 and OPS change in situation 4 and "anticorrelation" just refers to the fact this it's negative (or we expect it to be).

Are those just correlations btw OPS (not relative to general performance) btw one situation and another? Looks like they have more to do with the number of plate appearance than anything else. Just as we'd expect, I suppose.

Posted 6:40 p.m., February 4, 2004 (#54) - Steve B
  From the opening comment by Ryan:

"The author notes that the "clutch" hitters are mostly singles hitters and the "chokers" are mostly power hitters."

I wonder if this is because singles hitters hit fewer flyballs?

I have noticed that there are significant variations between OBP and BA in different base-out states. I assume that this is mostly due to different defensive positioning by infielders when a runner is on base, particularly in doubleplay situations. Playing for the DP sems to allow more singles through the infield, and flyball hitters don't take as much advantage from this. I'm wondering if there is a sampling bias in the study, if "clutch" situations contain lots of DP situations while non-clutch situations contain fewer.

I'm not arguing that "clutch" ability doesn't exist, just that I'm not sure it has been identified here.

Posted 7:25 p.m., February 4, 2004 (#55) - AED
  Tango, I should have been clearer. Suppose that you define "neutral" situations as leverage groups 1/2/3. You can then calculate players' "clutchness" by comparing neutral to group 4, and players' "loss of interest" by comparing neutral to group 0. Having done that, is there a correlation (or anticorrelation) between the two?

Steve, it's not that either. I defined yet another "clutch situation" definition that ignores base/out state: 7th inning or later, batting team behind by 1, 2, or 3 runs. The discrepancy between clutch and non-clutch situation batting is still there.

Posted 9:17 p.m., February 4, 2004 (#56) - David Grabiner(e-mail)
  I am not surprised that a well-done study finds a small effect; the key is to have a lot of data. In my own best clutch study, I found a correlation between past career and current-year clutch OPS of .01, with a standard error of .07; the study's .04 year-to-year is consistent with my data.

However, there are still many possible effects which have nothing to do with reaction to the clutch situation. One effect which I had noticed is that good hitters with large platoon splits tended to be chokers; this may have something to do with the tendency of good hitters to choke in AED's study. The probable reason is that good hitters have the platoon advantage less often in the clutch, with relievers who throw from the same side coming in to face them. Weaker hitters are more likely to have the platoon advantage in the clutch, as they will sometimes leave for pinch-hitters when a reliever from the same side is pitching, and may come off the bench to pinch hit when a reliever from the other side is on the mound.

One study which is probably dominated by secondary effects is the study which defines a clutch situation as one with runners in scoring position; this helps explain why it showed the strongest bias on OBP. Virtually all intentional walks are given with runners in scoring position, and the probability that the batter is intentionally walked is strongly dependent on his ability and lineup position. If a team has two equally great hitters batting third and fourth, the fourth hitter will have a higher OBP with runners in scoring position. A study using batting average, which is what increases most in value with runners in scoring position, could be interesting. (Again, I tried this myself, but I couldn't measure an ability from the amount of data I had at the time.)

Posted 11:27 p.m., February 4, 2004 (#57) - tangotiger
  AED removed IBB (and bunts and HBP). I don't think this comes into play, if I'm reading David correctly.

***

I think it's a fair point that if AED (and I) will define clutch in such a way that an abnormal number of those situations are with men on, then we should define the nonclutch or control group or whatever to have the same split.

AED said: First, "clutch" was defined as any plate appearance in the 3rd-5th innings. What I would suggest is to define this the same way as you did for innings 7+ (tie, or tying run on base, at bat, or on deck), but for innings 3 through 5. What this does is that it keeps the same distribution of men on base / bases empty, but in the early innings.

Posted 12:26 a.m., February 5, 2004 (#58) - AED
  David, I don't have the data in the right form to search for players with large platoon splits, but I can eliminate platoon effects by selecting only plate appearances against RH pitchers. Overall, I have a hard time seeing that this makes any difference -- with one clutch definition, such as the GABSB definition, the complete clutch sample gives a marginally larger spread; with another, such as my 8th/later definition, the RHP-only sample gives a marginally larger spread.

I agree about the secondary effects regarding RISP; that is why I didn't make a big deal of it (even though the statistical significance was greater).

Tango, I indeed did the reverse by creating a clutch definition that considered only inning and score (not baserunners), as noted in #55.

Posted 7:06 a.m., February 5, 2004 (#59) - David Smyth
  ---"I think it's a fair point that if AED (and I) will define clutch in such a way that an abnormal number of those situations are with men on, then we should define the nonclutch or control group or whatever to have the same split."

I agree. Over the years 2000-2002 (I don't have 1999), Tejada hit approx. .332/.528 with runners on, and only .313/.437 with bases empty. So this suggests that a disproportion of his clutch PAs as defined by Tango were with ROB. And if Tejada has the same sort of split by ROB/BE in both clutch and non-clutch situations, this suggest that his skill is hitting with ROB, not in his game-on-the-line- resolve.

Posted 9:14 a.m., February 5, 2004 (#60) - tangotiger
  DAvid, in Tejada's case, that may very well be.

But, as AED just noted in his previous post, he created in a subsequent test, a clutch/nonclutch based only on the inning/score. That is, the base/out distribution would be the same in both samples, and he still found a difference.

Posted 3:48 p.m., February 5, 2004 (#61) - tangotiger (homepage)
   
DATA
At this link, you will find data in various crucial situations. I invite analysts and statisticians to use this data for research purposes, and to make public their findings.
Note that this data was not exactly the same as what I have shown in my article. I changed the leverage categories so that there are roughly an equal number of PAs in each grouping. This might make it a little easier for analysis.

Posted 4:50 p.m., February 5, 2004 (#62) - AED
  Thanks, Tango. Just for reference, what are the average LI's of the five groups?

Posted 5:04 p.m., February 5, 2004 (#63) - tangotiger
  Good question. Here are the avg LI and the avg PA

0: 0.14 / 331
1: 0.50 / 346
2: 0.82 / 347
3: 1.19 / 352
4: 2.29 / 345

ALL: 0.995 / 1720

Posted 5:08 p.m., February 5, 2004 (#64) - tangotiger
  Just for those kind of new to LI, what that first line says is:

The average PA in low-pressure situations had 14% the swing of a regular PA. If you expect 1 run to normally generate 0.100 wins, then a run generated during this low-pressure would generate only 0.014 wins.

***

It's also important to note that how a player perceives a pressure situation does not necessarily imply that it actually is a crucial situation. LI really gives you the crucialness of the situation. I have no idea how a player establishes the pressure of a situation (if he even does).

Posted 6:53 p.m., February 5, 2004 (#65) - AED
  Tango, there's a mistake in your calculations. The estimate of true talent has an uncertainty, not just the clutch measurement. So the "model SD" equals:
1.08*lwtsOBA*(1-lwtsOBA)*(1/Nclutch+1/(Nnonclutch+209))
You left out the 1/(Nnonclutch+209) factor. Correcting this and rerunning, I find very good agreement between our respective results for plain OBA in the clutch.

Making this correction, the standard deviation of clutch effect is around 0.01 clutch lwtsOBA, or about 0.2 wins per season.

Posted 10:46 p.m., February 5, 2004 (#66) - tangotiger
  I think I did that in a separate, but non-disclosed, calculation. That is, I did the SD as I showed it, but then I regressed the SD by the 209 calculation to get a true talent clutch performance. I believe that I got the 1 SD = .01 or so. The 0.2 wins for 1 SD would conform to the 0.6 wins for 3 SD that I showed in the end.

However, let me double-check, to make sure that I did everything right. Thanks for double-checking!

Posted 11:01 p.m., February 5, 2004 (#67) - Cyril Morong(e-mail) (homepage)
  I have never done a monte carlo study, so I can’t comment on the method. But I did present research on clutch hitting at SABR32 in Boston. I used a method that Pete Palmer used for testing which players’ performance in the clutch was not due to chance. I looked at players who had 6000 or more plate appearances during the 1987-2001. I had 61 players and I looked at their OBPs when it was close and late and not close and late (OBP with intentional walks removed). I found 6 players were outside the range of + or – 1.96 in their Z-scores (Z = (CLUTCH AVG – NONCLUTCH AVG + EXPECTED DIFFERENCE)/SD, with SD being the standard deviation). Below are the players and their Z-scores. The first 3 are the possible clutch hitters and the last three are the possible chokers. But we would expect 1.5 players to have a Z-score of at least 1.96 and 1.5 to be below –1.96. So at most, we have 1.5 clutch hitters and 1.5 chokers. But we cannot be sure which guys are outliers because of some actual clutch ability (or choking) and which ones got their by chance (a poing Willie Runquist has made). In any case, there are not many clutch hitters here. I can email my handout from this presentation to anyone who wants it.

Edgar Martinez 2.216
Mark Grace 2.144
Tino Martinez 2.029
B.J. Surhoff -2.063
Travis Fryman -2.158
Ken Caminiti -2.390

Edgar Martinez and Tino Martinez could be labeled as power hitters. Also, the author of this study, AED, said that a .250 hitter won’t become a .400 hitter in the clutch, but a .285 hitter can become a .300 hitter. Suppose you get 150 at-bats in some clutch situation during a season. The .015 differential is only 2.25. That 150 at-bats is probably high since I think only about 15% of plate appearances come when it is close and late. 1 or 2 hits seems like a small difference and it suggests to me that the clutch hitting ability found here is very small. I also have some clutch hitting research at my website.

Posted 11:02 p.m., February 5, 2004 (#68) - Cyril Morong
  I think including players who had 1000 plate appearances overall and 250 in the clutch is still a pretty small number. This may only be 2 or 3 years of playing, and that may not be very long. I think more plate appearances would reduce the effect of randomness.

Posted 11:45 p.m., February 5, 2004 (#69) - AED
  Tango, I'd be surprised if you included the uncertainty of the talent estimate. I was able to reproduce your results pretty well without it.

Cyril, that's not the best way to look at it. The probability of getting 6 or more 2-sigma deviations out of 60 points, purely from randomness, is 8%. So by trying to count number of X-sigma deviations, you'll never measure anything subtle. What I've done is measure the cumulative probability of all data being created from randomness alone, and what clutch hitting distribution was needed to make it consistent.

Strictly speaking, chi^2 (the better-known version of z-scores) is based on Gaussian, not Poisson/binomial statistics. So you won't get as accurate an answer as you would using the binomial probabilities, but as long as the sample sizes are large, the difference isn't that big.

Posted 9:25 a.m., February 6, 2004 (#70) - Cyril Morong
  "What I've done is measure the cumulative probability of all data being created from randomness alone..."

Is this what the Monte Carlo method did?

Posted 12:09 p.m., February 6, 2004 (#71) - AED
  The equation in the appendix gives the cumulative probablity. The Monte Carlo test gives a large number of "randomness alone" samples for comparison.

Posted 12:22 p.m., February 6, 2004 (#72) - RossCW
  AED -

Have you used this method on Batting Average on balls in play?

Posted 1:51 p.m., February 6, 2004 (#73) - AED
  Ross, there is an inherent spread in BABIP among pitchers that can be measured; I don't think there's much question of that. The greater question is how much is due to ballpark effects, quality of fielding, pitcher's skill, and luck. Go to Tango's clutch hitting study and click the "solving DIPS" link for a discussion of the topic.

Posted 2:04 p.m., February 6, 2004 (#74) - J Cross
  AED, maybe you could use this method to study the BABIP of pitcher's who changed teams. I'd imagine that there would be almost no correlation of ballpark effects and quality of fielding for pitchers on a new team.

Posted 6:08 p.m., February 6, 2004 (#75) - Cyril Morong
  In my study of players with 6000 or PAs during the 1987-2001 time period, only 5 of the 71 players had a close and late batting average that was .010 or more different than their non-close and late AVG. The biggest differential was for Tino Martinez at .028 (.297 vs. .269). In a 660 at-bat season, and with 15% of at bats coming when it is close and late, this amounts to about 2.77 hits (660*.15*.028). That seems pretty small for the very best clutch hitter.

How many games would this win? I guess we would have to look at the expected runs and wins tables. But close and late can mean alot of things. We don't have to have any one on base. So it might not be easy to estimate it that way.

In basic linear weights, a sinlge is worth .47 runs. Suppose we double its value for close and late (my guess is that it is high and I hope that it would offset the fact that I am only talking singles here). .94*2.77 is just 2.6 runs, or about a quarter of a win. (Again, I raised the run value of the single to try to take into account the extra value of close and late). This seems like a very small difference for the best clutch hitter. And almost everyone else is within .010, so for the vast majority of the players, their clutch hitting ability's impact on winning is very small.

The normal difference during the 1991-2001 period was -.012 (probably because of good relievers, especially on the same hand as the hitter). Even if we give Tino Martinez this extra .012, we are still talking a very small impact on winning.

Then there is the question of using this information. Player A has a higher OPS (or whatever) when the game is not close and late than player B, say 50 points. Player B has a higher OPS when it is close and late. How big does this differential have to be for us to want B on our team and not A? My guess is very big, mabybe bigger than anyone would get. A quick estimate based on my own research on team clutch is that player B's close and late OPS would have to be about 130 points better than A's. Maybe AED could tell us some pairs of players whose relative ranking we have to change based on clutch ability.

In my study, ever player had at least 711 at-bats when it was close and late. 22 were over 1000. I think if AED looked at only players like this, he would find less clutch hitting ability.

Also, at a team level, I think clutch play matters just about the same as non-clutch play. This was my research presentation at SABR33. It is posted at my site, there is a link above. One of the things I did was to have team winning percentage as a dependent variable and team OPS and opponents OPS as the independent variable. Then I broke OPS down into close and late and non-close and late. There was very little increase in the r-squared or explanatory power of the model. Almost all of the incrase was from breaking down opponent's OPS into close and late and non-close and late. That could be due to bullpen strength and not necessarily clutch ability.

Also, the coefficients on the non-close and late OPS were much higher than for close and late. So a .010 increase in non-close and late OPS would increase team winning percentage more than a .010 increase in close and late OPS. Now, there are more non close and late PA's but the close and late PAs are supposed to matter more, so at least in theory, the coefficient on the close and late OPS could have come in higher. But it did not.

Posted 6:56 p.m., February 6, 2004 (#76) - tangotiger
  I think clutch play matters just about the same as non-clutch play

No, that can't be right. I've posted the LIs, and they are very significant.

You may not find it at the team level, because of the amount of noise, but you can certainly and easily find it at the granular level.

The effect is small, but that's because there aren't many clutch situations to begin with. Only 20% of the PAs had an LI of around 1.5 or greater. Shoot that up to an LI of 2.5, and that percentage goes down to a bit over 5%.

Posted 7:49 p.m., February 6, 2004 (#77) - Cyril Morong
  I did not say that clutch play was not significant. Just that close and late OPS contributes no more to winning than non close and late OPS. In mya regression, the coefficient on close and late OPS was significant.

Posted 8:03 p.m., February 6, 2004 (#78) - AED
  Cyril, I'm not sure what you're getting about the correlations of non-close OPS and late vs. close and late OPS with winning. It looks like you're trying to measure LI the hard way. The combined importance of the 15% of PAs that meet your "close and late" definition is 43% that of the other 85% of PAs. Since that ratio would be 18% if LI were constant, this means that the average LI of "close and late" PAs is 2.5 times that of "not close and late" PAs, or mean LIs of 2.0 and 0.8, respectively.

I think we agree that clutch hitting accounts for at most a few hits per year. The issue is leverage. If a player's clutch skill moves 2 hits from "non-clutch" to "clutch" situations (keeping overall batting average constant), this adds about 0.2 wins for the average LI in your definition, more under Tango's definition (which is selected by LI).

Posted 9:05 p.m., February 6, 2004 (#79) - David Grabiner(e-mail)
  A study in _The Hidden Game of Baseball_ found that relief aces' plate appearances had twice the leverage of average plate appearances. Thus I have estimated the relative importance of hitting in the clutch as double that of overall hitting, when clutch was measured as about 1/6 of the plate appearances.

This means that a good clutch hitter who has an ability to hit .270 overall but .276 in the clutch is as valuable as an average clutch hitter who has an ability to hit .272 overall and .266 in the clutch. Such a player would get 1.2 extra clutch hits a year, based on 100 clutch AB.

Posted 10:14 p.m., February 6, 2004 (#80) - RossCW
  there is an inherent spread in BABIP among pitchers that can be measured; I don't think there's much question of that. The greater question is how much is due to ballpark effects, quality of fielding, pitcher's skill, and luck.

I see the problem.

In basic linear weights, a sinlge is worth .47 runs. Suppose we double its value for close and late (my guess is that it is high and I hope that it would offset the fact that I am only talking singles here). .94*2.77 is just 2.6 runs, or about a quarter of a win.

Its pretty clear that the realtionship between hits and runs is not linear - the value in terms of runs scored raises as the number of hits raises. And the value of hits is higher in clutch situations precisely because they lead to more wins than hits in non-clutch situations.

Posted 10:56 p.m., February 6, 2004 (#81) - tangotiger
  A study in _The Hidden Game of Baseball_ found that relief aces' plate appearances had twice the leverage of average plate appearances. Thus I have estimated the relative importance of hitting in the clutch as double that of overall hitting, when clutch was measured as about 1/6 of the plate appearances.

In my post 63, I have 20% of the PAs with the most leverage to have a leverage of 2.3. Granted, different run environments would have different LI.

Posted 11:16 p.m., February 6, 2004 (#82) - Cyril Morong
  AED wrote: "The combined importance of the 15% of PAs that meet your "close and late" definition is 43% that of the other 85% of PAs."

How did you get 43%? What was in the numerator and what was in the denominator?

Posted 12:20 a.m., February 7, 2004 (#83) - AED
  Its pretty clear that the realtionship between hits and runs is not linear

For the team, yes, since aside from a HR you need more than one "positive event" to score a run. Likewise for pitching. For hitters, though, it is basically linear. The result one at-bat only affects that player's next at-bat if the 10+ batters go in one inning. Otherwise there's lots that affects the importance of a batter's at-bats, but the batter himself isn't responsible for it.

Cyril: I used the coefficients you measured: (0.345+0.421)/(0.918+0.845). I assumed the difference between your offense and defense coefficients was probably just from random noise. (Otherwise, it would seem that giving up a run when ahead 4-3 in the bottom of the ninth hurts you more than scoring a run when down 4-3 in the bottom of the ninth helps your opponent.)

Posted 1:26 p.m., February 8, 2004 (#84) - RossCW
  "Its pretty clear that the realtionship between hits and runs is not linear"

For the team, yes, since aside from a HR you need more than one "positive event" to score a run. Likewise for pitching. For hitters, though, it is basically linear. i>

I don't see any way to test the question for individual players. I thought the whole point of linear weights was to assign a run value to an individual action and runs are scored by teams. A good hitting team gains more than a bad team from the same offensive action.

The result one at-bat only affects that player's next at-bat if the 10+ batters go in one inning. Otherwise there's lots that affects the importance of a batter's at-bats, but the batter himself isn't responsible for it.

That is true in one sense - but it isn't really true if a player has the ability to hit better in some situations than others. The number of wins created by home runs in "clutch situations" will vary depending on how you define that term. And it is not likely that variations in situational hitting performance are limited to clutch situations.

Posted 1:54 p.m., February 8, 2004 (#85) - Charles Saeger(e-mail)
  Ross ... oops?

Posted 1:55 p.m., February 8, 2004 (#86) - Charles Saeger(e-mail)
  Alright
oops

Posted 4:42 p.m., February 8, 2004 (#87) - AED
  A good hitting team gains more than a bad team from the same offensive action.

Sure, a batter hitting behind a great OBP hitter means that he will have a disproportionate number of at-bats with a man on base, and thus his at-bats have greater leverage. However, since he is not himself responsible for that increased leverage, his impact on the game outcome is linear and should be treated as such.

The easiest way to look at it is that each player goes to the plate with a certain probability of his team winning, and finishes his plate appearance with a different probability of a win. The difference between the two is very closely related to Tango's lwtsOBA for the outcome of the at-bat multiplied by the LI. The lwtsOBA is the player's responsibility; the LI is not.

Posted 10:30 p.m., February 8, 2004 (#88) - RossCW
  Sure, a batter hitting behind a great OBP hitter means that he will have a disproportionate number of at-bats with a man on base, and thus his at-bats have greater leverage. However, since he is not himself responsible for that increased leverage, his impact on the game outcome is linear and should be treated as such.

It seems to me once you accept the notion that batter's hits are not randomly distributed then this is no longer true. You cannot assume that you can assign an identical value to each hit - if the player had had more opportunities with men on base they would have had more hits. Or perhaps they would have had more if they had played on a poorer team with fewer clutch situations.

The easiest way to look at it is that each player goes to the plate with a certain probability of his team winning, and finishes his plate appearance with a different probability of a win. The difference between the two is very closely related to Tango's lwtsOBA for the outcome of the at-bat multiplied by the LI. The lwtsOBA is the player's responsibility; the LI is not.

But the acutual average impact of an at bat is to reduce the offensive team's chances of winning. At least if you accept the win probabilities that have been published here. The net for offensive players is negative by a considerable margin. I don't think that is true of the linear weights analysis.

Posted 11:59 p.m., February 8, 2004 (#89) - AED
  Ross, I think we're talking about different things. You initially raised the point that run creation is not linear, which I interpreted as meaning how runs are scored within one inning. While that is true, each player's contribution is linear -- enter with X% chance of winning, leave with Y% chance of winning, and Y-X is closely related to the lwts of the outcome times the leverage index.

The net offense is zero, pretty much by definition. There will obviously be rounding issues; I think the study you are thinking of was in error by something like 0.0002 wins/PA -- it adds up over a lot of seasons, but the error itself is negligible.

Posted 1:12 a.m., February 9, 2004 (#90) - RossCW
  The net offense is zero, pretty much by definition

I think this is a common mistake. There is no reason to think that the net offense will be zero. The zero-sum is for both offense and defense. Take the example where teams trade the lead every inning. At the end of every half inning the average at bat for the offense in that inning will have increased the chances of the offensive team winning. You can make examples of similar effect where the team on offense loses ground in almost every inning.

And in fact, when you look at all game situations and how likely a team is to win before and after each at bat, the average change is negative for the team at bat. Teams lose ground on offense and make it up when they are in the field.

Posted 1:22 a.m., February 9, 2004 (#91) - RossCW
  You initially raised the point that run creation is not linear, which I interpreted as meaning how runs are scored within one inning.

What I was actually referring to was over the course of a season, but the point is the same - if players performance varies by the situations they are in then you cannot attach an average value to a typical outcome. There is no typical single since how many singles a player hits depends on the actual game situations they were in and the actual value of their singles to the team also depends on the game situations they were in.

Posted 2:26 a.m., February 9, 2004 (#92) - AED
  Ross, the median change is negative, in that over 50% of at-bats result in outs. However, the average change is zero because the positive changes tend to be larger than the negative changes. The only way to get net offense to be nonzero for the entire league is to estimate the run-scoring environment correctly. If your metrics are set up for 4.80 runs per 9 innings and the league average is 4.90, you'll get a positive net offense and equally negative net defense.

I don't quite follow your second point. By using the LWTS run value for an event and multiplying it by the LI and dividing it by the average wins/runs scaling ratio, you do account for different performance in different situations. In other words, if a strikeout costs your team -0.027 wins on average but the LI is 2.0, then a good first-order estimate is that by striking out, you cost your team -0.054 wins. If you happen to strike out when LI is 2.0 and happen to hit singles when LI is 0.5 (i.e. you're choking), a leverage-weighted lwts estimate will show you as choking. So there are some finer details that get swept under the rug with this approximation, but since detecting any clutch skill is on the hairy edge of our statistical abilities, I think this is good enough.

Note that it is still linear. The total amount of harm done to my team if I strike out in two games with the bases loaded in the 9th and down a run is exactly double the harm done had I only done that once. So the total effect is indeed the sum of the individual effects.

Posted 8:58 a.m., February 9, 2004 (#93) - Alan Jordan
  Tango,
I don't know if you got my email or not, but I analyzed the data that you put up for OBA by the five categories of LI. I used logistic regression. There simply wasn't a statistically significant effect for differences in clutch hitting. The data could be explained by differences in player's hitting ability and LI. Adding terms for players hitting differently in LI didn't explain any additional predictive ability.

I can provide details if anyone is interested.

Posted 10:20 a.m., February 9, 2004 (#94) - tangotiger
  Alan, Yes, I did get your email, but I did not have easy access to my data to confirm your request. I do now, and I will let you know in 2 minutes, if I have any bugs.

I would also appreciate you posting your findings, as well as an explanation as to how to (and how not to) interpret your results.

Posted 10:22 a.m., February 9, 2004 (#95) - tangotiger
  Alan, the data is correct.

Posted 12:03 p.m., February 9, 2004 (#96) - Alan Jordan
  I'll write it up tonight. Work calls.

Posted 12:40 p.m., February 9, 2004 (#97) - tangotiger
  stat-savvies:

1 - When I presented the standard deviation of the standardized scores, and I reported a figure of 1.12 (based on 340 samples), how can I figure out the statistical significance of this figure? For example, if I did it based on only 30 players, that would be different from 340. Can I assume that after a certain threshhold, say 30 or 40, that the figure I reported, 1.12, is essentially 1.12 +/- .005 99% of the time?

2 - How about the effect of the park? Now, I think we are saved pretty well in this regard. Say Tejada, he'd have an equal split of PAs in Oak and away, in each of the 5 categories. Or Giambi, who split his home time between Oak and NYY, the same issue. While the park does add a variance distribution that we have to consider, is it negligible because of the way it's being handled in this case?

(I can see that if we were looking at overall OBAs for all 340 hitters, we'd have to consider the variance of the park, similar to what we did with pitchers.)

Thanks

Posted 12:41 p.m., February 9, 2004 (#98) - RossCW
  Ross, the median change is negative, in that over 50% of at-bats result in outs. However, the average change is zero because the positive changes tend to be larger than the negative changes

No. The mean is negative. While the positive changes may tend to be larger - they are not large enough. I gave you an example. I don't think there is any way for the both offensive teams to have a better chance to win after every inning and have a zero average for the offenses at the end of the game.

you'll get a positive net offense and equally negative net defense.

I was talking about win probabilities and I wasn't talking about just offense, not defense. The study that was done here was based on the actual probability that a team would win given a certain score, inning, number of outs, players on bases. These probabilities were then compared before and after each plate appearance. On average, the offensive team lost ground on each plate appearance. And the defense gained ground on average.

Posted 12:46 p.m., February 9, 2004 (#99) - VoiceOfUnreason
  "But the acutual average impact of an at bat is to reduce the offensive team's chances of winning. At least if you accept the win probabilities that have been published here"

The first of these statements is, on its own, clearly incorrect. Probability is conserved, after all.
P(W)= P(A)P(WA) + P(B)P(WB) + ..., where A,B, etc are the space of mutually exclusive events.

Working in innings (because it's a bit easier conceptually) a teams chance of winning immediately prior to their next turn on offense is the probability that the score no runs times the probability that they win if they score no runs in the next inning, plus the probability that they score one run times the probability that they win if they score exactly one run in the next inning, plus....

This is mathematically identical with the weighted average of the probabilities that they will win after their next inning.

Now, the following things are true, given the nature of baseball scoring in today's game: the most common event of a team's offensive innings decreases their chances of winning (mode), the middle result of the possibilities also decreases the chances of winning (median). But the median (average) outcome is neutral.

You can do the math by innings, half innings, plate appearances, pitches, it doesn't matter - probability is conserved.

Also note that this does not in any way depend upon the fact that the game is zero sum. What that constraint does is ensure that that the sum of the probabilities of each team winning at any moment is unity. But in a different game, where perhaps under certain circumstances both teams could win, that constraint could be violated. The conservation of probability would still hold.

Now, if the win probabilities published here are not consistent with this, then perhaps they are in error, or some subsequent calculations introduced an error. That would be useful to know.

Posted 1:29 p.m., February 9, 2004 (#100) - RossCW
  You can do the math by innings, half innings, plate appearances, pitches, it doesn't matter - probability is conserved

Yep. Every time the probability that the offensive team declines the probability the defensive team will win increases.

Now, if the win probabilities published here are not consistent with this,

The win probabilities are completely consistent with this. If you look at the numbers for defense and offense they total to zero. But the defense is a net positive and the offense is a net negative.

Posted 1:33 p.m., February 9, 2004 (#101) - AED
  Ross, if the mean is negative, it's because of an error in the computation process. As I noted on the thread on this topic, the error amounts to 0.0002 wins/PA, so it's not like it's a huge problem in his methods. However, the true total offense is indeed exactly zero, as is the mean outcome of a plate appearance.

Posted 1:37 p.m., February 9, 2004 (#102) - tangotiger
  "But the median (average) outcome is neutral.", median should read as mean of course.

***

VoiceOfUnreason speaks with great reason, as does AED.

The change in win probability by the offense, on a league level, will always be zero. This is not true at a team, game, inning, or PA level.

It is only true when the underlying expected run environment that generates the win probability tables EXACTLY equals the actual run environment. Furthermore, that run environment is not only with runs per game, but in the distribution of those runs by game and by inning. Fortunately, a Markov chain that only distinguishes by base/out gets us 99.9% of the way there.

The bottom line is that the win prob tables that I generate and use (which are after-the-fact on a league level) will automatically preserve that off=def=0 on a league level.

Obviously, when Pedro is pitching, that's not the case. (And this opens up other thoughts for discourse that is more appropriate with the "Custom LWTS" thread from a few months ago.)

Posted 1:59 p.m., February 9, 2004 (#103) - RossCW
  However, the true total offense is indeed exactly zero, as is the mean outcome of a plate appearance.

That is highly unlikely since there is absolutely no reason it has to be.

The bottom line is that the win prob tables that I generate and use (which are after-the-fact on a league level) will automatically preserve that off=def=0 on a league level.

I accept that if you assume they will be zero they will be zero, but that does not accurately reflect the actual probability as measured. And there is no reason to think it will.

The change in win probability by the offense, on a league level, will always be zero. This is not true at a team, game, inning, or PA level.

You will have to explain how a positive win probability for the offensive teams in one game is accompanied by a negative win probability in some other game.

I gave the example of teams trading leads every inning.

Is it agreed that the win probability of the offensive team went up after each inning? Does that not mean that the average plate appearance in those innings increased the win probability for the offensive team? Doesn't that mean that at the end of the game the average plate appearance by the offense was positive? How does this get made up for in other games? And why would one believe it will?

Posted 2:23 p.m., February 9, 2004 (#104) - AED
  Tango, to answer your question from #97...

The values you are measuring are (variance)/(expected variance). The expectation value for this is 1.0; the random variance in this is 2.0.

For multiple measurements, the random variance drops as 1/N while the expectation value remains 1.0. So for 340 measurements, you would expect the average to be 1.0 with a random variance of 1/170 or a random standard deviation of sqrt(1/170)=0.077.

If you measured a value of 1.12, there is a 6% chance that a value of 1.12 or higher could have arisen purely from chance.

However, I mentioned I was unable to reproduce your findings when I included the uncertainty of the regressed averages. When including this factor, I measure a variance ratio of 1.04, which is well within the noise.

As for your other question, park effects shouldn't really matter, since players are in the same park for low-LI and high-LI at-bats.

Posted 2:40 p.m., February 9, 2004 (#105) - VoiceOfUnreason
  Gah, I need to get caught up. I think I finally understand what Ross is talking about (I don't agree yet, but it no longer seems stubbornly insane), but then Tango lost his mind. Sigh.

Posted 2:44 p.m., February 9, 2004 (#106) - RossCW
  As I noted on the thread on this topic, the error amounts to 0.0002 wins/PA,

The data provided was not based on a theoretical model - but actual probabilities. Unless someone has rerun the data and gotten a different result, asserting there is an error doesn't mean there is one.

Posted 3:08 p.m., February 9, 2004 (#107) - AED
  Ross, it's not that Tango is making an arbitrary assumption that the average probability change should be zero. It's that accurate probabilities will result in a situation where the average probability change should be zero. If the average change is not zero, it is the probabilities that are in error.

Suppose that I assign a 0.90 win probability for the home team in a game tied heading into the bottom of the 9th inning, for example. Obviously at the end of that inning, the home team's win probability will be either 1.00 (if they scored) or 0.50 (if they didn't), which means that the home team's offense gets credited either with +0.10 wins with a score or -0.40 wins if they don't score. Now suppose that only 30% of such teams actually score in this situation, meaning that the "average" wins produced by offenses in this situation is -0.25 wins. By your argument, I would conclude that offenses in this situation tend to produce -0.25 wins. However, that's not right; the reason for the -0.25 win average was that my 0.90 win probability was a mistake and should have been 0.65. Setting the win probability to equal the actual odds that a team in that situation will win the game, the net wins produced equals exactly zero.

Posted 3:24 p.m., February 9, 2004 (#108) - tangotiger
  It's that accurate probabilities will result in a situation where the average probability change should be zero

On a league level (say 1986 NL), it must come out to zero. If you have some 1-0 game where obviously pitching dominated (and the offensive was negative), you'll have some other game that same year that went 11-8 where hitting dominated (and offense was positive). The season, in its totality, is zero=off=def.

But, does it have to be?

Take the case of a T-ball league, with 50 fielders per team. In this case, you will probably find that the standard deviation of runs scored will be far higher than the standard deviation of runs allowed. It is those standard deviations (over a large enough sample) that establishes how much gain there is due to off and def.

But, even so, what impact does that give you? You still start off at win=.50/.50 and you still end up with win=1.00/.00.

So, you will have on team that will be off= +.50, and the other team that will be off= -.50 (and def=0 for both). Over the course of the whole league, what do you think will happen? Well, if one team has off=+.50, and another has off=-.50, then, the league will be off=0.

But, we already knew that because
off+def=0 (by definition)
And, we constructed a league where def=0. So, off has to be zero.

In MLB, the standard deviation of runs scored and runs allowed, over the last 100 years or so, is virtually identical. But, just like in the T-ball league, this information is not needed to ensure that off=def=0 on a league level.

The value of the information of the variance is to establish how to split up each of the PAs, one at a time. If you are in the T-ball league noted above, the full change in the win prob you give to the offense. In MLB, it's not that easy (and really, we can discuss this at length for hours and days).

The net effect is that off=def=0 on a league level.

Posted 3:42 p.m., February 9, 2004 (#109) - RossCW
  It's that accurate probabilities will result in a situation where the average probability change should be zero.

That is not true - the average (net) probability change for both the offense and the defense have to equal zero but there is no requirement that they both be zero. One can be positive to the extent the other is negative.

Obviously at the end of that inning, the home team's win probability will be either 1.00 (if they scored) or 0.50 (if they didn't),

Actually I doubt that is correct - my guess is that actual probability the home team will win is greater than 50-50 even if they didn't score.

Now suppose that only 30% of such teams actually score in this situation, meaning that the "average" wins produced by offenses in this situation is -0.25 wins.

I don't follow this at all. The data you are disputing is not a theoretical construct. We know how many teams will win in each situation based on past observation. That is to say if there is a 90% probability that the home team will win it is because the observed data found that the home team actually won 90% of the games where it was in that situation. If their chances of winning at the end of the inning was 55% then it is because 55% of the home teams win when the game is tied at the end of nine innings. The chances of the home team winning dropped by 35% over the course of that inning. The chances of the visiting team winning increased by 35%.

When you observe the actual data the findings were that on average the team in the field will increase its chances of winning while the team at the plate will decrease its chances of winning. This is based measuring the actual chances of winning before and after each plate appearance.

Posted 3:52 p.m., February 9, 2004 (#110) - RossCW
  So, you will have on team that will be off= +.50, and the other team that will be off= -.50 (and def=0 for both).

If you assume defense is zero then you have to assume offense is zero. I see no reason to assume either if we are talking about the impact on the probability a team will win. It ought to be obvious that the team that wins is fielding when the game ends. (The exception being walkoff victories.) Their chances of winning clearly increased while they were in the field. When Torii Hunter catches a ball or Randy Johnson strikes out a batter they are increasing their team's chances of winning.

Posted 4:31 p.m., February 9, 2004 (#111) - tangotiger
  To clarify this statement:

But, even so, what impact does that give you? You still start off at win=.50/.50 and you still end up with win=1.00/.00.

That should read explicity as:

But, even so, what impact does that give you? You still start off the game at win=.50/.50 and you still end up the game with win=1.00/.00.

Posted 4:51 p.m., February 9, 2004 (#112) - RossCW
  What you have here is actual data that shows that the actual probability of a team winning increases more while they are fielding than when they are batting. This data is being rejected as in error because it fails to match a theoretical construct that says that won't happen.

Posted 5:09 p.m., February 9, 2004 (#113) - VoiceOfUnreason
  "What you have here is actual data that shows that the actual probability of a team winning increases more while they are fielding than when they are batting"

You've said this a number of times. Telling me doesn't seem to help. Can you show me? I'd consider a satisfactory demonstration any which illustrates any average (median) change in a team's probability of winning when they are batting. You don't even have to do the fielding part.

Now, what I would expect to see is a probability of winning before the team bats, a list of states after the team bats, a winning probability associated with each, and a transition probability associated with each. But I'll be quite startled if you do it that way and demonstrate a change, so don't feel locked into that approach.

I expect that, seeing how you reach your conclusion, I will be able to work out what assumption you and I don't share.

Posted 6:15 p.m., February 9, 2004 (#114) - J Cross
  actual probability of a team winning increases more while they are fielding than when they are batting.

Ross, take a step back and look at this statement. If the AVERAGE turn of will lead to "x" probability of winning at the end of the inning then, as I understand it, x is by definition the probability of winning at the beginning of the inning.

Posted 7:07 p.m., February 9, 2004 (#115) - RossCW
  Voice of Unreason - here is the link http://www.livewild.org/bb/index.html posted here of the study I am referring to. If you add up all the numbers for the offensive players and all the numbers for the pitchers (he did not consider fielding) the combined total is 0. But when the pitchers are on the mound the chances of their winning increases and the chances of the team hittin decreases - and of course the converse is true as well.

If the AVERAGE turn of will lead to "x" probability of winning at the end of the inning then, as I understand it, x is by definition the probability of winning at the beginning of the inning.

I'm sorry I really fon't unerstand that.

Posted 7:22 p.m., February 9, 2004 (#116) - VoiceOfUnreason
  "But when the pitchers are on the mound the chances of their winning increases and the chances of the team hittin decreases - and of course the converse is true as well"

This is the bit that I don't see - can you be more specific about the evidence that leads to this conclusion?

Posted 7:52 p.m., February 9, 2004 (#117) - J Cross
  Ross, I left out a word. Should have been "if the AVERAGE turn of events will lead."

Posted 8:18 p.m., February 9, 2004 (#118) - RossCW
  If the AVERAGE turn of will lead to "x" probability of winning at the end of the inning then, as I understand it, x is by definition the probability of winning at the beginning of the inning.

Not if you consider that the other team will then come to bat and the probability they will win will also go down on average.

Voice of Reason - just add up the numbers for batters and the numbers for pitchers.

Posted 8:50 p.m., February 9, 2004 (#119) - VoiceOfUnreason
  OK, so I add up the pitcher numbers ( 4427533 AB, 22760.56 Wins ), and the batter numbers ( 4404677 AB, 12581.43 Wins ). I assume we can gloss over the missing 23K AB. Now what?

Are you proposing that, since the pitchers are getting more credit for wins than the batters, that it necessarily follows that the win expectation is going down (on average) in the offensive part of the inning?

Posted 9:33 p.m., February 9, 2004 (#120) - RossCW
  Sorry Voice of Reason - the numbers I was referring to are in the large download where he gives the net impact of every player on the probability that their team will win. If you add up the impact for the batters (which also includes all pitchers who batted) it is negative. If you add up the impact of all pitchers it is positive.

I don't know what you are looking at. I'll take a look later.

Are you proposing that, since the pitchers are getting more credit for wins than the batters, that it necessarily follows that the win expectation is going down (on average) in the offensive part of the inning?

I have no idea whether that is related to the win probability data or not or what its implications are.

Posted 10:01 p.m., February 9, 2004 (#121) - AED
  I've already addressed that study several times. In 4443803 PAs, the author finds a total offense contribution of -1067 wins and a total pitcher contribution of +1067 wins. This means that the discrepancy is 0.0002 wins per PA. So first off, if you really think you can measure contributions to an accuracy of 0.0002 wins, you're deluding yourself.

More to the point. If I estimate a 0.60 win probability from a specific situation, and the average win probability from the next situation is 0.70, I've screwed up. Put differently, if I calculated a 0.60 win probability for a specific situation but 70% of teams in that situation go on to win, then obviously the win probability in that situation was really 0.70! This isn't an arbitrary "theoretical construct" as Ross claims; it's the straightforward definition of the term "win probability".

Anyone who has done this sort of stuff knows that getting the transition probabilities perfect is ridiculously hard, and that small errors of this size are to be expected.

Posted 11:32 p.m., February 9, 2004 (#122) - tangotiger
  This is such a waste of time. Anyone who does the win prob tables does it virtually the same way... you always get the off=def=0.

Posted 11:40 p.m., February 9, 2004 (#123) - RossCW
  Put differently, if I calculated a 0.60 win probability for a specific situation but 70% of teams in that situation go on to win, then obviously the win probability in that situation was really 0.70!

Then you miscalculated. So where is the evidence for a miscalculation in the data presented. Where was the error? I don't think you have evidence of one except that it contradicts your theory. You believe teams are not supposed to gain ground on defense and lose ground on offense so it must be wrong.

This isn't an arbitrary "theoretical construct" as Ross claims; it's the straightforward definition of the term "win probability".

Well yes it is arbitrary because there is supporting the claim that the actual probability was 70% when the data presented said it was 60%.

There seems to be this assumption that the average decline of the offense only applies to one team - but it applies to both. So while a team's chances of winning may decline while it is at bat, the chances of its opponent will decline on average while they are at bat.

I have given a specific example where teams trade the lead every half inning. The chances of the team at bat winning goes up after every half inning for both teams. But at the end of the game someone wins because, by definition, the chances of the team in the field winning went down in every inning. Since both teams bat and both teams field there is absolutely no reason why they can't both improve their chances while batting or both improve their chances by fielding.

So first off, if you really think you can measure contributions to an accuracy of 0.0002 wins, you're deluding yourself.

Of course you can measure what actually happened to that degree of accuracy. We know that the actual probability of a team winning is not even close to being measured since there are probably hundreds of factors that effect the actual chances that we are not controlling for. But we can measure how often the offensive and defensive teams have won with a no one out, runners on first and third in the bottom of the third inning with the game tied. And we can measure how often they win with one out, and we can measure the difference between those two and we can assign that difference to the player who was at the plate and made the second out. And we can do that for every plate appearance. And when we are done doing that we can add up all those changes. And when we do, the offensive team has lost more ground than it has gained and the defensive team has gained more ground than it has lost. And there is nothing theoretically impossible about that outcome.

Posted 12:08 a.m., February 10, 2004 (#124) - RossCW
  This is such a waste of time. Anyone who does the win prob tables does it virtually the same way... you always get the off=def=0.

Tango - the data presented by Oswalt shows a net loss on offense of over 1000 wins. You can repeat your theory all you want but you have no evidence to back it up and the actual results contradict it. Perhaps you should find where he made his error instead of insisting there must be something wrong.

Posted 12:17 a.m., February 10, 2004 (#125) - VoiceOfUnreason
  What I was looking at was garbage. Bugs in the parser. Now that I've fixed it, I'm also getting 4443803, and plus/minus 1067. That's over some 65K games, so 1.5 % error if it really should be "neutral". I think AED is right, but I'm not certain - I have to think about where Ed could have introduced a bias.

http://www.livewild.org/bb/wintab.html.
Here's what I was expecting to see (Tango, take note). The initial probability in favor of the home team is .546 before the top of the first. What is the probability after the top of the first?


w% freq expectation
0 Runs: .593 : .721 : .428
1 Run : .494 : .156 : .077
2 Runs: .398 : .072 : .029
3 Runs: .309 : .030 : .009
4 Runs: .233 : .013 : .003
5 Runs: .171 : .005 : .001
.547 checks


The frequencies here are specifically those associated with runs scored by visiting team in the first inning. I'm not being terribly rigorous here (you'll have noticed the frequencies don't add up to unity, for instance), but rather am demonstrating that the probability conservation holds at the inning level, and trusting that it is then obvious that it will hold at the plate appearance or pitch level as well [certainly if you use true probabilities, and likely if your estimated probabilities are reasonable].

Posted 12:20 a.m., February 10, 2004 (#126) - RossCW
  Sorry Voice of Reason - the numbers I was referring to are in the large download where he gives the net impact of every player on the probability that their team will win. If you add up the impact for the batters (which also includes all pitchers who batted) it is negative. If you add up the impact of all pitchers it is positive.

I don't know what you are looking at. I'll take a look later.

Ok It is in his large download of the impact of every at bat from 1972 to 2002 for every player. He calculated the chances of winning before the at bat and after the at bat and attributed the change to the batter. He then calculated the net wins for each player based on those outcomes. Some players had a positive impact, others had a negative impact on their teams chances of winnings. The net impact for that entire list is minus 1067 games. The defense had a net positive of 1067 games.

Posted 12:21 a.m., February 10, 2004 (#127) - VoiceOfUnreaon
  Well, that chart is certainly clear. Please pretend that the column headings are over the numbers, and that the .547 is under the third column (it is the sum of the values of that column). Expectation, of course, is Win% * frequency.

Posted 1:38 a.m., February 10, 2004 (#128) - AED
  But we can measure how often the offensive and defensive teams have won with a no one out, runners on first and third in the bottom of the third inning with the game tied. And we can measure how often they win with one out, and we can measure the difference between those two and we can assign that difference to the player who was at the plate and made the second out. And we can do that for every plate appearance. And when we are done doing that we can add up all those changes.

If you actually bothered to do the analysis you suggest, you would find that the offensive team gains exactly as much ground as it lost.

However, this is not what Oswalt has done; he's estimated the state-to-state transition probabilities and computed win probabilities for each state using those estimated transition probabilities. The estimations are pretty good, but not perfect. The errors in those assumptions are what cause the 1000 win discrepancy.

Posted 2:37 a.m., February 10, 2004 (#129) - RossCW
  Using a database of play-by-play accounts of (almost -- a few 1972-1973 are missing) every game in the major leagues 1972-2002, I constructed tables which show a team's chance of winning a game, based on the score, the inning, location of baserunners, and number of outs. I have constructed such tables, one per year, for each year since 1972. In 1996 for example, if the home team is one run behind in the bottom of the seventh with 1 out and a runner on second, they have a .45253 chance of winning.

This sounds like he constructed his tables from the actual results of each situation not estimates of the likely result.

These tables are useful in many ways. One use is to determine the value of a player's performance. In the example of the seventh inning situation above, if the runner steals third, the home team's chance of winning improves from .45253 to .49368. This improvement of .04115 is a contribution of the baserunner. If instead the batter had singled to score the runner from second, the home team's chance of winning improves to .58374, and the improvement of .13121 (= .58374-.45253) is credited to the batter. On the other hand, the pitcher's contribution to his team for this event is the opposite, namely -.13121. Had the batter instead made an out without advancing the runner, he would get a -.07986 for lowering his team's chance of a win to .37267, and the pitcher is credited with +.07986.

And that sounds like he did exactly the analysis I suggest.

By adding the value of these contributions of each event for each player over the course of a season, we get an exact measure, of the value of that player's contributions as compared to average performances, which I call the player's win value. A value of +x can be interpreted as turning x losses into wins, or as contributing 2x to his team's number of games over .500. (emphasis added)

And Oswalt at least did not think he was making estimates.

Well, that chart is certainly clear.

It is clear but it seems to be repeating the basic truism which no one disputes that the net change to both teams combined is zero. At the end of the top of the first you don't even have data to compare for offense and defense if that was your intent.

Posted 3:00 a.m., February 10, 2004 (#130) - RossCW
  At the end of the top of the first you don't even have data to compare for offense and defense if that was your intent.

Actually I take that back. Obviously you can compare the impact of the top half of the inning on the offense and defense.

By taking the difference between the chances of winning at the top of the inning and after you will find a slight advantage for the home team - that is it gets more gain from the failure of the visitors to score than they get from scoring.

i.e. (.598-.546)*.721 = 34 is greater than the combined changes that favor the offense - that total is 32. My guess is that those differences increase the closer you get to the end of the game.

Posted 7:34 a.m., February 10, 2004 (#131) - tangotiger
  To how many decimals did Owsalt set his charts? Did he round, or truncate? I set mine to 38.

We have no good way of assessing the value of fielding, nor of baserunners distracting pitchers, nor of some baserunning plays. But for hitting (including walking and sacrificing), base stealing, pitching

Are WP, PB, Pickoffs, BK, and other events also included? From Ed's description, it does not appear so. In my win analysis I always include these and every single event. Since these events are negative for the defense, and positive for the offense, it's rather easy to see that 1.5% difference could be explained.

The typical win value of these events is about .003 wins. I don't have my Lahman DB handy, but if someone wants to list the # of WP,PB,BK,Picks, we can tell fairly quickly if Ed included those events or not.

Posted 7:35 a.m., February 10, 2004 (#132) - tangotiger
  Uhm, that should be .03 wins (almost .30 runs).

Posted 8:11 a.m., February 10, 2004 (#133) - VoiceOfUnreason
  Ross, the difference you observer (34 vs 32) is rounding error combined with the fact that the frequencies I listed do not include all of the outcomes which favor the offense.

I wonder if that's the error Oswalt made - if he were cropping his analysis at 5 run differentials, then the hitting team gets short changed any time they run the lead past 5. It might not have seemed an unreasonable thing to do (sample size goes to hell at 5, especially if you are treating each season independently).

I don't see any evidence to support that beyond the fact that it appears he introduced an error somewhere, and it seems an easy one to make which still gives mostly reasonable results.

What does he mean "1/24 of the entries"?

Posted 9:56 a.m., February 10, 2004 (#134) - tangotiger
  From 1974-1990, there were: 28,372 WP+PB+BK. There were also 9684 PB. So, that's 18,688 extra plays for the offense. That's based on 2,638,407 PAs.

Oswalt had 4,443,803 PAs from 1972-2002 (or whatever time period he was). Pro-rating my numbers to Ed's and we have 31,476 extra plays that Ed probably does not account for (based on his description).

Remembering that each of these plays is roughly equal to .03 wins, that gives us: .03 x 31,476 = +944 wins for the offense that are unaccounted for (and -944 wins for the defense).

So, that leaves us with a gap of 123 wins on over 4 million PAs, or about 0.2 wins per team per year.

Posted 10:08 a.m., February 10, 2004 (#135) - RossCW
  Ross, the difference you observer (34 vs 32) is rounding error combined with the fact that the frequencies I listed do not include all of the outcomes which favor the offense.

There may be a rounding issue but the argument that some factors favoring the offense are not included is clearly wrong. Anything that results in a runner being advanced will show up in the change from one state of probability to the next and be attributed to the offense.

Posted 11:38 a.m., February 10, 2004 (#136) - VoiceOfUnreason
  "the argument that some factors favoring the offense are not included is clearly wrong."

No, it isn't.

I know that it isn't because I generated the transition frequencies in #125, so I know what data was thrown away.

You know that it isn't because you can add the frequencies in that post, see that they sum to only .994, and being aware that an offense cannot score fewer than zero runs in the top of the first, the missing datapoints must be those where the visitors scored more than 5 runs, which is the part of the probability table that favors the offense.

Now, if you made that correction, you would still find a small discrepency in the probability conservation. There is an error that has been introduced because I'm guessing at the correct transition matrix (the frequencies) - I'm substituting the frequencies calculated from the observed transitions in an old study of mine, which are only going to be very close to the frequencies calculated by Oswalt from the observed transitions in the data from his study. Unless I overlooked it, Oswalt didn't publish his transition matrix, so this is as close as I can get without doing real work.

If we had Oswalt's transition matrix, there would still be a tiny error, from rounding.

But if you would rather believe that it is accident that Oswalt's data, my approximation of the frequency transitions, and the formula applied to them ( from #99 above ) gives an answer off by one in the last place, suit yourself.

Posted 12:04 p.m., February 10, 2004 (#137) - RossCW
  I know that it isn't because I generated the transition frequencies in #125, so I know what data was thrown away.

You will have to explain how it got thrown away. What happened to the wild pitches? Was the probability of the team winning after the plate appearance calculated assuming the wild pitch didn't happen? I don't think so.

The comparison of before and after states doesn't care how the state changed and it doesn't matter. It just measures the change.

Posted 12:15 p.m., February 10, 2004 (#138) - VoiceOfUnreason
  "You will have to explain how it got thrown away. What happened to the wild pitches? Was the probability of the team winning after the plate appearance calculated assuming the wild pitch didn't happen? I don't think so."

Plate appearances? I thought you were contesting the results in #125 which are based on innings, not PA.

As for how it got thrown away, Oswalt only provided in probabilities for the 0-5 runs states, so I didn't have anything to apply the other transitions to. I could have stuck them in, with big questionmarks to show that Oswalt didn't provide matching win probabilities, and additional footnotes remarking that the sample sizes had gone completely to pot and therefore the transition frequencies must be taken with a wife of salt.

Posted 12:23 p.m., February 10, 2004 (#139) - tangotiger
  Coming out of my purgatory, I will address Ross specifically, in the hopes that he will leave my Primate Studies sanctuary.

*****

The win prob matrix includes all events, yes.

But, when running the player data against the win prob matrix, the wild pitches, balks, and PB got thrown away. This is the best interpretation of the Oswalt statement ("We have no good way of assessing the value of fielding, nor of baserunners distracting pitchers, nor of some baserunning plays.") and resulting data.

***

Now, please, just stop posting here, and go back to Clutch.

Posted 12:43 p.m., February 10, 2004 (#140) - RossCW
  the missing datapoints must be those where the visitors scored more than 5 runs, which is the part of the probability table that favors the offense.

Oswalt didn't base his results on that probability table but tables that measured the likelihood of every state before and after each play.

You know that it isn't because you can add the frequencies in that post, see that they sum to only .994,

I get .997 - I am not sure why you think that .003 is significant. For that to be the difference the chances of the home team winning would have to be zero and that is not the case. The impact on the overall probabilities in the first inning are very small - as I said above you would expect the difference to grow as you get closer to the end of the game.

I'm substituting the frequencies calculated from the observed transitions in an old study of mine, which are only going to be very close to the frequencies calculated by Oswalt from the observed transitions in the data from his study.

So we can't calculate the difference in offense and defense reliably based on the data you have provided given the small differences. We still have the results Oswalt did publish for every player based on the changes in state for each play.

What we are left with is:

The only data we have shows that the probability of a team winning increases when it is in the field and decreases when it is at bat.

There is no theoretical reason why it should increase and decrease equally when a team is in the field or at bat.

No one can find an error in Oswalt's methodology or his data that would demonstrate why it is wrong.

But the consensus here remains that he is wrong, the data is wrong and the assumptions that have been held for so long are correct.

What we have is an object lesson in the elusiveness of "objective" truth.

Posted 12:56 p.m., February 10, 2004 (#141) - AED
  No one can find an error in Oswalt's methodology or his data that would demonstrate why it is wrong.

This is false. In my previous post (#128), I did in fact carry out the work you suggested. That is why I know for a fact that, if you took empirical transition properties for base-out-inning-score states, you would indeed find a total offensive value of zero.

Posted 1:18 p.m., February 10, 2004 (#142) - RossCW
  In my previous post (#128), I did in fact carry out the work you suggested. That is why I know for a fact that, if you took empirical transition properties for base-out-inning-score states, you would indeed find a total offensive value of zero.

This is post #128: I don't see any work here or results. And your description of how Oswalt did his projections contradicts his description of them.

If you actually bothered to do the analysis you suggest, you would find that the offensive team gains exactly as much ground as it lost.

However, this is not what Oswalt has done; he's estimated the state-to-state transition probabilities and computed win probabilities for each state using those estimated transition probabilities. The estimations are pretty good, but not perfect. The errors in those assumptions are what cause the 1000 win discrepancy.

I'm puzzled why you keep repeating this claim. You have provided no evidence to support it. Its certainly possible Oswalt is wrong - but simply saying he is is not sufficient for anything.

It would be a remarkable coincidence if the claim that offense and defense are equal is true. There is no reason why they should be.

Posted 1:22 p.m., February 10, 2004 (#143) - J Cross
  There is no theoretical reason why it should increase and decrease equally when a team is in the field or at bat.

On average it can't increase OR decrease whether the team is at bat, in the field or waiting around between innings. Just becasue you are refusing to listen to the theoretical reasons doesn't mean they don't exist.

it's EXPECTED WINNING PERCENTAGE, afterall. As AED stated, if you expect that a team's chances are .6 but an average turn of events (on defense, offense of whereever) will make them .7 then your expectations were wrong. The expected winnings percentage is by definition the same as any subsequent expected winning percentage in the game after the average sequence of events has taken place.

What you're claiming is equivalent to saying that batting average doesn't HAVE to be equal to hit/at bats. We all just think it is without bothering to verify this with data.

Posted 1:53 p.m., February 10, 2004 (#144) - VoiceOfUnreason
  "Oswalt didn't base his results on that probability table but tables that measured the likelihood of every state before and after each play."

So you believe that I demonstrated close agreement with Oswalt's first inning result even though I used a probability table completely unrelated to his situation?

Wow. I guess I should be flattered.

On the off chance that Tango is still reading: "The change in win probability by the offense, on a league level, will always be zero. This is not true at a team, game, inning, or PA level."

I still insist that it has to be, by #99. Is there some counter example you are aware of, or perhaps we are discussing different parts of the elephant?

Posted 2:13 p.m., February 10, 2004 (#145) - RossCW
  What you're claiming is equivalent to saying that batting average doesn't HAVE to be equal to hit/at bats.

We define batting average as hits/ at bats.

No one is disputing the issue of whether the sum of the probabilities one team will win is 1. That is by definition - the same as avg is hits/ at bats.

And no one is disputing that if a players batting average is .333 and they went 4 for 4 today somewhere they were hitless in 8 at bats or their batting average isn't .333.

The question is whether that principle applies here and the data says it doesn't. So is the data wrong or is the logic wrong? I have to admit I'm leaning toward the data - but I'm not nearly as certain as everyone else here.

Posted 2:28 p.m., February 10, 2004 (#146) - AED
  Fine, Ross. Even though you didn't realize that I had done the work, you do now. And as I said, I do not duplicate Oswalt's 1000-win discrepancy; rather I find it to be spurious. If you choose not to believe me, that's your problem - not mine. I have better things to do than argue over something as self-evident as this...

Posted 2:28 p.m., February 10, 2004 (#147) - tangotiger(e-mail)
  Voice, email me, and I'll clear it up.

Posted 3:42 p.m., February 12, 2004 (#148) - Schindleria Praematurus
  Sorry to join up late, but I've got a question about this. To summarize a little of what's come before:

1) clutch hitting ability exists
2) singles hitters tend to have more clutch hitting ability than power hitters
3) all hitters tend to perform worse during clutch situations than they do during the rest of the game
4) #3 is mainly attributable to the superior quality of the pitchers who appear during late innings.

One hypothesis that could explain all of this: singles hitters have an advantage against the types of pitchers who become late-inning relievers that power hitters do not share.

Here's my question: is there any kind of "profile" for late-inning relievers that describes a difference in how they pitch relative to the types of pitchers that we see in the first six innings? (I mean a difference other than the fact that they're better pitchers, a la post #18 and many others, of course) For example, do we see a larger number of power pitchers in the late innings than in the rest of the game?

Because if, hypothetically, there are more power pitchers in innings 7-9, and singles hitters tend to hit power pitchers better than power hitters, that might explain conclusion #2. And if we compare the hitters who have clutch ability with the hitters who tend to do well against power pitchers (or whatever kind of pitcher is disproportionately present in late innings), we might find that this hypothesis explains the observations.

The higher strikeout rate of late inning relievers (post #22) is suggestive of an answer to this question, but I'm not sure where to go from here.

Posted 5:26 p.m., February 22, 2004 (#149) - RossCWXYZ
  Now, please, just stop posting here, and go back to Clutch.

Muwuhahahahahah!