Tango on Baseball Archives

© Tangotiger

Archive List

The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)

Cyril Morong takes "Win Probability Added" and compares it to OPS, among other things, to try to look for clutch performance.
--posted by TangoTiger at 09:51 AM EDT


Posted 10:13 a.m., December 1, 2003 (#1) - Cyril Morong(e-mail) (homepage)
  Tango, thanks for posting this. I look forward to any feedback.

Posted 11:23 a.m., December 1, 2003 (#2) - Steve Rohde
  I do not agree that the fact that the win probability approach dates all the way back to Mills, and the fact that OPS correlates well with Oswalt's results turned into a rate statistic, in any way creates a "problem". I don't know anyone suggesting that this is a totally new approach, although the expanded availability of play by play data makes this approach easier to use.

Also, the fact that OPS correlates well with Oswalt's results, or better yet that OBP and slugging weighted more approriatrely correlates well with those results, tends to confirm that OBP and slugging used together are good quick and dirty measures of value. But that doesn't create a problem in attempting to get more information through the win probability approach.

By the r squared results reported, more than 10% of the variation in Oswalt's results remains unaccounted for by OPS, and more approprately weighting OBP and slugging still leaves more than 6% unaccounted for. To the extent that an offensive player over the years has tended to perform well in the clutch, this would explain part of that unaccounted for variation. So it is added knowledge.

In addition, the win added probability method would appear to have a lot of potential in measuring differences in value added by relief pitchers, where the manager has a lot of discretion in deciding whether a pitcher gets used in high leverage situations. For example, relief pitchers under a win probability approach get a lot more credit by protecting a one run lead than a 3 run lead. Moreover, if a save is blown, it makes a great deal of difference whether the relief pitcher has simply allowed the tying run to score, or whether he has allowed the tying and winning runs to score.

There are some issues that need to be addressed to make the win probabilty approach more useful, including for example, coming up with an appropriate methodology to adjust for park effects, and to take into accounts the different run scoring environments between the two Leagues. But I beleive that continued development of the win probability approach is an exciting area of onging study.

Posted 11:47 a.m., December 1, 2003 (#3) - Sylvain(e-mail)
  2 quick comments:
- concerning the linear regression with PWV/PA, the equation gives PWV/PA = -.024 + .00015*OBP + .000095*SLG which leads to weight OBP /weight OPS = 1.58. In the OPS Begone series Tango you used Base Runs, in this article it seems to be a historical approach, so is it the 3rd time that the 1.6-2.0 coefficient is "confirmed"?

- Cyril, you used only the top100 in PA, therefore only very good players or Luis Sojo are considered; isn't it possible that the regression coefficients are valid just for the "above 100 of OPS+" players? Can't one suppose the "clutch ability" is present but at a somewhat constant level between ML Players? This must have been mentioned before; it just reminds me of the different paths and reflexions during the DIPS debates (BABIP bad measure? Not ability at all or no ability compared to league average? etc)

Sylvain

Posted 11:49 a.m., December 1, 2003 (#4) - studes (homepage)
  I agree with Steve. I don't see how this analysis disproves the existence of "total clutch" hitting. If anything, one could argue that the standard error of 10% is related to clutch hitting, or lack thereof, and that this "proves" the existence of sustained clutch hitting for certain batters.

Posted 11:50 a.m., December 1, 2003 (#5) - Sylvain(e-mail)
  Concerning the 1st point, it is very possible that I missed an article looking at Win Prob and OPS/OBP and SLG....

Sylvain

Posted 11:52 a.m., December 1, 2003 (#6) - tangotiger
  I don't know anyone suggesting that this is a totally new approach, although the expanded availability of play by play data makes this approach easier to use.

Every time I read an article on this topic, and any time a reference to the Mills brothers is not made, it always seems that the author of the piece is claiming a totally new approach being invented.

Also, the fact that OPS correlates well with Oswalt's results, or better yet that OBP and slugging weighted more approriatrely correlates well with those results, tends to confirm that OBP and slugging used together are good quick and dirty measures of value. But that doesn't create a problem in attempting to get more information through the win probability approach.

You DO get more information through a WPA approach. But, is that information strictly a function of the game state context (inning,score,base,out), or is the identity of the hitter/pitcher also important?

By the r squared results reported, more than 10% of the variation in Oswalt's results remains unaccounted for by OPS, and more approprately weighting OBP and slugging still leaves more than 6% unaccounted for. To the extent that an offensive player over the years has tended to perform well in the clutch, this would explain part of that unaccounted for variation. So it is added knowledge.

Side note: according to this, the weighting scheme is 1.6 OBP + SLG.

But, as Cy points out, there is very little "clutch persistency" year to year. If a player does have clutch ability, we can't find it in this measure (context-specific performance minus context-neutral performance or WPA minus OPS-generated-WPA).

In addition, the win added probability method would appear to have a lot of potential in measuring differences in value added by relief pitchers, where the manager has a lot of discretion in deciding whether a pitcher gets used in high leverage situations. For example, relief pitchers under a win probability approach get a lot more credit by protecting a one run lead than a 3 run lead. Moreover, if a save is blown, it makes a great deal of difference whether the relief pitcher has simply allowed the tying run to score, or whether he has allowed the tying and winning runs to score.

I don't think Cy's paper contradicts any of this.

There are some issues that need to be addressed to make the win probabilty approach more useful, including for example, coming up with an appropriate methodology to adjust for park effects, and to take into accounts the different run scoring environments between the two Leagues. But I beleive that continued development of the win probability approach is an exciting area of onging study.

Agreed!!

Posted 12:08 p.m., December 1, 2003 (#7) - tangotiger (homepage)
  One thought on the using WPA or OPS-generated-WPA to estimate next year's WPA. Cy's paper says that OPS is a better predictor. In the Hidden Game of Baseball, it was noted there too that Linear Weights was a better predictor of Mills' Brother Player Win Averages (PWA) in year x+1, than was PWA itself from year x.

This is a similar claim that Voros makes with a pitcher's next year $H is better predicted with the current team $H than a pitcher's $H. The reason that this works is probably partly due to sample size. There were 6 good reasons for year-to-year r to be high, and only 1 of them was due to "ability" of the thing being measured. It might do well to review the Allen/Hsu paper for a list of these (see homepage link).

I'll reprint them here for review:

To recap, the year-to-year r is dependent on:
1 - how many pitchers in the sample
2 - how many PAs per pitcher in year 1
3 - how many PAs per pitcher in year 2
4 - how much spread in the true rates there are among pitchers (expressed probably as a standard deviation)
5 - possibly how close the true rate is to .5
6 - the true rate being the same in year 1 and year 2

So, I wonder that maybe because the spread of WPA will be much larger than OPS-generated-WPA that that doesn't affect the year-to-year r.

As well, WPA is a combination of: ability, high-leverage context PA, performance. It's possible again that a player may find himself in many high-leverage PAs one year and not alot the next year, AND PERFORM THE EXACT SAME in both cases, but his WPA will be much different. This is say like Keith Foulke, who performs great year-after-year, but his WPA will be much different, not because of his clutch performance being an issue, but the number of high-leverage PAs being far different.

To do this study correctly, you would at least have to control for the number of high-leverage PAs being similar year-to-year. That is, the Leveraged Index (LI) needs to be used as well.

Posted 12:09 p.m., December 1, 2003 (#8) - Cyril Morong
  On the 6.4% not accounted for in the regression with OBP and SLG. If I could get the data broken down into singles, doubles, outs, GIDP, etc, the r-squared might go up even more. But Oswalt says that his stat is compared to the league average. I used the Lee Sinnins sabermetric encyclopedia to find relative OBP and relative OPS but I am not sure if I can get relative numbers for singles, doubles, outs, GIDP, etc. with the sabermetric encyclopedia. And all Oswalt shows is his PWV.

The remaining 6.4% could be from some clutch ability and it could be randomness. Any suggestions on how to test that? But as Willie Runquist has pointed out, we can never tell which of the outliers in some clutch hitting stat got there due to chance or from a real clutch hitting ability.

As for the top 100 players in PA. I chose them because it is more likely that randomness would be smoothed out. Some of those guys stuck around along time because they were good fielders. I did run a regression with only the bottom thirty of this 100 (15 were negative in PWV, 15 positive). The r-squared was .844. Then I did the top 30 and the r-squred was .867. Close, maybe there is some small effect of having used good hitters.

But why would all the good hitter still conform pretty closely to the regression line? Why are there still not some who go above what other good hitters do in the clutch?

I will try to run the regression again with more hitters, maybe doing a set of guys who are only negative in PWV. But it takes time to get the data from the Oswalt site matched up with what I generate from the sabermetric encyclopedia.

Posted 12:25 p.m., December 1, 2003 (#9) - Cyril Morong
  The article from the print version of BusinessWeek was titled "Ball Park Figures You Can Bet On." That is a very strong endorsement. Will teams really make personnel decisions based on this stat more than on say, OBP and SLG? I don't think they should. I hope the White Sox don't. The article also mentioned two agents said they would consider using it. I don't think anyone should make financial decisions based on this. I wrote letters to them but heard nothing back.

It would take a several years of data for us to know if a guy truly has some clutch ability as indicated by this stat. That is, he consistently does better in PWV than is OBP and SLG would suggest. I think the burden of proof is on those who push the "total clutch" stat that it does tell us something we don' already know.

Posted 2:00 p.m., December 1, 2003 (#10) - studes (homepage)
  Cyril, I agree with your comments about ballclubs buying that data. I personally don't think that WPA is very useful to front offices, because it lacks predictive value, compared to other stats. I think it would be bizarre if front offices made personnel decisions based partly on value metrics such as WPA or Win Shares. As a fan, on the other hand, I find this sort of data fascinating.

I really don't think clutch hitting is provable. A random distribution of batting, even for players with long careers, will sometimes turn up batters who look like awesome clutch hitters (or the opposite).

Posted 2:05 p.m., December 1, 2003 (#11) - David Smyth
  I found this article very interesting and timely. I too have called into question whether WPA reveals enough new information (compared to OPS, say) to be worth the huge increase in data and computation. I know that much of that is mitigated today by computers, but still it reminds me of an old B James line, where he likened some advanced version of some stat to tying to use a 40 pound garden hoe. If the increase in accuracy (or new info) is only 10%, but 20 times the amount of work is required...

A few of you may have seen my recent discussions of a stat called AWP/ALP, which is certainly not going to take the sabermetric community by storm. But it measures something "real", and I expect that the correlation with OPS would be much lower than that of WPA and OPS, suggesting that the information it is adding is different. The objection to AWP is whether the added info is "worthwhile". I don't mean to "hijack" this topic at all, and if anyone is interested they can check and post in those other threads. But I thought it fit right into the general concept of this thread.

Posted 2:11 p.m., December 1, 2003 (#12) - tangotiger
  The great thing about Win Probability Added (WPA) is NOT about finding a new way to measure a player's performance. With all due respect, that's pretty boring. However, "lists" topics generate 100 times the responses that most other things. Find me one HOF and MVP topic (the ultimate lists topic) that doesn't generate at least 100 posts within 2 days.

The power of WPA is to make you see different things in new ways. As I always say, it's the journey, not the destination.

Leveraged Index is a direct result of WPA. Linear Weight win values are a direct result of WPA. How to walk Bonds is a direct result of WPA. Sac bunt in close/late situations? WPA. There are many things that you would use WPA to answer these types of questions.

But, when you consider the amount of noise/fluctuation in WPA, you've got to be very very careful.

As Cy pointed out, you can compare the incredibly very context-specific WPA, and the very context-neutral Linear Weights, and you get a high degree of correlation for players. WPA doesn't add much in the overall scheme of things.

It adds when you are trying to study a specific question as it relates to wins where it does not correspond directly to a random run (late/close situations).

Posted 2:22 p.m., December 1, 2003 (#13) - Steve Rohde
  Tango,

I agree with the thrust of your post # 12. I would add that the win probablity approach has a certain elegance, and in some sense puts into numbers the way fans and baseball people generally view the unfolding of a game, so that showing that Linear Weights highly correlates with it can potentially serve to help validate approaches such as Linear Weights.

Posted 2:25 p.m., December 1, 2003 (#14) - Steve Rohde
  When developing a regression equation to realte OBP and slugging average to a win probablity rate, clearly there are some things in the "unaccounted for" percentage other than clutch hitting. For example, variables involving base stealing, grounding into double plays, a baserunner taking the extra base, and not striking out can have impact contributing to winning, and would not be directly reflected in OBP or slugging average. My point was however, to the extent clutch hitting occurs, this would also be part of of that unaccounted for percentage.

In addition, I think it is important to point out that the question of the extent to which players can repeat good clutch performance from year to year is a separate question from whether they hit well in the clutch in a particular year, (or even, on balance, over their careers). For example, in 2003 Barry Bonds had 3 walkoff homers and other walkoff hits which no doubt did directly result in actual wins, and added to Bonds' value, and the win probability approach explicitly takes that into account.

Another example is Will Clark vs. Kevin Mitchell in 1989. Both had great years, but Mitchell had a better combination of OBP and slugging average. However I always thought that while Mitchell won the MVP that year with Clark second, that that the order should have been reversed, in part because Clark seemed to always be coming up with big clutch hits that year. And now, in my opinion, Oswalt's numbers provide strong comfirmation that Clark was more valuable than Mitchell in 1989. With Oswalt's methodology, Clark contributed 7.668 wins above average in 1989, to 6.255 for Mitchell.

Posted 2:35 p.m., December 1, 2003 (#15) - Cyril Morong
  Someone spoke of hijacking this thread. I thought metal detectors had been installed. Would somebody call the TSA before it's too late?

Posted 2:54 p.m., December 1, 2003 (#16) - tangotiger
  Steve, I agree that we should use all those things you said, and not just OBP and SLG. That's a very good point, especially for speedsters. Since I have supplied the LWTS run values for 1999-2002 somewhere, it would be a simple enough task (I won't do it now, but someone else can if they like) to come up with the LWTS run value for all hitters, and compare that to WPA. So, the difference would be attributed to the number of high-leverage PAs, and the performance in those particular contexts.

No one, I don't think, is discounting performance in high leverage situations. What is in dispute is whether those performances were the result of some ability, as opposed to random statistical variation (i.e., luck).

Looking backwards, yes, absolutely, give credit to the players only. And, if some guy happened to be lucky in one game (Mark Whiten), one month (Shane Spencer), one year (Brady Anderson), then we do want to count that towards the player's impact to winning the game. He was after all holding the bat.

Looking forward, is it reasonable to think that a player's performance in high-leverage situations will be repeated? That is, if you can get say Manny Ramirez to "rise to the occasion from 1996 to 2003", does that mean that he's more likely than Joe Schlub to perform better in high-leverage situations, relative to their overall performances?

So, give Manny credit, past-tense, for his performance. Don't be so quick to think that, future-tense, he will continue to outpace himself in high-leverage situations.

Is Mariano Rivera a high-leverage pitcher? Some might say that his playoff performance is so out of this world that it must be clutch. Others would respond that he got a good run, and it happened to coincide with the playoffs.

Is Barry Bonds a low-leverage hitter? Up until 2002, his playoff performance was a disaster. Is he now a good clutch performer?

Like Walt Davis said in another post, it's not whether the ability exists, but rather if we have reliable methods to detect it.

Posted 2:58 p.m., December 1, 2003 (#17) - Cyril Morong
  In one of the STATS books they show that Babe Ruth was 0 for 16 in late and close games in the World Series. Those seem like very high leverage. But I would pinch hit Ruth before I pinch hit Mazeroski if it is the bottom of the 9th in game 7 of the WS and the score is tied.

Posted 3:23 p.m., December 1, 2003 (#18) - Steve Rohde
  Tango,

I think we are largely in agreement. And I agree that it is not whetehr the ability exists, but whetehr we have reliable methods to detect it.

Clearly clutch performance from year to year is unpredictable.

But that doesn't mean that clutch ability doesn't exist, but that it is harder to detect whether a player's run of good years in clutch performance is truly an ability or just a run of good luck. Perhaps there are a relatively small percentage of players that do truly have a clutch ability. And I think it is also clearly true that different kinds of abilities can be subject to different levels of year to year variation.

Barry Bonds' generally poor playoff performances prior to 2002, is of particular intertest, however, because in other measures of clutch performance, he generally has performed very well. For example, in the New Historical Baseball Abstract, which came out before 2002, Bill James does a comparison between Bonds and Joe Carter on a number of measures of clutch performance over their careers, and gave a clear edge to Bonds, as having lifted his game in clutch situations more than Carter. One thing that has always struck me about Bonds was that he consistently has lifted his game during the September stretch runs of tight pennant races.

With respect to Bonds' playoff performances before 2002, this certainly could be explained as a random result of small sample size. But I think it is also possible that after his intial failure, which might have been random, Bonds for whatever reason did let the presuure get to him to some extent. But I agree that there is no way to really tell.

Posted 3:32 p.m., December 1, 2003 (#19) - tangotiger
  I think your post is very well said.

I just want to add that if it's hard to detect (like "team chemistry"), then it's not worth trying to figure it out. Whatever signal you do find will be so small as to be useless to you in the amount of noise. For example, when I looked at 2001 Giambi and Ichiro, their clutch performances added 10 runs over their context-neutral performances. The likelihood is that if these guys did have clutch ability, the effect that we could measure would be around 2 or 3 runs per year (and more likely, 1 or 2). And, that's just not worth it, in my view.

Posted 7:08 p.m., December 1, 2003 (#20) - David Smyth
  ---"No one, I don't think, is discounting performance in high leverage situations. What is in dispute is whether those performances were the result of some ability, as opposed to random statistical variation (i.e., luck)."

Another paet of the "dispute" is, or eventually should be, is who should get credit for a player performing in more (or fewer) high leverage situations. If you want to automatically give that credit only to the player, then when is the manager ever going to get credit? If E Gagne posts a 1.20 ERA in 80 IP, who deserves the credit that these IP were in (relatively) high leverage situations? Gagne? No, it was the manager who brought that about. If you want to reply that every manager uses his best reliever as the closer, think again. Every year, there are middle relievers who post better ERAs than the closers on their teams. Some of that is random variation, of course, but some of it is likely a real failure on the part of the manager to identify who his best players are, and how to maximize their leverage within the overall team framework.

So, in all of these systems which try to apportion all of the credit to the players for what "happened", is there ever going to be any "space" for the manager?

Posted 8:35 p.m., December 1, 2003 (#21) - FJM
  Cyril: I took another look at the R^2 for PWV/PA related to Relative OPS. Recall this figure was 90% for the Top 100. It dropped a bit when you looked only at the Top 30 (87%) or the Bottom 30 (84%). Still, pretty high. The trouble is, you are still including extreme outliers like Bonds and McGwire, or Boone and Bowa, in your 2 subsets. The inclusion of such outliers increases the R^2 significantly. To avoid this, I looked not at the 2 ends of the distribution, but the middle. Specifically, I looked at everybody in your Top 100 who had a Relative OPS of 110-119. (I did not use PWV/PA to define the middle group because I wanted it to be reasonably homogeneous in terms of ability, as defined by OPS. If I had done it the other way, I would have included hitters as diverse as Sammy Sosa(118) and Tony Phillips(101).)

My group accounted for nearly half of your original group, 46 to be exact. That's 50% more players than were in your high and low groups. So you might think the R^2 would be pretty high. Not so. It drops all the way down to 40%! More importantly, it identifies 7 players out of the 46 who produced anywhere from 27% to 48% more PWV's per PA than would be expected based on their overall Relative OPS. The Magnificent Seven are Keith Hernandez (+48%), Mark Grace (+46%), Wade Boggs (+38%), Ken Singleton (+38%), Daryll Evans (+36%), Will Clark (30%) and Tony Gwynn (+27%). There is a distinct break at this point, with no one else above +21% (Kirby Puckett and Ricky Henderson). Incidentally, Mr. Clutch (Eddie Murray) ranked 11th of the 46 with +18%. At the other end of the spectrum we find Ron Cey and Carlton Fisk (-27%), Andre Dawson (-31%), Bobby Bonilla and Sammy Sosa (-40%) and Chet Lemon, an almost unbelievable -63%!

Does this prove the players with big pluses were clutch performers, or that those with big minuses were choke artists? No, it doesn't. There are other possible explanations. For example, Ricky Henderson just barely missed the cut. But we know he won a lot of games with his legs, and that is not measured by Relative OPS. If we used a more comprehensive measure of overall value like Linear Weights Ratio, his Clutch Score would undoubtedly drop. Still, there isn't a lot of speed in the Magnificient Seven. So if we're missing something it must be something other than that.

It's especially interesting to compare 2 long-time teammates, Mark Grace and Sammy Sosa. That largely eliminates the ballpark factor. Based on his Relative OPS (111) the model suggests we would expect .0026 PWV/PA from Gracie. He actually produced .0038. Sammy was (and still is) a much better hitter overall (118). Yet he produced only .0023 PWV/PA, compared to an expectation of .0039. If clutch performance isn't the explanation, what is?

Posted 12:01 a.m., December 2, 2003 (#22) - Cyril Morong
  "The inclusion of such outliers increases the R^2 significantly."

Is this true in all regressions or just this one? I never heard this before. Why is it true? Do you mean outliers in terms of the independent or dependent variable? An outlier as an independent variable could also be far off the trend line. I think that would tend to lower the r-squared in a regression.

I used the top 30 and bottom thirty because someone mentioned that players who play long enough to get alot of PA's are good hitters. There were 15 of the 100 who had negative PWV's, so I added the first 15 above them to balance things off as in terms of negative and positive. Then I looked at the top 30 as a check against this.

You mention Ricky Henderson and his legs. But I only used the batting wins portion. Sorry that I did not mention that.

You ask the question if clutch performance is not the explanation, what is? I would rephrase the question as, what caused Grace to do better than expected while Sosa did worse? Grace had a special clutch ability is one answer. The other is randomness. How can we tell which one is right? Also, did Grace or Hernandez consistently do better in PWV than OPS would predict, season in and season out? Did they do so every year? I don't know if that data is available at Owalt's site. It seems the people coming up with these "total clutch" stats don't make all the data available for others to test them.

I ran the regression that you ran with the 46 guys from 110-119. The r-sqaured is low at .427. But the standard error is still pretty low, at .47 wins for a 700 PA season (a full season). Also, 37 of the 46 guys in your mini study are predicted to within .5 wins for a season of 700 PA's. That still seems pretty accurate for such a limited regression.

Of course, the better regression to run is the one with OBP and SLG as separate variables. Probably more players would be within .5 wins. Even better would be to break everything down into singles, doubles, triples, etc. But like I said earlier, it is not easy to put all of that together since I don't think you can generate relative outs with the sabermetric encyclopdia and the independent variables need to be relative to the league average since that is how Oswalt calculated his PWV.

I now have all players (284) with 5000 PA from 1972-2002. To do what FJM did with OPS but broken down by OBP and SLG, I had to figure who was in the middle. Not straightforward since a guy could be in the middle on OBP but not SLG. So I created a stat from 1.5*relativeOBP + relativeSLG. Then ran the regression with the middle 143 or so guys. The r-squared was .736. The standard error was .395 wins for a 700 PA season, still pretty accurate.

Posted 10:33 a.m., December 2, 2003 (#23) - tangotiger
  I published this in an earlier post, and I'll repeat it here:

To recap, the year-to-year r is dependent on:
1 - how many pitchers in the sample
2 - how many PAs per pitcher in year 1
3 - how many PAs per pitcher in year 2
4 - how much spread in the true rates there are among pitchers (expressed probably as a standard deviation)
5 - possibly how close the true rate is to .5
6 - the true rate being the same in year 1 and year 2

So, when looking at year-to-year r, I'd be very very careful. As well, the process being followed in the last few posts smells of selective sampling. Combining the two, and only looking at guys in the middle (where there is very little spread), I would expect the r to be low. I don't think this shows anything.

Say we use student grades. Our students have a "true score rate" of 60% to 90%, but when you look at any one test, the spread will be between 0 and 100%. You can probably come up with a model that says to regress the scores of the student's single test by 30% to come up with your best guess as to each student's true score rate.

But, what if you only selected those students who scored between 80 and 100%? Or who scored below 50%? Or who scored between 50 and 80%? I'm not sure what's expected to happen, nor if this is a good statistical technique (my guess is that it's not, but I don't know).

Posted 3:40 p.m., December 2, 2003 (#24) - dlf
  Is anyone here familiar with a study of clutch pitching? It seems that as the initiator of action, a pitcher could (should?) be more affected by heart / desire / character / balls or whatever cliche you want to use. But of all the hundreds of studies of clutch performance that I've seen, each and every one has looked at hitters rather than pitchers. I don't mean to suggest that Mariano Rivera is necessarily "clutch," but because of the physical and psychological differences in duties, I think it silly to suggest that evidence that Derek Jeter lacks a discernable ability to consistently outperform his norms in clutch situations requires us to believe the same is true of Rivera.

Posted 3:58 p.m., December 2, 2003 (#25) - tangotiger
  Since pitcher mechanics DO change based on the base situation, we should at least see a change there. I wouldn't necessarily attribute that to "clutch" or some "character trait".

Now, does a pitcher "bear down" in close/late situations? Again, any change in late situations can also be explained by fatigue, so again, any change there would not necessarily be a character trait.

So, what are we saying then? We want to look for pitcher performances in the
- same inning/base/out situation
- but different score situation (say 1 run differential versus 4+ run differential)

Again, a pitcher may change his pitching approach based on the score differential to "preserve" his arm, and any differences in performances would not necessarily be a character trait.

So, my question is:
Are we looking to establish if we can find a character trait of a pitcher to "bear down", or are we looking to establish if we can find a difference in performance based on the leverage of the situation?

Ask the question that you are after first, and then, we can discuss how you can construct a model to find the answer.

Posted 4:20 p.m., December 2, 2003 (#26) - FJM
  The difference between the best Clutch performers and the average ones over the course of a long career is only about 1 win (or 10 runs) per year. And yes, there is an enormous amount of random variation, year-to-year. So you shouldn't expect to see consistency from one year to the next. To the extent this does measure clutch performance it will only be apparent when looking at many years. One thing you might try: split the players careers into odd and even years. Identify the clutch (or choke) players using the odd year data, then do it again for the even years. If a player appears on both lists, that is pretty strong evidence.

All regressions display this phenomenon to some extent; for some it's a lot more significant than others. It's a direct result of the methodology, which is LEAST SQUARES REGRESSION. That is, you minimize the sum of the SQUARES of the residuals. So, for example, Barry Bonds and Mark McGwire get a lot more weight in your regression than an average player would. Same for Bob Boone and Larry Bowa on the low end.

Sensitivity testing a regression model is always a good idea, especially when you suspect a small number of data points may be having an undue influence. Still, I agree that excluding half the data is probably too much. So I reran the regression, including all players with Rel. OPS in the 100-119 range. That increased the data base to 83 out of the original 100. I still feel that the 5 superstars (Bonds, McGwire, Bagwell, Schmidt and Griffey Jr.) should be excluded. The same goes for the 12 players who made the list primarily for their gloves, not their bats. Anyway, the R^2 did improve to 74%, still a long way from 90%. But the model hardly changed at all. For every percentage point above average in Rel. OPS, the PWV/PA increases by .000226. That compares to .000224 for the 47 player group. (I inadvertently left Ted Simmons out the first time, hence the increase of one player.)

One thing that expanding the data base did was to add several new candidates for the title of Mr. Clutch, most notably Tony Phillips. With a Rel. OPS of only 101, you'd expect him to have a PWV/PA of .000224. Instead, he's at .0019, over 8 times his expected value. Other strong candidates include Toby Harrah, Ken Griffey SENIOR, Pete Rose, Lou Whitaker and Jose Cruz.

Posted 4:43 p.m., December 2, 2003 (#27) - tangotiger
  PWV/PA of .000224. Instead, he's at .0019, over 8 times his expected value.

Please don't multiply. This is what Bill James does in Win Shares to discredit Linear Weights. PWV/PA is a relative scale. I mean, if it was -.0001 and with OPS he'd be at +.0001, are you saying his value would be -1 times his expected? As I said, please don't multiply.

In your case here, the difference comes out to 1 win / 600 PA, which is fairly large.

Posted 7:30 p.m., December 2, 2003 (#28) - FJM
  I understand your point, and it is a valid concern. Ratios can get very dicey when an average performer does slightly better (or slightly worse) than expected. OTOH, simply showing Tony Phillips Actual-Fitted (.0017) makes it appear his clutch performance is just a little better than that of Mark Grace and Keith Hernandez(both .0013). To use your grading analogy, Phillips is like a C student throughout high school who gets a B on the final exam, enabling him to go on to college. Grace and Hernandez are more akin to B+ students who get an A on the final. I would argue that Phillips achievement is a lot more significant.

Posted 9:47 p.m., December 2, 2003 (#29) - Cyril Morong
  Regarding comment #26, if you look at your regression with 83 players, sure the r-squared is only .74. But the standard error is not much more than for the 100 hitter regression. It goes from .441 to .466 for a 700 PA season. With just 83 guys, it is still less than half a win. Over 60 guys, almost 75%, are predicted to within half a win. So although the r-squared goes down alot, you still have a model that predicts well. A key thing is that the coefficient estimate for relative OPS doesn't change much when you remove the outliers. That shows the regression is not sensitive to who is in and out.

Also, and again, OPS is crude and it is better to run a regression with relative OBP and relative SLG, which I have already mentioned. I put OPS in the study because it is simple and being a single variable, I could put in a graph. And since I put in the graph I thought it would be good to put in the numbers for people to look at.

As for Phillips, someone, by chance is going to be in the tail. We cannot be sure if he is there becaue of clutch ability or luck.

Posted 10:08 p.m., December 2, 2003 (#30) - Ted T
  Yes, only looking at subsamples (such as the 110-119 OPS guys) is methodologically bogus.

Here's why the R^2 plummets. In selecting on a range like this, it cuts down the variation due to OPS in the population. However, it *doesn't* cut down the other variation in that sample. So R^2 will certainly fall. To see this, consider the polar case: if we only selected guys with OPS=115, say, our R^2 would be zero -- we couldn't explain *any* variation using OPS!

I also don't understand this talk about "outliers". In the scatterplot I looked at based on Cyril's data, the guys at the extremes aren't far off the regression line. They aren't outliers. And it's not true that the Bondses of the world carry heavier weight, because least squares is on *differences*, not absolute values.

Posted 11:23 p.m., December 2, 2003 (#31) - Cyril Morong
  As a footnote to what Ted T. posted (#30), I ran a regression with the 32 players who ranged in OPS from 112 down to 108. The r-squared was only .068. But the standard error was still just .482 for 700 PA's.