Individual Poster Page

See copyright notice at the bottom of this page.

List of All Posters

 


Forecasting 2003

February 24, 2003 - FJM

I don't want to turn this into a DIPS thread. But for the sake of argument let's say that Voros is right; most pitchers' performance outside of the DIPS measures [SO/9,(BB+HB)/9, HR/9} is essentially random and hence unpredictable from year-to-year. Then it follows that, even if someone were to come up with a forecast for the 11 pitchers' ERA that is much better than all the others, we must assume it to be primarily luck rather than forecasting skill that is the determining factor. To separate the two, why not have everybody forecast the DIPS numbers as well as ERA?


Are Managers Optimizing Their Best Relievers?

January 2, 2003 - FJM

I want to make sure I understand how the LI is calculated. Let's consider 2 hypothetical closers. #1's typical outing: he enters the game at the start of the 9th inning with a 1-run lead and retires the side 1-2-3. #2's typical outing: he also starts the 9th with a 1-run lead. But he gives up a couple hits and a walk before retiring the side. Would #2 have a higher LI than #1?


Are Managers Optimizing Their Best Relievers?

January 3, 2003 - FJM

My concern is that the LI as defined now may be biased against closers who allow very few baserunners per inning (e,g., Mariano Rivera). Is there an easy way to test this?


Are Managers Optimizing Their Best Relievers?

January 3, 2003 - FJM

Looking at the "giants" of the 1974-90 period doesn't prove anything. As you have pointed out elsewhere, it's almost impossible for a starting pitcher to have an LI significantly different than 1.00. It really only has meaning for relievers.

Over the last 4 years, Rob Nen has a WHIP of 1.125; Mariano's is 0.969. Nen has allowed 41 more hits and 22 more walks while pitching only 18 more innings than Rivera. I suspect that goes a long way toward explaining Nen's higher LI.


Are Managers Optimizing Their Best Relievers?

January 4, 2003 - FJM

I understand your point. When a pitcher gets himself in trouble and his manager leaves him in he is in effect making a new decision, to let him "relieve" himself. I agree with you, except in the case of closers. The way closers are used now, the manager really doesn't have to do any thinking at all. When the closer comes in, everybody knows he will stay in until either he gets the Sv or he gets a BS. There is simply no other option. I'm not saying that's the way it should be; I'm saying that's just the way it is.

Anyway, I guess I'll just have to wait until you get a chance to rerun the data using the relievers entry point, Don't make me wait too long, OK?

In the meantime I have another request. I was surprised your TOP 10 list didn't include several of the top closers of the last 4 years, most notably Billy Koch (144 saves) and Troy Percival (142 saves). I think I know why Koch didn't make it: he really isn't that good. But I certainly thought Troy would make any Top 10 list. Is it really fair to include guys like Shuey who rarely faced high pressure situations and exclude guys like Percival who always did? How do we know Shuey would have performed as well as he did under pressure?

Actually, what I'd like to request is that you rerun the study using the Top 10 Closers of the last 4 years, using Saves rather than mFIP as the selection criteria. Also, since most of the top closers in 2002 were relatively new to the role (e.g. Smoltz, Williams, etc.), could you look at that group as well? Thanks a lot.


SABR 301 - Talent Distributions (June 5, 2003)

Discussion Thread

Posted 8:21 p.m., July 9, 2003 (#28) - FJM
  In a perfect world there would be a perfect correlation between talent level and playing time. The real world is far from perfect. I hope we can all agree that Barry Bonds was the best outfielder last year. Yet he was tied for 33rd (with Trot Nixon, no less!) in PA's. Manny Ramirez had the 2nd best OPS among outfielders, yet he ranked 55th in PA's.

Injuries are the most obvious reason that the correlation breaks down, but they are not the only one. Ichiro lead the way in PA's, but his OPS was far down the list. (In fact, he barely beat the aforementioned Mr. Nixon, .813 to .808.) Darrin Erstad ranked 15th in PA's, despite a lousy .702 OPS. I hope your future studies will develop are more complete Playing Time model.



Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 5:11 p.m., June 24, 2003 (#7) - FJM
  It is often said that closers don't pitch well in low leverage situations. Can you confirm or deny that? Does performance really improve with LI? Can you post OBA or OOBP for each pitcher in each LI Class?


Reliever Usage Pattern, 1999-2002 (June 24, 2003)

Discussion Thread

Posted 7:29 p.m., June 25, 2003 (#19) - FJM
  Walt: I don't think you can conclude that, just because most situations are low leverage, there must be a lot of blowouts. Consider all 2 out, nobody on, score tied situations. Unless it's late in the ballgame, these are low leverage situations. Hitting a homerun is about the only thing you can do to significantly alter the win probability, and even that wouldn't make a big difference early in the game.


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 8:35 p.m., June 26, 2003 (#11) - FJM
  I'm surprised that no one has mentioned there is a serious problem with the regression in the paper. It can most easily be seen if you put his first table into a spreadsheet. Then multiply n, the number of players in each group, by Mean PA. This gives you the number of PA's for all players by group. For example, in the 1-5 PA group we have 53 players averaging 3.8 PA's each, or a total of 201 PA's. At the other extreme we have 24 players averaging 7,271 PA's, or a total of 174,504 PA's. (Actually, the next-to-last group has even more: 49x4,835=236,915.)

So, what's the problem? Regression proceeds by minimizing the sum of the squared errors, ASSUMING that the data at each point along the regression curve is equally significant. But that is obviously violated here. When you have around 200,000 PA's, you can be very confident that the measured value is very close to the real value. When you have only 200 PA's (from 53 different players, no less!) you really can't conclude anything at all. Yet the regression treats the first point as if it is every bit as significant as the last.

To do the regression properly, you would need to "weight" each point by the total number of PA's it represents. That is, if the first point appears once in the regression, then the last two points would appear about 1,000 times each. (Actually, the next-to-last point would appear 236915/201=1,179 times, the last 174504/201=868 times.)

Of course most regression routines can't handle anywhere near that many points. But that's OK, because what this tells us is that there is very little useful information in the small PA groups anyway. In that case, you can throw them out of the regression altogether without affecting the result.

In this case, I'd probably throw out all groups with less than 5,000 PA's, which in this case turns out to be up through the 71-110 PA group. Then the smallest group becomes 111-150, with a total of 6,630 PA's. If that data point enters the regression once, then the next-to-last group should go in 236915/6630=36 times, which is certainly feasible.

I don't have time to do it now, but I'll post the result tomorrow.


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 2:00 p.m., June 27, 2003 (#14) - FJM
  Here is the regression equation using the "weighting" scheme outlined above: BR/PA = 0.01656 * ln(PA) - 0.129.

I must admit I expected the effect of the change to be more dramatic. To refresh your memory, Nate's version was BR/PA = 0.0154 * ln(PA) - 0.117. Although the two equations look similar (and, in fact, produce similar results over most of the range) they differ significantly at the upper end. For the group with the most data (4201-5000), which has an actual BR/PA of 0.0102, Nate's model gives 0.0134 while mine says 0.0118. His model fits the data better than mine in the 4 lowest ranges: 111-150, 151-200, 201-300 and 301-400. That's to be expected, since they get very little "weight" in my regression (1, 1, 2 and 2 respectively). On the other hand, mine fits better than his in all but one of the other ranges.

Ironically, the lone exception is the very last group (5,501-10,184). Both models seriously underestimate the actual BR/PA here, but his does come quite a bit closer (Actual: 0.0231, His: 0.0199, Mine: 0.0186). I suspect the reason we both do so poorly here is that this group, which has only 24 members, includes some truly exceptional players. Note that their productivity as a group was more than double that of the next lower group. It would be helpful if someone could produce the individual numbers for each member of this group.

This led me to another flaw in the analysis. Nate says: "I've compiled the performance of all non-pitchers whose careers were completed between the years 1973 and 1992..." I assumed he meant that he compiled the career numbers for all players who retired during that period. But he goes on to say that the player with the most PA's was Dave Parker with 10,584. Well, both Pete Rose and Hank Aaron ended their careers during that period, and they had roughly 15,000 PA's each. Apparently he only included PA's that occurred during that 20-year period, which means players like Aaron and Rose are misclassified and their numbers are incomplete. An even greater distortion occurs in the case of Willie Mays. He retired in 1973, the first year of the study, with about 250 PA's and a .211 BA. Please rerun the study, either by including the entire careers of all these players or, if that is not possible, by excluding them entirely.

One more thing I discovered


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 2:05 p.m., June 27, 2003 (#15) - FJM
  Ignore the "One more thing I discovered" line at the end of the previous post.


Cycles (June 27, 2003)

Discussion Thread

Posted 5:26 p.m., June 27, 2003 (#4) - FJM
  PhillyBooster makes a very good point about the danger of compounding averages. It's even more critical than that. Since the vast majority of potential cycles fail for lack of either a triple or a homerun, let's confine ourselves to those two stats. Last year there were a total of 921 triples and 5,059 HR's hit. Of the 5,059 homers, 1,189 (23.5%) were hit by players who didn't have a triple all season! So nearly one quarter of all HR's could not possibly have contributed to a cycle. Another 1,103 HR's (21.8%) were hit by players who had only one triple all season. While we can't rule out the possibility that these contributed to a cycle, the odds are overwhelmingly against it. So nearly half last year's HR's were essentially "wasted" on players who had little or no hope of hitting for the cycle.


Fewest BB / PA, since 1947, min 150 PA (July 10, 2003)

Discussion Thread

Posted 5:42 p.m., July 10, 2003 (#2) - FJM
  I agree. He is well on his way to setting a record for control that may never be broken. And what makes it all the more amazing is, his sixteen year history doesn't give the slightest indication he is capable of anything like this. Coming into this year he was averaging an uninspiring 53 BB's per 1000 PA. Even in his best year, 2000, he could only get it down to 31.4 per 1000. How does a 40 year old pitcher suddenly cut his walk rate by 79% (relative to his career average) or 64% (relative to his previous best)? I have no idea, but apparently the Questec system doesn't bother him.

Here's another amazing bit of information to keep an eye on. He has 7 HBP's to go with his 6 BB's! Has anybody ever hit more batters than they walked over an entire season? Coming into this season he had walked nearly 10 batters for every one he hit (604/61).


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 7:44 p.m., August 5, 2003 (#26) - FJM
  Let's change the R's to R^2's for the 250+ group so we can think in terms of % of variance explained.

1B/BIP: .18^2 = 3.2% explained.
XB/BIP: .17^2 = 2.9%.
XBH/PA: .22^2 = 4.8%.
1B/PA: .40^2 =16.0%.

Now the question is, which of these doesn't belong with the others? I think the answer is pretty clear.

It makes sense that XBH/PA would have a somewhat higher R^2 than XB/BIP since the denominator is bigger and hence more stable. (If K/PA, BB/PA and HR/PA were all perfectly correlated year-to-year it wouldn't make any difference. Since they aren't, it does.)

It also makes sense that the XB/PA R^2 is only a little higher than the XB/BIP R^2, because the bulk of the variation comes from the numerator, not the denominator. But that means that the 1B/PA R^2 should also be only slightly higher than the 1B/BIP R^2, not 5 times greater. Something is wrong somewhere.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 5:10 p.m., August 6, 2003 (#41) - FJM
  As long as the ability to prevent hits on BIP is viewed as a skill WHICH DOES NOT CHANGE from year-to-year, then your model is an accurate representation of the real world and the correlation coefficient increases with the width of the range of abilities. But that assumes every 0.20 pitcher remains a 0.20 pitcher, every 0.18 pitcher stays right there, and so on. How realistic is that? Well, if the range of abilities is very narrow, then the chance of any pitcher greatly improving (or worsening)is very remote. But if the range is very wide, significant changes in year-to-year ability are certainly possible.

At the extreme, assume your range is very wide (say, 0.12-0.28) but the pitcher's ability in year 2 is completely independent of his ability in year 1. Then the correlation coefficient is 0 by definition. So you can get a small r in either of 2 ways: 1)very small differences in true ability among pitchers with a lot of random variation, or 2)large differences in true ability accompanied by large year-to-year variation in that ability for individual pitchers. DIPS assumes that reason #1 is THE reason for the low correlation. I strongly suspect that #2 plays a very significant if secondary role.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 7:54 p.m., August 10, 2003 (#76) - FJM
  The GB/FB distinction is important for assessing the impact of fielding on a pitcher's BABIP. But no less important than separating lefties from righties, at least among the starters. In 2002 Arizona's 3 RHP starters saw LHB's in 47% of their BFP, so the fielders were about equally tested. In contrast, Brian Anderson saw LHB's only 26% of the time. And Randy Johnson? Only 15%.

Ironically, both Anderson and Johnson had higher BABIP's against the lefties than the righties. In Randy's case it was almost a wash (.295 to .291). But Brian was hammered by lefties at a .323 clip, compared to .288 by the righties. It would be interesting to know how much of that disparity was attributable to his fielders.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 1:18 p.m., August 11, 2003 (#79) - FJM
  I didn't make my point as clear as I should have. I don't think it is necessary to take the ball distribution down to the individual pitcher level. All 3 RHP's faced 47% LHB's, so the only important distinctions among them are GB/FB and Power/Finesse. On the other hand, not only did the D'backs lefties see a lot less of LHB's, the mix was very different between the two of them. Even here, I think the GB/FB and Power/Finesse splits would probably be enough of a split, although the Big Unit really should be in a class all by himself. But the LHP/RHP split is fundamental.

Of course if you start from the assumption that all fielders have equal ability, then nothing else matters. But that assumption is so far removed from reality that I'm afraid it will significantly understate the standard deviation attributable to fielding.

On the other hand, as with pitchers, I don't think you need to simulate separate fielding ratings for each position on the team, much less for every individual at each position. The left side of the infield could be rated as a unit. The right side could be treated similarly as far as assists are concerned, although the first baseman's ability to turn bad throws into outs might require a separate rating. The outfield could get by with 2 ratings, a LF-CF rating and a CF-RF rating. Finally, I don't think you need to rate the catcher at all for this purpose. The difference between a great catcher and an average one (or even a bad one) generally comes down to some combination of stolen base and wild pitch/passed ball prevention, neither of which affects BABIP. (Of course, pitch selection is part of it too, but that is generally credited to --- or blamed on --- the pitcher.)


BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 7:21 p.m., August 11, 2003 (#2) - FJM
  The Stats (now Stats/Sporting News) green book has component factors. They vary too much from year to year to be given a lot of credence, but they're better than nothing. They also show 3 year averages.


BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 9:07 p.m., August 11, 2003 (#5) - FJM
  There is something weird going on at Dodger Stadium this year. 11.7% (103/883) of all hits by the Dodgers and their opponents have left the yard. That compares to only 7.4% (67/907) on the road.

It is true Nomo has been victimized the most, with 18.9% (14/74) at home and just 8.2% (4/49) on the road. But it's affecting the rest of the Dodger staff too: 12.4% (43/348) home, 7.7% (30/391) away.

The Dodger hitters are not affected as much: 9.8% (46/461) home, 7.1% (33/467) road.

Whatever is going on here, it's not a sample size issue.


BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 12:45 p.m., August 12, 2003 (#7) - FJM
  Using BINOMDIST in Excel, the probability of 50 or more HR's given 400 hits and a true 10% rate is 4.3%. The probability of 30 or less is 5.2%. When you include Nomo and the Dodgers hitters the probability drops to around 1.5% either way.


BP - Sample size and park factors (August 11, 2003)

Discussion Thread

Posted 5:49 p.m., August 12, 2003 (#9) - FJM
  It's not quite that simple. We came up with our presumed 10% rate by combining the Home and Road data. So if the Home HR % turns out to be much higher than 10%, then the Road HR % must be much lower. In other words, the two results are not independent. Multiplying the probabilities together is appropriate only if they are independent.

As it turns out, HR rates were somewhat higher in Dodger Stadium in 2002 as well, although the difference was much less than it is this year. The Dodgers pitchers had 13.4% of their hits leave the park at home vs. 11.8% away. For Dodger hitters it was 10.9% home, 10.3% away. Putting them together we have 12.1% home, 11.0% away. So HR Rates have fallen just slightly in 2003 in L.A. while the Road % is down 33%!


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 7:59 p.m., August 20, 2003 (#6) - FJM
  I took a quick look at the Team ZR's for SS and 3B, 2001-03. I wanted to see how close they came to your estimate for observed standard deviation (.025). The answer is, pretty close, but there is an important caveat attached.

At SS I got a 3 year average of .843 with a st. dev. of .027, very close to your number. But that doesn't tell the whole story. The skewness parameter is -.766, which is highly significant. What does that mean? Two things. 1)There are a lot more teams with above average shortstops (49)than there are with below average ones(41). However, 2)the difference between the worst shortstop (.741) and the average is much greater than the difference between the best (.918) and the average. The distribution is skewed to the left. THIS IS NOT A NORMAL DISTRIBUTION; it's not even close. That makes sense, when you think about it. There is a practical limit to how good a shortstop can be. Only Superman or the Flash could get to every ground ball, and only the guy with the big red "S" could throw out every runner from deep in the hole. On the other hand, a really bad shortstop is limited only by the patience of his manager. (Incidentally, the Yankees rank either 29th or 30th all 3 years.)

The situation is a bit different at third. The average is .762, suggesting that a lot more balls get through. The st. dev. is also higher, .031, suggesting there is greater variation in ability. That makes sense too, since 3rd base is viewed by most people as being primarily an offensive position. Yet the skewedness parameter is much less extreme (-.198). There are nearly as many teams below average at 3B (44) as there are above (46). Assuming a normal distribution here is probably OK.


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 3:06 a.m., August 21, 2003 (#8) - FJM
  I'll give it a try. I wouldn't expect too much, though. The wide variation in number of games played will make the observed standard deviation questionable, unless the selection criterion is set awfully high. And if I do that, I won't have enough observations to work with. Is there any way to get annual Team UZR's? I'm also unclear how UZR Runs/162 translates to UZR percentage.


Solving DIPS (August 20, 2003)

Discussion Thread

Posted 9:06 p.m., August 22, 2003 (#14) - FJM
  Dirk: I'm guessing that Woolner simply used each batter's overall stats to compute these averages. But if you're trying to determine how tough a batter is for a particular pitcher, that's the wrong approach. At the very least you need to consider how each batter does against LHP/RHP, whichever is appropriate. Depending on the pitcher, you might also need to look at how each one does against GB or FB pitchers and/or power/finesse pitchers. In its present form, this is pretty useless.


CF Rankings (August 22, 2003)

Discussion Thread

Posted 8:45 p.m., August 22, 2003 (#7) - FJM
  KJOK: "Interesting San Diego, with "old" Steve Finley, is near the top..."

Old Steve has been in Arizona since 1999. According to Mong, he's slightly below average.

I have a hard time believing Torii Hunter is below average.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 8:28 p.m., September 22, 2003 (#5) - FJM
  There's another aspect to this highly subjective process that nobody has touched on. I have often heard it said: "Joe Blow may be the MVP of the league, but Jack Black is the MVP on Joe Blow's team." That always struck me as odd. Now, having read the instructions, it is clear (to me at least) that it is not merely odd but a violation of the rules. In order to qualify for league MVP voting at all, Joe Blow must first of all be the MVP of his own team. That suggests a two-stage voting process: each team selects its own MVP first, then the sportswriters (or whoever) select among the team MVP's. This process would alleviate several weaknesses of the current system, such as a team with 2 strong MVP candidates having their votes split.


Most pitches / game in a season (September 22, 2003)

Discussion Thread

Posted 11:58 a.m., September 23, 2003 (#7) - FJM
  I think we can agree that many pitchers could handle a 120-pitch average, if that's what they became accustomed to. The real problem is not so much the mean but the standard deviation. Throwing 120 every time out is very different than throwing 100 in half your starts and 140 in the other half. Could you post the st. dev. as well? Alternatively, show the total number of starts and the number in the 120's, 130's and 140's separately.


Most pitches / game in a season (September 22, 2003)

Discussion Thread

Posted 8:14 p.m., September 23, 2003 (#12) - FJM
  TT: I tried to test your Johnson/Radke hypothesis. First I looked at all of Radke's 2001-03 starts where 1)he pitched at least 7 innings, and 2) BB+SO < IP. There were a total of 38 starts that qualified. Here is what his average qualified start looked like. IP: 7.7, H: 6.4, BB: 0.7, SO: 3.9, Pitches: 99.5.

I then ran a multiple regression on the data, forcing the intercept to 0. Although the fit wasn't very good (R = .55), the coefficients looked reasonable, except for the BB coefficient. Here is the Radke Model: Pitches = 9.45*IP + 3.40*H + 1.34*BB + 0.87*SO. Of course, the BB coefficient should be greater than the H coefficient. But since Radke hardly ever allows more than 1 walk, they don't affect the model very much.

When I tried to do the same for Randy Johnson, I ran into a problem immediately: he had only 7 starts that met the criteria! With so few data points, fitting a model was out of the question. So instead I used the coefficients from the Radke model and applied them to Randy's average start. Here are Randy's numbers: IP : 7.7, H: 7.0, BB: 0.4, SO: 5.7, Pitches: 103.1. Given those numbers, the Radke model predicted he would average 102.3 pitches. That's less than a 1% error. Not bad!

To further test the Radke model, I did the same thing for another non-contact pitcher, Curt Schilling. He had 12 starts that qualified. Here are his averages: IP : 7.8, H: 8.2, BB: 0.5, SO: 5.5, Pitches: 108.7. And the Radke model predicted 107.3. That's a 1.3% error. Again, not bad!

My conclusion: using a model derived from a contact pitcher and applying it to a non-contact pitcher does not result in a significant underestimate of pitch count in games where his strikeout count is lower than normal.


Most pitches / game in a season (September 22, 2003)

Discussion Thread

Posted 12:16 p.m., September 25, 2003 (#15) - FJM
  Here's a possible explanation for why your theory didn't pan out. I've studied Schilling's performance at various ball-strike counts extensively, and one thing stands out. Hitters go up there hacking at the first pitch that they can put in play far more than they do with most other pitchers. That makes good sense, when you think about it. They know they have almost no chance if he gets 2 strikes on them, so they try to beat him to the punch. I haven't looked at Johnson as closely, but I'll bet they do it to him too. So the reason your model overestimates their counts may have more to do with first and second pitch BIP than it does with 2 strike BIP.



Aging patterns (September 23, 2003)

Discussion Thread

Posted 12:47 p.m., September 29, 2003 (#23) - FJM
  Also remember that the ratios in the table are based on HR Rates, not the raw number of HR's. His number of AB's is likely to decline significantly as a player reaches his late 30's, even if his production rate remains fairly constant. Also, his strikeout rate will increase, giving him relatively fewer opportunities to make contact.

One thing I don't understand, TT: why do you subtract HR's from the denominator: HR/(AB-K-HR)? Also, why subtract K's from the denominator of the K-rate: K/(AB-K)?

One


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 8:04 p.m., September 25, 2003 (#6) - FJM
  Since eraBB > eraH and since eraHR >> eraBB (and presumably eraHR > era3B > era2B > era1B), it follows that eraBB >> era1B! (">>" means "much greater than".) Which appears to contradict the observation that it doesn't matter how the batter gets to first base. Can you post the values for era1B and era2B?


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 8:19 p.m., September 26, 2003 (#12) - FJM
  I think you'll find that era2B3B is more like halfway between era1B and eraHR.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 12:22 p.m., September 29, 2003 (#18) - FJM
  You picked up the wrong column of numbers from the Tango's leadoff table. You want the "Reach" column, not the "Score" column. Specifically, the ratio you want is 82,637/265,610.

Since that works out to .311, your point is still valid, however. That's still far less than the overall BB/H ratio you quote (.375).

I wonder how much of that difference can be explained by Intentional Walks. Frankly, I was surprised to see the leadoff batter ever getting an IBB.


Pyschological Impact of a Devastating Outcome (September 27, 2003)

Discussion Thread

Posted 5:28 p.m., October 1, 2003 (#7) - FJM
  Building on Andrew's suggestions.

Walking or hitting the opposing pitcher (or #9 hitter in the AL).
Walking or hitting the 1st batter, especially if he is a weak hitter.
Allowing a stolen base, esp. to a runner who almost never steals.
Allowing a HR to a guy with no power.
Allowing a hit on an 0-2, 1-2 or 2-2 count, esp. with RISP.
Allowing a hit on a 2-0,3-0 or 3-1 count with 1st base open and RISP.
Losing a no-hitter (6th inning or later).
Wild pitch/passed ball on the 3rd strike.
Blooper falls in.
A visit by the pitching coach.
Returning to the mound after a very long half-inning.


Pyschological Impact of a Devastating Outcome (September 27, 2003)

Discussion Thread

Posted 8:03 p.m., October 1, 2003 (#9) - FJM
  I'm sure hitters get just as frustrated as pitchers. But their frustration is usually tied to a specific pitcher. As soon as the starter leaves the game, it's like starting fresh. So you'd only be able to use a fraction of the PA's in a typical game.

You could try looking at hitters who have been "owned" by a particular pitcher in the past. But is that caused by frustration or by a "real" problem, like the inability to pick up the ball because of that pitcher's motion?


2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 11:22 a.m., October 3, 2003 (#15) - FJM
  Bob: You could test your hypothesis simply by removing all Rockies games, home and away, from your data base and recomputing the PF's. You should still have enough data to look at for the NL West teams on a 3-year basis, except for the Rox themselves of course.

A couple other things you might want to check. BOB has the reputation, rightly or wrongly, of being a much better hitters park with the roof open. Also, since the Questec system was installed there, umpires supposedly won't give pitchers borderline strikes there that they can still get in non-Questec parks. (Both of these observations courtesy of Curt Schilling.)

Finally, I have heard many times that the Southern Cal. parks play very differently day vs. night. Can you do the split?


2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 5:57 p.m., October 10, 2003 (#20) - FJM
  It's always hard to compare starters and closers. So let's compare Gagne to Smoltz. Does anyone believe EG contributed 56% more to LA than JS did to the Braves? He threw 28% more innings, so that explains half the difference. Where's the other half?

While we're on the subject, does anyone believe Pudge was the 9th best catcher in the NL defensively?


Player Game Percentages, World Series (October 8, 2003)

Discussion Thread

Posted 8:04 p.m., October 9, 2003 (#8) - FJM
  If you have the Win Probability Matrix for a given situation (say, Coors Field, 2000-2002)in an Excel spreadsheet and you are able to read in the PBP data for Coors in 2003, it shouldn't be too difficult to calculate the WPA for each player, game by game. But what if the 2003 version of Coors is significantly different than 2000-2002, as it seems to have been? Then the old WPM may no longer apply. But if you try to create a new matrix based solely on 2003 data, you'll get all sorts of odd results. Even 3 years data won't be enough to prevent that entirely. The point is, doing the calculation isn't hard; creating a reliable situation-specific matrix is.


Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 12:30 p.m., October 10, 2003 (#2) - FJM
  Tom: could you post the H-R differential for each team and it's opponents separately? I believe some parks help the home team a lot more than they help the opposition. Case in point: Coors 2003. If you look only at the Rox, the differential (.324-.283=.041) was down only 8 points from your 1999-2002 number. But from the opponents perspective, it was down 51 points (.313-.315= -.002). That's right, Rox opponents actually had a better H$ at home than they did in Denver this year.


Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 1:48 p.m., October 10, 2003 (#4) - FJM
  It does indeed. You might want to take a quick look at the Day/Night split as well. Dodger Stadium and Qualcomm in particular have the reputation of being completely different parks in the daytime.

Also, a study of Questec/non-Questec parks is definitely needed. The effects of Robo-ump should be seen primarily in the K/BB ratio. However, if Curt Schilling is right, it will show up in H$ as well.


Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 5:01 p.m., October 10, 2003 (#7) - FJM
  If you tried to include all possible factors in your model for every park, you would indeed be left with nothing worth analyzing. I don't believe that will be necessary. For example, the day/night distinction probably only matters for a few parks. The burden of proof is always on the one who asserts the positive. So test it first by itself for each park. Unless it clearly matters for Park X, throw it out. Same for home/visitor, left/right, etc. I'm guessing only one or two factors will be significant for each park, but they won't always be the same ones.


Injury-prone players (October 14, 2003)

Discussion Thread

Posted 10:10 p.m., October 14, 2003 (#16) - FJM
  You are mixing two different phenomena here: frequency and severity of injury. Frequency should be more predictable than severity. You don't want to treat a player with 3 different visits to the DL totalling 90 days the same as one with a single, 90-day layoff.

Here's what I suggest. Set aside severity for the moment. For each player who has been in MLB at least 4 full years run a regression where Y=number of times on the DL in Year N and X1 is the number of times in Year N-1, X2 is the number in N-2 and X3 is the number in X-3.


Anatomy of a Collapse (October 15, 2003)

Discussion Thread

Posted 8:09 p.m., October 15, 2003 (#23) - FJM
  Fascinating! Naturally, I have a few questions.
1)Did you consider the possibility that the Run Expectancy (and hence the Win Probability) had already changed during Castillo's at bat by virtue of the count going to 3-2? In the NL last season the overall OBP was .332. But it jumped to .467 after the count reached 3-2, a 41% increase. So instead of a 1-in-3 chance of getting on base the odds had already changed to nearly 50/50. That made the fan interference somewhat more costly than if it had occurred on the first pitch.
2)How does this one compare to other notable post-season collapses (2002 W.S. Game 6, the Kim blowups in 2001, Billy Buckner & Bucky Dent)? Was this the worst ever?


Anatomy of a Collapse (October 15, 2003)

Discussion Thread

Posted 6:34 p.m., October 16, 2003 (#33) - FJM
  I remember Woolner's study. I (and many others) wrote to him at the time that it was flawed in that it mixed two different effects into one: the number of pitches thrown and the pitch count. I believe he published a followup study in which he separated the two. Sorry I don't have the link.

According to Woolner, the weighted average OBP for pitches 12 and up is .461. That number will only go down as you bring in pitch 11, pitch 10, etc. (Pitch 11 and above is .449, for example.) Yet as I stated the NL OBP on a 3-2 count was .467 last year. So either the two data sets are very different or there are a lot of long at bats where the count never reaches 3-2.


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 6:05 p.m., October 16, 2003 (#3) - FJM
  Tonight the Yankees and Red Sox will play for the 26th time this season, the most meetings ever between two teams. If New York wins they will have a 14-12 edge. Would that establish Yankee superiority? Of course not, no more than flipping a coin 26 times and getting 14 heads proves it's biased. Nor would a Boston win prove the two teams are equal, even though they would have the same number of wins.

Even if they played the entire 162-game schedule against each other you couldn't prove one was stronger than the other. For one thing, they were very different ballclubs back in April and May than they are now. Consequently, you'd have to give much greater weight to the most recent experience; i.e., this series. And that's why playoff baseball is so exciting and so unpredictable: because the importance of every play, every single pitch, is enormously magnified. The playoffs aren't about who is best on paper. A computer simulation would be the best way to decide that. They are about who is best on that particular day (or series of days) under conditions totally different than the regular season. And that's as it should be.


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 1:27 p.m., October 21, 2003 (#5) - FJM
  You can calculate the run distribution from the Win Probabilities for the bottom of the 9th and a little algebra.

P(0 runs) = 1-2*(.634-.500) = 1-.268 = .732.

P(1 run) = (.268-.194)*2 = .148,

P(2 runs) = (.120-.086)*2 = .068,

and so on.

All of which begs the question: If you go to the bottom of the 9th trailing by one run with the road team's closer coming in, can you really expect to win the game right there 12.0% of the time? And will you really send it to extra innings 14.8% of the time? That suggests a one-inning, one-run expected BS rate of 26.8%. Time for your opponent to get a new closer. Conclusion: Run scoring potential is highly situation-dependent.


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 5:58 p.m., October 21, 2003 (#8) - FJM
  That is a very interesting (and surprising) result! Here's one possible explanation. Back in 1979-1990 the role of the closer was not as clearly defined as it is today. Blowing an occasional save didn't carry the stigma it does now. Do you have similar data for the last 3 or 4 years?


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 8:01 p.m., October 21, 2003 (#10) - FJM
  If you define professional closer as one who earned at least 30 saves then there were only 3 of them in 1979. That number jumped to 11 by 1990 and 17 by 1999. (Interestingly, it declined to only 12 in 2003.)It truly is a different game today.


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 2:03 p.m., October 22, 2003 (#14) - FJM
  The one run, one inning BS probability still seems high to me at 26.6%. The 1979-90 Actual data implies it was 26.2% back then, before most teams had true closers. I still think it's significantly lower today.

I don't know what the HFA was back then. Your selection (4.5 vs. 4.3 overall) imples 4.7%. Last year it was only 1.2% in the AL (4.93 vs. 4.87 overall) and 1.8% in the NL (4.69 vs. 4.60).


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 8:02 p.m., October 22, 2003 (#17) - FJM
  Doesn't it seem strange that, if the best pitchers are on the mound Close & Late, the OBP-BAA in these situations would be far higher than it is overall? As you suggest, IBB would explain some of the difference, but certainly not all.

One problem I see with your data is it includes situations where the game is tied and where the team being studied is trailing by one run. Most teams don't use their closers in those situations.

I took a different approach. I selected all pitchers who had at least 10 Save Opportunities in 2002. (I used Sv. Opps. instead of Saves to avoid any chance of biasing the results toward the more successful closers.) There were 40 of them. Here are their combined stats: .228/.293/.344.

Comparing your C&L results with mine, your group BAA is 26 points (11%) higher. Your SLG is 45 points (13%) above mine. That's about what I would expect, since mine is a more restricted sample. But here's the kicker. The OBP for your group is 57 points (19.5%) worse than mine. And if you look at OBP-BAA, the difference is huge: 96-65=31, a 48% disparity. Something is definitely out of whack there.


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 9:42 p.m., October 22, 2003 (#18) - FJM
  I should add that, of my Fab 40, only one (Matt Herges) had an OBP of .350 or higher. (According to your data, .350 was the average OBP over all C&L situations!) And Herges just barely made my cut, with 10 Sv. Opps. and 4 BS's. There was only one other closer over .340 (Kelvim Escobar.) At the other end of the spectrum, 15 of them were under .280 and 24 were under .300. Now that's the kind of pitching I would expect to see, down by one run going to the bottom of 9th.


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 12:02 p.m., October 23, 2003 (#20) - FJM
  Thanks for your latest post, Tom. We're now only 10-12% apart on all the stats. I can live with that. It does raise an interesting question, though. Do closers perform better with a 2- or 3-run lead than they do with only 1 run to work with? Or perhaps they do better at home, where they get to do their thing in the top of the 9th?


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 1:22 p.m., October 23, 2003 (#21) - FJM
  I don't have the data to test the 2- or 3-run lead hypothesis. But I did test my closers for HFA. They came in at .221/.283/.331. Looks like they are about 3% better at home than overall, which implies they are 6% worse in the bottom of the 9th than they are in the top half. Note that their HFA is about twice as large as the HFA overall. This explains about 1/3 of the remaining discrepancy between your data and mine.

One correction on Matt Herges. He did have 6 saves in 2002. But that was out of 14 Sv. Opps., not 10. That's 43% success. 84% was average. Nobody else finished below 60%. No wonder he wasn't a closer very long!



Managers Post-season records (October 22, 2003)

Discussion Thread

Posted 4:42 p.m., October 23, 2003 (#6) - FJM
  You can never really answer the question "Who is the best post-season manager?" because you can't separate a manager from the players he manages. However, you can take a shot at it by comparing his record in each postseason series against his expected record vs. that opponent based on the performance of each team during the regular season. To that end, you can use either the 2 teams' actual W-L Pctgs. or their Pythagorean ones. It won't make much difference. The point is, a manager deserves a lot more credit for winning the World Series with a .550 team than with a .650 one.



Evaluating Catchers (October 22, 2003)

Discussion Thread

Posted 5:23 p.m., October 23, 2003 (#8) - FJM
  Fascinating stuff! Here's another way you could use this data. For those catchers where you have plenty of data on both sides of age 30, treat them as two different players. You could then determine how much each catcher lost in each category as he aged.


Gleeman - Jeter - Clutch (October 30, 2003)

Discussion Thread

Posted 6:48 p.m., October 30, 2003 (#6) - FJM
  Let's take this out of the postseason context, so we don't have to worry so much about small sample sizes. The point about comparing Jeter to his teammates in similar situations is a valid one. If his average drops only 10 points under pressure while his teammates lose 20 points, then he is a "clutch hitter", at least in a relative sense. And the fact is, as a team, the Yankees hit 21 points worse with RISP & 2 Out than they did overall last year (.271-.250). Jeter? He hit 17 points BETTER (.341-.324). That's a 38-point swing. A fluke? Well, in 2000-2002 he hit .309 overall but .333 in the clutch, a 24-point improvement. The Yankees? They were .273 overall but only .249 with RISP & 2 Out, 24 points worse. A 48-point swing.


Value of keeping pitch count low (October 30, 2003)

Discussion Thread

Posted 6:09 p.m., October 30, 2003 (#2) - FJM
  From 1999 thru 2003, Randy has 1,144 IP in 158 starts, an average of 7.24 IP/GS. That leaves about 5.3 outs for the bullpen per start. In contrast, Brad has 1,002 IP in 154 starts, an average of 6.51 IP/GS. That leaves 7.5 outs for the relievers. Conclusion: the bullpen can expect to get almost 50% more work with Radke on the mound.


Value of keeping pitch count low (October 30, 2003)

Discussion Thread

Posted 6:45 p.m., October 31, 2003 (#6) - FJM
  First, a word of caution. By this definition, the 2001 version of Tim Wakefield qualified as a power pitcher (1.31). But the 2001 Curt Schilling did not (1.29). That's because Wakefield gave up nearly twice as many walks in 1/3 fewer IP. I suggest you define "power" strictly as K/IP>1 and "crafty" as K/IP<0.6.

Second, can you calculate the BABIP for the 2 groups? (Might as well do it for the middle group as well.)


Value of keeping pitch count low (October 30, 2003)

Discussion Thread

Posted 1:17 p.m., November 3, 2003 (#13) - FJM
  In 2003 the 30 teams had ERA's for their starting staffs ranging from 3.49 (L.A.) to 6.24 (Texas). The median of the distribution is 4.47; the (unweighted) mean is 4.55. So 4.50 certainly seems reasonable.

Defining a marginal reliever is more difficult. I chose to define them as pitchers who had 1) no more than 2 starts and 2) no more than 2 saves and 3)no more than 2 holds. 168 pitchers qualified, which works out to an average of 5.6 per team. They averaged just over 15 IP. Together they accounted for about 6% of all innings, roughly one inning every other day. Seems pretty marginal to me. Anyway, their collective ERA was 5.81.

If you eliminate the guys with any starts or any saves the ERA goes up to 6.23. But the remaining pitchers account for only 3% of all innings pitched or about 2 IP per team per week.



Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)

Discussion Thread

Posted 4:16 p.m., November 7, 2003 (#8) - FJM
  Does anybody segregate true SB attempts from failed run-and-hit plays where the batter swings and misses? I'd expect they would have different distributions across the base/out states. Certainly the would have different success rates. When a runner is thrown out on a failed run-and-hit, shouldn't the batter be charged with the out?


Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)

Discussion Thread

Posted 6:37 p.m., November 7, 2003 (#12) - FJM
  In the regular season the Marlins stole 150 bases and were caught 74 (!!!) times. Then in the postseason they were 8-6. Did they steal too much?


What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 2:52 p.m., November 8, 2003 (#13) - FJM
  IMO, the division of credit should be the same, regardless of when the runs were scored. Assuming a 4.5 expected run environment and using the Pythagorean formula, a team scoring 3 runs expects to win 30.8% of the time. A team scoring 2 will only win 16.5%. .308+(1-.165)=1.143. if the sum was exactly 1.00, you'd have your split already. Since it isn't, you have to normalize. .308/1.143=.269. So the offense deserves about 27% of the credit for the win and 73% goes to the pitching/defense.


Pitcher's Hitting Performance When NOT Bunting (November 18, 2003)

Discussion Thread

Posted 10:50 a.m., November 19, 2003 (#1) - FJM
  This doesn't tell the full story either. When the pitcher fails in 2 bunt attempts and gets a quick 2-strike count, the manager will frequently remove the bunt sign and let him swing away, apparently thinking he has a better chance of advancing the runner by swinging away than by attempting another sacrifice. I'm not convinced that's the right decision. But one thing it certainly does is put the pitcher deep in a hole. To really assess how good a hitter the pitcher is, you need to remove all AB's where he unsuccessfully attempted to bunt at all, even if that wasn't the end of the at bat.


Baseball Player Values (November 22, 2003)

Discussion Thread

Posted 1:19 p.m., November 24, 2003 (#11) - FJM
  Fascinating! But there are some strange things here, particularly among the 1972-2002 Pitcher Rankings. For example, how can Appier, Hershiser, Gooden, Franco(!!!), Saberhagen, Brown, Mussina and Glavine all score higher than Steve Carlton even though Lefty faced more batters and had a lower BAA (.238) than any of them (.243-.252)? Franco is really weird, as he faced only 29% as many batters. Assuming equal performance, he'd need an LI of 3.41 just to match him.


The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)

Discussion Thread

Posted 8:35 p.m., December 1, 2003 (#21) - FJM
  Cyril: I took another look at the R^2 for PWV/PA related to Relative OPS. Recall this figure was 90% for the Top 100. It dropped a bit when you looked only at the Top 30 (87%) or the Bottom 30 (84%). Still, pretty high. The trouble is, you are still including extreme outliers like Bonds and McGwire, or Boone and Bowa, in your 2 subsets. The inclusion of such outliers increases the R^2 significantly. To avoid this, I looked not at the 2 ends of the distribution, but the middle. Specifically, I looked at everybody in your Top 100 who had a Relative OPS of 110-119. (I did not use PWV/PA to define the middle group because I wanted it to be reasonably homogeneous in terms of ability, as defined by OPS. If I had done it the other way, I would have included hitters as diverse as Sammy Sosa(118) and Tony Phillips(101).)

My group accounted for nearly half of your original group, 46 to be exact. That's 50% more players than were in your high and low groups. So you might think the R^2 would be pretty high. Not so. It drops all the way down to 40%! More importantly, it identifies 7 players out of the 46 who produced anywhere from 27% to 48% more PWV's per PA than would be expected based on their overall Relative OPS. The Magnificent Seven are Keith Hernandez (+48%), Mark Grace (+46%), Wade Boggs (+38%), Ken Singleton (+38%), Daryll Evans (+36%), Will Clark (30%) and Tony Gwynn (+27%). There is a distinct break at this point, with no one else above +21% (Kirby Puckett and Ricky Henderson). Incidentally, Mr. Clutch (Eddie Murray) ranked 11th of the 46 with +18%. At the other end of the spectrum we find Ron Cey and Carlton Fisk (-27%), Andre Dawson (-31%), Bobby Bonilla and Sammy Sosa (-40%) and Chet Lemon, an almost unbelievable -63%!

Does this prove the players with big pluses were clutch performers, or that those with big minuses were choke artists? No, it doesn't. There are other possible explanations. For example, Ricky Henderson just barely missed the cut. But we know he won a lot of games with his legs, and that is not measured by Relative OPS. If we used a more comprehensive measure of overall value like Linear Weights Ratio, his Clutch Score would undoubtedly drop. Still, there isn't a lot of speed in the Magnificient Seven. So if we're missing something it must be something other than that.

It's especially interesting to compare 2 long-time teammates, Mark Grace and Sammy Sosa. That largely eliminates the ballpark factor. Based on his Relative OPS (111) the model suggests we would expect .0026 PWV/PA from Gracie. He actually produced .0038. Sammy was (and still is) a much better hitter overall (118). Yet he produced only .0023 PWV/PA, compared to an expectation of .0039. If clutch performance isn't the explanation, what is?


The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)

Discussion Thread

Posted 4:20 p.m., December 2, 2003 (#26) - FJM
  The difference between the best Clutch performers and the average ones over the course of a long career is only about 1 win (or 10 runs) per year. And yes, there is an enormous amount of random variation, year-to-year. So you shouldn't expect to see consistency from one year to the next. To the extent this does measure clutch performance it will only be apparent when looking at many years. One thing you might try: split the players careers into odd and even years. Identify the clutch (or choke) players using the odd year data, then do it again for the even years. If a player appears on both lists, that is pretty strong evidence.

All regressions display this phenomenon to some extent; for some it's a lot more significant than others. It's a direct result of the methodology, which is LEAST SQUARES REGRESSION. That is, you minimize the sum of the SQUARES of the residuals. So, for example, Barry Bonds and Mark McGwire get a lot more weight in your regression than an average player would. Same for Bob Boone and Larry Bowa on the low end.

Sensitivity testing a regression model is always a good idea, especially when you suspect a small number of data points may be having an undue influence. Still, I agree that excluding half the data is probably too much. So I reran the regression, including all players with Rel. OPS in the 100-119 range. That increased the data base to 83 out of the original 100. I still feel that the 5 superstars (Bonds, McGwire, Bagwell, Schmidt and Griffey Jr.) should be excluded. The same goes for the 12 players who made the list primarily for their gloves, not their bats. Anyway, the R^2 did improve to 74%, still a long way from 90%. But the model hardly changed at all. For every percentage point above average in Rel. OPS, the PWV/PA increases by .000226. That compares to .000224 for the 47 player group. (I inadvertently left Ted Simmons out the first time, hence the increase of one player.)

One thing that expanding the data base did was to add several new candidates for the title of Mr. Clutch, most notably Tony Phillips. With a Rel. OPS of only 101, you'd expect him to have a PWV/PA of .000224. Instead, he's at .0019, over 8 times his expected value. Other strong candidates include Toby Harrah, Ken Griffey SENIOR, Pete Rose, Lou Whitaker and Jose Cruz.


The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)

Discussion Thread

Posted 7:30 p.m., December 2, 2003 (#28) - FJM
  I understand your point, and it is a valid concern. Ratios can get very dicey when an average performer does slightly better (or slightly worse) than expected. OTOH, simply showing Tony Phillips Actual-Fitted (.0017) makes it appear his clutch performance is just a little better than that of Mark Grace and Keith Hernandez(both .0013). To use your grading analogy, Phillips is like a C student throughout high school who gets a B on the final exam, enabling him to go on to college. Grace and Hernandez are more akin to B+ students who get an A on the final. I would argue that Phillips achievement is a lot more significant.


Correlation between Baserunning and Basestealing (December 10, 2003)

Discussion Thread

Posted 7:57 p.m., December 10, 2003 (#12) - FJM
  Another way around the multicollinearity problem.

Define 2 new variables: SB1=SB+CS and either SB2=SB-CS or SB3=SB-2*CS.


Pettitte agrees to deal with Astros (December 11, 2003)

Discussion Thread

Posted 5:08 p.m., December 11, 2003 (#1) - FJM
  The present value of $5.5 M at the beginning of 2004 plus $8.5 M in one year and $17.5 M one year after that is $29.1 M, not $30.8 M. $30.8 M would be the value discounted to the END of 2004. You are correct, though, in your main point, that $5.5 M, $8.5 M, $17.5 M is equivalent to $1 M, $1 M, $30.5 M. It's also equivalent to ten annual payments of $3.48 M each.


Pythag - Ben VL (December 12, 2003)

Discussion Thread

Posted 2:34 p.m., December 15, 2003 (#2) - FJM
  In 1991, by almost any measure, the Yankees were a better team overall than my D'backs. The Cardinals and the Braves were too. But they won the WS because of a 2-headed monster, Johnson and Schilling. And it shouldn't have been a great surprise. During the regular season they were 51-18 (.739) with either one starting; 41-52 (.441) otherwise. Their overall record was mediocre (92-70, .568) because the Big 2 started only 42.6% of the time. But in the postseason they started 11 out of 17 games (64.7%), and the Snakes were 9-2 in those games. (They were 2-4 in the other games.) So unless you adjust for pitching matchups regular season records (whether "actual" or "expected") are very unreliable predictors of postseason success.


Pythag - Ben VL (December 12, 2003)

Discussion Thread

Posted 3:32 p.m., December 15, 2003 (#3) - FJM
  That was 2001, of course.


Pythag - Ben VL (December 12, 2003)

Discussion Thread

Posted 3:06 p.m., December 16, 2003 (#5) - FJM
  You continue to ignore my point. Yes, regular season records DO have predictive value, IF AND ONLY IF you adjust for pitching matchups. The D'backs with Johnson or Schilling on the mound were a completely different team than they were with Batista or Anderson.


Banner Years and Reverse Banner Years for a Player's BB rate (December 14, 2003)

Discussion Thread

Posted 4:10 p.m., December 15, 2003 (#9) - FJM
  Applying a 4-2-1 weighting to (48,29,25) I get 39, not 37. I assume you got the lower value because you weighted each year by the number of actual PA's in that year. That introduces a small bias.

Using a simple 3-2-1 weighting I get 38, the actual subsequent year value. So why not use that?

You could refine your weights considerably by running a multiple regression.



Converted OBA (December 15, 2003)

Discussion Thread

Posted 2:52 p.m., December 16, 2003 (#12) - FJM
  You're right, it is unreasonable. A minor change to your example will show that very clearly. Instead of transforming 15 Outs into 15 HR's, change 15 SINGLES into 15 HR's. Now both the batter and the pitcher have the same BA (.278) and OBA (.350) as the league average. So we would expect those stats to stay the same when they face each other. But when you apply the Ben Matchup method to each rate stat and add everything up, you get a BA of .310 and an OBA of .379! The problem is, reducing the batter's singles rate by an additional 15% (from 85 to 72) is more than offset by doubling his HR rate (from 30 to 60).


Converted OBA (December 15, 2003)

Discussion Thread

Posted 7:49 p.m., December 17, 2003 (#25) - FJM
  Arvin: It's even weirder than that. If "true talent level" exists only in some context-neutral sense, with all park, league, and year effects removed, than what does it really mean, anyway? And if you want to somehow talk about how Barry would have done against Pedro at Coors in 1994, you first have to specify which Barry and which Pedro you are talking about. After all, both Bonds and Martinez were playing in 1994, presumably in Colorado, although not against each other. It might make some sense to ask how such a matchup would have turned out if it had taken place. But that's not what is being asked here. We're asking how the 2003 Bonds would have performed against the 2003 Martinez if both had been sent back thru time and space to Coors 1994. True talent level is constantly changing, and can never be totally separated from the context in which it was observed. We can measure it many different ways, making dozens of adjustments. But in the end we can never know it's true value. This is the sabermetric equivalent of the Heisenberg Uncertainty Principle. Asking questions like how many homeruns the 1929 Babe Ruth would hit today, training on steroids instead of beer and babes, may be fun. But if we take the answers too seriously we make a mockery of our own analysis. We have made great strides in understanding this wonderful game since the days of Bill James' Abstracts. Let's be mindful of one of his later warnings: This time, let's not eat the bones.


UZR 2003 Previews (December 18, 2003)

Discussion Thread

Posted 12:17 p.m., December 18, 2003 (#1) - FJM
  I must say I find this very surprising. What do you make of the fact that not one SS made the Top 10 and only one made the Top 17? Don't they get more opps. per game than anyone else? And no catchers on either list?????
Can you post the Top 10/Bottom 10 at each position, or at least at the key spots (2, 4, 5 and 8)?


Questec Interview (December 18, 2003)

Discussion Thread

Posted 8:19 p.m., December 18, 2003 (#1) - FJM
  Since there were 10 Questec parks in 2003, there should be enough data to test the hypothesis that the umpiring is different under Questec. Anyone care to take it on?


Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 7:30 p.m., December 30, 2003 (#44) - FJM
  In 2003 the average Team ERA for Starters Only was 4.55. For Relievers, it was 4.11. So the difference was 0.44. Fairly large, yes, although a long way from 0.6. The difference is just under 10%.

But let's dig a little deeper. Let's split the data into 3 groups, depending on whether each team's starters had a low, medium or high ERA. My cutoffs are under 4.00 for the good group and 4.90+ for the bad group. The cutoffs were chosen so that the low and high groups would each have 8 teams, leaving 14 teams in the middle. Now, what are the differences for each group?

The average Starters' ERA for the teams with good starting pitchers was 3.78. The Relievers on those 8 teams averaged 3.68. So here the difference is only 0.10 run, or 2.6%. (In fact, it turns out the entire difference for this group can be explained by just one team, the Dodgers, whose Starters came in at 3.49 while the Relievers were a phenomenal 2.46.)

Moving on to the medium group, the average Starters' ERA is 4.45. Their Relievers come in at 4.30. So the difference here is only 0.15 run, or 3.4%. (For this group, over 90% of the difference can be traced to 2 teams with exceptional bullpens, Houston and Minnesota.)

That leaves us with 8 teams to account for the great majority of the Starter-Reliever differential. And their Starters were indeed awful, posting an average ERA of 5.51, more than 1 run worse than the middle group. In contrast, their Relievers posted a decent ERA of 4.23, slightly better than the middle group. That's a differential of 1.28 runs, 23.2%. Moreover, unlike the other 2 groups, where one or two teams accounted for most of the difference, every one of these 8 teams had a significant Starter/Reliever disparity. All but one of them had a differential of at least one run. For three teams (Anaheim, Milwaukee and Cincinnati) the difference exceeded 1.5 runs.

In summary, for those 22 teams that have decent or better starting pitchers, there is very little difference between their starters and relievers in terms of ERA. For teams with lousy starters, their relievers are indeed much better. Isn't that what you would expect?


Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 1:10 p.m., December 31, 2003 (#46) - FJM
  To some extent, we're arguing about semantics here. What do you mean by "average"? When you say "the average reliever is worse than the average starter", I interpret that as follows. If you choose the same number of starters and relievers at random, the majority of the starters will be better than the majority of the relievers. I'm sure you are right about that, but that has very little to do with my statistics. You are talking about individuals; I'm talking about teams. The last 2 or 3 guys in most bullpens are marginal, near replacement level pitchers. But they get very little opportunity to pitch, never in high leverage situations. So by looking at teams rather than individuals the importance of these players is greatly reduced.

As I said above, for 22 of the 30 teams the difference between their starters and relievers is only about 3%. To me, that's the norm. And that difference is probably largely attributable to the defect in the way ERA is calculated, charging all runners to the pitcher who allowed them to reach base.

8 teams had very poor starting pitching, as defined by an ERA of 4.90 or more. Together, those 8 teams accounted for nearly 80% of the overall Starter/Reliever differential, 0.34 out of 0.44. 3 other teams (Dodgers, Astros and Twins) account for the rest of it. So what is your norm, the 11 teams who average a 1.2 run differential, or the 19 teams where the average differential is 0?


03 MLE's - MGL (December 28, 2003)

Discussion Thread

Posted 5:40 p.m., December 29, 2003 (#26) - FJM
  The top 10 position players could reasonably be used as a proxy for a team's entire offense. (You only list 8 Tigers minor leaguers, though, not 10. You need to add a DH at least.) But the Top 5 starters cannot be used as a proxy for a team's entire pitching staff. If the average starter gets 30 starts per season and averages 5 innings per start, then the 5 starters will consume about 55% of the total IP. The other 45% will be filled by mostly inferior pitchers, with the closer (if you've got one) accounting for 8 or 9%.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 11:48 a.m., January 2, 2004 (#15) - FJM
  Can you post the BABIP MLE coefficients? It seems like most, if not all, of the dropoff in hits can be attributed to the big increase in K's.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 3:15 p.m., January 2, 2004 (#18) - FJM
  Everything seems to make sense, except for the BB rates. The implication is that AA pitching is about halfway between AAA and MLB in terms of control. The AA reg. coeff. for BB/PA (.88) seems awfully low.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 6:28 p.m., January 2, 2004 (#23) - FJM
  The AA reg. coeff. for BB/PA (.88) seems awfully low.

You mean high, right? Yes, I agreee that there is no logical reason for the AA BB rate coeff. to be higher than the AAA. Should be the other way around. There is so much sample error in all the different calculations.

Actually, I was referring to the year-to-year reg. coeff., which you divide by to get the final MLE. In other words, for AAA it is .85 / 1.00 = .85 while for AA it is .81 / .88 = .92. I'm not questioning the numerator, just the denominator, the .88. Note that it is the only year-to-year coeff. other than SB rate which varies significantly from 1.0.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 1:11 p.m., January 3, 2004 (#35) - FJM
  Thanks for the kind words. Numbers, rather than plain English, is my first language, I guess. Anyway, I'm not having any trouble following you. The idea of a summary at the end is a good one, though.

Do you have the ability to split your AAA (or AA) data by league? I realize it shouldn't make any difference, since the data has already been adjusted for league (and park). Still, the PCL is so different from the IL that I wonder about the accuracy of those adjustments, particularly when it comes to stats like BB rate and K rate. If the adjustments ARE correct, then looking at the 2 leagues separately should produce 2 sets of similar results, the only differences presumably being random variation. That in turn would give us some idea how confident we can be in your MLE's.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 5:07 p.m., January 5, 2004 (#61) - FJM
  Are you able to calculate an R^2 for your MLE's against their actual major league stats for those batters who had enough PA's to qualify at both the minor and major league level during the 2001-03 period? If so, how does it compare to the year-to-year R^2 for all qualified betters who were in the majors for the entire period?

Here's my concern. I looked at the 3 rookies who made the D'Backs in 2003. (Alex Cintron technically didn't qualify as a rookie, even though he had had less than 100 PA's with the big club before 2003.) Here are your MLE's from 2001-03, followed by how they did with the AZ DBs.

Cintron / BA / OBP / SA / OPS
MLEs / .271/ .301/ .374/ .672
AZDB / .317/ .359/ .489/ .848

Kata / BA / OBP / SA / OPS
MLEs / .247/ .285/ .375/ .659
AZDB / .257/ .315/ .420/ .736

Hammock/ BA / OBP / SA / OPS
MLEs / .229/ .288/ .354/ .642
AZDB / .282/ .343/ .477/ .820

The MLEs are far below the numbers they actually put up in every case except Kata's BA. (Even there it should be noted that he was batting .269 through August before slumping to .204 in September.) I believe the problem lies not with your estimates as such. It's the old selective sampling problem reappearing. Your MLE's are very close to replacement level. Consequently, if the batters being called up don't do significantly better than you expect, their playing time will be severely limited and/or they will soon find themselves back in the minors. Either way, they won't get enough MLB PA's to qualify as successful callups.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 5:07 p.m., January 5, 2004 (#62) - FJM
  Are you able to calculate an R^2 for your MLE's against their actual major league stats for those batters who had enough PA's to qualify at both the minor and major league level during the 2001-03 period? If so, how does it compare to the year-to-year R^2 for all qualified betters who were in the majors for the entire period?

Here's my concern. I looked at the 3 rookies who made the D'Backs in 2003. (Alex Cintron technically didn't qualify as a rookie, even though he had had less than 100 PA's with the big club before 2003.) Here are your MLE's from 2001-03, followed by how they did with the AZ DBs.

Cintron / BA / OBP / SA / OPS
MLEs / .271/ .301/ .374/ .672
AZDB / .317/ .359/ .489/ .848

Kata / BA / OBP / SA / OPS
MLEs / .247/ .285/ .375/ .659
AZDB / .257/ .315/ .420/ .736

Hammock/ BA / OBP / SA / OPS
MLEs / .229/ .288/ .354/ .642
AZDB / .282/ .343/ .477/ .820

The MLEs are far below the numbers they actually put up in every case except Kata's BA. (Even there it should be noted that he was batting .269 through August before slumping to .204 in September.) I believe the problem lies not with your estimates as such. It's the old selective sampling problem reappearing. Your MLE's are very close to replacement level. Consequently, if the batters being called up don't do significantly better than you expect, their playing time will be severely limited and/or they will soon find themselves back in the minors. Either way, they won't get enough MLB PA's to qualify as successful callups.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 5:13 p.m., January 5, 2004 (#63) - FJM
  Sorry for the double post.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 7:15 p.m., January 5, 2004 (#65) - FJM
  You're right, RM, Overbay underperformed. But he's the exception that proves my point. His MLE's were way above replacement level, so he could afford to fall short and still provide some value to the team. But eventually it even caught up with him. He was sent down for most of July and all of August, returning only after the rosters expanded. Even then he got only 27 AB's after averaging 67 per month through the ASB.

To reiterate my point, unless he is brought up primarily for his outstanding defense, a young player must hit above replacement level to keep his job. A manager might stick with an older player in a prolonged slump if he has "veteran presence" (e.g., Mark Grace). But a rookie must constantly prove himself.


SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)

Discussion Thread

Posted 7:18 p.m., January 7, 2004 (#70) - FJM
  The ironic thing about this discussion is that managers (who probably know less about statistics as a group than the least sophisticated Primate) make decisions affecting people's careers all the time AS IF they did understand statistics. Clearly, Brenly felt Overbay failed. Even if it cannot be demonstrated statistically, the result was exactly the same: he got sent down and his days as a Diamondback were essentially over. I've seen this process repeated over and over again. Remember Jack Cust?



MGL - Component Regression Values (PDF) (January 8, 2004)

Discussion Thread

Posted 11:18 a.m., January 9, 2004 (#7) - FJM
  TT: Can you explain why the 2B regression for the batters is so high, just as high as it is for the pitchers? If you combine 2B and 3B, does it change much?


MGL - Component Regression Values (PDF) (January 8, 2004)

Discussion Thread

Posted 4:45 p.m., January 9, 2004 (#11) - FJM
  Let's set aside Fielding, Park and Luck for now. If all you wanted to know is the relative importance of the batter vs. the pitcher in determining the expected frequency of each outcome, you would calculate it as (1-B)/(1-P), correct? For example, for singles it would be .67 / .55 = 1.22; i.e., the batter's influence is 22% greater than the pitcher's. Alternatively, you could express the batter's influence as a percentage by (1-B)/(1-B+1-P). Then you have .67 /1.22 = 55%.

Here then are the 2 stats, sorted low to high:

2B+3B: 0.89 and 47%

SO: 1.02 and 51%
RBOE: 1.08 and 52%
NIBB: 1.13 and 53%
1B: 1.22 and 55%

HBP: 1.63 and 62%
HR: 1.86 and 65%

Now you can see the reason for my earlier question. The batter has a little more influence than the pitcher in every category except HBP and HR, where he has a LOT more influence. But the pitcher apparently has more influence than the batter over doubles and triples. Not only does this run contrary to DIPS, but it seems to go against common sense. Shouldn't the batter's influence over doubles and triples fall somewhere in between his effect on singles and his impact on HR's?


MGL - Component Regression Values (PDF) (January 8, 2004)

Discussion Thread

Posted 6:21 p.m., January 9, 2004 (#13) - FJM
  True. But if you divide all 3 hit categories by (PA-NIBB-HBP-SO) instead of by PA, won't 2B+3B still indicate less batter influence than 1B?


MGL takes on the Neyer challenge (January 13, 2004)

Discussion Thread

Posted 12:57 p.m., January 14, 2004 (#8) - FJM
  If you want to do regression toward the mean on team W-L records, shouldn't you use different regression parameters for each margin of victory or defeat? For example, if a game decided by one run was replayed (or simulated) one hundred times, wouldn't you expect it to come out close to 50/50? Whereas a game decided by 10 or more runs would almost always result in the same winner, albeit generally by a smaller margin. That's the basis for the Pythagorean formula, and that's why it can be off significantly for teams that have very good or very bad records in close games.


DRA Addendum (Excel) (January 16, 2004)

Discussion Thread

Posted 1:56 p.m., January 20, 2004 (#12) - FJM
  The results of your study are certainly thought-provoking. I'm troubled though by the amount of year-to-year variation shown for each player. I computed the mean and standard deviation for each of the shortstops and then created a confidence interval about the mean 1 s.d. wide. (Normally I'd use a 2 s.d. conf. interval, but if I did that here I wouldn't be able to conclude much of anything.) The results: out of the 33 SS's shown, only 5 are significantly better than average (i.e., mean - st.dev. > 0.) (If I used 2 st. dev., only Mark Belanger would qualify, and we can't even be sure about him because half his career is missing!) Meanwhile, only 1 SS (DJ) is significantly worse than average (i.e., mean + st.dev. < 0.) Everybody else could be either better or worse.

In an attempt to improve the resolution I threw out all the italicized years (i.e., years in which they played fewer than 146 games.) That cut the data base down by more than one third, from 256 years to 161. And it did reduce the standard deviations somewhat. (Using the full data base, the average SS had a st. dev. of 11.0. In the smaller data base, it was 10.2.) Frankly I'd been hoping for, and expecting, a bigger reduction than that. The results do improve a bit, with 9 players now coming out above average and 2 being worse. But that comes with a price. One player (Craig Reynolds)was thrown out entirely since he had only one qualifying year and so had no standard deviation. 5 others had only 2 qualifying years, which means their computed standard deviations are probably understated. If I throw those 5 out as well, we're back to 7 players better than average and only 1 worse. And 2 of those 7 make the cut by the tiniest of margins (0.03). Essentially, we're right back where we started.


DRA Addendum (Excel) (January 16, 2004)

Discussion Thread

Posted 11:51 a.m., January 21, 2004 (#17) - FJM
  It might be an interesting exercise to calculate the proportion of players who, based on their year-to-year *offensive* performance, were reliably better than average at *hitting*, using the one-standard-deviation test. Of course there would be at least several, but probably a lot fewer than you would expect.

It's a long way from the study you suggested, but here's an interesting observation. 8 of the OPS Top 15 qualifiers repeated in 2003. (It would have been 9 out of 15 if Vlad G. had had a few more PA's.) Out of the remaining 6, 4 finished in the next group of 15. Only Larry Walker and Mike Sweeney slipped significantly, and even they were still better than average.


SuperLWTS Aging Curve (January 26, 2004)

Discussion Thread

Posted 7:28 p.m., January 26, 2004 (#5) - FJM
  I'm confused. If as you say "only the very good (talentwise) young and old players get lots of playing time", how can the weights from 33 on all be increasingly negative? Wouldn't such players decline more slowly (or even improve at an advanced age, ala Barry Bonds) than the typical player? How do I interpret the -50 for a 39-year old?


SuperLWTS Aging Curve (January 26, 2004)

Discussion Thread

Posted 10:24 a.m., January 27, 2004 (#9) - FJM
  I'm not trying to be argumentative, but it seems to me you are making 2 statements that are mutually exclusive. If marginal players get significant playing time only in their peak years while good/great players play a lot both early and late in their careers, then it seems to me there must be at least 2 and possibly 3 different aging curves. Is it possible to construct a curve for those with significant playing time before age 24 and/or after age 33 and then a separate one for everybody else?


SuperLWTS Aging Curve (January 26, 2004)

Discussion Thread

Posted 6:24 p.m., January 27, 2004 (#14) - FJM
  I really don't see your problem, since you yourself suggested that playing time early (and late) in a career is a reliable indicator of true talent. But OK, let's say that it isn't reliable. Then use something else as a classification criteria, such as OPS+ or LWTS or whatever you like, but limited to the peak years, 26-29. Classify everybody at least one standard deviation above the mean as a good/great player, and everybody at least one s.d. below the mean as a marginal player. Everybody else is average. Now construct 3 aging curves. The curve for the average group should look pretty much like the overall curve, but I'll be very surprised if the other 2 don't look quite different.


Clutch Hitters (January 27, 2004)

Discussion Thread

Posted 10:56 a.m., January 28, 2004 (#7) - FJM
  Very interesting indeed. A couple observations.

1) From the data you provided I calculate an average LI for Tejada of 0.936. Since 1.00 is normal, that strikes me as surprisingly low, especially for such a large number of PA's (2,736). But perhaps I'm wrong. Could you post the average LI and number of PA's for each of your Top 20/Bottom 20? Thank you.

2) Wouldn't it be better to rank the players by Clutch Runs per a fixed number of PA's? The distribution appears to be skewed to the right. But that could be caused by the Top 20 getting more opportunities than the Bottom 20.


Clutch Pitchers (January 28, 2004)

Discussion Thread

Posted 2:15 p.m., January 29, 2004 (#2) - FJM
  Wasdin's "achievement" isn't merely interesting; it appears (to me at least) to be mathematically impossible. Same goes for McElroy, who finished exactly one notch above Wasdin in both LI and the non-clutch standings. That in itself is suspicious. Moreover, both pitchers actually had winning records during the period in question (12-11 for Wasdin, 8-4 for McElroy). That's really hard to believe, if they both gave up a lot of important runs (despite being used in very few high leverage situations). Compare that to the next 2 pitchers on your list (Rupe and Acevedo), who allowed 58 clutch runs between them and posted a combined 35-62 record. But the kicker is this: Wasdin allowed a total of 124 earned runs. Throw in some inherited runners and unearned runs and he's probably somewhere around 150. So to be 72 runs worse than the average pitcher in the clutch, just about all the runs he allowed would have to have come in high LI situations, which we know is impossinble. Could you double-check your numbers for these 2 pitchers?


Clutch Pitchers (January 28, 2004)

Discussion Thread

Posted 4:35 p.m., January 29, 2004 (#8) - FJM
  Your welcome. There are some surprising names at the top of the Clutch list, particularly Weathers and Mlicki. Could you break down their performance by LI category, as you did with Wasdin? I wonder if they are mirror images of him: great when the pressure is on, lousy when it isn't.


Clutch Pitchers (January 28, 2004)

Discussion Thread

Posted 7:10 p.m., January 29, 2004 (#10) - FJM
  If it isn't too much trouble, I'd appreciate it. Mlicki's clutch performance looks like nothing more than a statistical fluke, resulting from just 48 very high LI PA's. (He does seem to be somewhat better under moderate to heavy pressure as opposed to no pressure at all, though.)

Weathers is much more interesting. He looks like a closer-in-waiting. He's actually pretty good all the time, but he really steps it up when the pressure is heavy to extreme. If you were to assume his overall performance is the mean, how much would you regress his high pressure stats?


Clutch Pitchers (January 28, 2004)

Discussion Thread

Posted 2:26 a.m., January 30, 2004 (#12) - FJM
  We could give it more weight if we knew it was coming from improvements in SO, BB or HR as opposed to a BIP.


Leveraged Index Leaders/Trailers (January 29, 2004)

Discussion Thread

Posted 10:53 a.m., January 29, 2004 (#1) - FJM
  Thanks, Tiger. I find it interesting that 5 of the Bottom 10 are catchers. Looks like their managers call for a PH whenever they come up in a high leverage situation.

Turing to the Top 10, I assume Rodriguez and Sierra made the top of the list due to pinch-hitting. But how in the world did Burrell come in so high?


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 4:12 p.m., January 31, 2004 (#29) - FJM
  I ran a quick multiple regression on your data set. Since the average pitcher debuts at age 23 I defined X1 as DebutAge-23 and X2 was years of experience. The results can out pretty much as you expected.

Y = 1.042 + 0.012*X1 + 0.003*X2.

In other words, a pitcher who debuted at age 23 (X1=0) can expect to experience a 4.2% decline in his second year, 4.5% in his third year, then 4.8% and so on. If he debuts at age 24, the first year decline increases to 5.4%, then 5.7% and 6.0% and so on. For each year his debut is delayed, his rate of decline increases by 1.2% every year.

So if his debut is delayed until 27, his second year decline is expected to be 4.2%+4*1.2%=9.0%, more than double the rate of decline experienced by those who break in at 23. No wonder pitchers who arrive late don't last very long!

Of course, for those who break in before age 23, the rate of decline is reduced by 1.2% per year. So a pitcher who debuts at 20 can expect a first year decline of only 4.2%-3*1.2%=0.6%, hardly any dropoff at all.

The model produces a fairly good fit, although R^2 is only 10%. The F-statistic is 7.59. X1 is definitely significant (t-statistic of 3.88) while X2 is only marginally significant (t-statistic 1.86). It's interesting that the model never projects any improvement, regardless of debut age or years of experience.

But wait. There is something seriously wrong with this model. It weights all observations the same, even though some are based on only a little over 2,000 PA's and others are based on 40,000+. I will rerun it next week with the observations weighted from 1 to 12. The weights will be determined by PA/4000, rounded off to the nearest whole number. It will be interesting to see how it changes.


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 11:32 p.m., January 31, 2004 (#31) - FJM
  I didn't chain before running the regression.


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 3:40 p.m., February 2, 2004 (#37) - FJM
  My regression was on the year-to-year changes, not the cumulative effect. They still need to be "chained" together for that.

I reran the regression using the weighting procedure outlined in my previous post. I was glad to see the model didn't change very much. The R^2 improved only slightly to 14%, although the F-statistic jumped from 7.59 to 41.31. The t-statistic for debut age more than doubled, from 3.88 to 8.72, and the the t-stat for years of experience nearly tripled, from 1.86 to 5.24. But the actual coefficients changed very little. The new regression equation is Y=1.037+0.0127*X1+0.0036*X2.


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 6:08 p.m., February 3, 2004 (#40) - FJM
  It's true that pitchers who break in at 24 "age" about 3 times faster than those who debut at 20. But you are ignoring X2, the Years of Experience factor. Although its coefficient is small relative to the X1 coefficient, it becomes very significant for pitchers who have been around a long time, like Pedro.

Looking at the Tango's raw data for pitchers who debut at 20, it looks like Pedro has 5 years left, at best. Age 37 is the last year in his data base where the total PA's exceed 2,000. And remember, that's for all pitchers in that category combined. I don't know how many of them made it to 37, but I can tell you that in their peak years (age 24 and 26) the same group was recording over 10,000 PA's per year. Interestingly, age 37 seems to be the end of the line for pitchers who debut at 24 (like Colon) as well. But since there were a lot more of them, their decline is lot more spectacular: at their peak, they were posting over 45,000 PA's per year.

While I'm on the subject, take a look at this chart which I developed from Tango's data by "chaining", multiplying the year-to-year factors together to see the cumulative effect of aging.

PriorExp DA20 DA21 DA22 DA23 DA24 DA25 DA26
0................0.93... 0.95... 1.02... 1.04... 1.07... 1.10... 1.16
1................0.94... 1.02... 1.05... 1.03... 1.08... 1.18... 1.22
2................1.02... 0.98... 1.09... 1.05... 1.12... 1.25... 1.19
3................1.05... 1.01... 1.15... 1.13... 1.17... 1.41... 1.35
4................0.97... 0.93... 1.23... 1.20... 1.23... 1.49... 1.32
5................1.07... 1.04... 1.25... 1.21... 1.30... 1.52... 1.45
6................1.17... 1.10... 1.29... 1.27... 1.43... 1.68... 1.70
7................1.13... 1.21... 1.34... 1.36... 1.50... 1.79... 1.82
8................1.12... 1.20... 1.50... 1.53... 1.59... 2.12... 1.71
9................1.19... 1.12... 1.56... 1.67... 1.82... 2.31... 1.68
10..............1.04... 1.24... 1.62... 1.72... 2.03... 2.77... 1.81
11..............1.17... 1.34... 1.72... 1.92... 2.07... 2.58... 2.25
12..............1.29... 1.52... 1.81... 2.11... 2.51... 3.30
13..............1.36... 1.40... 1.84... 2.26... 2.71... 3.23
14..............1.57... 1.30... 2.10... 2.31.............. 3.62
15..............1.34... 1.47... 2.52... 2.58
16..............1.59... 1.78... 2.47... 2.69
17..............1.46... 1.87... 2.84... 2.66
18........................ 1.93
19........................ 2.04

Look at the "12" line, for example. After 12 years a pitcher who debuted at 20 (e.g., Pedro) has declined in effectiveness by 29% on average. That's not bad at all. But notice almost all that decline can be attributed to the last 2 years. (The apparent dramatic improvement from 9 to 10 I attribute to small sample size. Remember, the number of pitchers in each group keeps changing over time.)

Returning to the "12" line, notice how much greater the decline in effectiveness is for those who debut later. For the 21-year-old phenoms, the dropoff is nearly twice as much, 52%. For the 22's, it's almost 3 times as much, 81%. For the 23's, the average pitchers, it's nearly 4 times as much, 111%. Pitchers like Colon who start at 24 can expect a 151% worsening, more than 5 times as much. And those who don't make it until 25 can look forward to a catastrophic 230% decline over 12 years, nearly 8 times as bad as their 20-y.o. counterparts. That's assuming they make it that far. Those who debut after 25 have little hope of being around after 12 years.

I must say I find this data shocking. How could any pitcher have his ERA more than triple and still see significant playing time? Even doubling an ERA seems highly unlikely. Perhaps Tango can shed some light on this.


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 12:47 p.m., February 4, 2004 (#42) - FJM
  I understand what you are saying, but I'm not sure it applies here. In life insurance terminology this analysis involves looking at cohorts; i.e., groups of pitchers sorted by debut age. So the population we are dealing with is fixed; no new pitchers can join a cohort after the debut age. The only changes in the cohort over time come from pitchers dropping out due to injury, aging or just plain poor performance. Thus it would seem to me any bias would be in the direction of retaining the better pitchers while purging the marginal ones. If that is the case, we would expect to see less deterioration in this data base than there is in reality. Imagine how bad it could get if every pitcher was forced to continue pitching until age 37!

The easiest way I can think of to test for this bias is to create new cohorts based not only on debut age but also on how old each pitcher was in his final qualifying year. Do you have the data to do that?


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 2:28 p.m., February 4, 2004 (#46) - FJM
  I think we're saying the same thing. The point is, we can all speculate on the effect that attrition has on our data base. But if you define your cohorts based on both debut age AND final age we won't need to speculate; we'll know the answer.


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 7:22 p.m., February 4, 2004 (#49) - FJM
  The study you are proposing is somewhat different than what I suggested. Requiring that a pitcher have a minimum of 250 PA's every year strikes me as too restrictive. It essentially requires that he be free of major injury for most of his career, something very few pitchers achieve. For some of the smaller cohorts (e.g., debut age 20) you could be down to just a handful of pitchers at some ages, perhaps none at all in the advanced stages of their careers. Moreover, as you point out, you still have to deal with the problem of attrition in year x-y. If x>>y the attrition problem isn't too serious, but the injury-free requirement is; if x=y+4 or 5 attrition is still a major issue. If you try my approach attrition is not a problem at all and major injuries are a minor complication not requiring that a pitcher be thrown out of the study.



Aaron's Baseball Blog - Basketball (February 9, 2004)

Discussion Thread

Posted 12:53 p.m., February 9, 2004 (#4) - FJM
  Jim: I understand why you want to exclude non-shooting fouls, just like technical fouls. But why would you want to exclude fouls completing a 3-point play?


More Help Requested (March 4, 2004)

Discussion Thread

Posted 3:58 p.m., March 5, 2004 (#16) - FJM
  Can you create a correlation matrix for each player using all the raters and their ratings of the 7 skills? I'm guessing the correlations are going to be very high across the board, even though the skills are theoretically independent of each other. In other words, if a player is perceived to be better (or worse) than average by a fan, he'll probably be viewed that way across the board.


Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)

Discussion Thread

Posted 1:27 p.m., March 18, 2004 (#24) - FJM
  Most teams try to put their best overall hitter in the #3 spot. Are you saying that the typical lineup is seriously suboptimal for that reason alone? Why is it better to have your best hitter in the cleanup spot as opposed to #3?


MGL - Questec and the Strike Zone (March 20, 2004)

Discussion Thread

Posted 1:04 p.m., March 22, 2004 (#10) - FJM
  MGL: "I suspect however that the change in ball to (called) strike ratio in the league as a whole in 2002 and 2003 has nothing to do with QuesTec and everything to do with the fact that it took a year or so for many umpires to get used to the new strike zone."

There's just one problem with that theory. Based on your data, it appears that the umpires were able to make that transition successfully in 2001 in the future QuesTec parks but it took them another year or two in the other parks. Since they had no way of knowing that QuesTec was coming, much less where it would be, that seems like a bizarre result. I suspect the explanation lies elsewhere, either with the home batting bias you mentioned or perhaps with a bias created by the usage of umpires themselves in Q/non-Q parks.

Can you post the Ball-to-called strike ratios for each of the 30 MLB parks for all 4 years? That might help us to understand the large differences that apparently existed between future Q/non-Q parks back in 2000 and 2001.


MGL - Questec and the Strike Zone (March 20, 2004)

Discussion Thread

Posted 7:29 p.m., March 23, 2004 (#14) - FJM
  I don't read the data the same way you do. You are assuming that there is something inherently different about the future QuesTec parks, as evidenced by the disparities in Ball-to-called-strike ratios between them and MLB as a whole in 2000 and 2001. I admit that's a possibility, which is why I'd like to see the data park-by-park. But I think that it is unlikely.

There are essentially no differences between Q and non-Q B-to-C-S ratios (or walks or K's, for that matter) in either 2002 or 2003, which appears to contradict the Schilling Hypothesis. Occam's Razor says, when faced with 2 or more possible explanations for anything, the simplest one is usually the right one. In this case the simplest explanation is that Schilling was wrong, at least overall. (He might have been right about Bank One Ballpark.) There is no reason to believe that the pre-QuesTec disparities would have persisted if QuesTec had never been implemented.


MGL - Questec and the Strike Zone (March 20, 2004)

Discussion Thread

Posted 12:23 a.m., March 24, 2004 (#17) - FJM
  Well, let's see. With an average ratio of 0.925, there is no question that there were significant differences between the 2 groups back in 1993-96. That's a mystery worth exploring, but I question its relevance to the issue at hand. Virtually every park has changed significantly since then. Bank One Ballpark, where this controversy began, didn't even exist until 1998!

The 1997-99 period is a much tougher call. There does appear to be a small difference here. But with an average ratio of 0.98, I'm pretty sure it is not statistically significant. A few individual parks may be significantly different than MLB as a whole, but not the future QuesTec parks as a group.

Which brings us back to 2000-01. The average ratio for the 2 years using both home and road batters was 0.945. It was 0.883 using home batters only. That suggests the ratio for road batters (and home pitchers) was 0.825. Even allowing for the bias caused by introducing the home pitchers I find it hard to understand how the same parks (and the same umpires) could be simultaneously producing ratios of 0.945 and 0.825.

As further confirmation of this point, look at 2002-03. The ratios are virtually identical, whether you look at home batters only or both teams. The only way to resolve this mystery is to look at the data park-by-park.


Sophomore Slumps? (March 23, 2004)

Discussion Thread

Posted 7:54 p.m., March 23, 2004 (#15) - FJM
  How about changing the title of the article to this:

DEFYING REGRESSION TOWARDS THE MEAN:

1/3 OF ALL ROY'S ACTUALLY IMPROVE IN THEIR SOPHOMORE YEAR!!!!

Actually, it would be closer to 50% if you limited it to those who enjoyed an INJURY-FREE 2nd season. Now that's newsworthy.


Copyright notice

Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.

If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.