See copyright notice at the bottom of this page.
List of All Posters
Evaluating A-Rod (December 8, 2003)
Discussion ThreadPosted 10:09 p.m.,
December 11, 2003
(#24) -
Guy
I think there's a fundamental problem with the logic that says A)wins are worth X ($1.85M), B) A-Rod is good for about 7.5 net wins, therefore C) it only makes sense to A-Rod around $14M. The unstated premise is that all wins are worth the same, that 3 players producing 2, 2.5, and 3 wins are cumulatively worth exactly the same as A-Rod, that if I don't "overpay" A-Rod I can always buy more wins with my $$ elsewhere. In many economic markets that would be true, but I don't think it applies here, for two reasons. First, a team faces artificial constraints within which it tries to accumulate wins: a roster limit and, more importantly, PA and IP limits. Second, the market does not provide at all times a wide array of players of different values at each position. In that context, a player who can generate 7.5 wins while consuming about 1/10 of your team's offensive playing time, especially at SS, may be worth much more than 3 guys in the OF contributing 7.5 wins over 30% of your PAs.
I look at it this way. Tango says a team of replacement players will win about 49 games (.300). Suppose my goal as owner is to win 100 games and advance to the postseason, so I need 51 marginal wins. Realistically, I need to generate most of those wins from 14 players -- 9 position players, 4 starters, and a closer. I don't have the option of buying 30 1.5-win players for $2.8M each -- because I can't play them all! So those 14 players have to generate an average of 3.6 net wins to reach my goal; assuming my 10 backup players produce something, let's call it 3.2. That's a very high average level of performance (PECOTA projects Thome at 4.1/season). Can I get there without buying one or two 5-7 win players? It's possible, but very hard.
It's hard because of the second factor: a limited and "illiquid" market for baseball talent, especially when you need to fill 8 discrete defensive positions. How many 3-win or better catchers, SSs and 2B are available? Not many. If I don't sign one or two very elite players, putting together a top team is like trying to fill an inside straight -- I need to find very good players everywhere else. In contrast, if I have A-Rod, I have more flexibility to gamble on 1 or 2 cheaper players with the potential to overperform. And I may figure that I can find some good-hitting corner IFs or a LF at below-market prices (see 2003 Red Sox), which you can never do up the middle.
Given this constrained market, and the scarcity of very high "win density" players like A-Rod, the price naturally goes up.
That leaves the question of whether it makes economic sense to pay more than $1.8M for a marginal win. Certainly for an owner who wants to win (they do exist!), it's easy to see them paying at least the full marginal revenue ($2.5M)for a player who can contribute so much. There's no profit, but also no loss, and your team is better. And if Studes is right that wins 95 and 100 generate more revenue than wins 70 and 75 -- which certainly sounds right --then a lot of that higher revenue will go to the very best players. For it is they, not the 1-2 win players, who make it possible for a team to win at that level. So it's not hard to see how a 7.5-win player could be worth considerably more than $14M to a good team.
Evaluating A-Rod (December 8, 2003)
Posted 10:14 a.m.,
December 12, 2003
(#26) -
Guy
Doesn't sound like we're far apart, Tango. To get to your benchmark of 95W, the 14 key players need to average 3.4 wins above replacement (I guessed 3.2). The question is, how hard is that to do without buying one or more elite 5+ win players? You describe +3W as "a little above average." But how many players consistently perform at that level? And specifically, how many catchers, ss, and 2b? My guess is it's fewer than one would expect. In the PECOTA table in Bloom's article, the players projected to perform in that range are I-Rod and Cliff Floyd. That strikes me as a very high level of performance to average over 9 position players and 5 pitchers. And assuming you will inevitably fall short at 2 or 3 positions, you then need to exceed that level elsewhere.
This illustrates the larger problem with using the arithmetic mean to define "average," which is common to almost all baseball analysis. Mathematically, you have to do it. But analytically, you always have to remember that the mean performance is much higher than the median, or what a typical MLB player delivers. In part, this is because good players get most of the playing time, boosting the mean performance. Also, as Bill James pointed out long ago, the distribution of talent in baseball is a pyramid, not a normal curve. Consequently, there are far fewer "above-average" players than "below-average players," and the number who are one full game above average is much smaller still.
You're in a better position than I to evaluate this, but it seems to me that assembling a core of players that average +3.4 wins is no mean feat. And if the owner wants a high degree of confidence that the team will win 95+ games (as the Red Sox clearly do today), having an A-Rod/Vlad/Bonds type player is almost essential.
Evaluating A-Rod (December 8, 2003)
Posted 12:22 p.m.,
December 12, 2003
(#28) -
Guy
In defining median, are you including all players who play during a season? Just those who play with some regularity? If the former, median can't possibly be 95% of mean; if latter, I remain skeptical, but maybe so.
In any case, this is a sidebar (if interesting). How common are 3.4 wins over replacement players?
Do Win Shares undervalue pitching? (December 15, 2003)
Discussion ThreadPosted 11:42 p.m.,
December 15, 2003
(#9) -
Guy
Tango, you say that "WS is shortchanging the def by 0.5 runs per game, and overcrediting the off by 0.5 runs per game. That's a huge, huge problem." However, the RS/RA ranges you are talking about don't actually happen. Yes, there are .350 teams, but none that are .350 solely because of poor hitting or poor defense -- it's always some of both. At the team level in today's game, a great offensive team is one run per game above average, a very weak team one run below. Assuming a 5 run environment, a top offensive club will be 6RS/5RA or .59, while a top defensive club will be 5RS/4RA, or about .61. If you use James' .52/1.52 bookends, the good offensive team is credited with +3.4 R/G and the good defensive team is at +3.6, very close to the correct relationship (which I assume is why the .52/1.52 were selected). I could see how in the 1960s low-run environment James might have a small problem, but still nothing like the magnitude you suggest.
The real issue is VARIANCE, as AED suggests and you said in your original post. To try another metaphor, in little league games that use a machine to pitch, the game is presumably something like 75% hitting, 25% fielding, and 0% pitching. If every "pitcher" is the same, pitching no longer matters for explaining wins and losses. If every single MLB hitter performed in a range of .780 to .800 OPS, but we had the current variance among pitchers and fielders, baseball might indeed be 75% pitching. And that would still be true even if those MLB hitters were much better hitters than the rest of us, as long as there were enough of those hitters to fill all ML rosters (admittedly, an unlikely distribution of human athletic talent). The distribution of hitting and defensive talent seems fairly balanced in the current game, but that doesn't mean it is exactly the same. If there is more variance in RA than RS, then defense becomes a bit more important (James seems to say this in justifying his 52/48 split, but the case is murky). I could also see how this could vary in different historical eras: when 4 starters provided most of a teams' innings, pitching variance may have been greater (or smaller); modern gloves probably affected the range of defensive abilities in the game; etc.
That said, it seems to be the case that the offensive and defensive performance ranges are never greatly different, and this presumably reflects some fundamental facts about athletic ability and the rules of baseball.
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 11:43 a.m.,
December 16, 2003
(#16) -
Guy
Tango: On RS/RA, if one run saved = 1.2 runs scored, which I accept, then it seems that James' .52/1.52 benchmarks need tweaking, but are not "half a run/game off" for both offense and defense. Is that fair?
It occurs to me that Tango's point matters much more for at individual pitcher level than for the question of team level allocation. No team is much more than 1 R/G above or below average. But Pedro takes 2 R/G away from the opposing team, and it's concentrated in 30 or so games. So perhaps his contribution has proportionally more impact than WS suggests (where his runs prevented per inning are treated as only twice as valuable as preventing 1 R/G or scoring extra 1 R/G). Does this seem like a useful distinction? Seems to address widespread perception (which I share) that WS undervalues great starters.
If std dev for RS and RA has always been the same, then 50-50 is absolutely right for offense/defense split. Interesting it's been so stable.
Tango: To go back to original question of pitching/defense split, you say you've concluded it's 70-30. Could you post the cite? However, I think you've said elsewhere that responsibility for BIP is about 50-50. If that's true, and you then credit pitchers for Ks, HR, and BB, won't pitchers account for far more than 70% of total variance in RA?
Seems to me that if we give defense 50%, give pitchers 75-80% of that, and credit top starters for full impact of high level of runs prevented per game, then starters will get their due in WS.
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 12:50 p.m.,
December 16, 2003
(#18) -
Guy
On 70/30 split: Have you also factored variance into this? My perception was that the variance in teams' park adjusted H/BIP rates was relatively smaller than variance in HR/SO/BB rates (certainly, we know individual pitchers can be very successful with high H/BIP and vice versa), which would mean pitchers' performance could explain more than 70% of defense variance, even if BIP account for 75% of all RS. Is the variance actually about the same, or am I thinking about this wrong?
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 12:58 p.m.,
December 16, 2003
(#19) -
Guy
In addition to needing different RPW converter for pitchers vs. hitters, I think the other important implication of Tango's analysis is that preventing 2 R/G is more than twice as valuable as preventing 1 R/G: the first run increases W% by .095, second by .106. Thus, great starters have disproportionate impact on game outcomes.
(Conversely, each additional marginal RS is less valuable than the preceding RS -- first is worth .083, second is .070. Perhaps this is underpinning of old saw that "great pitching beats great hitting?")
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 4:27 p.m.,
December 17, 2003
(#38) -
Guy
I think there may be a problem with the starting assumption that "the standard deviations of offense, pitching, and fielding have a ratio of 52:35:17." Assuming that the ratio of offense SD to total defense SD is 52:52, and the ratio of pitching SD to fielding SD is something like 2:1, it does NOT follow that the ratio of offense to fielding is 3:1 (52:17). For the very same reason that one can't simply add the SDs for offense and fielding and get the combined SD for position players, it would follow that the inividual SDs for pitching and fielding -- also largely uncorrelated -- can't be added to equal the total SD for defense. So if SD for total defense does indeed equal 52, then the SD for pitching must be more than 35 and/or the SD for fielding is larger than 17, using this scale. Right?
Can anyone provide the actual team-level SDs for all these factors? Might help clarify the discussion.
UZR 2003 Previews (December 18, 2003)
Posted 5:47 p.m.,
December 18, 2003
(#6) -
Guy
For newcomers, could you post link(s) to an explanation of UZRs and past results? Thanks.
Clutch Hits - Race and Violence (December 18, 2003)
Posted 12:13 p.m.,
December 19, 2003
(#1) -
Guy
I think the most interesting aspect of this is the "QB issue" rather than the HBP stuff -- is there still discrimination at work in baseball, and if so, what form(s) does it take? A few years ago, I remember reading that black MLB players -- both pitchers and hitters, but especially pitchers -- performed appreciably better than white players across most or all statistical categories (not sure about Hispanics). This invariably leads to a debate on causation -- is this a genetic difference, social/opportunity difference, or combination -- that is certainly interesting (if often contentious). But the other implication gets much less attention: if black players are performing at a higher average level than whites, then unless the distribution of talent is vastly different in the two populations, this means that even today many blacks are being kept off rosters (and/or denied playing time) in favor of less talented white players.
In other words, if blacks are, for whatever reason, more likely than whites to be MLB-caliber baseball players, then they should be overrepresented in MLB compared to the total population (as they are). But that shouldn't mean that within MLB black players are any better than white players --both groups should include a range of talent from replacement-level to superstar. If black mean performance is appreciably better, then there must be a lot of "missing" black players of below-league-average ability. I suppose one could argue that these black athletes are choosing the NFL or NBA over MLB -- that black athletes only choose to pursue baseball is they are very, very good -- but I can't imagine there are enough jobs in those leagues to begin to provide the explanation.
Does anyone know if the performance of black players continues to exceed that of white players?
Valuing Starters and Relievers (December 27, 2003)
Posted 1:00 a.m.,
December 29, 2003
(#18) -
Guy
As a new contributor, I want to thank MGL for his warm welcome to Primate Studies. Seriously, thanks for the (generally) thoughtful comments. Three basic criticisms have been offered of my contention that relievers have an inherent advantage over starters:
1) Even if it is true, it doesn’t matter for valuing pitchers’ performance,
2) It isn’t true, and any observed difference is likely a function of small sample sizes for relievers and/or selection bias in which pitchers assume the two roles; and
3) It probably is true, but there is no way to accurately measure it.
Let me address each one.
IT DOESN’T MATTER
Tango has already addressed this issue, so I don’t have a lot to add. Personally, I am skeptical of systems that purport to assign absolute value to players without any reference either to average performance or replacement-level performance. This is one of Win Share’s conceits, but I don’t think it really succeeds, and I’m not even sure such a system is possible, or desirable. But in any case, most valuation systems DO compare players’ performance to some kind of benchmark, and this includes important elements of the Win Shares calculation for pitchers. So I think the reliever advantage, if it exists, clearly does have important ramifications for many player valuation systems.
IT ISN’T TRUE.
Two simple facts provide overwhelming evidence of a reliever advantage:
1) The large majority of relief innings are thrown by pitchers who have failed to succeed as starters;
2) Despite fact #1, in 2003 relievers posted a collective ERA 0.38 lower than starters.
The average team had 490 relief innings, and the average closer threw about 68 of these, or 14%. Even if we generously assume that all closers are very talented pitchers (the list includes Jose Mesa, Rocky Biddle, and Mike Williams), 86% of all relief innings were hurled by failed starters. The fact that relievers nonetheless perform better than starters on a per inning basis – collectively, not “anecdotally” – means that the reliever role on average confers a powerful advantage.
MGL hypothesizes that many pitchers get demoted to the bullpen on the basis of a small IP sample, and are then kept there despite the fact they could pitch just as well in a starting role. To begin with, this ignores the fact that pitchers’ entire performance record, including the minors and spring training, inform decisions about who starts and relieves. And think about what this theory means. There are scores of relievers who regularly post ERAs better than the average starter (4.54), and at least 25 teams each year must have a couple of relievers -- excluding their closer -- with ERAs better than the team’s worst starter. We’re supposed to believe that as 30 teams desperately try to find decent #4 and #5 starters, they continually overlook pitchers on their own roster who could successfully start? I recognize that GMs and managers make a lot of mistakes, but this is preposterous. If most decent middle relievers could pitch just as well as starters, many would be given the chance, and many of those would succeed (and the starter/reliever gap would shrink in my proposed dual-use sample).
MGL also suggests that relievers will naturally post many of the best ERAs, because they pitch only 75-100 IP while starters pitch twice that. This is an interesting theory, but a quick review of actual data shows it has little relevance here. Looking at pitchers who logged at least 50 IP last year, among those who had an ERA of 2.50 or lower 18 were relievers and 4 were starters, while among those with ERA of 6.00 or higher just 8 are relievers and 17 are starters (and a handful were hybrids). The predominance of relievers in the list of low-ERA pitchers obviously reflects the fact that the entire curve is shifted for relievers, not sampling error. Clearly, the threshold for continued employment in MLB is higher (i.e. a lower ERA) for relievers than for starters – as it should be.
Given Tango’s findings on times-through-the-lineup, the clear source of the reliever advantage is throwing harder, resulting in far more strikeouts and, perhaps, a lower hit% on balls in play. Last year, relievers struck out 1 more batter per 9 IP than starters (7.07 vs. 6.04), a 17% advantage. If we corrected for relievers’ lesser talent, I would guess that the K advantage in relieving would easily exceed 20%. (I couldn’t find PA totals for starters and relievers, but the K/PA gap is probably even greater).
Perform this thought experiment: imagine that all relievers are required to start 4 games in 2004, and all starters forced to pitch 30 innings in relief. Can any knowledgeable fan doubt that the former would be a disaster, or that the latter would, generally speaking, be a success? At a minimum, we should assume that starters are just as talented as relievers, which creates a presumption that the relief role delivers an advantage of at least 0.30 in ERA, and recognize that the real number is almost certainly higher.
IT CAN’T BE MEASURED.
Several people have suggested that my proposed methodology for measuring the reliever advantage, by comparing the performance of pitchers who have pitched in both roles, is flawed. Sample size would not be a problem, if one assembled data from the past ten years or so. The more serious objection is the potential for selection bias. It is not true that the dual-use sample would comprise only pitchers demoted to the bullpen after failing in only a handful of starts. It would include quite a mix, including solid relievers asked to make some spot starts, pitchers who have performed reasonably well in both roles (M. Batista), good starters who became closers (Smoltz), and a variety of other combinations. Still, we can’t be sure that dual-use pitchers are the same as those who have pitched only in one role.
Let me say first that I hope others will come forward with other suggested approaches for measuring the reliever advantage. This is the best I came up with, but I’d be happy to hear better ideas. And I hope Tango will share his findings soon (I reported his conclusion, but do not know what methodology he used).
That said, I still think such a comparison is valuable, if imperfect. Let’s break down what I’ve been calling the reliever advantage into two discrete, and potentially different, numbers: 1) Reliever Decline -- how much worse the average reliever would perform if forced to start, and 2) Starter Improvement -- how much better the average starter would perform in relief. My proposed study would almost certainly understate the magnitude of reliever decline. After all, to become a dual-use pitcher, a manager had some reason to think he could succeed in the starter role at that time. For relievers who have never been used as starters -- despite posting better ERAs than most starters! -- it is fair to assume that they would see at least as large a decline in performance as the dual-use group, and probably even larger.
Moreover, most relievers probably had some experience starting in the minor leagues. If someone can demonstrate that current ML relievers performed just as well in the starting role while in the minors as did pitchers who went on to succeed as ML starters, I’ll reconsider my position. But I will be shocked if that is the case, for the simple reason that teams desperately want to find quality starters, and would give successful minor league starters a reasonable opportunity to succeed before relegating them permanently to the pen.
So, I think a study of dual-use pitchers would, at least, provide a conservative estimate of reliever decline.
The much harder question is: Would current starters improve their performance by the same amount that relievers decline, if asked to pitch in one or two inning increments? It could be true that the gap would be smaller going in this direction – that what distinguishes starters is their ability to maintain a high performance level for 6 or more innings, but they cannot improve greatly even if asked to pitch fewer innings. And if true, a study only of dual-use pitchers would miss this.
To this objection, I would say two things:
First, that the burden of proof should fall on those who want to argue that starters are a totally different animal, immune from the difference found elsewhere. We have observed that relievers post better ERAs, even though we have good reason to think they are lesser talents. Among pitchers who have pitched in both roles, we expect to find (and maybe Tango has?) that they perform better as relievers. Given that, the burden of proof would shift to those who believe starters who have never pitched in relief form a separate class.
Second, and more importantly, the fact that our best estimate of the reliever advantage may be flawed is not a reason to ignore this difference. If the best evidence we have suggests relievers have an inherent advantage of .6 runs per game, that is very significant. Should we ignore that because we can’t totally rule out the possibility that it is really .45 (or .7)? Should we keep using a single benchmark for all pitchers when we know that makes no sense, because we aren’t sure exactly how different the benchmarks should be? To my mind, this is making the perfect the enemy of the good.
Finally, there are good reasons beyond the issue of player valuation to study and try to measure the reliever advantage as best we can, as well as which types of pitchers enjoy greater or lesser advantages in the relief role. Because if it exists at any level close to .6 runs/game, then it probably has major implications for the development of the game. Such an advantage would mean that a substantially below average starter will generally become an average pitcher in relief, an average pitcher will become a very good one, and a good pitcher will become excellent. And if that is true, we would expect a number of things to happen:
Teams would want to use relievers as much as they could, except when starting the very best starters;
Teams would move in the direction of carrying as many pitchers as they could, within the constraint of the 24-man roster (i.e. until the cost of losing another position player was too great);
If pitchers can throw harder when pitching fewer innings, then having starters pitch every 5th day rather than every 4th may improve their performance, as will having them pitch fewer innings per start;
Strikeouts will increase to the extent that teams pursue some or all of these strategies.
As it happens, this is a pretty good description of changes in pitching use over the past twenty five years or so. In 1968, starters accounted for 75% to 80% of IP for most teams, while today it's 67%. Five man rotations are now universal. Strikeout rates are up dramatically.
I’m not suggesting that the reliever advantage is the sole, or even primary, reason for these changes. But it could well have played an important role in the evolution of pitching staff management. This would be true even if managers and GMs didn’t consciously recognize the reliever advantage concept – simple trial and error response to the increasing advantages held by the offense over the past two decades would have pushed teams in this direction.
Thanks again for the comments. And I do hope others will contribute their thoughts about better ways to measure and understand this issue.
It is true that I take starting pitching to in some sense be the norm, when I say that someone who can pitch well for 7 innings is “more talented” than someone who can perform the same for only 1 or 2 innings. Clearly, the former is more valuable than the latter (assuming a performance above replacement level). I think it’s also reasonable to call that superior talent – certainly, most pitchers start their careers hoping to start – but I can see the contrary view.
Valuing Starters and Relievers (December 27, 2003)
Posted 1:04 a.m.,
December 29, 2003
(#19) -
Guy
Sorry -- last paragraph there was included by mistake.
Valuing Starters and Relievers (December 27, 2003)
Posted 10:53 a.m.,
December 29, 2003
(#25) -
Guy
I like David Smyth's suggested approach, both as a way of measuring the reliever advantage and because it addresses another important implication of the advantage: that there is no such thing as a single "replacement level pitcher," but rather replacement-level starters and replacement-level relievers. Depending on the distribution of the "durability talent," we could find that the gap is bigger (my guess) or smaller at the replacement level than overall.
Another interesting implication is the aging pattern for pitchers. If many starters pitch an increasing number of relief innings in their 30s, while few relievers do the reverse, then this will artificially flatten out the performance curve by age. That is, actual performance of starters may decline more quickly than it would appear by looking at their combined start/relief IP. And the curve may be quite different for staters than for relievers -- it may be easier to maintain performance with age in the relief role than it is as a starter. I would certainly guess that the average age of relievers (weighted by IP) is older than that of starters.
Valuing Starters and Relievers (December 27, 2003)
Posted 2:21 p.m.,
December 29, 2003
(#28) -
Guy
Perhaps thinking of two separate pools is not helpful. But clearly a better component ERA is required to hold a MLB job in relief than as a fifth starter, and I would expect the difference to be substantial. And when we evaluate starters in comparison to replacement level, the benchmark should be what a replacement-talent pitcher does in a starting role (not mixed with relief performance).
I may be misunderstanding Colin's suggestion, but it seems to me that comparing Schilling to a composite of 5 IP replacement starter performance together with 3 IP replacement relief performance would understate Schilling's value. His benchmark would then be better than just comparing him to the starter benchmark alone, in a sense penalizing him for his durability. This seems particularly misguided when we consider that relievers' performance is presumably enhanced by the days off they are periodically provided by 8-9 inning starts like Schilling's.
Valuing Starters and Relievers (December 27, 2003)
Posted 3:44 p.m.,
December 29, 2003
(#35) -
Guy
While we like to have an image of "the" bubble player, it is really the average of many different bubble players. One replacement shortstop may play above-league-average defense but bat .200, while another plays below-average defense but hits .260. Presumably, the NET impact on the team's run advantage is the same, making both replacement-level players.
My guess is that Tango's image of a RL pitcher -- the poor man's Jamie Moyer -- is the OLD version of the RL pitcher. The young RL pitcher probably looks quite different: he can throw hard for a couple of innings, doesn't have good control, is generally inconsistent -- and probably performs much better as a reliever. And my guess is that this type of RL pitcher is more common, and would come closer to matching the average RL pitcher. Of course, that's just a guess.
Valuing Starters and Relievers (December 27, 2003)
Posted 3:56 p.m.,
December 29, 2003
(#37) -
Guy
Colin: If our replacement scenario is 5 IP for a starter and 3 IP for reliever(s), shouldn't we also ask the question: wouldn't replacement-level relievers pitch even worse if forced (collectively) to pitch 3 innings every day? Perhaps more relevant, replacing Curt Schilling with a starter who went only 5 IP per start would increase the workload of his team's bullpen by about 13% (60-70 IP). This would almost certainly have a measurable negative impact on the bullpen's performance. So no, I don't think you should leave aside the obvious value to the team of Schilling going 8.
Valuing Starters and Relievers (December 27, 2003)
Posted 4:05 p.m.,
December 29, 2003
(#39) -
Guy
Tango (and others): can you post citations or links to what you consider the best current research/thinking on replacement level performance?
Valuing Starters and Relievers (December 27, 2003)
Posted 9:21 a.m.,
December 30, 2003
(#43) -
Guy
Colin: wasn't offended at all by the very reasonable criticism.
As for your example, let me turn it on you: suppose you had four pitchers (call them "Ligtenberg A", B, C, and D) who cumulatively put together Schilling's exact same stats. Would you value them the same as Schilling? Obviously not -- at that point the cost of consuming 4 roster spots is too huge to ignore. But the cost of 1 roster spot is also real. I guess I just don't agree that the roster spot is a different issue -- it has to be accounted for in valuation.
In any case, whether we define the starter benchmark as 8 IP of a replacement starter or the 5/3 combo, it will probably have a fairly trivial impact on the final definition of our benchmark.
Valuing Starters and Relievers (December 27, 2003)
Posted 7:43 a.m.,
January 2, 2004
(#51) -
Guy
David: I just checked the Woolner formula using the 2003 MLB ERA of 4.40. They would give you replacement ERAs of St 5.96 and Rel 4.82, a difference of over 1 R/G. Is there an error in the formulas you posted, or is the difference actually greater than .6?
Valuing Starters and Relievers (December 27, 2003)
Posted 10:04 a.m.,
January 2, 2004
(#54) -
Guy
Sorry to belabor this, but I don't follow. At BP Woolner has RA/9IP for ML at 4.77 for 2003. Using the formulas above:
St: 1.37*4.77 -.066 = 6.47
Rel: 1.70*4.77 -2.66 = 5.55
Of course ERAs will be a bit lower, but the issue is the gap. What am I missing?
Valuing Starters and Relievers (December 27, 2003)
Posted 12:05 p.m.,
January 2, 2004
(#56) -
Guy
Thinking about this some more, it's not clear that this comparison captures the inherent reliever advantage. I assume Woolner is measuring the bottom 10% of starters and relievers, or something similar. But the worst #5 starters are presumably better pitchers than the worst middle-relievers. Tango suggested this spectrum above:
1S, 2S, 1R, 3S, 4S, 2R, 5S, 3R, 4R, 5R, 6R
Assuming it's something like that, comparing RL starter to RL reliever will considerably understate the reliever advantage. Put another way, a replacement-level starter would perform better in relief (on average) than a replacement-level reliever.
Valuing Starters and Relievers (December 27, 2003)
Posted 5:52 p.m.,
January 2, 2004
(#58) -
Guy
David: I think we're back to Tango's point that there's just RL pitchers, not two pools of starters and relievers. A #6 reliever is a RL pitcher -- what you can generally find available. But the worst #5 starters are better pitchers, and are not freely available in the same way. They have been selected for that role precisely because they are better than most of the team's relievers. Suppose you calculated a RL closer, by looking at the worst pitchers used in at least 10 save opportunities. This would be something comparable to R2 or R3 -- perhaps even a better pitcher than R5 -- but clearly not "replacement level" in the usual sense. You're doing the same thing here.
To know the reliever advantage, we have to know -- or estimate -- how the RL reliever would perform in a starting role. And we can be confident that would be worse than Woolner's "RL starter," or else he'd be starting!
Valuing Starters and Relievers (December 27, 2003)
Posted 11:43 p.m.,
January 2, 2004
(#60) -
Guy
I'm at a disadvantage, not knowing Woolner's methodology. But my point is precisely that the worst 5th starters -- if that's what you mean by a "replacement level starter" -- are indeed more scarce than the worst relievers. They are in fact the 6th or 7th best pitcher on most staffs. I think you would find they are paid considerably more than the worst relievers, and probably out-perform them when they are called upon to pitch in relief (or when the relievers are forced to make an occasional start). IOW, the worst starters are NOT replacement level pitchers; replacement level pitchers aren't permitted to start.
And given this disparity in talent between the worst starters and worst relievers, I find a .4 ERA gap entirely consistent with the idea thar relievers enjoy an advantage of .6 or more (given two pitcher of equal ability).
Valuing Starters and Relievers (December 27, 2003)
Posted 9:02 a.m.,
January 4, 2004
(#62) -
Guy
I must not be making my point clearly. I don't believe that the scarcity of 5.36 ERA starters and 4.95 ERA relievers is the same. Do you? There are plenty of 5.36 starters out there making good money, well above ML minimum. Cory Lidle (5.75 ERA) just signed a $2.75M contract -- think any 4.95 relievers have done the same? If a pitcher only manages a 4.95 ERA in relief, pitching 1-2 innings at a time, he won't keep a job very long (i.e. he is a true replacement level pitcher). These are NOT equivalent pitchers, in terms of scarcity, ability, or anything else. Does Woolner present evidence to the contrary?
Valuing Starters and Relievers (December 27, 2003)
Posted 1:32 p.m.,
January 5, 2004
(#68) -
Guy
Yes, that's what I've been trying to say. You can't select a subset of players based on ability, then measure the worst of that group and proclaim it "replacement level." Then a "replacement level closer" might have an ERA of 4.00. A "replacement level starting 1B" (500 PA) might have an OPS of .800, and a "replacement level All-Star 1B" an OPS of .860. And in none of these cases would "replacement level" be meaningful. Taking "starting pitchers" as a group is just a less extreme version of the same thing -- they are selected to start because they are better than replacement level.
FIP and DER (December 30, 2003)
Posted 6:50 a.m.,
December 31, 2003
(#2) -
Guy
Nice piece. It seems that the numbers on the relative power of FIP and DER in predicting ERA have implications for the debate over the proper pitching/defense allocation. The R2 for FIP is .54 and for DER is .31. If we used RS instead of ERA as dependent variable, and replaced DER with BIP-Slg ((1B + 2x2B +3x3B/IP)) to better capture impact of balls in play, we should find that BIP accounts for more than 31% of the variation. Let's assume the split would be something like 55/45 FIP/BIP if we eliminate unexplained variance due to chance.
In "Solving DIPS," Tango et. al. argue that the division of responsibility for DER/BIP is about 62/38 pitchers/fielders, leaving aside park effect and random chance. I find this much more believable than the original Voros thesis -- which Studes seems to accept (sorry if I'm misinterpreting, Studes) -- that DER is almost entirely driven by fielding.
Combining these, it would suggest a pitcher/fielder split of around 83/17, giving fielders far less influence (about 9% overall) than they are assigned in WS. Thoughts?
FIP and DER (December 30, 2003)
Posted 1:44 a.m.,
January 1, 2004
(#12) -
Guy
" In theory, shouldn't DERA decrease as FIP decreases? Less BIP, less impact of a hit. Instead, DERA increases (and DER stays flat) as FIP decreases. I think I'm not doing this quite right."
Yes, a low FIP should tend to produce a lower DERA as more Ks = fewer BIP (but shouldn't affect DER in the same way). However, keep in mind that DERA is really a composite of two factors: pitcher-DERA and fielder-DERA. And there are other possible relationships at work. In terms of basic pitcher ability, low FIP is probably associated at least weakly with low pitcher DERA -- it's hard to believe the skills are wholly unrelated. In addition, you'd expect that financial disparities among teams could produce some association between low FIP (good pitching) and low fielders DERA (good fielding).
Against all of those tendencies is one major factor that cuts the other way: if you are a high-FIP pitcher, you must have a reasonably good pitcher-DERA or you won't pitch for long in the major leagues (i.e. if you can't strike batters out, you'd better be able to get them out on BIP). This selection process should contribute a negative correlation between FIP and DERA. (It would be interesting to see if FIP and DER are somewhat correlated in the minors, but not in the majors). Apparently, this last factor slightly outweighs the impact of the others (and maybe I'm missing others). The zero correlation between FIP and DER may just be an interesting coincidence.
FIP and DER (December 30, 2003)
Posted 12:15 p.m.,
January 2, 2004
(#20) -
Guy
Why should fielders get more WS under either scenario? If your premise is that pitcher A compensates for his fewer Ks with more outs on BIP due to his ability, shouldn't he receive those WS?
FIP and DER (December 30, 2003)
Posted 10:25 p.m.,
January 4, 2004
(#24) -
Guy
Let's say you have a good, well-balanced .600 team, with above-average hitting, pitching, and fielding. Where should you put resources to improve? Studes' article points out that such a team will get diminishing returns on defense: with good pitching, the impact of good fielding is reduced. The reverse must also be true -- adding an ace pitcher will reduce RA a bit less on a good fielding team than on an avg fielding team. Offense is just the opposite: adding a good hitter to a high OBP/high SLG team will increase RS more than for an average team. Since salaries are presumably based on players' impact on an average team, this team should get more bang for their buck by upgrading their hitting.
But, on the other side, you have Tango's point that the win multiplier is higher for marginal reductions in RA than for marginal increases in RS. That would argue for a pitching upgrade strategy. So, which factor outweighs the other? Or maybe it's a wash, and either strategy would deliver an equal W/$ return?
A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)
Discussion ThreadPosted 11:35 a.m.,
January 6, 2004
(#1) -
Guy
Cool stuff. I wonder: what is the cumulative probability that so many teams with relatively small chances of being the best team actually won the WS (or that the most-likely-best-team won so few)? Can we say, with a high degree of confidence, that the playoff/WS system is not an efficient system for identifying the best regular-season team? (For whatever reason: favoring top 3 starters, something else.)
A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)
Posted 9:55 p.m.,
January 6, 2004
(#10) -
Guy
Back to Tom:
Another way to look at this is to assume the 8 playoff teams always include the likely best team, and the cumulative percentage for the 8 teams is close to 100%. Then if the WS had been perfect at identifying the best team in these 12 seasons, the WS champs would have an average p of .44. If the WS were no better than a coin toss, the WS champs would have an average p of about .12. In these seasons it was actually 15% -- just a little better than the coin.