Tango on Baseball Archives

© Tangotiger

Archive List

Win and Loss Advancements (November 13, 2003)

This is going to be what my version of Win Probability Added is going to be called. What I'm presenting is very little, and a very early draft.

More...


I have not, yet, considered park or quality of opposition. Please remember this, especially if you see alot of Colorado pitchers. The following considers that of a BIP, the success rate in converting it into an out is based half on the pitcher and half on the fielder.

WA refers to Win Advancement. That is, add up all those times where the team's win expectancy (WE) increased after the pitcher was involved in a play. How much was the gross increase in WE? That's Win Advancement. LA refers to Loss Advancement. Again, every time the team's chances of winning decreased, how much was the total impact of that, that we can attribute to the pitcher? The difference between Win Advancement and Loss Advancement is Wins Above Average (WAA).

On a team level (i.e., the sum of all players), the WAA is exactly equal to the team's actual
(Wins - Losses) / 2

I also present a WAA2 column that assumes that 100% of the BIP goes to the pitcher, just so that you can see what the difference is. Not much is the short answer.

Here's the top 20, and the bottom 10. I'm sorry, but for the moment, this is all I'm prepared to present.


pitcherid WA LA WAA WAA2
johnr005 83 60 23 24
martp001 55 35 20 21
schic002 67 54 13 14
maddg002 67 55 12 13
benia001 41 30 12 14
mussm001 66 56 11 11
browk001 50 40 10 12
nen-r001 43 34 8 5
lowed001 50 41 8 10
foulk001 36 28 8 9
smolj001 32 23 8 8
hudst001 59 52 8 10
remlm001 31 24 7 9
hofft001 38 31 7 8
rivem002 36 29 7 7
leita001 65 58 6 8
morrm001 37 31 6 7
colob001 63 57 6 9
gravd001 48 42 6 8
kochb001 42 36 6 8
...
...
blaiw001 19 23 (4) (5)
perec001 18 22 (4) (4)
navaj001 14 19 (4) (5)
suzum001 30 34 (4) (4)
meadb001 36 41 (4) (4)
joneb003 38 42 (4) (6)
erics001 37 42 (5) (5)
sancj001 24 30 (5) (6)
bohab001 35 41 (6) (6)
bellr003 28 34 (6) (6)


Updated: Nov 14, 2003


Here are the totals for the hitters. As with the pitchers, park and opposition have not yet been accounted for. (So, beware of those Coors hitters.) As well, no basestealing has been accounted for yet. Any baserunning gain/loss is given to the batter (for the moment). Fielding has also not (yet) been accounted for.

However, these charts do include: hitting with men on base, clutch performance in high-leverage situations, moving men over on outs, IBB, bunts, and everything else.


batterid WA LA WAA
bondb001 57 28 30
giamj001 61 36 25
heltt001 62 40 22
ramim002 57 36 21
gileb002 60 40 20
walkl001 47 27 20
palmr001 60 41 19
guerv001 63 44 19
sosas001 63 44 18
bagwj001 62 44 18
delgc001 57 40 17
thomj002 54 37 17
rodra001 58 41 17
shefg001 54 38 16
gonzl001 57 41 16
jonec004 58 42 16
willb002 55 39 16
abreb001 55 40 14
marte001 46 32 14
burke001 43 30 14
...
...
barrm003 26 32 (6)
flahj001 22 28 (6)
youne001 37 43 (6)
easld001 31 38 (6)
gonza002 25 32 (7)
reesp001 32 39 (7)
castv001 35 42 (8)
gland001 36 45 (9)
peren001 37 46 (9)
ordor001 23 33 (10)


And here's the list for basestealing. This includes any runner movement while the batter is still at the plate, including SB, CS, PO, WP, PB, BK, etc, etc, etc.

If there are 2 runners on base, both runners share the WA and LA, regardless as to who is the lead runner. This will have little impact overall. I'll only present the top 10, and I'll include the decimal place. As you can see, basestealing does not have much impact over 4 years


runnerid WA LA WAA
damoj001 4.4 1.6 2.7
gland001 3.1 0.7 2.4
castl001 5.2 3.0 2.2
womat001 4.1 2.1 2.0
biggc001 2.4 0.5 2.0
jeted001 2.6 0.7 1.9
alomr001 2.8 1.0 1.8
goodt001 3.1 1.4 1.7
reesp001 2.7 1.0 1.7
ceder001 4.4 2.7 1.6

Here's one place where Jeter really shines.
--posted by TangoTiger at 10:22 AM EDT


Posted 10:24 a.m., November 13, 2003 (#1) - tangotiger
  Should have noted, but all the above numbers is 1999-2002.

Posted 10:52 a.m., November 13, 2003 (#2) - Sylvain(e-mail)
  Tango: Great Great stuff, as usual.

As for the BIP, how did you do?
- Situation 1, WE = x
- BIP by the batter
- Situation 2, WE = y

and WA/LA (assigned to the pitcher) = (x-y)/2, this for every kind of BIP, or do you have some "classification" (infield fly goes 90% to the pitcher as in DRA, other)?

Interesting point and questions, if you have the time: the best starters lead the way, then come the best relievers with the very good starters... How do the populations "Starters" and "Relievers" compare? One could expect over time to have the same average WAA = 0, but the deviation?

I was also surprised by the gap between Pedro, RJ, and the rest of the pack. It seems a very good starter/exceptionnal reliever season is worth about 13-10/4 = 2.5-3 WAA or 5-6 Mio.US$, and exceptionnal one about 5 WAA for 10 Mio. US$, right? you must have said it before, but the gap is huge.

And finally, if this can be used for forecasting, what is the year to year correlation?

Sylvain

Posted 11:10 a.m., November 13, 2003 (#3) - tangotiger
  For the moment, I have not made a distinction between the types of BIP, nor of using the quality of the fielder. Even without doing so, the spread was so tiny that the extra work to do this will set me back several weeks. Since I'm not writing a book about this (yet), I'm going to let it go for a while. I'd much rather account for the park before I do anything else. That'll have, by far, the largest impact.

As for how to do the split, it's a secret for now, but if you followed the "Anatomy of a Collapse", you might be able to figure it out by a couple of examples in there. It's all pretty much common sense, after I would explain it, and it follows pretty much what a fan watching the game would think.

I have not compared the difference between Starters/Relievers, though I will at some point.

The only provision for the reader is that he must accept the concept of win expectancy. If you can't accept it, Win Advancement is not for you.

Posted 11:36 a.m., November 13, 2003 (#4) - Sylvain(e-mail) (homepage)
  Thanks Tango. Your calculations must have taken you a lot of time. Thanks for sharing them.

The homepage links to the "Anatomy of a collapse" thread, for those who like me might want to look back at it.

Sylvain

Posted 12:36 p.m., November 13, 2003 (#5) - ColinM
  This is going to be great stuff. It will be interesting to see if the best relievers are closer to the best starters in value when looking at individual seasons rather than an aggreagate. It seems to me that you would generally find more variance in an ace reliever's performance season-season than a starters, and it might be that the best relief seasons do compare well with the best starter seasons. Make that the best non Pedro and RJ starter seasons.

IOW, you might find that while an ace reliever is unlikely to be as valuable as an ace starter over a number of years, It is not so uncommon for a reliever to be the top pitcher in a league for a single season (like Gagne in 2003). I'm just guessing though.

Posted 1:42 p.m., November 13, 2003 (#6) - tangotiger
  In terms of performance (value), I wouldn't be surprised if that's the case, simply because the smaller sample size and the higher leverage will conspire to give you that.

Posted 1:45 p.m., November 14, 2003 (#7) - tangotiger
  I have also added the leading hitters (go to top of page).

I'm having a tough time trying to figure out what to do with basestealing. For example, you have runners on 1b and 2b. I'm inclined to give the double-steal value completely to the runner on 2b. With runners on 1b and 3b, again, some of that value of the steal of 2b should go to the runner at 3b, since he's keeping the pitcher/catcher honest a little. I might just decide to split the difference, and just give everyone an equal share. Any thoughts on the matter?

Posted 2:33 p.m., November 14, 2003 (#8) - tangotiger (homepage)
  Anyone else surprised how little Bonds is head and shoulders above the rest?

For example, I'm saying here that Bonds is +7.5 wins above the average hitter, Giambi is +6 wins, and you have a group at +5 wins.

If we look at MGL's superLWTS from 2000-2002 (See homepage link), Bonds is +271 runs over those 3 years, of +9 wins per year. Giambi is next at +202, or +7 wins, and then a group at the 150-160 range, or +5 wins.

So, while Bonds' counting stats are impressive, his performance is not at all randomly distributed to the extent that others' performances would be. What happens here is that, just as Hoffman et al can have higher leverage PAs, then Bonds's PA can be lower-leveraged to the point where he can be contained (relatively speaking). Bonds himself is still a monster, but a contained monster.

Posted 3:16 p.m., November 14, 2003 (#9) - Michael Humphreys
  Tango,

Your model is a significant advance over counting stat models, particularly for outliers such as Bonds. It also directly addresses the most difficult issue of pitcher evaluation--the real impact of relief pitchers. I also like the name "Advancements", which concisely captures the temporal feature of the model.

The value for Bonds looks good. Not only do all those IBBs contain his impact, but all of the quasi-IBBs have a similar effect. As Yogi Berra might say, "Barry's not so valuable anymore--he's too good."

One counting stat of yours that I like very much, Runs Produced (R + RBI - HR - (AB/10)), indirectly gets to the same result as well. I think Barry had 117 Runs Produced last year--obviously outstanding, particularly as it was accomplished in only about 130 or so games, but I'm pretty sure that Pujols was more valuable.

Posted 3:20 p.m., November 14, 2003 (#10) - tangotiger
  I've included basestealing. Go to top of page.

Posted 3:33 p.m., November 14, 2003 (#11) - tangotiger
  Thanks Michael. I'm kinda enjoying this.

***

Just to show the non-effect of how I'm handling the basestealing, let's consider Johnny Damon. Over 1999-2002, 192 times something happened with Damon on 1B (SB, CS, PO, WP, etc, etc). 166 of those times (86%), he was the only runner on base. Of those 26 times where there were other runners on base, the total WAA was 0.73 wins. In my system, I gave Damon 0.32 wins. So, he had another 0.41 wins that you may think he would have deserved.

When he was on 2B, he was alone 11 times, and lead runner another 27 times, (and 3 times he had a runner at 3b). Again, in my accounting system, I give Damon 0.46 wins, but if we give Damon a win for all the times he was the lead runner, that would be 0.67 wins. So, that's another .21 wins that you may think he deserves.

Finally, with him on 3B and a runner on 1b and no one on 2b, Damon I gave Damon .20 wins, while you can argue that all .40 wins should go to the runner on 1B.

Net/net: you can argue about adding .42 wins (over 4 years) to Damon's total. I'm not too interested, right now, to worry about the .1 win per year (1 run per year) impact in trying to improve this model.

Posted 3:35 p.m., November 14, 2003 (#12) - ColinM
  I do find the Bonds result surprising. I wonder though, how much of that difference from LW is because you are including his sub-par 1999 season? If his yearly totals are something like 4-9-9-8, then his 2000-2002 WAA numbers are actually very close to the LW estimate.

Posted 3:36 p.m., November 14, 2003 (#13) - ColinM
  Actually, I guess 4-7-10-9 might be a better guess but you get the point.

Posted 6:44 p.m., November 14, 2003 (#14) - Michael Humphreys
  Tango,

I've been tinkering with Runs Produced by Bonds over the course of his career, and it really helps discount the last three or four years. From 1990-2003, Bonds has been consistently in the 130-150 range, except for injury or strike-shortened seasons. 2001-2002 were *not* his best years on a "gross value" basis (by this measure), though on a Runs Produced/27 outs basis, 2002 was clearly his best year--though not freakishly out of line.

Have you done a BaseRuns analysis for Bonds? I tried going to your articles, but I couldn't be sure I'd be doing it right. As we've discussed, I believe that BaseRuns would do a better job with an outlier like Bonds. If Win Advancements shows that BaseRuns is better than Linear Weights (or the new, "linear" Runs Created), you'd have more evidence of the value of the BaseRuns approach for years we don't have PBP data.

If you have the latest version of the BaseRuns formula, it would be interesting to check Barry's 1999-2002 numbers. Also, if I understand it correctly, BaseRuns provides a "gross" runs number. Can the formula be adjusted to yield runs above average?

Posted 7:02 p.m., November 14, 2003 (#15) - Tangotiger
  BaseRuns would only work if I take one-9th Bonds, and 8-9ths an average player. It would be easy enough to come up with the average runs scored.

I doubt that BaseRuns (or its offshoot: custom LWTS) will give us reliable enough number for Bonds, simply because of the non-randomness of his numbers. Though, I can't say for sure how much off they might be.

Also, let's remember that I have not park-accounted yet. That might actually be enough to make up the difference that we see here.

I already have a pretty good idea how I would generate Win and Loss Advancement for non-PBP years, so that I can generate this for the whole history of baseball. I would guess that I would ask you at some point to supply your fielding numbers to the WA system, so that we'd have a complete set of numbers. But, we'll talk about that in a few months.

Posted 8:08 a.m., November 15, 2003 (#16) - studes (homepage)
  Tango, I just want to echo everyone else. This is tremendous.

It's a little bizarre (at least, to me) that you're doing this at the same time I'm playing with Win Shares. I'm posting WSAA (Win Shares Above Average) for all 2003 batters over the weekend. Once you've done 2003, it will be interesting to see what the differences between the systems are at that point.

I'm also fascinated that you have figured out a way to apply this approach to non-PBP data. That's super. Can't wait.

Posted 1:13 p.m., November 15, 2003 (#17) - jto
  I think I'm confused. Isn't PBP data being used if you need to know the base/out/inning situation of every occurence? That's hardly easily available data.

Posted 1:30 p.m., November 15, 2003 (#18) - Tangotiger
  jto: I don't understand the question. I use my system where I have PBP data. And I'll have another system that'll try to match, as best as possible, to the PBP system for those years with no PBP data.

Posted 9:55 a.m., November 25, 2003 (#19) - tangotiger
  Not sure if you noticed Foulke in the list. If you remember an article from last year, Foulke's LI was around 1.3 for the 1999-2002 time period (essentially, he was used, overall, similar in impact to a setup guy). So, his incredibly strong showing is even more impressive here.

If he was used "properly", he would have shot up as the #1 reliever in impact, along with Benitez. Benitez is another interesting pitcher, and the above suggests that he was quite effective "in the clutch", which is very contrary to the perception and results of 2003. (Numbers not park adjusted.)

Keith Foulke not only has the context-neutral numbers of potentially the best reliever in the game, but his performance when it counted was even more impressive.

Foulke will go to a team that can appreciate him. Redsox? BlueJays? Mets?

Posted 3:39 p.m., December 1, 2003 (#20) - tangotiger
  I wrote this at Fanhome, and will repeat it here:

====================
As for the various win impact methods, you have to decide what perspective you like the most, and use that method:

Perspective 1
I look at things in real-time, like a manager or fan does, and want to attribute the win impact as it happens. This means that I must assume a random distribution of events (centered around expectations of player matchups and managerial tendencies) for all future events.

Perspective 2
I look at things after the game is over, and try to attribute value to the various performances, given that I know all future outcomes. I assume that value was only created by players on the winning team (meaning performances that led directly to runs scoring or not).

Going 4-4 in innings where your team scored no runs, yet won the game had, essentially, very little value. (You can argue it had some because it let your team send an extra batter and prolonged the inning, but now you get into the what-if / probability scenario, and this perspective does NOT like this.)

Perspective 3
I think that all performances are statistical random variations centered around the players/park matchups, and therefore, I don't care whether my team won or lost. I just want to know what would have happened, on average, to a team, if I were able to insert this player into it.

This is the seasonal perspective, where you look at a player's line, and simply use a simple runs-to-win converter to figure out his theoretical win impact on a theoretical team.

I think that about covers it.

I don't really see anyone as being right or wrong here, since this is a question of perspective or opinion.

Posted 7:59 p.m., December 1, 2003 (#21) - studes (homepage)
  One of the aspects of these approaches that intrigues me is the adjustments that are used. I tend to think that there are two kinds of adjustments:

- Those that "normalize" the data, so that players can be evaluated over environments. This is hugely important for ability stats, like slwts, but I'm unsure what role that should play in value stats.

- Those that refine the win probability appropriately, such as starting out in a bigger probability "hole" against Pedro.

Seems that park factor differs in its role between the two approaches. To normalize data, you must adjust for the park/environment so that Todd Helton can be directly compared to Shawn Green, for instance.

To refine the win probability, you use the park factor to factor the fact that a home run in Coors does not increase your probability of a win as much as a home run in Dodger Stadium.

Two approaches, two goals. Same "factor."

As you've said, Tango, Win Shares is sort of a hybrid of the two. Win Shares includes adjustments for park, leveraged innings, pitcher handedness, clutch hitting and opportunities (such as double play opportunities). It also includes an adjustment for the difference between actual wins and projected wins.

I scratch my head on a couple of these. Pitcher handedness normalizes the fielding data; clutch hitting refines the win probability. I think that Win Shares strives to be a "normalized value stat." Not the cleanest approach perhaps, but useful anyway.

Posted 11:00 a.m., December 3, 2003 (#22) - tangotiger
 
pitcherid WA LA WAA WAA2
johnr005 83 60 23 24
martp001 55 35 20 21
schic002 67 54 13 14

Only Schilling could go from the 2nd best pitcher on his team to ... 2nd best pitcher on his new team. If Foulke ends up with the Sox, Redsox would have the 4 of the 10 best pitchers from 1999-2002. I'm sure they could get Benitez cheap, to make it 5 out of 10. I smell Voros.

If I separate the top 330 pitchers from the rest in terms of game impact (WA+LA), here's what the totals are:

- the top 330 pitchers have 80% of the game impact (essentially the effective IP).
- the bottom pitchers have .480 WA and .520 LA for every PA

I think this is a fair line to draw between regulars and replacement level. 80% to regulars and 20% to the backups/replacements. For Randy Johnson, the replacement level comes out to 1.4 wins below average. For the rest of the top starters, it's around 1.2 to 1.3 wins below average. THAT, I think, is the best measure of replacement level.

Anyway, in terms of wins above replacement, Pedro is +23 and Schilling is +17, over the 4 year period. Giving 1.85 million$ / win, and Pedro "earned" 11.1 and Schilling earned 8.3 million$.

Pitchers are incredibly overpaid.

Posted 11:15 a.m., December 3, 2003 (#23) - tangotiger
  (That was per year for the salaries).

I decided to separate based on 90% as regulars, and 10% as the backups/replacement. At that level, the WA/LA is .473/.527. The replacement level is now around 1.7 wins below average. In order to maintain the whole overall salaries paid out to pitchers at the same rate, I had to reduce the marginal $/ win to 1.4 million. Pedro comes in at 9 million$ and Schilling at 7 million$.

Kinda of strange that I make the replacement level more favorable to the top pitchers, but they get less money. This has to do because I put in a fixed 800 million$ allocated to pitchers.

Anyone have any numbers as to how much pitchers were paid in 2003?

Posted 11:18 a.m., December 3, 2003 (#24) - David Smyth
  Tango, could you explain that a bit more. I don't understand what you are doing. The top 330 pitchers? Aren't there about 330 pitchers in the league at any given time?

Posted 11:32 a.m., December 3, 2003 (#25) - tangotiger
  Sure thing.

From 1999-2002, there were 1006 players who pitched. If I take the 450 pitchers who pitched the most (15 per team), that gives me 90% of the IP. Therefore, the other 556 pitchers can be considered the "replacements". They accounted for 10% of the IP, or essentially 144 IP per team per year.

These are the 10 pitchers in the replacement group that pitched the most:
coloj001
vizcl001
holmd001
bradc001
sands002
dipoj001
orosj001
tuckt001
georc002
kohlr001

I don't recognize any name except Jesse Orosco, and that, I think, we can say is the definition of replacement level from 1999-2002.

Posted 11:50 a.m., December 3, 2003 (#26) - ColinM
  I don't know, .473 looks awfully high as a replacement level. Isn't ~.400 a more commonly used level? For example, I quickly looked at all pitchers with < 25 innings in 2002. They had a collective ERA of 6.05 compared to the MLB average of 4.27. So you would expect around a .340 W% from this group. Now there is going to be some selective sampling issues at play here, but I would think this shouldn't bring the expected W% much passed .400 if you wanted to use this method to set the replacement level. What am I missing with your method?

Posted 11:59 a.m., December 3, 2003 (#27) - tangotiger
  Careful!!

We've got a new scale here. (You will not that Pedro's WA/LA is 55/35, or 61% win advancements, and we know that Pedro is not a .610 pitcher.) 0.47 win advancements to .53 loss advancements is NOT a .470 pitcher. Apologies for not making this clear.

For example, a heavily used starting pitcher from 1999-2002 had 120 "game advancements" (WA+LA), or 30 GA per year.

An average pitcher would have gone 15/15 in WA/LA. The replacement level pitcher, using .47/.53, would come in at 14/16. That is, 14 win advancements and 16 loss advancements, or -2 wins.

Posted 12:01 p.m., December 3, 2003 (#28) - ColinM
  Some of the difference is I would only be looking at about 5% of IP. If I increase it to all pitchers with < 37 IP that comes to about 10% with an ERA of 5.54 or about a .375 W%.

Posted 12:17 p.m., December 3, 2003 (#29) - ColinM
  Thanks Tango, that clears it up. I should have seen that after all the posting I did on other threads about win and loss shares.

How many GA would an entire staff have? I ask because I'm interested in seeing where you have the replacement level set in terms of W%. How many wins would a staff of replacement pitchers have in your system?

Posted 12:30 p.m., December 3, 2003 (#30) - tangotiger
  That's a good question. The average game had 1.4 game advancements.

Setting the .473/.527 level to that, and our pitchers come in at 0.66 WA and 0.74 LA, or -.08 wins (i.e., the team would win 42% of the time, if given average hitting and fielding).

Posted 12:39 p.m., December 3, 2003 (#31) - tangotiger
  What if I take the top pitchers in playing time, so that they account for 95% of all IP?

In this case, the WA/LA would be .46/.54, which works out to -.11 wins, or a team of replacement pitchers would win 39% of the time with average fielding and hitting.

And if I take 99% of all IP? That would be .43/.57, or -.20 wins, or a 30% record.

I don't know about anyone else, but I think 90% is probably the right level, and maybe 95%.

Therefore, I would probably say that a team of replacement level pitchers and a team of replacement level non-pitchers would win about 30% of the time (calculations not shown, but just an educated guess). More accurately, they'd have the true talent to win 30% of the time. Over 162 games, they can of course win alot less (or alot more) by random statistical variation.

Posted 1:40 p.m., December 3, 2003 (#32) - ColinM
  I might lean towards 95%, just because it seems to line up better with some existing replacement levels.

However,
There still seems to be some (smaller) discrepency between the replacements W% using WA/LA and the expected W% using ERA. Would I be correct in assuming that these replacement pitchers would tend to have a low GA/IP ratio because they tend to pitch in low leverage situations? Is it possible that as the GA/IP ratio increases that the WA/LA ratio may change? Or should it remain static?

Also, if a team were to pitch all replacement pitchers, I would think it's possible that the total amount of GAs might increase. If a game with average pitching had 1.4 GA, the replacements might have more of an overall effect on the game (negatively of course) than an average pitcher. So the total impact might be greater than the -.11 wins you came up with keeping the GA the same.

Posted 1:50 p.m., December 3, 2003 (#33) - ColinM
  I guess the what I'm getting at in the last part of my previous post is that using "game advancements" to figure replacement value might not be a good idea. If everything a pitcher did was very close to the expected value and he pitched 270 innings he would be a very valuable pitcher but might only be 5-5 in GA. This would not be any worse than a pitcher who is 20-20, but when comparing to replacement level, there would be a big difference.

Posted 2:10 p.m., December 3, 2003 (#34) - tangotiger
  I might lean towards 95%, just because it seems to line up better with some existing replacement levels.

I really wouldn't pay attention to "existing replacement levels".

Would I be correct in assuming that these replacement pitchers would tend to have a low GA/IP ratio because they tend to pitch in low leverage situations?

You are correct.

Is it possible that as the GA/IP ratio increases that the WA/LA ratio may change? Or should it remain static?

The WA/LA ratio remains static, though the DIFFERENTIAL would increase. And, it's the differential that we care about.

Also, if a team were to pitch all replacement pitchers, I would think it's possible that the total amount of GAs might increase.

That is a great question, one that is on my to do plate in terms of creating historical WA/LA. My guess is that extreme pitchers (good and bad) have a below average GA/BFP ratio. That is, pitchers who pitch in non-close games have their Leveraged Index lower. However, in the level that exists in MLB, the LI of Bert Bylelven and Bob Knepper were both around 1.0. I suspect that the GA/BFP ratio (which is similar to LI) to be pretty constant for all types of starters.

If a game with average pitching had 1.4 GA, the replacements might have more of an overall effect on the game (negatively of course) than an average pitcher. So the total impact might be greater than the -.11 wins you came up with keeping the GA the same.

This is definitely on my to do list. I'm going to look at the GA for pitchers when they win and lose, and for different types of pitchers. With RJ and Pedro, the game score won't fluctuate as much as an average pitcher, and so, I'd guess that the GA is smaller for them. Same for a bad pitcher. Think of it this way: remember that 1993 playoff game between Phi and Tor, where it was 15-13 or something? There were so many wild GA in that game that I suspect that it would be off the charts. For a one-hitter, the GA would be very very small.

I guess the what I'm getting at in the last part of my previous post is that using "game advancements" to figure replacement value might not be a good idea.

You are probably right in the technical sense, though I'm not sure if there's any practical difference. I should have tracked # of PAs, but neglected to. And, re-running the report is a bit of a pain. I'd rather go on to something else, and revisit later.

Posted 2:18 p.m., December 3, 2003 (#35) - tangotiger
  I might lean towards 95%, just because it seems to line up better with some existing replacement levels.

I really wouldn't pay attention to "existing replacement levels".

To continue, I don't think you should come in with a mindset that "hmmm.... 40% seems right... what does that imply", and get a result of "o i c... that means that 5% of PAs are considered replacement level".

I would suspect that very very few people would consider 5% replacement level. Remember, 10% of IP being replacement level means that you have 450 pitchers over 4 years as being regulars, or 15 per team.

If you ask yourself "what is replacement", you'll probably answer "if I lose a pitcher, who can I get to replace him with", and you'll probably look outside the 300 or 330 pitchers. Over 4 years, with turnover and such, maybe that balloons up to 400 or 450".

But at the 95% level, you've got 550 pitchers in your regular pool, or over 18 pitchers per team over 4 years. That's kind of unrealistic, in my view.

Posted 3:13 p.m., December 3, 2003 (#36) - ColinM
  Point taken about the 95%. I will say though that I'm not just pulling .400 out of the air. This is about the level I would expect given the ERA of the bottom 10% of pitchers by IP. So if you use 10% as the cutoff, I would also think that this should be consistent with the level you find.

Posted 4:36 p.m., December 3, 2003 (#37) - tangotiger
  What's interesting is that you take the bottom 10% in IP for a given year, whereas I'm taking the bottom 10% over a period over 4 years.

I suppose if you were to look at it from a 1 year time period, you should take the top 300 or 330 pitchers, and everyone else would be part of the replacement group. (Ideally, you would do this based on Opening Day pitchers + DL, and NOT after-the-fact performances.)

Posted 9:45 p.m., December 3, 2003 (#38) - studes (homepage)
  Tango,

according to Pete's spreadsheet, pitchers were paid $1,063,471,375
last year. That includes a full year's salary at $300K for a number of pitchers how only pitched part of the year. I'd estimate that, if you took them out, pitchers were paid about a cool billion last year.

Posted 2:03 p.m., December 9, 2003 (#39) - tangotiger
  Fixing the total to 1.063 billion$ (since with my list I won't be able to split a player's salary by MLB service time), and fixing the $/win at 1.85 (based on previous reseeach), and the win advancement replacement level is .469, which is pretty much what we concluded it should have been. So, what we have is a strong indication that the $/win and the replacement levels are in-synch, and that, overall, pitchers are properly paid.

Anyway, here are the $ earned from 1999-2002 for the top 10 pitchers:
pitcherid salary earned
johnr005 $15.1
martp001 $12.0
schic002 $9.6
maddg002 $9.4
mussm001 $8.8
benia001 $7.8
browk001 $7.6
hudst001 $7.0
leita001 $6.9
lowed001 $6.9

Very interesting that the 2 best starters are properly paid. The problem is with all those second-tier pitchers getting the money of first-tier pitchers. (And, in this light, anyone not named RJ or Pedro is second-tier.)

Furthermore, while a pitcher may have earned those salaries, they would not be expected to continue earning such salaries. Their true talent in the coming years is probably below their previous performance levels.

I'd put a cap at 9 million$/yr for a pitcher (except for the big two).

Posted 2:11 p.m., December 9, 2003 (#40) - tangotiger
  The reason that free agent pitchers are overpaid is that teams save alot of money on the under 6-yr pitchers (because the system allows them too). So, they have a HUGE ROI on these pitchers, and instead of pocketing the money or reinvesting it into the organization, the teams overpay for the freely available talent.

Overall, they are paying what they should: 1 billion$ for pitchers. To put this into common man terms, a pair of Air Jordan's cost 125$, and my Reeboks cost 50$. So, I'm prepared to pay 175$ for the 2 pairs, but I got a discount for Reeboks to 25$. Most people would still only pay 125$ for the Air Jordan's, but a MLB team would pay 150$ for them.

Posted 2:51 p.m., December 9, 2003 (#41) - studes (homepage)
  But Tango, why would you single out pitchers for the "boomerang" effect? Why would they benefit from it, but not everyday players? Wouldn't a GM "overpay" for marginal everyday players, because they benefitted from a Giles or Pujols pre-arbitration?

I think pitchers are paid more because of the potential high production they could bring to the mound. It's the same as the A-Rod factor. A single, good pitcher COULD have a year in which he contributes 5/6 wins, and the market pays for those wins at a higher rate than they pay for the first few wins.

You may have already touched on this (I haven't followed the entire thread), my guess is that standard deviations are higher for pitchers than everyday players. It may seem counterintuitive, but that would increase their salaries.

Posted 3:14 p.m., December 9, 2003 (#42) - tangotiger
  I didn't mean to single out pitchers. This would apply to hitters as well.

And, I take it as a given that MLB, on the whole, have properly split the allocation between pitchers and non-pitchers, hence my allocation of 1.063 billion$ to pitchers. So, I'm presuming that the average pitcher and average non-pitcher are properly paid.

By the way, how much $ went to nonpitchers? Seeing that pitchers get 36% of the Win Shares, that means that they earned 1.063 billion$ for 2620 Win Shares, or 400,000$ per win share. You had the overall as 300,000$ per win share, meaning that nonpitchers must be at around 250,000$ per win share. I wouldn't be surprised if this is tied into a problem with Win Shares (i.e., not enough WS for pitchers).

Posted 3:24 p.m., December 9, 2003 (#43) - studes (homepage)
  Yeah, that's the big question. Are pitchers overpaid? They are according to Win Shares. Also according to VORP. Not sure about WARP.

Total salaries last year were $2.2B. If you split runs and runs allowed 50/50, then $1.1B would go to batting and $1.1B to defending. $1B went to pitchers, which means $0.1B went to fielding. Does that seem right?

Posted 4:43 p.m., December 9, 2003 (#44) - tangotiger
  Don't forget that 300 million of the pitcher salary is part of their minimum salary. It's like like a non-pitcher get 2 minimum salaries (one for hitting and one for fielding).

I'll take a guess that since 700 million$ goes to a pitcher's above the minimum salary, that 300 million$ should go to fielders. However, because the distribution of the fielders is not like that of pitchers for various reasons, I'll say it's more like 100 to 200 million.

I've got some long process, and this is what I've come up with. Set the replacement level to .47 win advancements for pitchers and hitters (that works out to a "win %" of .420). Set the marginal $/win to 1.42. This will give us 860 million$ for pitchers and 1340 million for nonpitchers.

That 1340 is split as 480 minimum salary, 120 fielding, and 740 for hitting.

The 860 is split as 300 minimum salary, and 560 for pitching.

So, using 740, 560, 120, the breakdown is 52% hitting, 39% pitching, and 9% fielding.

Posted 4:49 p.m., December 9, 2003 (#45) - studes (homepage)
  I'll have to chew on that one. Got to admit that I don't understand exactly what you mean. $300 million for minimum salaries seems like a huge number. That would imply that there were 1,000 full-year pitchers in the majors last year, at $300K per pitcher.

Posted 5:20 p.m., December 9, 2003 (#46) - tangotiger
  Hmmmm... you know, my numbers were based on 1600 nonpitchers, and 1000 pitchers (because I'm covering a 4-yr span). So, I think I have a problem. I should probably set everything up to the 4-year salaries, rather than annualized. It'll save me grief. Ignore the above for now, and I'll repost the new results tomorrow.

Posted 11:00 p.m., December 9, 2003 (#47) - AED
  I think it's an issue with the definition of 'replacement level' in win shares. A player who is a replacement-level hitter probably only plays in the majors if he's well above a replacement-level fielder. Likewise, a replacement-level fielder is probably a decent hitter. So a true replacement-level position player is better than someone who is both a replacement-level batter and a replacement-level fielder. Since Win Shares are calculated separately for batting and fielding using the replacement level of each, the values of postion players are inflated.

Posted 11:41 p.m., December 10, 2003 (#48) - studes (homepage)
  I'll be the first to admit that I get lost on some of these more obscure replacement level issues. I've studied this thread a bit, and I still don't get win advancement percentages vs. won/loss percentages, and what that means for replacement level. Oh well.

What I have done is gone back to the salary data. One of the issues with the salary data is that our/my database doesn't include salaries for a lot of players -- several hundred in fact -- most of whom had a cup of coffee, but some of whom played a significant amount. In fact, a quick calculation showed me that over 10% of innings pitched in the AL were by pitchers for whom we had no salary data. Also, I had not fully allocated salaries between players who played for multiple teams.

So I corrected for that, as best I could. Now, at least, no player's salary is double-counted. And every player who played has at least a little bit of salary paid to them. If you want the dirty details on how I did this, let me know.

Here are the estimated salary paid results:

Total salary paid last year of $1,150 million in the NL and $930 million in the AL for a total of $2,080,000,000. I like seeing all the zeros.

In the NL, pitchers were paid $455 million (or 40% of the total) and in the AL, pitchers were paid $362 million (39% of the total) for a total pitchers salary paid of $817 million (39%).

I apologize that my previous numbers were off by so much. I'm not sure where I was off, but I am pretty sure these are in the right neighborhood, though I'll keep staring at them. The percent breakouts look "right" to me (for what that's worth), in line with Win Shares and probably most total value systems.

By the way, if you divide outfield salaries by three, the second most expensive position was NL 1B ($120M), led by Vaughn, Thome, Helton and Bagwell. I will post the totals for all positions on my site in the next day or so, after I've triple-checked them all.

Posted 9:58 a.m., December 12, 2003 (#49) - studes (homepage)
  If you're interested, I've posted salary, win shares and win share value breakouts by league and position at the above link.