Individual Poster Page

See copyright notice at the bottom of this page.

SABR 201 - Custom Linear Weights (July 11, 2003)

Posted 8:22 p.m., July 12, 2003 (#1) - studes (homepage)
Thanks, Tango. What are the default values in this spreadsheet?

Sabermetric Site to Visit - Patriot (July 25, 2003)

Discussion Thread

Posted 12:55 p.m., July 28, 2003 (#5) - studes (homepage)
This will probably strike you as silly, but Patriot's essay on linear weights reminds me that, a long time ago, I worked out my own, easy-to-remember relative scale for batting events. I figured this out after first reading about Paul Johnson's Extrapolated Runs. My scale is: 9(home run), 7(triple), 5(double), 3(single), 2(walk) and 1(stolen base). In reading Patriot's essay, I saw that the scale fits almost perfectly.

The absolute numbers make no sense, but I've often used the relative scale off the top of my head to remember how valuable different batting events are compared to each other. (eg a home run is worth three singles).

Leveraged Index (LI) - by the 24 base-out states (July 30, 2003)

Discussion Thread

Posted 6:52 a.m., August 1, 2003 (#5) - studes (homepage)
Why do you sometimes see funky progressions within out states? For instance, runner on third with two out is higher leverage than one or zero outs?

Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 9:27 a.m., August 2, 2003 (#33) - studes (homepage)
I have to admit that I agree with Crack. I was very surprised to see that Tippett hadn't read McCracken's second article. I mean, he was making public statements about McCracken's work, and I tend to think he has a moral obligation to make sure he has seen all of McCracken's work. A simple Google search would have uncovered the primer article.

Tippett and DIPS (August 1, 2003)

Discussion Thread

Posted 9:32 a.m., August 2, 2003 (#34) - studes (homepage)
By the way, Tango, call me slow, but your correlation table indicates that pitchers have even less control over HR% than on BABIP. In fact, the difference is probably even greater than the relative coefficients on your table, because there are less extraneous factors, such as defense, that cause the BABIP correlation to decrease. Does this make sense?

DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 6:17 p.m., August 5, 2003 (#21) - studes (homepage)
Okay, I know I'm showing my ignorance again. (I'm almost embarrassed to post sometimes!). But 1b and xbh on bip are basically even, right? At least, the coefficients are not different enough to draw significant conclusions. So the difference with the coefficients of the 1b rate vs. xbh rate must lie in those pitchers who have different rates of balls in play.

So, against those pitchers who allow more balls in play, those balls are more likely to be singles than extra base hits, right? This obviously would be a huge new insight, but isn't that essentially what the data says?

DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 6:39 p.m., August 5, 2003 (#22) - studes (homepage)
Nope. My bad. See what I mean? I misinterpreted the coefficients. So let me restate it: Among pitchers who allow higher rates of balls in play, the rate of singles is more predictable than any other kind of hit. Am I getting close?

Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 12:08 p.m., August 21, 2003 (#27) - studes (homepage)
I think Ed raises a good point here. "Regression to the mean" isn't exactly what's going on, unless you know the player's true mean, right? Without it, we're using the league mean to estimate the player's true mean. And that's one step removed from regression to the true mean.

The more information we have about a player (ie. the longer he has played in the majors), the more we can estimate his true mean, particulary if we adjust for age. Barry Bonds is the example I'm thinking of here: I'm guessing that his current season shows some regression to his own true, age-adjusted mean, rather than the overall league mean.

I do think that UZR and DIPS are the biggest, most recent advancements in sabermetrics, and the most important underlying "technological" trend in the area is wider access to and use of pbp data. Tango, I also think that your leveraged index is a great advancement.

And I want to echo dlf's point: I'd like to see more analyses done within team contexts. I think tango and mgl intend to get there with slwts.

Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 12:59 p.m., August 21, 2003 (#29) - studes (homepage)
Thanks, Tango. I really should go back to school. Just because I was last there 20 years ago is no excuse!

Advances in Sabermetrics (August 18, 2003)

Discussion Thread

Posted 7:25 a.m., August 22, 2003 (#32) - studes (homepage)
Did you see that Will Carroll mentioned "regression to the mean" in his August 21 column? Here's the quote:

"Luck--or whatever you want to call it--tends to even out. Perhaps injuries have a regression to the mean, but the explanation is secondary to the result."

I'm not exactly sure what he means, but it's nice to see the intent.

Solving DIPS (August 20, 2003)

Discussion Thread

Posted 10:43 a.m., August 21, 2003 (#10) - studes (homepage)
Tango, thanks for the summary. I'm looking forward to the summarized summary, too. I definitely can't keep up with the mathematics, though this summary helped me a lot.

Non-mathematical comment: When I read these sorts of studies, I always wonder if there's a way to include ALL elements into the study. In other words, if you include batters by batter type, let's say, might you come to different conclusions? In particular, if batter were included, along with luck, park, pitching and fielding, what would happen to the relative results?

I'm sure this is nearly impossible to analyze with the data.

CF Rankings (August 22, 2003)

Discussion Thread

Posted 11:04 p.m., August 22, 2003 (#8) - studes (homepage)
Bob Mong's findings about Andruw Jones' is similar to Jones' Win Shares ranking (see link). There was a recent thread about this in Clutch Hits.

CF Rankings (August 22, 2003)

Discussion Thread

Posted 11:34 a.m., August 23, 2003 (#10) - studes (homepage)
In other words, if Cameron or Andruw Jones played for the Cubs or Dodgers they would have less win shares, but presumably roughly the same ability to contribute but not the same opportunities. I think we already knew this.

Actually, that's not right. Fielding Win Shares are not based on opportunities, in general. Outfielders receive fielding win shares based on the percentage of putouts caught by the outfielders. Thus it "corrects" for number of opportunities.

As Mike Emeigh has pointed out, there are some problems with this, particularly with groundball staffs. But the relative difference in flyballs between leagues shouldn't affect the rankings.

Regarding the offensive Win Shares: you're right, but Win Shares essentially is built to do this. A player who creates 100 runs in a league that scores 5.0 runs/game will receive less Win Shares than the one who creates 100 runs in a league that scores 4.5 runs/game. That's because the latter player's runs are more "valuable" within the league context.

CF Rankings (August 22, 2003)

Discussion Thread

Posted 10:30 p.m., August 23, 2003 (#13) - studes (homepage)
One of the interesting/weird things that Win Shares does is allocate more fielding win shares to the outfield based on team park-adjsted DER. James' theory is that outfields should get the bulk of the credit for team DER. I really struggle with this part of Win Shares, and the A's are a great example.

The reason their outfielders rank so highly is because their defense leads the league in park-adjusted DER. Only Anaheim's outfield gets more claim points than Oakland's. But Oakland's outield zone rating is 11th of all major league teams. So something doesn't seem right.

Oakland's pitching GB/FB ratio is 1.35, vs. a league average of 1.18. I think that their infield may deserve a bit more of the credit for team DER. But I haven't really studied this further.

CF Rankings (August 22, 2003)

Discussion Thread

Posted 4:54 p.m., August 25, 2003 (#15) - studes (homepage)
Win Shares probably isn't the right metric for what you're suggesting. You're better off with something like Base Runs, or even runs created for comparisons without team or league contexts. Team and league context is a big part of what Win Shares is about.

Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 2:27 p.m., August 29, 2003 (#8) - studes (homepage)
Alan, this is great. But could you interpret the table a little, please? What is LF? Just a name for the function? And how do the calculations work, exactly? Is each variable a 1 or 0 and are all the factors additive? Do you only add the one situation that applies? Thanks very much.

Aaron's Baseball Blog - Andruw Jones (September 9, 2003)

Discussion Thread

Posted 2:56 p.m., September 9, 2003 (#3) - studes (homepage)
What's weird to me is that Andruw is still only 26 y.o. That is wild.

Livan Hernandez and Scouting (September 10, 2003)

Discussion Thread

Posted 1:43 p.m., September 10, 2003 (#1) - studes (homepage)
A good scout or pitching coach would tell you that even if the results weren't there after the change in his pitching mechanics, that the results would come.

But isn't that questionable? Don't athletes and coaches try things all the time, until they find something that works? I agree with your sentiment, Tango, but it seems to me that current performance is best judged by the results, to the degree we can quantify them accurately.

Livan Hernandez and Scouting (September 10, 2003)

Discussion Thread

Posted 8:10 a.m., September 11, 2003 (#6) - studes (homepage)
I don't know the answer to your question either, Tango, but I think David hits the point on the head. Talking about a change in the arm angle, without actual "proof" of improvement (representing some appropriate statistical likelihood -- and I certainly agree that the scouts' comments reduce the threshold for statistical likelihood) is what some sportswriters do all the time. (Gammons: "Scouts say Joe Shlabotnik has experienced great success in Winter Ball with a new delivery.") And then you never hear of the player again, and the writer never mentions it again.

Together, scouts' reports and players' results tell a great story. I wouldn't separate them.

DIPS bookmarks (September 13, 2003)

Discussion Thread

Posted 9:43 a.m., September 14, 2003 (#2) - studes (homepage)
DIPS is just way to complicated for me, and, as Tango has pointed out, FIP works just as well. (and it's a lot easier to calculate and understand). Here's the link: http://www.geocities.com/tmasc/drspectrum.html

But I don't know anyplace that keeps DIPS and FIP up-to-date by player (I have team FIP calculations in my graphs).

DIPS bookmarks (September 13, 2003)

Discussion Thread

Posted 7:55 a.m., September 15, 2003 (#7) - studes (homepage)
I think Charlie has a great idea. Just looking at BPro's lists randomly, Mike Mussina faced batters with a .273 BA, while Roy Halladay faced batters with a .266 BA. And we're not talking about big differences in BABIP for pitchers, are we?

I may be completely off, but it seems to me that batters are much more likely to face representative pitching samples than vice versa. A pitcher can start 30 games in a season but, due to scheduling quirks or just plan luck, face the same team 4 or 5 of those starts.

Patriot: Baselines (September 17, 2003)

Discussion Thread

Posted 10:27 a.m., November 12, 2003 (#20) - studes (homepage)
Joshua, I tend to think scales like these are just like temperature scales. Is a 100 degree day twice as hot as a 50 degree day? I don't think anyone says that -- they say it's 50 degrees hotter. Why? Because zero is a rather arbitrary figure, at least if you're talking Fahrenheit. Of course, Celsius is a little less arbitrary, but it's not really that meaningful as a practical matter. And if you're talking five degrees below zero Celsius, how does that compare to five degrees above zero? Would we say "twice as hot" even if we used a Kelvin scale? I don't think so.

In my opinion, these value or ability scales are relative, not absolute, and should be discussed in that manner. I'm not a fan of multiple tiers. Way too confusing. I like Tango's concept: just present Wins and Losses and let the reader take it from there.

Sabermetrics >WIN SHARES bibliography (September 19, 2003)

Discussion Thread

Posted 6:40 p.m., September 19, 2003 (#1) - studes (homepage)
Nothing about me? Seriously, Pete and I plan to "play" with the Win Shares methodology this offseason, now that we have a full year's data. We've already talked about one small mathematical change. We'll post our findings and let people comment.

Just to drive Patriot nuts. ^

Sabermetrics >WIN SHARES bibliography (September 19, 2003)

Discussion Thread

Posted 10:05 p.m., September 19, 2003 (#3) - studes (homepage)
Whoa! I missed it altogether, nestled next to your spreadsheet. Sorry.

Sabermetrics >WIN SHARES bibliography (September 19, 2003)

Discussion Thread

Posted 4:10 p.m., September 20, 2003 (#7) - studes(e-mail) (homepage)
Charlie, is that something you'd be willing to share? Let me know over e-mail, if you'd like.

Aging patterns (September 23, 2003)

Discussion Thread

Posted 6:16 p.m., September 23, 2003 (#2) - studes (homepage)
Thanks, Tango. Has anyone ever done a similar study for pitchers? Seems to me that pitchers age differently than hitters, but there's also probably a lot more data noise. Are you aware of any similar pitcher studies?

Aging patterns (September 23, 2003)

Discussion Thread

Posted 7:24 p.m., September 23, 2003 (#3) - studes (homepage)
BTW, after staring at this chart a bit, what does this mean about players being able to learn to take a walk? Doesn't Beane propose, in Moneyball, that players can't be taught plate discipline? And doesn't this seem to run counter to that argument?

Aging patterns (September 23, 2003)

Discussion Thread

Posted 7:32 p.m., September 23, 2003 (#4) - studes (homepage)
Okay, last post for now. Why the heck does the $H go down from year one??? That floors me. Is it due to speed? I could understand sample bias issues at the age extremes, but that is just weird.

Other observation: the walk rate and hit rate trend lines intersect between ages 27 and 28, right around the $LW peak. Don't know if that means anything, but it is interesting.

Aging patterns (September 23, 2003)

Discussion Thread

Posted 12:10 a.m., September 24, 2003 (#6) - studes (homepage)
Ah. Not T-ball. I'll try and remember that.

Aging patterns (September 23, 2003)

Discussion Thread

Posted 12:38 a.m., September 24, 2003 (#9) - studes (homepage)
It's an interesting discussion from a GM's point of view. How much you decide to invest in a young developing player depends on your perception of how well he can learn certain skills. I'm guessing development people make this sort of calculation every day: ability to learn vs. potential if learning occurs.

Skills can be learned, but it's very hard to change attitudes and nearly impossible to change personalities. Maybe a lot of baseball people believe plate discipline is an attitude, not a skill.

Sorry about the ramble. I'm still not sure what to make of the BABIP decline. Is that warped by lack of regression to the mean?

Aging patterns (September 23, 2003)

Discussion Thread

Posted 11:13 a.m., September 24, 2003 (#13) - studes (homepage)
Hey, I referred to speed in #4! I guess great minds think alike...

Nah. I'm just greedy for some credit. :)

Aging patterns (September 23, 2003)

Discussion Thread

Posted 2:08 p.m., September 24, 2003 (#15) - studes (homepage)
NP, David. I really don't mind at all. Just kidding.

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 7:30 a.m., October 2, 2003 (#4) - studes (homepage)
Patriot, I think you need to fix the links on your page as well. They're the same gooblebygook as the original link you posted here. Can't wait to see them; I didn't know you had posted 2003 stats.

Thanks

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 12:02 p.m., October 2, 2003 (#7) - studes (homepage)
Patriot, the links work great. Thank you!

BTW, how do you derive park factors from Doug's data? I don't see any park breakouts on his page. Or are you using another source?

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 11:40 a.m., October 7, 2003 (#17) - studes (homepage)
Slghtly off topic, but Pete Simpson has created an awesome spreadsheet of all players' base data in 2003. He's formatted it extremely well for anyone who'd like to play with the data. It rivals Patriot's spreadsheets, and it does contain James' version of park factors (at least, as laid out in Win Shares).

You can download it at www.baseballgraphs.com/winshares/

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 10:13 a.m., October 9, 2003 (#19) - studes (homepage)
Yes, Gagne is very highly rated by Win Shares. I actually think that's appropriate. He had an amazing years in a lot of high leverage situations. If you estimate his leveraged innings (see Tango's work) you can approximate his Win Shares total.

Obviously, Webb was not in the majors for the entire year.

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 6:52 p.m., October 10, 2003 (#21) - studes (homepage)
FJM, Gagne rates more highly than Smoltz for a number of reasons (this is off the top of my head; I've got the spreadsheet at home):

- You're right about the innings.

- Gagne's Compenent ERA was lower. He allowed less hits than Smoltz (in more innings) and he had almost twice as many K's as Smoltz. His BB rate was higher than Smoltz. Win Shares does adjust ERA for relievers, based on Earned Run components, and Gagne was much lower than Smoltz.

- I think it's an accident of the methodology, but Gagne rates higher because he depended less on his fielders than Smoltz did.

Yes, the catcher fielding rankings often look screwy. Pete probably got the most questions about those. You're welcome to e-mail him and ask him why Rodriquez ranked so low. I'll try and look it up over the weekend.

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 7:46 p.m., October 10, 2003 (#23) - studes (homepage)
I agree, Patriot. I'm really hoping to use this winter to try and improve the Win Shares methodology, improving it in some spots and "modernizing" it in other spots. Now that we have a year's worth of raw data, we can play "what if" scenarios to play with last year's results and see if we can get them to better match what we know today.

My long-term plan for next year is to keep posting Win Shares in-season, but also post a newer version of Win Shares based on our winter research. I know this may be a complete waste of time in the end, but at least I'll have some fun doing it.

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 8:00 p.m., October 10, 2003 (#25) - studes(e-mail) (homepage)
Yeah, I know I'm asking for it. Once I get a little organized, I'd love a laundry list of complaints (though I think I know some of them already, from reading previous fanhome and primer threads).

BTW, if anyone wants it, I've got the complete 2003 Win Shares calculations, with all player data loaded (thanks to Pete) in an Excel spreadsheet. The calculations are built in, so if you're a glutton for punishment, you can start messing with the Win Shares calculations and come up with your own version. Just let me know if you'd like a copy of the spreadsheet. Warning: it's not "user friendly" so you'll have to figure out how it all comes together. I found it fairly easy to navigate, though.

2003 Park Factors (October 1, 2003)

Discussion Thread

Posted 9:12 a.m., October 11, 2003 (#27) - studes (homepage)
Did some research on Rodriquez this morning, looking at Pete's spreadsheets. A good comparison is Rodriquez to Ausmus (4 Win Shares to 9 Win Shares). Comparing the two:

- They played similar number of innings (1132 to 1158 for I-Rod and Ausmus)
- I-Rod had more errors (8 to 3)
- I-Rod allowed more passed balls (10 to 3)
- I-Rod allowed less stolen bases and they had similar SB% (40/60 for IRod and 68/105 for Ausmus). Win Shares only looks at CS% on the position level, not absolute SBs. It does look at absolute SBs allowed for individual fielders.

So, while the record is mixed, it is certainly hard to justify the position that Ausmus contributed twice as many fielding wins to the Astros as I-Rod did to the Marlins.

I think the biggest issue is that Win Shares assigns defense Win Shares to a position first, then to individual players. And while Rodriquez is great, Redmond is a pretty bad catcher, and this brings down the fielding Win Shares claim points available to Marlin catchers.

So there are a few wrinkles we could play with here. The stolen base methodology, for one. And definitely the position/individual splits. This, in fact, may be the place to introduce "loss shares," to ensure that lousy fielders don't bring down good fielders.

Player Game Percentages, World Series (October 8, 2003)

Discussion Thread

Posted 6:31 p.m., October 9, 2003 (#7) - studes (homepage)
I just finished reading Curveball, and I enjoyed it immensely (which just proves that I'm a total geek, I guess). Many of the topics that are covered here are described in detail in the book. I'd highly recommend it if you're interested in this approach.

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 3:59 p.m., October 10, 2003 (#5) - studes (homepage)
The more I play with park factors, the more convinced I become that one-year factors are virtually meaningless. You can find all sorts of funky stuff in one-year factors. For instance, Rockies' pitchers had a better ERA at home than on the road (obviously, DER had a lot to do with it, as FJM points out). Why? Who knows? The only theory that would make sense is that O'Dowd somehow figured out what a Coors Park pitcher should be, and I sincerely doubt that.

I think MGL, Patriot and Tango are right on: take the long-term view and regress the heck out of the figures. If you don't do that, I think you are better off just totally ignoring park factors. Yes, there will be wild skews on a year-to-year basis. I'd accept those as statistical anomalies that just happen, and that don't really require "correction" except on a long-term basis.

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 12:40 a.m., October 12, 2003 (#11) - studes (homepage)
Whoa! Targeted by Ross.

I didn't say park factors are meaningless. Just one-year park factors. If I only knew the park factor for Coors Park for one year, I would severely regress it back to the league average.

Look at Pac Bell after its first year, then compared to the next two. Pretty big diff.

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 12:21 p.m., October 12, 2003 (#13) - studes (homepage)
Nice job, Tango. I better understand what Ross was saying. But I still question whether applying single-year park factors to individual players is "statistically significant." (at least not in terms of determining a player's true ability).

Of course, I should apply the usual caveat: I am not a statistician. And I'm not sure how the concept of statistical significance would apply here. But if one-year park factors are significant for individual players, why regress to the mean?

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 5:38 p.m., October 12, 2003 (#15) - studes (homepage)
But the probability that national league players played at Coors is higher so the average league park factors would still be statistically significant for the population of NL players, just as individual park factors are.

I agree with what you are saying, Ross, except that I am questioning the statistical validity of this last sentence. I mean, the Rockies' pitchers gave up more runs on the road than at home this year.

It's been twenty years since my last statistics class, but I did the following anyway: I looked at the Rockies' 2003 record at home and away, and computed total runs scored per game (both the Rockies and the opposition). At Coors, the number of runs scored per game was 11.9, vs 9.6 on the road, for a park factor of 125%. Next, I calculated the standard deviation for each set of 82 games. It's 4.7 at Coors and 5.3 on the road.

So, just applying one standard deviation to the numerator and denominator, there is a 67% chance that the numerator is between 6.7 and 17.2, and the denominator is between 4.8 and 14.4.

Okay, I admit that I don't know what to do with this statistically, but I am pretty sure this means that, for a given denominator of 9.6, there is a 16.5% (half of 33%) chance the numerator will be 6.7 or lower. If true, there is a 16.5% chance that the Coors park factor is 70 or lower (6.7/9.6).

I know there are problems with this approach, such as assuming a normal distribution and the imbalanced schedule, but I just don't think one-year park factors are statistically significant, even in an extreme case like Coors'.

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 5:51 p.m., October 12, 2003 (#16) - studes (homepage)
One mistake (at least!). The SD's are 4.7 on the road and 5.3 at Coors. The ranges are correct.

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 6:18 p.m., October 12, 2003 (#18) - studes (homepage)
Yes, I agree about the individual batter thing. That's not really what I'm saying.

And yes, the St Dev is 5 over 81 games. I analyzed this game-by-game.

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 8:04 p.m., October 12, 2003 (#19) - studes (homepage)
Going over my calculations a little more, I think a statistician would say that there is about a 65% chance that Coors is higher run-scoring environment than the average NL environment, based only on 2003 data. At least, that is my interpretation of the averages and standard deviations, adding in a probability table.

And, obviously, Coors is one of the most (if not the most) extreme environments around. So any conclusions you would want to make about other parks, based on one-year data, would be even less conclusive.

Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 4:51 p.m., October 13, 2003 (#20) - studes (homepage)
Okay, I apologize to everyone for using this thread as my own personal learning curve. I spent some time this morning studying my basic statistics even further, and realized I should have used the standard error, instead of the standard deviation (which is what i think you were saying, Tango).

Standard Error is St. Dev divided by the square root of the sample size, which is about 5 divided by 9 in this case, or .556. Two times this gives you a confidence interval of 95%. So I would use this to impute that the Coors Park one-year park factor is statistically significant (that is, there is at least a 95% probability that Coors is a better run environment than the league).

Regarding everything else I said: I guess I still don't trust most one-year park factors, and I absolutely agree that, when assigning park factors to individuals, individual batting/pitching types should be taken into account.

Pythag Expansion (October 11, 2003)

Discussion Thread

Posted 11:43 a.m., October 11, 2003 (#4) - studes (homepage)
Just had my son, the math genius, look this over. He understands the math, but says it wouldn't work for other exponents. The binomial expansion in the first step would be different for other exponents, and the logic wouldn't hold.

postseason odds - Silver (October 11, 2003)

Discussion Thread

Posted 9:38 a.m., October 11, 2003 (#1) - studes (homepage)
I think it would be kinda cool to see these reports in graph-form day-to-day, so you can see the wild swings.

Now you're talking! :)

RISP for hitters and pitchers (October 13, 2003)

Discussion Thread

Posted 5:19 p.m., October 14, 2003 (#9) - studes (homepage)
I guess I'm missing something. We assign runs to pichers when we use ERA, and I thought the point of ARP was to correct ERA for situations in which another pitcher impacts the stat. This particularly impacts relievers, I believe. Virtually all fans use ERA to assess pitchers, so this makes something like ARP more imperative for pitchers than hitters.

I don't believe we assign runs to batters in the same way. Yes, we do count runs and RBIs, but they are not primary stats for evaluating hitters (at least, not for educated fans). Speaking for myself, I pretty much ignore them. And most fans do look at other stats, such as BA, HRs, etc.

If you develop a stat that assigns run values to hitters, similar to ERA for pitchers, then I agree that you should correct for base/out/inning situation.

Game State Matrix (October 13, 2003)

Discussion Thread

Posted 6:16 p.m., October 13, 2003 (#1) - studes (homepage)
Cool graph. I'd like to know what software he used to create it.

However, there are enough weird things going on in this graph to make me question some of the data points. Little "splotches" of red or light blue seem to crop up in backgrounds of yellow or fuschia (really, what are some of those colors?), when I would think the distribution of each probability would follow a fairly normal pattern.

Game State Matrix (October 13, 2003)

Discussion Thread

Posted 1:01 p.m., October 15, 2003 (#7) - studes (homepage)
I personally like the leveraged index for pitchers very much, because we know that different innings have varying levels of importance, and this seems like a "straightforward" (at least, conceptually) way to quantify that. But we do need to be clear that it is a value stat only (just like ARP) and does reflect managerial choices (as well as other factors, such as whether the team was involved in a lot of close games compared to the league average).

I also like it for batters, but only in a system that assigns run creation to batters, such as Tango's example.

I hadn't thought of this, but can we use leveraged indices to quantify the effectiveness of a manager?

BTW, Tango, thanks for the links.

Results of the Forecast Experiment, Part 2 (October 27, 2003)

Discussion Thread

Posted 9:15 p.m., October 27, 2003 (#33) - studes (homepage)
Well, now I wish I had participated. Tango and Alan, you guys did a great job presenting the results. Well done!

For Aging Runners, a Formula Makes Time Stand Still (October 29, 2003)

Discussion Thread

Posted 10:30 a.m., October 29, 2003 (#1) - studes (homepage)
Tango, there is a graphic of his age factors, but it's not in the print version that you're linking to. I've put the link to the article under "homepage". The link to the graphic is halfway down the page on the right.

Fun with Win Shares (November 5, 2003)

Discussion Thread

Posted 9:02 p.m., November 5, 2003 (#3) - studes (homepage)
Thanks, guys. We're having fun.

Sylvain, thanks for your "Win Shares Resources" page. It was a natural link for us to add.

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 8:36 p.m., November 6, 2003 (#1) - studes (homepage)
See related NL Cy Young discussion from home page. Go, Gagne!

Got to admit, though, that the Halladay rating surprises me. Judging from the rankings, these guys are not splitting credit between pitching and fielding.

If a team or two buys this, Tango and MGL could be rich men!

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 3:49 p.m., November 10, 2003 (#18) - studes (homepage)
Colin, Tango:

Thanks. I think a little light bulb just started flickering in my head. Tango, I'm beginning to understand why you once said that Win Shares undervalues pitching.

Sorry to bring up Win Shares, but I've been working on research regarding the pitching/fielding split, so this has been on my mind. James essentially splits pitching and fielding by giving pitchers compete credit for FIP runs below the ceiling, and half credit for DER runs below the ceiling. On the surface, this makes sense to me.

I think you're both suggesting that this should be approached on a purely incremental basis, pitcher by pitcher. Am I close?

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 4:49 p.m., November 10, 2003 (#21) - studes (homepage)
Tango, thanks. I need to noodle on this. Sorry to bring up Win Shares.

Colin, I actually don't think that's right. Don't forget that James does use a replacement level, even though he says he doesn't. If you model it out (which I did, quickly), you'll see that a pitcher who achieves a 33% increase in Pythagoras also achieves about a 33% increase in marginal runs.

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 5:51 p.m., November 10, 2003 (#22) - studes (homepage)
One more comment: Halladay saved 37 earned runs against the league average earned runs (ERA vs. league ERA times innings pitched). If you attribute all those earned runs saved to Halladay, which I agree with, I don't see how you get 9 wins contributed above average. I would think it would be four to five games, max.

The Jays also scored 5.6 runs in each game he started, which certainly contributed to his fine W/L record. His BA with RISP also wasn't great, relative to his overall performance, so I don't expect Tango's WPA stats to reveal anything out of the ordinary. This ranking doesn't make sense to me. The Gagne ranking could be.

Giambi is rated ahead of A-Rod. My gut still tells me that they didn't factor fielding into runs allowed.

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 1:48 p.m., November 11, 2003 (#25) - studes (homepage)
Colin, thanks a ton. I appreciate the time you took to model PWS. You're right about leveraged innings taking Win Shares away from starters, though I'm not sure that's a bad thing.

My gut tells me that this may indeed be an issue and, if it is, it's an issue in the way WS handles marginal runs allowed and splits them between pitching and fielding. I plan to take some time analyzing it over the next month or so. I'll let you know if I make any progress. Thanks again.

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 11:56 p.m., November 12, 2003 (#34) - studes (homepage)
I'm even more convinced than ever that the issue lies with the pitching/fielding breakout AND the absence of Loss Shares. It's going to take a bit of time, but I think I can pull those out of Win Shares.

See the link to the latest article on the home page. If any of you have any comments, I'd appreciate them.

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 12:29 p.m., November 13, 2003 (#37) - studes (homepage)
Tango, one math comment. For the everyday player, you multiplied the denominator games by nine. I assume this is for nine players on the field, but you've already separated one of those nine players (the pitcher). So I think you mean to multiply 162 by 8, which gives an average player 17 Win Shares, vs. 15 for an average pitcher. I'd buy that.

Of course, this is assuming the National League.

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 3:20 p.m., November 13, 2003 (#42) - studes (homepage)
Tango, I don't think that's quite right, because you're adding back in pitching and fielding shares of a designated hitter, although he doesn't play defense. I'd have to think a bit about how your scenario plays out in the AL.

One small note: not piching in the AL actually helps most pitchers, because their negative batting marginal runs aren't impacting their totals.

I also agree with Colin. A pitcher who throws 250 innings with an ERA of 4.00 is a good pitcher, IMO.

Colin, you may be onto something with the replacement level for fielding and pitching. I have to work this out over the next couple of weeks, but I think the Win Shares issue is this: once you compute Loss Shares for fielders and pitchers, you're going to find that most, if not all, fielders hover around the .500 mark. That is, the variability won't be high.

Pitchers, however, will vary widely. In fact, the superstar pitchers will incur negative Loss Shares (at least, in the way I'm thinking about it) and I am interested to see what will happen to their pitching Win/Loss Share totals, compared to everyday players, when that happens.

At least, that is what I think will happen. Sheesh, I've got a lot of work to do.

What's a Ball Player Worth? (November 6, 2003)

Discussion Thread

Posted 3:42 p.m., November 13, 2003 (#43) - studes (homepage)
One other thought: using averages in this example may be misleading, because the variance among pitchers may be wider than the variance among batters, particularly once Loss Shares (or something) are added. Different distributions will skew your analysis. The median might be a better point of analysis.

METS SEARCHING FOR STATS ANALYST (November 7, 2003)

Discussion Thread

Posted 11:56 a.m., November 7, 2003 (#6) - studes (homepage)
Hey, I'm a Mets fan! Think they want someone to create graphs for their analysis????

David Pinto and fielding (November 10, 2003)

Discussion Thread

Posted 3:14 p.m., November 10, 2003 (#2) - studes (homepage)
Did you notice the non-impact of Coors? David is using one-year park factors here, and I think that's a mistake, myself.

The other thing that struck me, looking at his data, is how little impact defense actually has, at least when collated on the team level. The difference between the best and worst fielding teams in the leagues last year (Braves and Mets) was 80 outs, spread over an entire season. That's 0.5 outs/game. 1/6 of an inning/game. etc. etc.

Can this be? Is the relative difference between defenses that small? Is everything else park, handedness, pitcher and (mostly, it seems) luck? How could Darrin Erstad possibly save 55 runs in a season if this is true?

Let me know what I'm missing.

David Pinto and fielding (November 10, 2003)

Discussion Thread

Posted 1:07 p.m., November 15, 2003 (#17) - studes (homepage)
I don't know if this is the right approach, but I ran a regression of David's expected DER and actual DER to try and get a grip on the relative responsibility between pitching and fielding for DER. I'm including "luck" in the pitching bucket, cause there is really no way to separate it out.

I ran "Expected DER" and "Diff" against Actual DER. Of course, the R2 was almost perfect, but I was interested in the t-stat. Expected DER had a t stat of 99, and Diff had a t stat of 55.

Based on this limited sample size, I think that would indicate that pitching and luck account for 2/3 of DER variance between teams, and fielding accounts for 1/3. This is slightly higher than my previous guess of 50%.

Is my approach valid?

David Pinto and fielding (November 10, 2003)

Discussion Thread

Posted 2:34 p.m., November 16, 2003 (#24) - studes (homepage)
Michael, thanks for your input on my little calculation. Makes sense, so I went back and did another calculation. I "normalized" a fielding impact DER by averaging the pitcher DER across all teams, and then adding or subtracting the DER difference by team. I then computed the standard error of the pitching DER and the fielding DER and got these results:

- Pitching DER: .0016
- Fielding DER: .0009

This would indicate that variance for pitchers is 80% higher than that for fielders.

Standard error, in this case, is the estimate of the underlying "true" mean of the broader population. But I think that using it to determine the relative impact of pitching and fielding on DER is appropriate.

Caveat: all the "noise" you referred to in your previous post is in the pitching bucket, given the way David approached the data. That's why the variance is higher. So this is not invalidating your point about the relative true impact of pitching and fielding on DER.

I think we all agree that much of DER is dependent on things other than the pitcher or fielder. That is, luck. The only difference I'm bringing to the table is that I think that the noise should be attributed to the pitcher, when assigning "responsibility" for DER.

This is purely a philosophical belief. You can either see the pitcher as passive, or the fielders as passive (or a combination, I guess). Your approach (attributing all the noise to fielders) makes the pitcher passive, which might be consistent with the DIPS framework.

But in my mind, the pitcher is in charge of what happens on the field. He initiates the play, he throws the ball. What happens to a batted ball ought to be attributed to him until other players can make a reasonable impact -- i.e. fielders. So I would choose to make the fielders the passive players. They can't really act until a ball is hit into their "zone." It's the pitcher who ought to be assigned "control" before that moment.

Sorry to ramble. I don't think this has any impact on your DRA calculations, from what you said before. But I thought I would bring up the philosophical differences.

I think the approach Tango seems to be taking -- assigning 50/50 responsibility to pitchers and fielders -- is reasonable until we get more data.

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 8:08 a.m., November 15, 2003 (#16) - studes (homepage)
Tango, I just want to echo everyone else. This is tremendous.

It's a little bizarre (at least, to me) that you're doing this at the same time I'm playing with Win Shares. I'm posting WSAA (Win Shares Above Average) for all 2003 batters over the weekend. Once you've done 2003, it will be interesting to see what the differences between the systems are at that point.

I'm also fascinated that you have figured out a way to apply this approach to non-PBP data. That's super. Can't wait.

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 7:59 p.m., December 1, 2003 (#21) - studes (homepage)
One of the aspects of these approaches that intrigues me is the adjustments that are used. I tend to think that there are two kinds of adjustments:

- Those that "normalize" the data, so that players can be evaluated over environments. This is hugely important for ability stats, like slwts, but I'm unsure what role that should play in value stats.

- Those that refine the win probability appropriately, such as starting out in a bigger probability "hole" against Pedro.

Seems that park factor differs in its role between the two approaches. To normalize data, you must adjust for the park/environment so that Todd Helton can be directly compared to Shawn Green, for instance.

To refine the win probability, you use the park factor to factor the fact that a home run in Coors does not increase your probability of a win as much as a home run in Dodger Stadium.

Two approaches, two goals. Same "factor."

As you've said, Tango, Win Shares is sort of a hybrid of the two. Win Shares includes adjustments for park, leveraged innings, pitcher handedness, clutch hitting and opportunities (such as double play opportunities). It also includes an adjustment for the difference between actual wins and projected wins.

I scratch my head on a couple of these. Pitcher handedness normalizes the fielding data; clutch hitting refines the win probability. I think that Win Shares strives to be a "normalized value stat." Not the cleanest approach perhaps, but useful anyway.

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 9:45 p.m., December 3, 2003 (#38) - studes (homepage)
Tango,

according to Pete's spreadsheet, pitchers were paid $1,063,471,375
last year. That includes a full year's salary at $300K for a number of pitchers how only pitched part of the year. I'd estimate that, if you took them out, pitchers were paid about a cool billion last year.

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 2:51 p.m., December 9, 2003 (#41) - studes (homepage)
But Tango, why would you single out pitchers for the "boomerang" effect? Why would they benefit from it, but not everyday players? Wouldn't a GM "overpay" for marginal everyday players, because they benefitted from a Giles or Pujols pre-arbitration?

I think pitchers are paid more because of the potential high production they could bring to the mound. It's the same as the A-Rod factor. A single, good pitcher COULD have a year in which he contributes 5/6 wins, and the market pays for those wins at a higher rate than they pay for the first few wins.

You may have already touched on this (I haven't followed the entire thread), my guess is that standard deviations are higher for pitchers than everyday players. It may seem counterintuitive, but that would increase their salaries.

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 3:24 p.m., December 9, 2003 (#43) - studes (homepage)
Yeah, that's the big question. Are pitchers overpaid? They are according to Win Shares. Also according to VORP. Not sure about WARP.

Total salaries last year were $2.2B. If you split runs and runs allowed 50/50, then $1.1B would go to batting and $1.1B to defending. $1B went to pitchers, which means $0.1B went to fielding. Does that seem right?

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 4:49 p.m., December 9, 2003 (#45) - studes (homepage)
I'll have to chew on that one. Got to admit that I don't understand exactly what you mean. $300 million for minimum salaries seems like a huge number. That would imply that there were 1,000 full-year pitchers in the majors last year, at $300K per pitcher.

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 11:41 p.m., December 10, 2003 (#48) - studes (homepage)
I'll be the first to admit that I get lost on some of these more obscure replacement level issues. I've studied this thread a bit, and I still don't get win advancement percentages vs. won/loss percentages, and what that means for replacement level. Oh well.

What I have done is gone back to the salary data. One of the issues with the salary data is that our/my database doesn't include salaries for a lot of players -- several hundred in fact -- most of whom had a cup of coffee, but some of whom played a significant amount. In fact, a quick calculation showed me that over 10% of innings pitched in the AL were by pitchers for whom we had no salary data. Also, I had not fully allocated salaries between players who played for multiple teams.

So I corrected for that, as best I could. Now, at least, no player's salary is double-counted. And every player who played has at least a little bit of salary paid to them. If you want the dirty details on how I did this, let me know.

Here are the estimated salary paid results:

Total salary paid last year of $1,150 million in the NL and $930 million in the AL for a total of $2,080,000,000. I like seeing all the zeros.

In the NL, pitchers were paid $455 million (or 40% of the total) and in the AL, pitchers were paid $362 million (39% of the total) for a total pitchers salary paid of $817 million (39%).

I apologize that my previous numbers were off by so much. I'm not sure where I was off, but I am pretty sure these are in the right neighborhood, though I'll keep staring at them. The percent breakouts look "right" to me (for what that's worth), in line with Win Shares and probably most total value systems.

By the way, if you divide outfield salaries by three, the second most expensive position was NL 1B ($120M), led by Vaughn, Thome, Helton and Bagwell. I will post the totals for all positions on my site in the next day or so, after I've triple-checked them all.

Win and Loss Advancements (November 13, 2003)

Discussion Thread

Posted 9:58 a.m., December 12, 2003 (#49) - studes (homepage)
If you're interested, I've posted salary, win shares and win share value breakouts by league and position at the above link.

Win Shares, Loss Shares, and Game Shares (November 15, 2003)

Discussion Thread

Posted 5:29 p.m., November 15, 2003 (#1) - studes (homepage)
Tango, that is too funny. Thanks for the kind words.

You know, I never meant to wind up "trumpeting" Win Shares. It's just that I've always thought Win Shares were interesting, and I thought it was a shame they weren't being presented anywhere. Then I "met" Pete on Primer and we started working together.

I'm hoping my site will have some influence -- I know Rob Neyer drops by once in a while -- and that folks will understand that Win Shares can and should be improved. I would think this could only help your WPA approach, by the way, by helping the same folks better understand the validity of your interesting system.

Win Shares, Loss Shares, and Game Shares (November 15, 2003)

Discussion Thread

Posted 5:08 p.m., November 17, 2003 (#20) - studes (homepage)
Holy cow, Colin. That is a lot to think about. I'm still digesting it, but a couple of thoughts:

I don't understand how you can create a "baseline" offensive player whose WS=0, but somehow differs from absolute runs created. You mention a replacement level player who only creates outs, but I don't get it. Got to admit, I'm sort of hung up on the point.

The problem with going to absolutes, of course, is that you run into the curved nature of runs scored/allowed and wins. This is the main reason that James went with marginal runs (besides creating a "zero" point for the defense) -- to create a span in which that didn't matter too much. Now, maybe I've blown his intent away by going into the negative win shares, I'm not sure.

Maybe it's bad to go into negative win shares and loss shares, but I actually don't see why, unless it's the aforementioned curvilinear problem. The negative wins shares and loss shares are only there to get to WSAA, given the way I've put the system together. I think/hope that most people can understand that. Same with the way I've dealt with Game Shares. When presenting the data, we don't have to present the intermediate steps.

BTW, Games Shares (in my proposed system) do generally equal outs made for batters. So they act in the same manner that Loss Shares do in your system.

I like your ideas on the defensive side. I've started to try and think this through for the defense, and it's hard! You may be onto something by focusing first on losses.

Great stuff!

Win Shares, Loss Shares, and Game Shares (November 15, 2003)

Discussion Thread

Posted 5:27 p.m., November 17, 2003 (#23) - studes (homepage)
Charlie, I'd love to see your system, of course. I am struggling with the defense side of the equation, though I've really just started.

I'm glad to hear you take negative batting/pitching performances into account. As I tried to point out in the Bonds article, you run into equity problems between teams if you don't. I think this would be a problem with any system that uses a threshold.

Win Shares, Loss Shares, and Game Shares (November 15, 2003)

Discussion Thread

Posted 7:17 p.m., November 17, 2003 (#25) - studes(e-mail) (homepage)
That would be super. I've attached my e-mail address in case you need it. Thanks.

Win Shares, Loss Shares, and Game Shares (November 15, 2003)

Discussion Thread

Posted 10:27 p.m., November 17, 2003 (#29) - studes (homepage)
Thanks, Colin. I understand. Is it this simple?:

Figure the W/L record of offense and defense exactly as you're suggesting, based on team runs scored and allowed vs. league average (park-adjusted, of course) and pro-rating the actual W/L of the team on top of that. Then assign offensive Wins to each player based on proportion of absolute runs created (no need for marginal or replacement level, I don't think) and pro-rate the losses based on outs made. Bingo, your win and loss shares. As you say, this is essentially what I've accomplished with WSAA.

For defense, assign wins based on outs created (splitting BIP outs 50/50 between pitchers and fielders) and losses based on runs allowed (following the same sort of split). You'd probably have to use a component ERA approach for relievers, and throw in the leveraged innings concept.

Then do lots of oddball things to assign outs and runs allowed to individual fielders.

Can it be this simple?

Win Shares, Loss Shares, and Game Shares (November 15, 2003)

Discussion Thread

Posted 12:24 p.m., November 20, 2003 (#36) - studes (homepage)
BTW, I've played a little bit with the abolute approach, but haven't made it work yet. I think it's doable, but, frankly, the relational approach doesn't bother me one iota. Either approach is just as valid, and either approach will yield results that will be expressed as plus/minus vs. average, or replacement level.

So I probably won't spend more time on it (my family is ready to kill me, anyway) and I plan to move onto pitching and fielding.

Win Shares, Loss Shares, and Game Shares (November 15, 2003)

Discussion Thread

Posted 9:11 p.m., November 20, 2003 (#38) - studes (homepage)
Colin, that is great. Makes me feel better about not having the time to devote to it.

If you flesh it out some more, I'll be happy to try your ideas with the Win Shares database that Pete has put together.

Persistency of reverse Park splits (November 20, 2003)

Discussion Thread

Posted 11:42 a.m., November 20, 2003 (#2) - studes (homepage)
This makes a ton of sense. I spent a lot of time earlier this year picking apart park factors as best I could and trying to determine if certain types of hitters did better/worse in certain parks. I still think there's something there, but the classification of hitters is very tricky, and so is the regression toward the mean.

I also spent some time on Retrosheet reviewing home/road splits for a lot of Mets hitters (cause I was interested in the effects at Shea, which has been around for a long time). I was amazed at how much variance there was in one-year splits. For instance, Mookie Wilson had several years in which he hit "significantly" better at Shea, and several years in which he hit "significantly" worse. Significantly, by my own non-analyzed impression.

My only point is that I think you can only draw conclusions about specific batters or types of batters with A LOT of data and with an airtight classification system. And I'm not sure what you'll find out at that point.

By the way, check out Sid Fernandez's home/road splits from his Met days. I bet those pass the significance test.

Did I just undermine my entire point?

Win Shares per Dollar (November 20, 2003)

Discussion Thread

Posted 12:15 p.m., November 20, 2003 (#1) - studes (homepage)
Yes, that's a very good suggestion. I'll do it tonight.

Personally, I like the graph.

Win Shares per Dollar (November 20, 2003)

Discussion Thread

Posted 10:54 p.m., November 20, 2003 (#4) - studes (homepage)
I've updated the comments, added a graph and data.

Win Shares per Dollar (November 20, 2003)

Discussion Thread

Posted 9:44 p.m., November 24, 2003 (#6) - studes (homepage)
I haven't run playoff numbers, Scoriano. I think Pete was going to do that, so I'll check with him and see what's up.

Win Shares per Dollar (November 20, 2003)

Discussion Thread

Posted 1:51 p.m., November 25, 2003 (#9) - studes (homepage)
Robert, you may want to check out the thread on the baseballgraphs site. That's exactly where it goes.

Baseball Player Values (November 22, 2003)

Discussion Thread

Posted 10:45 a.m., November 23, 2003 (#5) - studes (homepage)
It's amazing how WPA is cropping up everywhere. The fact that he doesn't split responsibility between pitching and fielding pretty much undermines the pitching numbers, IMO. Also, it is certainly possible that Helton would rank first in 2000, even playing in Coors.

Tango, I imagine that park factors/run environment impact the WPA weights. Are park factors going to be part of your system?

Tendu (November 24, 2003)

Discussion Thread

Posted 7:28 a.m., November 25, 2003 (#5) - studes (homepage)
FYI, this was written up in Newsweek a month or two ago. I can't find the link. The Tendu guy must have a good PR firm.

ABB# (November 24, 2003)

Discussion Thread

Posted 4:00 p.m., November 24, 2003 (#2) - studes (homepage)
Wow, Tango. I really like the formula at the end. It's simple and elegant, really. It's even something that some could do in their heads, roughly. Maybe.

My first thought when I read Aaron's column was "well, why not just use BRA (OBP times SLG) instead?" The downside is that you don't have a number that's roughly comparable to BA. But I wonder how OBA/2 + SLG/4 compares, in terms of fit, to BRA?

I know you don't have this in your database, but the other comparison I can think of is EqA, which the Prospectus folks derived to approximate the same scale as BA, I believe.

ABB# (November 24, 2003)

Discussion Thread

Posted 4:31 p.m., November 24, 2003 (#6) - studes (homepage)
You know, calling it "the Aaron number" would be an appropriate honor for its creator. Which calculation are you talking about, though? His first, or the 0bp/2+slg/4 calculation?

ABB# (November 24, 2003)

Discussion Thread

Posted 6:04 p.m., November 24, 2003 (#12) - studes (homepage)
Abba? You want Mike Piazza "Dancing Queen" jokes for the next six months? kidding...

I personally like "the Aaron number". Catchier. Sure to sell on Madison Avenue. Maybe you can even trademark it!

Cheers,
Dave

ABB# (November 24, 2003)

Discussion Thread

Posted 11:37 p.m., November 24, 2003 (#23) - studes (homepage)
Count me as someone who's sick of acronyms. That's why I like "the Aaron number." That's classy. Sort of like the Richter Scale, the Laffer Curve and the Doppler Effect. Giving it an acronym will ensure that it gets lost in the sabermetric haze.

Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 8:12 a.m., November 29, 2003 (#4) - studes (homepage)
Alan, thanks. Maybe you can help me think this through. I did the first step you mention (WS*$300K minus salary) to derive "net value added to the team." I noticed that net value decreased as salary increased. Are you saying that this must be true, given the way I defined value? Is there a better way to define value?

On a team level, there is a fairly straightforward negative relationship between value and payroll. The team with the second-most value was Tampa Bay. Fourth was Milwuakee. This didn't seem like the most helpful analysis to me.

So I derived a formula that "best fit" the data. Frankly, I didn't pay any attention to the r or r squared, cause I wasn't interested in fit. I wanted a formula that best described the negative slope between salary and value, to better evaluate the GM, as you say. Was it inapproriate to try and capture the negative slope between salary and value, given the way I defined value?

Thanks again for your help.

Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 3:04 p.m., November 29, 2003 (#6) - studes(e-mail) (homepage)
Scoriano, sorry. Pete doesn't have time to do that right now, and I'm not going to tackle it myself. There are also some issues deciding how to set certain baselines -- particularly for fielding purposes -- with postseason data. Not as easy as just loading the data.

If you feel up to it, I'd be happy to e-mail you Pete's basic spreadsheets, and you could fill in the data. I think Patriot also has a Win Shares spreadsheet on his site. I'm guessing that Patriot's spreadsheet is more user-friendly.

Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 6:55 a.m., November 30, 2003 (#9) - studes (homepage)
Thanks, Alan. I appreciate the comments and work.

I went back to my data and looked at my basic Win Shares vs. Salary regression. The basic formula was WS = 4 + 1.2*$1M. So you would expect, on average, 16 Win Shares for a player who was paid $10M. This doesn't deviate a lot from my formula, so I think my graphs are still appropriate (if not the underlying approach).

When you say that, by definition, I will get a negative correlation between salary and value, I don't understand why. If you're saying that this is true, given baseball's salary structure, I can understand that. But I posit that there are a lot of industries in which the correlation would be positive. And this is an important insight.

Imagine an industry with a high learning/experience curve, some sort of specialized work. A good example would be baseball without the minor leagues. You might pay an entry worker $40,000 and literally receive no value in return, because that person is learning their craft.

Over time, as they continue to learn their craft, they start to contribute and their "value", as defined in my approach, begins to rise.

But, depending on the rate of increase in salary vs. contribution, that person's "value" might actually increase over time, and over different salary levels.

In a theoretical perfect labor market, with no inherent learning skill issues, value would remain constant across salary levels.

But baseball is different. My articles are about trying to determine how it is different, and what some of the implications are.

Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 6:58 a.m., November 30, 2003 (#10) - studes (homepage)
By the way, I posted a link to your comments and analysis from my site. I hope you don't mind.

The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)

Discussion Thread

Posted 11:49 a.m., December 1, 2003 (#4) - studes (homepage)
I agree with Steve. I don't see how this analysis disproves the existence of "total clutch" hitting. If anything, one could argue that the standard error of 10% is related to clutch hitting, or lack thereof, and that this "proves" the existence of sustained clutch hitting for certain batters.

The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)

Discussion Thread

Posted 2:00 p.m., December 1, 2003 (#10) - studes (homepage)
Cyril, I agree with your comments about ballclubs buying that data. I personally don't think that WPA is very useful to front offices, because it lacks predictive value, compared to other stats. I think it would be bizarre if front offices made personnel decisions based partly on value metrics such as WPA or Win Shares. As a fan, on the other hand, I find this sort of data fascinating.

I really don't think clutch hitting is provable. A random distribution of batting, even for players with long careers, will sometimes turn up batters who look like awesome clutch hitters (or the opposite).

Marcel, The Monkey, Forecasting System (December 1, 2003)

Discussion Thread

Posted 6:02 p.m., December 1, 2003 (#16) - studes (homepage)
That is one helluva monkey.

Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 11:26 p.m., January 14, 2004 (#17) - studes(e-mail) (homepage)
Well, I don't have anything for you, Colin. But are you willing to share your database?????

Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 2:43 p.m., January 15, 2004 (#19) - studes (homepage)
Colin, check out the copyright of the Win Shares book. I think that as long as you don't sell the database (ie. for commercial purposes), you can do whatever you want with the info. I went through this with a Stats salesman already, as you can imagine.

Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 8:02 p.m., January 15, 2004 (#23) - studes(e-mail) (homepage)
Michael/Tango: if it's not posted on Primate Studies, please send me a copy of the Excel spreadsheet, too. Thanks.

Regarding Davenport's numbers, is there some way to get them other than looking up each player one by one? Seems like a lot of work to me.

BTW, how's the book coming, Michael?

Evaluating A-Rod (December 8, 2003)

Discussion Thread

Posted 7:41 a.m., December 9, 2003 (#5) - studes (homepage)
In a recent issue of the "Journal of Sports Economics", a couple of guys ran a study in which they found that the average revenue per marginal win increases as total wins increase. That is, the 80th incremental win is not worth as much as the 85th, which is not worth as much as the 90th, etc. (No, I don't subscribe, but I did backorder the issue)

I think the market approaches ballplayers in the same manner. That is, I think the market pays more for a single player's seventh win than the sixth win. I think that's rational, but only within the context of the total team's position (something that wasn't well-considered with ARod). Over time, I'm exploring this thought on my site.

There are several other important aspects of baseball economics that drive up the salary paid for an incremental win. One is the relatively high fixed cost of running a baseball team. I've worked and consulted in several industries with high fixed costs, and it's interesting to see what the market does. Sooner or later, each market (if it's competitive enough) squeezes the incremental profit margin out of the product.

That is, the temptation to offer a player virtually all of your incremental revenue is too hard to resist, because ANY contribution to your bottom line helps. I believe baseball owners often look at baseball players this way. The players are the mechanism whereby owners realize their return on their large fixed investment.

The other issue is that the "win market" is a zero-sum game. There is a cap on the number of wins that a "market" can produce. An individual team can't invest more money and create more wins, beyond some reasonable number, because the total can't grow.

So game theory comes into play here, and I'm no expert in game theory. But I believe the zero-sum nature of the win market turns the entire affair into a bit of a gambling casino.

The junk bond thought is a good one. Think of your roster as a diversified portfolio of investments. In order to increase your total wins, you will decide to take on some riskier investments. That is, you'll price a junk bond like a AAA bond, in order to reach a certain level of wins, as long as the overall risk of your portfolio is under control. This is also rational, within the appropriate team context.

I'll stop here -- too many thoughts are making me ramble. But my basic point is that there are reasons to pay a player more than $15M. Some of them are rational, some are built into the economics of the business, others are irrational. But I believe the case can be made.

Evaluating A-Rod (December 8, 2003)

Discussion Thread

Posted 10:55 a.m., December 9, 2003 (#8) - studes (homepage)
Right. I decided to argue. :)

Evaluating A-Rod (December 8, 2003)

Discussion Thread

Posted 12:18 p.m., December 9, 2003 (#12) - studes (homepage)
Well, geez, my Microeconomics teacher told me that your salary WILL be your marginal revenue.

Well, that's a weird statement for an economics professor to make. It only makes sense to me if it applies to a situation in which the owner and labor are the same person.

If business management does not get to keep any of the marginal revenue generated by a player, then there is no incentive to hire that player. None. In every deal that has been made, the GM/owner made some sort of assumption about incremental revenue and the portion of that revenue he (or she) would get to keep.

Now, their assumptions may have been completely wrong. But that's a different matter.

Evaluating A-Rod (December 8, 2003)

Discussion Thread

Posted 2:38 p.m., December 9, 2003 (#19) - studes (homepage)
J., I agree with most of your last post. I went to business school, and have been a hardcore businessman for twenty years, so economic theory language often doesn't make sense to me.

I believe that when Hicks cut the deal with A-Rod, he truly believed that the marginal revenue generated by A-Rod for his team would be greater than his marginal cost/salary. Turned out, he was wrong. So now he wants to trade him.

To the Red Sox, who also believe that the marginal revenue he will generate will be greater than his marginal cost/salary -- particularly in contrast to the Manny situation.

The notion of increasing marginal revenue as total player wins increase is important to the discussion, too.

Building the 2004 Expos (December 8, 2003)

Discussion Thread

Posted 7:09 a.m., December 9, 2003 (#5) - studes (homepage)
I feel strongly that not offering Vlad arbitration would not be the "fair" or "moral" thing to do, even under David's circumstances.

The Expos invested a lot of money and time in Vlad. He was paid fairly and paid well for his contribution. Now, because of the rules of this labor market and the bizarre ownership structure of the Expos, he gets to walk and the Expos get nothing.

The rules are clear here, as are the repercussions. Even if the Expos were run by angels, I think offering Vlad arbitration would be fair and moral -- even if the Expos knew they would not be able to afford him.

Correlation between Baserunning and Basestealing (December 10, 2003)

Discussion Thread

Posted 2:13 p.m., December 10, 2003 (#8) - studes (homepage)
I may be wrong, but it seems to me that you have a huge multicollinearity issue when you include both SB and CS in your formulas (that is, the correlation between SB and CS is huge). I would think that it undermines the equations, though I don't know how much.

Correlation between Baserunning and Basestealing (December 10, 2003)

Discussion Thread

Posted 3:38 p.m., December 10, 2003 (#10) - studes (homepage)
Yah, I know you mentioned the correlation; I was just pointing out some of its implications.

Anyway, merging two of the formulas in this way is a bit more interesting, at least to me:

BS+BR = 0.20 * SB - 0.09 * CS - .01 * timeson1B

Harder to calculate, but it more dramatically shows the implicit "positive" correlation of CS on overall baserunning lwts, and also maintains a rate factor.

Great stuff, Tango. Extremely insightful.

Eck (December 10, 2003)

Discussion Thread

Posted 9:15 p.m., December 12, 2003 (#2) - studes (homepage)
Bob, I would think that they very clearly do. Those were not two separate players you're talking about -- they're one guy. Think of it in terms of Win Shares, or something like that. What was the total value that he created, over his entire career? I personally wouldn't think of his splits as two separate players.

I'm not arguing that he belongs in the Hall -- he seems right on the borderline to me -- but I wouldn't undervalue his total worth by bifurcating his accomplishments in two.

Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)

Discussion Thread

Posted 7:21 p.m., December 11, 2003 (#2) - studes (homepage)
Same with Win Shares, by the way. Valentin was the top-rated shortstop in the AL.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 11:05 p.m., December 15, 2003 (#7) - studes (homepage)
Time for me to tackle Fibonacci. I was sort of sneaking up on this. I printed out the old Fanhome discussion a long time ago. So I'm going to plug in 61/1.61 just for kicks and see what happens. I'll get it done in a day or two.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 1:09 a.m., December 16, 2003 (#10) - studes (homepage)
Guy, I'm not completely following everything you say, but I do want to point out that Tango's numbers do occur today. The 3/5/7.8 distribution is pretty close to today's averages.

Also it's very important to remember that moving the thresholds to 1.61/.61 DOES NOT imply that pitching/fielding are more imporant than batting. It's a mathematical thing. It means that a run not allowed helps slightly more than a run allowed hurts.

Or, from the batting point of view, a run not scored hurts a bit more than a run scored helps. Defense and offense are still 50/50 responsible for the outcome of a play.

Also, regarding AED's comments on replacement level -- I think I understand the issue, but I'm not convinced it's a big deal with Win Shares. Yes, Win Shares does use a pseudo-replacement level for batting and fielding, but it's a relatively low replacement level. It's not nearly as high as Baseball Prospectus's, for instance.

Once/If I get to it, I have a feeling we'll find that replacement level for a position player is something like eight Win Shares. It won't matter if the Win Shares are from batting or fielding -- eight will about do it. And I think it will be lower for pitchers (at least, given how the current system is configured). Something like four.

I may be wrong (I often am), but it seems to me that you don't need a zero-based system to avoid double counting replacement levels.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 9:41 a.m., December 16, 2003 (#12) - studes (homepage)
I plugged in 61% this morning. Pretty interesting impact.

The percent of WS attributed to batting/baserunning drops from 48% to 39.5%. Pitching increases from 36% to 42%. Fielding goes from 16.6% to 19%. So it appears that offense gets definitely underweighted.

The WS leader (these are not the original numbers -- they include some of the methodological changes I've made) was Delgado at 33, but he drops to 31 and ARod is the leader. He only drops from 33 to 32. In fact, everyday players, other than the DHs and 1B's, don't see their totals move too much.

One of the unintended consequences (but maybe a good one) is that starters are helped a bit more relative to relievers. Their innings pitched help them garner a larger percentage of pitching Win Shares.

For instance, Hudson moves from 24 to 27, while Foulke moves from 20 to 22.

So I think the practical implications of Fibonacci are too extreme.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 11:07 a.m., December 16, 2003 (#15) - studes (homepage)
Steve, I agree completely. I didn't mean to make a categorical statement. I do believe replacement value will vary by position. Colin, I think you have a very valid point too. If you could elaborate (here or in the thread on my site) that would be great.

In the end, it may well be that varying replacement levels by position and function is the only thing that can make Win Shares whole.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 1:46 p.m., December 16, 2003 (#20) - studes (homepage)
Conversely, each additional marginal RS is less valuable than the preceding RS -- first is worth .083, second is .070. Perhaps this is underpinning of old saw that "great pitching beats great hitting?"

Well, as I tried to say before, this is a mathematical phenomenon. In particular, it's a result of basing a system on variance away from a .500 level, which is what Win Shares does. If you build a system on a bottom-up level, like WARP or Win Advancement, I would think you'd have different impressions of the relative weight between the two.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 9:55 p.m., December 16, 2003 (#26) - studes (homepage)
In other words, the total variance from 50% from one variable and 15% from another is similar to 52.2% from only one variable. This is equivalent to giving position players 60% of the win shares and pitchers 40%.

I apologize, but this conversation is beyond me. However, AED, I particularly did not follow this thought, and would like to. How did you get to the 52.2%, and then to the 60/40 split?

Thanks.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 12:16 p.m., December 17, 2003 (#31) - studes (homepage)
AED, that's great. Thanks for taking the time to go through the details. I'm not a mathematician (my son got those genes, which seem to skip a generation) but I'll spend a bit of time with your description and see what I can apply to Win Shares.

I know Win Shares drives some folks crazy (and I understand why) but hey, I'm having fun with it on my site, and a few people seem interested. If nothing else, all this work will help me better present some of the stats and graphs on my site next year. I certainly don't intend to re-work all of Win Shares, but a few tweaks can make the output more valid.

Having said that, Michael, if you want to try and tackle some of the methodology you outlined, I'd be happy to try it. I'd need a lot of supervision, however!

Anyway, thanks for the dialogue.

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 1:35 p.m., December 17, 2003 (#32) - studes (homepage)
Putting 1 and 2 together, one concludes that Bill James, at least, feels that the standard deviations of offense, pitching, and fielding have a ratio of 52:35:17. Since most of offense and fielding are from position players, one can estimate that the ratio of standard deviation of total position player contributions to that for pitchers is 54.7:35, or 6:4.

This is where I still get lost. Has this methodology been covered in another thread that I'm missing? How does one make the leap from 52:35:17 to 54.7:35?

Do Win Shares undervalue pitching? (December 15, 2003)

Discussion Thread

Posted 2:37 p.m., December 17, 2003 (#36) - studes (homepage)
Ah. Thank you.

Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 9:41 a.m., December 28, 2003 (#11) - studes (homepage)
Excellent article. I agree with Guy's point, though I also agree with MGL. I cannot think of a way to formulate a study that would not include a signigicant selection bias.

David, no fair whetting our appetite. What's ALP?

Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 12:48 p.m., December 28, 2003 (#13) - studes (homepage)
Sorry, David. I should have guessed that.

Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 3:00 p.m., December 29, 2003 (#30) - studes (homepage)
...since making outs as a reliever is "easier," then relievers progress teams more quickly toward wins, on an inning-for-inning basis, and thus earn more win shares due to their role alone.

I think you may be mixing up skill and performance. Guardado and Loaiza performed similarly per inning (based on ERA), as did Spooneybarger and Eaton. Guy's point is that the starters of these two matched sets have a better set of skills/ability, and he feels that a value-based system ought to recognize that fact.

I also wanted to make a few Win Shares comments (if I may). The reasons relievers receive more Win Shares per inning are:

- The relative credit for wins and saves (which I've argued should be changed or dropped altogether because it favors relievers).
- The leveraged innings concept (Spooneybarger had 6 holds in 2003, which gave him some leveraged innings).
- The Component ERA (ERC) adjustment. Spooneybarger was also helped by this, as his base stats were very good (42 IP, 1 HR, 32 K's, 11 BB's and 27 H's). Starters don't get a component ERA adjustment.

So relievers earn more win shares per inning because the innings they pitch are more important than a starter's, not because their roles are easier. These are two different points altogether.

I'm torn about the best way to handle this, because value is value, regardless of the underlying skill. On the other hand, I think Guy's point is valid, because a reliever's value is based largely on the way he is used by the manager -- which makes reliever evaluation fundamentally different than any other evaluation.

I tend to think that replacement level is the way to go, though Tango and Colin raise good points. To me, replacement level is an economic concept and not a skill concept -- and the market is speaking pretty clearly about the relative value of starters vs. relievers (see Colon/Foulke).

Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 3:23 p.m., December 29, 2003 (#33) - studes (homepage)
Perhaps the best way to establish replacement level for starters is per game started, not inning pitched.

Valuing Starters and Relievers (December 27, 2003)

Discussion Thread

Posted 3:42 p.m., December 29, 2003 (#34) - studes (homepage)
Sorry. I'm a bit out of order. #33 was in response to Colin and Guy.

Tango (#32), are you saying that Guy may have a point for an average pitcher, but that a replacement level per inning might be the same for starters and relievers? If so, how can a measurement system resolve that?

Best Fielding Teams, 2003 (December 28, 2003)

Discussion Thread

Posted 11:43 a.m., January 29, 2004 (#9) - studes (homepage)
Tango, I just came across Post #5 and I want to thank you. That is truly awesome work.

FIP and DER (December 30, 2003)

Discussion Thread

Posted 7:37 a.m., December 31, 2003 (#3) - studes (homepage)
Thanks for posting the article, Tango. How is the DERA definition not technically accurate?

Guy, every time I've looked at it, I've come to the conclusion that responsibility for DER is around 50/50, maybe as high as 60/40 pitching.

This isn't inconsistent with Voros' theory, by the way. Saying that pitchers don't have a lot of control over BIP is different than assigning responsibility for BIP. IMO, if fielders can't be expected to reach a batted ball, than the pitcher is responsible for the hit.

BTW, I was going to compute XBHDER, using your formula, but that data isn't available historically so I backed off. I should probably conduct the analysis, but with a more recent group of pitchers.

I was thinking that the next step is to construct a pitching/fielding split for specific pitchers and teams, in which the pitcher credit/responsibility increases proportionally as FIP decreases. Not sure what the exact formula might be.

FIP and DER (December 30, 2003)

Discussion Thread

Posted 10:42 a.m., December 31, 2003 (#6) - studes (homepage)
I don't mean to bog this down, but why doesn't FIP have the same scale as ERA? I thought that was the point of it.

FIP and DER (December 30, 2003)

Discussion Thread

Posted 1:42 p.m., December 31, 2003 (#8) - studes (homepage)
Thanks, Tango. I understand. However, given that you basically use linear weights for your weights in FIP, I think it comes pretty close to a proportion. Average DERA across all pitchers was 3.00, BTW.

I struggled with this article a lot, and I don't think I nailed it. The conclusion bugs me. In theory, shouldn't DERA decrease as FIP decreases? Less BIP, less impact of a hit. Instead, DERA increases (and DER stays flat) as FIP decreases. I think I'm not doing this quite right.

My guess is that I'm using FIP in a way that wasn't intended, so I'm probably misinterpeting or misapplying something. In particular, it would seem to me that applying league-average weights to FIP for extraordinary pitchers is throwing off the conclusion. What do you think?

FIP and DER (December 30, 2003)

Discussion Thread

Posted 3:49 p.m., December 31, 2003 (#10) - studes (homepage)
Thanks, Tango. Agreed. I might try and play with this to see if I can develop a proxy for the weights (don't want to re-run your spreadsheet for every pitcher). Maybe I'll use the table you posted a week or two ago, in some way.

FIP and DER (December 30, 2003)

Discussion Thread

Posted 10:05 a.m., January 1, 2004 (#13) - studes (homepage)
Agree about the high-FIP pitchers. But even looking at pitchers with FIP of 1.5 and lower, DERA actually rises as FIP declines.

FIP and DER (December 30, 2003)

Discussion Thread

Posted 2:19 p.m., January 1, 2004 (#15) - studes (homepage)
Yes, I agree, Tango. I need to figure out some way to do that as a next step. I'd like to refine this analysis, and then use it to replace the current Win Shares methodology for splitting runs allowed between pitching and fielding.

FIP and DER (December 30, 2003)

Discussion Thread

Posted 7:57 a.m., January 2, 2004 (#17) - studes (homepage)
Thanks, Charlie. I'll send you an e-mail over the weekend about this, but I'm trying to develop an approach that would adjust the split, based on the underlying run environment established by the pitcher.

Your split formula (and James's) has an implicit assumption about a "baseline" split, if you know what I mean. I'm trying to see if there's a way to establish a "baseline" split based on the fundamental team and pitcher environment.

If I'm able to somehow make it work (a big if), then I hope to add your adjustments, including ADER, to refine the splits.

FIP and DER (December 30, 2003)

Discussion Thread

Posted 1:01 p.m., January 2, 2004 (#22) - studes (homepage)
I'm sorry that I'm not following your example, Tango, and I shouldn't be answering a question you posted for Charlie.

But from my point of view, the issue is whether you give a DIPs/FIP weight to K's, or a more typical out-weight to K's, in the Win Shares framework. My interpretation may be off, but I think that James gives them more of a typical out weight, and I think the DIPs/FIP weight is the better approach. I think Charlie agrees with this, too.

MGL - Component Regression Values (PDF) (January 8, 2004)

Discussion Thread

Posted 10:04 a.m., January 9, 2004 (#5) - studes (homepage)
THANK YOU! This is one of the areas I've wanted to better understand. MGL's article, and Tango's comments, are filling in one of the (many) holes in my understanding of some key concepts.

BTW, this has been a great offseason for Primate Studies. Nice job, Tango.

MLB Timeline - Best players by position (January 14, 2004)

Discussion Thread

Posted 11:32 a.m., January 14, 2004 (#1) - studes (homepage)
Wow. That is a nice presentation. It looks like output from Excel, which blows me away.

Of course, it would also be nice if they explained their rankings. But it is a good chart for what it does.

Mike's Baseball Rants - Hall Of Fame (January 21, 2004)

Discussion Thread

Posted 2:48 p.m., January 21, 2004 (#1) - studes (homepage)
Well, I'll chip in. Out of all the things that have been done with Win Shares this offseason, I think this is the best, most appropriate and creative use of them. Note that he's not using Win Shares to make a case for a player in particular (though he could) but he's using them to spotlight trends in voting.

I love a couple of those summary tables that clearly show how that Hall has gotten stingier in recent times.

Just a nice job all around. Mike's site doesn't get enough attention.

A Graphical History of Baseball (January 23, 2004)

Discussion Thread

Posted 10:12 a.m., January 24, 2004 (#2) - studes (homepage)
I love these graphs. One of my hopes is that, over time, sites will start to integrate stats with graphs like these. Wouldn't it be neat if Baseball Reference pulled up a graph next to the numbers when you looked up certain things?

Adam, I've been thinking of creating historic FIP and DER graphs. I'll do that and hopefully post them on my site over the weekend, and leave a message here when I do.

BTW, there are also some neat graphs on this page:

http://www.rose-hulman.edu/~rickert/BB/

I particularly like the graphs of cumulative W/L records of the eight original major league teams.

A Graphical History of Baseball (January 23, 2004)

Discussion Thread

Posted 4:56 p.m., January 24, 2004 (#3) - studes (homepage)
OK, I did it. You can find FIP and DER line graphs at the home page link, as well as a few other graphs I thought of.

Actually, this article turned out to be really interesting, I think. Beyond what I originally imagined. Please leave any comments you might have.

Baseball Graphs - FIP and DER (January 24, 2004)

Discussion Thread

Posted 11:22 a.m., January 25, 2004 (#1) - studes (homepage)
Thanks for the link, Tango. It seems highly likely that the new parks are responsible for a large portion of the increase in BABIP in the 1990's.

However, I think the 1940 transition is the most amazing thing. DER climbed from the mid seven-teens to the mid seven-thirties in the course of three or four years and stayed there for several decades after. There were apparently no park factors involved, and it happened to both leagues. I have never read anything about this before, have you?

I've got to say, it looks like one of the most fundamental, profound changes in the history of the game, and I had never heard of it before.

Baseball Graphs - FIP and DER (January 24, 2004)

Discussion Thread

Posted 12:11 p.m., January 26, 2004 (#8) - studes (homepage)
Interesting. I don't have the source file with me, but I believe the increase occurred from about 1938 to 1941. Straight up, and stayed up.

It makes sense that the gloves would have a big impact, but I didn't realize that they were adopted that quickly. Is there a "history of baseball gloves" anywhere? A chronology would be cool.

Baseball Graphs - FIP and DER (January 24, 2004)

Discussion Thread

Posted 12:12 p.m., January 26, 2004 (#9) - studes (homepage)
Never mind. Just went to coach's link.

Still, I'm sure there's a more detailed history of baseball gloves somewhere.

Baseball Graphs - FIP and DER (January 24, 2004)

Discussion Thread

Posted 2:19 p.m., January 26, 2004 (#11) - studes (homepage)
This article puts the development of the new glove at 1935, with the introduction of the official Rawlings version at 1941. This fits well with the DER record.

https://customglove.securelook.com/extras_gloveevolution.html

Fascinating stuff.

Baseball Graphs - FIP and DER (January 24, 2004)

Discussion Thread

Posted 4:11 p.m., January 29, 2004 (#13) - studes (homepage)
Sure sounds like it, doesn't it?

Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 2:54 p.m., January 27, 2004 (#15) - studes (homepage)
Very nice job by Jay, especially in the presentation of information.

Not to be a nattering nabob of negativism, but I do think that FIP is just as powerful, and a whole lot less work.

Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 11:54 p.m., January 27, 2004 (#25) - studes (homepage)
One-year park factors are my newest pet peeve. There is no good reason to use them, in this case or any other, if you have long-term park factors. It is just as easy to have a fluke park factor year as it is to have Brady Anderson hit 50 whatever home runs in a year. Okay, maybe not that easy, but still easy. Why apply a fluke park factor to get at the "truth"?

I'm not jumping on you, Jay, cause you're just following the methodology. But one-year park factors are really not that much better than no park factors at all.

Clutch Hitters (January 27, 2004)

Discussion Thread

Posted 7:35 a.m., January 29, 2004 (#10) - studes (homepage)
That's interesting to me; I'd like to see the list of leaders and laggards in batting LI.

Have you ever looked at batting LI before? You could potentially use it to (once again) see if batting order really has any impact (do cleanup hitters have the most LI?), or judge to see which hitters were most "on the spot."

Smack the Pingu (January 29, 2004)

Discussion Thread

Posted 8:13 a.m., January 30, 2004 (#4) - studes (homepage)
1216 here. A negative number? Did you swing backwards or something?

Tango, is this Canadian baseball?

Smack the Pingu (January 29, 2004)

Discussion Thread

Posted 2:46 p.m., January 30, 2004 (#11) - studes (homepage)
I'm pretty sure bob is cheating.

Smack the Pingu (January 29, 2004)

Discussion Thread

Posted 5:28 p.m., January 30, 2004 (#16) - studes (homepage)
told ya.

Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 12:26 p.m., January 30, 2004 (#6) - studes (homepage)
Great job, Tango. Two simple questions: did you correct for ballpark at all? (I assume you did).

Also, why did you use the ratio of performance to the league, instead of straight performance metrics? Was that to isolate the quality of batters faced?

The genius of Paul DePodesta (February 4, 2004)

Discussion Thread

Posted 11:23 a.m., February 10, 2004 (#27) - studes (homepage)
BTW, a friend of mine heard him speak at an investor conference, and said he was super. Great speaker. DePodesta is originally from the East Coast, and said, in response to questions, that he wouldn't mind going back to the East Coast. Something like "If the Expos move to D.C., they could get me cheap."

Also, he said that Michael Lewis originally was just working on a NY Times Sunday article when he was covering the A's. It wasn't going to be a book. Evidently, they cut him off when they realized that he was going farther (after about a month of observation) and DePodesta claims that he must have gotten a lot of his information based on other sources.

Batter's Box Analysis (February 5, 2004)

Discussion Thread

Posted 12:37 p.m., February 5, 2004 (#2) - studes (homepage)
Neat stuff. Robert is one of the best analysts/writers around. I like the Manifested Power, as well as the Kansas City Futility Score.

How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)

Discussion Thread

Posted 5:03 p.m., February 10, 2004 (#2) - studes (homepage)
Great work as usual, MGL. Do you really mean some of your comments? For instance, is Delgado really not nearly one of the best players around? -3.5 runs over 162 game for his baserunning doesn't seem to warrant such a downgrade.

Guess we'll see when all the slwts are finished.

Peak Age by Year of Birth (February 11, 2004)

Discussion Thread

Posted 1:00 a.m., February 12, 2004 (#1) - studes (homepage)
Alright! Great graph, tells a story, love it.

Seems pretty clear that the average peak age has been increasing over the past fifteen years. I wonder how he weighted the players? By plate appearances, or maybe not at all? Was there a minimum career length?

Great job, bob.

ARod and Soriano - Was the Trade Fair? (February 16, 2004)

Discussion Thread

Posted 11:19 p.m., February 16, 2004 (#7) - studes (homepage)
According to a recent study I read re: incremental revenue, an extra win is worth less than $1M to a non-contending team. The Rangers weren't going to contend, so ARod's contract didn't make sense. It obviously makes a lot of sense to a contending team in media heaven, like the Yankees.

I thought both articles were very weak, too.

ARod and Soriano - Was the Trade Fair? (February 16, 2004)

Discussion Thread

Posted 6:54 a.m., February 17, 2004 (#10) - studes (homepage)
Yes, signing pudge was a gigantic waste of money. you knew that.

Of course, the Tigers are desperate, so who knows what the economic rules should be for them.

You're right about the economics. Build a team with young, cheap guys, then spend for free agents when you're ready to contend. Ideally.

It would be an interesting study to go back and review non-contending teams that signed big contracts with free agents, to see how many of them "worked."

btw, I put the link in homepage to the actual review of the article I wrote.

ARod and Soriano - Was the Trade Fair? (February 16, 2004)

Discussion Thread

Posted 5:26 p.m., February 17, 2004 (#24) - studes (homepage)
Erik, that's an interesting way of describing the "player market". Can you elaborate a bit? How does the 15-slot constraint play out when "inflating" the salaries of marquee players?

I tend to think of it as game theory, but I think you're describing something a bit different.

Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 12:18 p.m., February 17, 2004 (#6) - studes (homepage)
By definition, the sum of the Win Shares numbers for all players should equal zero. If they don't then there is an error in the numbers or the methodology. I'll go back and review the data I sent Avkash.

I also agree that WS scale is an issue. MGL, I agree about the preference for UZR. But I do appreciate what Avkash did, because I'm interested in improving Win Shares' fielding numbers, and this approach helps a lot. I'm hoping to eventually use Michael Humphries' DRA approach, when he publishes his equations.

Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 12:32 p.m., February 17, 2004 (#8) - studes (homepage)
I'd be interested in those "r" stats with the right outfield numbers. My guess is that Rate2 and WS would be about equivalent.

Of course, I'll run it myself when I get a chance.

Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 1:23 p.m., February 17, 2004 (#11) - studes (homepage)
Win Shares undervalues defense and overvalues regulars.

I understand the second point (because mean is higher) but not the first. Are you saying this because the sd is lower for WS? Do you mean WS does not properly credit outstanding fielders?

If so, I agree.

Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 1:54 p.m., February 17, 2004 (#14) - studes (homepage)
I know I'll get hammered for this, but I should point out that Win Shares is not just a measure of range. It also includes claim points for double plays, errors, outfield assists, etc. I'm not defending Win Shares for doing this, just pointing out that it isn't a straight measure of range, as Pinto's is. I'm not sure which UZR metric Avkash used.

I was looking through some old Baseball Abstracts, and was surprised to find that Bill James actually laid out his Win Shares fielding structure in the 1982 Abstract. He rated fielders according to four diferent types of buckets, which differed by position, totaling 100 points, I think. It was virtually EXACTLY what went into fielding Win Shares.

Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 1:57 p.m., February 17, 2004 (#15) - studes (homepage)
Agreed, Tango. One of the problems with fielding Win Shares is lack of negative Win Shares.

Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)

Discussion Thread

Posted 5:09 p.m., February 17, 2004 (#18) - studes (homepage)
Thanks, Avkash. I understand you used UZR/162 games. But there are several types, I think. For instance, there's straight range, but there are also DP impact numbers, etc. Depends on which column you pulled.

Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 7:16 a.m., March 2, 2004 (#4) - studes (homepage)
Right on, Tango. There are at least four systems I would choose over DFT: ZR, UZR, DRA and Pinto. And, like you, I don't even know what goes into DFT. Because of that, I'd even reference fielding Win Shares before DFT! At least I know what some of the flaws in WS are.

Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 1:46 p.m., March 2, 2004 (#25) - studes (homepage)
That's a weird response. Who is the target audience? I would presume it is a relatively informed baseball fan with an analytic bent. And why wouldn't that sort of person be interested in UZR and other systems?

BPro writes some very informative stuff, but their attitude stinks.

Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 2:12 p.m., March 2, 2004 (#28) - studes (homepage)
Dotterer, you're right. I really should amend my statement to say that sometimes I don't understand BPro's attitude. Full disclosure: I'm a BPro subscriber, so I obviously like what they do.

Baseball Prospectus - : Evaluating Defense (March 1, 2004)

Discussion Thread

Posted 12:28 a.m., March 5, 2004 (#37) - studes (homepage)
As I noted in the previous thread, systems, like Win Shares, that are based on more than just range are going to have lower standard deviations than systems that are used to estimate range.

Like it or not, Win Shares includes double plays (in fact, ranks them ahead of range for second basemen), error rates, etc. etc.

Could this be part of what's going on with DFT? I honestly don't know what goes into the DFT equations, and it wouldn't explain the preponderance of negative rankings, but it might help explain the lower variance.

The 2004 Marcels (March 10, 2004)

Discussion Thread

Posted 5:45 p.m., March 22, 2004 (#31) - studes (homepage)
Great job, Tango. Question: did you think of taking a FIP/DIPs approach to the pitching stats? Or would regression to the mean take care of that, in theory?

The 2004 Marcels (March 10, 2004)

Discussion Thread

Posted 7:13 a.m., March 25, 2004 (#33) - studes (homepage)
I was thinking that a FIP/DIPS approach is a better way to predict ERA than using the previous three year's ERA. Take out the BABIP over the last three years, average and regress your result, and then add them back in.

Silver: The Science of Forecasting (March 12, 2004)

Discussion Thread

Posted 3:13 p.m., March 12, 2004 (#7) - studes (homepage)
Tango, that's a great question. My gut tells me that it does matter -- that you do want to forecast similar players based on components. I think this is particularly true of pitchers, and somewhat true of hitters. And I might wonder if there is some cross-impact between a certain skill set in a certain ballpark, when forecasting players.

OTOH, I don't think the components have very much impact, and the added accuracy may not be worth the added work and complexity.

This was a nice article. Nate is the best at using graphs in his work. Well, second best...

Sophomore Slumps? (March 23, 2004)

Discussion Thread

Posted 1:26 p.m., March 23, 2004 (#6) - studes (homepage)
Shoot, he showed it to me before publication last night, and it was the first thing I said!

Got to remember that Aaron is writing to a different type of audience than the folks who drop by here.

Sophomore Slumps? (March 23, 2004)

Discussion Thread

Posted 5:21 p.m., March 23, 2004 (#12) - studes (homepage)
Calm down, MGL. THT is not primarily about cutting-edge analysis, like you guys do here. It's not even about analysis, really. For instance, I won't be posting my heavier research articles on THT. I'll put those on baseballgraphs, still.

THT's articles will be about baseball, and the writing will be aimed at the common fan. I would expect you to be dissatisfied with a lot of the it.

This article was no different than the sort of thing Aaron has been posting on his blog all along.

Sophomore Slumps? (March 23, 2004)

Discussion Thread

Posted 6:06 p.m., March 23, 2004 (#14) - studes (homepage)
That's a good point, MGL. Actually, that's a good idea for a follow-up article: what happens to most players when they have a year similar to a typical ROY? I think we all know the answer, but it would be constructive to carry out the analysis.

Mo and the HOF (March 25, 2004)

Discussion Thread

Posted 2:26 p.m., March 25, 2004 (#6) - studes (homepage)
Not to be a Win Shares apologist, but Matthew only starts his article with Win Shares, and then goes onto other stats. I do think that HOF qualification is one of the best uses of Win Shares.

I think we're all pretty certain that Win Shares overvalues today's relievers. Arguably, every statistic overvalues relievers, as we discussed in a previous Primer discussion (see homepage link).

I agree that Mo has been lights out in the postseason, but he's also had many, many more opportunities than other players, because of the teams he's played on. How much credit should he be given for that?

Copyright notice

Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.

If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.