Individual Poster Page

See copyright notice at the bottom of this page.

List of All Posters

 


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 2:53 p.m., June 19, 2003 (#5) - RossCW
  I don't see any evidence for the claim that more BB and K's means more pitches are being used. It seems to make sense on the surface, but are there any studies that show what the current average number of pitches is before a ball is put in play, compared to K's and walks?

Even if you look at current data that does not mean the same percentages were in play 20 or 40 years ago when striking out was considered a much worse outcome. Batters would choke up and focus on just making contact to avoid getting a third strike. That could easily mean many more foul balls and weak hits in play after lengthy at bats than presently.

- higher OBA means more batters / game meaning tougher to get a complete game

Again - is there solid evidence for this - or is it speculation?

A rough OBA of (H+BB)/(AB+BB) shows MLB OBA in the 50's to be consistently higher than last year's. It was high in 1999 and 2000, but last year it was .327, slightly below the average of .328 since 1920 and in 2001 it was right at the average.

I also am not convinced that OBA is a placeholder for the number of pitches a pitcher throws - especially across eras. There are a lot of other variables that effect pitches thrown.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 4:37 p.m., June 19, 2003 (#9) - RossCW
  there are about 5.4 pitches thrown per BB, 4.8 per K, and 3.3 per BIP

Has that been consistent across era's?

So, if you have a pitcher with lots of (BB+K) / PA, you can bet that that pitcher will have more pitches / PA than league average

Well you can bet on it, but it may not be true. The average pitches thrown for each category may vary widely by pitcher.

Is this what you are getting at?

No. Although it is part of it. The question is what are the effects of the difference in K's and walks between eras, which is the way Sheehan is using it. It is perfectly possible that 50 years ago more 2 strike counts ended in a ball in play rather than a third strike. That does not change the number of pitches at all.

I'm not sure why you are talking about the 50s, when the low OBA was probably in 1967/68

The point is today's OBA are not really historically higher than other era's - they are about average. I don't have the numbers in front of me, but MLB OBA appears to be quite volatile from one year to the next.

So, as the OBA rises, so does the number of batters per 27 outs.

Assuming that the number of double plays, caught stealings and runners thrown out advancing are all the same. I don't think they are.

With more batters available because of the run environment/playing conditions, the more pitches needed overall.

Assuming the number of pitches per batter is constant for all pitchers and regardless of the number of batters.

I agree with your last point that there are more variables at play for pitches thrown across era

I think there is a fundamental problem with using average results for comparing elite pitchers. The question is does Roger Clemens have to face more batters and throw more pitches to accomplish the same things Steve Carlton did. The fact that Sean Bergman gave up a lot of baserunners or Bobby Witt struck out a lot of batters and walked a lot of batters doesn't seem to have much to do with that. And yet when we compare era totals they are included in the analysis - it may be there were fewer Bobby Witt's and Sean Bergman's pitching in the mid-60's but that doesn't make Steve Carlton's job easier than Clemens.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 4:51 p.m., June 19, 2003 (#10) - RossCW
  Of course Carlton didn't really pitch in the mid-60's but the point it the same if you use mid-70's or Bob Gibson.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 11:44 p.m., June 19, 2003 (#15) - RossCW
  I think I understand what you are saying, but then I don't.

Well I'm losing track of the issue I orginally raised - the question whether you can conclude that pitchers in earlier era's had an easier time completing games.

One argument for this idea, as I understood it, was that there are higher OBA today and so pitchers had to pitch to more batters to complete a game. Another was that players get more walks and strikeouts. My point was this is pitcher specific. There is no reason to think that batters are more likely to get on base against the same quality of pitching today than they were in earlier eras. The same is true of K's and BB's. So there is no reason to think that a pitcher of the same quality will need to throw more pitches to complete a game today than they did 20 or 30 years ago.

I agree there is a close relationship for an individual pitcher between their opponents OBP and PA/9IP. There may be some variations in CS, pickoffs, double-plays and batters thrown out but I doubt they are significant. When you compare different eras I am not sure that is as true. When I looked at CS per base runner the rate was not constant.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 4:19 p.m., June 20, 2003 (#19) - RossCW
  In other words, a specific pitcher who allows 8 hits, 3 walks, and 7 strikeouts per 9 innings in 1920 (if there was one) should be expected to throw about as many pitches per inning as someone in 2003 who has the same numbers. Is that what you are saying?

It seems obvious that that is true. But a pitcher of the same quality would also be expected to have the same number of hits walks and strikeouts per 9 innings in both eras.

It may be that any overall increase in OBA is an effect of the best pitchers pitching a smaller percentage of the total innings rather than a cause.


Sheehan: Pitcher Workloads (June 19, 2003)

Discussion Thread

Posted 7:37 p.m., June 20, 2003 (#21) - RossCW
  Two pitchers of the same quality should be expected to have the same k's, walks, hits only if they pitch in the same circumstance. The differences in level of competition, size of hitters, hitters approaches, ballparks, fielding ability will affect all the pitchers numbers.

I agree.

How many K's per inning would Randy Johnson have in the 1930's, when hitters used heavier bats and many were more concerned about making contact than hitting for power? Probably less than he has now.

But then what is the common measure of quality? Its possible Randy Johnson wouldn't have been at all successful in the 30's. Its possible Walter Johnson wouldn't be successful today. It seems to me that we need to assume that pitchers with the same numbers in different era's were of similar ability.


Redefining Replacement Level (June 26, 2003)

Discussion Thread

Posted 8:06 p.m., June 26, 2003 (#10) - RossCW
  No. Why is it better to use an "exact" calculation which has limited real-world utility, than a more "inexact" calculation which has proper real-world relevance? ... what logic tells us is a better concept of value.

It seems to me that measuring from a barely discernable starting point means you have to account for that in the result. You may have a tape measure accurate within inches - but the actual accuracy of your measurement is closer to yards or even miles if your starting point is that far off.

The problem I have will Silver argument is that it seems to move from a theoretical replacement level to a practical one. From a practical standpoint, replacement players available to a team for little or nothing are the ones already on their major league roster. Any other player at minimum requires opening a roster spot and exposing some other player to waivers. Moreover teams use both the major league and minor league rosters to store spare parts. So if a team doesn't have a player on those rosters who meets the replacment level then it is likely because one wasn't available to them. That means the real replacement player level is the worst AAA replacement catcher (distinguishing replacement catchers from starting catchers who are prospects but may currently be even worse than their backup).


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 8:39 p.m., August 5, 2003 (#28) - RossCW
  Just want to point out that post #16 Erik Allen brings up something important ... the lower coefficient on XBH doesn't necessarily have any implications about baseball, it could easily be explained as the nature of the data

As could the difference between the correlation of BABIP from year to year and K/9 or HR or ... In fact its quite likely that the relative correlation of two different stats from year to year has no baseball meaning.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 12:35 p.m., August 6, 2003 (#36) - RossCW
  Let me just start by saying that I agree with everything you say up to the end of the official quote. You are absolutely right, IMO, to say that correlation coefficients can give us an indication of how predictive a given statistic will be for the next year. For many situations, this is all we really need, since we are simply trying to project next year's performance...my only objection was in trying to relate these correlation coefficients to physical realities of the game (i.e. attributing blame or credit to the hitter, pitcher, or fielder). I think that going down that road is very difficult to justify.

This is a much clearer statement of what I was trying to say.


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:53 a.m., August 7, 2003 (#47) - RossCW
  Therefore, a low correlation coefficient ON ITS OWN is not enough to say that a talent or persistence of ability does not exist.

Which is a point I have tried to make several times. But no one owes me an apology. I had a very strong hunch but lacked the statistical knowledge to identify the specific problem.

The question is: Can we create a model in which pitcher ability is fixed, and have that model describe the observed variability in BABIP? If you can, then the source of variability is really irrelevant.

Can you expand on this? I don't really understand whether you are reiterating the point about r's or saying something else.


Game-Calling Revisited - Chris Dial (August 16, 2003)

Discussion Thread

Posted 11:23 a.m., August 17, 2003 (#1) - RossCW
  This study seems to depend on the discredited parts of Voros claims. Or am I missing something?


CF Rankings (August 22, 2003)

Discussion Thread

Posted 12:33 p.m., August 31, 2003 (#19) - RossCW
  I don't think outfield range is a zero sum game. If the left-fielder catches a ball that deosn't mean the center-fielder has less range - it may mean the left-fielder has more. That's probably part of the problem with Torii Hunter's ranking - he plays between outfielders with the range to play center.



Double-counting Replacement Level (August 25, 2003)

Discussion Thread

Posted 10:28 p.m., November 4, 2003 (#47) - RossCW
  Eating dogshit is not the equal of eating filet mignon

'Tis true, there are many people in the world with a clear understand of the difference between right and wrong who would tell you that eating dogshit is, morally, vastly superior.

I am probably wrong my share of the time. But that doesn't mean that everything is created equal.

Certainly not all shares are equal - you must be a lion.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 2:34 a.m., August 31, 2003 (#5) - RossCW
  Whether a ground ball scores the runner often depends on the decision of the fielders. If they are playing in, they will go to the plate - otherwise they are conceding the run. While that is not always the case for groundballs, it is virtually never the case for fly balls.

Maybe so, but we are only awarding statistical credit here for what did happen, not for what was most likely to happen. Furthermore, the only studies I've seen suggest that there is no "ability" to hit a FB in a SF situation that is different than the player's overall tendency to hit FBs.

I don't see why it matters. The player who had the ability to hit fly balls more often is rewarded for it in a situation where it is a better outcome.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 1:03 p.m., November 3, 2003 (#9) - RossCW
  , but we are only awarding statistical credit here for what did happen, not for what was most likely to happen.

That is true - we only give "credit" when the fly ball does score the run.

Furthermore, the only studies I've seen suggest that there is no "ability" to hit a FB in a SF situation that is different than the player's overall tendency to hit FBs.

I doubt this is true but why would it matter? It still means that player had the ability to hit a fly ball that scored the run. Do we count home runs based on whether the player is trying to hit one or it simply happens?

A ground ball isn't really the same. In that case, it is the defense that is making the choice between the sure out at first and the possible out at the plate. It is the runner going or not going that initiates that choice.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 1:06 a.m., November 4, 2003 (#11) - RossCW
  As I recall, a batter's longer-term rate of hitting a SF (in a SF situation) is the same as his tendency to hit FBs in general.

If true, wouldn't that indicate they must be hitting FB at a higher rate in SF situations since clearly not every FB will lead to a successful SF?

a GB out(including all runner advancement and GDPs) and a FB out(including SFs, outs on OF Assts, etc.) have almost exactly the same value

Again, if true, it means that a FB is more likely to score a runner since I think groundballs will clearly advance runners on first and second more often than a flyball will.

So I just don't see what is the special quality of SFs which merits their being kept as a special category in modern baseball

As I pointed out, they are the only out that directly leads to a run scored. A ground ball out scores a run only if the fielder decides to allow the run to score. You will be hard-pressed to find a game that was won on a groundball out. Its probably happened, but the circumstances must have been pretty extraordinary.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 1:28 p.m., November 4, 2003 (#13) - RossCW
  "the league rate is to have a SF on 40% of all FB, then our above hitter will end up with 25 SF per 100 (FB+GB)."

I understood the claim was that his "longer-term rate of hitting a SF (in a SF situation) is the same as his tendency to hit FBs in general." Not that he hit FB in SF situations at the same rate as other situations.

But either way it doesn't prove much. Afterall, in SF situations not only is the hitter trying to hit a flyball, the pitcher is trying to prevent it. And in critical situations managers may even choose pitchers or hitters for that reason.

I doubt you will find hitters that have a special ability in hitting SF, beyond what is known about their hitting profile.

That may be. But it seems obvious that if a player is trying to hit a fly ball he is more likely to hit one than if he isn't and if the pitcher trying to prevent a fly ball they are more likely to prevent it than if they aren't.

Anyway, it's stupid to make the distinction in the official stats this way.

Just record what happened.

No one is eliminating the plate appearance from official records. The question is whether you are going to penalize a player who does his job driving in the run. The reason we don't take away an at bat for a ground ball that scores the run is the same reason we don't give players a hit on a fielder's choice. It was primarly the fielder's decision, not the batters ability, that determined the outcome.

I see no reason to treat failed SB the same as successful ones. There is simply no reason to not treat them as at bats and you don't count successful ones for the same reason you don't count SF.

since the pitching/hitting approach are completely different.

Hitting and pitching approaches change in a whole variety of situations. There is really very little reason to think that results can be measured and then treated as random samples. I sometimes wonder if the whole sabermetric community isn't in denial about that fact.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 10:16 p.m., November 4, 2003 (#17) - RossCW
  A batter does "his job" by scoring the runner from 3B? Nope. The win probability in almost all cases says that the batter REDUCED the chances of his team winning

I don't know how you are calculating win probabilities but this ought to be a red flag for your methodology. I don't it is credible to say that a team is more likely to win with one out and a runner on third than they are with that runner scored and two outs. It may be true if they are down by 10 runs but that is about it.

What you do is assume randomness to make life easy, but being aware that there's a margin of error in so doing.

The problem with this convenient assumption is that it allows people to pick and choose conclusion based on personal preference while appealing to their findings as "objective" evidence.

So, what REALLY gives a more honest representation of what happened?

What does this mean? WHat happened, happened. At the end of a game does a team having 15 hits matter more than that it only scored one run, while its opponent scored two runs, both on sacrifice flies, and those were the the only runners on the team to reach base? What is the "more honest representation of what happened", the 2-1 score or 15-2 margin in hits? Both happened and neither is "more honest" than the other. But its pretty clear who won the game.

What it comes down to is that regardless of the batter/pitcher intent to the approach in the man on 3b and less than 2 outs, the results are better represented when counting the SF as an out in the air/ground ratio, and countint the SF as an out in AB.

You think that a fly ball that scores a run is pretty much the same as a fly ball that is an out. I don't. I am not sure, given this argument, why you think we should make a distinction between a fly ball that is a hit and one that isn't. Or a foul ball fly and a fiar ball fly. Or a foul ball in the stands and one that is caught for an out. Or ... you can find all sorts of apparent equivalancies that aren't really equivalent.


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 5:01 p.m., November 5, 2003 (#20) - RossCW
  You are trying to separate the SF from the other outs

No I'm not. If you want to include SF in the percentage of outs I have not objection.

The question however is whether to count an SF as an at bat - the very same question as there is with a BB.

Why count the SF as an unsuccessful opp in OBA

Because on base percentage measures how often a batter gets on base per plate appearance. If you want to change AVG so that it uses plate appearances instead of at bats then you would have a brand new stat.

The obvious reason that wasn't done is that it would change the comparison. Barry Bonds and Frank Thomas would look like much worse hitters. So the decision was to ignore plate appearances that were partially, but not wholly, successful - sacrifices and walks so that a player was neither rewarded or punished for them.

but don't consider it in batting average?


Mike's Baseball Rants - Sac Flies (August 28, 2003)

Discussion Thread

Posted 12:14 p.m., November 6, 2003 (#22) - RossCW
  I know that's not what the rules say, but that's me. I might change the "contact" to "non-bunt contact".

And probably you should add "and is caught" to the "number of times batter contacts a ball in play" since presumably foul balls are "in play" but only count if they are caught.

I'm also not sure why you would eliminate bunt hits and not slow ground balls. But I suppose it would make SLG more accurate.



Sabermetrics Crackpot Index (August 29, 2003)

Discussion Thread

Posted 11:53 a.m., August 29, 2003 (#9) - RossCW
  Uhm huh. Apparently, some people think crackposts are those who believe 100 years of baseball wisdom should not be thrown out based on back-of-the-envelope calculations and sports writer musings that can't be tested. My guess is that if you put Bill James through the crackpot test against traditional baseball wisdom he would score quite high.


Sabermetrics Crackpot Index (August 29, 2003)

Discussion Thread

Posted 7:46 p.m., August 29, 2003 (#15) - RossCW
  Bill James is one of those guys who (rightly so) says "Baseball wisdom says X is true. Here's why I think that X is not true, and here are the numbers to prove it."

Bill James is a sports writer who comes up with interesting ideas and proves them to his own satisfaction. Where is the objective test for MLE's? Where is the objective test for win shares? Where is the objective test for his claims how relievers should be used? There aren't any. They are interesting ideas. He may be right, but not much reason to think so.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 8:20 p.m., September 7, 2003 (#3) - RossCW
  It seems to me that there are a couple parts of the analysis missing - or I missed it.

1) Runners do not score runs at the same rates even when compensating for teammates. Vince Coleman and Otis Nixon scored runs an average of 40% of the times they got on base other than by a homerun. This is over 10 percentage points more than their teammates. At the other extreme is Willie McCovey who scored only 20% of the time almost 10 points lower than his teammates.

In fact, the variation in how often runners score when they get on base is larger than the variation in how often they get on base. So any analysis of a players value to his team which leaves this out is not complete.

2) It is quite clear that teams are greater or less than the sums of their parts. You can see this in that as a team's averages (AVG, OBP, SLG) increase, their runs scored generally increase even faster. That is the same single (or other event) on one team is worth more on another because of who the other players are.

If Tango is correct about the value of home runs dropping when OBP gets beyond a certain point, it would seem that the relative value of other contributions would have to increase.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 11:12 p.m., September 7, 2003 (#6) - RossCW
  .04 runs per time on base.

So if Otis Nixon scores runs at .440 rate when he gets on base speed only accounts for the 0.040? If he had average speed his runs per on base would be .400. And McCoveys would score to .240 with average speed.

2 - They have better hitters behind them (#2 through cleanup, as opposed to #5 thru 7)

Probably that is part of it - are there studies that have demonstrated that?

3 - They leadoff more, meaning they get on base with 0 outs more, meaning there are more PAs opportunities to drive them in.

What is the impact on the average leadoff hitter from this? I don't have data that breaks out by batting order.

I am doubtful that the difference is entirely attributable to batting order. If you are right about the value of speed being marginal, then any leadoff batter would be much more likely to score when they got on base than any cleanup hitter. I am sure there are some statistics by position in the batting order that would establish averages for each, but I don't have them.

Regardless of what causes the differene, it needs to be accounted for in any system of objective evaluation of their contribution to their team. The differences in how often players score when they get on base appear to be at least as great as the differences in how often they get on base.


Bonds, Pujols and BaseRuns (September 6, 2003)

Discussion Thread

Posted 11:41 p.m., September 8, 2003 (#18) - RossCW
  But, to Bonds himself, he turned a .296 situation if he was not the batter into a .351 situation because he was the batter after the event completed. That is worth +.055 wins for Bonds' IBB.

If Bond's is walked in the situations where his hits would be most valuable doesn't that imply that Bond's other numbers are less valuable than their face value since he average situation in which they were produced was less likely to create wins.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 8:00 p.m., September 28, 2003 (#9) - RossCW
  It's an either-or question from where I sit.

The value to a team is from winning games. Contributions in losses are of no value because the outcome created no value. They are like helping build a house that is blown down in a windstorm, your contribution has no value since the house has no value. This makes it not an either or question. A player may contribute so much to his team's wins that it makes up for the fact there aren't very many of them, but the fewer wins the greater the contribution required.

I think you can take this a step further - that wins in a pennant race are more valuable to a team than wins which aren't. So even on different teams with the same number of wins, the player who contributed to his team winning the pennant race is more valuable to his team than one who contributed to a fourth place finish.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 10:42 a.m., September 29, 2003 (#11) - RossCW
  where does it say anything about team wins in the guidelines for MVP?

It says the "value a player to his team". You can interpret that any number of ways. What value is there to the team from a player's contribution in a loss. Without the player presumably they still would have lost.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 12:24 p.m., September 29, 2003 (#13) - RossCW
  As so many do, you have chosen to ignore the second part of that same sentence, which is "...that is, strength of offense and defense."

And how does that change its meaning? I think you are reading something into it that isn't there.

The letter explicitly states "The MVP need not come from a division winner or other playoff qualifier." If the intent was to ignore a team's record entirely why not just say so? Especially since that has traditionally been one of the criteria used by many voters.

Sure, you can figure out a way to get around that, in order to justify what you *want* it to mean. But as Bill James once wrote, "Very like a whale".

And that seems to me to be what you are doing.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 9:46 p.m., September 29, 2003 (#17) - RossCW
  I interpret the "The MVP need not come from a division winner..." guideline in a different way than RossCW. To me, that statement, especially considering the specific "strength of offense and defense" guideline, is saying something like, "Vote for the best player, regardless of the quality of his teammates."

You certainly are reading a lot into it.

The worst offenders, of course, are those voters who say that a pitcher should not be MVP, and that a reliever could never be valuable enough to win. Maybe I'm blind, but I do not see even a hint of such a sentiment in the rules.

2. Number of games played.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 10:43 p.m., September 29, 2003 (#19) - RossCW
  That's the point--a combination of quality and quantity. You can argue all day about how to blend quality and quantity

And it is appears to be a perfectly legitimate position that a pitcher's "quality" - especially a relief pitcher's - can not overcome the quantity issue.

, but if games played was the only criteria, Cal Ripken would have been the MVP 15 times.

We aren't really discussing what the criteria are but what they should be.

No, that's how he's interpreting something that is left wide open to his interpretation.

Its not left that wide open. I don't doubt that "value" is deliberately to be left up to the voters to interpret but these instructions are not written in code. They don't need to use a very narrow, specific statement when trying to communicate some very broad unrelated meaning.


Instructions for MVP (September 22, 2003)

Discussion Thread

Posted 9:29 a.m., September 30, 2003 (#21) - RossCW
  There is nothing in those guidelines that implies that this is the case.

The guidelines leave that judgement to the voter - so the question is whether there is something in the guidelines that prevents a voter from making that judgement. I don't think there is. It is quite different than saying pitcher's aren't eligible.



Aging patterns (September 23, 2003)

Discussion Thread

Posted 7:43 p.m., September 28, 2003 (#18) - RossCW
  Why the heck does the $H go down from year one??? That floors me. Is it due to speed? I could understand sample bias issues at the age extremes, but that is just weird.

I doubt there is a big drop off in speed from 22 to 23. You will notice more balls are going out of the park as they get older - that probably means fewer well hit balls in the park. There is also likely a sample bias that players who are speedy slap hitters get to the big leagues faster than lumbering power hitters.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 11:23 p.m., September 24, 2003 (#2) - RossCW
  Tango -

Obviously number 5 does not have an effect on how often a player scores after a walk.

And if number 4 is correct, then how good the pitcher who walks him is at preventing hits must also have an impact. As was pointed out on the other thread - pitchers who walk a lot of batters tend to have lower opponents' batting averages since if they don't, they don't last long. Thus players are more likely to score after a hit than after a walk.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 2:53 p.m., September 26, 2003 (#8) - RossCW
  Does leadoff batter refer to the position in the batting order or leading off each inning?


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 6:34 p.m., September 26, 2003 (#10) - RossCW
  If you are referring to this

No - I was referring to this:

For all methods for leadoff batter to reach
base, number of times each event occurred, the number of
times that batter scored and the frequency of each.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 5:00 p.m., September 28, 2003 (#16) - RossCW
  Since 1965 the ratio of walks to hits has been .375 (501901/1338995). The ratio of walks to hits in the leadoff example above is .241 (33002/136760). Apparently batters are far more likely to walk when they are not leading off an inning - I would assume that means they are also far less likely to score.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 5:44 p.m., September 28, 2003 (#17) - RossCW
  That should have been far less likely to to walk leading off an inning and therefore far less likely to score after a walk.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 9:00 p.m., September 29, 2003 (#20) - RossCW
  I really don't know what your point is.

My point is that there appears to be a sixth factor:

A smaller percentage of players' walks come when leading off an inning than the percentage of hits that come when leading off an inning. This means that drawing a walk, on average, is less likely to lead to a run scoring than a hit because a walk is less likely to happen at the point most likely to lead to a runner scoring.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 10:24 a.m., September 30, 2003 (#22) - RossCW
  But, GIVEN the base/out state, the runner on a walk is virtually just as likely to score as the runner on a single.

The data here says that is true for leading off an inning. I have no idea whether it is true under other circumstances. It would seem unlikely since a runner is much more likely to advance to second on a single with a runner on base, while a leadoff batter advancing to second is counted as an error in the data above.

This is a given. Walks occur more frequently with 1b open than not, relative to a single, and walks occur more frequently with 2 outs than not, relative to a single.

So when we look at Factor 3 above, how does this effect it? Doesn't it imply that a batter with more walks is less likely to get on base with 0 outs and therefore less likely to score?

Of course we are talking about the average walk. It may well be that the pattern of when players walk varies widely. In fact its pretty likely.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 10:30 p.m., September 30, 2003 (#24) - RossCW
  my best guess is that it is the same.

I see no reason to believe that and pleny of reasons why they wouldn't.

I read this 4 times now, and I don't understand what you are trying to say.

1) The data above counts leadoff singles where the runner advances to second on an error as an error. I can't think of any other circumstances under which a runner will advance to second on a leadoff single. Can you?

2) With a runner on base a single will sometimes result in the batter reaching second when the throw is made to get the lead runner. Presumably this increases the chances of the batter scoring. A walk on the other hand will never get the batter to second.

And there may be any number of other factors in game situations that will influence actual outcomes. This makes it unlikely that the numbers are the same for other situations. Whether any are significant you can't know without running the numbers for actual events.

While the average single will score about .26 times, the average walk will score .25 times

This is based on actual events from what period of time? This appears to be contradicted by the fact that batters who get on base with a larger proportion of walks tend to score fewer times per time on base.

By the 24 base/out states however, there is no difference.

Which means nothing for the overall likelihood of scoring on a walk or a single. If the distribution of walks and singles across the base/out states is not the same then the chances could be identical in each instance and widely different in aggregate.

The walk and K varies the most by base/out states, and therefore, I suspect that those kinds of players vary the most.

I've read this four times and don't catch your meaning.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 11:52 a.m., October 1, 2003 (#26) - RossCW
  If you don't understand Markov, then I'd be happy to explain it.

I don't understand Markov.

From This Link:

http://www.taygeta.com/rwalks/node7.html

A Markov chain is a sequence of random values whose probabilities at a time interval depends upon the value of the number at the previous time

Does not seem to have anything to do with the question of whether a player is less likely to score on a walk than a single.

That's why I said the .26/.25 for overall, for this very reason

Is this based on actual events from specific season(s)? If you have the actual data of how many times a player who walked scored and how many times a player who hit a single scored the math is not complicated and requires little analysis. You have six data points: how many walks, how many times the player who walked scores and the percentage, how many singles, how many times the player who singled scored and the percentage.

The data above would indicate that someone who walks should be much less likely to score. They are much less likely to get on base leading off an inning. On the other hand perhaps that difference is insignificant when looking at all walks.


Factors that affect the chances of scoring (September 24, 2003)

Discussion Thread

Posted 9:03 p.m., October 9, 2003 (#29) - RossCW
  So, GIVEN that you've got a man on 2b and 1 out, what happens to that runner, according to Markov chains, is independent as to how he got there.

Now, is this true?

I think it clearly is true only if you choose to ignore any information how they got there provides about the state. To say that it doesn't matter whether a hitter got to second on a double or a bunt single and a stolen base is true only if it is the same hitter.

In fact, when one looks at the disparity between players in the number of runs they score per time on base, it appears that who is on base may be more important than whether they are on first or second or how many outs there are. If Vince Coleman is on first he is quite likely to steal second and far more likely to score from first or second on a well hit ball. It may well be that he is more likely to score from first than Willie McCovey is to score from second. Or more likely to score from first with one out than McCovey is with no outs. The only way to know for sure would be to look at the actual instances for each player.

In general the batter who got there with a double is likely slower than the average batter who got there with a bunt single and a double. In other words the runner on second is more likely to be Willie McCovey than Vince Coleman. So while how a runner enters the state doesn't effect how he leaves the state it may give you critical information about what type of runner he is which is an important characteristic of the state and does effect how he is likely to leave it.

If I set up my Markov chains, I'll get overall numbers that match pretty closely to that last line.

I don't see how matching "overall numbers" demonstrates that you have considered all the critical variables in the on-base state.

To look at it another way, I ran numbers on how likely batters were to socre assuming that the critical factor was whether they were on base. I did not consider how many outs there were or what base they were on. Just that they had reached base. In other words I defined the "state" as being on base and how they left that state as either by the inning ending or by them scoring a run. And what I found was that how often a player scored once on base was as critical a factor in how often they will score as how often they get on base. Does that mean that which base they got on is not a critical factor? No, it just means that I didn't consider it.


Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 10:24 p.m., October 11, 2003 (#10) - RossCW
  The more I play with park factors, the more convinced I become that one-year factors are virtually meaningless.

I don't think park factors are meaningless - its pretty clear Coors stadium numbers are not the same as other parks. They can't be accurately applied to individual players or individual seasons. But that just makes them not very useful for most purposes and very often misused, not meaningless.


Batting Average on BIP, 1999-2002 (October 10, 2003)

Discussion Thread

Posted 1:52 p.m., October 12, 2003 (#14) - RossCW
  And I'm not sure how the concept of statistical significance would apply here

The problem is that it doesn't. There is really no reason to think that a park has the same impact on every player. That large outfield that reduces the SLG of a home run hitter may increase extra base hits for a speedster. So while it is "statistically significant" for the population of players it has no meaning for any individual player.

One way to think about this is to take an average park factor for the AL and NL and then apply them to every park in each league. It ought to be obvious that the impact of Coors has no more relevance to Dodger stadium than it does to Fenway. But the probability that national league players played at Coors is higher so the average league park factors would still be statistically significant for the population of NL players, just as individual park factors are.


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 10:18 p.m., October 13, 2003 (#5) - RossCW
  I do remember the Expos leading the Cubs 15-2 one game, and when I got home, I learned that Jeff Reardon got the save (final score: 17-15).

Which may mean it has nothing to do with CYA. Baker has watched a lot of baseball and no doubt seen a lot of unlikely things happen. Was the Marlins scoring 11 runs any less likely than the Cubs taking an 11 run lead to begin with?


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 12:51 p.m., October 14, 2003 (#7) - RossCW
  Of course it means nothing. It was just a silly little anecdote that I remembered when I was a teenager.

That may be. BUt I'm trying to figure out why, immediately after watching the Cubs jump off to an 11 run lead in the first three innings, people are so quick to jump to the conclusion that it was virtually impossible that Florida would do the same thing over the next six.


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 10:11 p.m., October 14, 2003 (#9) - RossCW
  it would have been virtually impossible for Florida to do the same

Florida scored 11 or more runs nine times during the regular season.


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 10:16 p.m., October 14, 2003 (#10) - RossCW
  And the Cubs opponents also did it nine times - which might be more important to Baker.


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 2:44 a.m., October 15, 2003 (#12) - RossCW
  how often does a team OUTscore its opponents by 11 runs over a span of 4 innings?

Arizona did it to the Cubs on August 23rd. From the 3rd-6th they scored 13 runs, the Cubs scored 2. And the White Sox did it to the Cubs in the first four innings on June 20th. So the Cubs had it happen twice just this year. And there were at least a couple other "close calls" in Cubs games where it took a team five innings to score 11 more or they only scored 10 more in four.

Perhaps the "average" team is dfferent but I can't find Phil Birnbaum's data on the topic anyway. In any case, Baker had first hand knowledge that is wasn't impossible.


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 12:28 p.m., October 15, 2003 (#16) - RossCW
  you can quibble about my use of "virtually"

Actually you are quibbling about your use of the term "virtually". I didn't attribute "impossible", virtual or otherwise, to you, I simply used it in a sentence about what Baker was thinking.

The fact is Baker watched his team give up that many runs in that short a period twice this year and had a couple other occasions where they came close. It was perfectly reasonable for him not to take any chances on it happening again in the playoffs.


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 10:28 p.m., October 15, 2003 (#21) - RossCW
  It was perfectly unreasonable for him to leave a young pitcher who had already been worked hard in the playoffs, and who has pitched much more than he ever has before in a year this year, to pitch two utterly meaningless innings when it would have left him fresh for a Game 6.

There is exactly zero evidence that pitching seven innings instead of five has any predictable impact on pitching performance five days later. That is a lot less than the evidence that a team can score eleven runs in four innings against the Cubs pitching since two teams did it this season.


Odds of Cubs losing an 11-run lead (October 11, 2003)

Discussion Thread

Posted 2:49 a.m., October 17, 2003 (#23) - RossCW
  Managers and coaches will tell you that if you've been worked hard the last time out, most pitchers won't respond as well in their next start.

I didn't see Dusty Baker say that about Mark Prior and he's the manager with the most experience with Prior.

Prior pitched 9,9,7,8,8 innings from August 10th through September 1st. He gave up 3 runs. That does not look to me like a pitcher who has a hard time coming back after five days.

Frankly there is no evidence, anecdotal or otherwise, to support a claim that Prior suffered from being overworked in game 2. Its just speculation.

The fact is that if you play the odds no one should run out pop flys or ground balls. They should save their energy for when it is likely to count.



Injury-prone players (October 14, 2003)

Discussion Thread

Posted 2:30 a.m., October 19, 2003 (#23) - RossCW
  This might tell us whether some teams are able to avoid injuries either by avoiding injury-prone players or by encouraging superior training/conditioning techniques ... Yet this study implies individual injuries can be predicted to some resonable degree of accuracy, and so therefore a team should be able to at the very least make personnel decisions that are likely to reduce team injuries

Or it may be that some teams use the DL more often while others tend to keep players on the active roster for minor injuries.

I suspect that most teams tend to keep star players on the roster while they recover from minor injuries where they would DL another player. They may be used to DH or pinch hit even when they aren't able to play in the field.


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 12:35 p.m., October 17, 2003 (#5) - RossCW
  I don't know what the "best team" means if it isn't the team that won. Over the course of a 162 game season there is a lot of luck in who wins or loses. There are injuries, variations in schedules, different pitchers and teams play at different levels at different times. So you if the team that won a division isn't the "best team" what do you mean?

You can decide that you will run 150 seasons of Diamond Mind and name the team that comes out ahead the most. Or you can just as well award the "best team" title to the team with the largest seasonal run differential. But I don't think either of those really determine that. Baseball is a series of contests and the team that wins the most of them is the "best team".

To try to decide who might have won if a fan doesn't interfere with the ball, or a ball hadn't gone through someone's legs or if the series had been played 20 times, is the same as trying to decide what would have happened if a player hadn't been hurt or the teams had played different schedules or played in April instead of September. If you want to know who is the best team in baseball this year, its the team that won the World Series or the question is meaningless.


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 8:06 p.m., October 17, 2003 (#9) - RossCW
  The team whose players have performed better than their opponents over... what?

By that measure you could say the Cubs were the "best team" in the LCS since they scored more runs than the Marlins. But the rules of baseball, unlike most golf tournaments, doesn't work that way.

And if they played with European rules, where the regular season champ is the champ, then they were the best team?

But isn't that exactly the question - played by these rules who is the best team. Once you allow for different rules you might as well be talking about which team had the best golf scores.

Was Seattle's 116 win season meaningless, because of the existence of a post-season?

Of course it wasn't meaningless, but I have no doubt they would have gladly traded those 116 wins for a World Series championship. So if someone wants to say Seattle was the best team over the course of the regular season based on their record, I don't see any objection. But by the rules of baseball the object is to win the world series and Seattle didn't. You could just as well decide the team that scored the most runs or had the largest run differential was the best team.


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 1:15 p.m., October 18, 2003 (#13) - RossCW
  How teams match up is an issue that hasn't been addressed here. It perfectly possible - maybe even likely - that the Yankees would beat the Red Sox head-to-head in an infinite game series and the Red Sox would beat the A's, but the A's would beat the Yankees. Perhaps baseball should go to a round robin league championship series where all the teams play one another instead of going head to head.


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 3:13 p.m., October 20, 2003 (#16) - RossCW
  I agree with Jim. The Vegas betting line is probably the most accurate predictor of the team most likely to win the World Series at the end of the season. So if you want the best team based on the regular season that would be it.


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 10:39 p.m., October 20, 2003 (#18) - RossCW
  jimd -

You may be right. But I think Vegas odds are set by the professional (and therefore informed) gamblers. T

he odds are adjusted to keep the total payout on teams roughly equal. If a bunch of Yankee fans bet on the Yankees the odds of the Yankee's winning will go up above their "true value" and other teams will go down below their "true value". But the professionals - who are only interested in making money - will then start betting on the Yankees opponents since the odds are now below their "true value". The result is that the odds for the Yankees will drop back to their "true value" while the opponents odds will return to theirs.

That assumes that the professional gamblers, collectively, have an accurate assessment of the true value of teams and that they have enough money to move the odds when they stray from that value.


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 11:43 p.m., October 21, 2003 (#11) - RossCW
  Am I confused about the meanings or is there something strange about these two sets of data. There appears to be 4381 games tied at the beginning of the 8th and 11,507 at the beginning of the 9th:

"V0801 0",4381,2100
"V0901 0",11507,5552

That doesn't make sense to me.


Chance of Winning a Baseball Game (October 20, 2003)

Discussion Thread

Posted 1:48 a.m., October 22, 2003 (#12) - RossCW
  That is a very interesting (and surprising) result!

Is it surprising if one considers that the offensive team is also optimigzing their performance with pinch hitters, pinch runners etc?


Cities with best players (October 23, 2003)

Discussion Thread

Posted 11:23 p.m., October 23, 2003 (#16) - RossCW
  Russell played for San Francisco University - does that count? How about Chamberlain with the Harlem Globetrotters?

When you reduce it down to 10 players the list gets very short. There are a lot of great players listed here who would not make a top ten.

For baseball off the top of my head:

Ruth
Cobb
Aaron
Mays
Cy Young
Ted Williams
Walter Johnson
Honus Wagner
Roger Hornsby
Jackie Robinson

Satchel Paige and Josh Gibson might get added if you include the Negro Leagues but for what city? You can make a case for Mantle, Gehrig, Musial, Matthewson, Gibson, Henderson and Koufax and probably some others but it would be a hard case.


Value of keeping pitch count low (October 30, 2003)

Discussion Thread

Posted 8:44 p.m., November 2, 2003 (#9) - RossCW
  So, the crafty pitcher gets to face 4 extra batters, or essentially saves the manager of 1 inning of bad relief.

I'm not sure why it would be an inning of bad relief. In the case of both Johnson and Radke it is likely that they would leave the gaem with a lead and the extra innings would be pitched by the team's top relievers. You can pretend that relievers will pitch the same number of innings with the worst pitcher on the staff having some left-over innings that would have been used if the starters had gone fewer innings. But I don't think it really works that way. The setup guys and closer come in to pitch with a lead unless they are really used up.

Instead there is a combination of using pitchers for longer stretches and with less rest when the role they are in is called on more often than anticipated. The impact may well be felt in reduced performance by those pitchers rather than simply the run difference.

1 inning of bad relief is worth about .12 runs per game over what an average starter would have done.

Is the difference between an average starter and a bad reliever really only 1.08 runs over nine innings? Relief pitchers give up fewer earned runs, but earned runs don't necessarily reflect the contribution of each pitcher. A relief pitcher may come in with runners on first and second with two outs and let both of them score before getting an out. The starter gets charged with the two runs, the reliever gets charged with none. I'm not sure that reflects their contribution to the runs scoring.


Value of keeping pitch count low (October 30, 2003)

Discussion Thread

Posted 12:12 p.m., November 3, 2003 (#11) - RossCW
  Feel free to debate the merits of this.

I don't know where any of those numbers come from - the math looks right. If the 4.50 is supposed to reflect the average starter's ERA, I would point out that both your "crafty" and "power" pitchers do better than that. It wouldn't be surprising that there is not a huge difference between two groups of elite pitchers.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 4:34 p.m., November 5, 2003 (#1) - RossCW
  "there is evidence that pitchers do NOT have good and bad days"

Unlike everyone else in the world. And no, I don't think that is taken out of context. I think it is indicative of a world view that has become far too prevalent in sabermetrics - what I wish to be true is true.

That said, his basic point that it was not obvious Pedro was done is obviously correct. Little had seen Pedro pitch a lot and it wasn't obvious to him. I don't know how it could be to commentators in a booth or fans watching the game on television.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 7:16 p.m., November 5, 2003 (#5) - RossCW
  Also informing my view is Pedro's history of poorer performance with higher pitch counts

I think this is a mistake. While poor performance in the 75-100 pitch range may indicate a lack of endurance, above that it may simply reflect the decisions of the manager. If Little almost always left Martinez in until he got poor results, then he is always going to have bad stats for the last pitches of the day.

Instead, I think you have to look at how often he goes well beyond 100 pitches. Presumably if he wasn't getting results he would be lifted before he got to 130 pitches.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 11:18 p.m., November 6, 2003 (#14) - RossCW
  You've gotta figure that since even bad pitchers don't have a .400, that Pedro was probably extremely unlucky on BIP.

There are a lot of pitchers who have had years with BABIP over .400 - they just didn't get to pitch very many innings. To assume that a tired Pedro would pitch better is, I think, a mistake.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 1:25 p.m., November 7, 2003 (#25) - RossCW
  The highest BA/BIP against any pitcher with 100 or more innings

The 100 IP standard eliminates every pitcher who was not a starter. In 2002 here are the players with BABIP over .400 who also got over 30 outs:

NAME YEARID BABIP OUTS
========================= ============
Jaret Wright 2002 0.488 55
Kane Davis 2002 0.456 42
Bret Prinz 2002 0.452 40
Doug Nickle 2002 0.442 35
Victor Santos 2002 0.440 78
Jerrod Riggan 2002 0.413 99
Mark Lukasiewicz 2002 0.410 42
Matt Anderson 2002 0.410 33
Mark Corey 2002 0.407 36
Paul Abbott 2002 0.405 79
Juan Rincon 2002 0.402 86
Sterling Hitchcock 2002 0.402 118

It would be a mistake to think that very high BABIP are limited to pitchers who never were and never will be major league pitchers.

I took all pitchers from 1994 to 2002, and selected only those pitchers who had, at most, 100 BFP in that season. You have to figure that such pitchers probably produced pretty bad

Not really. You have pitchers who got september callups, pitchers who pitched well and were hurt and a whole collection of pitchers who may have only pitched in one or two games. Its not at all clear what category makes up the bulk of the innings that create your averages. It would be reasonable to assume that those who pitched better also faced more batters.

that is that distinguishable from the average Joe MLB pitcher....

Pitch $H
01-15 0.290
16-30 0.285
31-45 0.284
46-60 0.281
61-75 0.287
76-90 0.279
91-105 0.281
106-20 0.288

I find it hard to believe those numbers - what are they based on? We either have to conclude that pitchers never need to be taken out of a game or that managers are quite skillful at removing them before they do any damage.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 10:14 p.m., November 7, 2003 (#33) - RossCW
  "their $H was only .310."

I am not sure what "only" means in this context. That is about the same as the best hitting team in baseball.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 12:19 a.m., November 9, 2003 (#39) - RossCW
  Don't you find it rather odd that I can find such a horrible set of performance, and when I look at the one variable that I did not select from, that I got a $H that was just a bit worse than average? Wouldn't you have expected a $H of somewhere in the .350 to .400 level?

No, I wouldn't. You are making any number of assumptions that I think are incorrect. The first is that a pitcher with absolutely no success in any phase of the game is going to continue to be allowed to pitch. Second, is that a group of pitchers specifically chosen for their low K's, high walks and high home runs has some sort of meaning - is it really surprising that group has a high ERA? No. And third is that there is some expected relationship between poor performance on HR, SO and BB and how many hits a pitcher gives up. Lets assume we took the opposite - a .350 BABIP - would you conclude that the pitcher much walk a lot of batters. How about give up a lot of homeruns? How about strikeouts - must they be lousy at strikeouts? I don't think so and see no reason to think they are necessarily related.

Finally you suggest that .310 (or .302) isn't so bad and I point out that in this context it is "really bad" - perhaps not as extreme as the extremes you used to choose your group, but still really bad. You complain that somehow that is a minor detail - it clearly isn't. Its the difference between a team hitting like the Yankees and hitting like the Tigers.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 11:34 a.m., November 10, 2003 (#47) - RossCW
 
I'm saying that .330 is about as bad as bad should get, that .400 is really quite inconceivable from a true talent persective (but obviously quite expected from a sample of 63 BIP).

I'm saying that Pedro was supremely unlucky at .400 over 63 BIP (with that being only 2 SD away from his mean), and that it's more likely that a .320 or so is really representatiove of Pedro's true talent on BIP.

Once you accept that, Pedro-tired is about a lg avg pitcher.

So you can leave him in for 200 pitches and he will still be league average? I think you are missing the point. Pitchers ability varies widely over the course of a game. There is really no reason to think that it does not vary more widely than the average variation between major league pitchers over the course of a season. There is a point at which a low a-ball pitcher will be more effective than Pedro.

For those who don't follow, if Roger Clemens gets rocked in the 1st 2 innings, and Torre has no faith in him, he'll take him out.

If Clemens has given up a couple bloop singles and a couple mistake pitches etc. Torre is not going to lift him. Its not as if the manager can't see when a pitcher is suffering from bad luck as opposed to not having good stuff.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 2:14 p.m., November 10, 2003 (#49) - RossCW
  Why do you have to say stuff like that?

It was a question I asked since it seems that is where the argument you are making leads. Your seem to be claiming that it isn't possible for Martinez to get so tired that he would pitch at a level less than major league average. Is that what you are saying? If not how does the "evidence" you provide for your argument support it being true for 130 pitches and not for 200?

So, does Pedro hit a wall and he drop to a lower plateau at some point, or is a decline on a pitch-by-pitch basis? I don't know

So, what I'm saying is that when Pedro hits that wall, he becomes a league average pitcher.

As you point out it isn't clear that he ever "hits a wall". If not, then I agree there is some point in his decline where he is "league average". But isn't that simply begging the question of whether he somtimes/often drops below that point when his pitch counts get high?

What's your evidence for this?

I watch baseball games. Pitchers fall apart. Pitchers struggle in the first and settle down later. Is some of that chance attributed to skill? Yes. But it doesn't appear to me that is always the case and it doesn't appear that the people who play the game think so either.

My guess is that a pitcher's true talent stays extremely static during the course of a game.

I agree. And I misused the term "ability" - I should have said performance.

Pitchers talk about the adjustments they make all the time. They find out one of their pitches doesn't have movement or they can't get it over the plate and change their approach and the pitches they use. They adjust to the umpire's strike zone. They lose something off their fastball. They "find" their curve.

Unless you know many low a-ball pitchers who can strike out 26% of MLB batters, as Pedro post-105 has done from 2001-2003,

As you point out when Pedro starts to fade is going to vary from game to game. You are including in your averages what happens between 105 and 130 pitches even when he doesn't fade until 130 pitches.

My guess is that there are quite a number of low A ball pitchers that would do just fine striking out major league hitters. Unfortunately when they weren't striking them out, they would be either walking them or getting lit up.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 12:43 p.m., November 11, 2003 (#51) - RossCW
  and I provided a clarification

Yeh - after I raised the question. As I pointed out, the evidence you provided for Martinez pitching at major league average at 130 pitches applies equally to 200.

There's no question that there's great variability in performance. This is true in all walks of life where you have a binary (safe/out) result.

I said performance - not results. Martinez ability doesn't change from pitch to pitch, his performance clearly does. And in fact there is not a binary result for pitchers - there are any number of outcomes for a particular pitch.

This was to all those people who bring up Pedro's post-105 PAs,

I agree with you on that point obviously. You can't look at results and determine performance because its the manager's job to look at performance and act before the bad results. Given the sample size above 100 pitches, it doesn't take many mistakes by a manager to make the overall results look bad.

but there's no way they'd be able to maintain a 2:1 K/BB ratio

I don't think that is really true. They would just have to serve up pitches over the meat of the plate to avoid walking batters. I'm not sure that is any different than the effect of being tired on a veteran pitcher. Their slider and curve may still break just as sharply, but it no longer catches the plate.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 1:48 p.m., November 13, 2003 (#54) - RossCW
  is still throwing in the 90's with movement and any difference in control or command

I guess I don't think it matters what his K/BB or HR ratio is if this part were true. How exactly do you determine whether there are any differences?

In order for a pitcher to have a true $H of .400 or so would HAVE TO throw a straight fastball in the 80's (or less) with little else in terms of command or offspeed pitches

I just don't agree with this. I think it is perfectly possible for a pitcher with 90 mph fastball to have a hard time getting major league hitters out if that is all they have and they can't locate it well within the strike zone.

The question isn't what happens on the pitches batters don't hit - its what happens on the pitches where they do. If 90% of Pedro's pitches are fine he probably isn't going to struggle a lot. But if that percentage drops to 75% he is going to be offering up at least one batting practice pitch to every batter. Put another way - when 10% of his pitches are easily hittable he probably isn't going to struggle, when 25% are easily hittable he will.

no manager, in any situation, even Grady Little, is going to leave a pitcher in that long.

I agree - but a .400 BABIP is not A ball pitcher, its AAA pitching. Some pitchers put up .400 numbers in the major leagues.

it might just have been bad luck - BTW Ross, there are 2 kinds of bad luck in this regard; one is when bloop hits fall at the right time and a bunch of runs score accordingly; the other is when a pitcher gets hit hard even though his true "talent" has not changed

And managers can tell the difference in both cases. In the latter case, the manager can tell its just bad luck when batters are hitting every mistake pitch and even a few pitches that aren't. When every third pitch is a mistake it isn't bad luck.

What you want to do is to look at all pitching starts in which a pitcher gets hit hard in the first 3 innings (high percentage of line drives and high percentage of hits per GB and FB) and then look at their $H (or whatever stat you want to use to represent how "hard" they are being hit) in the next inning.

This will show you nothing if managers can tell the difference between a pitcher who is unlucky and one that is serving up too many mistakes. The pitchers that are unlucky will still be in the game, the others won't.

my guess is that the idea that pitchers who are getting "hit hard" (or not) is significantly indicative of their true talent at the time (IOW, predictive of the future) is another of those truisms that turn out to be clearly not true.

And my guess is that it will be clear only if you assume that you are dealing with random results rather than optimized results.

Tango and I have been debunking many such myths lately

The problem is that no statistical analysis can debunk many of these "myths" since you don't have the data to do that. You have the results and you often pretend that they reflect random actions when in fact they are a result of concious manipulation.


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 1:59 p.m., November 13, 2003 (#55) - RossCW
  Just to be clear I am not suggesting that anyone doesn't try to eliminate the "noise" but that even when it isn't possible rather than throwing up your hands and saying "this isn't possible" you go with the data you do have. The problem is that analysis based on that data is often less reliable than the subjective observations that are being "debunked".


ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)

Discussion Thread

Posted 12:09 a.m., November 14, 2003 (#59) - RossCW
  Tango -

You apparently are trying to prove that its possible for a pitcher to have a .400+ BABIP even if their "true" ability is closer to .300. I agree - it is possible.

What evidence is there that this is the explanation for every pitcher who had a BABIP over .400? Is it surprising that there aren't very many pitchers who give up 20 hits while getting 30 outs (not counting home runs) that make it to 50 balls in play. Nor is it surprising that in 2002 there were 8 pitchers between 25 and 50 BIP who had BABIP over .400.

The problem for your statistical approach is that all your data is as hopelessly contaminated as this. The player who goes over .400 is sent down. The pitcher who gets lucky and has a BABIP of only .300 when his true ability is .400 gets a few more chances but is quickly jettisoned when he returns to form. You seem to acknowledge this "selective sampling' issue in one sentence and then ignore it in the next as if having acknowledged it somehow makes it disappear.

BTW - I'm not a statistician. But you are developing standard deviations for 50 BIP and then applying them to pitchers between 50 -150 BIP. Shouldn't you be using the average, not the minimum, of the group - i.e. 100 BIP - to determine the standard deviation?


Effect of SB attempt on batter (November 10, 2003)

Discussion Thread

Posted 1:29 p.m., November 10, 2003 (#1) - RossCW
  None of the studies seem to consider the quality of the pitcher and its impact on when batters choose to steal. Without that any evaluation of the impact of stealing on the batter may just be catching variations in the quality of pitching. Henderson attempts steals against all pitchers, Murphy only when this team is playing for a single run against a quality pitcher.

It seems to me one of the fatal flaws in a lot of sabermetric analysis is failing to account for the fact that players/managers are consciously optimizing their chances of success. You need to be very careful that the effect you are measuring isn't just an artifact of that optimization.


Baseball Player Values (November 22, 2003)

Discussion Thread

Posted 9:57 p.m., January 30, 2004 (#25) - RossCW
  Perhaps someone here can explain why Oswalt's toatls for players "offensive wins" is negative and his totals for pitchers contributions are positive by an equal amount. It seems to have some fairly serious implications.


Baseball Player Values (November 22, 2003)

Discussion Thread

Posted 10:31 p.m., January 30, 2004 (#26) - RossCW
  Perhaps someone here can explain why Oswalt's total for players "offensive wins" is negative and his total for pitchers contributions is positive by an equal amount. It seems to have some fairly serious implications.


Where have you gone Tom Boswell? (January 7, 2004)

Discussion Thread

Posted 2:57 a.m., January 16, 2004 (#14) - RossCW
  Same for relievers. You need 'em. But, their win impact is less than twice that of a starter on a per batter basis, but they face one-third the batters.

This tells you nothing about the impact of elite closers who reduce the game to 8 innings.



MGL takes on the Neyer challenge (January 13, 2004)

Discussion Thread

Posted 2:36 p.m., January 16, 2004 (#30) - RossCW
  If you are trying to prove that there is very little difference between players the best way to accomplish that is may be using MGL's method - to the extent his method is described here (just once I'd like to see a reproduceable description of one of these "studies"). It starts by using three years of data and then regressing that to further limit the defined difference. The not surprising conclusion is that there isn't much difference in the competition.

But that hardly says anything about the impact of competition on the raw data of a single season for a single player.


MGL takes on the Neyer challenge (January 13, 2004)

Discussion Thread

Posted 9:23 p.m., January 16, 2004 (#32) - RossCW
  Maybe I'm reading it wrong...

No - you did a very good job of manipulating the data to get the result you wanted. I guess that's a compliment.


MGL takes on the Neyer challenge (January 13, 2004)

Discussion Thread

Posted 2:05 p.m., January 31, 2004 (#61) - RossCW
  it would be nice if a system could give some sort of confidence score or estimate based on how well-interwoven a player's playing time was with other players.

Be careful - even suggesting the Emperor is only wearing shorts is not well-received here.

The problem comes when the confidence level is less than the differences measured by the performance measuring system. Everyone probably agrees that Coors has some impact on almost any player's performance, its hard to make that argument for Yankee stadium. The Rockies have never figured out how to adjust their lineup to take advantage of Coors, the Yankees have always considered their stadium in constructing their lineup.



Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 8:19 p.m., January 28, 2004 (#32) - RossCW
  The ones between ERA and the next season's ERA are lower in my batch than they are in McCracken's, and strangely seem to get even lower when the bar is raised from 100 to 162 innings. It's a strange anomaly, but alas, I don't have similar data from McCracken to compare.

I'm curious - who here actually duplicated McCracken's work and produced identical results?


Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 1:27 p.m., January 31, 2004 (#37) - RossCW
  If you would RTFA, you'd see that Jay Jaffe actually does replicate one of Voros's most important results.

Well no - he didn't. He did similar calculations for last year and compared it to Voros results. The first step of any "peer review" process is to duplicate the exact methodology as described by the original research. People make mistakes. And the acceptance of un-duplicated research as the basis for comparison is a problem.

ERA to next ERA .407 .288 .378
dERA to next ERA .521 .513 .524

The baseline IP is the number of innings pitched in both seasons a pitcher needed to qualify for the study. McCracken used separate baselines for the two comparisons, but since I had data for both 100- and 162-inning levels, I'm running it here.

The numbers in the last two lines are the most important single result in sabermetrics over the last five years.

I won't argue with that for the reason I stated above. The problem is that it takes a sample based on pitchers who have pitched over 100 IP (or whatever baseline) two years in a row and treats them as if it is a random sample. They are comparing pitchers who managers have elected to pitch a certain amount two years in a row. And ignoring what happens to pitchers who don't meet the standard the second year.

There is a similar problem with evaluating any predictive system in baseball. How do you treat the people for whom you have little or no data because of playing time?


Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 1:52 p.m., January 31, 2004 (#38) - RossCW
  I would add its important to remember that DIPSERA is a modification of Bill James' Component ERA using the team averages instead of individual performances for the non-DIPS stats. James uses walks, home runs, each type of hit etc. to create a theoretical ERA based on the individual components. DIPS ERA takes that one step further so that players have the team average hits and distribution of hits between singles, doubles and triples.

I don't think any conclusions can be drawn from this beyond that it correlates better for the population used. Since several adjustments at the same time there is no way to know which are causing the changes.


Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 2:32 p.m., January 31, 2004 (#39) - RossCW
  Assuming my numbers are correct, 85 pitchers pitched over 160 innings in 2001, 50 of those also pitched over 160 innings in 2002. That means well over a third of the pitchers in the initial sample are not evaluated in the comparison to the next year. The numbers for other years appear to be similar.

Here is the Lahman database SQL:

select p1.Yearid, count(*)
from PitchingAnnual p1
JOIN PitchingAnnual p2 on p1.PLAYERID = p2.PLAYERID and p1.YEARID = p2.YEARID-1
where p1.IPOUTS>480 and p2.IPOUTS>480
GROUP BY p1.YEARID
ORDER BY p1.YEARID DESC

"PITCHINGANNUAL" is a view that sums a players stats for the entire year regardless of the team they played for.


Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 5:45 p.m., February 1, 2004 (#41) - RossCW
  will concede that RossCW has a point in #37 in that I haven't made any comparison of 98-99 DIPS 1.x to 02-03 DIPS 1.x and 98-99 DIPS 2.0 to 02-03 DIPS 2.0. I am but one man with limited capabilities, and while I've made a good faith effort to do the task I set out to do with as much accuracy as possible

Jay - I wasn't taking potshots at the work you did do, just responding to the claim it duplicated Voros original work.

Ross, you make one point there about "team average hits" etc., which is not accurate -- DIPS 1 used team averages, but DIPS 2 uses league averages.

Thanks for the correction.

And while I see Ross' point about a non-random sample, I'm not sure how meaningful a comparison of, say, pitchers who pitched 100 innings in Season 1 and at least 1 inning in Season 2 would be -- the "sample size" issues seem obvious when it comes to small amounts of playing time.

I don't think it is a solveable problem - but it is something that needs to be considered with all the predictive systems. How do you handle data points where there is no data the second year?

The problem is not really sample size. If you predict something is true for 85 people and you have a good correlation for the 50 who you have data for the next year, can you draw conclusions about the entire first group from that? I don't think so. You may be able to draw a conclusion about the second group but that isn't very useful for predictive purposes.

There is a larger question with DIPS ERA that I can't find anything on. To what extent does it predict better than a different system that regresses ERA to the mean. Afterall, that is the practical impact of using league (or team) averages in making the calculation. Wouldn't we expect almost any system that did that to be better with year to year correlations?

BTW - are your baseline numbers inclusive - i.e. does the over 50 IP include pitchers who pitched over 100?


Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 12:12 p.m., February 2, 2004 (#43) - RossCW
  One possible explanation for the decline in correlation over 162 IP could be that the range of variation among pitchers ERA'S is a lot less. Bad pitchers don't pitch that many innings two years in a row, but they may still get over 100 so you get better correlations at the lower threshold.


Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 12:05 a.m., February 7, 2004 (#46) - RossCW
  You have a serious reading comperhension problem. Please see a specialist, or a f***ing fourth-grade teacher. I said it replicates Voros's results. With a different data set, yes.

You need a f**cking dictionary. The "results" are not the "identical" same if the data is different, you idiot. Here is the exact question I asked that provoked your first response:

who here actually duplicated McCracken's work and produced identical results?

As I suspected - no one here ever duplicated Voros results before endorsing them and spreading them to the world at large.

I see the point about redoing Voros's study, the idea that he may have screwed up his study.

Then stop complaining about my raising the fact that no one bothered to do it. Or is that just another red herring.

If he did, then it's odd that everyone since has reported similar results

Everyone being Jake. As you note - its nice that Jake got similar results but what if he hadn't? Its the first time anyone has done the work. When Tippets tried to duplicate the results that Voros claimed about the bunching of pitchers BABIP, he couldn't.

The fact that a single year's DIPS ERA correlates better to next year's ERA than the raw annual ERA hardly has any meaning since no one has ever believed that ERA is very predictive of next year's ERA. That's why Bill James came up with component ERA in the first place.

I prefer to use balls in play as the basis for my comparisons, rather than innings pitched. It's a personal preference, but since we're trying to look at a measure of skill on balls in play it makes more sense to me to evaluate the group based on that, rather than on IP which is only indirectly related to BIP.

I agree - but if there were substantially different results between the two you would want to figure out why.

One issue that I have never seen addressed is that we are dealing with data that is based in part on how pitchers are managed. It is related to the above since IP is a measure of how long someone lasted in the game - not how many batters they faced, but how many they got out.


Futility Infielder - 2003 DIPS (January 27, 2004)

Discussion Thread

Posted 12:13 a.m., February 7, 2004 (#47) - RossCW
  Actually if the only question is BABIP the proper way of choosing a cutoff is probably just balls in play.



Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 12:05 a.m., February 1, 2004 (#33) - RossCW
  Here is the result of all that for
HBP: HBP/(PA-IBB-HBP)
BB: (BB-IBB)/(PA-HBP-BB)
SO: SO/(PA-HBP-BB-SO)
HR: HR/(PA-HBP-BB-SO-HR)
xH: (H-HR)/(PA-HBP-BB-SO-H)

Perhaps somone can explain why is the relationship of home runs to plate appearances that are not strikeouts, walks, home runs or hit by pitch significant or the relationship of strike outs to late appearances that are not walks, hit by pitch or strikeouts?


Forecasting Pitchers - Adjacent Seasons (January 30, 2004)

Discussion Thread

Posted 2:04 p.m., February 1, 2004 (#35) - RossCW
  It always seemed to me, by casual observation, that pitchers in general tended to improve their K rates and HR rates for a few years after coming into the lg.

That's because they do. The numbers above measure K rate to non-walks and hit by pitch. So if a pitcher strikes out 5 and walks 4 in 10 plate appearances one year and strikes out 5 and walks 1 in 7 plate appearances the next the "K rate" remains the same even though the pitcher struck out over 70% of the batters faced the second year and 50% the first.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 3:36 p.m., February 3, 2004 (#25) - RossCW
  If I understand the selection of "clutch situations" it includes a lot of situations that players would not consider to be clutch. For instancethe top of the seventh with the score tied at 5 to 5 and no one on base isn't really a "clutch" situation. So presumably the measured impact of the situations that are trul "clutch" is diluted by the lack of impact in situations that aren't.

It is certainly likely that players are more sensitive to the impact of clutch hitting than its statistical significance might warrant. At bats that win ball games tend to get noticed.

The relative value of sluggers versus singles hitters in those situations may be related to the quality of pitching they face. I think a manager is much more likely to bring in a top reliever to face a slugger than a singles hitter.

It also seems that a larger percentage of the at bats considered are truly "clutch" for sluggers. Tony Gwynn is not really in a clutch situation with a no one on base even in the 8th inning. A slugger - who could put his team in the lead with one swing - might feel like that is a clutch situation. Its also plausible that the slugger will swing for the fences when the bases are empty and not when there are runners on base.

Assuming someone repeats the methodology used here and gets the same results (something I am wholly incapable of) this study seems to end the debate about whether clutch hitting exists. But it calls for further studies to better understand how it exists. Or more accurately - there is something about the group of at bats selected that causes some hitters to get statistically significant different results than other hitters compared to other at bats. I don't know what another explanation of the results would be, but that also needs to be considered.

My comments are just suggestions for areas where further study is warranted.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 3:00 p.m., February 4, 2004 (#49) - RossCW
  I'm with David that we don't know why some players hit better than others. It may be that players that just don't care very much do better - the exact opposite of the popular notion. Its also possible that there are a variety of reasons for different players.

On Jack Clark - I have a different memory. I remember being unconvinced by James work on clutch hitting and thinking his peon to Clark sounded like a fanboy exception which reinforced my sense that he had a conclusion in mind in both cases.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 12:22 p.m., February 6, 2004 (#72) - RossCW
  AED -

Have you used this method on Batting Average on balls in play?


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 10:14 p.m., February 6, 2004 (#80) - RossCW
  there is an inherent spread in BABIP among pitchers that can be measured; I don't think there's much question of that. The greater question is how much is due to ballpark effects, quality of fielding, pitcher's skill, and luck.

I see the problem.

In basic linear weights, a sinlge is worth .47 runs. Suppose we double its value for close and late (my guess is that it is high and I hope that it would offset the fact that I am only talking singles here). .94*2.77 is just 2.6 runs, or about a quarter of a win.

Its pretty clear that the realtionship between hits and runs is not linear - the value in terms of runs scored raises as the number of hits raises. And the value of hits is higher in clutch situations precisely because they lead to more wins than hits in non-clutch situations.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 1:26 p.m., February 8, 2004 (#84) - RossCW
  "Its pretty clear that the realtionship between hits and runs is not linear"

For the team, yes, since aside from a HR you need more than one "positive event" to score a run. Likewise for pitching. For hitters, though, it is basically linear. i>

I don't see any way to test the question for individual players. I thought the whole point of linear weights was to assign a run value to an individual action and runs are scored by teams. A good hitting team gains more than a bad team from the same offensive action.

The result one at-bat only affects that player's next at-bat if the 10+ batters go in one inning. Otherwise there's lots that affects the importance of a batter's at-bats, but the batter himself isn't responsible for it.

That is true in one sense - but it isn't really true if a player has the ability to hit better in some situations than others. The number of wins created by home runs in "clutch situations" will vary depending on how you define that term. And it is not likely that variations in situational hitting performance are limited to clutch situations.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 10:30 p.m., February 8, 2004 (#88) - RossCW
  Sure, a batter hitting behind a great OBP hitter means that he will have a disproportionate number of at-bats with a man on base, and thus his at-bats have greater leverage. However, since he is not himself responsible for that increased leverage, his impact on the game outcome is linear and should be treated as such.

It seems to me once you accept the notion that batter's hits are not randomly distributed then this is no longer true. You cannot assume that you can assign an identical value to each hit - if the player had had more opportunities with men on base they would have had more hits. Or perhaps they would have had more if they had played on a poorer team with fewer clutch situations.

The easiest way to look at it is that each player goes to the plate with a certain probability of his team winning, and finishes his plate appearance with a different probability of a win. The difference between the two is very closely related to Tango's lwtsOBA for the outcome of the at-bat multiplied by the LI. The lwtsOBA is the player's responsibility; the LI is not.

But the acutual average impact of an at bat is to reduce the offensive team's chances of winning. At least if you accept the win probabilities that have been published here. The net for offensive players is negative by a considerable margin. I don't think that is true of the linear weights analysis.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 1:12 a.m., February 9, 2004 (#90) - RossCW
  The net offense is zero, pretty much by definition

I think this is a common mistake. There is no reason to think that the net offense will be zero. The zero-sum is for both offense and defense. Take the example where teams trade the lead every inning. At the end of every half inning the average at bat for the offense in that inning will have increased the chances of the offensive team winning. You can make examples of similar effect where the team on offense loses ground in almost every inning.

And in fact, when you look at all game situations and how likely a team is to win before and after each at bat, the average change is negative for the team at bat. Teams lose ground on offense and make it up when they are in the field.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 1:22 a.m., February 9, 2004 (#91) - RossCW
  You initially raised the point that run creation is not linear, which I interpreted as meaning how runs are scored within one inning.

What I was actually referring to was over the course of a season, but the point is the same - if players performance varies by the situations they are in then you cannot attach an average value to a typical outcome. There is no typical single since how many singles a player hits depends on the actual game situations they were in and the actual value of their singles to the team also depends on the game situations they were in.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 12:41 p.m., February 9, 2004 (#98) - RossCW
  Ross, the median change is negative, in that over 50% of at-bats result in outs. However, the average change is zero because the positive changes tend to be larger than the negative changes

No. The mean is negative. While the positive changes may tend to be larger - they are not large enough. I gave you an example. I don't think there is any way for the both offensive teams to have a better chance to win after every inning and have a zero average for the offenses at the end of the game.

you'll get a positive net offense and equally negative net defense.

I was talking about win probabilities and I wasn't talking about just offense, not defense. The study that was done here was based on the actual probability that a team would win given a certain score, inning, number of outs, players on bases. These probabilities were then compared before and after each plate appearance. On average, the offensive team lost ground on each plate appearance. And the defense gained ground on average.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 1:29 p.m., February 9, 2004 (#100) - RossCW
  You can do the math by innings, half innings, plate appearances, pitches, it doesn't matter - probability is conserved

Yep. Every time the probability that the offensive team declines the probability the defensive team will win increases.

Now, if the win probabilities published here are not consistent with this,

The win probabilities are completely consistent with this. If you look at the numbers for defense and offense they total to zero. But the defense is a net positive and the offense is a net negative.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 1:59 p.m., February 9, 2004 (#103) - RossCW
  However, the true total offense is indeed exactly zero, as is the mean outcome of a plate appearance.

That is highly unlikely since there is absolutely no reason it has to be.

The bottom line is that the win prob tables that I generate and use (which are after-the-fact on a league level) will automatically preserve that off=def=0 on a league level.

I accept that if you assume they will be zero they will be zero, but that does not accurately reflect the actual probability as measured. And there is no reason to think it will.

The change in win probability by the offense, on a league level, will always be zero. This is not true at a team, game, inning, or PA level.

You will have to explain how a positive win probability for the offensive teams in one game is accompanied by a negative win probability in some other game.

I gave the example of teams trading leads every inning.

Is it agreed that the win probability of the offensive team went up after each inning? Does that not mean that the average plate appearance in those innings increased the win probability for the offensive team? Doesn't that mean that at the end of the game the average plate appearance by the offense was positive? How does this get made up for in other games? And why would one believe it will?


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 2:44 p.m., February 9, 2004 (#106) - RossCW
  As I noted on the thread on this topic, the error amounts to 0.0002 wins/PA,

The data provided was not based on a theoretical model - but actual probabilities. Unless someone has rerun the data and gotten a different result, asserting there is an error doesn't mean there is one.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 3:42 p.m., February 9, 2004 (#109) - RossCW
  It's that accurate probabilities will result in a situation where the average probability change should be zero.

That is not true - the average (net) probability change for both the offense and the defense have to equal zero but there is no requirement that they both be zero. One can be positive to the extent the other is negative.

Obviously at the end of that inning, the home team's win probability will be either 1.00 (if they scored) or 0.50 (if they didn't),

Actually I doubt that is correct - my guess is that actual probability the home team will win is greater than 50-50 even if they didn't score.

Now suppose that only 30% of such teams actually score in this situation, meaning that the "average" wins produced by offenses in this situation is -0.25 wins.

I don't follow this at all. The data you are disputing is not a theoretical construct. We know how many teams will win in each situation based on past observation. That is to say if there is a 90% probability that the home team will win it is because the observed data found that the home team actually won 90% of the games where it was in that situation. If their chances of winning at the end of the inning was 55% then it is because 55% of the home teams win when the game is tied at the end of nine innings. The chances of the home team winning dropped by 35% over the course of that inning. The chances of the visiting team winning increased by 35%.

When you observe the actual data the findings were that on average the team in the field will increase its chances of winning while the team at the plate will decrease its chances of winning. This is based measuring the actual chances of winning before and after each plate appearance.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 3:52 p.m., February 9, 2004 (#110) - RossCW
  So, you will have on team that will be off= +.50, and the other team that will be off= -.50 (and def=0 for both).

If you assume defense is zero then you have to assume offense is zero. I see no reason to assume either if we are talking about the impact on the probability a team will win. It ought to be obvious that the team that wins is fielding when the game ends. (The exception being walkoff victories.) Their chances of winning clearly increased while they were in the field. When Torii Hunter catches a ball or Randy Johnson strikes out a batter they are increasing their team's chances of winning.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 4:51 p.m., February 9, 2004 (#112) - RossCW
  What you have here is actual data that shows that the actual probability of a team winning increases more while they are fielding than when they are batting. This data is being rejected as in error because it fails to match a theoretical construct that says that won't happen.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 7:07 p.m., February 9, 2004 (#115) - RossCW
  Voice of Unreason - here is the link http://www.livewild.org/bb/index.html posted here of the study I am referring to. If you add up all the numbers for the offensive players and all the numbers for the pitchers (he did not consider fielding) the combined total is 0. But when the pitchers are on the mound the chances of their winning increases and the chances of the team hittin decreases - and of course the converse is true as well.

If the AVERAGE turn of will lead to "x" probability of winning at the end of the inning then, as I understand it, x is by definition the probability of winning at the beginning of the inning.

I'm sorry I really fon't unerstand that.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 8:18 p.m., February 9, 2004 (#118) - RossCW
  If the AVERAGE turn of will lead to "x" probability of winning at the end of the inning then, as I understand it, x is by definition the probability of winning at the beginning of the inning.

Not if you consider that the other team will then come to bat and the probability they will win will also go down on average.

Voice of Reason - just add up the numbers for batters and the numbers for pitchers.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 9:33 p.m., February 9, 2004 (#120) - RossCW
  Sorry Voice of Reason - the numbers I was referring to are in the large download where he gives the net impact of every player on the probability that their team will win. If you add up the impact for the batters (which also includes all pitchers who batted) it is negative. If you add up the impact of all pitchers it is positive.

I don't know what you are looking at. I'll take a look later.

Are you proposing that, since the pitchers are getting more credit for wins than the batters, that it necessarily follows that the win expectation is going down (on average) in the offensive part of the inning?

I have no idea whether that is related to the win probability data or not or what its implications are.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 11:40 p.m., February 9, 2004 (#123) - RossCW
  Put differently, if I calculated a 0.60 win probability for a specific situation but 70% of teams in that situation go on to win, then obviously the win probability in that situation was really 0.70!

Then you miscalculated. So where is the evidence for a miscalculation in the data presented. Where was the error? I don't think you have evidence of one except that it contradicts your theory. You believe teams are not supposed to gain ground on defense and lose ground on offense so it must be wrong.

This isn't an arbitrary "theoretical construct" as Ross claims; it's the straightforward definition of the term "win probability".

Well yes it is arbitrary because there is supporting the claim that the actual probability was 70% when the data presented said it was 60%.

There seems to be this assumption that the average decline of the offense only applies to one team - but it applies to both. So while a team's chances of winning may decline while it is at bat, the chances of its opponent will decline on average while they are at bat.

I have given a specific example where teams trade the lead every half inning. The chances of the team at bat winning goes up after every half inning for both teams. But at the end of the game someone wins because, by definition, the chances of the team in the field winning went down in every inning. Since both teams bat and both teams field there is absolutely no reason why they can't both improve their chances while batting or both improve their chances by fielding.

So first off, if you really think you can measure contributions to an accuracy of 0.0002 wins, you're deluding yourself.

Of course you can measure what actually happened to that degree of accuracy. We know that the actual probability of a team winning is not even close to being measured since there are probably hundreds of factors that effect the actual chances that we are not controlling for. But we can measure how often the offensive and defensive teams have won with a no one out, runners on first and third in the bottom of the third inning with the game tied. And we can measure how often they win with one out, and we can measure the difference between those two and we can assign that difference to the player who was at the plate and made the second out. And we can do that for every plate appearance. And when we are done doing that we can add up all those changes. And when we do, the offensive team has lost more ground than it has gained and the defensive team has gained more ground than it has lost. And there is nothing theoretically impossible about that outcome.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 12:08 a.m., February 10, 2004 (#124) - RossCW
  This is such a waste of time. Anyone who does the win prob tables does it virtually the same way... you always get the off=def=0.

Tango - the data presented by Oswalt shows a net loss on offense of over 1000 wins. You can repeat your theory all you want but you have no evidence to back it up and the actual results contradict it. Perhaps you should find where he made his error instead of insisting there must be something wrong.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 12:20 a.m., February 10, 2004 (#126) - RossCW
  Sorry Voice of Reason - the numbers I was referring to are in the large download where he gives the net impact of every player on the probability that their team will win. If you add up the impact for the batters (which also includes all pitchers who batted) it is negative. If you add up the impact of all pitchers it is positive.

I don't know what you are looking at. I'll take a look later.

Ok It is in his large download of the impact of every at bat from 1972 to 2002 for every player. He calculated the chances of winning before the at bat and after the at bat and attributed the change to the batter. He then calculated the net wins for each player based on those outcomes. Some players had a positive impact, others had a negative impact on their teams chances of winnings. The net impact for that entire list is minus 1067 games. The defense had a net positive of 1067 games.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 2:37 a.m., February 10, 2004 (#129) - RossCW
  Using a database of play-by-play accounts of (almost -- a few 1972-1973 are missing) every game in the major leagues 1972-2002, I constructed tables which show a team's chance of winning a game, based on the score, the inning, location of baserunners, and number of outs. I have constructed such tables, one per year, for each year since 1972. In 1996 for example, if the home team is one run behind in the bottom of the seventh with 1 out and a runner on second, they have a .45253 chance of winning.

This sounds like he constructed his tables from the actual results of each situation not estimates of the likely result.

These tables are useful in many ways. One use is to determine the value of a player's performance. In the example of the seventh inning situation above, if the runner steals third, the home team's chance of winning improves from .45253 to .49368. This improvement of .04115 is a contribution of the baserunner. If instead the batter had singled to score the runner from second, the home team's chance of winning improves to .58374, and the improvement of .13121 (= .58374-.45253) is credited to the batter. On the other hand, the pitcher's contribution to his team for this event is the opposite, namely -.13121. Had the batter instead made an out without advancing the runner, he would get a -.07986 for lowering his team's chance of a win to .37267, and the pitcher is credited with +.07986.

And that sounds like he did exactly the analysis I suggest.

By adding the value of these contributions of each event for each player over the course of a season, we get an exact measure, of the value of that player's contributions as compared to average performances, which I call the player's win value. A value of +x can be interpreted as turning x losses into wins, or as contributing 2x to his team's number of games over .500. (emphasis added)

And Oswalt at least did not think he was making estimates.

Well, that chart is certainly clear.

It is clear but it seems to be repeating the basic truism which no one disputes that the net change to both teams combined is zero. At the end of the top of the first you don't even have data to compare for offense and defense if that was your intent.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 3:00 a.m., February 10, 2004 (#130) - RossCW
  At the end of the top of the first you don't even have data to compare for offense and defense if that was your intent.

Actually I take that back. Obviously you can compare the impact of the top half of the inning on the offense and defense.

By taking the difference between the chances of winning at the top of the inning and after you will find a slight advantage for the home team - that is it gets more gain from the failure of the visitors to score than they get from scoring.

i.e. (.598-.546)*.721 = 34 is greater than the combined changes that favor the offense - that total is 32. My guess is that those differences increase the closer you get to the end of the game.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 10:08 a.m., February 10, 2004 (#135) - RossCW
  Ross, the difference you observer (34 vs 32) is rounding error combined with the fact that the frequencies I listed do not include all of the outcomes which favor the offense.

There may be a rounding issue but the argument that some factors favoring the offense are not included is clearly wrong. Anything that results in a runner being advanced will show up in the change from one state of probability to the next and be attributed to the offense.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 12:04 p.m., February 10, 2004 (#137) - RossCW
  I know that it isn't because I generated the transition frequencies in #125, so I know what data was thrown away.

You will have to explain how it got thrown away. What happened to the wild pitches? Was the probability of the team winning after the plate appearance calculated assuming the wild pitch didn't happen? I don't think so.

The comparison of before and after states doesn't care how the state changed and it doesn't matter. It just measures the change.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 12:43 p.m., February 10, 2004 (#140) - RossCW
  the missing datapoints must be those where the visitors scored more than 5 runs, which is the part of the probability table that favors the offense.

Oswalt didn't base his results on that probability table but tables that measured the likelihood of every state before and after each play.

You know that it isn't because you can add the frequencies in that post, see that they sum to only .994,

I get .997 - I am not sure why you think that .003 is significant. For that to be the difference the chances of the home team winning would have to be zero and that is not the case. The impact on the overall probabilities in the first inning are very small - as I said above you would expect the difference to grow as you get closer to the end of the game.

I'm substituting the frequencies calculated from the observed transitions in an old study of mine, which are only going to be very close to the frequencies calculated by Oswalt from the observed transitions in the data from his study.

So we can't calculate the difference in offense and defense reliably based on the data you have provided given the small differences. We still have the results Oswalt did publish for every player based on the changes in state for each play.

What we are left with is:

The only data we have shows that the probability of a team winning increases when it is in the field and decreases when it is at bat.

There is no theoretical reason why it should increase and decrease equally when a team is in the field or at bat.

No one can find an error in Oswalt's methodology or his data that would demonstrate why it is wrong.

But the consensus here remains that he is wrong, the data is wrong and the assumptions that have been held for so long are correct.

What we have is an object lesson in the elusiveness of "objective" truth.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 1:18 p.m., February 10, 2004 (#142) - RossCW
  In my previous post (#128), I did in fact carry out the work you suggested. That is why I know for a fact that, if you took empirical transition properties for base-out-inning-score states, you would indeed find a total offensive value of zero.

This is post #128: I don't see any work here or results. And your description of how Oswalt did his projections contradicts his description of them.

If you actually bothered to do the analysis you suggest, you would find that the offensive team gains exactly as much ground as it lost.

However, this is not what Oswalt has done; he's estimated the state-to-state transition probabilities and computed win probabilities for each state using those estimated transition probabilities. The estimations are pretty good, but not perfect. The errors in those assumptions are what cause the 1000 win discrepancy.

I'm puzzled why you keep repeating this claim. You have provided no evidence to support it. Its certainly possible Oswalt is wrong - but simply saying he is is not sufficient for anything.

It would be a remarkable coincidence if the claim that offense and defense are equal is true. There is no reason why they should be.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 2:13 p.m., February 10, 2004 (#145) - RossCW
  What you're claiming is equivalent to saying that batting average doesn't HAVE to be equal to hit/at bats.

We define batting average as hits/ at bats.

No one is disputing the issue of whether the sum of the probabilities one team will win is 1. That is by definition - the same as avg is hits/ at bats.

And no one is disputing that if a players batting average is .333 and they went 4 for 4 today somewhere they were hitless in 8 at bats or their batting average isn't .333.

The question is whether that principle applies here and the data says it doesn't. So is the data wrong or is the logic wrong? I have to admit I'm leaning toward the data - but I'm not nearly as certain as everyone else here.


Copyright notice

Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.

If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.