See copyright notice at the bottom of this page.
List of All Posters
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
In a study I did in March 2001 (which included the hitter's last year, but used a much larger sample of players): hitters improve their walk ratio virtually every year, they strikeout the least at age 29, get their best HR ratio at age 27, their balls in play success goes down almost instantly, their line drive power stays pretty flat for a long period of time, their speed as measured by triples goes down instantly, their speed as measured by SB peaks at 24 and goes down almost at the same rate as the triples.
My intention is to eventually re-run that study but with the new information I've discovered recently regarding the last year effect.
A thread with all the data can be found here - http://baseball.fanhome.com/forums/showthread.php?threadid=662692#post1958322
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
As to your other question on different aging patterns for different types of players, I also took a look at this a while ago. My sample set was pretty small, and so, I wouldn't want to make any strong conclusions based on it, but the evidence was showing that all types of players age the same way. The Tim Raines class of runners would lose his abilities across the board (SB, HR, Hits) to the same extent that the Wade Boggs class of runners would. This is another area that I will be (eventually) looking at.
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
I usually only look at 1919 and later because I don't think that "power" is well-represented in the pre-1919 time period. Even HR are not true representation of "power", but it's still pretty good to use.
In that March 2001 study (which I'll reiterate used a slightly different methodology), I found virtually no difference between the aging patterns of the various skillset between 1919-1979, and 1979-1999.
In that study, I concluded the following: "...the historical averages match up very well with the recent period. While today's ballplayers may be better, and playing longer, the "curve" of their aging is the same. There is no age bias with today's regimen of training and medication. It affects all age groups the same."
Aging Patterns
June 26, 2002 - tangotiger
(www)
(e-mail)
jmac: yes, I agree that you have other forces at work. This is why I first presented that large chart breaking up the performances by number of years in the league (for players debut at age 25).
What I can do is present a similar chart for players debut at age 22, 23... etc, so that we can determine a more specific pattern. The only reason I did not do so is that it would be so overwhelming that the reader wouldn't know where to begin. As well, this kind of analysis suffers from sample size issues, and so conclusions will not be reliable. Let me think how I can best present such data.
Aging Patterns
June 27, 2002 - tangotiger
(www)
(e-mail)
In other words (and more simply), age is calculated as of Dec 31st.
I was wondering when someone would say that. Yes, I simply make Age = Year - YOB . Not only is it a snap to calculate, you also don't need to know them month the player was born.
Do you know that batting averages ...follows a normal istribution?
I seem to recall looking at this a long time ago to determine how many single-hit and multi-hit games a player would have, if it did follow such a distribution. And it did.
Do you stick to that in strike years, when 300 PA were harder to come by?...
Yes, on all counts. It's not as if I can say that the reliability of 300 PA in 1982 is similar to 200 PA in 1981. 300 PA is 300 PA. In the cases like this, my sample size goes down somewhat. My other option would be to limit my sample set so that 1981 is not part of the study. I can do this for pairs of seasons, but at the extent that I looked at this issue, I needed to have 5 and 10 and 15 consecutive years. Removing 1981 from such a study would drastically reduce my sample size. Your point is valid however.
I do think the league and park adjustments should be done, so that there is more confidence in the conclusions.
I agree. As my sample size goes down, these adjustments become much more important. As for followup articles, I think I'm going to have to make it a whole series of them because there are so many results that this dataset will give us.
Aging Patterns
June 28, 2002 - tangotiger
(www)
(e-mail)
One of the previous posters mentioned that players who play longer will have different aging curves than those who don't.
Working always from the same sample, I broke my main sample into three subsets. The first subset is those players who play between the ages of 24 (or earlier) and 33 (or later). This is a group of players that have at least 10 years experience, and who have had the chance to play during the "traditional" peak years. The second sample are those players whose career was over by the age of 31. Therefore, these might be those guys who you think might have peaked earlier. The third sample is everyone else. This means, it's a group of players whose career was over after the age of 31. These are just a whole bunch of different types of guys, but skewed slightly towards the older players.
Because of sample size issues, some of the results might look strange. In any case, here it is:
Age # Long # Short # Rest 19 2 0.565 1 0.810 20 5 0.687 4 0.871 21 25 0.816 15 0.921 3 0.936 22 61 0.857 36 0.939 9 1.025 23 99 0.930 81 0.999 16 1.007 24 140 0.936 131 0.987 21 0.972 25 140 0.982 167 0.994 79 0.929 26 140 0.989 191 0.994 119 0.973 27 140 1.000 181 1.000 166 0.991 28 140 0.987 141 0.994 185 0.989 29 140 0.981 80 0.966 203 1.000 30 140 0.979 218 0.969 31 140 0.977 162 0.980 32 119 0.954 114 0.976 33 88 0.938 80 0.951 34 58 0.930 55 0.945 35 39 0.885 35 0.930 36 27 0.887 20 0.905 37 17 0.856 11 0.937 38 11 0.842 4 0.906 39 5 0.809 1 1.014 40 4 0.790 1 0.942 41 2 0.672 42 2 0.563 Hopefully, the formatting comes out here.
What do we see? As you would expect, those players with long careers centered around the expected peak years did just that. They had their best years at the 25-29 level, peaking at age 27. They had a bit of a jump prior to that. Then they had a slowly declining phase to age 34, after which they plummeted. However, don't forget we specifically selected our subset for players who played between the ages of 24 and 33. Therefore, we should not be surprised to see demarcation points close to these ages. Furthermore, the peak point was still age 27. The slowly declining phase is a result of the selective sampling.
The second set of players is far more interesting. These are players, for whatever reason, had their careers over by the age of 31. These players did not have the traditional aging curve. Essentially, they stayed at a peak level between the ages of 23 and 29. Is it possible that there is a class of players that don't get to the next level? That perhaps, there is a class of players that peaks at age 23 and stays there? Or is this again a bias in our sample? That because we specifically chose our players whose careers ended prior to age 31 then this is exactly what we expected to see? This is a much more likely result. Management did not give the players a chance to show their stuff, and simply cut them before their truly good seasons could be shown.
The last set of players has a bias to be an older type of player, and the results show this. These players peaked between the ages of 26 to 32, peaking at age 29.
Selectively choosing your sample set leads to many biases.
Aging Patterns
July 1, 2002 - tangotiger
(www)
(e-mail)
Do you feel the Linear Weights Ratio is a much better measure of offensive performance than OPS? If so, what evidence do you have? If not, may I suggest you use OPS+ in your future studies? On June 26, I posted the following on fanhome "[OPS]'s extraordinarily useful and practical because: - it's readily available - it's made up of the two most important rate stats we have - it's highly correlated to runs scored - it can be used in research when you have the power of sample size that masks its deficiencies
It is NOT useful because: - you can not count on it for game-level decisions - you can not count on it to evaluate players of weird profiles - it does properly weight all the events
So, depending what you are trying to do, OPS is either a godsend or a bane.
The reason I hate it is that people use it for the exact reasons that it is not useful."
In sum, OPS is used as a stand-in when you don't have something better. LWR is something better.
I'm not sure why I need to show "better" evidence to use LWR over OPS. All the deficiencies of OPS are taken care of in LWR. I can use LWR to convert it into Runs Created or Runs over average, or really anything, in a simple one-step process. LWR is the best "rate" measure we have. (If you want more discussion on LWR, you can check out my site at http://www.geocities.com/tmasc/lwr.html which will give you the full formula, as well as a link that discusses LWR.)
******** Not only will this take care of the adjustments for park and year which you indicate are necessary, but it will make it easier to incorporate your findings with other studies that are OPS+ based...
I have the data to calculate the park/year adjustments, I just didn't want to add another layer of complexity. (If someone wanted to reproduce the above research, they could. If I added park and year adjustments, they couldn't.) As I indicated, I'll add that layer the next time. All studies that are OPS+ based are flawed for the reason that they rely on an indicator that has deficiencies that are circumventable.
Felipe Alou: Is He Afraid of the Walk?
November 13, 2002 - tangotiger
(www)
(e-mail)
I know that Walker hated the way Felipe would talk to him about hitting approach.
The first poster is doing exactly what I said we shouldn't: looking at team totals.
However, the first poster is correct that Alou does have a choice from within the 15 hitters which ones to play. But Alou was not dealt a good hand. If you're given the bottom of the barrel, you should expect to have low walk totals period.
The team was selected by the GM, and therefore, the low team walk totals is more an indication of the type of team that the GM has selected.
I'm going to continue the analysis tomorrow, looking at it from another angle. I don't know what the results will be, so I'll report whether they favor Alou or not.
Banner Years
October 31, 2002 - tangotiger
(www)
Good comments, guys. I actually meant to address these two issues, and I'm glad you brought them up.
Age: definitely should be looked at, but I can tell you that there is no big age bias, even with the 149 group. I will do the breakdown, hopefully before the end of the day.
Banner selection: one of the considerations I had was that I did not want to select players that were say 110-110-110-140, because of the "regression towards the mean" issue I brought up. That is, even though you've got a guy who you think is a 110 level for three years, he might actually be 107 or 113, etc. The closer you are to 100, the more likely it is that this is a 100 player. Furthermore, by introducing all players, then I get into trouble with losing players. While it is unlikely that you will lose a player from the pool who is 100-100-100, it is very possible that you might lose a player who is 80-80, thereby introducing a bias. Of course, this also depends on position.
Having said that, I was thinking about running through the data anyway, and see what happens. And with the much larger sample size this would allow me, I can select 35% above "previous 3 years" to really highlight the banner years. I'll try to get to this next week.
Banner Years
October 31, 2002 - tangotiger
(www)
Walt, good comment at the end, and this is exactly what I did with the HR study I linked to. And rather than seeing a "retention", we see essentially that what the player did the year after the banner year was repeated the year after that as well.
Again, what I am talking about is not "retention", even though I used that term. We are presuming that a player's performance level is a sample of his true talent level. Therefore, by selecting 130-100-100, I am choosing those players that had a great year followed by 2 average years. This does not imply that this player had an injury or something that forced him to go down to 100. The more years I tack on, the smaller my sample size. You are correct that I can simply show year1, select on years 2 through 4 (whether 100-100-130, or 130-100-100), and then look at year 5. My guess is that year 5 production will be only slightly different than year 1 production, age notwithstanding. This is a good idea, and I will run that next week as well.
Walt: any comment about the Hank Aaron issue?
Banner Years
October 31, 2002 - tangotiger
(www)
MGL, yes, I agree with almost everything you said. Two points:
1 - yes, not my best writing work, as I wrote it in 30 minutes, but what is it that is unclear? Was it the weighting thing at the end? It basically means that you put more weight on the most recent year, and you have some weight to regress towards the mean. Or was it something else?
2 - As for the 149,149,149, which I selected for, the 4th year was 142. However, you then say that this group is actually a "147" group . Not! Because my group is "fairly large", then I would say that this selected group of 149,149,149 is a 142 player. And, if I looked at the 5th year, I would bet that this group would also exhibit 142. I would also bet that the year prior to 149 would also be 142. I would say *every* year around the 149 years would be 142. Do you agree? (Age of course is an issue if I start going crazy and start to consider pre-24 and post-36, etc years.)
However, for a single player, if I had a 149,149,149,142 player, since I didn't select such a player, then I would have to guess that he is a 147 player.
I think we are on the same page, but I'm not sure.
*** As for parks and changing teams, etc, yes that is always a problem. It's "possible" that the park may play an influence in the selection of my players, but I doubt it. The banner year was 25% above the base years, and so, while playing at Coors does increase the chances that he will be selected in the banner year, I don't think this is the case. I'll look into it though.
*** By the way, the more I look into this, this is just like MGL's hot/cold streak study. While he is looking at 15-day periods, I am looking at 3-year periods. We are (or will in my case) looking at the pre-selected and post-selected period, and we are (or might/will in my case) finding that those two values pretty much match, regardless of the intervening period.
Banner Years
October 31, 2002 - tangotiger
(www)
First off, I'm not trying to capture ALL banner years, just some of them. As well, I am not suggesting at all that 149,149,149 is banner performance. I am using that type of player to show that a 149,149,149 player is not in fact a 149 player but a 142 player.
So, when you look at a 130,100,100 player, a player that certainly had a banner year, we should treat the 130 with some hesitation, since, as we've seen, this performance was "lucky" in some respect.
****
Anyway, I've re-run, so that we have "x", 149,149,149, "y". That is, how did the players with 3 great years do just before the "banner 3 years", and just after? Here are the results
Year 1 1.42 Year 2 1.49 Year 3 1.49 Year 4 1.49 Year 5 1.41
This population of players had 593 strings.
Now, if we break it down by age (in Year 5), this is what we get: Age 1 2 3 4 5 n
34+ 1.46 1.51 1.51 1.48 1.37 173 30-33 1.44 1.49 1.49 1.47 1.41 229 29- 1.36 1.46 1.49 1.51 1.45 191
Again, as you can see, the "3 selected years", were pretty constant around that 149 level. The before/after years are consistent with the age grouping. But, in all cases, the before year was less than the selected period, even for the old guys.
There is also about an annualized 2% change in performance level between year 1 and year 5, which is also consistent with my findings in aging patterns previously done.
So, the "true talent level" is year 1 and year 5, and everything in-between is "lucky".
Banner Years
October 31, 2002 - tangotiger
(www)
MGL, sorry for the bugaboo.
To go back to your question, let me amplify. The 149 performance is regressed 14% towards the mean to match the "expected probable" true talent level of 142. So, generally speaking, we should regress all 3 year performances by 14%.
Now, of the three remaining components (year x, year x-1, year x-2), we weight the most recent seasons (x) as 38%, and the other two as 24% equally.
As a shorthand, rather than remembering kooky percentages, you can apply integer weights of "3" for "x", "2" for "x-1","x-2", and "1" for "mean". Maybe I should have skipped this part, as it's probably more confusing than it should be.
Banner Years
November 1, 2002 - tangotiger
(www)
Good job, MGL!
The mean of the players who played in those 5 year spans, with at least 300 PA is 115%. Now, this may sound like alot, but don't forget, we have alot of repeating players in there (like Aaron).
I don't think that the regress towards the mean would regress to 115%, but I'd like to hear from the statistics-oriented fellows about their thoughts on this matter. I would guess at this point that the Aaron situation comes up, and I should identify unique players only.
Banner Years
November 1, 2002 - tangotiger
(www)
Since age is an issue, and I can easily control for it, I will re-run using that.
As well, the "mean" of the players is 115%. If we look only at one age group for the 5 year period (say ages 26-30), we see that each year they average 115%. If we select any other time period like 24-28, you also get similar results. And of course, no player could possible exist more than once in each age group. Therefore, the mean is 115%.
Therefore, I should probably select players that center around 115%, and that center around the 27 age group. I'll get to this next week.
Banner Years
November 1, 2002 - tangotiger
(www)
MGL, maybe you missed my last post, but if I only look at one 5-yr period, say ages 24-28, then of course Aaron can only exist once in this string. And, the players in this group are 115% of league average. Now, if I select some other age group, the unique players in that group are also 115%.
However, if I decide to combine the two groups, I might have two Aarons, and two Ruths, etc. I don't see why I would want to remove one of them from the groups.
I think it would be easier to keep all the age groups separate (24-28, 25-29, 26-30, etc, etc) and report on each one separately. This removes the conflicting players, but addresses the Aaron issue. However, I don't see the problem in then combining these three age groups afterwards, AND KEEPING the mean at 115%.
Or maybe I'm missing something?
Banner Years
November 2, 2002 - tangotiger
(www)
Contrarian: I've already admited my shortcomings in many areas, including statistics. I've taken enough that I can follow conversations, but that's as far as I would take it. I also know enough to apply the basics. This is no news to people who've been reading me, and any of my comments should be taken like that.
I am always interested to hear from Walt Davis, and frankly I just missed his second post (the way Primer regenerates the site, there is a lag, and Walt's post got sandwiched in-between).
I have no problems with people criticizing my approach, or my comments, or anything I do. It would be nice though if you would provide an email address so that we can correspond privately, and you can elaborate further.
Banner Years
November 4, 2002 - tangotiger
(www)
Sancho, thanks for the links. The first one I had not seen, and is a not bad one. As for Albert, I'm frankly disappointed. There's a long list of math professors who have tackled baseball issues, and really either miss something, or write so dry that I miss something. (Of course, there's an even longer list of sabermetricians who miss some math issues as well.)
Banner Years
November 5, 2002 - tangotiger
(www)
(e-mail)
Shaun, I agree age should be taken into account, and I'm currently working on this. I should have something to show as soon as I get the time (which these days is not too much).
As for contract status, certainly this would have an impact. However, by having an aggregate of players, this impact should not be noticeable too much. And of course, since my data is from 1919 onwards, there's an even smaller population which would even be affected by this at all.
As for learning and improving, etc. This is the issue. Is it the case that the player is learning and improving, or is it simply random chance that the player happens to have a banner year. Hopefully, with the new data I have, we'll have a better answer.
Banner Years
November 7, 2002 - tangotiger
(www)
(e-mail)
MGL, no F James specifically said that these 149,149,149 players would not regress, except for aging. In fact, they do regress to 142.
This group of players will regress towards THEIR mean, I agree. In fact, they will regress 100% towards their mean. But since we don't know what their mean is (without looking at other non-sampled years), we take the next best thing: the mean of the population they were drawn from. This mean is in fact 115%. Therefore, given the number of years (3), the number of players (I don't remember, let's say 100), and the number of PAs (let's say 500 / player / year), the best players will regress 7/34 (20%) towards the mean of the population they were drawn from. Different years, different # of players, and different # of PAs will regress differently.
Now, I know little of statistics, and perhaps Walt Davis or Ben V can put this matter to rest.
I'll be back in a week or two with detailed data, broken down by age.
Banner Years
November 8, 2002 - tangotiger
(www)
(e-mail)
F James: I think I wrote this already, but it might got lost with all of MGL's explanation, but the year before the 149,149,149 string was 141 and the year after the string is 142. Subsequent years drop off slightly from 142, and in fact matches what you would expect from normal aging. (This will become more clear when I do the breakdown by age... eventually, whenever that is.)
Essentially, MGL's point boils down to: whatever period you take, how many ever players you take, how many ever PAs that performance makes up, you have to regress to some degree. The amount to regress is related to the variables I just mentioned. By choosing 1 day, we are regressing almost 100%, by choosing 5 years of performance between age of 25 and 29, and in each of those years the player has 1 google PAs, you regress close to 0%. Everything in-between is subject to more analysis.
Given my sample of 3 years of 600+ players of about 500 PAs, the regression of the 149 player is 20% towards the mean of 115 to achieve the true talent level of 142 (more or less).
Let's Contract Two Different Teams
July 12, 2002 - tangotiger
(www)
(e-mail)
Proofreader guy: you know, I read and reread and re-reread my article, and it amazes me what I miss. How about "here" for "hear", and "marker" for "market"? Competitif is french for "competitive", so I don't think I can use the french excuse.
Common Sense: do you think that if Steinbrenner reduces his payroll from 140 million to 90 million that he will give that 50 million$ of savings to you? In fact, don't you think that now that he set up the YankeeNets that it will be very easy for Steinbrenner to claim much less revenue because the YankeeNets corporation owns the Cable rights, and not Steinbrenner?
If teams claim that they can't play in the same playing field as the Yankees, then either level the playing field by introducing teams into a lucrative market to siphon off some of that revenue, or take some of that Cable money, or realign the two leagues by market size. Let the Yankees and MEts and REdsox and Braves and Dodgers spend themselves crazy. Let the A's and Expos and Royals and Twins spend smart.
To think that by controlling player salaries that you will get an outcome that is different from today is ludicrous. Nothing is going to change. In 5 years, you'll be right back to where you started.
Let's Contract Two Different Teams
July 14, 2002 - tangotiger
(www)
(e-mail)
There's no question that we are introducing accountants into the fold with the owners' plan. As if lawyers aren't bad enough. How many white collar solutions do we have to introduce to "solve" the problem?
Just re-align based on market size. 4 divisions of 8 teams. The top team of each division goes foward, while the 2nd and 3rd place go into a wild-card system where the 2nd place of Divison 1 plays 3rd place of Division 4, etc. There's no need to force a socialistic solution. Just change dance partners.
There's no need to overhaul anything. If you want to overhaul, then disband the league, and do it right.
Let's Contract Two Different Teams
July 15, 2002 - tangotiger
(www)
(e-mail)
Common sense: it seems that you've been getting more and more common sense. How much longer before we get Commen Sense the third?
Seriously, when I say to "contract" the New York teams, I intended it to be in a humorous note. But the point of contracting the teams is to reposition the power that is highly concentrated in the New York teams. Since Steinbrenner is consolidating and hiding his power and revenue in a second enterprise (that exists only because of the first), it is unlikely that he will reduce the market value of his interests.
Why would 29 intelligent men buyout a franchise that has limited value (Expos) when they can buy out a franchise that has substantial value (Yankees). Steinbrenner used the system to its fullest, he capitalized on it with the unanticipated TV value that has created the great divide. Everyone has his price. So, buy out Steinbrenner at fair market price, and redeploy the value of the Yankees by siphoning away the cable and TV value, and selling the rest of the team to an interested buyer. That is, buy Steinbrenner's TV and cable rights away from Steinbrenner.
If that is too hard or too expensive to do (as if maintaining the status quo does not have its own expenses), then just take the "barnstorming" idea to something more palatable. Put the Yanks, Mets, Redsox, Dodgers, Braves, Orioles, Rangers, and Cubs in the "Division 1" league. Put the Expos, Pirates, Brewers, Reds, A's, Marlins, Devil Rays, and Blue Jays into the "Division 4" league.
What would happen? Well, all those Division 1 teams will soon realize that they can't hope to buy their way in because they've got too much competition for too few spots. They'll have to be smart. The Division 4 teams will realize that with just a little effort and smarts, they'll have a decent chance to make a run for the playoffs.
Once in the playoffs, anything can happen (especially if you make the first round 5 games instead of 7).
Without spending a single dollar on either side, we can reshape the entire competitive balance by simply changing divisions.
And what's more shocking: that I say to redistribute the wealth of the Yankees to the poor teams, or Selig redistributing the wealth of the Expos to the rich teams?
Let's Contract Two Different Teams
July 16, 2002 - tangotiger
(www)
(e-mail)
Willy Loman is all in favor of the American dream. And I didn't say to steal it from the Boss, but buy it back from him. MLB made a huge error in not securing the TV rights the way the NFL did. Now they've got to pay for it. Literally. Once they do that, the chips will fall into place. But to restrict player salaries through non-American ways? I don't think so.
Let's Contract Two Different Teams
July 24, 2002 - tangotiger
(www)
(e-mail)
I think the soccer relegation/promotion idea is viable. But I question the 30 teams/league decision. The disparity will still exist. Why not have a 12 team premier league, 24 team division-1, etc, etc. Which just brings us back to my proposal of having leagues segregated by market size, but having ALL of them play for the World Series. By having each league have its own championship, the fans will question the legitimacy of any except the World Series.
Let's Contract Two Different Teams
July 24, 2002 - tangotiger
(www)
(e-mail)
As for pay for performance, why not simply limit contracts to 1 year? And make everyone a free agent? That would make it truly free market. You'd end up paying rotisserie style prices (about 15% to your top player), because of the abundance of supply. So, a team with a 60 million$ payroll will pay say Mike Piazza 9 million$. A-Rod would have a tough time getting more than 15 million$.
So, we have a mechanism that can severely limit top players' earning potential. All owners have to do is declare everyone a free agent, and no more guaranteed contracts. Too hard. It's like going to the Playboy mansion and being told you have a chance with 1 girl, and 1 night only. The owners want control, and they want to feel empowered. It'll cost you.
Let's Contract Two Different Teams
July 31, 2002 - tangotiger
(www)
(e-mail)
It would turn exactly into a rotisserie style system. The top guy would get at most 20% of the payroll. In any case, it doesn't matter how much the #1 guy gets. It's the overall payroll that matters. Players will be willing to sign for below market in some cases, simply because they don't want to be left out.
Teams will have their budgets before the bidding starts, and they won't try to run up prices, because of all the other fish in the sea.
Owners need help controlling themselves, and this is the best way. And if they overpay? So what, it'll only be for 1 year. You won't have all those 5 year contracts guaranteed to worry about.
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
I still have not decided how to "rank". With only 32 players, using differentials or RMSE might not be the most appropriate (esp with the Bonds thing). I could create "classes of differentials" (consider each class to be 1 SD of error, and max out at 3 SDs or something like that). Or I might use differentials, while capping the individual differential at 3 SDs. Really, it's not important. I'm going to present the full data, and the reader is free to analyze and interpret the data as well.
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
...retrospectively, using the same methodology, at last year or other prior years?
I guess surprises are out of the question around here! Voros has looked at the various forecasters for the year 2000 hitters . I was going to also add in the "baseline forecast" to his list to see how that stacks up. Stay tuned in a couple of weeks.
When do fantasy drafts usually occur? The last weekend in March?
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
I probably should have said this in the article.
If I were to ask the Primer readers to estimate 200 players ERA or OPS, I'd get a smattering of response. By limiting it to something reasonable (32) I hope to get a decent participation, while at the same time getting reasonable (though not conclusive) results from the forecasters. This is similar to what the WSJ does with using the top 10 picks from the brokerages. The intent is not to prove anything. I also selected those 32 players who showed the most deviations, and therefore, we'd expect the forecasters and the Primer readers to have little agreement on these.
I have also asked the forecasters to participate in a second parallel study, where they would submit the projections for a large number of players. I've only received a positive response from 2 of them. This is essentially what Voros did with his study, except he did the hard work by compiling everything himself. I can understand that the forecasters don't want to give everything away (which is why it was easy to ask them for only 32). I hope though that by the end of the season, they'll give me their list, so that I can save some work. So, you'll get the study that you are looking for, plus the other readers will have some fun (I hope) as well.
I hope this answers your concern.
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
Yes, included with the ballot for the 32 players will be your estimate of MLB OPS and ERA (which will default to the 2002 level if you don't choose anything).
This is critical because if a forecaster underestimates all his projections, it doesn't matter, as long as you only use his system. Therefore, that's not a bad thing.
Really, I wanted to ask everyone to submit their OPS/lgOPS, but that loses too much meaning.
Great question!
Forecasting 2003
February 13, 2003 - tangotiger
(www)
(e-mail)
By the way, if anyone has a systematic forecasting system, then send me an email. It could really be based on anything, like
- weighted or unweighted recent performance - lefty/righty splits - gb/fb tendencies - comparable players - age - height/weight - position - regression towards mean - injury historyYou don't have to tell me how your engine processes everything, but just what/how does the engine consider. I can then throw you into the systematic forecaster pool. Thanks...
Forecasting 2003
February 17, 2003 - tangotiger
(www)
(e-mail)
Just to reiterate (or maybe iterate, since I was not very clear), the point is not to figure out who has the best forecasting system, but rather if a systematic forecasting system is any better than a baseline or back of the envelope (card) system.
What the WSJ study shows is not that the Lehman brothers have a better forecasting system (hard to say with only 10 stocks) but rather that the mom&pop do better using a baseline (the S&P500 index) than in paying off the professionals.
To determine which professionals are better, you need far more than just 10 sample points, and the WSJ also does this by looking at all stock picks. This would be part of a second parallel study if I get a decent participation from the forecasters as well, similar to what Voros did in the 2000 link I provided. However, given that I've chosen 30 players who have very inconsistent performances, I think it might show something about the forecasters, but will be far from conclusive. (If I had chosen the 30 most consistent players, my guess is that all systematic forecasters would come up with very very similar estimates. I've removed the Colorado and the inexperienced players from the study, and there again, some forecasting systems might be better with those players.)
Forecasting 2003
February 18, 2003 - tangotiger
(www)
(e-mail)
Erstad: good call!
The next set of players that missed making the cut were, in order: Renteria, Erstad, Beltran, Sosa, Javy Lopez, Mark Loretta, Vina, Giles, Magglio Ordonez, Garret Anderson, Ben Molina.
Forecasting 2003
February 19, 2003 - tangotiger
(www)
(e-mail)
Minks: no, you would only supply the unadjusted OPS. The only reason to supply lgOPS and lgERA would be to establish your basis. Suppose that you miss all your OPS projections by 50 points, but that you also projected the lgOPS to be off by 50 too. Then, this scores 100% (in my book). A person using the results of such a projection will be perfectly happy (as long as he uses only this projection).
David: the back-of-the-card forecasters are just like mom&pop investor. They each have access to their own private information and public information and intuition, and combine all their data into some sort of target price for a stock. The collection of all these investors makes up the market. You can benefit from this "wisdom" by buying the S&P500 index (SPY). The systematic forecasters follow a rigid, repeatable process, like the various brokerage houses, like Lehman and Smith Barney. The baseline is the monkey throwing darts at a stock chart. So, whether I am comparing apples or oranges I don't really care (for this study). I'm trying to put this study on the same plane that the WSJ puts its study in.
A second parallel study, looking at extended picks that the systematic forecasters provide (which the WSJ also does when selecting their best analysts), might satisfy the fruit requirements.
Forecasting 2003
February 21, 2003 - tangotiger
(www)
(e-mail)
Vinay, excellent. I did not know about this. We are essentially after the same goal, but where they have 27 humans projecting 125 players, I'm hoping to get the reverse (100+ humans projecting 32 players).
What is very interesting to me, and which matches the stock market with its S&P 500 index, is that the collective wisdom of the market matches the top forecaster, with all of his intricacies.
The "missing big or getting big" projections of Wilton, I think is probably attributed to lack of regression towards the mean in that system. I'd have to look at the data more carefully though. Because we are dealing with sample performances, you should expect a few guys to have seasons that are out of the norm, and therefore a system like STATS or Palmer will miss the outliers at the gain of the large population. Silver's PECOTA should give the readers the best of both worlds.
Forecasting 2003
February 24, 2003 - tangotiger
(www)
(e-mail)
We are trying to forecast a player's performance for the upcoming year. This performance is a combination of a player's expected true talent level, context in which that talent will manifest itself, and luck.
ERA has more luck (from the pitcher's perspective) than other measures. The point of this forecast is to try to predict a player's performance numbers, with the reader trying to do as little as possible.
The 2003 Projections
May 6, 2003 - tangotiger
(www)
(e-mail)
I didn't think about that sort of thing when making my projections; they were more seat-of-the-pants than that, and I assume that they were for most people.
I hope this is the case, as this is what I was hoping for. Can you get 100+ baseball fans to make seat-of-the-pants calls on extreme players, average them, and come up with something decent? We'll see in a few months...
Crucial Situations
December 3, 2002 - tangotiger
(www)
(e-mail)
Really? Hmmm, are you using an old version of Netscape? What's your browser version?
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
...but the shading doesn't print for me. Just pages of empty grids.
Hmmm... maybe I should put text and color? I'll see what I can do about that.
In some innings, does the blue shading bleed into the -4/+4 run columns (or even further)?
Good question! I was thinking about that, but since I used the same program as for Bonds, I limited to -3/+3. Maybe next time I'll expand to something larger.
...although the column headings would be better if repeated every half-inning, not just every inning.
Thanks for the suggestion! My artistic skills are not what even an average person has, so any formatting improvements suggestions are appreciated. I'll do this next time as well.
But is there anything here that isn't intuitive? I was a bit surprised by how much the leverage changes as soon as you get one guy on base, especially the late innings.
...but can there really be an argument for pinch hitting for guys (other than your pitcher) in the 3rd or 4th inning. I mean, you'd run through your bench, pretty fast. I mentioned at the end that that was not what I was suggesting. Though I would consider this if my batter was Ordonez, and Piazza had the day off.
...like what to do when you're in a particular colored box, so the practical value can be perceived. In contrast, your earlier, similar piece on when to walk Bonds seemed eminently practical, ...
There's really no end to this WE stuff. Eventually, I will be producing charts for the SB break-even points, when to bring in your reliever, should you go for the DP or try the runner at home, should you test the RF's arm, etc, etc. Any suggestions you can offer would be appreciated as well.
What are your definitions of 'Very high-leverage', 'High Leverage', etc.?
It gets a bit dry (series of math equations), but I just picked some arbitrary threshholds to try to distinguish easily the various situations. I could have put +.054 wins and +.013 wins, etc, but who the heck knows what that means?
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
I'll do my best.
This is what you do: 1 - Determine the WE for every inning/game/base/out for an average team. I've provided a subset of that in the initial link.
2 - Assume that your "great pitcher" or "great hitter" or whatever is going to come in for 1 PA. What is the expected WE following this player's PA? (I used a player whose component stats translates to a .750 win%)
3 - Take the difference between the two. That is the impact in wins of a "typical super-great" player for 1 PA.
The biggest swing, in this example, is about +.07 wins, and that occurs in the bottom of the 9th, home team up by 1, and you have men on 2b and 3b and 1 out. That is, if you bring in say Pedro or RJ or Mo Bonds or Thome or Giambo for ONE SINGLE PA, he will have an effect of .07 wins (assuming these guys are .750 players) over an average player.
How much is +.07 wins? Well, the typical star is +6 wins in 600 PA (+.01 wins). If you bring in Giambi IN THIS PARTICULAR SITUATION 100 times, he'll have as much impact as playing full time.
Now, now, you won't have this situation 100 times, and not having Giambi regularly in the lineup might even mean you might have this situation zero times, who knows. But this is the magnitude of the impact.
So, while Theo Epstein and Bill James are saying that tied games in late innnings are very important (AND THEY ARE!), my research shows that up by 1 for the fielding will have more of an impact to have a great pitcher pitching.
Anyway, the thressholds I used are .01 / .02 / .04. Just made them up to try to get a balance to the chart. Well, I used the .01 because that's what a great player is worth randomly. And .04 cause that would make it 4 PAs in a game. So, given the choice to hit Piazza 4 times randomly, or once in the "very high-leverage" situation, it's a wash. Of course, if that situation doesn't come up, well, you lose on the deal.
Is that enough detail? Too much?
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
Chris, you got it!
If you followed my "Runs Created" series, it shows that the "run environment" (really WIN environment) already exists when the batter/pitcher matchup comes up. That is, the runners on base already have a built-in chance of scoring, given the environment they are playing under.
So, if you then introduce a great player into the mix at that point in time, the entire environment changes. Now, the chances of winning change (sometimes drastically). With 1 out, more damage can be done (not only with the runner on base, but with the batter getting on base). You bring Bonds as a PH with 1 out, not only is the guy on base likely to score, but Bonds will now put himself in a position to extend the inning.
Crucial Situations
December 4, 2002 - tangotiger
(www)
(e-mail)
Oliver, I really don't know. You'd really have to compare how teams should make their choices optimally against how they really make their choices. And you'd have to break it down by the kinds of choices as well (steals, sacs, taking extra base, throwing to wrong base, bringing in the wrong reliever, batting order, etc, etc, etc). It's gotta be a few wins at least. I don't know, 5? 6?
In the business world, I would perform a cost/benefit analysis. But, the reason I'm doing all this baseball stuff is so that I can get away from doing these boring dry cost/benefit reports! Please don't make baseball like a job for me!!
Crucial Situations
December 5, 2002 - tangotiger
(www)
(e-mail)
Here's a printer-friendlier version
http://groups.yahoo.com/group/tangotiger/files/crucialpa.pdf
Still no text, though.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Vinay, the starter is almost exactly 1.00. I looked at two starters, one who was good and went long (Blyleven), and one who went short and was not so good (Knepper). Bert was .99 and Bob was .98.
As for historical, I have the LI for the 20 pitchers with the most relief games from 1974-1990:
pitcherid Leverage Index suttb001 1.90 smitl001 1.76 fingr001 1.75 gossr001 1.72 rearj001 1.69 smitd001 1.52 orosj001 1.48 laveg001 1.46 quisd001 1.45 mintg001 1.45 tekuk001 1.41 garbg001 1.38 lyles101 1.35 campb001 1.31 stanb001 1.30 martt001 1.24 leffc001 1.20 hernw001 1.18 baird001 1.10 andel001 1.03
This is the LI only while as a reliever.
Sorry, but my data is limited to the pbp provided by Retrosheet.
I agree that doing the LI by year, and then doing the multiplication by year would add the "timeliness" aspect as well. I might do it for one of these guys, maybe Quiz.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Thanks... good question. I haven't calculated it yet, but I have another tool (which for a lack of a better name I call the Tango Distribution), which shows the expected runs per game distribution, given the runs per game of a team. Using this, I can figure out the win% for any two teams, broken down by run differential. Surprisingly (to me), there is little difference between a .400,.500, and .600 team in terms of "number of close games". I would suspect that I could extend this to "number of crucial situations" as well, and therefore, expect that most team face a similar number of crucial situation. The more you get away from being a .500 team, the less the number of crucial situations. I'm not sure what the relationship is between team win% and crucial situations (yet). I'll keep this in mind next time I'm working on this. Great question!
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Craig: Mark Eichorn, in 1986, was 1.32. If you look at the list of 20 players I listed in my followups, you will note that Bob Stanley is 1.30. I think this is probably what you'd find with your multi-inning non-closing firemen. Eichorn did have 10 saves, and finished half his games, so you might be careful about how you extrapolate his usage to other relievers. In any case, his 157 relief innings is equivalent to 207 typical innings. The guys below 1.00 in LI are the true mop-up guys.
Devin: Your point is valid regarding what it would take. Since the HOF is a self-defining institution, I don't see how I can answer that question with any basis. Writers, like fans, are flying by the seat of their pants in trying to establish the potential impact a reliever has.
Pete Palmer and Bill James try to answer this question by using a combination of SV, GF, and G to come up with a reasonable estimate. I'm offering the same type of solution from a different angle. Therefore, I think it is irrelevant how we think they impact, and how they've changed the way the game is played and managed. The fact is that the impact of the best relievers, while real, is not substantial enough to catapult them to the levels of the superstars. And the best of the lot is good enough to put them in line with star pitchers who lack longevity. This is why relievers are paid they way they are. GMs may have figured out their true value already.
However, your point is just as valid, and that the HOF may not simply be about "overall value". And perhaps relievers do deserve a special spot. I don't know, and I think that the writers also don't know.
As for my sims, I'll run a couple more, like for Quiz and Reardon and Bob Stanley, using their LI. I'll let you know what shows up.
Charles: thank you! I had alot of fun doing this piece! I just wish I could devote more time to this.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
No, you didn't miss it. I explained it in another article.
If you go to the first line of the article, and click on the link, you get a general explanation of how I determined the leverage of the situations. If you then go into the comments section, in one of the December 4 comments, I elaborate on how I derived the leverage values. Hope that's good enough?
I apologize for making each of these win expectancy articles links to links to links. They're all related, and it's very tough (for me) to write it adequately, without making it a mathfest.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
If ever I get pbp prior to 1999, and I get the World Series pbp, I'd *love* to look at Mariano Rivera. He may turn out to be a borderline candidate like Goose or Lee Smith, based on regular season numbers. But when you add in his tremendous playoff performance, that may be enough to get him over.
In fact, I am surprised how little play playoff heroics get. In the NHL, they have the same problem. The NHL and NBA is *all* about the playoffs. But the awards and HOF, etc, is mostly about the regular season. Rarely do you see the two combined. I believe in soccer, they combine all games, regardless of "league". Pele has 1,241 (or whatever) goals, with no split.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
Thanks Colin.
Yes, your point is very valid, and Vinay also brought up the issue in his comment. Essentially, the next breakdown is to look at each PA within the context of the leverage of the situation (which is what the Mills Brothers and Doug Drinen did). So, while Bruce Sutter may be 1.90 overall, he'd have say 30% at a 4.0 leverage, and 40% at a 1.50 leverage, and 30% at a 0.3 leverage. And maybe during the 4.0 leverage situations, that's when he was his best (whether because his manager used him during his peak years, or he rose to the occasion, or by luck), and therefore, he was even more valuable.
It's possible that there is some impact here, especially with the relief wild card.
It requires some upfront work on my part to get the whole thing set up. I'll see if I can devote some time to this. Perhaps after XMAS.
Bruce, Lee, and the Goose
December 17, 2002 - tangotiger
(www)
(e-mail)
I think this is the exercise that Vinay did, but in response to Joe's point, let's go through it step-by-step.
Let's assume that Gossage has an ERA+ of 126. Let's assume that he had 251 IP as a starter, and 1558 as a reliever. Let's assume that as a starter, his ERA+ was 100, and as a reliever it was 130. Fair enough?
Now, his LI as a starter was 0.99. His LI as a reliever was 1.72. As we see in the above paragraph, he pitched his best as a reliever. So, if we take his 1558 innings and multiply by 1.72, that gives us his "adjusted typical" innings. Do the same for the 251 x .99. Good so far?
Now, weight the 130 ERA+ by 1558x1.72 and the 100 ERA+ by 251x.99. You end up getting an ERA+ of 127. That's compared to the initial value of 126.
The point is that because very little of Goose's innings came as a starter, the change won't affect much. If this was Eckersley, then that's a different story.
That said, while the impact is small in this case, we should still do the breakdown as I mentioned in the previous post, so that we are leveraging each particular PA, and not applying an overall leverage on the sums of the PA.
Bruce, Lee, and the Goose
December 18, 2002 - tangotiger
(www)
(e-mail)
Good point.
There are two things to consider with "leverage". You can take the position of what was the leverage of the situation, assuming that the pitcher will pitch to the end of the inning. So, if it's top of 9th, 1 out, man on 1B, up by 1, the leverage is not that particular situation, but rather that particular situation as the starting point, until the end of the inning. It could be that that particular PA may have a leverage of "4", but the "starting from that PA to end of inning" may have a leverage of "2".
Furthermore, you can also take the point of view that if a reliever gets himself into a jam that the manager is "bringing him in" to get himself out. That is, after every PA, the manager is deciding whether to bring in his existing pitcher, or bring in a new pitcher.
Remember, my point of view is crucial PAs. So, PA by PA, what is the leverage. I don't know if it's the pitcher or the fielders that caused the change in leverage. And really, I don't care. What I care about is how often did he face a high-leverage situation.
It is important that you don't make a stat do what it wasn't designed to do.
If I were designing a model to decide when is the optimal point in the game to bring in a reliever, such that he will pitch to end of inning, I would have different leverage numbers. And if I design a model, such that my pitcher will pitch to end of inning, plus one more full inning, I'd have again, different leverage numbers.
All these methods are good, within the context of their design assumptions.
Bruce, Lee, and the Goose
December 19, 2002 - tangotiger
(www)
(e-mail)
To add to the point about "how often do reliever cause high-leverage PAs for themselves": Bruce Sutter comes into a high-leverage situation, and he keeps it high-leverage. Fat Rojas comes into a high-leverage situation, and turns it into a low-leverage situation (by giving up 3 run HRs).
So, there are various reasons as to turning a type of leverage situation into another type of leverage situation. It's not just a "if he's bad, then..." kind of deal. It's alot more intricate than that.
Bruce, Lee, and the Goose
December 19, 2002 - tangotiger
(www)
(e-mail)
Gossage gave up .5 more walks than Gooden, but .7 less non-HR hits. Gooden's run environment was slightly higher than Gossage. The actual runs allowed by relievers are a little suspect because of the "accountability" issue. Not that it should be ignored, but just you have to account for it.
On a "rate" basis, of pitchers born since 1950, I have Guidry, Cone, Rijo, Sutter, Blylven, Gooden, Gossage all being "equivalent". In terms of IP or leveraged-IP, clearly Blyleven is the one that stands out here. By the way, John Smoltz is also in this group.
Gossage is borderline, in my view.
Bruce, Lee, and the Goose
December 19, 2002 - tangotiger
(www)
(e-mail)
Paul, welcome! I don't think I've seen you around here? I'd love for Retrosheet to get more PBP, and I'd love to run the 73 Hiller, and the Franco career through their paces.
Walt, my inclination is to say that they are in the pen because they are not Roger Clemens or Greg Maddux. However, they are David Cone or Ron Guidry, and those guys were pretty darned good. I don't have separate standard for catchers or anyone else. I look at it as how many wins did they contribute over some baseline. If you are a catcher, and you only play 120 games, and you are done by 34, then I don't have different standards. Not to say I'm right or anything.
You make a valid point that relievers can be considered similar to catchers (can't play long enough in a season or a career). So, you have to first resolve why they have shorter careers (because of the position, or the quality of the players there). Then you have to resolve if you want to have a different standard.
It's a tough call no matter what your perspective is.
Bruce, Lee, and the Goose
December 20, 2002 - tangotiger
(www)
(e-mail)
If I remember my post, Eichorn had 157 real innings, and 203 leveraged innings. It may be that that's as much mileage as you can get out of a reliever. That, because of all his warm up tosses, etc, etc, you won't get more than that. Then, he's got to do that for 15 years.
However, I don't know if this is a physical limitation, as it is for catchers (who play 130 games instead of 150, and who play 13 years instead of 18, e.g.). If the relievers are physically limited to 200 leveraged innings insead of 250 for starters, and 12 years instead of 16 for starters (just examples), then it may be fair to consider the relievers to have lower standards, like catchers.
However, this should be studied to the extent that catchers' careers have been, before we pronounce sentence.
Even after all this though, people can still choose to not lower the standards for the C/RP.
Bruce, Lee, and the Goose
December 26, 2002 - tangotiger
(www)
(e-mail)
If there's anyone still out there, Eric Gagne's LI last year was 1.83, and Smoltzie was 1.79.
Are Managers Optimizing Their Best Relievers?
December 31, 2002 - tangotiger
(www)
(e-mail)
But first, I'd like to suggest that the most optimal use of the best relievers would generally be as a starter.
Agreed.
Why don't we use the same thinking for relievers? Why is the 9th inning any more important than the 1st?
If you bring in Mariano Rivera with a 6-run lead 50 times, you won't change the outcome of the game, than if you brought in an average pitcher.
If you bring in Mo 50 times with a 1-run lead, the Yanks will win a few more games than if you brought in an average pitcher.
If there's a one-run game, aren't each of the starters' six innings just as vital as the closer's ninth?
I'm not taking anything away from the starters. Their LI is about 1.00.
In fact, I would even argue that the first 7 innings of pitching are MORE important than the 9th because the score after the 7th (and often the 6th) influences the choice of relievers the opposing manager will use.
7 innings of LI of 1.00 is 7 leveraged innings. 2 innings of LI of 2.00 is 4 leveraged innings. Yes, the first 7 are more important, or at least, they have more impact to the final outcome of the game.
Are Managers Optimizing Their Best Relievers?
December 31, 2002 - tangotiger
(www)
(e-mail)
Is it reasonable to conclude that Rivera's LI could be somewhat deflated due to the fact that the Yankees have been consistant winners over the last few years...
I believe I mentioned that as a possibility that the Yanks pay (earn) this price.
I don't recall any mention in the previous article of a correlation between overall team LI and team W-L records (though I would expect really bad teams to have the lowest LI's for their relievers).
On my to do list. I should be able to come up with the LI, on a team-by-team, year-by-year basis, from 1974-1990. I expect the LI to peak with teams at .500, and slowly degrade the more the team's win% is from .500 (on either side).
It would also be fun to see the converse of this study, i.e. what is the xFIP for pitchers with the highest LI?
Also on my to-do list. I just ran a prelimiary report for 1974-1990, and Todd Worrell actually tops the list at 1.97. Bruce Sutter is second at 1.90. The top of the list is all the usual suspects. The first name that I didn't recognize was Victor Cruz at 1.58. Next was Steve Foucault at 1.50.
Among "middle-relievers", Tim Burke was 1.54. He's a favorite of mine, and it certainly looks like he was used prominently. Paul asked earlier, and john Hiller was 1.62. Mike Marshall was 1.51.
Among pitchers with at least 2000 PA, Dave Tomlin was 0.73, and worst of the bunch.
Are Managers Optimizing Their Best Relievers?
December 31, 2002 - tangotiger
(www)
(e-mail)
Thanks, I'm enjoying this as well!
The problem with the "out" is that sometimes an out increases your WE (win expectancy), say a flyball with a man on 3b, of a tie game in the 9th inning. Strictly speaking, you have to look at the change in WE for every possible event, and then come up with the variance (and the frequency of those events). In essence, how much swing potential in winning does a particular game state provide? That's the question to answer.
I'd love a faster computer, as I'm running this on a 650 MHz (but 512 RAM). Sometimes, I have to run stuff overnight.
Are Managers Optimizing Their Best Relievers?
January 1, 2003 - tangotiger
(www)
(e-mail)
I will give a performance breakdown for Shuey and Stanton, among crucial, normal, and non-crucial situations. Look for this in a few days. We'll see if they can "handle" the pressure...
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
What we are after is *not* to maximize a pitcher's LI, but rather to maximize their leveraged-innings (LI x IP). LI of 1.00 with 120 IP will have the same win impact as 1.50 LI with 80 IP to a reliever. Of course, it's not that simple, as you have to take the totality of your starters and relievers, and maximize the leveraged innings for the good pitchers, and minimize the leveraged innings for the bad pitchers, such that all innings are accounted for. You have other constraints as well, with respect to the tiredness of a pitcher's arm, etc.
Mark Eichorn, for example, had 200 leveraged innings (LI of about 1.3) in his great year. That is an excellent total.
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
All things equal, you are better off having your pitcher as a starter.
Your considerations would be to take someone like Urbina and Wetteland, and determine their level of effectiveness as a starter or reliever.
Say that as a starter, their performance would be a win% of .600. And as a reliever, they would be .650. You know that you can get say 160 leveraged-innings as a reliever, or 220 leveraged-innings as a starter. What do you do?
Compared to a baseline level of .450 (the effective level of rejigging your whole pitching lineup), you get 160/9 * (.650-.450)= +3.6 wins as a reliever or 220/9 * (.600-.450) = +3.7 wins. Essentially, a wash.
So, you really have to go into it deeply, determine the effectiveness level of all your pitchers based on the starter/reliever role, determine how you can best optimize your leverageable innings, and come up with your plan. It's not so easy, especially considering injuries throw a wrench in your whole plan. Unless you are the Yankees.
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
The leverage classes were broken up into high-leverage (LI of 2 or greater), low-leverage (LI of 0.5 or less), and the rest.
$H is non-Hr hits per ball in play. All the others should be self-explanatory.
Paul Shuey? He was at his best in high-leverage situations. Mike Stanton? He was by far his best in high-leverage situations. Note the small sample of PAs. Note also that it's easier to get more WP in high-leverage situations, since high-leverage situations occur more often with men on base. In any case, Shuey's WP rate wasn't so high, relative to his other situations.
I think there's some interesting DIPS numbers in there as well. With the leverage situations different, each pitcher gave up fewer hits / ball in play, and fewer Ks as well. Almost as if the pitcher had to bear down in the high-leverage situation, and therefore, has a different pitching approach, thereby lowering his K rate, and improving his $H rate. We may in fact find that pitchers DO control the hits/ball in play ALOT. And it may simply be the fact that once you reach the majors, the pitchers are similar in this regard overall.
Are Managers Optimizing Their Best Relievers?
January 2, 2003 - tangotiger
(www)
(e-mail)
FJM: yes that is correct. The second guy was on a hotter seat, and that's what LI is reflecting. As I mentioned on another thread, LI is not about rewarding a player, but classifying each PA.
Note that a manager is choosing to bring back the same reliever. If he had chosen to replace the reliever after 2 hits with another reliever, we'd have no problem saying that the replacement was on a hot seat.
It doesn't matter who sets the fire. We are capturing the existence of the fire, and we are capturing that the manager is letting someone pitch in that fire.
Doug Drinen's reliever reports works based on when the reliever enters and exits the inning. This metric works great in other areas, for other purposes. Eventually, I'll probaby create an LI for this as well.
Are Managers Optimizing Their Best Relievers?
January 3, 2003 - tangotiger
(www)
(e-mail)
Well, I provided the LI for 10 top relievers of 99-02, as well as the historical LI for all pitchers in the 74-90 time period (see Clutch hits).
As for biased, again, there is no bias. It's a reflection of the game state for each PA. I know what you are saying about say John Franco or Mel Rojas being arsonists.
But it's not like there is a giant in a land of pygmies, even Mariano, that we should be concerned about. In the 74-90 time period, Clemens and Gooden are probably the giants. Their LI are 0.96 and 1.03. Hershiser was 1.03. Ryan was 1.05 and Blyleven was 0.98.
Are Managers Optimizing Their Best Relievers?
January 4, 2003 - tangotiger
(www)
(e-mail)
FJM: Again, I don't know how much effect it has, but I suspect a little. I'll find out eventually.
But again, remember the purpose of leveraged PAs. It's about describing the level of fire during that PA, regardless of whether that fire was arson or not. The manager is bringing back Mel Rojas, the arsonist, for the next PA.
As mentioned in another article, I can also create leveraged appearances, whereby I only note the fire level when the reliever is first brought in. This I will also do eventually. (Drinen essentially did this already.)
It's important to realize that a stat is constructed to answer a specific question, and it should not necessarily be used to answer other questions. Nor is it a shortcoming of the stat if it can't answer this new question.
Are Managers Optimizing Their Best Relievers?
January 4, 2003 - tangotiger
(www)
(e-mail)
If you page up to my Jan 2 comment, you will see a link to Paul Shuey and Mike Stanton, and how they performed in the various leverage situations. Paul Shuey, and especially Stanton, have excelled in high-leverage situations, when given the chance. The sample size is small, so who knows.
I was surprised with Percy too. I thought he was better, but his K,BB,HR numbers don't compare with the best, though he would have come in the 11-20 list.
As for more analysis, I would love to do it. But my time is really constrained. I want to do an analysis on a team-by-team year-by-year basis for the last 4 years, and within that, show how each pitcher performed in the high-leverage and low-leverage situations. There is really so much I want to do, I don't know where to begin.
Right now, I'm taking a break from relievers and concentrating on baserunners.
Are Managers Optimizing Their Best Relievers?
January 6, 2003 - tangotiger
(www)
(e-mail)
David, thanks much! I'm actually using alot of different concepts into all this, so it's rewarding to me as well.
As for 2002 PBP, astrosdaily.net has it, so I'm fine there. What I need is *time*. Can you help me there?
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
The A's have an additional point that by being able to work the count longer, a team can "choose" their opposing pitchers to the point where the average opposing pitcher is worse than by random chance.
They "choose" the pitcher by forcing their opponent to bring in the 10th reliever, because they wore out the starter. While this is certainly conceivable, you would need a whole team of such batters for this to work. As well, there's no guarantee that your team will benefit from it, since your opposition's next opponent might reap the benefits.
In the end, we are talking about a max .20 run difference/GP (see a previous Clutch hit for calculation), if the whole team is like this, and they are the ones who get the benefit. I fail to see how jumping the OBA to 3x from 1.8x would capture this. The "extra pitches" is not a function of OBA, but of (BB+K)/PA. By jumping the number from 1.8 to 3, you are capturing only part of this effect (BB/PA), in a whole bunch of other noise (H,HR,outs). This extra 1.2 is sort of trying to rise above the noise to find the BB/PA. If this is what the A's are trying to do, I don't think they're doing it in the best way. It's hard to comment further, without having the specifics (like James / Todd Walker comment as the best #2 hitter). From what we think they are trying to do, they are wrong.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
The "additional point" thing is what I'm capturing. It doesn't matter if you do: 3*(OBA-.3)+(SLG-.35) OR 3*OBA+SLG-1.25
It's the same thing.
========== As for the "wearing out the starter", Ted is correct in his approach. If you have a team of player's whose "true talent level" was .333/.400, this team would score about 4.5 runs per game. However, because these guys all work the count, they have a synergistic effect in tiring out the starter, and bringing in the 10th man. These guys, because they feed off each other in this manner, will end up with .343/.405 numbers (let's say). Now, all of a sudden, this team of talent of .333 with the synergy effect, acts just like a team of .343 with no synergy effect.
This extra effect the A's are capturing inside the OBA, by overweighting that metric. However, there's no reason to rely on such a noise-filled metric, when what you want is (BB+K)/PA or (pitches/PA). Because of the amount of noise, to try to capture the little extra pitches/PA in the OBA, you have to severely overvalue the OBA to find it.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
The other reason for using "3" for OPS is if you are actively looking for those types of players. If you really really want guys with high OBA, then you would overweight OBA. You would do this because maybe you feel that it's a better predictor of future production. Or you feel that you need to get the players to toe the company line, or whatever. Guy like Vlad, Nomar, and Soriano would not be properly appreciated in such a system.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
This is how SLOB*k and SLOB*PA*k (where k is some constant to make things add up nicely) for 6 equivalent players from that last chart look:
81 7681 78
79 79
77 79
74 78
70 76
SLOB by itself works ok, except at the real extreme. SLOB*PA works much better. SLOB*PA is essentially Runs Created, and we already know that BaseRuns is more logical/accurate than Runs Created.
The best one in this group remains static Linear Weights. The best one "on the market" right now is BaseRuns-generated custom Linear Weights.
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
Rob, you know what, you are right! I goofed.
While I was using outs as my baseline in the last chart, I should have used PA instead. Each player on the team should have the same number of PAs, not outs. Let me re-run the chart, and I'll publish the update on my site.
Good catch!
(Vinay, you are right about RC = SLOB*AB, and not PA as I mentioned in my last post.)
OPS: Begone!
May 20, 2003 - tangotiger
(www)
(e-mail)
For the last example, I should have been more careful.
What happens is that I should fix the team outs to something. In my example, I actually fixed it to each player making the same number of outs (440) which is wrong.
Anyway, what I now did (see link) was started with the team outs (3960), and, making sure each player had the same number of PAs, found the 8 typical guys and the 1 variable guy that would produce 3960 outs.
Things actually change. The Best-Fit becomes 1.64 (and not 1.75). I suspect that the best-fit will fall somewhere between 1.5 and 2.0, and for ease, probably use 1.5.
(Static) Linear Weights now looks less good than originally. I like this change, as it shows that the component values should change if the underlying environment also changes.
Custom Linear Weights wouldn't have this issue. Though at this point, I don't want to pronounce that custom LWTS will see all these guys as the same. It would definitely see all the teams as the same (just like BaseRuns). I think there will be some differences among these players though through custom LWTS. I'm not sure how much difference though.
Great catch again, Rob!
OPS: Begone!
May 21, 2003 - tangotiger
(www)
(e-mail)
See link for the values I used. For the categories I didn't use, I set them to "zero". It's not too important for what I am trying to do though.
As for the other question, you are asking if you can only know one thing, OBA or SLG, which one correlates to run scoring the best? I seem to remember Dan Werr doing a correlation study a month or 2 ago that showed the r to be pretty even between the two. That doesn't mean they are "equally important", especially if you have both.
As well, the coefficient itself (1.56, 1.64 or whatever) doesn't specificy the level of importance. If you made lilSLG = 1/4S + 2/4D + 3/4T + HR, all divided by AB, what do you think would happen? The best-fit would be 1.64*OBA + 4*lilSLG. That doesn't make lilSLG twice as important as OBA, now, does it?
OPS: Begone!
May 21, 2003 - tangotiger
(www)
(e-mail)
I agree that it would be a rush to judgement to make any conclusions without having all the information.
While you can conclude that using 3*OBA+SLG is a poor way to evaluate current run production, it's not so clear if you want to use that equation to try to evaluate future run production (or for other secondary reasons). And you certainly can't indict someone or some organization overall. Sample size! You need alot more evidence.
I also agree that being able to work with a group of people, respecting their views, regardless of what it is, as long as they respect your views as well, is very important. Respect, courtesy, professionalism. Isn't that the police motto?
However, I'll note that in the ESPN chat, Bill James said: Baldelli's a lot of fun. In my office we were making fun of some scout who compared him to Joe DiMaggio, but when you see him play you realize what people are reacting to. Of course, he doesn't have DiMaggio's entire package, but he does have more than half of it. I kinda didn't like the first part, which left me with the impression that the stat-heads and the scouts clash behind each other's backs. But, this was a throwaway sentence, so who knows what James meant.
Finally, as for anyone's ability to deal with people, I'm not sure that you can necessarily say that DePodesta is good or bad, nor could you say that with me, or Voros, or anyone else, unless you deal with these people on different issues in different settings (or you have some second-hand knowledge... definitely not third-hand or worse). I don't think that an executive is a better people-person, or can deal with people, than a non-exec.
I agree that arrogance is a turn-off to most people, and that's something that a speaker should be conscious of. Mike Gimbel, who I've had occasion to e-mail from time-to-time, seems like a pleasant enough fellow. But I've heard from many many people that he is insufferable. That by itself, truth or perception, will keep Gimbel out of MLB, in my view.
OPS: Begone!
May 21, 2003 - tangotiger
(www)
(e-mail)
I agree with your comment on the corporate world (as I've been here for...geez, almost 13 years... my "corporate world" anniversary will be in 1 month).
Rather, I'm talking about the ability to be persuasive when dealing with people who have dissenting or at least ambivalent viewpoints, which at the very least involves some combination of:
That sentence alone is interesting to read!
But to really do all of those things well is fairly unusual, and I would guess that among the pool of the 15 or 50 or 500 or whatever leading analysts, there's a lot more differentiation in terms of interpersonal ability than technical ability.
That's an interesting thought too. I'm not sure if there is more differentiation in one or the other, or how you would qualify/quantify all that. And even if the differentiation is more in one category, the impact of that differentiation might not be as much as the other category.
Did it just feel like we had an OBA v SLG discussion? (More differentiation in SLG, but more impact with OBA differences.)
As with everything, there's degrees of impact to everything, and it's rather pointless to label them black/white (not that that's what I think anyone is doing here). Even if you have a terribly insufferable analyst, his work might be of such quality that it tips the scales towards good. Even if you would be able to classify DePodesta as a mediocre sabermetrician (and I'm not doing that), the rest of his skills might be so strong, that he can make an impact with his research, while others might not (even with better "stuff").
The fact that a successful organization has him employed, and he is highly regarded by other successful people, even though his experience is not as vast as other baseball execs, must show that his total package is something to respect highly. He's a mover and a shaker, and he gets things moving and shaking in generally the right direction.
OPS: Begone!
May 21, 2003 - tangotiger
I think that you should give the benefit of the doubt when you can. I've heard nothing but good (in fact great) things about DePodesta, so, without him actually saying anything, I give him that benefit.
Now, I can interpret the 3 thing as being "you know, I've got this great formula, and you know what, this correlates highly to 3*OBA+SLG. I don't use 3OPS, I have my own, but as it turns out, it's close to 3OPS. BaseRuns, which I don't use, is close to 1.6OPS. I'm sure Tango/David don't use 1.6OPS, but their equation is close to that".
I don't think that explanation is unreasonable, is it?
OPS: Begone!
May 22, 2003 - tangotiger
(www)
(e-mail)
David, I agree that the 1.64 value is a little suspect since it is based only on those 6 players that I happen to construct. I mentioned that 1.50 to 2.00 would be the correct value, if you were to look for it.
I've used the plus-1 method in the past, and I find I can minimize the runs error by using 1.83 as the coefficient for OBA. That is, 1.83*OBA+SLG. I think that as long as you use something between 1.5 and 2.0, you'll be ok, or at least better than not. I suppose if you really wanted to find the best-fit via the "plus 1" method, you'd look at 200 regular hitters, and figure it out that way.
(For the uninitiated, the "plus 1" method was described in the "Runs Really Created" series last year. Check out the archives.)
OPS: Begone!
May 22, 2003 - tangotiger
(www)
(e-mail)
Interesting. You know, I'm pretty sure I never include the IBB, but it was several months ago when I did that 1.8 thing. Interesting results though. I suppose we should compare it to the full-blown BsR version in that case.
OPS: Begone!
June 2, 2003 - tangotiger
3*(OBA-x)+(SLG-y)
This works out to 3OBA+SLG-(3x+y) which works out to 3OBA+SLG-k
Therefore, it is irrelevant what "k", "x", or "y" is. Whatever numbers you choose won't affect the ranking of the players, or the degree of their rankings, relative to each other, than if you simply used 3OBA+SLG
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
Nick, very well said, and I especially liked this
...because he generates extra PA at (mostly) his teammates ability levels, not at his own. It would have taken me a paragraph to explain this, but you said it perfectly in half a sentence.
As for the batting average thing, I suppose that's another myth. It's pretty clear that given two guys with the same OBA and SLG, you want the guy with the LOWER BA (though in reality, we're not talking about much difference).
I suppose if you really needed to quantify it, probably something like 3*OBA+2*SLG-BA (I really don't know, but it would be of some form like that). I'll bow out of any discussion on trying to find the best-fit equation using OBA,SLG,BA. I already don't have much use for OPS, and I know I won't like OPSMB!
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
(e-mail)
Jason, interesting thought.
I just tried with a weird environment (OBA/SLG of .393/.493), and in this case, the higher the BA, the more runs scored. I then tried the other way, with .289/.351, and this time the LOWER the BA, the more runs scored.
The "break-even" point seems to be about .360/.450. That is, at that level, the change in batting average (and I checked from .200 to .340) made zero change to the run production of the team.
Great call!
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
(e-mail)
"Key" situation is another topic entirely.
Click the above link, select your "key" situation, and plug in the numbers (on a /PA or /600PA basis). That'll tell you which guy you want.
If by key you mean inning/score as well as base/out, then you need another tool to evaluate it.
OPS: Begone! Part 2
May 27, 2003 - tangotiger
(www)
(e-mail)
I just want to make it clear: do not, absolutely do NOT, rely on OBA/SLG/AVG to make game decisions.
You must break it down to your components, and you must apply those components against the context being faced (base/out states, inning/score/base/out game state, game/pitcher state, etc, etc).
OPS is quick and dirty and has no place in game decisions. Relying on it for some cases will make you rely on it for most cases, and sometimes all cases. That's a bad habit to start. OPS, begone!
OPS: Begone! Part 2
May 27, 2003 - tangotiger
Every game context produces different "win potential" for H, HR, BB, outs, SB, sacs. The values between those components are not static. In a completely "run potential" world, you would never call for an IBB or a sac. But in a "win potential" world, there are many many times that you need to call for the IBB or sac.
OPS, if left to its own devices, would become the defacto mechanism to evaluate game situations, when in fact its purpose is to gloss over player evaluations. I don't believe in taking baby steps, and the long path to get the job done. I also don't believe that we should hand hold the manager for 20 years to lead him to the proper tools.
Give them the right tools for the right job, and let them decide if they want it. If Felipe Alou says that looking at OPS is b.s. to decide whether to walk Bonds, I'm going to agree with him. Should I say that OPS is less b.s. than using BA? A rose by any other name...
OPS: Begone! Part 2
May 30, 2003 - tangotiger
Yes, what you want is win-based LWTS (or a sim). And I would guess that a manager will be able to be right (using only his experience) more often than using just OPS, in a tight in-game decision.
OPS: Begone! Part 2
May 30, 2003 - tangotiger
(www)
1) How can injecting more than 40 extra bases into the same number of plate appearances or outs produce a negative result?
40 extra bases on hits, but 100 less bases on walks.
OPS: Begone! Part 2
May 30, 2003 - tangotiger
(www)
The differences between the top guy and the bottom guy, the bottom guy has: 100 more walks 19.7 more HR 119.7 less singles
everything else is the same.
Straight static LWTS says that works out to +33, +28, -56 = +5, or some such.
OPS: Begone! Part 2
May 30, 2003 - tangotiger
RC has its own problems, magnified substantially when the HR/H or HR/PA becomes out of whack. RC does not model run scoring at all: it just got lucky that it looks like it models it. If you've got a computer, there's zero reason to use RC, when you've got BsR (unless you want to propose a model that's better).
I don't really care about the different denominators. The whole thing of OPS centers around: more good, less bad. The more walks, the more hits, the more TB, the less outs, the better the number. There's nothing inherent in OPS that ensures that the balance is proper. It's just plain old luck that for the run environment of MLB, that it works out that way.
Believe me, if the run environment was half what it is today, or double what it is, there'd be some other "quick" estimator that would get lucky to model run creation.
Sorry for the rant.
How are Runs Really Created
August 12, 2002 - tangotiger
(www)
(e-mail)
Devin, excellent points.
...If the results of the RC formula didn't correspond roughly to actual runs, James wouldn't be using it.
As I mentioned, as long as you are using typical teams in the .300 to .400 OBA range, and as long as the HR/game hit is around the norm, then RC works fine as something useful.
The problem is when you try to extend that to Barry Bonds types of teams (not that they exist) or Pedro Martinez types of teams (and they exist plenty, as Pedro, when on the mound, is his own team).
My point is to make sure that just because the results of Runs Created works on a particular set of samples doesn't mean that you can extend that methodology to other types of things you may be doing.
There's a reaons RC fails, and it's in its treatment of the HR.
2) Okay, my common sense has a problem with a run value system that has events with the same outcome (walk, HPB, interference) having different run values.
Let's take a real simple example: a regular walk v IBB. Since an IBB walk occurs almost always with first base open, then an IBB has zero "moving over" value. Since the IBB is given out much more with 2 outs than with 0 outs, the "run scoring" value of the IBB is much less than a regular walk.
So, based on the frequency of when the events happen, and the effect of each event, the values can change drastically.
As for a regular walk v HBP, HBP occur in more or less random fashion. A walk occurs with more frequency with 2 outs than 0 outs, and with more frequency with no runners on 1B than expected in random fashion. The effect of these two things reduce the "moving over" value of the walk and the "run scoring potential" of the walk.
If you are thirsty for more, I've published PRELIMINARY results on the run values of various hitting events by the 24 base-out states. (I should be publishing an updated table in a few weeks.) From there you will see there are virtually no differences between the walk, IBB, and HBP, as you'd expect.
http://www.tangotiger.net/lwtsrobo.html
Thanks, Tom
How are Runs Really Created
August 13, 2002 - tangotiger
(www)
(e-mail)
Rob, your question on base-out differences in run value can be found here http://www.geocities.com/tmasc/lwtsrobo.html
I looked at the batting order differences of run values, and there was a long thread posted on fanhome. It is not easily digestable, and someday I'll write an article on the discoveries there. But yes, as you'd expect the leadoff hitter's HR value was 1.30 while the #5 hitter was somewhere around 1.47.
John Warren: the steal is an interesting point. The run value of the SB is very independent of the run environment, as the additive value of the SB is around .17 to .21 for the most part. The CS however changes HIGHLY, as the out is the most dependent on the run environment. The break-even point is therefore much lower with Pedro, and more steals should be attempted against him.
Mike: I've previously published charts on win expectancy which I have to update in the near future. There's no doubt that win expectancy is really the most important aspect of analysis since that's what we are after. Again, for those thirsty for more, you can consult my prelimiary chart on WE here: http://www.geocities.com/tmasc/we.htm . Again, where this comes most into play is the IBB. While the run value of a regular walk is .30 runs and the run value of the IBB is .17 runs, the win values are far different. Because the IBB occurs in game situations where it is "controlled" to minimize the impact of win/loss, then it's win value would also decrease.
Thanks for all your great comments.
How are Runs Really Created
August 13, 2002 - tangotiger
(www)
(e-mail)
GIDP: it's worth around -.45 runs. I was thinking of breaking up the "outs" PA into "outs 1, outs 2, outs 3", but decided against it. Maybe I will fix that.
Jason: what I am presenting is how runs are really created. It's the building blocks to whatever it is you want answered. From this, you can generate win expectancy tables, if you like, or the more detailed run values by the 24 base-out states. You can then further extend this to a 24x9 run values that ALSO includes batting order. And from that standpoint, you can evaluate the #9 v Bonds with the bases loaded.
These other run evaluators give no option to do this simply because they are the end to the means. They were built to answer a specific question, and therefore are not very extendable. Play-by-play analysis is very extendable.
How are Runs Really Created
August 14, 2002 - tangotiger
(www)
(e-mail)
Linear Regression
There are certain things that must be understood about linear regression and using it to determine the relationship between hitting events and runs scored.
First, a little background on linear regression. If you have two things, say, the price of a stock and the earnings per share, you can probably find a relationship between these two variables. The higher the earnings, the higher the price of the stock. You will end up with a formula like P = m times E + b, where P is price, E is earnings, b is some constant and m is the slope. The price of a stock, and runs in baseball, is influenced by more than one variable. You end up with an equation that says y = m1a1 + m2a2 + m3a3 + ... + b. Linear regression lets you input the independent variables a1, a2, a3..., the dependent variable y, and solve for m1, m2, m3..., and b.
Here are 4 major problems with using this in baseball: 1 - Linear regression is LINEAR. Linear as in a straight line. While there is a somewhat linear relationship between runs and singles, doubles, triples, and walks, there is NOT a linear relationship between runs and HR, or runs and everything else like SB, WP, BK, etc. Baseball is non-linear.
2 - The independent variables are not independent. There is an interdependence between all these variables. A walk is only worth what it is because of the other things that happen. Linear regression attempts to "freeze" all the other variables when calculating the value of the unfrozen variable. As your run environment increases however, we know that the values of these variables change. Baseball is interdependent.
3 - Even if you assume for ease that run creation is linear and independent (a safe assumption for very controlled environments), what sample data will you use to run your regression against? Most people will use team season totals, which is an aggregate of individual games, which is an aggregate of individual innings. If you want to run a proper regression analysis, at the very least run it on a game or inning level. Your sample size will explode to something much more reliable.
4 - Not accounting for all the variables. Triples have a strong relationship to speed. If you don't have SB in your sets of variables, the regression analysis will award more weight to the triples as a stand-in (because of its relationship to steals). It is possible, based on some samples, that the value of a triple could exceed the value of the HR! What other variables are you not accounting for?
Arvid - Let me get back to your post. The purpose of this article is to explain the building blocks of run creation at the team level. I have not shown how to extrapolate this to individual players. The end-result is not to end up with linear values for each hitting event, since these linear values only apply to a given run environment. We need to determine the linear values for EVERY run environment! As I said, the value of a single in Pedro's run environment is far less than a single in an average pitcher's run environment.
I am interested in the pieces of how runs get created, an actual model. I am not interested in a formula that estimates runs based on whatever variables that ONLY works for a given run environment. Runs Created and Linear Weights work fine for that. BaseRuns is the key, and I will present this hopefully by the end of this month.
Michael - The building blocks of run creation does lie in run expectancy tables for the 24 base-out states. I am not introducing anything new here, but rather showing how we should extend this to other run environments. I have not read Curve Ball. Please clarify your post further so that I can properly answer you.
Rob - Are you asking me what would a player's run value be using a context-neutral approach (i.e., the final weighted average values I presented) compared to a context-specific approach (i.e., the specific values by the 24 base-out states)? If this is the case, the answer is about +/- 10 runs at the extremes. I looked at this last year, with regards to Ichiro. You can find that article here http://www.geocities.com/tmasc/lwbymob.htm though I only looked at the 8 base states. If this is not what you are talking about, please clarify further.
How are Runs Really Created
August 14, 2002 - tangotiger
(www)
(e-mail)
Michael: I agree that the easily most digestable measure of run creating is one that is context-neutral, and therefore, I am not adding anything new here, except more perfect values to use (and adding values to the obscure events like RBOE or BK).
My interest lies "under the hood", and the how and the why.
The important point that I'm also trying to get across is that even if you stick to a linear context-neutral measure like linear weights, that you should use a custom version, based on the run environment. It really makes no sense to apply the same formula to Mel Rojas as to Pedro Martinez. We only do this, because it's easy for us. And if we keep doing it, we will forget to question why we do it. Runs Created, as great as it was then, is an example of this. It completely fails us at the extreme player level.
I think I am in basic agreement with your point of view.
Rob: OUCH! First of all, I did look at the batting order about 2 years ago, and there was an effect of something like 15-20 runs for Rickey Henderson in the leadoff spot. That is, putting a player whose skillset is uniquely qualify for a batting spot that has the most variability (which is Rickey to a tee) with his best season I think had a variability of close to 20 runs (against putting Rickey say in the #5 spot). The #2 hitter also showed great variability, and I concluded elsewhere that in certain (many!) situations, your best hitter should bat #2.
With the MVP/Ichiro thread, I showed that batting great with men on base, or being given alot of men on base will add 10 runs. Give both, and you're close to 20 runs as well.
I really don't need to run a simulator to determine all this though. This is a simple problem of determing the frequency of facing the 24 base-out states, and your success in those same states.
I wouldn't be surprised if you have a player who is ideally qualified for a particular spot (say Ichiro for #2, though I don't know that), who faces more than normal high-leverage situations, who is one of the best hitters in the game and who performs far above his "neutral" performance level would add 30 more runs than if placed in a "neutral" spot and performing at his normal high level. This is of course a rarety, and I would guess in practical terms that 1 standard deviation would be +/- 4 runs.
This issue however is very interesting to look at, but it would be something that I would have to prioritize in with the other equally interesting things I'm looking at.
How are Runs Really Created
August 14, 2002 - tangotiger
(www)
(e-mail)
Michael, I would not look at SF actual run output to determine anything since 6000 PA is not a very small sample.
Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player.
I then moved this Bonds type player through the batting order.
From what I remember, I did not notice much difference between the 9 equal guys and the Bonds + bad team.
I'll have to redo that study now that I have better data available. It is again another interesting question that I must look at.
How are Runs Really Created
August 14, 2002 - tangotiger
I meant IS a small sample.
How are Runs Really Created
August 15, 2002 - tangotiger
(www)
(e-mail)
Michael, I guess I didn't make myself very clear, since what you replied is exactly what I said.
"Anyway, I once ran sims where I had a team of 9 .333 OBA guys, and another team with 8 .300 OBA guys and 1 .600 OBA guy. Overall, both groups are the same. I also made the SLG average about 30 or 40% higher for each player."
So, the 9 equals of .333 OBA had a team weighted team average of .333 OBA. The 8 equals of .300 OBA plus the Bonds-like .600 OBA would have a team weighted average of .333. So, the first team has the Bonds magic spread around. We are talking about two equal teams in terms of overall talent, except that the spread is far different.
As I mentioned, I don't remember seeing any noticeable difference. It might have been maybe 2% difference (say 15 runs over a season) only to the extent that you'd be able to optimize the batting order so that the .600 guy could do the most damage. I will redo the study at some point in the future though to get more accurate results.
Here is a link to the results of the study I did last year. Please take it as preliminary and crude. Spreading the Bonds magic
How are Runs Really Created
August 15, 2002 - tangotiger
(www)
(e-mail)
tango I think your crude Bonds analysis goofed up in exactly the types of ways you intended to prevent with your article.
I was doing my best to avoid lone gunmen types like Bonds. I did that analysis ONLY to show the effect of runs at a team level, with having either 9 guys equals, or 8 guys equals, and 1 outlier, even though overall, they have the same stats. I did not want to talk about the "run environement" because...
The problems I see are that as you so elegantly noted walks are valuable only because others drive you in. By using just OBA you missed that completely as most of Bonds exceptional value is in the walks.
...because Bonds doesn't get to partake in his own run environment. Bonds's run environment, the chances that the runners ahead of him will score, and the chances that he himself will score is derived by all the other batters. You can't measure Bonds value of moving runners over, if those chances include partly Bonds' effect.
So, I was hoping that everyone would overlook this, because the Bonds effect to the run environment is outside the scope here. However, since you brought it up, what you have to do, in this case, is establish a run environment for each batting spot for this particular team, such that if Bonds is the #3 hitter, then the run environment of the #2 hitter includes Bonds, but the run environment of the #3 hitter should "assume" an "average" type of ballplayer.
I went into this into great and deathly details in the batting order thread on fanhome. I really want to avoid talking about that here, because we are going to get away from the basics too fast.
Your point is well-taken and accurate.
It would be interesting to analyze all of the line-ups that have been tried to see which would be the most effective based on the run environment concept, and of course see if you could find a better one.
The run environment concept applies to the basic building blocks of run creation, and I did apply this to the above mentioned thread on batting order.
The correct and proper way to do what you are suggesting is to use the proper model (a simulator) to go through all the variations. The run environment concept with its building blocks of run creation however will reduce the different combinations of players to look for greatly.
For those interested in the batting order thread, drop me a line, and I'll point you there. tom@tangotiger.net
How are Runs Really Created
August 15, 2002 - tangotiger
(www)
(e-mail)
new: you pretty much have got it, except near the end. The run environment is established by the overall offense + pitching + fielding. You CAN create the run expectancy tables and all that with a little programming. You can also extend this into win expectancy tables, which is where the real fun and learning experience lies.
How are Runs Really Created
August 16, 2002 - tangotiger
(www)
(e-mail)
Voros: BaseRuns does not fall into the trap that RC does nor LWTS. You will find it an appropriate measure, though you lose the great additive advantages that LWTS affords you. Readers of fanhome know what I am talking about here. For the others, please bear with me until the end of the month.
Walt: your terrific dissection deserves a generous response. I will in due course. I do want to make three specific points in the meantime though: 1 - the linear models that are presented with regards to baseball are almost always to the power of 1, and therefore that was my basis for my statement
2 - John Jarvis did a regression analysis on I believe the 1976-2000 TEAM SEASONAL totals and came away with a regression value of .62 (or something) for a double, and .87 (or so) for a triple. Those values are nonsensical in reality. It doesn't matter that his r-squared was 90% or that the standard error was very low. It's wrong. I've done regression analysis on team totals by era, and the results also were strange in some cases.
3 - But I'm really not seeing what this has to do with how slopes change by run environment. Modeling that would suggest other possibilities like a series of dummy variables representing different run-scoring eras or a multi-level random effects model.
Yes, we've tried that, but it doesn't work. As I've shown, each element would have to have its own best-fit linear or parabolic or whatever equation, with respect to the run environment. And the run environment itself would have to be known before the fact. Since we are attempting to determine what is the actual run environment without knowing the number of runs scored, we're stuck. This is where BaseRuns comes in. An elegant, simple and accurate equation.
I will reply to your lengthy post soon. Thanks...
How are Runs Really Created
August 16, 2002 - tangotiger
(www)
(e-mail)
Not exactly. Linear regression is linear because it's "linear in the parameters." There are many ways to model non-linearities among the variables using multiple regression.
Thanks for clarifying some points. I should then say that baseball is virtually linear in the parameters, but is non-linear to its environment.
In regression, a coefficient gives you the impact of adding that particularly variable to the model, after having removed all the influence of the other variables from both the dependent variable and the independent variable in question (aka "statistical control"). I don't see any inherent problem with doing that here, but perhaps I'm missing something.
The problem is that if you freeze say all the hits, HR, etc, but leave the walk to be the independent variable in question, its value is dependent on the values of hits and HR. So, you freeze hits and HR at say 10 and 2, then the value of 1 walk might be .30, the value of 2 walks might average .32, the value of 3 walks might average .34. Furthermore, if you then freeze the hits at 11 and the HR at 1, all these values change. So, exactly what is the value of the walk?
But I'm really not seeing what this has to do with how slopes change by run environment. Modeling that would suggest other possibilities like a series of dummy variables representing different run-scoring eras or a multi-level random effects model.
Yes. But that's really really hard.
This is a good point, but of course this has nothing to do with the appropriateness or inappropriateness of multiple regression, but rather with what the proper unit of analysis is.
Yes, my third statement was exactly this point.
...That's a lot of conditional RE tables. :-)
Yes, the 24 basic states is the least amount of states that you should accept. 24 x 9 to include the batters would be better. The point of about the fielders etc should be factored into the RE tables before the game so that you have a customized set of a 24 x 9 RE tables that is based on the actual 9 hitters, the pitcher, and the fielders.
However, if things like HBP and interference are truly random, omitting them from the model will not bias the coefficients for variables included in the model.
Things like HBP may be an indication of poor or wild pitching and therefore before we omit anything, we have to determine if they are truly random. Interference I'm sure we can ignore.
All this aside, chances are none of this will have much impact. Baseball scoring is not all that variable, most of the important variables have been identified, etc. Chances are the best we can hope for is minor improvement in the level of error. The proof's in the pudding there and I hope a future article will compare the accuracy of your method to the existing ones.
John Jarvis has gone through the exercise of comparing the various estimators, so I don't need to rehash that.
As I said in the article, as long as you adhere to typical MLB teams who play at the typical OBA levels, then really any run estimator will "work". That's because at a very narrow given specific run environment, what you say is correct, and there is not much variability.
For a "team" like Pedro, this does not apply whatsoever. And Voros is correct that while there is no 9 Bonds hitters, there is effectively 9 Bonds hitters when a really bad pitcher is on the mound. This pitcher would provide his opposition with a Bonds environment.
This is why it is important to understand the building block of run creation, and its high dependence to the run environment (which itself is determined by the various offensive events working together in a non-linear interdependent fashion).
Great comments Walt, and I hope that my lack of knowledge on specific statistical concepts did not take away from the comments I have presented. Thanks.
How are Runs Really Created
August 17, 2002 - tangotiger
(www)
(e-mail)
Here are the results of a basic linear regression, using team totals from 1969-1999 (808 teams). The second line is the standard error.
outs 1b 2b 3b hr bb k sb cs (0.11) 0.51 0.72 1.10 1.47 0.34 (0.10) 0.21 (0.19) 0.00 0.01 0.03 0.08 0.03 0.01 0.01 0.03 0.07
The r-squared is 95.5%. I still wouldn't take those numbers. They are nice guidelines. Very good ones in fact. But when we have access to play-by-play that tells us exactly what each event, on average, is worth, what does looking at the aggregated seasonal line tell us?
Related Web Pages and Articles
How are Runs Really Created - Second Installment
August 20, 2002 - tangotiger
(www)
(e-mail)
I've gotten a few response emails with some nice remarks, but I guess there's not much controversy in what I'm saying.
How are Runs Really Created - Second Installment
August 25, 2002 - tangotiger
(www)
(e-mail)
Tango, don't let the lack of response deter you from completing this project. It is important to get all this down on paper in one place. Combined, it will become an important source for proper RC understanding.
David, I agree with your sentiment. I have a half-dozen other projects that I don't yet know the answer to, and therefore have more interest to me (things like when to actually walk Barry Bonds specifically, based on score, inning, men on base, outs, and batting order). However, I've read too many run evaluator articles that I hope to put a stop to the gobbledygook type approaches, and steer the search in the right direction.
Your BaseRuns is using the only right approach, by definition. As I've mentioned, the only thing left in understanding run creation is the score rate, and how to calculate it.
Tango, trust me, it's not yawning -- it's digesting. I am thoroughly enjoying the work.
Thanks, and hopefully the next article in the series will be as satisfying.
...it was the position of several well-qualified analysts that models don't matter--all that matters is accuracy in the range of interest. To them, the range of interest was the 3 to 6 R/G range of real MLB teams. To me, and to you, the range of interest is 0 to infinity R/G.
Yes, the shortcut way, while easier, and sometimes even more accurate, does not lend itself to extrapolating beyond what it was designed for. Since we are living in the time of Pedro and Bonds, maybe we care more about the extreme types.
I hope you delve into the subject of how to apply a successful team run formula to individuals. I believe that this area has NOT to this point been analyzed properly, and I am curious as to what you have come up with.
This area also needs alot of work, but I will present what I have nonetheless (probably in the 4th installment).
*** A few others have responded by email, and I appreciate any feedback (positive or negative).
How are Runs Really Created - Second Installment
August 26, 2002 - tangotiger
(www)
(e-mail)
Thanks Ben. To answer your question, there are 24 base-out states to consider. Depending when the K occurs, it's value would be different than a regular out.
You can check out a chart that breaks down the various run values by the 24 base-out states here: Run values by the 24 base-out states .
The K also has an extra wrinkle in that the batter can be safe after a K, or other events can happen afterwards. I've chosen to include them as part of the K, but in reality the effect is very tiny.
Related Web Pages and Articles
How are Runs Really Created - Third Installment
September 16, 2002 - tangotiger
(www)
(e-mail)
Actually, the linear weights that I use can determine either absolute runs scored, or runs above average.
As described in the previous article, the reconciliation between the two methods is simply subtracting .16 runs per out (for 1974-1990).
How are Runs Really Created - Third Installment
September 16, 2002 - tangotiger
(www)
(e-mail)
That is, the number of home runs hit isn't independent of other run-generating events. It's not that more runners are scoring per home run when a large number of home runs are hit, but rather, that there are more (run-generating) singles, doubles and triples in these games as well.
*** First of all, this is not true. Generally speaking, those events ARE independent of the HR. I ran further studies that controlled for those events (for example, looked only at games with 2 to 4 walks, 6 to 8 singles, etc, and separated them by the HR class) so that there was virtually no difference in those events. The results were the same.
Why the hell should I care about extreme outcomes?
*** In extreme examples, you can't hide the shortcomings of your models or estimates. They stick out like a sore thumb. And note, in my extreme examples, the dataset went only so far as Barry Bonds' 01/02. So, it was not "unrealistic" extreme, but realistic extremes.
If I'm a major league GM, how does BaseRuns help me to build a better team? Does it have better predictive value than runs created?
*** As I said, almost all run evaluators are similar at the .300 to .400 OBA range. This will help you determine the true value of those extreme players that GMs are trying to figure if they are overpaying them.
Better predictive value? This system, nor runs created, does not talk or explain about predictive value. Voros, MGL and a few others do a good job there.
Does it do a better job of explaining run creation in "realistic extreme" environments such as Barry Bonds 01/02 or the Deadball Era or Coors Field?
Yes. This series of articles explains a team of Barry Bonds, a team of Pedros (i.e., Pedro himself), and virtually any run environment, regardless of whether that run environment is due to the hitter, the pitcher, the fielders, or the park. If Runs Created says that the run value of a HR is LESS than 1 for dead ball (an impossibility) pitchers, or worth more than 2, close to 3 runs (!), for Barry Bonds run environment, what does that tell you?
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
There is some correlation. Here's the data. But since I've shown the breakdown by OBA (where there is HIGH correlation by definition between the # of singles, doubles and OBA), and I broke it down by HR (where the correlation that does exist has very little impact overall), I don't know what it is that you are after.
S D T HR HBP BB 6.3 1.4 0.2 - 0.2 3.1
6.3 1.5 0.2 1.0 0.2 3.3
6.4 1.6 0.2 2.0 0.2 3.4
6.5 1.6 0.2 3.0 0.2 3.6
6.4 1.6 0.2 4.0 0.2 3.7
6.7 1.7 0.2 5.0 0.3 3.8
7.3 1.7 0.3 6.0 0.2 4.2
7.6 2.1 0.3 7.0 0.4 4.4
As you can see, the most important variable here is the HR. It has by far the most impact on how many runs should be scored, *in this grouping of data*.
And the impact that it does have is nowhere as high as Runs Created would say it is. It's impact, as shown in the article, is virtually exactly what BaseRuns says it should be.
If BaseRuns is 50% more accurate at dealing with extreme cases, and 1% less accurate at dealing with realistic cases (I don't know that it is), that seems to me like one step forward and two back.
Again, the point of the articles is to paint a picture as to how runs are created. BaseRuns is the first step in trying to figure out what the score rate should be. I'm sure there'll be better ones to come around. But the basic model is correct by definition. What Runs Created, static Linear Weights, et al do is to ignore the model, and instead fit their formula to the sample data they have on hand.
What is the question you're trying to answer?
How are runs really created.
What are the implications of your research?
The implication is that by ignoring the actual basis of how runs are scored (that the HR has an absolute minumum value of 1, and caps off at somewhere below 2, and that all events should converge to 1 as the OBA converges to 1), you are fixing your formula to reduce the RMSE. While you might (and should!) get better results by fixing your formula against known sample data, you are deceiving the reader into how runs are really created.
The implication is that the value of the HR does not have an ever-increasing value. There is a law of diminishing returns for the HR specifically.
I apologize in advance for my confrontational tenor,
I would appreciate if the confrontation aspect is reduced slightly. Thanks. I'd prefer the debate center on the merits of the data and interpretation of the data.
but your advocacy of BaseRuns comes across as almost cultish, based on a series of assumptions it conflates with Truth
Again, except for BaseRuns' definition of the score rate, everything I've said is truth. What assumptions are you referring to?
We start off with a point of fact that runs = BR x scoreRate + HR. Now, we're trying to figure out what the score rate is. David' B/B+C seems too simple, but in actual fact, this ends up conforming to reality. There's a problem at the very high end, and that's where we should try to look for better answers. But the runs = BR x scoreRate + HR must hold.
, without regard for the world around it.
BaseRuns is the only model that accounts for the world around it.
But how about one Barry Bonds and eight mortals? What I'd like to see is a comparison of the systems within an actual major league context, not a simulation that you've designed to produce an outcome that is preordained to be favorable to your cause.
Preordained? Would you believe me if I told you that I wrote the first 2 articles BEFORE I ran the data in the third article? I was happy to use my sim, but then I decided to run against real data. I was the biggest skeptic of BaseRuns when David first introduced it to me. There's no bigger skeptic of "new math" than me, be it DIPS, BaseRuns, or Win Shares.
As for 1 Barry and 8 mortals, that would require the use of a sim, because of the problem I mentioned regarding the pond and the aquarium. There are other factors, specifically, the batting spot you put Barry in. He has a different effect if batted 1st than 5th. I intend to look at the batting order effect at some point, but I'd be happy to share anything specific you would like to know.
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
is Base Runs designed to provide new information
At the risk of repeating myself, it is designed to present how runs are really created. It's up to the reader to decide how valuable it is to know this.
As I mentioned on a few occasions, if all you look at are players and teams with an OBA around .300 to .400, it really doesn't matter what you use.
But, if you are interested in extreme examples, like Pedro, or a high run scoring environment, BaseRuns value comes through in that it doesn't give you the shortcuts the other run estimators rely on to be accurate.
I agree, if you are not bothered by back of the envelope calculations, and you don't care about extreme situations, then stick with basic RC or static LWTS. They'll serve your purpose. I've said as much in past articles.
I am presenting a framework to understand how runs are created, and that at some point the marginal value of the HR decreases, while other run evaluators never consider this.
And if you are looking at college or high school ball, then BaseRuns becomes much more valuable.
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
As for how accurate is BaseRuns is in "real-life" situations, here's the data behind the "by OBA" chart. The "R" is actual Runs scored.
oba R BsR LWTS RC 0.030 0.12 0.10 (1.93) 0.03 ... BsR better
0.077 0.41 0.31 (1.18) 0.20 ... BsR better
0.124 0.61 0.63 (0.37) 0.51 ... BsR better
0.176 1.17 1.22 0.63 1.09 ... BsR better
0.224 1.88 1.97 1.67 1.86 ... RC better
0.275 2.83 2.99 2.89 2.93 ... LWTS better
0.324 4.04 4.21 4.19 4.21 ... LWTS better
0.371 5.31 5.58 5.51 5.65 ... LWTS better
0.421 6.87 7.28 6.95 7.42 ... LWTS better
0.468 8.64 9.18 8.41 9.33 ... LWTS better
0.515 10.34 11.33 9.89 11.49 ... LWTS better
0.566 12.10 13.91 11.78 13.84 ... LWTS better
As you can see, when the OBA is between 275 and 375, all three measures are very very similar. But for the Pedros and Thomes and Bonds of the world, things are different.
Notice also that LWTS is better at the "high-end". This is because LWTS takes advantage of the HR value to be fixed at 1.40, and it doesn't fall into the RC trap.
If I present a similar table broken by HR class, BsR will take over in *all* respects. This is why I say that David's BsR is the first step. Clearly, it still falls into a similar trap as RC in that it overvalues each event (but not as much) that RC does. The search is to find out how to better represent the interaction.
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
Ugh, just trying to turn off those italics. Sorry about all that.
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
Are those italics ever going to die?
...this as the sort of tough love you'd encounter in defending a dissertation. It is clear that all of this makes sense to you, but it is not so transparent to a well-informed audience.
*** Yes, this is very clear to me. Since I don't have the natural honed gift of Bill James in writing, I'll do my best to convey my message better.
I also think that you're inviting somewhat more ... aggressive feedback when you say things like "Runs Created is dead, BaseRuns is the now",
*** That's ok. I say these things with basis. I've gone to great lengths to show different scenarios, etc.
...or invoke (incorrectly) something like the Heisenberg Uncertainty Principle. I mean, you're talking the talk.
*** I didn't mean to suggest that the Heisenberg Uncertainty Principle was at work here, nor that my example was one of Heisenberg. The specific quote I took was one where it's hard to distinguish between what is being observed, without interacting with the system you are observing. Barry Bonds does not interact with himself, only with his teammates. But by throwing Bonds into the mix, you are changing the relative values of the teammates you are trying to study.
2. Thank you for presenting the table of correlations.
*** Sure. I'd be glad to show more detailed data. Just the forum here is not very appealing for it. Email me if you want more.
3. The fundamental point is that there's no "Holy Grail" here. Runs are created by the particular combination of batting and baserunning events in a particular inning of a particular game.
*** The search is for the holy grail, and BaseRuns is *not* it.
Any attempt to generalize these unique sequences of events into something universally applicable has got to make approximations and assumptions.
*** You don't want to make them universally applicable. The holy grail reference was in reference to things to come. You have to understand all the contexts, the base-out situation, the pitcher/batter matchup, the runners, the fielders, the park. I don't expect that we will end up with 1 formula for all that. I do expect that we will get a series of principles that will follow all that. The work that I have done all leads to this.
Linear weights cuts a different corner than BaseRuns does, by focusing on data at the season level rather than at the game level. You assert that focusing on data at the game level is True, without presenting evidence either as to the utility of this approach
*** The game is the unit since the interaction of the events occur at the game level. You make a more accurate point that the interaction occurs at the inning level, and this is true. If I had the data, I would have presented at the inning level.
(what would Billy Beane do with it?),
*** I'm sure he's a smart guy. But not all these things have to be applied by GMs. My audience is myself, and people who think like me. Maybe there's not many people out there like that, that's fine.
or to its aesthetic purity. Why focus on the game level, rather than at the inning level? Why try and take all of the context out of run creation at all?
*** BaseRuns tries to (wrongly) take the context out. LWTS, as I do them, forces all the context right back in.
4. My point in criticising your experimental design is that the coefficients you use in BaseRuns were derived based on data gathered at the game level, and that you then use game-level data in order to test its superiority.
*** As discussed, the interactions occur at the inning level. Game-level is the best I had.
If you tested the various systems based on season-level data, linear weights would triumph, because that's how it is derived.
*** Not necessarily so. Dynamic Custom Linear Weights would always triumph over static linear weights. Assuming you meant static linear weights, this is probably true, but not necessarily. But I agree generally with this statement. The point however is that the single from last week does nothing to determine the impact of run scoring tomorrow. It might have some predictive value, but it has no impact on it. This is why inning-view (or game-view if you are also looking at wins and lineup construction) is the correct view.
5. In the OBA chart you present above, BaseRuns is considerably less accurate over the entire normal range of OBA's. Missing by an additional .25 runs per game would amount to about 40 runs or 4-5 games over the course of a season. That is not "very very similar"; you have made a substantial trade-off here!
*** If I present the chart by HR, or by OPS, BaseRuns would triumph over the normal range. It depends what data it is that you are using, the context that you are presenting. Again, the strength of BaseRuns is how it handles the HR. Its weakness (relatively speaking to itself, but not to the other evaluators) is the rest of the components. This is why I don't support BsR as the end-all and be-all, but as the first step. As I've mentioned a few times, the search is on for a better score rate. If you have a run model that doesn't adhere to something as fundamental as R = BR x scoreRate + HR, what are you supposed to do with this?
*** Here is the data by the OPS class (grouped by .100)
opsClass R BsR LWTS RC
0.055 - 0.03 (1.76) 0.02 ... RC is better
0.160 0.22 0.18 (0.75) 0.17 ... BsR is better
0.258 0.47 0.50 0.11 0.48 ... RC is better
0.358 0.93 1.02 1.01 0.97 ... RC is better
0.454 1.62 1.72 1.94 1.65 ... RC is better
0.552 2.49 2.60 2.90 2.52 ... RC is better
0.651 3.47 3.63 3.88 3.59 ... RC is better
0.748 4.62 4.79 4.82 4.83 ... BsR is better
0.846 5.85 6.08 5.72 6.27 ... LWTS is better
0.945 7.17 7.48 6.59 7.92 ... BsR is better
1.043 8.60 9.00 7.42 9.78 ... BsR is better
1.141 10.11 10.53 8.18 11.85 ... BsR is better
1.239 11.57 12.25 9.06 14.14 ... BsR is better
1.338 13.16 13.89 9.76 16.66 ... BsR is better
1.443 15.78 16.15 10.78 19.62 ... BsR is better
(you'll note that when RC is better, is is barely better. As the OPS rises, BsR is far better.)
And here's broken down by the HR class
HR R BsR LWTS RC
- 3.08 3.06 3.79 3.03 ... BsR is better
1 4.62 4.62 4.44 4.66 ... BsR is better
2 6.12 6.12 5.00 6.41 ... BsR is better
3 7.65 7.65 5.62 8.37 ... BsR is better
4 9.03 9.00 6.07 10.29 ... BsR is better
5 10.55 10.49 6.73 12.45 ... BsR is better
6 12.33 12.32 7.52 15.35 ... BsR is better
7 16.22 14.32 8.34 18.27 ... BsR is better
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
At the risk of being overly reductive, if a model like this doesn't have a use for baseball management then I'm not sure what the point of the research is.
*** The point of the research is to enlighten people as to how runs are really created. It doesn't have to have an application beyond that. However, if you want to properly value a player, you should value him on how he really creates runs. And BaseRuns helps in that regard for the extreme players.
BR could help with strategic questions concerning lineup selection or how to efficiently run your offense against a top notch pitcher.
*** That's possible, but I would not rely on BsR for that. Personally, I would use BsR to generate custom linear weights values, and THEN I'd use linear weights to assist in answering those questions. This is what I do, and I am very very confidant in the results I get from that.
...so I'm curious if you have a sense of how using BaseRuns should change the approaches we've all been using. And I apologize if this is a point you feel you've hammered home in your previous articles, because if it is I'm not sure I've understood it.
*** I don't think I've really addressed this issue. The approach is to get away from the "typical" run estimators, because they don't model reality. To quote someone's "Equivalent Runs" or "XRuns" or "Runs Created" almost makes it seem as if those estimators are accurate. They may yield accurate results in some or most cases, but the calculation to derive those calculations are not correct. Suppose that we know that 3 = 6 x .33 + 1. But, I come out and say, well, you know 3 is also equal to (6+1) x .429. I may end up getting the same answer using the same data, but the way I combined the data is wrong. But since most players and most teams do not deviate much from the norm, then, really who cares? It all works out.
However, I care about the extremes, about Pedro, Bonds, Thome, et al. And just because something works "on average" doesn't mean it works in the extreme.
So, to get back to your question, BaseRuns should change your approach as to how you view how runs are created, and should force you to question when you see a run evaluator that "works".
For low-level, or game-level actions, a custom set of LWTS or RE or WE charts is what you want (and I've provided some links above throughout the article).
============== Rob, here is the full "B" component I use. Just a caution: you DON'T need to have all this data. But I have this data, and this is what I am using. You will recognize most from the Retrosheet event files. If you want me to clarify some of the items, let me know. Note: because of "partial innings", you have to be very very careful (which is why I have that last entry). The short answer is that the RE chart at the bottom of the 9th inning of a tied game is DIFFERENT from the RE chart at any other point in the game. Again, if you need the long answer, let me know.
To all: Again, adding each of these components beyond the basics adds very very little to the accuracy of the run construction. But, for completeness, I am providing it.
0.73 Single 1.95 Double 3.13 Triple 1.69 HR 0.05 Walk (0.48) IBB 0.16 HBP 0.80 Error 0.28 Interference 1.43 OtherSafe 0.73 Sac (0.06) Strikeout (0.00) Out 0.81 SB (1.19) CS (0.51) Pickoff (0.35) PickoffError 1.05 Balk 1.17 PB 1.17 WP 0.56 DefensiveIndiff (1.06) OtherAdvance 0.00 FoulError (1.49) implied outs
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
Italics: yes, this was all my fault. I did not have a proper closing italic tag, and it left everything subsequent in italics. I threw in a whole bunch of closing italic tags in my previous post just to make sure that there was no nesting going on, to close it off, and that seemed to work. Sorry about all that...
How are Runs Really Created - Third Installment
September 17, 2002 - tangotiger
(www)
(e-mail)
Arvid:
1. It seems almost oxymoronic that BaseRuns doesn't do particuarly well relative to different levels of OBP, but does do very well relative to different levels of OPS. This suggests to me that there is some sort of interaction between the "getting on base" element and the "moving runners along" element that has been lost in the attempt to segregate those two things from one another. For one thing, the probabilities of particular batting outcomes aren't independent of the bases occupied during a given plate appearance.
*** As mentioned, BaseRuns is the first step at the score rate. It does very well with high OPS and HR classes, simply because it handles the HR properly. Further improvement is called for in cases where no HR are hit. This is why I mention that the search is on for a better score rate.
2. I suppose I'm still somewhat put off by the implications that BaseRuns is a "true" or "real" or otherwise aesthetically pure measure of run creation. Even if you look at data on an inning-by-inning basis, it is still an approximation:
BB-1B-1B-HR-K-K-K produces 4 runs, whereas HR-K-K-1B-1B-BB-K normally produces 1.
*** The model assumes somewhat random distribution of events. It does not purport otherwise. You can pick any single example, and any model will be wrong. You need the sample size behind it.
That, in aggregating data to the season level, unusual and random sequences tend to get lost in the noise, is as much an advantage as a disadvantage.
*** No need to aggregate by team though, since this introduces a bias. By aggregating on other terms, as I've done, you get "better" data.
Paul:
is sufficient to justify replacing existing estimators when measuring typical examples within the controlled set of major league baseball teams
*** As I said, if all you care about is the typical example, then the typical evaluators is all you need.
but you can hardly claim to be surprised to meet some challenges when you throw down a gauntlet like that.
*** I have no problem arguing against RC, since it does not have a basis in logic. Its basis is gobbledygook math that is fixed to the sample narrow data, and its flaws are exposed when taking it out of its environment. This is also true of static LWTS. This is not the case with BsR or with custom LWTS.
because as Davenport himself acknowledged, for most situations in actual MLB you’ll do just fine using 1.83 (or even 2), but that if your particular interest is in studying extreme examples, you need a custom exponent
*** I didn't know Clay said this, but this is exactly my position as well.
I’ve found Tango’s tone to be far more modest in subsequent posts on this discussion
*** I must be getting old these last few hours. I'll try to be more "O'Reilly Factor" from time-to-time.
Arvid:
BaseRuns is simply a refinement of linear weights
*** BsR allows the generation of custom LWTS. There's no other relationship between the two.
I would suggest that he discuss the offensive performance of the 2001 San Francisco Giants in terms of BaseRuns.
*** For that you need custom LWTS by batting order by the 24 base-out states. BsR is not appropriate, except to help in establishing the baseline custom values.
How are Runs Really Created - Third Installment
September 18, 2002 - tangotiger
(www)
(e-mail)
Paul:
No, I only meant modest on a relative scale, where 0 is the amount of arrogance to be found in an average Tango article. For God's sake, there's never a need to turn on Fox News.
Ah-hahaha... the Linear Weights Arrogance Tango Scale! I love it! As for Fox News, they are extremely biased. PBS, CNN maybe, and 60 minutes are really the only good ones out there. Seriously, watch BBC, or other world news and you get such a different perspective on the world. Did you watch those 3 Arabic-American kids from Florida on Larry King 2 nights ago? They were extremely believable, and given the choice between them and that lady, I'd choose them. Of course, the American media was all over them before the King appearance, and since then? Exactly.
You know what else they said when asked if they would sue her? No! They said no! How un-American is that??
Brian:
Excellent summary overall. I agree with almost everything, except
might be to attempt to seperate the run scoring, or the moving over(driving in) components
I have done this in Article 2, under the "building blocks of run creation". The separating into components is what you need to do to get custom Linear Weights components.
I'll have to think about your "8.2" concept. Sounds interesting.
I would like to also see the data shown grouped by number of triples in a game, number of walks in a game, number of doubles in a game
Sure, no problem. I'll try to get that done by this weekend (I usually run my research while my newborn is asleep, which is not often these days!).
It seems that if there is any inaccuracy here, it is likely in ... the relative weights assigned to the impact of the individual events
No, that is not possible. Those numbers were generated such that when using the plus 1 method it yields the exact LWTS coefficients determined by the play-by-play data. Therefore, the inaccuracy would be that we can't simply have such a simple "B" equation.
============
I will post the complete BsR equations that I used by the end of today. What I provided to Rob above was only the "B" equation. I neglected to also include the "Baserunner" portion as well.
How are Runs Really Created - Third Installment
September 18, 2002 - tangotiger
(www)
(e-mail)
2. Somewhat less accuracy in normal run-scoring environments.
This may very well be true, but that is only because the other measures (except LWTS) are "cheating" to get there. They ignore the constraints of a HR being at least 1 run, they ignore the constraint that you can't score more runs than you have runners, and so that gives them enough wiggling room to force in coefficients to the sample data they have to get the lowest RMSE possible.
Static LWTS values are derived from the pbp and therefore does no cheating. Well, it cheats in that its values can only be applied from the data it was generated from, the typical run scoring environment.
BaseRuns may be 1% less accurate in the typical environments but "50%" more accurate in the extremes. You (?) said that this is 1 step forward, 2 steps back. From my standpoint, this is 2 steps forward, 1 step back.
I don't like that the accuracy of the other formulas is fitted to the typical data, *especially since almost everyone then takes that formula out of that environment and applies it to Pedro, Barry, and Thome*. That little disclaimer is always ignored.
Anyway, to repeat: if all you care about is the typical, use the typical. If you want to know how the events interact with each other to produce runs in various run environments, then you need to use R = BR x scoreRate + HR. For now, BaseRuns is it.
How are Runs Really Created - Third Installment
September 18, 2002 - tangotiger
(www)
(e-mail)
I think you are onto something here
Thus total outs are being divided by 4.5. If you have 27 outs in a game, that would mean you are counting 6 of them, which it seems to me might be roughly the number of outs made per game with men on base in your data set.
That is not possible. About 45% of all PAs occur with men on base. 65% of all PAs are outs. Therefore, # of outs with MOB is .45 x .65 x 39 = 11
However, I did notice a very interesting relationship between the B component values and the LWTS values in the past. I haven't been able to quantify well yet though. I'm sure you are on the right path.
How are Runs Really Created - Third Installment
September 18, 2002 - tangotiger
(www)
(e-mail)
btw, the number I derived was 4.25. Not sure what to do with it yet.
How are Runs Really Created - Third Installment
September 19, 2002 - tangotiger
(www)
(e-mail)
If we go back to article 2, and the definition of the score rate (or just using common sense), we have:
% of runners scorings = (runners who score) / (runners who score + those who don't)
This is the score rate.
Runners who score is represented by the "B" equation. Though, as mentioned astutely in the earlier post, we should strip out the 4.25 (or whatever constant) to represent this actually.
The "outs" portion, the "C", of the score rate represents those runners who don't score, namely those left on base, and those outs on base.
How are Runs Really Created - Third Installment
September 23, 2002 - tangotiger
(www)
(e-mail)
Give me another week please on posting the formula. I'll have to write a whole article on it, as it's not as simple as I thought I could make it.
How are Runs Really Created - Third Installment
September 30, 2002 - tangotiger
(www)
(e-mail)
I've added a baseruns article, which is an addendum to the RC series. I apologize for not making it better, but I'd rather get it out there, rather than let it sit on my backburner.
SABR 301 - Talent Distributions (June 5, 2003)
Discussion ThreadPosted 11:58 a.m.,
June 5, 2003
(#2) -
tangotiger
In a typical plate appearance, a player will not face the median pitcher, but the average pitcher. If you've got say 360 pitchers, but the BFP are spread out based on talent, with the 360th pitcher barely pitching, then the batter will faced a weighted version of the 360 pitchers, which comes out to exactly average.
However, the concept of median and other things that you can gather from these charts is certainly interesting. You just have to be careful in how you apply it, and its purpose.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 1:11 p.m.,
June 5, 2003
(#9) -
tangotiger
Philly, I'll answer your specific questions in post8, and if you have other questions from post7, please rephrase them. I had a hard time following it.
1 - 1.00 is just a "fictitious" number, like Pamela Anderson is a 9.5 / 10, or what have you. I say "fictitious" in quotes because there is some reason behind it, but I haven't presented it here, though I will in the future. It's a number that can be multiplied and divided. A guy with .5 talent level compared to a 1.0 talent level is the same as a 1.0 talent level compared to a 2.0 talent level. That is, if you have a 1.0 hitter against a .5 pitcher, the resultant expected matchup is exactly the same as a 2.0 hitter against a 1.0 pitcher.
Those numbers might be roughly equivalent to a single-A pitcher (.50), an avg MLB hitter or pitcher (1.00) and Barry Bonds/Pedro (2.0). (Maybe top college and not single-A... I don't know yet.)
2 - If there are say 680x9x30 PA in MLB, and you have say 14x30 hitters, then the avg #ofPA per average hitter is about 430 PAs. A top hitter would have say 700PAs, or 160% of average, or 1.60. Something like that. I was thinking of putting actual numbers, say from a scale of 0 to 750, instead of 0 to 1.8, or whatever. I kind of like having the average at 1.0 though.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 4:23 p.m.,
June 5, 2003
(#12) -
tangotiger
Chart 6 has a Y-axis labelled "Playing Time" 1-100. Draw a vertical line on that chart so that there is exactly as much Playing Time on each side. That line would probably fall between 4.50 and 4.55 (in any event, to the right of 4.45).
Yes, that's correct.
But 4.45 is defined as 1.00 ("MLB average") in chart 4.
4.45 is defined as the talent level of 1.00. The talent level of 1.00 is the MLB average.
Why is the mid-point of Playing Time chart (Chart 6) not identical to the MLB average (1.00) in Chart 4?
Because the distribution is not normal. Chart 6 is a multiplication of Chart 5 (with a very high skew to the right) and Chart 2 (with a very high skew to the left).
The mid-point of chart 6 (Playing Time chart) would be the *median* and not the mean. Because of the skew, this is pretty much what we expected.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 4:31 p.m.,
June 5, 2003
(#13) -
tangotiger
I made an error in my article. If you multiply Chart SIX by Chart 4, you'll get 1.00. That's the same as multiplying Charts 2 (number of players at each level), 4 (talent at each level), and 5 (playing time at each level).
Vinay, you might remember I had a thread last week regarding "ERA by era" or some such. And in there, I showed that regardless of the run environment, your ERA relative to league average ERA was pretty constant. So, if you are Pedro with a 2 ERA in a league of 4, you should expect to be a 2.5 ERA in a league of 5. That makes his ERA+ as 200, or twice the league average. That gives him a talent level of 2.00, in a league of 1.00.
If you decided to double the number of MLB teams, the talent level would drop to say .83, and Pedro maintains his 2.00 talent level. I used 1.00 as a convenient marker, but it was fixed, so that I didn't always have to redo the baseline.
Consider the 1.00 level to be the avg MLB player in 2002.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 10:29 p.m.,
June 5, 2003
(#17) -
tangotiger
Walt: I was trying to avoid technical terms so that I wouldn't get slammed for using normal, binomial, standard normal, etc improperly.
common or typical a normal distribution is
I don't think I said that a normal distribution was typical, but rather that the distribution that does exist in Chart 6 was a typical looking distribution (in lay terms).
For my education, if the median is to the left of the mean, what is that distribution called? Does a normal distribution imply that the mean and median are equal? Does a standard normal distribution imply that the 68% of the points fall within 1 SD and that the mean and median are equal?
Thanks...
Tom: yes, that's pretty much it.
Kevin: please explain the purpose of the equation. I don't know what it's trying to tell me.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 9:43 a.m.,
June 6, 2003
(#19) -
tangotiger
Michael, thanks much for all that info.
I really don't get the use of the covariance for the ERA, insomuch as what it's trying to tell us.
I showed mathematically that an ERA of 2.00 in a 4 RPG environment is the same as a 2.5 in a 5 RPG environment. Therefore, an ERA+ type fits the bill.
However, if you are trying to use the covariance to say "how hard" is it to get a ERA+ of 200, based on the talent distribution of your opponents, or your peers or something, that's another issue. That maybe there's such diluted talent that an ERA+ of 200 in 1906 is equivalent to 160 in 1993 or something. That's really a whole other ball of wax (and really more in line with what I'm trying to do here, than what ERA+ equivalencies are normally used for).
It almost looks like Kevin is trying to work backwards by trying to infer what the talent distribution could have been to produce those results. It's a worthwhile exercise, but I must ask what is the confidence level and sampling error in doing so.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 12:34 p.m.,
June 6, 2003
(#21) -
tangotiger
Well said.
Another thing that opens up now is "Regression towards the mean". What mean? The unweighted mean of all MLB players (say talent level .91)? The weighted mean by PA of all MLB players (talent level 1.00)?
The issue is with PA: is that based on a player's sample performance, or based on his "tools"? This becomes critical, especially for rookies. If I were to regenerate the charts, but only look at say 22 year olds, thing are going to get skewed differently. Are 22 year olds given PAs by talent or performance? What mean do we regress them towards? What's the difference between a performance level of a 22 year old of .85 in MLB than in TripleA?
The only reason he's in MLB is because he was selected there for some reason. If that selection was based on tools, that's one thing. But if it was based on sample performance, that's quite another.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 1:32 p.m.,
June 6, 2003
(#23) -
tangotiger
Dave, you want me to multiply Chart 2 (number of players, per SD) by Chart 4 (talent level, per SD)? That resultant will be a weighting of the players, by talent level, per SD.
That is, if you had 300 players at 4.2 and 150 at 4.6, and the talent level at 4.2 was .8 and the talent level at 4.6 was 1.6, then you'd get "240" at 4.2 and at 4.6.
What would this represent?
If you want to multiply them all AND add then, to come up with one number, that's a different story. This would be the unweighted average talent level in MLB. The answer to that question would be 0.92. That is, if you take the 1500 players who play in MLB in any given year, and they were to each play the same amount, their mean talent level would be 0.92.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 12:36 p.m.,
July 7, 2003
(#25) -
tangotiger
With these distributions, we are in a good position to establish how much "talent dilution" exists with adding or subtracting teams.
If we assume that the current 2003 average player has a talent level of 100, what would happen if half the teams were to disband, leaving us with 14 or 16 teams? What would the average player look like?
I figure that the average player in such a league would have a talent level of 110, which is roughly equivalent to a player that is right now about +1 win / 162 GP over the 2003 average player.
How about if we double the number of teams from 30 to 60 teams? These talent distributions say that the average player in such a league would have a talent level of 90, or about -1 win / 162 GP from the average 2003 player. Troy O'Leary or Ricky Ledee would be an average player in such a league.
So, when we talk about adding or subtracting 4 teams, how much impact is that? This would have the effect of making the player with the 97 talent level or 103 talent level average. Effectively, this would be imperceptible to the viewer.
So, when people talk about "talent dilution", it's hard to see it, if you are talking about adding/subtracting 2 or 4 teams.
SABR 301 - Talent Distributions (June 5, 2003)
Posted 9:02 p.m.,
July 7, 2003
(#27) -
tangotiger
At fanhome I did a very long study on timeline adjustments. However, based on assumptions, you can make the case that Ruth's era had half the talent as today, making Ruth average today. Using a very slightly different assumption, Ruth then became much much better.
The study also suffered from my non-understanding (at the time) of regression towards the mean, and my non-understanding that a player's performance is only a sample of his true talent, and not representative of his talent. This basically invalidated everything I've just said about Ruth.
I've seen this error repeated when looking at aging patterns for pitchers, where regression towards the mean is much more important to understand. You can spot these studies when they show a pitcher's peak age to be 23.
I was going to rerun that study (eventually).
Using the talent distributions that I've listed here would probably be best to be used as explanation after the fact, than trying to lead to a conclusion.
SABR 101 - Relative and Absolute Scales (June 6, 2003)
Posted 7:43 a.m.,
June 9, 2003
(#6) -
tangotiger
Because James views "negative" value as being "bad", so bad that you'd be better off as 0 runs above average in 5 PAs, than 5 runs below average in 1000 PAs. But, the key to understanding this issue is the point I made in the article regarding the "key" point.
SABR 101 - Relative and Absolute Scales (June 6, 2003)
Posted 12:59 p.m.,
June 9, 2003
(#9) -
tangotiger
But those lists in Total Baseball are implicitly intended to be a ranking of players according to their value
Palmer is wrong in how he sells Linear Weights by doing this.
James is wrong in saying that because he doesn't buy the TB ranking, then he can't possibly buy Linear Weights, and then go on and start telling you why he can't buy Linear Weights. James can't get past the concept of "zero" not meaning zero in an absolute sense, where zero is "absence of something". Zero is defined in LWTS as average.
The issue I have with James is not that he doesn't buy it, because I really don't care. I also don't care too much that he gives weak arguments against Linear Weights. My issue with James is that he has an enormous amount of influence (more than all of us put together), and the reader has a certain amount of trust in James, that they won't feel they have to do the dirty work to validate what James is saying, and that James then puts out these weak arguments that it takes us 20 years (and counting) to undo the damage.
That run-on sentence means: James has to be responsible with what he says, becauses people treat him as judge, jury and executioner. James derives (or at least derived) his income by getting people to buy into what he says, and he should be more responsible with his analysis.
SABR 101 - Relative and Absolute Scales (June 6, 2003)
Posted 10:56 a.m.,
November 12, 2003
(#12) -
tangotiger
Just bringing this one forward as a companion to Patriot's article.
Velocity loss of a pitched baseball (June 10, 2003)
Discussion ThreadPosted 7:49 a.m.,
June 17, 2003
(#7) -
tangotiger
Must be a typo. I think some tennis player just hit 147 mph over the weekend. The max is around 102 or so.
SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)
Posted 9:41 a.m.,
June 11, 2003
(#4) -
tangotiger
Andrew, your suppositions are plausible, though I don't think the sample size at this level will make it statistically significant. That would be my guess as well though, that walks are more likely to be issued by below average pitchers, and to the top of the order.
As for lineup slot, yes, I could do that as well, but a few problems
1 - sample size will definitely play an issue here (I think I'd have to do this for the whole retrosheet years, which really isn't more time, but I just need a more powerful computer)
2 - selective sampling (Mike Piazza would be 3% of the cleanup spot instead of a random noise in the overall, which makes that pretty significant, unless all cleanup hitters are of the same "type" as Piazza.. that may be true, but that won't be the case for the #1 or #2 hitters)
3 - we'd want to separate pitchers batting to not
4 - most importantly, I find the current chart already a little unweildly, and I can't imagine readers enjoying NINE of these, and really, if I separate by pitcherss, 18 of them!
However, I know I would, and I'd guess you would as well!
SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)
Posted 3:39 p.m.,
June 11, 2003
(#8) -
tangotiger
(homepage)
Andrew, I did not take it as such... just pointed out some areas to think about.
Jim, I didn't talk about win probability here, as that would be another topic. However, if you are interested in that, please go to my site (see homepage link above), and there's plenty there for you. The two important things I've done are:
1 - I've give you the win expectancy for inning/score/base/out for 7th inning and on, with score differential of 1 or 0
2 - Created "leverage" situations for ALL innings, ALL base/out with score 3 runs and less, which you could use for pinch hit talk, and to a slightly smaller extent, bullpen usage
Hope this answers at least some of your questions.
SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)
Posted 3:42 p.m.,
June 11, 2003
(#9) -
tangotiger
As always, there are other variables to consider:
- batter/pitcher matchup
- batters due up
- potential pitchers due up
- runner speed
- fielding talent and positioning
- park
in addition to the inning/score/base/out. But, I'm limited in my time, otherwise, I'd love to generate a WE that incorporates all this.
SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)
Posted 1:18 p.m.,
June 12, 2003
(#11) -
tangotiger
The way to read it is:
with man on 2b and 0 outs, there's a certain run expectancy from that point on to the end of the inning, based on the actual chain of batters that faced that situation
with man on 2b and 0 outs, and an IBB then issued, there's a certain run expectancy from that point on to the end of the inning, based on the actual chain of batters that faced that situation
The chain of batters are not necessarily random (well, they probably are in the first case), and certainly not random in the second set.
Therefore, you have to be very careful in how you read the chart and trying to make comparisons, and what-ifs, etc.
SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)
Posted 7:55 a.m.,
June 13, 2003
(#13) -
tangotiger
No.
In order to establish the validity of the IBB, you first need to use Win Expectancy, as I did when I looked at "When to walk Barry Bonds" last Oct. To do that, you have to do some work behind the scenes, to establish a "what-if" scenario.
The empirical results are just what they are.
SABR 201: Linear Weights by the 24 base/out states, 1999-2002 (June 10, 2003)
Posted 8:12 a.m.,
June 14, 2003
(#15) -
tangotiger
http://pub119.ezboard.com/fbaseballfrm8
Applications of Win Probabilities (June 13, 2003)
Posted 9:51 a.m.,
June 15, 2003
(#2) -
tangotiger
It's always important to use the right tools for the right job. Phil's data is completely empirical, and therefore, the situation you face yourself in should be similar to what the empirical shows.
If you have additional variables like "pitcher is 20% abve average", the hitter at bat is 30% above average, but the batter on deck is 10% below average, and the hitter after him is 5% above average, you have to create your model to reflect what it is you want. Empirically, you won't find the sample size to match that. Which is why you need a Markov chain that handles all this (or you run a sim).
The empirical data or a basic Markov may help you and guide you to an answer, but if the variables not being considered are very relevant, your conclusion will be suspect.
Hitting the cutoff man (June 13, 2003)
Discussion ThreadPosted 9:54 a.m.,
June 16, 2003
(#2) -
tangotiger
Sylvain,
I know you said drag is not considered, but that has to be a key consideration right? I mean, I'm sure Tim Raines can throw a ball at 75mph, but he'd never be able to throw it 330 feet, certainly not on the fly. I'd also think that maybe a throw on 1 or 2 bounces would be better, in terms of "accuracy".
Thanks for producing your work, as it is very fun to try to decipher!
Hitting the cutoff man (June 13, 2003)
Posted 11:15 a.m.,
June 16, 2003
(#4) -
tangotiger
No need to go to the trouble for the pdf file. I meant deciphering not in your presentation layout, but in the equation itself. I'm trying to remember my physics classes from 15 years ago.
Hitting the cutoff man (June 13, 2003)
Posted 9:26 a.m.,
June 17, 2003
(#8) -
tangotiger(e-mail)
Sylvain,
Send me the file, and I'll ask Primer to post it.
Hitting the cutoff man (June 13, 2003)
Posted 1:46 p.m.,
June 17, 2003
(#9) -
tangotiger
(homepage)
Sylvain's file has been posted to the above link. It is a Excel file. If you have trouble opening it, right-click the "homepage" link, and "save target as".
Hitting the cutoff man (June 13, 2003)
Posted 12:05 p.m.,
June 18, 2003
(#10) -
tangotiger
Sylvain has issued an update (with graphs!). You can find it at the same link from post #9.
SABR 301 - Rocco Baldelli, sabermetrician (June 16, 2003)
Posted 5:10 p.m.,
June 16, 2003
(#1) -
tangotiger
And try to answer that question WITHOUT referencing a player's K or BB numbers.
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 1:15 p.m.,
June 19, 2003
(#2) -
tangotiger
I agree that we don't know what the knucle advantage is, especially since we've got such a small sample to deal with. It could even be "good knucklers versus bad knucklers" have a huge gap in $H, while the "good flyballers verus bad fylballers" have a smaller gap in $H, etc.
In all cases, we are always comparing a MLB relative to other MLB pitchers, who *may* have been specifically selected to play in the majors because they can keep the $H down. Lots of work ahead of us.
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 1:56 p.m.,
January 13, 2004
(#3) -
tangotiger
I'm bringing this thread forward in conjunction with the True Talent Fielding thread.
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 9:47 a.m.,
January 14, 2004
(#5) -
tangotiger
I know Tango is now thinking, "What took that idiot (boor) MGL so long to figure this out?"
Not at all. I only criticize your memory!
I'll comment on the rest of your post in a while.
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 10:40 a.m.,
January 14, 2004
(#6) -
tangotiger
Actually, this PZR thing is getting more complicated than I thought. I'm going to need some time to try to sort things out.
(For those also trying to work on this, there are two layers to consider: things that the pitcher controls and those that he doesn't. We probably also want to consider HR and not just BIP. We don't want to consider the end-result of the play, so that we can ignore the fielder's impact.)
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 12:50 p.m.,
January 14, 2004
(#8) -
tangotiger
PZR should be calculated independently. But, as a test, they should all add up.
Team UZR does not necessarily apply equally to all the pitchers on the team (if for example you have a great CF and you have a pitcher that rarely allows a ball to that CF, then he won't benefit as much, etc, etc).
The point of PZR is that we should be able to get it independent of the fielders, while, as a test, the fielding + pitching + park should get you the total of defense on contacted balls or BIP (not sure which yet).
The handedness issue is an interesting thought. After all, RJ gets to face a disproportionate RH because he is a LP.
SABR 301 - PZR - Blueprint (June 17, 2003)
Posted 2:10 p.m.,
January 14, 2004
(#11) -
tangotiger
MGL, you can't just blanket use the team UZR on each pitcher. If you have 3 great fielding OF, and 4 poor fielding IF, and you have 1 GB pitcher and 1 FB pitcher, you can't given them the same UZR runs / BIP impact.
What if you have a great SS,3B,RF, but poor other fielders?
If you are going to go down the path of adjusting a pitcher's stats by taking the UZR of the fielders, then you'll have to do it one position at a time.
I really have no problem with this, and it should be done.
I'm (trying to) offer a way to do PZR without needing to know about UZR. But, by calculating PZR, UZR and parks, you'll end up with a team's DER.
Sheehan: Pitcher Workloads (June 19, 2003)
Discussion ThreadPosted 3:37 p.m.,
June 19, 2003
(#7) -
tangotiger
Varb: it would be interesting to see if the spread of hitting talent is tighter across years, and not concentrate so much on the position.
Rob: yes, pure speculation, but a variable to consider.
Ross: there are about 5.4 pitches thrown per BB, 4.8 per K, and 3.3 per BIP. So, if you have a pitcher with lots of (BB+K) / PA, you can bet that that pitcher will have more pitches / PA than league average. Dan Quisenberry I will wager had the lowest pitch / PA count of any pitcher with at least 1000 PA in the last 50 years. (If not lowest, then in the bottom 10). Just for fun, go look at any active pitcher's (BB+K) / PA, and you will see a direct correlation to pitches / PA. MLB has the data from 1998-2003 on their site. Benitez is I think the #1 guy in (BB+K)/PA and pitches/PA (or pretty close).
Is this what you are getting at? Otherwise, please restate the concern.
- higher OBA means more batters / game meaning tougher to get a complete game
I'm not sure why you are talking about the 50s, when the low OBA was probably in 1967/68, but let's say there are 25 batting outs / 27 outs (I realize that a pitcher with few baserunners will be closer to 27/27).
Anyway, if your OBA is .300, that means .3 safe plays per .7 outs, or 10.7 safe plays per 25 batting outs, or 35.7 batters. A .350 OBA works out to 38.5 batters. (Just an example here). So, as the OBA rises, so does the number of batters per 27 outs. With more batters available because of the run environment/playing conditions, the more pitches needed overall.
I agree with your last point that there are more variables at play for pitches thrown across era. I've got a working model, though I have no data to validate it against. Essentially, it shows Cy Young with, if I remember right, 2.8 pitches / batter, and Nolan Ryan with 3.9 pitches / batter. (or somewhere around there.) Eventually, I'll present a mathematical proof that shows the relationship between K,BB,BIP and pitches thrown. This is as much as I can say for the moment.
Sheehan: Pitcher Workloads (June 19, 2003)
Posted 4:34 p.m.,
June 19, 2003
(#8) -
tangotiger
This article would have been much more interestiung if Joe had gotten a hold of a pitch count estimator like Tango's and actually checked how many pitches were being thrown per start by today's aces versus yesteryear's.
To expand on this point, and tie it in to Ross's point, our estimators are based on "all things being equal". I am reasonably confidant that the extended pitch count model (unpublished) that I have works with the 1990s to today pitchers. I wasn't certain how well it would hold up to the 1960s pitchers, and so I was very happy to see how well it matched to Koufax.
So, for Koufax, and for the 90s pitchers, the way batter/pitcher matchup exist in terms of "all things" we can say that they are pretty equal. Because it worked on Koufax, will it work on all other pitchers of his era? I would bet yes, but I'm not sure. If it works in the 60s and 90s, should it work in the 70s? I would say yes, but Sheehan brings up a good point about maybe baseball changed the way they played in the 70s. I can't just discount it, especially since Ryan and Carlton and a few others had kinda high pitch counts. If you look at the Ryan progression, he completely tails off at around age 30. It's kind of remarkable, and I don't know if he was injured or what happened. Maybe the Angels didn't want to wear him out so he can resign with them as a free agent? But again, I would bet that if we really looked at it, baseball was the same from the 50s to today, in terms of batter/pitcher matchups, and how often balls/strikes were thrown, etc.
But how about earlier? Well, again, the more you go back, the less likely things are the same. You get to the point where you have very few BB+K per PA across the whole league. Those pitcher/batter matchups may not be "all things equal".
To get back to the point, I agree the article would have been enhanced if he used my estimator or someone else's (it seems Nate/BP also has an estimator). However, the article is excellent, and gives a good push for people to tackle some of the issues, most notably, the talent distribution across eras.
Sheehan: Pitcher Workloads (June 19, 2003)
Posted 4:58 p.m.,
June 19, 2003
(#11) -
tangotiger
there are about 5.4 pitches thrown per BB, 4.8 per K, and 3.3 per BIP
Has that been consistent across era's?
It's not even consistent among pitchers of the same era. However, I've got a working model that takes the "type" of pitcher you are, and spits out the pitches / event. It's quite logical, but there's room for error. Because of the way that model works, it's transportable to other eras, assuming that the way batters/pitchers approach each other in terms of throwing/looking for strikes is the same. For this reason, I'm not too crazy about publishing this until I get some data to at least point me to the right answer.
So, if you have a pitcher with lots of (BB+K) / PA, you can bet that that pitcher will have more pitches / PA than league average
Well you can bet on it, but it may not be true. The average pitches thrown for each category may vary widely by pitcher.
It actually does vary widely. But, I think eric's point explained it very well in post 6. I won't extend my answer more than that, and my bet would be there. Of course I could be wrong, but that's why betting is fun!
Is this what you are getting at?
No. Although it is part of it. The question is what are the effects of the difference in K's and walks between eras, which is the way Sheehan is using it. It is perfectly possible that 50 years ago more 2 strike counts ended in a ball in play rather than a third strike. That does not change the number of pitches at all.
Sure, it's possible. As I said, we don't know what the strike/ball matchups were for pitcher/batter.
So, as the OBA rises, so does the number of batters per 27 outs.
Assuming that the number of double plays, caught stealings and runners thrown out advancing are all the same. I don't think they are.
I agree, which is why I added my provision. But again, what's the impact here? I'm sure I can come up with an r of over .95 between OBA and PA/27 outs historically (if I had the data back then).
With more batters available because of the run environment/playing conditions, the more pitches needed overall.
Assuming the number of pitches per batter is constant for all pitchers and regardless of the number of batters.
Assuming the same K and BB rates, I'd say yes, you are right about your first part.
I agree with your last point that there are more variables at play for pitches thrown across era
I think there is a fundamental problem with using average results for comparing elite pitchers. The question is does Roger Clemens have to face more batters and throw more pitches to accomplish the same things Steve Carlton did. The fact that Sean Bergman gave up a lot of baserunners or Bobby Witt struck out a lot of batters and walked a lot of batters doesn't seem to have much to do with that. And yet when we compare era totals they are included in the analysis - it may be there were fewer Bobby Witt's and Sean Bergman's pitching in the mid-60's but that doesn't make Steve Carlton's job easier than Clemens.
I think I understand what you are saying, but then I don't. I agree that you don't necessarily want to use the basic pitch count estimator, as it won't work too well on the extreme pitchers. But then again, that basic estimator was 2% off for Clemens and Koufax compared to their actual totals.
Sheehan: Pitcher Workloads (June 19, 2003)
Posted 7:51 p.m.,
June 19, 2003
(#12) -
tangotiger
Ross, just so I'm not going to go nuts again, here's what I'm going to do. You let me know if this will satisfy the issue.
Select
- all starter seasons
- from 99 to 02
- min 400 PA in that season
Calculate
- seasonal OBP
- seasonal PA / 9IP
Then run a regression of OBP against PA/9IP. I expect to get an r over .90, and more likely over .95.
My contention is that the more runners on base / batter faced, the more batters at the plate / 9ip, regardless of pitcher. This I think is rather obvious, so rather than wasting my time again, you tell me what you want me to run, and what contention you are positing. Thanks...
Sheehan: Pitcher Workloads (June 19, 2003)
Posted 11:30 a.m.,
June 20, 2003
(#17) -
tangotiger
(homepage)
So there is no reason to think that a pitcher of the same quality will need to throw more pitches to complete a game today than they did 20 or 30 years ago.
I don't think I was really talking about this, but I agree. The number of pitches a pitcher has to throw is dependent on the style of the pitcher/batter, in which both are dependent on the skillset of the pitcher/batter and the "run environment", which is partially provided by the pitcher/batter in question.
This is neatly evidenced by Bartolo Colon 2001/2002, who must have changed his pitching style drastically (assuming he maintained the same quality) in order for him to have that change in pitch count, walks and K changes, etc.
See above link for his stats. Looking at his 2002 to today (about 1400 batters), his pitches/batter was: 3.6. In 1999-2001, he averaged 4.0 pitches/batter.
His (K+BB)/PA rates in 1999-2001 was: 32%. In 2002 to today, he's at: 23%.
Looking at his HR/H:
1999-2001: 12.5%
2002-today: 10.6%
His (BB+H)/PA:
1999-2001: .319
2002-today: .297
So, Bartolo changed his style to the point that batters were no longer going deep in the count, probably because Colon was giving what looked like more "hittable" pitches (though they probably weren't). The net effect is that by not going deep in the count (or by batters not taking Colon deep, who knows), Colon:
- reduced his pitches/batter
- reduced his walk and K rates per batter
and as a side effect
- improved his overall performance level (though not necessarily to a statistically significant degree... I didn't check).... that is, maybe Colon skillset remained the same, but he found a more optimal pitching style to increase his performance level.
Sheehan: Pitcher Workloads (June 19, 2003)
Posted 12:22 p.m.,
June 20, 2003
(#18) -
tangotiger
In 2002, Colon went to 3 balls or 2 strikes: 501/966: 51.9%
In 1999-2001, he did that: 41.6%
Boy, that's not at all what I was expecting. Assuming I didn't make a mistake somewhere, Colon managed to get to 3 balls or 2 strikes MORE when not concentrating on striking players out. Yet, somehow, he ended up with fewer pitches/batter. Not sure about 2-strike fouls changing. I can only guess he had more pitcher's counts, so that when he got to 2 strikes, he did that with getting fewer balls.
Let's see.... ok, here's how many balls/PA thrown, when Colon managed to get to 2 strikes:
year.. PA with 2 strikes... balls/PA
2002 455 1.58
2001 459 1.77
2000 432 1.77
Interestingly, he managed to get to 2 strikes while throwing fewer balls, which may explain some of his good performance (more pitchers counts, and fewer pitches to get there).
Koufax Pitch Counts (June 19, 2003)
Posted 3:20 p.m.,
June 19, 2003
(#2) -
tangotiger
Rally: no they are not, but I'll see if I have permission to post them for you.
After Sabre-School Special (June 19, 2003)
Discussion ThreadPosted 3:46 p.m.,
June 19, 2003
(#2) -
tangotiger
Park: well, that makes alot more sense. I should have verified what he did. That looks just about what you and I discussed a year or two ago, about putting the PF at the center of the time period.
28/29: well, .28 is my preference, because I look at extreme teams. .29 is the best-fit for actual all teams.
After Sabre-School Special (June 19, 2003)
Posted 7:47 p.m.,
June 19, 2003
(#5) -
tangotiger
I agree with Patriot that I don't think you'll see any difference (max 1 run / 600PA is my guess), especially since a random team really isn't that different from the average. A team .300 to .360 OBA is no big diff really.
After Sabre-School Special (June 19, 2003)
Posted 7:48 p.m.,
June 19, 2003
(#6) -
tangotiger
I mean no big diff for what's being proposed here.
Actual Pitch Log for Koufax, game-by-game (June 20, 2003)
Posted 1:21 p.m.,
June 20, 2003
(#3) -
tangotiger
(homepage)
Keith Woolner passed on the above link to me, which I rememeber reading, but I quickly forgot. Some excellent stuff in there, specifically about how the avg pitches / start was the same between the Old Dodgers, and our current pitchers, but that the distribution was much different, with starters being pulled out much much faster than today (and as well kept going longer and longer in other outings). Seems like managers have all agreed in the last 60 years that 100-120 pitches is what your top starter should average per start, but have not agreed on how that distribution should be reached.
Actual Pitch Log for Koufax, game-by-game (June 20, 2003)
Posted 2:41 p.m.,
June 24, 2003
(#6) -
tangotiger
Well, if you maintain your pitch count level and improve your BB rates, then you'll get to face more batters (which is what happens here). On top of which, he got better, meaning he got more outs/batter, and increased his chances for a CG.
I'm still shocked his overall average is essentially what a 1990s starter would get.
Making Money (June 23, 2003)
Discussion ThreadPosted 8:48 a.m.,
June 24, 2003
(#2) -
tangotiger
You don't have to look for the article as it is in the link of the title of this thread.
Your quote of mine was me referring to Voros' equation of
Team Revenue = (W% * $430,169,580) + (Metro Population * $3.46) - (Teams in Metro Area * $27,962,685) + (Home Playoff Games * $2,446,043) + (Per Capita Income of Area * $2,655.60) - $160,287,379 + ($22,906,159 if the team’s stadium is less than two years old).
So, the non-winning variables are: population, teams, per capita income. I meant "all things equal" among these three, and let's only focus on the relationship between winning and revenue.
I also said that he has an extra variable for playoffs, which will force a curved relationship for the higher achieving teams. According to the Voros equation, each home playoff game adds 2.5 million$. If you are the "upper echeleon" team, maybe you expect to add an extra 5 million$ of which you'd give 4 million to the players. So, instead of having an 80 million$ payroll to finance an 87 win team, you really have 84 million$ payroll, or 2 mlliom$ instead of 1.85.
I also didn't say anything about any conspiracies, but rather that there's a fundamental equilibrium point that would force the free market to center around.
So, what exactly am I saying that you disagree with?
Making Money (June 23, 2003)
Posted 5:06 p.m.,
June 25, 2003
(#4) -
tangotiger
Steve, thanks for writing back.
I am surprised that the effect of revenue works as it did, rather than as a multiple of the other factors (like population, etc). Makes life easier though that way... And the playoff impact wasn't as great as you would have thought, though I am surprised.
It's good to remember that Voros' equation is based only on 3 years of data, and so, we would have issues there. It would be worthwhile to re-run for the last 10 years.
Making Money (June 23, 2003)
Posted 8:00 p.m.,
June 26, 2003
(#6) -
tangotiger
I suppose what we want is "disposable net income". NYC residents and workers also pays a city tax.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Discussion ThreadPosted 2:39 p.m.,
June 24, 2003
(#2) -
tangotiger
Well, I'm not so surprised that Percy gets 1/4 of his batters when they basically don't count, but that that's the best figure, and that the top relievers get 1/3. That's seems a bit high to me. I'm guesssing 15-20% would be a better target?
I could understand if say Mariano comes into the 8th in a high-leverage situation and gets out of the jam, and then the Yanks take a big lead, making the 9th inning a mop-up job for Mo (though you could consider taking him out if that happens). But, this is not what usually happens with the current fireman.
Give me the usage pattern of the late 70s to mid 80s.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 4:25 p.m.,
June 24, 2003
(#5) -
tangotiger
Rally: I looked at Gossage from 82-86. His line was: 20,9,14,27,31. That's probably what we should shoot for.
Tribe: Yes, you are accurate. Remember what we are NOT measuring. We are NOT measuring the performance level of a pitcher. What we ARE measuring is the level of fire a pitcher finds himself in. If you are John Franco and you have a propensity to get men on base, and then work yourself out of a jam, well, that's a big fire you had to put out, even though you were the arsonist.
As well, the manager is making the call to leaving in Franco after every batter. I can't say that "If Franco stays in, the LI is 1, but if Strickland comes in to bail him out, the LI is 2", can I?
To measure performance, you need a "win probability added" type of measure (which I have, and have shown some results in the past). What this does is combine the performance of the pitcher with the leverage of the situation. So, say you are at LI 1.0, and you give up a hit. That brings you to LI 2.0, and you give up a walk. That walk will count as "2 walks". Now you are at 3.0, and you give up a HR. That counts as "3 HR with 2 men on each". At this point, the game may be out of reach, at which point your LI is now 0.3.
Leveraged Index is NOT this measure. It's one-half of what you need to measure the performance of a reliever. Eventually, I'll combine the two for a complete measure. For the moment, take my LI and multiply it by Wolverton's ARP. That'll get you close enough.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 6:37 p.m.,
June 24, 2003
(#10) -
tangotiger
Nick, maybe you should sell someone at FOX on the idea! I love the story!
Jim, the example is based on figuring out what the win probability is at each situation, and the possible distribution of win probabilities based on each possible outcome. That "variance" gives you the leverage. Check out Phil Birnbaum's article that I posted elsewhere on this STUDIES section of Primer as he gets into it as well. I figured this mathematically, for all possible inning/score/base/out, with the score +/- 18 runs.
As for the cutoffs, they are arbitrary, but I tried to group them so you get 50% in the first grouping, and then about 10-15% in each of the other 4.
FJM: I started to do that a little while ago, with Urbina, Shuey, Stanton, and Benitez. Eventually I'll do it, but I'm just trying to find the best way to present it. My feeling is that I will not find a difference.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 8:06 p.m.,
June 24, 2003
(#12) -
tangotiger
I agree, when he was brought in would ALSO be a good metric. Heck, when he was REMOVED would also be a good metric too. Both are a snap to calculate as well. A manager therefore can do 3 things:
1 - when does he put him into the fire, and how big is that fire
2 - how long does he let him stay in the fire, and how big is that fire while he's battling it
3 - when does he take him out, and how big is the fire when he leaves
You can look at 1 and 3 and say that the difference in fire is his performance level, but
a - if he pitches multiple innings, then his offense can help/hurt
b - his fielders always help/hurt
Drinen handles #1/#3 by taking him "out" at the end of the inning, and inserting him back "in" at the start of the next inning. It's a good way to do it.
Like I said, there's alot of descriptions you can generate using the LI concept. For the moment, I've only done #2. Drinen has done #1 and #3.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 4:41 p.m.,
June 25, 2003
(#14) -
tangotiger
(homepage)
What might be interesting for this chart is an indication of each reliever's percentage of the team's total reliever innings at each leverage category.
You are absolutely correct. I actually did this for the Yankees (see homepage link), and I should do this for all the teams.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 5:11 p.m.,
June 25, 2003
(#16) -
tangotiger
Walt, good idea! I can try to pick out say the "typical" LI for each class.
As for the % of distribution, I really wasn't sure what to expect. 45% for 0-0.5, 15% for 0.5-1, 10% for 1 to 1.5, 8% for 1.5 to 2.0, 7% for 2.0 to 2.5 among our group of relievers here. I guess that's an ok distribution, though I was surprised by the big dropoff from 45% to 15%.
Essentially, half the PAs in MLB have very little value.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 9:04 p.m.,
June 25, 2003
(#23) -
tangotiger
LI *is* normalized to 1.0, so your equation holds for PA, by definition.
From 1974-1990, the LI for starters was 1.01, and relievers was .98. I haven't checked the 99-02 period, yet.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 6:37 a.m.,
June 26, 2003
(#25) -
tangotiger
Doug, your concern was also expressed by post #13, and I addressed part of that in #14.
Btw, the overall LI, by team, varies by about 0.1. I don't think you will find quite the distribution differences that you might be looking for. But the concern is definitely valid.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 9:40 a.m.,
June 26, 2003
(#27) -
tangotiger
You know, I spoke too quickly. ARP includes base/out, but not inning/score. So, I can't use LI and multiply it by ARP, even for a basic verison.
Yes, all you need to do is calculate the change in WE (win expectancy). This is what Drinen did in the other article I posted (WE before pitcher came in, after he left, and give difference to pitcher. In multi-innings, assume the pitcher left the game, and came back in, so that the change in offense won't affect him). I've actually also done this, and posted the results somewhere.
But, I cannot use LI directly to get into my change in WE.
The issue comes with "crediting" the fielders. Right now, I give it all to the pitcher (which is why I'm not too crazy about publishing my current results). Same with hitters, and I give all the credit to the hitter, and none to the runners.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 2:21 p.m.,
June 26, 2003
(#29) -
tangotiger
Jim, check out the other link I added, called "Win Probability Added".
What I intennd on doing (eventually) is
1 - win probability added using inning/score/base/out
2 - win probability added using no context
3 - calculate hitter's LI
Take #3 and multiply by #2. This gives you his win probability, if he were to hit the same regardless of the leveraged situation.
Compare the the result of this to #1. The difference is the player's "clutch" performance.
To measure his underlying clutch skill, you'd check year-to-year correlation.
Now, if you read the Hidden Game, I believe they did this kind of thing with the Mills' Brothers PWA (2 years only). We're in a position to do it from 20+ years.
Not now though.
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 11:02 p.m.,
June 26, 2003
(#30) -
tangotiger
I wrote this on Jan 2, in one of the Primer articles, and it bears repeating:
==========================
What we are after is *not* to maximize a pitcher's LI, but rather to maximize their leveraged-innings (LI x IP). LI of 1.00 with 120 IP will have the same win impact as 1.50 LI with 80 IP to a reliever. Of course, it's not that simple, as you have to take the totality of your starters and relievers, and maximize the leveraged innings for the good pitchers, and minimize the leveraged innings for the bad pitchers, such that all innings are accounted for. You have other constraints as well, with respect to the tiredness of a pitcher's arm, etc.
Mark Eichorn, for example, had 200 leveraged innings (LI of about 1.3) in his great year. That is an excellent total
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 1:48 p.m.,
June 27, 2003
(#32) -
tangotiger
THat would be good to know, but I'm not sure what you mean about "adjusting".
LI is a reflection of the game state. It is dependent only on the inning/score/base/out.
Certainly, if Bonds were at plate, the LI would be different. Certainly, if Clemens is throwing 160 pitches already, the LI would be different. But, I'm trying to keep the player's identity out of the picture. (For this purpose anyway.)
Reliever Usage Pattern, 1999-2002 (June 24, 2003)
Posted 4:42 p.m.,
July 11, 2003
(#34) -
tangotiger
(homepage)
Phil,
See above link for a discussion on wearing out the pitcher. There's also a cool chart by MGL in post #25 I think it was. I'd ignore the back/forth between Steve and Ross... they no like each other.
As for your other comment, I think this definitely has value. I've been meaning to break down the performance of all relievers by classes of leverage. It would be interesting to see at the team level this difference, as you propose. Something tells me that we shouldn't find much difference.
2003 Win Shares, updated (June 24, 2003)
Discussion ThreadPosted 12:31 p.m.,
June 25, 2003
(#5) -
tangotiger
The highest single-season LI that I have found was 2.3. No way is any pitcher getting an average of 3.
10-15% of all batters for a team occur with an LI over 2.5. That translates to about 60 innings. I think the single-season record for a single pitcher is probably the equivalent of about 40 innings. The current relievers have the equivalent of 25 innings of LI over 2.5.
So, if you've only got 25 innings with an LI over 2.5 for the season, you've got 50-60 innings with an LI under 2.5.
2003 Win Shares, updated (June 24, 2003)
Posted 3:54 p.m.,
June 25, 2003
(#7) -
tangotiger
Oh, I wasn't trying to say whether WS was valuing Gagne correctly or not. Gagne's LI last year was 1.8, so if he has a 2 this year, I can accept that.
In the case of Gagne and Nomo, if you have something like
Gagne: 40 IP, 1.50 "component" ERA
Nomo: 120 IP, 2.50 "component" ERA
our LI comes in and treatss Gagne's 40 IP as 80.
The question then is if you want a 1.50 ERA in 80 IP, or a 2.5 ERA in 120 IP. Compared to say an average of 4.00, that makes Gagne worth +22 runs and Nomo is +20 runs. Compared to a replacement of 5.00, then Gagne is worth +31 runs, and Nomo is +33.
I see no issue where you have a reliever having 1/3 the innings, but if he is more effective, then he could be worth the same.
Redefining Replacement Level (June 26, 2003)
Posted 10:31 a.m.,
June 26, 2003
(#2) -
tangotiger
Patriot, good stuff! I had forgotten about that equation. (I think I was looking for something where it would cap at a ratio of 1, and never got around to it.)
The other thing that we should remember is that Nate groups them by classes, when really, we want the aggregate to that point. For example, if you have a 10-yr time frame, you don't want the guy with 5000 AB, but *all* the guys up to the guy with 5000 AB. I'm not sure if I'm saying it correctly.
From that standpoint, I would guess that you might get similar results (as this would slow down Nate's curve) using completely different approaches, which as you say, is a great thing.
As well, we have to remember about "chaining", which is a concept that Patriot I think introduced. This is easier to think about with hockey, where you have 6 defenseman, each getting ice time based on their (perceived) talent. When the #1 guy goes down, the other 5 get more ice time, and the 7th defenseman comes in to play with limited ice time. The #1 guy was replaced by a combination of the other 5, and this 7th guy. When the #6 guy goes down, the #7 slides right into place. The "replacement level" is higher for the #1 guy than the #6 guy, because of the chaining effect. In baseball, it's a little different because of the positions not being so interchangeable, and the talent depth not being so even at each position/team.
Redefining Replacement Level (June 26, 2003)
Posted 4:40 p.m.,
June 26, 2003
(#6) -
tangotiger
Using a running total of Nate's numbers, I'll add a 3rd column to Patriot's chart
YEARS...Repl-Nate...Repl-Tango... running-Nate/Tango
1 .76 .81 .80
2 .84 .86 .85
3 .88 .88 .88
4 .92 .90 .91
5 .94 .92 .94
6 .96 .93 .95
and it caps out at 1.00 at 11 years.
Redefining Replacement Level (June 26, 2003)
Posted 1:45 p.m.,
June 27, 2003
(#13) -
tangotiger
A nice quick simple function would be
repl rate = 1 - 1 / (2n + 4)
where n = years
You get the following
years, repl rate
0.001 0.75
0.25 0.78
0.5 0.80
1 0.83
2 0.88
3 0.90
4 0.92
5 0.93
6 0.94
7 0.94
8 0.95
9 0.95
10 0.96
11 0.96
12 0.96
13 0.97
14 0.97
15 0.97
16 0.97
17 0.97
18 0.98
19 0.98
20 0.98
Redefining Replacement Level (June 26, 2003)
Posted 6:25 p.m.,
June 27, 2003
(#17) -
tangotiger
I'm not sure why we need a "general" evaluation system.
A player does add PA-by-PA value. From that standpoint then, he comparison level is always the average player, since that is the environment he faces himself in. Average teammates, average opponents, average pitchers, average park, average everything. His value is derived therefore against the average player. The average player is worth "0" relative to average. The "0" win value corresponds to the average salary of say 4 million$ for a regular.
The problem is that you also have a time component. The more you play, the more you can keep somebody else out of a job (the bad guy hopefully). This is worth positive dollars. And the longer you play, the longer you keep somone out of a job. But if you are a .400 player playing for 15 years, then, you kept a GOOD player out of a job. That creates negative absolute value.
Anyone above the sliding time-dependent replacement level scale creates absolute positive value, even if he has negative relative value. If you drop BELOW the replacement level, the you've create negative ABSOLUTE value.
If we insist one scale as a general evaluation system, I'd say to use the .450 scale or the 90% scale, as this corresponds to roughly say a player's 3 or 4 year career. But I really like the time-depedendent scale.
Redefining Replacement Level (June 26, 2003)
Posted 11:05 p.m.,
June 27, 2003
(#22) -
tangotiger
I agree with Patriot that the starting point must be .500. You can argue that the final point should be something else like .450 or .400 or whatever.
But, the run values of all events, the win values of all events always presumes the actual context being played in. And the average actual context is average.
The reason that the HR has a 1.4 run value is because, given average teammates to get on base, average pitchers to hit against, average fielders, average park, average everything, we expect the HR to add, on average, 1.4 runs to the game.
So, .500 must be the starting point in actual game conditions.
Now, the next step is: "Well, how would the bottom of the barrel had done, given that average environment?" And, for one PA, you can argue that the bototm of the barrel is 80% of league average.
But, the argument here goes, that over the average career of an average 3-yr hitter is 90% of league average. THAT's the guy that you have to beat. It's not the 2-month guy who gets the emergency callup and who has no business being in the majors. It's that guy, that 3-yr guy, who's your bottom of the barrel. Anyone below that is keeping you from finding that 3-yr guy.
It's like if you have a company, and you suddenly got alot of work, and you need someone, anyone. So you hire some schlub. But, if you keep this schlub for 3 years, you know what happens? You start to lose business. You actually lose absolute dollars.
I don't see why the "emergency" baseline is the baseline we need to compare against all the time. Use that to compare against for an "emergency" time period (1 month). Use a different baseline for different needs.
I feel another 108 post thread coming...
Redefining Replacement Level (June 26, 2003)
Posted 1:01 p.m.,
June 28, 2003
(#26) -
tangotiger
I agree with Patriot's good explanation.
I also want to talk about "salaries".
Salaries have 2 components: value relative to average, and playing time. In terms of marginal dollars, each marginal win is worth about 2 million $ in salary. So, an average team (81 wins) will pay the average payroll (70 million$). If you are a player that is +1 win over average, you should get 2 million$ more than average.... but what average?
That's where playing time comes in. If the average regular gets 4 million$, then this player is worth 5 million$. If this guy did +1 as a backup, then maybe the backup average gets 2 million$, and this guy is worth 3 million$.
Enter replacement level. If this guy, who is +1 over average, he might be +1.5 over replacement (while a regular might be worth +3 over replacement).
Mutliply by 2 million/win, and you get his salary worth (3 million or 6million depending on the playing time).
This is just another dimension to replacement level, and one that is based on really the replacement level being about 80% all the time. One a one-year salary basis, I think this makes sense.
I'll be gone for a week, so have fun!
SABR 301- Win Probability Added (June 26, 2003)
Posted 7:12 a.m.,
July 2, 2003
(#4) -
tangotiger
I've only got a sec to reply to the first post.
No, I do not yet adjust for park or league or any other environmental condition (opposing pitcher or batter, etc). But I should.
To measure reliever's past effectiveness, you need some form of WPA, meaning inning/score/base/out at least, plus park/league preferably, and batter/fielders, too.
Wolverton does only base/out, which is also pretty good. For the moment, your best bet is to stick with Wolverton.
As for using LI, the thing is that that assumes that the pitcher is equally effective in all base/out, which is one reason you can't simply multiply it to Wolverton's. The other is that Wolverton already "LI's" the base/out, in effect. Therefore, I'd have to give you an LI based only on inning/score, for you to multiply by Wolverton's number.
You other option is to calculate a reliever simply by using "peripheral ERA", and then multiply by LI. That might get you part way there.
Bonderman and Age 20 (June 26, 2003)
Discussion ThreadPosted 12:36 p.m.,
June 27, 2003
(#3) -
tangotiger
I would also propose that you ues a "pitch-count estimator" instead of innings. Or at the very least, an estimate of BFP.
Estimating Pitch Counts (July 2, 2003)
Posted 10:12 p.m.,
July 2, 2003
(#5) -
tangotiger
bob, you are right, and I'm wrong. He's essentially giving out 2.8 pitches / non-K out (which as we know is too low). The intercept should go through zero, though.
UZR inter-positional linear correlations (July 6, 2003)
Posted 10:23 a.m.,
July 6, 2003
(#1) -
tangotiger
I think another good test is looking at players that switch teams, and see how they have their UZR change. I.e., Ventura next to Jeter and next to Reyrey, etc. Put all of the 3B UZR in one pile with good SS UZR, and the same 3B UZR in another pile with the bad SS UZR, and see if there's any relationship there.
UZR inter-positional linear correlations (July 6, 2003)
Posted 12:24 p.m.,
July 8, 2003
(#7) -
tangotiger
Mike,
That's very interesting! I looked at all players who played for multiple teams, and tracked their UZR/162 (weighted by the lesser of their games), to see if there was anything there for all players.
The biggest discrepancy was with Tor, followed by Bos, Det, Cin, ChA. On the flip side were teams who had the "advantage" go the other way: Min, LA, Ari, Tex, Bal.
This might suggest that the park factors employed by MGL are not good enough. Looking at Cincy specifically, there was 979 matched-games.
Here's the breakdown
team team diff G
CIN MIL 65 37
CIN COL 51 143
CIN MIN 36 106
CIN TEX 32 48
CIN SEA 30 281
CIN KCA 2 61
CIN PIT 0 115
CIN SDN -6 48
CIN TBA -11 140
In almost all cases, a player playing for Cincy had a higher UZR than playing for another team.
Here's the report from Texas:
team team diff G
TEX COL 25 145
TEX CLE 4 53
TEX KCA -2 100
TEX OAK -14 47
TEX SEA -15 382
TEX CHA -24 217
TEX ANA -25 47
TEX NYA -31 42
TEX CIN -32 48
TEX SDN -38 48
TEX DET -98 145
That's 1274 matched games. Here is the match on the last line, TEX/DET, making up 145 games:
Juan Gone went from +15 in Det to -7 in Tex, over 53 games in RF. Kapler went from +55 in Det to -21 in Tex in CF.
As for why this might be, it could be the park or the pitchers (or something wrong with MGL's programs). That is, MGL has tried as best he could to isolate the park and pitchers, etc, so that all we are left is the fielder's performance to measure. It could be that the "batted ball velocity" isn't doing the job, and that maybe Texas pitchers suck so bad, that it's not being measured properly in the data. Or that Texas is such a hard place to field at, that we can't adjust it well enough.
Cincy has the opposite problem that maybe their pitchers are much better at controlling balls in play than the league, or that their park is much easier to play at (and the park adjustment does not reflect that properly).
Note: I chose a cutoff point of at least 30 games played with both teams in the above analysis.
Here is the full report using all players, with no min cutoff:
team diff G
TOR 8 1861
DET 7 2046
ANA 7 1026
CHA 6 1333
NYA 6 2200
CIN 5 2647
PHI 4 1044
CLE 4 2012
SLN 3 1714
BOS 3 1832
TBA 2 2232
SEA 2 3751
HOU 1 1765
CHN 1 3468
MIL 1 2674
SFN 0 1685
FLO -1 1322
NYN -2 2912
SDN -2 3193
MON -2 1856
ATL -3 2501
LAN -3 1842
PIT -3 1261
COL -4 3005
OAK -4 1886
KCA -4 2072
BAL -5 1808
TEX -6 3548
ARI -8 1271
MIN -13 727
Essentially, this suggests that Tor fielders are 8 runs over compensated in UZR (if their opposing teams are league average... I'd have to adjust for this as well... that maybe the teams that the Tor played for were +4 or something... this is similar to strength of schedule analysis).
UZR inter-positional linear correlations (July 6, 2003)
Posted 12:26 p.m.,
July 8, 2003
(#8) -
tangotiger
The "diff" is difference in UZR/162.
UZR inter-positional linear correlations (July 6, 2003)
Posted 12:38 p.m.,
July 8, 2003
(#9) -
tangotiger
Ughhh.. "opposing teams" is really "the non-Toronto team that the Toronto player has also played for at the same position".... replace "Toronto" with whatever team you want.
UZR inter-positional linear correlations (July 6, 2003)
Posted 10:17 a.m.,
July 9, 2003
(#10) -
tangotiger
I re-ran the above to include the standard deviation. What I did was assume a 70% success rate (p), and a sample number of plays = G x 4. And each play was worth .8 runs. Taking Toronto as an example, we get that 1 SD (for plays) is SQRT(.3 x .7 / (1861 x 4) ). Taking that figure, and multiplying it by .8 gives us the SD (for runs) per play. Converting this into a /162GP figure, I multiply that value by 162 x 4. So, for Tor, 1 SD (for runs per 162 GP) is 2.8. Their figure is 2.91 SD from their sample mean.
Now, how did all teams do? Only 10 of 30 were within 1 SD, when we would have expected 20. 23 of 30 were within 2 SD, and all were within 3 SD. So, I do think there is some park bias going on.
Here's the data
team diff G 1SD SD
TOR 8 1861 2.8 2.91
DET 7 2046 2.6 2.67
NYA 6 2200 2.5 2.37
CIN 5 2647 2.3 2.17
ANA 7 1026 3.7 1.89
CHA 6 1333 3.3 1.84
CLE 4 2012 2.6 1.51
PHI 4 1044 3.7 1.09
BOS 3 1832 2.8 1.08
SLN 3 1714 2.9 1.05
SEA 2 3751 1.9 1.03
TBA 2 2232 2.5 0.80
CHN 1 3468 2.0 0.50
MIL 1 2674 2.3 0.44
HOU 1 1765 2.8 0.35
SFN 0 1685 2.9 -
FLO -1 1322 3.3 (0.31)
MON -2 1856 2.8 (0.73)
PIT -3 1261 3.3 (0.90)
NYN -2 2912 2.2 (0.91)
SDN -2 3193 2.1 (0.95)
LAN -3 1842 2.8 (1.08)
ATL -3 2501 2.4 (1.26)
OAK -4 1886 2.7 (1.46)
KCA -4 2072 2.6 (1.53)
BAL -5 1808 2.8 (1.79)
COL -4 3005 2.2 (1.85)
ARI -8 1271 3.3 (2.40)
MIN -13 727 4.4 (2.95)
TEX -6 3548 2.0 (3.01)
What value firemen? (July 6, 2003)
Discussion ThreadPosted 7:14 a.m.,
July 7, 2003
(#3) -
tangotiger
According to the Voros link I put up last week, the "playoff factor" will increase the marginal $/win to 2 or 2.15 or somethign / win. It's certainly possible that his study would be improved greatly by increasing sample size, and that maybe we'll find a 1.5 million$/win for the average team, and 2.5 million$/win for teams in the hunt.
What value firemen? (July 6, 2003)
Posted 9:29 a.m.,
July 7, 2003
(#5) -
tangotiger
Since the media has mangled his original definition of how to use a bullpen, I wouldn't be surprised if they've mangled anything else he has said on the subject.
All I can say is that Rollie Fingers, Bruce Sutter, and Goose Gossage were fine and did not suffer any psychological depression or anxiety attacks.
What value firemen? (July 6, 2003)
Posted 12:39 p.m.,
July 7, 2003
(#7) -
tangotiger
I agree that the biggest difference is quantity. As I showed in an earlier article, the leverage situation of Gossage and Sutter in the 9th was pretty much what Percy and Hoffman have been getting. The difference is that Goss/Sutter have received equally high-leveraged situations in the 8th and even 7th, innings that Percy and Hoff don't see.
What value firemen? (July 6, 2003)
Posted 8:56 p.m.,
July 7, 2003
(#10) -
tangotiger
Vinay, that 80/87 thing is very interesting! What's the breakdown for say Pettite?
jto: An "optimal" LI would be somewhere above 2.0. So, if the average RPW converter is 10 to 11, then a leveraged RPW converter would be about 5 for a top optimally used reliever. However, a topreliever gets an LI of about 1.7 for the most part, giving him a RPW of over 6.
What value firemen? (July 6, 2003)
Posted 11:12 a.m.,
July 9, 2003
(#12) -
tangotiger
The causal relationships would be the following
1 - more talent+experience leads to more payroll
2 - more talent leads to more wins
3 - more wins leads to more revenue
So, if we take say the Yanks or Redsox, they do #1 (get good players), and they pay for it. At the same time, those players will do #2, and generate wins. #2 leads into #3, causing more revenue.
The causal relationships you can combine #2 and #3 to form:
1 - more talent+experience leads to more payroll
2 - more talent leads to more wins which leads to more revenue
With #2, we know that when the talent adds 1 win, it adds 2.65 MM in revenue (of which say 1.85 will be redirected to the players).
If a team spends more initially, you are talking about #1, which is somewhat related (though not in direct causal effect) to #2.
I don't see how a team having a higher base of revenue, say like the Redsox, would be able to spend their disposable income any wiser than the Blue Jays.
What value firemen? (July 6, 2003)
Posted 3:12 p.m.,
July 9, 2003
(#14) -
tangotiger
Ah, gotcha.
Sure, I can buy that there may be other variables. I for one would have expected the increase to not be linear, but tied to the "fan base". That is, I would have expected that with more people to draw from that the Redsox increase in wins would be proportional to their fan base, rather than be as linear as every team. It seems that the only non-linearity (or at least the pronounced non-linearity) is tied into the playoff possibilities.
Until other variables are identified, the marginal wins x 2MM rule-of-thumb seems to be pretty good.
UZR, multiple positions (July 7, 2003)
Discussion ThreadPosted 9:10 p.m.,
July 7, 2003
(#4) -
tangotiger
ADP, you comment about high-school SS, or other sports (say like an avg NFL QB v avg NFL OT) are bang-on, and something that I've been talking about at fanhome for the longest time.
It's a given that the avg HS SS is better than the avg HS player at almost any other position (except for probably pitcher). So, if we were to expand MLB to 3000 teams, I think we can say the same thing. What if we only had 6 MLB teams? (The NHL had 6 teams for the longest time.) Well, I think we can see how maybe the avg LF or the avg SS in MLB might be better than at other positions. So, somewhere between 6 and 3000 teams, there's a balance where the avg player at each position is equivalent.
There's no reason that balance has to be at 26 or 30 teams. And really, we shouldn't even look for it. We also shouldn't expect that if you do find that the balance should exist for 30 teams that this should exist EVERY YEAR. Sometimes, the avg SS overall is better than average, or the avg 1B overall is better. Why balance to every year?
I think we can maybe accept that over a 25-year period, there would be a balance, but that's only to make our life easier. That trying to balance it to the last 2 or 3 runs might be not worth our while. And maybe that's true. But, to use a 1-year adjustment is just plain wrong.
UZR, multiple positions (July 7, 2003)
Posted 9:13 a.m.,
July 8, 2003
(#6) -
tangotiger
David, you are probably right that the avg 3B is probably better than the avg player overall (since I believe as a hitter, he is lg average).
I don't buy the "someone has to play that position", because you can make that argument about HS baseball too, or NFL. The avg QB gets paid far more than the avg at other positions. The avg HS SS is far better than the avg HS at most other positions. Every team needs a QB or SS.
This is useful not only for MVP discussion, but for how much to pay the player. The avg 3B, if the above analysis is correct (above average fielder, average hitter) should be paid more than the average player overall.
UZR, multiple positions (July 7, 2003)
Posted 10:33 a.m.,
July 8, 2003
(#7) -
tangotiger
I have revised my process. Please see article above, and page down to the "Revised" section.
UZR, multiple positions (July 7, 2003)
Posted 3:08 p.m.,
July 8, 2003
(#8) -
tangotiger
I revised the article again. In addition to the "Revised" section, look for the "Practical Application" section.
UZR, multiple positions (July 7, 2003)
Posted 10:29 a.m.,
July 9, 2003
(#10) -
tangotiger
Well, it is only based on 4 years of data, so just generally confidant. If you want to bump some position up or down .003 runs / play (about 2 runs / 162 GP), I wouldn't disagree with you.
I'm a little annoyed about the LF-RF thing not being closer. But with the largest matched-pair being LF-RF, of all the matchups, that's the one that we should have the most confidence in. As well, position-wise, there's really very little difference in playing LF or RF (though I suppose if you wanted to add a trait like "what's his UZR against LH/RH while in LF/RF, that would be a good thing).
Generally speaking, the values I presented are in-line with the fielding spectrum.
As for the Yankees comment, putting Bernie at LF and Matsui in CF should be done immediately. Switching say Jeter for Ventura is probably a net loss, knowing the traits of these 2 players specifically. Generally, you want your better fielder at SS and worse at 3B, but when one guy is 27 and the other is 35 (or whatever), speed becomes critical. Putting say Soriano at LF and Bernie at 2B I think might also be a positive (after some time).
As for the process, what I did was take UZR / 162 GP and turned that into UZR / play. (using 4 plays per game for 3b, 3 plays for lf,rf, etc...). That gives you this list that I published
6 +.011 (6)
4 +.009 (5)
5 +.007 (4)
8 +.005 (4)
7 -.016 (3)
9 -.021 (3)
3 -.025 (2)
Then, that final table was generated using only the above figures (difference in runs / play, and number of total plays / 162 GP). If you look at the Hubie column, those would be the neutral "posistional adjustments".
UZR, multiple positions (July 7, 2003)
Posted 5:14 p.m.,
July 9, 2003
(#11) -
tangotiger
I'm looking at the LF-RF mulitple positioning, and I split it up into "Primary LF", "Primary RF", "Neutral LF/RF", with the split being that the primary position is at least 50% more games than the secondary position. In each matchup, there was between 1100 and 1700 games, which is pretty good.
Anyway, when the player whose position was a primary LF got moved to RF, this was the change:
- 7 runs to +2 runs.
So, right away, we know that the guy who moves from LF to RF is a below-average LF, and he becomes above average at RF.
What about the primary RF who moves to LF? He was +1 run at RF and +2 runs in LF. So, an average RF gets moved to LF, and he continues to be around average. In fact, he looks slightly better.
And what about the players with no primary positions? They were -2 runs in LF and -4 runs in RF, meaning that there was tougher competion in RF (and that the guys with no primary position were below average).
It seems that the next breakdown I should do would be based on:
primary LF - above average
primary LF - average
primary LF - below average
... how does each do when moved to RF?
Repeat for primary RF moving to LF, and non-primaries.
Not sure when I'll get to it, but I also want to repeat this for the 2b/ss, the other natural comparison point.
UZR, multiple positions (July 7, 2003)
Posted 10:11 a.m.,
July 10, 2003
(#12) -
tangotiger
Breaking down by Primary, Seconday and Neutral positions adds an interesting layer.
Let's look at the IF, starting with SS as the primary position. I will present the UZR runs / play at each position, relative to the league average of that position.
SS/2B (823): 0.000, -.004
SS/3B (647): +.001, +.008
So, what does this mean? In the 823 games where our player played at SS and 2B, with SS being the primary position of the player, he performed at league average level at SS, and slightly BELOW league average at 2B. Since we "know" that the average 2B is a worse fielder than the avg SS, we would have expected that our SS would have performed better at 2B. He didn't. Sample size is an issue. The other factor would be experience, that there's a natural dropoff in performance when playing at a secondary position. On the other hand, looking at SS/3B, we get pretty much what we expected. We have essentially an average SS performing very well at 3B.
Let's look at the 2B as the primary position
2B/3B (689): +.002, +.001
2B/SS (710); +.013, +.001
We see that the guys who shift from 2B to 3B are slightly above average fielders as 2B. They pretty much maintain that same level of performance at 3B.
The next line is the very interesting one. Of the players whose primary position is 2B, and they were asked to play SS, they were the cream of the crop, with a very very above average +.013 runs / play at 2B (the equivalent of +11 runs / 162 GP). When they played at SS, they performed essentially as the equivalent of a league average SS.
Finally, with 3B as the primary position:
3B/2B (638): +.022, -.008
3B/SS (513): +.017, +.007
The 3B to 2B shift is only done with very good 3B, and they end up performing below average at 2B. Interestingly, the same level of 3B also gets moved to SS, but he performs at an above average level at SS. That is, there is more of a dropoff going to 2B than to SS, even though the base talent level he is being compared to is higher at SS than 2B.
Wheh! That's alot of inconsistencies on the surface. With a sample of 670 games, at 5 plays per game, and .8 runs / play, one standard deviation is .006 runs. Essentially, I really don't have the sample size here to say anything with confidence. So, what I'm about to say in the rest of the article, I'm saying it without the numbers actually supporting me.
The SS to 3B move indicates that the avg SS is .007 runs / play better than the avg 3B. The 3B to SS move indicates a .010 runs advantage for the SS. Splitting the difference, and it works out to a .009 advantage. We can even say that the difference between the .009 and .007 is the "experience/familiarity/similarity" factor.
Doing the same process for SS to 2B (.004 advantage to 2B) and 2B to SS (.012 advantage to the SS), and we can say that the SS gets a .004 advantage (with that whopping .008 difference being the familiarity factor).
The 2B to 3B (.001 advantage to 3B) and 3B to 2B (.030 advantage to 2B) implies a whopping .015 advantage to the 2B, with the largest familiarity factor of the bunch as well. Essentially, it's tough to ask the players to switch between 2B and 3B.
Listing this mathematically, and we have:
3B + .009 = SS
2B + .004 = SS
3B + .015 = 2B
Trying to force a best-fit equation gives us the following spectrum among these three players:
SS: +.004
2B: +.004
3B: -.008
I think what this exercise shows is two important points:
1 - Sample size, sample size, sample size
2 - That the primary/secondary component is critical, since the familiarity/tools aspect comes into play. Specifically, the 2B/3B tools are different enough, such that the experience level to leverage those tools conspire to bring the whole house down.
Anyway, I don't think I can do much more without a large enough sample.
UZR, multiple positions (July 7, 2003)
Posted 10:14 a.m.,
July 10, 2003
(#13) -
tangotiger
By the way, based on the level of talent that is moved around, this supports what we already knew: the fielding spectrum is SS/2B/3B. Going from SS to 2B to 3B, the talent level is much less than going from 3B to 2B to SS.
UZR, multiple positions (July 7, 2003)
Posted 1:07 p.m.,
July 10, 2003
(#14) -
tangotiger
I had this laying around, so I thought it would be interesting to look at too.
Off LWTS by position, both leagues, 1989-2001.
Pos LWTS
ss -13
c -10
2b -6
cf -1
3b 0
lf 7
rf 9
1b 17
This looks pretty similar to the "Hubie" column. Let me put them up, side-by-side:
Pos...OffLWTS...FieldingAdj (Hubie column)
ss... -13 -11
c... -10 ???
2b... -6 -7
cf... -1 -3
3b... 0 -2
lf... 7 7
rf... 9 10
1b... 17 6
The first column is the actual Offensive value. The second column is how Hubie Raines would play, relative to lg average, at each position.
As you can see, the big difference is at 1B. I mentioned in the article that there's only so much damage a fielder can do at 1B, since the opportunities for damage are less. (Even more so at DH, where the fielding value of Hubie, relative to an average fielder at DH is ZERO.)
If you take col 1 and subtract col 2, you get the overall value at that position. The 1B has a sizeable advantage in this regard (i.e., we put too much penalty on the 1B fielding value.) In pretty much all other cases, it seems that the managers have properly balanced the fielding to their hitting.
UZR, multiple positions (July 7, 2003)
Posted 10:28 p.m.,
July 11, 2003
(#16) -
tangotiger
Excellent post David! I pretty much agree with the sentiment.
I also think the key is to look at the replacement level to help us out. However, I think this applies far more in NFL than MLB. What does the best QB and the best tackle not in the NFL have in common? They are both worth exactly 0 dollars to the NFL team. AND they (pretty much) have no other position to play except QB and tackle, respectively. Therefore, for the NFL, I have no problem setting a player's worth compared to this best player at his position not in the bigs.
In baseball, CF/LF/RF are very much related, as are SS/2B/3B. And the pool to draw from 1B is the largest. C might be a unique position. P is. (And guys who can't make it as an SS/2B/3B can still make it as OF.)
Therefore, I think it's hard to find that replacement level by position. Which is why the multiple-position player analysis comes in handy. It's a built-in conversion factor between positions that you can chain for the whole spectrum. But, as we've found, going from a primary position to a less difficult secondary position is not always advantageous. This is because it's not THAT easy to make the switch (at least insofar as 2b/ss/3b is concerned).
I agree that what I think we will find is that if we use a long-term offensive-based positional adjustment (OPA), we'll be better off. 10 or 20 years or something.
I would NOT do like some people do and use a 1-year OPA.
UZR, multiple positions (July 7, 2003)
Posted 11:08 a.m.,
July 30, 2003
(#17) -
tangotiger
Someone sent me an email regarding figuring out position-neutral UZR. Here is my response:
===============================
For example, if you go to the bottom of [the above article], you'll see my hypothetical neutral fielder called Hubie Raines. This fielder, at SS, would be 11 runs below the MLB avg SS. Put him in RF, and he's be 10 runs above the MLB avg RF, etc, etc.
So, in terms of trying to get his neutral fielder rating, and assuming all you know is that he's 11 runs below the MLB avg SS, this translates to "0" relative to ALL fielders.
Trying to look for a best-fit equation, I would then say that
UZR(neutral) = UZR(SS) x 0.667 + 8.
So, Jeter, if we think that he's really a -18 UZR at SS would come out as -4 as a fielder at a neutral position.
Doing the same for 1B: UZR(neutral) = UZR(1B) * 2 - 16
So, if you've got a top 1B who is +10 UZR relative to the avg 1B, he'd come in at +4 as a fielder at a neutral position.
We can of course can't take it to this extreme. There are certain tools of a 1B that just won't translate to SS, and vice-versa. Zeile could be a +10, or more at 1B, but the tools that he can hide at 1B would be exposed at SS. And what he can leverage at 1B might not be exploited at SS.
It is better to think of "what would an average fielder do at this and that position". From this standpoint, now you can compare Jeter's -18 to Hubie's -11. (Jeter is 7 worse). And you can compare Zeile's +10 to Hubie's +8. Zeile is 2 better.
You've given them the same comparsion baseline player, without getting involved with actually figuring out how to move Zeile or Jeter around.
It allows you to do these neutral positional comparisons, without losing the argument (which you would) about moving Zeile to SS or Jeter to 1B.
UZR, multiple positions (July 7, 2003)
Posted 3:20 p.m.,
August 4, 2003
(#18) -
tangotiger
Having said what I said above, and all that still holds, for those who wish to compute a UZR(neutral) for all positions, here's the best-fit equations:
Pos...slope... intercept
3 2.0 -16
4 0.8 6
5 1.0 5
6 0.65 8
7 1.3 -10
8 1.0 3
9 1.3 -13
So, for seconbase, it would look like this:
UZR(neutral) = UZR(2B) * 0.8 + 6
A 2B who has a -7.5 UZR as a secondbaseman would be considered an "average" fielder overall.
Again, I'll repeat it. This does not mean that all 2B who are -7.5 would be average at a neutral position. It's just a neat little way to say that an average fielder at a neutral position would be a -7.5 if he played 2B.
You'll end up with weird numbers, if say your 1B was +15 runs above the avg 1B. By this process, this 1B would end up as +14 relative to an average fielder at a neutral position. Hard to believe you can have a 1B that good. But, in reality, what I'm saying is that an average fielder at a neutral position would be +8 as a 1B. So, if you do have a 1B who was +15, then he must have really really leveraged his skills that added an extra 7 runs. What kind of skill? Maybe his height or his scooping or his charging the plate or whatever... things that you essentially not come into play anywhere near as much at a neutral position. And he was able to hide his lack of speed maybe, a skill that would be quickly exposed at a neutral position.
Hope all that was clear...
UZR, multiple positions (July 7, 2003)
Posted 7:28 p.m.,
December 18, 2003
(#19) -
tangotiger
I'm going to make this thread "Required Reading of the Week". The article has pertinent information in trying to compare fielders at different positions. Please read the article and all posts herein, prior to commenting.
UZR, multiple positions (July 7, 2003)
Posted 10:59 p.m.,
December 18, 2003
(#21) -
tangotiger
I had forgotten that I did work on the primary/secondary position thing (post #12). It's definitely a critical component. As soon as I get my hands on the 2003 UZR, I'll update all this.
Ruane - Cost of outs, and speed (July 9, 2003)
Discussion ThreadPosted 11:41 a.m.,
July 10, 2003
(#4) -
tangotiger
I showed a few months ago the GIDP rate of Rickey, Coleman, and Raines, it was much less than the league average. Something like if the league average was around .115, these guys were at the .060 to .080 level. Can't remember though.
Ruane - Cost of outs, and speed (July 9, 2003)
Posted 1:23 p.m.,
July 10, 2003
(#5) -
tangotiger
(homepage)
The above link has the data I mentioned.
Fewest BB / PA, since 1947, min 150 PA (July 10, 2003)
Posted 2:39 p.m.,
September 29, 2003
(#4) -
tangotiger
Wells ended up the season with 20 walks, in 213 IP, while facing 887 batters.
After his incredible start, Wells ends up with 23 walks per 1000 batters. Still great, but not the record.
SABR 201 - Should we non-sac bunt more? (July 10, 2003)
Discussion ThreadPosted 1:15 p.m.,
July 11, 2003
(#4) -
tangotiger
For a great hitter, the break-even point is much higher.
If we look at the bases empty, 1 out situation, here are the run values for all events:
1b .27, 2b .40, 3b .65, hr 1.00, bb .27, out -.18 (for 1999-2002).
Let's assume you have a .333/.440/.675 hitter, what's his run value / pA? In this case, it works out to +.08 runs / PA.
That makes the breakeven point for this hitter to non-sac bunt at .500 (assuming he doesn't miss his bunt, and is now down in the count).
How often would Bonds, Pujols et al have to bunt to keep the fielders "honest". I think the fielders are giving that up, that risk that he might bunt, by playing them the way they do. It would probably cost them too much to do otherwise.
But for your Vizquels, the bunt is a good weapon.
Aaron's Baseball Blog - David Wells (July 10, 2003)
Posted 9:18 p.m.,
July 11, 2003
(#2) -
tangotiger
Aaron is the Santana of baseball writing.
SABR 201 - Custom Linear Weights (July 11, 2003)
Discussion ThreadPosted 7:42 a.m.,
July 13, 2003
(#2) -
tangotiger
Actual totals from 1974-1990.
SABR 201 - Custom Linear Weights (July 11, 2003)
Posted 2:44 p.m.,
July 17, 2003
(#5) -
tangotiger
It might be unclear, because it sounds like the same thing.
If the RE in state1 is .60 and the RE in state2 is .90, and a walk will ALWAYS get you from state1 to state2, then it's value is .30.
To expand, there are 24 states to consider, and therefore, 24 (or more) transitions to consider. Each of the 24 start states will give you a run value for the walk (LWTS by the 24 base-out states). The weighted average of the walk (frequency of each walk by start state) will give you the LWTS value of the walk.
I hope that was clear?!?
Workshop on Pitch Counts (July 14, 2003)
Posted 2:22 p.m.,
July 14, 2003
(#1) -
tangotiger
Just to qualify the "regardless of time period". I mean, as long as the time period has the modern day rules (4 balls, 3 strike, 2-strike foul), then the model should work. I suppose that this should apply to minor leaguers and college too.
GIDP (July 14, 2003)
Discussion ThreadPosted 4:00 p.m.,
July 15, 2003
(#2) -
tangotiger
Yes, that method would be better, so that at least we could control somewhat for the fielder influence.
Also comparing different pitchers on the same team would help too.
Previous DIPS research (July 22, 2003)
Discussion ThreadPosted 3:55 p.m.,
July 22, 2003
(#1) -
tangotiger
(homepage)
This is also a good link.
SABR 301 - Regression towards the mean (July 22, 2003)
Discussion ThreadPosted 10:10 a.m.,
January 14, 2004
(#1) -
tangotiger
I'm bring this back, as a blast from the past. Let's make this thread the "required reading thread of the week".
SABR 301 - Regression towards the mean (July 22, 2003)
Posted 5:10 p.m.,
January 17, 2004
(#4) -
Tangotiger
(homepage)
The above is another good link.
Chances of making the playoffs (July 23, 2003)
Posted 3:25 p.m.,
July 24, 2003
(#6) -
tangotiger
If all teams were .500, then your guess is pretty good. Dackle is showing a 55% chance of winning the division in that case, looking only at the lead.
If you use dackle's expectation of the true talent of that team and its opponents (which I don't), then you get a much higher value.
Sabermetric Site to Visit - ESPN (July 25, 2003)
Discussion ThreadPosted 3:31 p.m.,
July 29, 2003
(#3) -
tangotiger
Well, you can visit my site for my individual work, and that's been categorized.
For a more all-encompassing, you can go to baseballstuff.com , and I believe that James Fraser or Jim Furtado's sites (see links on the right on that site) has a pretty good list of such topics and categories.
My suggestion, to those who really want to do this, and it'll be alot of work, is to be an "editor" on the open source directory, specifically for sabermetrics, and build the thing directly there. The value is that many search sites use that open source directory (including google I believe) to build their search engines.
Sabermetric Site to Visit - ESPN (July 25, 2003)
Posted 11:10 a.m.,
July 30, 2003
(#5) -
tangotiger
Yes, my idea was simply to have a set of links, just like John Skilton's site, but better (with categories along the lines of what I have on my site, and expanded).
If you want to create such an index, and if the open-source directory that Google uses doesn't want it, I'd be happy to post it here.
Competitive Balance (July 25, 2003)
Posted 11:46 a.m.,
July 26, 2003
(#4) -
tangotiger
Thanks guys...
One thing that I would want to do as well is to add the "expenses". The cost of doing business in NYC is alot higher than in a typical city, so the potential net income, aside from salaries, is not as great as the revenue would suggest. This is complicated, especially if you try to figure out a cost for the ballpark.
Competitive Balance (July 25, 2003)
Posted 2:38 p.m.,
July 29, 2003
(#7) -
tangotiger
What the Voros equation shows is that aside from winning, that all of a team's base revenues is inherent.
The only way to increase revenues (long-term anyway) is to put a product on the field that performs better than the competition.
Any short-term gimmick ("Come see our prospects!", "Come see our European players!", "Come eat our free food") will not impact revenues for an extended period of time.
(This is just like stocks where fundamentals will drive the price of the stock long-term, but the technicals will set the pace short-term. Fundamantels in a stock, its expected future earnings, is equivalent to the talent level of a team, its expected future wins.)
The one doubt I have is the linear relationship between wins and revenue (except for that curve at the end for the playoff-bound teams). I really expected something like an extra win would result in an increase in 2% revenue (that is, proportional to the base revenue), instead of the linear relationship Voros is presenting. This is what Palmer reported in the Hidden Game, and this is what I found in a very very quick look last year (where Palmer and I only looked at attendance, which is our best stand-in for revenue). And this is contrary to Voros' more detailed, but smaller sampled, study.
Competitive Balance (July 25, 2003)
Posted 2:40 p.m.,
July 29, 2003
(#8) -
tangotiger
If not proportional to its base revenue, at least proportional to parts of its base revenue. And this may be where Jim is heading about a team being able to increase its revenues. I think they can only leverage a certain part of their base components, but they can't do anything, unless that leverage is based on the team winning.
Competitive Balance (July 25, 2003)
Posted 4:29 p.m.,
July 29, 2003
(#10) -
tangotiger
Sure thing.. you are talking about making an investment. Let me spend 200 million$ on a new stadium, or let me spend 5 million$ on new luxury boxes, or let me spend 500,000$ on a new scoreboard. I can drive more people to come to the game, and that will drive more revenue to my team.
I can choose, or not, to spend those revenues on my team to sustain that increase in attendance.
So, yes, there is an extra way to get more revenue. You can either get more revenue by getting more wins from your players (which may or may not cost you more money), or you can get more revenue by paying for it now (spend 10 million$ in expenses, and hope to get 1 million$ in revenue each year for the next 20 years, which discounted at a certain rate might be worth 11 or 12 or 7 million$ in revenue). If I can leverage it right, spending that money can generate more revenue.
Bats Right, Throws Left (July 29, 2003)
Discussion ThreadPosted 11:15 a.m.,
July 30, 2003
(#2) -
tangotiger
(homepage)
The list of bat R and throws L was simply to get a list of "great players" with that weird combination of handedness. The easiest listing I used was Runs Produced (though I don't subscribe to the name, I do subscribe to its idea). I could have used just runs, or just games, or just hits, or whatever. Wouldn't have made a difference.
As for subtracting the HR from R+RBI, I suggest you click the Homepage link above, and read the discussion I have with a few of the fanhome regulars. (Unfortunately, some of the data presented was lost in transition.)
Open Directory Project - Sabermetrics (July 30, 2003)
Discussion ThreadPosted 6:06 p.m.,
July 30, 2003
(#3) -
tangotiger
I agree.
The suggestion here is for Sylvain to do the work of coordinating all of the material, based on the help from Primates wishing to contribute their favorite links.
After Sylvain has gather, sorted, indexed, and done his stuff, he can go to the editor, or become the editor, of the Sabermetrics directory at dmoz.org.
And, if they don't want him, or they take too long, Primer can be the official source to house the index.
Leveraged Index (LI) - by the 24 base-out states (July 30, 2003)
Posted 1:07 p.m.,
July 30, 2003
(#1) -
tangotiger
If someone wanted to come out with a clutch index, and if you didn't want to use the Linear Weights by the 24-base out states to do it, then using the LI is a good stand-in.
In essence, figure out every player's OBA and SLG by the 24 base-out states. Multiply that figure by his LI x lgPA for that base out state. Add up your totals and divide by the sum of lgPA.
Compare that figure to his overall figure. Voila. Clutch Index.
(Again, using OBA and SLG is problematic, especially since they don't have the same denominator. And I would include SF in the SLG calculation. But like I said, to do it right, use LWTS by the 24-baseout states.)
Leveraged Index (LI) - by the 24 base-out states (July 30, 2003)
Posted 12:37 p.m.,
August 1, 2003
(#6) -
tangotiger
Probably because with a runner on 3b and 0 outs, he's always assured of scoring (not much leverage there), while with 2 outs, he's got a 30% chance of scoring. So, there's a big swing possibility there.
If a reliever comes in to the 9th inning with a 3 run lead, he's almost assured of winning the game (not much leverage there), but coming in with a 1 run lead, there's a big swing possibility of winning or losing the game.
Leverage is about swing possibilities, and nothing else.
Baseball Prospectus - Small sample size (July 30, 2003)
Discussion ThreadPosted 12:49 p.m.,
August 1, 2003
(#2) -
tangotiger
What would have been better would have been to breakdown by GB, Flys (and ignore pops and liners). The 1B being there or not would not affect the flyball out rate, and vice versa for the CF and groundballs.
If you really wanted to do something cool, extend that for multiple years. How did Giambi's teams throughout the years do on groundballs when he was there, and when his backups were there? Problem is that you might not get enough "backup" games to get any meaningful results.
Like I said, it's a fun exercise, but limited to the sample size.
Calculating Relative Stats (July 30, 2003)
Discussion ThreadPosted 6:09 p.m.,
July 30, 2003
(#1) -
tangotiger
In that new McGwire example, with Coors affecting the medium/long flyballs, he gets 173 long flys, and 82 HR.
Forecast 2003 - Interim Results (July 31, 2003)
Discussion ThreadPosted 10:10 a.m.,
July 31, 2003
(#1) -
tangotiger
The deltas were calculated as follows, and I'll take Bonds as an example:
Baseline: Bonds = 1.231, lg=0.760, comp=+0.471
Current: Bonds = 1.240, lg=0.757, comp=+0.483
delta = Absolute value of (+.471 minus +.483) = .01
We can do it as a percentage as well, but I think this will do for now.
Forecast 2003 - Interim Results (July 31, 2003)
Posted 11:54 a.m.,
July 31, 2003
(#3) -
tangotiger
standard deviation, in the above order:
hitters: .094, .099, .115
pitchers: 1.00, .83, .83
10 runs = 1 win? (August 1, 2003)
Discussion ThreadPosted 4:57 p.m.,
August 7, 2003
(#2) -
tangotiger
(homepage)
Every 10 marginal wins will add 1 marginal win.
So, you start with a team of 4.5 RS and 4.5 RA. What's their win%? .500.
Now, you have a team with 4.6 RS and 4.4 RA. What's their win%? If you use PythagaPat, it estimates .521. You can also try running the Tango Distribution (see above site), and you get .520.
Try 4.8 v 4.2, and you get .563 with PythagaPat and .561 with TD.
Try 5 v 4, and you get .604 with PP, .601 with TD, and .600 in real-life.
The equation: W% = .500 + (RS-RA)/10 will lead to a similar answer to the above.
It is a nice shorthand, but for more rigorous, I suggest the PP or TD processes.
John Jarvis SABR presentation on the IBB (August 1, 2003)
Discussion ThreadPosted 2:00 p.m.,
August 1, 2003
(#2) -
tangotiger
Excellent point on the "rolling-forward" concept.
I have my own method of handling the IBB issue, and it has similar variables to Jarvis'. I do not use SLG or BA, but rather a tailored version based on the 24 base/out state that we are in (much less when looking at the IBB situation). I would definitely use the LWTS by 24-base out states, rather than SLG and BA.
I'm also disturbed that Jarvis only looks at the recent performance to establish the player's performance level, because the regression deemed that to be the best fit. What you should use is the best estimate to a player's true talent level. Maybe at a high group level, the 2-week performance does the job, but at an individual player's level, it definitely can't.
All in all, I'd have to give this thing a few more readings.
Tippett and DIPS (August 1, 2003)
Posted 12:58 a.m.,
August 2, 2003
(#30) -
tangotiger
Year-to-year correlation, pitchers, from 1969-2002
minPA HR/PA BB/PA K/PA bipH/PA n
500 0.20 0.45 0.61 0.22 2814
200 0.13 0.36 0.56 0.17 6690
bipH: H-HR
n: number of pitchers
Seems to me that if HR/PA and bipH/PA are pretty close, we'd expect (2b+3b)/PA to also be close.
Using /PA or /BIP or /(AB-SO), I don't think, will make much difference.
Tippett and DIPS (August 1, 2003)
Posted 9:46 p.m.,
August 2, 2003
(#41) -
tangotiger
I was a little surprised by my results, esp since the HR is so related to the park too.
However, Tippet, is a much larger study, shows the "r" of the HR to be much higher.
Lots more work to do.
Tippett and DIPS (August 1, 2003)
Posted 4:19 p.m.,
August 3, 2003
(#44) -
tangotiger
Please note that every size sample comes with its own error range.
As well, the different environment almost forces you to start separating by era.
It could very well be that Tippett's findings and my findings are consistent from being drawn from the same population, and the differences can be explained to chance. I don't know that. But, as with everything, we should always report the margin of error.
Also note that we did not use the same denominator.
And, we should also try to control for change in home team.
I don't expect to do much work in the near future, but if I do, I'll report whatever I find.
Tippett and DIPS (August 1, 2003)
Posted 10:00 a.m.,
August 4, 2003
(#49) -
tangotiger
Voros is an excellent sabermetrician. Fanhome has a great group of guys who post consistently with the same handle (which is as close as you'll get to a signature.)
Tippett and DIPS (August 1, 2003)
Posted 10:50 a.m.,
August 4, 2003
(#53) -
tangotiger
By the way, I once dared ask if Babe Ruth would have been an average hitter today, and I went through a rigorous thorough analysis of how he would hit today. And based on the assumptions and analysis, of which NO ONE questioned, I was forced to conclude, at that time, that Ruth would have been an average hitter today. My conclusions were questioned, but instead of going back to where I went wrong, people came back with other arguments.
Afterwards I realized that my error was in not handling regression towards the mean (of which I was not aware of at that time), the most important concept any sabermetrician should be aware of.
I'd hate to think what people would have called me for making the first statement that Ruth might have been an average hitter today, while not recognizing that I corrected myself in further research.
This is why we have peer review after all, something that Voros was extremely open to, publishing virtually all of his findings. This is a far cry from some sabermetricians who will only publish their results (which I do time to time).
Now, Tom Tippett was very thorough and honest with everything he's done, issuing corrections and addendums to his conclusions, based on new material that he's been made aware of. He's opened himself up to peer review, and so, we should try to further his findings.
So, can we get back to the matter at hand? Let's talk about the topic, rather than people's opinion of the topic, and people's opinion of people's opinion of the topic.
Tippett and DIPS (August 1, 2003)
Posted 12:11 p.m.,
August 4, 2003
(#56) -
tangotiger
When you post under a different handle than what you normally use, the point is exactly for the reason you are specifying: to protect your good name, from the darker but still real thoughts that you have. If it's not cowardice, it's dishonest.
(If it's not cowardice or dishonest, what in the world would you have to do to be a coward? It's like Reagan saying "no, that's not a threat... that's a promise". No, it's a threat, because there's nothing worse that you can do to make that "promise" a threat.)
If I want to say that Bill James "just doesn't get it" about linear weights, and that "Runs Created is dead", those are pretty harsh words, and I'll stand by them. For me to say that with a big over my head... what does that do?
And before anyone asks, Tangotiger is my handle or nom de plume, one that I use with the same thought and respect that I would use with my real name. For privacy's sake, I don't give out my real name. (I figure Mark Twain and Dr. Seuss don't have a problem with nom de plumes, and those "names" are much more valuable and well-known than Samuel Clemons and Theo Geisel.)
Tippett and DIPS (August 1, 2003)
Posted 12:28 p.m.,
August 4, 2003
(#57) -
tangotiger(e-mail)
And that's my last comment on "etiquette". If anyone else wants to talk about this issue, send me an email, and I'll open up a separate thread, where those who want to continue this, will have their chance.
Tippett and DIPS (August 1, 2003)
Posted 3:42 p.m.,
August 4, 2003
(#67) -
tangotiger
I was thinking of calling the book "Tangotiger presents [The Book] by Mitchel Lichtman".
Baseball Prospectus doesn't put bylines to most of what they write in their own book. Maybe the "brand" of BP or the "brand" of Tangotiger is good enough. Maybe I can spin off magazines like Oprah and Rosie. Anyone want to design a "T" logo?
Seriously, I'll probably "come out" by that time... not that there's anything wrong with the way things are now.
Tippett and DIPS (August 1, 2003)
Posted 9:57 a.m.,
August 5, 2003
(#70) -
tangotiger
Triples, relative to doubles, is almost definitely due to: speed, handedness, park, OF fielding.
How do we know? Well, the age progression of triples/(2b+3b) is almost exactly the same as sb/(1b+bb). Triples is easily an artifact of speed.
Other variables, like if the batter's handedness (more balls hit to RF) and the park would explain some of the individual variations.
Extended Pitch Count Estimator (August 4, 2003)
Discussion ThreadPosted 9:10 a.m.,
August 6, 2003
(#2) -
tangotiger
Bob, let's take it backwards. If let's say you construct a model where a pitcher, with every pitch, will do the following to a batter:
make contact for a fair ball: 90% of the time
others (fouls, swing and miss, or called ball/strike): 10%
I think it's easy to see that at some point, that 10% will eventually lead to a walk or a K. So, what value for "others" would I have to set to guarantee that the PA will end with a fair ball? Mathematically, I have no choice but to set "others" to 0%.
Now, this is not exactly how baseball works. Every count offers the pitcher/batter matchup different expectations for the batter to get a fair ball. And in fact, every individual pitcher/batter/count matchup has their own rates.
But, to the extent that I want the BIP rate to be 100%, I have to put the pitches/PA at 1. We also know that the league average is around 3.7 to 3.8 pitches when the BIP rate is around 73 to 76%. So, we have that fixed point. We also have the two sample points provided by Randy Johnson (57%, close to 4 pitches/PA) and Brad Radke (81% and close to 3.5 pitches/PA).
So, given those 4 points (and some other behind-the-scenes work that I'm doing), those equations come out. It maxes out to 3.5 pitches / BIP, 5.5 / BB and 4.9 / K, though I suspect that a pitcher with very high BB and K would probably throw more pitches than I'm showing here (will go to 3-2 more than someone else). It may be that I also need a function for BB that is based on the BIP rate AND the K rate.
In any case, I think what we have here is a reasonable basis to further the research on estimating pitch counts.
Extended Pitch Count Estimator (August 4, 2003)
Posted 1:05 p.m.,
August 6, 2003
(#4) -
tangotiger
Bob, what you are saying is that the penalty for the walk is so high that a pitcher would rather finally give a fat one down the middle than to give up a walk.
That a pitcher/batter will work the plate to the point where the pitcher can throw at the corners and risk a ball, or a batter will wait for a fat one and risk a strike.
But, don't forget there are also swinging strikes. If the pitcher has the batter 0-2, it's very possible therefore that he can swing and miss, and thus the strikeout.
What you are saying is that the batter has the chance of swinging and missing on 0-0 and 0-1, but that he cannot at 0-2. Is that because the price of the strikeout is so high that he just absolutely has to get the ball on the bat, rather than take a big cut and maybe get a solid hit on it?
In the "real-world", there are a few pitchers that I estimate had 95% BIP rate (back to the early 1900s). From that standpoint, the pitches / BIP checks in at 3.0. Even at 99% BIP rate, I'm estimating 2.7 pitches / BIP.
So, from 99% BIP to 100%, I'm going from 2.7 to 1.0. I really have no reason to make it go to 1.5 or 2.2 or anything else.
I agree, it's an interesting scenario, but one which we probably don't need to worry about, as you are also saying.
(There were 2 pitchers in the beginnings of baseball with no walks and Ks and at least 150 PA, but they used different rules for balls and strikes.)
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Discussion ThreadPosted 3:36 p.m.,
August 5, 2003
(#1) -
tangotiger
Lowering the bar to at least 250 PA in consecutive years, and you get the same order of results (all r are about .05 less than the above ones).
Event r
K 0.74
BB 0.61
1B 0.40
HR 0.30
XBH 0.22
1bBIP 0.18
2bBIP 0.17
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 4:21 p.m.,
August 5, 2003
(#6) -
tangotiger
You know what's even more weird? The year-to-year correlation for single/BIP and (2b+3b)/BIP for that second class was .18 and .17, right? The year-to-year correlation for (1b+2b+3b)/BIP was .15.
I think, though I'm not sure, that this must imply some negative relationship between 1b and 2b+3b. This may be due to the GB/FB tendency of the pitcher (FB pitcher has more xbhits and outs, than a GB pitcher).
(For the 500 class, those numbers are .25, .21, .20)
As for what it says of DIPS, there's no change. The year-to-year r is .20 as has been reported by many people many times with very different data sets. It's still our best guess that if a pitcher has a (1b+2b+3b)/BIP rate of .320 and the league is .300, then a pitcher's "true" talent, based on the BIP, is .305 (80% regression towards the mean, or 1-r). This applies to pitchers with 500 to 1200 PAs.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 4:22 p.m.,
August 5, 2003
(#8) -
tangotiger
I think I do agree that fielding and park may play a much larger role on flyballs than ground balls, and therefore, we'll see the pitcher's influence relatively less.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 5:22 p.m.,
August 5, 2003
(#19) -
tangotiger
I really gotta go run, but here you go. I broke up the pitchers with at least 250 PA in both years into "GB" and "FB" pitchers.
I ran the correlation only for the xbhBIP category. The FB pitcher's year-to-year r was .10, while it was .19 for the GB pitchers. Seems to me that park and OF fielders play a big part here.
Ok... I'll do the same for 1bBIP: .13 for GB pitchers, and .15 for FB pitchers. Again, makes sense.
Sorry, but I don't have the breakdown by FBhits, GBhits, though that would be very useful.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:49 a.m.,
August 6, 2003
(#30) -
tangotiger
We are not trying to establish if a pitcher has a skill, even though I and others are saying that, when we look at the year-to-year correlation.
What we are really saying is "does this particular metric correlate well year-to-year.... and if it does NOT correlate well year-to-year, then we should not be using it as a basis to predict the next year's metric".
So, if we replace the "ability" talk with the "metric's persistence", I think we'd be more accurate.
So, regardless of the extent to which a pitcher has a skill at preventing hits on balls in park, we are saying that:
[Official quote]
the metric "hits per ball in park" has an r of about .20 among pitchers with 500 to 1200 PA, and therefore we need to regress that metric heavily (80% for the group, which may not necessarily apply to the individuals to the same figure), if you want to predict next year's metric.
Even having next year's metric still does not tell you about the pitcher's true underlying skill at preventing hits on balls in park. Just to the extent that we can measure this underlying skill, that's our best guess as to the expected outcome of that skill, with a [insert number] margin of error.
It may very well be that if we look at very specific breakdowns by zones, opponent, fielders, park, weather, etc, that we CAN ascertain what a pitcher's skill is at preventing hits on balls in play (see: PZR). It's just that, for the moment, the metric called "hits per ball in park" does not do a good enough job at establishing the pitcher's skill with "hits per ball in park". (This would be similar to ERA, earned runs per 9 innings, does not do a good enough job to establish a pitcher's skill at allowing "earned runs per 9 innings".)
[End Official Quote]
************
I may be completely wrong, but the numerator is irrelevant to establish the "strength" of the correlation. Triples/PA for a hitter I believe has an r over .50.
Think of it this way. Say I do: x = Triples/PA * 10 + .300, and then say newRate = x / PA. And I did a correlation year-to-year with either Triples/PA or x/PA.... I'm almost positive that my "r" will be identical.
It's the denominator that counts, not the numerator.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:47 a.m.,
August 6, 2003
(#32) -
tangotiger
Tom or Tango or Tangotiger is good.... I'm not old enough to be a mister.
Ok, I just ran the following test, and perhaps you can tell me what it means. I took 5 pitchers each with 1000 PA, and I randomly gave them, for each PA, a double, a single, or an out, at the rate of 0.1, 0.2, 0.7.
Therefore, we "know" what their true rates are. And we give them a full season to let their true rates manifest themselves.
Then, I did the same for year 2.
As an example, here are singles allowed, year-to-year, for the first 4 of the 20 pitchers in my group.
203,201
208,192
211,196
199,207
Now, since we know, absolutely know, that it's the same talent rate, then we should be able to explain the "r" based strictly on some statistical principle, probably standard deviation. [I'll let you insert that here.]
Anyway, for these 20 pitchers, here are their year-to-year r
2b: .18, 1b: .47, out: .11
Wouldn't we have expected the out, with the highest numerator, to have the highest r, based on your previous explanation?
Now, what I did for a second test was take the same 20 pitchers, but this time, change their talent rates in the second year. For example, allow a .10 doubles rate in the first year, and make it .08 in the second year for 1 pitcher, or .12 rate in the second year. In essence, I'm trying to change the talent rate of my pitcher year-to-year to try to get a lower "r".
Here was the results of that:
2b: .10, singles: .33, outs: .35
I'm not sure what this means, if anything. PErhaps having only 20 pitchers is really limiting, and maybe I should redo with 50 or 100 pitchers.
I look forward to your comments...
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:48 a.m.,
August 6, 2003
(#33) -
tangotiger
In my initial comment, "5 pitchers" should read "20 pitchers".
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:52 p.m.,
August 6, 2003
(#38) -
tangotiger
Very very interesting!
A couple of things. First, what you are showing is that with a pitcher's singles and extrabase skills consistent year-to-year, that the correlation among a group of 10,000 pitchers with exactly 1000 PAs each was .46 for singles and .28 for doubles.
These figures are virtually IDENTICAL to what I have presented at the top of this page. That is, GIVEN that a pitcher has a set skill, the best year-over-year r that you can hope for is .46 and .28.
More specifically, the year-over-year r that I have presented is consistent with a pitcher having a skill where the range is between .18 and .22 for singles and .09 and .11 for doubles.
Am I reading this right?
What if you extend this to .15 and .25 for singles, and .05 to .10 for doubles?
Will the larger spread in talent among pitchers allow us to get an r to approach 1?
So, I guess what I'm saying is not that the low r is telling you that you've got little consistency, but rather that the low r is showing that you can only get little consistency, simply because the range of talent is so tight.
And that "tightness" is really what DIPS is all about.
This is tremendous stuff Erik! Keep it up.
Finally, can you also give the "r" for the out, the largest component of them all? I'm still not convinced. My guess is that the further you get from .500, the less the "r". So, the "r" of the out (which is .2 from .500) should be slighly larger than the single (which is .3 the other way from .500).
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 3:49 p.m.,
August 6, 2003
(#40) -
tangotiger
I think what we are prepared to say is that:
- given the spread of "true skill rates" of whatever metric you want, you can estimate the expected "r" year-to-year
- using the sample year-to-year results, you will get an "r" for those samples
COMPARING these two "r" is what establishes to the extent that you can say that a skill exists (in that metric).
So, we can easily have a hits/BIP with an r of .2 and a BB/PA with an r of .7 and in both cases we can say "yes, a pitcher's skill is perfectly represented in those metrics".
It would be good if the Primate statisticians spoke up at this point to add clarity and conviction to what we are saying.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:29 p.m.,
August 6, 2003
(#42) -
tangotiger
To recap, the year-to-year r is dependent on:
1 - how many pitchers in the sample
2 - how many PAs per pitcher in year 1
3 - how many PAs per pitcher in year 2
4 - how much spread in the true rates there are among pitchers (expressed probably as a standard deviation)
5 - possibly how close the true rate is to .5
6 - the true rate being the same in year 1 and year 2
Given all that, the biggest factor in the K "r" being the highest and the XBH "r" being the lowest may be entirely due to #4. That is, the "r" is not explaining #6 anywhere near as much as we think it is.
Someone please slap me awake... it seems that there's about 10,563 Primates that need to give RossCW an apology?!?!?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 7:31 a.m.,
August 7, 2003
(#45) -
tangotiger
I think, maybe, that simply the tightness of the h-hr / BIP (over a career) is what is being explained and not the "pesistence" of ability, based on the "r".
For those of us hoping that "r" was trying to find the signal, that's not what it's doing. The h-hr / BIP is too tight to find a signal.
So, we should use a heavily regressed h-hr / BIP, but not for the reason of "lack of control".
I think.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:12 a.m.,
August 7, 2003
(#48) -
tangotiger
I would think that you create a model where you have known fixed talents, with a range equivalent to what you think MLB has (however you do that, but you can try different reasonable scenarios). And figure out the year-to-year "r" based on this model, and the number of BIP these pitchers have. That essentially gives you the "upper boundary" of r, which may be something like .2 or .25 for hits on BIP.
If in actual life, the MLB r is .18, well then, that's pretty strong evidence of perisistence, right?
I think (again).
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 12:11 p.m.,
August 7, 2003
(#51) -
tangotiger
Erik, that should be no problem, but there are a couple of issues.
If you've got a pitcher with 800 BIP, chances are that he would be of a certain quality. So, you shouldn't expect a .350 BABIP in this class, based on selective sampling.
If you've got a pitcher with 100 BIP, chances are that the reverse would happen... you'll get lots of .400 BABIP, by luck, and the manager's had enough, and won't put him out there.
That is, I'd expect to find the mean to be different among the classes, and the distribution around them might be skewed based on selective sampling.
I don't know how the spread would be affected.
Send me an email, and I'll send you the file. Unless you need something else, I will give you a file that has:
BIP,1B,2B,3B
for every pitcher by year, 1972-1992, min 100 BIP, and I'll let you select the necessary classes.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 12:11 p.m.,
August 7, 2003
(#52) -
tangotiger(e-mail)
My email address.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:13 p.m.,
August 7, 2003
(#54) -
tangotiger
(homepage)
Erik, if you haven't seen it, click on the above link.
It is the career records of pitchers, relative to their teammates, broken down by career BIP classes.
It is very apparent the skew that exists. We just don't know the reason (selective sampling, ability, or both).
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 3:17 p.m.,
August 7, 2003
(#56) -
tangotiger
Virtually exactly what we expected if we suspected selective sampling.
A couple of points: you should probably at least adjust for the year-to-year league changes in BABIP. Park does play a role, but it's not like pitchers at Dodger Stadium will get more BIP per season than at Fenway. We kinda expect to have 1 pitcher on each team with 900 BIP, on one each with 750BIP, etc, etc.
What's interesting is that after 600 BIP, you are talking about guys with at least 30 starts. So, it's not like the manager will have suffered with a pitcher for 30 starts and then pull the plug on him. Essentially, selective sampling should not play an issue with 600+ BIP.
Therefore, the effect we see from the 600 to 999 classes would probably be due to skill more than anything.
From 200 to 600, it's pretty stable, and that's probably also due to great relievers balancing out the starters who couldn't cut it after bad luck.
The impact that we are talking about is that the great pitchers will have a BABIP of .272 against the league average of .282. That's .01 hits / BIP, or 7 hits per 700 BIP. That's really what we'll be talking about, after the dust settles.
The range of skill is so tight among MLB pitchers that there's little to differentiate at this level (with the metrics currently at our disposal).
The conclusions that DIPS is showing is still supported, jsut that the statistical justification using the "r" is not applicable for those conclusions (at least to the extent that we first thought).
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 3:27 p.m.,
August 7, 2003
(#57) -
tangotiger
Tippett brought this up, and since no one else is going to say anything, I will.
The idea to use the team $H (or BABIP) in place of the player $H is severely flawed (though I have used this process many many times).
Because of what we now know about sample sizes affecting the correlation, it's probable that the reason that the team $H works better than the player $H might be simply due to the team $H being based on a much larger sample (4000 BIP to a pitcher's 500 BIP).
In fact, I would bet that if you randomly took any team $H, and compared that to the next year's pitcher $H, that it would do better than the current year's pitcher $H.
Therefore, if you want to do this "substitution" process to kind of mimic your team's fielders, you should find a pitcher on your team with a similar # of BIP. So, if you've got Steve Rogers with 700 BIP and a $H of .270 and Charlie Lea, with 650 BIP and a $H of .285, then use Lea's $H as your control. I think that would work out better.
If DIPS holds, then we'd expect that the pitcher and his control will have an equal "r" when compared to the pitcher's next year's $H.
If someone wants to do this, you should control for
- both pitchers being on the same team in year x
....(and at least 600 BIP each to kind of circumvent selective sampling issues,
.....and have the number of BIP within say 10% of each other),
- the pitcher being studied to also be on the same team in year x+1 (and also at least 600 BIP).
Anyone want to try?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 6:31 p.m.,
August 7, 2003
(#59) -
tangotiger
#BIP #seasons observed STDEV..... expected STDEV if all random
200-299 1446 0.032... sqrt(.28*.72/250)= .028
300-399 812 0.0268... .024
400-499 592 0.0245... .023
500-599 507 0.0221... .019
600-699 579 0.0210... .018
700-799 454 0.0204... .016
That is, if all the pitchers in the 500-599 group were all just pitchers of the same ability, we'd expect 1 SD = .019, while we observe .0221.
However, as Erik is showing us, to match the observed, the spread of the pitchers true talent cannot be the same (even though my 019 and 022 look so close). We are showing that the standard deviation of the true ability must be .012, essentially across all samples. This is a great discovery!!
And this .012 is much higher than I would have expected. This means that 95% of the pitchers are within +/-.025 hits / BIP. At 700 BIP, that works out to +/- 17 hits. This is more than double what I would have expected.
The problem is that even though we have this huge gap, we just can't measure it on an individual pitcher's basis with much reliability (until his career is almost over).
Lots to think about....
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 7:37 p.m.,
August 7, 2003
(#61) -
tangotiger
(homepage)
2 things I forgot about: park and fielding.
Go to my site to get park factors. Divide by 2 to simulate half season at a park. Maybe randomly assign a park to your simulated pitchers.
Also assume about 1 sd = .008 hits / BIP for a team of fielders.
Run your stuff again. I think that'll cut your numbers down to half what they are showing.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:38 a.m.,
August 8, 2003
(#65) -
tangotiger
(homepage)
Erik,
To simulate park, that's easy enough. Just go to the above link. We see that the stdev for park is .0085. Since they play half their games at home, the "seasonal" park adjustment would be .004.
We definitely have to simulate fielding, but the question is "how"? If I look at team-level UZR, on a year-by-year basis (n=120 over 4 years), the stdev is about .0100 (but you need to regress somewhat). If I take it on a multi-year basis (1999-2002, n=30), the stdev is .0070. Since teams do turnover, I think the answer lies somewhere in-between, I'd guess. So, I'd make that .008. (I'd guess that if you even just used ZR, or any other measure, you'll get similar results.)
If you were to run your simulation where you set the standard deviation of the park to .004 and the fielders to .008, we can figure out what's left over for the pitchers.
Now, you can try running your sim so that fielding is set to .006 or .010 or anything (reasonable) you want really. So, you can say that "if fielding stdev is .006, pitching stdev is .007... if fielding stdev is .008, pitching stdev is .005", or something alone those lines.
This is really exciting! We can finally come up with the proper "split" between fielding, pitching, and park.
My original guess would have been a 4/3/2 split between fielding/pitching/park.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 2:15 p.m.,
August 8, 2003
(#67) -
tangotiger
1) Both the park effect and the defensive deviations you mention are the observed standard deviations, correct? If so, I would think that the measured stdev is larger than the true stdev as in the general case. I think we can account for this somehow.
The link I have for the park effects on DER is over a 17year period, or about 80,000 BIP per park. Feel free to regress whatever your sim would say to regress. That is, run your sim giving each team 80,000 BIP, and try to match the observed. I have to believe that you won't regress more than 5%.
2) I just read the UZR Primer by Mitchel Lichtman. However, he focuses on individual performance, not team performance. Do you have a good article relating to team UZR?
On my site, I have MGL's file by player, pos, team, year. My results I just published was based on this data.
3) It appears that UZR ignores certain outcomes (pop flys?) which would not give credit to pitchers who were able to induce lots of pop flys. I am worried this (or other effects) might give more credit to the defense than is due.
That's a fair point. For every one ground out (as opposed to ground ball), there is 0.7 flyouts and 0.3 line outs and pop outs (about evenly distributed). Again, if you want to set aside a certain percentage of BIP as fielder-independent, that's a good idea too.
4) Has anyone done a comparison of year-to-year correlation of pitchers who remain with the same team, versus pitchers who change teams? This seems like it might provide some insight into how much control a pitcher has.
Yes, and I think the Tippett article also examined that. If I remember, he said the year-to-year r of pitchers who switched teams was .09.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 3:43 p.m.,
August 9, 2003
(#75) -
tangotiger
I would say that the stdev for team fielding would be .008, as I noted earlier.
I suppose we can break down UZR by "IF" and "OF" and get stdev by that level, as an approximation for "GB" and "FB".
Then, we can use the GB and FB rates of the pitchers to figure out the extent to which the IF or OF is impacting them. (Sid Fernandez would be impacted by the OF or FB stdev more than the IF or GB stdev. So, you give Sid and Gooden et al the same OF stdev, but that stdev would apply to Sid more than anyone else.)
I'll guess that the stdev for GB and FB rates is .04. Virtually all pitchers are within .12 of the league average.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:54 p.m.,
August 10, 2003
(#77) -
Tangotiger
Ok, how about we split up the fielding factors by team by position.
Then, give each pitcher his own ball distribution.
THEN, you can figure out the effect the fielding has on the pitcher.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:54 a.m.,
August 11, 2003
(#78) -
tangotiger
Let me take a few steps back. We're almost to the point that if we start doing all this (accounting for fielding by position and accounting for ball distribution by pitcher), we are really just doing UZR and PZR, and therefore, no need to do this sim analysis.
Therefore, I suggest that to proceed in baby steps, that we assume that the pitchers have the same ball distribution, and that the fielders on the same team are equals as defenders.
Once we get those results in, we can start adding layers like taking into account ball distribution, and individualized fielding.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 2:28 p.m.,
August 11, 2003
(#80) -
tangotiger
Please note that I meant that each fielder on the same team would be "equals", but that each team of fielders would follow the .008 standard deviation that UZR says it is.
Like I said, start off with baby steps, and work your way up.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:57 p.m.,
August 12, 2003
(#83) -
tangotiger
Erik, SUPERB stuff! I think as a rule of thumb that on balls in play, fielding/pitching are 50/50, based on your analysis in Case 2.
Now, what if case 1 is more representative? You asked how I got the ".008" as the true expected. In my post #65, I said the following:
If I look at team-level UZR, on a year-by-year basis (n=120 over 4 years), the stdev is about .0100 (but you need to regress somewhat).
Therefore, if you want to rerun to establish what the "true rate" based on this "observed rate", remembering that we've got n=120, perhaps we will find that the true rate standard deviation will be .007 rather than .008 for fielding. My guess is that if you rerun using .007 for fielding, that you'll get .008 for pitching.
In any case, even if you stop here, I think you've added a tremendous amount of knowledge to this.
Our current best-guess is that fielding and pitching are (more or less) equally impactful on BIP.
The revelation about what an "r" really is is also incredibly important to us non-statisticians.
If you were to rewrite your research and analysis, I'd be glad to post it here, or send it to the "home page" of Primer.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 7:18 a.m.,
August 14, 2003
(#99) -
Tangotiger
To split GB/FB: I mentioned earlier that we can probably use a mean of .50, with a standard deviation of .04. I'll confirm that later.
I will redo the UZR observed calcs, splitting between IF and OF (to approximate GB/FB). I'll guess that we'll get a std dev of .015 observed for each.
For the park, for the IF/GB, I have to believe that the effect is almost all grass/turf. From that standpoint, you would do something like +.002 grass, and -.002 turf or something. If someone wants to look at the DER factors I put up, you can probably make a good guess at that. Maybe.
So, for the OF/FB factors, you'll probably get .006 or .007 for the standard dev.
So, as the next baby step, in addition to the steps Erik has taken, you randomly assing a pitcher a grass or turf park (based on the 1972-1992 teams), and you randomly assign him a GB/FB tendency, and you randomly assign him an OF/FB park factor.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 7:19 a.m.,
August 14, 2003
(#100) -
Tangotiger
And this is where now Erik has to split things in 2: based on what is GB/FB rate is, you'd have to give say your first pitcher 2000 GB and 2550 FB, and then use the appropriate park and fielding factors for each of GB and FB.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:56 a.m.,
August 14, 2003
(#103) -
tangotiger
Among the 752 pitchers with at least 1000 PA (average of 3,589), the standard deviation to their GB rates was .078.
Among the 183 pitchers with at least 5000 PA (average of 8,106), the standard deviation to their GB rates was .066.
Among the 36 pitchers with at least 10,000 PA (average of 12,705), the standard deviation to their GB rates was .058.
My guess is that if you were to run your "expected" to "observed" sim using these numbers, that you would get the "expected" stdev of .04.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:14 a.m.,
August 14, 2003
(#104) -
tangotiger
The standard deviation, observed, over 120 team-years:
whole team: .008
IF only: .010
OF only: .013
The rest of this take with a grain of salt, since I had to make some assumptions. Anyway, by position, over 120 team-years, here are the stdev:
2b/ss: .015
3b/lf/cf/rf: .023
1b: .030
Now, what can you do with this information, besides what we've talked about? Well, you can FINALLY answer the question: is fielding talent at a position independent on a team level? That is, do teams seeing that they have a bad SS counter that with a great 2B? Or, are the talents at the positions randomly distributed?
Well, once Erik or someone confirms what the "expected" stdev is based on these true rates at a position level, you can then see if using these values as independent variables will match the observed at either the IF/OF level or at the team level.
My guess is that teams DO treat positions rather independently.
Should be fun to find out...
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:31 a.m.,
August 14, 2003
(#105) -
tangotiger
Actually, Arvin, I'm assume "league average" for everything else. For example, I already published the DER park factors over the 21 year span. The standard deviation (50% home, 50% road) was .004. I'm assuming that over that many years and BIP that the observed and expected would come in at pretty much the same thing.
A pitcher takes a random point inside this DER park factor, and when applied with the pitcher's expected DER rate, and his sample (say BIP=600), this will match the observed DER rates (which I sent to Erik).
So, now we're extending that. We're saying that a pitcher will have a random GB rate which we're taking from the stdev observed of .06, which we have the "true" rate as probably .04. We split up his BIP into 2. Instead of the park factor DER, we use the park factor IF or OF DER. Since the observed at the IF/OF level is around .011 or so, the expected might be .006.
etc, etc...
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 9:50 a.m.,
August 15, 2003
(#110) -
tangotiger
(homepage)
Excellent ball distribution data can be found at the above. Essentially, this is what I used, even though it doesn't exactly correspond with the same time period.
I did not park adjust any of the figures I supplied. I will rerun my standard deviations by pos to make sure I've done it correctly.
Yes, I use UZR for everything.
The .010 was the observed standard dev for n=120, and .008 was the "agreed expected true", which you can confirm using your process.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:41 a.m.,
August 15, 2003
(#111) -
tangotiger
Ok, reworking my numbers to match the Levitt numbers, this is what I get for standard deviations. n=120.
Both: .009
IF: .013
OF: .013
rf: .020
2b: .020
ss: .021
lf: .022
cf: .026
3b: .031
1b: .032
(Doing a weighted average of the above, and we get a value of .024. I think for ease, we should consider the standard deviation on a per-position basis to be the same and equal to .024. Erik, it's your time, so do whatever you figure you can handle.)
These standard deviations are all observed and need to be sim-ed or calculated to determine the "true rates".
There's something that looks strange with the Levitt numbers. For example, the BABIP against SS comes out to .066, while against CF it's .454.
I do know that the BABIP for GB and FB are more or less similar (about .030 off, with the value lower against OF). But the Levitt numbers show a BABIP of under .100 for IF and over .500 for OF. Do some balls hit into the OF count as GB? Is this what I'm missing?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:15 a.m.,
August 15, 2003
(#112) -
tangotiger
When I think about it, each of those positions need to be regressed a different amount, since the opps for each position is different. So, first, we'd have to do the sim process to get to the true rates for each position. THEN, we can do a weighted average if we want a uniform true rate to use.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 3:58 p.m.,
August 15, 2003
(#115) -
tangotiger
Arvin,
I don't know what you are talking about! Please re-explain.
What I am saying is that for the *park factors* DER, those splits were based on 21 years of data, comprising about 80,000 BIP per team. So, if I say that Fenways is +.020 hits / BIP compared to a non-Fenway park, my guess is that this observed difference will be pretty darn close to whatever "true" difference would produce this observed difference over 80,000 BIP. Are we talking about the same thing here?
As for UZR, just think about ZR or DER instead. We are simply talking about how many extra outs a fielder makes / BIP. The standard deviation, on the observed team-level data (n=120) is .010. Broken down by position, the observed standard deviation (n=840) is .024.
If you regress a certain amount, or calculate the "True" rate using this sim process, my guess is that the true standard deviation that produces those observed figures would be .008 for team fielding and .015 for positional fielding.
Mike: interesting. Can you provide the "plays,outs,hits in zone" (however you want to define it), split by GB/FB, by position for any year that you have it?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:26 p.m.,
August 16, 2003
(#118) -
Tangotiger
.010 is the observed stdev, which we simed (or mentally regressed) to .008.
Arvin, is that a statistical equation? Because it is brilliant and simple! Pythag move over, make way for the Arvin theorem.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:49 p.m.,
August 16, 2003
(#119) -
Tangotiger
Arvin's theorem is intriguing. For example, I mentioned that the observed stdev for IF and OF was .013, and for the team it's .009 (according to my post 111).
Let's see what happens with this new equation, and realizing that half the BIP are IF and half are OF (let's say).
Observed team ^ 2 = [(.013/2) ^ 2] + [(.013/2)^2] = .009 ^ 2
Wow!
How about if we use the .024 for each of the 7 positions? Following the same process, and we get: .009!
Holy moley!
Now, if you want to really impress me, tell me how to get from the observed stdev to the true stdev. That is, how much do I regress towards the mean, given the sample size? Do I make it k/sqrt(n)? How do I know what to set k to?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:17 p.m.,
August 17, 2003
(#121) -
Tangotiger
Ah, got it now. Tremendous stuff.
So, to go back to first my team-level data. I showed an observed stdev of .009 for my 120 teams, each of which has about 4500 BIP. In your equation above, is n=120, or n=120x4500 or n=4500? If n=120, then how do you account for a team having 4500 BIP or 62 BIP?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 9:45 a.m.,
August 18, 2003
(#123) -
tangotiger
Good stuff again.
So, we have
.090^2 = .28*.72/4500 + true^2
that makes the true std dev at the team level as: .090
Actually, even after only 450 BIP, the true stdev rate comes in at .087.
Am I doing this right?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 9:49 a.m.,
August 18, 2003
(#124) -
tangotiger
Oops... that should be .009. Working it out again, and we get: .006
So, that's the fielding.
.012 ^2 = .006^2 + .004^2 + pitching^2
pitching = .010
So, are we saying that each pitcher has a .010 stdev, each team of fielders is .006?
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:05 a.m.,
August 18, 2003
(#125) -
tangotiger
Continuing in the same vain:
true team fielding ^2 = true 1b fielding ^ 2 + true 2b fielding ^ 2...
.006 ^ 2 = [(t/7)^2] * 7
(That is, each position is on average getting 1/7th of the plays, and there are 7 positions. See post 115 for more info.)
t = .016 = true avg single fielding position
So...... the true standard deviation for a single position is about .016. The true standard deviation for pitchers is .010. So, on any given BIP, the fielder has more influence than the pitcher.
On a group of BIP, the pitcher has more influence than the team of fielders.
Anyway, since we know that range of fielders UZR runs is about +/- 30 runs (and since we know that their stdev is .016), then I would make a guess that pitchers with a stdev of .010 would have a range of +/- 20 runs. That is probably our best guess as to the influence of the pitcher on BIP.
Just taking a wild guess, but if the range is +/- 20 runs, then 1 stdev is probably +/- 6 runs. So, we expect say 95% of pitchers to be +/- 12 runs.
Since our best interpretation of BABIP shows that a pitcher's skill is about +/- 8 runs, for 95% of them, then the BABIP is not a good enough metric to capture the real skill that a pitcher has on the influence of BIP.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 10:14 a.m.,
August 18, 2003
(#126) -
tangotiger
Sorry for the continuous posts, but I'm writing faster than I'm thinking. That last step you should ignore, as it uses different denominators.
Anyway, since we've established for fielders that 1 stdev is .016, and if they average about 650 plays each, that gives us 1 stdev = 10 plays per season, or about 8 runs. That's 1 stdev for fielding runs for an average fielding position.
For pitchers, 1 stdev is .010. The average full-time starter will have 700 BIP, and the average reliever will have 200 BIP. For the starter, .010 stdev on 700 BIP = 7 plays per season, or about 6 runs. That's 1 stdev for pitching runs for an average full-time starter. That means 95% of pitchers have a skill at preventing hits on BIP to the tune of 12 runs per 700 BIP.
I believe that our current interpretation of BABIP is that 95% of pitchers have a skill to the tune of 8 runs per 700 BIP, but I'll have to look that up again.
Bottom line? Pitchers have the skill, not as much as an individual fielder (60/40 split on a single BIP), but they have more skill than a team of fielders (60/40 split the other way for a season of BIP). And the BABIP metric is not good enough to capture this skill.
Which is why we need PZR to find their skill.
Tremendous work by Erik and Arvin!!
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:50 p.m.,
August 18, 2003
(#127) -
tangotiger
Erik, Arvin, and anyone else who has contributed to this thread: I was thinking of doing a writeup of this entire thread as an article, hopefully citing everyone's work at the appropriate places. This has really been an eye-opener for me, and perhaps having a (hopefully logically ordered) detailed summary of the really incredible work by Erik and Arvin would be the reference point for DIPS going forward. (Arvin, I've already got Erik's email, so please send me your email address.) I don't know about anyone else, but I call this a sabermetric orgasm!
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 7:08 a.m.,
August 19, 2003
(#131) -
Tangotiger
I think you're right, but I've got to think about it for it to sink in. Makes sense though...
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:10 a.m.,
August 19, 2003
(#134) -
tangotiger
Yes, I think I am agreeing with you. Since our assumption is based on fielding talents on single fielders only, I think we can stick with .006 (though again, I don't see this being impacted to a significant degree if we look at SS to 1B throws, or 2B to SS DPs, etc).
So, what we are saying is that we have a 10/6/4 split between pitching/fielding/park, in that order. Luck plays a part, and that is dependent on the sample size. When n=1, it's almost all luck. When n=1 million, luck is not involved.
So, over 700 BIP, where we observed a .020, we have the following:
observed ^ 2 = .010 ^ 2 + .006 ^ 2 + .004 ^2 + luck ^2 = .020 ^ 2
solving for luck = .016
So, can we say that when a starter has 700 BIP, the influence on those BIP as a group can be broken down by:
luck : 44%
pitch: 28%
field: 17%
park : 11%
??
I have to admit that I've recently said, though I don't remember where, that I thought the split would be 40/30/20/10 with the order being luck,fielding,pitching,park.
What we are saying here is that pitching and not fielding is the larger determinant between the two. And perhaps before I read about DIPS I might have had the correct order.
I think it's still important that yes we need to separate the components (HR.BB,K) from the BIP, as Voros does. But, the conclusions drawn from that does not stand based on the reasoning.
I think our best conclusions would be the follows:
1 - pitching has more impact on BIP than does fielding
2 - luck has more impact than anything, over 700 BIP
3 - BABIP is not a good enough measure for the pitcher's skill
What would be interesting is that if MGL or Tippett or someone with pbp data gets around to implementing the PZR blueprint I published (the flip side to UZR), that we'll get closure on this subject. That is, we should be able to get the standard deviations on the pitcher's side that will support the data we are inferring here.
So, before we trample in any direction, it may be worthwhile to keep the case open, pending final data. After all, we may have made a serious miscalc somewhere.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 11:46 a.m.,
August 19, 2003
(#135) -
tangotiger
Someone asked me about the implication of all past DIPS work. I responded the following:
=====================
I'm not really sure of the impact. It's still a blur
to me as to what use to make of it.
What we are saying is that there are 2 components for
a pitcher: his non-fielding dependent skill (HR.BB.SO)
and his fielding-dependent skill.
We know very well how to estimate the former, and not
very well the latter. Since the BABIP figure is not
reliable for an individual pitcher, it's more accurate
to use say 50% lg, 40% team, 10% pitcher to estimate
his expected BABIP. But, that estimate will come with
a very wide margin for error.
The conclusion stands that you need to separate
things, and you can't rely on a pitcher's past BABIP
to predict the future (much like you wouldn't use his
ERA). Still outstanding is WHAT to use for BABIP.
I'll contend that PZR would be that measure. But,
that has yet to be implemented by anyone.
=====================
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 2:47 p.m.,
August 19, 2003
(#137) -
tangotiger
Actually, the observed should have been
600-699 579 0.0210
700-799 454 0.0204
So, at 700 BIP, I should have used .0207. Reworking, and we get a nearly perfect match.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 5:02 p.m.,
August 19, 2003
(#140) -
tangotiger
Arvin,
Please note that each individual fielding position has a true standard deviation of about .016, and an observed of .024, for n=120.
The .006 figure is the TEAM standard deviation for fielding.
So, on a player basis, it's .016 x 650 (or 10). On a team basis it's .006 x 4500 (or 27).
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 5:04 p.m.,
August 19, 2003
(#141) -
tangotiger
Also note that if each team of 7 fielders were independent and randomly chosen, you would get
team ^ 2 = fielder ^2 x 7
And, 10 ^ 2 x 7 = 700
27 ^ 2 = 729
Close enough.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:58 a.m.,
August 20, 2003
(#143) -
tangotiger
I just want to interject something to keep in mind. Remember, our equation is
trueDER ^ 2 = truePitch^2 + trueField^2 + truePark^2
Erik has provided trueDER from his sim, and Arvin has confirmed it with his "observed" equation, and that is .012. I've provided the truePark figure as .004. Dropping all the decimals, and our equation becomes
128 = truePitch^2 + trueField^2
Based on UZR, which I'll have to go over because I'm not sure I'm using the right numerator (Levitt's numbers might include HR), the observed single position UZR is around .025 and the observed team fielding UZR is around .010. So, our true UZR will be somewhere between .005 and .008, probably.
We're not even sure that UZR is the best thing to use, but it is the best thing available at the moment. (You could even use ZR, and I'm pretty sure you'll get a single position observed stdev of .025 for your regular players. This is easy to eyeball since the range of players is mostly around +/-.05 outs/BIP, so that would be 2 standard deviations.)
Anyway, so we've got something like
128 = truePitching^2 + [5 to 8]^2
So, when fielding = 8, pitching = 8.
When fielding = 5, pitching = 10
etc, etc.
So, depending on how the fielding measure is determined and manipulated, a small change there will have a huge impact in the relative value between fielding and pitching.
Exactly How Full of S is OPS? (August 6, 2003)
Discussion ThreadPosted 3:08 p.m.,
August 7, 2003
(#5) -
tangotiger
I mentioned this in my OPS article, but that doesn't matter.
Value = (OBP - baseX) * 3 + (SLG - baseY) * 1
Value = 3*OBP + SLG - 3baseX - baseY
Value = 3*OBP + SLG - whatever
As you can see, it's irrelevant what baseX, baseY, or whatever equals. The DIFFERENCE among the players will remain exactly the same.
Try it out. Take a 400/500 player and a 300/600 player, and take whatever baseline for OBP and SLG you want. The difference between Value(player1) and Value(player2) will be exactly the same.
Hoban - A player ranking (August 8, 2003)
Posted 1:33 p.m.,
August 8, 2003
(#4) -
tangotiger
Hoban was kind enough to reply back to my email, pointing him to UZR, which he is aware of.
I replied:
=========
In order to evaluate Jeter, Nomar, and ARod, you would be forced to use play-by-play data, since that would contain all the data available for the players that you are evaluating.
The less data you decide to use, the larger the margin of error being introduced.
If the intent is to compare Arod to Ozzie to Honus Wagner, it is not clear that using the same data would be the best way. What you can say is that by using UZR, you can say something like Arod is +10 relative to average, +/- 3 runs, and by using the basics, you can say Honus Wagner is +12 relative to average, +/-8 runs (or something along those lines).
While I see the point in using the same metric for all players, you would at least have to use the most advanced metric to baseline the more basic metrics against. The validation of the basic version must be done against the more advanced metrics, with a certain margin of error.
Just my 2 cents.
============
Hoban - A player ranking (August 8, 2003)
Posted 9:13 a.m.,
August 12, 2003
(#6) -
tangotiger
I posted this at Clutch:
Just because you do 5*fld% + RF or whatever, does NOT make one thing 5 times more important. This is the same kind of explanation that people give about OPS.
BEcause these things are different scales, the multiplier does not and cannot imply importance.
What that multiplier does is streeeeetch out the range of performance.
If for example the fld% stretches from .950 to .990, that's a .040 swing. If RF stretches from 4.00 to 6.00, that a 2.00 swing.
By multiplying fld% by 5, you are increasing the swing from .040 to .200. The RF swing is still much much larger.
I don't know what Hoban's exact equation is, but, please keep this in mind when you think of "importance" and matching it to the "multiplier".
Pitch Counts, estimated (August 8, 2003)
Discussion ThreadPosted 4:10 p.m.,
August 11, 2003
(#1) -
tangotiger
Based on the above estimate file, here are how many pitches were thrown, by decade, per game per team.
decadeStart pitchesPerGame
1890 134
1900 129
1910 133
1920 135
1930 137
1940 137
1950 139
1960 139
1970 139
1980 140
1990 143
2000 144
As you can see, we are increasing the counts by about 1 pitch per decade. So, I don't think we can say that it's harder for a pitcher to get a complete game these days because more pitches are thrown. I think it simply comes down to that pitchers today are on much tighter pitch ranges on a per start basis, even though, on a per-season basis, today's pitchers pitched as much as any non-70s pitcher.
Pitch Counts, estimated (August 8, 2003)
Posted 3:16 p.m.,
August 13, 2003
(#3) -
tangotiger
It's fun for me too.
One thing that I didn't show was the number of pitchers, per year, with a pitch count of at least 4000. By far, since 1919, the 1969 to 1975 time period shows the largest number of pitchers at that level. From 1989 to the present, it's by far the lowest.
Number of starts has something to do with it. The increase in the number of pitchers in the bullpen might be another reason.
And I agree, it's not like those workhorses were getting injured like crazy in the 70s.
If it was me, I'd go back to the 1970s style of starter and reliever usage.
Pitch Counts, estimated (August 8, 2003)
Posted 7:05 a.m.,
August 14, 2003
(#5) -
Tangotiger
The earliest pitch counts I have is for Koufax, and those haven't changed.
However, why SHOULD they change, be it 1911 AL, or 1976 College? Think about it. The rule is 4 balls 3 strikes and 2-strike fouls. If you end up with 75% contacted balls, 15% Ks, and 10% walks, don't you think the ball-strike progression to get to those observed results would be similar regardless of league?
If Koufax, Feller, Walter Johnson, or RJ are all at 60% contacted balls, 30% Ks, 10% walks, again, could the approaches of the batter/pitcher be that different that the ball-strike progression for each pitcher be completely different?
But, like I said, this is mostly theoretical. I'll be getting the Dodgers 1947-1964 data soon, so we'll see how that stacks up.
Pitch Counts, estimated (August 8, 2003)
Posted 10:27 a.m.,
August 14, 2003
(#6) -
tangotiger
Remembering that I did NOT use Koufax as part of my sample to establish my equations, here is how Koufax stacks up, through 1964:
Actual pitches thrown: 26,450
Estimated, xPCE: 26,785
Estimates, basic: 26,300
So, the xPCE is 1.3% too high, and the basic is 0.6% too low.
So, if I say that Steve Carlton averages say 4,300 pitches/season, I'm probably within 100 pitches/season of being accurate.
Of course, getting the pitch count totals for games of yesteryear would be great to validate against.
Pitch Counts, estimated (August 8, 2003)
Posted 3:55 p.m.,
August 14, 2003
(#7) -
tangotiger
By the way, I'm not saying that pitches are thrown at the same rate, whether on a pitches/PA or /IP or /game basis. What I am saying is that the function of pitches/PA is dependent almost entirely on the rate at which balls are put into play. So, if you have an era where most balls were put into play, then the pitches/PA would be lower than otherwise.
Psychological Impact of Losing an Easy Game (August 9, 2003)
Discussion ThreadPosted 10:34 a.m.,
August 10, 2003
(#2) -
Tangotiger
I believe that the link underlying that thread points to a study that does that.
BP - Sample size and park factors (August 11, 2003)
Discussion ThreadPosted 7:50 p.m.,
August 11, 2003
(#4) -
Tangotiger
Patriot, I agree with you.
Depending what you want, sometimes run PF are what you want, and other times component PF.
Discussion ThreadPosted 10:43 p.m.,
August 11, 2003
(#6) -
Tangotiger
FJM: can you calculate the odds that a team of pitchers that gives up 10% of their hits as HR would give up 12.5% at home and 7.5% on road, given 400 hits in each, by random chance alone?
Posted 1:41 p.m.,
August 12, 2003
(#8) -
tangotiger
So, combining the two, it's 1 chance in 460 of having those rates occur by luck, correct? Seems like something's up, especially if you go back to the history of Dodger Stadium where I presume that the split would usually be the other way.
Posted 7:38 p.m.,
August 12, 2003
(#13) -
Tangotiger
FJM: the 10% was simply a historical figure that I like to use.
Neyer - Angels (August 15, 2003)
Posted 1:43 p.m.,
August 15, 2003
(#2) -
tangotiger
That "32" should be "34".
Neyer - Angels (August 15, 2003)
Posted 3:48 p.m.,
August 15, 2003
(#6) -
tangotiger
(homepage)
I agree you can't just say take the last 3 years, and ignore the rest.
The homepage is Erstad's br.com page. His career OPS+ (park adjust OPS) is 100, or league average.
If you were to weight his OPS on a 5,4,3,2,1,1,0.5 basis from 2002 back to his rookie year, his weighted OPS would be: 96.
So, that's pretty darn close to average. Also, don't forget all the little things he does that I mentioned that does not show up in OPS or the "boxscore".
Anyway you cut it, our best guess is that Erstad was an average hitter entering 2003. He has 280 PAs in 2003.
Neyer - Angels (August 15, 2003)
Posted 5:04 p.m.,
August 15, 2003
(#8) -
tangotiger
Actually, just because you perform at +40 runs over average doesn't mean that is your talent level. It's more likely he is +30 runs over average.
Anyway, how can you figure it out at home? Assume there are 3.5 plays available for every CF, and the avg CF makes 3 outs on them (rate of .857). How much would a great and bad CF make? Let's guess .900 on the top end and .800 on the bottom end. So, essentially, a great defender will be +.05 hits / BIP better than average. In this case, that works out to about +.2 hits per game, or about 32 hits per season.
Each hit-to-out is worth .80 runs, and so, the top defender would be worth about +25 runs in this example.
If the great defender is +.06 hits / BIP, then, he would be +30 runs.
For all intents and purposes, I think that +30 is the upper boundary for a CF, and realistically, +20 is the upper boundary for a CF's career.
Neyer - Angels (August 15, 2003)
Posted 3:53 p.m.,
August 18, 2003
(#11) -
tangotiger
No, each position should have its own converter, based on how many extra base hits are hit in his zone. I believe Chris Dial may have published this somewhere, or maybe it was MGL?
Pretty much, I think, it was between .75 and .85 for every position. So, to not complicate matters, I like to use .8
Game-Calling Revisited - Chris Dial (August 16, 2003)
Discussion ThreadPosted 1:08 p.m.,
August 17, 2003
(#2) -
Tangotiger
No, it is based only on those parts that are mostly catcher-pitcher in relationship (or those things that are not dependent also on the fielders). I don't think Chris should even have brought DIPS into play, or any of that other stuff.
New postseason odds (August 17, 2003)
Posted 11:28 a.m.,
August 19, 2003
(#6) -
tangotiger
It's worth noting that BP and Dackle offer very close odds for everything except the 2 central divisions.
The differences between approaches are:
BP: head-to-head matchups do not have binary approaches
Dackle: does not do a good job at valuing a team's true talent level
What does this mean? As long as the number of games remaining is large enough, the BP estimates are more reliable. However, as soon as you've got say 20 games left, the BP estimate would have to be discarded, and Dackle's estimates would take precedence. You can't have the possibility as BP has it that the Yanks and Redsox can win the same game and hope it evens out over such a small number of remaining games.
Pankin - Walking Bonds - SABR presentation (August 17, 2003)
Posted 4:12 p.m.,
August 21, 2003
(#3) -
tangotiger
(homepage)
Aurilia on 3rd, Grissom at 1st. 1 out, bottom of the 9th, tied at 1
Let's look at some win probabilities, assuming that you've got an average opponent, and you yourself are also average. (I'm guessing Smoltz was there, but I suppose we have a great pitcher for the Giants as well? Maybe not.)
Anwyay, bottom 9th, tied, 1 out and:
men on 1b/3b: .829
bases loaded: .835
uhmmm, I said "don't walk?"... let me see. According to the link above, I'm saying don't walk any time you have a runner at 1b or 3b with 1 out. Kinda strange, so let's look into it some more.
If Bonds gets a hit, game over. If Bonds gets a regular walk, he barely gains anything (.006 wins). If he gets an out, the Giants win prob goes down to .643.
So, a hit adds +.171 wins, an out drops -.186 wins, and a walk adds .006 wins.
You know, that really doesn't make any sense. The win prob cannot be .829, it must be much lower. In this case, it's very easy to figure out.
Win prob(when hit wins game) means:
freq(H) * (1 - winprob) = freq(out) * (winprob - .643)
(assuming that the walk is almost irrelevant)
That sets our winprob at .762.
This is really strange, I must have programmed something seriously wrong.
Thanks for pointing this out to me...
Pankin - Walking Bonds - SABR presentation (August 17, 2003)
Posted 10:58 p.m.,
August 21, 2003
(#5) -
Tangotiger
This is really strange, and I'm going to post my thoughts on the matter tomorrow.
The win probs that I have listed are correct, and I have one "sure-fire" way of doing them, and I have a second "fail-safe" way to verify them. I can't verify them for this 9th inning scenario.
As soon as aaron gave me the situation (man on 1b/3b, 1 out, tied, bottom of 9th), this was such an easy "walk now" situation (since the runner being on 1b or 2b almost virtually doesn't matter), that it really stunned me that I said "don't walk".
Anyway, I'll give more details tomorrow, and maybe one of the clever Primates can point out my flaw.
Pankin - Walking Bonds - SABR presentation (August 17, 2003)
Posted 10:09 a.m.,
August 22, 2003
(#7) -
tangotiger
Let's figure out how to calculate run expectancy (RE). Given the following:
- safe play occurs 33% of the time
- RE AFTER a safe play is 1.10 runs
- RE AFTER an out play is 0.30 runs
What is the RE BEFORE this PA?
RE = .333 x 1.10 + .667 x .30 = .5667
So, we can also say:
- LWTS value of safe play = 1.10 - .5667 = +.533
- LWTS value of out play = .30 - .5667 = -.2667
And
.333 x .533 = .667 x .2667
With me?
Ok, now let's try to do this with Win Expectancy.
What's the chance of scoring at least one run (and thus winning the game), with a man on 1b, 3b, 1 out, bottom 9th, tie game, assuming lg average opponents?
You have a 66% chance of scoring your run IN THIS INNING, and winning the game. The other 34% chances you get into Extra Innings, and you have a 50/50 shot at winning.
So,
WE (bottom 9th, tied, man on 1b/3b, 1 out) = .66 + .34x.50 = .830
Doing the same thing
WE (bottom 9th, tied, man on 1b/3b, 2 outs) = .28 + .72x.50 = .640
Ok, so those are our known true WE.
Now, doing the same as we started with RE:
- 26% chance of hit or RBOE = WE of 1.00
- 9% chance of walk = WE of .840
- 65% chance of out = WE of .640 (maybe less cause of DP)
So,
WE = .26 x 1.00 + .09 x .840 + .65 x .640 = .752
But, we expected .830
Can someone figure out where I'm going wrong?
Pankin - Walking Bonds - SABR presentation (August 17, 2003)
Posted 10:12 a.m.,
August 22, 2003
(#8) -
tangotiger
After I posted that, it finally hit me: sac flies! The WE can actually increase substantially, following an out, and my Bonds program looks like it did not consider the SF properly.
Pankin - Walking Bonds - SABR presentation (August 17, 2003)
Posted 6:57 p.m.,
August 22, 2003
(#10) -
Tangotiger
Yup, the win prob tables I published did consider the SF and grounders scoring the runners, etc.
But, my walk/don't walk Bonds did not balance that properly. Kinda embarrassing really. It's easy enough to fix, maybe 1 or 2 hours of work. Not sure when I'll do it though, but I'll target the 1st game of the playoffs.
Pankin - Walking Bonds - SABR presentation (August 17, 2003)
Posted 10:56 p.m.,
August 23, 2003
(#12) -
Tangotiger
Sorry, but I need the pbp event files to do that. I'll only be able to look at this in the off-season.
Advances in Sabermetrics (August 18, 2003)
Discussion ThreadPosted 7:15 a.m.,
August 19, 2003
(#2) -
Tangotiger
Greg, thanks for the kinds words, as I wake up from a night of a crying baby. I don't think any of the stuff I did is a great advance... maybe I'm making a better hammer, a better tool, but UZR or DIPS is a building by comparison.
I think the next great advance would be to get "tools-based" analysis done properly. There's a ton of information in the heads of scouts that needs to be extracted and quantified in a more systematic and widespread use. Of coure, this may already be done by others, and we just don't know about it.
Advances in Sabermetrics (August 18, 2003)
Posted 9:01 a.m.,
August 19, 2003
(#4) -
tangotiger
I've actually been working on #1. If you send me an email, I'll let you know what I've done.
Your #2 and #3 are great ones. I think the #3 will be a huge advance if we can also incorporate scouting.
Advances in Sabermetrics (August 18, 2003)
Posted 11:19 a.m.,
August 19, 2003
(#7) -
tangotiger
I meant more in say Brady Anderson or others with great 600 PA numbers, but not great 1800 PA numbers.
Essentially, and this happens in hockey too, you give a player a 4-year contract based on his first breakout year. And how did they know it was a breakout year? They were hoping/looking for it, and those 600 PAs confirmed it for them.
Unless something fundamental has changed about a player or pitcher's approach to a player, a breakout year is virtually impossible to find. They exist, but I don't think you can find it until well-after the fact. 1997 may have been a breakout year for Pedro, that maybe everything finally came together for him at that time. But we only know that if it's 1999 or 2000, if we use only the numbers.
Was 1988 a breakout year for Mark Davis? Well, he did even better in 1989, and then phhft. If you do a systematic view, I would be surprised if you can find a breakout year, based only on the numbers.
Your visual tools might have spotted Pedro's breakout in 1997, and you might have realized that Mark Davis might have been getting lucky in 88/89.
Advances in Sabermetrics (August 18, 2003)
Posted 12:33 p.m.,
August 19, 2003
(#9) -
tangotiger
It's interesting that we've now got 2 readers saying such a thing, that this advance is 20 years old.
Let me ask a question then: all you know is the following 2 bits of information
- Johnny Walker has an OBA of .380, with 600 PA
- the league OBA is .340
What is your best guess as to JW's true OBA talent level? That is, if he were to have 1 million PAs, what's your single best guess as to his true OBA level? Is his chances at really being .380+ equal to, more than, or less than 50%?
Advances in Sabermetrics (August 18, 2003)
Posted 2:52 p.m.,
August 19, 2003
(#12) -
tangotiger
Ed, this is where we will have to disagree. Our best guess is that it is less than .380.
A player's performance numbers are a sample of his true talent. It is observed.
If I had 1000 such players, regression towards the mean would say that as a group, they would move towards .340. Therefore, if the group moves towards the mean, more than half has to move toward the mean (unless you think these moves are not symmetrical enough that we can say such a thing).
Little Johnny Walker: .380 OBA, 50 PAs
League: .340 OBA
What's your best guess there?
Advances in Sabermetrics (August 18, 2003)
Posted 3:31 p.m.,
August 19, 2003
(#15) -
tangotiger
Rally, yup it would have to be under .380. And you are right, the second part, the degree of movement would be dependent on the number of PAs.
For 600 PAs, I'd guess a 30% regression on OBA, or .368. For 50 PAs, I'd guess a 80% regression, or .348. Just some guesses, and a little ingenuity really. Each component has its own regression factor based on # of PAs.
Advances in Sabermetrics (August 18, 2003)
Posted 3:36 p.m.,
August 19, 2003
(#16) -
tangotiger
I agree that James did do a good job of it. But, I'm not so sure he understood WHY he was doing it, or do I think the stat fan really understood the implication of all this. Even fans on this site will quote you the 11-20 shawn green has against Bacardi Rum, and that this means something of any significance.
James also "invented" replacement level, so I'm not sure what Gary H was talking about with Woolner's advances on that. Comparing the two, James inventing replacement level and regression towards the mean, and the extra knowledge that Woolner has added, and the extra knowledge that we've all come to know about regression from the statisticians around, I think the regression issue had a bigger advance.
Anyway, sorry for manipulating the topic. Any other advances, or advances-to-come?
Advances in Sabermetrics (August 18, 2003)
Posted 8:50 p.m.,
August 19, 2003
(#21) -
Tangotiger
I get these insulting messages every now and then, so I suppose I should address them every now and then.
Off the top of my head, I linked to a BP article regarding sample size and the Redsox and BIP and I praised them for taking the time to take a full paragraph to explain the limitations.
Nate had an excellent piece on replacement level, and I praised that too.
Sheehan had an article on pitcher workloads, and I can't remember what I said, but I was I think complimentary.
So, that was probably in the last 2 or 3 months.
Over at Clutch, I linked to the Jack Morris reprint article of Joe Sheehan. I remember another link to Craig Biggio, and then another to Andruw Jones too.
I'm pretty sure I had a direct link back to the article.
Oh yeah, I think I linked to a Randy Johnson quote that Ryan Wilkins had, but I don't remember if I did link to them.
I've made I think 2 links to Keith Scherer on balls and strike counts, though I distinctly remember at least one.
Ok, so I didn't put a direct link in the title to Brook Keisheickiek, but I did reference it in my comment, and it was not even germane to BP. I just liked the idea of dual players. And same here. I just like the question that the BP reader had. Do I have to link to the whole chat?
And finally, I make some tongue in cheek comments about "unnamed authors" at BP with their TP series (no bylines). So, I intentionally didn't name Gary in my opening piece as a play on that. But,who cares anyway?
Now, is that satisfactory to you?
In terms of BP and Primer referencing each other, I have to say that there are 100 BP links from Primer to every 1 Primer link from BP. If you want to ask about policy, don't ask me.
Now that you've p-ssed me off, and taken 10 minutes out of my life, I'm going to play with my kid now. Why do I waste my time? Sheesh...
Advances in Sabermetrics (August 18, 2003)
Posted 10:13 a.m.,
August 20, 2003
(#24) -
tangotiger
But the forecasts for t+1 are not the same thing as the best guess of the true values for individual players. How could they be? If there is measurement error or other random errors that lead us to believe that one set of 600 PA is only an estimate of a player's true value, we should believe the same thing about any set of 600 PA, right?
Actually, the way I worded the question, by saying "1 million PAs", was a way to say "don't worry about random variations that will exist in all PAs, as they will cancel out, at least to the 3 significant digits after 1 million PAs." I actually don't know if 1 million PAs will give me something as less than +/-.0005 99.999% of the time, but whatever "1 million" should be, that's what I meant. I should have used the safe Austin Powers "100 billion".
Even if you knew that his true OBA was .368, that doesn't mean that in the next 600 PAs he will perform at .368, just that his performance will center around that, with some distribution around it. Just like flipping a coin.
Anyway, I'm using "true rate" and "future estimated rate, with five 9s, within .0005" interchangeably, even though, technically that is inaccurate.
Advances in Sabermetrics (August 18, 2003)
Posted 12:00 p.m.,
August 20, 2003
(#26) -
tangotiger
I don't think you are wrong on anything.
In essence, the more data you have the more able you are able to discern the true ability from past performance.
I agree with this statement especially.
And for the regression towards the mean, I agree that the larger your group, the better you will be able to estimate the group's true mean by applying the appropriate regression factor, but as the group gets smaller and smaller (down to a group of 1), your confidence in that regression becomes smaller and smaller. While you may guess that that single player's OBA would be .368, that's really just an average estimate that is centered around .368, that could be between .310 and .440, with various probability rates for each point. In fact, even that exact point (.3680000000) is impossible.
The likelihood is that JW IS a below .380 true hitter, but I might only be [insert appropriate number here]% sure of that.
And like you mentioned, the more data you have (whether more "n", or more description of each "n"), the more reliable your estimate is possible.
Advances in Sabermetrics (August 18, 2003)
Posted 12:41 p.m.,
August 21, 2003
(#28) -
tangotiger
A player will regress 100% towards his true mean.
A group of players will regress [insert number]% toward the population mean, from which they were drawn.
Regression towards the mean is the second case, and not the first.
We are trying to estimate the group mean as best we can by looking at the observed mean of the group, the observed mean of the population, and regressing a certian amount based on other factors (correlation between the two samples of the population).
If we already knew the individual player's true mean, we wouldn't need regression towards the population mean. We already know his true mean.
Advances in Sabermetrics (August 18, 2003)
Posted 11:01 p.m.,
August 21, 2003
(#31) -
Tangotiger
His 2003 performance is a sample of his true mean.
We cannot know ever a player's true mean. We jsut know that every single day he is performing at his true talent level, which differs day-to-day, based on his conditioning, state of mind, etc.
I was just pointing out that regression is towards the population mean.
There's no need to think of regression towards his own mean, since, by definition, he will always play to his own mean.
Advances in Sabermetrics (August 18, 2003)
Posted 7:38 a.m.,
August 22, 2003
(#33) -
Tangotiger
To answer David directly, yes it was silly of me to say that you regress 100% to your own mean, and I probably made that more confusing that it was. My post 31 hopefully makes that clearer.
As for the Carroll statement, I'm reading and reading it, and I'm not sure what he's after there. That there's a large luck component to getting injured, and that other than your personal history and maybe position, there's not much more to it than that? Probably.
Advances in Sabermetrics (August 18, 2003)
Posted 9:59 a.m.,
August 22, 2003
(#35) -
tangotiger
9791 PA is not enough, and our best guess is that his .482 OBA IS a little less, probably regressed 5 to 10% towards the population mean, or .470 to .475 or so.
A player will always play to his true mean for every play, and this mean will be different play to play. As his sample number of plays approaches infinity, his average performance level in those plays will approach his average true talent level over that time span of plays.
So, I should never have said the "100% towards his own mean". I just meant the above paragraph.
So, a player himself does not regress towards the population mean, but we regress his sample performance towards the population mean to infer as a best guess what his average true talent level was over that time span.
Advances in Sabermetrics (August 18, 2003)
Posted 4:03 p.m.,
August 22, 2003
(#37) -
tangotiger
The regression factor would be different for various events. For example, with OBA, the year-to-year r might be .70 for 600 PA, and therefore, you would regress 30% towards the mean. But, the year-to-year r might be .50 for BA, and so you regress 50% towards the mean. Each metric has its own regression factor.
My guess is that at 10,000 PAs, the OBA needs to be regressed between 3 and 10%, while the BA needs to be regressed between 5 and 15%, towards the population mean.
Rob, I'm looking forward to your results.
Advances in Sabermetrics (August 18, 2003)
Posted 6:51 p.m.,
August 22, 2003
(#39) -
Tangotiger
We regress a player's sample performance mean towards the his population mean as an estimate of his true talent mean, such that we minimize the error for the group.
A player's sample performance was done at discrete points over a certain amount of time (days, years, etc). His true talent level at those discrete points was not constant (since he is human... except maybe Bonds).
So, seeing that you know how he performed, seeing how you can account for his context, and seeing how the population (i.e., the average player) does in that context, you're now ready to regress a group of player similar to your player a certain amount towards the mean.
*****
A true .400 player, over 10,000 PAs will have a stdev of .005. So, that true known .400 player will be between .390 and .410 95% of the time over 10,000 PAs (hopefully I'm doing this right, going from true to observed and not the other way around.... been a long day for me too).
Get 100 times more PA, and your factor multiplies by 10, and so that .005 becomes .0005. So, at 1 million PAs, your true talent and your performance levels converge (at +/- .0005 95% of the time).
I think.
Advances in Sabermetrics (August 18, 2003)
Posted 10:39 p.m.,
August 22, 2003
(#40) -
Tangotiger
To confirm the above numbers, I ran a sim, where I gave my true .400 OBA player 10,000 PAs each season for 600 seasons. The standard deviation was .00495.
Our expectation was sqrt(.4*.6/10000) = .00490
So, 10,000 PAs is not enough to say that performance ~ true talent.
Advances in Sabermetrics (August 18, 2003)
Posted 4:54 p.m.,
August 25, 2003
(#45) -
tangotiger
Yes! You can use anything you want really, as long as you specify your criteria not based on the numbers you are measuring. That is, is Sosa a power hitter because the variable you are studying says so about him?
But, yes, you can select RF who swing hard and are bulky and make that your represenatative population, and draw Sammy from there.
Advances in Sabermetrics (August 18, 2003)
Posted 10:12 a.m.,
August 26, 2003
(#49) -
tangotiger
Rob, great stuff! I can confirm that between 500 and 2000 PA, those results are inline with empirical results of year-to-year r, with the regression towards the mean being 1-r. Great!
Advances in Sabermetrics (August 18, 2003)
Posted 11:20 a.m.,
August 26, 2003
(#51) -
tangotiger
I think they only apply to OBA, if OBA is dependent only on the batter's skills. Therefore, I think you have to regress a little more towards the mean with OBA, alot more with BABIP, etc, etc.
For SLG, it's more complicated. YOu have to have successes/trials, and SLG won't work that way.
Solving DIPS (August 20, 2003)
Discussion ThreadPosted 12:09 p.m.,
August 20, 2003
(#2) -
tangotiger
I'm actually exhausted Erik! I've been putting off doing some other baseball stuff for 2 weeks, and I'm really happy with the way this thread unfolded.
Feel free to run more sims, but I don't think they're necessary at this point. They might be valuable if you do more breakdowns, like with GB/FB and Lefty/Righty and by the 7 fielding positions, etc. I think ARvin's equation works on independent variables, but I don't think that would apply here.
But, I think your work and Arvin's should shake things up!
Solving DIPS (August 20, 2003)
Posted 4:04 p.m.,
August 20, 2003
(#4) -
tangotiger
It's an interesting thought, and I'll pass it along to the group.
One thing I did a few months ago was to cleanup my website by ordering things, so that it's more useful. Of course, since then, I've added a few more articles, but I have not updated the index to point to them. Story of my life. My wife has been after me to update the pictures of my baby on our personal site. I'm 7 months behind that too.
Many many times I've thought about doing a "best of" kind of deal, and putting things in one place. But like you are alluding to: time/money/work/family is a tough thing to balance.
I agree though that it's nice to have everything in one place, and I think that within 1 year, maybe less, I'll have consolidated everything I've done into something organized, if not in PDF/book format, at least in a "finalized" fashion.
Thanks for the idea!
Solving DIPS (August 20, 2003)
Posted 10:26 p.m.,
August 20, 2003
(#7) -
Tangotiger
(homepage)
FJM: check out the above link. It lists the UZR for all players, min 120 games over 4 years. Maybe you can take that, bring up the threshhold to 240 games or 300 games or something, and run your thingie again. I'd like to see the results against UZR.
I agree that there is greater variability at 1b,3b than ss,2b. I think my numbers bore that out (.022 or something for ss and .027 or something for 3b). It's reassuring that ZR showed something similar, but a bit higher (which we'd expect because ZR includes the park factor, and pitcher tendency/handedness effect, which UZR strips away).
Anyway, just eyeballing the UZR chart, and things do look normal, but I agree that you would expect at positions that don't tolerate bad defense to have a different skew. 3b is neutral-type of position, and so we should expect no skew, and wide variance. 1b we expect the skew opposite of SS.
Good stuff!
Andrew: thanks for the offer. I'm not sure what can be done. I usually work on impulse, and have a habit of leaving alot of things unfinished.
Solving DIPS (August 20, 2003)
Posted 7:49 a.m.,
August 21, 2003
(#9) -
Tangotiger(e-mail)
I can send you the annual Team UZRs. Send me an email.
To translate runs into a rate stat, you divide by the number of plays per year at that position. For example, I think I set 1B at 2x162 and 3B at 4x162.
(Actually, I kind fudge a little: if a SS makes 3 of the 21 outs on BIP, and there were 28 BIP, I give him 4 "plays". It kinda keeps things in line, since each BIP doesn't belong to any one fielder.)
Solving DIPS (August 20, 2003)
Posted 10:54 a.m.,
August 21, 2003
(#11) -
tangotiger
Hmmm... the batter. From the perspective of the pitcher, the true variance of the batter, and any random element, would be zero (I think I'm saying that correctly). Even something as substantial as the park has a stdev of .004, barely making a dent into the equation.
I don't think it's an issue in this case.
Solving DIPS (August 20, 2003)
Posted 2:24 p.m.,
August 22, 2003
(#12) -
tangotiger
(homepage)
Guys,
I just wanted to thank you all for this thread again. It's been a very big eye opener for me, and I enjoyed tremendously the work that Erik and Arvin especially put in, as well as the different perspectives of everyone who posted. This may have been the only DIPS thread where it was truly a pleasure to read everyone's posts.
I don't think I will be doing a summary of this summary. If someone would like to do it, feel free to jump in.
I've been trying to "bend the wand" for about a month now, but this great DIPS work really reeled me in. And other things that I've read on other topics around (like at battersbox and Clutch) have conspired to pull me in further.
Anyway, looks like the only way for me to stop procrastinating is to go cold turkey. So, after this weekend, I won't be stopping by for a while, or reading anything else online. If someone wants me to post some links in Primate Studies, I'll be glad to do so, but I won't offer any of my thoughts on the matter. I'll be back in time for the World Series in a limited capacity.
MGL and I have talked about maybe starting a site to preview our research, so maybe we'll have something worked out by then. You can join the group at the "homepage" link above to be on the mailing list.
Thanks again guys.... truly fun to talk with all of you.
Tom
Solving DIPS (August 20, 2003)
Posted 6:37 p.m.,
August 25, 2003
(#16) -
Tangotiger
Remember the equation:
True variance (DER) = True variance (pitching) + True variance(fielding) + True variance (park) + True variance (hitting) + True variance (fill in the blanks)
We know that the true variance is .012 for DER. My guess is that the true variance for hitting, from the perspective of the pitcher, to be close to zero.
I'm pretty sure this is how we are supposed to look at it, but I'll defer to the statisticians.
Solving DIPS (August 20, 2003)
Posted 11:01 p.m.,
December 26, 2003
(#18) -
tangotiger
This article is this week's "Oprah's Book of the Week", and required reading for anyone who missed it.
Solving DIPS (August 20, 2003)
Posted 9:50 a.m.,
December 27, 2003
(#20) -
tangotiger
I'm drawn by the intelligence of the readers here... it's my vice. But, yes, I am once again (third time now?) wondering whether to take a break.
Making (some) sense of RBI (August 20, 2003)
Discussion ThreadPosted 3:09 p.m.,
August 20, 2003
(#1) -
tangotiger
(homepage)
The battersbox article is at the homepage above.
Rereading it, I see that Elias Bureau does virtually the same thing as I do, though I'm not sure how they handle the HR issue.
Making (some) sense of RBI (August 20, 2003)
Posted 10:45 p.m.,
August 20, 2003
(#4) -
Tangotiger
(1) Where do you get the data for #1 and #2.
All Ray Kerby, my lord and saviour.... uhmmm, my saviour anyway. The query is easy. Put "basesit:outs" in the key field, and "n pa rbi hr" in the output field, and "1999-2002" in the years field. 2 minutes later, you look prolific.
(2) If its already written up what is the laborious process for controlling (b) and (c). [Skip this if its not written up]
It's not written up. For (b) you figure out the following. Saying you haev a guy with 600 PA, 150 H, 50 2b, 30 HR, 25 BB, etc, etc... how many RDI would this guy get if he were to get a normal number of opps in each of the 24 base out states? (You have to figure how many RDI a double with man on 1b/2b, 1 out gets for the league, etc, etc.) So, what that gives you is "if my player performed the same across the 24 base-out states, how many RDI would he have gotten?" How many did he get? That's his clutch. For (c) you have a little tougher time. How often did he have Raines at 1b and 1 out? How does a speedster like this do when a double is hit? etc, etc. Kinda complicated, but something along those lines. You may also want to check the Tom Ruane article on Joe Carter at www.baseballstuff.com
If one were to try and build upon this work and determine if there are skills for Ichiro's magic ...if the HR has a contextual value it still must be considered?
I'm only considering the base/out state first of all. If you hit a HR, regardless of the base/out state, that's 1 run. If you happen to hit a HR with 2 men on, that's 2 RDI. So, you'll get "credit" for the RDI with the timely HR. You just don't need to get the credit for driving yourself in in a "timely" situation, since driving yourself in has nothing to do with the base/out state.
I ask this not to be a smart ass, but instead to understand and or extrapolate my worldly view on baseball. If one were to accept that (1) some batters were to change their approach on the AB based on the dynamics of the game; (2) they have differing performance parameters based on their approach.
There's no question that every batter/pitcher matchup is unique based on the context (inning/score/base/out/park, etc, etc). Therefore, absolutely everyone changes their approach to some degree, each hoping to perform optimally, and most likely everything cancelling out. But not quite, and certainly not in all instances.
We do know that certain base/out situations, like certain counts, are "hitter's states" and "pitcher's states". So, batters can leverage say the bases loaded 0 outs situation. Maybe they overcompensate, etc. A look at the league totals at each base/out state will show you the direction that these matchups go.
I do believe in all that. I don't believe we can say with much reliability who does what though. We just know general tendencies as to what they are probably doing. Clutch ability exists, but is more elusive to find than a pitcher's skill at preventing a hit on BIP. You won't be able to find the numbers to state at a high enough statistical significance that a player is clutch, whether he is Eddie Murray or Manny Ramirez. In fact, once you look at Murray by the 8 base states (and count the SF as a regular out... VERY IMPORTANT!), you will see that Murray's entire cluch value is when the bases are loaded. You can probably show some significance there, but I don't think anywhere else.
I hope I answered your issue, even though I went off to some other place!
Making (some) sense of RBI (August 20, 2003)
Posted 10:47 p.m.,
August 20, 2003
(#5) -
Tangotiger
how many RDI would this guy get if he were to get a normal number of opps in each of the 24 base out states? (You have to figure how many RDI a double with man on 1b/2b, 1 out gets for the league, etc, etc.)
Sorry... I said that wrong. How many RDI would he get if he performed the same in all baseout states, given his actual opps in the baseout states?
Making (some) sense of RBI (August 20, 2003)
Posted 9:15 a.m.,
August 21, 2003
(#6) -
tangotiger
(homepage)
By the way, I would only take this RBI thing so far.
If you want to get serious about it, I suggest you click the homepage link above. That was written 2 years ago, but applicable all the time.
CF Rankings (August 22, 2003)
Discussion ThreadPosted 1:56 p.m.,
August 22, 2003
(#3) -
tangotiger
This is the number of putouts by OF, for each team. Feel free to come up with "league averages" (it sure seems as though you have alot of NL teams under 1000).
TOR 1088
TEX 1066
TBA 1215
SEA 1172
SLN 1093
SFN 1148
SDN 1014
PIT 907
PHI 940
OAK 976
NYN 1046
NYA 1031
MON 989
MIN 1202
MIL 1083
LAN 1017
KCA 1097
HOU 926
FLO 1051
DET 1150
COL 1062
CLE 1054
CIN 1065
CHN 973
CHA 1081
BOS 983
BAL 1069
ATL 1057
ARI 979
ANA 1182
Double-counting Replacement Level (August 25, 2003)
Posted 3:32 p.m.,
August 25, 2003
(#2) -
tangotiger
Great perspective Patriot!
FWIW, using on 2001 superLWTS, and setting 300 PA as the line between regulars and backups, I get, on a /680 PA:
regulars: +6 overall, + 6 batting, 0 fielding
backups: -22 overall, -21 battting, -1 fielding
The *players* that are replacement level (backups, or shades below backups) are *average* fielders.
There's no such thing as a replacement level fielder or replacement level hitter... there are replacement level *players*. A replacement level player turns out to be an average fielder.
Double-counting Replacement Level (August 25, 2003)
Posted 4:30 p.m.,
August 25, 2003
(#7) -
tangotiger
In terms of "value" for Edgar, I always look at it this way: "How would a baseline player do, if he was in Edgar's shoes?" From that standpoint, your baseline player will have no fielding contributions, just like your baseline AL pitcher has no hitting contributions.
I agree that Edgar theoretically limits the way the team is set up, in that if they had a truly horrible fielder, they couldn't hide him at DH..... except, have you looked at the way teams use the DH? It's not reserved to just the bad fielders. You'll get decent fielders in there. Edgar being in the DH doesn't really affect the way teams do their business.
Double-counting Replacement Level (August 25, 2003)
Posted 4:50 p.m.,
August 25, 2003
(#8) -
tangotiger
I agree with Patriot. BP should never have added it up as they do.
I mean, why stop there? Why not set the replacement level for batting, for stealing, for taking the extra base, for range, for throwing, for every facet of play? And then add it up.
The idea of replacement level is exactly what Patriot is saying: that a replacement level player = 0 wins = 0 (or 300K) dollars in salary. It's the minimum level of play in which you will get paid MLB dollars.
As I mentioned in my scales article a few months ago, *first* compare everything to average, and then, as a *final* step, compare to replacement. Do *not* have your intermediary steps do replacement as well.
Double-counting Replacement Level (August 25, 2003)
Posted 10:52 a.m.,
August 26, 2003
(#12) -
tangotiger
I looked at 1999-2002 UZR. I selected, by year, all players with at least 81 "UZR games" (treat that as "full" games), including if they had 60 games at 2b and 30 games at SS. Those are my regulars.
Then, by position, I figured the regular's UZR/162. I did the same for the backups. Here are the results.
pos Regular Backups diff
3 0.5 -1.8 2.3
4 0.8 -1.9 2.7
5 2.0 -5.0 7.0
6 1.7 -6.0 7.7
7 0.9 -1.5 2.4
8 1.2 -3.9 5.1
9 0.6 -0.9 1.5
So, we see that the regulars are slightly above average fielding-wise, at about +1 relative to all players at their position. The backups are -3 relative to all players at their position. That makes the difference, 4 runs, how much an average regular is better than an average backup, fielding-wise.
If someone wants to repeat this exercise for hitters (I'd HIGHLY suggest using LWTS) by position, that would be nice.
Double-counting Replacement Level (August 25, 2003)
Posted 9:02 p.m.,
August 29, 2003
(#15) -
Tangotiger
I don't know what "all BP" does. I'm just telling you what is on their site, and you can see the result by looking at Mike Schmidt.
Look at they do, and not as they say.
Double-counting Replacement Level (August 25, 2003)
Posted 12:56 p.m.,
August 31, 2003
(#18) -
Tangotiger
You have to regress their observed performance to establish their true rates. At 900 PAs, you probably regress about 25%, so that -35 runs would come in at -26 runs.
Double-counting Replacement Level (August 25, 2003)
Posted 12:58 p.m.,
August 31, 2003
(#19) -
Tangotiger
To put it another way, you selectively sampled your players by looking at their performance after the fact, and selecting on that. That's a no-no.
However, if you take ALL those players based on 2000-2002, AND THEN, tell me what their average performance was in 2003, then, that's the correct figure to use as your replacement level.
And, it'll probably come in at around -26 runs or so.
Double-counting Replacement Level (August 25, 2003)
Posted 4:45 p.m.,
August 31, 2003
(#21) -
Tangotiger
You can't selectively sample your group after-the-fact on the metric that you are studying. To combat this selection issue, you regress. Otherwise, your sample is tainted.
Why did you choose a PA cutoff then? Why not select ALL players, and take the worst runs / PA of the bunch? If you have a guy who's rate is -60 runs / 600 PA, but he did this after only 25 PAs, then so be it. I agree, ridiculous.
Double-counting Replacement Level (August 25, 2003)
Posted 9:40 p.m.,
August 31, 2003
(#23) -
Tangotiger
The reliability of a metric is always dependent on the number of trials, and not the number of successes.
If you flip a weighted coin, and you get 73 heads in 100 flips, is that 73 successes or is it 27 successes? 73 hits may be a success to a hitter, but 27 would be the success to a pitcher.
Double-counting Replacement Level (August 25, 2003)
Posted 7:57 a.m.,
September 1, 2003
(#26) -
Tangotiger
Yes, it would apply. That is simply your best guess.
In the case of 100 PA, you'd regress probably 70% towards the mean. So, if goes 100 for 100, and the league mean is .300, your best guess as to the true talent level that would produce such an observed rate is .510.
You can instead of using a weighted coin, you can use an unbalanced die, where the weighting of the die changes for every roll, but skewed towards say landing on 4,5,6. This would be like a human where his "true rate" changes PA-by-PA, but centered around something.
Double-counting Replacement Level (August 25, 2003)
Posted 4:51 p.m.,
September 3, 2003
(#29) -
tangotiger
I was thinking a bit about this. The problem is that we use regression towards the mean on rate stats, when I'm not sure that's entirely accurate, especially when you have distributions such as this.
So, I propose the following, with an example. Say the league mean is .300 and your 100 PA player is a .950 player. The regression towards the mean is set at .700 for a player with 100 PAs.
Let's break out our ratios.
.300 = .300/.700 = .429
.950 = 19.000
.400 = .667
With a .400 player with 100 PA, we would normally do a regression as 70% towards .300, or .330. With our new-fangled ratio method, that would become
new ratio = .667 - (.667 - .429) * .700 = .500
new rate = .5 / (1+.5) = .333 (as opposed to our previous .330)
With your .950 player and 100 PA, that becomes
new ratio = 19 - (19-.429) * .700 = 6.00
new rate = 6 / (6+1) = .857
I don't know if that even makes mathematical sense, but I find that my trusty ratios always come through in the pinch.
(That 70% regression for a rate might translate to 73% for a ratio, or something.)
Double-counting Replacement Level (August 25, 2003)
Posted 4:55 p.m.,
September 3, 2003
(#30) -
tangotiger
Btw, your .980 player becomes a .936 player using this process. I think I may be onto something here. Maybe I should break out my stats books from 15 years ago.
Double-counting Replacement Level (August 25, 2003)
Posted 1:53 p.m.,
September 8, 2003
(#33) -
tangotiger
Joe, that looks about right, though I can't comment on what the replacement level for fielding that is used for those golden years. Assuming it's set the same way, then yes.
As for pitchers, there's no double-counting going on, though they should have their own runs-per-win converter. I think you already do this.
Double-counting Replacement Level (August 25, 2003)
Posted 3:49 p.m.,
September 9, 2003
(#34) -
tangotiger
(homepage)
Michael: what you are reporting about what Gary told you is inconsistent with what Clay is reporting at the above link. BP is, to the best that I can tell, double-counting the replacement level. This, according to the WARP-3, makes Loiaza a very viable MVP candidate.
My note to BP from last month was left unanswered, and therefore, I will report a "no comment" from them.
Double-counting Replacement Level (August 25, 2003)
Posted 3:52 p.m.,
September 9, 2003
(#35) -
tangotiger
To clarify, if WARP-3 did not double-count the replacement level for non-pitchers, it sees Loaiza as a viable MVP candidate.
Double-counting Replacement Level (August 25, 2003)
Posted 10:10 a.m.,
October 31, 2003
(#38) -
tangotiger
(homepage)
Clay says: ...an assumption that the ultimate replacement level team was the Cleveland Spiders of 1899, a combination of craptastic hitting, pitching, and fielding. That puts my "replacement level" player at a .130 win pct., significantly below the "freely available" threshold (which typically involes a .300-.350 win pct), but still above the "no contribution whatsoever" of Win Shares.
Is this reasonable? Is it reasonable that you can have crappy hitting AND crappy fielding from the same players at the MLB level, in today's day and age? Is this who you are trying to be better against?
The most reasonable baseline is that a MLB scrub non-pitcher is a bit over 2 wins / 162 GP worse than average (hitting and fielding). For pitchers, it's probably a bit under 3 wins / 27 full games. So a team of non-pitchers would be -2 x 9 = -18. A team of pitchers would be -3 x 6 = -18 wins as well. That's -36 wins from an average team of 81 wins, or 45 wins, or .278.
You can present data, based on your varying assumptions, that'll put the baseline at somewhere between .250 and .350 for a team. If you want to set the replacement level to .130, this would mean you have a team scoring 2.90 runs and allowing 7.75. I find that completely unreasonable.
***********
When the Tigers and the Mets and the Spiders get brought up as examples, I have to remind the readers about (a) the difference between a sample performance and a true talent level, as well as (b) the non-random distribution of talent among teams.
Taking that last one (b): there's some players on the 62 Mets and 03 Tigers that would not have seen the light of day on any other team.
As for (a): the stats of players are samples... SAMPLES.... OBSERVATIONS... of their true talent level. You simply can't take a player's stats and assume that that's representative of their true talent level, and therefore base a theoretical team from those stats.
If you've got a theoretical team of .300 talent, they will NOT play .300. They will play between .200 and .400. So, if you knew (which is impossible) that you had a team that is expected to win .300 over 1 million games, it's quite possible that they will play .200 over 162 games.
So, if you've got the 62 Mets and 03 Tigers or the 99 Spiders, you must, absolutely must, regress their performance to some degree to establish the true talent level of that team of players.
Double-counting Replacement Level (August 25, 2003)
Posted 10:26 a.m.,
October 31, 2003
(#39) -
tangotiger
Not to pick on Clay, since he's doing great work with translations, but his statement on Matsui:
...it sent me back to the drawing board with respect to Japanese translations, with Extra Special Attention paid to power. The result is a revised system specifically meant to deal with Japan, and not treating it like every other league in the States. If I had had these revisions in the spring, my forecast would have been more like .290/.375/.479 (22 HR) instead of the .290/.421/.567 (32 HR) we actually forecast - since his actual line was .287/.353/.435 (16 HR), that cuts more than half the error away.
PECOTA does the same thing, as just about every regression equation out there. You can't include your samples to establish an equation, and then use those samples to test against. All you are doing is best-fitting your samples, which is not necessarily predictive of data outside your samples.
If Clay did not include Matsui in his samples, then that's another story, and I'd have no problem with it... as long as the equations being developed had no knowledge of Matsui. If on the other hand, Matsui was included, then you can't talk about "cutting errors in half", since Matsui was part of the sample group you established the equations on.
As an example, John Jarvis shows, using regression, that the value of a double is .67 runs. This is laughable. And then, he shows the RMSE of all teams, and shows that a LWTS equation, with the .67 figure, comes out the best! Well, it was best-fitted to do so. The true test would be to best-fit say the 1974-1990 time period, and then test against the 1961-1973 time period and 1991-2003 time period.
For PECOTA, MLEs, and other translation systems, you should only use a certain percentage of the data, and then test it against the rest of the data.
(Forewarning: if you're going to comment that my comment is too "harsh" or I'm picking on anyone, then send me an email to that effect, and we can discuss it privately. I'm not going to debate this type of issue in an open forum any more.)
Double-counting Replacement Level (August 25, 2003)
Posted 5:03 p.m.,
October 31, 2003
(#41) -
tangotiger
Well, John Jarvis then goes out and starts using that figure for other purposes.
As well, the regression value itself has a margin of error, which is ignored as well.
Double-counting Replacement Level (August 25, 2003)
Posted 5:17 p.m.,
November 4, 2003
(#45) -
tangotiger
We're slowing trudging along. The outline is all written, and the data is all pretty-well parsed for easy querying. The hard part is really managing our family (and for me work) lives with this project. And, I know that writing and presenting the report will take up 80% of the time. As for a "pay-for" website, we've discussed it, but still not sure yet if/when to implement that.
Empirical Win Probabilities (August 28, 2003)
Posted 10:02 a.m.,
August 29, 2003
(#5) -
tangotiger
I understand your issue with innings, but that can't be right, especially with men on base and with the score.
I would add that interaction you did with Innings to "SIT" and "Difruns". Even if it adds little to the overall predictive power, it will add *alot* to the overall predictive power for the 9th inning of a tie game with men on base.
(Can I guess SIT2=8 will be multiplied by zero?)
Empirical Win Probabilities (August 28, 2003)
Posted 10:52 a.m.,
August 29, 2003
(#7) -
tangotiger
(homepage)
Tremendous stuff Alan!
Again, doing only the work that you feel is worth your time, see if a model for any of the following interests you:
- 9th inning, score within 3 runs
- 9th inning, rest
- 8th inning, score within 3 runs
- 8th inning, rest
- 7th inning, score within 3 runs
- 7th inning, rest
- 1 thru 6, all
If you go to the homepage link, you will see that I have generated WE using a math model. Feel free to run your system against that if you want.
Again, what has been presented is excellent work, so thanks!
Empirical Win Probabilities (August 28, 2003)
Posted 9:43 a.m.,
August 31, 2003
(#11) -
Tangotiger
Cool, thanks. No need to email, I can generate this on my own.
It's worth pointing out that mine is math generated assuming that both teams are equals at all times, with no HFA.
I would expect discrepencies, especially in the later innings where the pitching talent would change drastically.
Good stuff again!!
With your permission, I will reproduce my chart, along with yours (and the empirical provided by Phil), side-by-side-by-side, so people can see how things compare.
Empirical Win Probabilities (August 28, 2003)
Posted 10:14 a.m.,
September 3, 2003
(#13) -
tangotiger
(homepage)
Here is the win probability chart that shows my math model, Phil's empirical data, and Alan's function.
Empirical Win Probabilities (August 28, 2003)
Posted 10:20 a.m.,
September 3, 2003
(#14) -
tangotiger
The largest discrepencies between mine and Phil's real data are the following:
Inning HomeAway Score Base Out Tom Phil Alan
7 Away 0 2nd_3rd 0 0.279 0.167 0.288
7 Away 1 3rd 0 0.517 0.619 0.587
7 Away 1 2nd_3rd 2 0.665 0.768 0.631
7 Away 1 Loaded 1 0.500 0.608 0.533
7 Home 0 Loaded 0 0.826 0.967 0.830
8 Away 1 1st_3rd 0 0.453 0.340 0.574
8 Away 1 2nd_3rd 0 0.410 0.300 0.534
9 Away 1 2nd_3rd 1 0.552 0.448 0.696
9 Home -1 3rd 0 0.593 0.457 0.410
9 Home -1 2nd_3rd 0 0.741 0.628 0.509
9 Home -1 Loaded 0 0.766 0.614 0.536
You get oddball results with Phil's data because of the sample size issue. For this reason, I would not rely too much on that data.
Empirical Win Probabilities (August 28, 2003)
Posted 10:44 p.m.,
September 3, 2003
(#16) -
Tangotiger
I mean that I don't give a HFA advantage in terms of a home team winning about 54% of their games.
But, tied in the bottom of the 9th for the home team is far better than tied in the top of the 9th for the home team. After all, if the visiting team scores a run in the top of the 9th of a tied game, the home team can still win the game. But, the home team scoring in the bottom of the 9th guarantees the win.
Mike's Baseball Rants - Sac Flies (August 28, 2003)
Posted 12:11 p.m.,
November 4, 2003
(#12) -
tangotiger
If true, wouldn't that indicate they must be hitting FB at a higher rate in SF situations since clearly not every FB will lead to a successful SF?
If you have a batter that has 100 FB and 60 GB, and the league rate is to have a SF on 40% of all FB, then our above hitter will end up with 25 SF per 100 (FB+GB).
100 * .40 + 60 * 0 = 40 SF, in 160 outs, or 25% of outs are SF
If you have the reverse, 60 FB and 100 GB, then our batter is expected to have 15% of outs as SF.
I doubt you will find hitters that have a special ability in hitting SF, beyond what is known about their hitting profile. (long FB rate, FB/GB ratio, etc).
And you have the flip side with GB as well. It all balanaces out nicely, more or less.
The point is that the batter did not "sacrifice" himself. He alters his hitting approach to maximize his team's chances of winning. If that means he might gets slightly more outs, so be it. That doesn't mean that, after the fact, after you know he has a FB out and the runner scores, that you should remove that as an opportunity.
If you want to be "right" about it, remove the PA for all "men on 3b and less than 2 outs", regardless of the outcome (HR, H, SF, GB, etc).
With sac bunts, you *should* throw out all bunt ATTEMPTS where the batter TRIED to give himself up, regardless of whether he was successful or not. As it is, only successful SH are removed. This is another case where the result is irrelevant, and it should depend on the initial intent.
The sac bunt and the SF are not the same thing at all. In the former case, we know that the batter has the bat taken out of his hands, and into the manager's (like the IBB). In the SF, the batter changes his hitting approach (as they do for ALL 24 base/out situations).
Anyway, it's stupid to make the distinction in the official stats this way.
Just record what happened.
What I almost always do is throw out the IBB and Bunts from the batter's and pitcher's line, and track them separately, since the pitching/hitting approach are completely different.
If there was a preponderance of SF, where the batter would completely change his hitting approach to "force" a FB, then I would remove those as well. As it is, a SF is alot more a regular PA than a sac PA.
Mike's Baseball Rants - Sac Flies (August 28, 2003)
Posted 1:44 p.m.,
November 4, 2003
(#14) -
tangotiger
Nothing is ever completely random. Just our ability to spot these things is dependent on the size of the sample. What you do is assume randomness to make life easy, but being aware that there's a margin of error in so doing.
******
I would count a "reached base on error" as a "safe" play in OBA, even though the official records gives one AB and no safe play for it.
A batter does "his job" by scoring the runner from 3B? Nope. The win probability in almost all cases says that the batter REDUCED the chances of his team winning. The obvious exception is when it's the winning run. Can't a batter do his job by moving a runner from 2b to 3b, while hitting the ball to the 1b or 2b? Or is the job only about scoring the run, and not moving the runners over?
All these things are so contextual, that you might as well just break out the batter's PA by the 24 base out states, instead of inventing rules as what constitutes a job.
Mike's Baseball Rants - Sac Flies (August 28, 2003)
Posted 1:59 p.m.,
November 4, 2003
(#15) -
tangotiger
In 2002, with men on 3b and less than 2 outs, this is what happened:
AVG : .329
AVG (but including SF as an AB): .278
airout/groundout ratio (without SF): 0.45
airout/groundout ratio (without SF): 1.00
The air/ground ratio for ALL situations is: 1.00.
So, what REALLY gives a more honest representation of what happened? Do you want to exclude the SF from the airout to ground out ratio? Nope, I don't think so. Do you want to exclude SF from AB, since they are not really failed AB, but only so-so failed AB? I don't think so either. The .278 is alot more representative than what .329 is.
What it comes down to is that regardless of the batter/pitcher intent to the approach in the man on 3b and less than 2 outs, the results are better represented when counting the SF as an out in the air/ground ratio, and countint the SF as an out in AB.
Mike's Baseball Rants - Sac Flies (August 28, 2003)
Posted 2:01 p.m.,
November 4, 2003
(#16) -
tangotiger
That should obviously read as:
airout/groundout ratio (with SF): 1.00
Mike's Baseball Rants - Sac Flies (August 28, 2003)
Posted 7:51 a.m.,
November 5, 2003
(#18) -
Tangotiger
Why not remove singles where the runner is out trying to stretch into a double?
Why not remove singles where the runner is stranded on the bases, and never drove anyone in?
Why count the SF as an unsuccessful opp in OBA, but don't consider it in batting average?
Why treat a BB the same as a SF with batting average?
You are trying to separate the SF from the other outs, while not doing the same thing with the hits and other events.
Anyway, I'm bored already.
Mike's Baseball Rants - Sac Flies (August 28, 2003)
Posted 10:04 a.m.,
November 5, 2003
(#19) -
tangotiger
I don't it is credible to say that a team is more likely to win with one out and a runner on third than they are with that runner scored and two outs.
I spoke too spoon. I was thinking about some other study I ran.
Anyway, is it better to have the man on 3b and 0 outs, or bases empty 1 out and a run scoring (assuming average batters all-round)?
If it's the home team and you have 0 outs:
- The SF is ALWAYS preferable if you are ahead.
- It is also preferable with the score tied in the 3rd and later innings (the closer to the 9th, the more preferable).
- It is also preferable being down by 1 run in the 7th and later innings
If you have 1 out:
- all the above applies, plus
- being down by as much as 6 in the 1st, 5 in the 2nd/3rd, 4 in the 4th, 3 in the 5th, 2 in the 6th
Mike's Baseball Rants - Sac Flies (August 28, 2003)
Posted 5:09 p.m.,
November 5, 2003
(#21) -
tangotiger
Because on base percentage measures how often a batter gets on base per plate appearance. If you want to change AVG so
Finally! My definition of batting average is:
"number of times batter reaches (but not limited to) 1B safely on a contacted ball in play, without forcing a runner out" divided by "number of times batter contacts a ball in play"
I know that's not what the rules say, but that's me. I might change the "contact" to "non-bunt contact".
Sabermetrics Crackpot Index (August 29, 2003)
Posted 9:50 a.m.,
August 29, 2003
(#1) -
tangotiger
Thanks to Andrew Clarke.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 11:40 p.m.,
September 6, 2003
(#1) -
Tangotiger
I tried posting this at batters box, but couldn't.
****
I just came across this.
Runs are created on a game-by-game basis (getting a runner on in April won't help you win a game in June). So, your run evaluator should be based on a game-by-game basis.
As for why I didn't test on a seasonal basis, as pointed out elsewhere, because of the incredible clustering of teams to the mean, virtually any half-decent run measure will be acceptable. All that that means is that any deviations will be masked by the 90% of the teams that are close to the mean.
However, when I selected games with 3 HR each, and then grouped them together, that gives you a few hundred or whatever games. So, instead of trying to select 100 teams with 180 HR or whatever, I've given you essentially a couple of teams that hit HR at the pace of Babe Ruth! (And of course, I game teams of HR at the 0 level, 1 level, 2, 3, 4...).
If nothing else, the one major point of BaseRuns, the one thing to keep in mind at all times, is that the HR does not generate runs the same way that the other events do. The more baserunners you have (at some point), the less valuable the HR. That's the takeaway from BaseRuns. That the HR does not have the ever-increasing value that all "multiplicative" methods says it does, or the always-stable value that all "linear" methods says it does. It's value increases to a point (around an OBA up to .350 to .400), AND THEN, it diminishes.
But, since no team actually exists at that level, then who cares?
But pitchers ARE their own teams, and you should care about that.
To evaluate hitters, custom-generated Linear Weights is probably the best thing to use.
Thanks for the interesting discussion!
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 9:50 p.m.,
September 7, 2003
(#4) -
Tangotiger
There are 3 reasons why Coleman, Willie, Raines, etc score more run per time on base than McCovey and his ilk:
1 - They are faster (this adds about +/- .04 runs / time on base, if I remember my research)
2 - They have better hitters behind them (#2 through cleanup, as opposed to #5 thru 7)
3 - They leadoff more, meaning they get on base with 0 outs more, meaning there are more PAs opportunities to drive them in
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 9:52 p.m.,
September 7, 2003
(#5) -
Tangotiger
The average IBB is virtually win-neutral based on research I published a few months ago. I did not look to see whether the IBB to Bonds specifically are win-neutral as well, but they probably are.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 9:58 a.m.,
September 8, 2003
(#7) -
tangotiger
(homepage)
Ross, you can go to my site, and look for the link on "Batting Order". In there, I have MGL's run expectancy matrix by batting order and league (but only for 1999 I think). I have much better data with more years, and I'll be doing alot more with it sometime in the upcoming months, and they will happen to address your issues here, which are all legitimate.
In fact, the reason I started that batting order thread at fanhome was because I believed that Rickey Henderson and Tim Raines were being ripped off because of their skills were optimally suited for the leadoff spot, but all run evaluation methods were not given them that credit. I.e, they are able to leverage their particular skills more in the leadoff spot, than others would. This impact, for Rickey in particular, I think amounted to almost 1 win per season. You can reasonably add a whopping 10 to 15 wins to Rickey's career simply on the fact that his skills were ideally leveraged in the leadoff spot.
Considering that a HOF is about +30 to +40 wins above average for their career, this +10/15 thing is an enormous impact that is simply not quantified by any other sabermetrician (but is probably intuitively recognized by the average fan).
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 10:29 a.m.,
September 8, 2003
(#8) -
tangotiger
Ross, by the way, MGL's superLWTS *does* take into account the "taking the extra base" performance of players. You can check it out. If I remember right, Juan Pierre and Derek Jeter do quite well.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 3:07 p.m.,
September 8, 2003
(#10) -
tangotiger
I think you are on the right track, but we all should separate runs from wins. Since each component has its own runs-to-win conversion ratio, it makes little sense to compute a run evaluator using IBB, and then converting that overall runs to wins.
What you want to do is figure out the win value for each component.
Now, you make a great point, and I'll reiterate here: the win-neutral value of the IBB is for the GIVEN PLAYER and not a league average player.
That is, if the win expectancy in a given state is .764 with Bonds at bat and pitching to him and .761 with Bonds being IBB to face Santiago, then, the IBB is worth NEGATIVE .003 runs ... adn this is the important part... relative to Bonds himself. So, if Bonds is +.010 wins / PA above average in his "pitched-to" PAs, then in this particular IBB PA, he'd be +.007 wins / PA.
If we assume that managers sometimes walk Bonds when he shouldn't and walk him when they should so that overall they are win-neutral PAs *to Bonds*, then you would do the following:
Compute Bonds' runs above average excluding IBB and convert to win above average. Say that works out to +80 runs or + 8 wins over 400 PA, or +.02 wins / PA.
Suppose that he gets 100 IBB. He gets credit for +.02 x 100 = 2 wins above the average player (or zero wins above himself).
His new wins above average is +10 wins over 500 PA (including the IBB).
Remember, the IBB is win-neutral relative to the player at bat, but not relative to the average player.
Great comment Colin!!
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 3:08 p.m.,
September 8, 2003
(#11) -
tangotiger
Colin, I'm rereading what you said, and you said it better than I did.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 3:56 p.m.,
September 8, 2003
(#13) -
tangotiger
Suppose you have bottom of the 9th, home team down by 1, man on 2b, 1 out. With everyone in the game an average player, the chances of the home team winning is .296. Suppose though that with Barry Bonds at the plate, the chance that the home team will win is .370. (I didn't check what it is, so let's go with that.) So, do you walk him or not?
Well, the win expectancy for bottom of 9th, with men on 1b and 2b, and 1 out, and down by 1 is .351.
So, insofar as the visiting team is concerned, walking Bonds is worth -.019 wins to the home team.
But, to Bonds himself, he turned a .296 situation if he was not the batter into a .351 situation because he was the batter after the event completed. That is worth +.055 wins for Bonds' IBB.
Because managers probably walk Bonds and INCREASE the chance that Bonds' team wins, we can guess that the win expectancy before a Bonds PA and after a Bonds PA to be virtually the same, following an IBB.
It's a win-neutral event to the visiting team, but a huge win-gaining event for Bonds himself.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 4:50 p.m.,
September 8, 2003
(#15) -
tangotiger
(homepage)
Colin,
I'll be working on that in the upcoming months (my guess is that it is win-neutral by batter), but in the meantime, you may be interested in the above link.
Tom
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 6:49 p.m.,
September 8, 2003
(#17) -
Tangotiger
No, what he is saying IS consistent with what I am saying. The defense is better off walking Bonds (they gain say +.02 wins in the process).
But, from the perspective of Bonds, Bonds alreay gains +.20 wins just for being in the batter's box. By being handed 1B in that situation, his worth is now +.18 wins instead.
It's a question of which perspective you have, the offense, the defense, or the batter.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 7:02 a.m.,
September 9, 2003
(#19) -
Tangotiger
Yes!
Just like Pedro would be less valuable if the opposing manager would be allowed to have him replaced for one batter when Pedro allows a runner to get on base, and replace him with the mop-up guy. (Not THAT bad, because Bonds does get to go to 1B.)
How much impact is this? I don't know, but it might be a bit. I did publish the "Win Probability Added" a few months ago, and Bonds' numbers were NOT out of this world (though they were pretty incredible and tops in the league), for 1999-2002.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 3:28 p.m.,
September 9, 2003
(#21) -
tangotiger
Be careful between your use of runs. The .125 is that absolute runs, or marginal runs?
The IBB has a marginal run value of .17 runs or so for the average player from the perspective of the team, and probably including Bonds. The win-value of the IBB is win-neutral from the perspective of the player, as discussed.
The NIBB has a marginal run value of about .32 runs for the average player, and probably for Bonds as well. Though, my guess is that for Bonds, because he probably gets alot more NIBB with 1B open, that the walk is worth less to Bonds (maybe .28 runs or something).
Now, suppose you have 2 equals, and we'll call them Pujols and Bonds. But one of them gets IBB alot, and the other, not as much. In the cases where Bonds can do alot of damage, he gets IBB. But, Pujols gets pitched to, and as a result can create more wins than Bonds in the exact same situation.
That is, if we have that late and close situation where walking Bonds will have a win expectancy of .35 and facing him will be .37, they walk him. But Pujols, they face him, and he makes them pay... to the average win expectancy of .37.
Like it or not, Pujols has now impacted his overall PAs more than Bonds (assuming they were equals to begin with).
So, yes, it does make a difference, if the managers are approach players in non-optimal ways.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 4:19 p.m.,
September 9, 2003
(#23) -
tangotiger
Robert is reporting marginal runs. In fact, you should ALWAYS talk about marginal runs in cases like this. ALWAYS.
****
Let me think about the rest of your post.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 2:58 p.m.,
March 18, 2004
(#34) -
tangotiger
For pitchers, you want BaseRuns per out (akin to ERA).
For hitters, you want LWTS per PA.
The formula shouldn't change based on era, though I have not yet tested whether the best-fitted 1974-1990 BsR matches that of 1991-2003. I'm sure it would be quite close.
***
Also, be careful on using the equation with missing data. Any fudge factor you apply can ONLY be applied to the "B" component. As Patriot rightly pointed out, Clay Davenport did NOT do this for BsR, thereby making BsR looks worse than it should have been.
The idea behind BsR is very simple. As the OBA approaches zero, the run value of the HR approaches 1, and the run values of all other events approach 0. As the OBA approaches 1, the run value of all non-out events approach 1, and the out event approaches infinity.
BsR is the only model that adheres to these known constraints. And, to boot, it's as accurate as anything out there in the "regular" MLB run environments.
Bonds, Pujols and BaseRuns (September 6, 2003)
Posted 10:12 p.m.,
March 18, 2004
(#37) -
tangotiger
Those are good points. The fudge should go wherever the data is lacking. If HBP is lacking, then of course you need to fudge the A as well.
How you fudge is not clear. I fudge as a function of PAs, or estimated PAs. I don't think that the HBP fudge should be a function of H+BB. IBB might be a function of BB. I guess you'd have to run a regression to figure those things out.
By The Numbers - Sept 7 (September 8, 2003)
Discussion ThreadPosted 3:31 p.m.,
September 9, 2003
(#8) -
tangotiger
Keith Woolner has some good data at BP from a few years ago. Do a search for "Lumina", and you should get it.
By The Numbers - Sept 7 (September 8, 2003)
Posted 3:35 p.m.,
September 9, 2003
(#9) -
tangotiger
(homepage)
Actually, the data you want can be found at the above link from diamond-mind.com .
By The Numbers - Sept 7 (September 8, 2003)
Posted 5:25 p.m.,
September 17, 2003
(#13) -
tangotiger
That Jai Alai equation makes no sense from what I'm looking at.
Plug in a .600 team against a .500 team, and the result SHOULD be .600, but it is nowhere close to that.
As it stands now, the best method to use is the Odds Ratio method. Maybe I'll put up a Javascript program so that people can use it.
Livan Hernandez and Scouting (September 10, 2003)
Discussion ThreadPosted 2:33 p.m.,
September 10, 2003
(#2) -
tangotiger
Suppose that two pitchers named Orlando and Livan have been pitching rather so-so (to the eye and performance-wise).
The pitching coach comes up to both of them and exclaims "I know what you are doing wrong!" They practice the two new pitching mechanics, and the pitching coach is satisfied that they both have adjusted properly, and should be more effective.
For the next 7 starts each facing a total of 200 batters, Orlando shows no change in his performance numbers (always striking out 15% of his batters), but Livan does improve his numbers (say from striking out 15% of his batters to striking out 25% of his batters).
You also have a third pitcher, say Pasqual, who always strikes out 15% of batters, but, without changing anything, struck out 25% of his next 200 batters.
Question to the Primate statisticians:
1 - what is your best guess as to Livan's true K rate (assume lg of 15% if you need that)?
2 - What is your best guess to Orlando's true K rate?
3 - What is your best guess to Pasqual's true K rate?
Please provide confidence levels and margin of error.
Livan Hernandez and Scouting (September 10, 2003)
Posted 3:52 p.m.,
September 10, 2003
(#4) -
tangotiger
Interesting approach. I'll give you my thoughts tomorrow, as I'd like to see how others would approach this.
I do agree with your last statement, and it is for this reason that I *do* believe that scouts serve a purpose, the extent of which is yet to be established by the public (though may been established privately).
Accuracy of Run Estimators (September 12, 2003)
Posted 11:51 a.m.,
September 12, 2003
(#1) -
tangotiger
Patriot tried more best-fit equations for BsR and RC and he is reporting the following
For RC:
1st--24.89
2nd--23.02
3rd--22.60
BsR:
1st--22.66
2nd--22.48
3rd--22.44
So, RC vaults from worst to 3rd best, and BsR jumps to best.
Like I said, all we are doing is best-fitting. It doesn't prove anything.
Accuracy of Run Estimators (September 12, 2003)
Posted 1:39 p.m.,
September 12, 2003
(#4) -
tangotiger
Cool, good stuff!
What I wouldn't mind seeing (if not from Patriot, from some aspiring sabermetrician) is using the 1974-1990 data as the "sample" data to fix all your equations. What's good about this is that I already give you what the "plus 1 method" true value to fix against (at the bottom of the article in article 1, or at the bottom of the page of article 3, which links to the BaseRuns addendum). You can limit it to the fields you've been using (ab,h,2b,3b,hr,bb,sb).
Once you've fitted all the equations against this data, you then apply it to the 1961-1973 and 1991-2002 data.
As Patriot is starting to show here, I would guess that BsR would come up with better estimate than the best-fit linear equation, and probably anything else.
What's really cool about the time periods I am showing is that they each have their own pecularities, and so, should be a good test against extreme-type team-seasons.
Accuracy of Run Estimators (September 12, 2003)
Posted 3:24 p.m.,
September 12, 2003
(#7) -
tangotiger
(homepage)
Great stuff again Patriot!!
You will find the "absolute" (along with the "marginal") event values of empirical data from 1974-1990 at the above link. The CS value is something like -.28 runs.
Accuracy of Run Estimators (September 12, 2003)
Posted 4:00 p.m.,
September 12, 2003
(#9) -
tangotiger
(homepage)
You may be interested by the great work by Tom Ruane, who uses the "runs value-added" approach, on a PA-by-PA basis for all players from 1980 to 1999.
The next step is of course the Mills' brothers approach on WPA. That will come from me next year, unless someone else beats me to it.
Accuracy of Run Estimators (September 12, 2003)
Posted 11:07 p.m.,
September 12, 2003
(#13) -
Tangotiger
I agree with Robert's general sentiment. Let's get it right first, and let others worry about how accurate they need something.
As for the CS, please note the difference between an "absolute" method and a "marginal" method. When the out value is set to -.10 runs or thereabouts, you are employing an absolute method. When the out value is set to about -.27 runs, you are using a marginal method.
The same applies to CS. -.28 runs? Absolute. -.45 runs? Marginal. Check out article 2 of "How runs are created" for more on this.
Accuracy of Run Estimators (September 12, 2003)
Posted 10:15 p.m.,
September 13, 2003
(#18) -
Tangotiger
All of Patriot testing shows that there is an enormous number of teams where there is not a large distinction between them. Essentially, most of the teams are .320 to .340 OBA and .390 to .420 SLG (or whatever).
So, what his testing shows is that all these run estimators "work", not for any logical reason, but simply because every team in the sample group also has a matching team in the testing group (more or less).
However, to extend these things beyond your sample group, to pitchers like Gibson for example, you need to be grounded in logic. And for that, you need a non-linear interdependent model. And that can be generated using a math model or sim or custom-RE matrix (and looking at change-in-states). Or, you can use the thing that most closely matches what you really want: BaseRuns, or the custom-LWTS generated from BaseRuns.
Can you use ERP or XR or LWTS or RC or EqR or ....? Sure thing. As it turns out, while run creation is non-linear interdependent, you can assume a linear independent process and you will be pretty close (say within 3 runs / 600 PA for a hitter).
Much ado about nothing for most people. But, if someone says that EqR or RC basic LWTS is fatally flawed, you can't argue with that either.
The proper thing would be for say Bill James or Clay Davenport to say "hey, this is how accurate it is... it won't work for these kinds of teams or players.... so, it's up to you to decide if this is good enough".
Accuracy of Run Estimators (September 12, 2003)
Posted 3:32 p.m.,
September 15, 2003
(#20) -
tangotiger
(homepage)
I made a comment at battersbox.ca at the above link, and again 2 posts later.
Accuracy of Run Estimators (September 12, 2003)
Posted 7:19 a.m.,
September 17, 2003
(#22) -
Tangotiger
I have not done any of that work, but the pitcher thing is I think the most worthwhile to pursue.
The results of that will make it clear as to the relevane of BsR compared to the others. And, since we've got pbp for the last 30 years, we'd be in great shape to get all the data we want at the pitcher level.
DIPS bookmarks (September 13, 2003)
Discussion ThreadPosted 11:32 p.m.,
September 14, 2003
(#6) -
Tangotiger
Charlie, I would guess it would not matter.
For example, I think MGL showed that there was a 2 to 3 run difference max / 600 PA with the quality of opposing pitchers for each batter. That is, 1 SD = 1 run / 600 PA. In there, it includes HR, BB, K. I would therefore guess that 1 SD would equal about .5 hits / 400 BIP, or 1 SD = .001.
1 SD for the park variation is .004, and the fielding is probably .007, and the pitching is .009.
So, I would guess that the hitting variation would be virtually insignificant.
Just an educated guess though...
DIPS bookmarks (September 13, 2003)
Posted 10:44 a.m.,
September 15, 2003
(#8) -
tangotiger
Remember, as long as the opposition hitting distribution is random, then we don't have to worry about it.
That is, you are trying to answer the following question:
"What is the true variance of the opposition hitting, over and above luck, that would produce the observed variance?"
If the observed variance is exactly as would be predicted by luck, then the true variance of the opposition hitting is zero, and we don't need to consider it as a variable. Don't forget that we are looking at the pitchers as a group, and we are not trying to pinpoint the effect on any single one pitcher.
I think that's right.
Clemens' turnaround? (September 15, 2003)
Discussion ThreadPosted 5:11 p.m.,
September 15, 2003
(#1) -
tangotiger
And of course, win-loss records are very heavy team-dependent. Don't we all know this already?
Clemens' turnaround? (September 15, 2003)
Posted 10:57 p.m.,
September 15, 2003
(#3) -
Tangotiger
I just used one of my quick things. I looked for all pitchers with a K/9IP and B/9IP that were within .7 of Clemens, and between 600 and 900 IP over 4 years, and at the age of 31-34.
Certainly not an exhaustive study, nor the best ones. But, I'm simply providing some data that Mr Edes did not.
First, Edes was wrong in calling Clemens' 4 years "mediocre", since the other pitchers I've shown had quite good 4 years. Secondly, it wasn't such a great turnaround. Clemens was already starting from sky high, and he became very good, and then followed that up with sensational to great.
The other comps did not start so high, but they also continued to have a few more good years after those similar "mediocre" years.
And to say "100 years" as if Edes actually looked into it? All these comps were within the last 10 years.
Patriot: Baselines (September 17, 2003)
Discussion ThreadPosted 9:27 a.m.,
September 18, 2003
(#2) -
tangotiger
My personal preference is to present TWO figures,
the "Wins" and "Loss" or
"Runs scored" and "Runs Allowed" or
"Player x" and "Average"
and then let the reader decide how to manipulate the numbers.
If you want win differential, fine. If you want 2*wins - losses, fine too.
As Patriot noted, we each have our own objectives and questions to answer. Providing two numbers allows all those objectives to be answered individually, instead of having the Win Shares or TPR model imposed on us.
Patriot: Baselines (September 17, 2003)
Posted 10:50 a.m.,
September 18, 2003
(#4) -
tangotiger
I agree with Michael's assertion about the comprehensiveness and balance to the article.
The tier-ed approach is also excellent because it attempts to model reality, and I'm a big proponent of that line of thinking.
I also agree with the quote of Patriot that the "playing time issue" and the "negative /positive" paradox is not limited to a .500 baseline but to any baseline. This was well-described by Patriot.
In terms of paying someone money, you'd pay someone the minimum to perform the minimum. Kinda like a college graduate coming in as a stage at my company. Any marginal performance above this marginal player gets money at a multiple of this marginal performance.
(You can of course get a non-linear relationship, but I don't believe in that, unless you factor in playoffs.)
Therefore, what a team pays is based on overall team performance. If you lose Vlad, you have a chaining process so that the team won't be as bad as replacing all his PAs with a schlub.
A team pays based on the marginal change to the overall team, but crediting that change to the variable that changed.
It's a fascinating topic, and as Patriot points out, there's no 1 right answer. From this standpoint, Pete Palmer and Bill James should listen and read Patriot's article.
Patriot: Baselines (September 17, 2003)
Posted 8:28 a.m.,
September 19, 2003
(#9) -
Tangotiger
What if the true FAT line is .340? .330? .300?
Patriot: Baselines (September 17, 2003)
Posted 10:26 a.m.,
September 19, 2003
(#11) -
tangotiger
(homepage)
I just want to point people who may not have seen it to the above link. It's my theoretical work (with some empirical data to support it) on the talent distribution in MLB and around the world.
I think it's easy to see that while the ideal line might be at 80% of MLB average, that the non-uniform distribution of talent at the team and position level would make it a very non-stationary line.
Again, depending what you want, using an average baseline is perfectly fine. Going forward, as Patriot noted, all you want is a rate stat. You just need to know that this player is a "101" and that player is a "98" and it's irrelevant that the average is "100" or that the minor leaguer is a "75" or whatever. 101 is better than 98.
Going backwards, the 101 might have contributed 1.1 wins and 1.0 losses, while the 98 might have contributed 3.9 wins and 4.0 losses. Playing time is a consideration. While the 101 did contribute more than the opponent that he actually played against, the 101 contributed LESS than the opponent that he did NOT play against (because he was on the bench at the time). There's an opportunity cost in sitting down and not playing, and that's in letting someone else, presumably worse than you, play.
Patriot: Baselines (September 17, 2003)
Posted 11:48 a.m.,
September 19, 2003
(#13) -
tangotiger
This assumes that the replacement line is fixed, whereas the likelihood is that the replacement line is centered around some point, say 80% of league average, with a distribution around it, of say 1 SD = 3%.
So, the question is: what is the probability true talent distribution of the .345 player in 500 PA? You might say that it is centered at .355, with 1 SD = .020. We can never know for certain what his true talent level is, so the best you can do is come up with a distribution of what his true talent probably is.
Then, you ask the same question about the .355 player in 10 PA. Maybe it's centered at .370, with 1 SD = .050. You are less certain of his true talent level, and therefore, you distribution is much wider than the first guy.
Now, overlaying on these 2 distribution is the "replacement-level" true distribution. And again here, we don't have 1 fixed point. The true level might be .350, with 1 SD = .01.
Finally, the question you can ask is: what is the probability that the first player is above a replacement-level player? In essence, what the chance that a .345 in 500 player (or a true .355 player, +/- .020 = 1SD) would "win" against a .350 +/- .01 player?
And you ask the same question of the .370 +/- .050.
Patriot: Baselines (September 17, 2003)
Posted 8:05 a.m.,
November 12, 2003
(#17) -
tangotiger
Bringing this forward for those who missed it.
Patriot: Baselines (September 17, 2003)
Posted 10:24 a.m.,
November 12, 2003
(#19) -
tangotiger
Herman had three times more value above his actual .500 opponent than did Myer
is meaningfully different from saying
Herman was three times the player that Buddy Myer was
The first one is a relative scale, and the second one is an absolute scale.
You cannot, just cannot, perform your division/multiplication on a relative scale and think it's going to give you anything meaningful.
Compare -1 celsius to +1 celsius. Compare -1 runs to +1 runs. Compare +.0001 runs to +1 run. Compare +1 runs to +10 runs.
Why in the world would you try to do +1/.0001 ? Or +1/-1?
Now, if you had one player being "101" and the other being "99" (where "100" is average), then you'd be on firmer ground.
Patriot: Baselines (September 17, 2003)
Posted 11:19 a.m.,
November 12, 2003
(#22) -
tangotiger
I agree on the issue of Palmer's intent. Palmer is wrong about his intent, and James is wrong for blasting wins above average, because of Palmer's intent. Just because Palmer misused it doesn't mean that the whole framework is wrong.
Patriot: Baselines (September 17, 2003)
Posted 1:34 p.m.,
November 12, 2003
(#24) -
tangotiger
Patriot, I reread your article again. Just an excellent piece!
Thanks for the clarification from the Palmer quote. He makes perfect sense there. I'll guess that David's Palmer quote was probably some "rush statement" he made, similar to stuff James would say in ESPN Chats.
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 12:05 p.m.,
September 18, 2003
(#8) -
tangotiger
Good job Joe!
Here's his list in "pecentage" format. You'll note the juice adds an extra 38%. Ahh, to be a bookie.
NewYork 25.0%
Oakland 20.0%
SanFrancisco 20.0%
Atlanta 14.3%
Boston 14.3%
Minnesota 11.1%
Chicago 7.7%
Florida 6.3%
Houston 5.3%
Philadelphia 4.8%
Chicago 3.8%
LosAngeles 2.0%
Seattle 2.0%
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 12:22 p.m.,
September 18, 2003
(#10) -
tangotiger
We also should remember a few things that would change the odds drastically:
1 - Your top 3 starters have a higher % of innings in the playoffs than regular season
2 - Your top 2 relievers have a higher % of innings (and high-leverage innings) in the playoffs than the regular season... I think Mariano Rivera has something like 80 innings in 87 Yankee playoff games
3 - Certain types of hitters, though we haven't established which ones, may be less able to optimize their abilities against higher level of pitching
4 - The park affects every player (hitter or pitcher) differently. While in a season, the road parks balance all this out, more or less, it's not necessarily the case in a short series (say Ted Lilly at Fenway).
Because of the non-random nature of the contexts faced by the players, the "true talent" level of each team might vary greatly in the "playoff universe".
And finally, injuries.
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 1:35 p.m.,
September 18, 2003
(#11) -
tangotiger
And by the way, kudos to Nate Silver for putting this up:
DISCLAIMER: Because this analysis does not take into account head-to-head matchups, it may be less reliable from this point in the season onward.
It's good to see analysts establishing the boundaries of their work to the readers.
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 5:08 p.m.,
September 18, 2003
(#16) -
tangotiger
Primer policy has: Comments ... may be removed...if the comment ... really does nothing to move the conversation forward.
It's (unfortunately) rarely exercised, but I thought the volume (especially by me) was just too much. I'll be happy to send anyone the exchange.
In the words of the great Leslie Nielson: "There's nothing to see here. Please move on. "
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 5:20 p.m.,
September 18, 2003
(#18) -
tangotiger
The big differences are all at the top:
YANKEES 23.2 ... 16.6
GIANTS 15 .... 21.9
BRAVES 13.8 .... 24.8
ATHLETICS 12.5 .... 14
REDSOX 11 .... 7.8
I think the Yanks have a bias for whatever reason. The A's+Sox are 23.5% on the one side and 21.8% on the other. So, it comes down to the Giants+Braves being so much higher with dackle.
So, the question to ask first is:
Is the AL/NL talent even that you can do as dackle is doing? Or are some NL teams getting the benefit on beating up on worse teams than the AL does? Or are the AL teams so evenly matched that no 1 team can really stand out?
The other thing to remember is that if the Giants/Braves are really as good as suggested by their W/L record, compared to the Yanks/A's, etc, then it's no surprise that a team from the NL will be more likely to win than a team in the AL.
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 10:18 a.m.,
September 19, 2003
(#20) -
tangotiger
I'm sure it does not.
I would think that the best way to do these odds is to use Diamond-Mind baseball or some similar game, where the random creation of injuries, or the occurrence of injuries based on past history can be incorporated into the game.
As well, since pitcher usage changes in the playoffs, again, you can incorporate that as well.
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 12:50 p.m.,
September 19, 2003
(#22) -
tangotiger
Can someone take the top 5 NL teams and top 5 AL teams, and see how they did against "same competition"?
That is, lump ATL, SF, et al into "NL leaders", and find their records against their AL opponents. Weighting only by the games against the AL opponents, how did those 5 unweighted AL leaders do?
And vice-versa. W/L, RS/RA would be nice.
Anyone, anyone? Bueller?
Fanhome's Dackle: World Series Odds (September 18, 2003)
Posted 3:50 p.m.,
September 19, 2003
(#24) -
tangotiger
Just to add some data, NL teams are .544 when facing AL teams.
That is, an average NL team will win .544 of their games against an AL team. An average AL team will win .500 of their games against an average AL team.
RS/RA would have been better to use, but I no got.
1 standard deviation is about .030, so this may be completely due to luck.
TheStar.com - Analyze this: NBA '04 (September 19, 2003)
Discussion ThreadPosted 11:06 a.m.,
September 19, 2003
(#3) -
tangotiger
Funny thing, Andrew. I've been doing the goalie stuff for a few years now. I've got a system in place that I think is pretty simple and logical. Same thing for plus/minus for players. I've got "sim scores" for players as well.
But, seeing my time management skills aren't up-to-speed yet, I'm not sure when I can deliver on this publicly, if ever.
TheStar.com - Analyze this: NBA '04 (September 19, 2003)
Posted 3:41 p.m.,
September 19, 2003
(#5) -
tangotiger
I agree that soccer won't translate as well, since you would get enough sample to do your testing on.
The way to do the plus/minus thing is virtually the same as you would do with strength-of-schedule. I would call this "strength-of-context" (SoC).
Jason Kidd + player1 + player2 + player3 + player4 + opp1 + opp2 + opp3 + opp4 + opp5 + Home/Away = 14 pts for + 11 pts allowed over 30 minutes
You do this for every single combination of teammates and opponents. And for every player. It simply becomes a mathematical problem.
I agree you lose sample size at this level, but there's ways around it. (I wrote to a basketball exec asking him for the pbp files, and I'd do it for free, but no dice. Funny isn't it? I would imagine if I told him I'd do it for 10,000$ that he would take me more seriously.)
Pitchers, MVP, Quality of opposing hitters (September 19, 2003)
Posted 3:33 p.m.,
September 19, 2003
(#2) -
tangotiger
Here's data to support your conclusion:
***
Looking at all hitters with at least 400 PA, the standard deviation of their opposition pitcher's OPS is .009.
Looking at all pitchers with at least 400 PA, the standard deviation of their opposing hitter's OPS is .017.
Concentrating only on those hitters and pitchers with 400 to 600 PAs, the standard deviations are .010 and .022, respectively.
Pitchers are much more likely to be influenced by their schedule than hitters are.
Pitchers, MVP, Quality of opposing hitters (September 19, 2003)
Posted 10:34 p.m.,
September 19, 2003
(#3) -
Tangotiger
By the way, the reason that pitchers will be more influenced is not necessarily the distribution of their opposing teams, but rather than there is a larger spread of true talent among hitters than pitchers on a per PA basis.
If someone wanted to, they can figure out the stdev on OPS for hitters and pitchers with at least 400 PA. I don't think it will be twice as much for hitters, but it will probably be close to it.
Pitchers, MVP, Quality of opposing hitters (September 19, 2003)
Posted 3:33 p.m.,
September 20, 2003
(#6) -
Tangotiger
Charlie,
Suppose that pitchers did not have a home park, and instead randomly played at a park for each start. They won't pitch exactly once at each park. Some parks they won't pitch in, and others they might pitch 2 or 3 times in.
As long as the distribution of where they pitch can be explained by random chance, then we don't need to consider the park factor. I think that the Central Limit Theorem would apply (though don't quote me on that).
The same would be the case with their opposing hitters. As long as the distribution can be explained by random chance, then the TRUE VARIANCE (which is what we are after) would be equal to zero. Now, I grant you, that the opposing hitters might not be due to random chance, and there is something at work here. However, I would guess that we are talking about a true variance of .001 to .002. I would be surprised if it's any higher than that.
Pitchers, MVP, Quality of opposing hitters (September 19, 2003)
Posted 6:43 p.m.,
September 20, 2003
(#8) -
Tangotiger
My guess is that the #3 and #4 hitters have an LI of 1.05 to 1.10.
As for 2B/SS, I don't think you'll find much spread in talent there compared to 3B or CF, fielding-wise.
I agree that it is tough for a pitcher to keep up, but it is not unreasonable to think a great year from a pitcher is equal to an almost great year from a hitter. Pedro and Gooden and Maddux and RJ easily equal the greats, regardless what other analysts say. Loiza/Halladay? Maybe not, but they are in the top 10.
Sabermetrics >WIN SHARES bibliography (September 19, 2003)
Posted 10:30 p.m.,
September 19, 2003
(#4) -
Tangotiger
I think Sean Smith has the basketball win shares somewhere...
Sabermetrics >WIN SHARES bibliography (September 19, 2003)
Posted 11:35 a.m.,
September 29, 2003
(#11) -
tangotiger
Rally: if you want, I can post it up, though I think "studes" at baseballgraphs.com might want to post it.
Instructions for MVP (September 22, 2003)
Discussion ThreadPosted 4:19 p.m.,
September 22, 2003
(#3) -
tangotiger
The genius of BBWAA is that by not making things stone-cold, it opens up the debate every single damn year, making this debate probably the most debated topic after Pete Rose.
Instructions for MVP (September 22, 2003)
Posted 12:40 p.m.,
September 29, 2003
(#14) -
tangotiger
Value in a loss:
It depends what the objective of the player is. I look at the objective of the player in "trying to help his team towards winning that game".
From that standpoint, having Ted Lilly pitch a 1-0 no-hit loss would qualify as having alot of value. He kept his team in the game as much as he could. Therefore, I measure value from the "win probability added" perspective, where you measure the change in theoretical win expectancy after every discrete event, and attribute (somehow) the change to the players involved.
However, people can view it from "playoff probability added", or "playoff probability added in games won", or whatever. Again, you can really make the definition however you want it, since it's not clearly defined as to what value is.
My only problem is people saying that they won't vote ARod #1, but will think nothing of voting him #2. Apparently, there is such a huge price to pay in not playing for a contender that it knocks say about 20 effective runs from him for the #1 spot, but it doesn't knock him down enough to put him in the #10 spot or so.
That is, if you have something like this, as runs above replacement (illustration purposes only):
+80 ARod
+65 Boone
+63 Posada
+61 Ramirez
+59 Halladay
etc, etc
Somehow, ARod is dropped from the top spot, and someone else gets ahead of him, but enough to stay ahead of everyone else. In this case, the voters penalize ARod for performing on a team that doesn't have much to gain by exactly 16 runs. They could have penalized him 14 runs, and let him take the top spot. They could have penalized him 22 runs, and let him take the #5 spot. But, no. The way the ballots will come in, he'll get either a #1 or #2 spot on every ballot (probably).
In fact, chances are, because of the diverse possibilities, he might get the fewest #1 ballots and still be MVP. Anyone have a count of the current record?
Anyway, I think some voters have decided only after the fact what the penalty is. If they would say before the fact, that they will penalize such a player 20 runs, then I can respect that. I don't think that's how some voters think. They just "look at everything", and make their selection.
Of course, they can "look at everything" in 3 months, and have different conclusions too.
Instructions for MVP (September 22, 2003)
Posted 12:45 p.m.,
September 29, 2003
(#15) -
tangotiger
As for foreign players and the Rookie of the Year: I have no problem setting an age limit. They did this in hockey, and no one complains. Makarov, probably the best player on the best Russian team that ever played, a team that went toe-to-toe with probably the best Canadian team that every played...
(Gretz and Lemieux on the same line!... I love that play where they win the draw in their own end, and Gretz send that pass against the boards to Lemieux, who passes it back to Gretz, and while a Russian falls/trips, Lemieux goes by him, gets the pass from Gretz, and even though Murphy is wide open at the net, Lemieux, at an unbelievable speed, just flicks the shot passed the goalie!.... I LOVE that play!)
... won the Rookie of the year. With the incredible influx of new talent in the league, they quickly capped the rookie of the year age at 26 or something. (Interestingly, Gretzky, in his first year in the NHL, won the MVP, but not the Rookie of the Year, because he played the prior year in the WHA.)
I think we need a hockeyprimer.com
Most pitches / game in a season (September 22, 2003)
Posted 4:35 p.m.,
September 22, 2003
(#1) -
tangotiger
Same list, ordered by name.
playerID yearID PperG
allenjo02 1933 120
blackew01 1947 127
bluevi01 1971 120
blylebe01 1973 121
blylebe01 1975 120
blylebe01 1976 126
bridgto01 1935 122
carltst01 1969 120
carltst01 1970 121
carltst01 1972 126
carltst01 1974 122
carltst01 1980 123
carltst01 1981 123
carltst01 1982 121
carltst01 1983 123
chandsp01 1942 124
chaseke01 1940 123
chesnbo01 1948 120
clemero02 1987 122
clemero02 1997 120
colemjo04 1949 120
coneda01 1995 121
coopewi01 1921 126
fellebo01 1938 127
fellebo01 1939 123
fellebo01 1941 130
fellebo01 1946 123
ferrewe01 1936 120
ferrewe01 1937 123
gibsobo01 1965 123
gibsobo01 1968 128
gibsobo01 1969 137
gibsobo01 1970 136
gibsobo01 1971 121
gibsobo01 1972 124
gomezle01 1937 125
grimebu01 1921 121
grimebu01 1923 126
grimebu01 1924 125
grovele01 1937 126
grovele01 1939 122
hudsosi01 1950 124
hunteca01 1976 120
jenkife01 1971 121
johnsra05 1992 121
johnsra05 1994 123
johnsra05 1999 123
kenneve01 1936 127
langfri01 1981 120
langsma01 1987 128
langsma01 1988 121
leeth01 1941 125
leonadu02 1940 125
lolicmi01 1969 122
lolicmi01 1971 127
lopated01 1947 120
lyonste01 1935 120
lyonste01 1938 127
lyonste01 1940 120
lyonste01 1941 121
lyonste01 1942 120
malonji01 1965 124
marchph01 1947 124
maricju01 1964 120
maricju01 1968 122
marreco01 1952 123
martipe02 1997 120
mccatst01 1981 122
mcdowsa01 1970 127
morrija02 1983 121
morrija02 1987 122
newhoha01 1946 122
newsobo01 1938 129
niekrph01 1977 126
niggejo01 1944 132
norrimi01 1980 126
palmeji01 1977 121
pascuca02 1963 123
perryga01 1969 123
perryga01 1972 120
perryga01 1973 130
perryga01 1974 127
perryga01 1975 124
pfeffje01 1919 125
piercbi02 1956 124
raschvi01 1950 124
richajr01 1976 120
richajr01 1978 127
richajr01 1979 121
rignejo01 1941 122
ringji01 1923 121
ruethdu01 1923 122
ruffire01 1936 123
ruffire01 1937 122
ruffire01 1938 122
ruffire01 1939 122
ryanno01 1972 121
ryanno01 1973 133
ryanno01 1974 135
ryanno01 1975 123
ryanno01 1976 127
ryanno01 1977 142
ryanno01 1978 133
ryanno01 1989 122
schilcu01 1998 120
schwado01 1961 120
scorehe01 1955 120
seaveto01 1970 121
shantbo01 1952 120
sheasp01 1952 121
singebi01 1973 126
smithed04 1941 120
sotoma01 1983 124
spahnwa01 1951 120
stonebi01 1971 124
tananfr01 1976 126
tiantlu01 1974 121
turlebo01 1954 122
uhlege01 1926 125
valenfe01 1984 124
valenfe01 1985 120
valenfe01 1986 123
valenfe01 1987 124
vanceda01 1924 131
vanceda01 1925 131
vandejo01 1943 130
vuckope01 1982 120
waltebu01 1940 120
waltebu01 1941 120
whiteea01 1931 124
whiteea01 1935 128
whiteea01 1936 120
whitejo02 1935 122
wittbo01 1988 133
Most pitches / game in a season (September 22, 2003)
Posted 9:53 a.m.,
September 23, 2003
(#5) -
tangotiger
Michael, that is actually something I will be looking at within the next 6 months.
In my preliminary look, I see no evidence of arm damage and usage pattern at this level. Until I continue this study, I would say that pitchers can handle 120 pitch outings, and that relievers can handle 100 inning seasons without extra damage.
Most pitches / game in a season (September 22, 2003)
Posted 10:15 a.m.,
September 23, 2003
(#6) -
tangotiger
Bob, those were simply results from my already-published pitch count estimator. I published the estimates for all pitchers since 1889 (when the ball-strike count was finally the most similar to today) at the baseball-databank yahoo group.
Most pitches / game in a season (September 22, 2003)
Posted 12:25 p.m.,
September 23, 2003
(#8) -
tangotiger
My estimates are on a seasonal-level. I only have the pbp from 1972-1992, and I have not confirmed if my estimates work at that level. I would think they would not.
For example, say you have RJ who is a non-contact pitchers, but he happened to pitch 7 innings, with 5 hits, 1 walk and 4 K. Now, with my estimates, it would infer that he did not go deep into the count. But, what if you had Brad Radke post the same line?
I think what happens here is a sampling issue. That, probably, RJ went deeper into the count than Radke did based simply on their pitching styles, but that, by luck, hitters managed to contact balls deep in the count with RJ, rather than K/BB as they normally would.
In any case, I will eventually produce such numbers, and we can compare to see if my theory here would hold.
***
I did previously publish Koufax's log, and the standard deviation there was much much higher than what we are used to. He had numerous pitch outings of over 140 pitches/game. However, his overall average was under 120. Why? Because he had numerous pitch outing of under 100 and even under 80 pitches/game.
I'm dumbfounded by this type of pitcher usage pattern (I'm guessing managers saw the small sample that Koufax was "in trouble" and assumed he was off that day. Today's managers are probably much smarter and realize that small sample size is probably at work, and maybe their pitching coach is telling them that mechanically all is fine, so just wait for the performance to catch up to his talent.)
Most pitches / game in a season (September 22, 2003)
Posted 3:15 p.m.,
September 23, 2003
(#10) -
tangotiger
For relievers, yes, it is based on broader data. Relievers are being incredibly pampered.
For starters: stay tuned!
Most pitches / game in a season (September 22, 2003)
Posted 11:50 p.m.,
September 23, 2003
(#13) -
Tangotiger
That's great work!!
Using the basic pitch count estimator on:
IP: 7.7, H: 6.4, BB: 0.7, SO: 3.9, Pitches: 99.5.
I get: 104 pitches.
Applying the basic estimator to RJ:
IP : 7.7, H: 7.0, BB: 0.4, SO: 5.7, Pitches: 103.1.
and I get: 107
and to Schilling:
IP : 7.8, H: 8.2, BB: 0.5, SO: 5.5, Pitches: 108.7.
I get: 112
Since the basic pitch count estimator overvalues the non-Contact pitchers (or games), if I had used the extended pitch count estimator, I'd guess that I'd come in pretty close to the actuals.
It seems therefore that I am probably wrong in my original thought that there is a difference between two similar outcome games from Radke and RJ. Chances are, they both got there the same way.
Very interesting.
Great work!!
Most pitches / game in a season (September 22, 2003)
Posted 10:16 a.m.,
September 24, 2003
(#14) -
tangotiger
When I use the extended pitch count estimator, and I compare it to the actual pitch count, my estimates are 3 to 4 pitches above the actual in each case.
Therefore, I would say that it is irrelevant what type of pitcher put up those performances, and that my original thought was wrong.
Aging patterns (September 23, 2003)
Posted 11:42 p.m.,
September 23, 2003
(#5) -
Tangotiger
Yes, I did the same for pitchers, and they peaked at age 24.
The problem is that I didn't handle "regression towards the mean", the single most important thing to understand if you're going to do these things. I was unaware of the concept then, and will eventually redo these charts for pitchers and hitters, as well as taking care of park and era adjustments.
The strike zone IS a learned skill, but it is based on experience and intelligence. Even things like power IS a learned skill. You may physically mature at age 23 or so, but hitting HR or hitting with power also requires experience and maturity. It's not like it's a T-ball league.
Aging patterns (September 23, 2003)
Posted 12:26 a.m.,
September 24, 2003
(#7) -
Tangotiger
I wasn't trying to be facetious or anything. It's basically bs when someone says that you can "learn" something, and I don't care if it's Beane or Bill James that says it.
What they are probably trying to say is that there is a certain learning pattern that is being followed, and you won't be able to change that pattern too much. That is, whether you are Frank Thomas or Alfredo Griffin, you may start off at different levels, but you will both follow the curves that are based on your skillset, and that those curves exist because of the experience that you will get.
Usually, the right answer is the easy answer, and I believe this is the case here. I haven't proved it, but neither has the opposite been proven by their believers either.
Aging patterns (September 23, 2003)
Posted 12:26 a.m.,
September 24, 2003
(#8) -
Tangotiger
"you can " = "you can't"
(That changes the whole meaning!)
Aging patterns (September 23, 2003)
Posted 7:40 a.m.,
September 24, 2003
(#10) -
Tangotiger
It's either that, or that players change their approach to hitting as they age. Once you have confidence to be patient and hit HR, why settle just for a BIP?
Aging patterns (September 23, 2003)
Posted 10:06 a.m.,
September 24, 2003
(#12) -
tangotiger
Great call!
If I had the data, it would have been nice to break it down into $Hfly and $Hground.
I suppose I can redo the charts eventually, one set for "fast" runners and one for "slow" runners, and see what the aging patterns are. I'm sure PECOTA has done this kind of work already.
Aging patterns (September 23, 2003)
Posted 2:03 p.m.,
September 26, 2003
(#17) -
tangotiger
I probably messed up somewhere. Remember, I jsut did those real quick a few years ago. That's why it looks like that, instead of polished.
Aging patterns (September 23, 2003)
Posted 9:55 p.m.,
September 28, 2003
(#20) -
Tangotiger
Suppose that HR (age 36)=30, HR factor (age 36) = .90, HR factor (age 37) = .87, you do:
30/.90*.87 = whatever-that-comes-out-to
Aging patterns (September 23, 2003)
Posted 10:17 a.m.,
September 29, 2003
(#22) -
tangotiger
What's cool about 30/.90 is that it gives you the player's "peak". That is, you would do 30/.90 x 1.00 to give you the HR at the peak age.
Of course, there is GREAT variability, so be careful with this train of thought.
Aging patterns (September 23, 2003)
Posted 12:57 p.m.,
September 29, 2003
(#24) -
tangotiger
That's how you can do chaining. I explained this in a long-deleted post at fanhome.
In essence, if you don't do things as ratios, you will get results that are mathematically incorrect.
That is, if you did aging rates for 1b/pa,2b/pa,3b/pa,hr/pa,bb/pa AND outs/pa, and redid this chart for ages 21 to 37, you will not be able to do it.
If you really want me to get into it, I see if I can dig something up. But, ratios, and not percentages, is how you need to do this.
Aging patterns (September 23, 2003)
Posted 12:45 p.m.,
September 30, 2003
(#26) -
tangotiger
I did not adjust for park or league or anything. The only reason is because I did not do this for any purpose other than to get a general idea.
Aging patterns (September 23, 2003)
Posted 2:26 p.m.,
October 1, 2003
(#28) -
tangotiger
(homepage)
I promise I'll get to this again eventually bob!
At the above link is an email exchange I had with Mike Gimbel on the peak age of a player (something about the peak age being later than we think). This was posted in Jan at fanhome, but might be worthwhile for Primer readers who've never seen it. Mike was understandly tight-lipped about his thoughts, as he was trying to keep things proprietary and valuable to him.
Factors that affect the chances of scoring (September 24, 2003)
Discussion ThreadPosted 7:40 a.m.,
September 25, 2003
(#3) -
Tangotiger
And you are wrong if by "hit" you mean single. The chances of scoring on walk/single are virtually the same. I'll point you to the research on this later.
Factors that affect the chances of scoring (September 24, 2003)
Posted 10:16 a.m.,
September 25, 2003
(#4) -
tangotiger
(homepage)
This excerpt was published at the above link.
===============
Table 1. For all methods for leadoff batter to reach
base, number of times each event occurred, the number of
times that batter scored and the frequency of each. Note
that the "E" category includes all times the leadoff
batter reached on an error, which includes those cases
when he went past first.
Event Reach Score Freq
1B 183468 72841 .397 ************
2B 48364 30961 .640
3B 6573 5753 .875
HR 27205 27205 1.000
BB 82637 33002 .399 **************
HP 6217 2543 .409 **************
INT 81 22 .272
E 12105 5298 .438
==================
As you can see, it doesn't matter how you got to 1B, via walk or single. The chances of scoring were virtually the same.
I would guess that the rate for the HBP can be explained by random chance (1 SD = .006). The INT rate is statistically insignificant (only 81 samples, making 1 SD = .054).
Factors that affect the chances of scoring (September 24, 2003)
Posted 10:27 a.m.,
September 25, 2003
(#5) -
tangotiger
By the way, here's some interesting data. From 1994-2002, here is the weighted ERA of the pitchers (weighted by the event). What this tells you is: "Given that a BB was given up, what's the ERA of that pitcher?"
eraBB.....eraSO.....eraHR.....eraBIPH.....eraH.....eraOuts.....eranonKouts
4.77.....4.38.....4.94.....4.71.....4.74.....4.51.....4.55
So, the difference in performance between a pitcher giving up a walk or a hit is pretty darn close.
And as we see from David's chart above, the overall impact is just not there.
(eraOuts is League ERA by definition.)
Factors that affect the chances of scoring (September 24, 2003)
Posted 11:03 p.m.,
September 25, 2003
(#7) -
Tangotiger
I don't have them easily available. However, if eraBIPH is 4.71, and since 70 to 75% of those are singles, I'll guess that era1B is 4.69 and era2B3B is 4.77.
And while worse pitchers do give up walks (and therefore you expect them to allow more of them to score than on singles), you are probably talking such a small impact when you look at the distribution of pitchers, that you get the reported results that the chances of scoring on a single and walk are virtually the same.
The difference between an ERA of 4.69 and 4.77 is .08 ER, or .09 runs per 9 IP. Or .09 runs per 12 nonHR baserunners. Or .0075 runs per baserunner.
More likely, it's something like a pitcher giving up 5.2 runs / 9 IP with 12 nonHR baserunners and 5.29 runs / 9 IP with 12.1 nonHR baserunners. Take out the let's say 1 HR for the first pitcher, and the 1.01 HR for the 2nd pitcher, and you get:
5.2-1=4.2 runs per 12 nonHR baserunners = .350 runs per baserunner
5.29-1.01=4.28 runs per 12.1 baserunners = .354 runs per baserunner
(I'm sure you can work out the numbers better than me.)
In any case, it's rather easy to see how you can start tweaking numbers here and there to get the empirical results. I did this off the cuff, and I'm showing a .004 difference, and in fact, the empirical is showing .002.
Factors that affect the chances of scoring (September 24, 2003)
Posted 3:48 p.m.,
September 26, 2003
(#9) -
tangotiger
If you are referring to this
A leadoff hitter will get on base with 0 outs about 48% of the time, while he'll do so about 26% of the time with 1 or 2 outs each
this is saying that the leadoff hitter of the game will get 48% of his times on base with 0 outs.
Factors that affect the chances of scoring (September 24, 2003)
Posted 6:56 p.m.,
September 26, 2003
(#11) -
Tangotiger
Those are leadoff innings.
Factors that affect the chances of scoring (September 24, 2003)
Posted 7:55 a.m.,
September 27, 2003
(#14) -
tangotiger
Just to clear up what I'm doing:
Sum(ERA*IP)/Sum(IP), where I sum over all pitchers
Expanding that, and we get
Sum([ER/IP*9]*IP)/Sum(IP)
Sum(ER*9)/Sum(IP)
9*Sum(ER)/Sum(IP)
And that gives you lgERA.
Factors that affect the chances of scoring (September 24, 2003)
Posted 7:57 a.m.,
September 27, 2003
(#15) -
tangotiger
Oh, and this is true, as long as the IP > 0 for every pitcher.
Factors that affect the chances of scoring (September 24, 2003)
Posted 12:54 p.m.,
September 29, 2003
(#19) -
tangotiger
Ross, I really don't know what your point is.
Given that a player reaches 1B with 0 outs, it doesn't matter if he gets there by single or walk, as he'll score about 40% of the time. Whatever extra information is contained in the manner in which a runner reaches 1B is almost insignificant. The significance is almost completely explained by my 5 factors.
Factors that affect the chances of scoring (September 24, 2003)
Posted 6:53 a.m.,
September 30, 2003
(#21) -
Tangotiger
This means that drawing a walk, on average, is less likely to lead to a run scoring than a hit
This is a given. Walks occur more frequently with 1b open than not, relative to a single, and walks occur more frequently with 2 outs than not, relative to a single.
Across all base/out states, no question, a walk leads to less runners scoring than a single. But, GIVEN the base/out state, the runner on a walk is virtually just as likely to score as the runner on a single.
Are we agreed?
Factors that affect the chances of scoring (September 24, 2003)
Posted 11:56 a.m.,
September 30, 2003
(#23) -
tangotiger
I have no idea whether it is true under other circumstances.
I'm not going to run the data for you, but my best guess is that it is the same. How you get there is inconsequential, after you account for "the 5 factors". I can say this with confidence because my Markov models match the empirical based on this assumption.
It would seem unlikely since a runner is much more likely to advance to second on a single with a runner on base, while a leadoff batter advancing to second is counted as an error in the data above.
I read this 4 times now, and I don't understand what you are trying to say.
So when we look at Factor 3 above, how does this effect it? Doesn't it imply that a batter with more walks is less likely to get on base with 0 outs and therefore less likely to score?
Correct. As you note in your next statement, this is the average walk. While the average single will score about .26 times, the average walk will score .25 times. (These numbers are dependent on the environment, but the effect is about .01 runs / time on base.) I documented this somewhere in the comments section of the "How Runs Are Really Created" series.
By the 24 base/out states however, there is no difference. (or the difference is as shown in my quote above, with a difference of .002 runs / time on base). Talking fan to fan, it's no difference. Talking professor to professor, well, that's boring isn't it?
Of course we are talking about the average walk. It may well be that the pattern of when players walk varies widely. In fact its pretty likely.
The walk and K varies the most by base/out states, and therefore, I suspect that those kinds of players vary the most.
Factors that affect the chances of scoring (September 24, 2003)
Posted 8:04 a.m.,
October 1, 2003
(#25) -
Tangotiger
Ross, if you understand Markov, then I don't get your followup questions, and I will be happy to bow out at this point. If you don't understand Markov, then I'd be happy to explain it.
Which means nothing for the overall likelihood of scoring on a walk or a single.
I know. That's why I said the .26/.25 for overall, for this very reason. Overall, it's .26/.25. By the 24 base/out, it's almost certainly the same. I say that because of Markov.
Factors that affect the chances of scoring (September 24, 2003)
Posted 12:29 p.m.,
October 1, 2003
(#27) -
tangotiger
(homepage)
Basically, what Markov says is that "how you enter a state is independent as to how you leave a state".
So, if you can picture the different ways that you enter the "2nd and 1 out" state:
- start state: 1 out, event: double, end state: 2nd and 1 out
- state state: 1 out, man on 2b, event: single, end state: 2nd and 1 out (runner scores, and batter takes 2b on a single... something that is unlikely with a walk, I agree)
- state state: 0 outs, man on 1b, event: succ bunt, end state: 2nd and 1 out
etc, etc, etc.
So, GIVEN that you've got a man on 2b and 1 out, what happens to that runner, according to Markov chains, is independent as to how he got there.
Now, is this true?
What's cool is that you can compare the expectation from a Markov chain to empirical analysis, and you'll get results that are close enough that you can make that claim.
The same applies for getting on 1b and 0 outs. If you get there on a walk, you know that the pitcher is slightly worse than the pitcher that made you get on base with a single. Maybe if you got a walk, you are more likely to come from the top of the order, and so you have better hitters behind you. Maybe a single happens with fast players more, etc, etc, etc. There are alot of "hidden information" contained in how you get there. But, empirical analysis shows that the rate at which you score from a walk or single with 1b and 0 outs is virtually the same (.399 to .397).
Now, when I produce the Linear Weights results for each event, I base this on the empirical data for 1999 to 2002. That is, I look at exactly how many runs scored from that event/base/out to the end of the inning. (See above homepage link. The overall values are in the last line.)
If I set up my Markov chains, I'll get overall numbers that match pretty closely to that last line.
And, as a third step, I can also reproduce the numbers in the last line by assuming things like "single/walk = same chance of scoring". So, again, whatever differences might exist is insignificant.
I'm happy with the data that I've produced, and the reasoning behind it. I'm not going to do further work here, but I encourage you to download the Retrosheet event files, or Ray Kerby's program, and do the work yourself.
Finally, while a single might find the batter at 2b (because of error on the throw, or taking 2b on the throw home, etc), he might also find himself out for trying to take the extra base. Like I said, my best guess is that the difference between a single and walk scoring is probably pretty darn close for each of the 24 base/out states. This assumption leads to results that are consistent with the empirical LWTS, the Markov LWTS, and the LWTS process I detailed in Article 2 of "How Runs are Created".
Factors that affect the chances of scoring (September 24, 2003)
Posted 12:32 p.m.,
October 1, 2003
(#28) -
tangotiger
(Note: at the above link, you'll find very weird results for HBP and IBB. Those are sample size issues. There are just not enough of them, at each of the base/out states, to feel confident with them. Even overall, the margin of error is high. )
Factors that affect the chances of scoring (September 24, 2003)
Posted 8:34 a.m.,
October 10, 2003
(#30) -
tangotiger
I've already acknowledged that speed is a component, as this was part of the 5 factors that affect scoring.
As a group, the chance of scoring from 1b on a single is close to that of a walk. Since there's alot of speed information hidden inside a SB, I would expect the chance of scoring from 2b would be diff between a double or single/walk+steal. Obviously, at the individual level, this is even more true. The point is to try to establish how much hidden information there is in the unconsidered variables. The single/walk 1B thing does not have much.
Factors that affect the chances of scoring (September 24, 2003)
Posted 10:10 a.m.,
October 10, 2003
(#33) -
tangotiger
This is my point. When you consider all the variables, they don't amount to a hill of beans, overall.
Pyschological Impact of a Devastating Outcome (September 27, 2003)
Discussion ThreadPosted 12:34 p.m.,
October 1, 2003
(#6) -
tangotiger
Great stuff! Thanks for the excellent ideas. I agree with your last statement as to how to proceed. This is exactly the way I'd do it.
Pyschological Impact of a Devastating Outcome (September 27, 2003)
Posted 7:19 p.m.,
October 1, 2003
(#8) -
Tangotiger
How about from the hitter's perspective? Or do we think that because 30 minutes elapse between PAs that he has time to recover? Just the errors?
How about leaving runners stranded?
Or leaving bases loaded twice as a hitter?
CS in a close game?
3 consecutive Ks?
Pyschological Impact of a Devastating Outcome (September 27, 2003)
Posted 6:17 p.m.,
October 13, 2003
(#11) -
tangotiger
Charles, your question was answered in a Clutch thread several months ago. I'll see if I can find it.
Another thought: a pitcher who hits a batter is more likely than his average to get hit himself his next time up.
Pitch Type and Count May Increase Risk of Elbow and Shoulder Pain in Youth Baseball Pitchers (September 27, 2003)
Discussion ThreadPosted 11:58 p.m.,
September 27, 2003
(#1) -
tangotiger
(homepage)
Check out this link too.
Pitch Type and Count May Increase Risk of Elbow and Shoulder Pain in Youth Baseball Pitchers (September 27, 2003)
Posted 11:02 a.m.,
September 29, 2003
(#6) -
tangotiger
(homepage)
If you click the above link, I should Maddux's estimated pitch counts. You can check mlb.com site for his actual pitch count. In any case, it's a fallacy that Maddux has low pitch counts, or that he routinely pulls himself from games early, etc, etc. Maybe these last 2 years, but not years earlier.
And I don't think his K totals are modest. I'm pretty sure they are above average.
Mark Prior and the Perfect Delivery (September 30, 2003)
Posted 1:31 p.m.,
September 30, 2003
(#2) -
tangotiger
That's a good point, and after bringing it up with Will, he also mentioned the videotape as the best tool to use.
2003 Park Factors (October 1, 2003)
Posted 7:02 p.m.,
October 1, 2003
(#2) -
Tangotiger
Patriot, with yahoo groups, you should right-click the file name, do a copy of the URL address, adn post that address. They randomize URLs making this inaccessible.
2003 Park Factors (October 1, 2003)
Posted 12:30 p.m.,
October 2, 2003
(#9) -
tangotiger
KJOK's great parks database can be found at the baseball-databank yahoo group.
Results of the Forecast Experiment (October 2, 2003)
Posted 11:57 a.m.,
October 3, 2003
(#4) -
tangotiger
(homepage)
A similar experiment was done a few years ago at the above link. In this case, rather than having 165 people picking on 32 select players, there were about 30 people picking on 125 or so players.
There were 3 good reasons I did not do as that study did:
1 - No interest in duplicating the work
2 - Much easier to get 100+ people to participate if I ask them to do less work
3 - By selecting the 32 players who were most inconsistent, it's here we'll see the variations in picks. The other say 100 players that I would have added would have left little variation. If you've got a guy with a 4.20, 3.98, 4.12 ERA, there'll be very little difference in what any group will pick. So, you can increase the sample size, but little variance will be added.
Alan, can you send me an email?
Player Game Percentages, World Series (October 8, 2003)
Discussion ThreadPosted 5:11 p.m.,
October 9, 2003
(#5) -
tangotiger
Reno, I'm thinking about doing this, though I'm not sure if I've got the time.
Pete, you are right that you need different win probability matrix for different run environments. You also have a different run environment, from the hitter's perspective, if you have Pedro at Dodger Stadium, than Joe Schlub at Coors. It gets to be a very very messy thing to try to do right.
I'll post a link to Doug Drinen's article, as it was top-notch as well.
Player Game Percentages, World Series (October 8, 2003)
Posted 5:14 p.m.,
October 9, 2003
(#6) -
tangotiger
(homepage)
This is Doug Drinen's Win Probability Added (WPA).
Player Game Percentages, World Series (October 8, 2003)
Posted 8:22 p.m.,
October 9, 2003
(#9) -
tangotiger
Having 1 million games won't give you the reliable data you need at the inning/score/base/out level. You need to create a math model based on various event/base-out-state/transition rates.
For those who don't know, the author of the article cited is the author of Curve Ball.
Batting Average on BIP, 1999-2002 (October 10, 2003)
Posted 8:02 a.m.,
October 10, 2003
(#1) -
tangotiger
(homepage)
And this is for 1974-1990.
Eventually, I'll update the first link to include 2003 and the second one to span 72-92.
Batting Average on BIP, 1999-2002 (October 10, 2003)
Posted 1:05 p.m.,
October 10, 2003
(#3) -
tangotiger
FJM: while I certainly would be wary of any 1 year differential, your suggestion is a good one, at the multi-year level.
When I get the 2003 data, I'll do breakdowns for 1999-2003 by
- lefty/righty + home/opponent
as well as the overall average.
That sound good?
Batting Average on BIP, 1999-2002 (October 10, 2003)
Posted 4:37 p.m.,
October 10, 2003
(#6) -
tangotiger
Day/night? Hmmm... I suppose grass/turf as well (within home/road), say Fenway-grass, BosAway-grass, BosAway-turf?
So, we've got:
- handedness of pitcher
- handedness of batter
- illumination
- park surface
- park location
- questec
I get the feeling that my sample size will get down to near nothing at this point. As well, the unbalanced schedle rears its ugly head, as well as pitchers batting or not.
I think what we need here is Alan's logistic function to make some sense out of all this.
Batting Average on BIP, 1999-2002 (October 10, 2003)
Posted 6:23 p.m.,
October 10, 2003
(#9) -
tangotiger
Actually, if you looked even closer, you will note that "team" does not appear as a heading, but "park" does.
Batting Average on BIP, 1999-2002 (October 10, 2003)
Posted 10:16 a.m.,
October 12, 2003
(#12) -
tangotiger
I agree. We think of Coors as "Coors" because of the amount of years of data.
But, I can easily go back and cherry-pick 1 year from Fenway or Wrigley or Fulton County or the Metrodome, etc, etc and find a 30% increase in runs home to road.
To be precise, 1 year is not meaningless (nothing is ever meaningless), but I'll guess that single-year park factors need to be regressed somewhere 50 to 80%.
And, as Ross points out, when applied to individual players (or style/quality of players), there's no reason that Jack Clark, Willie McGee and Vince Coleman get the same park factors either.
So, the reliability of using single-year park factors on any one player comes in pretty low (though still statistically significant).
Batting Average on BIP, 1999-2002 (October 10, 2003)
Posted 6:06 p.m.,
October 12, 2003
(#17) -
tangotiger
The SD might be 5 for one game, but it won't be 5 for 81 games.
That is, if 1000 runs over 81 games are scored at Coors, you need the standard deviation of runs over 81 games (and not do 5 x 81).
In any case, we really don't know how to apply PF to individual players with the current reports. You need to figure out park factors by handedness, gb/fb tendency, power/slap hitter, speed, and quality of player.
The problem is that if you want to know how Pac Bell affects all LH, FB, power, fairly fast, great hitters, you might end up having a sample of only 2000 PA over 5 years, 1000 of which would belong to Bonds.
Pythag Expansion (October 11, 2003)
Posted 12:24 p.m.,
October 11, 2003
(#6) -
tangotiger
(homepage)
The above is Patriot's link.
memo to Pat: easier to shove that in the "Homepage" box when posting.
memo to All: The PythagenPat equation is the best most reliable equation we currently have.
Pythag Expansion (October 11, 2003)
Posted 4:52 p.m.,
October 13, 2003
(#15) -
tangotiger
(homepage)
Right click on the above link, and "save picture as" to your PC. Then double-click the file that you just downloaded.
(If you exceed my bandwidth limit, try again tomorrow. I'll try to upload it to baseballstuff.com, but I'm having problems from work right now.)
I used the PythagoPat (with the .28 value) to figure out the expected win% for RS+RA from 2 to 16, and from RS-RA from 0 to 5. Then, I simply figured the RPW (runs-per-win) converter as (RS-RA)/(win% - .500)
What are you going to see?
You are going to see the "black line" where the run differential is 0, and the RPW converter starting at "2" when RS+RA = 1. And you will see it go up, in an almost straight line, to almost "15" when RS+RA = 16.
You will see a few more lines at other RS-RA lines that follow the same pattern. (This corresponds to the legend on the right.)
As I mentioned in other threads, the PythagoPat corresponds the closest to the Tango Distribution, and that distribution is the best one to model actual win matchups.
Pythag Expansion (October 11, 2003)
Posted 9:19 p.m.,
October 13, 2003
(#17) -
Tangotiger
David,
I wasn't trying to get into the fray here, because I'm not sure what the discussion actually is.
I will start a new thread to only discuss the RPW (and you will be able to see it without any trouble), so as to not hijack your discussion here.
postseason odds - Silver (October 11, 2003)
Discussion ThreadPosted 4:06 p.m.,
October 14, 2003
(#3) -
tangotiger
Alan, I didn't notice that. Thanks for pointing it out.
If you've been keeping track on a day-to-day basis, can you compile your results in an Excel file so that we can see the progress of each team?
Odds of Cubs losing an 11-run lead (October 11, 2003)
Discussion ThreadPosted 9:50 p.m.,
October 11, 2003
(#2) -
tangotiger
I can't imagine it being much worse than 1%.
A pitcher that gives up 6 RPG will allow at least 11 runs over 9 innings almost 11% of the time. A pitcher that allows 4.5 RPG will allow at least 11 runs over 9 IP less than 4% of the time.
Not that this helps us any, but if one pitcher is at .14% of blowing it, I have to believe that the other pitcher would be under 1%.
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 3:56 p.m.,
October 13, 2003
(#4) -
tangotiger
I agree, it's a CYA (cover your a.s.s) move. Managers in baseball, as well as in the business world, do the exact same thing.
I do remember the Expos leading the Cubs 15-2 one game, and when I got home, I learned that Jeff Reardon got the save (final score: 17-15).
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 9:03 a.m.,
October 14, 2003
(#6) -
Tangotiger
Of course it means nothing. It was just a silly little anecdote that I remembered when I was a teenager.
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 1:14 p.m.,
October 14, 2003
(#8) -
tangotiger
Because it was virtually impossible for the Cubs to have done that to begin with. And like I said, it would have been virtually impossible for Florida to do the same, irregardless of the Cubs doing that, UNLESS, the wind was blowing out so much that the run environment was completely changed (and of course, changed for BOTH teams).
Phil Birnbaum published the empirical data, and I posted a link to that page on Primate Studies. Feel free to publish the results here.
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 12:40 a.m.,
October 15, 2003
(#11) -
Tangotiger
The key are the following:
1 - Remove Prior after 5, and therefore, only have 4 innings remaining for the opposition
2 - The Cubs also having 4 innings to score
Therefore, how often does a team OUTscore its opponents by 11 runs over a span of 4 innings? My math model says .14% of the time, or 1 in 700. MGL's sim showed 1 in 500. What does Phil Birnbaum's empirical data show?
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 11:08 a.m.,
October 15, 2003
(#13) -
tangotiger
(homepage)
Putting "Birnbaum" in the search box below yeilded the above link. If you click on that link, you will be directed to
http://www.philbirnbaum.com/winprobs.txt
Accoring to this record
"V0601-6",1851,24
which reads
"when the visiting team is batting in the 6th inning, with 0 outs and bases empty and down by 6, they won 24 of 1851 games".
That's 1.3%. I think it's easy to see that when they are down by 11 that it would fall pretty low don't you think?
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 11:17 a.m.,
October 15, 2003
(#14) -
tangotiger
FWIW, my math model, for the above situation reads 2.2%. Here's the chance of the home team losing the game, entering the 6th, if leading by 1 all the way to leading by 11.
33.3% 1
21.0% 2
12.7% 3
7.3% 4
4.1% 5
2.2% 6
1.2% 7
0.6% 8
0.3% 9
0.2% 10
0.1% 11
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 11:58 a.m.,
October 15, 2003
(#15) -
tangotiger
Oh, and I didn't say "impossible", but "virtually impossible". And, I also provided a concrete number (1 in 700) to define what "virtually impossible" means. So, you can quibble about my use of "virtually", but there no reason to drop it from my statement, and to ignore the rather clear odds statement I made.
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 12:33 p.m.,
October 15, 2003
(#17) -
tangotiger
Sorry about that Ross.... natural reaction.
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 12:57 p.m.,
October 15, 2003
(#18) -
tangotiger
As for the absolutely most stunning inning I've ever seen (even my baseball-hating wife was intrigued by what was unfolding), the "Win Probability Added" to that fan is -.014 wins.
Cubs chance of winning was .970 if Alou catches, and .956 with the fan getting in the way.
Odds of Cubs losing an 11-run lead (October 11, 2003)
Posted 4:46 p.m.,
October 15, 2003
(#20) -
tangotiger
Excellent point there. The number of hard hit balls, rather than the number of hits I think is more telling. Prior was getting creamed, and who knows if the extra rest in game 2 would have helped. The presumption is that it certainly wouldn't have hurt, and the .0014 wins that the Cubs gained in game 2 was just not worth it.
RISP for hitters and pitchers (October 13, 2003)
Posted 12:44 p.m.,
October 14, 2003
(#5) -
tangotiger
(homepage)
If you guys keep clamoring for it, I might decide to get off my lazy a-- and do it seriously.
You can check out the above for what I did a few months ago.
RISP for hitters and pitchers (October 13, 2003)
Posted 3:48 p.m.,
October 14, 2003
(#8) -
tangotiger
Alan, I agree that that ARP process is sound, and I whole-heartedly support it. I just happen to support it to the same extent that I would for hitters, too.
I would support it more for pitchers than hitters, but not much more, if research shows that pitcher's have more variability in their performance men on base v bases empty than hitters too.
RISP for hitters and pitchers (October 13, 2003)
Posted 5:52 p.m.,
October 14, 2003
(#10) -
tangotiger
What happens with ARP is that if you happen to be brought in alot with men on base, and you happen (by luck or design) get yourself out of those jams (but not necessarily your own jams), you will get a very favorable ARP.
In essence, ARP allows you to leverage your talent on the mistakes of others. I whole-heartedly support this.
At the same time, Eddie Murray leveraging his talent by having alot more men on base to work with is just as valid.
Game State Matrix (October 13, 2003)
Discussion ThreadPosted 6:21 p.m.,
October 13, 2003
(#2) -
tangotiger
It was based on empirical data, and even though it's 2000 games, it's still ONLY 2000 games. Believe me, you need far far more than that, which is why a simulator or a math model would be better, IF those models can match reality in terms of better pitchers coming into the game in close games, etc.
If you want studes, take Phil Birnbaum's data from several weeks ago, and reconstruct this graph with Phil's data.
As for the software used, it's mentioned elsewhere on Rhoid's site.
Game State Matrix (October 13, 2003)
Posted 6:28 p.m.,
October 13, 2003
(#3) -
tangotiger
(homepage)
Studes,
The software was developed by the person in charge of the baseball Rhoids' site. You can check out the above link.
Game State Matrix (October 13, 2003)
Posted 12:47 a.m.,
October 15, 2003
(#6) -
Tangotiger
If it's the bottom of the 9th, tie game, bases empty 0 outs, say that gives the home team a .65 chance of winning. If he hits a triple, say that gives the team a chance of winning of .95. So, that triple with bases empty add .30 wins. The next batter, if he drives in that runner will add only .05 wins.
To answer the question being put forth: it's all taken care of. The win probability is based on assuming a typical random distribution of FUTURE events to establish the CURRENT win probability.
So, to start the 9th, we *expect* 30% of the time the team to score. And the other 70% of the time? Well, it goes into extra innings, and you get (an almost) 50-50 chance of winning. .30 + .70/2 = .65
You get the triple with 0 outs, and you have a nearly 90% chance of scoring. Why? Because you put yourself on 3B, and you *expect* the next 3 batters to perform at league average, and therefore, the batter-runner will reap the large majority of the share of getting on base.
On the other hand, if it was 2 outs, and the batter walked, he'd add very little value to his team winning. The batter that drives him in would get the lion's share of the credit.
Injury-prone players (October 14, 2003)
Posted 12:28 p.m.,
October 14, 2003
(#2) -
tangotiger
(homepage)
The above link is probably what Steve was originally referring to.
You can google the following
hammonds treder davis site:baseballprimer.com
Injury-prone players (October 14, 2003)
Posted 12:39 p.m.,
October 14, 2003
(#3) -
tangotiger(e-mail)
Steve, a couple of things:
1 - I'm not sure how I missed this thread, but I'm glad I saw it finally.
2 - I see that you have some players who were on the DL as of opening day. I also see that you selected the players over 3 weeks prior to opening day. However, were any of those players on the DL in the last week of 2002, the prior season? That is, they were on the DL until Oct 1, 2002, and starting from March 31, 2003. In fact, they could have been injured the whole time, and the way you selected the players, you might have grabbed a few like that. It certainly won't explain the whole difference though.
3 - Age should have been a controlled variable as you alluded to. Perhaps a breakdown of the 50 players by: born 1977 or later, born 1970-1976, born 1969 or earlier (or whatever boundaries you want to set so that the 100 players fall more or less as 33/33/33).
4 - As well, rather than just DL days, just "players on DL", and maybe a breakdown by "15 days or less", "60 days or more", "16-59 days".
At this point, the level of granularity might not leave us with much.
I think you already did great work on this. If you've had enough with this, send me your spreadsheet, and I wouldn't mind taking a look at this.
And for posterity, I think an "official" writeup of your findings is called for, and I'd be happy to post it here, or send it to the home page for publication.
Great job!
Injury-prone players (October 14, 2003)
Posted 1:39 p.m.,
October 14, 2003
(#5) -
tangotiger
Thanks to Steve for providing the data.
I split his data into
Old: born 1970 or earlier (30 players)
Young: born 1975 or later (27 players)
MiddleAged: born 1971-1974 (43 players)
Among the old players, 10 were non-injury prone, and 20 were. The average DL time per class was: 9 days for non-injury prone, and 65 days for injury prone.
Among the young, 21 were non-injury prone, and 6 were. 18 DL days for the healthy ones, and 40 for the unhealthy. Though, at 6 data points...
Among the middle aged folks, 19 non-injury prone, and 24 injury-prone. TWO DL days for the healthy, and 27 for the unhealthy.
I think this is a fascinating idea, and if you want to improve on this study, I would suggest the following:
Do a matched-pair study. That is, you have 2 groups that are equals in terms of:
- age
- position
- body type
- performance level
but differ in the number of times and number of days on the DL over the last 4 years.
I would also say that you would want all players to NOT have been in the DL in 2003, so that there are no "lingering" effects.
Removing catchers and pitchers is also the right idea.
So, you find out say Alfonso Soriano's twin (born within 1 year, plays 2B, BMI within 1, above average hitter), but someone who has gone on the DL in 2000-2002.
You could potentially do this for past years, as long as you don't let your future knowledge make you select players.
Fascinating stuff Steve!
Injury-prone players (October 14, 2003)
Posted 2:49 p.m.,
October 14, 2003
(#6) -
tangotiger
Just running some more stuff on Steve's data.
I get an r of almost .40 between age/proneness to days on DL.
Days on DL = x + y, where
x = 31 if injury prone
y = 1.3 * (Age - 23)
So, a non-injury prone 23 year old would be expected to be on the DL 0 days, while an injury-prone 36 year old would be expected to be on the DL 48 days.
Injury-prone players (October 14, 2003)
Posted 4:04 p.m.,
October 14, 2003
(#8) -
tangotiger
dlf,
It's an interesting thought.
So, you are saying that if you look at players from 2000 to 2003, and you've got 2 players with a .280/.350/.470 line, but:
- player 1 did that with 1500 PA over the 4 years, though never on the DL in 2003
- player 2 did that with 2500 PA over the 4 years, and never on DL
that.... What would happen in 2004?
We're not really tracking what the performance level of the player will be in 2004, but rather what his DL status will be.
By taking two players that are similar in performance level, body type and position, we've got "twins". If they didn't have the same performance level, then maybe there's something else different about them.
Technically, you want the injured player to have a higher performance level (if above average), on a rate basis, to make these guys equals. Why? Because his observed rates occur on a smaller PA sample size, and therefore, are less reliable to his true talent level.
Injury-prone players (October 14, 2003)
Posted 4:23 p.m.,
October 14, 2003
(#10) -
tangotiger
Hmmm... but how about the flip-side? How about a guy who is hurt, but doesn't land himself on the DL?
Isn't it possible that the non-DL guy will play hurt more than the DL-guy? I'm not sure... just throwing it out there.
Injury-prone players (October 14, 2003)
Posted 7:26 p.m.,
October 14, 2003
(#14) -
Tangotiger
The "x"/"std error" was about 3.5 for the "31" and less than 1 for the "1.3".
If those refer to the standard deviations, then I suppose the second parameter is probably chance.
Can you fill in the blanks, Alan?
Injury-prone players (October 14, 2003)
Posted 1:03 p.m.,
October 17, 2003
(#22) -
tangotiger
That's a good point. If you have an injury in which it's expected to be recurring, then we're not really addressing the issue of random injuries.
So, the selection of injury-prone players would be those players who have an injury that is not expected to recur. In hockey and football, Troy Aikmen, Eric Lindros, Pat Lafontaine all have had multiple concussions, and I think we can recognize that this injury might be more likely to happen to guys who've already had it before, and we're not really proving anything.
What we want is really for a guy to have been on the DL for one ailment, and then been prone to be on the DL for another ailment.
You can also have a concurrent study of guys who pull their hammys and getting on the DL, and then getting back on the DL for the same ailment. That is, how recurring is an injury? In this case, a regression analysis would probably suffice.
Good point Andrew!
Injury-prone players (October 14, 2003)
Posted 4:16 p.m.,
October 21, 2003
(#25) -
tangotiger
J, that's a great idea!
I have longed been annoyed at the prognosticators that did not have the decency to revisit what they've said, before going on to their next Nostradamus projections.
Tom Tippett is one of the very few that has actually laid it all out. Voros did this as well for a few years.
And, this is not just baseball, but in stock picking, weather, football lines, and any forecasting model. As far as I'm concerned, if someone makes a set of forecasts, he should be obligated to go back and look at how well he did, and let the readers know.
Anatomy of a Collapse (October 15, 2003)
Posted 3:41 p.m.,
October 15, 2003
(#1) -
tangotiger
Note2: I'm also assuming that the pitchers/fielders are equally responsible on balls in play, which again, may not necessarily be a good assumption when looking at these actual plays. Some balls were extremely hard hit, and most of the blame on those BIP should go to the pitcher.
Anatomy of a Collapse (October 15, 2003)
Posted 4:42 p.m.,
October 15, 2003
(#11) -
tangotiger
DK: yes, I threw that in with "Fielders".
Craig: Good point. If I were to watch the inning over, we can do additional micro-level things as you are doing it. The key is for everything to add up.
As well, there were many hard hit balls in that inning, making the OF almost inconsequential in those cases, and Prior/Farnsworth should get more blame.
***
Looking at the Jeter play at the end, you can almost credit that play as if it was 80% hit, 20% out (and credit the pitcher as such). That Jeter makes the play gets him more credit. At the same time, Bernie looked like he might have been able to make it, so, technically, you could give Bernie some credit for being in the position to make that play (say if Jeter wasn't there, maybe Bernie would have made that play 85% of the time). You can really get into the nitty-gritty with all this. However, if you look at it on a seasonal basis, this kind of approach will lead to very few surprises (the breaks even out, more or less). The impact is really felt at the game or inning level.
Anatomy of a Collapse (October 15, 2003)
Posted 4:48 p.m.,
October 15, 2003
(#13) -
tangotiger
By the way, when I did WPA for 1999-2002, one of the pitchers that was in the leaders was Mike Remlinger. I haven't looked at his performance this year, but how has he pitched? Is there a reason that Dusty did not go to him earlier? Was this his pattern all year?
Anatomy of a Collapse (October 15, 2003)
Posted 5:42 p.m.,
October 15, 2003
(#21) -
tangotiger
Thanks for the kind words, guys!
Matt: hmmm... I agree, it would not be a 100% sure out. Let's say that Alou would have made that play 80% of the time. So, the fan's WPA should be -.025 and not -.031. Alou+Prior get +.025.
Anatomy of a Collapse (October 15, 2003)
Posted 9:18 a.m.,
October 16, 2003
(#27) -
Tangotiger
I'll guess it's around .4^3 = 6%
Anatomy of a Collapse (October 15, 2003)
Posted 12:21 p.m.,
October 16, 2003
(#28) -
tangotiger
FJM: that's a good point, and I'm glad you brought it up. On a 3-2 count, it is definitely a hitter's count. In a random inning (say the 4th), if the team normally has a .500 chance of winning on an 0-0 count, it would have a .507 chance of winning on a 3-2 count. I will take a guess that if the chance of Cubs winning at the start of the Castillo PA was .924, then it was probably around .922 or .921 on a 3-2 count. So, instead of the fan costing .031 wins, it might have been .033 or .034. That's assuming a sure out, which it certainly wasn't. Giving Alou an 80% chance of coming out with that ball, that reduces the fan to .026 loss. At this point in the game, the count didn't come into play much.
On a side note, since the count was 3-2 before the fan got in the way, and it remained 3-2, there was no change in Win Expectancy for having the pitcher throw an extra pitch. It was essentially a "let".
Anatomy of a Collapse (October 15, 2003)
Posted 3:30 p.m.,
October 16, 2003
(#30) -
tangotiger
I think Keith Woolner did some study along those lines. I don't remember the results of that. My guess is that if you are at 3-2, and you continually foul-off the pitch, you may get a miniscule advantage as a hitter, but I'd be surprised if the numbers would show anything that would be statistically significant.
Anatomy of a Collapse (October 15, 2003)
Posted 4:13 p.m.,
October 16, 2003
(#32) -
tangotiger
(homepage)
This is the Woolner study.
Aggregating the "at least 10 pitches" here's the breakdown:
5,935 PA / .242 / .430 / .406
A batter on a full count does better than that. I really don't think you see a tiring effect on that particular batter, but perhaps on the next batter or for the rest of the inning, maybe you might.
Perhaps consecutive 25 pitch innings might have an effect on the 3rd inning too.
Anatomy of a Collapse (October 15, 2003)
Posted 11:35 a.m.,
October 17, 2003
(#35) -
tangotiger
FJM: I agree, the Woolner study does not show what we want, and it was pretty clear of that the first time reading his article. He did followup on that in a "mailbag" issue I think. There was a similar issue with reaching base on error, from what I remember.
The "problems" in those articles, from the perspective of some of the readers, was they had a question to answer, and the data was not properly set up for them to do so. However, from the perspective of Keith, he was simply presenting data from a different perspective, and trying to infer things from that data.
This is the classic process that I describe: ask your question first, then find the data to answer that. Doing the reverse, like "here are the results of 10-pitch-plus outings...what does that tell you" won't answer every question you may have.
Anatomy of a Collapse (October 15, 2003)
Posted 12:03 p.m.,
October 22, 2003
(#36) -
tangotiger
On BP (from probably Woolner), they are reporting
"Visitors have three-run lead with one out in the 8th inning, 1972-2003: 6281-445 W-L (.934 winning percentage) "
In my math model, I have it at .929. Again, all those things that differ from the overall average in this situation (8th inning, 3 run lead) doesn't amount to much, if anything.
Anatomy of a Collapse (October 15, 2003)
Posted 12:14 p.m.,
October 24, 2003
(#38) -
tangotiger
The analysis is useful to the point that you accept the underlying assumptions of "win probability added". That assumption is that you only care about the current inning/score/base/out/pitch state, and not how you got there.
Therefore, the traumatic experience that Mark Prior had with the 4 fans reaching over, with one of them batting it away is irrelevant to him walking Castillo, under these assumptions.
If you mean to say that the assumptions are not acceptable, then I don't have a problem with that.
Relevancy of the Post-season (October 16, 2003)
Discussion ThreadPosted 2:09 p.m.,
October 16, 2003
(#1) -
tangotiger
And this is true in other NA sports as well. No one believes that the 1993 Canadiens were the best team in the league. They happen to have a goalie, Patrick Roy, who did not play so well in the regular season, but was un-freaking-believable in the playoffs, reeling off 10 consecutive OT wins. And in 1986, the rookie Patrick Roy, after a so-so season, established his dominance again. This was done much easier after the ouster of the powerhouse Oilers at the hands of the tough Flames that year.
Don't talk to me about the World Series or Stanley Cup proving anything about who the best team is, as if "best" can be established by playing 3 or 4 teams over 20 games.
Relevancy of the Post-season (October 16, 2003)
Posted 1:10 p.m.,
October 17, 2003
(#6) -
tangotiger
The question is: who is the best team in the NL? The Braves, Giants, or Florida?
Given that a team is not static, and the players that make up each team changes day to day, or month to month, what is even meant by the "best team"? The team whose players have performed better than their opponents over... what? the first 162 games (i.e., regular season)? The last 20 games (i.e., playoffs, more or less)? Was Seattle's 116 win season meaningless, because of the existence of a post-season? And if they played with European rules, where the regular season champ is the champ, then they were the best team?
While some people prescribe to the view that the objective of the playoffs is to establish the best team in a league, my perspective is that the playoffs is to create fun, money, and drama.
Neither perspective is any more right than the other.
Relevancy of the Post-season (October 16, 2003)
Posted 4:17 p.m.,
October 17, 2003
(#8) -
tangotiger
I agree with Patriot. The Marlins are the NL champs, since they beat the contenders for the title.
Tiger Woods is the best golfer even if he doesn't win the Masters. Serena is the best tennis player, even if... etc, etc, etc. This is no slight on the champions of any of those events.
Chance of Winning a Baseball Game (October 20, 2003)
Posted 8:26 p.m.,
October 20, 2003
(#2) -
Tangotiger
Yes, I should have added what the runs per inning distribution was.
Chance of Winning a Baseball Game (October 20, 2003)
Posted 12:09 a.m.,
October 21, 2003
(#4) -
Tangotiger
Yes, something like 71% 0 runs, 15% 1 run, 8% 2 runs, etc, etc.
You have to be a little careful here, since the runs per inning distribution is not random between teams, since they play in the same park, and the batting order is a problem as well.
You can also try to extend this to other sports, but again, the points / game distribution would be even less random between 2 teams (probably). Baseball is unique that each team is guaranteed a certain clock (27 outs). Hockey, football, soccer, basketball have their possessions / time determined by both teams.
Still, if someone has a log of scores, including the identity of teams for the other sports, it should be easy enough to construct a similar chart as I have here.
Chance of Winning a Baseball Game (October 20, 2003)
Posted 2:28 p.m.,
October 21, 2003
(#6) -
tangotiger(e-mail)
(homepage)
Good job FJM! The actual numbers are
0 - 73.1%
1 - 14.9%
2 - 6.7%
3 - 3.0%
4+ - 2.2%
Rounding errors would give you your results.
If you go to the above link, you will see Phil Birnbaum's chart of actual data. Look for this record:
"H0901-1",4685,887
which is read as "Home team batting, 9th inning, bases empty 0 outs, down by 1 run occurred 4685 times, and won 887 times".
That works out to: 18.9%
My chart, which assumes everyone is always equals, says: 19.4%
With 1 standard deviation being .006, these two differences are not statistically significant.
So, while it would be more accurate to use actual run scoring by inning/score (to simulate the closer coming in, etc), we're actually pretty close to reality, aren't we? That is, all that stuff that we know is true doesn't have anywhere near the impact that we might think.
Anyone feel like reproducing my chart using Phil's data?
Chance of Winning a Baseball Game (October 20, 2003)
Posted 3:40 p.m.,
October 21, 2003
(#7) -
tangotiger
Phil sent me a note saying he will reformat the data to be much easier to import. Look for it in the coming days.
Chance of Winning a Baseball Game (October 20, 2003)
Posted 6:12 p.m.,
October 21, 2003
(#9) -
tangotiger
(homepage)
Also note that I don't have a HFA, so that'll cost something there too.
Click on the above link for the ACTUAL results, based on Phil's data.
Chance of Winning a Baseball Game (October 20, 2003)
Posted 11:56 a.m.,
October 22, 2003
(#13) -
tangotiger
(homepage)
This chart uses probability theory as well, but this time I DO have a HFA. This will make the comparison to the empirical chart much more appropriate.
I kept the run environment at 4.3, but put the home team at 4.5 RPG and the visiting team at 4.1 RPG. The empirical has the home team with winning record .542, while in my probabilistic model I have it at .539. I could have best-fitted it to .542 as well (by setting the RPG to 4.51 v 4.09 or something), but the current HFA is actually lower. Whenever the empirical data gets expanded to cover the recent seasons, the .539 might probably be about right.
Anyway, now the effect that FJM notices in the bottom of the 9th is much stronger, and the likelihood is simply that you have better pitchers in the game.
Chance of Winning a Baseball Game (October 20, 2003)
Posted 2:24 p.m.,
October 22, 2003
(#15) -
tangotiger
HFA was around .540 for the longest time, but over the last 10 years, that has shrunk considerably, for unknown reasons.
As for "significantly lower" and "true closers", you seem to be implying that by having a "true closer" that you must have much better pitching in the 9th these days than in the old days (relative to their league). I would say that the "true closer" is not even necessarily the best reliever on the team, and I would highly question your presumtion of the "true closer" impact, and even if you are right, I would be shocked if it's "significantly lower".
Chance of Winning a Baseball Game (October 20, 2003)
Posted 3:09 p.m.,
October 22, 2003
(#16) -
tangotiger
1999-2002
row1: all PAs, in the 9th or later innings, with a score differential within +/- 1 run
row2: all PAs
close/late: .254/.350/.389
all: .267/.338/.429
1974-1978
close/late: .255/.338/.353
all: .258/.325/.377
Taking the ratio of OBAxSLG and we have
1999-2002: Close/Late: 94% of league average
1974-1978: Close/Late: 97% of league average
I probably should have removed IBB, but oh well.
So, setting the league average runs scored to 4.3, that gives us:
1999-2002: Close/late: 4.04 runs per game
1974-1978: Close/late: 4.17 runs per game
So, yes, the good inning late relievers of today are better than their peers of 30 years ago, relative to league average (for reasons of skill and dilution). But the impact is pretty small.
Using the Tango Distribution, this is what it works out to:
Runs/inn 4.04 4.17
0 0.7440 0.7379
1 0.1460 0.1483
2 0.0627 0.0644
3 0.0270 0.0280
4+ 0.0203 0.0214
Chance of Winning a Baseball Game (October 20, 2003)
Posted 9:48 a.m.,
October 23, 2003
(#19) -
tangotiger
Situation: home 9th inning, home team down by exactly 1
1999-2002: .250/.325/.386, 86.5% of league, 3.72 equivalent RPG
1974-1978: .254/.316/.360, 92.8% of league, 3.99 equivalent RPG
Tango Distribution
3.72:
0 runs - 75.9%
1 run - 14.0%
2+ runs - 10.1%
3.99:
0 runs - 74.6%
1 run - 14.5%
2+ runs - 10.9%
So, with the current relievers, they lose the game in the 9th 10.1% of the time, and go into extra innings to lose another 7% for a total of 17.1% loss.
With the 70s style/talent, they lose in the 9th 10.9%, and another 7.2% for a total of 18.1% loss.
Phil's data from 1978-1990 says 18.9%.
The "theoretical everyone is equal, with HFA" says 20.4%
Evaluating Catchers (October 22, 2003)
Posted 6:28 p.m.,
October 22, 2003
(#2) -
tangotiger
Unfortunately, I don't have access to the PBP files from 1993-1998. Not only am I missing out on a good chunk of years, but all that cross-referencing that I do would also be missed.
Evaluating Catchers (October 22, 2003)
Posted 9:09 a.m.,
October 23, 2003
(#6) -
Tangotiger
Follow the link to "Ray Kerby" from the article.
Evaluating Catchers (October 22, 2003)
Posted 9:32 a.m.,
October 23, 2003
(#7) -
tangotiger
Thanks for your kind words, guys.
Evaluating Catchers (October 22, 2003)
Posted 10:57 p.m.,
October 23, 2003
(#11) -
Tangotiger
Thanks again guys!
FJM: hmmm... good idea.
Colin: almost always, when I do these types of studies, I take the "lesser of the two PAs". The primary reason I didn't do it in this case was that it was a little more work than I wanted to do. And, I would have to do that fudging you mentioned to bring the PAs back to their norm. However, all the other actuals for the catchers would not necessarily match. Because of the sheer number of pitchers, I feel pretty safe that any oddball effect would be drowned out mostly. However, I would certainly have no problems doing it as you suggest.
Evaluating Catchers (October 22, 2003)
Posted 7:51 a.m.,
October 24, 2003
(#14) -
Tangotiger
That's a good point... no I did not.
Evaluating Catchers (October 22, 2003)
Posted 1:47 p.m.,
October 27, 2003
(#16) -
tangotiger
In the example in the article, I would take Foote's rate, and apply it to the 2176 PAs with Rogers. I do this with all catchers that Rogers had. This tells me how good the "non-Carter" catchers were. And I repeat the step, etc, etc....
***
What Colin's trying to do is scale back up the # of PAs, so that the actual # of PAs is preserved.
Evaluating Catchers (October 22, 2003)
Posted 5:07 p.m.,
October 27, 2003
(#19) -
tangotiger
In response to Colin, and looking only at Carter's PB and WP, and Carter's top 20 pitchers:
The total # of Carter PAs was 47,476. However, if I weight by the lesser of the two PAs (with and without Carter), the PAs is 40,724.
If I prorate the WP and PB down to the lesser of the 2 PAs, I get:
Carter is 52 WP and 36 PB better than his baseline catchers.
Increasing all the totals by 17%, so that we put the PAs back in line to 47,476, and Carter is 61 and 42 better. In the original process, he was 79 and 51 better.
Only 2 pitchers were affected by this process: Steve Rogers and Charlie Lea. As it turns out, the 2 of the 4 pitchers that benefited the most from Carter were.... Rogers and Lea.
The effect is 27 extra WP and PB over 47,000 PAs, or about 1 run per season.
So, what we have is that:
- the majority of the pitchers did throw to other catchers more than they did to Carter
- of all catchers, Carter would have this effect the most
- of the pitchers who's other catchers had their PAs prorated up, they were particularly better with Carter
And after all this, the net effect is 1 run per season (PB and WP only).
I would have to say that even though taking the lesser of the two PAs has some good aspects to it, you have to figure that the impact of not doing so will be pretty small.
Cities with best players (October 23, 2003)
Posted 12:12 p.m.,
October 23, 2003
(#3) -
tangotiger
Uhhhh... only look at their prime years.... Not sure if Gretz post 88 would still be "top 10" level.... might be though.
Cities with best players (October 23, 2003)
Posted 1:42 p.m.,
October 23, 2003
(#7) -
tangotiger
NY: I don't follow basketball, but where do Patrick Ewing and Willis Reed fall in the "greatness" list? As for Phil Esposito: I don't think so. His best years were in Boston, and even then, I wouldn't put him in the all-time category. NY can choose from Bossy and Trottier too.
A "top 10" for hockey would include:
For sure: Gretzky (Edm), Howe (Det), Orr (Bos), Lemieux (Pit) (in whatever order you want)
The next rung would have any 6 of the (at least) following: Rocket (Mon), Beliveau (Mon), Dionne (LA), Lafleur (Mon), Bossy (NY), Trottier (NY), Messier (Edm), Bourque (Bos), Robinson (Mon), Harvey (Mon), Roy (Mon/Denv), Hasek (Buf), Plante (Mon). Yzerman (Det), Sakic (Denv), Jagr (Pit), and Forsberg (Denv) might be part of that list too.
I think Roy is probably the only legitimate 2-city star. You can argue whether 1986-1995, or 1996-2003 were his best years.
Cities with best players (October 23, 2003)
Posted 10:49 a.m.,
October 24, 2003
(#23) -
tangotiger
The point about "where they played" was to let Boston fans know that they've been lucky to have seen such great players at their peaks (Russell, Williams, Orr).... I mean, some people would argue that each of those players were the best ever at their sport.
That hockey list looks fine except for Phil Esposito. Are you a closet Espo fan or something? I'm the biggest Roy fan, and I think Hasek is better. In terms of "career", I think Marcel Dionne probably had it better than Espo.
Cities with best players (October 23, 2003)
Posted 12:27 p.m.,
October 24, 2003
(#25) -
tangotiger
I'm an open anti-Espo guy, so that may have something to do with it too.
My favorite Phil Esposito moment was when Boston retired his number (#7). At the time, Ray Bourque also had #7 (for reasons I don't know, since his first year he had #29). I'm sure there was talk in the press about how can you retire a number if someone else is also wearing it, etc, etc.
Anyway, at center ice, as Bourque skates to Espo to congratulate him at the retirement ceremony, Bourque takes off his #7 jersey to reveal another jersey underneath (#77). Bourque turns around to show Espo Bourque's new number.
Espo, for one of the few times in his life, was completely speechless, and mouth gaping. The crowd cheers like crazy. Espo composes himself enough to speak over the crowd to say "I will never forget what this man just did for me."
****
My other favorite Bourque moment was when he was in Colorado (40 years old), and playing in yet another double-OT game. While everyone is in the dressing room between periods, he starts doing push-ups, and announces "Anybody here feel tired?".
****
And my last hockey story was Pavol Demitra. He had a clause in his contract to increase his salary by 500,000$ if he reached a certain milestone (I think 40 goals). In the last game with under a minute to go, he's on the ice, while the other team pulls their goalie. Pavol gets the puck, and as he's skating towards the net decides to pass it to his teammate (I think his teammate had 2 goals).
The Blues management actually wanted Pavol to score the goal, because that contract bonus was being covered by insurance. They paid the premium on it, and so, would have been happy for Pavol to collect on it.
Best time to bring in your best reliever (October 23, 2003)
Posted 10:00 p.m.,
October 26, 2003
(#2) -
tangotiger
I agree... that's the toughest part.... when to warm him up, and not waste the warm ups. If he could stop the game, and let his reliever warm up for 5 minutes, that'd be great.
Since that's not the case, he either has to bring in his reliever a little early, or a little late.
Best time to bring in your best reliever (October 23, 2003)
Posted 6:49 a.m.,
October 28, 2003
(#4) -
Tangotiger
The really high leverage situations are when you've got men on base already. Managers really should be bringing them in with men on base in the 8th. Playoff managers know this, as well as 70s/80s managers. After McKeon and Alou, maybe Whitey and Davey Johnson need to be brought back in?
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Discussion ThreadPosted 1:49 p.m.,
October 27, 2003
(#2) -
tangotiger
I emailed all the readers who participated, telling them that I will send their individual scores, if they so ask. I had about 20 emails bounce, and perhaps you were one.
The only reason I did not want to post everyone's score is that I didn't want to embarrass anyone.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 2:25 p.m.,
October 27, 2003
(#4) -
tangotiger
Ahh, I love the manliness of that response. Stevens, your score was .840. The average reader was .788.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 2:59 p.m.,
October 27, 2003
(#6) -
tangotiger
David: .748, which makes you #63.
(.788 was # 89).
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 3:12 p.m.,
October 27, 2003
(#8) -
tangotiger
No, I did not.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 4:33 p.m.,
October 27, 2003
(#15) -
tangotiger
(homepage)
The original article is the above link. Here's the paragraph in question, for the monkey:
===============
The baseline forecast is very simple: take a player's last 3 years OPS or ERA. If he was born 1973 or earlier, worsen his OPS by 5% or his ERA by 10%. If he was born 1976 or later, improve his OPS by 5% or his ERA by 10%. The 1974-75 players will keep their 2000-2002 averages.
==============
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 4:36 p.m.,
October 27, 2003
(#16) -
tangotiger
Tim: .900, #149
Michael: .695, #22
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 5:09 p.m.,
October 27, 2003
(#18) -
tangotiger
Sky: .761, #72
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 7:12 p.m.,
October 27, 2003
(#30) -
Tangotiger
Hmmm.... ok, I guess I'll embarrass everyone, except the bottom 10. I'll post the individual results tomorrow morning. I'll also break it by hitters/pitchers.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 8:59 a.m.,
October 28, 2003
(#39) -
tangotiger
(homepage)
The above link shows all the reader picks. Since Ira was the person who has asked to be embarrassed and was the most embarrassed, I reverted to initials for all names after Ira. All other names are exactly as was written back in April.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 10:05 a.m.,
October 28, 2003
(#43) -
tangotiger
File has been updated to include breakdown by hitters and pitchers, overall. Note that we have 20 hitters and only 8 pitchers.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 10:46 a.m.,
October 28, 2003
(#48) -
tangotiger
Sylvain, that is a good point.
Now that the Primer readers are well aware that their intuition on players with up-and-down recent careers is worse than the baseline, what will they do?
Sabermetricians have longed said that sample size, sample size, and sample size is critical. What we have here is that just taking the last three years unweighted, and making virtually no adjustment (except a slight one for age) is almost exactly what a professional forecasting engine gives you (and they would certainly weight the seasons a little different, and make more adjustments for age, park, and make even more adjustments for types of players, be it speed, power, patient, etc, etc).
In terms of interpreting numbers, back-of-the-card calculations or intuition just won't cut it (though again, as a group, all the biases of the readers cancel out almost very nicely).
What edge does that leave anyone? Personal scouting is the only thing left, I think.
The project I am mulling for next year is to have about 300 players on the ballot, and the readers can ONLY choose those players that they've seen with their eyes at least 10 games, or they are a member of the team they follow all the time. I'd only count those players where there are at least n number of readers making their forecasts. I'm not sure what n would be, but I'm hoping to make it 20, but realistically, I'd say 5.
I'm not sure if I'll be able to get the professional forecasters to give up their OPS/ERA forecasts for this many number of players, but, as we've seen, the baseline forecast does a pretty good job anyway.
We'll see how well the scouting eye holds up.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 10:54 a.m.,
October 28, 2003
(#49) -
tangotiger
Nick (#3): the scores (hitter/pitcher) for
.56/.97 - baseline
.62/.87 - mean of forecasters
.66/.79 - readersgroup
Among the individual systematic forecasters, Palmer and Silver were 1-2 for hitters, but Shandler and Szymborski were 1-2 for pitchers. Palmer was last for pitchers, and Shandler last for pitchers. So, they were all over the place. I doubt that the systematic forecasters have similar engines, though, overall, they get similar results.
I think pitching is the one place where "personal scouting" might come into play, and, though we have only 8 extreme pitchers, the readers did great there.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 12:40 p.m.,
October 28, 2003
(#52) -
tangotiger
All the forecasters sent me their picks, but Palmer and Warren do not have theirs published.
MGL: I can do the 5/4/3 weighting (though I would make it 5/4/3/2, where "2" is the league mean). Park factors I don't have, and I think it would be too much for me to do at this point.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 12:55 p.m.,
October 28, 2003
(#53) -
tangotiger
(homepage)
Hmmm... very interesting.
Somewhere in the above link (Banner Years article), I proposed a very simple scheme, which I call the 5/4/3/2 scheme (for hitters). You take 5 parts 2002, 4 parts 2001, 3 parts 2000, 2 parts LeagueMean. That last component is the "regression towards the mean". This is for hitters.
Though I've never published it for pitchers, I usually toy around with 3/2/1/2. That is, I give a little more weight to the 2002, and less weight to the 2000 years, and I regress alot more.
And the results? This monkey, which we'll call Marcel, after the Friends' monkey, does extremely well. How well? Better than all the other forecasters.
Marcel the monkey has a .653 score overall, including a .59 score for hitters (#3 among the systematic forecasters) and a .82 score for pitchers (also a #3 among the systematic forecasters). This combination was enough to propel Marcel ahead of the 6 systematic forecasters.
I'm actually a little surprised by the results. I would have expected the 6 systematic forecasters to have applied the Marcel scheme, and enhanced it. It doesn't appear to be the case.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 5:35 p.m.,
October 28, 2003
(#58) -
Tangotiger
Palmer=Pete Palmer.
Warren=Ken Warren
STATS=no one bothered to ever return my emails.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 8:38 p.m.,
October 28, 2003
(#63) -
Tangotiger
If you want that information I'd ask people to put y/n on a question have you seen this player. That way you can study both and don't lose data.
PERFECT idea.
***
As for MGL (sorry for blowing your cover Mickey), he did participate in 2 prior Voros studies, and he did not do as well as Voros, who has been retired as champion. You can find the results to those studies on Voros' site.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 11:10 p.m.,
November 1, 2003
(#77) -
Tangotiger
Just to add an additional perspective to PECOTA-type projections.
1 - PECOTA attempts to establish PERFORMANCE probability ranges. There are 3 very important things that affect these figures
a) the true talent level of a player. We can't know this, and therefore, this itself comes with its own probability distributiong. Say that Bonds' true talent level is .500 OBA, with 1 SD = .050.
b) the probability distribution of performance over 600 PA, given the true talent. So, now we have a probability distribution of a probability distribution. So, over 625 PA, you might have 1 SD for Bonds' OBA of .500 at .020. Then, you have another one for Bonds' OBA of .480, etc, etc. a) and b) combined will widen the distribution
c) the number of PAs. And in here, there are 2 subsets: is his PA limited because of some random occurrence, or is it limited because of poor short-term performance, and his opps have been limited because of selective sampling? So, again, you have to have different distributions at different PA levels. If Bonds had only 150 PAs, changes are, he got hurt. If Jeff Weaver get 15 starts, changes are, he's selectively sampled. But again, you need a probability distribution of why he has those PAs.
So, even though PECOTA gives those distributions, there's really another dimension missing, playing time.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 11:11 p.m.,
November 1, 2003
(#78) -
Tangotiger
"changes" = "chances"
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 10:09 a.m.,
November 3, 2003
(#79) -
tangotiger
The average standard deviation of the forecast of each player, among the readers, was .46 overall (.42 for pitchers and .47 for hitters).
The actual year-to-year standard deviations is over 1.0.
So, for the people thinking that the readers would give a representative distribution of the players' probability performance ranges, this is not the case.
Results of the Forecast Experiment, Part 2 (October 27, 2003)
Posted 11:35 p.m.,
November 9, 2003
(#83) -
Tangotiger
Should have realized since Andrea Bocelli is not a girl, and I'm italian.
For Aging Runners, a Formula Makes Time Stand Still (October 29, 2003)
Posted 4:01 p.m.,
October 29, 2003
(#3) -
tangotiger
(homepage)
Here's the chart.
Gleeman - Jeter - Clutch (October 30, 2003)
Discussion ThreadPosted 1:22 p.m.,
October 30, 2003
(#2) -
tangotiger
I wrote this elsewhere, but it applies to clutch hitting, pitcher's skill at preventing hits on balls in play, catcher's skill at blocking the plate, etc, etc, etc:
The sample size required to show evidence of clutch hitting is so large that we don't have the chance to prove clutch hitting. And, the likelihood is that even if some players are proven to be clutch hitters, then
1 - by the time you figure this out, he's 38 years old
2 - the impact will not be large enough that you would even care
The numbers at our disposal make it unlikely that we can ascertain who is a clutch hitter. If we want to figure out who is a clutch hitter, we have to look elsewhere.
Gleeman - Jeter - Clutch (October 30, 2003)
Posted 3:03 p.m.,
October 30, 2003
(#4) -
tangotiger
Comparing to teammates: now you're really mincing it down, aren't you? I mean, I agree that this should be done, but if Jeter is
245/.345/.329 hitter with men on base
what do you expect to find with his teammates, something like
230/.310/.300?
You'd have to see Pedro-like numbers from his teammates to show that Jeter stood head and shoulders above them.
And if you take out Jeter's injured games, you have to do it to all injured players.
Gleeman - Jeter - Clutch (October 30, 2003)
Posted 1:47 p.m.,
October 31, 2003
(#8) -
tangotiger
Comparing Jeter to his teammates is not enough. What you want is:
deltaJeter: Jeter post v Jeter regular
delatYanks: rest of Yanks post v rest of Yanks regular
Then, we can talk.
And, if you want to take out Jeter's injured series, go ahead.
Btw, didn't Giambi hit 4 HR this post-season? And wasn't his OPS near the top of the team as well?
Gleeman - Jeter - Clutch (October 30, 2003)
Posted 11:18 p.m.,
November 1, 2003
(#17) -
Tangotiger
# of PAs for Jeter would have been better, since your splits sometimes have multiple lines per year, and sometimes not.
Summarizing:
96: plus
97: plus
98: minus
99: plus
00: plus
01: minus
02: plus
03: plus
Pretty good overall.
Value of keeping pitch count low (October 30, 2003)
Posted 7:11 p.m.,
October 30, 2003
(#4) -
Tangotiger
I agree, I'd jump straight to IP/GS.
Value of keeping pitch count low (October 30, 2003)
Posted 11:42 a.m.,
October 31, 2003
(#5) -
tangotiger
I sent this to one of the Primer readers, and perhaps it will be interesting to readers here as well:
=============
Taking all pitchers from 1993 to 2002, and using my own estimate for BFP, I get for pitchers with at least 300 PA is a season:
"power": n=412, PA/IP=4.37 (using 1.3+ as the boundary for bb+so / IP)
"crafty": n=482, PA/IP=4.37 (using .9- as the boundary for bb+so / IP)
crafty:
4.42 ERA
1.09 HR
2.51 BB
4.77 SO
9.82 H
power:
4.00 ERA
0.94 HR
4.21 BB
8.86 SO
8.11 H
... wanna see something cool? here are the BB+H for
both
crafty: 12.33
power: 12.32
So.... they allow the same number of runners per IP, and get the same number of batters per per IP.
Value of keeping pitch count low (October 30, 2003)
Posted 11:25 p.m.,
November 1, 2003
(#8) -
Tangotiger
Crafty: .291, Power: .283
When you come up with a defintion of "power" and "crafty", they are just some loose terms to try to convey something. It's obvious that some players will appear when they shouldn't and vice-versa.
However, you want to characterize the pitchers by some street term, you are still left with one group with 4.77 K and 2.51 BB, and another group with 8.86/4.21.
This whole thing started with pitch counts. And, the pitchers with lots of BB and K will throw more pitches than the low BB and K, simply because they'll go deeper in the count.
So, if you want to call them "high maintenance" or "game workers" or whatever, that's fine. But, the point is to try to separate guys with lots of pitches and few pitches per game, without using actual pitch count totals.
Value of keeping pitch count low (October 30, 2003)
Posted 9:37 p.m.,
November 2, 2003
(#10) -
Tangotiger
The 1.08 is simply 4.50 / (4.50 + 1.08) = .80 (more or less). Feel free to debate the merits of this.
Gleeman - Vlad (November 4, 2003)
Posted 7:56 a.m.,
November 5, 2003
(#2) -
Tangotiger
I'll defer to the A's braintrust when the cited their market research saying Giambi was worth 119 million$ for 7 years.
I've done very very little in terms of establishing a player's worth to a team long-term. And if Vlad is more like a +9 replacement guy, then he shoots up to 18 mill per year.
The market will always overpay, in any case, because we don't have the shorts to keep them honest. If 29 teams think he's worth 12 million, and 1 team think he's worth 15, how much is he really worth? Unless you can account for the extra 3 for that last team through something unique to that team, he's probably worth 12.1 million.
Gleeman - Vlad (November 4, 2003)
Posted 12:27 p.m.,
November 5, 2003
(#4) -
tangotiger
I would think so, and had thought so.
Voros' look at this issue should little to no relationship to the market size.
Frankly, I'm skeptical about this. The "rule of thumb" that I use was first put forth by Pete Palmer in the Hidden Game of Baseball. Each marginal win accounts for 2% increase in attendance. Assuming that you can extrapolate that 2% to ALL revenues, then a 110 million revenue generating team will have 1 marginal win = 2.2 million $ (similar to the 2.6 that Voros found). And so, you'd probably give 1.5 of that to the player.
If we DO make the relationship to revenues, then the Yanks, who generate at least double the revenue of the average team would double the marginal win to over 3 million $ / win.
As well, increasing the chances of your team getting into the playoffs also adds even more to the marginal $ / win.
As you can see, the margin of error here is pretty large. There is *alot* of variables in-between the performance of the player and the eventual revenue generated. The more variables, especially those unreliable variables, the more shaky ground you are on.
Gleeman - Vlad (November 4, 2003)
Posted 12:29 p.m.,
November 5, 2003
(#5) -
tangotiger
Also note that revenue sharing decreases the marginal $ / win. Sometimes, I think that economists and not baseball writers should be the ones covering labor negotiations.
"Wow! Look at all that money! There's millions there! Can't they agree on something?"
Gleeman - Vlad (November 4, 2003)
Posted 4:24 p.m.,
November 5, 2003
(#9) -
tangotiger
(homepage)
You guys are great!
The above link shows the complete study in question. The interesting aspect there is the split between the teams that are in contention against those teams that are not in contention.
Essentially, a team not in contention has a marginal $ / win of under 0.5 million $ / win, while a team in contention is all over the place, depending on which team it is.
Pretty interesting stuff, though with only a few years of data, I'm not sure of its reliability (and the same applied to the Voros study).
Fun with Win Shares (November 5, 2003)
Discussion ThreadPosted 10:20 a.m.,
November 7, 2003
(#4) -
tangotiger
(homepage)
I urge readers to go to the above link, as studes has some excellent analysis on zero-ing out those negative win shares.
Fun with Win Shares (November 5, 2003)
Posted 12:40 p.m.,
November 7, 2003
(#5) -
tangotiger
(homepage)
Batters Box has a thread on Win Shares, and I made a comment at the above link.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Discussion ThreadPosted 5:02 p.m.,
November 5, 2003
(#3) -
tangotiger
there is evidence that pitchers do NOT have good and bad days
I agree with Ross, and disagree with MGL.
What we should say instead is:
"the sample data is not large enough for us to show WHO is having a good day or bad day"
Obviously, everyone has good/bad days, obviously clutch exists, obivously pitchers control the outcomes on BIP, obviously chemistry affects a team.... but for us to catch this, to know this, to say that "yes, this guy has it" ,etc, etc... we can't. There's not enough data to make it statistically significant.
So, while I would ignore what we think is a good day or bad day, I do this not because it doesn't exist, but because I can't figure it out using the data.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 5:04 p.m.,
November 5, 2003
(#4) -
tangotiger
And no one can really figure it out using the data. You have to go beyond the data.
The problem is that people will always look at the result, instead of ignoring the results completely, and concentrating only on the mechanics, the delivery, the movement, the demeanour, etc.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 3:26 p.m.,
November 6, 2003
(#10) -
tangotiger
(homepage)
You've all heard how Pedro doesn't have the stamina at the over 105 pitch count limit right?
Please click on the above link.
And to all you analysts who based your opinions on 100 PA: you should know better.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 8:16 p.m.,
November 6, 2003
(#13) -
Tangotiger
Walt, Pedro walked .12 walks per batter in the post-105 2001-2003, and .06 walks per batter pre-2001. 2 SD over 100 PA = .05. So, it may look like alot, I'm not that impressed. As noted, his $H was .400. You've gotta figure that since even bad pitchers don't have a .400, that Pedro was probably extremely unlucky on BIP. His ERA was a direct consequence of the $H.
And, even if Pedro is actually that line I showed in 2001-2003, he's still a league average pitcher, and certainly no Jeff Weaver. Even if he was tired, he went from being superhuman to mortal. And that's the worst-case.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 11:28 p.m.,
November 6, 2003
(#15) -
Tangotiger
Why was he tired in 2001-2003 and not earlier? Injury? Here, compile a list of injured pitchers that you think might be susceptible to hitting a wall more than others.... without looking up the stats. Just give me 10 pitchers. Then, let's look to see what their $H was in 2001-2003. My guess is that they will be just slightly worse than their overall average. (i.e., if their overall average was .300, then their post-105 pitch $H would be .310). No cheating now.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 9:38 a.m.,
November 7, 2003
(#18) -
tangotiger
What evidence do you have to suggest that $H is consistent from pitch 1 to pitch 115?
Realizing that not the same pitchers make up each pool, and realizing that even great pitchers like RJ etc don't have a $H that is that distinguishable from the average Joe MLB pitcher....
Pitch $H
01-15 0.290
16-30 0.285
31-45 0.284
46-60 0.281
61-75 0.287
76-90 0.279
91-105 0.281
106-20 0.288
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 9:39 a.m.,
November 7, 2003
(#19) -
tangotiger
That's 2003 AL.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 9:51 a.m.,
November 7, 2003
(#20) -
tangotiger
I took all pitchers from 1994 to 2002, and selected only those pitchers who had, at most, 100 BFP in that season. You have to figure that such pitchers probably produced pretty bad. Here are there overall totals:
ERA: 6.67 (4.51)
H/9IP: 10.9 (9.3)
HR/9IP: 1.5 (1.1)
BB/9IP: 5.1 (3.5)
K/9IP: 6.3 (6.5)
League average totals in ().
You have to admit, that's a pretty sucky pitcher, and probably representative of what you think about with a replacement-type pitcher. His ERA+ is a paltry 68. And what do you think his $H was? 0.312. The league average is .285.
If you take the league BB and divide by this group's BB, you will get BB+. That figure is 69. The HR+ is 71. The H+ is 91.
As we've found out with the Allen/Hsu model, 1 SD in $H is around .010. We expect 95% of pitchers to be within .020 of the league average. The difference between the paltry group and the league average $H is .027 (or 3 SD).
Do you think it's reasonable to think that Pedro's $H would be .100 different from the his mean when he's tired (TEN SD)? His BB+ was 50 (compared to his mean). His ERA+ (relative to his mean) was comparable to the paltry group. I think that, at most reasonable, Pedro's $H would have increased by about .030 because he was tired. .100? No way.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 11:00 a.m.,
November 7, 2003
(#24) -
tangotiger
I just want to point out that selecting pitchers based on $H, and then giving those numbers is extremely biased.
What I did, selecting pitchers based on number of PAs is also biased, but not as much. And, that bias would give a sort of upper boundary. By selecting pitchers with at most 100 PAs, we are getting:
a) pitchers who suck
b) pitchers who got hurt
c) pitchers who were unlucky, and weren't given the chance
d) pitchers who got called up, and might have a good chance next year
I think that having a pitcher with that ERA/BB/HR line was pretty much a "worst-case" scenario. That is, that's as bad as a pitcher can reasonably be expected to pitch at the MLB level. And his $H was only .030 points worse than the league average.
In conjunction with our best-guess that the true talent level of a pitcher's $H has 1 SD = .010, then I would say our best-guess of a pitcher getting tired will have his $H be worse than his normal level by .030.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 2:06 p.m.,
November 7, 2003
(#27) -
tangotiger
In post 19, I said those numbers are from 2003 AL. In actual fact, if I show you the AVG, OBA, and SLG, you will be quite surprised. Here it is for AL, 2003
AVG OBP SLG
01-15 0.267 0.331 0.420
16-30 0.256 0.331 0.403
31-45 0.267 0.329 0.426
46-60 0.275 0.340 0.452
61-75 0.273 0.327 0.427
76-90 0.274 0.333 0.436
91-105 0.266 0.329 0.422
106-20 0.265 0.333 0.408
Realizing that the very good pitchers would dominate the 106-20 category, what you have are very good pitchers that end up pitching like a league average type pitcher. I would also say that you statement is accurate, that managers probably ARE skillful at removing pitchers.
You have to figure that such pitchers probably produced pretty bad
Not really. You have pitchers who got september callups, pitchers who pitched well and were ...
I already said that in post #24.
Please explain to me what kind of pitcher would put up the following
ERA: 6.67 (4.51)
H/9IP: 10.9 (9.3)
HR/9IP: 1.5 (1.1)
BB/9IP: 5.1 (3.5)
K/9IP: 6.3 (6.5)
$H: .312 (.285)
I'm giving you here a pitcher with an ERA of 6.67, with 5.1 walks, and K/BB ratio of around 1.2, and who gives up alot of HR. And yet, and yet, his $H is "only" .027 above the league mean.
What does this suggest to you?
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 2:29 p.m.,
November 7, 2003
(#29) -
tangotiger
Here's some interesting stuff. Using ALL pitchers from 1994-2002, here's their performance lines, when:
min .20 BB / PA
H ER HR BB SO PA $H
10.28 9.28 1.46 12.43 7.37 49.71 0.310
(about 5000 PA).
Ok, so I selected guys who just performed really really bad at walks. Their ERA was pretty high, they allowed a ton of HR, struck out at a good rate, and their $H was only .310.
max .05 SO / PA
H ER HR BB SO PA $H
14.19 9.15 1.60 6.41 1.36 47.59 0.329
(about 4000 PA)
Again, truly awful numbers, lots of HR, lots of BB, very few K... and still only $H of .329.
min .08 HR / PA
H ER HR BB SO PA $H
13.94 11.15 4.79 4.94 6.40 45.88 0.308
(about 4000 PA)
Once again, a truly awful performance, with a super-high ERA, lots of walks, a ton of HR, average # of K.... and a $H of .308.
Don't you find it strange that guys who perform rather poorly in either walks, K, or HR will also perform poorly in at least one of the other areas, and yet, have a $H that is bad but not horrible?
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 2:36 p.m.,
November 7, 2003
(#30) -
tangotiger
And all this $H is pretty much expected. In the time period, using my own PA calculations, the average $H was .285. We know, using the Allen/Hsu model, that the "true talent" rate of $H for a pitcher follows a distribution where 1 SD = .010. This means that we'd expect virtually every pitcher to have a "true talent rate" $H to be between .255 and .315. That essentially gives you the bounds at which the true talent rate of a pitcher will perform, with respect to preventing hits on BIP.
And, selecting pitchers based on either truly horrible BB,HR, or K rate, and the $H were between .310 and .329.
I would bet that the floor for a pitcher's $H performance, at the MLB level, is around .330.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 2:42 p.m.,
November 7, 2003
(#31) -
tangotiger
One last one.
Looking for across the board ineptness:
max .10 SO / PA
min .15 BB / PA
min .05 HR / PA
Here's your completely pathetic pitcher:
H ER HR BB SO PA $H
13.3 10.1 2.6 7.8 2.3 48.1 0.302
(about 2500 PA total)
So, you have a pitcher that gives up a ton of HR (2.5 x the league average), a pitcher that gives up a ton of walks (2.5 x the league averaage), a pitcher that can't strike anyone out (the league is 2.5 x higher), and yet.... and yet... somehow, these truly inept pitchers managed to get a .302 hits per ball in park.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 10:29 p.m.,
November 7, 2003
(#34) -
Tangotiger
Grrrr....
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 10:53 p.m.,
November 7, 2003
(#35) -
Tangotiger
Ok, 15 minutes have elapsed now, so I'm nice and calm.
What f-ck-ng p-sses me off about someone like.... well, guess.... is that I plant a whole g-dd-mn forest, and the best comeback is about s-me f-ck-ng fern. G-dd-mn it. Why bother coming here, other than to p-ss me off, and not illuminate anything. G-dd-mn it.
Uhmmmm... sorry about that.. somehow Cartman from South Park took over.
Let me try to repeat my point, so please try to look at the broader picture.
I looked for the absolutely worst-performance in a season by a pitcher, from 1994 to 2002. I was looking for across the board ineptness:
max .10 SO / PA
min .15 BB / PA
min .05 HR / PA
Any way you slice it, if you come across that kind of performance, that pitcher pretty much sucked.
Here's what 2500 PAs of that actually did, on a per 9IP basis:
H - 13.3
ER - 10.1
HR - 2.6
BB - 7.8
SO - 2.3
$H - 0.302
Now, that is one helluva bad performance. I specifically chose pitchers who gave up tons of HR, gave up tons of BB, and couldn't strike anybody out. 2500 PAs of that. That's one pretty terrible performance. But, somehow, they managed a .302 hits / BIP, where the league was .285.
Don't you find it rather odd that I can find such a horrible set of performance, and when I look at the one variable that I did not select from, that I got a $H that was just a bit worse than average? Wouldn't you have expected a $H of somewhere in the .350 to .400 level?
But that's not at all what we found.
I maintain that at the MLB level, the true talent level for $H is probably around .330 at worst-case. And therefore, I find it virtually impossible to believe that Pedro Martinez, even if he was supposedly a tired and worn out pitcher (yet still managed to strike out 26% of his batters), could have a $H of .400 by his ability. I contend that it was almost definitely incredibly bad luck.
26 K, 12 BB and 0 HR, over 21 IP is a rather decent performance, and that's what Pedro did in pitch count 106+ from 2001-2003.
Pedro may actually have been tired, and maybe his performance dropped drastically.... but his performance was still league average at worst.
F-ck!
*****
Apologies all-round. If you read the 2nd paragraph as Cartman, maybe it'll absolve me.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 7:15 p.m.,
November 8, 2003
(#37) -
Tangotiger
No, you do make a good point, as I was thinking about this myself, and was waiting for someone to bring it up.
The one interesting thing about the "tired" Pedro (post 105 pitches, 2001-2003) is that even though his K/BB did become human (26/12), he still managed to not give up a HR in 21 IP. It's very possible that one of the "quirks" in DIPS is the removal of the HR from the BIP.
It's possible that a hard hit ball against Pedro-tired would simply become a hit rather than a HR against other types of pitchers. That is, we may *expect* Pedro-tired to have a $H far higher than a schlub, simple because he gets to keep the ball in the park, even tired.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 9:33 a.m.,
November 9, 2003
(#40) -
Tangotiger
I'm saying that .330 is about as bad as bad should get, that .400 is really quite inconceivable from a true talent persective (but obviously quite expected from a sample of 63 BIP).
I'm saying that Pedro was supremely unlucky at .400 over 63 BIP (with that being only 2 SD away from his mean), and that it's more likely that a .320 or so is really representatiove of Pedro's true talent on BIP.
Once you accept that, Pedro-tired is about a lg avg pitcher.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 11:32 p.m.,
November 9, 2003
(#43) -
Tangotiger
Mike, there's great data to interpret in there, and I'll take a better look tomorrow.
However, most importantly, you have serious selective sampling issues. By limiting to the max 4 innings, you can guarantee that you have "unlucky" breaks on the pitcher. Phil Birnbaum did a great article on a similar issue in one of the BTNs.
For those who don't follow, if Roger Clemens gets rocked in the 1st 2 innings, and Torre has no faith in him, he'll take him out. If, at the end of the season, you pick out all of Clemens' start of max 2 innings, what whill you find? high $H, high HR, high BB, low K. It's a given.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 11:33 p.m.,
November 9, 2003
(#44) -
Tangotiger
To continue, if Torre instead had left him in after 2, what would have happened? Hard to believe, but the best guess is that in the next 2 or 3 innings, Roger Clemens would have "reverted" to his true talent rate.
So, by selectively sampling those games where Torre took him out after 2, you are not giving Clemens a chance to show his true stuff.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 10:01 a.m.,
November 10, 2003
(#46) -
tangotiger
While it would be true that he's getting hit harder, and the reason, the degree of that impact is not anywhere near as much as the observation/sample would show, simply becaue of the selective sampling issue. This is why we have the regression towards the mean concept. You can pick out all the students who get a 100 on their last test, and they simply won't get 100 on their next test. Or you can pick out all the kids with an F, and they won't get an F. What happens is that the D student follows a distribution where, by luck, he will sometimes get an F and some times get a C and alot of times get a D. By picking out all the F students in 1 test, that doesn't tell you much, other that they failed, and they got alot of wrong answers (i.e., got hit hard). And they did that, in large part, by luck. Their true talent had only something to do with failing. What we're after is trying to figure out how much of Pedro's .400 was true talent Pedro-tired, and how much was just bad luck.
Since I don't have a stats degree, I'll let those who make this their vocation discuss this with authority.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 12:02 p.m.,
November 10, 2003
(#48) -
tangotiger
So you can leave him in for 200 pitches and he will still be league average? I think you are missing the point.
Why do you have to say stuff like that?
Cue Cartman
!@##$$%%% - !@!#@$!@%%^ - !@!$@#!#%%#$^$
Exit Cartman
I'm looking at it from a practical human viewpoint. I think we can all accept that Pedro has no problems with the first 60 pitches he throws, and his arm would fall off at 300. Obviously. So, does Pedro hit a wall and he drop to a lower plateau at some point, or is a decline on a pitch-by-pitch basis? I don't know.
But, even if Pedro does hit a wall, and drop to a lower plateau, it certainly wouldn't be at the exact same pitch count each time. Maybe he'll hit a wall at 95 or 110 or 120 or 85 or something each time. The net effect, over a period of times, is that you'll probably notice a decline at some point. (i.e., Pedro say hits a wall at 105, with a distribution centered at 105, with maybe 1 SD = 5 pitches or something... just guessing).
So, what I'm saying is that when Pedro hits that wall, he becomes a league average pitcher.
You can probably create a function that says something like:
Pedro's healthy true talent $H = .290 (or whatever it is)
For every pitch above 90, increase his $H by .002, or something like that. It doesn't necessarily have to be linear. You can make it +.002 for every pitch 90 - 120 (for Pedro), and then increase by +.005 for pitch 121 - 150, and +.010 for 151+ pitches. (Again, numbers only for illustration).
So, at pitch 105, his true talent $H might be .320, and at pitch 125, his true talent $H might be .380 (or whatever it works out to).
Pitchers ability varies widely over the course of a game.
Cue RossCW
What's your evidence for this?
Exit RossCW
No, I doubt it. My guess is that a pitcher's true talent stays extremely static during the course of a game. The variation of his performance is probably just an expected distribution around his talent that day. His day-to-day abilities I would guess would change widely (not feeling well, tight muscle, healthy outlook, feeling psyched, or whatever). In-game? Probably those with low confidence might let themselves be susceptible. I don't think this applies to many MLB pitchers. But, I have as much evidence about this as you do.
There is really no reason to think that it does not vary more widely than the average variation between major league pitchers over the course of a season. There is a point at which a low a-ball pitcher will be more effective than Pedro.
Obviously. But, my best guess is that it's not Pedro at 105. Unless you know many low a-ball pitchers who can strike out 26% of MLB batters, as Pedro post-105 has done from 2001-2003, while maintaining a better than 2:1 K/BB ratio, and not give up any HR.
Maybe Pedro at 160? I dunno.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 2:25 p.m.,
November 10, 2003
(#50) -
tangotiger
I just love the way things are taken out of context, or misinterpreted.
it seems that is where the argument you are making leads. Your seem to be claiming that it isn't possible for Martinez to get so tired that he would pitch at a level less than major league average. Is that what you are saying?
No, and I provided a clarification to that by saying that at some point Pedro might have a true talent of .380, and his arm would drop off at some point. So, why ask if this is what I'm saying when I clearly said that he would further decline until he drops off?
And I misused the term "ability" - I should have said performance.
There's no question that there's great variability in performance. This is true in all walks of life where you have a binary (safe/out) result.
As you point out when Pedro starts to fade is going to vary from game to game. You are including in your averages what happens between 105 and 130 pitches even when he doesn't fade until 130 pitches.
Bingo! Bingo!!! This was to all those people who bring up Pedro's post-105 PAs, when they can't even say whether the true talent level he was showing at game 7 at pitch count 106 was the same, better, or worse than his average true talent level at pitch count 106 from 2001-2003.
My guess is that there are quite a number of low A ball pitchers that would do just fine striking out major league hitters. Unfortunately when they weren't striking them out, they would be either walking them or getting lit up.
I actually said:
Unless you know many low a-ball pitchers who can strike out 26% of MLB batters...while maintaining a better than 2:1 K/BB ratio, and not give up any HR.
So, yes, I'm sure there are some pitchers that can strike out 25% of the batters, but there's no way they'd be able to maintain a 2:1 K/BB ratio. Nuke Lalush, off the top of my head, is one of them.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 2:13 p.m.,
November 13, 2003
(#56) -
tangotiger
Just so that MGL doesn't put words in my mouth, these things are *not* going to be debunked. As I've said many many many times, all these things exist. Just because we can't see it, or have enough data to say anything with statistical significance, doesn't mean it doesn't exist.
However, if we can't see it, then, chances are, neither can you.
I agree - but a .400 BABIP is not A ball pitcher, its AAA pitching. Some pitchers put up .400 numbers in the major leagues.
Ugh. They put up .400 almost by random chance. How many times do we have to go through this? Our best guess is that 1 SD = .010 $H among MLB pitchers. If pitchers give up .400, it's far far more likely it's because they have few BIP and/or really bad fielding, rather than the pitcher's probably true $H.
And .400 is A ball pitcher. AAA pitcher is probably .320. Before you ask, I don't have any evidence.
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 2:45 p.m.,
November 13, 2003
(#57) -
tangotiger
$H BIP=50 BIP=500
0.270 0.063 0.020
0.280 0.063 0.020
0.290 0.064 0.020
0.300 0.065 0.020
0.310 0.065 0.021
0.320 0.066 0.021
0.330 0.066 0.021
The first column would be a pitcher's true $H. We don't know, and can't ever know, a pitcher's true $H (or true anything for that matter). Our best guess is that if a league has a .300 $H, then 1 SD = .010. The above table represents 3 SD of pitchers around the mean (i.e., 99%+ of pitchers).
The second column is what 1 SD of performance would be, over 50 BIP, given the true rate of the pitcher. Essentially, 19 times out of 20, we'd expect a pitcher to perform at a $H rate that is within +/- .130 hits / BIP. A true .300 pitcher, with 50 BIP, will have an observed $H of .170 to .430.
(In fact, we should add extra variation to account for park and fielding differences.)
The third column is with BIP=500. So, 19 times in 20, a pitcher's $H will be +/- .040 within his true rate.
So, based on the above, with at least 500 BIP, we should find maybe 1 pitcher a year at the .400 level, if even. (If of course they're allowed to pitch at that level for so long.)
And what do we find from 1994 to 2002? 7 pitchers with at least 200 BIP had a $H in the .360s. So, we probably have selective sampling issues here that would prevent a poor $H from pitching even at a bad level.
Bring it down to 50 BIP? We have 31 pitchers over 9 years with a $H of over .400. There were 1232 pitchers with between 50 and 150 BIP in that time period, or 2.5%. The average # of BIP of these 31 pitchers was 74, or 1 SD = .053. So, we'd expect 95% to be between .200 and .400. We'd expect 2.5% to be under .200, and .... 2.5% to be over .400.
(There were 41 pitchers with BIP of between 50 and 150 with a $H of under .200, just about exactly what we expected.)
ALCS Game 7 - MGL on Pedro and Little (November 5, 2003)
Posted 9:47 a.m.,
November 14, 2003
(#60) -
tangotiger
Ross said:
But you are developing standard deviations for 50 BIP and then applying them to pitchers between 50 -150 BIP. Shouldn't you be using the average, not the minimum, of the group - i.e. 100 BIP - to determine the standard deviation
You must have missed this, because I said:
There were 1232 pitchers with between 50 and 150 BIP in that time period, or 2.5%. The average # of BIP of these 31 pitchers was 74, or 1 SD = .053.
Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)
Discussion ThreadPosted 10:23 a.m.,
November 7, 2003
(#2) -
tangotiger
I mentioned that the impact of the SB is rather randomly distributed. This was figured out using the Leveraged Index (LI) on the SB attempt. That is, rather than figuring out the LI for a reliever or starter, I figured it for an event. Things like "defensive indifference" had an LI of 0.01 I think, while the SB attempt was around 1.00 (I don't remember exactly, but it was somewhere between 0.90 and 1.10).
The win break-even point DOES vary wildly by game situations, anywhere from 60% to 80% if memory serves. (This will be covered in a few months).
Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)
Posted 12:50 p.m.,
November 7, 2003
(#4) -
tangotiger
Ted,
In the "How Runs Are Really Created" article, I show what the chances of scoring from each base, using Markov on the 1974-1990 data.
Here's the snip from that chart:
======================================================
Chance of scoring, from each base/out state
0 outs 1 out 2 outs
1B .38 .25 .12
2B .61 .41 .21
3B .86 .68 .29
======================================================
If we focus only on the "1 out" column, we see that a SB adds .16 runs from 1b to 2b, and .27 runs from 2b to 3b. I'll guess that errors might add another .02 runs or so to the SB event itself.
The CS figure can be similarly calculated. You lose the .25 runs from being on 1B (and another .20 runs for making the 2nd out... not listed in this table, but the breakdown is essentially .24/.20/.10 in terms of the cost of the out, when it's 0, 1, or 2 outs). So, the CS is worth about -.45 runs, if it happens at 2B.
If it happens at 3B, the CS is worth .41 + .20 = -.61 runs.
So, the breakeven of the SB at 2B (+.18 SB, -.45 CS) is 71%, and the breakeven of the SB at 3B (+.29 SB, -.61 CS) is 68%.
Also note that a CS does *not* always lead to an out, the many quirks in baseball recordkeeping.
Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)
Posted 12:59 p.m.,
November 7, 2003
(#5) -
tangotiger
btw, the old adage about "don't make the 1st or 3rd out at 3b" is dead-on. The breakeven in those cases jumps to 75 to 80%. Only really really really good baserunners should try to steal 3B, or if there's a really really bad catcher behind the plate.
Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)
Posted 2:10 p.m.,
November 7, 2003
(#7) -
tangotiger
I should amend my statement. SB are *not* randomly distributed by inning/score/base/out, but they *are* (more or less) distributed in such a way that the "leverage" of when they occur, on average, is 1.0 (a random situation).
What this suggests is that
a) SB are distributed such that it follows a normal distribution
b) SB are distributed such that alot more SB occur at the center
c) SB are distributed such that alot more SB occur at the tails
while all at the same time, distibuted around the average in all cases.
Since we really care about the leverage of the situation, and not necessarily the inning/score only, I'm happy to leave it at that.
Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)
Posted 4:25 p.m.,
November 7, 2003
(#9) -
tangotiger
In a play-by-play world, I would charge the batter with part of the out of the runner.
Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)
Posted 4:39 p.m.,
November 7, 2003
(#11) -
tangotiger
I'm quite certain that Rickey stealing 130 and getting thrown out 42 (!!) times was not optimal. He probably could do 100/20, making the extra 32 SB and 22 CS a very bad thing.
Offensive Performance, Omitted Variables, and the Value of Speed in Baseball (November 6, 2003)
Posted 10:21 a.m.,
November 10, 2003
(#15) -
tangotiger
Ted, actually, I don't think I ever read that paper, since I'm involved in similar research, and I didn't want to have any kind of conflict. I'm working on a model that shows what the optimal steal% should be given a given talent level. It's all theoretical, but, off-hand, I'd say it would be quite improbable that Rickey, in that season (1982?) was stealing optimally.
What's a Ball Player Worth? (November 6, 2003)
Posted 10:31 p.m.,
November 6, 2003
(#3) -
Tangotiger
Ok, I just read the article.
1 - This is NOT a novel concept, since the Mills Brothers (PWA) did this over 30 years ago. This has since been done in some form by Albert/Bennett (PGP), Doug Drinen (WPA), Rhoids, Phil Birnbaum and yours truly (among others).
2 - I am unimpressed with their statement about fielding. They should read Primer more often. Their pitchers are probably overrated because of that.
3 - I would NOT use such a system to evaluate the "true talent" level of a player, but I would (and do) use such a system to evaluate the performance of a player. Therefore, this system is valuable for game-level type decisions more than anything.
4 - The 2 million$ / win has been repeated many many times by many many people.
5 - Their replacement level seems to be about -2 to -2.5 wins (based on the Giambi example cited). Good, cause that's what it should be.
All in all, it's great to see this kind of stuff come out.
What's a Ball Player Worth? (November 6, 2003)
Posted 10:51 p.m.,
November 6, 2003
(#5) -
Tangotiger
Just thought of something
6 - Since they don't measure fielding, they should have a positional replacement level. I would guess they don't based on their leaders list.
studes: I doubt there's riches to be made. And I don't think MGL cares, seeing that: a) he buys the STATS pbp and b) has publicly offered to GIVE his UZR to any team that wants it. This reminds me of corporate America that will only take something if they can pay for it. Linux? No! We need to hold someone accountable when something breaks down! That's the American way! Let's pay 10x more for Microsoft (the kings of the blue screen of death).
American Company to Japanese supplier: ... and finally, we want to see 2% defect rates on the product you supply.
Weeks go by, and Japanese supplier sends the product to America.
Japanse supplier: You will find the product you requested enclosed. As per your request, we have complied with the 2% defect rate. To make matters easier for you, we have sent you the defective parts in a separate box, though we don't know why you want a defective part.
What's a Ball Player Worth? (November 6, 2003)
Posted 10:53 a.m.,
November 7, 2003
(#7) -
tangotiger
It is funny that the people cited either need to be an author (Bill James, Pete Palmer) or need to have a PhD. Me? My ears would perk if Tippett and Ruane are cited, because then I know the journalist knew where to go.
As for the "15 hits a month", this section gets about 20,000 hits a month, which is still way way behind the whole Primer site of 1 million a month. My best guess is that Primate Studies is reaching 1 unique user per 1 million internet users.
What's a Ball Player Worth? (November 6, 2003)
Posted 12:55 p.m.,
November 7, 2003
(#9) -
tangotiger
Can you clarify your statement with an example. I'm having a little hard time following it.
To the extent that I understand it, on average, the swings will be exactly the same at the game level (over n games). That is, if a team plays 2 million games, and wins 1 million of them, the amount contributed by the offense and defense, given that they were equals to begin with, will be exactly the same.
Note that WPA is great in terms of answering a very very very specific question: "Given that you know the inning/score/base/out state, and given that all other things are equal, how much did the next event change the probability of the team winning? ", and it also tries to distribute that change to the players involved in that event (hitter, pitcher, runners, fielders) in some fashion. How you do that distribution is the fun/hard part, especially when you have to consider the park too.
What's a Ball Player Worth? (November 6, 2003)
Posted 4:24 p.m.,
November 7, 2003
(#11) -
tangotiger
I agree completely. Your context is actually any variable that you are aware of. It makes it very hard, for example, if you have Pedro on the mound. Right away, instead of the Sox having a .500 chance of winning, the combination of the Sox hitting, and Pedro pitching (say to an expected 27 batters) is .680, let's say.
So, before you do anything, you automatically have to give +.18 to the Sox players before the game starts! (I know, hard to believe.) Then, as the game is progressing, each marginal Pedro out is worth less than the opposing pitcher's out, simply because Pedro started from a higher win prob.
You might have some really funky stuff like Pedro being injured after 1, at which point the win prob drops down to say .55, and so you must now dock Pedro -.13 wins for being injured (but that's balanced against say the +.14 wins he gets just for being on the mound).
It's actually an extremely fascinating topic (to me anyway), and I will hope to finally implement this by opening day, 2004. But, we'll see.
What's a Ball Player Worth? (November 6, 2003)
Posted 3:34 p.m.,
November 8, 2003
(#14) -
Tangotiger
FJM, with all due respect, this is not a question about opinion. Once you accept the definition of what "Win Probability Added" is, then you are forced to accept its results.
Going to these examples:
300000000
200000000
The home team pitchers give up 3 runs in the first. That probably knocks say .25 wins out. The home hitters then score 2 runs, and that probably adds .15 wins. To start the top of the 2nd, the home team now has, say, a .40 chance of winning the game. From that point, to the end of the game, the off+def will lose .40 wins. Since the def is not allowing any runs to the end of the game, I'm guessing that they will gain about .05 wins per inning, while the offense, not scoring any runs, will probably lose about .10 wins per inning (I'd have to work it out exactly).
Anyway, adding it up, and we have the defense being worth +.15 wins, and the offense being -.65 wins. The total is -.50 wins. That's really just a wild guess.
000000003
200000000
In this case, the defense holds the opponent down in the first inning. Say that's worth +.03 wins. The offense gets 2 runs, and so let's say that adds +.12 wins (again, numbers just for illustration). All the way through the 8th, the def holds the opponent, and let's say that adds +.08 wins, while the off can't score any more runs, and let's say that's worth -.07 wins. So, going into the 9th, the home team now has a .72 chance of winning. The defense gives up 3 runs, and let's say knock the chance of winning all the way down to .32 (or a -.40 loss). The offense scores nothing, and so they are worth -.32 wins.
Adding it up again, and the defense is .03 + .08*7 -.40 = +.19 wins, and the offense is .12 - .07*7 - .32 = -.69 wins. I wouldn't be surprised, since both these example gave you pretty close numbers using my wild-asses guesses, that if I did it using my model that you might actually end up with exactly the same numbers.
000000003
000000002
Ok, in this one, the defense gains +.05 wins per inning, and the offense loses .05 wins per inning going into the 9th. The defense gives up 3 runs, and the home team now has a .03 chance of winning. The home team scores 2 runs, but they may as well have scored 0 or 1, as this is still worth -.03 wins.
Adding it up, and the defense is +.05 * 8 - .47 = -.07 wins, while the hitting is worth -.05 * 8 - .03 = -.43 wins.
So, if you can accept these kinds of results, the real-time change in win expectancy, then welcome to the world of "Win Probability Added". If you can't accept this kind of premise, then WPA is just not for you.
What's a Ball Player Worth? (November 6, 2003)
Posted 11:09 a.m.,
November 10, 2003
(#16) -
tangotiger
Colin is dead-on. As long as the rest of the context is average, then the overall performance (whether as above-average or above-replacement), it won't change. However, as soon as the rest of the context is say a really bad fielding team (i.e., the Yanks), then things change a little.
Work it out, and assume you've got a fielding team that gives up +.02 / BIP more than league average, and assume you've got 2 equal pitchers in terms of $H and component ERA, but one gets 50% of his PA on BIP, and the other gets 90%.
What's a Ball Player Worth? (November 6, 2003)
Posted 2:31 p.m.,
November 10, 2003
(#17) -
tangotiger
(homepage)
Danil,
If you are so inclined, use the process that I set forth in post 14, with the theoretical or empirical data at the above homepage link, and figure out the off/def split on your examples using WPA.
Only do this if you want to get your sabermetric hands dirty a little.
What's a Ball Player Worth? (November 6, 2003)
Posted 4:33 p.m.,
November 10, 2003
(#20) -
tangotiger
Studes, I don't want to comment specifically on WS, so let me just extend my above example.
Say you've got 3 pitchers, each with a $H of .280, in a league of $H, and the fielders are all league average. Let's say that one pitcher has lots of K and BB, another has low in both, and another is league average.
All these pitchers give up 4.2 RPG, and are therefore equals.
Now, let's say each of these 3 pitchers now pitch with Ozzie and a few other good fielders, so that their $H is now .260 each. The high K+BB guy won't get to benefit as much from this arrangement (not as much leverage) as the low K+BB guy. The RPG for these 3 pitchers are now 3.70, 3.75, 3.80.
Now, we can see here that the fielders with an average pitcher are worth +.45 RPG (4.2-3.75). However, the high K+BB guy essentially wasted .05 RPG simply because he didn't use them enough.
In this context, in the context that this high K+BB pitcher pitched, he's actually -.05 RPG compared to the average.
From a replacement standpoint, they'd all be compared against the same baseline, whatever it is, as long as that baseline is based against the actual fielders these guys had.
What's a Ball Player Worth? (November 6, 2003)
Posted 6:53 p.m.,
November 10, 2003
(#23) -
Tangotiger
The "9" is wins above replacement.
What's a Ball Player Worth? (November 6, 2003)
Posted 12:30 p.m.,
November 12, 2003
(#27) -
tangotiger
You will find that most events will have a WPA figure that is in the same proportion to the run probability added figure (i.e., LWTS). IBB and defensive indifference would be 2 that would standout against this rule.
***
I'm working, on and off, on my own WPA. The preliminary results shows that Randy Johnson and Pedro, from 1999 to 2002, were around 21 wins above average, if we assume that half their BIP values would be split half/half with their fielders. (In any case, Pedro/RJ did not have performance on their BIP that was much different than average.) So, a top level pitcher, is worth +5 wins above average per year.
At the tier below that, Schilling, Maddux, Mussina, Brown, they were 10 wins above average, or about +2.5 wins per year above average.
Assuming around 2 wins below average being replacement, and 2 million$ / win, I get RJ specifically as being +8 wins and worth about 16.8 million$, on average, from 1999-2002. Pedro was +6 wins, and worth 12.8 million$, Schilling +5 and 10.3 million$. After that, you've got a good number of pitchers at the 8 million$ level.
Interestingly, Armando Benitez came out as the best reliever in that time period! The relievers were Benitez, Nen, Foulke, Hoffman, Remlinger, Rivera. A top reliever is worth around 7 million$ / year.
I've got Pettite as worth 5 mill / year, and the Yanks would be crazy to pay him at the 10 mill level.
I have not accounted for the park yet and a few other things. This is still a work-in-progress.
What's a Ball Player Worth? (November 6, 2003)
Posted 3:05 p.m.,
November 12, 2003
(#29) -
tangotiger
Actually, since you've got top pitchers over those 4 years that were not hurt, then I really don't see an issue in this particular case.
And even if they were, it would be a simple matter for you or someone else to extrapolate that to a full-4 years.
What's a Ball Player Worth? (November 6, 2003)
Posted 3:19 p.m.,
November 12, 2003
(#30) -
tangotiger
As for Win Shares, can someone post RJ and Pedro's WS totals over 1999-2002 (as a total), as well as the number of WS that a league average pitcher would have been expected to get given the # of PA (or IP) of RJ and Pedro.
What's a Ball Player Worth? (November 6, 2003)
Posted 8:06 p.m.,
November 12, 2003
(#32) -
Tangotiger
Using IP (though PA is probably a better approach), RJ had 1030 IP from 99-02, and Pedro had 746.
Assuming ARI,BOS had 162 GP each:
Avg Pitching WS = 162 GP x .5 W / GP x 3 WS / W x .35 WS = 85
IP = 162 x 9 = 1458
WS / IP = .0583
So, an avg pitcher over 1030 IP would have 60 WS. Over 746, it would be 44.
RJ therefore was 107 - 60 = +47 Win Shares above average = +16 wins above average. I have him at +24 I think.
Pedro was 89 - 44 = +45 Win Shares above average = +15 wins above average. I think I have him at +20.
Win Shares is way below what it should be for pitching.
And remember, I'm looking at it from a PA-by-PA basis. I can also perform the same calculation simply using component ERA or any other measure.
James is saying that Pedro/RJ were +4 wins above average from 99-02, and I'm saying there were a bit over +5 wins above average.
James, to the best of my analysis, is wrong.
What's a Ball Player Worth? (November 6, 2003)
Posted 8:04 a.m.,
November 13, 2003
(#36) -
Tangotiger
Thanks to studes, if we include pitching+fielding+hitting, pitchers (AL and/or NL) get 36% of win shares.
So, pitcher win shares = .36 x 243 = 87.5
non-pitcher WS = .64 x 243 = 155.5
Give an average pitcher 7 IP x 36 starts = 252 IP
This pitcher's WS = 87.5 x 252 / (162 x 9) = 15
For a non-pitcher playing average at an average position (3b? cf?), if this player played 140 full games, he'd also have 15 win shares. (140 x 155.5 / [162 x 9] = 15).
So, who here believes that these 2 players are equals? That is, an average pitcher, pitching 7 IP / game for 36 games, has produced the same win/loss impact as an average hitter/fielder at an average position over 140 games?
What's a Ball Player Worth? (November 6, 2003)
Posted 1:46 p.m.,
November 13, 2003
(#38) -
tangotiger
Good point. Ok, so that's 125 NL games and 140 AL games. That by itself is a little troubling no?
So, we are saying that the following will all have 15 win shares:
- an average everyday NL player, playing in 125 complete NL games
- an average everyday AL player, playing in 140 complete AL games
- an average pitcher, either league, throwing 252 IP
Thanks studes, you just convinced me even more about how poor the scale is between non-pitchers and pitchers.
METS SEARCHING FOR STATS ANALYST (November 7, 2003)
Posted 10:41 a.m.,
November 7, 2003
(#3) -
tangotiger
I didn't, but I was thinking about it a few months ago.
Someone gave me the bright idea that the best way to get into a front office, if you really want that, is to schmooze with the assistant GMs. Since I live in northern NJ, I was well-aware of what was going on with Steve Phillips and Jim Duquette. "Hmmmm, I thought... what if I contact Duquette?" But, I never got around to that, as I was much too busy posting here.
However, I need a stable and good paying job (funny how your perspective changes once you have a baby), and I'm not sure that MLB offers that.
Note to Mets: MGL recently moved back to NY.
Note to MLB teams: Tom Tippett, Tom Ruane, Mike Emeigh, Keith Woolner, Chris Dial and several others should be on your "exhaustive search" list.
METS SEARCHING FOR STATS ANALYST (November 7, 2003)
Posted 10:44 a.m.,
November 7, 2003
(#4) -
tangotiger
You know, USA runs Shawshank all the time, and I can never get enough of it. It's such a brilliantly-crafted movie. Every character in there breathes life. Godfather (I and II), Shawshank... never get tired of those. Color Purple. Anything else?
David Pinto and fielding (November 10, 2003)
Posted 3:44 p.m.,
November 10, 2003
(#3) -
tangotiger
If I remember right, on a team level, fielding is about +/- 40 runs per team, with some exceptions (like the 2002 Angels).
Great observation, studes, on the 1-year thing. You are absolutely right. You've got to have multi-year data here, or things get skewed.
What David is doing with his measure is, essentially, best-fitting. It implicitly assumes that the data itself is not a sample, but the population.
You can think of the "strength of schedule" idea, as it's the same principle. You may have the 98 Yanks input into a strength of schedule to come out with a .681 win% against a balanced league, but that itself is still subject to regression towards the mean.
This would actually be a huge stumbling block, depending on the size of the sample, in David's methodology.
MGL, on the other hand, does (I think) regress his adjustment factors to account for this.
David Pinto and fielding (November 10, 2003)
Posted 10:20 a.m.,
November 11, 2003
(#6) -
Tangotiger
In comparing the MGL and Pinto model:
- The real strength in MGL's model is that we can "see" what all the underlying "impacts" are for every variable, as well as to easily be able to regress those impacts
- The real drawback in MGL's model is how he gets those adjustment factors. Essentially, we're treating all these variables as independent to each other, and then just multiply them all to get the impact for a given context. We're not even sure we can do that, and, to even get those factors to begin with, you have to try to strip them away from those polluted context
- The real strength in the Pinto model is that we don't treat all these variables as being independent
- The real drawback is that the data being used would probably have a rather large standard error, and that we can't really see what the impact is. We can see it if we isolate one variable, but if you have a combination of variables, we really don't know what direction it's pulling the data, as well as to the degree. And of course, no regression towards the mean (though that may be able to be built into the system).
This really just becomes a statistics problem.
David Pinto and fielding (November 10, 2003)
Posted 10:42 p.m.,
November 12, 2003
(#7) -
Tangotiger
(homepage)
David posted the team-by-team breakdown (see above link).
David Pinto and fielding (November 10, 2003)
Posted 10:52 a.m.,
November 13, 2003
(#8) -
tangotiger
Doing a "actual outs" minus "expected outs", here are the leaders:
Position Outs Diff Team
4 34 Orioles
3 30 Devil Rays
6 30 Devil Rays
4 29 Athletics
8 24 Braves
5 23 Astros
4 23 Pirates
6 23 Royals
6 22 Expos
4 20 Blue Jays
and the trailers
4 -21 Twins
3 -22 Athletics
5 -22 Mets
1 -22 Orioles
4 -22 Rockies
6 -23 Yankees
4 -24 Mets
1 -25 Devil Rays
9 -25 Orioles
9 -27 Dodgers
Yes, there's Jeter over there...
How can you possibly be -22 at 1B and +30 at 1B? Kinda surprising.
If I take the absolute value of all the differences, here's what the average spread is by position
Pos Avg err
1 8
2 4
3 8
4 12
5 8
6 10
7 6
8 9
9 8
This means that the spread in talent is found most at 2B, and least at C. The C is not surprising, since they'll be involved so little on BIP.
David Pinto and fielding (November 10, 2003)
Posted 11:33 a.m.,
November 13, 2003
(#9) -
tangotiger
Anyone else notice how well the Atl CF are doing (i.e., Andruw Jones)?
The one big difference between David and MGL is that MGL looks at the grid location, while David only looks at the pizza slice location. So, if Jones plays shallow or deep, this will have an effect on things.
Perhaps the Official Priamte Statisticians can jump in here, insofar as I'd like to talk about logistic regression and independence.
Let's focus on Andruw Jones. When MGL does his "out rates by zone", where the zone is (more or less):
grid,stadium,type of fly ball, pitcher tendency on flyball, batter handedness
MGL comes up with league-wide adjustment rates, so that one player doesn't have much impact overall. Though, I'm not convinced about the stadium effect.
When David does his analysis, doesn't the identity of the player now become a problem, since the out rates by zone, as David does them, is now polluted by that player to a much larger extent?
I'm still a little confused on making sure that Andruw Jones doesn't impact what the out conversion rate is on a flyball hit in Atlanta.
(Kinda like having a LH HR factor for the 1920's Yanks... and having Babe Ruth make up the majority of that sample.)
Thanks in advance...
David Pinto and fielding (November 10, 2003)
Posted 3:36 p.m.,
November 13, 2003
(#11) -
tangotiger
I believe UZR would have the same thing. For example, if Andruw catches a ball in zone Y where 95% of all balls are caught, this is what happens:
run value of ball in zone Y = .05 x (-.60) + .95 x (.30) = .255
run value of ball CAUGHT in zone Y = .30
run value of ball NOT caught in zone Y = -.60
Therefore, the change in run value of a ball caught in zone Y
= .300 - .255 = +.045 runs
The run value of a ball not caught in zone Y = -.855 runs
95% of .045 + 5% of -.855 = 0
David Pinto and fielding (November 10, 2003)
Posted 2:03 p.m.,
November 14, 2003
(#14) -
tangotiger
I sent the following to David.
**************
I made a comment here on the perfect UZR which I will cut / paste here
======= cut ==========
- the location of the batted ball
- the trajectory of the batted ball
- the speed of the ball
- the game situation (inning/score/base/out/count)
- the speed of the runner(s) and batter
- the batter's handedness
- the gb/fb tendency of the pitcher/batter
- the type of pitch thrown (fastball, curve)
- the surface of the park
- the dimensions of the park
- the climate that day
- teammates' positioning
===== end cut ========
So, in addition to the base/out that you mentioned,
you might as well throw in inning/score, right?
I don't remember seeing the surface of the park in
your variables, so you might want to consider that.
The gb/fb tendency of the pitcher/batter also effects
the rates, as would the speed of the baserunners.
These are a little more problematic to figure out,
though. The climate (temperature, wind, etc) might be
worth considering as well. I see that STATS now has
the speed and type of pitches thrown.
Anyway, consider using any/all of these.
You have to be careful about now applying your
probabilities to the pitchers, because you do not want
the pitcher skill as a variable. For example, the
handedness of the pitcher should not, I don't think,
be a variable in here, since the handedness is a trait
of the pitcher. We are not trying to remove the
pitcher handedness bias, since it's something inherent
to the pitcher. Same thing with the slice location.
This is under the pitcher's control, so you don't want
to treat that as a bias to account for.
In that above link, I list the variables from the
pitcher's perspective:
===== cut =====
- the game situation (inning/score/base/out)
- the speed of the runner(s) and batter
- the batter's handedness
- the gb/fb tendency of the batter
- the surface of the park
- the dimensions of the park
- the climate that day
==== end cut ==
Great job overall!
David Pinto and fielding (November 10, 2003)
Posted 2:17 p.m.,
November 14, 2003
(#16) -
tangotiger
Octavio Dotel had 205 BIP, meaning that 1 SD = .032. Dotel's actual DER minus expected DER was .063. This means that Dotel's performance was a shade under 2 SD from the mean. That was actually the largest deviation on the one side. Jeff Weaver was 2.7 SD the other way.
In the 230 pitchers in the group, we had only 1 pitcher that was more than 2 SD. We also had 196 within 1 SD, or 85%.
However, because of, I think, the "removal of skill" that David is doing, this forces everyone towards the mean.
Even look at say keeping the slice location in. Say you have a pitcher that gives up plenty of deep flyballs. However, under David's system, all pitcher BIP in that slice should give you the same "expected outs". A pitcher with a great fielder or a pitcher that gives up alot of easy flyballs will record more outs than expected, and we don't know the real reason.
The OF doesn't have this problem much, because he's got many pitchers getting balls over there. The same can't be said for the flip side.
David Pinto and fielding (November 10, 2003)
Posted 5:08 p.m.,
November 15, 2003
(#20) -
Tangotiger
Do you think that the trajectory and ball speed variables would serve as proxies for "depth" in the "slice"?
No, I don't. And, since the grid location is available, I don't understand why not include it as well. Between "slice", "grid", and "ball speed", I would think the latter is most at risk from scorer judgement.
As well, I'll reiterate that I think David is not handling the pitcher portion of his system correctly, so we should wait for a response from him before trying to make sense of the numbers.
David Pinto and fielding (November 10, 2003)
Posted 5:18 p.m.,
November 15, 2003
(#21) -
Tangotiger
Let me try to give an example of where I think David is going wrong with his pitching version.
Say that he only looked at 2 categories: handedness of pitcher, and batted ball type (and let's assume the batted ball type is either air or ground).
So,
p(LP,a)= .70
p(RP,a)=.80
p(LP,g)=.75
p(RP,g)=.85
Those are probability of getting an out.
So, say you have a left-handed ground ball pitcher. HEre's would be his frequency numbers:
f(LP,a)=.20
f(LP,g)=.80
So, what would the "expected outs" be?
.7 x .2 + .75 x .8 = .74
And, let's say that he actually got .74 outs. So, this means that he has average skill, right? Wrong. He has average skill relative to a LP. If you look at the prob rates, all LP have a probability of getting an out as -.10 relative to a RP.
If you've got 70% of the pitchers as RP, then the average LP would be -.07 relative to an average pitcher.
Therefore, our average LP is actually -.07 realtive to an average pitcher.
The problem is that David treats the variable of handedness of the pitcher as something to account for (you should) and essentially remove (which you shouldn't).
I think I have it right here.
David Pinto and fielding (November 10, 2003)
Posted 9:55 p.m.,
November 15, 2003
(#23) -
Tangotiger
My numbers were strictly for illustration and for me being able to do it in my head.
Win and Loss Advancements (November 13, 2003)
Posted 10:24 a.m.,
November 13, 2003
(#1) -
tangotiger
Should have noted, but all the above numbers is 1999-2002.
Win and Loss Advancements (November 13, 2003)
Posted 11:10 a.m.,
November 13, 2003
(#3) -
tangotiger
For the moment, I have not made a distinction between the types of BIP, nor of using the quality of the fielder. Even without doing so, the spread was so tiny that the extra work to do this will set me back several weeks. Since I'm not writing a book about this (yet), I'm going to let it go for a while. I'd much rather account for the park before I do anything else. That'll have, by far, the largest impact.
As for how to do the split, it's a secret for now, but if you followed the "Anatomy of a Collapse", you might be able to figure it out by a couple of examples in there. It's all pretty much common sense, after I would explain it, and it follows pretty much what a fan watching the game would think.
I have not compared the difference between Starters/Relievers, though I will at some point.
The only provision for the reader is that he must accept the concept of win expectancy. If you can't accept it, Win Advancement is not for you.
Win and Loss Advancements (November 13, 2003)
Posted 1:42 p.m.,
November 13, 2003
(#6) -
tangotiger
In terms of performance (value), I wouldn't be surprised if that's the case, simply because the smaller sample size and the higher leverage will conspire to give you that.
Win and Loss Advancements (November 13, 2003)
Posted 1:45 p.m.,
November 14, 2003
(#7) -
tangotiger
I have also added the leading hitters (go to top of page).
I'm having a tough time trying to figure out what to do with basestealing. For example, you have runners on 1b and 2b. I'm inclined to give the double-steal value completely to the runner on 2b. With runners on 1b and 3b, again, some of that value of the steal of 2b should go to the runner at 3b, since he's keeping the pitcher/catcher honest a little. I might just decide to split the difference, and just give everyone an equal share. Any thoughts on the matter?
Win and Loss Advancements (November 13, 2003)
Posted 2:33 p.m.,
November 14, 2003
(#8) -
tangotiger
(homepage)
Anyone else surprised how little Bonds is head and shoulders above the rest?
For example, I'm saying here that Bonds is +7.5 wins above the average hitter, Giambi is +6 wins, and you have a group at +5 wins.
If we look at MGL's superLWTS from 2000-2002 (See homepage link), Bonds is +271 runs over those 3 years, of +9 wins per year. Giambi is next at +202, or +7 wins, and then a group at the 150-160 range, or +5 wins.
So, while Bonds' counting stats are impressive, his performance is not at all randomly distributed to the extent that others' performances would be. What happens here is that, just as Hoffman et al can have higher leverage PAs, then Bonds's PA can be lower-leveraged to the point where he can be contained (relatively speaking). Bonds himself is still a monster, but a contained monster.
Win and Loss Advancements (November 13, 2003)
Posted 3:20 p.m.,
November 14, 2003
(#10) -
tangotiger
I've included basestealing. Go to top of page.
Win and Loss Advancements (November 13, 2003)
Posted 3:33 p.m.,
November 14, 2003
(#11) -
tangotiger
Thanks Michael. I'm kinda enjoying this.
***
Just to show the non-effect of how I'm handling the basestealing, let's consider Johnny Damon. Over 1999-2002, 192 times something happened with Damon on 1B (SB, CS, PO, WP, etc, etc). 166 of those times (86%), he was the only runner on base. Of those 26 times where there were other runners on base, the total WAA was 0.73 wins. In my system, I gave Damon 0.32 wins. So, he had another 0.41 wins that you may think he would have deserved.
When he was on 2B, he was alone 11 times, and lead runner another 27 times, (and 3 times he had a runner at 3b). Again, in my accounting system, I give Damon 0.46 wins, but if we give Damon a win for all the times he was the lead runner, that would be 0.67 wins. So, that's another .21 wins that you may think he deserves.
Finally, with him on 3B and a runner on 1b and no one on 2b, Damon I gave Damon .20 wins, while you can argue that all .40 wins should go to the runner on 1B.
Net/net: you can argue about adding .42 wins (over 4 years) to Damon's total. I'm not too interested, right now, to worry about the .1 win per year (1 run per year) impact in trying to improve this model.
Win and Loss Advancements (November 13, 2003)
Posted 7:02 p.m.,
November 14, 2003
(#15) -
Tangotiger
BaseRuns would only work if I take one-9th Bonds, and 8-9ths an average player. It would be easy enough to come up with the average runs scored.
I doubt that BaseRuns (or its offshoot: custom LWTS) will give us reliable enough number for Bonds, simply because of the non-randomness of his numbers. Though, I can't say for sure how much off they might be.
Also, let's remember that I have not park-accounted yet. That might actually be enough to make up the difference that we see here.
I already have a pretty good idea how I would generate Win and Loss Advancement for non-PBP years, so that I can generate this for the whole history of baseball. I would guess that I would ask you at some point to supply your fielding numbers to the WA system, so that we'd have a complete set of numbers. But, we'll talk about that in a few months.
Win and Loss Advancements (November 13, 2003)
Posted 1:30 p.m.,
November 15, 2003
(#18) -
Tangotiger
jto: I don't understand the question. I use my system where I have PBP data. And I'll have another system that'll try to match, as best as possible, to the PBP system for those years with no PBP data.
Win and Loss Advancements (November 13, 2003)
Posted 9:55 a.m.,
November 25, 2003
(#19) -
tangotiger
Not sure if you noticed Foulke in the list. If you remember an article from last year, Foulke's LI was around 1.3 for the 1999-2002 time period (essentially, he was used, overall, similar in impact to a setup guy). So, his incredibly strong showing is even more impressive here.
If he was used "properly", he would have shot up as the #1 reliever in impact, along with Benitez. Benitez is another interesting pitcher, and the above suggests that he was quite effective "in the clutch", which is very contrary to the perception and results of 2003. (Numbers not park adjusted.)
Keith Foulke not only has the context-neutral numbers of potentially the best reliever in the game, but his performance when it counted was even more impressive.
Foulke will go to a team that can appreciate him. Redsox? BlueJays? Mets?
Win and Loss Advancements (November 13, 2003)
Posted 3:39 p.m.,
December 1, 2003
(#20) -
tangotiger
I wrote this at Fanhome, and will repeat it here:
====================
As for the various win impact methods, you have to decide what perspective you like the most, and use that method:
Perspective 1
I look at things in real-time, like a manager or fan does, and want to attribute the win impact as it happens. This means that I must assume a random distribution of events (centered around expectations of player matchups and managerial tendencies) for all future events.
Perspective 2
I look at things after the game is over, and try to attribute value to the various performances, given that I know all future outcomes. I assume that value was only created by players on the winning team (meaning performances that led directly to runs scoring or not).
Going 4-4 in innings where your team scored no runs, yet won the game had, essentially, very little value. (You can argue it had some because it let your team send an extra batter and prolonged the inning, but now you get into the what-if / probability scenario, and this perspective does NOT like this.)
Perspective 3
I think that all performances are statistical random variations centered around the players/park matchups, and therefore, I don't care whether my team won or lost. I just want to know what would have happened, on average, to a team, if I were able to insert this player into it.
This is the seasonal perspective, where you look at a player's line, and simply use a simple runs-to-win converter to figure out his theoretical win impact on a theoretical team.
I think that about covers it.
I don't really see anyone as being right or wrong here, since this is a question of perspective or opinion.
Win and Loss Advancements (November 13, 2003)
Posted 11:00 a.m.,
December 3, 2003
(#22) -
tangotiger
pitcherid WA LA WAA WAA2
johnr005 83 60 23 24
martp001 55 35 20 21
schic002 67 54 13 14
Only Schilling could go from the 2nd best pitcher on his team to ... 2nd best pitcher on his new team. If Foulke ends up with the Sox, Redsox would have the 4 of the 10 best pitchers from 1999-2002. I'm sure they could get Benitez cheap, to make it 5 out of 10. I smell Voros.
If I separate the top 330 pitchers from the rest in terms of game impact (WA+LA), here's what the totals are:
- the top 330 pitchers have 80% of the game impact (essentially the effective IP).
- the bottom pitchers have .480 WA and .520 LA for every PA
I think this is a fair line to draw between regulars and replacement level. 80% to regulars and 20% to the backups/replacements. For Randy Johnson, the replacement level comes out to 1.4 wins below average. For the rest of the top starters, it's around 1.2 to 1.3 wins below average. THAT, I think, is the best measure of replacement level.
Anyway, in terms of wins above replacement, Pedro is +23 and Schilling is +17, over the 4 year period. Giving 1.85 million$ / win, and Pedro "earned" 11.1 and Schilling earned 8.3 million$.
Pitchers are incredibly overpaid.
Win and Loss Advancements (November 13, 2003)
Posted 11:15 a.m.,
December 3, 2003
(#23) -
tangotiger
(That was per year for the salaries).
I decided to separate based on 90% as regulars, and 10% as the backups/replacement. At that level, the WA/LA is .473/.527. The replacement level is now around 1.7 wins below average. In order to maintain the whole overall salaries paid out to pitchers at the same rate, I had to reduce the marginal $/ win to 1.4 million. Pedro comes in at 9 million$ and Schilling at 7 million$.
Kinda of strange that I make the replacement level more favorable to the top pitchers, but they get less money. This has to do because I put in a fixed 800 million$ allocated to pitchers.
Anyone have any numbers as to how much pitchers were paid in 2003?
Win and Loss Advancements (November 13, 2003)
Posted 11:32 a.m.,
December 3, 2003
(#25) -
tangotiger
Sure thing.
From 1999-2002, there were 1006 players who pitched. If I take the 450 pitchers who pitched the most (15 per team), that gives me 90% of the IP. Therefore, the other 556 pitchers can be considered the "replacements". They accounted for 10% of the IP, or essentially 144 IP per team per year.
These are the 10 pitchers in the replacement group that pitched the most:
coloj001
vizcl001
holmd001
bradc001
sands002
dipoj001
orosj001
tuckt001
georc002
kohlr001
I don't recognize any name except Jesse Orosco, and that, I think, we can say is the definition of replacement level from 1999-2002.
Win and Loss Advancements (November 13, 2003)
Posted 11:59 a.m.,
December 3, 2003
(#27) -
tangotiger
Careful!!
We've got a new scale here. (You will not that Pedro's WA/LA is 55/35, or 61% win advancements, and we know that Pedro is not a .610 pitcher.) 0.47 win advancements to .53 loss advancements is NOT a .470 pitcher. Apologies for not making this clear.
For example, a heavily used starting pitcher from 1999-2002 had 120 "game advancements" (WA+LA), or 30 GA per year.
An average pitcher would have gone 15/15 in WA/LA. The replacement level pitcher, using .47/.53, would come in at 14/16. That is, 14 win advancements and 16 loss advancements, or -2 wins.
Win and Loss Advancements (November 13, 2003)
Posted 12:30 p.m.,
December 3, 2003
(#30) -
tangotiger
That's a good question. The average game had 1.4 game advancements.
Setting the .473/.527 level to that, and our pitchers come in at 0.66 WA and 0.74 LA, or -.08 wins (i.e., the team would win 42% of the time, if given average hitting and fielding).
Win and Loss Advancements (November 13, 2003)
Posted 12:39 p.m.,
December 3, 2003
(#31) -
tangotiger
What if I take the top pitchers in playing time, so that they account for 95% of all IP?
In this case, the WA/LA would be .46/.54, which works out to -.11 wins, or a team of replacement pitchers would win 39% of the time with average fielding and hitting.
And if I take 99% of all IP? That would be .43/.57, or -.20 wins, or a 30% record.
I don't know about anyone else, but I think 90% is probably the right level, and maybe 95%.
Therefore, I would probably say that a team of replacement level pitchers and a team of replacement level non-pitchers would win about 30% of the time (calculations not shown, but just an educated guess). More accurately, they'd have the true talent to win 30% of the time. Over 162 games, they can of course win alot less (or alot more) by random statistical variation.
Win and Loss Advancements (November 13, 2003)
Posted 2:10 p.m.,
December 3, 2003
(#34) -
tangotiger
I might lean towards 95%, just because it seems to line up better with some existing replacement levels.
I really wouldn't pay attention to "existing replacement levels".
Would I be correct in assuming that these replacement pitchers would tend to have a low GA/IP ratio because they tend to pitch in low leverage situations?
You are correct.
Is it possible that as the GA/IP ratio increases that the WA/LA ratio may change? Or should it remain static?
The WA/LA ratio remains static, though the DIFFERENTIAL would increase. And, it's the differential that we care about.
Also, if a team were to pitch all replacement pitchers, I would think it's possible that the total amount of GAs might increase.
That is a great question, one that is on my to do plate in terms of creating historical WA/LA. My guess is that extreme pitchers (good and bad) have a below average GA/BFP ratio. That is, pitchers who pitch in non-close games have their Leveraged Index lower. However, in the level that exists in MLB, the LI of Bert Bylelven and Bob Knepper were both around 1.0. I suspect that the GA/BFP ratio (which is similar to LI) to be pretty constant for all types of starters.
If a game with average pitching had 1.4 GA, the replacements might have more of an overall effect on the game (negatively of course) than an average pitcher. So the total impact might be greater than the -.11 wins you came up with keeping the GA the same.
This is definitely on my to do list. I'm going to look at the GA for pitchers when they win and lose, and for different types of pitchers. With RJ and Pedro, the game score won't fluctuate as much as an average pitcher, and so, I'd guess that the GA is smaller for them. Same for a bad pitcher. Think of it this way: remember that 1993 playoff game between Phi and Tor, where it was 15-13 or something? There were so many wild GA in that game that I suspect that it would be off the charts. For a one-hitter, the GA would be very very small.
I guess the what I'm getting at in the last part of my previous post is that using "game advancements" to figure replacement value might not be a good idea.
You are probably right in the technical sense, though I'm not sure if there's any practical difference. I should have tracked # of PAs, but neglected to. And, re-running the report is a bit of a pain. I'd rather go on to something else, and revisit later.
Win and Loss Advancements (November 13, 2003)
Posted 2:18 p.m.,
December 3, 2003
(#35) -
tangotiger
I might lean towards 95%, just because it seems to line up better with some existing replacement levels.
I really wouldn't pay attention to "existing replacement levels".
To continue, I don't think you should come in with a mindset that "hmmm.... 40% seems right... what does that imply", and get a result of "o i c... that means that 5% of PAs are considered replacement level".
I would suspect that very very few people would consider 5% replacement level. Remember, 10% of IP being replacement level means that you have 450 pitchers over 4 years as being regulars, or 15 per team.
If you ask yourself "what is replacement", you'll probably answer "if I lose a pitcher, who can I get to replace him with", and you'll probably look outside the 300 or 330 pitchers. Over 4 years, with turnover and such, maybe that balloons up to 400 or 450".
But at the 95% level, you've got 550 pitchers in your regular pool, or over 18 pitchers per team over 4 years. That's kind of unrealistic, in my view.
Win and Loss Advancements (November 13, 2003)
Posted 4:36 p.m.,
December 3, 2003
(#37) -
tangotiger
What's interesting is that you take the bottom 10% in IP for a given year, whereas I'm taking the bottom 10% over a period over 4 years.
I suppose if you were to look at it from a 1 year time period, you should take the top 300 or 330 pitchers, and everyone else would be part of the replacement group. (Ideally, you would do this based on Opening Day pitchers + DL, and NOT after-the-fact performances.)
Win and Loss Advancements (November 13, 2003)
Posted 2:03 p.m.,
December 9, 2003
(#39) -
tangotiger
Fixing the total to 1.063 billion$ (since with my list I won't be able to split a player's salary by MLB service time), and fixing the $/win at 1.85 (based on previous reseeach), and the win advancement replacement level is .469, which is pretty much what we concluded it should have been. So, what we have is a strong indication that the $/win and the replacement levels are in-synch, and that, overall, pitchers are properly paid.
Anyway, here are the $ earned from 1999-2002 for the top 10 pitchers:
pitcherid salary earned
johnr005 $15.1
martp001 $12.0
schic002 $9.6
maddg002 $9.4
mussm001 $8.8
benia001 $7.8
browk001 $7.6
hudst001 $7.0
leita001 $6.9
lowed001 $6.9
Very interesting that the 2 best starters are properly paid. The problem is with all those second-tier pitchers getting the money of first-tier pitchers. (And, in this light, anyone not named RJ or Pedro is second-tier.)
Furthermore, while a pitcher may have earned those salaries, they would not be expected to continue earning such salaries. Their true talent in the coming years is probably below their previous performance levels.
I'd put a cap at 9 million$/yr for a pitcher (except for the big two).
Win and Loss Advancements (November 13, 2003)
Posted 2:11 p.m.,
December 9, 2003
(#40) -
tangotiger
The reason that free agent pitchers are overpaid is that teams save alot of money on the under 6-yr pitchers (because the system allows them too). So, they have a HUGE ROI on these pitchers, and instead of pocketing the money or reinvesting it into the organization, the teams overpay for the freely available talent.
Overall, they are paying what they should: 1 billion$ for pitchers. To put this into common man terms, a pair of Air Jordan's cost 125$, and my Reeboks cost 50$. So, I'm prepared to pay 175$ for the 2 pairs, but I got a discount for Reeboks to 25$. Most people would still only pay 125$ for the Air Jordan's, but a MLB team would pay 150$ for them.
Win and Loss Advancements (November 13, 2003)
Posted 3:14 p.m.,
December 9, 2003
(#42) -
tangotiger
I didn't mean to single out pitchers. This would apply to hitters as well.
And, I take it as a given that MLB, on the whole, have properly split the allocation between pitchers and non-pitchers, hence my allocation of 1.063 billion$ to pitchers. So, I'm presuming that the average pitcher and average non-pitcher are properly paid.
By the way, how much $ went to nonpitchers? Seeing that pitchers get 36% of the Win Shares, that means that they earned 1.063 billion$ for 2620 Win Shares, or 400,000$ per win share. You had the overall as 300,000$ per win share, meaning that nonpitchers must be at around 250,000$ per win share. I wouldn't be surprised if this is tied into a problem with Win Shares (i.e., not enough WS for pitchers).
Win and Loss Advancements (November 13, 2003)
Posted 4:43 p.m.,
December 9, 2003
(#44) -
tangotiger
Don't forget that 300 million of the pitcher salary is part of their minimum salary. It's like like a non-pitcher get 2 minimum salaries (one for hitting and one for fielding).
I'll take a guess that since 700 million$ goes to a pitcher's above the minimum salary, that 300 million$ should go to fielders. However, because the distribution of the fielders is not like that of pitchers for various reasons, I'll say it's more like 100 to 200 million.
I've got some long process, and this is what I've come up with. Set the replacement level to .47 win advancements for pitchers and hitters (that works out to a "win %" of .420). Set the marginal $/win to 1.42. This will give us 860 million$ for pitchers and 1340 million for nonpitchers.
That 1340 is split as 480 minimum salary, 120 fielding, and 740 for hitting.
The 860 is split as 300 minimum salary, and 560 for pitching.
So, using 740, 560, 120, the breakdown is 52% hitting, 39% pitching, and 9% fielding.
Win and Loss Advancements (November 13, 2003)
Posted 5:20 p.m.,
December 9, 2003
(#46) -
tangotiger
Hmmmm... you know, my numbers were based on 1600 nonpitchers, and 1000 pitchers (because I'm covering a 4-yr span). So, I think I have a problem. I should probably set everything up to the 4-year salaries, rather than annualized. It'll save me grief. Ignore the above for now, and I'll repost the new results tomorrow.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 6:42 p.m.,
November 16, 2003
(#3) -
tangotiger
I meant that I don't believe in absolute wins/losses at the player level. Players, in a team sport, get their value based on changing the state at a marginal level. This is just my perspective.
To convert from a marginal utility to a total utility will require some weird things, like negative wins.
If you are going to say that runs scored leads to wins and runs allowed leads to losses, you are still going to have some "playing time component" to carry the context, so that you will always know what the average or expected absolute wins and losses is for that player. And even your conversion from runs to wins needs to be aware of the entire context for you to make that relationship. I think you are really hiding the marginal impact inside an asbolute total.
I look forward to seeing your post...
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 8:38 p.m.,
November 16, 2003
(#5) -
tangotiger
Exactly. Those things are MARGINAL. At the very least, your process starts with marginal utility.
While you and others try to convert that into total utility, I'm happy to leave it in marginal terms. Bonds had 60 win advancements and 30 loss advancements. Pretty clear. Why muddle it up by doing some conversion into total utility?
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 10:42 p.m.,
November 16, 2003
(#7) -
tangotiger
Btw, I like a common sense approach.
You see, a win probability added approach adds up after every single pitch, PA, inning, game, season.
However, any other approach doesn't have that luxury. For example, you lose a game 5-4. In any other approach, you either have to make those 4 runs as worthless, or, you have to have a large sample of games to give those 4 runs some meaning. And, you would give a 3-4 or 5-4 result the same win impact for those 4 runs, even though in one instance they happened in a loss and another in a win.
So, I would say I can essentially debunk any win impact system by saying that it doesn't hold in at least one case.
Win Probability Added doesn't have that problem.
It has other problems, to be sure. (i.e., it requires you to have the win probability for every particular infinite context.) But, it is mathematically perfect.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 9:36 a.m.,
November 17, 2003
(#9) -
tangotiger
Well, in terms of actual wins, they *are* worthless.
If you believe that, then in terms of value of a player (past tense), you should only include the performance of a player in a team win, right?
(Again, I'm not talking about establishing a player's true talent level here.)
So, Nolan Ryan's 8-16's league-leading ERA was pretty worthless to the Astros, given that they weren't able to leverage that performance.
When discussing MVP, we should only look at ARod performance in Texas wins.
Hey, I don't have a problem with this approach.... but don't try to take a player's entire season's PAs and try to estimate what he did in the wins. As long as you only include a player's performance in wins, then, you're right, this process is justifiable.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 11:20 a.m.,
November 17, 2003
(#11) -
tangotiger
Like I said, at the very least, it should hold up at a game level. Otherwise, what you have is something that is not mathematically perfect.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 11:55 a.m.,
November 17, 2003
(#12) -
tangotiger
Oh, just to be clear. I'm not saying that a mathematically perfect process is the only way to go. I'm just saying that if you are not going to follow such a process, then you will have at least one hole in the process. How big that hole is, and what its impact is, needs to be understood.
Even a math perfect process might have other holes in it, as noted earlier.
What I'm talking about is something theoretical/fundamental, that if it doesn't add up at the actual game level, then that's an obvious hole in a "win impact" system.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 2:01 p.m.,
November 17, 2003
(#14) -
tangotiger
Good job, Colin!
I think you're on the right step with the conceptual part. This process is similar to my win advancement, in that outs lead to loss advancement and positive actions (not necessarily runs created) lead to win advancement (and the reverse for pitchers).
However, you still have to prove that "it works". The easiest way is to figure out how many wins above average your various players are in your system and BaseRuns or Linear Weights, etc. I'm not sure that it'll work out fine though. But, it's a good first step.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 2:27 p.m.,
November 17, 2003
(#17) -
tangotiger
Mirroring reality is always the goal.
***
I wouldn't say the goal is to necessarily make the playoffs, otherwise a player like Arod might not have value. A player's goal may simply be for his team to win THAT game, and not say 85 or 90 or whatever games. If you stretch out otherwise, the goal is really to win the World Series. I think you've got to draw a line at either one game, or the World Series, and not have anything in-between, like playoffs, etc.
Since it is far far easier to figure out (game) Win Advancement rather than (World Series) Win Advancement, I'm happy to leave it this way.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 5:02 p.m.,
November 17, 2003
(#19) -
tangotiger
Thanks for taking the time to soak it all in.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 4:42 p.m.,
November 19, 2003
(#33) -
tangotiger
When we attempt to assign absolute wins and losses to a player what we are really trying to do is answer the question: In games that this players' team won, what percentage of those wins is this player responsible for?
If this is your question, then you should at the least split up a player's record as to whether the team won or lost.
Then, in all the team's wins, you don't care if a player went 0-20 or 0-150, as he didn't contribute to his team winning (though he made it harder for his team to win). So, there are no losses to hand out here.
And the reverse question for losses. And the fact is, negative wins and losses don't make any sense as an answer to these questions.
At the same time, you don't care if someone went 150-300 in his team's losses, since against, it wasn't enough. So, there's no wins to hand out. As for losses, I suppose you could just make the outs proportional to the losses.
Of course, we don't have players records by team won/losses for most of history.
So, if you were really serious, you would try to estimate that. For example, let's say that when a team wins, they score 6.5 runs and allow 2.5 runs. You would create a complete batting line as well. Then, you take your player's overall performance, and translate into a team win and team loss setting, so that overall it still adds up. (Complicated, but doable).
Then, you can apply your absolute win/loss concept.
(Btw, I still think marginal win/loss advancement is the way to go, but this is just for those who want to go to the logical conclusion on absolute win/losses.)
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 12:06 p.m.,
November 20, 2003
(#35) -
tangotiger
(homepage)
Studes has another installment at the above link.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 4:34 p.m.,
November 21, 2003
(#39) -
tangotiger
I posted the following at fanhome, in response to David Smyth's comment on absolute wins.
====================
A guy hits a leadoff triple in the game, and the next batter hits a SF. Then, 53 consecutive outs get recorded.
Now, in the system that I construct that models the real world, I give each team a .50 chance of winning the game, because this is the environment in which each team finds itself (let's assume equality for my purposes). The leadoff triple will bring the win expectancy to say .59, making that triple worth +.09 wins. This win expectancy, .59, assumes a typical run scoring distribution to the end of that inning, and the game.
The next batter gets the SF, and brings the WE to .60, making that SF worth +.01 wins.
To the end of the game, the leading team's pitching will add about +.60 wins, and its batters will add about -.20 wins.
So, winning team has added +.09 + .01 + .60 -.20 = +.50 wins
To the end of the game, the losing team's pitching will add about +.20 wins, and the hitters will add -.60 wins.
So, the losing team has added -.09 -.01 +.20 -.60 = -.50 wins
I submit that this model captures how the fan sees the game unfolding, and how that fan attributes the performance of the various players involved.
In the "absolute model", we start with the end of the game, 1-0. We then give 1.00 wins to the only players involved in the scoring, and in this case, it's the first 2 batters, and presumably, we give each batter 0.5 wins. The pitcher who threw to those two guys gets 1.0 losses.
Does anyone really think this captures reality?
I further submit that reality is that players contribute towards trying to win. That is, reality is that players have marginal contribution in a given real-time context. This is how we perceive and react to things (be it baseball or life).
I think that absolute wins have no meaning whatsoever to individuals that are involved in a group. NASA's space shuttle may blow up (wins = 0, losses = 1), but there are thousands of people that made substantial marginal contributions, but there was one person or a small group of people who failed miserably.
Furthermore, not only does absolute wins at a player level not even model reality, its application is so limited as to be useless. It won't even come close to capturing a player's true talent level, it won't have any predicatability, it won't have any other potential secondary byproducts.
Marginal wins on the other hand has these properties. While that doesn't validate marginal wins, it's a happy by-product that makes marginal wins very attractive.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 3:05 p.m.,
November 24, 2003
(#43) -
tangotiger
I assume that more wins added are now credited to the two hitters than in the first example, and the pitcher has less wins added. But is the pitcher's shutout really any less valuable than in the first example? Does this capture reality?
In the case where the batters would score say the 1 run in the bottom of the 9th, I think the pitcher on the winning team would get about +.45 wins, the two batters would get +.35 wins, and the rest of the batters would get -.30 wins. (On the losing team, the pitcher would get +.10 wins, and the batters would get -.60 wins.) Again, I'd say, yes, this captures reality, since the impact, in real-time, is felt much much more in the bottom of the 9th than in the top of the 1st.
After-the-fact, of course they are the same. That's the difference between an absolute model, and a marginal model. One thinks in real-time, and the other in after-the-fact.
Win Shares tries to do both at the same time, by looking at things after the fact (and over a span of games), but still trying to give some credit to anyone who did something good, even if it did not result in a run or a win.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 3:39 p.m.,
November 24, 2003
(#47) -
tangotiger
Colin, I think we reached an agreement here, which is rather boring!
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 7:38 p.m.,
November 24, 2003
(#50) -
Tangotiger
There are many viewpoints, neither of which is necessarily better, but rather different.
Here, we have the marginal approach in real-time, and there's the absolute approach after the fact.
In the absolute approach described by David, it has very limited application. We cannot directly use it to establish a player's ability, since he is, by definition, removing all performances in hitter's losses, and pitcher's wins.
In the marginal approach described by me, its applications are also limited in some sense. It can describe somewhat a player's abilities.
You can have other marginal approaches that only considers the base/out, or even no context. And the less context, the more you can describe a player's abilities, and less you can describe how much impact he had in a given game or inning.
It's a long scale, with value on one end, and ability on the other end.
What metric you choose should be based on what you want to describe.
In the two metrics we are discussing, I'm saying the better model is the real-time marginal approach, because fans follow in real-time, and decisions are made in real-time. And ALL performances count, whether in a win or loss.
I have no qualms with David's approach, either. I just don't like it too much.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 11:12 a.m.,
November 25, 2003
(#52) -
tangotiger
Well, David's assumption is that hitters create runs and runs create wins. Pitchers allow runs and runs allowed created losses.
If you can't buy into that, then you won't buy into David's system.
So, right away, David completely removes essentially half a player's performances. And, he's right to do so, under the above assumption.
And, as David has pointed out elsewhere, when you compare to an average player, things do balance out somewhat ok. For example, a pitcher with 34 starts may get 0 wins created (by definition) and 6 losses created (say Pedro), but that an average pitcher would have been 0 and 12. So, from that standpoint, Pedro was 6 better than average.
But, you have to be able to accept his initial assumption. If you can't get past it, then you will not care at all for his approach.
What I like about this is that it doesn't try to take some marginal / win probability crack at it. He doesn't say, "well Pedro only gave up 1 run, and even though it was a 1-0 game, I know he did something good, so I've got to give him some positive win contribution, and I'll balance that out against the hitters", even though you may think like this. You see, this thought is a marginal approach thought, that people are trying to fit into some absolute construction (with the eventual result of things like negative wins, etc).
David's approach is completely different.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 9:35 a.m.,
November 26, 2003
(#55) -
tangotiger
Colin, from that standpoint then, there should be no problems with a negative total utility (i.e., negative win shares).
David, I can buy into it, it's just that I don't think it's terribly useful or revealing. Essentially you are only including those PAs of a player that leads to a run and in games with a win. So, imagine a player with 150 runs produced in 162 games, with 81 wins. You are basically only going to count 100 his 650 PAs (100 of his 150 RP would be in a win).
What I think absolute wins are going to reveal is very little, in my view.
What is good about this is that it draws a line as to what is an absolute looking-backward system, and what is not. If we compare it to Win Shares, Win Shares essentially assumes that a certain amount of hits and walks lead to runs, and assumes that a certain amount of those runs were in wins. It really hides what it really wants to do (after the fact look at contribution in wins), by making these assumptions.
Your system lays it all out, and is proud to only consider 15% of a hitter's PAs.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 11:20 a.m.,
November 26, 2003
(#57) -
tangotiger
I agree that WPA and AWP would be THE best measures to evaluate MVP (probably the least-important, yet the most discussed topic ever!).
But again, no offense to you or your system, how many people will buy into the AWP assumption? That is, look at the game after-the-fact, consider only performances that lead to a run in games that led to a win (for hitters). If you tell them that you only looked at the 15% of the PAs that the player had, what do you think they will say?
With WPA, the assumption is that you are measuring the performance in real-time, and all of the player's PAs. (I'm not tooting my own horn here, since this was first described as Player Win Average by the Mills brothers).
I dunno, but I guess that 90% of people would buy into WPA more than they would into AWP, from a "modeling reality" standpoint.
Furthermore, WPA has more application since it considers all of a player's performances. So, while that technically doesn't make it better, its side benefits easily trumps AWP.
Again, limiting the discussion to "modeling reality", I don't think that AWP does (or at least it's doesn't model reality as most people perceive it, would be my guess). However, it IS better than Win Shares.
Win Shares, Loss Shares, and Game Shares (November 15, 2003)
Posted 10:28 p.m.,
November 26, 2003
(#60) -
Tangotiger
David, that's a fair enough point. If you have 2 win-based systems that work in completely different ways, and they both tell you the same thing about Pujols, let's say, then that leaves little room for discussion.
Since both WPA and AWP are grounded in their very specific assumptions, unlike Linear-Weights-converted-into-wins or Win Shares or anything else (where these try to assume a standard conversion process from runs to wins using seasonal data, and since they both have very different perspectives, they can both live in harmony.
Pitcher's Hitting Performance When NOT Bunting (November 18, 2003)
Discussion ThreadPosted 11:12 a.m.,
November 19, 2003
(#2) -
tangotiger
Well, since he removed the potential bunt situations (i.e., I suppose man on 1b and 0 or 1 out), then that would remove that potential bias, I 'd guess.
Pitcher's Hitting Performance When NOT Bunting (November 18, 2003)
Posted 1:47 p.m.,
November 19, 2003
(#4) -
tangotiger
and to think that he does is both arrogant and stupid, since somewhere on this earth lies a person or two who could figure it out if they (managers) would only bother to ask!
Anyone want to guess who will do most of the writing, and which person will do most of the sports talk shows?
Pitcher's Hitting Performance When NOT Bunting (November 18, 2003)
Posted 10:45 a.m.,
November 25, 2003
(#8) -
tangotiger
I'd say 6-9 months after our website comes up. The website would only come up in 2-6 months.
I hope people aren't too eager for it.... I have a full-time job and family, and I'm spreading the few hundred hours required to do this, here and there.
HOOPSWORLD.com Review: Pro Basketball Prospectus 2003-04 Edition (November 18, 2003)
Discussion ThreadPosted 11:09 a.m.,
November 19, 2003
(#2) -
tangotiger
It's alot easier in basketball than in football. The plus/minus concept that's used in hockey can easily be used in basketball. The results of that can be adjusted to account for strength of opponent and teammates.
For football, I think the advance would come with a "Win Expectancy" chart based on yard line, yards to go, down, time left in half, whichHalf.
HOOPSWORLD.com Review: Pro Basketball Prospectus 2003-04 Edition (November 18, 2003)
Posted 1:40 p.m.,
November 21, 2003
(#8) -
Tangotiger
I believe it is possible in football to separate out the individual offensive lineman by following a process similar to what I did in evaluating catchers performance separate from pitchers when it comes to the running game and PB/WP.
You might not be able to get in-season performance evaluation, but you should be able to get it for career performance.
HOOPSWORLD.com Review: Pro Basketball Prospectus 2003-04 Edition (November 18, 2003)
Posted 4:44 p.m.,
November 21, 2003
(#9) -
tangotiger
Head Football Outsiders: I sent an email to one of you on the contact page. Can you confirm that someone will be getting back to me?
HOOPSWORLD.com Review: Pro Basketball Prospectus 2003-04 Edition (November 18, 2003)
Posted 11:17 a.m.,
November 25, 2003
(#13) -
tangotiger
I find any database program or Excel to fit the bill. I've got my own set of sim scores and projection systems, and I use Access and Excel for them. I prefer not to use C/Java simply because the DB program takes care of alot of the mess that I don't have to worry about debugging.
I can't help you with too many details, because I don't want to get into an IP issue. But, if you are a huge baseball fan and you are proficient with math, sim scores are essentially a piece of cake. (I'll eventually release mine.) That said, I find that on the fun-to-useful scale, that they are 95% fun and 5% useful. (Not that there's anything wrong with that.)
Sports Quotes (November 18, 2003)
Posted 10:29 a.m.,
November 19, 2003
(#5) -
tangotiger
(homepage)
Talking about Bonds' 6 MVPs, the story says:
Among the four major North American professional sports, only the NHL's Wayne Gretzky has more MVP awards, with nine. The NBA's Kareem Abdul-Jabbar also won six MVPs.
Gordie Howe also had 6 MVPs I believe.
"I feel that Hank Aaron's record is the greatest single record in all of sports," Bonds said. "It's going to be a very difficult task to do. I'm prepared for the challenge. I just don't know if it's reachable."
The aforementioned Gretzky has about 50% more points than the #2 guy.
(Please, don't bring up Ks for Ryan.)
Grrr....
Sports Quotes (November 18, 2003)
Posted 1:01 p.m.,
November 19, 2003
(#7) -
tangotiger
According to my estimates, Cy Young and Nolan Ryan threw in the neighborhood the same number of pitches. I forget exactly, but I think it was 95,000 or so for Ryan and 100,000 or so for Cy Young. And I'm sure that Ryan threw alot harder. In terms of durability, no one can touch Ryan.
Young managed to pitch more innings and face more batters because back then batters were probably hackers like Alfredo Griffin and Vlad.
Persistency of reverse Park splits (November 20, 2003)
Discussion ThreadPosted 10:20 a.m.,
November 20, 2003
(#1) -
tangotiger
Bill James also did a similar study regarding the reverse platoon advantage (LH/RH v LP/RP), and came to a similar conclusion.
Persistency of reverse Park splits (November 20, 2003)
Posted 11:56 a.m.,
November 20, 2003
(#3) -
tangotiger
El Sid is an EXTREME FB pitcher, and also at the top of the list with low hits on ball in play (not that those 2 things are mutually exclusive). I would not be surprised if whatever park effect exists at Shea that he'd be most exposed to it (like Wade Boggs at Fenway).
Baseball Player Values (November 22, 2003)
Posted 4:35 p.m.,
November 22, 2003
(#1) -
tangotiger
There are many interesting links on that page. Notice how Gagne contributed more wins above average than any other pitcher in 2002? And 4 of the 5 pitchers were relievers?
The career totals shows how few pitchers can break into the +40 win class.
Bert Blyleven does NOT do well at all with this system, though I'm not sure if any park adjustments were made. Check out Goose Gossage and Trevor Hoffman's impact here.
Baseball Player Values (November 22, 2003)
Posted 8:25 p.m.,
November 22, 2003
(#3) -
tangotiger
(homepage)
If, and only if, you want to do the hard work, at the above link I generated a sim of Win Expectancies using 1 million games. (I don't know how accurate it is, as I did it a few years ago.)
Do a search on this site for Phil Birnbaum, as I have a link to his work where he published the actual data from 1978-1990.
Then, just compare.
Baseball Player Values (November 22, 2003)
Posted 11:00 a.m.,
November 23, 2003
(#6) -
tangotiger
Park factors, by far the most important thing to consider, will be part of my (eventual) system.
As for the pitcher's totals, the numbers are completely appropriate if, and only if, you believe that his fielders were league average.
Since the question being asked is: "what is the win probability, given average conditions, at the bottom of the 5th, down by 1, bases empty, 1 out before and after this PA", then as long as that pitcher had average fielders at every position, then you do NOT have to worry about fielding.
While you can't say this for any given season, you can pretty much figure that this is the case for a pitcher's career. I'd be a little surprised if the effect for a given pitcher would amount to much.
Baseball Player Values (November 22, 2003)
Posted 11:50 a.m.,
November 24, 2003
(#10) -
tangotiger
Michael, can you do the following:
1 - figure out how many runs / BIP were saved by each team that Maddux pitched for
2 - multiply the above figure by Maddux's BIP, year by year
3 - Sum the above figures
This will give us the fielding support that Maddux received, relative to average, (assuming that the GB/FB IF/OF effect is also balanced).
If you want to break it down by IF/OF to try to make it more Maddux-centric, feel free.
I have to believe that the effect for any given pitcher will be under 5 runs per season. I'd say over a career, it's probably under 50 runs. These are just guesses.
Baseball Player Values (November 22, 2003)
Posted 3:09 p.m.,
November 24, 2003
(#14) -
tangotiger
If someone has a little time, do they want to compare the results of my Win Advancements from 1999 to 2002 to Ed Oswalt's (at least for the pitchers)?
Since I have not yet factored in park, the differences would be entirely due to the win probability tables we would have used (mine generic/math, and his empirical for a given year). Like I said, you need hundreds of thousands of games to get a good table, and not just 2000 games.
Baseball Player Values (November 22, 2003)
Posted 3:55 p.m.,
November 24, 2003
(#16) -
tangotiger
The other way to get a decent estimate is to remember the work from Allen/Hsu.
We figure that the standard deviation of hits / BIP from fielders is about .006 to .008 or so. Giving Maddux 700 BIP means that number is around 5 hits. 2 SD (95%) gives you 10 hits, or about 8 runs. Since you get some turnover in fielding, its unlikely that a team can continue to field great fielders year-in year-out. Perhaps over 15 seasons, the standard deviation for fielding may fall down to .003 to .004. Over 10,000 BIP, that works out to about 60 runs (for 95% of the pitchers).
Baseball Player Values (November 22, 2003)
Posted 4:54 p.m.,
November 24, 2003
(#17) -
tangotiger
RJ is the only pitcher that appears in all 4 years in the top 5. Ed has him at +24 and I have him at +23 (though I have him at +24 if I give 100% of the BIP to RJ).
I've emailed Ed, but have not gotten a response from him.
Baseball Player Values (November 22, 2003)
Posted 12:47 p.m.,
November 28, 2003
(#19) -
Tangotiger
I mentioned this elsewhere, but WPA will NOT be part of the book.
Baseball Player Values (November 22, 2003)
Posted 7:57 p.m.,
November 29, 2003
(#20) -
Tangotiger
Here are the top pitchers (inc active) not inthe HOF, according to this system:
Glavine, Gossage, K Brown, Hoffman, Smoltz, Mussina, Schilling, Eckersley, Saberhagen, John Franco, Cone, Gooden, Hershiser, Appier, Lee smith, Blyleven
Baseball Player Values (November 22, 2003)
Posted 2:37 p.m.,
November 30, 2003
(#22) -
Tangotiger
Wow, that must have been what I was thinking. You know, you can argue that any one of those 4 pitchers was the best pitcher since 1919. And you've got them all in their primes at the same time. Damn right pitchers aren't like they are used to.
Baseball Player Values (November 22, 2003)
Posted 5:05 p.m.,
November 30, 2003
(#24) -
Tangotiger
Those are all great points. I can easily see an extra 10 to 15 wins because of this.
Park and league factors are critical here.
Tendu (November 24, 2003)
Posted 9:20 a.m.,
November 25, 2003
(#6) -
Tangotiger
(homepage)
Studes, see above link. It was written by Alan Schwartz of Baseball America (who also did that Carl Morris article) for Newsweek.
Tendu (November 24, 2003)
Posted 11:54 a.m.,
November 25, 2003
(#7) -
tangotiger
Ron mentioned the training practice to me, which I will quote in full:
"I only use former players for recording this data, players who have many years of seeing these ptiches. Every player must prove to me his knowledge of pitch movement, pitch grips, pitch follow through, pitching mechanics and prove it via tests before they are hired. Every data collector goes through a one month probation, proving that the hiring tests were not flukes and also proving that they will focus intensely and be dedicated to data accuracy. Roughly half of the guys who start are dropped during training, or more likely, they quit."
Tendu (November 24, 2003)
Posted 4:53 p.m.,
November 25, 2003
(#9) -
tangotiger
Well, his quote did say to prove it to him via tests. So, just as umpires are evaluated by their peers, Ron seems to imply that his tests were constructed by professional players (akin to the umpires). I don't see Ron as the judge (evaluator) here, any more than Sandy Alderson is the evaluator of the umpires.
Tendu (November 24, 2003)
Posted 11:13 a.m.,
November 26, 2003
(#13) -
tangotiger
The ideal would be to have a "FoxTrax Puck" that measures the precise location and spin of the ball at all times. Anything that tries to do that is great.
I suppose Questec tries to do that, and I'm sure it would be a great feed into Tendu.
The Tendu scorers apparently, according to one of the articles, spend 12 hours per game recording the various ball locations (strike zone, and field). I have to figure that you'll get SOMETHING of value. It's certainly better than not recording any pitch location whatsoever.
And, if the scorers are consistent, and there's quality control, it must be half-decent. And if you do it for 700,000 pitches, the poor quality that may exist (you still drive a car, and buy a computer, even though some parts break down don't you) will certainly not take away from the incredible benefit that you may gain.
Everything comes with a margin of error. The key is for Tendu, STATS, anyone else to SHOW what that margin of error is. It's the only right thing to do.
ABB# (November 24, 2003)
Posted 3:37 p.m.,
November 24, 2003
(#1) -
tangotiger
Finally, I ran a regression using all players from 1919 to 2002, with at least 300 PA, and the best-fit between OPS and Linear Weights was:
0.45 * OBA + 0.25 SLG + .016
Or: (1.8 * OBA + SLG) / 4 + .016
So, whether you decide to use that ".016" to get the overall average to match the batting average is up to you (it may be more annoying than it's worth).
"1.8" is the best constant to use, but if you use 1.7 it's almost as good. (The advantage of 1.8, is not only in its accuract, but it also bumps up the overall average a little closer to the real batting average.)
If you insist on NOT having the intercept, then the best-fit would be
(2 * OBA + SLG) / 4
which is OBA/2 + SLG/4.
My recommendation is to use this last one.
ABB# (November 24, 2003)
Posted 4:14 p.m.,
November 24, 2003
(#3) -
tangotiger
Comparing the recommended new Aaron number to OBAxSLG, and you get an r-squared of .975. So, no sense in using a harder metric, when all you want is a quick estimate, right? As well, the scale of OBAxSLG is higher (.032 for batting average, .036 for the recommended Aaron number, and .041 for OBAxSLG).
To scale your number to batting average, you'd have to do something like:
0.88 x OBA x SLG + .15
which is not very satisfying.
I personally really like the Aaron number for its simplicity and power.
ABB# (November 24, 2003)
Posted 4:27 p.m.,
November 24, 2003
(#5) -
tangotiger
Just to give you some quick calculations, this is what OBA/2 + SLG/4 gives you:
OBA... SLG... Aaron
0.300 0.300 0.225
0.320 0.350 0.248
0.340 0.400 0.270
0.360 0.450 0.293
0.380 0.500 0.315
0.400 0.550 0.338
0.420 0.600 0.360
0.440 0.650 0.383
0.460 0.700 0.405
ABB# (November 24, 2003)
Posted 4:36 p.m.,
November 24, 2003
(#7) -
tangotiger
I'll let Aaron put his stamp on his own metric. I'm just offering my recommendation.
ABB# (November 24, 2003)
Posted 5:09 p.m.,
November 24, 2003
(#9) -
tangotiger
Tango OPS correlates .999 to the Linear Weights rate stat whether you divide by 4 or not. Multiplying by a constant or adding a constant have no effect on r.
I know.
I could have easily have said I ran the correlation prior to Aaron dividing by 4. I didn't mean to imply that dividing by some number led to the correlation.
What about correlating it to something based on BaseRuns, wouldn't that be better than linear weights?
No, for a player, Linear Weights is the better measure. Custom LWTS (generated from BaseRuns) would be even better, but when you look at 15,000 players, I doubt that custom LWTS will show any appreciable difference.
ABB# (November 24, 2003)
Posted 5:11 p.m.,
November 24, 2003
(#10) -
tangotiger
I also mentioned that:
1 - 1.7 * OBP + SLG has such a strong correlation to Linear Weights Ratio
as opposed to
(1.7*OBP+SLG)/4.
ABB# (November 24, 2003)
Posted 10:43 p.m.,
November 24, 2003
(#21) -
Tangotiger
Yes, I like aOPS and gOPS. The name will at least convey something, unlike the millions of other TLA out there.
TLA = three letter acronym
ABB# (November 24, 2003)
Posted 7:07 a.m.,
November 25, 2003
(#25) -
Tangotiger
heh... never associated Aaron to Hank Aaron.... I think "The Aaron Number" is better in that case, as it gives Gleeman his credit, and maybe better name recognition because of Hank.
ABB# (November 24, 2003)
Posted 10:12 a.m.,
November 25, 2003
(#26) -
tangotiger
Aaron has updated his site, with this GPA, along with the league leaders. To really show this off, I'm going to take Aaron's list, and include his EqA.
GPA EqA Player
0.425 0.420 Barry Bonds
0.364 0.362 Albert Pujols
0.363 0.345 Todd Helton
0.340 0.341 Gary Sheffield
0.328 0.325 Jim Edmonds
0.321 0.322 Brian Giles
0.317 0.321 Jim Thome
0.316 0.310 Richard Hidalgo
0.314 0.308 Luis Gonzalez
0.314 0.312 Lance Berkman
0.340 0.337 Top 10
GPA EqA Player
0.340 0.338 Carlos Delgado
0.339 0.341 Manny Ramirez
0.328 0.326 Alex Rodriguez
0.323 0.325 Trot Nixon
0.317 0.325 Jason Giambi
0.316 0.318 Frank Thomas
0.314 0.316 David Ortiz
0.314 0.317 Bill Mueller
0.312 0.318 Jorge Posada
0.308 0.310 Magglio Ordonez
0.321 0.323 Top 10
Except for Todd Helton, which we expected, that's a huge BINGO!
ABB# (November 24, 2003)
Posted 3:03 p.m.,
November 26, 2003
(#28) -
tangotiger
(homepage)
Just for those who are interested in my other comments on this subject, you can go to the above link, and also look for these posts: #72 and several starting at #136
I also made some comments at http://www.battersbox.ca
ABB# (November 24, 2003)
Posted 1:46 p.m.,
November 27, 2003
(#31) -
Tangotiger
The difference that I see is the "finality" of this scale. Once you put in on a run scale, you are now saying something specific. In this case, " a team of 9 hitters with an equal performance will produce these many runs per game". And, that is a b-**** thing to do.
At least, with Aaron's number, it's presented along a batting average scale, and so, it saying that there's still at least another step to convert it into runs. Since Aaron's number and Linear Weights Rate have the same standard deviation, it's actually a pretty easy thing to convert it into runs for a player, within the context of a team.
With OBAxSLGx34, since the currency is already runs, that's it. Sorry, but I don't like it.
However, whatever floats someone's boat. If David thought long and hard for a R/G scale, then by all means. And if Aaron and lovers of EqA prefer the BA scale, so be it.
ABB# (November 24, 2003)
Posted 9:44 p.m.,
November 28, 2003
(#33) -
Tangotiger
What does that mean exactly. That Barry Bonds created 13.30 runs, within the context of an average team, per 27 of... who's outs? His actual outs? An average player's outs given the same number of PAs as Bonds?
What is the one-line definition of OBAxSLGx34?
ABB# (November 24, 2003)
Posted 7:30 p.m.,
November 29, 2003
(#35) -
Tangotiger
I just wanted to make sure we are talking about the same thing.
OTS*34 is a quick estimate, using seasonal data, of the number of absolute runs created by a batter per offensive game.
But, didn't you say earlier that you don't have the problem with the interactive effect? I still don't know what it means exactly that Bonds has 13 RC/game, in the real sense. You aren't talking about 9 Bonds interacting with himself, right? Are you talking about Bonds interacting with 8 players, and then, ..... what exactly?
An offensive game is 27 total outs by this batter, or 25.2 batting outs,
This is another problem. Again, Bonds may make 2.2 outs per 5 PAs, while everyone else makes 3.3 outs per 5 PAs.
I still don't know what it means that Bonds created 13 runs per game, when he himself will only have 5 PAs.
I know what you are trying to do, but I don't know what it is EXACTLY that is happeneing.
ABB# (November 24, 2003)
Posted 10:50 p.m.,
November 29, 2003
(#37) -
Tangotiger
David, I have no intents other than my direct questions.
OBA*SLG/(1-OBA) is RC/27, I believe. And the reason we don't like it is because of the interaction of having 9 Bonds on the same team.
Now, you said that by doing OBA*SLG*34 that you get a result similar to the Bonds + 8 average players, as best as I understand it. And you get a result of 13 runs. But, that doesn't make sense, which is why I've been asking for a clarification.
So, I ask again: what exactly does OBA*SLG*34 represent? What does 13 RC/27 for Bonds mean? Are you saying that the definition is:
"Barry Bonds created 13 runs per 27 of his own outs, within the context of that performance occurring with 8 league average hitters"?
ABB# (November 24, 2003)
Posted 2:46 p.m.,
November 30, 2003
(#41) -
Tangotiger
One of the GREAT things about putting something in a probability metric (like batting average and OBP are, and SLG is not) is that you can apply statistical probability distribution techniques. So, there are some benefits of the BA scale.
Again, a reader can prefer to look at the BA scale, and others the R/G scale.
And yes, I wanted to say that OTA*34 is a fudge, albeit, a very good one.
ABB# (November 24, 2003)
Posted 5:06 p.m.,
November 30, 2003
(#43) -
Tangotiger
In practical terms, Barry Bonds' is really the limit here, and his BA scale corresponds just about exactly to Linear Weights rate. From that standpoint, its a GREAT tool.
Baseball Graphs - Money and Win Shares (November 28, 2003)
Posted 12:44 p.m.,
November 28, 2003
(#2) -
Tangotiger
The way I calculate "earned" salary is wins above replacement times 1.5 to 2 million$ + 0.3 million$.
So, an average team with 30 players will have about 30-35 wins above replacement, which works out to a total team salary of around 70 million$.
I'm not sure how different this is from this Win Shares process.
I don't think I yet believe in the extra value paid to the extreme player. This may be a byproduct of being a free agent or not.
Perhaps, studes can generate TWO different lines, one for those players who are 6-yr and over players (i.e., paid the fair market value), and the rest. I'll bet that you'll get 2 lines that will make far more sense and are far different (instead of the current 1 line that averages these 2 lines).
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 11:52 a.m.,
December 1, 2003
(#6) -
tangotiger
I don't know anyone suggesting that this is a totally new approach, although the expanded availability of play by play data makes this approach easier to use.
Every time I read an article on this topic, and any time a reference to the Mills brothers is not made, it always seems that the author of the piece is claiming a totally new approach being invented.
Also, the fact that OPS correlates well with Oswalt's results, or better yet that OBP and slugging weighted more approriatrely correlates well with those results, tends to confirm that OBP and slugging used together are good quick and dirty measures of value. But that doesn't create a problem in attempting to get more information through the win probability approach.
You DO get more information through a WPA approach. But, is that information strictly a function of the game state context (inning,score,base,out), or is the identity of the hitter/pitcher also important?
By the r squared results reported, more than 10% of the variation in Oswalt's results remains unaccounted for by OPS, and more approprately weighting OBP and slugging still leaves more than 6% unaccounted for. To the extent that an offensive player over the years has tended to perform well in the clutch, this would explain part of that unaccounted for variation. So it is added knowledge.
Side note: according to this, the weighting scheme is 1.6 OBP + SLG.
But, as Cy points out, there is very little "clutch persistency" year to year. If a player does have clutch ability, we can't find it in this measure (context-specific performance minus context-neutral performance or WPA minus OPS-generated-WPA).
In addition, the win added probability method would appear to have a lot of potential in measuring differences in value added by relief pitchers, where the manager has a lot of discretion in deciding whether a pitcher gets used in high leverage situations. For example, relief pitchers under a win probability approach get a lot more credit by protecting a one run lead than a 3 run lead. Moreover, if a save is blown, it makes a great deal of difference whether the relief pitcher has simply allowed the tying run to score, or whether he has allowed the tying and winning runs to score.
I don't think Cy's paper contradicts any of this.
There are some issues that need to be addressed to make the win probabilty approach more useful, including for example, coming up with an appropriate methodology to adjust for park effects, and to take into accounts the different run scoring environments between the two Leagues. But I beleive that continued development of the win probability approach is an exciting area of onging study.
Agreed!!
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 12:08 p.m.,
December 1, 2003
(#7) -
tangotiger
(homepage)
One thought on the using WPA or OPS-generated-WPA to estimate next year's WPA. Cy's paper says that OPS is a better predictor. In the Hidden Game of Baseball, it was noted there too that Linear Weights was a better predictor of Mills' Brother Player Win Averages (PWA) in year x+1, than was PWA itself from year x.
This is a similar claim that Voros makes with a pitcher's next year $H is better predicted with the current team $H than a pitcher's $H. The reason that this works is probably partly due to sample size. There were 6 good reasons for year-to-year r to be high, and only 1 of them was due to "ability" of the thing being measured. It might do well to review the Allen/Hsu paper for a list of these (see homepage link).
I'll reprint them here for review:
To recap, the year-to-year r is dependent on:
1 - how many pitchers in the sample
2 - how many PAs per pitcher in year 1
3 - how many PAs per pitcher in year 2
4 - how much spread in the true rates there are among pitchers (expressed probably as a standard deviation)
5 - possibly how close the true rate is to .5
6 - the true rate being the same in year 1 and year 2
So, I wonder that maybe because the spread of WPA will be much larger than OPS-generated-WPA that that doesn't affect the year-to-year r.
As well, WPA is a combination of: ability, high-leverage context PA, performance. It's possible again that a player may find himself in many high-leverage PAs one year and not alot the next year, AND PERFORM THE EXACT SAME in both cases, but his WPA will be much different. This is say like Keith Foulke, who performs great year-after-year, but his WPA will be much different, not because of his clutch performance being an issue, but the number of high-leverage PAs being far different.
To do this study correctly, you would at least have to control for the number of high-leverage PAs being similar year-to-year. That is, the Leveraged Index (LI) needs to be used as well.
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 2:11 p.m.,
December 1, 2003
(#12) -
tangotiger
The great thing about Win Probability Added (WPA) is NOT about finding a new way to measure a player's performance. With all due respect, that's pretty boring. However, "lists" topics generate 100 times the responses that most other things. Find me one HOF and MVP topic (the ultimate lists topic) that doesn't generate at least 100 posts within 2 days.
The power of WPA is to make you see different things in new ways. As I always say, it's the journey, not the destination.
Leveraged Index is a direct result of WPA. Linear Weight win values are a direct result of WPA. How to walk Bonds is a direct result of WPA. Sac bunt in close/late situations? WPA. There are many things that you would use WPA to answer these types of questions.
But, when you consider the amount of noise/fluctuation in WPA, you've got to be very very careful.
As Cy pointed out, you can compare the incredibly very context-specific WPA, and the very context-neutral Linear Weights, and you get a high degree of correlation for players. WPA doesn't add much in the overall scheme of things.
It adds when you are trying to study a specific question as it relates to wins where it does not correspond directly to a random run (late/close situations).
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 2:54 p.m.,
December 1, 2003
(#16) -
tangotiger
Steve, I agree that we should use all those things you said, and not just OBP and SLG. That's a very good point, especially for speedsters. Since I have supplied the LWTS run values for 1999-2002 somewhere, it would be a simple enough task (I won't do it now, but someone else can if they like) to come up with the LWTS run value for all hitters, and compare that to WPA. So, the difference would be attributed to the number of high-leverage PAs, and the performance in those particular contexts.
No one, I don't think, is discounting performance in high leverage situations. What is in dispute is whether those performances were the result of some ability, as opposed to random statistical variation (i.e., luck).
Looking backwards, yes, absolutely, give credit to the players only. And, if some guy happened to be lucky in one game (Mark Whiten), one month (Shane Spencer), one year (Brady Anderson), then we do want to count that towards the player's impact to winning the game. He was after all holding the bat.
Looking forward, is it reasonable to think that a player's performance in high-leverage situations will be repeated? That is, if you can get say Manny Ramirez to "rise to the occasion from 1996 to 2003", does that mean that he's more likely than Joe Schlub to perform better in high-leverage situations, relative to their overall performances?
So, give Manny credit, past-tense, for his performance. Don't be so quick to think that, future-tense, he will continue to outpace himself in high-leverage situations.
Is Mariano Rivera a high-leverage pitcher? Some might say that his playoff performance is so out of this world that it must be clutch. Others would respond that he got a good run, and it happened to coincide with the playoffs.
Is Barry Bonds a low-leverage hitter? Up until 2002, his playoff performance was a disaster. Is he now a good clutch performer?
Like Walt Davis said in another post, it's not whether the ability exists, but rather if we have reliable methods to detect it.
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 3:32 p.m.,
December 1, 2003
(#19) -
tangotiger
I think your post is very well said.
I just want to add that if it's hard to detect (like "team chemistry"), then it's not worth trying to figure it out. Whatever signal you do find will be so small as to be useless to you in the amount of noise. For example, when I looked at 2001 Giambi and Ichiro, their clutch performances added 10 runs over their context-neutral performances. The likelihood is that if these guys did have clutch ability, the effect that we could measure would be around 2 or 3 runs per year (and more likely, 1 or 2). And, that's just not worth it, in my view.
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 10:33 a.m.,
December 2, 2003
(#23) -
tangotiger
I published this in an earlier post, and I'll repeat it here:
To recap, the year-to-year r is dependent on:
1 - how many pitchers in the sample
2 - how many PAs per pitcher in year 1
3 - how many PAs per pitcher in year 2
4 - how much spread in the true rates there are among pitchers (expressed probably as a standard deviation)
5 - possibly how close the true rate is to .5
6 - the true rate being the same in year 1 and year 2
So, when looking at year-to-year r, I'd be very very careful. As well, the process being followed in the last few posts smells of selective sampling. Combining the two, and only looking at guys in the middle (where there is very little spread), I would expect the r to be low. I don't think this shows anything.
Say we use student grades. Our students have a "true score rate" of 60% to 90%, but when you look at any one test, the spread will be between 0 and 100%. You can probably come up with a model that says to regress the scores of the student's single test by 30% to come up with your best guess as to each student's true score rate.
But, what if you only selected those students who scored between 80 and 100%? Or who scored below 50%? Or who scored between 50 and 80%? I'm not sure what's expected to happen, nor if this is a good statistical technique (my guess is that it's not, but I don't know).
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 3:58 p.m.,
December 2, 2003
(#25) -
tangotiger
Since pitcher mechanics DO change based on the base situation, we should at least see a change there. I wouldn't necessarily attribute that to "clutch" or some "character trait".
Now, does a pitcher "bear down" in close/late situations? Again, any change in late situations can also be explained by fatigue, so again, any change there would not necessarily be a character trait.
So, what are we saying then? We want to look for pitcher performances in the
- same inning/base/out situation
- but different score situation (say 1 run differential versus 4+ run differential)
Again, a pitcher may change his pitching approach based on the score differential to "preserve" his arm, and any differences in performances would not necessarily be a character trait.
So, my question is:
Are we looking to establish if we can find a character trait of a pitcher to "bear down", or are we looking to establish if we can find a difference in performance based on the leverage of the situation?
Ask the question that you are after first, and then, we can discuss how you can construct a model to find the answer.
The Problem With "Total Clutch" Hitting Statistics (December 1, 2003)
Posted 4:43 p.m.,
December 2, 2003
(#27) -
tangotiger
PWV/PA of .000224. Instead, he's at .0019, over 8 times his expected value.
Please don't multiply. This is what Bill James does in Win Shares to discredit Linear Weights. PWV/PA is a relative scale. I mean, if it was -.0001 and with OPS he'd be at +.0001, are you saying his value would be -1 times his expected? As I said, please don't multiply.
In your case here, the difference comes out to 1 win / 600 PA, which is fairly large.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 3:34 p.m.,
December 1, 2003
(#2) -
tangotiger
(homepage)
Jay, click the above link, and that should get you started.
(Note: I didn't know about regression towards the mean at the time I did that, and therefore, the numbers are a little unreliable. When I did the same process for pitchers, the pitchers peaked at age 24, and that's ridiculous. Again, all explained by regression towards the mean.)
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 4:18 p.m.,
December 1, 2003
(#7) -
tangotiger
I'm not sure I follow. Given the following as an example:
YR,PA,timesOnBase
2003,600,250
2002,400,100
2001,500,150
lg,1,0.333
So, first thing we convert that last line into 600 PA, so now we have
lg,600,200
Then, we multiply by the weight:
2003,600*5,250*5
2002,400*4,100*4
2001,500*3,150*3
lg,600*2,200*2
And we add to get:
total,7100,2500 (with 14 weights)
average,507,179
which gives you an OBA of .353
Now, are you saying instead to use SQRT(600*5) for the first record, and so on? And then, after I get a total, take the square of that? And same for the "timesOnBase"?
If you are, then I wouldn't use 5/4/3/2, but maybe 25/16/9/4, or something else. That is, on my scale, my best-fit was with the 5/4/3/2 system. I'm not sure what my best-fit would be if I used SQRT.
Again, seeing that you have 2 great systems with DMB and PECOTA, and they are very intricate and pay alot of attention to detail... and since Marcel does just as well, I don't see much benefit to adding all that complexity.... unless of course I would charge you for it, and therefore, precision would be a nice thing.
(I also didn't mention anything about putting stuff to league average, but again, no big deal if you do it or not.)
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 4:36 p.m.,
December 1, 2003
(#9) -
tangotiger
I'm thinking the weights for hitters should be:
5*2003PA, 4*2002PA, 3*2001PA, 2*600*sqrt[1800/(2003PA+2002PA+2001PA)]
So, for 1800 PAs, the regression towards the mean remains at 2/14, or 14%. But, for 900 PAs, the regression towards the mean would be 40% under your system, and 25% for Marcel. If I remember right, I would regress about 30-35% with 500 PAs. So, I don't think we need any further adjustment beyond what Marcel tells us.
(I also didn't mention anything about putting stuff to league average, but again, no big deal if you do it or not.)
You said that the final weight is for the league average and that the point is to regress to the mean, right?
I meant to say if you have something like 1987, where the overall league average was much different from 1986 and 1988, that maybe you'd want to adjust against that first. But, again, no big deal for the most part, since most players will be affected the same way, more or less.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 4:38 p.m.,
December 1, 2003
(#10) -
tangotiger
I must have calculated wrong... I think your system says 32% regression. Much better, but I'm not sure that that's correct.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 4:58 p.m.,
December 1, 2003
(#12) -
tangotiger
Ahh, but don't forget the number of PAs in the other 3 years are also different.
So, you'd have:
5*300 + 4*300 + 3*300 + 2*1.4*600
So, the regression towards the mean is effectively 32%.
For 500 PAs, that's now:
5*167 + 4*167 + 3*166 + 2*1.9*600, with an effective regression towards the mean of 53%.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 5:03 p.m.,
December 1, 2003
(#13) -
tangotiger
That's the really cool part about the "2" part of the equation. It's really "2*600". So, if the player's total PA over the 3 years is around 300 (5*100 + 4*100 + 3*100), the regression towards the mean is 50%. Put the PAs at 600, and regression towards the mean is now 33%. Put the PAs at 1200, and regression is 20%.
I.e, put things in terms of the ratio of the player to the regression total:
PA*4 : 1200
300*4 : 1200 = 1:1 ratio = 50% regression towards mean
600*4 : 1200 = 2:1 ratio = 33% regression
1200*4: 1200 = 4:1 ratio = 20% regression
1800*4: 1200 = 6:1 ratio = 14% regression
(the "4" is the average of 5/4/3)
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 9:38 a.m.,
December 2, 2003
(#20) -
tangotiger
By the way, in the Forecast Experiment, with the 28 players that were the hardest to forecast based on their up-and-down last 3 years, Marcel beat each of the other 6 systematic forecasters. Of course, since I never put out the equation until after the season was over, it's kind of "back testing", which isn't really fair. We'll see how it does in 2004.
My personal opinion is that the amount of precision that Diamond-Mind, PECOTA, ZiPS et al add is so slight that we don't learn too much. Those systems do have the advantage in using minor league data for the youngsters, and park factors for Coors hitters. You can also argue, slightly, that speedsters and power hitters might follow a different career path. But, all in all, for purposes of say fantasy drafts, Marcel should fit the bill.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 9:25 a.m.,
December 3, 2003
(#23) -
tangotiger
The same you would with anyone else, even if you knew his name was Barry Bonds. Because you've decided to only use 1 year of data, even if you have more, you always regress based on the sample size.
In your case, with say 600 PAs, you'd regress about 30-35% (a little less for HR,BB,K, and alot more for nonHR hits).
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 12:04 p.m.,
December 3, 2003
(#26) -
tangotiger
(homepage)
If you go to my post#2, you will see that the younger player should get a bigger adjustment.
You may also want to read the article at the homepage link. At the end of the article you will get a great chart to use.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 2:17 p.m.,
December 4, 2003
(#28) -
tangotiger
Marcel, the Monkey... the monkey from Friends.
Marcel could also be Marcel Dionne.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 8:46 a.m.,
December 5, 2003
(#32) -
Tangotiger
MGL, I don't think you read me right. Suppose you are a long-time baseball fan, but you can't remember a number to save your life. And all you have are the 2003 batting stats.
It doesn't matter what his name is... you would still regress the 600 PA hitters around the same way.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 7:59 p.m.,
December 5, 2003
(#34) -
Tangotiger
Marcel only works correctly for starters
... but, since we usually only validate for players in year x+1 who have at least 300 PA.... well, you see where I'm going right? This is just like those MLE programs that try to forecast MLB perform using minor league data. The better the rookie happens to perform, the more PAs he gets... the worse he performs, he gets sent down. And so, the guys who play the most do better and get the higher weight.
No question, if you look at my "Talent Distribution" article, you can create a rather simple function between PAs and true talent level, and you can regress based on that.
I was thinking of changing Marcel so that you would regress to 90% of league average. While this would be a little less accurate for the regulars (they should regress to 105%), it would be more accurate for rookies and sophs.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 8:13 p.m.,
December 5, 2003
(#35) -
Tangotiger
To continue...
If I have a regular with 1800 PAs over the last 3 years, and I have his OBA at .400, and the league is .333 (at 600 PA), this would work out to a forecasted OBA of .390.
For a rookie, he'd by default get .333.
However, if I use say .300 as my "league" or "regression towards the mean", my regular comes in at .386, and my rookie comes in, again by default, at .300.
Against this though is the few number of rookies. So, the regulars should regress to .340, the rookies to .300, but my guess is that if I try to minimize the differences, I'd probably end up using .333 for all players. I think.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 10:12 a.m.,
December 6, 2003
(#38) -
Tangotiger
If you know literally nothing about this player names Bonds (he could be named Savkjgbjd for all you know), then isn't it so obvious that it won't affect the regression that it's not worth even mentioning?
No, it is not obvious, considering that someone asked the question. The original question was:
How would you regress a player with only 1 year of data available?
If Player X emerged fully formed out of Zeus's forehead and hits .360 in 2004, without knowing anything else about him, how would you predict he would hit in 2005?
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 9:39 a.m.,
December 8, 2003
(#41) -
tangotiger
(homepage)
There was a good discussion on regression towards the mean at the above link.
I concluded that the regression towards the mean should not be based on rates, but on RATIOS (odds).
If you look at Rob Wood's post near the end of that, I looked for a best-fit to match it, and I end up with the following process:
======================
1 - Take 318 / PA. Call this the "RegressionRatio",
or rr.
2 - RegressionRate = rr / (1+rr)
The 318 would change based on whatever it is you are looking at. I'm not sure of the significance of 318, other than it's close to SQRT(100,000).
I'm sure I stumbled upon something really cool, but as an amateur, I really don't know why it works so well.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 11:20 a.m.,
December 8, 2003
(#43) -
tangotiger
To just clarify, while Rob's process did only involve the error from the binomial process, my contention is that if I modify that "318" constant to tailor what I want to regress (HR/PA, nonHRhits / BIP, etc), that I get a function that would work for any type of process. (I'd do that by looking at empirical year-to-year data.)
The complaint by the readers is that you should not regress a guy who went say 90 for 100 to the same "rate" degree as the one who went 40 for 100. The ratio process takes care of this in a more realistic fashion.
That said, I look forward to you and AED discussing this regression topic, and be happy to watch.
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 12:25 p.m.,
December 10, 2003
(#49) -
tangotiger
AED: I'd be interested in seeing the results of your methodology against Marcel. How much extra accuracy does your process add?
Marcel, The Monkey, Forecasting System (December 1, 2003)
Posted 10:08 a.m.,
December 11, 2003
(#52) -
tangotiger
Technically, it should be weighted by 5*PA(yrX) + 4*PA(yrX-1) + 3*PA(yrX-2).
Hockey Summary Project (December 1, 2003)
Discussion ThreadPosted 9:42 a.m.,
December 2, 2003
(#2) -
tangotiger
(homepage)
The hockeydb site is linked above.
I agree, it's a incredible site, thanks to Ralph Slate.
Hockey Summary Project (December 1, 2003)
Posted 10:53 p.m.,
December 2, 2003
(#5) -
Tangotiger
(homepage)
Click on the above link. That is the best one I've come across.
Bases Batted Forward (December 3, 2003)
Posted 9:30 a.m.,
December 4, 2003
(#3) -
tangotiger
David, I added the thing in parentheses exactly for this reason.
From the perspective of player evaluation, we ARE talking about odds and probabilities. It's not about getting bases, it's about increasing the odds that your team will win.
I believe that you are 100% wrong on the issue when you say " I think they are the real deal, and will eventually be the basis for sabermetrics. ". I say that the basis of sabermetrics is and always will be the marginal change of win probabilities (in baseball, football, hockey, basketball, soccer, and any other team sport).
Bases Batted Forward (December 3, 2003)
Posted 2:22 p.m.,
December 4, 2003
(#5) -
tangotiger
Just to expand on my point:
What are we after? We are trying to figure the impact that a player has on a given team.
What kind of impact? The only kind that matters: win impact.
And what does win impact mean? It's the marginal impact this player has on this given team in contributing to wins.
And how do you figure marginal impact? It's the change in win probability that this player is responsible for, given this team.
How you establish this win impact is fun exercise #1.
How much this win impact will cost you is fun exercise #2.
Outs? Bases? Runs? Nope, nope, nope. Wins.
Bases Batted Forward (December 3, 2003)
Posted 5:17 p.m.,
December 4, 2003
(#8) -
Tangotiger
In terms of evaluating a player's true talent level, what you want is a marginal win probabilistic approach. And, in that sense, not all bases are created equal (from an individual player standpoint). The backward looking stuff is really just for fun, and has no real-word impact. MVP, HOF,all that stuff... just fluff really. The good stuff is the estimating the future, and how much to pay for it. And, you can't do that with a base approach and think you'll get it better than a win approach.
Anyway, I really have nothing more to add to this topic, so I'll bow out at this time.
Bases Batted Forward (December 3, 2003)
Posted 11:15 p.m.,
December 4, 2003
(#11) -
Tangotiger
I meant that I want to bow out, because I feel I've exhausted anything I have to say. I'm bored with it! So, its a preemptive strike, in case someone wanted to continue this with me specifically. It's not like a said something so new, and decided to run for the hills. I just feel like a broken record at this point, like a broken record at this point. In any case, if I receive a personal email on any topic, I almost always respond.
As for WPA, the cool part is all the little things to try to figure out the components. The end-result, the list, is really a byproduct. But, it's these lists that usually excite people. I get my kicks on figuring out the win probability matrix for inning/score/base/out/park, and splitting hitting from running and pitching from fielding. As for superLWTS, WPA will one day supplant it.
Bases Batted Forward (December 3, 2003)
Posted 11:19 p.m.,
December 4, 2003
(#12) -
Tangotiger
As for the snob comment, I felt I was being courteous in announcing my departure, especially considering that there's a reasonable expectation that I respond to all comments on this section of Primer, seeing that I start all the threads.
Bases Batted Forward (December 3, 2003)
Posted 8:22 p.m.,
December 6, 2003
(#15) -
Tangotiger
WPA measures theoretical marginal win impact based on actual performance in real-time in a context-specific environment. That, to me, is value.
WPA is easily adjustable to account for the opportunity context to measure theoretical win impact in a context-neutral setting (i.e., superLWTS). However, superLWTS doesn't account for everything that happens, while WPA does. (MGL has decided that some things are just luck, so why account for it. I give credit to the runners for WP, BK, etc, and a whole host of other things.)
To make it an ability metric, you need to regress the metric. Each of the subcomponents would have its own regression rate. (And, maybe the WP would regress 100% for the runners... I don't know. Same for clutch performance, etc.)
superLWTS is somewhere between a value and an ability metric. MGL has implicity decided to regress 100% all those components that he doesn't include (like WP with a runner on base), and regress 0% all the other components.
Baseball Musings: Defense Archives (December 5, 2003)
Discussion ThreadPosted 11:32 p.m.,
December 5, 2003
(#1) -
Tangotiger
(homepage)
I also want to direct the readers to a rather interesting (and at one point, orgasmic, for me anyway) discussion on fielding at fanhome.
Baseball Musings: Defense Archives (December 5, 2003)
Posted 8:12 p.m.,
December 6, 2003
(#5) -
Tangotiger
To close this off, yup, I was being sincere, and not sarcastic.
Baseball Musings: Defense Archives (December 5, 2003)
Posted 11:54 a.m.,
December 8, 2003
(#7) -
tangotiger
Want to guess what a backup fielder's value is?
What I did was take all of David's data by position, and separated them into two pools: regulars and backups. To qualify as a regular, he had to be in the top 30 at his position in "expected outs". That's essentially a "playing time" component. Everyone else at that position was considered a backup. (You do have a problem with multi-position players, but, we can take care of that in a future study.)
Here is the total for SS:
regular expected outs: 12,372
backup expected outs: 3,249
total SS expected outs: 15,620
So, based on this, we can say that the regular shortstops played 79% of the time, and the backups 21%. So, to put this in a "162 game context", the 30 regular shortstops played .79 x 162 x 30, or 3849 games, and the backups played 1011 full games.
How many outs did the backups actually record? 3225, or a total of 24 less than expected if they were average. Seeing they played 1011 full games, to put that into a 162 game context, we get -24 / 1011 x 162 = -4 outs.
So, the backup SS is worth -4 outs compared to the average SS. How do all the positions do?
3 (4)
4 (0)
5 (2)
6 (4)
7 4
8 (6)
9 4
You'll notice that the backup LF and RF are BETTER fielders than their regulars. This makes sense to some level, since you've got great hitters/bad fielders at those positions. As well, I wouldn't be surprised that some CF are also playing at LF and RF (i.e., double-counting the regular CF as a backup LF/RF).
In any case, what happens if we ass all that up? A team of backup fielders are worth a total of -8 outs compared to the average SS. So, -8 outs per team, or -6 runs per team over a season. That's 1 run per position.
Essentially, a team of backups fielders is worth around the same as a team of regular fielders.
Baseball Musings: Defense Archives (December 5, 2003)
Posted 12:10 p.m.,
December 8, 2003
(#8) -
tangotiger
Note: because David's list does not include all players, the backups list might actually be worse. If I look at the regulars, they come out to being +5 better than average in almost 75% of the playing time. If we include the missing players, the playing time of the regulars probably drops to around 72 or 73%. In order for everything to add up, +5 for 72% of playing time for the regulars matches up to -18 outs for the 28% of playing time for the backups. I'm a little bothered that the few players that didn't make David's list could have that much of an effet. If you take that number, that works out to -14 runs per team, or -2 run per position that the backup fielder is worth compared to the average.
Baseball Musings: Defense Archives (December 5, 2003)
Posted 12:14 p.m.,
December 8, 2003
(#9) -
tangotiger
Finally, putting the regular pool at 45 players per position, here's what the list looks like:
BK ExpectedOuts ActualOuts %PT Outs/162
3 1290 1281 0.14 (2)
4 1929 1924 0.12 (1)
5 1656 1622 0.14 (8)
6 1174 1149 0.08 (11)
7 1314 1325 0.15 3
8 1599 1548 0.13 (13)
9 1623 1644 0.17 4
That's a total of -29 outs for a team of backup fielders, or about -3 runs per fielder per 162 GP.
Baseball Musings: Defense Archives (December 5, 2003)
Posted 2:08 p.m.,
December 8, 2003
(#11) -
tangotiger
I performed a similar analysis using 1999-2002 UZR last year, and the results were similar there as well. The typical backup fielder is only a couple of runs worse than the typical starting fielder.
**************
Here's another one you may like. For each player, I estimated his "BIP" as actual outs divided by 0.7. How does that make any sense? Well, the average DER is around 0.700 for a team. So, giving each player the responsibility, on average, to convert 70% of his BIP into an out, we can get his BIP. It's a useful measure for what I'm about to do...
... Now that we know each player's opportunity context (BIP), we can figure out how much 1 standard deviation works out to. Taking Andruw Jones as the example, he had 362 expected outs or 517 "expected assigned" BIP. One standard deviation would come out to .020 outs / BIP (sqrt(.3*.7/517). Andruw Jones is +28 outs per 517 BIP, or 2.7 standard deviations (28/517 / .020).
How do all the 640 players-positions do from Pinto's model? 435 of those players were within 1 SD, or 68%. That number should appeal to many people. 68.3% of all samples in a population will be within 1 SD from the mean. 102 player-positions were to the right of 1 SD, and 103 were to the left of -1 SD. What you've got is a fairly normal distribution of fielding talent, and NOT at all any kind of pyramid shape of distribution talent at fielding.
Seeing that we can model fairly accurately the talent distribution of fielding, it becomes a snap to figure out how good a fielder can be at fielding.
The leader in BIP was Tejada at 807, with a group of 2b and ss at around the 720 BIP mark. Essentially, you can say that you've got 800 BIP. At that level, 1 SD is .016 outs / BIP. With 95% of the players being within 2 SD, and I think 99% being 3 SD (someone can correct me), you can essentially say that you've got an upper boundary of 3 x .016 outs / BIP, as to how much a fielder can be better than average. That's .05 outs / BIP. With 800 BIP, that's 40 outs per season, or 32 runs. So, 30 runs is really about the most a fielder can contribute, with two-thirds expected to be within 10 runs. (When I repeat this at the position level, it goes from a low of 23 for 1B/LF/RF to a high of 30 for SS/2B).
Could we have gotten that .05 outs/BIP in other ways? Sure thing. Bring up the ZR chart for any position. You will note that the league leaders and league trailers are around +/- .05 outs / BIP.
So, what you've got is a real hard cap here as to how much a fielder can add.
Baseball Musings: Defense Archives (December 5, 2003)
Posted 2:16 p.m.,
December 8, 2003
(#12) -
tangotiger
Limiting it to the 210 regulars (30 per position), and we have 146 players within 1 SD (69.5%), 35 to the right of 1 SD, and 29 to the left of 1 SD. Regulars, backups, whatever. The distribution of fielding talent is virtually the same (though the mean of the regulars is ever so slightly better).
Baseball Musings: Defense Archives (December 5, 2003)
Posted 7:22 p.m.,
January 15, 2004
(#22) -
Tangotiger
Michael,
If you like, I can post it at Primate Studies in the morning.
Tom
By The Numbers, Dec 7 (December 8, 2003)
Posted 11:52 a.m.,
December 10, 2003
(#4) -
tangotiger
The attendance research is one that I was interested in doing, so I'm happy to see something like this.
The best-fit equation is listed as:
ATT/AVE = 2.7525 * (WIN) -.3769
where, WIN = win% and ATT/AVE is the attendance relative to league average.
So, setting the league attendance to 2.4 million, we see that the marginal change of 1 win in a season (.006 wins/game), will increase attendance by 1.7%. If I remember correctly, Pete Palmer, in The Hidden Game of Baseball noted that 1 marginal win changed attendance by 2%.
Assuming that attendance and TV revenue and all those revenues also increase by 1.7%, and assuming 120 million$ of team revenue, 1 marginal win generates 2 million$ in revenue. The 2% rule would specify 2.4 million$, and Voros' research comes out with 2.65 million$.
My own research on replacement level, coupled with the total amount of salaries paid out, shows a marginal win costs, in salary, somewhere around 1.4 to 2.0 million$ / win.
By The Numbers, Dec 7 (December 8, 2003)
Posted 12:00 p.m.,
December 10, 2003
(#5) -
tangotiger
(homepage)
Baltimore was one of those strange teams that I noticed as well. (See homepage link)
Baltimore was a very good team in the 70s, but they just didn't draw. Even when they weren't playing so well in the 80s, their attendance was poor. Only when they went into Camden Yards did things change drastically.
I think Baltimore should be split into two cities. The Baltimore pre-Camden and the Baltimore/Camden era. I think there was a whole redevelopment of the area, and not just the park. I understand it's the same people, but there's a completely different mindset here. Other areas with new parks don't also do a massive redevelopment of the entire area.
At the least, I would introduce an extra variable called "massive neighborhood redevelopment", and give a "1" to the Baltimore/Camden years.
Evaluating A-Rod (December 8, 2003)
Discussion ThreadPosted 9:42 p.m.,
December 8, 2003
(#3) -
Tangotiger
I was just taking a look at aging patterns a few weeks ago. I really didn't see anything with K/BB ratios, or batters who are hackers. I did notice something with fast players (they age slightly better), and fast players with no power (they also age well, presumably because they have a bit more potential).
You might want to ask Nate Silver, as I'm sure he has taken a look at this. I can present some of my preliminary research, but I didn't really want to,because I really didn't do any thorough work on it.
****
I may generate 200,000$ of marginal revenue for my company, but my salary won't be that. If a company is going to spend say 10 million$ on a hitter, they don't want him to generate 10 million$ of revenue (that's not a good ROI), but more like 12 to 14 million$. And that 12 to 14 million will really be more like 6 to 20 million$ (lots of variability in player performance).
I would consider a player like a junk bond (or maybe the coupon rate of a bond), though perhaps others here have a better security to compare them too.
Evaluating A-Rod (December 8, 2003)
Posted 10:48 a.m.,
December 9, 2003
(#6) -
tangotiger
You can contact Nate from the "contact us" link at BP.
I was just taking a look at my sim scores (great way to waste half an hour). Nomar is not just a hacker, but a smart, strong, and fast hacker, which makes him different from most. Right now, I have my program set up to look at players aged 23 to 26 only. (Nomar was born in 1973, and so, his age 23-26 would be 1996 to 1999). And I haven't lg/park adjusted yet.
His 4 best comps among players from the same age group are:
Billy Williams, Tony Oliva, Ruben Sierra, Reggie Smith
Hope this helps.
And I spoke to soon regarding the profiles of players that don't age differently, so I'll take that statement back.
Evaluating A-Rod (December 8, 2003)
Posted 10:50 a.m.,
December 9, 2003
(#7) -
tangotiger
Studes, I don't think your last statement is any different than mine:
Sorry, but I would essentially cap anybody's salary at 15 million$ / year (with anything extra being marketing driven).
Of course, you can argue that wins in certain markets or conditions, like playoffs, might generate more money.
Evaluating A-Rod (December 8, 2003)
Posted 11:51 a.m.,
December 9, 2003
(#10) -
tangotiger
Seriously though, if a player's marginal revenue is $12-$14M I'm sure teams would love to pay him $10M but some other team would offer $11M.
I'm sure that's how Kevin Brown got his 15 million/year.
I can't see any reason for teams to be risk averse with respect to player's salaries
Uncertainty to their actual true talent level + uncertainty as to their expected true talent level = RISK! So, ARod may have produced over the last 4 years at say +9 wins above average / year, that's not what his true expectation is today, nor would it be for the next 4 years.
and I'd think that teams would bid up a free agent until his salary equaled his expected marginal revenue.
I think they used to, and now they don't.
I don't think the investment analogy is a good one since (for the most part) you are receiving the revenue at the same time as you're paying the salary.
I don't think it's that close, but it might be. A win today will add some money tomorrow, but a bit more the weeks after, and even some next year. So, I think there's probably a 6-month lag between revenue and performance. Plus, the playoffs too add more risk, as this is an all or nothing revenue source.
In fact, with a small pool of free agents and a situation with, say, one star shortstop but several bidders for that shortstop with varying expectations we might expect a little winner's curse and a salary greater than the expected marginal revenue.
Because we can't "short" a player's stock, the market price is not the average market price, but the highest price a market will bear. So, there is inefficiency here.
You can have 5 bidders for the ARod stock, and it could be $21, $22, $23, $24, $25. It'll sell at 25$, but that's only because you don't have any shorts to keep them honest (and of course, you only have 1 share outstanding).
Evaluating A-Rod (December 8, 2003)
Posted 1:45 p.m.,
December 9, 2003
(#15) -
tangotiger
Yes, they can take that 10 million$ and buy a government bond.
Evaluating A-Rod (December 8, 2003)
Posted 2:17 p.m.,
December 9, 2003
(#17) -
tangotiger
Well, present day value makes a big difference. 10 million$ invested in a government bond is still 10 million$ in present-day dollars.
However, the performance of a player won't be discounted at the T-bill rate. It will be much more likely to be discounted at a junk bond rate (15 to 20% or whatever).
Even after you account for the best-guess true talent level of the player today, the expected future earnings of that player will have such a high variability, that a team would be crazy to pay a player equal to the marginal revenue he's expected to generate.
Maybe if they had 500 players, they would do it. But, they've only got a handful of players.
Evaluating A-Rod (December 8, 2003)
Posted 3:06 p.m.,
December 9, 2003
(#21) -
tangotiger
That was based on a study by Voros, from I think 1999-2001. A good rule of thumb is that 1 marginal win = 2% marginal increase in revenue. A MLB team would have around 120 million$ in revenue.
*******
Perhaps you can enlighten me on stocks.
You have a basket of 30 stocks, where the correlation in price movement among the 30 stocks is the least. So, while the beta for every individual stock might be 2 or 3 or 10, as a basket, the beta is 0.5.
Now, the market doesn't know that you've figured out how to reduce the variability to this extent, and so, each individual stock's price has a high discount rate (good for you!).
However, in baseball, player performance are naturally not tied to others on the team, and so, the team beta will generally be the same for every team. While the discount rate for any one player might be 20%, you are saying that the discount rate, when looking it from the team's perspective, should be closer to 5% or something. Is this what you are saying?
I'm not sure I'm buying into this, yet.
Evaluating A-Rod (December 8, 2003)
Posted 11:06 p.m.,
December 11, 2003
(#25) -
Tangotiger
Guy, I will concede that there is an extra variable here, and that is the cap on the number of roster spots. I still haven't spent any time trying to figure out the cost of that.
***
Let me also clear up a little something on the distribution of those marginal wins above replacement. A team of all replacement players will win around 49 games, and realistically, what you want is a team to get you to 95 wins, or +46 wins.
Your 8 hitters, 4 starters and 1 closer, if they were all average, would contribue about +26 wins. Make them all above average a little (+1 win above average, or +3 wins above replacement), and you are at +39. Your backups would generate about +4 wins or so. That makes it +43, or 91 wins.
So, I'm not sure the real constraint with the roster size.
Evaluating A-Rod (December 8, 2003)
Posted 11:12 a.m.,
December 12, 2003
(#27) -
Tangotiger
(homepage)
mean performance is much higher than the median
This is not true. I direct you to the above link. If the true talent level of the average player is 100, and the replacement level is 80, the median player will be around 95 if I remember right.
While James is right that the talent is kinda of pyramidial (i.e., the tail-end of the right-hand of a bell curve) he is wrong when he said something about 20 players below 10% of average to 1 player of 10% above average. If I remember right, it's more like 4 or 5 to 1.
As well, because of the playing time component, you end up with an almost normal distribution of "talent times playing time".
Absolute Wins Produced (December 8, 2003)
Posted 10:55 a.m.,
December 9, 2003
(#2) -
tangotiger
Colin,
David's process assigns "credit" (I shouldn't have used the word value) to players who participate in scoring runs in games in which a team won. So, AWP would be similar to runs scored and RBI. It's a type of counting stat.
In terms of comparison, you can figure out what an average player's AWP / out is, similar as you would figure out an average player's Runs/Game or RBI/AB or something.
And, David's perspective is not based on a "what if would have happened", and therefore, you can't bring in that perspective. It's an after-the-fact accounting system.
Whether this kind of system is relevant to your perspective is another story entirely. I don't have much use for this system (nor do I for RBIs), but it is well-constructed with regards to what it's trying to count.
Absolute Wins Produced (December 8, 2003)
Posted 11:35 a.m.,
December 9, 2003
(#4) -
tangotiger
No, I agree that we shouldn't talk about it in the sense of value as we define it. We don't hold R, RBI to those standards.
(As a side note, because the word value has morphed into 20 definitions, we should try to avoid it when possible.)
AWP is, by its name, Absolute Wins Produced. Any counting stat would have an "absolute" implied to its name.
How you relate it to "value" is another story. I can construct a measure called (Absolute) Runs Produced, and make it R+RBI-HR. And, I can start to try to make it better for comparison by creating Adjusted Runs Produced as R+RBI-HR - AB/10. Then, I can figure out what an average player has done in ARP, given the same number of outs. So, maybe Pujols is 60 Adjusted Runs Produced above the average player, given the same number of outs. Does this mean that Pujols has a "value" of 60 runs. I don't know. I just know that he was directly involved in 60 runs more than an average player, given the same number of outs.
Absolute Wins Produced is the same thing. Maybe Pujols was involved in 20 Absolute Wins Produced. And an average player, given those outs, was involved in 12 Absolute Wins Produced. That makes Pujols +8. Maybe the reference point would be outs, at that park with that team, I don't know.
The dirtiness of it is rather clean.
Absolute Wins Produced (December 8, 2003)
Posted 11:48 p.m.,
December 9, 2003
(#7) -
Tangotiger
You're right, it is a bit misleading.
Let me clarify:
Perspective 1: marginal win contributions in real-time
Perspective 2: absolute win contributions after the game is over
Perspective 3a): marginal win contributions, theoretical, based on seasonal data
Perspective 3b): absolute win contributions, theoretical, based on seasonal data
So, in all cases, we are simply interested in recording the win contributions, without necessarily worrying about "valuation" or "total valuation". That would be a next step,if we want to.
Win Advancements (WA), Win Probability Added (WPA), Mills' Player Win Averages (PWA), these are all perspective 1. It's a rather simple process to convert that into a "value" stat.
Absolute Wins Produced (AWP) is perspective 2. It's not clear yet if we can even convert that into a value stat and make it mean something.
Linear Weights is perspective 3a. It's a rather simple process to convert into a value stat. It's also rather straightforward to associate this to perspective 1.
Win Shares is perspective 3b. It's not clear how to convert that into a value stat. It's not straightforward (or even necessarily possible) to associate this to perspective 2.
The valuation process is the next step, and shouldn't take away from the perspectives that these metrics try to describe.
Correlation between Baserunning and Basestealing (December 10, 2003)
Discussion ThreadPosted 12:14 p.m.,
December 10, 2003
(#2) -
tangotiger
Michael,
re: Rickey. I did a (perhaps my most rewarding) study on the effects of batting order in a no longer web available thread at fanhome called "Linear Weights by the 24 base out states". And, in there, I made the case that Rickey's optimal talent for the one batting spot where optimization is the most critical (leadoff spot) could add 10 runs per year above what his neutral-batting-order LWTS would suggest. It is a huge, huge number. An extra 15 wins in a career potentially. (I am in the middle of redoing the study, this time also adding Markov, as well as empirical pbp data for the eventual book.) Baserunning may also add something to Rickey (probably an extra 5 runs a year as well). Rickey may have been the best nonpitcher post-Mays and pre-Bonds.
The Ed Oswalt "win probability added" I don't think included baserunning, giving all the credit to the batter.
And yes, that equation was strictly done against the 2000-2002 data, and therefore would not necessarily apply to the Raines/Coleman/Rickey era. Presumably, the fast runners of today were as fast then, and therefore, we shouldn't expect more than +/- 5 runs added on baserunning back them. However, their SB totals were double those of today. Cutting the BR best-fit in 2, and you'd get something like BS+BR = .25*SB - .40*CS. We just have to be careful what we do.
Finally, I have a high degree of correlation between triples/(2b+3b) and sb/timeOn1B. I'm sure adding triples would increase the correlation, but I doubt it would add much more. But, if we have it, we should use it.
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 12:21 p.m.,
December 10, 2003
(#3) -
tangotiger
What the heck... I included the triples rate, and the r went up to .65. The best-fit was .08 * SB + .07 * CS + .04 * Triples - .01*timeOn1B. The r of just triples to BR was .53.
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 2:12 p.m.,
December 10, 2003
(#7) -
tangotiger
(homepage)
You are right that the LWTS for SB would be different in the 80s. I don't remember off the top of my head what it would be, but being about .05 runs below the 99-02 time period is about right. The SB value is pretty constant across the 3-6 RPG environment.
I would guess that if BR = .10SB + .12CS in 99-02, then it might be = .05SB + .06 CS in the 80s. So, .20+.05 for SB and -.41+.12 for CS.
So, SB-CS seems to work pretty well. It's just the constant (.25 or .30 or whatever) that needs to change.
**********
I agree, there may have been more fast players in the 80s.
As for changes in going from 1b to 3b, you can check out John Jarvis' site. It's a great resource. (See homepage link above.)
You can also check out what I've got here:
http://www.geocities.com/tmasc/destmob.html
That's from 1978-1990. I would guess if you look at Jarvis' data for 1999-2002, you'll probably get similar numbers.
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 2:37 p.m.,
December 10, 2003
(#9) -
tangotiger
I think I mentioned the high correlation between SB, CS and triples.
If you want a best-fit with only SB, and only CS, and only triples:
BR = [.11 3b/(3b+2b) - .01] * timesOn1B
BR = .13 * SB - .01*timesOn1B
BR = .36 * CS - .01*timesOn1B
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 10:06 a.m.,
December 11, 2003
(#20) -
tangotiger
I posted one last study at the fanhome thread. Go to the bottom and look for around today's date/time.
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 12:47 p.m.,
December 11, 2003
(#22) -
tangotiger
(homepage)
Yup, running = basestealing + baserunning
*********
If anyone wants, I uploaded the data so that you can play with it as well (see homepage link).
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 1:53 p.m.,
December 11, 2003
(#24) -
tangotiger
No order. Just data provided for those who want to do their own research. I think I've exhausted what I can do.
Correlation between Baserunning and Basestealing (December 10, 2003)
Posted 2:14 p.m.,
December 11, 2003
(#25) -
tangotiger
Ok, I added more columns of data (including the best-fit, as well as the difference between the best-fit and the actual baserunning LWTS, where you see that Roger Cedeno and Vlad are much worse baserunners than their speed says they should be).
And, I ordered the data by best-fit.
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 2:16 p.m.,
December 11, 2003
(#3) -
tangotiger
Machines are dumb but fast. Man is smart but slow. You would think people would appreciate how well man can use machines.... but not in sports.
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 2:39 p.m.,
December 11, 2003
(#5) -
tangotiger
Can any of the Unofficial Primate Statisticians comment on the Colley process and a logistic regression model (I think KRACH uses one)?
As well, I would have found it simpler to state the "1/2" initial values by simply stating that the first 2 games a team plays is against itself, thereby getting 1 win and 1 loss.
People forget that a win% is not against an average team but against an average team that doesn't include you. Throwing in say 12 games of Yanks v Yanks takes care of this thorny issue in a rather clean way.
However, the logistic regression process does not include, I don't think, regression towards the mean (it assumes that the performance results are representative of the true talent). Does the Colley method address this? Or is all this implicit?
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 3:27 p.m.,
December 11, 2003
(#7) -
tangotiger
I don't follow college sports, but I overheard this issue 2 days ago on WFAN. In there, they mentioned how the #1 computer ranked team at the time, no matter what they did in their next game, were guaranteed to fall out of the ranking as #1. I remember a similar type issue with Tennis a few years ago.
First of all, can someone go into more detail as to what the situation was?
Secondly, I agree that this kind of stuff is frustrating. As the WFAN host was saying: how can you be #1 now, win, and then not be #1 tomorrow. Then, how can you be #1 now?
I would think what you'd want to do is make assumptions on the rest of the schedule as to who wins and who loses, going through this iterative process.
The issue, as I see it, is that all the past games are always being reevaluated based on the new set of games being played. If it was the case that it didn't matter if the team won or not, that they'd automatically be sent to below #1, then there's no reason for them to be #1 now.
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 3:28 p.m.,
December 11, 2003
(#8) -
tangotiger
To continue, I think that by forecasting the rest of the season, this causes all the past games being reevaluated to be lessened in impact (I think).
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 4:24 p.m.,
December 11, 2003
(#10) -
tangotiger
(homepage)
See above for the KRACH system (Bradley-Terry model). Surely they have something similar in college football? How does this compare to Colley?
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 5:14 p.m.,
December 11, 2003
(#12) -
tangotiger
Can you do as I noted as well, and add in 1 game where the team plays itself to a tie?
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 9:25 a.m.,
December 12, 2003
(#19) -
Tangotiger
Jesse:
The game or games that are added in need to serve as a penalty to keep undefeated teams from having an infinitely high rating, and a team having a game against itself doesn't do that.
If the NJ Devils start the year at 13-0, my sytem would give them 1 win and 1 loss against the NJ Devils so that they'd be 14-1. So, my method WOULD prevent a team from having an infinite high rating would it not?
By making each team play against the same fictitious .500 opponent, and being .500 against them, that to me screams of regression towards the mean. Heck, I'd bet that if you gave each team a 3-3 starting point record, that might approximate regression towards the mean the best (similar to my adding 2*600 PA to every player's record in Marcel).
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 11:40 a.m.,
December 12, 2003
(#22) -
Tangotiger
I still don't understand why making a team play itself does not take care of the infinity issue.
I understand that it's not required in any other case.
I don't understand why adding 1 win and 1 loss is required.
********
I would love it if the statistical pros that grace this board (Ben V-L, AED, Alan, Walt, and a few others) can give a step-by-step in explaining some of this stuff, without losing the audience.
Professor who developed one of computer models for BCS speaks (December 11, 2003)
Posted 1:06 p.m.,
December 12, 2003
(#28) -
Tangotiger(e-mail)
Jesse, thanks for that great example. I'll have to think about it, as I'm about to feed my baby.
(Btw, is your logistic regression progam available? If it is, please email me, as I have some things I'd like to run.)
Pettitte agrees to deal with Astros (December 11, 2003)
Posted 5:15 p.m.,
December 11, 2003
(#2) -
tangotiger
I didn't say what the present value was, I simply said that a 3-yr 30.8 million$ deal would be equivalent to the signed 31.5 million$ deal (i.e., both of these have the same present-day value).
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Discussion ThreadPosted 9:19 a.m.,
December 12, 2003
(#10) -
Tangotiger
I am disappointed that someone as intelligent and knowledgeable as MGL makes an evaluation, that he doesn't provide some balance. It's not like DMB is trash, where finding balance would be hard. DMB may have the best sabermetric minds and system, bar none. I too would hold them to the highest standards, to the level of James and Palmer.
I found their presentation balanced, as they attempted to bring in the old minds, with talks of Range Factor and other non-PBP methods, and introducing their PBP methods as well.
We have to remember that when you talk about RF with lefty/righty splits, park and a few other 100% reliable data, you are trying to sell someone that thinks the PBP data, like grid location, and batted ball type/speed, is questionable.
Since no one has publicly shown the reliability of such PBP data (I want a correlation by stringers!), it's certainly a fair thing for DMB to do. Essentially, they could say that "hey! without considering grid/ball type/speed, Mike Cameron is great! And look, if we consider all the possibly questionable PBP data, Mike Cameron is great! So? Mike Cameron is great!"
***
As for SB/CS, if they are considered for catchers, they should most certainly be considered for pitchers. And heck, why not for the 2b/ss (though of course that would be 1% of what they'd do)?
***
As for multi-year data for pitchers, I think that's also fair. With the extremely limited data points, you can essentially pick 15 different pitchers, and you can make an argument for any of them that would be statistically significant. Because DMB does not track each pitcher's actual positioning, and the ball of the bat, and all that, they then decide to cut their losses, and look at prior years. I mean Greg Maddux and Mitch Williams may end up looking the same one year, by the stats by luck (ok, that's a stretch), but just looking at them for 10 plays would be enough to say that Greg would be a much better fielder.
***
Pinto had Andruw Jones near the top in CF rankings, as did DMB. I'm not surprised that UZR also has him near the top. This is more evidence of the silly talk about figuring out that someone is in a decline phase because you see his +20 to +0 ratings over a 4-year period. Now that he's +16, what are people going to say?
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 11:19 a.m.,
December 12, 2003
(#15) -
Tangotiger
dlf: I very much enjoyed reading your comments, so please don't think that they could have been supplanted by my comments. I thought they were very well-written.
Diamond Mind Baseball - Gold Glove Winners (December 11, 2003)
Posted 1:16 p.m.,
December 15, 2003
(#24) -
Tangotiger
Since Tippett goes through a process similar that I do with "Evaluating Catchers", I find it a little hard to believe that Tippett would have let something like that slip by. When I get the 2003 PBP data, I'll break down the Mariner pitchers and catchers for 1999-2003.
Converted OBA (December 15, 2003)
Posted 4:28 p.m.,
December 15, 2003
(#3) -
tangotiger
I would prefer to consider the limits to be Barry Bonds and a pitcher. Can you repost your thoughts under those realistic limits?
Converted OBA (December 15, 2003)
Posted 4:36 p.m.,
December 15, 2003
(#4) -
tangotiger
I'm doing this to use in future articles. I think it's alot easier to show, for example, that Thome's effective OBA is .420, and Maddux's is .280 and come up with the matchup effective OBA, than to list the full line of the players, along with the expected matchup line, and then convert it.
I can do in 1 number and 1 line (and for 30 players), what would take me 10 times as much data to get virtually the same level of accuracy.
Converted OBA (December 15, 2003)
Posted 6:37 p.m.,
December 15, 2003
(#5) -
tangotiger
This is the Ben matchup method:
=============================
B=Batter rate
P=Pitcher rate
L=League rate
B*P/L = X
Sum X for all the events, and divide the X for each specific event by the sum.
=============================
Now, here's some homework for those who want to get their hands dirty.
Method 1, the long accurate way:
Take 2 realistic players, and calculate their matchup lines using the Ben method noted above. Take that resultant matchup line, and convert it into the effective OBA as I have shown it.
Method 2, the quick and dirty way:
Take 2 realistic players, and calculate their respective effective OBA as I have shown it.
Then, use the Odds Ratio method (log5) and calculate the resultant matchup OBA.
How far apart can you get these 2 methods?
Converted OBA (December 15, 2003)
Posted 8:21 p.m.,
December 15, 2003
(#9) -
tangotiger
I'm trying to run some tests, and I don't think I'm liking the Ben Matchup Method right now. Here's my data:
player PA outs 1B 2b 3b hr bb effOBA
League 600 390 100 30 5 15 60 0.342
Hitter 600 375 100 30 5 30 60 0.391
Pitcher 600 375 100 30 5 30 60 0.391
matchup 600 351 97 29 5 58 58 0.476
All I did was transferred 15 outs into HR for both the hitter and pitcher. Now, look at the resultant matchup. It's far higher than I expected for the HR. And the outs look too low.
When you look at the effOBA, the resultant effOBA should be .442, which certainly looks better (at this level, a differential process, as opposed to an Odds process, should work out to pretty much the same).
I'm actually pretty much liking this effOBA approach. Kid has my mouse... gotta go.
Converted OBA (December 15, 2003)
Posted 2:23 p.m.,
December 16, 2003
(#11) -
tangotiger
That is the Odds Ratio (log5) approach.
*****
For the Ben Matchup method, you do that process for each individual component. So, you have 1b/pa, 2b/pa, hr/pa, outs/pa, etc, etc, and you follow that process to get the matchup line I presented.
That matchup line seems unreasonable, and I think the Ben Matchup method is questionable.
Converted OBA (December 15, 2003)
Posted 4:56 p.m.,
December 16, 2003
(#13) -
tangotiger
The only way to prove this is to use empirical data. Just because you and I think it's unreasonable doesn't mean that we're right. It would be interesting to see that if the specific matchups does result in what Ben VL says, or what my model says. I'll get to this at some point within the next few months (along with 100 other things).
Converted OBA (December 15, 2003)
Posted 9:14 a.m.,
December 17, 2003
(#17) -
tangotiger
AED: unless I missed something, this IS what Ben does.
***
Arvin: the .250 OBA means something only in context to the talent distribution of the opponent. To get a context-free metric (some "inherent" talent):
.250/.750 = player
.333/.667 = league
player's "true talent" = player/league = .667
Converted OBA (December 15, 2003)
Posted 9:48 a.m.,
December 17, 2003
(#18) -
tangotiger
Let me go through this step-by-step.
Ben Method:
Assume you have 600 PA, 440 outs, 100 hits, 60 walks as your league. Your pitcher and hitter are both 600,380,100,120. What's the resultant matchup?
On a perPA basis, these are the rates of the lg,hitter,pitcher:
outs H bb
0.733 0.167 0.100
0.633 0.167 0.200
0.633 0.167 0.200
For each one, you do H*P/L, so you get:
outs H bb
0.547 0.167 0.400
You sum this to get: 1.114
You divide the 3 numbers by 1.114, and then multiply by 600 PA to get:
outs H bb
295 90 216
So, that's your matchup line.
The Tango (Odds Ratio) Method
Convert everything into a the effOBA, so we have: .219,.291,.291
Turn all those rates into RATIOS, so we have: .281,.411,.411
Calculate the matchup ratio as H*P/L, so we have: .602
Convert the ratio into a rate (.602/1.602) = .376
That's my effective OBA mathcup rate. Ben has it as .391.
*********
If I were to swap the BB and 1B numbers, Ben gets an effective OBA of .425, and I get .404.
So, there's something also to the various "weights" I give them that carry extra information (that may or may not be correct).
Converted OBA (December 15, 2003)
Posted 1:55 p.m.,
December 17, 2003
(#20) -
tangotiger
Arvin,
But then what do you compare against.
I agree that you don't need the league if you have the true rate. But, to go from the observed rate (which is true rate of hitter against the true rate of the opposing league of pitchers and luck), you need to account for the league (as I show in post 17), and regress as you mention.
Now, we've got everyone on an "inherent" true talent level.
Then, you can compute the expected "true talent" matchup. But, you then need to convert that back to the league level to give it meaning.
I can have Bonds as 2.00 and Pedro as 0.75 matchup against each other, and I'll get a 1.50 matchup. But, that doesn't mean anything. The expected OBA and SLG will depend whether they face each other at Coors or the Astrodome, in 1994 or 1968.
I think we are all talking about the same thing, but for some reason, we are not.
Converted OBA (December 15, 2003)
Posted 2:02 p.m.,
December 17, 2003
(#21) -
tangotiger
For those who may not be following, think of this.
The success rate of the SB is dependent on:
- runner speed, lead, jump
- pitcher pitch type, speed, move, handedness
- catcher arm, release, accuracy
So, all these things are context-free. You can measure each of these things, without accounting for anything else (let's assume all runners are only on grass surface).
Then, you can create probability distributions for all of these variables, and you can figure out the chance of Raines stealing a base against Pettite and Bench.
However, for hitting, it's not quite so easy. If you can "scout" a player to figure out his strike zone judgement, power, acceleration, etc, etc, and the similar for the pitcher and fielders, and add in some game theory, THEN you can do the same. But, for now, we can't. (This would be the holy grail by the way.)
Anyway, you can try to simulate all this by using the actual OBA record of the hitter against the league OBA of pitchers he's faced and parks and fielders, and after regression, end up with a "true talent" rate, similar to say a runner's speed.
Once you do this for all players, you can then create probability distributions to see how they would matchup. Again, you need a park, and that's another probability distribution. Then, you get your resultant matchup.
These last 3 paragraphs I'm simulating using the Odds Ratio Matchup method (and Ben is doing with his Matchup method).
Converted OBA (December 15, 2003)
Posted 2:06 p.m.,
December 17, 2003
(#23) -
tangotiger
Arvin, you probably posted at the same time as I did, but let me ask you a question:
How do you figure out the expected OBA of Bonds v Pedro at Coors 1994, and Astrodome 1980?
Converted OBA (December 15, 2003)
Posted 8:31 p.m.,
December 17, 2003
(#26) -
tangotiger
(homepage)
Yes, the true talent level means "nothing" and "everything". It's just a number from 0 to infinity that captures the player's true talent level in some context-free sense (like Raines' speed from 1b to 2b).
If you go to the homepage link, I showed what the "true talent level", in a context-free sense, would look like for all people. I set it to so that "1" meant a current MLB player, and the function essentially is at "2" for a Bonds-type player, and "0.8" for a top minor leaguer.
********
FJM: I didn't mean to consider the time as a chaning variable, though it certainly is. I just meant to use it to establish a playing environment that we can all picture.
Converted OBA (December 15, 2003)
Posted 8:31 p.m.,
December 17, 2003
(#27) -
tangotiger
chaning=changing
Do Win Shares undervalue pitching? (December 15, 2003)
Discussion ThreadPosted 6:25 p.m.,
December 15, 2003
(#3) -
tangotiger
David, I do remember you bringing up Fibonacci. I think I, and Patriot as well, said that .58 and it's recipricol was the best one, and then you brought up Fib.
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 8:30 p.m.,
December 15, 2003
(#5) -
tangotiger
AED, the overall assignments are NOT reasonable at all. I've already shown that an offensive team scoring 3 and allowing 5 will win .30 games, an a bad def team scoring 5 and allowing 8 will win .30 games. So, what makes you say that WS are reasonable between off/def? They are most definitely not. Win Shares sets the boundaries so that 2.5 off and 7.5 def are equivalent. See? WS is shortchanging the def by 0.5 runs per game, and overcrediting the off by 0.5 runs per game. That's a huge, huge problem.
My comments regarding .6/1.6 is meant as a shorthand explanation to this.
Your comments on fielding are definitely valid.
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 11:26 p.m.,
December 15, 2003
(#8) -
tangotiger
I haven't thought about how to fix this.
1 - If you want a guy who starts 36 games and averages 7 innings to be equivalent to a CF or 3B who plays in 127 games (with all players performing at league average), you better be prepared to prove it. Right now, all people who accept Win Shares accepts this. (This is as bad as the replacement level thing with EqA.)
2 - That 3 run scored and 5 runs allowed has the same win impact as 5 runs scored and 8 runs allowed. You can't make it 2.5 and 7.5 and think that you can make the system "work".
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 8:46 a.m.,
December 16, 2003
(#11) -
tangotiger
Variance: yes, it's all about the variance to split up the distribution. The RS/RA standard deviation is virtually the same for any era since 1900. So, 50/50 is the best split, based on that granularity. It's NOT so obvious if you include hitting, baserunning, pitching, fielding as the various distributions (though 3 of these 4 are not independent of each other). In any case, 50/50 should be the given, and others should prove against it.
Assuming a 5 run environment, a top offensive club will be 6RS/5RA or .59, while a top defensive club will be 5RS/4RA
Those are not equals, are they? But these are:
RS/RA
4.0 / 5.0
5.0 / 6.2
This will give you about the same win%. So, that's 80% of league and 124% of league.
And, in any case, James uses 52/152, so HE's setting the boundaries at a low level. 61/161 is the more in-line with what he wants to do.
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 12:09 p.m.,
December 16, 2003
(#17) -
tangotiger
The 52/152 split: we can't just come up with numbers to fit our liking. They have to be derived somehow. Since James is intent on doing a .300 replacement level by off and by def, then the PROPER levels to set are 61/161. If you want to do a .400 replacement level, then 80/125 is more like it. (Essentially you want the recipricol.)
Now, the pitcher and hitter effect are different on a game level. While the RPW converter might be 11 for a hitter, it would be 9 for a pitcher. However, James applies, essentially, a constant RPW converter. Therefore, to counteract that, you may have to "tweak" the 61/161 to something else so that it adds up properly. Essentially, fudge one number and fudge another so you come up with something right.
As for the 70/30 split: the run value of a safe and out play for a BIP is almost exactly the same as the safe and out play for a non-BIP (i.e., the HR is worth 1.4, the BB is worth 0.33, and so on average, that's almost exactly the same as a 1b.2b.3b). There are about 75% BIP and 25% non-BIP. Giving the pitcher 60% of the BIP, and we get:
.6 x .75 + 1.0 x .25 = .70
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 4:52 p.m.,
December 16, 2003
(#22) -
tangotiger
Tango, you are implicitly assuming that a team with average offense and replacement-level defense would have the same record as one with replacement-level offense and average defense.
First of all, let's not forget what Bill James is trying to do. He's NOT after wins above replacement, but wins above zero. James is SOMEHOW trying to get there by doing wins above some baseline, and when he adds up these wins he again SOMEHOW ends up with an absolute wins total.
Therefore, in terms of apportioning ABSOLUTE wins, you've got to assume that the split is 50/50, since the standard deviation of RS and RA are the same across all eras. If you want to use something else, you've got to prove it.
I am not at all saying that a replacement level def and avg off, and vice-versa, are equals. This is not a question about replacement level as the rest of us talks about it, but as Bill James talks about it.
A team that scores 3 and allows 5 will win as often as a team that scores 5 and allows 8. Therefore, those are the boundaries. Why would Bill James' boundaries of 2.5/7.5 be more right than mine?
I don't see this as a given. In my example of slow-pitch softball, the wins cost by replacement-level defense would be less than the wins cost by replacement-level defense in baseball. Requiring the two numbers to be reciprocals of each other is only valid if you believe that replacement-level defense is as bad as replacement-level offense.
The real replacement level, unlike the Bill James baseline level, can be completely determined by the overall talent distribution of the pitchers and non-pitchers. My best guess is that the talent distribution of the nonpitchers is wider than that of the pitchers, and therefore, the average pitcher, relative to the replacement level, is LESS than the average nonpitcher. By how much, I don't know (yet).
I disagree that 0.62/1.62 (or 0.52/1.52, 0.50/1.50, etc.) is a reasonable definition of replacement level.
We should probably use different terms. The real replacement level is .80/1.25. The Bill James marginal baseline level should be .61/1.61.
Fundamentally, I think it's pointless to try to quantify specific replacement levels for fielding, pitching, and hitting.
I never said as such, and I apologize if I said something that might have been construed as such.
A replacement-level player is one whose overall contribution is at the replacement level, not whose fielding, batting, or baserunning skills specifically are at replacement level.
Yes, I know, as I railed against BP's WARP calculation precisely for this reason.
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 9:10 a.m.,
December 17, 2003
(#30) -
tangotiger
Great stuff AED! I agree that Win Shares has a multitude of problems. Though it seems that people prefer handling problems, as evidenced by the posts in this thread!
I've done some preliminary work on the talent distributions of hitting, pitching, fielding, baserunning. From this point on, I'm going to talk on a PER-PLAY BASIS. While the distribution is in that order, when you consider that hitting,fielding,baserunning are really the same player, something wonderful happens. The talent distribution of the pitcher and nonpitcher are close to each other. Again, on a per-play basis. (I don't want to say that they are the same, because this is all preliminary work, and I still have lots of work to do.)
However, the nonpitcher is involved in 63% of the plays and the pitcher is involved in 37% of the plays. I'm not sure if that means that the split should be that.
***
Using Win Advancement, which I'm always working on on-and-off, I get about a 38% allocation to pitchers on wins above replacement (the "real" replacement level). However, I don't have the problem that James has about capping the pitchers on the bottom at 0, and as I noted in another thread, RJ and Pedro come out much much higher than James does, on a wins above average metric.
***
Still lots of work to do.
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 1:49 p.m.,
December 17, 2003
(#33) -
tangotiger
I don't know how, but you are adding two distributions, and so you would expect the spead to increase.
new dist ^ 2 = dist1 ^ 2 + dist2 ^ 2
So, if you have a standard deviation of "3" for the first one, and a "1" for the second one, the standard deviation for the new distribution would be sqrt(3^2+1^2) = 3.16
So, if the "3" corresponds to "52", then 3.16 would correspond to 54.77
Do Win Shares undervalue pitching? (December 15, 2003)
Posted 2:37 p.m.,
December 17, 2003
(#35) -
tangotiger
Assuming a 70/30 split between pitching/fielding, 90/10 between hitting and baserunning, and a 50/50 on off/def, and using the stdev^2 = stdevA^2 + stdevB^2, then we get the following breakdowns:
hitting = 41%
pitching = 38%
fielding = 16%
baserunning = 4%
How does this match to the 50/50 off/def split?
41^2 + 4^2 = 38^2 + 16^2
So, the pitchers get 38% and the nonpitchers get 61%. (Obviously, I shouldn't have rounded off. Any case, these are just estimates. )
As well, the pitchers are also fielders and hitters, so you can probably say that nonpitchers and pitchers would have a 60/40 split.
Request for statistical assistance (December 17, 2003)
Posted 8:26 p.m.,
December 17, 2003
(#3) -
tangotiger
(homepage)
Check the above homepage link.
***********
This is what I'm thinking as an amateur:
If stdev of random = 1, and the catchers' deltas is = 2, then:
2 ^ 2 = true ^ 2 + 1 ^ 2, making true = 1.73
So, to turn the observed standard deviation from 2 to 1.73, I regress all the observed values by .27/2 = 13.5%. Am I doing this right?
Request for statistical assistance (December 17, 2003)
Posted 12:00 a.m.,
December 18, 2003
(#6) -
tangotiger
Alan,
I wasn't interested (for the moment) in trying to redo the catcher study. I was just using the results of that to try to establish if I can use the standard deviations of the observed results to determine the regression towards the mean.
As for this comment:
You're method has two problems.
1. You have no proof that it works (if it even does).
2. Nobody would understand what you're doing.
Are you referring to the method in my article? For #2, it's rather straightforward, and many people understand what I'm doing. As for #1, I'm satisfied with my processes in everything I do. I'm not looking to prove it beyond however I present it.
Request for statistical assistance (December 17, 2003)
Posted 2:16 p.m.,
December 18, 2003
(#8) -
tangotiger
Thanks for the clarification Alan.
Can you expand on error variance and true variance, with some examples?
Request for statistical assistance (December 17, 2003)
Posted 2:22 p.m.,
December 18, 2003
(#9) -
tangotiger
The method I'm talking about is attempting to use SS errors to regress balks.
Actually, I didn't explain it properly. I was making the distinction between 2 things. For example, if I used SS errors, and I get a standard deviation of the observed value of .026, and if the expected standard deviation of SS errors was also .026, then I can say that the SS errors deltas are completely random, from the perspective of the catcher.
For Pitcher Balks, the observed standard deviation was 2 per 162 GP, while I expected, from a purely random distribition, to be 1 per 162 GP. Therefore, I can't say that the catcher doesn't influence the pitcher's balks totals. He DOES. But, I don't know to the degree that he does.
This is why I'm asking how to make use of the standard deviation of the observed deltas against the standard deviation of random expected deltas.
Apologies for not making it clearer. Even more apologies if I'm still not clear!
Request for statistical assistance (December 17, 2003)
Posted 8:28 p.m.,
December 18, 2003
(#11) -
tangotiger
Is this right? Do you mean standard deviation or rate?
I could, and should, have marked the rates and standard deviation of the rates as a per play, but I put it at per 162 GP. It makes a little more confusing, but I find it easier not to deal with decimals if I don't have to.
I think there's something that's being missed between you and I. How about we try a simple example.
I flip 1000 coins with my kid in the room, and I get 520 heads. I flip 1000 coins with my kid sleeping, and I get 490 heads. So, I give my kid a +30.
My 10,000 other friends do the same thing. Some get +50, others -20, others +10, others -5, etc, etc.
So, question 1: what is the observed standard deviation of the deltas of these 10,000 flippers? ( I guess it would help if I give you the data.)
question 2: what is the expected standard deviation of the deltas, assuming that only luck is expected.
If the standard deviations of the deltas of the two questions above are the same, then I would say that the kids have no effect on the flipping, and so, I can regress the deltas at 100%.
If on the other hand the SD of the deltas of the first question is twice the SD of the deltas of the second question, then THERE is an effect. So, my question #3 is how to get a regression equation for that? Is it as simple as post #3?
Thanks again...
Request for statistical assistance (December 17, 2003)
Posted 8:47 a.m.,
December 19, 2003
(#12) -
tangotiger
I have 100 people, who each flip 1000 coins. The expected win/loss is 50-50. The standard deviation of the "true talent" of these flippers is known to be zero (i.e., pure luck).
The experiment yields a standard deviation of the deltas for this sample at 32.
I then invite everyone's kids, and they repeat the process. The expected win/loss is still 50-50. The standard deviation of the "true talent" of these flippers with kids is known to be .028.
The experiement yields a standard deviation of the deltas for this sample at 64. I ask them to flip again, and again I get a standard deviation of 64.
I then run a "sample1-to-sample2" correlation. The r was .73, meaning that I want to regress 27%.
Can I get that 27% in other ways?
observed stdev ^ 2 = true stdev ^ 2 + err stdev ^ 2
64 ^ 2 = true ^ 2 + 32 ^ 2
(32 ^ 2) / (64 ^ 2) = 25%
So, I think it's a rather simple step to establish what the regression towards the mean figure, given only the standard deviation of the observed, and the stdev of the random.
Am I on to something here?
Request for statistical assistance (December 17, 2003)
Posted 7:18 p.m.,
December 19, 2003
(#14) -
tangotiger
My deltas are p-q, as opposed to what you are showing as p - n/2, or (p-q)/2. I think we're on the same page now.
Thanks for the responses Alan.
Request for statistical assistance (December 17, 2003)
Posted 8:53 a.m.,
December 20, 2003
(#16) -
tangotiger
AED, can you tell me how to do that extra work?
Right now, I limited it to the 29 catchers who caught the most, and then took the average of them to get about 45,000 PA, or about 20,000 PA with at least 1 runner on base. That's a good shorthand, but I'd like to capture at least 100 or more of the catchers.
Request for statistical assistance (December 17, 2003)
Posted 3:04 a.m.,
December 21, 2003
(#19) -
tangotiger
Post #3 was illustration only.
Perhaps the problem is that I'm using the wrong terms. I think I should have said that the variance was "1 standard deviation = 1 bk / 162 GP".
Request for statistical assistance (December 17, 2003)
Posted 8:12 a.m.,
December 22, 2003
(#22) -
tangotiger
Thanks for the input guys. I think I'll need a couple of hours to digest this, and then play around with it. Hopefully, I'll have something to report before xmas.
Request for statistical assistance (December 17, 2003)
Posted 2:42 p.m.,
December 22, 2003
(#25) -
tangotiger
I use something like 5500 PAs per year (about 140 games).
You'll notice that Gary Carter's line reads a delta of -76 PB with 72,385 PAs. In the next table, we have Carter at 13 effective seasons (72,385/13=5568). His per season delta PB is -6. So, either -76/13, or 76/72835*5500.
UZR 2003 Previews (December 18, 2003)
Posted 12:27 p.m.,
December 18, 2003
(#2) -
tangotiger
I'm not surprised, since the baselines are different. The average SS is a much better fielder than the average LF. However, how much better can the top SS be from an already pretty good fielding "average" SS?
UZR 2003 Previews (December 18, 2003)
Posted 4:22 p.m.,
December 18, 2003
(#4) -
tangotiger
Updated.
UZR 2003 Previews (December 18, 2003)
Posted 7:23 p.m.,
December 18, 2003
(#18) -
tangotiger
As per Primer policy, I can (and have) removed posts that don't move the discussion forward. I try not to, and I would ask that you police yourselves. Trying to clean up a mess is a waste of time, compared to generating interesting discussion.
I had a thread on race/athletics. I'll bring that one forward, and you can discuss that particular issue there.
UZR 2003 Previews (December 18, 2003)
Posted 7:58 p.m.,
December 18, 2003
(#20) -
tangotiger
Bill James' study was incredibly eye openiing to me. I would love to redo that study, if I ever get my hands on them. Based on the James study, race does have an impact. Whether that is because blacks are faster, or whether blacks are chosen more for their speed than comparable whites (selective sampling) I don't know.
Tom Timmerman, I believe, had a great study on this issue as well.
UZR 2003 Previews (December 18, 2003)
Posted 10:01 p.m.,
December 18, 2003
(#24) -
tangotiger
I was more talking about the other post I deleted, though calling one person stupid is close to a line.
and do not really need or want you to do it for us. You have better things to do....
I'll do it for myself, because I enjoy coming here reading intelligent, articulate, inquisitive, funny, silly posts. I have no desire to let this place be a free-for-all. Everyone here is a guest, and this place is supposed to be a place that people want to come to. Think of me as the Cheers bouncer who'll throw anyone out who needs to sober up. If I have to waste my time doing this, I'll get another job. This issue is not up for discussion, so let's please drop it.
UZR 2003 Previews (December 18, 2003)
Posted 7:53 a.m.,
December 19, 2003
(#30) -
tangotiger
Is there enough data to know how well defense in one position correlates to defense in another for each position swap?
See the "UZR multiple positions" thread that I brought forward yesterday. It's this week's "Required Reading".
UZR 2003 Previews (December 18, 2003)
Posted 12:00 p.m.,
December 19, 2003
(#32) -
tangotiger
Rally, can you point us to an article that discusses his injury in particular?
UZR 2003 Previews (December 18, 2003)
Posted 10:23 p.m.,
December 19, 2003
(#39) -
tangotiger
One of the interesting things about the 10 run swing between a CF and LF is that it's different based on what the primary position is.
For example, if a +15 LF moves to CF, he might be +0 at CF. If a +0 CF moves to LF, he might only be +5 in LF. (Numbers for illustration only.)
So, not only is their a "degree of difficulty" adjustment, but there's also a "familiarity/experience" adjustment.
Once MGL finishes the 2003 UZR, I'll update all of my other UZR articles (I'll have 5 years worth of data, which should be substantial).
Questec Interview (December 18, 2003)
Posted 8:31 p.m.,
December 18, 2003
(#2) -
tangotiger
I remember MGL, Keith Woolner/Nate Silver doing some studies. Google on baseballprimer.com, and you should get something.
Questec Interview (December 18, 2003)
Posted 11:58 a.m.,
December 19, 2003
(#4) -
tangotiger
or location within the batters box I don't see this part as relevant.
As for the changing strike zone by player, I guess the effect is overstated. I suppose it could change by half an inch or something, and that would be nice to know.
Sabremetrics 301: Custom Linear Weights (December 18, 2003)
Posted 7:07 p.m.,
December 18, 2003
(#2) -
tangotiger
That's a good question, and it brings up something worthwhile to talk about.
Those are the weights given that that is the normal run environment. Therefore, everythings adds up so that the marginal impact of all events equals ZERO. This is true for every one of those columns. Every pitcher should have a marginal impact of zero, because the pitcher himself is shaping that run environment.
That's step 1.
For step 2, the step where you try to compare Pedro to a league average pitcher, that's a simple enough step. If Pedro creates a 2 RPG environment while the batters he faces shapes a 5 RPG environment against all other pitchers, you give Pedro an extra 3 RPG (or 3 runs per 9 innings, or 3 runs per 27 outs, or -.111 runs per out).
In this particular example, his marginal outs are worth -.122 -.111 = -.233 runs.
(For a pitcher, the more negative the better.)
Sabremetrics 301: Custom Linear Weights (December 18, 2003)
Posted 1:10 p.m.,
December 23, 2003
(#4) -
tangotiger
I'll have to see exactly what happens in that case. I'll give you a breakdown later.
Sabremetrics 301: Custom Linear Weights (December 18, 2003)
Posted 4:08 p.m.,
December 23, 2003
(#5) -
tangotiger
From 1974-1990, there were only 30 plays marked as "Pickoff Error".
Here's the breakdown:
a: 1 time, the runner got 2 extra bases
b: 11 times, the runner got an extra base
c: 17 times, the runner was out
d: 1 time, one runner was out, and the other runner advanced 1 base
So, if you work it out, that's 14 bases gained, and 18 outs (and 18 runners removed).
Applying a shorthand standard +.2 runs per base, and -.45 runs per out (with runner removed), and that's -5.3 runs over 30 opps, or -.18 runs per PickoffError. And that corresponds exactly to what the chart says.
For people who resist Linear Weights for whatever reason, realize that everything can be reduced to the Run Expectancy by the 24 base/out chart. And from that, you can generate the Win Expectancy by inning,score,base,out, you can generate Linear Weights run values for any run environment, you can generate Leveraged Index. The Run Expectancy chart itself can be constructed from a very basic Markov model. Basically, the "truth" lies completely in the Run Expectancy chart. Learn it, live it.
Sabremetrics 301: Custom Linear Weights (December 18, 2003)
Posted 4:55 p.m.,
February 11, 2004
(#7) -
tangotiger
Based on this chart, here are the SB breakeven rates, based on run environment:
RPG... breakeven
1 59.5%
2 63.7%
3 66.8%
4 69.3%
5 71.4%
6 73.2%
7 74.8%
8 76.2%
9 77.4%
10 78.5%
As you can see, and as you figured, when facing Pedro, steal alot. When in Coors, be very careful about stealing.
Sabremetrics 301: Custom Linear Weights (December 18, 2003)
Posted 5:01 p.m.,
February 11, 2004
(#8) -
tangotiger
If you are looking to do this in your head:
(RPG+5)/4
That'll tell you how many steals you should make for every 1 CS.
Clutch Hits - Race and Violence (December 18, 2003)
Discussion ThreadPosted 12:26 p.m.,
December 19, 2003
(#2) -
tangotiger
I agree. The performance of blacks, white, latins, lefties, righties, fatsos, skinsos, and anything you can think of that has a large enough population to draw from should be pretty much even, otherwise they're not being properly selected.
In hockey, the European players are under-represented because when it comes to the 3rd and 4th line players, it's more cost-effective, and easier language-wise, to deal with a North American. When you look at 1st round picks, or you look at the "50 best players in the NHL", invariably, over 50% of the players are non-North Americans.
However, the % of all players is, I think, about 60% are North Americans. In fact, even though the NHL has expanded from 17 to 30 teams over the last 25 years, the number of North Americans has remained stagnant at 400 players. If the NHL continues to expand say to 40 teams, a great majority of the extra players will almost certainly come from Europe.
UZR, 2000-2003 (December 21, 2003)
Posted 4:44 p.m.,
December 22, 2003
(#5) -
tangotiger
(homepage)
The homepage link lists a comparison between MGL and Pinto, sorted by biggest difference.
An in-depth look at Steve Finley would be enlightening. MGL?
UZR, 2000-2003 (December 21, 2003)
Posted 3:08 p.m.,
December 23, 2003
(#8) -
tangotiger
I used a cutoff of 48 UZR Games by position. Kearns was at around 40 in CF and RF.
Kearns was -1 with UZR and Pinto in CF, and +7/+4 in RF.
For 1999-2003, I have him at a true talent fielding level of +11 runs per 600 BIP, which puts him in the top 10% of all fielders.
UZR, 2000-2003 (December 21, 2003)
Posted 1:59 p.m.,
December 24, 2003
(#11) -
Tangotiger
Jay:
===================
Suppose a team with Ozzie at SS gives up on average 12 non-HR hits, and 2.6 walks every game (which of course is 27 outs). Applying .50 runs per non-HR hit (I know it should be closer to .57, but I just want to keep it basic), and .30 runs per BB, and -.10 runs per out, and we get 4.08 runs scored per game. And per game, we see that Ozzie's team faces 41.6 batters (again, let's not worry about DPs, etc).
Now, let's say Ozzie was traded for Spike, and let's say for every 41.6 batters faced, there is one ball that Ozzie gets to that Spike doesn't. So, for those 41.6 batters, Spike's team records 13 non-HR hits (1 more than Oz), 2.6 walks, and 26 outs (1 less than Oz). However, there's still one more out to go! Since Spike's team gives up 13 non-HR hits / 26 outs, we can estimate that this team will give up 13.5 non-HR hits, 2.7 walks, and 27 outs per game ( a total of 43.2 batters, a remarkable 1.6 MORE batters than Oz). Anyway, applying our LW constants, and we see that Spike's team gives up 4.86 runs per game.
This number is .78 runs MORE than Ozzie. This is the result of Ozzie getting to one more hit than Spike. .50 runs for the hit, and about .30 runs for the out gives you the .80 runs.
UZR, 2000-2003 (December 21, 2003)
Posted 3:37 p.m.,
December 28, 2003
(#14) -
tangotiger
Joel,
Thanks for taking the time to drop in. Not sure why you would be timid. Hundreds of different people have posted here, and only a few of them have mysteriously disappeared.
Your point about positioning is definitely valid. How much impact it has though needs to be established.
And, would this affect only one player on the whole team? Looking at the team-level UZR for Bos/2003, and we have:
CF: +11
RF: +7
1B: +1
SS: -2
3B: -7
2B: -26
LF: -27
I'd have to ask why would we think that the 2B were not being positioned properly, but the SS were. Prior to 2003, Nomar and the other Bos SS were pretty average, and this average performance was repeated again in 2003.
The LF in Boston, from 1999-2003 were:
Ramirez: -22 / 162 GP
O'Leary: +16 / 162 GP
Rest: -6 / 162 GP
(Each of the 3 above had at least 200 GP).
So, again, do we want to blame bad positioning for Todd Walker, but bad talent for Manny? Todd Walker, from 99-03, was -10 per 162 GP. Walker is simply not a good fielder.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 8:28 p.m.,
December 21, 2003
(#4) -
tangotiger
I don't think it'll change much, but we'll see.
As for its implications, for the most part, you do have the best fielders at the top needing to play in one of the 4 top fielding positions.
There might be some oddball players where they have such a specific skillset, that they can't leverage those skills.
To whoever followins Jenkins: please give us a scouting report.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 10:10 p.m.,
December 21, 2003
(#7) -
tangotiger
There's only a 4 or 5 run advantage between SS and 2b/3b. However, there is a familiarity disadvantage of moving a 2b to SS, and so you might not make up for all that.
There's also this thing about leveraging skills. If the 2b just doesn't have the arm, the conversion rate would be worse from 2b to ss. That is, if the only difference if the arm, and that's the reason for the 4 run difference, a 2b with a really bad arm might have an 8 run difference, and a 2b with a great arm would have no difference.
Certainly, Jenkins should be considered for CF, but, you'd really have to know how fast he is. Maybe the LF/CF need a quick first step, but the CF needs to be extra fast as well.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 8:08 a.m.,
December 22, 2003
(#13) -
tangotiger
AED,
Actually, I only work on UZR runs per play. I agree that the chart is a little confusing when you look at it. I'll update it a little later, as well as a step-by-step so that you can see what and why I'm doing what I'm doing.
I'm also working on better positional adjustments, rather than those off-the-cuff ones I'm using.
MGL,
Read the other article I brought forward on multiple positions. I gave a perfect example of moving Andruw Jones to LF and Chipper Jones to CF, and the effect that would have.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 11:15 a.m.,
December 22, 2003
(#15) -
tangotiger
I added a whole new section in my comments. Please go to the top of this page.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 12:50 p.m.,
December 22, 2003
(#17) -
tangotiger
I guess I don't like it, because there's alot of work left to do!
I added another revision. Go to top of page, and look for it.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 4:50 p.m.,
December 22, 2003
(#20) -
tangotiger
I have this lying around:
Pos LWTS
ss -13
c -10
2b -6
cf -1
3b 0
lf 7
rf 9
1b 17
Off LWTS by position, both leagues, 1989-2001.
Adding the fielding chart I just put up:
+7: SS+5: CF+2: 2B+1: 3B-3: LF,RF-9: 1B
and we get, overall OFF+DEF:
Pos LWTS
ss -6
2b -4
3b +1
cf +4
lf +4
rf +4
1b +8
(Doesn't add up because of catching.)
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 11:40 p.m.,
December 22, 2003
(#21) -
tangotiger
(homepage)
The above homepage link contains the Excel workbook that goes through in step-by-step detail as to how I did all this. (Numbers will be different, as I changed things slightly.)
I used Erstad as the example. Fields in green are given by MGL. Fields in red are set by me. I've tried to make it easy to follow. Let me know if there are any questions.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 11:26 a.m.,
December 23, 2003
(#22) -
tangotiger
The year-to-year r for players with at least 300 BIP was .54 (average of 510 BIP).
My initial regression towards the mean equation for UZR is:
Step 1: rr = 430/BIP
Step 2: regression towards the mean = rr / (1+rr)
So, if the fielder has 510 BIP, rr=.84, and regression = .46
For a fielder with 1500 BIP, the regression would be 22% towards the mean. So, if Erstad is showing +40, he's probably actually +31.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 1:09 p.m.,
December 23, 2003
(#24) -
tangotiger
The r would be at .95 or .99 for those hitting measures. That's because they are really measuring it the same way (the differences only affect a handful of players, like Bonds.)
I'm not sure what should happen if you regress say Off LWTS by 20%, and UZR by 50% and add it up. That is, what's the new confidence interval?
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 1:32 p.m.,
December 23, 2003
(#26) -
tangotiger
Charlie, this is true. However, the effect is probably on the order of .05 runs or so.
That is, if a single/out is worth around .75 runs, and the double/out is worth 1.05 runs, let's say the following:
of hits saved by IF: 90% are singles
of hits saved by OF: 70% are singles
So, the weighted average comes in at .81 for the IF and .84 for the OF.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 2:08 p.m.,
December 23, 2003
(#28) -
tangotiger
Charlie, no, those numbers were just for illustration. I'm just pointing out that the impact is rather contained. Perhaps we are talking about .10 runs per play, or 5 runs difference over a season between the IF and OF.
If I had the more granular data, I would definitely do as you suggest.
UZR, 2000-2003, Adjusted by Difficulty of Position (December 21, 2003)
Posted 12:04 p.m.,
December 24, 2003
(#33) -
tangotiger
I don't think you can further regress.
If Erstad is +31, with a 95% interval of +/- 6 runs on fielding, and he's -5, with a 95% interval of +/- 2 runs on hitting, he now becomes +26 with a 95% interval of ???? runs overall.
Assuming the his BIP is 1500 on fielding, and his PAs are 2500 on hitting, maybe it's a simple matter of weighting the interval like that, so you get: 1500*6 + 2500*2 / 4000 = 3.5 runs.
So, maybe ???? is +/- 3.5?
I'll leave this to the stats-savvy to comment on.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 1:12 p.m.,
December 23, 2003
(#1) -
tangotiger
The "r" is the regression towards the mean factor for that player.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:16 p.m.,
December 23, 2003
(#2) -
tangotiger
Here's how ESPN/STATS sees some of the top UZR fielders:
From ESPN/STATS
Darin Erstad: Erstad, the fastest player on the team, led the Angels' relentless charge on the bases. He races from first base to third as well as anyone in the league. His spectacular defensive style-diving to the ground and slamming into outfield walls-makes him susceptible to injury; he missed seven games after suffering a concussion last year.
Mike Cameron: In the field, there's nothing Cameron can't do, whether it's turning his back to a drive and chasing it down, climbing a wall to take away a home run, or outrunning a drive in the gap.
Geoff Jenkins: ...no one ever figured he'd develop into one of the better defensive left fielders in baseball. But that's what he's made himself into, by getting good jumps and going hard for everything that comes anywhere near his territory. He had fairly good speed and ran the bases aggressively before his ankle injury...
Aaron Rowand: He doesn't have great range but is fearless chasing down balls.
Seems that all these players have this "extra drive" that might make them go from being good fielders to great fielders. This "extra drive" might not necessarily translate itself into other positions.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 7:10 p.m.,
December 23, 2003
(#5) -
Tangotiger
There are several hundred players on that list. I expect that there should be tens of players that don't make sense (as you would expect from any list where the year-to-year r is .50).
Having said that, I'd like to hear from more people as to where they think their players should be. For example, I know that the Batters Box fans love Vernon Wells, but I think his fielding True Talent number is rather unremarkable.
Try to give them a number. For example, if Lofton has a true talent of +10, should he be +5, +15, -5 in your view? Exactly who is it that you have seen say for 20 games that surprises you?
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 9:42 a.m.,
December 24, 2003
(#10) -
tangotiger
I've seen quie a bit of Nick Johnson, and I don't know who it is that says he's a good fielder.... he's horrible. The number of times he's not positioned well, or doesn't know what to do with a ball on a bunt, it's just staggering. He's a god athlete, so you hope that'll translate itself. But, I think that, as a fielder, he is completely lost.
Reyes is probably small sample, so the verdict is out.
I've also seen alot of Matsui, and I am surprised by how low he is. He seems like a very smart fielder, but it's hard to measure his quickness unless you watch him from the crack of the bat. When your frame of reference is Bernie, it's easy to look good.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 3:26 p.m.,
January 12, 2004
(#11) -
tangotiger
Erstad to 1B
I've got Erstad's true talent level at +34 runs per 600 BIP at a neutral fielding position.
Jose Guillen is +10.
Vlad is +5.
Anderson is -2.
Salmon is -11.
So, what you want is Erstad in CF, Guillen in LF, Vlad in RF, Anderson at 1B and Salmon DH.
The position-specific UZR would work out to:
Erstad +32
Guillen +11
Vlad +7
Anderson +7
Salmon 0
TOTAL +57
Moving Guillen to CF, Anderson to LF, and Erstad to 1B and we get:
Erstad +37
Guillen +6
Vlad +7
Anderson +1
Salmon 0
TOTAL +51
So, it seems that Erstad at 1B won't be that bad a move. While Erstad is +32 relative to an average CF, he would be +37 relative to an average 1B. How does that work out?
Well, Erstad is +34 relative to an average fielder at a neutral position, with 600 BIP. In CF, there'd by 4x162 BIP, and at 1B, there'd by 3x162 BIP. So, Erstad would be +37 in CF, relative to an average fielder (not average CF), and +28 in 1B. So, moving Erstad to 1B really costs the Angels only 9 runs.
(This assumes that the translation, which works for the average fielder, happens to work for Erstad as well. This might be a huge problem here.)
Leveraging Guillen in CF (a slightly good move) and we've only got a 6 run loss with Erstad at 1B.
I'm a little skeptical though.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 11:39 p.m.,
January 12, 2004
(#19) -
Tangotiger
Plus, and this is a big plus, I don't know why Tango didn't regess those 4-year UZR stats to reflect true ability before doing the calculations
MGL, MGL, MGL... you must be the absolute worst reader I've ever met (to go with your memory)! The title of this article is "True Talent Fielding...". I also spell out how I did the weightings in the article, and I provided a complete spreadsheet example. And I also have a column that specified how much I regressed it.
To recap:
- title of article
- weightings of seasons
- regression towards the mean equation
- regression value for EACH player
- complete example of regression
I'm not sure what more I needed to do!
I actually should have added an age adjustment.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 11:58 p.m.,
January 12, 2004
(#20) -
Tangotiger
I wouldn't be surprised that for 1B that it would be hard to leverage your skills to the point that you can be better than +20. If that is Erstad's over/under, then that drops another 17 runs, for a total of -23 runs by putting him at 1B, compared to what a healthy Erstad delivers.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 11:18 a.m.,
January 13, 2004
(#22) -
tangotiger
Please note that in 1999 through 2002, Erstad has 93 "UZR games" at 1B. (And Stoneman et al "know" what's best? It's like putting Ozzie at 1B for 93 games.... ridiculous.) He was +8 runs in that time span, or +14 runs per 162 GP.
I think Helton is +16 per 162 GP. So, perhaps +20 is probably around the best that you can hope for in a 1B.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 11:50 a.m.,
January 13, 2004
(#23) -
tangotiger
Finally, I looked at all players who:
- have a true talent level of above 0 and
- played 1B but
- not as their primary position
With 572 games, they had an average UZR at 1B of +12.5 above the average 1B.
Their "true talent" average UZR of these guys was +6 per 600 BIP, which translates itself to +14 relative to the avg 1B according to my adjustment factors. Pretty close to what they actually got.
So, maybe that's as high as the translation system works. That you can cap somebody at around +10 true talent per 600 BIP, which translates to +17.
On the flip side, I looked at all players
- have a true talent level of below -13 and
- played 1B but
- not as their primary position
With 609 games (equivalent to the above), these guys were -11 runs at 1B, relative to the avg 1B. Their true talent UZR was -18 runs per 600 BIP, which translates to -6 runs at 1B, relative to the avg 1B. So, they were actually a little worse, which may mean that there's a certain "familiarity" factor that a poor fielder has that a good fielder doesn't.
I can look at this all day, and I will, if MGL ever produced UZR back to 1989 (which I think is the first year Retro has almost complete locations for PBP).
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 12:10 p.m.,
January 13, 2004
(#25) -
tangotiger
I updated the main article to highlight the player's primary position.
You will notice the large number of CF, and SS are a ways down. This is only because of the adjustment factors I use, which may or may not be accurate. Alot of it has been explained already, so please reread those posts.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 12:24 p.m.,
January 13, 2004
(#27) -
tangotiger
I was thinking about adding extra parameters to do the comparisons.
For example, the diff between a 1B and 3B is really just the arm. But, between 1B and CF, it's a host of things.
2B/SS is the arm
SS/3B is the speed
2b/3B is the arm and speed.
LF/RF is the same (since UZR tracks the arm to throw runners on based separate).
CF/(LF or RF) is the speed
So, just starting off with those categories (speed and arm) might be a good first step.
It would also be interesting to know if a good fielder has less need for familiarity than a poor fielder.
And, if some position needs more familiarity involved. I kinda think that all the positions would be the same, but maybe you need a bit more in the IF, since that involves 2 things (catching and throwing) as opposed to the OF (throwing).
But, MGL, I need more data!!
Btw, more important than all this is if MGL implements PZR. You guys should be starting a petition for that. THAT will be the culmination of DIPS and UZR, the 2 biggest advances in that last few years.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 1:28 p.m.,
January 13, 2004
(#29) -
tangotiger
Actually, I think all of what you are saying is important.
You have:
a - time of ball to you
b - how much you have to move to get the ball
c - how much time it takes to throw the ball
All numbers ONLY for illustration
So, looking at the 4 IF positions (mean, sigma):
a)
1B,3B (2 sec, 0.5 sec)
2B,SS (2.5 sec, 0.7 sec)
b)
1B,3B (0.5 sec, 0.2 sec)
2B,SS (0.8 sec, 0.4 sec)
c)
1B (0.1 sec, 0.05 sec)
2B (0.3 sec, 0.05 sec)
3B (0.5 sec, 0.10 sec)
SS (0.6 sec, 0.15 sec)
So, when making the 2B/3B translation, you have to figure out where you fit in each of the distributions.
I think it would be a rather interesting (and rather monumental) work to model baseball as we really see it, and plug the various tools of the players as we think we see them (i.e., the best Earl Weaver baseball game you can do), and see if you can get the results to match the observed.
Then, it should be a rather simple task to then move these players to different positions, since you now know their specific tools set, and what each positions' toolset requirement.
What I am doing with my article is less than 1% of what I should be doing.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:02 p.m.,
January 13, 2004
(#31) -
tangotiger
(homepage)
The other thread has been brought forward.
Aaron, at the above link, also looks at Erstad at 1B.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:42 p.m.,
January 13, 2004
(#33) -
tangotiger
I think the Erstad issue is really just one of leverage of skills.
Say that Hubie Raines, the most average of all fielders, plays any position, and he always converts 70% of all balls in his zone of responsibility. (An avg 1B say will convert say 67% and an avg SS say will convert 73%, etc, etc.)
Darin Erstad, when playing CF, converts 77% of all balls in his zone of responsibility. Now, the way I do my translations, I assume that he will convert 77% of all balls in zone of responsibility at every single position.
But, because of the tools he leverages at CF, this statement, while true for Hubie Raines, is not true for players of a "weird" profile like Erstad or Ozzie.
So, we may find that Erstad, because of his tool set, will only convert 74% of balls at 1B, and 76% in LF, and 73% at SS, etc, etc. That Erstad's tool set is being completely leveraged and maxed out at CF at 77%.
A player like Clay Bellinger, because he has a slightly above average profile across the board, might convert 72% of his balls in zone of responsibility at each an every position.
So, the key is to try to establish what those tools are, and how those tools are leveraged at each position. Speed and "ball tracking" can be leveraged the most at CF and the least at 1B. So, if Erstad can add +.04 to his tool set in CF, he might be a say "73%" true talent player, but because of his profile, he might be 77% at CF and 73% at 1B.
Hope all that was clear...
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:52 p.m.,
January 13, 2004
(#34) -
tangotiger
To try to expand, what you should end up having is:
Erstad's speed, relative to Hubie Raines, in catching balls: +.04 / BIP
Leveraged Index of speed in catching balls at each position:
1B: 0.1
2B: 0.6
SS: 0.8
3B: 0.4
LF: 0.7
CF: 1.0
RF: 0.7
So, to figure out how much impact Erstad's speed has on catching balls at each position, you multiply the above LI by +.04/BIP.
So, Erstad's speed in CF is worth +.04, but at 1B, it's only worth +.004.
Then, you can come up with "instincts". Erstad's instincts say are worth +.03 relative to Hubie Raines.
Let's come up with an LI for instincts
Leveraged Index of instincts in catching balls at each position:
1B: 0.8
2B: 0.5
SS: 0.5
3B: 1.0
LF: 0.7
CF: 0.7
RF: 0.7
So, maybe instincts are more important at other positions, and you come up with new numbers for that.
You do the same for arm strength, and a whole bunch of other tools that you can think of.
THIS is how I would do the translation for fielding.
(All numbers only for illustration.)
So, if we want to get started, start jotting down specific tools that we want to look at, and the LI that you think it should be at every position (max it out to 1.0 for the top position).
Ready, set...
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:58 p.m.,
January 13, 2004
(#36) -
tangotiger
In fact, Darin Erstad is about as far from being the average player who gets moved from the OF to first base as he could possible be
Which is why it is so shocking at the amount of time that he has ALREADY put in at 1B. And this was in his 20s.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 3:10 p.m.,
January 13, 2004
(#39) -
tangotiger
MGL, I agree, which is why I said to break it up as I did (you probably posted at the same time as I did).
By the way, I just noticed that Jeff Davanon is one of the best fielders in the league (similar to Shinjo from his numbers). He's an average hitter. This would make him worth around +20 runs above average per season or +40 runs above replacement, or worth about 8 million bucks per year. So, why is this guy 30 years old, and had only 500 career PAs?
So, my recommendation is to get your scouts on Geoff Jenkins and Jeff Davanon.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 3:20 p.m.,
January 13, 2004
(#42) -
tangotiger
If I had to pick btw Erstad and Sheffield to be my shortstop tomorrow I'd take Sheffield b/c he was once a SS.
No way, not for me.
If both players had the rest of the offseason and spring training to work on it I'd be tempted to go with Erstad but not so sure
This one is even easier. Hubie Brooks, the 80s Derek Jeter, played SS in college. It didn't matter. Giving a guy the entire offseason and spring training is enough for me to choose Erstad over Sheffield. Heck, Vlad over Sheffield. How low can I go?
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 4:17 p.m.,
January 13, 2004
(#47) -
tangotiger
J, great list!! Unless someone beats me to it, we can try to generate some LI for each of those.
Then, we can try to discuss a few players that we've seen, like Erstad, Cameron, Jeter, etc. At the very least, we should end up getting EXACTLY what UZR says for their positions.
Afterwards, we can transform these players into other positions to see how they look there. This might be a fun exercise.
The only players I can comment on are Mets/Yanks players, and a few Expos players.
Assuming he's healthy, how long would you think Erstad needs to play SS to be better than Jeter? better than the average SS? Would he ever be an average SS?
better than Jeter: spring training
better than avg SS: probably under a year, but let's see after we do the transformation above
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 4:19 p.m.,
January 13, 2004
(#48) -
tangotiger
Lefty: hmmmm... that's a great point! That should be part of the transformation as well... handedness!
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 4:34 p.m.,
January 13, 2004
(#51) -
tangotiger
(homepage)
See above link for Fielding LI, using Cross's list.
I removed instinct, cause I thought it was captured in 2 other entries.
Also, "arm strength" I'm only considering with respect to getting the BATTER out (or keeping him to a single), and NOT for getting runners already on base out at 3B or home.
I'll add handedness as well.
Comments?
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 4:47 p.m.,
January 13, 2004
(#52) -
tangotiger
(homepage)
I updated the file with a new name.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 10:11 p.m.,
January 13, 2004
(#56) -
Tangotiger
The scale should be extra outs per BIP compared to Hubie Raines.
So, if Jeter's arm is +.02 outs per BIP, but that the LI for arm strength for a SS is .7, then his extra value is +.014. (I see what you are saying, that we need another LI to compare all the tools to themselves.)
You do this for all the categories, and the sum of all these things will give you his UZR relative to Hubie Raines. In Jeter's case, I think that works out to about -.04 outs per BIP.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 9:58 a.m.,
February 16, 2004
(#60) -
tangotiger
Without speaking about ARod or Jeter specifically, but players of their overall ability (as opposed to their specific toolset) would have the following impact fielding-wise on their team:
ARod SS / Jeter 3B: -7 runs
ARod 3B / Jeter SS: -13 runs
Putting out a suboptimal fielding configuration of players of this caliber will cost a team 6 runs.
If you think that 6 runs is "no big deal", then you have really come to the wrong place.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 10:59 a.m.,
February 16, 2004
(#61) -
tangotiger
For those thinking that the change should be even larger, remember that Jeter has to play *somewhere*. This is just like creating an optimal batting order. You can try to hide ReyRey at the bottom of the order, but he's still going to have over 10% of the PAs (as opposed to 11%).
Same thing here. The SS is involved in 5 plays a game while the 3B is involved in 4. What you need is to put Jeter at LF, RF, 1B (3 plays each), or DH (0 plays).
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 11:45 a.m.,
February 16, 2004
(#62) -
tangotiger
I posted this at fanhome.
=====================================
ARod is 30 runs better than Jeter at SS.
ARod is 24 runs better than Jeter at 3B.
Jeter has to play SOMEWHERE. So, that's a 6 run gain by putting ARod at SS and Jeter at 3B (rather than vice-versa).
You can think of batting orders the same way. You might say that Barry Bonds is 100 runs better than ReyRey at #2 and that Barry Bonds is 85 runs better than ReyRey at #8. So, putting Bonds at #2 and ReyRey at #8 will be 15 runs better than the reverse.
This extends into parks as well. Piazza may be +40 runs compared to an average hitter at Shea, but +30 runs compared to that same hitter at Coors. So, Piazza would be more efficient at Shea (numbers for illustration only).
Or Larry Walker may be +30 runs compared to an average hitter against Randy Johnson, and +40 runs compared to an average hitter against Andy Pettite. So, Walker, in this case, can't take advantage of RJ to the extent that he can against Pettite. (Numbers for illustration only.)
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:40 p.m.,
February 16, 2004
(#64) -
tangotiger
I agree that we can't look at these two guys in particular. Just generally, if you have two guys who happen to be a 30-run difference at SS, they would be a 24-run difference at 3B.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 10:50 a.m.,
February 17, 2004
(#66) -
tangotiger
As noted in the article, you can't take it too far, which is why I'm trying to distance from comparing ARod and Jeter specifically.
Given the average player who has played SS and 3B from 1999-2003, my conversions are what they are. I should really look at primary SS going to 3B. And, a good SS and a bad SS making that move. Given the lack of data points for such a small period of time (99-03), the confidence level will go down the tubes.
How this applies to ARod/Jeter specifically is anyone's guess.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 10:56 a.m.,
February 17, 2004
(#67) -
tangotiger
(homepage)
Please read the above homepage link for more analysis on players switching between SS/3B.
True Talent Fielding Level, 1999-2003
Adjusted by Difficulty of Position, and
Extracted to All Positions (December 23, 2003)
Posted 2:43 p.m.,
February 17, 2004
(#69) -
tangotiger
I don't have the necessary data to say either way.
The skillset required between CF/LF/RF is the strongest in terms of similarity. So, we are in far better footing to say that the translations we have in the OF are going to be right.
If the reason that your +11 SS is a plus 11 is because of his blazing speed, then he won't be able to leverage that so much at 3B. If the reason that your -9 SS is a minus 9 because he is so incredibly slow, then he'll be able to unleverage all that at 3B.
It's very possible that the translations would work as you show them, if the profile of the players that make up your left column is far different than what is required in your right column.
Super-lwts previews - Baserunning (December 23, 2003)
Discussion ThreadPosted 9:48 a.m.,
December 24, 2003
(#4) -
tangotiger
MGL: why not present the intermediate data, so that we can try to make sense of it? There might be a systematic bias, say that Edmonds moves ALL his runners over more than average. This is another one of those "matchup" type things, like I did with the catcher/pitcher.
In any case, what I'd like to see is:
runnerid,batterid,startbase,startouts,finalbase,gridLocationHit,GBorFB,1BorXBH,n
For Pujols, and league average.
Batting average on balls in play, ground balls and other such beasts (December 24, 2003)
Posted 1:52 p.m.,
December 24, 2003
(#4) -
Tangotiger
David: yes.
Voice: good point. We should also include handedness.
The Base on Balls (December 24, 2003)
Posted 3:37 p.m.,
December 25, 2003
(#3) -
Tangotiger
I finally got a chance to read this fully. A sharp-minded man, this guy was.
Home run …………100.0%
Triple……………… 74.1%
Double…………….. 50.6%
Single……………… 29.4%
Base on balls………. 16.4%
Multiply by 1.55 all the way, and you get:
Home run …………1.55 runs
Triple……………… 1.15 runs
Double…………….. .79 runs
Single……………… .46 runs
Base on balls………. .25 runs
These numbers would be similar if you looked at R+RBI-HR. That is, a HR will score himself, plus 0.6 runners on base. A double would score himself .43 times, plus another .42 runners. A walk would score himself .26 times, plus .02 other runners on base (i.e, bases loaded).
This would be the first step towards understanding Linear Weights. Hard to believe it took so long before this finally took hold.
The Base on Balls (December 24, 2003)
Posted 11:00 p.m.,
December 25, 2003
(#6) -
Tangotiger
I particularly liked the approach that pitchers had already decided back then as the forerunner to dips: don't walk the batter, and take your chances with them hitting it to your fielders. Jamie Moyer and David Wells would have fit in perfectly back then.
BGIM : Maximum Likelihood Estimation Primer (December 26, 2003)
Discussion ThreadPosted 11:47 p.m.,
January 2, 2004
(#1) -
Tangotiger
I'm hoping some of the usual stat-savvy suspects chime in here:
Let's say that it's a given that I know the true talent distribution of players in a league at .340 OBA with a variance of 1 standard deviation = .040.
Now, you have a player who has a .440 OBA in 600 PA, and I ask the question: what's his most likely true talent OBA?
I can think of 2 ways to find that answer:
1 - Find out the sample observed OBA of all the players, and establish the variance of those samples, compare it to the true known, and regress such that true variance ^ 2 + luck ^ 2 = observed ^ 2
2 - Choose the a priori value throughout the true known distribution, and come up with a best guess true OBA based on the observed of .440 over 600 PA (say with a 95% confidence level). Take the weighted average of the results based on the true known distribution. (Is this known as MLE?)
Is what I am saying making sense? If #2 is correct, how do you set up your equations to find the answer? And are the results of #2 going to be close to #1?
BGIM : Maximum Likelihood Estimation Primer (December 26, 2003)
Posted 5:15 p.m.,
January 5, 2004
(#3) -
tangotiger
AED, this is great stuff! I'm not sure where you get all your numbers, but walking through it, here's what I've got:
Taking 1 / variance^2, and we end up weighting the population mean by 26% and the observed mean by 74%. That gets us to .414. The variance would be sqrt(.414*.586/600) = .020.
I get where you get this:
x^264 * (1-x)^336
where 264 = 600*.44, and 336 = 600*.56
but I don't get where the variance is .0237.
Thanks again... it's been very enlightening!
BGIM : Maximum Likelihood Estimation Primer (December 26, 2003)
Posted 8:26 p.m.,
January 5, 2004
(#5) -
Tangotiger
Numbers for illustration only.
BGIM : Maximum Likelihood Estimation Primer (December 26, 2003)
Posted 9:56 a.m.,
January 6, 2004
(#7) -
tangotiger
This is some good stuff! Thanks again.
I looked at the OBA, and the stdev is around .030. The regression towards the mean equation, to match the above, would be rr=270/PA, r=rr/(1+rr). I've been using 250 instead of 270. Interesting, when I use .340 as my observed mean, the regression equation would now have 250. In fact, it seems that how much you would regress (to best-fit to the above) is related to how much away from .500 you are (as well as what the population variance is).
In any case, given the league mean of .340, and having an observed OBA of between .250 and .500 (and regardless of PA), the "270" works out to between 210 and 280.
Therefore, if you are looking for a quick regression towards the mean equation, use 250.
A .340 OBA with 250 PAs will give you a standard deviation of .030. This will match the league population standard deviation. Since the two sigmas now match, this means the population mean and the observed mean will be equally weighted. That is, you would regress the observed by 50%.
rr=250/PA, or in this case rr=1
regression rate = rr/(1+rr) = .50
I suppose then it would be rather simple to come up with regression equations for a whole bunch of metrics. Figure out what the population variance is. Figure out how many observed PAs the variance from the binomial would generate. That would be your numerator in the rr equation.
(With all the provisions I've already noted.)
For example, if the league BA is .27 and the sigma is .025, then the numerator would be 316. If the SLG is .4 and the sigma is .04, then the numerator is 153.
AED, thanks for the sabermetric orgasm!
[Insert component name] Adjustment Factors (December 26, 2003)
Discussion ThreadPosted 5:47 p.m.,
December 26, 2003
(#1) -
tangotiger
(homepage)
The original fanhome discussion can be found here.
[Insert component name] Adjustment Factors (December 26, 2003)
Posted 10:43 p.m.,
December 26, 2003
(#3) -
tangotiger
I can't work at home with my baby... what else am I supposed to do?
Just to clear it up, MGL did not characterize my position on park factors. I hate the way people interpret park factors, and not that people use park factors to begin with. The park factors, as currently done, are only a first step. Because of they way they are used, people treat them as the final step.
[Insert component name] Adjustment Factors (December 26, 2003)
Posted 9:12 a.m.,
December 27, 2003
(#5) -
tangotiger
I would use ALL years for a park. For example, the dimensions of Wrigley has not changed, right? Well, go back to the very beginning to establish the Wrigley factor.
However, the types of players HAS changed probably. As well, the other parks being compared to have changed.
And of course, the weather would apply only to the given year. So, you have to figure out which parks have its climate change the most year-to-year.
There's tons of stuff to consider, which is why I say all this is only a first step.
Park Factor Thoughts (December 27, 2003)
Posted 2:43 p.m.,
December 27, 2003
(#2) -
tangotiger
I figure I've got 4 days to cleanse myself.
Valuing Starters and Relievers (December 27, 2003)
Discussion ThreadPosted 2:41 p.m.,
December 27, 2003
(#3) -
tangotiger
Excellent article, and thoughts. This is the kind of article I like to read, even if it has no accompanying research.
Valuing Starters and Relievers (December 27, 2003)
Posted 11:34 p.m.,
December 27, 2003
(#8) -
tangotiger
a single run given up in a game has the same weight regardless of whether a starter or a reliever allows it.
So? Don't you value an out from a SS different from a 1B? Don't you value the hitting performance of a pitcher different from a RF?
What we are talking about here is the appropriate baseline for comparison purposes.
The specific question being asked is: "How would a MLB average pitcher do, if he had pitched to those 6 batters in relief?" or "How would a MLB average pitcher do, if he had pitched to those 25 batters to start the game?" Because of the very specific question, you are automatically going to get different baseline standards to compare against.
At this point, I'm not even sure what an average MLB pitcher is, because of this starter/reliever issue. If you take all pitchers, and weight them by their actual PAs, but assume they all started, you might get a 5.00 ERA. If you assume they all relieved, you might get a 4.40 ERA. If you then weight them at 2/3, 1/3, you'll get 4.80 ERA. But, the actual MLB ERA might be 4.70 because of the way the pitchers are used individually.
(Numbers for illustration purposes only.)
Valuing Starters and Relievers (December 27, 2003)
Posted 4:03 p.m.,
December 28, 2003
(#17) -
tangotiger
To confirm MGL, yes, I did also find that "times through the order", a pitcher, be it a career starter, career reliever, or something in between, performed equally well each time through the order (NO dropoff whatsoever). So, knowing you are going to start, you can pace yourself.
However, it's certainly possible that a Billy Wagner has no reason to pace himself, and decides to juice it up 2 MPH per pitch. Why not? The difference in starting/relieving can simply be traced down to effort exerted. This may also be the reason that "leveraged innings" for relievers are alot less than starters. That is, a starter will have an LI of 1.0 for 230 innings, or 230 leveraged innings, pitching at 95% of maximum. A reliever will have an LI of 2.0 for 90 innings, or 180 leveraged innings, pitching at 99% maximum.
It could very well be that effort x Leverage x Innings works out the same for both starters and (top) relievers.
Valuing Starters and Relievers (December 27, 2003)
Posted 9:26 a.m.,
December 29, 2003
(#22) -
tangotiger
The most important takeaway in Guy's last reply is that, overall, the league average ERA of a reliever is lower than that of a starter. Assuming that the component ERA is lower as well (and I only look at component ERA btw), then that by itself is enough to say that "yes, relievers have an advantage". Why? Because your best reliever is worse than your best starter. Your 2nd best reliever is worse than your 2nd best starter, and so on and so on.
I would guess that if you take out the best reliever (BEFORE THE FACT) and your two best starters (BEFORE THE FACT), THEN, I would say that the remaining relievers and remaining starters would be equals. Just a guess.
And MGL: Guy said that while the reliever's top ERAs were much better than the starter's top ERA, the reliever's bottom ERAs were ALSO much better than the starer's bottom ERAs. Not sure if he used the appropriate IP cutoffs. In any case, this suggests that:
a) the spread is probably wider as expected, but
b) the distribution is shifted to one side by a large degree (otherwise you would have expected alot more bad reliever ERAs out there)
However, this last paragraph can be selective sampling. A manager might need (or accept) going to 100 innings with a starter before benching him, and might only accept 25 innings from a reliever at that level.
Valuing Starters and Relievers (December 27, 2003)
Posted 10:13 a.m.,
December 29, 2003
(#24) -
tangotiger
since bullpen pitchers receive higher per-inning evaluation scores than starters
This is not true. The average reliever's Leveraged Index (LI) is 1.0, as it is for the average starter. Obviously, the Riveras and Hoffmans et al are at 1.7 or 1.8, while the Pedros and RJ are still at the 1.0 level.
The LI ONLY serves to establish the relative impact that the relievers have on the game state.
Relievers have it easier, just like catchers have it tougher. I would certainly advocate a different baseline comparison for catchers too.
Valuing Starters and Relievers (December 27, 2003)
Posted 12:42 p.m.,
December 29, 2003
(#27) -
tangotiger
Colin, great thought, and I was thinking of something similar.
Essentially, if Curt is averaging 30 BFP per game, then his replacement's performance should be judged against that. While the "performance in times-through-the-order" is fairly static, this really only applies to the pitchers who were able to pitch through the order 3 times. It's possible that a replacement-level pitcher just is not durable/cunning enough to go through 27 batters, and he might pay the price the third-time through (yes, I will look into this as well).
However, I will disagree with part of this replacement talk. There IS a replacement-level TRUE TALENT pitcher. However, how this pitcher performs in context as a starter, or in context as a reliever is in question. I'm not sure that talking the replacement-level reliever and replacement-level starter, from 2 separate pools, is a good way to equate it.
To take an example, you wouldn't take a replacement-level high school SS and compare that to a replacement-level high school 1B as a way to baseline the SS and 1B positions in high school. The pools of position players are very interchangeable in high school, and they are as well with the roles of MLB pitchers.
Valuing Starters and Relievers (December 27, 2003)
Posted 2:55 p.m.,
December 29, 2003
(#29) -
tangotiger
And when we evaluate starters in comparison to replacement level, the benchmark should be what a replacement-talent pitcher does in a starting role (not mixed with relief performance).
Agreed.
As for Colin's suggestion, this would assume, I think, that the replacement-level reliever is worse than the replacement-level starter (assuming two pools of pitchers).
a better component ERA is required to hold a MLB job in relief than as a fifth starter
Ok, so what do we have? In terms of TRUE TALENT, what would be our best guess as to the order:
1S, 2S, 1R, 3S, 4S, 2R, 5S, 3R, 4R, 5R, 6R
(where S=Starter, R=reliever)
Is this pretty much the "Pitching Spectrum"?
Valuing Starters and Relievers (December 27, 2003)
Posted 3:22 p.m.,
December 29, 2003
(#32) -
tangotiger
What if we find that a player, when placed at catcher, losing 10% of his offensive output. What do you do with this information?
****
Again, we go back to our question. I'm going to make this "Tangotiger Question #1", as I ask this question all the time, and I think it's the most basic question that every fan asks: how would an average player do if placed in this context? This usually leads to "Tangotiger Question #2": how much worse would the team be if a bubble player played instead of the average player?
TQ1: So, I think we can all agree that it is much more likely that an average pitcher would perform better as a reliever than as a starter. So, how would an average pitcher do in Prior's place? How would an average pitcher do in Gagne's place? In Dotel's place?
TQ2: the bubble player will suck more as a starter than as a reliever, but maybe not to the same degree. Maybe, he just sucks. A Billy Wagner, who lives and dies on 100% exertion, might not be as effective as a 95% exerted starter. But, a bubble pitcher, just might not have that much of a discrepancy between starter and reliever... sort of a poor man's Greg Maddux or Jamie Moyer... the type of guys who might be equally effective as a starter or reliever.
So, it might be possible that if your baseline is the bubble pitcher, that a reliever and starter should have the same comparison line. It's just that it's easier to leverage Wagner's skills as a reliever than it would be to leverage Maddux's skills as a starter. (Just like it's better to leverage Cameron's skills as a CF than to leverage Rolen's skills as a 3B, even though if they played at some same neutral position, like 1B, they might be equally effective.)
Valuing Starters and Relievers (December 27, 2003)
Posted 4:00 p.m.,
December 29, 2003
(#38) -
tangotiger
If someone is going to try to do some research on this issue, it is *imperative* that your classification of a starter, reliever, #5 starter, #5/6 reliever is done prior to the season in question (as is any replacement-level issue).
As well, when we talk about "true talent" we are talking about the *expected* performance in a neutral setting. In the case here, a neutral setting might be two-thirds of PAs as a starter, and one-third PAs as a reliever. And *expected* performance would be actual prior performance, but regressed a certain amount.
Valuing Starters and Relievers (December 27, 2003)
Posted 4:13 p.m.,
December 29, 2003
(#40) -
tangotiger
(homepage)
Go to the above link, and read that article. It's the best one-stop-shop place for replacement ideas.
Valuing Starters and Relievers (December 27, 2003)
Posted 10:13 p.m.,
December 30, 2003
(#45) -
Tangotiger
Your first statement:
In 2003 the average Team ERA for Starters Only was 4.55. For Relievers, it was 4.11. So the difference was 0.44. Fairly large, yes, although a long way from 0.6.
is good evidence of what is being said. If my pitching spectrum (post #29) holds, we have to accept that the average starter is a better pitcher than the average reliever. So, at the very least, using your above data, you have to bump up the relievers ERA at least .44 just to make them equals. And, a bit more to make sure that the average reliever is worse than the average starter.
Valuing Starters and Relievers (December 27, 2003)
Posted 3:47 p.m.,
December 31, 2003
(#48) -
Tangotiger
I don't see how looking at teams helps in looking at players in this instance, but let's not beat this dying horse.
***
For WPA, it doesn't matter much what you use, because my consideration is that both teams are equally talented at every point in the game. So, if you've got a 5.5 ERA pitcher, I've got one too. And my offense is as bad as your pitching, so that we're always a .500 team in a 5 RPG environment. This is the easy way to do WPA, and one which I do while I continue having a full-time job.
Valuing Starters and Relievers (December 27, 2003)
Posted 10:24 a.m.,
January 2, 2004
(#55) -
tangotiger
I think DAvid meant to say -0.66 and not -.066
Valuing Starters and Relievers (December 27, 2003)
Posted 1:39 p.m.,
January 4, 2004
(#64) -
tangotiger
both of whom are presumed to be somewhat freely available, almost by definition
I don't get this. The replacement level starter is your #3 or #4 reliever. The replacement level reliever is your best pitcher in the minors, or some crud that was just released.
The value of the avg starter v avg reliever has nothing to do with the IP numbers. The avg starter is simply a better pitcher, on a per PA basis, than the avg reliever.
In the minors, the #5 starter is better than probably even the #1 reliever.
I think alot of the discussion so far assumes, from both sides, things that the other side doesn't.
Best Fielding Teams, 2003 (December 28, 2003)
Posted 4:06 p.m.,
December 28, 2003
(#1) -
tangotiger
It's nice that MGL, Pinto, I and 8 million NY fans see the MEts and Yanks as 2 horrible fielding teams.
It would be interesting to hear from BAL, COL, and MIN fans.
Best Fielding Teams, 2003 (December 28, 2003)
Posted 4:30 p.m.,
December 30, 2003
(#3) -
tangotiger
UZR and Pinto both uses the batted ball speed (3 different speeds).
However, I have no idea as to its reliability, nor to its impact on the ratings.
One of the problems with UZR is that it does so much under the hood, that you have to take it on faith that it's done well. No offense to MGL, who does tremendous work on UZR, but I would want to split up the UZR by the various parameters to see what in the world is happening with each one.
That is, what's the basic UZR using only the zone. Then, using the park. Then, add in the batter hand. Then, add in the gb/fb tendency of the pitcher. Then, the base/out situation. Then, the batted ball speed, etc, etc. Right now, the selling of UZR is based on faith and that the results are somewhat, but not very, surprising. That's good for alot of people, but not everyone.
Best Fielding Teams, 2003 (December 28, 2003)
Posted 11:04 a.m.,
January 15, 2004
(#5) -
tangotiger
What I did was take each team's actual players, used their "true talent" that I calculated in the other thread, weighted them by the number of BIP they had, and calculated a team's "true talent" fielding level.
Anaheim and Seattle have the best fielding players, while the Yanks, by far, have the worst fielding players (in 2003). You may think the differential between actual UZR and True Talent UZR is "too much". The standard deviation of the differential is 23 runs. Our expectation for what the SD of the differential should have been is 24 runs. (sqrt(.3*.7*27*162)*.8=24)
Take for example the Whitesox fielders. UZR says they were +54, but the true talent of their fielders, weighted by their actual BIP, was -4. By this measure alone, I expect the Whitesox to give up an extra 58 runs next year. (Aside from personnel changes).
The Yanks fielders were extremely unlucky. I expect their fielding runs to improve by 43 runs (had they kept the same personnel).
Essentially, we expect everyone to play up to their true talent levels.
(No age adjustments made)
team actUZR TTruns diff
ANA 40 39 -1
SEA 78 39 -39
SLN 11 30 19
OAK -12 24 36
COL 15 15 0
HOU 21 15 -6
KCA 33 12 -21
MIL -7 9 16
LAN -2 8 10
ATL 9 7 -2
FLO 13 6 -7
CLE 6 5 -1
BAL 17 2 -15
MON 13 2 -11
SDN 28 1 -27
TOR 8 -3 -11
ARI -30 -3 27
CHA 54 -4 -58
CIN 7 -5 -12
BOS -43 -6 37
MIN -17 -9 8
PIT -35 -11 24
TBA 13 -11 -24
PHI 5 -12 -17
CHN 10 -13 -23
TEX -51 -18 33
NYN -34 -19 15
DET -16 -20 -4
SFN -16 -22 -6
NYA -96 -53 43
Best Fielding Teams, 2003 (December 28, 2003)
Posted 10:41 a.m.,
January 29, 2004
(#7) -
tangotiger
Agreed.
You can also break that up into lefty/righty pitchers, and look at the sides of the infield and the sides of the OF.
If you include the GB/FB and LH/RH, you know what you get? You get the 7 fielding positions.
So, to best estimate the impact of fielding on each pitcher, you want to know the ball distribution to the 7 major zones for each pitcher, and then apply the fielder (and park) effect on each pitcher.
Best Fielding Teams, 2003 (December 28, 2003)
Posted 11:13 a.m.,
January 29, 2004
(#8) -
tangotiger
Here is how the 2004 Yanks might look like, fielding-wise:
Position Neutral... Position Specific.... Player
+12, +8 CF : Lofton
.0, -2 2B : Soriano
-9, -4 RF : Matsui
-12, -23 SS : Jeter
-12, -1 1B : Giambi
-15, -9 LF : Bernie
-16, -18 3B : Sheffield
---------------------------
-52, -49 : Total
The worst fielding 3B from 1999-2003:
Greg Norton
Aubrey Huff
Travis Fryman
Fernando Tatis
Gary Sheffield would be right in the middle of that.
Really, the Yanks' fielding is a big mess. This is what I think I'd do:
1B - Giambi
2B - Bernie
SS - Soriano
3B - Jeter
LF - Sheffield
CF - Lofton
RF - Matsui
The first important move is Soriano. He's a better fielder than Jeter, and he's quicker. Bernie has a horrible arm, and can't cover the ground. Other than 1B/DH, the only other place to hide those attributes are at 2B. I think Jeter would be more reliable than Sheff at 3B.
This of course assumes that these players had half-a-season to make the switch.
But, this is just a really big mess. No matter what combination you come up with, it would be a bad choice. What I just did was bad. It's the Bad News Yanks. Add in the poor catching of Posada, and you've got 6 of your 8 fielders as way below average.
Best Fielding Teams, 2003 (December 28, 2003)
Posted 12:02 p.m.,
January 29, 2004
(#10) -
tangotiger
Thanks for the kind words!
03 MLE's - MGL (December 28, 2003)
Discussion ThreadPosted 9:33 a.m.,
December 29, 2003
(#16) -
tangotiger
Mickey: you have the MLB ERA average (or is that RA) as 4.00, and you regress your pitchers towards 5.00. Then, you state that your hitters have a league average 5.00 RPG. I think it gets confusing, and you should settle on a common baseline for both hitters and pitchers, and for pitchers, use RA and not ERA.
03 MLE's - MGL (December 28, 2003)
Posted 4:21 p.m.,
December 29, 2003
(#23) -
tangotiger
MGL: Marcel and I don't have access to minor league data in a downloadable form. Therefore, what should Marcel do? Give every rookie a blanket OBA and SLG of 95% of league average!! How can Marcel do that? Well, since we only compare various forecasting systems with an after-the-fact 200 or 300 PAs, this is a great way to cheat. And, if a rookie were given that much playing time, he probably performed around the league average.
If Marcel were a smarter monkey, he'd make it 90% of league average for the IF/C, and 100% of league average for the OF/1B.
The one place where PECOTA, DMB, ZiPS, and MGL trumps Marcel is with rookies and sophs.
03 MLE's - MGL (December 28, 2003)
Posted 9:33 a.m.,
December 30, 2003
(#30) -
tangotiger
I know of no evidence that suggests that hitters have to get "used to" the major leagues.
I know of no evidence that suggests that hitters are equally disadvantaged as they play against better competition.
It's rather obvious that some players will have different translation numbers. The 3 questions to ask are:
1 - can we spot these players using their profile/quality numbers
2 - how much impact does this have
3 - to what extent does scouting help us find these players
MGL's implicit answers are:
1 - no
2 - none
3 - none
And my reply to that is: it's alot more fun to at least try to find the answer, than to write a long-winded paragraph response with no statistically significant argument.
(Man, it's alot easier to argue with MGL this way.)
03 MLE's - MGL (December 28, 2003)
Posted 2:18 p.m.,
December 30, 2003
(#33) -
tangotiger
Don't worry, I'll be working on this too.
03 MLE's - MGL (December 28, 2003)
Posted 3:58 p.m.,
December 30, 2003
(#39) -
tangotiger
To control for "quality" of pitcher, simply do:
FIP = (13*HR+3*BB-2*SO)/IP
If you want to ERAize it, ERA = FIP + 3.2
If I were to do this study, I'd do something like:
1 - Look for all pitchers aged 25 to 28, min 3000 PAs (basically, starters)
2 - Calculate their FIP
3 - Calculate their SO/PA
4 - Calculate how many PAs they have from age 29 to 36
5 - Run a regression of 2 and 3 against 4
03 MLE's - MGL (December 28, 2003)
Posted 4:33 p.m.,
December 30, 2003
(#41) -
tangotiger
I have no idea what the PF is at b-r. BUT, in your case, why bother with it. You are not trying to pinpoint the results of any one pitcher. The PF will have virtually zero impact here.
03 MLE's - MGL (December 28, 2003)
Posted 10:20 p.m.,
December 30, 2003
(#44) -
Tangotiger
I meant to do the regression analysis as a quick way. I almost always do it the "splitting the data into groups" to present to the reader.
As for what the regression will give you, again only for the researcher's understanding, and not to explain to a reader as it would be boring, you'd get your slope and standard error for the slope. That by itself will tell you what you need to know.
Instead of using the actual K/PA rate, you could create a dummy flag as +1, 0, -1 for a K/PA rate above a certain threshhold. Essentially, you are splitting the group into 3. Again here, this kinda of merges the "splitting analysis" with the power of regression.
Yes, of course, statistical analysis is boring to explain, present, and read. The splitting into various control groups helps to explain to the interested reader what the boring regression analysis says.
FIP and DER (December 30, 2003)
Posted 9:45 a.m.,
December 31, 2003
(#4) -
tangotiger
Studes, it's not a "proportion". FIP does not share the same scale as the other metrics.
FIP and DER (December 30, 2003)
Posted 11:42 a.m.,
December 31, 2003
(#7) -
tangotiger
To get technical about it, studes said:
As a next step, I computed an intermediate stat called DERA (a combination of DER and ERA -- get it?). It equals ERA minus FIP. So it represents the proportion of a pitcher's ERA for which he shares responsibility with his fielders.
dipsERA = FIP + 3.2
DERA = ERA - FIP
DERA = ERA - dipsERA + 3.2
when the pitcher and fielders are both league average at hits / BIP, ERA = dipsERA
so,
DERA = 3.2
So, there's no real "proportion" being allocated between FIP and ERA.
A proportion would be: "50% of his salary is paid by the Yanks" or some such.
There is a relationship in what studes is doing, but it is not a "proportion". You can't have FIP divided by ERA and have it tell you anything. FIP is just a number. To give it meaning, you have to add 3.2 to it.
(FIP + 3.2)/ERA will tell you something useful, though.
FIP and DER (December 30, 2003)
Posted 3:43 p.m.,
December 31, 2003
(#9) -
Tangotiger
However, given that you basically use linear weights for your weights in FIP, I think it comes pretty close to a proportion
What is Pedro's "proportion", what with his negative FIP? My only problem is with you using the word proportion.
In particular, it would seem to me that applying league-average weights to FIP for extraordinary pitchers is throwing off the conclusion.
Correct. FIP is a shortcut to DIPS, and a damn good one. You lose at the extremes,as you would with any shortcut.
What you REALLY want to do is apply BaseRuns. You plug in the known BB,K,HR, and use league average for hits,2b,3b,nonKouts. That'll give you the dipsRA. Use the player's real numbers for hits,2b,3b,nonKouts to get the RA.
THEN, you can perform your analysis.
FIP and DER (December 30, 2003)
Posted 11:07 a.m.,
January 1, 2004
(#14) -
Tangotiger
Studes,
Until you rerun using BaseRuns, what you are finding may simply be a byproduct of the shortcut.
hits per ball in play is fairly static across all quality of pitchers, quality as measured by BB,K,HR.
So, what you are left with is how much run impact a hit per BIP has. And, we know with custom LWTS that the run value of a hit decreases with the run environment. So, I would expect a slight slope downwards with a real DERA and FIP, as the FIP decreases.
FIP and DER (December 30, 2003)
Posted 10:54 a.m.,
January 2, 2004
(#19) -
tangotiger
Pitcher A: 40 PAs, 1 HR, 3BB, 6 K, 21 non-K outs, 5 RPG
Pitcher B: 40 PAs, 1 HR, 3BB, 10 K, 17 non-K outs, 5 RPG
So, pitcher A has 30 BIP where the fielder might be able to do something with, while pitcher B has 26 BIP. (Let's assume that the hits/bip talent rates are different for the 2 pitchers, such that overall, they are the same at 5RPG).
So, if fielders contributed "50 win shares" for a season of pitcher A, he'd contribute 43 win shares for a season of pitcher B.
Charlie, are you saying there's more to it than that? Are you talking about the cases where the hits/BIP talent rates of the pitchers are the same?
FIP and DER (December 30, 2003)
Posted 12:34 p.m.,
January 2, 2004
(#21) -
tangotiger
I didn't say he compensated for it completely.
In any case, the fielders have the same effect on each BIP, more or less. So, I take that as a given. (I know it's different between a GB/FB pitcher, etc.)
Whatever is left over to keep the RGP equals, assume that the different talent levels of the pitchers on BIP is responsible.
It's like the park effect. Say both pitchers are GB pitchers, and they play on turf. But, if one pitcher gives up lots more GB because he strikes out alot less, the "turf effect", per GB, is still the same, but over the whole game, it'll affect the lowK guy more.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 9:14 a.m.,
January 1, 2004
(#3) -
Tangotiger
I have an issue with taking the lower of the two PAs.
What you should do, I think, is first regress each of the components by the number of PAs. THEN, you can make you comparison.
If you have 300 PAs in AAA and 100 in MLB, the spread of performance will be much larger in MLB. Dividing the AAA stats by 3 will still give you a spread that is much smaller than the 100 PA in MLB.
Therefore, first regress each of the AAA stats, and the MLB stats, based on their actual PAs. Then you can compare.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 7:30 p.m.,
January 1, 2004
(#6) -
Tangotiger
MGL, I think the problem with your HR example (high/low) is selective sampling. You choose the guys with really low HR rates.... well, alot of them were unlucky, right? You would have been better off using say low 2b+3b to 1b ratio as a proxy to "power", adn then looked at the HR rates.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 10:37 a.m.,
January 2, 2004
(#12) -
tangotiger
(homepage)
MGL, MGL, MGL.... YOU are accusing someone ELSE of skimming a post, and then commenting on it? Should I get John McEnroe on you?
********
You said:
As you probably figured out already, there is not single coefficient that we can use to do a one-step prediction form minor to major, because the real two-step process is nowhere near linear.
This is not the reason at all. It has nothing to do with linearity, and everything to do with selective sampling, which is what my comment was directed at.
As well, you really should be using the Odds Ratio method. At the level that you are doing it, using a Rates method is wrong.
******
Now, your idea about looking at the stats of the player in question in a year that's NOT part of the sample is EXCELLENT!
However, you still have some issue. Even though you are looking at say a player who played in the minors-majors one year to establish the equivalency, and then look at year+1 in the minors to figure out his true talent level (all the players as a group), you get selective sampling. If in year+1 the guy had very few PAs in the minors, then he's hitting the cover off the ball (luckily) and he gets called up. If he stays in the minors all year+1, and gets tons of PAs, he probably wasn't doing so well. Worse, you weight those PAs more, because he had more.
*********
I've noted this many times, but it bears repeating. The PAs of a player is dependent on how well he plays. It should not simply be used as a weight as if it was independent of his performance. Just knowing the PAs of a player, and nothing else, will tell you alot about how a player performed in the minors/majors.
*********
Finally, everyone should reread Walt Davis's post on the subject of "survivorship" (see homepage link), as well as running a Heckman Selection Correction (near the bottom of that page). These are very important subjects if you want to be serious about MLEs.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 10:48 a.m.,
January 2, 2004
(#14) -
tangotiger
Michael, can you give details as to how you got your numbers, as well, as how you'd handle the issue of non-uniformity mathematically
Actually if you assume that A-Rod's OBP talent is 95% likely to be between .380 and .420 (and make that uniform for that 95% - we can play with that later, it isn't important for now),
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 3:42 p.m.,
January 2, 2004
(#20) -
Tangotiger
There are problems in using this approach to get MLE's, as the players going from A to AA will not be the same as the ones going from AA to the majors.
Even better!
What you really really want is for the players that jump from league to league to be representative of the players in each league. That is, the CLOSER you can get your sample players to be almost random, the better.
I have to believe that alot of the A to AA moves must be based on somethign other than "they were somewhat lucky in A ball". If you get a nice big chunk of players going A to AA, that's great. AA to AAA might also be a good watch, though I'm guessing that AA and AAA are treated almost the same.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 3:44 p.m.,
January 2, 2004
(#21) -
Tangotiger
For example, you can get decent translation numbers between NL and AL because the types of players and the quality of players that move league-to-league are pretty random. What hurts you are sample size, and you have to control for the handedness of the batters and pitchers.
However, you can easily figure out translations between AL and NL using say 5-year moving averages. I'm a little miffed that I haven't done this already actually.
Maybe someone will now...
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 8:23 p.m.,
January 2, 2004
(#26) -
Tangotiger
AL to NL or vice-versa.
WEll, if both experience the same "familiarity" factor, it'll cancel out.
A NL may lose 5% going to AL, and the AL will lose the same 5% production. So, you have to look at the two at the same time.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 12:02 a.m.,
January 4, 2004
(#45) -
tangotiger
The issues with college equivalencies, in order:
1 - strength of schedule relative to other college players
2 - quality of competition relative to MLB
3 - number of games
4 - age
5 - aluminum bats
6 - parks
7 - fielding position
1: this can probably be handled by a logisistic regression model
2: this is huge really... I'll guess that an average college team would play .100 ball against an average MLB team... it's a huge gap... the "feasting on bad pitchers" would apply alot here
3: sample size... not much you can do about it
4: the talent slope goes very high at the 18 to 21 level... while age itself is not much of an issue, when you couple it with the other issues, it just adds another level of variables that you didn't need
5/6: the style of play is much different... no reason to think that all players will be affected similarly, or anywhere close to it
7: this might actually be huge... most good players in college ball play a strong fielding position, but move them to the majors and very few stick to it... the evaluation of a player's fielding talent is extremely important, as it relates to where you can put the guy's bat
As for pitchers, I wouldn't use the normal stat line, but I'd like to get my hands on their performances by count. THAT's where you can figure some things out. I'd want it for hitter's too, but not as much.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 3:41 p.m.,
January 4, 2004
(#49) -
tangotiger
I'm sure not.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 1:58 p.m.,
January 5, 2004
(#57) -
tangotiger
Tango, do you still hate MLE's?
This is probably the 10th time that MGL has mischaracterized my position, and seeing that he likes to (and accuses others of) skim articles, let me reiterate:
I hate the way people derive and treat MLEs as a final product. MLEs, as currently done, are simply a first step (maybe second step in this thread). Until you address the issues of selection bias (and examples have been provided on doing so) and supply confidence levels to what you do, MLEs are far, far from a final product.
The same applies for my disdain to how park factors are derived/treated, along with a whole set of adjustment factors. The basic concept is correct, but the execution leaves much to be desired.
SABR 201 - Issues with MLEs - Why I hate them (December 31, 2003)
Posted 2:01 p.m.,
January 5, 2004
(#58) -
tangotiger
As for the point about the MLEs being the same for pitchers and hitters: they won't be.
If you apply the MLEs on the observed data, they can't be, since you would regress a pitchers hit/bip different that a hitter's hit/bip. If on the other hand you apply MLEs AFTER doing a regression, then what you are left with is simply applying a factor for quality of competition difference.
In this case, it's probably safe to say that talent distribution of hitters and pitchers are similar enough.
Linear Weights by Batting Order (January 2, 2004)
Posted 10:58 a.m.,
March 22, 2004
(#3) -
tangotiger
(homepage)
I made a long comment about batting orders here.
Win Values: Updated for 1969, 1974-1977 - Rob Wood (January 6, 2004)
Discussion ThreadPosted 11:52 p.m.,
January 6, 2004
(#2) -
Tangotiger
Charlie great stuff!
Your second bullet point is the most telling. Essentially, Bert did not pitch "well enough to lose". He essentially pitched the same, regardless of his offensive support.
Your third bullet point is not bizarre at all. When he wins, he gave up 1.61 RA and his team had 5.76. When he lost, his team scored 1.89 and he gave up 4.82 RA. I don't remember what the average RS/RA in wins and losses, but 5.5 RS and 2.5 RA seems about right. So, when he wins, he gives up less runs than an average pitcher that wins. When he loses he gives up less runs than an average pitcher that loses.
All your other numbers in the third bullet are because of your sampling... because he lost, we expect him to have had bad H, HR,BB rates.
So, Blyleven's W/L record does NOT show any "underachieving". Charlie's analysis, and Rob Wood's analysis match here.
Win Values: Updated for 1969, 1974-1977 - Rob Wood (January 6, 2004)
Posted 3:26 p.m.,
January 7, 2004
(#4) -
tangotiger
My guess is that *all* great pitchers post a league average ERA and go 0-whatever in his losses.
Win Values: Updated for 1969, 1974-1977 - Rob Wood (January 6, 2004)
Posted 4:52 p.m.,
January 7, 2004
(#6) -
tangotiger
From 1974-1990, the average team (4.3 RPG), scored 2.7 runs in its losses and 5.9 in its wins, or a little more than twice runs scored in wins than losses.
If you have a good pitcher, say one who allows 3.0 RPG, I would expect such a pitcher to give up 4.64 RPG in his team's losses (which would happen 35% of the time), and 2.12 RPG in his team's wins (which would happen 65% of the time).
Where have you gone Tom Boswell? (January 7, 2004)
Discussion ThreadPosted 11:23 p.m.,
January 7, 2004
(#7) -
Tangotiger
If you go to the "Index of Primate Studies" link, you will see two Primate Studies links that deals with the issue of psychological impact, including a study that was done (not by me). That researcher concluded that the theory was unfounded.
I also looked at this issue, and it is unfounded.
Where have you gone Tom Boswell? (January 7, 2004)
Posted 11:27 p.m.,
January 7, 2004
(#8) -
Tangotiger
(homepage)
Here's a quick link.
Where have you gone Tom Boswell? (January 7, 2004)
Posted 9:28 a.m.,
January 8, 2004
(#10) -
tangotiger
I agree that it would be almost impossible to find it. And, if you read one of my quotes of MGL in that above link, you'll realize that the impact of this, even if it existed to a great extent, is so small as to be almost meaningless.
Much ado...
Where have you gone Tom Boswell? (January 7, 2004)
Posted 2:17 p.m.,
January 14, 2004
(#12) -
tangotiger
I'm only saying what their win impact is. I expect the football HOF to be filled with QBs and not having many OTs. You need 1 of each on your team, but the impact of the QB is far higher.
Same for relievers. You need 'em. But, their win impact is less than twice that of a starter on a per batter basis, but they face one-third the batters.
What if in the future the reliever will only come in with 2 outs to go each game? Do you still want that guy in the HOF?
BABIP and Speed (January 7, 2004)
Posted 11:18 p.m.,
January 7, 2004
(#3) -
Tangotiger
Honestly, I don't understand what the point of this is.
Because we are quantifying what we already know qualitatively.
Yes, faster players get more infield hits and get more doubles and triples and ROE's (and they bunt more often, which results in a hit around 40% of the time). Is that supposed to be a revelation?
Now, when you regress a hitter's hit/BIP, you can also use his speed, so that if you have a fast guy with a .305 and a slow guy with a .305, you regress the fast player upwards, and the slow player downwards.
All of these things are reflected in their stats, other than the ROE's. We've always advocated making an adjustment for ROE's in a player's stats. I don't know that it is "stupid" for MLB to treat a ROE as an out. What do you expect them to do?
Record it separately, just like you record a single and a walk. I see no reason to lump in a RBOE with the other ABouts.
Who says that a player's official stats are supposed to prefectly reflect talent?
No one. But, why not keep something that has a run value of +.5 separate from somethign that has a run value of -.3?
but one of the points of keeping official stats is simply to reflect exactly what is going on in the field.
But, "officially", it's considered an ABout, unless you break out your PBP files.
The fact that some players cause more ROE's than others is not necessarily relevant to MLB - only to someone trying to estimate true value from official stats.
Agreed.
At least they are tabulated separately, so you can do the adjustment if you want.
Only for those who have PBP files.
What would you have them do - call it a single?
Keep it separate.
That's fine too, I suppose, although it is certainly more consistent to call it an out than a single. Frankly, I don't think it matters what you call it. If you have a problem with ROE's you have to have a serious problem with SF's or with including IBB's in a batter's BB totals!
At least the SF and IBB is recorded separately so that I can have the option to add in SF to the other ABouts, and remove the IBB from the total BB. Not true with RBOE.
As far as the "article" goes,
I think it was a good effort. I think encouragement, where effort is put in, is warranted, don't you?
As far as the ROE's, which I assume is your "beef," before we treat them as a single, we would need to see how much is luck and how much is skill.
Agreed, but until MLB compiles them for us easily, those of us with PBP files are the only ones who can do the work on it.
If the skill component is not similar to that of a single (if the period to period r's are not similar), then I don't think you want to treat it is a single. We know that there is SOME skill (i.e. speed) involved in the ROE's - how much is the important question before we get all worked up over the fact that they are normally treated as outs.
Yes, that's the point. Keith Woolner's look at it last year is certainly promising. The league leaders were fast or LH hitters. Whether there is anything extra beyond that, like an additional skill, needs to be looked at. Again, keep it separate.
Accordingly, you might be better off just making an adjustment for speed, GB/FB ratio, and handedness. In fact, maybe just adding a little extra value to a playre's IF hits might do the trick, as a player's infield hit rate (per PA) might be the best predictor of his ROE rate, as they come from essentially the same "skill" (speed, BIP rate, distance from home to first, and GB/FB ratio, and depth of IF)...
Very possible. But again, this goes back to the stupidity of MLB to not account for this as a separate category. Why should Woolner, Emeigh, MGL, or me be the only ones who can answer this question?
The above blogger, Dick Allen, certainly would have enjoyed looking at this issue. And I'm sure many many more.
BABIP and Speed (January 7, 2004)
Posted 2:17 p.m.,
January 8, 2004
(#8) -
tangotiger
I took all players with at least 800 PA (excluding IBB and bunts) from 1999-2002, and compiled their RBOE. I could have split it up by BIP instead of PA, but whatever.
Anyway, we've got 340 hitters, with an average RBOE/PA of 1%. Pat Meares had 22 RBOE over 885 PA, or a rate of 2.5%. A player with 885 PAs would have a variance of .0034. That is, sqrt(.01*.99/885) = .0034. (Apologies if I'm not using the correct terms.) Meares was 4.3 standard deviations away from here, and best in the 1999-2002 period. That is, (.025-.010)/.0034=4.3
Doing the same for all 340 hitters, and only 51% of the hitters were within 1 standard deviation. The standard deviation of these standard deviations is 1.33. Again, if I'm doing this right, this means that the spread is 1.33 times larger than would be expected by random luck.
Using the Guassian method that AED so kindly posted for us a few days ago (and assuming that the true variance is .0035, which I'm sure is too high), and the regression equation for RBOE becomes:
regression towards the mean (RBOE) = 832 / (832 + PA)
In this case, Meares regresses 48% towards the mean, or a "true talent" RBOE rate of 1.8%.
The average regression towards the mean was 33%, with an average of 1700 PAs.
************
Using the other way I had posted:
obs var ^ 2 = true var ^ 2 + luck ^ 2
.0035 ^ 2 = true ^ 2 + .0024 ^ 2
that makes the true = .0024
The regression towards the mean in this case is 50%.
******
The reason for the difference is that I treated
.0035 as the true variance in the Gaussian method. Making .0024 as the true variance, and the new regression towards the mean equation, according to the Gaussian method is:
regression towards the mean (RBOE) = 1700 / (1700 + PA)
BABIP and Speed (January 7, 2004)
Posted 12:06 a.m.,
January 9, 2004
(#15) -
Tangotiger
The run value of a RBOE is HIGHER than that of a single.
Whether ROE is a skill or the byproduct of a trait (like being RH) doesn't matter much to me... the spread of ROE is still not explained by luck. And, the player deserves the result.
The net effect is that it's a differential of +/- 3 runs.
The PA cutoff doesn't matter, since the standard deviation will take care of all that. I could have made it 100 PAs, and it wouldn't have made a difference.
BABIP and Speed (January 7, 2004)
Posted 9:28 a.m.,
January 9, 2004
(#21) -
tangotiger
What is the average value of a hit? I'm going with "more than a single". So I'll stick with ROE is less than a regular hit.
Ok, so a single is also less than a regular hit. What's the point? The RBOE has a .01 run impact greater than a single.
And while I'm no statistician, ROE distribution matters if it is going to be called a skill (or whatever). If it is a skill or trait, then the numbers should have some predictability year-to-year. Do they?
I believe that MGL's method uses year-to-year correlation. So, you have 2 separate methods to establish the nonrandomness of RBOE: (1) the distribution of such an event, (2) the year-to-year persistency of such an event.
And I don't think the player deserves the "credit". The player in question hit a routine GB - why does he get *any* credit for another player not making his play? The critical aspect to this the definition of an error.
Then, we would expect randomness if the player deserves no credit. We think that alot of the reason for RBOE is the propensity to: (1) hit GB and (2) hit balls to the right side of the infield. So, we expect persistency and we expect a distribution that is greater than that which would be based on random variations. And, we get it.
If using 100 PAs would work as well, run the numbers that way and let's see what happens. I think you'll find a wider rate distribution.
You *will* get a wider rate distribution. You will *also*get a wider expected random rate distribution based on the smaller number of samples (PAs) for each player. The *difference* between this new observed wider distribution and the new expected random distribution will remain the *same*. This difference is the true talent distribution.
BABIP and Speed (January 7, 2004)
Posted 11:07 p.m.,
January 9, 2004
(#26) -
Tangotiger
I thought MGL showed it in his PDF, but apparently he didn't. Perhaps MGL can rerun his stuff for RBOE. If he doesn't do it by Monday, I'll do it.
My bet is that the year-to-year r for the 600 PA hitters is .25. There will be some persistency.
BABIP and Speed (January 7, 2004)
Posted 7:20 p.m.,
January 10, 2004
(#28) -
Tangotiger
In my equation, I get 75%. So, like I said, the year-to-year r is .25.
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 9:52 a.m.,
January 8, 2004
(#1) -
tangotiger
One of the problems with what you are doing with component regression is that you treat each of the components as independent of each other.
Say that each component has the same flat regression of 40% for 600 PA.
However, if you were to take LWTS of a player, I'll bet you the regression should be 30% for 600 PA.
By keeping the components separate, you overstate the OVERALL regression, while correctly stating the component regression.
A good way to test this is to do your component regression prediction (and AFTER convert that into LWTS), and also come up with a LWTS regression prediction. Do the two match? Yes? Then, you can ignore what I said, and all the components are independent of one another. No match? Hmmmm....
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 5:16 p.m.,
January 8, 2004
(#2) -
tangotiger
Here are the regression towards the mean figures to use for 600 PAs, as well as the "x" value to use in
regression = x / (x+PA)
obaLWTS 1B 2B 3B HR NIBB HBP RBOE SO
209 298 1,101 571 131 96 255 1,627 62
26% 33% 65% 49% 18% 14% 30% 73% 9%
obaLWTS is that "effective OBA" that I mentioned a few weeks ago, one that weights the single as 0.9 and the HR as 2.0, etc, etc.
All figures are per PA, though that's not necessarily the way I'd do it.
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 11:34 p.m.,
January 8, 2004
(#4) -
Tangotiger
You mean you reduced my article to 3 lines? Where did you get these.
Just continuing work that I brought up a few weeks ago, which is being explained with great thoroughness by the stat-savvy here, like AED, Alan, Arvin, et al.
It's all based on comparing the observed variance to the expected variance based on luck, and attributing the difference to the true variance.
Are you figuring all the rates as per PA
Read the last line of my last post.
PA, do you think it makes a big difference that these are not the best deno,inators to use?
Yes, I believe that it is wrong.
Do these inlcude the possible "inter-dependencies" you mention in your first reponse?
Yes, though I don't know the degree to which this is wrong.
I assume these are for batters. Do you have similar numbers for pitchers?
I did those quick on my way out. I'll do the pitchers tomorrow morning.
Your numbers are very close to mine, except for triples. His lower values may reflect the "inter-dependency." Tango, why do you think our triples are so different.
I believe it's because of the sample size. Your sample is 2B+3B, and mine is PA. However, your spread of 3b/2b+3b is much larger than my 3b/pa. I'm not sure which of the two variables has more of an effect. (They might cancel out.)
I use park adjusted stats. If you did not, the persisitency of triples rate may reflect the park more than the batter.
That's possible, but I don't measure persistency per se. I just measure the spread of triples. However, I should have done (all numbers are variances):
observed = true + park + pitchers + fielders + luck
I think pitchers and fielders would have a variance of 0 from the perspective of the batter. I should have included park, and maybe that has more of an effect on triples. But, I don't really want to get into nuances.
Also do you have values for SB/CS (or attempts per 1B+BB and success rate, which is what I would use)?
No, though I could have done SB,CS,WP,PB,BK,PO as well.
Also, you might want to explain how to use the x/(x+PA)...with the quick and dirty formula: x/(x+PA). I assume that techincally that is not the correct true "relationship" (curve) between the regression coefficient and PA.
Technically, I believe it IS correct, but the stat-savvy can chime in with their expertise.
That formula is probably the most important one to remember out of any formula you will read. In the "obaLWTS" I put out, "x" = 209. Think of obaLWTS as OBA, but "weighted". Not important. Anyway, if you have an OBA of .400 with 1000 PAs, and the league is .300, how much do you regress?
Regression = 209 / (209 + 1000) = .16
So, you regress the .400 16% towards .300, or .384. That's your best guess as to his true obaLWTS. There's also a simple equation that AED put out to figure out the confidence interval, but that escapes me right now.
Tango suggests that the best "Q&D" way to handle this potential problem is to just reduce the regressions for all of the components by some amount.
I'll run the above regressions against my players, by component and then converted to obaLWTS as well as directly on obaLWTS, and see if they match. If they do, then we can assume that interdependency does not play a role. Otherwise, I'll just factor in a blanket interdependency factor across the board to get a fit.
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 10:28 a.m.,
January 9, 2004
(#6) -
tangotiger
Here are the regression values for hitters and for pitchers, on a per 600 PA basis:
Bat Pit Event
26% 39% All
33% 44% 1B
65% 64% 2B
49% 83% 3B
18% 56% HR
14% 24% NIBB
30% 57% HBP
73% 75% RBOE
9% 11% SO
"All" refers to the Linear Weights-based OBA.
Like I said, I WOULDN'T do it this way, per PA, because of the interdependency, but this is good enough for now.
Check out the RBOE. There's a similar impact based on the hitter and pitcher. Now, I *know* that the distribution of batters faced from the pitcher's perspective does NOT have a variance of zero, especially as it relates to handedness. The LH/RH split for a LP and RP are far different.
That the 3B rates regress much more for pitchers than batters is probably due to the batter's speed. The park effect, if the variance is not zero from each of the hitter's and pitcher's perspective, is probably the same for both. So, the regression differentials are probably the same, but the amount of regression might be different.
Check out how much a pitcher's HR has to regress... right in line with his doubles, and MORE than his singles. This does NOT mean that a pitcher has less control on HR than singles, or whatever. It just means that our ability to figure out how much HR skill the pitcher has is limited by the sample available.
In virtually all cases, the hitter's performance is more indicative of his skill level than a pitcher's performance. Again, this is not to say, necessarily, that a hitter has more influence on a PA (they probably do), but rather that the individual performance lines AND the distribution of these performance lines are such that we can tell more about a player if he's a hitter than if he's a pitcher.
***
Incidentally, these numbers kind of support my off-the-cuff MArcel for pitchers to be 3/2/1/2, where the last value is for regression towards the mean, compared to the hitter's 5/4/3/2.
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 11:36 a.m.,
January 9, 2004
(#8) -
tangotiger
Good question: 67% for hitters and 63% for pitchers.
If you count XBH as 2b+3b+hr, 30% for hitters (similar as for singles) and 45% for pitchers (similar as for singles).
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 12:26 p.m.,
January 9, 2004
(#9) -
tangotiger
I just want to reiterate, and it's important, that from a pitcher's perspective, this is the equation for variance:
Obs = True + Batters + Fielders + Park + Luck
From a pitcher's perspective, batters (in terms of talent) will have a variance of 0 but not in terms of handedness. Fielders will definitely not have a variance of 0 for BIP, but would for BB,K,HR. Park would not have a variance of 0.
So, those regression equations I have listed assumes that all these variances ARE zero.
To bring back the famous equation from the Solving DIPS document:
.012 ^ 2 = pitch ^ 2 + field ^ 2 + park ^ 2 + luck ^ 2
After you figure out field, park, luck, you are left with a pitch variance of .009 or .010, depending on the other values.
So, just be careful in trying to conclude anything with the numbers I published.
To continue the work that I did, you want to:
1 - figure out the best denominator for each component
2 - figure out the variance of field,park,bat on each of these components, for the pitcher (and similarly for the batter)
3 - figure out the interdependent relationship between these components
MGL - Component Regression Values (PDF) (January 8, 2004)
Posted 11:00 p.m.,
January 9, 2004
(#15) -
Tangotiger
Great point, Mike!
FJM: you might also be mixing up 2 separate things.
1 - A .300 true OBA pitcher facing a .400 true OBA hitter will have virtually the same result as a .400 pitcher against a .300 hitter. (I've yet to publish the study, but that's pretty much it.) So, the hitter doesn't have greater influence.
2 - The spread of talent is greater with hitters than pitchers. So, the likelihood is that the result is more based on the hitter than the pitcher. If all pitchers were like Pedro, RJ, Maddux and Clemens (variance close to zero), then the result of the matchup would depend almost entirely on the hitter.
In your case, you are seeing #2.
MGL takes on the Neyer challenge (January 13, 2004)
Discussion ThreadPosted 10:25 p.m.,
January 13, 2004
(#1) -
Tangotiger
This appeared at Clutch Hits
Here's a quick way to figure out the impact of quality of competition.
Suppose that the true talent of your average opponent was a .536 record. With a 10-1 runs to wins, that's +.36 run differential, or +.18 on offense and +.18 on defense (I'll assume that your opponent was just as good on off as on def). Remember, this is just a rule of thumb.
If the average team scores 4.5 RPG, we see here the impact is 4% (4% x 4.5 = .18).
So, a hitter who came out with 80 RC was really 4% too high, or 3 RC. Someone with 125 RC was 5 runs too high.
I understand that you should look at how good the opposing pitchers were, and not the whole team (absense play-by-play).
But, this is basically the extent of the impact.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 9:43 a.m.,
January 14, 2004
(#7) -
tangotiger
MGL, further proof of your lack of memory... this is getting real old.
That article that Michael linked was the very very first article I ever linked to on "Regression Towards the Mean". And, yes, that's where I got the 1-r.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 1:29 p.m.,
January 14, 2004
(#11) -
tangotiger
If you're going to regress each game one at a time, independent of the other games, the regression will be virtually 100%.
If on the other hand you regress each game, but dependent on the team's identity, I don't think you'll get much further than just taking the actual RS/RA distributions of the teams, and coming up with the expected win%.
Essentially, the reason you want to take it one game at a time is for the "playing to the score" variable. The likelihood is that there's no such thing, so why bother doing it that way, and just use the scoring distributions of the teams in question.
And, even taking the scoring distributions of the teams in question, and assuming that both teams are 4.5 RPG teams, the result will not be that much different than assuming they have the same scoring distribution (I worked it out once, and at the very extremes, I think it worked out to a 4.5 RPG team against a 4.5 league might win .520 games with an extreme distribution).
So, you are essentially just back to using the mean RPG.
Take the team's mean RS and RA, regress it, and come up with the probability of winning, using the Tango Distribution.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 9:14 p.m.,
January 14, 2004
(#14) -
Tangotiger
Always use 5/4/3/2 for hitters, regardless of # of PAs. Rmember, the "2" means 2x600.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 11:31 p.m.,
January 14, 2004
(#16) -
Tangotiger
but it doesn't explain that elusive way in which teams turn components into wins.
It's not elusive at all. The Tango Distribution explains it pretty well. The combination of the various components (h,hr,bb, etc) gives you the mean and the variance.
Once you've got that, it's rather trivial to come up with an expected win% given 2 distributions.
A good vigorous least-squares regression analysis on all players with 20 PAs from 1900 to 2003 would hit the spot nicely
I'm not sure what you are asking here, but a player's overall talent level can be determed by regressing the performance by
209 / (209 + PA)
MGL takes on the Neyer challenge (January 13, 2004)
Posted 11:37 p.m.,
January 14, 2004
(#17) -
Tangotiger
David, to expand on the regression, say you have
2003: 200 PA
2002: 600 PA
2001: 100 PA
How do you regress?
Marcel says:
performance PA = 200 x 5 + 600 x 4 + 100 x 3 = 3700
league mean PA = 600 x 2 = 1200
So, 1200 / (1200+3700) = 24.5%
That's what Marcel the Monkey says.
How about somethign a bit better?
To get "effective" PAs, I'd do:
effective PA = 200 x 1 + 600 x 0.8 + 100 x .6 = 740
regression = 209 / (740 + 209) = 22%
You'll note that 740 is really just 1/5 of 3700.
To get Marcel in-line with this, I should actually do 500 x 2, and not 600 x 2.
The "209" came from another recent thread.
So, we know exactly how much to regress knowing how many PAs.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 12:40 p.m.,
January 15, 2004
(#20) -
tangotiger
But, I don't think it does.
If we know that every team is equal, but that, as it turns out, the opposing actual win% of the Yanks was .530, we can't assume that they played an unbalanced schedule.
In fact, since we know that every team is equal, Yanks played a group of .500 team that managed to play .530 over their games.
While I agree that these SOS type things do assume that the opponent was .530 in this case and therefore the Yanks played an unbalanced schedule, in fact they did not.
The problem is that we don't know that every team is the same (and we are pretty sure they are not). Therefore, we need to regress the opponents record to account for this.
Perhaps AED can explain how he handles the SOS that takes this into account.
(SOS = strength of schedule)
MGL takes on the Neyer challenge (January 13, 2004)
Posted 7:20 p.m.,
January 15, 2004
(#24) -
Tangotiger
David, you minimize your errors by projecting the true talent as the actual record, rather than increasing the spread of the true talent to match the expected spread of actual records.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 12:23 p.m.,
January 16, 2004
(#28) -
tangotiger
And so if they'd played seven of those games against the Yankees, three against Boston, and I do schedule adjustments and bump them up to .923, that's fine. I know they aren't a .900 team, or a .923 team.
Dackle, what you are saying is fine, but ONLY if you know what the "true talent" win% of Bos and NYY were.
There are two issues here:
1) recasting the 9-1 record into something "true"
2) establishing the opponent's strength
As far as Dackle is concerned, he doesn't care about 1), and neither should we. If KC happens to go 9-1, then that's what they did, and we don't take it away from them. Sosa goes 3-3, with 3 HR and 10 RBIs, and so he gets to keep that performance.
But who did KC do the 9-1 against? Who did Sosa goes 3-3/3HR against?
If you look at KC's opponents, and even if all of KC opponents we "knew" were .500, they would not have performed at .500 over the 10 games. Therefore, if we know the opponents are .500, but they actually PERFORMED at .600, this does NOT mean that we set KC's opponent's strength at .600 (and reset their .900 record to .920 or something). The opponents ARE a .500 team, but they just happened to play like a .600 team. KC's 9-1 record was done against a .500 team, and therefore, no adjustment needed.
Same thing with Sosa. If he happens to do that against Pedro Martinez, but Pedro in his two only previous games was shelled, we don't say "oh, Sammy's strength of opponent had a 7.89 ERA". The limited sample of the opponent does not carry enough information about what the opponent is truly capable of.
In order to do SOS, you need to establish the true talent level of the opponent (whether by team or player).
This does not mean that you regress KC's 9-1 or Sammy's 3 HR night (for what Dackle is trying to do anyway).
MGL takes on the Neyer challenge (January 13, 2004)
Posted 11:21 p.m.,
January 16, 2004
(#34) -
Tangotiger
Dackle, I think we are at a definite impasse, as both sides understand (but don't accept) each other's perspective.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 8:27 a.m.,
January 18, 2004
(#40) -
Tangotiger
I want, on average, for the 2003 Red Sox to go 95-67 in my theoretical dice replay.
And I suppose you want the Royals to want to go 9-1 each time, right?
Under those requirememts then, Dackle's approach is the only one that can fit the bill.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 4:37 p.m.,
January 18, 2004
(#43) -
Tangotiger
Actually, AED, there's no reason to make the SOS additive. I think this was just an example.
In the case where KC is 9-1 and the SOS was .600, I would do:
KC Odds: 9:1
Opp Odds: 1.5:1
KC adj Odds: 13.5:1 (9x1.5)
KC adj win% = 13.5/14.5 = .931
***
The problem here is Dackle trying to justify his approach, when really all he cares about is replicating KC's 9-1. From that standpoint, all the discussion and debate goes away. What Dackle wants is to replicate KC's 9-1, and therefore, he has no choice but to make that it's true talent. And therefore, everyone else is in the same boat... they all have their true talents kept. So,from that perspective, Dackle is fine.
His premise though is hard to accept. But, given his premise, I think we have to accept his methodology.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 4:39 p.m.,
January 18, 2004
(#44) -
Tangotiger
Essentially, whether KC's .900 is done in 10 games or 10 million, Dackle WANTS the same result.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 7:05 p.m.,
January 18, 2004
(#47) -
tangotiger
Essentially as the odds approaches infinity, the rate approaches 1.
So, regardless of the opposition (unless that opposition is also 100%), the opposition's strength is ignored for a team with a win% of 100.
MGL takes on the Neyer challenge (January 13, 2004)
Posted 9:41 a.m.,
January 19, 2004
(#52) -
tangotiger
And I think that's exactly what Dackle wants.
Treat any 9-1 record as really being 9 billion wins to 1 billion losses.
Or 5-0 being 5 gazillion wins and 0 losses.
MLB Timeline - Best players by position (January 14, 2004)
Posted 3:55 p.m.,
January 23, 2004
(#23) -
tangotiger
AED, feel free to send me that article, and I'd be glad to post it here (or put a link to your site, if you post it there).
I can't believe that the "clutch ability" would be more than 1 SD = 2 runs, and I would guess that 1 SD = 1 run. I'd be interested to see that, for sure.
DRA Addendum (Excel) (January 16, 2004)
Posted 11:20 p.m.,
January 17, 2004
(#7) -
Tangotiger
I'll post a PDF version of Michael's file on Monday.
DRA Addendum (Excel) (January 16, 2004)
Posted 10:01 a.m.,
January 19, 2004
(#9) -
tangotiger
(homepage)
PDF file can be found here.
DRA Addendum (Excel) (January 16, 2004)
Posted 3:07 p.m.,
January 20, 2004
(#13) -
tangotiger
Using my "True Talent" fielders generated from UZR, 166 of the 402 (41%) fielders were within 1 SD (using an average of about 1000 BIP per player). The spread of fielding talent, according to UZR, is about twice that which you'd get from random.
1 SD corresponds to about 5 runs per 600 BIP.
DRA Addendum (Excel) (January 16, 2004)
Posted 1:14 p.m.,
March 12, 2004
(#20) -
tangotiger
This was posted by Miko:
=========================
Posted 11:50 a.m., March 12, 2004 (#21) - Miko
To those of us who don't (yet) have excel (yes, it's true, and I'm embarassed), why not just put a "leaders" post or whatever here, so that everyone can share in the glory of DRA (absolutely no disrespect or satire intended).
FYI:
One can almost always open excel spreadsheets of the sort MHumpreys provides with the spreadsheet program "Calc," which is included in the open-source and freely downloadable Open Office.
OO may barf on excel files which uses lots of graphs and such, but so far I've been able to open all of the ones linked to on BPrimer.
Oh, and thanks, tango, for releasing data in .csv format so often. Data portability is underrated.
Obscure Rule Flags Students Who Sharply Improve SAT Scores (January 21, 2004)
Discussion ThreadPosted 10:49 p.m.,
January 21, 2004
(#1) -
tangotiger
(homepage)
This link has the full story, along with this:
The testing service looked at Soriano's answers and those of the students sitting near her. Using a measure called the Probability of Matched Incorrect Answers, the review panel determined Soriano's incorrect answers on the test were similar enough to those of a student at a neighboring desk to indicate she had cheated.
Obscure Rule Flags Students Who Sharply Improve SAT Scores (January 21, 2004)
Posted 9:32 a.m.,
January 22, 2004
(#5) -
tangotiger
Having multiple tests (or even the same tests but differetn ordered questions) in the same room seems to be the easiest way to control this. Not sure why they wouldn't just make it a rule.
MGL's MLEs (January 22, 2004)
Posted 3:01 p.m.,
January 22, 2004
(#2) -
tangotiger
Now, all we need is for someone to supply minor league fielding statistics, which I can translate into position-neutral stats, and we'll have the best list of minor league players.
Michael H: do you have access to minor league data?
MGL's MLEs (January 22, 2004)
Posted 4:21 p.m.,
January 23, 2004
(#5) -
tangotiger
The teams are whatever their last team was they played for.
CSV: you should be able to look at them with any text editor. I don't know what kind of text format the mac uses, but there must be some conversion utility somewhere.
MGL's MLEs (January 22, 2004)
Posted 4:40 p.m.,
January 27, 2004
(#7) -
tangotiger
(homepage)
According to the above link, Myrow played in the minor league all-star team.
According to MGL's MLEs, Myrow is an excellent hitter by MLB standards.
He's also a 3B in the Yankee system.
Soooooooo..... can anyone tell me about this guy?
MGL's MLEs (January 22, 2004)
Posted 10:44 a.m.,
January 28, 2004
(#9) -
tangotiger
Ouch.... that's pretty bad. He sounds like the perfect DH.
Clutch Hits - True Talent Levels (January 22, 2004)
Discussion ThreadPosted 4:38 p.m.,
January 24, 2004
(#2) -
tangotiger
(I usually block these threads, when I reference other threads. Not sure how this one slipped by me!)
***
Just to be clear, are you saying that I'm saying that:
1 - there is only one true talent level per player/student, given the exact context
2 - there is multiple talent levels per player/student, given the exact context
3 - there is multiple talent levels per player/student, given multiple contexts
The answer is #1 and #3.
Are you saying that I'm saying:
1 - We can measure the whole distribution of true talent levels of ALL Players, as a single true talent level per player, given the single same context
2 - We can measure the whole distribution of true talent levels of ALL Players, as a single true talent level per player, given the single different contexts
3 - We can measure the whole distribution of true talent levels of ALL players, as multiple true talent levels per player, given the multiple contexts
The answer is #1.
If you agree that this is what I'm saying, then you can proceed with whatever assumptions you have about what I'm saying.
If you don't agree with my answers, then you do not know what I'm saying. Therefore, proceed with whatever you have to say, without referencing what I'm saying.
SuperLWTS Aging Curve (January 26, 2004)
Discussion ThreadPosted 11:36 a.m.,
January 26, 2004
(#2) -
tangotiger
I agree that the peak age does go down a bit, since fielding/baserunning is much more speed dependent (and we know those peak pretty early).
I usually use 25-29 to denote peak.
I would almost always stay away from a free agent, since they are usually paid above what they can deliver in the future. No question that trying to make it off the backs of the young guys, if you can find those quality young guys, has a great return for the money.
SuperLWTS Aging Curve (January 26, 2004)
Posted 1:12 p.m.,
January 26, 2004
(#3) -
tangotiger
I should have noted how MGL did this. He took players from successive seasons (not sure how far back) with at least 100 PAs in each season, and weighted them by the lesser of the two PAs.
There are selective sampling issues to be sure.
One thing that I've started to do is regress the performance of year 1. For the year 2, there's other issues as well, which can be partly handled by regression, but not to the same extent or way that you would do the year 1 regressions. As well, the weighting issue is a little dicey, and you have to be careful there.
I'm sure those with stat knowledge of Heckman Selection Correction can speak better than I can on this issue.
SuperLWTS Aging Curve (January 26, 2004)
Posted 10:33 a.m.,
January 27, 2004
(#10) -
tangotiger
File has been updated.
SuperLWTS Aging Curve (January 26, 2004)
Posted 3:01 p.m.,
January 27, 2004
(#13) -
tangotiger
(homepage)
AED, If you have not checked it out yet, go to the above homepage link, and click on the article there.
I'd be interested to hear your thoughts on the sampling issue, insofar as the PA component plays a role.
SuperLWTS Aging Curve (January 26, 2004)
Posted 8:50 a.m.,
January 28, 2004
(#18) -
tangotiger
The easiest ways to get rid of that are to either require a player to have played at least N seasons
This won't work either. Even for 10 years, you still run the risk that the observed line is a bit higher than what his true talent line is. The fewer the years you use,the bigger the gap between the observed line and the true talent line.
The player's last year's observed point will almost surely be way below his true talent point.
No player's true talent drops as much as his observed performance drops in his last year of play.
SuperLWTS Aging Curve (January 26, 2004)
Posted 2:16 p.m.,
January 28, 2004
(#19) -
tangotiger
(homepage)
If you go to the above link, and page down to my "June 28" comments, you will see that I did an interesting exercise, similar to what is being asked here.
I broke my groups of players by length of careers, to get different aging patterns.
Aging Patterns and Selective Sampling (January 26, 2004)
Discussion ThreadPosted 5:07 p.m.,
January 26, 2004
(#1) -
tangotiger
There's a tremendous amount of information you can get from that last chart at the end. See the "count" number? You can generate "attrition rates" based on that. That is, guys who are full-timers for 2 years in a row at the age 26-28 class have a low attrition rate, while part-timers have a larger attrition rate.
I'm sure if you add more parameters, like performance level, etc, you can create a function for attrition rates rather easily. That is, given age, probable true talent level (hitting+fielding), and number of PAs, you can come up with a nifty function to figure out the expected distribution of PAs for the next 3 years for a player.
Futility Infielder - 2003 DIPS (January 27, 2004)
Discussion ThreadPosted 11:25 a.m.,
January 27, 2004
(#5) -
tangotiger
(homepage)
Mike,
I suggest you use FIP:
(13*HR+3*BB-2*SO)/IP
You may find the above link somewhat informative. You can also check out baseballgraphs.com
Tom
Futility Infielder - 2003 DIPS (January 27, 2004)
Posted 3:02 p.m.,
January 27, 2004
(#16) -
tangotiger
Yes, I think a simple:
dipsERA = FIP + 3.2
(or whatever the 3.2 needs to be to best-fit to the 2003 data)
would be a good idea, to go along side Jay's data.
Futility Infielder - 2003 DIPS (January 27, 2004)
Posted 3:43 p.m.,
January 27, 2004
(#19) -
tangotiger
Jay, if you want to include FIP, you can do the following:
dipsERA = (13*HR + 3*(BB-IBB+HBP) - 2*SO)/IP + 3.12
That's for 2003. The constant for the last few years is:
2003: 3.12
2002: 3.06
2001: 3.15
2000: 3.22
Futility Infielder - 2003 DIPS (January 27, 2004)
Posted 12:07 a.m.,
January 28, 2004
(#26) -
tangotiger
The only reason to use 1 year PF is if you think that:
1 - The climate was very nonrandom that year compared to other years
2 - The park dimensions changed enough that you are playing in practically a new park
I would cut my losses, and assume for an outdoor park that it's 50% 1 year and 50% 100 years (or however long you can go not to conflict with #2 above).
(Reminder with multi-year PF: the PF may be set to 1.0 for the league for 2003, but it won't be 1.0 for the league in any other year where the parks are not the same.)
Futility Infielder - 2003 DIPS (January 27, 2004)
Posted 2:21 p.m.,
January 28, 2004
(#31) -
tangotiger
I agree with Charlie.
Our best guess as to the ability of a pitcher on BIP, is 1 SD = .009 hits / BIP.
So, given 700 BIP, we expect 95% of our pitchers to have a true talent rate of +/- 10 runs.
I don't know what the +/- is on the 250 non-BIP, but I think it would be higher. (It's certainly higher on a per-play basis)
Clutch Hitters (January 27, 2004)
Posted 10:26 p.m.,
January 27, 2004
(#4) -
tangotiger
Leveraged Index.
It's the amount of "swing win" impact each situation presents. Bottom of 9th, down by 1, men on base, 1 out, the probability distribution for the potential new win % swings about 10 times than a random situation. That's an LI of 10.
Bottom of the third, up by 18, the LI is almost 0.
Clutch Hitters (January 27, 2004)
Posted 11:32 a.m.,
January 28, 2004
(#8) -
tangotiger
I'll try to do that tomorrow.
I do it both ways, by total runs and total runs per 600 PA. I just presented it one way here.
Clutch Hitters (January 27, 2004)
Posted 12:37 p.m.,
January 28, 2004
(#9) -
tangotiger
Among the 119 hitters with at least 2000 PA from 99-02, the LI for the hitters ranged from 1.09 (Phil Nevin) to .91 (Fernando Vina). Tejada was 110th among these hitters at .94.
Clutch Pitchers (January 28, 2004)
Posted 11:55 p.m.,
January 28, 2004
(#1) -
tangotiger
Btw, I redid the LI for all my relievers, but this time, I used the win probability table for 1999-2002 (as opposed to my generic 1974-1990 win prob table).
Boy, did that make a difference. It's alot easier to win a game with a 3-run lead in 1986 than in 2000. As a result of redoing the LI, Troy Percival comes in with a 2.17 for 1999-2002. 9 pitchers had an LI above Bruce Sutter's career LI of 1.90. Keith Foulke came in #23, which shows how horribly he was used. Lou Pote brought up the rear with pitchers with at least 800 PA with .54, and the incredible John Wasdin had a .64 for 2nd to last place.
Clutch Pitchers (January 28, 2004)
Posted 2:44 p.m.,
January 29, 2004
(#3) -
tangotiger
I wanted to run Wasdin, because I couldn't believe it either. Here we go.
This is how Wasdin did, in various pressure situations (the more negative, the better):
PA avgLI runs
582 0.18 (19)
198 0.73 (7)
84 1.24 10
69 1.90 7
42 3.35 8
So, he had 582 PAs in low-pressure situations, and he was great. He had 42+69 high pressure PAs, and he was terrible.
The total of his performance was 975 PAs, for a total of -1 runs. That is, without quantifying the pressure of his performance, he was a league average pitcher.
However, those 582 PAs had very little pressure (18% of normal swing impact). Recasting the above numbers by multiplying by the LI, and we get:
PA runs
103 (3)
145 (5)
104 13
131 13
140 28
His total leveraged PAs is now 622. (Remember, his total was 975, so his LI is .64).
The total of his runs is now +46 runs.
Hmmm... how did I get 72? Looks like I have a bug in my program, as I double-counted his LI (46/.64 = 72). I'll fix that shortly.
So, Wasdin is 47 runs worse, based on the game context, that his random performance would have dictated.
Looking at Wasdin's 42 PA with the highest pressure (i.e., with the game most on the line), this is what he did:
HR: 5
2B: 3
RBOE: 2
1B: 5
HBP: 2
BB: 4
Outs: 21
Now, that is complete futility. 21 times on base in his 42 most PAs where the game was most on the line. That's a .500 OBA. Each of these PAs had an LI of at least 2.5 (which is fireman territory).
John Wasdin pitched exactly as bad as Barry Bonds hits well, and he did it with the game on the line.
If I go back to his 111 PAs with the LI of at least 1.5, and the hitters got on base 49 times in 111, or .441, with a total of TWELVE HR. Remember, that's in 111 PAs.
A truly horrible performance.
Clutch Pitchers (January 28, 2004)
Posted 2:53 p.m.,
January 29, 2004
(#4) -
tangotiger
FJM: thanks for spotting the bug. List has now been updated. Seeing Mariano Rivera as "bad clutch" and Armando as "good clutch", I will present their numbers in a few minutes as well.
Clutch Pitchers (January 28, 2004)
Posted 3:02 p.m.,
January 29, 2004
(#5) -
tangotiger
Mariano Rivera. Here we go. 312 PA (I exclude all IBB) with an LI of at least 2.5
6 HR, 6 2B, 6 RBOE, 52 1B, 1 HBP, 17 BB, 224 outs.
That's an OBA of .282, which is excellent.
However, overall, he had 1066 PA, with an OBA of .260, which is sensational. So, his "poor" showing with the pressue for Mo is that he went from super-great overall to great overall.
I'll do Armando next.
Clutch Pitchers (January 28, 2004)
Posted 3:06 p.m.,
January 29, 2004
(#6) -
tangotiger
Armando. .275 OBA overall. .220 OBA with the game on the line!
So, Armando went from being great to unfreakingbelievable.
Benitez had 353 PAs with the game on the line. This is what he did:
8 HR, 2 3B, 8 2B, 2 RBOE, 28 1B, 2 HBP, 28 BB, 275 outs.
Clutch Pitchers (January 28, 2004)
Posted 3:51 p.m.,
January 29, 2004
(#7) -
tangotiger
When I look at my "Win Advancement" (WA), who do we see bringing up the rear? John Wasdin. He had 9.7 WA and 14.2 LA (loss advancements). That makes him -4.5 wins compared to average. This WA process was done on a PA-by-PA basis.
As you can see, my current "rough" process detailed in this thread had Wadin at -46 effective runs. At a 10:1 runs to win converter, that translates to -4.6 wins compared to average.
Clutch Pitchers (January 28, 2004)
Posted 4:57 p.m.,
January 29, 2004
(#9) -
tangotiger
"effOBA" is LWTS converted into an OBA, as described in a thread a few weeks ago.
Mlicki
PA avgLI effOBA
600 0.31 0.407
1018 0.77 0.374
561 1.23 0.353
356 1.87 0.346
48 3.11 0.258
Weathers
PA avgLI effOBA
489 0.21 0.334
284 0.73 0.321
183 1.24 0.340
263 1.97 0.293
171 3.48 0.243
Should I just generate this for all the pitchers?
Clutch Pitchers (January 28, 2004)
Posted 11:16 p.m.,
January 29, 2004
(#11) -
tangotiger
Pitcher stats are alot less reliable than hitter stats. If you take his last 2 lines of pressure (434 PA), you would regress that by a little less than 50%. If he was a hitter, you'd regress by about 30%.
But, this regression is towards the population mean.
However, since we "know" that that observed distribution of high-pressure OBA compared to overall true talent OBA is exactly what you'd expect from luck, you should regress everyone's high-pressure OBA by 100% towards the true talent.
So, either you regress his performance by 50% towards the pop mean of .340, or you regress his performance by 100% towards his true mean.
Bottom line, our best guess is that all these performances are just random variations centered around the player's true mean.
Clutch Pitchers (January 28, 2004)
Posted 11:45 a.m.,
January 30, 2004
(#15) -
tangotiger
Having a low LI jsut means that they weren't facing many "fire" situations. We don't know why that is. Is it because their teams took early leads? Is it because they pitched great, to let their teams score? Is it because they pitched horribly, and their teams couldn't keep up the score?
LI is not an indication of the pitcher, so much as the situation. LI = level of fire.
As for Wasdin, to the extent that clutch pitching might exist, Wasdin is by far the worst clutch pitcher around. If a psychologist were to interview Wasdin, and tell us that Wasdin is perfectly fine, and whatever intangible characteristic they can figure out, then that's another nail in the "clutch doesn't exist".
Wasdin is the poster boy here, as much as Patrick Roy is the poster boy for great clutch performer. I BELIEVE that Roy does have something extra.... up until the point in the semi-finals when he gave up 9 goals against Detroit. So, who knows.
Super Bowl Notebook: Is Adams a genius' genius? (January 29, 2004)
Discussion ThreadPosted 9:51 a.m.,
January 29, 2004
(#1) -
tangotiger
(homepage)
Here's another piece on Adams.
Adams is in charge of computer analysis and statistical evaluation, but as a former coach and scout he is more than a mathematician. Adams has had a hand in nearly all of the major personnel and game-planning decisions made by Belichick in the past four seasons. ... it was Belichick and Adams who ... determined that the timing of the entire Rams offense was predicated on running back Marshall Faulk. The Pats' championship-winning game plan, which relied on pounding Faulk at every opportunity, was the result of that conversation. ...
Adams truly is a behind-the-scenes character and he has not granted interview requests to this point, but much of his work will be on full display in the coming days (or weeks) as the Pats go after another championship.
Leveraged Index Leaders/Trailers (January 29, 2004)
Posted 11:22 a.m.,
January 29, 2004
(#2) -
tangotiger
You will also note that Mike Lieberthal (also with the Phils) came in the top 10. The Phils may have had an unusual number of close games played.
Tejada being low may also be a sign that the A's had fewer close games played. Sorry, I don't have the time to run the team data, but I can give you the LI for Zito (0.91), Hudson (0.95) and Mulder (0.97). Lidle was 0.91 and Heredia was 0.97 from 1999-2002. So, it could simply be that the A's didn't play in much pressure situations because they would take an early lead.
Leveraged Index Leaders/Trailers (January 29, 2004)
Posted 11:22 a.m.,
January 31, 2004
(#4) -
tangotiger
I wouldn't try to make any conclusion on two or three players.
Sheehan: Foulkelore (January 29, 2004)
Discussion ThreadPosted 1:56 p.m.,
January 29, 2004
(#1) -
tangotiger
Hmmm... just thinking about it now. I said:
28: 4.5
29: 4.3
30: 4.1
31: 3.9
32: 3.7
33: 3.5
34: 3.3
If I add in the 26 and 27 lines, I would have had:
26: 4.5
27: 4.7
28: 4.5
29: 4.3
30: 4.1
31: 3.9
32: 3.7
33: 3.5
34: 3.3
The 26-30 group we expect was actually a total of 22.5. So, all the numbers should bump up by 0.1, so that we get:
26: 4.6
27: 4.8
28: 4.6
29: 4.4
30: 4.2
31: 4.0
32: 3.8
33: 3.6
34: 3.4
The 26-30 group comes in for a total of 22.6. That's the regressed performance (true talent) of our pitchers.
Their out-of-sample true talents would be 14.8 wins.
Not much change, but just wanted to clear it up.
Sheehan: Foulkelore (January 29, 2004)
Posted 6:54 p.m.,
January 29, 2004
(#3) -
tangotiger
Great stuff, Bob!
Smack the Pingu (January 29, 2004)
Posted 4:54 p.m.,
January 29, 2004
(#1) -
tangotiger
(homepage)
This link contains the original.
Smack the Pingu (January 29, 2004)
Posted 9:40 a.m.,
January 30, 2004
(#6) -
tangotiger
jto, I got the same score.... I think that might be the max.
I also got a negative number, and it might be if you swing and miss the very first time.
Astute Phelps learning his way with Jays (January 29, 2004)
Posted 1:48 p.m.,
January 30, 2004
(#3) -
tangotiger
A pitcher has a skill with hits per bip. A batter has a skill with hits per bip.
It's far easier to detect that with batters. And, the spread in the talent is wider for batters.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Discussion ThreadPosted 11:50 a.m.,
January 30, 2004
(#2) -
tangotiger
Please be careful that I have listed things as RATIOS and not rates. So, there's going to be some "interdependence". Just be careful!
I'll let the stat-savvy's speak their mind on the regression towards the mean and selective sampling issue. Following that, I'll be happy to creat an Excel file (something like Brock2.... I don't want to use Marcel, because this is too convoluted for that... how about Rodin1?).
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 11:54 a.m.,
January 30, 2004
(#4) -
tangotiger(e-mail)
Oh, and the other thing that I haven't done (and I'm not sure that I will do in the foreseeable future) is playing time.
So, in conjunction with the above, you want to have something else that also includes playing time. Pitchers are a pain, because of the start/relief issue, and the high number of injuries.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 1:31 p.m.,
January 30, 2004
(#8) -
tangotiger
Actually, I should have noted that I only looked at pitchers who faced at least 250 batters in consecutive years while playing for the same team and in the same league. So, in the rare cases where a team changed park, this was not a good choice. (Perhaps I should have limited by that too.) However, seeing that it is rare, it would hardly cause a dent. Remember, I've got several hundred pitchers per age pair. One or two pitchers won't affect that.
As for ratios and not rates, I've talked about this a few times in a few places. It's the only way to get the chaining to work fine. I'll explain in a separate post.
As for the "binary" approach, this is also necessary to chain, and to get the best "opportunity" factor for every event, and, so far, the best way to isolate all the events (probably).
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 1:32 p.m.,
January 30, 2004
(#9) -
tangotiger
Would using the various DIPS components break up the interdependences
This is what I used. From the article, I noted:
HBP: HBP/(PA-IBB-HBP)
BB: (BB-IBB)/(PA-HBP-BB)
SO: SO/(PA-HBP-BB-SO)
HR: HR/(PA-HBP-BB-SO-HR)
xH: (H-HR)/(PA-HBP-BB-SO-H)
As you can see, the numerator becomes more and more isolated.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 1:32 p.m.,
January 30, 2004
(#10) -
tangotiger
Ughh... denominator.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 1:45 p.m.,
January 30, 2004
(#11) -
tangotiger
I posted this at fanhome 3 years ago, and it's important to those who want to understand the difference between rates and ratios (odds).
=========================
Let's say that at age 31, the typical player has 5 triples, and 25 doubles. And let's say that that same guy, playing at age 36 would have 2 triples and 20 doubles.
So, as a "ratio", the age 31 3b/2b is 5:25, or .20. At age 36, the ratio is 2:20, or .10. So, comparing age 36 to age 31 shows that the ratio should be .10/.20 or .50.
Now, let's say you have a guy that has 10 triples, and 20 doubles at age 31. That ratio is .333. At age 36, we would expect this guy, if he follows the same aging pattern as the typical example from above to have a ratio of .1667. (Now, this is where you need more info. You need to figure out his $BB, so that you can get his new AB and BB. Then you need his $K, $HR, $H, $XBH to get his new K, HR, H, XBH. Let's say his XBH comes out to 21.) So, if his ratio is .1667, and we know that his 2b+3b is 21, then we can say that his 3B will be 3, and his 2B will be 18.
Now, let's try it the other way, and work with percentages. The typical player at age 31 will have 17% of his XBH as triples, and 83% as doubles. At age 36, the triples% will be 11% and his doubles% will be 89%. As you can see, his triples rate at age 36 is 67% of age 31, and his doubles rate of age 36 is 107% of his age 31 rate.
Ok, so, now, let's take our guy with 10 triples and 20 doubles at age 31 again. His triples rate is 33% and his doubles rate is 67%. Applying the factors from above, at age 36 his triples rate will be .333*.667 = .222. His doubles rate will be .666 * 1.07 = .713. As you can see, his 2b+3b will equal .935. And this is impossible.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 2:27 p.m.,
January 30, 2004
(#13) -
tangotiger
MGL, I looked at age/experience a few years ago at Fanhome. I'll try to find it, and post it here for you.
==============================
As for the pre-chaining, and post-chaining.
Say you have
Age Change in Performance
26-27 +5%
27-28 +0%
28-29 -5%
29-30 -10%
30-31 -15%
What do you do?
Now, the simple way would be to do:
26 100
27 105 (or 100 + 5)
28 105 (or 105 + 0)
29 100 (or 105 - 5)
30 90 (or 100 - 10)
31 75 (or 90 - 15)
This is probably the way MGL does it. But, percentages are not meant to be added like this. It's close, but not mathematically accurate.
What you want is:
26 100
27 105 (or 100 + 5%)
28 105 (or 105 + 0%)
29 99.75 (or 105 - 5% or 105 * .95)
30 89.775 (or 99.75 - 10% or 99.75 * .90)
31 76.31 (or 89.775 - 15% or 89.775 * .85)
I understand there's not much difference (especially since we are really only taking about 2 or 3% change, and not the 5 or 10% in my example).
But, I see no reason not to do it right.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 2:53 p.m.,
January 30, 2004
(#14) -
tangotiger
(homepage)
MGL, the above homepage link contains a file I had prepared for you at fanhome 2 years ago. I don't remember the time period but I think it was 1979-1999.
It shows you the unregressed year-to-year change, for ERA, given: prior years experience and age.
Let's look at age 27, and let me cut and paste the relevant portion:
PriorExp Age PA1 $ER
0 27 9135 1.06
1 27 19121 1.05
2 27 33890 1.06
3 27 44862 1.04
4 27 38466 1.06
5 27 29669 1.02
6 27 22825 1.06
7 27 8268 0.96
So, what does this show us? For players aged 27, their performance worsened by about 5%, regardless of number of previous years of experience.
****
Running a regression analysis of prior years against change in performance, and the r is .02. A regression of age against change in performance, and the r is .18. Age and prior years, and the r is .31. So, it seems that number of prior years seems to be important.
However...
How about running a regression of age when first made the majors (age minus prior exp) against change in performance? r is .28.
When a pitcher is brought up at age 21, it includes with it a certain amoung of scouting information.
Let's look at the 10 best change in performance year-to-year:
PriorExp Age First Year PA1 $ER
18 37 19 2260 0.82
15 35 20 2449 0.85
6 25 19 2346 0.87
10 30 20 6319 0.87
16 35 19 2776 0.89
4 24 20 10130 0.92
17 37 20 2321 0.92
4 25 21 24857 0.92
13 34 21 7089 0.92
7 34 27 2892 0.92
Wow. In 9 of the 10 age groups, knowing that the pitcher first entered the league at age 19 through 21, and we see that those pitchers had the best change in performance.
Of course, we've got small sample-itis. Redoing this, but limiting it to at least 10,000 PAs for each year-to-year change, and we have:
PriorExp Age First Year PA1 $ER
4 25 21 24857 0.92
4 24 20 10130 0.92
9 30 21 14527 0.94
2 23 21 20635 0.96
2 28 26 16468 0.98
4 30 26 14764 0.98
1 24 23 40812 0.99
8 29 21 19124 0.99
1 25 24 45688 1.01
5 28 23 38200 1.01
Again, the pitchers who started their careers at age 20/21 dominate the most.
Here are the 5 worst year-to-year change in performance:
PriorExp Age First Year PA1 $ER
3 28 25 27264 1.13
3 29 26 17358 1.13
9 33 24 18345 1.14
0 26 26 19150 1.16
6 32 26 11438 1.17
Pitcher who first started at age 25/26.
Let me reiterate: the biggest indicator is NOT based on how many prior years experience you have, but rather the age at which you first entered the league.
The younger you entered the league, the better your year-to-year changes will be, regardless of how many prior years experience you have had to that point.
So, to maximize the forecast for your pitchers, you want to know:
- how old is he now
- how old was he when he first entered the league
That second portion "carries" information about your pitcher. A pitcher who makes MLB at age 21 probably has much much more true talent than a pitcher who makes MLB at age 28. What this does is that rather than regressing all your pitchers towards the same pop mean, your pitchers who started off at age 21 will regress to a much higher (in terms of true talent) pop mean than the pitcher who makes the bigs at age 28.
That is, knowing the age a pitcher starts off, is kind of like using scouting information.
And, as we see here, an incredible amount of information can be gained from that.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 4:18 p.m.,
January 30, 2004
(#16) -
tangotiger
One final word on this.
If I start to create "classes", I get better regressions using:
Prior Year's Experience: 0,1,2+
First Year in MLB: 21 or under, 22, 23, 24, 25 or later
Age: (no classes, each age is its own class)
So, for prior year's experience, once you have been in the league 2+ years, experience doesn't count for anything.
For first year, treat all the 21 and under pitchers as carrying the same amount of scouting information, and the 25 and older pitchers as well.
Have fun!
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 4:22 p.m.,
January 30, 2004
(#17) -
tangotiger
experience doesn't count for anything.
that should read, "count for anything more". That is, 2 or 5 years of prior experience doesn't make any difference.
I agree, lots more to do, but I don't see much anyway. The changes, year-to-year will still be pretty tiny.
I suppose if you were trying to forecast for the next 5 years, it would be important. But, for year-to-year? I don't see it.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 4:43 p.m.,
January 30, 2004
(#18) -
tangotiger
Ok, one more thing to think about.
If say that the year-to-year changes showed that the ERA got worse by 2% for EVERY age pair, the regression would show that age would have a correlation of ZERO.
That is, suppose you have:
21 to 22 102%
22 to 23 102%
23 to 24 102%
24 to 25 102%
25 to 26 102%
26 to 27 102%
27 to 28 102%
28 to 29 102%
29 to 30 102%
30 to 31 102%
31 to 32 102%
32 to 33 102%
Clearly, what we have is that for every age pair, there is a constant decrease in performance. However, the regression shows that the age pair would have zero effect.
What the age thing is actually showing is that there's no ADDITIONAL effect at that age, but that age does have an effect.
The problem is that I'm doing age pairs, and then running a regression on the age.
What I REALLY should do (but won't) is to first chain the age performances (1.02, 1.04, 1.06, etc, etc), and then run a regression of that performance line. However, it won't be a straight line anymore, but some parabolic curve.
So, really, I need to, somehow, run a regression against some sort of quadratic equation.
Anyway, I posted the file with all the data. The stat-savvy's among you probably know what I'm talking about, and can figure it out better than I could.
Hopefully, we'll see some results.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 11:41 p.m.,
January 30, 2004
(#23) -
tangotiger
Voros also used the lesser of 2 PAs for his minor league MLEs. It's always nice to have someone with AED's background to set us stat-amateurs straight.
MGL, your post 19 is almost accurate, except that PA column IS the min(PA1,PA2).
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 11:25 a.m.,
January 31, 2004
(#25) -
tangotiger
(homepage)
Not really. Regression towards the mean affects hitters much less than pitchers (because their stats are a more reliable indicator).
However, my first article at Primer 2 years ago had a rather extensive look at hitters. And, if you go to my site above, you'll get a great chart on aging patterns by components (similar to what I did for pitching).
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 1:04 p.m.,
January 31, 2004
(#27) -
tangotiger
MGL, I'm faily confidant that if you use the RATIO process, that you will get the best year-to-year change that can be applied to various levels o fperformance.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 3:45 p.m.,
February 1, 2004
(#36) -
tangotiger
if your study of regression gave the values in the middle chart, why not just use those? Why confuse us as to which chart to use?
Because of the amount of selective sampling, I'm not sure that my regression values apply directly here.
Since I did not spend much effort in creating those charts, I did not want to leave the impression that any of the charts are final, and then having to justify something that I did not research extensively.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 4:01 p.m.,
February 2, 2004
(#38) -
tangotiger
By the way, I'm not sold that the binary split that I have is fine. They have never been verified. These splits were first brought forth (to me) by Voros, and making it binary does give us some benefits.
They just seem logical, but you can come up with different ways to split it up. For example, you can make it nonContactPAs / contactPAs, so that you have (BB+K+HBP)/(PA-BB-K-HBP) as one ratio. Then, you can do HBP/(BB+K), and then, K/BB.
In the end, the way to break it down has to be supported by how baseball really works. I'm pretty sure that we can't just break it down into these binary measures, but I'm not sure the impact of doing so.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 11:45 p.m.,
February 3, 2004
(#41) -
tangotiger
I think regression comes into play here.
If the players in my sample continually is made up of better than average performances, we expect them to get worse ERA the next year.
So, a 3.8 ERA will have a 4.0 ERA the next year,even though the true talent level stayed the same.
Now, the next year, your group of pitchers has an ERA of 3.8 (those guys in the 4.0 group, plus a new batch of pitchers). They follow that up with an eRA of 4.0
When you chain, that makes it 3.8, 4.0, 4.2. In actual fact, they should probably be 4.0,4.0,4.0.
Regression makes a world of difference. Without it, you get the incorrect conclusion that pitchers peak at age 24.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 1:08 p.m.,
February 4, 2004
(#43) -
tangotiger
Let's concentrate on your last column: players who debuted at age 26.
Those are not the same players each year, as you will have attrition. Who goes? Those guys at the (performance) bottom of the barrel.
So, in their rookie year, they'd have an ERA of 4.0, and, of those guys who made it to the soph year, they have an ERA of say 4.4. But, what about the guys who didn't make it into their soph years? The guys at the 4.0 level is a subset of a larger group. This larger group really had an ERA of (say) 4.5. Now, how did they (or would have) done in their soph years? Well, the guys who stuck around did 4.4, and the guys who didn't... well, we don't know, but say they would have done a 5.0, such that the overall average is 4.5.
So, tracking the exact same group of pitchers from year-to-year-to-year, and we see we've got a problem with attrition. If you've got attrition (i.e., selective sampling in most cases), then you need regression towards the mean.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 2:38 p.m.,
February 4, 2004
(#47) -
tangotiger
Yes, one thing that I wanted to do is take players over the same age span. Say, all pitchers who pitched at least 250 PA each year from age 23 to age 33. Then, look only at those pitchers from age 24 to 32. I would apply this last condition only because of the selective sampling that happens in the last year, and maybe the first year.
Of course, if I looked at pitchers aged 22 to 40, I'd get a different aging pattern (likely that the decline phase would not be so bad... think Clemens).
Essentially, any combination of age x to age y, such that you look at the performances of x+1 to y-1. If you make x from 21 to 26, and you make y from 26 to 36, that gives you 66 different combinations to look at.
It's a good idea, one which I've wanted to do, but, it looks like a big pain in the butt. If someone wants to contract me out :), I'd be happy to do it.
Forecasting Pitchers - Adjacent Seasons (January 30, 2004)
Posted 2:52 p.m.,
February 4, 2004
(#48) -
tangotiger
For the sharp-thinkers, that would be 60, since x-y must be at least 4.
Pigskin Pythagoras (January 31, 2004)
Posted 3:39 p.m.,
February 1, 2004
(#2) -
tangotiger
(homepage)
The above link contains Will's comments, and my reply.
I'm not sure what you are asking about. Will made a rather clear statement, and I made a rather clear reply. There's nothing more to it than that.
If anyone has anything more to say on this subject, please post it directly on Will's site (or email me).
I'd prefer to keep this thread/forum to discuss only those topics I initiate.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Discussion ThreadPosted 11:46 a.m.,
February 3, 2004
(#14) -
tangotiger
AED has made a remark regarding that when I first introduced it, and he has provided some process for me to go through to see how well it holds up against the binomial. I'll report back later.
As well, LWTS by the 24-baseout state might be considered, since, as some have pointed out, the value of a walk or HR changes depending on the base/out situation. This would be tied to the above paragraph.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 1:03 p.m.,
February 3, 2004
(#18) -
tangotiger
In terms of the opposing pitcher for our hitters:
Using my own terminology for clutch (based on LI), the standard deviation of the differences between opposing pitcher's "overall 1999-2002 lwts-based OBA" during clutch and non-clutch situations is .0019. (I don't know how random that is, though others here might know.)
Breaking it down by top 50 clutch performers, bottom 50 clutch performers, and overall:
- opposing pitcher's OBA during non-clutch, for all groups, was .347
- opposing pitcher's OBA during clutch, for all groups, was .342
- the differential was .005 for all groups
So, a guy who performs well in the clutch did not do so by facing poorer pitchers.
I have the batter's performances being .0058 lower during clutch, while their opposing pitcher's overall OBA was .0053 lower. The reason that hitters did worse in the clutch than nonclutch is almost entirely due to the better pitchers being faced.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 2:25 p.m.,
February 3, 2004
(#20) -
tangotiger
I probably didn't explain post #18 well, but when I said:
Breaking it down by top 50 clutch performers, bottom 50 clutch performers, and overall:
- opposing pitcher's OBA during non-clutch, for all groups, was .347
- opposing pitcher's OBA during clutch, for all groups, was .342
- the differential was .005 for all groups
I meant that the opposing pitcher's "overall 1999-2002 lwts-based OBA" during nonclutch was .347, and during clutch it was .342.
That is, I weighted each pitcher's PAs in clutch and assumed a "true talent" equal to their overall 1999-2002 performance.
Note that that's not the true definition of "true talent", and that for pitchers with less than 800 PAs in that time span, I gave them league average numbers. (i.e., I regressed the regulars 0% and the bubble players 100%). For a quick report, I think this is probably acceptable.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 3:14 p.m.,
February 3, 2004
(#23) -
tangotiger
That works out to a .315 OBA in clutch and .317 OBA in nonclutch.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 5:12 p.m.,
February 3, 2004
(#27) -
tangotiger
I have updated the lead-in, with a link to my study on the issue.
Please note, there is room for much improvement here. What has been established by Andy's study, and now by mine, is that clutch hitting is detectable (even if faintly). It is possible as we look into the issue more, that we'll improve our methods to detect clutch ability.
The important takeways are:
- just because no one, until now, was able to prove clutch hitting, doesn't mean that it doesn't exist
- since we now have some signs that clutch ability is detectable to some degree, this degree can be increased by improved methods and samples
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 5:22 p.m.,
February 3, 2004
(#29) -
tangotiger
I made a little good that AED caught in my article. Article has now been corrected.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 7:43 p.m.,
February 3, 2004
(#35) -
tangotiger
I used an LI of at least 1.5 (which gives us an overall avg LI of 3), and that was nearly 20% of the PAs.
My problem is that if I use a threshhold too high, my sample size is too low to detect a pattern.
It's a tough call really.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 11:38 p.m.,
February 3, 2004
(#37) -
tangotiger
Ok, so Miguel Tejada "situationally adapts when there is a large swing impact in the game" better than expected by random chance. Can't I use the term "clutch situation" or "crucial situation" there? And can't I call his trait as clutch ability? Do I have to use the legalese 12-word description?
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 9:23 a.m.,
February 4, 2004
(#39) -
tangotiger
Fine, if we have to associate "clutch" to strictly the emotional or intellectual (but not phyiscal) resolve of the player, then obviously I can't say that a player is clutch. By that definition, we can never use the term with any statistical significance.
But, I see no reason to try to distinguish between emotional, intellectual, and physical. To me, clutch is any or all 3. I don't really care why Jason Giambi's true talent is different when the game matters the most (keeps his emotions at bay? has a better understanding as to how to bat? his body adapts better to the situation?).
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 9:40 a.m.,
February 4, 2004
(#40) -
tangotiger
I added a short paragraph at the end of my article to show that the 2 clutch runs, because of the timing, is worth 0.6 wins, instead of 0.2.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 10:45 a.m.,
February 4, 2004
(#43) -
tangotiger
Alan, in my article, I said that the hitters, as a group, hit 5.8 lower in clutch compared to nonclutch. At the same time, the quality of opposing pitchers has an OBA that is 5.3 points lower during clutch than nonclutch.
Therefore, players, as a group, hit just as well in clutch/nonclutch, after you control for pitching.
Just to give you the numbers for Jason Giambi:
nonclutch: 2213 PAs, .442 lwtsOBA, .441 realOBA, .346 pitcherlwtsOBA
clutch: 466 PAs, .503 lwtsOBA, .487 realOBA, .339 pitcherlwtsOBA
So, Giambi hit 46 OBA points higher in the clutch, and if you adjust for the better pitching he faced, that's 53 points higher. If you use lwtsOBA (which weights the HR more and BB less), he's out of this world.
After adjusting his performance for the pitchers he faced, and regressing his nonclutch performance to determine his true talent level, Giambi was 3.05 SD higher in clutch than his nonclutch would dictate.
The performance of the 340 players in my study does not conform to a distribution that would be dictated by random. Something is at play here. What else could it be? Maybe we should look at it the other way. How about "really nonclutch situations" and "the rest"? Instead of my taking the LI of greater than 1.5, I should look for LI less than 0.5. Instead of players rising to the occasion, maybe Giambi and Tejada are players that drag their true talents down when the game doesn't count.
This is Giambi's lwtsOBA in the 5 classes of pressure that I make:
0 to 0.5 ... 0.5 to 1.0... 1.0 to 1.5... 1.5 to 2.5... 2.5+
0.438...... 0.430 ........ 0.461....... 0.482.......0.565
Maybe Giambi is a genius who gets bored when there's no action.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 10:50 a.m.,
February 4, 2004
(#44) -
tangotiger
The important thing to remember, in both AED's study and mine, is that there is something nonrandom at play here. There may be other factors that need to be controlled further. Maybe I need to break out my LWTS by the 24 base-out states to get a better read as to the true performance of the player.
What is presented here should be considered as a starting point that we see a light that maybe the performance of the player is not distributed in some random fashion.
That we can all accept that hitters of different profiles approach (input) each LI situation differently already shows that it's not random. But, could the effect (output) still be random?
Right now, there's cause to believe that it's not random.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 2:16 p.m.,
February 4, 2004
(#47) -
tangotiger
re: OBA treating all safe events as 1.
In terms of establishing the randomness, I don't see this as a problem. The binomial needs to treat things as binary events (safe/out). So, from that perspective, AED has shown something as statistically significant.
However, is the reason that we get nonrandomness because a player changes his hitting profile so that we were expecting this to begin with? Looking at what I did, I give more weight to the HR and less to the walk. However, in so doing, we no longer have binary events. My lwtsOBA may look binary, but it is not. But, I desperately want to use the binomial. Applying my fudge factor (which should come under alot of scrutiny, much more scrutiny than what AED has done), and I get the same kind of nonrandomness effect (overall) as AED shows with using plain OBA.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 2:26 p.m.,
February 4, 2004
(#48) -
tangotiger
One test that I've been thinking about (and I think someone has brought up) is to split up the player's performances by base/out.
Right now, the base/out frequency won't be the same with clutch/nonclutch. While 45% of all PAs occur with the bases empty, maybe only 30% do so during clutch situations.
As a way to ensure that the hitter's approach should be the same during clutch/nonclutch, we should ensure that they have the same mix of base/out in clutch/nonclutch. Or, at least adjust all the player's performances based on the base/out (similar to what I did with the opposing pitcher).
If for example, we have in nonclutch situations
Giambi, bases empty, .400 lwtsOBA, .390 OBA
Giambi, men on, .410 lwtsOBA, .380 OBA
and we do this for all 340 hitters (equally weighting them), and we get
all players, bases empty, .345 lwtsOBA, .350 OBA
all players, men on, .345 lwtsOBA, .340 OBA
Then we can adjust the player's OBA with men on to account for that fact that players do have a different hitting approach with men on, and bases empty.
Then, we can take this adjustment, and apply it to the men on during clutch situations.
Does this sound good?
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 4:05 p.m.,
February 4, 2004
(#50) -
tangotiger
This time, I tried it the other way.
I took a player's performance when the leverage is at least 0.5, and made that his "regular" performance. Then, I looked to see how they played when the game least counted (LI under 0.5).
Interestingly, there is an effect here as well, but not as much. The true spread in clutch hitting is about 50% wider than the true spread in bored hitting. However, bored hitting also exists.
(This is the case whether I looked at OBA or lwtsOBA.)
The leaders in the "perform much better when the game is not on the line" are: Chipper Jones, Tony Batista, Brian Giles, Vinny Castilla, Nomar Garciaparra.
The opposite, which is the guys who do their worst when the game is not on the line: Tino, Jacques Jones, Richie Sexson, Craig Counsell.
So, it's mostly rising to the occasion, and partly playing down to the crucialness of the game.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 5:28 p.m.,
February 4, 2004
(#52) -
tangotiger
I have no idea what you just said. Can I interpret that to mean that you have Jessica Alba's home phone, and you will give it to me?
***
Correlation (r) of the 5 levels of pressure (0 = LI less than 0.5, and 4=LI greater than 2.5). avg PA in each group is: 514, 555, 327, 236, 89
0-1 .66
0-2 .61
0-3 .55
0-4 .47 (.81 OBA0 + k = OBA4)
1-4 .39 (.62 OBA0 + k = OBA4)
2-4 .38
3-4 .39
Not sure how to interpret the differing PA levels in each group.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 11:27 p.m.,
February 4, 2004
(#57) -
tangotiger
AED removed IBB (and bunts and HBP). I don't think this comes into play, if I'm reading David correctly.
***
I think it's a fair point that if AED (and I) will define clutch in such a way that an abnormal number of those situations are with men on, then we should define the nonclutch or control group or whatever to have the same split.
AED said: First, "clutch" was defined as any plate appearance in the 3rd-5th innings. What I would suggest is to define this the same way as you did for innings 7+ (tie, or tying run on base, at bat, or on deck), but for innings 3 through 5. What this does is that it keeps the same distribution of men on base / bases empty, but in the early innings.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 9:14 a.m.,
February 5, 2004
(#60) -
tangotiger
DAvid, in Tejada's case, that may very well be.
But, as AED just noted in his previous post, he created in a subsequent test, a clutch/nonclutch based only on the inning/score. That is, the base/out distribution would be the same in both samples, and he still found a difference.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 3:48 p.m.,
February 5, 2004
(#61) -
tangotiger
(homepage)
DATA
At this link, you will find data in various crucial situations. I invite analysts and statisticians to use this data for research purposes, and to make public their findings.
Note that this data was not exactly the same as what I have shown in my article. I changed the leverage categories so that there are roughly an equal number of PAs in each grouping. This might make it a little easier for analysis.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 5:04 p.m.,
February 5, 2004
(#63) -
tangotiger
Good question. Here are the avg LI and the avg PA
0: 0.14 / 331
1: 0.50 / 346
2: 0.82 / 347
3: 1.19 / 352
4: 2.29 / 345
ALL: 0.995 / 1720
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 5:08 p.m.,
February 5, 2004
(#64) -
tangotiger
Just for those kind of new to LI, what that first line says is:
The average PA in low-pressure situations had 14% the swing of a regular PA. If you expect 1 run to normally generate 0.100 wins, then a run generated during this low-pressure would generate only 0.014 wins.
***
It's also important to note that how a player perceives a pressure situation does not necessarily imply that it actually is a crucial situation. LI really gives you the crucialness of the situation. I have no idea how a player establishes the pressure of a situation (if he even does).
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 10:46 p.m.,
February 5, 2004
(#66) -
tangotiger
I think I did that in a separate, but non-disclosed, calculation. That is, I did the SD as I showed it, but then I regressed the SD by the 209 calculation to get a true talent clutch performance. I believe that I got the 1 SD = .01 or so. The 0.2 wins for 1 SD would conform to the 0.6 wins for 3 SD that I showed in the end.
However, let me double-check, to make sure that I did everything right. Thanks for double-checking!
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 6:56 p.m.,
February 6, 2004
(#76) -
tangotiger
I think clutch play matters just about the same as non-clutch play
No, that can't be right. I've posted the LIs, and they are very significant.
You may not find it at the team level, because of the amount of noise, but you can certainly and easily find it at the granular level.
The effect is small, but that's because there aren't many clutch situations to begin with. Only 20% of the PAs had an LI of around 1.5 or greater. Shoot that up to an LI of 2.5, and that percentage goes down to a bit over 5%.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 10:56 p.m.,
February 6, 2004
(#81) -
tangotiger
A study in _The Hidden Game of Baseball_ found that relief aces' plate appearances had twice the leverage of average plate appearances. Thus I have estimated the relative importance of hitting in the clutch as double that of overall hitting, when clutch was measured as about 1/6 of the plate appearances.
In my post 63, I have 20% of the PAs with the most leverage to have a leverage of 2.3. Granted, different run environments would have different LI.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 10:20 a.m.,
February 9, 2004
(#94) -
tangotiger
Alan, Yes, I did get your email, but I did not have easy access to my data to confirm your request. I do now, and I will let you know in 2 minutes, if I have any bugs.
I would also appreciate you posting your findings, as well as an explanation as to how to (and how not to) interpret your results.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 10:22 a.m.,
February 9, 2004
(#95) -
tangotiger
Alan, the data is correct.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 12:40 p.m.,
February 9, 2004
(#97) -
tangotiger
stat-savvies:
1 - When I presented the standard deviation of the standardized scores, and I reported a figure of 1.12 (based on 340 samples), how can I figure out the statistical significance of this figure? For example, if I did it based on only 30 players, that would be different from 340. Can I assume that after a certain threshhold, say 30 or 40, that the figure I reported, 1.12, is essentially 1.12 +/- .005 99% of the time?
2 - How about the effect of the park? Now, I think we are saved pretty well in this regard. Say Tejada, he'd have an equal split of PAs in Oak and away, in each of the 5 categories. Or Giambi, who split his home time between Oak and NYY, the same issue. While the park does add a variance distribution that we have to consider, is it negligible because of the way it's being handled in this case?
(I can see that if we were looking at overall OBAs for all 340 hitters, we'd have to consider the variance of the park, similar to what we did with pitchers.)
Thanks
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 1:37 p.m.,
February 9, 2004
(#102) -
tangotiger
"But the median (average) outcome is neutral.", median should read as mean of course.
***
VoiceOfUnreason speaks with great reason, as does AED.
The change in win probability by the offense, on a league level, will always be zero. This is not true at a team, game, inning, or PA level.
It is only true when the underlying expected run environment that generates the win probability tables EXACTLY equals the actual run environment. Furthermore, that run environment is not only with runs per game, but in the distribution of those runs by game and by inning. Fortunately, a Markov chain that only distinguishes by base/out gets us 99.9% of the way there.
The bottom line is that the win prob tables that I generate and use (which are after-the-fact on a league level) will automatically preserve that off=def=0 on a league level.
Obviously, when Pedro is pitching, that's not the case. (And this opens up other thoughts for discourse that is more appropriate with the "Custom LWTS" thread from a few months ago.)
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 3:24 p.m.,
February 9, 2004
(#108) -
tangotiger
It's that accurate probabilities will result in a situation where the average probability change should be zero
On a league level (say 1986 NL), it must come out to zero. If you have some 1-0 game where obviously pitching dominated (and the offensive was negative), you'll have some other game that same year that went 11-8 where hitting dominated (and offense was positive). The season, in its totality, is zero=off=def.
But, does it have to be?
Take the case of a T-ball league, with 50 fielders per team. In this case, you will probably find that the standard deviation of runs scored will be far higher than the standard deviation of runs allowed. It is those standard deviations (over a large enough sample) that establishes how much gain there is due to off and def.
But, even so, what impact does that give you? You still start off at win=.50/.50 and you still end up with win=1.00/.00.
So, you will have on team that will be off= +.50, and the other team that will be off= -.50 (and def=0 for both). Over the course of the whole league, what do you think will happen? Well, if one team has off=+.50, and another has off=-.50, then, the league will be off=0.
But, we already knew that because
off+def=0 (by definition)
And, we constructed a league where def=0. So, off has to be zero.
In MLB, the standard deviation of runs scored and runs allowed, over the last 100 years or so, is virtually identical. But, just like in the T-ball league, this information is not needed to ensure that off=def=0 on a league level.
The value of the information of the variance is to establish how to split up each of the PAs, one at a time. If you are in the T-ball league noted above, the full change in the win prob you give to the offense. In MLB, it's not that easy (and really, we can discuss this at length for hours and days).
The net effect is that off=def=0 on a league level.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 4:31 p.m.,
February 9, 2004
(#111) -
tangotiger
To clarify this statement:
But, even so, what impact does that give you? You still start off at win=.50/.50 and you still end up with win=1.00/.00.
That should read explicity as:
But, even so, what impact does that give you? You still start off the game at win=.50/.50 and you still end up the game with win=1.00/.00.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 11:32 p.m.,
February 9, 2004
(#122) -
tangotiger
This is such a waste of time. Anyone who does the win prob tables does it virtually the same way... you always get the off=def=0.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 7:34 a.m.,
February 10, 2004
(#131) -
tangotiger
To how many decimals did Owsalt set his charts? Did he round, or truncate? I set mine to 38.
We have no good way of assessing the value of fielding, nor of baserunners distracting pitchers, nor of some baserunning plays. But for hitting (including walking and sacrificing), base stealing, pitching
Are WP, PB, Pickoffs, BK, and other events also included? From Ed's description, it does not appear so. In my win analysis I always include these and every single event. Since these events are negative for the defense, and positive for the offense, it's rather easy to see that 1.5% difference could be explained.
The typical win value of these events is about .003 wins. I don't have my Lahman DB handy, but if someone wants to list the # of WP,PB,BK,Picks, we can tell fairly quickly if Ed included those events or not.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 7:35 a.m.,
February 10, 2004
(#132) -
tangotiger
Uhm, that should be .03 wins (almost .30 runs).
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 9:56 a.m.,
February 10, 2004
(#134) -
tangotiger
From 1974-1990, there were: 28,372 WP+PB+BK. There were also 9684 PB. So, that's 18,688 extra plays for the offense. That's based on 2,638,407 PAs.
Oswalt had 4,443,803 PAs from 1972-2002 (or whatever time period he was). Pro-rating my numbers to Ed's and we have 31,476 extra plays that Ed probably does not account for (based on his description).
Remembering that each of these plays is roughly equal to .03 wins, that gives us: .03 x 31,476 = +944 wins for the offense that are unaccounted for (and -944 wins for the defense).
So, that leaves us with a gap of 123 wins on over 4 million PAs, or about 0.2 wins per team per year.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 12:23 p.m.,
February 10, 2004
(#139) -
tangotiger
Coming out of my purgatory, I will address Ross specifically, in the hopes that he will leave my Primate Studies sanctuary.
*****
The win prob matrix includes all events, yes.
But, when running the player data against the win prob matrix, the wild pitches, balks, and PB got thrown away. This is the best interpretation of the Oswalt statement ("We have no good way of assessing the value of fielding, nor of baserunners distracting pitchers, nor of some baserunning plays.") and resulting data.
***
Now, please, just stop posting here, and go back to Clutch.
Clutch Hitting: Fact or Fiction? (February 2, 2004)
Posted 2:28 p.m.,
February 10, 2004
(#147) -
tangotiger(e-mail)
Voice, email me, and I'll clear it up.
The genius of Paul DePodesta (February 4, 2004)
Posted 2:58 p.m.,
February 4, 2004
(#2) -
tangotiger
I used to want to be like Tim Raines. Now I want to be like Paul DePodesta.
Not something that I should admit.
The genius of Paul DePodesta (February 4, 2004)
Posted 4:16 p.m.,
February 4, 2004
(#5) -
tangotiger
90% of the player population in major league baseball is replaceable by someone who makes less
That's a true statement, but a little deceiving.
Take the case of Albert Pujols and Mark Prior. Based on their salary of last year, you can say that they are in the bottom 10% of the salary scale. So, every player who makes more than Pujols (which is 90% of the player population) is replaceable by Pujols. Well, in some weird way, yeah.
How about looking at it from a more realistic perspective: flip 90% of your roster, and expect to replace them with players who make less, but perform the same (for each one). Yes, you can flip Jeter for Nomar. Yes, you can flip Alomar for Reyes, etc, etc, all the way down the line. You can't flip ARod, you can't flip Pujols, or Prior, etc.
But, is there a cost to flipping? If you want Reyes, you give up Alomar, and some cash. So, while you can say that 90% of the players are replaceable for players who make less, they are not readily available, and it's going to cost you money.
The genius of Paul DePodesta (February 4, 2004)
Posted 4:38 p.m.,
February 4, 2004
(#6) -
tangotiger
I should say that 90% of a single team's roster might be flippable, but not every roster can be flipped at the same time. I mean, I flip Jeter to get Nomar, and then Bos flips Jeter to get Tejada, and then Bal flips Jeter to get... at some point, you can't flip.
The genius of Paul DePodesta (February 4, 2004)
Posted 10:42 a.m.,
February 5, 2004
(#9) -
tangotiger
I don't see how I missed the point if I said that you would flip Alomar for Reyes (which is essentially the point you are saying about flipping Tejada for Crosby).
***
Tejada is an above average player. If Tejada would have offered the A's 3 years at 5 million$ each, what do you think they would do? If DePodesta's point is that there are alot of overpaid players, fine. Crosby is an excellent prospect, and may be close to Tejada's equal, but there's a large margin for error there. I would bet that the A's wouldn't mind having Tejada at 5 million / yr, even though that is much more than Crosby will make.
The genius of Paul DePodesta (February 4, 2004)
Posted 12:40 p.m.,
February 5, 2004
(#11) -
tangotiger
I think my favorite line was the one that DePodesta quoted, about if you weren't doing this already, how would you do it.
There's an incredible inertia, simply because most teams don't know WHY the heck they are doing what they are doing. To believe in sabermetrics is to believe that you can properly identify and balance performance-based results and tools-based observations. And, the only way to do it is to have a systematic methodology.
I want scouting information, I need scouting information, but my guess is that the way that the scouting information is compiled makes it less than ideal.
If I had scouting information, instead of regressing the performance-based results of my players towards the population mean, I'd be able to regress them towards players with similar tools. Once you have a convergence of tools and performance, then you've got the pinnacle to sabermetrics.
My feeling is that most teams would rather dip their toe (as opposed to jumping) into sabermetrics because:
1 - they think it's only about statistical analysis
2 - there's inertia
3 - they're afraid to pay for it
Sabermetrics, as a process, can probably identify 5 to 10 wins that you can maximize in some way. That's at least 10 million$ of value right there. But why would an owner believe a guy named Tangotiger of all things that there's 10 million$ of inefficiencies that can be tapped?
What baseball owners should want is geniuses like Beane/DePodesta or Ricciadi/Law, guys who believe in doing things differently, adhering to a set justifiable philosophy, and controlling things from the top.
The genius of Paul DePodesta (February 4, 2004)
Posted 3:55 p.m.,
February 5, 2004
(#13) -
tangotiger
I agree, and that's why I don't think a team that's going to just dip their toe in the sabermetric water will be enough. They are either in, or they're out.
Regarding my "named Tangotiger" comment, I meant that you need Paul DePodesta (himself, not Michael Lewis) to write the book, otherwise, it's just a whole bunch of people giving good advice online, without having the "proven" credentials. That's also the ironic part, since the "proven veteran player" is usually the guy that sabermetrics would say is being overpaid and should be avoided.
The genius of Paul DePodesta (February 4, 2004)
Posted 10:53 p.m.,
February 5, 2004
(#17) -
tangotiger
Yeah, I was a little annoyed about the lack of citations on things like run expectancy and Markov chains, making it almost seem like it was A's-invented.
Then again, it *is* a presentation, and he's not doing a history lecture, or writing a magazine article.
The genius of Paul DePodesta (February 4, 2004)
Posted 2:47 p.m.,
February 9, 2004
(#23) -
tangotiger
Sure, they both need to be evaluated. The question is how much weight to give the scout.
You have to know:
1 - How good is the scout at evaluating what he sees (qualitatively)
2 - How persistent is it what he sees
If the scout sees that Dwight Evans has changed his stance, and is finally hitting the way he should, can the scouts qualitative observation be very confident after 20 or 30 or 50 PAs?
And, even if the scout is right... is what he sees going to be what he's going to continue to see for the next 100 PAs?
***
In terms of results-based performance, you would regress a player's performance by:
200 / (200+PA)
This means that if you have 200 PAs, then you would regress the player's performance 50% towards the league mean, because you have that much uncertainty to account for. If you had 1800 PAs, you'd regress 10% towards the league mean.
But, what about scouting, or tools-based analysis? How reliable is it, and how good can it get?
I would say that a trained scout can do something like:
50 / (50 + PA), with a max of 200 PAs or so.
That is, what I can tell you about a player with 200 PAs of stats (50% regression), a scout can tell you with 50 PAs of observations (50% regression).
Give a scout 200 PAs, and we'd regress his observation 20% towards the mean, and you'd have to give me 800 PAs of results to give you the same confidence.
But, after that, there's not much more that the scout can give you. There's too much deception that the scout can't process out.
***
Note: I have no idea if I'm right, and what the boundaries or break-even points are. This is just a guess as to how I think it works.
***
Optimally, I would have the scouts mark things down in a systematic way so that it becomes part of the performance-based results. Like, where was Jeter, how long did it take him to react to a ball hit, how fast did he get the ball out of his glove, how did his 1B handle the throw, etc, etc. Some of these things are qualitative, and some are quantitative.
The key is to try to quantify as much as you can, and in a systematic fashion.
The genius of Paul DePodesta (February 4, 2004)
Posted 3:06 p.m.,
February 9, 2004
(#24) -
tangotiger
Just to try to give you a better idea as to what I'm thinking:
regression(performance) = 200 / (200 + PA)
regression(tools) = 5 / (5 + PA^0.333)
So, here's how much regression you would need with various levels of PA:
Results Tools PA
95% 70% 10
80% 58% 50
67% 52% 100
50% 46% 200
25% 37% 600
20% 35% 800
10% 29% 1800
5% 24% 3800
2% 19% 9800
I kind of fixed 100 tools PA to be equal to 200 performance PA. I don't know that they are.
Take the case of say Shane Spencer, and his remarkable Sept. After 50 PAs, a tools-based analysis might have determined that he's really a slightly above-average MLB hitter (say a .350 OBA guy in a league of .340), and would have regressed their analysis 58% towards the .340, for a .344 "true" OBA. So, the scout thinks he's seeing a .350 guy after 50 PAs, but we make a best guess that he's actually seeing a .344 true player.
Results-based, we'd see Shane Spencer as Barry Bonds or Jim Thome, and claim him to be a .440 player. But, we regress that 80% towards the mean, to a .360 player. Again, our best guess would be that he's a .360 player after 50 PAs.
What would be better is to merge these two as being not independent. If a scout's observation says that he's a true .344, then we don't regress Shane towards .340, but towards .344. And, having the extra reliability of the scout, now we don't regress only 80% towards the scout-driven mean, but say 90% towards the scout-driven mean. In this case, his .440 becomes .354.
So, that becomes our best guess, merged, result. Based on scouting, and how we think it's reliable, and based on performance-results, and we have a pretty good idea of its reliability, we come up with a true talent level of .354 for Shane.
This is where I see the next advance of sabermetrics/scouting (if it hasn't already occurred behind the curtain).
The genius of Paul DePodesta (February 4, 2004)
Posted 4:33 p.m.,
February 9, 2004
(#26) -
tangotiger
Voros, MGL, and Nate Silver, among others, use the player's body type and position to do their forecasting.
Batter's Box Analysis (February 5, 2004)
Posted 1:47 p.m.,
February 5, 2004
(#4) -
tangotiger
In MGL's MLEs, he had Phelps as one of the best hitters in the minors from 01-03.
Batter's Box Analysis (February 5, 2004)
Posted 3:52 p.m.,
February 5, 2004
(#7) -
tangotiger
I agree that triples are speed-doubles. This is further evidenced that the rate in decline of 3b/(2b+3b) is similar to sb/(1b+bb).
Aaron's Baseball Blog - Basketball (February 9, 2004)
Discussion ThreadPosted 12:31 p.m.,
February 9, 2004
(#3) -
tangotiger
1. Yes, anything like technicals, or I suppose those late non-shooting fouls, should be removed.
2. Yes, that would count as "1" opp.
***
J. Cross: If I read that measure properly, there are no opps included for free throws in there, and the numerator will also include technicals.
***
Can I assume that a missed field goal on a foul does not count as a field goal attempt if missed, but it does if they get the basket?
Aaron's Baseball Blog - Basketball (February 9, 2004)
Posted 1:40 p.m.,
February 9, 2004
(#6) -
tangotiger
If you don't want to measure how well Shaq does on free throws, but you want to account for the fact that he gets alot of free throws, then you need to add in a term:
.75*free throws
which will give you the number of points an avg NBA player would have gotten, if he got the number of free throws that Shaq earned by being in a position to take a shot, but was denied by the opponent's foul.
Aaron's Baseball Blog - Basketball (February 9, 2004)
Posted 5:09 p.m.,
February 9, 2004
(#17) -
tangotiger
Player A takes 100 2pt shots, hits on 50, and the other 50 are rebounded. Of those 50, the opposition grabs 40 of them, and the off gets another 10 possessions.
So, that's 90 possessions turned over, with 100 points earned.
Player B take 100 3pt shots, hits on 33, and the other 67 are rebounded. Of those 67, the opposition grabs 63 of them, and the off gets another 4 possessions.
So, that's 96 possessions turned over, with 99 points scored.
To equate them to say both with 1 point per possession, you need:
player A: gets 90 points per 100 shots, or 45% from 2pt
player B: gets 96 points per 100 shots, or 32% from 3pt
(All numbers just educated guesses.)
Aaron's Baseball Blog - Basketball (February 9, 2004)
Posted 5:17 p.m.,
February 9, 2004
(#20) -
tangotiger
To expand my example more to include fouls:
Player A:
100 2pt "opps"
41 hits
10 fouls (of which he makes 80%, and which the def gets back all of them)
10 missed and off rebound
39 missed and def rebound
So, that's a total of 82+8 = 90 points earned for a price of 90 possessions turnover. His shooting percentage is 41/(41+10+39)=45.6%
Player B:
100 3pt shots
31 hits
4 fouls (of which he makes 80%, and which the def gets back all of them)
3.8 missed and off rebound
61.2 missed and def rebound
So, that's a total of 93+3.2 = 96.2 points earned for a price of 96.2 possessions turnover. His shooting percentage is 32.3%.
So, a player with a 46% 2pt FG is equivalent to a 32% 3pt FG player.
(Again, just plug in the appropriate numbers.)
Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)
Posted 3:17 p.m.,
February 10, 2004
(#5) -
tangotiger
you are treating the probability of Y as if it were linear with respect to LI,
Agreed. I really meant:
x= .400/(1-.400) all divided by .340/(1-.340)
newOBA = x/(x+1)
I see the point about having whole numbers for PA*OBA.
Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)
Posted 1:28 p.m.,
February 11, 2004
(#12) -
tangotiger
Can you give us a small primer as to how a multinomial model would work?
And, do you think that my lwtsOBA, coupled with the fudge factor that Andy suggested, would be an excellent approximation?
Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)
Posted 2:33 p.m.,
February 11, 2004
(#14) -
tangotiger
Yes, it does look familiar.
In this case then, you would have say :
ln(1b/out),
ln(2b/out),
ln(3b/out),
ln(hr/out),
ln(bb/out),
ln(hbp/out)
For each player/LI category, right?
In terms of least-squares, since the BB is worth less than the HR, would you weight the ln(bb/out) less than ln(hr/out)?
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 9:36 a.m.,
February 11, 2004
(#9) -
tangotiger
MGL, I'm surprised that you have to use a high number like 500.
For off lwts, I use 209, and for UZR I use 420. Since these would be the two largest components of superLWTS, 500 seems out of line.
Otherwise, you are saying that you have to regress superLWTS 50% if given 500 PA. And, I know that you have previously stated tht you regress around 30% or so.
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 2:38 p.m.,
February 11, 2004
(#13) -
tangotiger
MGL, it's even worse, because even if the spread of talent of baserunning were as wide as hitting, each hitting event is worth more.
That is, if you had basestealing as a spread between 50% success and 90% sucess, and you had OBA as between 25% and 50%, you not only have what you mentioned about the opps (say 0.1 SB att/game to 4.3 PA/game), but that each SB att is worth far less than an average PA in terms of potential impact.
My guess is that it would be far easier to just convert superLWTS into a binomialized stat, and work from there.
I'll defer to those with more expertise on the matter.
How Valuable Is Base Running and Who Are the Best and the Worst? (February 10, 2004)
Posted 4:51 p.m.,
February 11, 2004
(#16) -
tangotiger
(homepage)
Run Expectancy depends on EVERYTHING.
The above link will give you a good idea about the various run values of events in different run environments. All those run environments assumes a typical split of hits, hr, walks given that you have a safe play.
For your specific question about a team that relies much less (or more) on hr, I'd have to run it through (say like the 80s Cards). Chances are, these Cards will have a lower chance of scoring from 1B and greater from 2B and 3B. This makes the SB worth more.
A team with tons of HR hitters will make the value of being on 1B greater and being on 3B less, and makes the SB worth less (relative to the Cards team).
As for magnitude of effect, I'm sure it would be quite small, maybe a .002 win difference per SB at most, and more likely .001? So, Vince Coleman might have been worth an extra .2 wins in a season by not being surrounded by boppers. Just a guess.
Baseball: Pythagorean Method (February 11, 2004)
Posted 10:40 a.m.,
February 11, 2004
(#2) -
tangotiger
It's the old one.
Baseball: Pythagorean Method (February 11, 2004)
Posted 5:32 p.m.,
February 11, 2004
(#10) -
tangotiger
If I remember right, doing it as abs(real win% - est win%), all of these were around an average error of .020 (or 3.2 wins per year). I suppose that probably corresponds to the RMS of 4.2. So, you don't gain much anywhere, even with the ultracomplex Tango Distribution.
Baseball: Pythagorean Method (February 11, 2004)
Posted 3:23 p.m.,
February 12, 2004
(#12) -
tangotiger
Ben, that's great stuff!
"I get b=0.286089 ". I don't remember what my data set included, but I had .287. So, I think we've pretty much nailed it.
As well, I mentioned that in the cases that we're most interested in (extreme teams and extreme players), I found a better fit with .28. That is, if you throw out the one-third or one-half of your teams with the smallest run differential (or smallest RS-RA / RS+RA), .28 works better.
I suppose if you really, really wanted to find the best fit, you'd do:
b = x + [ABS(RS-RA) / (RPG)] y
Then, n=RPG^b
For the majority of teams, that y term will reduce to close to zero. For a Tiger or Yankee team, that second term might come out to -.01 or -.02.
I suppose that x=.29 and y= -.01 or -.02.
Baseball: Pythagorean Method (February 11, 2004)
Posted 3:25 p.m.,
February 12, 2004
(#13) -
tangotiger
Uhmmm... that should be y= -.1 or -.2.
Peak Age by Year of Birth (February 11, 2004)
Discussion ThreadPosted 4:38 p.m.,
February 12, 2004
(#3) -
tangotiger
You might want to try 3 separate lines:
players with 5 years or less,
players with 6 to 10 years,
players with 11 years or more
Peak Age by Year of Birth (February 11, 2004)
Posted 4:38 p.m.,
February 12, 2004
(#4) -
tangotiger
You will have selective sampling issues of course.
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 1:08 p.m.,
February 12, 2004
(#2) -
tangotiger
why wouldn't you use as a population mean say, the average change in performance from year one to year two for all first year players in the last 10/20/30/40 years
I never said not to. I clearly said that if you wanted to take a subpopulation of players (like Arod, Giambi, Delgado), then you can do so, with my provision.
Choosing other similar-aged and experienced players is also a good criteria, and this particular combination was specifically cited a few weeks ago when I presented the "Forecasting Pitchers" article.
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 1:09 p.m.,
February 12, 2004
(#3) -
tangotiger
of their own career "sample" mean.
Again, if you have a large sample mean, then you will regress barely at all towards the pop mean. A player with 1800 PAs will be regressed 10% towards the pop mean. That means a player who performed at +50 runs per year for 3 years is, at our best guess, a +45 runs per year player (with a margin of error).
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 2:31 p.m.,
February 12, 2004
(#6) -
tangotiger
(homepage)
MGL, your question in your last paragraph was asked/answered at the above homepage link.
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 4:14 p.m.,
February 12, 2004
(#8) -
tangotiger
No, because you are forgetting about the confidence interval.
If say you have a league average OBA of .340 and Bonds has a .440 OBA. Because that's a sample, maybe Bonds' true OBA is .430. But, that comes with a confidence interval. It would be better to say that Bonds' true OBA is .430 +/- .030 95% of the time. As you can see, there is a chance, in this example, that Bonds' true OBA is actually higher than his sample OBA.
What is being said is that, on average, players above the mean got more good luck than bad luck. Not all players, of course. Again, in my above example, there is a chance that Bonds got more bad luck than good luck.
Perhaps what we should show is a line like:
Bonds: player
.440: sample OBA,
.430: true OBA,
.015: 1 standard deviation
20%: chance that true OBA is greater than sample OBA
(Numbers for illustration only.)
Would this make it easier to swallow?
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 11:08 p.m.,
February 12, 2004
(#12) -
tangotiger
I think that from now on, I will do my best to present stats with the variance.
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 2:42 p.m.,
February 21, 2004
(#14) -
tangotiger
With regression.
Clutch Hits - Tango's 11 points to think about --- to understand why we regress towards the mean (February 12, 2004)
Posted 9:40 a.m.,
February 24, 2004
(#17) -
tangotiger
(homepage)
If you include the playing time component, then you are pretty close to it.
See above homepage link.
USATODAY.com - Study backs NHL's claim of major financial losses (February 16, 2004)
Discussion ThreadPosted 5:20 p.m.,
March 1, 2004
(#2) -
tangotiger
I'm trying to remember, but when Stevens first signed with the Blues, they forfeited 5 1st-round picks back to the Caps.
Then, when he was restricted again, NJ claimed him, and NJ had to give up Brendan Shanahan back to the Blues (I think this was arbitrator ruled). In effect, it was a trade. The only thing the "restricted" thing does is it allows a team to trade for a player at full market value.
I seem to remember Dale McCourt 20 years ago was claimed by the Kings from Det (or vice-versa), and there was stiff compensation too.
I'm not sure how Marcel Dionne went from Det to LA after only 2 years.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 5:04 p.m.,
February 16, 2004
(#2) -
tangotiger
The PTBNL is a goodpoint. Let's see who it is first. I think Texas has a choice of 5 players.
If it was a Joe Mauer type, then you're talking about say 45 million$ of present value for a salary of 20 million$ over 6 years. Otherwise, it could be Joe Schmoe. So, we're talking about a 0 to 20 million$ net income value of PTBNL.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 10:32 a.m.,
February 17, 2004
(#12) -
tangotiger
I never had a chance to fully go through that article, but I would be wary. I believe it only covered a period of 5 years (late 90s). I think you have tons of issues here.
I do agree that a team will (should) spend more per win as they increase their chances of hitting the playoffs, and getting all that money. I'd like to see another look at this as well.
Furthermore, there should also be a topping off at some point. NYY and Bos are already "guaranteed" of making the playoffs. Well, say they are 95% of the way there. Flipping ARod for Soriano increases that to what, 96%? It's like bringing in Mariano Rivera in the bottom of the 9th with a 6-run lead. Do you need to do that to MAKE the playoffs? Not if it'll cost you 100 million$ to do so.
However, once in the playoffs, this flip makes a world of difference.
The point is that we shouldn't necessarily rely on a 0.5 million / win for noncontenders and 1.5 for contenders (or whatever numbers the authors suggest). Even my flat 1.85 million$/win isn't good enough. There's more dynamics going on.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 10:43 a.m.,
February 17, 2004
(#13) -
tangotiger
Darren: I don't have Manny's particulars handy. Let's assume that his remaining 100 million$ (some deferred) for 5 years has a present value of around 90 million$. The market value for Manny, for 5 years, would probably be around 60 million$. (Vlad, younger, less healthy, better defensive, signed for 70/5, and Tejada signed for 72/6 I think).
So, Manny is being paid at 40 million$ over market value.
Tex/NYY, based on their trade and my back-of-the-envelope calculation, has ARod as being about 46 to 53 million$ over market value.
Boston, based on market value, should have asked for relief of around 10 million$. I think they asked for over 30 million (though Saturday they were saying "as-is").
***
And, never should a player take a pay cut. Do you think the Cards will give Pujols a 10 million$ bonus for being severely underpaid? However, if ARod wanted to "buy" his way into playing for Bos, that's another story. But, that sets a precedent for an MLB team to strong-arm players. The union sees this 3 steps ahead of the rest of us, and I'd defer to their market analysis on this.
***
Note that what the market thinks of a player is not necessarily what he is actually worth. The market is only right if it can properly balance all known information. The stock market has the luxury of having millions of shares of each company being traded every day. MLB does not have that luxury.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 10:46 a.m.,
February 17, 2004
(#14) -
tangotiger
"So, Manny is being paid at 40 million$ over market value."
That should read "30", and so Boston should have asked for relief of around 20. Again, change some assumptions around, and asking for relief of 10 or 30 may have been appropriate.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 11:35 a.m.,
February 17, 2004
(#16) -
tangotiger
I agree that at this level, changing the assumptions slightly can have a 20 million$ swing. Both sides can reasonably argue that it was a fair trade. A straight ARod for Sori trade (or even just ARod for nothing) would have been a bad trade for the Yanks.
***
You may think ARod is unique, because of the combination of hitting, fielding and position, but that's not too important.
His 2000-2002 performance says he was 20 runs better than the best players other than Bonds:
http://www.baseballprimer.com/bodies/lichtman/200002.html
That's 2 wins, which you can value at anywhere from 2 to 6 million$ per year. Seeing that he'd be in his decline phase at some point, you wouldn't expect that over the next 7 years. So, he should be paid somewhere around 30 million$ more than the standard superstar over 7 years.
Vlad got 70/5, so we'd expect a 7 yr contrat should be around 90 to 95 million$. That's the market for a typical superstar over 7 years. ARod should be around 120 million$ for 7 years, or 17 million$/yr.
I mentioned at the beginning that That production, today, would be paid out at about 17 million $ / year.
It's not just wild speculative guesses. They are reasonable educated guesses.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 12:24 p.m.,
February 17, 2004
(#18) -
tangotiger
The biggest reason is that business practices get thrown out the window when it comes to baseball. Having bright people who understand economics like Law, DePodesta, Epstein et al can only improve the economic landscape.
We aren't at the point where players are priced optimally, but they are priced fairly (as many good contracts for players as bad).
A team economically managed by Primates would do better than every other team out there.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 2:23 p.m.,
February 17, 2004
(#21) -
tangotiger
dlf,
I certainly don't mean to ignore the "draw" factor beyond what his performance on the field has. Certainly, Gretzky had that in hockey (witness the resurgence of the Kings and the actual creation of teams in that market) or Jordan in basketball, etc.
I think some marketing folks set Jordan's value (if he were to play at his peak) to the NBA as (and I think I'm remembering this right) 1 billion$.
I won't begin to try to figure out ARod's draw value, and I won't dismiss it either.
But, given the time, I think it would be a rather simple process to go through, to the same extent that Nike can establish Tiger was worth 60 million$ as a rookie.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 5:09 p.m.,
February 17, 2004
(#23) -
tangotiger
Yes, but there's a lot of value in being quite sure what you're getting back. With ARod, you're getting a consistent performer, with Soriano, you've got a big chance that he tanks.
That we are more certain in ARod's true talent level than we are with Soriano has already been factored into their market price.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 1:15 p.m.,
February 18, 2004
(#31) -
tangotiger
I don't see why a "proprietary" system from the A's or Redsox would be any better, if they have access to the same data that we do.
The only place where they'd have a (short-term) leg up on us would be that they have access to scouting reports. However, once we have that data, then we're all on the same footing.
And, I'll stack up the collective wisdom of the Primates against anyone other there.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 3:20 p.m.,
February 18, 2004
(#33) -
tangotiger
You should make a distinction between the drawing power that a team can benefit from, and once which they won't.
Bonds probably has great draw from a fan perspective that probably sends more people to see the Giants than they normally would for that win level. (Maybe.)
Will ARod helps the Yanks draw more in attendance/TV than they would based strictly on his marginal win impact? I dunno. Maybe?
I'm not sure how much of an effect a baseballball player can have here.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 10:00 p.m.,
February 18, 2004
(#35) -
tangotiger
I estimate each marginal win to add 2% to attendance. The diff between ARod and Soriano is about 4 marginal wins, or 8%. NYY had about 35 to 40 K fans per game. So, based only on what he brings to the field, that's about a 3,000 fan increase per game.
Tickets for NYY have been BLAZING. They might actually have tons of sellouts this year.
ARod and Soriano - Was the Trade Fair? (February 16, 2004)
Posted 3:44 p.m.,
February 20, 2004
(#40) -
tangotiger
I don't think it's so much 2004, as the years beyond.
Say you have Sori following:
Age Performance
24 +30
25 +30
26 +30
(I don't know what Sori's numbers was, but let's just assume that was his performance).
So, what do you do? First, you translate his performance, each year independently, into an expected performance at age 27:
Age.... Performance.... Expected Performance at 27
24 +30... +37
25 +30... +33
26 +30... +31
You give more weight to the recent seasons, on a scale of 5/4/3. That gives you a weighted average of: +33 (assumes he had the same number of PAs each year).
Finally, assuming 1800 PAs, you regress 10% towards the mean, or a true talent of +30 at age 27.
(Just a coincidence that it's +30 btw.)
Now, let's repeat, but this team, let's adjust his age:
Age.... Performance.... Expected Performance at 27
26 +30... +31
27 +30... +30
28 +30... +29
The weighted performance is now: +30. Regress 10% towards the mean, and you have +27 as his true talent at age 27. But, he will be 29. From 27 to 29, you probably lose 4 runs total. That brings him in at +23 as his true talent level at age 29.
(I calculate age as season minus YOB. Not too important, as long as you do the same for everyone.)
So, we thought he has a true talent level of +30 at age 27, but he actually had a true talent of +27 at age 27, and seeing that he is 29, he has a true talent of +23.
That's a 7 run difference, which is a whopper.
You guys can plug in whatever numbers Sori actually had.
Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)
Posted 11:14 a.m.,
February 17, 2004
(#1) -
tangotiger
I sent the following to the author above:
=========================
Here are your totals by position:
Pos Rate2 WS UZR Pinto Cedeno
------------------------------------------------
3 54 85 -11 11 35
4 59 11 82 24 45
5 38 160 36 32 67
6 38 2 -6 32 18
7 29 58 -36 -80 -8
8 31 257 21 83 97
9 27 74 -77 -75 -12
------------------------------------------------
Tot 276 647 9 27 242
OF only 87 389 -92 -72 77
I don't expect all positions to be zero, because you
are not looking at all players (and you did set
everyone to 162 GP).
However, the UZR and Pinto numbers do make the most
sense. The Rate2 numbers are about 30 runs too high
total and your WS OF are just off the charts too high.
Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)
Posted 11:24 a.m.,
February 17, 2004
(#3) -
tangotiger
Btw, the SD of all players in that chart is as follows:
UZR: 16 runs = 1SD
Pinto: 12 runs
Rate2: 11
Win Shares: 6
Win Shares just doesn't have the scale.
Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)
Posted 12:20 p.m.,
February 17, 2004
(#7) -
tangotiger
Pinto's does rely on PBP, and it uses maximum likelihood estimation. The problems with Pinto's model has been noted by me a few times in a few places (which are most notable with the CF and the curious decision to look at slices and not grids). Pinto's model would greatly increase if he were to use multi-year data (something that's built into UZR).
We should remember that Pinto's model was a first stab, and it was published as a blog entry. If Pinto were to devote his time/effort as much as MGL did to improving it, it would be great. It's not fair to criticize it as MGL does, since Pinto never really trumpeted it to the point that it should be criticized as a final product.
It would be like criticizing my "position-neutral" fielding, when it's still at the very early stages of even contemplation.
On a 0 to 10 point scale of usefulness, UZR is probably a 7, and Pinto is a 5. Rate2 is probably a 3 and WS a 2. Just wild meanigless guesses.
***
The "r" among the 4 stats as originally published:
UZR/Pinto: .67
WS/Rate2: .58
UZR/Rate2: .51
Pinto/Rate2: .45
UZR/WS: .34
WS/Pinto: .34
all to UZR: .70 (WS added nothing to Pinto/Rate2)
all to Pinto: .68 (Rate2 and WS added almost nothing to Pinto)
all to Rate2: .67 (UZR/Pinto added almost nothing to Rate2)
all to WS: .59 (UZR/Pinto added nothing to WS)
Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)
Posted 1:47 p.m.,
February 17, 2004
(#13) -
tangotiger
Yes, the SD measures the spread of the data. And the spread in WS is half what it is for all other measures.
The "problem" with correcting this though is that you'll end up with negative Win Shares.
Say that all your "regulars" SS WS per 162 GP is between 1 and 13, with a mean of 7. This corresponds to a SD of 2 WS, or 0.67 wins, or 6.7 runs (more or less). However, what we think we know is that the spread should be double that. Since the mean for fielding is fixed, you've got to keep the mean for SS at 7. That gives you a range of -5 to 19 WS for SS. This will correspond to a SD of 4 WS, or 1.33 wins, or 13 runs (which is in-line with what we think is right).
However, according to James' thinking, negative win shares can't happen, and so to "fix" this, he's got to cut the spread in half in order to fit the data into his thinking.
***
Note: A quick way to figure out the SD of something like this is to take the difference in the typical points, and divide by 6. "6" corresponds to 3 SD from the mean on both sides, which I think covers 99% of the points. It's a handy rule of thumb.
Amalgamation of Fielding Measures: Cedeno Charts (February 17, 2004)
Posted 2:14 p.m.,
February 17, 2004
(#17) -
tangotiger
It's based on the "games opps".
I'll present the hitting side, which will be easier to follow. If the avg hitter gets 680 PAs per 162 GP, we can say he's +40 runs per 680 PAs or +40 runs per 162 GP.
But, what if this guy was the leadoff hitter and had 750 PAs? In that case, he had the equivalent of 179 GP. So, if he was a +30 for 750 PAs, he'd be +27 per 680 PAs or per 162 GP.
Same situation applies with fielders, where someone like Jeter might have only 130 fielding games, because alot of balls are not hit to him, or a flyball staff will have their OF with 178 GP or something.
When you see that things are put to 162 GP, it's not HIS 162 GP, but the equivalent number of PAs or BIP that goes into 162 GP.
Blog Entry of the Week (February 20, 2004)
Posted 3:19 p.m.,
February 20, 2004
(#4) -
tangotiger
Gleeman has started drinking, I think. He's on his way.
Blog Entry of the Week (February 20, 2004)
Posted 6:48 p.m.,
February 20, 2004
(#8) -
tangotiger
I think he was looking at OPS and saw 1.1, and interpreted as he did (or something like that).
Blog Entry of the Week (February 20, 2004)
Posted 1:24 p.m.,
February 23, 2004
(#13) -
tangotiger
(homepage)
Here you go. (Will probably only be available for 10 days).
Poisson Distribution - Win % between two teams (Excel Spreadsheet) (February 20, 2004)
Posted 4:49 p.m.,
February 20, 2004
(#2) -
tangotiger
I updated the poisson file to allow you to put in "matchup" style data.
Technically, I should have a regression component to convert the performance of the team into a true talent. So, just be careful.
Poisson Distribution - Win % between two teams (Excel Spreadsheet) (February 20, 2004)
Posted 9:49 p.m.,
February 21, 2004
(#5) -
tangotiger
Soccer I would guess yes. If you email me the game scores of a soccer league, I can tell you right away.
Football has weird scoring rules (7 pts or 3 pts). It's like having 1 goal and half a goal.
FANTASY CENTRAL (February 21, 2004)
Posted 12:36 p.m.,
February 22, 2004
(#3) -
tangotiger
SD is the most important thing to know to establish the roto values.
You also want to baseline by position, such that the "replacement" level at each position is also set to zero $. Gets more complicated when you have 1 ss, 1 2b, and 1 2b or ss.
The sum of the 200 players (or how many players are drafted) should equal the sum of the auction dollars available for the league. Those $ values are the max you should spend. Ideally, you should pick up 300$+ worth of players for 250$ of money.
That is, you should end up with an extra 1 or 2 star players than your opponents.
FANTASY CENTRAL (February 21, 2004)
Posted 11:43 a.m.,
February 24, 2004
(#23) -
tangotiger
J, what kind of difference are we talking about here? In your post #2, are your coefficients based on the SD of the players, or is it based on putting them into teams?
FANTASY CENTRAL (February 21, 2004)
Posted 5:05 p.m.,
February 24, 2004
(#45) -
tangotiger
J,
Can you print out the SD (and for the rate stats, the number of AB or IP) for your 100 team sample?
***
As for the higher correlation, it might be that the high K pitchers also have better than avg ERA and wins, etc.
FANTASY CENTRAL (February 21, 2004)
Posted 9:34 a.m.,
February 25, 2004
(#52) -
tangotiger
Using J. Cross' numbers from post #49 (which are based on the team results' SD), force the HR to 1, and his hitters come in at:
hitter
= HR
+ 1.05 * SB
+ 0.39 * Runs
+ 0.37 * RBI
+ 0.63 * (H-AB*.28)
+ some constant that will be the same for all players
If I instead look at the SD of individual players (from 1994-2002 as a group, with at least 300 PA), and I get:
hitter
= HR
+ 1.00 * SB
+ 0.45 * Runs
+ 0.39 * RBI
+ 0.83 * (H-AB*.28)
+ some constant that will be the same for all players
If I make it at least 500 PA, I get:
hitter
= HR
+ 0.96 * SB
+ 0.51 * Runs
+ 0.41 * RBI
+ 0.85 * (H-AB*.28)
+ some constant that will be the same for all players
***
We see that the biggest difference comes in the handling of the batting average.
***
I suppose what you really want to do is take each team, and replace 1/14th of the team totals with runs scored from 60 to 130, and recompute the standings. Then, come up with the best-fit function. Hopefully, you'll get something like a straight-line, so you don't have to worry about it too much. And, you repeat this for all the categories.
Anyone want to try?
FANTASY CENTRAL (February 21, 2004)
Posted 1:53 p.m.,
February 25, 2004
(#56) -
tangotiger
Can you tell me how many hitters are drafted, and from which league?
As you can see from my run in post #52, the SD hardly change, whether you select all players with at least 300 PA or 500 PA.
FANTASY CENTRAL (February 21, 2004)
Posted 12:04 p.m.,
February 26, 2004
(#68) -
tangotiger
I think you are understanding it fine.
I doubt the .28 will change much, but that's easy enough to check with a little work.
I'll have the Marcels, but remember, Marcel is the most basic estimate that all other forecasters should improve upon. If you have access to ZiPS, PECOTA, DMB, Ken Warren, etc, use those.
FANTASY CENTRAL (February 21, 2004)
Posted 5:31 p.m.,
February 28, 2004
(#75) -
tangotiger
Selecting the pitchers from 94-03 with at least 20 GS or 5 SV (that's an average of 160 pitchers per MLB season), here are the SD's for the counting categories:
W: 5
L: 4
G: 16
GS 13
CG: 2
SHO: 1
SV: 12
I'm not sure if my threshholds are proper. If you would like different ones, please let me know.
FANTASY CENTRAL (February 21, 2004)
Posted 10:31 a.m.,
March 1, 2004
(#77) -
tangotiger
I'll try to target devoting an hour or two on Fri Mar 5 to do the Marcels.
I'm going to assume a playing time estimate for hitters of:
.5 * 2003 PA + .1 * 2002 PA + 200
That means a guy with 500 PAs in 2003 and 2002 is expected to get 500 PA in 2004.
I haven't looked for starters, middle guys, and relievers what to do.
FANTASY CENTRAL (February 21, 2004)
Posted 11:44 a.m.,
March 2, 2004
(#85) -
tangotiger
The best stat they report is RMSE - the average difference between projected stats and actual stats.
The implication of this statement is that if you had a league OPS in 1984,85,86 of .730,.730,.730, you would expect .730 for the league in 1987. We know that 1987 was not like the others. Say it was .770.
If you had Tim Raines at .830,.830,.830 in those 3 years, you might project him at .820 or something in 1987. If he ended up being .860, you'd think you were off by .040. But, everyone in the league would be off by that.
The best way to do is to compare the player to the league (whether by differential, division, or ratio).
FANTASY CENTRAL (February 21, 2004)
Posted 1:28 p.m.,
March 2, 2004
(#87) -
tangotiger
No, because how can you predict 1987?
The fair thing to do is to say: "this forecaster predicted the overall lg avg the best". But, after that, within that forecaster's universe of players, it should be normalized. There are two sets of predictions here.
In fact, every forecaster worth his salt would do the forecasts in this two-step process. The first thing he tries to figure out is how is the whole league average going to be affected (say like when the strike zone was changed). After that, he estimates the player relative to the league average.
FANTASY CENTRAL (February 21, 2004)
Posted 2:00 p.m.,
March 2, 2004
(#89) -
tangotiger
J, agreed. Even the correlation is no good, because that fits the slope and the intercept. In my view, the slope should be fixed at 1, and you fit the intercept to the league average.
Afterwards, you can compute your RMSE.
Essentially, only do RMSE on OPS/lgOPS or OPS-lgOPS.
FANTASY CENTRAL (February 21, 2004)
Posted 3:07 p.m.,
March 2, 2004
(#92) -
tangotiger
I prefer the normalized stats, esp for pitchers who change leagues. If his ERA is 150+, does it matter what his NL ERA forecast is and what his AL ERA forecast is?
What a player's forecast is actually saying is: "given that I think the level of competition would produce a runs per game of 4.52, this is what I think his ERA is going to be". If you take that pitcher, adn throw him into a different league, you will still expect that player, relatively speaking, to produce at around the same rate.
Anyway, I think we both said our points 3 different ways, so we ain't getting nowhere.
FANTASY CENTRAL (February 21, 2004)
Posted 10:59 a.m.,
March 3, 2004
(#96) -
tangotiger
I made a post in the PECOTA thread at Clutch, but since that takes forever to load, I'll repost here, and continue the discussion here.
***
Why project "Ice" Williams to have a 465 OPS? Better make it 680, since the only way he manages 130 PAs is by putting up an OPS over 650.
This is a great point that is not said enough. The forecast of the performance is also dependent on the number of PAs. There was an interesting article on the Primer home page that talked about this kind of thing.
I also put out, somewhere at Primate Studies, that shows how a player's performance is tied-in to his PAs.
What I would do, if I were so inclined, would be to forecast a different set of OPS, based on the number of PAs he'd get. For example, if my best guess in full-time play that, Soriano's OBA is going to be .360 (with an error range), then I would make the following guess:
PA, OBP
700,.360
600,.355
500,.350
400,.345
300,.340
200,.335
100,.330
(Numbers for illustration only.)
Why would I do that? Because, as a group, this pattern exists. Why does it exist? One might be injuries, that a guy might be playing through injuries, and then it catches up to him. Another might be that when you start off slow, a manager might be tempted to start benching you (see World Series), therefore, not giving you enough PAs to catch up to your normal talent level. Another might be that something might have changed with you.
So, to graphically show this, you would do:
... percentile
PA.... 25%.... 50%....75%
700.....340.....360...380
600.....330.....355...380
500.....320.....350...380
, etc, etc
(Numbers for illustration only.)
Then, within each of those, you would "click" that estimate (say the .355), and you would get another set of "percentiles" to show you the likelihood of getting that from walks or hits, etc.
***
The point here is that PAs are very germane to the issue here. Forecasters are doing their best to estimate the true talent level within that park/league context, but the only thing we can verify is their actual performance levels. And that is tied-in to the number of opps they are given/earned.
FANTASY CENTRAL (February 21, 2004)
Posted 3:07 p.m.,
March 3, 2004
(#98) -
tangotiger
3. feel free to use this thread
1. I meant how the run scoring in 87 was out-of-line with the surrounding years
FANTASY CENTRAL (February 21, 2004)
Posted 1:10 p.m.,
March 5, 2004
(#104) -
tangotiger
I agree that if you take a reasonable set of players, you should use the SD among those players. You could rerun it every time. That is, start off with the 250 players you think might be selected, and figure out the SD for the 500 MLB players. Rank them. Take the top 250, and redo your SDs. Rerank them. And on and on. Your SDs will stabilize very very quickly.
FANTASY CENTRAL (February 21, 2004)
Posted 1:11 p.m.,
March 5, 2004
(#105) -
tangotiger
and figure out the SD for the 500 MLB players.
Should read as: and using the SD for the 250 players, figure out the scores for the 500 MLB players.
FANTASY CENTRAL (February 21, 2004)
Posted 1:11 p.m.,
March 5, 2004
(#106) -
tangotiger
and figure out the SD for the 500 MLB players.
Should read as: and using the SD for the 250 players, figure out the scores for the 500 MLB players.
FANTASY CENTRAL (February 21, 2004)
Posted 1:28 p.m.,
March 5, 2004
(#108) -
tangotiger
J, if you send it to me, I can post it here. Your call.
Michael/#103: I think if you stick to the player level, you should be fine.
FANTASY CENTRAL (February 21, 2004)
Posted 9:09 a.m.,
March 8, 2004
(#114) -
tangotiger
Sky, the spread in RBIs is twice that of HR. If most players will have 10 to 50 HR, they will also have 60 to 140 RBIs (numbers made simple for illustration).
You use the SD to try to make them even, because RBIs and HR have the same impact to the overall Rotisserie score.
If you do:
HR score = (HR - 20)
RBI score = (RBI - 80) / 2
what you are doing is:
a) comparing players to the mean
b) standardizing the impact of the events
Since you will be subtracting 20 or 80 from everyone, all that cancels out. What you are left with is just the spread (standard deviation).
Total score = HR + RBI / 2
You figure out what the replacement level "total score" will be. That sets your dollar value for that player to zero.
***
If you have just 6 players in the whole league, and they have the following "total scores":
player1: 400
player2: 300
player3: 200
player4: 100
player5: 50
player6: 20
and if the whole league has to draft 4 players, then you know that that last guy will go for the minimum (say 1$). So, recalculate the relative total scores as:
player1: 300
player2: 200
player3: 100
player4: 0
player5: less than zero
player6: less than zero
Say that the whole league will have 64$ in salary. Everyone has a minimum 1$ in salary, so that leaves us with 60$ in marginal dollars.
The total scores above replacement is 300+200+100+0=600. So, we see that the marginal dollar per marginal total score is 1 per 10.
That gives us the following roto values:
player1: 31
player2: 21
player3: 11
player4: 1
player5: 0
player6: 0
Total$ = 64
That's how you do it.
FANTASY CENTRAL (February 21, 2004)
Posted 9:19 a.m.,
March 8, 2004
(#115) -
tangotiger
As for being underwhelmed by PECOTA, I'm not sure what you are expecting. There are no soothsayers here. Anyone who can predict a "breakout" is full of it. The best you can do is establish a probability distribution to that players true talent level, and then you can establish a probability distribution of a performance level, based on the player's true talent probability distribution.
For example, say that the MLB player is a "100", and Bonds is a "200", and a top minor leaguer is an "80". You've got Javier Vasquez. What do you do?
Well, you try to figure out what his probable true talent level is. You figure that he's a 130. But, that's a best guess. He's more likely to be a 130, with 1 SD = 15. So, you are 68% sure that he's a 115 to 145, and 95% sure that he's a 100 to 160, etc, etc. (Not quite so symmetrical). There is a chance that he's actually a below average pitcher.
Now that you've got that, for every point on the distribution curve, you have to figure what the likelihood of him performing at various levels if given only 1000 PAs. So, at the 100 level, he's got a prob distribution of 100, with 1 SD = 20. At the 101 level, he's got a prob distribution of 101, with 1 SD = 20. (Again, not so symmetrical).
You add it all up, and you end up with a weighted performance level of 130, with 1 SD = 20. You break that out into "percentile" rankings, and you get your answers.
***
You should also realize that there is another dimension: number of PAs. The less PAs, the lower the performance level. Why? Injuries, or just bad luck, and the manager not sticking with you. So, that's another probability distribution to consider.
***
In any case, just be happy with the true talent level. All the other stuff is interesting, but not really useful.
FANTASY CENTRAL (February 21, 2004)
Posted 10:54 a.m.,
March 8, 2004
(#117) -
tangotiger
Don't do replacement level by each stat. You have replacement level players, and not replacement level HRs.
***
"I guess I missed the part where you subtracted out league-average stats from each player's category totals"
For counting stats, you don't have to do that, because it all comes out in the wash.
***
As for batting average, see posts: 47 through 52.
FANTASY CENTRAL (February 21, 2004)
Posted 5:16 p.m.,
March 15, 2004
(#127) -
tangotiger
It should be based on the last reliever to be picked. The last player at every position is worth exactly 1 dollar. How much value Gagne has above this player compared to Vlad is what decides who you should pick.
If Vlad has "150 units", Gagne has "70 units", the last OF chosen has 100 units, and the last reliever expected to be chosen is 10 units, then it is irrelevant that Wagner has 60 or 30 or 69 units. Gagne is 60 above and Vlad is 50 above. (Assuming that all units are equal).
FANTASY CENTRAL (February 21, 2004)
Posted 7:46 p.m.,
March 15, 2004
(#129) -
tangotiger
I'm not sayign to draft Wagner! I meant that it doesn't matter how much Gagne is above Wagner.
FANTASY CENTRAL (February 21, 2004)
Posted 10:38 p.m.,
March 15, 2004
(#133) -
tangotiger
I agree with Sky. The only way that you care about the "next best player" is with very extreme distributions of positions. Even then, it's hard to believe that this can even have much of an effect when you consider the number of positions in baseball.
Maybe in basketball, if you have your positions as guard, center, forward. And, if there's a big skew in the distributions of centers. Only then do I think I can even consider buying it. And, I'm not even sure about that either.
FANTASY CENTRAL (February 21, 2004)
Posted 11:51 a.m.,
March 16, 2004
(#137) -
tangotiger
But, you are assuming only 2 positions. In the baseball scheme of things (with the 8 position players, and pitchers), I don't think your scenario has much if any impact. Just a guess on my part.
What you need to do is run Monte Carlo.
FANTASY CENTRAL (February 21, 2004)
Posted 3:05 p.m.,
March 19, 2004
(#142) -
tangotiger
I have no way of telling by your point listing. You need to:
1 - apply your point system to hitters and to pitchers
2 - draw a line at the number of hitters and pitchers who will be selected (preferably based on position, but not necessary for something Q&D)
3 - reset your points as points above replacement hitter and pitcher
4 - average your new hitter points and pitcher points
That'll tell you if there's any favoritism.
FANTASY CENTRAL (February 21, 2004)
Posted 10:19 p.m.,
March 21, 2004
(#149) -
tangotiger
Funny thing... I was doing them while my kid was sleeping. I'm almost done.
FANTASY CENTRAL (February 21, 2004)
Posted 1:58 p.m.,
March 23, 2004
(#152) -
tangotiger
I set the replacement level at about the 70th best OF, but at the end of the day, I should have set it at about the 40th best OF
How many teams were there, was it AL+NL or one league only, and how many OF subs and "general" subs did you have to choose?
FANTASY CENTRAL (February 21, 2004)
Posted 10:53 p.m.,
March 24, 2004
(#154) -
tangotiger
60-70 OF would be what I would have figured as well. Now, you mentioned that there were a about 20% of those in your top 70 OF list not drafted. What about the other positions? Was there the same kind of sub-optimal drafting?
If it's the case that there is a particular position that you EXPECT to have sub-optimal drafting, then you would adjust your base accordingly.
While we like to say to draw a line at the 140 hitters, we're really saying to draw a line at the number of hitters, out of those 140, that would be selected. So, if only 120 of those 140 hitters would have been selected, draw a line at 120.
FANTASY CENTRAL (February 21, 2004)
Posted 3:07 p.m.,
March 25, 2004
(#156) -
tangotiger
It's certainly possible that your group specifically overvalued OF (or undervalued their replacement level). Same thing happens in hockey with defensemen.
EconPapers: Steven Levitt (February 24, 2004)
Discussion ThreadPosted 5:33 p.m.,
February 24, 2004
(#1) -
tangotiger
(homepage)
More from Levitt, including the penalty-kick paper.
EconPapers: Steven Levitt (February 24, 2004)
Posted 5:33 p.m.,
February 24, 2004
(#2) -
tangotiger
(homepage)
More from Levitt, including the penalty-kick paper.
EconPapers: Steven Levitt (February 24, 2004)
Posted 6:54 a.m.,
February 25, 2004
(#5) -
tangotiger
Fish: this forum is not limited to baseball, though the emphasis is baseball.
If you check out the index that I compiled, you will see a handful of links on other sports, as well as statistical concepts.
EconPapers: Steven Levitt (February 24, 2004)
Posted 4:33 p.m.,
February 25, 2004
(#6) -
tangotiger
Fish said: Am I missing something, or is this link just for the sake of admiring data in other realms?
Did you read the soccer link? I just finished reading it. It's fascinating stuff. This is a great example of game theory, with a minimal number of variables at play (unlike say the pitcher/batter matchups). I look forward to reading a few more of Levitt's links. The application of his methods can certainly be translated in some form to baseball.
To answer your question: no, it's not just for the sake of admiring "data" in other realms. It's for the sake of enjoying, and perhaps even learning, expressions of ideas beyond baseball. And, if we're lucky, applying them to baseball.
Baseball Prospectus - : Evaluating Defense (March 1, 2004)
Posted 10:16 a.m.,
March 2, 2004
(#12) -
tangotiger
(homepage)
In addition to getting everything you want about UZR from the Primate Studies Index, you can go to the above homepage link to read all about UZR, along with the dozens of comments. Take a few hours to go through both articles and commentary.
Baseball Prospectus - : Evaluating Defense (March 1, 2004)
Posted 10:17 a.m.,
March 2, 2004
(#13) -
tangotiger
(homepage)
As well, you should read Mike Emeigh 8-part series at the above homepage link for a primer on all fielding metrics.
Baseball Prospectus - : Evaluating Defense (March 1, 2004)
Posted 11:16 a.m.,
March 2, 2004
(#16) -
tangotiger
You develop a system to maximize the information at hand. If that means using UZR for 1989 to today, and using something else for other years, fine.
A good process would baseline their system against maximum data to use in those years with minimum data. That is, we know how many GB and FB are hit from 1989 to today. If you have a way of trying to estimate that, then you should ensure that you do so against what we know (with an error range). Once you have that system do, you apply it backwards in time (with that same error range, and perhaps a little higher). Same thing with lefty/righty, grass/turf, etc.
I think Charlie Saeger / Mike Emeigh do something like this, but I'm not sure to the extent that it is done.
Baseball Prospectus - : Evaluating Defense (March 1, 2004)
Posted 11:17 a.m.,
March 2, 2004
(#17) -
tangotiger
(homepage)
Larry, he's one more link for you. James Fraser and Sylvain have done just as you said, and every article related to fielding is on the above page. (Not sure when it was last updated)
Baseball Prospectus - : Evaluating Defense (March 1, 2004)
Posted 12:03 p.m.,
March 2, 2004
(#18) -
tangotiger
Rolen's UZR runs and games from 1999 to 2003:
99: +26, 101
00: +25, 116
01: +33, 154
02: +28, 155
03: +3, 148
Pinto has him at +6 outs (or +5 runs) for 2003.
That's certainly quite a change in performance. I'll take a guess and say that it is statistically significant, but maybe someone else can chime in here.
Figure that from 99 to 02 he did: 2104 BIP, and .767 outs per BIP (lg of .700). In 03, he'd be 592 BIP, and .706 outs per BIP.
So, his probable true talent in 99-02 was .756. His 03 performance would be 3 SD away from this.
Baseball Prospectus - : Evaluating Defense (March 1, 2004)
Posted 12:05 p.m.,
March 2, 2004
(#19) -
tangotiger
Of course it is statistically significant (pretty much anything is if you put your level low enough). I meant if it was significant at the 95% level.
Baseball Prospectus - : Evaluating Defense (March 1, 2004)
Posted 1:23 p.m.,
March 2, 2004
(#22) -
tangotiger
Colin, agreed.
***
The BP author was kind enough to reply to my rant. In essence, the reply is that the target audience of the piece doesn't necessarily include someone like me. I'm ok with that.
My points are:
1 - calling something as straight-forward as ZR "enigmatic" is, in my view, unenlightening (ZR is to a fielder what OBA is to a batter); a new reader, or the target reader, might take that at face value, and not even question it
2 - that if you decide to write those parts that I have noted in bold, then UZR should have been mentioned
Take out those parts in bold, and the article itself does a good job of conveying its message to its target audience.
More Help Requested (March 4, 2004)
Posted 11:53 a.m.,
March 5, 2004
(#3) -
tangotiger
Actually Alan, I specifically asked only for honest opinions of those people who watched at least 20 games of the player. Here's the full ballot in question, with the person's email removed. (To that person: thank you so much for adding a few hours of work to my plate to look for other junk ballots.)
Coomer, Ron 1 1 1 2 1 2 2
Giambi, Jason 1 1 1 1 1 1 1
Jeter, Derek 4 5 5 5 5 5 5
Johnson, Nick 2 2 2 2 2 2 2
Matsui, Hideki 2 2 2 2 2 2 2
Mondesi, Raul 1 1 1 1 1 2 2
Posada, Jorge 5 4 4 4 4 4 4
Soriano, Alfonso 3 3 3 3 3 3 3
Spencer, Shane 3 3 3 3 3 3 3
Ventura, Robin 3 3 3 3 3 3 3
White, Rondell 1 1 1 1 1 1 1
Williams, Bernie 5 5 5 5 5 5 5
That to me just reads junk. While Shane Spencer's line is defensible, there are enough junk picks in there that I have no problem believing that this person was not being honest.
You can easily flag several picks in there as being indefensible. Bernie's arm is nowhere near as comparable to Vlad. It's comparable to mine.
Your point about analyzing the data with and without the changes is a good one. I can almost see 3 types of ballots: (1) honest, (2) questionable, but maybe defensible, (3) lies.
I have no problem taking the above ballot as a lie, and removing it as if it didn't exist. I'm hoping that I won't have many, if any, of ballots in case #2. If I do, then I would follow your suggestion.
Thanks for the thoughts...
More Help Requested (March 4, 2004)
Posted 1:07 p.m.,
March 5, 2004
(#7) -
tangotiger
Variability: I'll have to check that out.
One bad/goofy datapoint out of thirty won't affect any end-analysis too much, I wouldn't think
See, that's the problem. Take out the "5" and the SD of the remaining ballots on Bernie's arm is a .5. Include the junk ballot, and the SD is .8. The lower the SD, the more agreement there is on a trait. But, there's as much agreement on BErnie's arm as there is on all his other traits (using the junk ballot). Removing the junk, and it soars.
In this case, this was 1 out of 40 ballots for Bernie. Imagine those players with only 10 ballots.
I'm hoping that this yahoo is the only person that has this kind of an effect. We'll see...
More Help Requested (March 4, 2004)
Posted 2:25 p.m.,
March 5, 2004
(#9) -
tangotiger
I only used Bernie as an example, because it was so obvious.
The problem with what you are saying is for cases like Ichiro or Beltran or Shane Spencer or Jeremy Giambi, where plenty of ballots have put strings of 5 or 4 or 3 or 1.. .i.e., they are even in their talent traits across the board. Agreed that to see that many strings on one ballot would be tough, but then again, some people only fill out a few names on a ballot. So, if someone only picks Cameron and Ichiro, I shouldn't flag that ballot (or if I flag it, I will accept it upon investigation).
My current preference is to use:
- ballot minus group average
- if ballot is > 2, flag
- if # of flags > 3, investigate to delete
However, thinking further, alot of the junk balloters are really lazy, and they would likely have used the same value across the board.
I'll have to think about it some more.
More Help Requested (March 4, 2004)
Posted 3:02 p.m.,
March 5, 2004
(#10) -
tangotiger
Using my flagging method in my last post, the person who put that Bernie ballot out was flagged with 11 entries out of 84 selections on the ballot. I found another one with 38 entries out of 76 selections, someone calling themselves a Sox fan, and marking all their players as 1 practically.
Here are the biggest offenders:
FanID n junk junkPercent
88 76 38 50
389 28 10 36
248 63 11 17
241 84 11 13 .... that's the Bernie fan
362 105 9 9
I've got over 400 ballots, so that's comforting. I'll check out the other ballots, but I'm sure they're of the same variety.
I should have implemented a registration system.
More Help Requested (March 4, 2004)
Posted 3:08 p.m.,
March 5, 2004
(#11) -
tangotiger
Using Alan's method of SD, fan 88, 389, and 241 are in the top 4.
Fans 248 and 362 that I 've flagged as junk, have a normal SD. It's almost like they tried to make their ballot look reasonable (like the exact opposite of what they really felt). I'll check out those ballots.
More Help Requested (March 4, 2004)
Posted 3:14 p.m.,
March 5, 2004
(#12) -
tangotiger
In reply to J CRoss, here are the average of the SD (take each player's SD by tool, and average that):
Instincts: .82
First Step: .80
Speed: .70
HAnds: .75
footwork: .77
Arm strength: .75
Arm Accuracy: .78
So, there's not that much agreement as you might expect.
More Help Requested (March 4, 2004)
Posted 3:31 p.m.,
March 5, 2004
(#14) -
tangotiger
It might be a sign that the fans listen to the same announcers and analysts too.
More Help Requested (March 4, 2004)
Posted 4:02 p.m.,
March 5, 2004
(#17) -
tangotiger
I don't have much time for anything (but that's rarely stopped me unfortunately), and the SD are severely affected even with only 1 junk responder, as noted earlier.
If I look at that ballot in post #3, I can't just leave it in. I realize "where do I draw the line". But that ballot, and another ballot by a purported Sox fan listing all players as 1 in all traits. That's a gulf.
I'm also thinking that showing the results pre and post junking is going to be a time-waster. I don't see how anyone would ask me for the results with that junk ballot put back in.
I think what I will do is list all ballots that I consider junk. If I get a couple of you to say "hmmm, that ballot seems weird, but I can live with it", then I'll put it back in.
More Help Requested (March 4, 2004)
Posted 4:12 p.m.,
March 5, 2004
(#18) -
tangotiger
Correlation (r) between Speed and...
Instincts: .41
First Step: .78
Hands: .33
Footwork: .31
Arm Strength: .18
Arm Accuracy: .23
All of them: .84
All of them (Except First Step): .41
All of them (Except First Step and Instincts): .34
More Help Requested (March 4, 2004)
Posted 5:05 p.m.,
March 5, 2004
(#20) -
tangotiger
Means, SD, sim scores, regressions with UZR all at the player/Tools, player and position/tools level.
The SD won't cancel out enough with Bernie. His SD is either .5 or .8 for throwing arm. What I'm going to present is a "level of agreement", and show it either as the SD (.5) or convert that into some sort of percentage, like 75% or something. A .8 is the same SD as for an average player's trait. And one thing is for certain, if it wasn't for that moron, almost everyone agrees that Bernie is either a 1 or 2 for arm.
Maybe we can ignore that person's ballot without throwing it out. Say we have 40 Bernie/arm ballots, and it shows:
1 - 30
2 - 7
3 - 2
4 - 0
5 - 1
The group mean is 1.375, and the mode is 1.
So, 75% agree that he's a 1. 92.5% agree that he's a 1 or 2. 97.5% agree that he's a 1,2,3. 97.5 as 1,2,3,4.
In terms of "level of agreement", what if I weight the first number as "4", the second as "3", the third as "2", and the fourth as "1". This will give me a level of agreement of: 87%.
If I threw out the junk ballot, I'd get: 89%. So, its almost not worth throwing out.
I think this better expresses how the ballots look, than what the SD tries to do.
Thoughts?
More Help Requested (March 4, 2004)
Posted 10:22 a.m.,
March 8, 2004
(#22) -
tangotiger
Alan, I agree that this would only make sense to me. Showing a "% of respondents who agreed on a 3/4 was 75%" is much clearer.
More Help Requested (March 4, 2004)
Posted 6:44 p.m.,
March 11, 2004
(#24) -
tangotiger
I agree that 1 or 2 outliers should stay in.
In the cases that I cited, in one of them 11 of them were outliers (that ballot at the top of this thread). In another 36 of the 72 selections were outliers.
In all, I had less than 10 ballots where at least 10% of the selections were outliers. I would only consider those, and as well, I would publish those ballots so that the reader will be free to add those back in if he so chooses.
More Help Requested (March 4, 2004)
Posted 10:03 a.m.,
March 18, 2004
(#25) -
tangotiger
I'm thinking about regression towards the mean, based on the number of ballots cast. For example, someone complained about NEifi Perez being one of the lowest ranked fielders in MLB, when he "surely" is above average. My problem is that only 4 people voted for Neifi, while 58 voted for Yankee fielders.
So, what I need to do is regress the responses, and apply a confidence interval. In this case, if Neifi is, based on 4 ballots, a 25 on a scale of 0 to 100, with 50 as average, and the SD is 20, I would want to convert that into something like:
true talent = 40
1 SD = 15
(numbers for illustration)
When I check out what IMDB.com does with their movies:
http://us.imdb.com/top_250_films
they note the following:
The formula for calculating the top 250 films gives a true Bayesian estimate:
weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the top 250 (currently 1250)
C = the mean vote across the whole report (currently 6.9)
So, it seems IMDB uses a regression towards the mean equation similar to what I use for baseball (x/x+PA). The key value in the IMDB weighting is the "m" value.
Any thoughts as to what that m should be in my case?
More Help Requested (March 4, 2004)
Posted 8:03 p.m.,
March 20, 2004
(#28) -
tangotiger
The "m" is the specific weight, in this case, 1250. As they use it here, they mean 2 things: 1) the regression towards the mean component, so, the fixed value of 1250, and 2) "oh, and by the way, we only list those movies that are regressed at most 50% towards the mean".
More Help Requested (March 4, 2004)
Posted 11:12 p.m.,
March 20, 2004
(#30) -
tangotiger
I didn't think bayesian systems used cut offs for inclusion
I think the cutoff was something that was added and has nothing to do with Bayesian. Just like if I decided to give everyone 5 votes of "3" (average), that I still wouldn't want to include a guy with 1 vote of a "5". The total would be 3.33, but that really tells me nothing about that guy.
Same here. A movie get 100 votes of "10". But, every movie starts off with 1250 votes of "7". That would make this limited seen movie a "true talent" ranking of 7.2. Heck, my home movies would get a 7.1.
However, making the "m" as 1250... fine... I have no idea how they figured that one out, other than the way I would do it with OBA in MLB. 1250 sure looks way too high. But, then going out and also making that the cutoff? I guess they thought it would be too weird to show the sample average as 10.0, and the weighted average of 7.2.
More Help Requested (March 4, 2004)
Posted 12:52 p.m.,
March 21, 2004
(#32) -
tangotiger
Alan,
I had thought that they just drew a line at 1250, and whatever movies fell above that line (say 2000 thousand), those movies qualify. Then, using 1250 again (coincidentally) as the regression towards the mean component, then put out the weighted "true score" list.
However, what you are suggesting is that they first selected the top 250 unweighted movies, and the movie with the fewest votes in that list is 1250, and then proceeded. I don't think that's right.
More Help Requested (March 4, 2004)
Posted 2:06 p.m.,
March 22, 2004
(#35) -
tangotiger
Here are the ballots I rejected (6), along with the reason. These ballots were rejected on the following basis: more than 2 selections (and 10% of the selections on the ballot) that differed from the mean by more than two levels. Out of nearly 500 ballots, only these 6 were rejected.
FanID Team Player Instincts FirstStep Speed Hands Release Strength Accuracy
210 TOR Cash, Kevin 1 1 1 1 1 1 1
210 TOR Delgado, Carlos 5 5 5 5 5 5 5
210 TOR Wells, Vernon 5 5 5 5 5 5 5
210 TOR Woodward, Chris 1 1 1 1 1 1 1
Delgado was an across-the-board "5". Having across-the-board "1" on Cash and Woodward didn't help either.
259 NYA Coomer, Ron 1 1 1 2 1 2 2
259 NYA Giambi, Jason 1 1 1 1 1 1 1
259 NYA Jeter, Derek 4 5 5 5 5 5 5
259 NYA Johnson, Nick 2 2 2 2 2 2 2
259 NYA Matsui, Hideki 2 2 2 2 2 2 2
259 NYA Mondesi, Raul 1 1 1 1 1 2 2
259 NYA Posada, Jorge 5 4 4 4 4 4 4
259 NYA Soriano, Alfonso 3 3 3 3 3 3 3
259 NYA Spencer, Shane 3 3 3 3 3 3 3
259 NYA Ventura, Robin 3 3 3 3 3 3 3
259 NYA White, Rondell 1 1 1 1 1 1 1
259 NYA Williams, Bernie 5 5 5 5 5 5 5
No one would ever equate Bernie's arm as the best in the league. Williams' best comp I think was Rondel White. Considering they both played on the same team, this ballot is just disgusting. Raul Mondesi's arm strength a "1"? The whole same-score listing for each player just shows how utter nonsense this ballot is.
311 NYA Giambi, Jason 2 2 1 3 2 3 4
311 NYA Jeter, Derek 1 1 1 1 1 1 1
311 NYA Johnson, Nick 4 3 3 4 4 4 4
311 NYA Matsui, Hideki 2 2 1 3 2 2 1
311 NYA Mondesi, Raul 3 4 4 3 2 4 4
311 NYA Posada, Jorge 1 1 1 1 1 1 1
311 NYA Rivera, Juan 4 5 4 3 3 3 4
311 NYA Soriano, Alfonso 2 3 5 2 2 3 2
311 NYA Williams, Bernie 2 1 4 3 2 3 3
This fan doesn't care at all for Derek Jeter. Alot of the other selections are justifiable, but this blatant attempt to bring down Jeter invalidates the whole ballot.
387 BOS Clark, Tony 5 1 1 0 5 5 5
387 BOS Damon, Johnny 1 1 1 1 1 1 1
387 BOS Daubach, Brian 1 1 1 1 1 1 1
387 BOS Garciaparra, Nomar 1 1 1 1 1 1 1
387 BOS Millar, Kevin 1 1 1 1 1 1 1
387 BOS Mirabelli, Doug 1 1 1 1 1 1 1
387 BOS Mueller, Bill 1 1 1 1 1 1 1
387 BOS Nixon, Trot 1 1 1 1 1 1 1
387 BOS Ortiz, David 1 1 1 1 1 1 1
387 BOS Ramirez, Manny 1 1 1 1 1 1 1
387 BOS Varitek, Jason 1 1 1 1 1 1 1
Yup, the worst fielders in the whole league at every position. The worst ballot I've ever seen.
396 CIN Boone, Aaron 5 5 5 4 5 5 3
396 CIN Branyan, Russ 2 1 2 1 0 0 0
396 CIN Casey, Sean 2 2 1 3 2 2 1
396 CIN Castro, Juan 4 4 3 5 4 3 4
396 CIN Dunn, Adam 1 1 4 2 5 5 3
396 CIN Griffey Jr., Ken 5 5 1 5 3 1 2
396 CIN Guillen, Jose 3 3 3 3 5 5 5
396 CIN Kearns, Austin 5 5 4 5 5 5 4
396 CIN Larkin, Barry 1 1 1 1 1 1 1
396 CIN Larson, Brandon 2 2 1 3 3 4 3
396 CIN LaRue, Jason 3 0 0 2 4 5 3
396 CIN Pena, Wily Mo 1 1 3 1 1 4 1
Similar to the Jeter ballot, this person thinks that Larkin is a rundown completely decaying SS, which is a far cry from the overall balloting.
407 BOS Damon, Johnny 1 1 1 1 1 1 1
407 BOS Garciaparra, Nomar 1 1 1 1 1 1 5
Again, an attempt to bring down Redsox players.
418 CHA Crede, Joe 5 0 0 0 0 4 4
418 CHA Lee, Carlos 3 3 4 5 0 0 0
418 CHA Valentin, Jose 1 1 1 1 0 1 3
Again, another fan that can't stand Valentin. The rest of the ballots had Valentin as: 3.9, 4.1, 3.4, 2.8, 3.1, 4.3, 2.1. That's 4 huge discrepancies.
More Help Requested (March 4, 2004)
Posted 7:12 a.m.,
March 23, 2004
(#38) -
tangotiger
The ballots are only junk if the person doesn't believe what they're are saying, not if they're incorrect.
I had already emailed 2 of the balloters, and they did not reply. Two others did not leave their email addresses. I'm going to email the other 2 today. As far as I'm concerned, these are not observational biases, but an attempt to force an outcome. Observational biases even-out. Since I set pretty loose standards to catch junk ballots, I'm sticking with this.
Relocation and the effect on attendance (March 5, 2004)
Posted 11:45 a.m.,
March 5, 2004
(#2) -
tangotiger
I just think that if you have 10 million people born in NJ in the next 10 years, and 1 million of them will be baseball fans, maybe 250,000 of them would become NJ Expos fans and reject their fathers' allegiances to Yanks or Mets.
No question that there are enough fans to handle 3 or 4 franchises in the NY area. The Yanks and Mets want them all to themselves.
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 5:30 p.m.,
March 9, 2004
(#1) -
tangotiger
There's an addendum as well:
====================
A note on where I got my figures: canoe.ca provided the figures which are the average salary per player per team. I made the assumption of 30 players per team. And of course, I have no idea whether these salaries are at start of year, or aug 30, or whatever. As I said, I would have used different numbers if given the access.
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 9:24 a.m.,
March 10, 2004
(#4) -
tangotiger
Bob, the results of what? Pappas? Me? Neil?
In any case, we ALL use marginal wins per marginal dollars. The equation is the following:
wins = a (salary) + b
Even if you want it to be:
wins = a (salary - c) + b
or
wins - d = a (salary - c) + b
it's ALL the same thing.
LEt's do that last one:
wins - d = a (salary - c) + b
adding "d" to both sides and this becomes
wins = a (salary - c) + b + d
expand the "a" part and we get:
wins = a (salary) - a(c) + b + d
make z = a(c) + b + d
and we get"
wins = a (salary) + z
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 10:01 a.m.,
March 10, 2004
(#5) -
tangotiger
(homepage)
I took the above chart, and plotted it. r = .81.
In terms of how much money teams pay and how much production they get out of it, I think a linear fit works pretty well.
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 10:13 a.m.,
March 10, 2004
(#6) -
tangotiger
More tidbits:
If your team payroll is 60 million$ below league average (say 10 million compared to a league of 70), your expectation is 30 wins below league average (or 51 wins or .315) with a 2 million$ / win converter and 20 wins below league average (or 61 wins or .377) with a 3 million$ / win converter. I have other reasons that I've discussed at length for using 2 million$ / win, so that's the one I prefer.
For a team that is 60 million$ above league average (say 130 million$ in a league of 70), at 2 million$ per win, that give you 111 wins. I think that's about as good as you can expect. Any team that spends more than this amount is hoping to get a benefit beyond just wins.
The Yanks have NOT amassed a team that will win commensurate with their payroll. Being over 100 million$ above league average would translate to 131 wins. Their actual talent is more indicative of a team with 101 wins (or an equivalent payroll of 110 million$ or 40 million$ above league average).
What the Yanks are doing are spending 60 million$ more than they should for equivalent talent. The teams should be REJOICING that the Yanks are spending tons of money like this (unless they think these extravagant contracts are setting the market; but Kevin Brown and ARod were inherited, Vazquez is similar to Halladay, Sheffield is in-line with the market).
The Yankees are wasting 60 million$ on the hope that this money generates at least that much money based on the "brand" of players that they have.
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 12:36 p.m.,
March 10, 2004
(#9) -
tangotiger
What does the log of the salary have to do with it? The r using that is .76. I get a best-fit with salary^1.5. The r of that is .818. Otherwise, with no exponent, it's .812. Linear works fine.
All data is from 1995-1999.
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 1:46 p.m.,
March 10, 2004
(#11) -
tangotiger
MGL, you might want to follow Pappas' series. It would be more useful if Doug were to normalize his data to 2003 dollars at some point.
The "r" represents the relationship between payroll and team wins. There is NO CAUSAL effect here. The causations are:
talent + luck = performance = wins
talent + years to free agency + mismanagement = salary
So, when I run a regression of wins to salary, we would hope that:
1: talent is the prevailing variable
2: teams have the same distribution of players with respect to years to free agency
Since we know that "2" is nowhere near the same, and since we know that teams have alot of mismanagement, it's quite interesting that the r is as high as it is. Of course, if every team mismanages in the same way, that cancels out.
In terms of efficiency, like I said, each team should only be spending 2 million$ / win. That the best-fit shows 3 million$ doesn't mean I'm wrong. I could be. It just so happens on this small sample that it's 3.
Pappas - Marginal $ / Marginal Wins (March 9, 2004)
Posted 2:32 p.m.,
March 10, 2004
(#13) -
tangotiger
The answer to figuring a team's efficiency has already been given by my two equations here:
talent + luck = performance = wins
talent + years to free agency + mismanagement = salary
That second line shows you how smart/lucky a team was in matching salary to talent.
The 2004 Marcels (March 10, 2004)
Posted 2:52 p.m.,
March 10, 2004
(#3) -
tangotiger
I forgot about that. I used the BDB. I'll update that one. I'll post a revised file on Friday in case there are other changes needed.
The 2004 Marcels (March 10, 2004)
Posted 2:56 p.m.,
March 10, 2004
(#4) -
tangotiger
Hmmm... thanks Charlie. I see that I've got a bug in my age adjustments. I divided where I should have multiplied, and multiplied where I should have divided. This applies to all players.
Give me 5 minutes, and I'll reupload the whole thing.
The 2004 Marcels (March 10, 2004)
Posted 3:02 p.m.,
March 10, 2004
(#5) -
tangotiger
Ok, the latest version is up. You know you've got the latest version if Sori is shown with an age of 28. I aged Soriano from a YOB of 1978 to 1976. If someone has the correct year of birth, let me know.
The 2004 Marcels (March 10, 2004)
Posted 3:42 p.m.,
March 10, 2004
(#7) -
tangotiger
Correct. I did only and exactly what I listed above. It should be possible for someone to independently verify my results using only the 2001-2003 data from the Batting table, and the Master table (from BDB or Lahman).
Consider this to be a good exercise to those who want to improve their Access/Query/SQL skills.
***
I agree that park adjustments, "profile" adjustments (like strong, fast, smart, tall, skinny, athletic, etc) would be necessary to improve reliability.
The 2004 Marcels (March 10, 2004)
Posted 4:22 p.m.,
March 10, 2004
(#8) -
tangotiger
I added another column called "reliability". That shows how much of the forecast is based on his performance, and how much was regression towards the mean.
Bobby Abreu shows a .87. That means that I regressed towards the mean 13%. Using that, it should be easy enough to figure out a confidence interval for each of the stats. If I show a reliability of .00, this means that it is an absolute pure guess on my part.
The 2004 Marcels (March 10, 2004)
Posted 4:54 p.m.,
March 10, 2004
(#10) -
tangotiger
Hmmmm.... since these are csv files, Excel should automatically parse it for you properly. When you see the link of the file, do a "right-click" and "save target as". Then, open up Excel, and from Excel, open this csv file. Excel should automatically parse it for you.
If Excel doesn't, then do the following:
- Data / Text to Columns
- select Delimited
- click Comma and set the Text qualifier to none
- click Finish
The 2004 Marcels (March 10, 2004)
Posted 12:57 p.m.,
March 11, 2004
(#15) -
tangotiger
Agreed.
My intent is only to do what a monkey would do: the simplest forecasts possible: uses last 3 years of data weighted, regression, and age.
(Feel free to quibble that this monkey is too smart for a monkey.)
The 2004 Marcels (March 10, 2004)
Posted 6:36 p.m.,
March 11, 2004
(#17) -
tangotiger
No, it's just like the stock market. The stock price is based on all known information. What the price will be in 1 year is, for all intents and purposes, random. A monkey picking a stock to improve is like a monkey picking a player to perform better than his Marcel forecast.
The 2004 Marcels (March 10, 2004)
Posted 10:23 a.m.,
March 12, 2004
(#20) -
tangotiger
Yup, it should be 29 - age. That was the bug I reported in post #4.
The 2004 Marcels (March 10, 2004)
Posted 12:51 p.m.,
March 13, 2004
(#22) -
tangotiger(e-mail)
If you are asking if there's anything of the mundane things that I do that I'd like to take off my plate (like updating the Team Previews file, or my Primate Index file, or formatting MGL's superLWTS file [I don't have time for that one], etc), sure! If that's what you'd like to do, then email me.
The 2004 Marcels (March 10, 2004)
Posted 4:14 p.m.,
March 15, 2004
(#25) -
tangotiger
Ok, let's look at Carlos Beltran's HR forecast.
From 01 to 03:
HR: 24, 29, 26
PA: 680, 722, 602
lgHR/PA: .0300, .0279, .0285
***
Weighting his numbers on a 3/4/5 level, and we have:
HR: 318
PA: 7938
The league numbers would be:
.0300x680x3, .0279x722x4, .0285x602x5 = 228. That's the league mean HR for Beltran's 7938 PAs. Set this to 1200 PAs, and we have 34.4 league HR.
***
HR: 318 + 34.4 = 352.4
PA: 7938 + 1200 = 9138
Or, HR/PA = 352.4/9138 = .0386
Those are Beltran's expected rates
***
We are projecting Beltran at:
PA = 602*.5 + 722*.1 + 200 = 573 PA
***
573 PA x .0386 HR / PA = 22 HR
***
Beltran is 27, so the age adjustment has almost no impact.
***
See, the thing with Beltran is that he had an ENORMOUS number of PAs in 2001/2002. The projected PA for 2004 is HEAVILY influenced by his PAs in 2003 (rightly or wrongly).
His simple average number of PAs in 2001-2003 is 668 PAs, or almost 100 more PAs than I'm projecting him for. Give him 4 HR in those number of PAs, and you get to 26. And that's matches his average.
The 2004 Marcels (March 10, 2004)
Posted 11:44 a.m.,
March 16, 2004
(#27) -
tangotiger
To everything, except PA and AB.
I actually have to fix that... it should be RATIOs relative to batting outs (AB-H), and not per PA.
The 2004 Marcels (March 10, 2004)
Posted 12:33 p.m.,
March 22, 2004
(#28) -
tangotiger
Ok, I have completed the 2004 Marcels for pitching. FTP is currently down, so I'll have to wait until that opens up.
It follows the exact same process as the Marcels for batting. Here are the particulars that are different (which you can line up with the top of this thread).
1 - Weights are 3/2/1.
2 - Removed nonpitchers pitching totals (i.e. Wade Boggs as a pitcher.)
3 - same
4 - used IP instead of PA. Change "200" to 25 for relievers and 60 for starters (or something in between for part-time starters based on GS/G).
5 - Same
6 - Same
Now, I need to make one final modification. Pitchers in the NL have a .2 or .3 ERA advantage (and big-time K advantage) over their counterparts in the AL (because of the DH). To make better forecasts, I need to know whether the pitcher is currently in the AL or NL. Right now, I have lumped everyone into the same league.
If someone wants to help me out, download the files (after I post them), and send me a csv file of all pitcherid and their leagues.
If I get no takers for this, I will repost the files with my own markings of a player's current league: last league pitched in. I'm not keen of going through each pitcher manually afterwards, like Clemens and Vazquez. Marcels will just have to be a little off on those.
As well, I added a category called bsrER, which is the "component" ER, based on BaseRuns. The ERA column is a 50/50 split between the mER column and bsrER columns.
That's it...
The 2004 Marcels (March 10, 2004)
Posted 4:44 p.m.,
March 22, 2004
(#29) -
tangotiger
Ok, they are all there now! For pitchers, I used "last league pitched in" as the baseline. So, for guys like Vazquez and Clemens, you'll have to mentally adjust them slightly. Over the last 3 years, the ERA in the AL was 0.25 higher than in the NL.
Per 9 IP, the HR rates are similar. 0.5 more K and 0.2 more BB in the NL. That's kind of weird. I'd expect more K because: pitchers batting and more HR allowed. I'd expect more BB because: more K and HR. I'd expect fewer BB because: pitchers batting. I'm surprised that the BB rate increased as much as it did.
If I make any changes, it will be before Opening Day. After that, that's it.
The 2004 Marcels (March 10, 2004)
Posted 4:45 p.m.,
March 22, 2004
(#30) -
tangotiger
Btw, for W/L, be careful! You need to look no further than Javier Vazquez to see how useless it is.
The 2004 Marcels (March 10, 2004)
Posted 3:23 p.m.,
March 23, 2004
(#32) -
tangotiger
Since we are after the pitcher's ERA, that includes the hits allowed by his fielders. FIP/DIPS wouldn't apply here.
***
I have added a file called: jtoMarcel.zip. This contains an Access 2000 database for the Marcel for 2001 to 2003. My program is now setup to generate the Marcels for any year in history. (It takes about 2 minutes to generate the data for each year.) Not now, but eventually, I might generate them for every year. It might be useful as a way for other forecasters to improve their engines.
The 2004 Marcels (March 10, 2004)
Posted 9:52 a.m.,
March 25, 2004
(#34) -
tangotiger
The best way would be some combination of:
- past ERA
- component ERA (BaseRuns)
- DIPS/FIP
Tom
MGL's superLWTS (March 10, 2004)
Discussion ThreadPosted 11:25 p.m.,
March 10, 2004
(#4) -
tangotiger
To be more accurate, you should regress each component independently. No need to lump in fielding and hitting together. The hitting component will regress with a value of 200 and fielding with 420. I don't buy MGL's comment about using 400 overall. That just looks wrong.
On the other hand, doing it independently is kinda wrong too. After all, a player is not just a sum of his parts. KNowing how good a fielder is might be another thing to use in the regression for him as a HITTER. (And vice-versa. Possible.)
***
Btw, I have nothing to do with superLWTS. I only post the file as-is.
MGL's superLWTS (March 10, 2004)
Posted 8:29 p.m.,
March 13, 2004
(#14) -
tangotiger
MGL sent me this last month with his file:
The defensive position listed for a player is his primary position
for that year, but in the defensive lwts category, his entire
combined UZR for the year is given. I know that's not right, but
it's close enough for government work.
The "position adjusted" categories are simply the unadjusted values
with the following adjustments made:
C +15
1B -11
2B +6
3B +3
SS +9
LF -11
CF -1
RF -8
DH -6
These are roughly derived from the 4-year Superlwts averages at each
position.
I removed the "moving runners over" category and essentialy included
it (at least as far as handedness, GB and GB rate are concerned) in
the batting lwts category. Since last year, I added the SB/CS
categories.
2004 Team Previews Around the Web (March 10, 2004)
Posted 12:59 p.m.,
March 11, 2004
(#2) -
tangotiger
Thanks, I'll add them tomorrow.
2004 Team Previews Around the Web (March 10, 2004)
Posted 4:01 p.m.,
March 12, 2004
(#3) -
tangotiger
File has been updated.
2004 Team Previews Around the Web (March 10, 2004)
Posted 9:45 a.m.,
March 15, 2004
(#6) -
tangotiger
Team previews has been updated.
2004 Team Previews Around the Web (March 10, 2004)
Posted 1:24 p.m.,
March 23, 2004
(#7) -
tangotiger
Page has been updated.
Silver: The Science of Forecasting (March 12, 2004)
Posted 10:17 a.m.,
March 12, 2004
(#1) -
tangotiger
Let me touch on one point that I would want substantiated. This would be a good exercise for you the sabermetricians in the group.
Here's what Nate says at the end:
PECOTA accounts for these sorts of factors by creating not a single forecast point, as other systems do, but rather a range of possible outcomes that the player could expect to achieve at different levels of probability.
Let me just say that all forecasting systems implicity does this. It would be impossible to project a player as getting a .360 OBA for next year. A forecaster is really giving an implicit range where this .360 is the mean (or possibly median) of that distribution.
Instead of telling you that it's going to rain, we tell you that there's an 80% chance of rain, because 80% of the time that these atmospheric conditions have emerged on Tuesday, it has rained on Wednesday.
Nate should be applauded for actually being explicit with his range. Saying that Hubie's OBA will be .360 with 1 SD = .020 is clearer than saying that his OBA will be .360. We all know that the forecaster in the second case isn't trying to pinpoint it exactly. But, in the first case, at least he's trying to give you enough information for you to figure out his probability range.
Surely, this approach is more complicated than the standard method of applying an age adjustment based on the 'average' course of development of all players throughout history. However, it is also leaps and bounds more representative of reality, and more accurate to boot.
What Nate is showing is not "more representative of reality", since no one should think that a forecaster's mean has an SD = 0. What he is showing is something explicit. However, is what Nate showing something that is peculiar to how his engine works? Or, is it simply something implicit in all systems that he's showing explicity?
Now, here's where I'd like to challenge Nate's system. I would say that all players with a similar number of PAs over the last three years (say 1800 - 1900 PAs) with around the same overall talent level (say somewhat above average), but with widely varying set of skills (say like Jeter, Magglio) should have a similar range of forecasts. That is, the range that is being provided is almost entirely based on the number of PAs.
So, for the sabermetricians in the group: check it out. Can the forecast for the range for OBA, HR/PA, BB/PA be almost completely derived from the number of PAs? Or, is the profile of the player important to establish this range?
And look at pitching too. Is the range of ERA for pitchers with similar past IP, but different K/BB ratio and K/IP rates the same or different?
Silver: The Science of Forecasting (March 12, 2004)
Posted 11:38 a.m.,
March 12, 2004
(#3) -
tangotiger
I published aging patterns based on 1979-1999 data, and 1919-1999 data. They were the same.
Whatever advantages the players have today with regards to nutrition/exercise carries itself to all ages.
Silver: The Science of Forecasting (March 12, 2004)
Posted 1:09 p.m.,
March 12, 2004
(#6) -
tangotiger
J, good stuff!
Silver: The Science of Forecasting (March 12, 2004)
Posted 3:22 p.m.,
March 12, 2004
(#8) -
tangotiger
Actually, I didn't ask if using components is important for the forecasts... they are.
I'm asking if the RANGES of the forecasts for a player are any different among players with the same number of PAs. If the range of ERA forecast for Pettite is 2.5 to 5.5 and Mussina is 2.0 to 5.0 and Pedro is 1.5 to 4.5... well, you see what I mean? It doesn't matter what kind of pitcher you are, everyone with the same number of past PAs will get the same range.
(I have not looked into it. This is what I'm asking for others to verify.)
Silver: The Science of Forecasting (March 12, 2004)
Posted 10:46 a.m.,
March 13, 2004
(#16) -
tangotiger
The way I se Silver doing similarity is: take the 100 most similar players, giving them a certain weight (maybe 10 times more weight for the most similar down to 1 weight for the 100th most similar). Then, take each of those players, and find THEIR most similar players.
That's one degree of separation, and you might end up with say 1000 unique players, each weighted between 1 and 30 (numbers for illustration only).
At this level, sample size might not be an issue. However, before you do discard the 10,000 hitters in favor of the 1000 hitters that might be more represenatitive, you have to establish if being more representative using this model does contain extra information.
It's an interesting process to go through (I'm not even sure this is what PECOTA does). We should remain to be skeptical about any grandeur claims that can't be reproduced. Chris Kahrl said so just last week.
Silver: The Science of Forecasting (March 12, 2004)
Posted 12:48 p.m.,
March 13, 2004
(#17) -
tangotiger
Here, this is what I'm talking about. Let's look at some pitchers from the Angels:
Pitch 90th 50th 10th
Colon 2.83 3.71 5.34
Ortiz 3.09 4.55 6.92
Washb 3.4 4.16 5.75
Sele. 3.73 5.3 8.08
Lacke 3.08 4.18 5.27
Esoba 2.62 3.97 5.14
This is the ERA forecasted by Nate. I put them roughly in order by experience.
Resetting the above numbers relative to the 50th percentile and we get:
Pitch 90th 50th 10th
Colon 76% 100% 144%
Ortiz 68% 100% 152%
Washb 82% 100% 138%
Sele. 70% 100% 152%
Lacke 74% 100% 126%
Esoba 66% 100% 129%
ALL 73% 100% 140%
There's not really much to glance from here in terms of my expectation that the more PAs a pitcher has had, the narrower their bands should be.
I don't understand why Ortiz' band would be so wide. Why do we know less about Ortiz than Lackey?
There's alot to talk about here with regards to these bands. Their reliability has also not been established. It's a good "to be" model (to use a corporate america term). But, we can't yet say that we are there yet.
Silver: The Science of Forecasting (March 12, 2004)
Posted 11:21 p.m.,
March 13, 2004
(#22) -
tangotiger
The most extreme case would be a player coming straight from college (and assuming tha comparables are set only starting from the minor leagues, and draft position is not used).
Silver: The Science of Forecasting (March 12, 2004)
Posted 2:35 p.m.,
March 14, 2004
(#23) -
tangotiger
I just picked out 2 relievers:
Donne: 1.89, 3.18, 4.89 (59%, 154%)
Perci: 1.95, 3.47, 4.88 (56%, 141%)
That spread, relative to the starters, just looks wrong. I will guess that the spread is not based on probability distributions based on # of PAs (as it should be mostly), but mostly on the comps. And, I would guess that you are going to get weird results like this.
The fewer the # of PAs, the larger the expected spread of performance. We're getting that, but just barely. When I get a chance tomorrow, I'll show what we think the spread should be.
Silver: The Science of Forecasting (March 12, 2004)
Posted 11:58 a.m.,
March 15, 2004
(#25) -
tangotiger
Suppose you have a pitcher that you "know" his true talent OBA is .340. What do you expect his ERA to be? A quick short-hand would be to do: OBA^2 * 37. So, that's 4.28. It's not too important that it's 37 or 38, or 1.8 instead of 2, etc. This is just a nice quick and dirty way.
Now, what if this pitcher is going to face 1000 batters? What do we expect his ERA to be? In this case, we are 95% sure that it'll be between 3.56 and 5.06, or 83% to 118% of his true talent ERA.
What if instead, he will face 300 batters? In this case, the spread would be 70% to 135%. So, we have a substantial difference in our expected ERA based strictly on how much playing time the player will get.
Remember, we started off "knowing" his true ERA. For a starting pitcher, we are more certain about this than about a reliever (because a starting pitcher will have faced 2500 batters in the last 3 years compared to the 800 that a reliever would face). If you add that level of uncertainty to the true era, that widens the gap even more.
Therefore, if you want to verify the reasonableness of the PECOTA probability ranges, just compare the forecasted range for the starters and relievers. You should find a substantial difference between the two.
(Note also that the ERA itself is also subject to uncertainty, because of the "stringing" of hits and outs, along with the fielders impact on BIP).
Silver: The Science of Forecasting (March 12, 2004)
Posted 1:27 p.m.,
March 15, 2004
(#27) -
tangotiger
Wally, I'm not disagreeing with what you are saying. But, it can be avoided for what I'm talking about. If you just concentrate on the 30 starters with the most PAs over the last 3 years, and the 30 relievers with the most saves over the last 3 years, the selective sampling would be similar.
Our expectation therefore is that the spread of the forecast (any forecast) for those starters must be much smaller (at least as a group) than the spread of the forecast for those relievers.
Anyone who looks at the prob distribution spread of PECOTA (or any other forecast) must expect this. If they don't get it, then I would say that that forecast engine would have to explain itself.
Silver: The Science of Forecasting (March 12, 2004)
Posted 2:11 p.m.,
March 15, 2004
(#29) -
tangotiger
I just went through the Angels pitchers, starting from the top, and stopped in the middle. I looked for pitchers with no, or almost no, MLB experience. These are guys who we must be less sure of, if only because minor league data cannot be as reliable as MLB data. And, to boot, I split them off as being minor league relievers or minor league starters.
Andra 2.04 3.66 5.59 56% 153% Reliever
Dunca 3.32 5.16 7.76 64% 150% Reliever
Jones 2.29 3.7 5.62 62% 152% Reliever
...
Bootc 3.12 5.04 7.46 62% 148% Starter
Cyr 3.71 5.31 8.74 70% 165% Starter
Fisch 3.28 4.49 6.59 73% 147% Starter
Green 3.23 4.79 6.75 67% 141% Starter
Hensl 3.08 5.02 7.84 61% 156% Starter
Notice a pattern? That's right... not much. The relievers are between 60 and 150% of their 50th percentile, while the starters are 65-70 and 150%.
These reliever spreads match the spreads of Donnely and Percival (see post #23). These starter spreads are pretty close to the spreads of the experienced Angel starters.
We shouldn't confuse the quality of the mean forecasts of PECOTA and the quality of its prob distribution forecasts.
Silver: The Science of Forecasting (March 12, 2004)
Posted 4:01 p.m.,
March 15, 2004
(#31) -
tangotiger
Actually J, you should only compare it to the Marcels, since that's the only system that we know what it's doing.
What the r between Marcel and the others?
Silver: The Science of Forecasting (March 12, 2004)
Posted 5:03 p.m.,
March 15, 2004
(#33) -
tangotiger
MGL, please remember that Nos Amours played 25% of their home games away from the Big O. Can you publish your non-regressed PF for Montreal (and San Juan), and regressed for the last few years?
Silver: The Science of Forecasting (March 12, 2004)
Posted 10:51 p.m.,
March 15, 2004
(#40) -
tangotiger
It might be worth asking Nate for a CSV file of the 2003 data, at least OBP, SLG, and PA for each player/band.
One good thing that Nate shows with the bands is that as the player's performance gets worse, he gets less playing time. That's a good thing, as that models reality. That is, if your true talent is an ERA of 4.0, and you are performing half-way through the season at a 5.5 clip, you will get less PAs the rest of the way. Even if the rest of the way you continue at your true rate of 4.0, your overall rate will be around 5.0, and your PA will be less than forecasted.
Like I said in another thread: you really need another dimension to the forecast. I would do it something like this:
90%: 3.00 ERA
75%: 3.50
50%: 4.00
25%: 4.50
10%: 5.00
Given 3.00 ERA
90%: 260 IP
75%: 230 IP
50%: 210 IP
25%: 190 IP
10%: 160 IP
Given 3.50 ERA...
And so on. Then
Given 3.00 ERA
90%: 1.0 K / IP
75%: 0.8 K / IP
etc, etc
So, there's alot of dimensions going on here. I think Nate does a good job in presenting it as he does. But, I think that hides what's really happening with the probability distributions.
In any case, as MGL said, what does it really matter. If you have 2 guys with a forecasted ERA of 3.00, but one has an SD of 0.50 and the other has it at 0.25, do you need to apply risk aversion? Aren't they both equals? Even applying a non-linear salary to each probability distribution, I don't think you'll have much difference.
Silver: The Science of Forecasting (March 12, 2004)
Posted 11:49 a.m.,
March 16, 2004
(#45) -
tangotiger
Michael, I don't disagree with you, but you haven't specified the impact.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 10:53 p.m.,
March 14, 2004
(#4) -
tangotiger
As a general rule, I usually find these optimization things are on the order of 0.5 to 1.5 wins. By themselves, you might say "no big deal". But, add them up, and a good sabermetrician should be able to add 5 wins to a team just on optimization alone (i.e., helping the manager). (He should be able to add another 10 to 15 wins in helping the GM). At 2 million$ / win of value..... it's just crazy that not more teams pay for this.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 9:16 a.m.,
March 15, 2004
(#9) -
tangotiger
MGL, batting order optimization can add 0.5 to 1.5 wins. As long as the #2 hitter continues to be an average hitter, teams have alot of places to improve upon.
And yes, giving the exact same payroll, a good sabermetrician should be able to add 10 wins to his GM. This means that there's 20 million$ being poorly allocated. So, a well-run sabermetric team with a 70$ million payroll is equivalent to a poorly-run nonsabermetric team with a 90 million$ payroll. And I think I'm being conservative.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 12:28 p.m.,
March 15, 2004
(#12) -
tangotiger
J, you're going to have to wait a while. I think MGL and I have good intentions, and we had great momentum up to a point. The outline we have written out, along with the ideas for research, is really something great to work from. But, we have let other things take precedence.
Right now, I'm having the most fun working on the Fans' Scouting Report. I think there's great payoff there. Just wait until I roll out the "Similarity Scores" for fielders! For example, Derek Jeter's most similar fielding comp is Bobby Abreu. There are NO similar fielders to Vlad. That's what YOU fans say. This sabermetric stuff to me is a hobby and it's supposed to be fun. And, this Scouting Report is loads of fun. Is Mark Ellis' UZR real? Well, you fans think he's a GREAT fielder, so this makes it more reliable. But, Eckstein is average according to the fans. So, maybe his UZR should be regressed more heavily.
The only way to get back into the book is to really shut down everything. No more Primate Studies, no Fanhome, no Clutch, no web altogether. I can shut all that, except Studies. I find I gain so much talking with you guys at Primate Studies that I'm not sure it's worth it.
I wrote a Markov Chain Batting Order program. It's really cool (programmatically, I take great pride in it), and I did it a few months ago. I needed it for one chapter in the book. There's a bug in it, and after a day, I stopped looking for it. I was going to get back to it, as I usually like to relax for a couple of days before going back into it. But, I have not. I went on to something else, because it's more fun for me to do so.
So, who knows what we'll do.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 12:31 p.m.,
March 15, 2004
(#13) -
tangotiger
(homepage)
Kyle, the above link probably is saying what you are saying. It's not a question of the measure of risk, but rather what are the probabilities of a given situation, regardless of how much spread there is in possible outcomes.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 9:47 p.m.,
March 16, 2004
(#16) -
tangotiger
The APBA one I think comes from Mark Pankin, and he's the guru of batting orders.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 8:18 a.m.,
March 17, 2004
(#18) -
tangotiger
I agree that it's a case-by-case basis. Any team that doesn't have one of their 2 best hitters as the #2 hitter is almost definitely suboptimal, and you can gain at least 5 runs. I would say this is the case for most teams.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 2:46 p.m.,
March 17, 2004
(#20) -
tangotiger
[Spam between posts #14 and #15 was deleted]
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 2:51 p.m.,
March 17, 2004
(#21) -
tangotiger
AED, very interesting. 0.1 RPG = 16 runs per season. That's on the high-side of what I was expecting. MLB is in worse position than I thought! Can I take a guess that in your 5 teams, all of them had a #1 and/or #2 hitter that is considered at best average?
As a rule of thumb, your 3 best hitters should hit somewhere 1,2,4.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 1:53 p.m.,
March 18, 2004
(#25) -
tangotiger
If you check out the Primate Studies index, I have a "workshop" on batting orders. If you have 2 or 3 hours, I suggest reading that.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 5:36 p.m.,
March 18, 2004
(#28) -
tangotiger
Off-hand, I'd say put Nomar 2 and Manny 4.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 10:19 p.m.,
March 18, 2004
(#31) -
tangotiger
Could you also list their SLG? My rule of thumb would have the largest gap for the #4 hitter in SLG-OBA, and smallest for the #1 hitter, all the while having the top OBA*1.8+SLG for the #2 hitter, with #1 and #4 close behind.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 8:40 a.m.,
March 19, 2004
(#36) -
tangotiger
You definitely want a diff lineup or LH/RH and GB/FB. Heck, since it's all done by computer anyway, I'd have a different one by park.
Of course, the devastating pyschological impact of moving Manny from 2 to 4 to 2 might be something from which he would never recover. Funny how these world-class athletes are the most susceptible to things that would never bother a kid in high school.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 11:50 a.m.,
March 19, 2004
(#39) -
tangotiger
Wouldn't they be different in any given lineup?
YEs, but I don't think much. In my batting order sim, one thing I wanted to do is figure out whether the structure of the batting order has more impact than who the players are in those lineups, to determine the LWTS by batting order.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 12:52 p.m.,
March 19, 2004
(#41) -
tangotiger
I have a Markov Monte Carlo sim. It's just a couple of steps from being "perfect", but those last 2 steps are a time-killer. Tippett at DMB did a similar one last year. Mark Pankin has the best one that I know of. Ben Baumer might have one, but I'm not sure if he uses Markov or not.
Wolverton - Wavin' Wendells - Outs on Base (March 12, 2004)
Posted 2:34 p.m.,
March 19, 2004
(#43) -
tangotiger
I think Nate Silver also has one from a blurb I saw on BP.
***
If ever you try to write one, you will have to think of having 4,5, or 6 dimensional arrays, and making it recursive. As a programming challenge, it's great. But, it's a huge time killer. And debugging it is a b-tch, especially if you are doing it on the side, a half-hour at a time.
Batter's Box Baseball Blog - Statistical Evaluations of College Pitchers (March 13, 2004)
Posted 2:52 p.m.,
March 17, 2004
(#1) -
tangotiger
[Spam messages have been deleted.]
Park Factors (March 18, 2004)
Discussion ThreadPosted 4:44 p.m.,
March 18, 2004
(#2) -
tangotiger
The quote was from Dayn Perry's chat.
Park Factors (March 18, 2004)
Posted 5:43 p.m.,
March 18, 2004
(#4) -
tangotiger
It needs to go one step further: by profile and quality of batter, what's the park factor.
Given a strong, LH, FB, good hitter, what's his PF at Pac Bell? Shea? Fenway?
Park Factors (March 18, 2004)
Posted 10:26 p.m.,
March 18, 2004
(#6) -
tangotiger
Are your HR factors a function of "long fly balls hit", or a function of HR hit? This makes a HUGE difference. As well, the factors should be based on ratios and not rates.
For example:
Olerud: 10 HR, 100 long flys
Bonds: 40 HR, 100 long flys
If Coors turns a 15/100 hitter into a 20/100 hitter, that's a factor of:
20/80 divided by 15/85 = 1.4
So, Olerud is: 10/90 x 1.4 = ratio of .156 = rate of .135 = 13.5 HR per 100 long flys
Bonds is 40/60 x 1.4 = ratio of .93 = rate of .48 = 48 HR per 100 long flys
So, Olerud's rate was increased by 35%, while Bonds' was increased by 20%.
Park Factors (March 18, 2004)
Posted 10:52 p.m.,
March 18, 2004
(#7) -
tangotiger
And the other more larger point is Barry Bonds. What is your component PF for LH at Pac Bell? Now, can you redo your calculations, but this time removing Barry Bonds?
These park factors only work when you have a typical distribution of players. I think it's fairly certain that the LH at Pac Bell don't follow that typical distribution, when 15% of those LH PA goes to the best hitter in the league.
How can you possibly calculate a LH Pac Bell park factor that includes 15% Bonds PAs, and then apply that PF to Bonds himself?
Can you imagine doing a LH HR park factor for the 1920 NYY? What is it... 75% of the LH HR are hit by 1 guy?
Park Factors (March 18, 2004)
Posted 8:34 a.m.,
March 19, 2004
(#11) -
tangotiger
but once you use many years of data and regress
But, since Pac Bell opened, hasn't Bonds continued to make up 15% of all LH PAs there?
You can take Yankee Stadium from 1920 to 1931, and I'd guess that over 50% of LH HR were hit by 1 guy. If, for example, Ruth hit 300 HR at Yankee Stadium and 200 away, and his LH teammates his 300 HR at Yankee Stadium and 400 away, you would conclude that Yankee Stadium has no HR PF for LH.
But, this is not a representative sample of MLB players, since Ruth makes up 50% of the sample. Furthermore, the more PAs you have of Ruth home and away, the less you need to know about the PF for the non-Ruth players.
Park Factors (March 18, 2004)
Posted 8:35 a.m.,
March 19, 2004
(#12) -
tangotiger
Alan, thanks for that great insight. I'd love to see an article on your findings, and I'd be glad to post it here, if you like.
Park Factors (March 18, 2004)
Posted 9:28 a.m.,
March 19, 2004
(#13) -
tangotiger
Btw, it's a given that using the additive, multiplicative, or odds ratio method will give you similar results, on average. After all:
1 - these adjustment factors, on average, are small to begin with
2 - there are an enormous number of players clustered to the mean
But, when it comes time for Bonds and Pac Bell, the extreme cases, the cases we most care about, we should do it the right way.
Park Factors (March 18, 2004)
Posted 10:35 a.m.,
March 20, 2004
(#16) -
tangotiger
Using your argument, we wouldn't want to combine the data from 100 different players since each of those players has their "own" park factor."
No, my argument is that the players in your sample should be representative of the MLB population. The 1920s LH Yanks, and the 2000s Giants are not.
The Scouting Report, By the Fans, For the Fans - Most Similar Fielders (March 18, 2004)
Posted 5:40 p.m.,
March 18, 2004
(#4) -
tangotiger
On a scale of 0 to 100, with 50 as average, here are Garrett and Royce's numbers, in order of the listed traits:
Garr: 63,58,58,64,69,48,66
Royc: 60,56,54,69,61,56,66
Do you think that too many fans evaluated the players with respect to their positions?
The Scouting Report, By the Fans, For the Fans - Most Similar Fielders (March 18, 2004)
Posted 10:29 p.m.,
March 18, 2004
(#7) -
tangotiger
Jose Cruz Jr is one of the most complete fielders, according to the fans. Another guy the fans love is Cesar Izturis.
The Scouting Report, By the Fans, For the Fans - Most Similar Fielders (March 18, 2004)
Posted 3:00 p.m.,
March 19, 2004
(#10) -
tangotiger
That is absolutely correct, and intentional.
I have synthesized the fielding aspects of players into 7 traits. The sim scores are based solely on those traits.
You can argue that 7 is not enough, and that those 7 are not representative enough. However, I'm fairly pleased at how seriously the fans have taken this project, and how many of the comps are good or at least interesting. Mostly when catchers are involved would you think that the sims are not good.
Out of all 8 positions, I think the catcher is the one that does things that are so different from the rest that it might be a good idea to exclude him. You can put Manny Ramirez at 1B, or Nick Johnson in LF, or Todd Helton at SS, and you can already kind of figure how they might do there. Put Ivan Rodriguez at 2B? Ausmus in RF? Their skillsets are not exposed enough at C that you can try to make an informed opinion.
The Scouting Report, By the Fans, For the Fans - Most Similar Fielders (March 18, 2004)
Posted 10:33 a.m.,
March 22, 2004
(#12) -
tangotiger
mommy, you may think it's "almost impossible", but it shouldn't be (especially if you are an experienced scout). I think a scout should be able to see a player make 100 plays at one position (say like the throws from SS to 1B, or relays from short OF to home), and be able to give it one number, like 95 out of 100.
Change "almost impossible" to "difficult if you're not looking for it", then I will agree with your statement.
The one thing that I noticed in the player evaluations is that catchers are of such different breed, that I should remove them altogether. For the other 7 fielders, we are looking at how they react on batted balls in play, and trying to get the runners out from that BIP. For catchers, they are starting from such a different positions on BIP, that it's really almost impossible to compare the first 2 categories.
There is a quality to catchers (experience) that is far more important than all the other categories I listed. When I compared the average experienced catcher to the average no experienced catcher in things like PB, WP, etc, there is such a huge difference, that I have not captured it in this study.
Since the study is focused on BIP, I don't think I can come to good conclusions with catchers.
The Scouting Report, By the Fans, For the Fans - Most Similar Fielders (March 18, 2004)
Posted 2:27 p.m.,
March 24, 2004
(#13) -
tangotiger
This is something I've been doing 5 different ways: how to compare fielders at different positions. You can check out the various articles I have on this in the Primate Studies index.
The idea goes: every player is compared against the average at that position. Every position's average will end up being 0. So, we need some way to adjust the avg SS upwards and the avg 1B downwards to properly balance them. The traditional way has been to look at their offense, and take the opposite. So, if the avg SS is -10 hitting runs, then they must be +10 fielding (relative to all fielders), so that overall they are zero. The assumption here is that all positions, off+def, are equals.
As noted, I have many different ways to get to that answer.
Here's another. I took Mark Ellis, and looked at the Fans' Scouting Report. I looked for the SS that is most similar to him. I repeated this for all 2B. What I end up with is the average 2B profile, and the average profile of their most similar SS. In my group of 2B, their UZR was +3.5 runs. Their SS equivalents had a UZR of -0.7 runs.
While I can't move my 2B to SS and see how they do, I can instead look at the SS who share the most similar traits to my 2B, and assume that if I moved my 2B to SS, that they would do as well as those SS! Pretty cool, right?
Well, the answer here is 4 runs. When comparing a SS and a 2B, you need to make a 4 run adjustment. What's really interesting about that number is that you get something similar when you do it the many different ways I've been doing it, as well as the "traditional" way.
What I will be doing is making a few more of these types of comparisons. The tricky ones will be the IF-OF comps. We'll see what that gives us.
The Scouting Report, By the Fans, For the Fans - Most Similar Fielders (March 18, 2004)
Posted 2:47 p.m.,
March 24, 2004
(#14) -
tangotiger
Just looking at LF-RF. Wow, what a difference. Based on the Fans' evaluation, there are many more good fielding RF than in LF. And the number of bad fielders are abundant in LF and missing much in RF. Among my sample, the adjustment is a whopping +6 runs. An average fielder in RF should be +6 in LF.
This is in direct conflict with:
http://www.tangotiger.net/UZRmultiple.html
where I show that the avg LF (from 1999-2003) was 3 runs BETTER than the avg RF. In this case, I looked at the players who actually did play in both positions.
What we have here are:
- sample size issues with the Fans' scouting comparison
- Fans' evaluation may be positionally-biased
The Scouting Report, By the Fans, For the Fans - Most Similar Fielders (March 18, 2004)
Posted 10:03 a.m.,
March 25, 2004
(#16) -
tangotiger
Good point.
Here are the average scores for the LF (1st line) and RF in the first 4 categories:
48 48 51 47
50 52 59 50
In each case, the RF is ahead of the LF. He's way ahead of the LF in speed.
In the throwing categories, it's no contest, as we'd expect:
45 44 46
59 69 59
The top speedsters in LF/RF are:
Ichiro
Crawford (LF)
Drew
Cruz Jr
Jacques (LF)
Stewart (LF)
Vlad
Sanders
Abreu
Mondesi
Tucker
At the bottom of the pile:
Burrell
Salmon (RF)
Alou
Pujols
Ramirez
Berkman
Dye (RF)
Floyd
Giles
Ibanez
Green (RF)
The fans sure think that the better fielder is in RF.
MGL - Questec and the Strike Zone (March 20, 2004)
Discussion ThreadPosted 4:51 p.m.,
March 20, 2004
(#3) -
tangotiger
questec.pdf - MGL's new article
superLWTS2003.html - super-LWTS nicely formatted (I just uploaded it to replace the old one that was there)
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 3:38 p.m.,
March 22, 2004
(#3) -
tangotiger
I agree. I'm unimpressed with these "sim score" views. 10 players really means almost nothing to me. While each type of player would have his own aging profile, the small sample that would set this trajectory has a huge error range. I prefer a sample based on either all players, or somewhere on the order of at least 50 very similar players (or 200 - 300 somewhat similar).
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 4:26 p.m.,
March 22, 2004
(#7) -
tangotiger
There's no way you can figure out the "consistency" of a player. A player's true talent level changes slightly day-to-day, year-to-year. This is what Marcel tells us. Just take the last 3 years, weighted, and adjusted slightly for age. The PERFORMANCE of the player is subject to much randomness. You just can't figure out the consistency of a player true talent level from the performance of that player.
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 7:20 p.m.,
March 22, 2004
(#10) -
tangotiger
Yes. Chavez is not 10th best, but 5th to 20th best. And, over the next 6 years, he'll be 1st to 100th best.
We need the error range.
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 7:18 a.m.,
March 23, 2004
(#21) -
tangotiger
I agree Tejada was way overrated, they have a good kid in Crosby.
Over the next 6 years, Chavez projects at 30 wins above replacement. 2 million$/win x 30 wins = 60 million$. The deal was fair all-around.
Was the Eric Chavez signing a good one? (March 22, 2004)
Posted 11:08 p.m.,
March 23, 2004
(#28) -
tangotiger
I'd guess Bonds has (at least) another 5 years before he becomes an average hitter (star players usually retire when they become average). Of course, his fielding and baserunning talents are already a drag. He should become a DH, ala Molitor, in a year or two, but I'm guessing he won't bother. Gordie Howe played at an effective pace into his 50s.
Sophomore Slumps? (March 23, 2004)
Posted 1:12 p.m.,
March 23, 2004
(#4) -
tangotiger
I've been having an email discussion with Aaron, and I immediately told him about "regression towards the mean". I told him I'd post his thread, and that I would guarantee him that one of the regulars here would bring it up. Just to get the point home, I looked at the MVP award winners from 1955-2002, and their SLG in the award year was .565 and in the next year it dropped 50 points. That is, it regressed about one-third of the way towards the mean. And, that's pretty much what we expected.
Sophomore Slumps? (March 23, 2004)
Posted 10:20 a.m.,
March 24, 2004
(#18) -
tangotiger
Regression is by far the most important thing. We're talking about 30 to 35% change! The age factor is up to 5%.
Sophomore Slumps? (March 23, 2004)
Posted 10:06 a.m.,
March 29, 2004
(#22) -
tangotiger
I didn't think it was anything wrong with Clay's response.
The Scouting Report - Compared to UZR (March 23, 2004)
Discussion ThreadPosted 5:21 p.m.,
March 23, 2004
(#2) -
tangotiger
What will be really telling is the 2002/2003 UZR v 2004 UZR and the 2002/2003UZR+Fans v 2004 UZR.
If the fans have been unduly influenced by the UZR, then by definition they have nothing more to add to them, and therefore the two r's above should be identical. I'll be happy if the Fans would add to the UZR regression as much as it did in my little study here.
What I'll do is list the players that the Fans and the "true talent UZR" (already regressed) disagrees with the most. If this is meaningful, like Cesar Izturis, it would mean that the 2004 UZR (or ZR) will regress towards the Fans' somewhat.
The Scouting Report - Compared to UZR (March 23, 2004)
Posted 9:50 a.m.,
March 24, 2004
(#7) -
tangotiger(e-mail)
I should know this, but how well do year y UZRs correlate with year y+1?
Uh... this is what I said in my intro:
I compared the UZR for the 160 players between 2002 and 2003 who I had enough Fans' balloting on. The r was .58.
I guess that wasn't clear enough. The year-to-year r for UZR (between 2002 and 2003) for the 160 players that qualified was .58.
***
How did you convert Izturis' scouting report to runs? Was that just an example to make a point? Also, do you (or will you) weight all the skills evenly?
In this example, I weighted all traits evenly. That's not what I will be doing in the future.
Will you (or have you)regressing your results vs. UZr so as possibly to establish the importance of your categories?
I have done this, but my sample size is too small, and there is alot of interdependence. Some traits end up with negative coefficients, which makes no sense.
Could positional averages be used to create a profile of who might succeed at a certain position? At least average quickness, reactions and hands are necessary conditions to being even a below average SS.
This is what I will be publishing soon. I will show the average rankings by position. Off the top of my head, the avg SS was above average (relative to all fielders) in every category, and to about the same degree. C/1B was by far the lowest in speed. 2B do not fare so well.
It'd be cool to be able to conclude that Mientkiewicz has the profile of a defensively better than Rivas 2B. Or to figure out which position Jeter is best suited to switch to.
That's also my next assignment. I will compare each player to each positional average, and figure out where he is most similar (and dissimilar). In the meantime, I have provided you with each player's most similar comp, so that should give you some idea. Since Jeter's most similar comp is Bobby Abreu, we know that one, or both, of these guys are playing out of position. My guess is that it's Jeter.
Does anyone know if teams do this sort of analysis on their 20-80 scouting?
I should hope so. Isn't this where most teams are spending their efforts? It would shock me if teams only collect scouting data, but don't have a systematic process to handle it. If they are not doing what I'm doing (at least at the minor league level), then those teams should call me!
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Discussion ThreadPosted 2:14 p.m.,
March 24, 2004
(#2) -
tangotiger
Excellent comments. Let me just take a few.
One danger in regressing to the fans is that, sometimes, the fans' evaluation could be based on reputation or distant skills than on current demonstrable skills.
I was thinking about that as well. The equation should probably be: Fans * .25 + UZR * .50 + mean * .25, or some such.
Perhaps the weight should vary by sample size, with a max weight of 1/3 and a min of 0?
Absolutely. With UZR, the regression towards the mean is 420/(420+BIP). So, I'd have to use some similar thing here, and not the above equation I just listed.
If someone is evaluating players on 20 teams
This is nowhere near the case. I think the most I had was 3 teams from 1 person. 95%+ of the people voted for 1 team.
BTW, could you add sample size to your table?
Ugh. I keep meaning to do this, and I keep forgetting. The reason is that some players have 10 evaluations for instincts but 12 for speed, etc. I'm thinking of just doing the sum of the total evaluations and divide it by 7.
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 4:05 p.m.,
March 24, 2004
(#4) -
tangotiger
Great point!
I'm trying to come up with the best weighting scheme (by position). Right now, for 1B, I'm double-weighting Acceleration and Speed (because of their high correlation to UZR), and Hands (because of the scooping ability which is not captured in UZR). I'm not sure how important the DP-type skills are for a 1B. After all, only about 20% of PAs have a DP in effect, I believe. So, even if it was very important, it would only apply 20% of the time, thereby dragging it down to very unimportant.
In terms of covering for the sac bunt, I see Speed, Hands, and the 3 throwing categories as important. But again, not many bunt attempts to begin with.
Are we satisfied with my current weighting scheme? I'm tempted to half-weight the throwing categories, simply because of lack of opportunities. Maybe keep Release as single-weighted to capture the 3-1 plays.
Thoughts?
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 4:17 p.m.,
March 24, 2004
(#6) -
tangotiger
MGL: you know, I never noticed that 1B had that. To confirm, I am only looking at UZR, though in these positional reports, I should include the Arm as well.
For 1B with at least 40 games, the range is from +2 to -2 runs for the arm. Bumping up the game requirement to 140, and it goes from +1 to -1. Interestingly, Minky has a -1.
In any case, this goes to the heart of my point. Among 1B with at least 120 games from 1999-2003, the SD for UZR is 9.6, while for the DParm it's 0.6. I think giving the 3 throwing categories a half-weight might even be too generous.
Right now, I'm thinking:
1: Instincts
2: Acceleration, Speed, Hands
1/2: Release, Strength, Accuracy
Total weights = 8.5, of which 1.5 are for throwing (or 18%).
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 10:58 a.m.,
March 25, 2004
(#10) -
tangotiger
I spoke with MGL about this a while ago. You should not do the "divide by 2". We don't do that for pitchers/fielders, or hitters/pitchers, and we shouldn't here.
If the average OBA is .34, and the batter gets on base, he doesn't get +.33 times on base, while the pitcher also gets +.33 times on base. If the average BIP is .30, and the batter gets an out, the pitcher doesn't get -.15 hits and the fielder doesn't get -.15 hits. That's not how it works.
Suppose you have the following combos:
Jeter/Sori: .40 DP / opp
Jeter/Wils: .40 DP / opp
ARod/Sori: .50 DP/opp
ARod/Wils: .50 DP/opp
avg: .50 DP/opp
What this means is that AROd, Sori, and Wils are all average, and that Jeter is -.10 DP/opp.
If the Jeter/Sori combination did 40 DP per 100 Opp, you would give Jeter -10 DP and Sori 0.
For a more extensive explanation, you should see what I did with catchers:
http://www.tangotiger.net/catchers.html
On the other hand, with such little impact with DP, MGL's process is ok. But, I wouldn't do it that way.
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 11:02 a.m.,
March 25, 2004
(#11) -
tangotiger
Note: my example here was very simple. Sori and Wil could be +.1, Arod -.1, and Jeter -.2. This is why I suggested reading the catcher article.
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 11:15 a.m.,
March 25, 2004
(#13) -
tangotiger
Yes, absolutely.
You figure ARod-[specific2B]-1B, and repeat this for all SS (like my Gary Carter example). That's for the 6-4-3 DP skill.
Then, [specific2B]-ARod-1B, and repeat. This is for the 4-6-3 skill.
You would probably want to limit it to a 4 year period, as aging will probably come into play here alot more than with pitchers/catchers.
The Scouting Report, By the Fans, For the Fans - 1B Report (March 24, 2004)
Posted 12:33 p.m.,
March 25, 2004
(#14) -
tangotiger
I updated the 1B report for the following:
- added DParm to the UZR
- resorted based on Fans Runs
- did the correlation between Fans Runs and UZR162 (I actually had the correlation between Fans Runs and the UZR162 for 2003 only; now I have it for both years)
Mo and the HOF (March 25, 2004)
Discussion ThreadPosted 10:51 a.m.,
March 25, 2004
(#2) -
tangotiger
I agree that you should not weight all his innings the same. I was basing the "25 wins" from here:
http://www.livewild.org/bb/toppit.html
This uses win expectancy, and has Mo as +21 wins above average through 2002. So, he's probably around +24 to +25 through 2003.
In the playoffs, he's about +40 runs above average, which for his usage pattern is probably about +8 wins. Single-weighting, he's borderline.
The Scouting Report, By the Fans, For the Fans - 3B Report (March 26, 2004)
Posted 7:43 p.m.,
March 26, 2004
(#2) -
tangotiger
MGL, that should be easy enough to check. I just have to run a regression of the Fans Runs to UZR162, and include age, and since if there is any correlation to that.
I did notice that Barry Larkin, ReyRey and Vizquel came out pretty high with the Fans evaluations.
The Scouting Report, By the Fans, For the Fans - 3B Report (March 26, 2004)
Posted 9:28 a.m.,
March 29, 2004
(#4) -
tangotiger
I think someone did mention it, and I have not done it.
So, I'll add age and offensive performance to the mix to see if those indeed have an impact.
Copyright notice
Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.
If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.