author topic   this topic is 5 pages long:    1   2   3   4   5  
tangotiger posted September 22nd, 2000 06:00 PM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

Also, to compound the error, when the pitcher is on base, the top of the order comes up to bat, almost giving the pitcher too much credit for the runs scored when he does get on base.

However, now that the top of the order bats in the pitcher's inning, this removes the possibility of the top of the order batting in their own (next) inning.

The more I think about it, the more I believe RE must be calculated to end of game.

IP

mgl posted September 22nd, 2000 07:29 PM find more posts by mgl    edit/delete message   reply w/ quote
Senior Member
Member Since: Apr 2000
Location:

Tango,

The RE is to the end of the inning. I'LL have to give it some thought as to why (and how) I would want to use the end of the game. For example, how would I adjust the RE's when they are values like 5.0, etc.? And obviously, RE's at the beginning of the game are higher than at the end. Could you explain the EXACT methodology for using the end of the game as an endpoint?

IP

tangotiger posted September 22nd, 2000 10:00 PM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

MGL, I have no idea how. I just know the why.

If you just pick out all the real games where the #9 hitter went 0-fer, i'll bet his LW out value will be very low. Just because he doesn't get a hit (and consequently drastically reducing the number of runs in the inning) doesn't mean that his out value should be worth less than someone else. (That's why I call it the Pedro effect, because for the pitcher it DOES mean that his out is worth less.) If anything, the out value of the #8 hitter should go way down, since the expectation is that the #9 hitter will not get a hit. When the #9 hitter gets an out, he prevents the top of the order from coming up.

If I understand your RE methodology correctly, you determine the number of runs expected in any given inning at any point in time, with the understanding that the state will go to zero once you're at 3 outs. And I guess the problem is that you have an inner state that must get to 0 (end of inning), and a final state that also gets to 0 (end of game). I'm not sure how to resolve this.

Maybe we can tackle this from a different angle. You have a sim right? What if you create a lineup of exactly equal batters. This should give you the true batter-neutal LW by batting order.

Now, the next variable to introduce is to create the #9 hitter as a player who always gets out. Then run you RE program against these results. My guess is that your LW will show the #9 spot with a very low LW for the outs. And this is wrong. The only difference is the #9 hitter. But the guys around him, the guys who should have the ONLY impact on a player's LW, are exactly the same. The only reason for the #9 hitter to have slightly different LW values from the batter-neutral test is if he comes up with different base/out number of occurrences.

It's almost like we're saying after the fact that we EXPECT the #9 hitter to not get a hit, and therefore WHOEVER (like a pinch hitter) is in the #9 spot, his outs are not worth as much. If we introduce a pinch hitter, then the LW values of the #9 hitter will be drastically changed. Even worse, his outs will be worth MORE. And this can't be right.

The batter operates in a vaccumm, where the other batters change. Unlike Pedro, where he is every plate appearance.

Anyway, sorry if I'm pounding this thing dead....

IP

mgl posted September 22nd, 2000 10:12 PM find more posts by mgl    edit/delete message   reply w/ quote
Senior Member
Member Since: Apr 2000
Location:

I'm glad you wrote such a "treatise." I have to ruminate and cogitate..

IP

tangotiger posted September 22nd, 2000 11:49 PM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

Let me get back to the out value for the leadoff hitter. Yes, I was definitely wrong. The normal out value for any batter is about .30 runs. The one situation where the leadoff hitter has more occurrences than anyone else is the one with bases empty and no outs. And the value of that out is .26 runs. I estimate then that the out value of a leadoff hitter should be worth about .29 runs if he operates in a vaccum. Given however that the best hitters in the team bat after him, then the .30 runs for the AL #1 seems right, and no adjustment is required for the leadoff batter for outs.

Given that, and assuming my leadoff values, here is the best leadoff hitter's PROFILE since 1975: Vince Coleman, 1986. Remember, with my assumptions, the adjustments were very minor for all offensive events except for the home run. So basically, whoever gets the fewest homers is best positioned for the leadoff spot. The worst one is Sammy Sosa, 1998. And the range between best and worst was only 7 runs.

However, trying MGL's adjustments (+.06 BB, +.01 1B, -.01 2B, +.02 3B, -.10 HR, +.04 SB, +.04 CS), we get: Rickey Henderson 1982, 1983, 1980, and Tim Raines 1981. The 6 worst were all Matt Williams and Juan Gonzalez. The range top to bottom was 15 runs. Marquis GRissom, in his Montreal days, was an average leadoff hitter type.

Anyway, I think the more interesting test is to take a team's lineup, and try to find the optimal setup, based on MGL's AL numbers.


IP

mgl posted September 23rd, 2000 12:13 AM find more posts by mgl    edit/delete message   reply w/ quote
Senior Member
Member Since: Apr 2000
Location:

I'll try and put the lwts/batting order matrix (NL and AL) into a spreadsheet. I can use my program to spit out some kind of a space or comma delimited text (ascii) file and then import this into a spreadsheet like Excel (I think). I don't work with spreadsheets, so I'm not sure. If anyone wants this, drop me an e-mail. I still have to think about this "end of the inning/end of the game" thing, and the out value thing. As Tango says, it shouldn't be lower in any given spot unless the OTHER spots have something wierd going on. In other words, the frequency of a single (or any other event(s)) in any given slot should have no influence on the VALUE of the single (or other event) in that same slot. Tango is right about the pitcher/batter analogy. That's why I always say you should use OTS (on base TIMES slugging) for pitchers and OPS (on base PLUS slugging) for hitters. It's the same idea.

IP

David Smyth posted September 23rd, 2000 08:06 AM find more posts by David Smyth    edit/delete message   reply w/ quote
Sports Guru
Member Since: Dec 1999
Location: Lake Vostok

Could the reason for the low NL 9th slot outs value be due to the sac bunt? Whenever there's a productive situation, where the cost of an out would be high, the pitcher bunts. My interpretation of mgl's nomenclature is that sacrifices are not included in the out values, which would mean that mostly bases empty situations and 2-out situations comprise most of the out value data.

Speaking of nomenclature, in an above post mgl gave me info on the values of IBB, HBP, etc. I'm about the last person who needs such a 'lesson'. I was simply unsure what was being included/not included in order for the out weight to balance out to zero.

And mgl, I maintain that OTS is better than OPS, even for batters. (I know all about the batter interacting with himself problem). OTS does allow some batter self-interaction, but much less than actually occurs on the team level. And what it does allow is compensated for, relative to OPS, by the much better balancing of on base/advancement issues. The only way to make OPS possibly better is normalize SLG and OBA to the league averages before adding them, but then you lose simplicity.

IP

tangotiger posted September 23rd, 2000 11:44 AM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

Boring stuff: using Excel. You can cut and paste exactly the matrix MGL posted into Excel. This will stick everything into one column of cells. You can "parse" the data into separate cells by clicking "Data/Text-To-Columns" and use "space" as the delimiter. Or you can import from comma-separated, and use "comma" as the delimiter.

IP

tangotiger posted September 23rd, 2000 12:11 PM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

Here's a little more food for thought.

Let's say the pitcher comes up with no one on and 2 outs. The Run Expectancy (RE) for a typical player is 0.10 runs, but for a pitcher it's 0.06 runs (just examples here). But these numbers are AFTER THE FACT.

Let's also say that the only two possible events in baseball are the out and the hit. If the 9th hitter gets an out, then the RE for the rest of the inning is zero (end of inning). If however the 9th hitter gets a hit, the RE for the rest of the inning (the top of the order) is 0.24 runs.

Therefore, for a pitcher, the LW is .06 for an out, and .18 for a hit. For a pinch hitter, the LW is .10 for an out and .14 for a hit.

What if instead, we look at the RE for the REST of the inning for EACH POSSIBLE type of event AFTER THE BATTER did his deed (the hit and the out). Whether it's the pitcher or the pinch hitter, after the hit, the RE for the top of the inning is ALWAYS .24. And after the out, regardless of the batter, the RE is zero.

Now, work it backwards and assume it is a league-average type player in that batting spot (batting-order neutral). What would a typical batter do? Well, let's say that 35% the average batter gets a hit with 2 outs and the bases empty, and 65% an out. Then you simply take .24 * .35 + .00 * .65 = .08 runs. THAT is the RE of an AVERAGE batter given the situation of having the top of the order coming up behind him (batting order specific).

In reality, there are a dozen batting offensive events that you have to account for.

What do you think?

IP

mgl posted September 23rd, 2000 01:54 PM find more posts by mgl    edit/delete message   reply w/ quote
Senior Member
Member Since: Apr 2000
Location:

David,

I realize that you don't need a lesson in IBB/HBP, etc. Sometimes I think aloud on these boards. Also, you are not the only person that reads these posts. So even if I am answering a question that you posed, I may include some info that is geared towards other folk... You really are a snob.

As far as the sac bunts...

Last night I thought that was the answer to the low out problem for the #9 NL slot. I had some bugs in my program (although that wasn't the reason for the out values). I did include sac bunts in the out values, which would lower the overall values, especially for the #9 (and other "weak") slot. Also, I did not include errors which should be included, I guess, and shoudl raise the out value. I also did not include FC's, which shouldn't matter all that much, I don't think.

Anyway, it dawned on me around 3 in the AM what Tango was talking about. The "problem" is in using the RE's for each slot to come up with the values. As Tango said, if the #9 slot always makes an out, the RE of the 8th slot with 2 outs, will always be 0. Since the RE after the 3rd out is always 0, the value of an out for the 9th slot is 0 minus 0 (RE before and after) or 0. It wouldn't be fair to use this value (o) for another potential hitter. This other hitter, assuming he didn't make out all the time, would change the before RE (to something positive), so the value of his out would not be 0. Same goes for all the other events, of course. So the reason that the #9 slot in the NL is so low, is because the RE for the #8 hitter is so low, especially with 2 outs.

I have no idea how to handle this. Tango may be on to something in his last post. I have to think about that...

IP

tangotiger posted September 23rd, 2000 04:05 PM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

MGL, Not too often do people think about me or what I say at 3AM. I guess it's too much to hope for that you are actually Latitia Casta or one of her Victoria Secret model friends?

That "average hitter thing" is probably what makes the most sense. The question we are always trying to answer is "how much better than average is he", etc, etc, or "all things being equal, meaning given the exact same situation and environement, what would an average player have done".

The quickest solution is to run your sim with the average stats for each of the 9 batting spots (i.e., pitcher averages in the 9th spot, mcgwire, soso and their brothers averages in the cleanup). Then, run 9 sims, and with each execution, replace the true stats for that particular batting spot with the "league average of all spots". This way, you always have 8 controlled stats, and one variable.

Then, run your RE 9 times, but limit it only to the batting spot where you introduced the variable. This will give you the true RE, based on the exact type of batters surrounding him (as I described in my last post).


IP

David Smyth posted September 24th, 2000 09:52 PM find more posts by David Smyth    edit/delete message   reply w/ quote
Sports Guru
Member Since: Dec 1999
Location: Lake Vostok

I hope you guys (mgl and Tango) post a clearly written explanation when/if you get to the end of this analysis. I (and probably most others), am not really sure what everything means. I think I'm capable of understanding it; I just need a clear, organized presentation of this interesting material.

IP

David Smyth posted September 25th, 2000 09:27 AM find more posts by David Smyth    edit/delete message   reply w/ quote
Sports Guru
Member Since: Dec 1999
Location: Lake Vostok

I think I got it now. The numbers from each slot are supposed to have a certain type of independence from each other, but they don't. The values for each slot are dependent on what the production is from the surrounding slots. Tango's suggestion is intended to control for this and produce the desired independence.

Is that it?

While I'm here, I have a question. Why is the value for CS (-.47) so much higher than in other eras? I realize it's a high scoring era, etc., but if that's the reason, then why is the BB value (.33) the same as it is at other times? The CS value has a much larger change than any other value, looking at the present and the past. The answer that that's what it is, by definition, isn't really an explanation. If more CS were occuring nowadays with 0 outs, that would make sense, but I don't think that's the case. If SB attempts have suddenly become much more randomly distributed than before, that might make sense, but I see no evidence of that--and besides, the SB value (.19) is the same as it always is.

IP

tangotiger posted September 25th, 2000 11:44 AM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

SB: As it turns out, the chance of scoring from 1B and 2B is always separated by about .17 runs, regardless of the scoring era. That is, if there is a 30% chance of scoring from 1B, there is a 47% chance of scoring from 2B. 24% chance of scoring from 1b? 41% chance of scoring from 2B. I only looked at this in the past quickly, but that was the basic result.

CS: The CS has 2 factors. One is how many runs is lost by the removal of the runner, and the other is the runs lost on the out. Both of these are completely dependent on the scorinig environment. In the above example, the runners lost value can be -.24 runs or -.30 runs or whatever era you happen to be in. Even worse, the more runs that the team scores, the more negative runs the out is worth. In today's high scoring era, you'd be crazy to be caught stealing since that'll wipe out -.30 runs.

CS, part 2: now adding up those values, and we get -.55/-.60 runs. But all these regression analysis shows that CS is much less. This brings up the old argument of the out as a fudge factor, and where if you use -.10 runs for the out, the result is true runs scored, and if you use -.30 runs for the out, the result is runs above average. It's apparent that we should use the the -.30 on all types of outs. The debate of this is probably best saved for a new thread.

What we are trying to do: Yes, David, I think you've got it. Basically, if there are 5000 plays with a runner on 1st and 2nd and 2 outs, and by the end of the inning there are 2000 runs scored, we can say that the Run Expectancy (RE) is 0.40 runs for that combination. There are 24 such base/out combinations, and you calculate the RE in this kind of manner (unless MGL does something different). The problem though, for what we are trying to do, is that the pitcher impacts the RE so much for his lack of hitting ability. Perhaps, if the pitcher came up with those 5000 situations, maybe only 1000 runs would have scored by the end of the inning, meaning a RE of 0.20 runs. This is fine, but when you try to extrapolate the value of each of the offensive event, just because THAT hitter only has a RE of 0.20 runs doesn't mean that the AVERAGE batter, giving those situations and circumstances would have gotten 0.20 runs. He clearly would not.

This is why I proposed splitting up each of those 24 situations to reflect what happens after a particular event (single, walk, homer, out). Then you would apply the league average OCCURENCE rate of each of those events to simulate an average batter coming up in those number of situations. This would force the RE up, since the RE will go up with the more hits and less outs created.

Once you've got that down, then it's just a matter of running MGL's program to determine the actual LW values.


IP

mgl posted September 25th, 2000 03:54 PM find more posts by mgl    edit/delete message   reply w/ quote
Senior Member
Member Since: Apr 2000
Location:

Tango,

Love your explanation of the SB/CS thing! To simplify it, positive events change little with an increase or decrease in run scoring and negative events (outs and removal of baserunners) change a lot. The "reason" this is so is because an increase in say .5 runs per game gets spread out over several offensive events (s,d,t,hr,bb, etc.). Since the out value must always counterbalance the positive values of the other events, all of this extra .5 runs per game also goes into one term, the out (negatively, of course).

BTW, the way I figure the RE's of all the bases/outs situations IS exactly as you describe.

I still have to think about your other suggestions...

IP

David Smyth posted September 25th, 2000 06:00 PM find more posts by David Smyth    edit/delete message   reply w/ quote
Sports Guru
Member Since: Dec 1999
Location: Lake Vostok

I understand what you's are saying about the CS value, but it still doesn't add up.

First of all, even though outs are only one category, there are more outs made than the sum of the positive events, which would at least partially negate what mgl just wrote.

But more importantly, according to the Hidden Game of Baseball, a CS in 1921-40 was worth -.39 and an out -.30. The ratio CS/out is therefore 1.3. For 1941-60, it's -.36, -.27, and 1.33, respectively. For 1961-77, it's -.32, -.25, and 1.28, respectively. But now it's -.47 and -.30, for a ratio of 1.57, which is way out of line with those other eras.

Why should the ratio of a CS to a regular out not still be at around 1.3?

IP

Patriot posted September 25th, 2000 08:56 PM find more posts by Patriot    edit/delete message   reply w/ quote
Sports Guru
Member Since: Jul 2000
Location: Ohio

Actually, I would ask if the value should be less because it is easier to replace the baserunner in a high offense context. Of course, than the out costs more, so it might negate it.

IP

tangotiger posted September 26th, 2000 11:48 AM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

Actually, it should not be a ratio, but a difference.

value of CS = value of Out + value of lostRunner

Reworking, we get:

lostRunner = CS - Out

Therefore, in your examples, the value of the lostRunner (in the early days) was .10 runs. And this is wrong. In the MGL example, it is .17, and this is wrong as well.

This only makes sense if there are an abundant number of CS in non-random fashion, where the chance of the runner scoring is already so low (say with 1 or 2 outs).

Given that I trust MGL's methodology, I will then assume that there is alot of CS with 1 or 2 outs, where the chance of the runner from 1B scoring is probably like 22% (instead of 27%), and where the value of the out, rather than being .30 runs is more like .25 runs. Adding the two, and you get a -.47 runs for the SITUATION-DEPENDENT CS.

The league average non-situation-dependent out remains -.30 runs.

As for the past, if we had all the stats, maybe we could explain it the same way.




IP

David Smyth posted September 26th, 2000 08:35 PM find more posts by David Smyth    edit/delete message   reply w/ quote
Sports Guru
Member Since: Dec 1999
Location: Lake Vostok

To me, Tango, your equation "lost baserunner= CS-out", only seems to work, or make sense, if you use the .10 instead of the .30 out value. Here is what Palmer wrote about the out value: "An out is considered to be a hitless at-bat and its value is set so that the sum of all events times their frequency is zero...".

IOW, the (his) CS value does not receive this added treatment. Using .09 for the out value in those eras, for 1921-40, lost BR = .39-.09, which is .30. For 1941-60, it's .36-.09 = .27. For 1961-77, it's .32-.09 = ..23.

All of these lost baserunner values seem reasonable, given a bit of variation in the typical number of outs in the inning on a CS. I think your own estimate for a baserunner at 1st was .27 runs.

For the now era, the equation is .47- .10 = .37. Is a current baserunner worth .37 runs? Mgl's BB value is .33. If you take off a bit to remove the advancement value of the walk, the remainder is around .27 instead of .37.

No, it still doesn't add up, IMHO. There's something wrong.

IP

tangotiger posted September 27th, 2000 12:55 AM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

I agree, there may be something wrong. I am only giving one possible explanation. The out value of -.10 runs is the league average for all base/out situations. If it turns out that the distribution of CS is not random, and skewed towards situations where there is a high penalty value for the out, then .47 - .17 = .30, and there lies one possible explanation.

While I champion the use of -.10 for most situations, I'm still undecided on issues like this, and that maybe the -.25/-.30 might be better. Cases can be made on both sides.

IP

David Smyth posted September 27th, 2000 11:16 AM find more posts by David Smyth    edit/delete message   reply w/ quote
Sports Guru
Member Since: Dec 1999
Location: Lake Vostok

1) According to mgl, the SB attempts nowadays are not really 'skewed' much from random.

2) For the out value to go from -.10 to -.17 because of the average CS situation would require a huge, huge situational variation. If the average CS situation were that much different nowadays than what it used to be, we would be quite aware of it even just by observing games.

The explanation lies elsewhere.

IP

tangotiger posted October 2nd, 2000 01:21 AM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

Going back to my earlier post:

====== snip ====
Here's a little more food for thought.

Let's say the pitcher comes up with no one on and 2 outs. The Run Expectancy (RE) for a typical player is 0.10 runs, but for a pitcher it's 0.06 runs (just examples here). But these numbers are AFTER THE FACT.

Let's also say that the only two possible events in baseball are the out and the hit. If the 9th hitter gets an out, then the RE for the rest of the inning is zero (end of inning). If however the 9th hitter gets a hit, the RE for the rest of the inning (the top of the order) is 0.24 runs.

Therefore, for a pitcher, the LW is .06 for an out, and .18 for a hit. For a pinch hitter, the LW is .10 for an out and .14 for a hit.

==== end snip ===

I'm thinking about how I can use this using MGL's batting spot LW. First off, if we assume (big assumption which we'll try to prove in another post) that the value of the out stays pretty constant from batting spot to batting spot, then maybe we can combine my above example with MGL's numbers.

In the above example, the value of the out should be .10 and the value of the hit should be .14. Yet, MGL's method shows that it is .06/.18 (and this is because the value of the hitter introduced the error... MGL explained it a bit better). In any case, if we KNOW that the out should be .10 and not .06, then we adjust the hit the EXACT same value of .04 from .18 to .14, so that everything balances out.

Following this logic (if it's even logical), and looking at MGL's #9 NL numbers, out is .186 runs, and the average out is .271 runs. That difference is .085 runs. Therefore his offensive numbers (bb,s,d,t,hr,wsb,cs) now changes to .345,.475,.775,1.105,1.475,.245,-.515. These numbers also turn out to be much closer to the league average, with the exception of the home run.

In a future post I will calculate how much an out SHOULD be worth based on batting order.

For now, these are the best profiles for each batting spot, assuming all the above is true:
leadoff - high walks, low homers, high SB
2nd - high walks, high singles, high SB, low CS, low HR
3rd - pretty much average all the way thru
4th - high singles, high doubles, high HR,
5th - low SB
6th - low singles, low doubles, low SB
7th - low singles, low walks, low doubles, low SB
8th - same as 7th
9th - high HR, high SB

Some interesting things come out here: maybe the best OBA should be the #2 hitter? your best hitter is your #4 hitter. your worst hitters are #7 and #8. the #9 is NOT the "2nd leadoff position" but rather the a one-dimensional home run guy (butch hobson maybe?). I think his value comes form the fact that the #7 and #8 hitters are bad hitters who have a hard time getting into scoring position, so the value of the HR of the #9 hitter is enhanced.

however, note that if the hitting order is changed to fit the above profile, then the whole thing has to be re-run, since everything affects everything else.

given all that, here are MGL's numbers, adjusted by my thesis above:

out -0.286 -0.286 -0.286 -0.286 -0.286 -0.286 -0.286 -0.286 -0.286
bb 0.382 0.395 0.333 0.344 0.308 0.272 0.259 0.269 0.349
s 0.492 0.520 0.493 0.519 0.468 0.422 0.414 0.424 0.484
d 0.767 0.780 0.778 0.809 0.793 0.727 0.734 0.734 0.789
t 1.057 1.095 0.993 1.024 1.063 1.097 1.074 1.054 1.109
hr 1.292 1.360 1.393 1.439 1.418 1.402 1.389 1.399 1.454
sb 0.217 0.215 0.178 0.169 0.148 0.142 0.144 0.149 0.229
cs -0.403 -0.510 -0.477 -0.411 -0.427 -0.423 -0.421 -0.421 -0.486

IP

tangotiger posted October 2nd, 2000 01:39 AM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

The value of the typical out is about .285 runs. The value of the typical out with no one on base is about .185 runs. Since the no one on base scenario happens about 55% of the time, we can figure that the value of the out with someone on base is about .407 runs. Now, the #4 hitter probably comes up to bat with a runner on base 50% of the time. Working that out, the value of the typical out for the cleanup hitter is .296 runs. So, we can assume that the value of the out, dependent on the batting position should be +/- .01 runs. Given that, it is clear that the OBA of the batter should not have too much of an effect when determining batting order.

If someone has the exact numbers, feel free to post them.

IP

mgl posted October 2nd, 2000 04:56 AM find more posts by mgl    edit/delete message   reply w/ quote
Senior Member
Member Since: Apr 2000
Location:

Here's what I think you have to do to come up with the proper (a good approximation) lwts values for each spot in the BO:

I think Tango's suggestion was the following. Use an average player's RE for the "before" RE for every slot. Then in order to calculate the "after" RE, use the frequency that each slot comes up to bat in the various bases/outs situations. Then use the average player's RE's for the "after" RE's. E.g., say the leadoff batter comes up with no one on and no one out 10% of the time. Use the average batter's RE for no one on and no one out (around .6 runs). So in that situation, a walk or a single is worth whatever a runner on first and no one out is for an average batter minus .6. Of course multiply this number by .1 (10%). Do the same thing for all of the 24 bases/outs situations. I think this is what Tango suggested. Is it, Tango? This eliminates the problems associated with my original lwts by batting order - namely that the "before" RE's are generated from an avergae batter in that slot. So if we want to substitute another batter in that slot - one that is not average - we would actually have to recalculate the "before" RE's, which would give us different lwt values for that slot. It's kind of like a recursive procedure (with only one "recurse").

There is one weakness to this (Tango's) method, although it may be better than using my original lwts values by batting order. Using league average "before" and "after" RE's for all 9 slots does not account for the slots AFTER the slot we are interested in. In other words, if we put in Barry Bonds in the #8 hole, and wanted to see how he fared there - it wouldn't be fair to use my lwts values for the 8 hole because they were generated using the "before" RE's assuming an average # 8 hitter. However, it also wouldn't be fair to use the "before" RE's for an average hitter, since the #9 (and #1 and #2, etc.) hitter has a significant effect on the "before" RE's of the # 8 hitter. For example, in the NL, the value of a HR in the #8 hole is probably quite a bit higher than the value of a single with no runners on and 2 outs because the #9 (and #1) hitters are not likely to drive in a runner on first with 2 outs, while a HR creates a run right away.

So here is the compromise between Tango's suggestion (if I got it right) and my original methodology. I think this will work the best, although it will require a lot of work (I'll work out the numbers). Use an average player's stats for each slot, but use the subsequent hitter's average stats for his (the subsequent) slot, and THEN calculate the "before" and "after" RE's. In other words, start with the frequencies for the 24 bases/outs matrix for each slot, as Tango suggests. E.g., the #8 hole bats with 0/0 (no one on, no one out) 8% of the time, 1/0, 5% of the time, etc. The "after" RE's will remain the same as the ones I have in my program - the real RE's for the subsequent slot, in this case the #9 hitter. It's the "before" RE's that are tricky. The problem before was that the "before" RE's are based on a typical #8 player. We somehow have to compute the "before" RE's based on this new player we are evaluating in the #8 slot. Here's the way to do that. It's a bit cumbersome. We take each bases/outs situation (24 of them) and then use our new player's stat profile to do the following: Let's say our new player gets a single 15% of the time, a double 5% of the time, etc.. We have to take each one of those percentages and do the following: Start with the first bases/outs situation - no one on and no one out. A single puts a runner on 1st with no one out. We now take the RE for the #9 hitter with a runner on first and no one out. This number multiplied by .15 (frequency of a single) is the partial "before" RE. How do we get the whole "before" RE? We do the same thing for the double with no one on and no one out. We take the #9 slot RE for a runner on 2nd and no one out and multiply this by .05 (frequency of a double by our new hitter). We then add this number to out partial "before" RE to get a new partial RE. Obviously, we keep on doing this for all the possible offensive events (using the frequencies of these events of our new hitter) and all 24 of the bases/outs situations. We then have 24 times 6 (144) different calculations to do (24 bases/outs situations and 6 possible offensive events, not including sb, cs, or GDP).

When we add the results of all these calculations, voila, we have the "before" RE for any hitter in any slot!

Now we can use the real "after" RE of the subsequent batter (I have these numbers in my program) and the "before" RE that we just calculated based on the stat profile of the new batter and the bases/outs frequencies of that slot, to come up with very (perfectly?) accurate lwts values. Of course, using the rigorous method I have just described, we are assuming that every other slot besides the one we are putting a certain player into, holds an average player. This is fine. We have to assume something. In reality, if we are trying to construct a perfect batting order for a particular team given 10 or 15 different possible players, putting one player in one slot will affect the "before" and "after" RE's of all the other slots, over and above the methodology I just described to calculate the "before" RE (you couldn't use a standard "after" RE as well); but practically speaking, I think this methodolgy is pretty darn close to perfect.

Of course, it would be a heck of a lot easier to just use a sim to generate a perfect batting order.

I hope this post is somewhat understandable.

I'll try and come up with a formula for the method I describe above. I can't print lwts values for each spot in the BO because they are not static. The whole point of this post is that they are different depending upon whom you are testing in a particular slot. I suppose we could use Tango's "average player" lwts values by batting order and then have an "adjustment" factor depending upon the profile of the batter we are testing in a particular slot. For example, the reason my original out value in the #8 and #9 slots are so low, as Tango pointed out, is because they were calculated using the typical (bad) #8 and 9 hitters. So using these out values for testing a good hitter in the #8 or 9 hole is a bad idea. We could, however, use an out value for the #8 or 9 hole that is based on an average batter, and then use an "adjustment" for the out value which would depend upon whether the player we were testing was better or worse than an average player. A bad player would have a lower out value (and probably a higher value for the positive events) and a good player would have a higher out value in those spots ( or any spot, I think). In fact, no that I think of it, this sounds like an excellent idea - better than the complicated formula and calculations we would need to do if we used the methodlogy I described at the beginning of this post. It would be nice to have the lwt values already calculated and then just adjust them for an individual batter rather than calculating new ones (in a very cumbersome manner) for every batter we are testing...


[Edited by mgl on October 2nd, 2000 at 04:10 AM]

IP

tangotiger posted October 2nd, 2000 01:27 PM find more posts by tangotiger    edit/delete message   reply w/ quote
Senior Member
Member Since: May 2000
Location:

MGL, I think you got the jist of what I'm saying. However, the only thing we have to adjust is the before RE, since the "before" RE is actually the "before and current" RE. That is, the RE is based on all other events leading to this hitter, AND the abilities of the current hitter. The "after" RE is completely fine, as it represents the actual RE of the event, based on whatever batting spots are coming up.

Therefore, to remove the "current" RE bias, you do exactly as you specified and assume an AVERAGE player's production of the offensive events, BUT with the frequency of opportunities of the 24 base/out situation of that hitting spot. So, in reality, you could change your program to produce the 24 base/out situation for EACH offensive event. And then apply the frequency to get the batter-neutral (but batting-order specific) "before/current" RE.

The "after" RE should not change since this represents the actual production of all batters after this batting spot.

The interesting thing of course is that the offensive events of the #8 NL hitter will be pretty low, because we KNOW the #9 NL hitter is a pitcher. BUT the offensive events of the #9 NL hitter will be pretty average because we are assuming the current hitter is average, and even more, that the next batter is at the top of the order.

I think, though I am not positive, that the adjustments I made based on my last post will give us results that are pretty much the same as what we are talking about here. It will be interesting to see that.

MGL, can your program spit out the following: the frequency in which each of the 24 base/out situations happens for each of the 9 spots for each league? I'd like to try some things with this.....

I don't know about you guys, but I'm having fun with this one....



IP

> rate this topic: 1: Worst 5: Best (5 is best)