Tango on Baseball Archives

© Tangotiger

Archive List

Professor who developed one of computer models for BCS speaks (December 11, 2003)

"Next time, don't leave it up to us," Colley said. "It bothered me when (USC coach) Pete Carroll said Southern Cal did everything they could possibly do to be in the national championship game. No, you didn't beat Berkeley. You didn't do everything possible."
--posted by TangoTiger at 11:44 AM EDT


Posted 11:45 a.m., December 11, 2003 (#1) - The Colley Matrix (homepage)
  .

Posted 2:05 p.m., December 11, 2003 (#2) - Chuck Oliveros
  "The computer speaks!"

I hate that kind of remarks, not to mention those like the one made by Tony Kornheiser, "This is a man versus machine issue." It's nothing of the kind. In determining BCS rankings, the computer is little more than a sophisticated calculator. It is computing results based upon algorithms that implement the criteria decided upon by the humans who run the BCS. It makes no more sense to say that the computer itself is doing the ranking, than it would be to say that a sports writer's article is written by the computer upon which his word processor resides.

Posted 2:16 p.m., December 11, 2003 (#3) - tangotiger
  Machines are dumb but fast. Man is smart but slow. You would think people would appreciate how well man can use machines.... but not in sports.

Posted 2:33 p.m., December 11, 2003 (#4) - Erik Allen
  Amen, brother. When the computers disagree with the ESPN polls, the talking head response is to say that the computers are flawed since they cannot replicate the human poll process. I tend to turn the tables: what does it say about human fallibility that we can't have press poll that reproduces the results of a logical computer ranking system?

Incidentally, I think it would be quite easy to write a computer ranking system that replicated the coaches' and writers' polls: a team's ranking would be heavily weighted by its previous week's ranking, with an automatic drop in ranking for any losses. However, stating the "logic" of such a rating system displays what a ridiculous notion it is.

EA

Posted 2:39 p.m., December 11, 2003 (#5) - tangotiger
  Can any of the Unofficial Primate Statisticians comment on the Colley process and a logistic regression model (I think KRACH uses one)?

As well, I would have found it simpler to state the "1/2" initial values by simply stating that the first 2 games a team plays is against itself, thereby getting 1 win and 1 loss.

People forget that a win% is not against an average team but against an average team that doesn't include you. Throwing in say 12 games of Yanks v Yanks takes care of this thorny issue in a rather clean way.

However, the logistic regression process does not include, I don't think, regression towards the mean (it assumes that the performance results are representative of the true talent). Does the Colley method address this? Or is all this implicit?

Posted 3:05 p.m., December 11, 2003 (#6) - Sean Forman
  I got so upset by Bill Lyon's column yesterday, I sent a letter to the editor. I don't have the text here, but basically, I used the word processor metaphor mentioned above. My wife can tell you how much this "machines" vs. "humans" tact upsets me, perhaps because I program computers.

Posted 3:27 p.m., December 11, 2003 (#7) - tangotiger
  I don't follow college sports, but I overheard this issue 2 days ago on WFAN. In there, they mentioned how the #1 computer ranked team at the time, no matter what they did in their next game, were guaranteed to fall out of the ranking as #1. I remember a similar type issue with Tennis a few years ago.

First of all, can someone go into more detail as to what the situation was?

Secondly, I agree that this kind of stuff is frustrating. As the WFAN host was saying: how can you be #1 now, win, and then not be #1 tomorrow. Then, how can you be #1 now?

I would think what you'd want to do is make assumptions on the rest of the schedule as to who wins and who loses, going through this iterative process.

The issue, as I see it, is that all the past games are always being reevaluated based on the new set of games being played. If it was the case that it didn't matter if the team won or not, that they'd automatically be sent to below #1, then there's no reason for them to be #1 now.

Posted 3:28 p.m., December 11, 2003 (#8) - tangotiger
  To continue, I think that by forecasting the rest of the season, this causes all the past games being reevaluated to be lessened in impact (I think).

Posted 4:06 p.m., December 11, 2003 (#9) - Jesse Frey
  tangotiger,

I'm not sure what specific situation you are thinking of, but I can confirm for you that the Colley method does have the unfortunate property that playing a weak opponent and winning could cost a #1 team its ranking. This can occur even if the game in question is the only game played (i.e. even if there is no reevaluation of past games). This makes it of some interest to USC fans that LSU's game against a D-IA opponent didn't count in the Colley rankings.

The Colley method has this problem because it obtains its rankings just by comparing each team's winning percentage to the strength of its average opponent. The method doesn't correspond to any statistical model that would give you, for example, probabilities for future game results. A method based on a likelihood would almost certainly not have this type of problem.

Posted 4:24 p.m., December 11, 2003 (#10) - tangotiger (homepage)
  See above for the KRACH system (Bradley-Terry model). Surely they have something similar in college football? How does this compare to Colley?

Posted 4:59 p.m., December 11, 2003 (#11) - Jesse Frey
  tangotiger,

I have a program that can implement the B-T model described in your link (#10). Leaving out D-IAA games just as Colley does, the top 5 teams would be LSU, Oklahoma, Miami(OH), Ohio State, and USC. If you do a bit more of what one might call 'regression to the mean' and assume that each team, rather than simply having a tie with a fictitious average team (as in the link), splits a pair of games with this fictitious team, then the top 5 teams would be Oklahoma, LSU, Miami(OH), USC, and Ohio State. Under either of these methods, winning an additional game could only improve your rating.

Posted 5:14 p.m., December 11, 2003 (#12) - tangotiger
  Can you do as I noted as well, and add in 1 game where the team plays itself to a tie?

Posted 5:30 p.m., December 11, 2003 (#13) - Jesse Frey
  tangotiger,

I don't think that adding a game in which a team plays itself to a tie makes any difference in these ratings. The game or games that are added in need to serve as a penalty to keep undefeated teams from having an infinitely high rating, and a team having a game against itself doesn't do that.

When you add in a win and a loss against a fictitious average team, that loss appears less and less probable the higher the team's rating, and there eventually comes a point, even for an undefeated team, when increasing the rating further would decrease rather than increase the likelihood of the entire set of real and fictitious results.

Posted 5:37 p.m., December 11, 2003 (#14) - AED (homepage)
  The BCS has not done a very good job of selecting computer rankings on the basis of merit. The fact that a win over a I-AA team is ignored, while a win against a weak I-A team can lower your ranking is absurd. That wouldn't really have mattered here, though, since better computer rankings also put Oklahoma #1, LSU #2, and USC #3.

Only three of the ranking systems use statistics in any meaningful way. Sagarin's system is based on the Elo chess system, in that it finds the set of team ratings such that every team's record is "predicted" properly. (The cumulative odds of winning each game equals the team's number of wins.) This is similar to the KRACH system. Massey's and Wolfe's ranking systems are based on more accurate maximum likelihood models. All regress to the opponent strength rather than to the mean. This is somewhat problematic in that a win over a really bad team will lower your rating, albeit only by a tiny amount. Also, Massey and Wolfe do not consider home field, so overrank Ohio State (8 home games, 4 road games).

I've spent several years working on this sort of stuff; my homepage link has details on correct Bayesian treatments, both with and without margin of victory considerations.

Posted 10:53 p.m., December 11, 2003 (#15) - Alan Jordan
  Massey's does use home vs away. There are two versions of his ratings. The one he posts on his website uses the points scored and allowed. The one he uses for the BCS uses only wins and loses.

http://www.masseyratings.com/theory/massey.htm

Where most people in Baseball use the pyth or some variation, Massey has a function that based on difference of the score/ divided by the some of the scores and adjusted by two constants. He told me how he got the constants, but I'd have to dig up that email. He lists it out for football in this presentation.

http://www.masseyratings.com/theory/uttalk_files/frame.htm

The Colly Matrix on the other hand doesn't use homefield advantage. I asked him about that once and he stated that because homefield advantage varied so much from place to place that it was better not to model it all. His system is pretty simple (probably the only one you could do on a single worksheet of Excel) and doesn't really have a place for it. I'm not sure how you add it in.

The Colley Matrix essentially solves a simulaneous set of equations to derive the rankings. The equations basically represent who beat who with a bayesian prior of 1/2 thrown in. The real beauty is that you don't have to use an iteritive method to solve it because its linear. Bradford-Terry and logistic regression both require an iterative proceedure to solve them.

http://www.colleyrankings.com/matrate.pdf

There is a list of links on Massey's website that includes many other people who rank teams and players from various different sports, including our own AED (look for Dolphin).

http://www.masseyratings.com/index1.htm

Posted 2:09 a.m., December 12, 2003 (#16) - KJOK(e-mail)
  "...I remember a similar type issue with Tennis a few years ago.

First of all, can someone go into more detail as to what the situation was?"

Tango, I think the Tennis issue is a little different. I believe in Tennis they use a 12 month rolling tournament results method, so that if a tournament win by the player was 12 months aga and is just about to "roll out" of the calculation, then there can be situations where even if the player wins the current tournament, because the current tournament is of a lesser "tier" than the one rolling out, and/or because the quality of opponents defeated is less, the player could lose points even if winning the tournament, and thus go down in ranking.

Posted 2:09 a.m., December 12, 2003 (#17) - AED
  Alan, I think you're mixing up the details of Massey's two rankings. While he uses a fairly sophisticated homefield treatment in his main ranking, he does not use homefield effects at all in his BCS ranking.

The variance in score difference is proportional to the sum of the scores - something that is true in every sport and allows better rankings than what one gets from Pythagorean (or related) systems. From the link Alan provided, it seems that Massey's game outcome function assumes the variance is proportional to the square root of the sum of the scores (the standard deviation goes as the fourth root), which doesn't sound right.

As for Colley's system. Each team's rating equals its wining percentage plus the average of its opponents' ratings minus 0.5, with all teams additionally considered to have won and lost to an average team (rating=0.5). It would be fairly trivial to build in a homefield factor to this by adding another linear equation where the homefield factor is set equal to the winning percentage of the home team plus the rating difference between road and home teams minus 0.5. It's probably not worth the effort; it's sort of like arguing about 1.6OPS vs. 1.7OPS when linear weights is vastly better than either.

Posted 9:09 a.m., December 12, 2003 (#18) - Alan Jordan
  You're right Massey is not using the homefield advantage in his BCS rankings. I don't get that.

How could factoring homefield advantage not be worth the effort. If homefield advantage is an effect then not factoring it in produces biased results. Even if it has variable effect for teams, adding it in to the model should produce less biased results than leaving it out.

If you added a homefield advantage equation to the Colley Matrix, would you still be able to solve it without an iterative proceedure?

Posted 9:25 a.m., December 12, 2003 (#19) - Tangotiger
  Jesse:

The game or games that are added in need to serve as a penalty to keep undefeated teams from having an infinitely high rating, and a team having a game against itself doesn't do that.

If the NJ Devils start the year at 13-0, my sytem would give them 1 win and 1 loss against the NJ Devils so that they'd be 14-1. So, my method WOULD prevent a team from having an infinite high rating would it not?

By making each team play against the same fictitious .500 opponent, and being .500 against them, that to me screams of regression towards the mean. Heck, I'd bet that if you gave each team a 3-3 starting point record, that might approximate regression towards the mean the best (similar to my adding 2*600 PA to every player's record in Marcel).

Posted 10:08 a.m., December 12, 2003 (#20) - AED
  Alan, the process I described for adding homefield advantage to the Colley Matrix would just add one more variable and one more linear equation. So yes, it could still be solved by matrix inversion.

Tango, there is no way to build an effective probabilistic win-loss rating without using a prior of some kind. The odds of all results this season having happened are maximized when all undefeated teams have ratings of infinity (and winless teams have ratings of negative infinity), since only then is are the probabilities of those teams' wins (losses) maximized at exactly one. A team's games against itself don't change anything, since those probabilities are constant as ratings are changed. (By crediting a team with a win and a loss against itself, you multiply the total season probability by 1/4, regardless of the team's rating). Likewise, a team whose only loss was to an unbeaten would gravitate to a rating of 'half infinity', which gives 100% probability of its wins and losses having happened.

Failing to use a prior also violates Bayes' theorem, in that it ignores the fact that it would be quite unlikely for a team to score 1000 on a rating system that has never created a team rating over 10.

Posted 11:26 a.m., December 12, 2003 (#21) - Jim (homepage)
  I'm a BCS novice, but I noticed from David Wilson's page (linked above) that the BCS includes both predictive and retrodictive ratings systems. Can anyone here explain why?

Also, one thing that bothers me about the strength of schedule component is that it keeps adjusting itself throughout the season beyond what seems relevant. For example, USC lost points when Syracuse beat Notre Dame because USC played Notre Dame two months ago. But did USC really get perceptively weaker when SU beat ND? Or to put it another way, if USC and LSU were scheduled to play each other, and then SU beat ND, would the betting line on USC/LSU change as the result of this seemingly unrelated game? I seriously doubt it.

Posted 11:40 a.m., December 12, 2003 (#22) - Tangotiger
  I still don't understand why making a team play itself does not take care of the infinity issue.

I understand that it's not required in any other case.

I don't understand why adding 1 win and 1 loss is required.

********

I would love it if the statistical pros that grace this board (Ben V-L, AED, Alan, Walt, and a few others) can give a step-by-step in explaining some of this stuff, without losing the audience.

Posted 11:56 a.m., December 12, 2003 (#23) - Tom T
  "Also, one thing that bothers me about the strength of schedule component is that it keeps adjusting itself throughout the season beyond what seems relevant. For example, USC lost points when Syracuse beat Notre Dame because USC played Notre Dame two months ago. But did USC really get perceptively weaker when SU beat ND? Or to put it another way, if USC and LSU were scheduled to play each other, and then SU beat ND, would the betting line on USC/LSU change as the result of this seemingly unrelated game? I seriously doubt it."

I think the idea is that, as you see how an opponent does against other teams, that tells you more about how impressive your win (or loss) against them was. Suppose, for example, that Team A beats the number 1 team in the country in September. People would see that and think, wow, that Team A must be pretty good, they just beat the #1 team in the country.

But, now, suppose that a couple of months later, this (now former) #1 team in the country lost three straight games to fairly mediocre competition. All of a sudden, that win by Team A doesn't look so good -- it wasn't against the best team in the country, it was against a mediocre team that was severely overrated.

To a lesser degree, this is what happened when ND lost to Syracuse. Assuming the computer dealt with this correctly, basically ND's loss to Syracuse made ND look worse, so, therefore, USC's win over ND was less impressive, because it was against a worse opponent than one might have thought at the time that USC won that game. That seems reasonable to me.

Posted 12:07 p.m., December 12, 2003 (#24) - Jim
  Tom, I understand the concept behind strength of schedule, but I just think they're splitting hairs by talking about a team that you played two months ago. Perhaps ND simply isn't as strong as it was then, maybe due to injuries or whatever.

I still go back to the Vegas hypothetical. If the ND/SU game wouldn't affect a USC game betting line by even 0.5 points, then you're not talking about anything of significance. You might as well flip a coin.

Posted 12:15 p.m., December 12, 2003 (#25) - AED
  Jim, the BCS includes only 'retrodictive' rankings. Predictive rankings are more accurate in terms of predicting outcomes of future games, but the best ones rely exclusively on game scores (no win-loss bonus) would give #1 vs. #2 pairings that are unacceptable to the general public. For example, the 2002 title game would have been Kansas State vs. USC, despite a couple losses for both teams. Sagarin and Massey both have predictive rankings, but their rankings used by the BCS are entirely retrodictive.

The schedule strength issue is a thorny one. A real-life example was Alabama a few years back, which was ranked #3 in the preseason and went on to have a dismal season. Their first opponent was a Pac-10 team, I think USC, and the poll voters gave USC a ton of credit for beating Alabama. The final computer rankings were less impressed, because they evaluated Alabama as they really played rather than how good they were perceived to be at the time of the game. So yes, you really do have to use all games an opponent plays - including those played later - to make an accurate statistical evaluation.

Posted 12:33 p.m., December 12, 2003 (#26) - AED
  Tango, assuming you are asking about a KRACH-like system, the basic idea to find the set of team ratings such that each team's expected number of wins equals its actual number of wins. The expected number of wins is the sum of the probabilities of winning each game. In other words, if your "trial" rating has team A one sigma better than team B, team A gets 0.84 expected wins and team B gets 0.16 expected wins.

The infinity problem comes in if a team is unbeaten, in which case you need to find the set of ratings such that its probability of winning each game equals one. Because even a 10-sigma mismatch has a nonzero chance of an upset, one only finds a win probability of exactly one for all of a team's games if its rating is infinitely better than that of any of its opponents.

Having that team play itself to a tie N times (or winning N/2 times and losing N/2 times) doesn't affect this, since you're adding N/2 to both sides of the equation (the probability of a team beating itself exactly equals 1/2). Thus you would still need the probabilities of winning each of its real games to all equal one, which again is only the case if it is infinitely better than its opponents.

Instead you have to do one of the following. The KRACH system adds N (N=1?) ties against an average opponent to the team's rating. Sagarin's system appears to credit a win as slightly less than one win (perhaps 0.97 wins) and a loss as slightly more than zero wins (around 0.03), or something to this effect. This is equivalent to using a prior in a probabilistic model, and is equivalent to regression to the mean.

The main drawback to this sort of model is that ignores details of which teams you beat and which you lost to. All it tries to do is match the actual and expected wins, but the specific details of which games were won and lost gives additional information about a team's quality.

Posted 1:02 p.m., December 12, 2003 (#27) - Jesse Frey
  tangotiger,

Here is an example to explain the infinity issue. Suppose that there are only 2 teams, A and B. A single game is played, and A defeats B. In the Bradley-Terry model, we assume that there is a hidden merit parameter for each team, say m(A) for team A and m(B) for team B. The probability that team A defeats team B, given these merit ratings, is m(A)/(m(A)+m(B)). Since we observed only that team A defeated team B, we maximize the likelihood of the observed game results by choosing m(A) and m(B) in such a way that the probability m(A)/(m(A)+m(B)) that A defeats B is as large as possible. Since the merit parameters are determined only up to a constant, we may assume that m(B)=1. We then try to maximize m(A)/(m(A)+1). No finite maximizing value m(A) exists, since the bigger we make m(A), the higher the value for m(A)/(m(A)+1). This is the problem that adding in extra game results tries to solve.

Suppose now that we add in a pair of game in which team A defeats itself and then loses to itself. The probability of A defeating B is still m(A)/(m(A)+m(B)). The probability that team A defeats itself is m(A)/(m(A)+m(A)), or 1/2. The probability that team A loses to itself is also 1/2. The likelihood which we choose m(A) and m(B) to maximize is then (1/2)(1/2)(m(A)/(m(A)+m(B))). Since the two factors of 1/2 just give a constant factor of 1/4, we end up in the same situation as in the previous paragraph.

Suppose that, instead of adding in a game in which team A ties itself, we add in, for each of team A and team B, a win and a loss to a team with known rating 1. The probability that A defeats this team is m(A)/(m(A)+1), and the probability that A loses to this team is 1/(m(A)+1). Thus the probability, given m(A) and m(B), that all 5 (1 real, 4 fictitious) games have the results we observed is the product (m(A)/(m(A)+1))*(1/(m(A)+1))*(m(B)/(m(B)+1))*(1/(m(B)+1))*(m(A)/(m(A)+m(B))). What makes this different from adding in games where team A plays itself is that the additional factors involve the parameter m(A). Choosing m(A) and m(B) to maximize this product then gives the ratings m(A)=1.695 and m(B)=0.590 without any further normalizations.

Posted 1:06 p.m., December 12, 2003 (#28) - Tangotiger(e-mail)
  Jesse, thanks for that great example. I'll have to think about it, as I'm about to feed my baby.

(Btw, is your logistic regression progam available? If it is, please email me, as I have some things I'd like to run.)

Posted 2:25 p.m., December 12, 2003 (#29) - Jim
  OK, if the BCS is not intended to be predictive, that makes the strength-of-schedule issue make more sense. So USC's strength-of-schedule factor would be no different if they had beat ND last week (as opposed to two months ago).

On the other hand, perhaps a predictive system would weigh recent games more heavily, including the strength-of-schedule component, to account for teams improving or declining during the season.