Saturday, February 21, 2009

Team Rankings and Playoff Probabilities

The first chart shows how each team in the league has fared thus far in terms of adjusted winning percentage. Adjusted winning percentage is essentially each team’s Pythagorean Expectation, with the exception that, instead of goals for and goals against, I use adjusted goals for and adjusted goals against. In computing each team’s adjusted GF and adjusted GA, I simply take each team’s actual GF and GA, subtract shootout goals and empty netters, and then make a second order correction for schedule difficulty. In determining schedule difficulty, oppositional strength is determined through the goal differential of the opponent, the location of the game (i.e. whether it’s a home or away game), and the circumstances of the game – namely, whether or not it’s the second half of a back-to-back for the road team.

If you compare these rankings to the actual standings, most teams are similarly positioned. There is, however, one notable outlier.

The Rangers are currently 9th in the league in points per game, yet 26th by this metric. Not surprisingly, they’ve had a ton of success in the shootout so far (record: 9-4), which is basically equivalent to sheer luck. While some may point to the Rangers shot differential, especially at EV, as evidence of them being not that bad of a team, I’m inclined to disagree. Reason being: they're in the red in terms of expected goals, which suggests that they’ve been below average in terms of shot quality for, shot quality against, or both.

Of course, there are a few teams who can be labeled as either lucky or unlucky in general – notwithstanding the fact that that these rankings aren’t too different from the standings. In other words, teams who are either better or worse than these rankings would suggest.

In terms of teams that probably aren’t as good as their adjusted winning percentage would indicate, I’m thinking of BOS, FLA, and PHI. These teams have all been greatly aided by the percentages this year. I think that the success that each of these teams has experienced thus far is unlikely to continue during the remainder of the regular season and the playoffs. Granted, the Flyers outperformed their underlying numbers last season as well. As the sample size in games played increases, it becomes increasingly difficult for one to point to randomness in an attempt to account for success with the percentages. On the other hand, I find it very difficult to look at a team that’s scored 15 shorthanded goals and conceded none and say that they haven’t been at least somewhat fortunate. I just don’t think that they're an inherently good hockey team.

And for teams in which the opposite is true, I’m thinking of OTT, LAK, COL, and TOR. These teams have all been – for lack of a better term – utterly screwed by the percentages this season, to the point where none of them have a realistic shot at making the playoffs. This is unfortunate in the sense that, if you were to compare this group of teams with the three listed above, I don’t think that there’s much to choose between them. Hell, I think that one could make a reasonable argument for the Kings being the best team of the seven – at least, looking at it in terms of which team is most likely to experience success from this point forward.

Anyway, here the playoff probabilities for all 30 teams (updated on 02/19/09). The left hand column contains seeds 1-15 in each conference, with the corresponding column for each team showing the probability of finishing the season in that position, expressed as a percentage. So, for example, the Blackhawks have an (approximately) 1% chance of finishing in 1st place in the West. The final two rows contains each team's probability of making the playoffs (in the second last row) and each team's probability of winning the division (in the last row). Future game probabilities are based on the respective adjusted winning percentages of the involved teams, game location, and whether or not the game is the second half of a back-to-back for the road team.

Thursday, February 12, 2009

Coming off a Win/Loss: The effect of Prior Results

I’ve often wondered if the outcome of a team’s previous game has any affect on the result of that team’s subsequent game. Intuitively, I wouldn’t expect there to be much of an effect. The outcome of any given game is determined by many different factors, some of which are known to have a large effect.

While the result of the previous game could conceivably be one of these factors, it would probably rank pretty far down the list in terms of importance. In other words, if there is such an effect, I would expect it’s magnitude to be small.

That said, I’ve heard it argued before that the previous game does in fact have an effect on a team’s performance in the following game, so it’s something worth examining, I think.

On the one hand, some have suggested that the momentum of winning the previous game carries over to the next game, thus enhancing a team’s chance of success. According to this line of reasoning, the average team should do slightly better when coming off a win than when coming off a loss.

Conversely, others have suggested that winning breeds complacency, with losing having the opposite effect. This approach predicts that teams should do better when coming off a loss, on average.

I don’t think that either of these arguments have much merit. Both are based on the idea that psychological factors have a measurable effect on game outcomes, a premise with which I personally disagree. While casual fans often resort to folk psychology when discussing success and failure at the NHL level, its relevance has never, to my knowledge, been demonstrated through actual evidence.

In any event, I attempted to determine if the preceding game has any effect on following game results. My methodology was pretty straightforward. The sample included all regular season games played during the seasons of 2005-06, 2006-07 and 2007-08. Each game played was classified as a win, a loss, or a tie for both of the involved teams. For the sake of simplicity, any game that went past regulation was considered to be a tie. I then looked at whether that team won, lost or tied in its next game. Here are the results for 2007-08. The teams that had a better record when coming off a win compared to coming off a loss are shaded green. Teams for which the opposite was true are shaded orange.

Below is a chart of the average winning percentages of all 30 teams in each situation (coming off a win, coming off a loss, and coming off a tie) for all three seasons. The left hand column shows the average winning percentage for all 30 teams in games played after a win. The middle and right hand columns do the same, only for games where the team was coming off a loss and tie, respectively. It’s necessary to look at the average winning percentages rather than the aggregate winning percentages for one simple reason: better teams, by virtue of winning more games, tend to play a higher percentage of their games when coming off a win. For example, the Thrashers played a mere 18 games coming off a win last season; Detroit played 46. It needn’t be explained as to how this could confound the results.

Also included is a chart that breaks down the number of teams that had a better record after winning vis-à-vis their record after losing, and vice-versa.

The results are pretty consistent with my expectation in that the effect of the preceding game appears to be fairly small. In the 90 ‘team-seasons’ analyzed, 41 teams had a better record after winning, whereas the other 49 had a better record after losing. The average winning percentage for teams coming off a win was slightly less than 0.49. For teams coming off a loss, that figure was approximately 0.505. Therefore, it can be said that teams have, since the lockout, done slightly better after losing their previous game than they have when coming off a win. Of course, the margin is quite small and well within the potential range of random variance. Even supposing that the results are statistically significant, the influence of a team’s preceding game upon the outcome of its following game appears to be limited.

Sunday, February 1, 2009

Even Strength Shooting Percentage

To what extent is team-to-team variation in even strength shooting percentage the product of random variation? I'm not sure what the answer is, but I suspect that the contribution is substantial. I've included several graphs below in order to illustrate this. The table below the first graph contains the data upon which each distribution is based.

The first graph. The yellow line is the actual spread in EV ( note: 5 on 5 only) shooting percentage that exists among NHL teams at this point in the 2008-09 NHL season.

The X-axis contains the percentage 'categories' in which the figure listed is the midpoint value of the category.

They Y-axis is the relative frequency of each individual percentage 'category'.

As an example, 6 teams in the NHL this year currently have an EV shooting percentage that is between 0.08 and 0.085. As there are 30 teams in the league, the relative frequency is 0.2 ( as 6/30 = 0.2). The midpoint value for this category is 0.0825. Therefore, the relative frequency of the '0.0825' category is 0.2.

The pink line shows the predicted spread in EV shooting percentage if each team had the exact same underlying shooting percentage at ~0.085 ( i.e. the league average 5-on-5 shooting percentage). This was determined through the following.

1000 "seasons" were simulated.
For each "season", each team has an artificial shooting percentage.
This percentage is the number of goals that a team scores over x number of trials.
The number of trials is equivalent to the number of EV shots that the team has taken through this point in the season.
The probability of "scoring" in each individual trial is the same for every team at 0.085.
Therefore, any team-to-team variation will be the product of randomness.

A specific example will hopefully make this clear.

Philadelphia has taken 984 shots at EV at this point in the 2008-09 season. Therefore, Philadelphia has 984 trials. The probability of scoring in each individual trial for Philadelphia is the league average EV shooting percentage at ~0.085. In Philadelphia's first "season", they scored 107 times. As 107 / 984= ~0.109, Philadelphia's EV shooting percentage for their 1st "season" is 0.109.
I then did this for every team and repeated the process 100 times (i.e. simulated 100 seasons). Here's how the first 48 or so shaped out:

Even though the probability of a goal on any given "shot" is 0.085, the artificial shooting percentage will necessarily differ from 0.085 due to insufficient sample size. While it goes without saying, as the sample size (number of trials) increases, any given team's artificial shooting percentage will more closely approximate 0.085. Therefore, for teams that have taken more shots through this point in the 2008-09 season will have more "trials". The spread in shooting percentage for these teams will be lower due to them having a greater number of trials. For example, the standard deviation for Detroit's 100 seasons is ~0.007. By comparison, the same value for Pittsburgh is ~0.009.

The same rules regarding the x and y axes that apply to the yellow (actual) distribution also apply to the pink (random) distribution. The relative frequency for the pink distribution is the proportional representation of each artificial shooting percentage category. As an example, as there were 100 "seasons" and 30 teams, the entire sample consisted of 3000 artificial shooting percentages. 601 artificial percentages fell between 0.08 and 0.085. The relative frequency for the '0.0825' category is therefore ~0.2, as 601/3000 = ~0.2.

As many will note, the spread between the worst ( NYI at 0.069) and best ( BOS at 0.108) teams appears to be sizable, as is indicated by the breadth of the yellow distribution.

However, the pink distribution is itself fairly broad. In fact, it very closely resembles the yellow distribution. As would be anticipated, the yellow distribution is slightly broader than than its counterpart, but the difference is not large. This suggests that much of the inter-team variation in EV shooting percentage is the result of randomness.

The second graph, shown above, contains a 'smoothed' version of the actual distribution, which is represented by the dark line. The average shooting percentage in the league is currently ~0.085, as has been mentioned. The standard deviation is currently ~0.01. The dark graph is simply a normal distribution (bell curve) with a mean of 0.085 and standard deviation of 0.01.

The light line is merely the pink distribution reproduced. Again, the two distributions are very similar to one another.

The fact that the actual distribution is somewhat broader than the expected distribution shows that teams do indeed differ in their underlying shooting percentage at EV. Nonetheless, this variation is only very slightly larger than what would be predicted by chance alone. The underlying differences appear to be minimal.

Vic Ferrari
has done a lot of excellent, excellent work over at his site that is similar to this. Much of his work has examined the ability of individual players to influence shooting and save percentage while on the ice. His findings are comparable in that the vast majority of inter-individual variation seems to be due to random variation.

EDIT: I've included some supplementary data tables for the purposes of clarity.

I should mention that the data I used for this post was obtained at behindthenet -- an awesome site that I highly recommend. Without it, this post wouldn't have been possible.