Saturday, June 25, 2011

How often does the Best Team Win?

This year’s Stanley Cup Final concluded with a somewhat surprising outcome. The Vancouver Canucks – who were widely regarded as the league’s best club – were defeated by the underdog Bruins.

To those who regard the NHL playoffs as a competition designed to determine the league’s best team, the result can mean only one thing – that the Bruins were the best team all along, and the Canucks mere pretenders.

A more reasonable explanation, however, is that shit happens over the course of a seven game series and, because of that, the better team doesn’t always win. The Canucks were better than Boston during the regular season, and were likely better in the first three rounds of the playoffs as well. They were better than Boston last year and there’s a good chance that they’ll do better next year. They were probably the better team.

The Canucks may or may not have been the best team in the league, but if they were in fact better than Boston, then that means that a team other than the best team in the league won the cup. That raises an interesting question – how often does the best team in the league end up winning the cup?

(The answer, of course, will vary as a function of the level of parity that exists in the league. Because the level of league parity has varied over time as a function of era, we’ll confine our answer to the post-lockout years).

Unfortunately, the question cannot be answered directly due to the fact that it’s not possible to identify the league’s best team in any given season with any certitude. One can only speak in terms of probability and educated guesses.

It is, however, possible to arrive at an approximate answer through assigning artificial win probabilities to each team, simulating a large number of seasons, and looking at how often the team with the best win probability ends up winning the cup.

This exercise is made possible by the fact that the distribution in team ability – which we’ll define as true talent goal ratio –can be ascertained through examining the observed spread in goal ratio and identifying the curve which best produces that spread when run through the appropriate simulator.

In order to generate an observed distribution of results, I randomly selected 40 games from every team and looked at how each of them performed with respect to goal percentage (empty netters excluded) over that sample. This exercise was performed 2000 times for each of the six post lockout seasons. The following curve resulted:

The likely ability distribution is the curve shown below – a normal distribution with a mean of 0.5 and standard deviation of 0.03.

If a large number of half-seasons are simulated through assigning artificial goal percentages based on the above ability distribution, the spread in simulated results closely matches the observed results displayed in the first graph.

As the ability distribution can be used to generate results that closely parallel those observed in reality, it can also be used in order to answer the question posed earlier in the post – that is, the probability of the best team in the league winning the cup in any given season.

Here’s how the simulations were conducted:

  • For each simulated season, every team was assigned an artificial goal percentage based on the ability distribution produced above
  • The artificial goal percentages were, in turn, used to produce GF/game and GA/game values for each team
  • GF/game values were calculated by multiplying a team’s goal percentage by 5.49 (5.49 being the approximate average number of non empty net goals scored per game in the post-lockout era)
  • GA/game values were calculated by subtracting a team’s GF/game value from 5.49
  • All 1230 games from the 2010-11regular season were then simulated, with a score being generated for each individual game
  • The probability of a team scoring ‘x’ number of goals in an individual game was determined through taking its GF/game value and adjusting it based on the GA/game value of the opponent
  • If each team scored an equal number of goals, each team was awarded one point and a random number generator was used to determine which of the two teams received the additional point
  • After all games were simulated, the division and conference standings were determined in accordance with NHL rules (that is, with the teams ranked by points, with the division winners being placed in the first three seeds in each conference)
  • If two teams were tied in points, greater number of wins was used as a tiebreaker
  • If two teams had the same number of points and wins, then a random number generator was used as a second tiebreaker
  • The playoff matchups were then determined based on the regular season standings
  • Individual playoff games were not simulated; rather, each series was simulated as a whole based on the Pythagorean expectations (which were derived from the goal percentage values) of the involved teams
  • Home advantage for the higher seed was valued at +0.015

20 000 Simulations were conducted in total. Here’s how the league’s best team – defined as the team with the best underlying goal percentage in each individual season – fared. We’ll start with the regular season results:

The above chart shows how the best team performed in four areas – division rank, conference rank, league rank in points, and league rank in goal differential. So, as an example, the best team ended up winning the President’s Trophy – i.e. finishing with the most points – about 32% of time.

The results are interesting. The best team does very well in general, but the range in outcomes is considerable. It wins its division a majority of the time yet still manages to finish dead last every now and then (about once every 200 seasons). It wins the conference almost half the time and finishes in the top four about 84% of the time. However, it still misses the playoffs a non-trivial percentage of the time (2.2%). The latter fact may not be too surprising – the 2010-11 Chicago Blackhawks were close to being the best team in the league but only made the playoffs by the slimmest of margins.

It wins the President’s Trophy about a third of time and does even better in terms of goal differential, posting the best mark in roughly 40% of the simulations. However, it occasionally finishes in the bottom half of the league in both categories (about 2% and 1% of the time, respectively).

The graph below shows the distribution in year end point totals for the best team. It averaged just over 107 points, with a high of 145 and a low of 73.

And the distribution in goal differential (mean = 57; max = 161; min= -34).

Finally, the chart showing the playoff outcomes for the best team, and therefore answering the question posed earlier.


It turns out that the best team wins the cup 22% of the time – about once every five seasons. This accords well with what we’ve observed since the lockout, with the 2007-08 Detroit Red Wings being the only cup winner that was also unambiguously the best team in the league. The 2009-10 Chicago Blackhawks were probably the best but it’s hard to say for sure. The 2008-09 Penguins were a good team but the Wings were probably better that year. Ditto for the 2006-07 Ducks. The 2010-11 Bruins were merely a good team, and I can’t even say that much for the 2005-06 Hurricanes, who may not have even been one of the ten best teams in the league during that season.

Caveats:

The exercise assumes that team ability is static. This is obviously untrue in reality, given injuries, roster turnover, and like variables. Consequently, the best true talent team at one point in the season may not be the best team at a different point. Moreover, the spread in team talent at any given point in the season is likely to be somewhat broader than the ability curve used in the exercise.

Scores were generated for individual games through the use of poisson probabilities, which does not take into account score effects. Thus, the model slightly underestimates the incidence of tie games. For the same reason, it also overestimates the team-to-team spread in goal differential.

17 comments:

Anonymous said...

The 05-06 Hurricanes were 4th in the league that year. They had over 50 wins, tied for 3rd in points. They followed up that impressive regular season with a Cup win. And you claim they weren't a top 10 team that year? Come on now, that's just ignorance talking.

Hawerchuk said...

Very nice work, puts some ideas in my head! It would be interesting to see the impact of simulated trade deadline moves on these outcomes. Some of the observed outcome spread comes from teams acting to improve their teams if they've had good results and vice-versa.

Snarksd said...

Awesome post. Do you have probabilities on all ranked 30 teams? I'd be interested to know probabilities on the last team, and groups, e.g. Top 5, bottom 10, etc. This really does give an idea about the nature of parity in the NHL post lockout. Great work

JLikens said...

Anonymous:

The 05-06 Hurricanes were a +25 GD team (274 For, 249 against) once you exclude shootout and empty net goals. That gives them a goal percentage of 0.524. Ten teams were better in that respect (OTT, DET, DAL, NYR, NSH, COL, S.J, ANA, BUF & CGY).

They also had one of the easiest schedules in the league, having been the best team in the worst division. My schedule correction algorithm shows them to have had the 28th hardest schedule, behind only DET and NYR.

If you correct their goal percentage to account for the easier schedule, they fall to 0.514, which places them 13th in the league.

Another method of estimating a team's ability is to regress all major statistics in order to account for the percentage of variation due to luck.

The 05-06 Hurricanes rank 10th with this method, with a goal percentage of 0.514. But this method does not take into account team differences in schedule difficulty, and therefore flatters teams like Carolina.

So the numbers suggest that Carolina was probably somewhere between the 10th and 13th best team in the league.

It's possible that they were a better team than their numbers, but other lines of evidence don't really support that. For example, they missed the playoffs in the two preceding years as well as the two following ones.

JLikens said...

Hawerchuk:

That's a good point about lucky teams taking measures to improve their roster, and vice versa.

The model could be refined in order to account for this (as well as other stuff I've overlooked).

JLikens said...

Snarksd:

That's a good idea.

I'll make a follow up post with those numbers.

Anonymous said...

Good teams make their own luck. To use the 05-06 Hurricanes as an example. One of the knocks against them was the injury to Roloson in the Finals.

The injury definitely changed the outcome of at least one of the games. If Roloson wasn't injured, Conklin doesn't take over in Game 1, turn the puck over and the Hurricanes might not have won that game.

Speculating past that Game 1 involves too many factors, especially considering the Roloson that had looked invincible early in the playoffs had regressed considerably late in the Anaheim series, then had given up 4 goals against the Hurricanes in Game 1.

But back to the injury, which was caused when MA Bergeron pushed Andrew Ladd into Roloson as Ladd attempted to crash the net. Sure, it was lucky for Carolina that the injury happened, but would that play have existed if Ladd didn't follow the strategy of the team and do what got his team into the Finals in the first place?

JLikens said...

"Good teams create their own luck"

What does that even mean?

Anyway, the 05-06 Hurricanes were not a very good team compared to other cup winners in the post-expansion era.

They got outshot at even strength - even when the score was tied - and only managed to slay afloat on account of an unsustainably high shooting percentage.

Their PP and PK units were no great shakes, but because they were a great team at drawing penalties they tended up posting a relatively impressive special teams goal differential.

Put it all together and they were a +25 team, which isn't terribly impressive for a cup winner. There wasn't a lot of parity in the league in 2005-06 and, as I mentioned earlier, ten teams were better in that respect that year. And when you take into schedule difficulty, the Canes look even worse.

The 06-07 Hurricanes actually had better underlying numbers but got screwed by the percentages big time at EV and ended up missing the playoffs altogether.

snarksd said...

Hey J,
Just a quick methods question. How did you determine the artificial goal differential for each team? did this change for each game played? Or was a GD generated from probability each game? I would think this would be a critical step to the outcome, as changing the artificial GD changes the results quite a bit.

JLikens said...

snarksd:

This scores for each regular season game were determined through poisson probabilities, with the probability of a team scoring x goals being based on its theoretical GF/game value and the theoretical GA/game of the opponent.

A random number generator was used to determine the game result.

So, in the latest simulation for example, the first game featured ANA (a 0.472 goal ratio team) against DET (a 0.497 team).

ANA's GF/game and GA/game values were 2.59 and 2.90, respectively. DET's values were 2.73 and 2.76.

Thus, ANA has an adjusted GF/game value of 2.59, and DET has an adjusted GF/game value of 2.89.

DET's random number was 0.743, so it "scored" 4 goals (because the random number fell between 0.67 and 0.83). ANA's random number was 0.451, so it "scored" 2 goals (because the random number fell between 0.27 and 0.52).

Hope that answered your question.

E said...

now this makes for some good ponderin'. thank you.

N Albert Roche said...

I realize I'm quite late to the party here, but do you think you're maybe not thinking about this in the right way? I'd say that although not a huge sample size, a seven game series is enough for us to say that, if playing head to head, the Bruins are the better team than the Canucks. But what about all the other Western Conference teams the Bruins didn't play? Just because the Bruins beat the Canucks, doesn't mean they are better than the Hawks or Sharks or Predators, the teams Vancouver beat, nor does it mean they were better than the Wings or Capitals or Penguins.....They simply had, what was for them, the most favorable set of matchups. The regular season determined that Vancouver was the best team at beating the most teams, but the playoffs showed that they were not as good when it came to beating the other great teams (or those good enough to advance into the next stage of the season).

Anonymous said...

Albert - you are wrong.

Seven games is not a sufficiently large sample.

Anonymous said...

I like this post, but i think JLikens is a raging homosexual.

Food for thought...

Anonymous said...

Anonymous - you are wrong.

JLikens is heterosexual.

As is almost everyone else who cares about hockey.

Hostpph.com said...

I think that it is great that it has a different outcome that people would expect to happen. It makes things a little more interesting.

Anonymous said...

Not convinced...likens definitely is a raging homo erectus