Saturday, December 13, 2008


Since the 2005-06 season,   there’s been a lot of talk in the media about the amount of parity that currently exists in the NHL. While I’m inclined to agree with this,   I have a feeling that people are simply looking at the (presumably diminished) spread in point totals and making their conclusions on that basis.    This is,   of course,  completely misguided and incorrect.

Points totals themselves are not necessarily indicative of reduced parity.    For in order to measure parity,   you first have to measure team strength,   and point totals do not adequately measure team strength.

To be sure,   point totals are
correlated with team strength.    Hockey would be a very strange game if this were not true.    However,  there are certain problems with point totals that preclude its use as a proxy for team quality.

For one,   points totals are influenced by overtime and shootout success,   and I would argue that overtime and shootout success have very little to do with how strong a team is.    When I use the term ‘team quality’,   I’m referring to how good a team is at actually playing hockey.    And when I use the term ‘actually playing hockey’,  I’m basically referring to how good a team is at winning in regulation.    The distinction between regulation and extra-regulation results might seem arbitrary at first,   but there's good reason for it.    For one,  overtime and shootout success has almost nothing to do with regulation success.    Observe:

Moreover,  extra-regulation results are not very repeatable across seasons, especially compared to regulation results.

The fact that extra-regulation results have virtually nothing to do with regulation results and have little to no repeatability suggests that they are largely the product of randomness.    If something is largely random,   then it cannot be thought of as an underlying ability.    And if something cannot be thought of as an underlying ability,  then it ought not to be part of a metric that ostensibly measures team strength.    And yet,   shootout and overtime success
does have a sizable affect on point totals. Hence,  my reluctance to use point totals as a metric for team strength.

However,   the inadequacy of point totals goes much deeper than this.   Even before the advent of 4-on-4 overtime and the shootout, points were not the best metric for team strength.    The reason for this is that point totals only reflect wins and losses while completely ignoring the margin of victory.    If there are two teams with similar point totals, one of them tending to win convincingly and lose narrowly,   the other tending to win narrowly and lose convincingly,   then the former team is,   in almost all cases,   the better team.    The concept is an intuitive one.    If you disagree with the assertion that a team’s goal differential better conveys its ability relative to its point total or place in the standings,  then you’re probably at the wrong site.

Granted,  goal differential per se,   while better than points,   is not the best available metric.    Several corrections need to be made to it for this to be true.    Firstly,   shootout and empty net goals should be excluded from the totals,   as they provide no useful information.   Secondly,  raw goal differential is problematic in that not all teams play identical schedules.   Some teams,   usually by virtue of playing in a stronger division or conference,   are burdened with a more difficult schedule than average.    If you thought that the 2005-06 Phoenix Coyotes and the 2005-06 Carolina Hurricanes had equally difficult schedules,  then you would be mistaken.    Thus, some attempt should be made to correct for schedule difficulty.   Finally,  it is not so much a team’s absolute goal differential that is important,   but its GF-GA ratio.    A team that scores 200 goals and concedes 100 is better than one that scores 400 and gives up 300.    Furthermore,  simple goal differential is too sensitive to scoring context for it to provide any useful information on league parity,   as it would lead to the spurious conclusion that there was less parity in higher scoring seasons.    These two problems are avoidable by using each team’s Pythagorean expectation instead–  essentially,  its theoretical winning percentage determined through the following calculation:

(Adjusted goals for)^2 / [(adjusted goals for)^2 + (adjusted goals against)^2]

The resulting metric can be termed adjusted winning percentage.

AW% is important as provides us with a suitable metric for assessing team strength.   By computing the standard deviation in AW% in any particular season,  we’re essentially measuring parity.

What,  then,  does AW% tell us about the amount of parity in the NHL over the last ten years?

A few comments.    Firstly,  parity in the pre-lockout NHL was pretty invariant on a year to year basis (mean: 0.094, ST DEV: 0.008).    Only 1996-97 is anomalous,   with all of the remaining values falling between 0.092 and 0.101.    Secondly,   there is clearly more parity (read: the standard deviation in AW% is smaller) in the post-lockout NHL relative to the pre-lockout NHL.    The difference may not seem like much,   but the 2005-06 and 2006-07 values are separated by one SD from the pre-lockout mean.    The value for 2007-08 is 4 SD(!) from the pre-lockout mean.    That's a fairly significant difference.

Parity in the new NHL seems to be more reality than fiction.    Teams really are less separated in ability now compared to five or ten years ago.    I find this interesting as the purpose of having the shootout and three point games seems,  to me,  like a ploy designed by the NHL with the intention of creating the illusion of parity.   However,  the fact that the new NHL is characterized by genuine parity has in some sense obviated this purpose.    That considered,   perhaps the NHL should do away with three point games and the shootout.   I certainly wouldn't complain.


Sunny Mehta said...

Agreed that goal ratio is the best indicator of team strength in the long run. But i'm starting to come around to the idea that shot ratio might be better to look at in the short run (and i'm considering one season to be the "short run").

I'm sure you read Tyler's thing on PDO Numbers (ES SV% + ES S%), but basically the premise is that even goals scored and goals allowed can be subject to a lot of short term randomness due to the influence of a lucky (i.e. - unsustainable) sv% or s%.

So, to take your analogy further, if two teams had the same goal differential and the same goal ratio in a given season, i might be inclined to think the one with the better shot ratio was a stronger team.

JLikens said...

Well, I think it's an issue of reliability vs validity.

You're definitely right that shot ratio is less subject to error than goal ratio -- the split half reliability for shots for and shots against is much higher than that for goals for and goals against.

On the other hand, goal ratio is better in the sense that, relative to shot ratio, it captures more of the factors that have an impact on team performance (goaltending, shot quality, etc).

For example, shot ratio likely doesn't do justice to teams like Pittsburgh and Minnesota, who routinely get outshot yet manage, and have managed, to stay in the black in terms of GD (though for different reasons, I suspect). And it likely overrates shoot-happy teams with reliably bad goaltending, like Toronto and Carolina.

I think that expected goals, such as those used by Chris at hockeynumbers, are a step in the right direction in terms of capturing underlying ability while eliminating the noise. Then again, the current figures are only as good as the shot quality data that they're based upon. Not to mention the fact that they completely neglect goaltending as well.

Perhaps a fusion of the three would be most informative?

Sunny Mehta said...

"Perhaps a fusion of the three would be most informative?"

totally. we've made great strides in the realm of hockey knowledge over the past few years. i am excited about where we go from here.

Hostpph said...

you are right points are a good way to measure it. there are other way that it can make a better difference.