Tuesday, April 27, 2010

The Repeatability of Special Teams Performance

In my post on playoff probabilities, one of the methods in which I calculated each team's expected winning percentage was on the basis of the underlying numbers.

Under this model, shot volume on the powerplay, shot prevention on the penalty kill, as well as penalty differential, were incorporated as determinants of special teams goal differential. However, neither shooting percentage on the powerplay nor save percentage on the penalty kill were used as predictors.

Initially, my intention was to include both variables within the model. However, after looking at the relationship between team powerplay shooting percentage in even numbered games and team powerplay shooting percentage in odd numbered games in the 09-10 regular season, I discovered that there was essentially no correlation. I then did the same thing for the 07-08 season, and the result was the same: no relationship.

I found this to be unusual, given that I had looked at the distribution of powerplay shooting percentage in the past and found that the team-to-team spread was somewhat broader than what one would expect if there was no skill component. Nevertheless, my exercise had revealed the absence of any split-half correlation, thus necessitating the exclusion of PP S% from the model.

(As mentioned above, I also excluded PK save percentage, even though I had not specifically examined its repeatability. This was somewhat unjustified given that, as discussed below, team PK SV% is somewhat repeatable. However, the regression is fairly strong and, even though I ought to have taken it into account, it's exclusion didn't affect things too greatly.)

In any event, my curious findings prompted the following question: To what degree is special teams performance repeatable?

Real Effects

Vic Ferrari had an excellent post about a year ago where he looked at the various components of team even strength performance -- specifically, shooting percentage, save percentage, and shot differential -- and determined the extent to which each component was repeatable. Specifically, his method involved looking at each team's shooting percentage, save percentage, and shot differential, all at EV with the score tied, in 38 randomly selected games from the 2008-09 season. He then looked the same variables over a separate 38 game sample, and determined the correlation between the two sets of games. The exercise was then repeated over 1000 simulations.

The rationale behind the exercise is a simple one -- as expressed by Vic, "if an element of nature is affected by something other than randomness, that it should sustain itself from one independent sample to another." Thus, if the observed correlation is significantly non-zero, it can be assumed that the variable is at least partly determined by factors other than luck. On the other hand, if the observed correlation is insignificantly different than zero, then fluctuations in the variable are assumed to be primarily luck driven.

I decided to apply a similar technique in order to determine the degree to which the components of special teams performance are governed by 'real effects.' Specifically, my methodology involved the following:
  • I obtained special teams data at the team level for each season from 2003-04 to 2009-10
  • Within each season, I looked at team performance on specials teams at the level of individual games
  • In particular, I looked at the following variables: powerplay shooting percentage, penalty kill save percentage, powerplay shot rate (shots for divided by time on ice), penalty kill shot rate (shots against divided by time on ice), and powerplay ratio (the ratio of powerplays drawn to powerplays conceded)
  • However, shooting rates were not examined for 2007-08, 2008-09, and 2009-10, as I was not able to obtain data on PP TOI and PK TOI for those seasons
  • Empty net goals were excluded when calculating shots and goals
  • For each team, I randomly selected 20 home games and 20 road games, combined the two sets of games, and looked at how that team performed within that sample with respect to the above stated variables
  • I then did the same thing for 40 other randomly selected games (again, consisting of 20 homes and 20 road games)
  • I then looked at the correlation between the two sets of games for each of the listed variables
  • I repeated the exercise 1000 times, for each of the six seasons
The results

I should note that the final highlighted column shows the averaged value for each variable.

As indicated by the table, both generating shots on the powerplay and preventing shots on the penalty kill appear to be largely ability driven measures. The same applies to drawing more powerplays than the opposition.

Not surprisingly, both PP S% and PK SV% are less ability driven than the other three variables. It's worth noting that PK SV% appears to be more reliable than PP S%. I presume that this can be attributed to the influence of the goaltender on PK SV%.


Scott Reynolds said...

Thanks JLikens! This is great!

The results are about what I'd expect, though PK save percentage looks to be more repeatable than I'd anticipated. I just figured forty games wouldn't usually provide a big enough sample of PK shots to get a look at the real ability of goaltenders.

A quick question: Does this data include only 5-on-4 situations or do they also include 6-on-4, 5-on-3 and 4-on-3 chances?

Vic Ferrari said...

Great stuff again, J.

JLikens said...



You raise a good point about the sample size that I planned to address in my post, but opted not to in order to keep the post length tolerable.

40 games is a fairly small sample, and if you look at the seasonal correlation for the same variables, the relationship is invariably stronger. For example, the seasonal correlation for PP S% at the team level is somewhere on the order of ~0.25.

The strength of the PK SV% correlation is interesting, though. If the same exercise is applied to EV SV% with the score tied, the average correlation is about 0.10. The increased correlation observed with PK SV% is not the result of sample size, as the number of shots faced over the course of a season is about the same for both situations (~500 ).

As the goaltender exerts influence in both situations, the implication is that team factors have an effect on PK SV%. I believe that Vic made this point a while back, although I can't remember where.

Regarding your other question, the data includes all powerplay situations, rather than just 5-on-4. Ideally, I ought to have isolated 5-on-4 situations. However, because I scraped the information from the NHL.com play-by-play feed - which merely identifies whether or not the shot was taken on the powerplay, and not the specific type of man advantage --, I wasn't able to.

JLikens said...



It wouldn't have been possible but for your initial post on real effects.

It's a great method, and not something that I would have come up with on my own.

Scott Reynolds said...

Thanks for the response.

Why would the stronger correlation in PK save percentage suggest that the team has a greater effect in that situation? If the percentage of scoring chances per shot is higher on the PK, perhaps that situation is more revealing of goaltender ability. On the other hand, maybe some teams really are better than others to a significant degree at limiting shot quality on the PK. It's an interesting question but I'm not sure I'm fully comfortable with either answer as the right one just yet. Though if I had to guess, I would side with you that the team has a greater effect on the PK than they do at EV.

JLikens said...


You're right - both explanations are possible.

I suppose one way to settle the issue would be to compare the PK SV% of goaltenders that changed teams to the PK SV% of goaltenders that remained with the same team.

Perhaps that's something that I ought to look at this offseason.

Vic Ferrari said...

I think that getting rid of the 5 on 3 PKs would make a significant difference. There are teams that just never seem to have to kill one off (I don't know if I've ever seen MIN killing a 5 on 3) and teams that always seem to take a whack of penalties while PKing (ANA comes to mind). That's going to make a measurable difference when we're looking at things so closely.

Also, shot counter bias is huge here. Corsi and Fenwick provide some remedy for that at EV, but they don't help us on special teams. I think a guy would have to use just road games in order to minize that effect.

A quick check for team effects on goalie PK save% would be to look at team 4v5 save% from Gabe's site. Then the same for the No.1 goalie on each team. Then subtract the shots and saves for each goalie from his team's totals. That would be the combined backup goalie shots and saves on the PK.

Then see if there are any real effects from starter to backup(s).

Makes sense, no?

JLikens said...


Good point - the 5-on-3s are a big confound.

Your suggestion about comparing the 4-on-5 SV% of the starter to his backups is a good one.

I actually remember doing a similar exercise last year (i.e. looking at the (weighted) correlation between the SV% and his backup within certain seasons).

In order to mitigate sample the size issues, I assigned each data point a weighting based on the number of shots faced by the backup.

I wasn't able to find much - If I recall correctly, the correlation was about 0.15 for EV and 0.04 on the PK. However, the PK data included 5-on-3s, which, as you said, tends to distort things.

I think I'll take your suggestion and repeat the exercise at 4-on-5 only (ideally, with home games excluded as well).

Tom Awad said...

Stupid question... in your 2 samples of 40 games, can there be any overlap (i.e. games chosen in both samples?). That would skew the results.

JLikens said...


There was no overlap.

If a game was chosen in one group, it was not included in the other.

Tom Awad said...

Just checking. Excellent analysis, BTW.

Tom Awad said...

Another stupid question (I'm full of them!): any idea if arena bias plays in the PP / SH rates? At evens it balances out, but if we have arenas that record fewer shots suddenly we're seeing "repeatable skill". It would affect the percentages too, in the other direction, but there's so much noise there that you wouldn't notice it.

It would be interesting if you did only road games. The coefficient would go down, but it might go down more than we think.

JLikens said...


You're right - the inclusion of home games is going to make each metric appear to be more sustainable than it actually is (aside from powerplay ratio, of course).

Initially, I considered looking only at road games, but decided against that because I didn't want to make the sample any smaller than it already was.

That said, I think I'll take your suggestion and re-run the experiment for road games only. If nothing else, it'll provide an interesting comparator.

BTW, your questions aren't stupid - they raise some legitimate issues that I didn't address in my post.

Host Pay Per Head said...

I didn't know that you used that kind of method to predict the winner and it sounds quite interesting.