Tuesday, May 10, 2011

Team Effects and Even Strength Save Percentage

The extent to which a goaltender's team has an impact on his save percentage - and, in particular, his even strength save percentage - has received some attention in the hockey blogging world in the past - see here and here for some good articles.

One way in which team effects on even strength save percentage (EV SV%) can be gauged is to compare goalies that changed teams to goalies that remained with the same team. This can be done through creating two groups of goalies on the basis of the above criterion, and looking at how well EV SV% repeats from one year to the next for each group. If the correlation for the group of goalies that changed teams is significantly smaller than the correlation for the group of goalies that remained with the same team, then that would be evidence of team effects.

Using a spreadsheet kindly supplied by Geoff Detweiler, I performed the above exercise with respect to goaltender data from 1997-98 to 2010-11. Goalies that played for more than one team in a single season were excluded. No minimum shots faced cutoff was employed. However, because some of the goalies in the sample faced very few shots in a given season, I used a weighted correlation in which the weight assigned to each season pair was the lower number of shots faced in the two seasons used. Thus, if a goalie faced 1600 EV shots in one season, and 400 in the next, the weight assigned to the season pair would be 400.

Additionally, because the league average EV SV% was not uniform over the period in question, I adjusted each goalie's raw EV SV% by dividing it by the league average EV SV% in that particular season. Here are the results:

[ n refers to the number of season pairs in each group ]

The correlations are scarcely distinguishable, which implies that team effects aren't important at even strength. This essentially replicates what Vic Ferrari found when performing similar analysis a few years ago.

Of course, that doesn't necessarily settle the issue. For example, another approach would be to look the relationship between the EV SV% of a team's starting netminder and the collective EV SV% of its backups. If the two variables are positively correlated, then that implies the existence of team effects.

The advantage of this method is that it allows for team effects to be measured more directly by examining the relationship between the variables of importance at the within-season level. This is significant as team effects on save percentage - to the extent that they do exist - may not repeat overly well from one season to the next. For example, Tom Awad, in an excellent article written last year, found that while team differences in shot quality over the course of a single season were much larger than what would be predicted from chance alone, the metric exhibited weak season-to-season repeatability.

Using the same goaltender data referred to earlier, I separated starting goaltenders and backup goaltenders into two groups. A starting goaltender was defined as the goaltender that faced the most shots for his team in a particular season. All other goaltenders were defined as backups, except for goaltenders that played for more than one team in a season, who were excluded from the sample. Just like in the first method, the EV SV% for all goaltenders was adjusted by dividing same by the league average EV SV% in the particular season. I then determined the weighted correlation between the EV SV% of a team's starter with the collective EV SV% of his backups. The weight assigned to each data pair was the lower number of shots faced by either the starter or his backups. So, for example, if the starter faced 1000 shots, and his backups collectively faced 1400, the weight would be 1000.

After doing all of that, I obtained a correlation of 0.156. With 340 data pairs, the probability of a correlation that large materializing by chance alone is very small - slightly under 1%, in fact. Moreover, it cannot be accounted for by shot recording bias.* Therefore, it would appear that the EV SV% of individual goaltenders is affected to some degree by team effects. The question that must now be answered is this: how large is the effect?

As discussed before on this blog, the fact that two variables are weakly correlated over a given sample does not in itself mean that there is no strong underlying relationship between those variables. For example, if each of the variables exhibits low reliability over the sample in question, a weak correlation may in fact indicate a close underlying relationship. Thus, ascertaining the reliability values of the two variables is critical in interpreting the significance of the correlation between them.

Applying this to our value of 0.156, it becomes necessary to determine the seasonal reliability co-efficients of both starting goalie EV SV% and backup EV SV%. While it is not possible to perform this calculation directly,** it can be approximated by simulating seasons to match the spread of the averaged observed results and noting the average correlation between such seasons. Using this method, the approximate reliability co-efficients are 0.33 for starter EV SV% and 0.22 for backup EV SV%.

These values imply that the true correlation between the two variables is roughly 0.58. Assuming that both variables are normally distributed***, this means that the variation in one variable would be able to explain 33% of the variation in the other over the long run, suggesting that team effects are important.

As a final note, this post was intended to generate discussion more than anything. Comments demonstrating flaws in my reasoning and/or methodology are welcome, as is the presentation of contrary evidence.

* Shot recording bias causes the save percentages of goalies playing for the same team to be more similar to one another than what would be the case if shots were recorded in the same in every rink. However, because of a) the small number of shots taken over the course of the season, b) the relatively mild nature of the bias, and c) the fact that half of all games are played on the road, the effect is fairly minor. Of the observed correlation of 0.156, only 0.018 can be attributed to shot recording bias.

** Ordinarily, and as I've done in the past, I would calculate the split-half reliability values for each variable and then calculate the split-half correlation between them. This method is superior as no approximation is necessary with respect to determining the reliability co-efficients. Unfortunately, EV SV% data at the individual game level is required in order to do so. As such data is only available for 2007-08 onward, I'm only able to apply this method to the years of 2007-08, 2008-09 and 2009-10 (and even then, for 5-on-5 play rather than all EV situations). Here are the results:

The results imply that team effects have a very important role in relation to 5-on-5 SV% - indeed, that there would be a perfect correlation between the 5-0n-5 SV% of starters and backups in the long run! This being an obvious absurdity, I think it's preferable to ignore this and concern ourselves with the results from the larger 12 year sample instead.

*** This is merely a simplifying assumption. In reality, it is unlikely that either variable is normally distributed.


Anonymous said...

I am not a stats wizard. I was wondering if you could help me by translating your analysis in to more direct prediction of how large the team effect on actual save percentage might be. For example, is it conceivable that team defense can swing a .900 goalie to a .910 goalie? (under what conditions?).

JLikens said...

That's a good question.

Because the evidence is mixed, I'm not entirely confident that there is a significant team effect.

However, we'll work from the assumption that the correlation of 0.15 between starters and backups is both indicative of a real effect and representative of the magnitude of that effect.

In answering your question, the relevant issue is this: What kind of team spread in shot quality allowed does there need to be in order to produce a starter-backup correlation of 0.15?

Chris Boersma publishes EV shot quality data on his statistics website. Using data from 2009-10, I assigned the starters and backups for each team an expected save percentage based on the Shot quality allowed rating of their team.

I then simulated 50 seasons - in which the starters faced 1168 shots and backups 508 - and looked at the average correlation between starters and backups.

The average correlation was 0.13, which closely matches the observed correlation of 0.15.

Given that correspondence, we can look at the range in the team shot quality allowed values in order to get a sense of the practical effect size.

Tampa Bay allowed the least dangerous shots and had an expected EV save percentage of 0.925. Carolina allowed the most dangerous shots and had an expected EV save percentage of 0.913.

So it seems that the maximum effect that a team can have on the EV save percentage of its goaltender is approximately 0.006.

Sunny Mehta said...


I'm not sure what happened to your response to Anonymous, it seems to have disappeared. But I wanted to point out one thing...

If a shot quality model is measuring something other than scorer bias and playing-to-the-score effects, we should see the effects even in road games when the score is tied.

I took Chris' data for the 2010-'11 season and assigned each team the Expected Save Percentage that Chris' model predicts they'd have at even strength due to team shot quality effects alone, i.e. if every goaltender were the same.

I then let each team flip a coin weighted to their individual ExSVP, and flip it the number of times equal to their actual shots against at even strength on the road in tied games.

If there were no goalie skill at all, chance and team effects alone (according to Chris' model) would result in a save percentage spread of about .015 sd.

Turns out the actual sd was only about .013 though, so something about the model seems off. And it's not just a small sample size thing because if every team's coin were weighted to the same league average SvP, we'd see a spread of about .013 sd for EV-road-tied. So even if every goalie were the same, and the shots against sample size is as low as EV-road-tied, we should still see the results of team shot quality if the effect is as big as Chris' model assumes.

But we don't.

JLikens said...

Strange thing about my response disappearing. Hopefully anonymous had a chance to see it while it was still up. If not, I'll summarize what I said originally.

But to address your comment, the point of the exercise was to see what range in team EV SQA values would produce a starter-backup correlation in the neighborhood of 0.15. Chris' data fit the observed results well in that respect, so I decided to use it instead of arbitrarily assigning my own values.

As to the quality of the SQA data itself, I have no position either way, although it does seem that scorer bias is an issue, what with Tampa Bay consistently ending up at the top of the list.

Hostpph.com said...

it is quite interesting that you can make predictions with that kind of data and you can get conclusive facts about it.