Wednesday, January 26, 2011

Even Strength Outshooting and Team Quality

Readers familiar with the team even strength data that I've published over the course of the season might wonder why I seem to place a large amount of emphasis on even strength shot ratio with the score tied.

After all, only about 35% of league play occurs with the score tied. And of that 35%, one-fifth consists of special teams play. Taken together, that means that time played at even strength with the score tied represents less than 30% of a typical NHL game.

Indeed, if we examine the relationship between a team's even strength shot ratio with the score tied and it's overall goal ratio for every season since the lockout, we find an average correlation of 0.556, meaning that even strength shot ratio only accounts for roughly 30% of the variance in outscoring with respect to a single NHL season.

However, because goals in the modern-day NHL are relatively rare events, a substantial proportion of the team-to-team variation in seasonal goal ratio can be attributed to luck. For example, random variation accounted for 47, 35, and 41 percent of team variation in goal ratio in 2007-08, 2009-10 and 2009-10, respectively.

In a hypothetical season with a sufficiently long schedule, that random variation would eventually disappear, leaving each team with a goal ratio commensurate with its abilities. What each team's goal ratio might look like in such a scenario can be approximated by taking its seasonal statistics - namely, shot ratio, shooting percentage and save percentage - and adjusting them to account for the extent to which each one is affected by random variation.* For both shooting and save percentage, the adjustment is significant as luck accounts for a majority of the variation in respect of both over the course of a single season, as indicated in the table below.

For shot ratio, however, the adjustment is less severe as the impact of randomness is comparatively smaller. Consequently, as the sample size increases, so too does the correlation between shot ratio and goal ratio.

If this exercise is performed for each post-lockout season, one is able to determine the relationship between true goal ratio and even strength shot ratio with the score tied. The results:**

Therefore, in an imaginary league in which luck is a complete non-factor, EV shot ratio with the score tied would account for roughly 65% of the variance in outscoring. In other words, even though the two variables may not be strongly correlated over the course of a single season, a team's EV shot ratio with the score tied serves as a reasonably good indicator of how it can be expected to perform over the long run. This is especially true for the three most recent seasons, in which EV shot ratio accounts for 75% of the variation in outscoring ability. It seems that as the level of parity between teams has increased, even strength shooting has become even more important.

Finally, the remaining 35% of outscoring variance indicates that there are other sustainable components of team success. Apportioning the remaining proportion of the variance between these components gives us an idea of their relative importance.

As special teams ability and EV tied shot ratio are correlated variables, residual special teams skill refers to the proportion of special teams skill that cannot be accounted for by EV tied shot ratio. Residual specials teams skill accounts for about 49% of the remaining variance.

Similarly, residual EV shot ratio refers to the proportion of even strength outshooting that cannot be predicted by EV shot ratio. This accounts for 7% of the remaining variance.

The rest of the remaining variance is explained by even strength shooting, even strength save percentage and residual variance. Residual variance is the amount of variance left over after subtracting the sum of the other four components from 1. It results from the fact that the four components are not uncorrelated, independent variables.

* even strength and special teams statistics were, of course, treated separately for this part of the analysis

**There is an alternative calculation that can be applied as a check on the correctness of these values. As the seasonal reliability of both goal ratio and EV tied shot ratio is imperfect, it is necessary to upwardly adjust the observed correlations between the two variables in order to ascertain their 'true' relationship - that is, the correlation that would result if each variable was perfectly reliable.

The adjustment involves dividing the observed correlation by the square root of the product of each variable's reliability co-efficient. In other words

r adjusted = r observed/ SQRT( reliability EV tied shot ratio* reliability goal ratio )

The application of the above formula involves determining the reliability co-efficients for each variable, which can be calculated as follows:

reliability = 1- [(1- split half reliability)/SQRT(2)]

If these formulae are applied with respect to each post-lockout season, the following results:

The average adjusted correlation is 0.81, which is comparable to the average adjusted correlation obtained through the first method (0.804). It should be noted that this second method is likely to slightly overestimate the true correlation, given that the two variables are not truly independent.

EDIT: Accidentally used Fenwick ratio instead of Shot ratio when determining observed correlations for 2007-08, 2008-09 and 2009-10. Table and accompanying discussion has been edited accordingly.

EDIT 2: In re-thinking the method used in the alternative calculation, it occurred to me that the better way to adjust the observed correlations would be to calculate all three input values at the half-season level.

There's no sense in using the split-half reliabilities in order to estimate the full reason reliabilities for EV shot ratio and goal ratio when the split-half reliabilities can be used themselves, given that the split-half correlation between EV shot ratio and goal ratio is readily ascertained.

This approach produced the following results.*

* the half-season values were calculated through randomly selecting 40 games, randomly selecting another 40 games without replacement, and determining the correlation between the relevant variables across the data sets. This was repeated 1000 times, with the average values used.


Wednesday, January 19, 2011

EV Data for Games 1 - 692

The spreadsheet that should appear below contains detailed EV data at the team level for games 1 to 692. As has inexplicably failed to publish play-by-play data for the following games, they were not included:

Game 124 - WSH@CAR
Game 429 - ATL@NYI
Game 491 - PHX@PIT*

*The play-by-play feed for this game was initially available, but is no longer accessible. Consequently, I have EV and EV close data for this game, but no EV tied data.

[If you're having difficulty viewing the document, click here to view the spreadsheet directly at googledocs.]

A couple of points:

The document contains three worksheets. The first sheet shows even strength data for all situations. The second shows even strength data for when the score was close (i.e. whenever the score margin was 1 or 0 in the first two periods, or tied in the third period or overtime). The last sheet contains data for when the score was tied.

Empty net goals have been removed from the data.

The abbreviations are defined as follows:

GF: goals for
GA: goals against
SF: shots for, where shots = goals + saved shots
SA: shots against
SHOT%: shots for/(shots for + shots against)
SH%: shooting percentage
SV%: save percentage
PDO: shooting percentage + save percentage
FF: fenwick for, where fenwick = shots + missed shots
FA: fenwick against
F%: fenwick for/ (fenwick for + fenwick against)
CF: corsi for, where corsi = shots + missed shots + blocked shots
CA: corsi against
C%: corsi for/ (corsi for + corsi against)
ADJ: refers to the fact that an adjustment for schedule difficulty has been made. See here for the details of the adjustment process.

Finally, as mentioned earlier, I've included EV tied data in this go around. The reason for that is that it appears that score effects are still very much relevant when the score margin is 1 in the first two periods. For example, consider the table below which shows how teams trailing by one goal in the first two periods have performed with respect to shot percentage and PDO over the last three seasons.

In other words, teams trailing by one goal in the first two periods tend to play more aggressively, which increases their shot differential yet hurts their PDO. Additionally, there appear to be strategic differences between teams with respect to style of play when the score margin is one in the first two periods. That's a topic that I plan to explore in more detail in the upcoming weeks.

The result of all of this is that Fenwick or Corsi percentage with the score tied should be a better measure of a team's true ability to control the play at even strength, given that EV score close shot statistics tend to favor teams that play more from behind and/or play more aggressively, relative to the average team, when leading or trailing by one in the first two periods.

Tuesday, January 4, 2011

East vs West Follow Up

In my last post, I included a detailed breakdown of how the two conferences have matched up against one another so far this season and, on the basis of the data, concluded that there wasn't much to choose between them, the West's superior record notwithstanding.

My conclusion was implicitly premised on the assumption that there's no significant skill difference between the two conferences with respect to shooting or save percentage. The rationale behind that assumption was that, while there are talent differences between teams in terms of the percentages, those differences should cancel out when comparing large groups of teams. Without taking a further look at the data, however, it's impossible to determine whether or not the assumption relied on is true.

In determining the above issue, I think that it might be helpful to look at data from interconference games over the last six seasons. I'd delve back further in time, but 2003-04 is the oldest season for which I have advanced statistics at the team level. It's well established that the West dominated the East over this timeframe, as evidenced by the table below. The West had the better record in each of the six seasons examined, with an aggregate winning percentage of 0.54. That's only slightly better than what one would expect on the basis of their goal ratio (its expected winning percentage was 0.536).

[The % column indicates the number of percentage of East wins/goals/shots/powerplay opportunities as a percentage of overall wins/goals/shots/powerplay opportunities. For the post-lockout seasons, games that went to a shootout are considered ties. Empty nets goals have been removed from the data.]

In order to determine the nature of the West's dominance, however, it becomes necessary to take a more granular look at the data. This can be achieved through looking at how each conference has done in terms of shots and the percentages, through breaking down the data by game situation (even strength and special teams), and through looking at data from when the score was tied in order to identify and/or control for playing to the score effects.

[I elected not to include data on shorthanded shots and goals given that shorthanded scoring is neither an important nor sustainable component of team success].

The first thing one might notice is that the West did better than the East across the board - both at even strength and on special teams, and both in terms of shots and the percentages. It was also a fair bit better on drawing penalties, which is noteworthy given that NHL referees tend to favor the trailing team -- Eastern teams presumably would have spent more time playing from behind than Western teams during the games sampled.

However, while the West has technically outperformed the East in respect of shooting percentage, the difference is marginal. Indeed, there's effectively no difference at all at even strength, and while the difference in terms of PP SH% is larger, it's not necessarily reflective of an underlying talent advantage. For example, if the two conferences had the same "true" PP SH%, one would expect one conference to have an advantage of at least 0.004 approximately 50% of the time.

Similarly, the fact that one conference tended to outperform the other in terms of EV SH% in specific seasons should not be construed as meaningful. Random variation requires an average difference of 0.00468 when comparing the shooting percentage of one conference to the other. The actual value? 0.00467.

The success of the Western teams was driven primarily by its outshooting advantage, and this was true at both even strength and on the powerplay. The fact that the data for this season shows no such advantage for Western teams suggests to me that it may no longer be the better conference. While it's true that the West has done better than the East in relation to the percentages, it's difficult to interpret that as a difference in underlying skill for the reasons outlined above.