Tuesday, November 30, 2010

Adjusted Corsi Update and Goalposts

I've been posting adjusted team Corsi data on a more or less bi-weekly basis over the course of the 10-11 season. As a continuation of that trend, here are the updated numbers. For those that may not be familiar, the following adjustments have been made:
  • only even strength events with the score close (when the score margin is one or zero in the first two periods or tied in the third period and overtime) are included
  • a strength of schedule correction is made with reference to both game location and oppositional strength

[Column abbreviations, from left to right: Corsi For, Corsi Against, Corsi percentage, Corsi percentage rank, Schedule difficulty rank, Adjusted Corsi For, Adjusted Corsi Against, Adjusted Corsi percentage, Adjusted Corsi percentage rank]

I received an email in relation to my last post on the subject regarding the manner in which schedule difficulty is corrected for. I figured that I should post a summary of the methodology involved in case any other readers were curious.

The method is essentially an iterative process. In the first iteration, oppositional strength is calculated with reference to each team's raw Corsi (both for and against) with the score close, and each team's Corsi numbers are adjusted on this basis. The second iteration is identical to the first iteration, except that oppositional strength is calculated with reference to each team's adjusted Corsi, as calculated in the first calculation. Each subsequent iteration proceeds on this basis (i.e. calculating schedule difficulty through the adjusted Corsi data obtained in the previous iteration).

It turns out that ten iterations are sufficient -- the average change in Corsi percentage from the 9th to the 10th iteration is 0.00000043.

Within each iteration, several other adjustments are made that are worth mentioning. Firstly, an adjustment is made in terms of the location of each individual game and whether either team is playing a back to back. Secondly, each individual game is weighted with respect to the total number of Corsi events with the score close. Finally, a correction is made to account for the extent to which each team has contributed to the Corsi percentage of its opponents (this is necessary so that the adjustment does not favour weak teams).

In addition to that, I thought it might be worthwhile to throw up some data on the number of posts that each team has hit thus far, as well as how many times their opponents have struck iron. Vic Ferrari made two posts on this very subject during the 2007-08 season, which I found to be quite interesting then. I'm not sure of the information I'm about to present is all that valuable -- the underlying numbers provide a good indication of which teams have been fortunate or unfortunate, so there's not really any need to rely on goalposts as a proxy for luck. With that said, here's the data.

Based on then above, I think it's fair to characterize the Lightning, the Predators and the Flames as teams that have been unlucky. I'd also include the Wild within that group, given how poorly they've fared on the shot clock so far. On the other end of the spectrum lie the Hurricanes, as well as the Red Wings and the Sharks (the former because of their favorable differential, the latter two because of their favorable differential relative to their shot statistics).

For what's worth, I also ran the numbers for last year. There's been some debate within the hockey blogging world as of late as to whether its appropriate to regard Colorado's 0910 season as lucky. I figure that goalpost data is probably relevant to this issue to some extent, especially among those who reject the validity of Corsi and other shot statistics. Assuming that I haven't made any errors in obtaining the data, the Avalanche hit 27 posts last season, with their opponents hitting 41. Take that for what it's worth.

Tuesday, November 23, 2010

In Defence of Outshooting

David Johnson recently put up a post at his blog that examined the relationship between shot volume, shooting percentage and goals scored at even strength. Specifically, he determined each team's number of Fenwick shots (shots + missed shots) as well as their "Capitalization Ability" (goals scored/Fenwick shots) over the last three regular seasons and how each variable correlated with goals scored over that same period. He then repeated the exercise with respect to Fenwick shots against, save percentage (goals against/ Fenwick shots against) and goal prevention. After presenting his findings, the following conclusion was drawn:
"The conclusion we can draw from these four charts is when it comes to scoring goals, having the ability to capitalize on opportunities (shots) is far more important than having the ability to generate opportunities (getting shots). Controlling the play and generating shots does not mean you’ll score goals (just ask any Maple Leaf fan), having the talent to capitalize on those opportunities is what matters most. From my perspective, this means the usefulness of ‘Corsi Analysis’ to be minimal, at least for the purpose of evaluating players and teams."
At first glance, Johnson's findings and conclusion seem sound enough. For example, if we determine each team's Fenwick differential and Fenwick PDO (Fenwick SH% + Fenwick SV%) over the last three seasons, and look at the correlation of each with each team's even strength goal differential over that same timeframe, the following values are obtained.*

*empty netters were removed from the sample

The results appear to support Johnson's conclusion. The average correlation between outshooting ability (as measured by Fenwick differential) and goal differential is weaker than the average correlation between [shooting + save percentage] and goal differential. As Johnson might put it, the ability capitalize on opportunities and preventing the opposition from doing likewise is more important than having the ability to generate opportunities.

However, Johnson's analysis suffers in that he fails to consider the impact of random variation upon the correlations that he adduces as evidence to support his position. For example, suppose that outshooting was the sole determinant of even strength goal differential, with the percentages merely reflecting the favour (or disfavour) of the hockey gods. If a full NHL regular season was played out under such conditions, we would still expect to observe:

a) a less than perfect correlation between outshooting and goal differential
b) a substantial correlation between the percentages and goal differential

This can be illustrated by simulating the last three NHL seasons a sufficiently large number of times and averaging out the results, using the following parameters:
  • the number of shots taken by a team in any given simulation was the number of Fenwick shots taken by that team during the season simulated
  • conversely, the number of shots taken against that team corresponded to the number of Fenwick shots against it conceded during that year
  • each team's probability of scoring a goal on any particular shot was the league average Fenwick shooting percentage in that particular season (~5.5%).
  • similarly, the probability of conceding a goal on any particular shot was the same for all teams, again corresponding to the league average Fenwick shooting percentage during the season in question
  • after each simulation, the correlation between Fenwick differential and even strength goal differential at the team level was determined and recorded, with the same then being done with respect to [Fenwick shooting percentage + Fenwick save percentage] and even strength goal differential
The results:

So, in the three simulated seasons, the average correlation between Fenwick differential and even strength goal differential was 0.73, whereas the average correlation between [Fenwick shooting percentage + Fenwick save percentage] was 0.67. This is significant for two reasons.

Firstly, even in our imaginary world in which the only way for a team to control its goal differential is through generating and limiting shots, the correlation between the percentages and goal differential is effectively as large as the correlation between outshooting and goal differential. This despite the fact that teams have no ability to influence the former.

Secondly, the simulated values (0.73 and 0.67) are comparable to the actual values (0.54 and 0.61), suggesting that the underlying factors that dictate even strength goal differential in the real NHL are not too different from those that prevail in our simulated world. The relationship between the percentages and outscoring is slightly stronger, and the relationship between outshooting and outscoring slightly weaker, but that's to be expected. After all, we know that:

1. There is a skill component to both even strength shooting percentage and even strength save percentage at the team level.
2. Game score (whether a particular team is playing while tied, from behind or while leading) has an effect on both shot differential as well as the percentages. Over the course of a particular season, the amount of time played in each of these goal states at even strength while vary from team to team.
3. Not all teams adopt the same strategy in relation to playing to the score.

The influence of these last two factors cannot be overstated. For example, if we repeat the above exercise, but use only data from when the score was tied at even strength, the actual results are essentially indistinguishable from the simulated results.

Therefore, the fact that there exists a strong relationship between the percentages and even strength goal differential over the course of a single regular season does not in any way negate the utility of Fenwick, Corsi or even strength shot differential as a measure of a team's ability level. Results at the NHL level are strongly subject to the influence of random variation, even over what might seem like a long period of time (i.e. a single NHL season). Losing sight of this fact - or ignoring it to begin with - can only lead to misguided analysis and flawed conclusions.

Friday, November 19, 2010

Adjusted Corsi w/ Score Close - Part III

In two previous posts, I showed how each team had performed in terms of Corsi percentage with the score close at that particular point in the season, and then adjusted each team's percentage in order to account for variance in strength of schedule.

While my strength of schedule adjustment corrected for game location (i.e. whether, from a particular team's perspective, the game was being played at home or on the road), it did not account for back-to-games. As it turns out, this was a mistake on my part -- the effect of back-to-games on Corsi (with the score close) is much larger than the effect of game location.

Table Legend
GP = Games played; The number of games played in the 2010-11 season which meet the criterion in the left hand column
R Corsi = Road Corsi; The number of shots directed at the net at even strength by the road team
H Corsi = Home Corsi; The number of shots directed at the net at even strength by the home team
R Corsi % = Road Corsi Percentage; the Corsi percentage from the road team's vantage

The above table shows how game location and whether or not the game is a back to back for either team have interacted to effect Corsi percentage with the score close in games played during the 2010-11 season thus far (up to and including the 268th game).

Some observations:

Firstly, in games in which one team played the night before and the other team did not, the road team carries the burden roughly 75% of the time. I guess the league wants to give the home team every advantage it can without making it seem too obvious.

Secondly, the effect of back-to-back games on Corsi percentage with the score close is considerable -- approximately 0.04. In other words, an average team playing a likewise average team that played the previous night can expect to achieve Corsi results on par with those of the 2009-10 Boston Bruins (a strong, if underrated, team). If the positions are reversed, however, it can expect to perform more along the lines of the 2009-10 Tampa Bay Lightning. That's a sizable difference.

Thirdly, the effect of game location, in and of itself, is pretty small. In games in which both teams are playing on at least one day's rest, the home team only marginally outshoots its counterpart.

With these findings in mind, I decided to modify my method of correcting for scheduling difficulty by accounting for the effect of back-to-back games in addition to game location. Here are the results for all games played as November 17.

More observations:
  • The Capitals raw numbers are ok, but they really get killed by the schedule adjustment. I think that they've played the Thrashers three times already
  • Chicago's numbers might come across as underwhelming, given where they were last year, but they're improved in this regard as the season has progressed. I expect that to continue
  • The Oilers are a real outlier. They're 0.095 from the mean when the next furthest team is a mere 0.064. Things don't look good
  • The Wings may not be what they once were in terms of territorial dominance, but they've still managed to best everyone else. It's hard not to have some degree of admiration for that club

Thursday, November 4, 2010

Adjusted Corsi w/ Score Close - Updated

A couple of weeks back, I put together a post showing how each team had performed thus far in terms of Corsi (shots directed toward the net at even strength) with the score close (whenever the score margin is zero or one in the first two periods, or zero in the third period or overtime) after making an adjustment for schedule difficulty (the method of adjustment is described in the original post).

Here are the updated rankings as of game 177 (VAN@COL). I should note that Game 124 between the Capitals and Hurricanes wasn't included due to the lack of a play-by-play feed.

(Table abbreviations: C F = Corsi For; C A = Corsi Against; C % = Corsi Percentage; SD RANK = Schedule Difficulty Rank (lower values indicate a more difficult schedule); ADJ = Adjusted)

Some observations:
  • Anaheim, Atlanta and Edmonton are terrible
  • If Florida can maintain their pace, they might be the league's most improved team in this respect
  • What's going on in Washington?
  • Neither the Devils nor the Sabres are as bad as their records would indicate, but even this assessment of their abilities isn't all that positive

Wednesday, November 3, 2010

Score Effects and Minor Penalties

The playing to the score effect has received a fair amount of coverage from the more statistically inclined members of the hockey blogosphere over the last two years or so (see here, here and here for some good overviews on the subject). In short, playing from behind tends to have a favourable effect on a team’s shot ratio, whereas playing with the lead tends to have the opposite result. The effect increases linearly as a function of goal margin, and is exaggerated in the third period.

The majority of analysis conducted in relation to the subject thus far has focused on the effect of game score on shot ratio. Little, if anything, appears to have been done in the way of determining the extent to which the effect operates in other aspects of the game. It’s conceivable, for example, that game score could have an analogous effect on team penalty percentage (defined as minor penalties drawn / ( minor penalties drawn + minor penalties taken).

In order to answer the above question, I looked at the NHL.com play-by-play data from the 2007-08 and 2008-09 seasons and created a script on excel to determine how many penalties each team drew and took during that period. However, because I was only interested in penalties that provided one of the teams with a manpower advantage (relative to the situation that existed prior to the penalty was called), I only counted “unique” penalties. I defined a unique penalty as a penalty not accompanied by the calling of any other penalty at that specific point in time. This obviously fails to account for situations where multiple penalties are called and one team emerges with a powerplay, but such situations are rare enough so as to not affect the data materially.

I then determined the goal state prevailing at the time that the penalty occurred – in other words, whether the team that drew/took the penalty was trailing, leading or tied at the time that the penalty was called. Here are the combined results for the two seasons, broken down at the team level.

Evidently, the trailing team does significantly better in terms of penalty ratio than does the leading team. During the period in question, every single team did better in terms of penalty percentage when trailing than when leading. In aggregate, the trailing team drew roughly 54.5% of all penalties, making the magnitude of the effect similar to that observed with respect to shot percentage (the trailing team had an aggregate Corsi percentage of 55.2 during the same timeframe).

One final question remains – does the trailing team earn its penalty advantage on merit, or is it a product of referee bias? Given that the trailing team also enjoys an advantage in terms of Corsi, and therefore spends more time in the opponent’s end than its own (an area of the rink in which a disproportionate percentage of penalties are drawn), one might be inclined to favour the former explanation. However, as demonstrated in the table below, there isn’t much of a relationship between Corsi percentage and Penalty percentage when the score margin is other than zero.

In other words, while the trailing team tends to both outshoot and “outdraw” the leading team, teams that outshoot the opposition by a large margin when playing from behind don’t do significantly better with respect to (trailing) penalty ratio than teams that outshoot the opposition to a lesser extent.

As alluded to above, the second possibility is that the penalty advantage accruing to the trailing team is the result of referee bias. If this explanation is correct, then the team-to-team variation in both leading and trailing penalty percentage would be the product of both randomness and team ability differences in drawing more penalties than the opposition. To test this hypothesis, the following experiment can be performed:

  • taking each team’s penalty percentage with the score tied over the two seasons in question and regressing each value 60% to the mean. The resulting values provide an estimate of each team’s underlying ability to draw more penalties than the opposition. The regression is necessary given that approximately 60% of the team-to-team variation in penalty percentage with the score tied (in the sample of question) can be attributed to luck
  • simulating the two seasons such that every “unique” penalty that occurred when each team was trailing constitutes an individual trial
  • designate the probability of drawing any given penalty as that team’s “true ability” penalty percentage (as determined above) plus 0.045 (0.045 being the magnitude of the referee bias)
  • calculate the average team-to-team spread (standard deviation) in penalty percentage after conducting a sufficiently large number of simulations
  • compare the predicted standard deviation to the actual value
  • repeat the above with respect to all penalties that occurred when each team was leading

The results:

The essentially confirms the above hypothesis in that the predicted standard deviations are virtually identical to the actual values. As such, an analogy can be drawn between the trailing team advantage in penalty percentage and home ice advantage. The probability of a team winning any given game is approximately 5% higher on home ice relative to neutral ice, but all teams benefit from the effect equally (that is, the team-to-team variation in home vs. road winning percentage is entirely random). Similarly, the referee bias in favour of the trailing team appears to be even across the board.