Tuesday, March 9, 2010

Shot Recording Bias: Part n

This post is the third post that I've made on the subject. For a more detailed discussion of the methodology and reasoning applied, please refer to my first two posts ([1] [2]).

After looking over my previous post on the subject -- the one examining Florida and New Jersey specifically, I realized that I'd made an error in inputting the data for the 2007-08 and 2008-09 seasons. Here are the corrected charts. It may be necessary to enlarge them in order to properly view the information.

New Jersey

In my original post, I concluded that the shot recorder in New Jersey undercounts shots on goal. The corrected data does nothing but affirm that conclusion.

The only major difference is that the chart contained in my original post incorrectly showed that there were more shots counted in New Jersey home games than New Jersey road games in both 2007-08 and 2008-09. This led me to suspect that the bias may no longer persist, notwithstanding the fact that the shooting percentage in New Jersey home games was higher than the shooting percentage in New Jersey road games during the two seasons in question.

However, as is evident from the corrected chart, there were actually fewer shots in Devils home games for both 2007-08 and 2008-09. This is consistent with the data from previous seasons, the corresponding shooting percentage data for 2007-08 and 2008-09, as well as the undercounting hypothesis.


The corrected data for Florida, however, does serve to affect my conclusions somewhat. While the home-road shot gap for 2007-08 and 2008-09 is similar in magnitude to that observed in the previous three seasons, the shooting percentage data for those two seasons suggests that an overcounting bias may have emerged. However, I'm reluctant to assert the existence of a bias on the basis of two seasons worth of data alone, especially considering that the shot gap has not increased materially.

Other Arena Recording Biases

Given that we're on the subject, I figured I'd take this opportunity to explore the issue of shot recording bias more generally.

The above tables show each team's home-road splits for shots on goal and shooting percentage from 2003-04 to 2008-09. The first table shows the 15 teams that had the greatest number of recorded shots on goal in home games relative to road games, and ranks those teams in descending order. The second table basically shows the reverse.

The first highlighted column in each table displays the number of shots recorded in home games, minus the number of shots recorded in road games.

The second highlighted column displays road game shooting percentage minus home game shooting percentage.

Where there exists a significant positive value in both columns, an overrecording bias is implied.

Conversely, where both values are significantly negative, an underrecording bias is implied.

Looking at the two tables together, it would appear that the shot recorders in Colorado, Ottawa, Nashville and Boston overcount shots to some degree, whereas the recorders in Minnesota, Dallas, St. Louis and Vancouver are seemingly guilty of undercounting.

Of course, a more rigorous analysis is required before any conclusions can be reached.









The above tables break down the home-road shot and shooting percentage splits by game state and season for the eight listed teams. I'm not sure if these tables add all that much on top of the aggregated data presented earlier, although I think that their inclusion is valuable for two reasons.

For one, the home games of some teams might have featured more special teams play over the period in question, even by sheer chance alone. As both shot rate and shooting percentage increase significantly on special teams relative to even strength, this factor can potentially distort the overall data.

Secondly, it's important to break down the data by season in order to see if any of the apparent recording biases are time-limited -- that is, present in some seasons but not others. For example, it's conceivable that some teams have employed more than one arena statistician at different points over the last seven years.

As for the tables themselves, one thing that strikes me as unusual is that the home-road shooting percentage gap for the Wild is quite large on special teams yet virtually non-existent at even strength (indeed, not even in the predicted direction). I can't think of any reason why this would be so, although it leads me suspect that there may be no bias. The home-road shot gap is large, but that could be a product of the Wild playing more conservatively at home.

Looking at the data collectively, there's overwhelming evidence of a recording bias in Dallas and Colorado, strong evidence of one in Vancouver and Ottawa, and moderate evidence of bias in the other four locations.

The above table requires some description. It essentially shows the 95% and 99% confidence intervals for each team's home shooting percentage (that is, the shooting percentage by both teams) during the period in question (2003-04 to 2008-09). The intervals were generated by assuming that each team had the same underlying shooting percentage on the road as at home, and that shots were recorded accurately irrespective of game location.

The final column shows what the shooting percentage in each team's home games actually was over that timeframe. Values colored light blue fall outside the 95% confidence interval. Highlighted values fall outside both confidence intervals. White colored values are within both confidence intervals.

A specific example will be illustrative. The Stars shot 0.081 at EV from 2003-04 to 2008-09. Using that value as their underlying home shooting percentage, their home shooting percentage would be expected to fall within 0.075 and 0.087 95% of the time, and between 0.073 and 0.089 99% of the time. The observed value was 0.089, which strongly implies that shots were undercounted in Dallas during this period.

Of course, assuming that each team's actual road shooting percentage is roughly equivalent to its underlying road shooting percentage is somewhat questionable. For example, if the underlying shooting percentage in a team's road games is 0.092, the 95% confidence interval after 12000 shots -- the average number of shots in road games for teams during the 5 year period -- is roughly between 0.087 and 0.098.

That being the case, the above table represents a slightly different approach. The left-hand column titled 'DIFF' shows the absolute difference in home and road shooting percentage for each team over the entire sample, for both EV and overall. The right-hand column titled 'PROB' displays the probability of a difference that large or larger occurring by change alone (over 100 simulations).

So, by way of example, the difference between the EV shooting percentage in Dallas road games and Dallas home games was 0.008. The probability of a difference at least that large arising from chance alone is 5%. In other words, it probably isn't the result of chance, but, rather, because shots have been undercounted in Dallas over that period.


So, what can we conclude from all that?
  • The shot recorder in New Jersey undercounts (this was addressed in a previous post)
  • The shot recorder in Dallas undercounts
  • The shot recorder in Colorado overcounts
  • The shot recorder in Vancouver almost certainly undercounts
  • The shot recorder in Ottawa probably overcounts
  • The shot recorders in Boston and Nashville may overcount, but the evidence is not conclusive
  • The shot recorders in St. Louis and Minnesota may undercount, but the evidence is not conclusive