Monday, May 18, 2009

PP S% Percentage: Correction

Well, I feel like a bit of an idiot.

In my post examining powerplay shooting percentage, I concluded that a sizable component of the team-to-team variance in powerplay shooting percentage was non-random.

For the 2008-09 season, the standard deviation in team powerplay shooting percentage was about 0.018. I determined that the predicted standard deviation -- that is, the standard deviation that one would expect if powerplay shooting percentage was entirely random -- was roughly 0.011. Had that figure been correct, that would have meant that a mere 1/3 of the variation in powerplay shooting percentage could be accounted for by randomness.

However, I had erred in calculating that figure. The mistake is an embarrassing one: instead of using each team's shot totals, I used each team's PP S% multiplied by 1000 (the shooting percentages at behindthenet are expressed in this manner). Those familiar with Excel are aware that an incorrectly typed formula can sometimes have profound consequences.

As each team's shot total was roughly double their actual shot total, and the predicted standard deviation was much lower than it should have otherwise been.

Upon doing another set of simulations -- this time hopefully correctly -- it seems that the predicted standard deviation is somewhere in the area of 0.016.

This means that about two-thirds of the variance in team powerplay shooting percentage can be explained through randomness, which is considerably different from my original (flawed) estimate.

However, as identified by Vic Ferrari in the comments section, there was another reason why my study was somewhat flawed.

Vic correctly pointed out that the standard deviation in team powerplay shooting percentage will vary around the 'true' mean standard deviation from year to year. This means that using the standard deviation from any single season as an approximation for the true standard deviation is problematic.

I made the mistake of representing my findings as more or less definitive when that wasn't necessarily the case.

This methodological problem also applies to my earlier study that dealt with even strength shooting percentage.

Obviously, this somewhat limits the extent to which my findings are generalizable.

What I've done, then, is to apply this methodology to other seasons to see if the contribution of randomness to the overall variance is similar. I also did this for even strength shooting percentage (5-on-5 shooting percentage, strictly speaking, as the numbers are from behindthenet), as well as even strength shooting percentage when the score is tied.

The results:


Some observations:

1. It appears that all of the team-to-team variation in EV shooting percentage when the score is tied is due to randomness.

I don't think I've erred this time -- there was absolutely no interyear correlation between 07-08 and 08-09 for team EV shooting percentage when the score is tied (r=0.004).

2. By contrast, the contribution of randomness to EV shooting percentage in general is much lower, and appears to be somewhere around the order of 50%.

This implicates the 'playing to the score effect' as one of the non-random causes of team-to-team variation in EV S %.

3. Contrary to what my last post on the subject would have one believe, team powerplay shooting percentage actually appears to be more random in its distribution than EV shooting percentage, not less.

This seems counterintuitive to me and I'm not sure how much confidence can be placed in these findings. Perhaps there is some other flaw in my methodology that I've overlooked.

4 comments:

Ryan said...

You're a little vague about your methodology, but it sounds like you took number of power play shots as a given. Which isn't reality--power play shots are a random variable too. So your estimate of .016 for the standard deviation due to pure chance is actually a lower bound for the standard deviation due to pure chance. That means your estimate of how much of PP shooting percentage is random is, once again, low-balling it.

Ryan said...

Sorry, I was wrong. I looked at it a little more closely today, and I realised I was implicitly making the assumption that shots and goals are uncorrelated, which is obviously not true. Somewhere in the neighbourhood of 0.016 is, as far as I can tell, a good estimate for the pure chance shooting %age standard deviation, even with shots being randomly distributed.

JLikens said...

Here's a brief summary of the methodology involved, as described in my original post on the matter:

"I looked at how many shots each team took at 5-on-4 during the 2008-09 regular season. The values can be viewed at behindthenet. I then figured out the average 5-on-4 shooting percentage in the league (~0.128). I then simulated 100 'seasons'. In each 'season', the number of shots taken by each team was the number of 5-on-4 shots taken by that team during the 2008-09 season. However, the percentage of scoring a goal on each shot for every team was 0.128 -- the league average 5-on-4 shooting percentage. That is, each team was assigned the exact same shooting percentage. This is significant as, in any particular 'season', any deviation from the mean is strictly due to randomness, thus allowing one to determine how the spread in 5-on-4 should appear through the impact of randomness alone."

Kent W. said...

Ahh...special teams. The enigma of pro hockey. Vic has been saying for awhile that the PP and PK seem to be total mysteries when comes to trying to predict anything.