Wednesday, January 19, 2011

EV Data for Games 1 - 692

The spreadsheet that should appear below contains detailed EV data at the team level for games 1 to 692. As NHL.com has inexplicably failed to publish play-by-play data for the following games, they were not included:

Game 124 - WSH@CAR
Game 429 - ATL@NYI
Game 491 - PHX@PIT*

*The play-by-play feed for this game was initially available, but is no longer accessible. Consequently, I have EV and EV close data for this game, but no EV tied data.



[If you're having difficulty viewing the document, click here to view the spreadsheet directly at googledocs.]


A couple of points:

The document contains three worksheets. The first sheet shows even strength data for all situations. The second shows even strength data for when the score was close (i.e. whenever the score margin was 1 or 0 in the first two periods, or tied in the third period or overtime). The last sheet contains data for when the score was tied.

Empty net goals have been removed from the data.

The abbreviations are defined as follows:

GF: goals for
GA: goals against
SF: shots for, where shots = goals + saved shots
SA: shots against
SHOT%: shots for/(shots for + shots against)
SH%: shooting percentage
SV%: save percentage
PDO: shooting percentage + save percentage
FF: fenwick for, where fenwick = shots + missed shots
FA: fenwick against
F%: fenwick for/ (fenwick for + fenwick against)
CF: corsi for, where corsi = shots + missed shots + blocked shots
CA: corsi against
C%: corsi for/ (corsi for + corsi against)
ADJ: refers to the fact that an adjustment for schedule difficulty has been made. See here for the details of the adjustment process.

Finally, as mentioned earlier, I've included EV tied data in this go around. The reason for that is that it appears that score effects are still very much relevant when the score margin is 1 in the first two periods. For example, consider the table below which shows how teams trailing by one goal in the first two periods have performed with respect to shot percentage and PDO over the last three seasons.

In other words, teams trailing by one goal in the first two periods tend to play more aggressively, which increases their shot differential yet hurts their PDO. Additionally, there appear to be strategic differences between teams with respect to style of play when the score margin is one in the first two periods. That's a topic that I plan to explore in more detail in the upcoming weeks.

The result of all of this is that Fenwick or Corsi percentage with the score tied should be a better measure of a team's true ability to control the play at even strength, given that EV score close shot statistics tend to favor teams that play more from behind and/or play more aggressively, relative to the average team, when leading or trailing by one in the first two periods.
.
.
.

9 comments:

  1. This is great stuff.

    It looks like game the play-by-play for 20491 is back up, by the way. Or, at least, I'm not having trouble seeing it. The game was only tied for the first 6:48. Shots were 5-2, Fenwick 6-5, and Corsi 7-5, all in favour of PIT.

    ReplyDelete
  2. Weird. I'm still getting a blank screen. And I have no problem viewing 490 or 492. Not sure what the issue is.

    Thanks for the numbers, though. I'll make sure to update the data accordingly.

    ReplyDelete
  3. Great work again. However, I can't see the full data...on the right (COR%?) it's prob. my computer..any tips??
    thanks

    ReplyDelete
  4. Anonymous:

    I might have to play around with the table width settings.

    Are you able to use the scroll bar at the bottom? If so, how far to the right are you able to scroll?

    ReplyDelete
  5. only able to scroll to barely CF.

    ReplyDelete
  6. I've included a link below the data that links to the spreadsheet at googledocs.

    Hope that helps.

    ReplyDelete
  7. How do you calculate the adjustments for Corsi?

    ReplyDelete
  8. I described the adjustment process in this post.

    I've excerpted the relevant parts below.

    "The method is essentially an iterative process. In the first iteration, oppositional strength is calculated with reference to each team's raw Corsi (both for and against) with the score close, and each team's Corsi numbers are adjusted on this basis. The second iteration is identical to the first iteration, except that oppositional strength is calculated with reference to each team's adjusted Corsi, as calculated in the first calculation. Each subsequent iteration proceeds on this basis (i.e. calculating schedule difficulty through the adjusted Corsi data obtained in the previous iteration).

    It turns out that ten iterations are sufficient -- the average change in Corsi percentage from the 9th to the 10th iteration is 0.00000043.

    Within each iteration, several other adjustments are made that are worth mentioning. Firstly, an adjustment is made in terms of the location of each individual game and whether either team is playing a back to back. Secondly, each individual game is weighted with respect to the total number of Corsi events with the score close. Finally, a correction is made to account for the extent to which each team has contributed to the Corsi percentage of its opponents (this is necessary so that the adjustment does not favour weak teams).

    ReplyDelete
  9. I don't know why it is quite hard for them to post that kind of information. it is quite hard to miss data these days.

    ReplyDelete