|
Displaying pretext
I guess it was only a matter of time until I put this together.
On Google documents I have placed an all-time GVT database that covers the NHL, WHA, AHL, IHL, the Swedish, Finnish, Russian, Czech and German elite leagues, as well as some of the Canada Cups and Olympic tournaments. There is both a text file, multi
_gvt.txt, and an Excel file, multi_gvt2.xls. The text file contains all the players; the Excel file only contains the first several hundred, as the site didnt allow me to upload spreadsheets larger than ~10 MB (!). You can download the text file then paste it into an Excel spreadsheet.
Calculating the GVT data itself was fairly simple, although for many of these leagues the data I was able to find was quite rudimentary: Games, Goals and Assists. For recent years, goaltender shots against and player plus-minus are available, which at least gives us a skeleton of defensive information to work with. For now, I have restricted myself to seasons for which I had enough data to work with and in which there is enough overlap between the various leagues that I can be reasonably confident of my normalization rates. Ive started in 1983 for the AHL, 1988 for the Swedish Elitserien, 1989 for the Finnish SM-Liiga, 1992 for the IHL, 1999 for the Czech Republic League, 2001 for the German Deutsche Eishockey League and 2003 for the Russian Elite League / KHL. I have also included all the Canada Cups, World Cups and the 2006 and 2010 Olympics. More seasons will get added as I find the time.
The process of normalization, of course, is the most interesting part. GVT naturally normalizes to 3 goals a game, and I normalized goals and assists in the same way. I also normalized for schedule length, although because of the huge disparity in schedule lengths between various leagues, and the different levels of variance inherent in each, I could not normalize every league to 82 games: what would I do with an 8-game Olympic tournament? My compromise was to normalize versus a minimum of 70 games, so if a league had a 50 game season, I normalized the games played to 50 / 70 * 82 = 59 games. Its not perfect, but its close enough.
The second part was to normalize for league difficulty. Past approaches, especially the most well-known ones by Gabriel Desjardins, normalize by games played. This is the correct approach when doing projections for the majority of players, as good players in lower leagues will often be marginal players in the NHL. However, I chose to use a translation system that was more accurate for elite players; to do this, I needed to normalize by ice time instead of by games played. Obviously, I dont have ice time numbers for any league but the NHL, but I do have a simple algorithm to estimate ice time based on basic statistics. Its not perfect, but its close enough. The upshot is that my normalization factors are slightly higher than Gabriels and track good players better but weaker players worse.
|