Fancy Stats: The Good, The Bad, and The Ugly
I have a love-hate relationship with statistics. I always did well in statistics classes, and have excelled at applying and developing ways to measure business performance in my career. But statistics for me can be frustratingly inexact. You attempt to find correlation in data sets that appear disparate in order to extrapolate and predict outcomes, or measure results against established baselines to judge performance. In business, an unexpected contract may be an outlier that corrupts your model. In sports, a role player may make a big play against the superstar that is way outside what you’d expect from them. You identify these outliers, but fight the everlasting battle to maintain confidence levels. At what point does changing that simple sales forecast into a twelve variable monster formula, or layering several statistics to determine if a defenseman tends to play a ‘stay-at-home’ or ‘offensive’ game, miss the point. That question is what my ‘Fancy Stats’ feature series will focus on.
Don’t get the ‘Fancy Stats’ moniker wrong. Statistical analysis is big business and I’m not mocking their usefulness in the sports world. Baseball started the trend for sports with sabermetrics – a movement to answer objective questions based on statistics using in-game activity. A similar movement is breaking through with hockey. There is a dedicated and passionate group of individuals producing analysis, projects, and endless amounts of data to use. I see a lot of good coming out of this work – but also a lot that frustrates me because of the way people hang opinions on the numbers. The advanced statistics that have been developed provide a wonderful set of parameters that can be applied in an attempt to answer questions. But what questions are being asked? And are they the correct questions? When do the stats become so layered that you’ve created a number that fits a narrative? The battle with the hockey purists occurs on this line – where their opinion is generally that these statistics can’t provide the insight that getting on the ice and carrying a puck does.
This first entry in the series introduces a few statistics I’ll work with in future posts, and links you to some of the already immense amount of writing on the subject. Please read and enjoy if you chose to broaden your knowledge even more. I’m admittedly new to a lot of these statistical measures myself, but I learn by doing so this is my outlet to rip them apart and see how they work. Then apply them and re-arrange them to answer the questions I have about the game.
A majority of these advanced statistics aim to answer a simple question – who possessed the puck more? The concept goes that you can’t score unless you possess the puck, and you also can’t be scored on if you possess the puck. Possession is key to winning hockey games, but a quick scan of the stats page on the NHL’s website does not show any discrete possession information for teams, individuals, or even in box scores. Bring in the fancy stats to answer the question.
What if there was a concrete fact that could represent possession? The simple fact is when you possess the puck your objective is usually to score a goal. You make passes, get into position, and shoot the puck at the net. Sometimes it reaches the back of the net, other times the goaltender snags it. There are misses, and it may even get blocked along the way. It turns out that those shots are all being captured in game and can be added up for what is known as Corsi (named after Buffalo Goaltending Coach Jim Corsi). You can attribute these Corsi events to teams, individuals, and split them up in all sorts of fun ways (those will become apparent in future posts). Similar to Corsi is Fenwick, which measures everything Corsi does except blocked shots. The Fenwick measurement is intended to eliminate any influence an opponent skilled at blocking shots may have. Both are usually analyzed for even strength shots only, to eliminate possession bias on the power play. But there is value in viewing all activity, depending on the question you are trying to answer. Check out this post from SecondCityHockey.com for a real world example and breakdown of the two measures.
More statistics that will be discussed in these posts include Shot Percentage (goals scored in all the shots on goal), Time on Ice, Quality of Competition, and Zone Start Percentage. I will go into more on these as we encounter them in future posts. I strongly recommend reading this six part series by Matchsticks and Gasoline, SBNation’s Calgary Flames community – but I’ll cover the basics when needed. If you’d like a fun read about these statistics and the community in general, I recommend Grantland’s The Faker’s Guide to Advanced Stats in the NHL.
My series will focus on applying statistics to practical questions. The goal is to examine how and if they can provide answers particularly from a hockey player’s or coach’s perspective. Please post comments and offer feedback – the only way this community can advance is with active participants. I hope you will join in.