Tuesday, March 18, 2014

SSAC Recap Day 1


For those of you that missed the 2014 MIT Sloan Sports Analytics Conference, I wished to provide a recap.  Here are some shorthand notes from Day 1.
Opening Remarks: Daryl Morey (Geek Elvis) and Jessica Gelman

The theme for this year’s conference is ripple to revolution, describing how the field of analytics has grown in the sports world over the past few years, thanks in part to this conference. It started as a small field and known has become an integral part of competition. About 2,000 people were in attendance for the conference, which had approximately 1,000 on the waitlist alone. 10% of the attendees are international; there are 800 students from 180 institutions, and 360 sports organizations in attendance. The “heart” of the conference is the 26 panels offered. There are also four innovative areas to highlight:

1. Research Papers- with over 300 submission and 8 finalists

2. “Evolution of Sports” Talks

3. Trade Show Blitz- Start Ups Pitching Ideas to Venture Capital/ PE Firms

4. Competitive Adjustment: presented by industry experts



Panel #1: Athlete Analytics: Instrumentation, Training, and Injuries
Panelists: Shira Springer (moderator), Andrew Luck, Matthew Hasselbeck, John Brenkus, Adir Shiffman, and Qaizar Hassoujee

1st topic was on the evaluation of NFL Combine Process - Hasselbeck had interesting experience- was not invited to it after entering draft as Junior, was in same draft as Ryan Leaf and Peyton Manning

A.Luck- “There is merit to the interviews and psychological testing done”, wants to see as a QB more football specific movements as part of combine, such as simulating 2 minute drill, how to quantify focus and clock management

Brenkus- interesting to note that the 40 yard dash does not really matter for wide receivers- look at times of Jerry Rice and Larry Fitzgerald, change of direction and stopping is more important - for quarterbacks an important metric is concerning their release, not overall arm strength, ex: Kaepernick tested high for quick release

Shiffman- the real challenge is to provide data that is valid, 3rd party testing is needed

JB- it’s all about context, the defensive rookie of the year the last three years has done well in the combine - Florida State and its injury prevention use of analytics with performance maximization - great success was no soft tissue injuries at Combine this year

AL & MH: see it as a hindrance to wear performance-wearing tracking technology, too bulky - This year the Colts tried out different wearables, such as corrective posture shirts and masks that simulated altitude play

S: if athletes are aware that they are wearing technology, then the technology is a failure- they are conscious of it - most of the early adopters of new technology are the teams that are on the cusp of greatness and cannot buy additional success in free agency - baseball is most hesitant to adopt new technology, have very traditional approach

JB: bat speed actually goes down if you swing a weighted bat in on deck circle, argues that players should be swinging a whiffleball bat instead - NASCAR seen as head of analytics- always looking for ways to shave 1/100th of second

MH: we are always concentrating on studying the analytics on the opponent, might need to look at data on ourselves more

JB: there exists a generational gap when thinking about different levels of education on technology, e.g. some coaches are still very old school

How they all see future of this field

S: in- ball technology, inertial movement

AL: invisible sensors

MH: hydration levels testing, head impacts

Q: college level, sport specific, fan engagement

JB: defining standards Colts use analytics most in terms of 3rd down blitz tendencies and red zone situations Buzz word: CONTEXT the “holy grail” is predictive analytics

What they would like to see measured

AL: pitch count/ arm throw count and analysis

S: position-specific data

MH: recovery methods- which are best

JB: sport psychology

Final comment by Hasselbeck: would be interesting to put a heart rate monitor on Adam Vinatieri when he is about to kick game winner vs. all other kickers in league



Panel #2: Man & Machine: Real Time Data and Referee Analytics

Panelists: Hank Adams, Michael Bantom, Dan Brooks, Mike Carey, Paul Hawkins, Tom Penn (moderator)

Bantom: most important thing is for referees to maintain their focus over emotions

Adams: created SportVision, which is used for 1st down lines in football

Hawkins- sports of cricket, tennis, and soccer have real time computer decisions, have a signal that is relayed to the referee’s watch when ball crosses goal line in soccer (goal to increase the speed in soccer) - goal for referees and technology creators is to not get noticed, Example: arguably best tennis match of all time 2009 Wimbledon Final (Nadal vs. Federer)- was a critical line call play that was called correctly Brooks- sorts through the SportVision data and maps calls of umpires in MLB, some people are unhappy with inconsistencies

Bantom: NBA has log of every call made in the last 10 years and they rate each one as correct or incorrect, looking for next year to have a centralized location for calls and replay reviews

Carey: issue of player safety, NFL is “game of inches”- get tough calls with “double action”, where knee hits, ball hits and breaks plane - there has been an adaptive habit among players of how to approach tackles by not hitting with the helmet

Hawkins: issues in soccer with cheating in regards to diving vs. what is actually a genuine foul - future may look at review of red cards and other fouls to possibly overturn, want to be proactive with approach rather than reactive, Example: Lampard goal that was overturned in the England vs. Germany match in World Cup Carey: still uses rubber bands on each of his fingers during games to keep track of what down it is - “most difficult call is catch/no catch” - again it’s a game of inches- there is human judgment error when it comes to marking where a player is down on a field, where the chains are placed for the 1st down, how that compares to the SportVision computerized line we see on TV, is chain actually precisely 10 yards?, etc.

Hawkins: in soccer, it would be easier to judge call if it was based on whether middle of ball crossed goal line for goal

Brooks: catcher framing in baseball is arguably worth up to 20 wins a season- where you are fooling umps, what is value of Molina brothers? - would be hard to apply technology that soccer uses for NHL goals due to speed of puck, how much of puck crosses line, etc.

Bantom: in NBA, there is great transparency, as fans are able to see what officials see in terms of replay reviewing - However in MLB, interesting that there is no explanation of calls by umpires to the fans

 Panel #3: College Football’s Playoff Selection Dilemma Presented by ESPN

Panelists: Jeff Bennett (moderator), Dean Oliver, Alok Pattani, Brad Edwards, and Chris Fallica

- ESPN Stats & Info provides metrics such as expected points and probability models to help sort out Playoff selections - also includes strength of schedule calculations and adjusted ratings of team strength - going past the “eye test”, looking at how the game played out, Ex: FSU was up 42-0 in 2nd quarter

- 12 committee members comprising selection committee- How many of them are watching games/ which ones are they watching? ALL members are over 50 years old- cause for concern????

Key distinction is between “Best” and “most deserving” team- last year’s title game: Alabama vs. Notre Dame (undefeated, what did their resume look like, were they deserving??) - all panelists agree that Notre Dame would have been 4th seed in playoff

BIGGEST PROBLEM: determining who is 4th seed - looking at polls does not tell the story necessarily, Example: Gonzaga in NCAA Tournament last year, #1 seed (was #1 in AP Poll), with AP Poll you have to reassess every week win probability

- can look at metrics now such as final score margin, other team stats, average in-game win probability at different points in game

- Strength of Schedule is relative to perspective, there are different ratings and systems of how to configure SOS

The March Madness Selection committee is provided with a “Nitty Gritty” sheet when enter meeting with bunch of different stats and columns on teams - CFB Playoff Committee will be provided with NOTHING 

Panelists Now Had Activity Where Audience had chance to vote given blind resume tests from previous years to see who they would vote in/out of playoff system, would compare with panelists’ opinion and what the numbers said
First talked about FSU this year and looking at performance vs. level of competition - Loss Tolerance- Was actually found that it would be harder to go 12-1 with Auburn’s SOS vs. going 13-0 with FSU SOS

Scenario #1: revealed to be 2008 Alabama (only loss to #1 Florida in SEC Championship) vs. USC (conference champs)- numbers favor Alabama

Head to head matchups were a high priority for the panelists

#2: Stanford (11-1) vs. Oregon (11-2) in 2011- Oregon destroyed Stanford in regular season, votes and numbers favored Oregon

#3: 2013 4th seed: Alabama vs. MSU vs. Stanford- audience mostly voted to leave out MSU, but numbers say Stanford would be left out

Panel #4: Beyond the 4-4-2: Soccer Analytics

Panelists: Taylor Twellman(moderator), Steven Houston, Robbie Mustoe, Jim Pallotta, and Paul Neilson

 Latest development in soccer analytics: rise of technical scouts and positional data, future is with physiological and social fields - interesting that there is rise in U.S. owners in Europe (Pallotta with AS Roma as example)

Look at economics with transfers and acquisitions- Is Wayne Rooney really worth $500,000/ week? Tottenham was able to acquire Gareth Bale for 7 M pounds and later sold him to Real Madrid for about 87M pounds

Pallotta: At Roma, strategy is to build strength in the midfield, middle of pitch and with defense (which has given up least amount of goals in Europe this year) - notion of buying one star player and surrounding him with other players goes against his philosophy of success

Prime example: Dortmund has no real “star” players, are able to identify quality players before they become great - analytics is seeking to understand playing styles better, and how certain players can fit system of clubs Best way for teams to get better is to improve revenue streaming, attract better players

 Panel #5: Basketball Analytics

Panelists: Steve Kerr, Stan Van Gundy, Brad Stevens, Mike Zarren, moderator Zach Lowe SportVU cameras in every NBA arena now!!

Kerr: interesting to see that for Miami, they are not concerned with their low offensive rebounding numbers because they create turnovers and get more possessions, more efficient than most teams

SVG: you need a style to fit the players/personnel you have, there are different interpretations of pick n’ roll defenses - can’t get caught up in hiring guys that only know analytics and do not know the game of basketball: idea that you can substitute numbers and analytics for actually watching the games - “ I read a useless stat in this ESPN Magazine that said Paul George has ran the most in the league (130 miles). What possible use is that?”

- Stan Van Gundy was by far the most entertaining panelist at the conference! He was in a sense playing "Devil's Advocate" but his arguments were convincing

BS: have noticed some psychological analytics- observe Dirk’s interaction with his teammates, constant talking and communication, always smiling

SK: would be nice to have a measure of conditioning and to be able to measure the stress these players put on their bodies throughout the season

SVG: discussion on the balance between work and rest, many teams now resting their players on some nights or not playing as many minutes “ Michael Jordan when he won his 6 titles never averaged less than 38 minutes/game”

- Tom Thibodeau has success despite injuries because he plays his good guys more minutes than most coaches - why it seems there are more injuries than ever in NBA? Increase in pick n’roll play and strategy, lot more guards attacking rim, very few practices during the season now

“Minute restrictions are BS” - Interesting theory on why Derrick Rose is getting hurt so often: Van Gundy says that guys in league are becoming stronger and more explosive than ever, D.Rose is most explosive in his attacks= larger load on his knees= more injuries Is it the best to be training our players to become stronger and more explosive? Lebron would say yeah

SK: if I am hiring someone in the front office, I want someone with both an analytics and basketball background ideally

Zarren: TANKING solution = the WHEEL(Wheel) – basically picks in perpetuity that would eliminate current lottery system in aims to avoid tanking, submitted it to the league 2 years ago

major criticism: college player that aims at typical team, might stay for another year in college if he knows he’s going somewhere like Milwaukee - could be solved with saying top 3 picks are thrown in a hat, no certainty player knows what team he will end up on or if that pick will be traded

- Another MIT conference staple, Mark Cuban, has provided his idea to avoid tanking, which includes not giving the worst 3 teams in the league picks in the draft, giving an incentive to at least finish 4th worst(Cuban Solution)

Colangelo (Cornell grad!)- Admitted to tanking as Raptors GM a few years ago towards end of season

 This concludes PART 1 of My Recap. PART 2 will cover 2nd Day of Panels.

Labels: , , , , , ,

Saturday, November 30, 2013

Know your Stats: The Key Pass


In my last piece, I started looking at the importance of properly contextualizing player performance in order to isolate what we --  as fans, managers, and coaches --truly care about: ability and value. In this piece, I'd like to show how it is that advanced metrics can help out in that difficult task.

I'm very partial to passes. Passing is my favorite part of the game. Nothing trumps a side that can pass the ball around fluidly, and aesthetically nothing beats a beautifully threaded through-ball to a put a forward in scoring position. Therefore, a metric that I'm very partial to is the "Key Pass". First, let's begin by defining the Key Pass. Per Opta Sports, the company that measures and tracks the metric, the definition of the Key Pass is:

The final pass or pass-cum-shot leading to the recipient of the ball having an attempt at goal without scoring.

So there we go. The definition provides a standard for the people at Opta to objectify events, and it makes the data they provide very reliable, moving away from subjectivity. Opta essentially has large number of people sitting in their viewing center (or working remotely, maybe?) counting Key Passes. Sounds like a fun job. Unless your assignment is to track Crystal Palace games or something. 

The Key Pass metric provides one big advantage over assists,  and that advantage can be found in the final clause of the definition: "without scoring". The Key Pass is a better way to unveil the true measure of what a player is actually doing on the field.  Let's view this through an example of an one of the best playmaykers in the world, Mesut Ozil, who has averaged around 4 Key Passes per 90 minutes over the past few seasons -- an impressive figure.

Say Mesut Ozil filters a beautiful ball from midfield in  between the two center backs. Last season, Cristiano Ronaldo would have been on the receiving end of such a pass, and Cristiano Ronaldo, being the monster that he is, probably would have buried the ball in the back of the net (the guy breaks hands from 30 meters – a goalkeeper a few meters away has no chance). Now, this season, Nicklas Bendtner could be on the receiving end of those passes, and it is just as likely that Bendtner will trip over his own feet as it is that he scores.

Observe Nicklas Bendtner in his natural habitat


So we have two situations where Mesut Ozil makes the exact same pass and gives his forward the same chance of scoring – let’s say an *85% chance. The 15% left to actually score the goal will be decided by 1) the scorer’s ability and 2) random chance (say a beach ball getting in the way). Neither 1 or 2 actually tell us anything about Mesut Ozil. And it is 1 and 2 that determines whether Ozil's pass gets tallied up as a an assist.

*Note: these percentages are mere abstractions to make my point.

Now, expand that situation to a much grander scale, where a creative midfielder plays with only Bedntners and no Ronaldos. I’m inclined to say his assist numbers would not be as high with Bendtners as they would be with Ronaldos. 

Now, this is not to say that the Key Pass Metric is a flawless measure. This article from Statsbomb does a fairly good job at evaluating the Key Pass, pointing to one big flaw: not all key passes are created the same. In the example given above, say instead of Ozil filtrating a perfect ball that leaves the forward on a 1 v 1 with the goalkeeper and with an 85% chance of finishing, he puts a little too much on the ball and instead leaves the forward at an awkward angle with the goalkeeper and only with a 70% chance of scoring. Also, not every final pass is equally as "key" -- there is a big difference between a  ball played from midfield to put a forward through and a short tap in pass in the keeper's box.

Over many different observations (instances), these differences are meaningful, and they are directly a result of a player's ability. However, the Key Pass is a great starting point and a vast improvement over the statistics we have typically had to contend with. Anyway, I strongly recommend the Statsbomb article; it actually attempts to normalize for the differences in the quality of the Key Pass using another metric called Expected Goals (ExpG). It shows the type of improvement that we can continue to make in soccer analytics.

My initial intent with this piece was to actually start engaging in analytics and to track Ozil's career based on the Key Pass. However, the availability of soccer statistics is very limited, and the availability of advanced soccer statistics is even more limited. I have put in a request to Opta to see if they will grant me access to their data, which they do grant to bloggers and writers as long as their projects merit it. Hopefully I get access to it, and the quality of my content on this space can go up a notch.

I'll leave you with some passing gems from Ozil:


And in case you were wondering about the significance of the picture at the beginning of the post, here are some gems from one of the masters of the Key Pass, Carlos Valderrama:



Around the Web:

  • Richard Whittall with a short but thought provoking look at football finance and the possibilities of teams...GASP...actually spending their money wisely and not indiscriminately.
  • Statsbomb with a similar piece, suggesting improvements over the decision making organization that most teams employ in an effort to improve that word we all love: efficiency.
  • Over the weekend: Barcelona, Real Madrid, and Atletico Madrid combined for a 16-0 trouncing over their opponents. Gotta love the parity in La Liga. I'm just gonna keep hitting the snooze button, sadly.
  • A World Cup Draw simulator! The most competitive field ever? 


Labels: , , , , , , , ,

Tuesday, November 5, 2013

Soccer Analytics: From Fantex to the Fundamentals


Fantex Brokerage Services recently announced that it will be offering shares in the stock of Houston Texas Running Back Arian Foster. That’s right – shares in an athlete. Essentially, Fantex paid Foster a cool $10 million for a 20 percent stake in Foster’s future income, which includes but is not limited to: contracts, endorsement deals, and public speaking. Fantex will begin to offer shares in that 20% stake as early as next month.

This is an incredibly interesting idea. Beyond the logistical limitations (will players want to do this? How will leagues respond?), I think it strikes the core of much of what analytics in sports is attempting to get at it: player value.

More specifically, I want to examine this story from the point of view of soccer analytics in an effort to start building an analytical framework – to start molding a mode of thinking through which we can analyze players and team performance. That all sounds rather abstract and wordy, but there really are some very concrete and real ideas behind what Fantex is trying to do. Caution: I started writing this article by talking about Fantex’s IPO of Arian Foster, but by the end it will have little to nothing to do with this. If you’re interested in the valuation of Arian Foster, check out NYU Stern Professor and valuation demi-God Aswath Damodaron’s blog post.

Let’s start by examining the most obvious parallel: an athlete has an intrinsic value, much like a company has an intrinsic value.

This intrinsic value can be measured by performance metrics – say goals allowed by a goalkeeper and revenues by a company. However, both of these statistical measures must be viewed in a proper context. Let’s take this very obvious example: Ford Motor Company rakes in revenues that are far larger than they were 100 years ago. A part of this revenue growth is simply because Ford produces more cars than they did 100 years since more people demand Ford cars . However, a major part of this large growth in revenues is due to inflation – Ford takes in more for every car for a reason that is really beyond their control.

Similarly, and this is another obvious example (but that’s why I want to use it – it elucidates the point quite well), the amount of goals a goalkeeper allows is highly dependent on the context within which he performs. A goalkeeper that plays for a team at the bottom of the league (which presumably allows a lot of goals, although it could be that they just don’t score many goals. But for now, let’s assume that they’re bad at both defending and attacking) will allow more goals than a goalkeeper that plays for a team at the top of the league.

Like I said, this is an obvious example that most fans intuitively get and wouldn’t make the mistake of overlooking. However, there are many other instances where the importance of context won’t be so clear.

And I feel like this is where most professional soccer analysis starts to fall apart – a failure to properly contextualize performance makes it hard to analyze what actually is due to player performance and what is simply a product of the context or even flukes.

It is also symptomatic of a larger problem within the community of soccer analysis, journalism, blogging, etc, and that is a problem of “irrationality”. Pick up a copy of any big soccer newspaper (Marca, Sport, La Gazzette…) if you’d like a quick survey of any and all logical fallacies: straw men arguments, inductive fallacies, the fallacy of false cause, etc. Much of soccer punditry in every part of the world consists of knee-jerk reactions to fluctuations in performance over small sample sizes of games or even a single game. Take, for example, the guy pictured above  -- public enemy #1 of rationality. The World Cup (and the Euro) is a perfect opportunity for Mr. Alexi Lalas to make outlandish claims on the basis of 45, 90, 135, or how ever many minutes (all small samples, no doubt) he's happened to watch this team play during the World Cup. Now, Lalas happens to say things that are easily recognizable as drivel  -- but what about your favorite pundits? Your favorite newspaper?

Take, for example, Lionel Messi’s current dip in performance – he hasn’t scored a goal in 4 games. Alarmists are already ringing up theories that Messi is taking the club season off in order to be at full peak for the World Cup, which is what he really wants to win. This is after about a month of mediocre performances from Messi.

I believe a major part of better decision making for soccer managers, owners, and executives and a major part of the discipline brought along with soccer analytics has nothing to do with data or statistical analysis. In fact, saying that is probably misattribution of the word “soccer analytics” on my part. Being a better soccer executive and being a smarter fan has nothing do with data; but it has everything to do with being a more rational observer and analyst. This is a point Professor Chris Anderson alludes to in his interview with our blog:

“General texts explaining decision making and analysis like Daniel Kahneman’s Thinking, Fast and Slow or Silver’s The Signal and the Noise are very useful for understanding how people think about and interpret information.”

In a game such as soccer, where research has shown (mainly by Professor Anderson) that luck plays a bigger role than in any other sport, a rigorous decision making and logical foundation is even more important – there is more “noise” that distorts our understanding of what is actually going on in the pitch.

Now, being irrational and succumbing to the emotions of being a fan is part of what makes soccer beautiful. I’m not suggesting that we need to be cold, emotionless fans. Rather, what I am suggesting is that it is important to detach ourselves from the irrationality of being a fan in order to see the game as what it is, not as what we may construct it to be based on biases and faulty reasoning. “Animal spirits” don’t just apply to the stock market.


Around the Web:

Labels: , , , , ,

Friday, October 4, 2013

Soccer Analytics: An Interview with Chris Anderson


The age of Big Data is upon us. In industries throughout the world, the collection and analysis of data is a focal point of decision making.  Sports are no exception --  not surprising given that professional sports are billion dollar industries.  In the past decade, sports such as baseball and basketball have started integrating statistical and objective analysis into player evaluation and team management. Soccer --futbol, football, the Beautiful Game -- is slowly beginning to accept analytics as a decision-making tool. 

Cornell's own Chris Anderson is one of the leading innovators in the burgeoning field of soccer analytics. On campus, he is a Professor of Government and Labor Relations whose work primarily converges the fields of  economics, politics, and sociology.

Outside of  the classroom, Professor Anderson's interests lie on the soccer pitch, where he played as a goalkeeper in the German lower divisions.  Along with David Sally, Professor Anderson authored The Numbers Game: Why Everything You Know About Soccer is Wrong, a book that breaks many established conceptions in soccer and counters them with an objective, analytical approach -- all supported with careful statistical analysis. It is a great read for any soccer fan looking to nuance his/her view of the game, but it can also serve as a great introduction to anyone that is just becoming interested in the sport.

I hope to review the book later in the semester, mostly as a basic introduction to the field of soccer analytics. Professor Anderson is also co-partner of Anderson Sally, a sports analytics consulting firm that works closely with professional teams.

Professor Anderson recently took time out of his busy schedule (check out his recent interview on CNN ) to answer some questions for the Cornell Sports Business Society. 

Professor Anderson, first I’d like to thank you on behalf of the Cornell ILR Sports Business Society for taking time out of your busy schedule to do this interview. It is very exciting to see that a member of the Cornell community is one of the leading figures in the growing field of soccer analytics. First, would you mind giving us your personal definition of “soccer analytics”?
It’s become kind of a catch-all term for all kinds of things. I think it’s basically one thing that’s applied to another: first of all, it’s analytics – which is about collecting and interpreting information, evidence, data, what-have-you. That information can be quantitative or qualitative in nature; and analytics is not just about having information, but also about deriving meaning from it, and doing so in a systematic way.  
Wikipedia defines analytics as “the discovery and communication of meaningful patterns in data”, and I think that sums it up nicely. Analytics is becoming a common tool across lots of industries, and soccer analytics is simply analytics ideas and practices applied to the game of soccer. Within soccer, we’re talking about analytics with regard to playing the game, recruiting players, or player fitness – the various areas that affect a team’s performance.

How did you first become interested in analytics in soccer, and how did you start getting involved in the field?
For me it started with a love of the game. I've always been interested in understanding soccer as a game played by 22 people who have to make decisions both in isolation (e.g., do I pass, dribble, or shoot?) and together (as part of a team). We tell a story in the book about how I took an analytical approach to soccer from an early age; more recently, Michael Lewis’ book Moneyball got me excited about the potential of applying similar ideas to soccer.
Then I started a soccer analysis blog on a lark, and attended the MIT Sloan Sports Analytics Conference (which was pretty inspiring). As the blogging and analysis became more serious, David Sally and I started talking about writing a book about soccer analytics. That book eventually became The Numbers Game. I guess the lesson for me was that it’s fun to start small and go from there – the key to any of it is to stick with it over time.

Do you think the growing economic disparity between the richer clubs and the poorer clubs in the European leagues will help the field of soccer analytics grow even faster than it already is? There was a similar context in the mainstream emergence of sabermetrics, where Billy Beane had to look for ways to compete with the big money teams in the Majors.
That’s a good question, and I’m not sure of the answer. In principle, the clubs with less money to spend on superstars should be willing to try new ways of winning or to get more bang out of the buck for money invested in analytics. A great example of a club that did some fairly basic but very effective things coming out of analysis were Bolton Wanderers under their then-manager Sam Allardyce (who now coaches West Ham United). Bolton was able to do much better than their wage budget would have suggested.  
But the reality at many of the lesser clubs is that money is really tight, and clubs find it difficult to justify spending money on people, software, data, and computers to ramp up their analytics operations. So ironically, the better-financed clubs like Manchester City or Liverpool are spending more money and resources on analytics, and they benefit from those investments. By the way, Billy Beane is a huge soccer fan, and I’d love to see him give advice to soccer teams (and you’d only have to hire one guy) – but I don’t think he’s available!

One of the bigger and most counter-intuitive points you make in your book “The Numbers Game” is the importance of luck in the game of soccer – significantly more than in any other sport. Do you think that observation should have any effect on the way teams and fans analyze on-field performance?
I would hope so, but I’m note sure. It should be pretty logical. More randomness and luck means more noise in the data, and that should make fans and clubs look longer term. In statistics language, what you want is a bigger sample before drawing any kinds of firm conclusions about performance because outcomes can be too much influenced by chance in the short term. But of course, telling a fan or a coach not to worry about the last 2-3 games is likely to encounter resistance. So we have to divorce our role as fans and the emotion that comes with that from the reality of what the data really do or don’t tell us.

If patience is hard to come by – and it always is – then another thing the role of luck and chance should teach us is that fans and coaches might be well-advised to focus more on those aspects of a team’s or player’s performance that are more controllable or have less chance. Shot conversion rates are an example of a performance indicator that is less replicable than, say, producing high quality chances in the first place. The former regress more quickly to the mean than the latter.

On a similar note, we have seen the emergence of analytics in other major sports, namely baseball, where analytics are firmly entrenched. However, soccer is a very different game than baseball – it is much more fluid with fewer fixed events. How does that limit the extent of the objective analysis that can be used to view the game?
It doesn't really; people working on basketball and hockey, for instance – two sports that are fluid and team-based – have already shown us that quite a lot of interesting insights can be produced about soccer’s “cousins”. At the same time, it’s probably naive to think you can simply apply ideas from one sport – especially one, like baseball, that’s very different – to another. So you have to be careful, and every sport has to find its best ways of using analysis. More fundamentally, the nature of the game makes soccer analytics simply a harder set of analytical problems. But that doesn't mean it can’t be done.

One could say that soccer has always been a mathematical game, but in a different way than baseball in that it is a very geometric game. Formations have been a big obsession from the very beginnings of the sport, shape – mostly defensive -- is always emphasized by youth coaches, and triangles are a big part of the ideology of FC Barcelona. Do you think the close connection between soccer and geometry opens up other objective frontiers within the game of soccer?
It’s only natural. Soccer is a game of space and a game of timing, so when it comes to the spatial aspects, and the team aspects of players having to coordinate, I think there is a lot of potential here. It’s also an area that is easier to explain to coaches (say, on a blackboard or a computer screen) than a set of numbers.

Would you mind sharing with us any soccer analytics research you are currently working? Any upcoming books or projects we can look forward to?
Not at the moment. Actually, I am busy working on various projects related to my job as a political scientist in the Government Department. That’s keeping me pretty busy.
Lastly, could you recommending some crucial readings for any readers looking to start learning about soccer analytics?
Going back to your first question, I think it would be important for any aspiring analyst to get a good handle on “analytics” – analytical thinking, analysis tools (econometrics, statistics, etc.) – as well as soccer as a game. There are a variety of sources out there, and many of them aren't very technical (which is nice). My personal favorites that I recommend with regularity are books like Jonathan Wilson’s Inverting the Pyramid: The History of Football Tactics and Kuper and Szymanski’s Soccernomics
More generally, I would say that becoming familiar with soccer analytics does not imply only learning about soccer. I would always recommend reading (not just watching) Michael Lewis’ Moneyball and Jona Keri’s The Extra 2%.  Scorecastingby Moskowitz and Wertheim and Basketball on Paper by Dean Oliver are excellent, too. General texts explaining decision making and analysis like Daniel Kahneman’s Thinking, Fast and Slow or Silver’s The Signal and the Noise are very useful for understanding how people think about and interpret information.
Finally, I would recommend reading all the great material that’s now available courtesy of various analytics-focused blogs. For soccer, I would recommend socceranalysts.com and statsbomb.com. But there are also lots of great analysis blogs on hockey and basketball, for instance.

Labels: , , , , , ,