Copa America : Notes – WTF is xG?? edition

We got a bunch of questions from yesterday’s notes – primarily WTF is xG? So we’re dedicating this note to our methodology and a “201” on soccer analysis.

1.      A rose by any other name – xG, EGV, Expected Goals – whatever name you prefer – is a way of assessing shot quality in soccer. Every shot has a probability of turning in to a goal. A shot from 5 meters when Messi is 1-1 with the Goalie has a very high xG – or expected goal value. Let’s say 50%. A shot from 40m by yours truly has a very different xG – let’s say 1% to be nice to me. Some details if you have 5 minutes, if not skip forward to #2:

a.      A bunch of factors go in to xG calculations – and ours are the best out there. Shot distance, shot angle and type of play are obvious factors – and everybody uses those

b.      Then there’s the context of who is shooting, how they are shooting and how they got there – Messi as good as he is, doesn’t take headers, so if he were to take a header, the probability of scoring from the same spot is lower than if he were shooting with his right foot

c.      Finally, we layer on the context of how the defense and goalkeeper perform at defending from that distance/shot type

d.      However, when assessing goalkeepers we don’t use this xG value – because it’s not fair to a goalkeeper. My 1% shot from the half line might just end up in the top left corner of the goal after a wicked curve (one can dream) – so even though it was an impossibly unlikely shot, from the perspective of the goalkeeper it was very tough to stop. So we calculate a separate saveability index when we assess goalkeepers. Head hurt yet? We’re almost done

e.      Here is a look at our xG vs actual goals per 90 minutes by team across EPL and La Liga over 3 years – you can see that very few teams manage to stray much from their predicted xG value

Data provided by Opta

2.      What goes up must come down: We are big proponents of mean reversion. Which is to say, you can get lucky for short periods of time, but eventually you will revert back to your xG value.

a.      When a player first suddenly breaks from his xG line (remember our xG takes in to account that player’s accuracy and conversion for each shot type from each zone on the pitch), we’re inclined to agree with others that conclude it’s luck.

b.      Orange line is actual goals, blue line is xG – this is Harry Kane in 2014 – this led Nate Silver to call him the luckiest player in soccer:

xG Harry Kane
Data provided by Opta

c.      So how do you tell skill from luck? The best way is time. If a player is consistently outperforming his xG, over time we adjust their accuracy and conversion – effectively we start accepting they have improved. Look at Kane at the end of 2015 – our models expected more from him in 2015 based on his 2014 performance – and he still beat his xG line:

Orange line is actual goals, blue line is xG – this is Harry Kane in 2015 – which led us to this:

xG Harry Kane
Data provided by Opta

3.      Though this be madness, there be method in it the big question is why. Why should you care about any of this?  Because when you put all the pieces together (the above is a small part of the Sportify analytical system), you get a great way to assess every single thing about soccer – from players to teams to matchups.

a.      Team analysis – the performance indices let you get a snapshot of a team’s strengths, weaknesses and consistency:

Arsenal Performance 2015
Data provided by Opta

A snapshot view shows you that Arsenal have been a frustratingly inconsistent team this season:

b.      Player analysis- You can look at a league across a season (let’s look at EPL 2015) and get a snapshot of performance

Premier League 2015 xG
Data provided by Opta

Lots to look at in the xG vs. Goals across a season – Aguero’s amazing finishing (can he sustain this or will this be an outlier for him), Bony’s struggles in a new system, Vardy’s breakout year – there’s a treasure trove of insights at your fingertips:

c.      Moneyball in soccer! – Now layer on player salaries and values, and you can assess performance relative to price.

MLS Moneyball
Data provided by Opta

This is MLS 2016 season – colors  show how much xG they were involved in (either as an assister/creator or as the shooter), the size of the circle is their cost. A lot of big red circles show wasted money, the small green circles show hidden value.  Time for MLS to rethink their retirement home strategy?