Busting the Myth of Moneyball in Soccer Statistics?

Over the past month or so Tim, @7amkickoff, and I have been having some great discussions about soccer, statistics, and the ways or means in how to use statistics to better communicate what may be happening on the pitch outside of what may normally be seen by supporters.

I’m not sure we’ve cracked the nut completely but these discussions have spurred me to come up with some other ways to show the strengths and weaknesses of statistics in soccer and what key indicators may better tell the story of a team exclusive of Goals Scored or Goals Against.

My article today is an attempt to do that.

In setting the stage, I feel it is worthy to reinforce that the pioneering of soccer statistics is not just about one or two people; I’m aware of many folks trying to help others better understand the nuance of soccer in a variety of different ways.

But with all that hard work, by people across the pond, and now here, recently in the US, I think  some of the well-intended efforts have strayed off the mark.

Why?  As much as it pains me to say this I blame Moneyball relative to baseball statistical thinking and trying to apply the event statistical thinking of baseball to the concepts of statistical measurement in soccer.

Soccer is not a game played in series (like baseball) it’s a fluid game played with continuous, sometimes random decision making, all with the intent to possess the ball, retain and move the ball, penetrate, create, take shots, put them on target and score goals.

And at any time, be it a Coaching decision, Referee decision, Assistant Referee decision, or a split second decision, by any player, either with or without the ball, can influence the outcome of a game.

Therefore, statistics, single statistics, simply miss the mark on translating the nuance of soccer to the general supporter, and as such, are – on the surface – flawed if used (alone) to evaluate the market value of a player.

To put this into perspective, ignoring Coaching or Referee decisions, here’s a rundown on the Correlations (r’s) of the three best Attacking r’s for each team in the English Premier League.

Caveat:  The statistics are either measured by volume (quantity) or by percentage of accuracy (quality) to Points Earned in the League Table over the span of 21 games, one game at a time; these are not Aggregate r’s.

Said another way, this is NOT a measurement relative to winning or losing… it’s a measurement relative to winning, drawing, or losing (points earned).

  • Chelsea: Goals Scored (.46) Shots on Goal per Shots Taken (.30) Shots on Goal (.18)
  • Burnley: Shots on Goal per Shots Taken (.46) Goals Scored (.44) Goals Scored per Shots on Goal (.40)
  • Man City: Opponent Total Passes (.59) Goals Scored (.58) Opponent Total Passes Completed (.54)
  • Newcastle: Goals Scored per Shots on Goal (.60) Goals Scored (.52) Passes Completed Final 1/3 per Passes Completed Entire Pitch (.49)
  • Southampton: Goals Scored (.58) Goals Scored per Shots on Goal (.58) Shots Taken per Passes Completed Final 1/3 (.46)
  • Liverpool: Goals Scored (.66) Shots Taken per Passes Completed Final 1/3 (.52) Opponent Possession Percentage (.43)
  • Crystal Palace: Goal Scored (.67) Shots Taken per Passes Completed Final 1/3 (.62) Shots Taken (.53)
  • Arsenal: Goals Scored (.60) Goals Scored per Shots on Goal (.41) Shots Taken per Passes Completed Final Third (.20)
  • Spurs: Goals Scored (.67) Goals Scored per Shots on Goal (.65) Shots on Goal (.46)
  • West Ham: Goals Scored (.76) Shots on Goal (.44) Goals Scored per Shots on Goal (.43)
  • Sunderland: Goals Scored (.61) Goals Scored per Shots on Goal (.40) Passes Completed Final 1/3 per Passes Completed Entire Pitch (.38)
  • West Brom: Shots on Goal per Shots Taken (.45) Passes Completed Final 1/3 per Passes Completed Entire Pitch (.45) Goals Scored (.44)
  • Aston Villa: Goals Scored (.76) Goals Scored per Shots on Goal (.46) Shots on Goal per Shots Taken (.41)
  • Stoke City: Goals Scored per Shots on Goal (.81) Goals Scored (.68) Opponent Possession Percentage (.50)
  • Hull City: Goals Scored (.63) Goals Scored per Shots on Goal (.59) Shots on Goal (.36)
  • QPR: Goals Scored (.68) Passes Completed Final 1/3 per Passes Completed Entire Pitch (.56) Shots on Goal (.44)
  • Everton: Shots Taken per Passes Completed Final Third (.71) Goals Scored (.60) Goals Scored per Shots on Goal (.34)
  • Leicester City: Goals Scored per Shots on Goal (.74) Goals Scored (.53) Opponent Possession Percentage (.31)
  • Swansea: Shots on Goal per Shots Taken (.52) Goals Scored (.48) Shots on Goal (.40)
  • Man United: Goals Scored per Shots on Goal (.69) Goals Scored (.62) Shots on Goal per Shots Taken (.28)

What’s that mean?

For the most part what this means is that no two teams show the same consistency of pattern in what single (game to game) quantity or quality indicators best represent team performance in Attacking.

Therefore – the individual player statistics behind these values have a different meaning (amount of influence) in whether a team wins, draws, or loses.

In addition, while Goals Scored (in bold) appears as a relevant indicator it is not the most relevant indicator for every team.  Reinforcing that teams, in attacking, behave differently with respect to earning points in the League Table.

Of additional note is that the r for eight of those teams is less than (.60) and only two teams show an r greater than (.70).

Finally, the single indicators (either by volume or by ratio) that fit into the top three, exclusive of Goals Scored, are:

  • Goals Scored per Shots on Goal (thirteen times)
  • Shots on Goal per Shots Taken (six times)
  • Shots on Goal (six times)
  • Shots Taken per Passes Completed Final 1/3 (five times)
  • Passes Completed Final 1/3 per Passes Completed Entire Pitch (four times)
  • Opponent Passing Percentage (three times)
  • Opponent Total Passes (once)
  • Opponent Total Passes Completed (once)
  • Shots Taken (once)

What’s intriguing is that three Defending Indicators appear; Opponent Passing Percentage, Opponent Total Passes and Opponent Total Passes Completed.

With all those variety of different attacking r values, it’s pretty clear it simply isn’t all about scoring goals (getting a man on base and moving them forward)… therefore the market value used to assess that players value should be questioned if it doesn’t consider outside factors that influence output…

In other words, it’s about a variety of different ways and means to do well – even (in a small way) about not possessing the ball so even passing accuracy is influenced – somewhat – but a head coaching tactical decision.

But wait, there’s more:

All those indicators above show the top three r’s for a team when attacking.

There’s a whole side of the game that is missed with those – and that’s defending.

So here’s the top three, best negative (inverse) r’s compared to Points Earned in the League Table, for each team in the English Premier League:

  • Chelsea: Opponent Goals Scored (-.51) Opponent Shots on Goal (-.43) Opponent % of Success Passes Final 1/3 (-.42)
  • Burnley: Opponent Goals Scored (-.59) Total Passes Completed (-.55) Total Passes (-.54)
  • Man City: Opponent Goals Scored per Shots on Goal (-.67) Opponent Goals Scored (-.53) Opponent Passes Completed Final 1/3 per Passes Completed Entire Pitch (-.43)
  • Newcastle: Opponent Goals Scored (-.57) Passing Accuracy (-.44) Total Passes (-.43)
  • Southampton: Opponent Goals Scored (-.72) Opponent Goals Scored per Shots on Goal (-.63) Opponent Shots on Goal per Shots Taken (-.35)
  • Liverpool: Opponent Goals Scored per Shots on Goal (-.67) Opponent Goals Scored (-.60) Passing Accuracy (-.46)
  • Crystal Palace: Opponent Goal Scored (-.42) Opponent Shots on Goal (-.37) Opponent Shots on Goal per Shots Taken (-.41)
  • Arsenal: Opponent Goals Scored (-.86) Opponent Goals Scored per Shots on Goal (-.64) Opponent Shots Taken (-.47)
  • Spurs: Opponent Goals Scored (-.52) Opponent Shots on Goal (-.43) Opponent Shots on Goal per Shots Taken (-.42)
  • West Ham: Opponent Goals Scored (-.66) Opponent Goals Scored per Shots on Goal (-.50) Opponent Shots on Goal (-.48)
  • Sunderland: Opponent Shots Taken (-.50) Total Passes Completed (-.40) Total Passes (-.39)
  • West Brom: Opponent Goals Scored (-.80) Opponent Goals Scored per Shots on Goal (-.64) Opponent Shots on Goal (-.57)
  • Aston Villa: Opponent Goals Scored (-.60) Opponent Goals Scored per Shots on Goal (-.55) Passing Accuracy (-.37)
  • Stoke City: Opponent Goals Scored per Shots on Goal (-.70) Total Passes (-.60) Total Passes Completed (-.60)
  • Hull City: Opponent Goals Scored per Shots on Goal (-.60) Opponent Goals Scored (-.57) Opponent Total Passes (-.40)
  • QPR: Opponent Goals Scored (-.55) Opponent Goals Scored per Shots on Goal (-.42) Opponent Shots on Goal (-.35)
  • Everton: Opponent Goals Scored (-.57) Passes Completed Final 1/3 per Passes Completed Entire Pitch (-.56) Opponent Goals Scored per Shots on Goal (-.52)
  • Leicester City: Opponent Shots on Goal per Shots Taken (-.54) Opponent Goals Scored (-.47) Opponent Goals Scored per Shots on Goal (-.42)
  • Swansea: Opponent Goals Scored (-.72) Opponent Goals Scored per Shots on Goal (-.66) Opponent Shots on Goal (-.59)
  • Man United: Opponent Goals Scored per Shots on Goal (-.57) Opponent Goals Scored (-.47) Passes Completed Final 1/3 per Passes Completed Entire Pitch (-.36)

What’s that mean?

Again, for the most part, no two teams show the same consistency of pattern in what single (game to game) quantity or quality indicators best represent team performance in Defending.

Therefore – the individual player statistics behind these values have a different meaning (amount of influence) in whether a team wins, draws, or loses.

In addition, while Opponent Goals Scored (in bold) appears as a relevant indicator it is not the most relevant indicator for every team.  Reinforcing that teams, in defending, behave differently with respect to earning points in the League Table.

Of additional note is that the r for eleven of those teams is less than (-.60) and only four teams show an r2 greater than -.70.

Also, Opponent Goals Scored does not appear in the top three single defending indicators for two teams, Stoke City and Sunderland.

Finally, the single indicators (either by volume or by ratio) that fit into the top three, exclusive of Opponent Goals Scored, are:

  • Opponent Goals Scored per Shots on Goal (fourteen times)
  • Opponent Shots on Goal (eight times)
  • Opponent Shots on Goal per Shots Taken (four times)
  • Total Passes (four times)
  • Total Passes Completed (three times)
  • Passing Accuracy (three times)
  • Passes Completed Final 1/3 per Passes Completed Entire Pitch (twice)
  • Opponent Percentage of Successful Passes Final 1/3 (once)
  • Opponent Shots Taken (once)
  • Opponent Total Passes (once)
  • Opponent Passes Completed Final Third per Passes Completed Entire Pitch (once)

A few thoughts here to go with some of these indicators:

Most recognize that a negative r means there is an inverse relationship – in other words you get more with less or you get less with more.

What is intriguing is that Attacking Total Passes appears four times while Attacking Total Passes Completed and Attacking Passing Accuracy appear three times.

Meaning, as those teams have less overall Passes Attempted, Passes Completed or lower Accuracy they are more likely to earn points.  Imagine that sort of logic applying to baseball – where a team who, sometimes, puts less men on base is more likely to win!

Finally, with the variety of defending r values this also seems pretty clear that earning points is not just about putting a man on base and moving them forward, and in some cases it may even be about not possessing the ball!?!

In Closing:

Single statistics have value – but they should be offered up, in context, with relation to other things that occur in the game of soccer.

Not enough writers do that – they simply offer up individual statistics as if they are the panacea of greatness… the more they do this the more ingrained most soccer supporters become in individual statistics that over-value a player.

And the more the media does it the more likely the supporters will become disenchanted with front office decisions that don’t make sense based upon those high-visibility individual statistics…

I’m not a Moneyball guy for soccer – never have been – and to me that line of thinking is flawed (as it applies to individual statistics in baseball).

What’s that mean??? (Editorial)

After a great question offered up in the comments section I think I should clarify what I mean by that with respect to soccer.

When I read Moneyball I was more focused on the individual statistics part of the game that were used to generate market value than the ‘economic state’ of buying and selling players that might lead to more wins…

That being said, I am not saying that you can’t measure the value of a player in soccer – it can be done but it needs to be done after considering teammates, opposing players, and at least the Head Coach of the team the player plays for.

Modern day soccer statistics, for the most part, don’t measure the appropriate level of influence teammates, opposing players, and Head Coaching tactics – as such when I say I’m not a Moneyball guy when it comes to soccer it really means I don’t buy all that crap about tackles, clearances, goals scored, etc…

I value players relative to team outputs and I strongly feel and think the more media and supporters who understand this about soccer the less frustration they will in blaming or praising one individual player over another player.

I hope that makes sense???

Anyhow, an example if you will…

A player with many tackles or clearances is simply a player with many tackles or clearances – it doesn’t mean they are better or worse than another player with fewer tackles or fewer clearances.

And… actually, I could make a reasonable argument that a player with many tackles or clearances is actually a worse player… why?

For one reason – if an opposing head coach knows that a player on the other side is weak – what do you think that head coach will want his players to do?

Drive or pass the ball towards the weaker player – as such – that increase volume of tackles or clearances will naturally increase that weaker players defending statistics simply because of increased volume!!!!

Bottom line here is that individual tackles or clearances can be over-valued or under-valued – as such – as an individual statistic it’s relevance to a player being better or worse than another player is flawed…

However viewed…

I would offer more individual statistics need to be created for players that better reflect how those statistics relate to points earned.

It’s that type of reporting and analyses that should help others better understand the nuance of soccer and that it isn’t just all about scoring goals.

Best, Chris

COPYRIGHT, All Rights Reserved.  PWP – Trademark

You can follow me on twitter @chrisgluckpwp

Advertisements

15 thoughts on “Busting the Myth of Moneyball in Soccer Statistics?

  1. “I’m not a Moneyball guy for soccer – never have been – and to me that line of thinking is flawed.”

    To say that the line of thinking of Moneyball is flawed seems to suggest you are using a different definition. “Moneyball” is not synonymous with getting guys on base, or whatever soccer analogy you want to use. Moneyball is a basic economics concept that suggests the transfer (free agent) market occasionally values a particular skill or quality more than its on-field value suggests. In baseball, one such skill happens to be drawing walks.

    By saying that Moneyball can’t apply to soccer, you’re saying that there’s no way that we can identify undervalued and overvalued players. I agree that looking at tackling or interception rate in isolation is a flawed way of measuring individual talent, due to all kinds of things that could cause that to happen (gets a lot of tackles because he gets beat a lot). But to throw up your hands and say we can never measure player ability is giving up too early, in my opinion.

    “Meaning, as those teams have less overall Passes Attempted, Passes Completed or lower Accuracy they are more likely to earn points. Imagine that sort of logic applying to baseball – where a team who, sometimes, puts less men on base is more likely to win!”

    Teams that win tend to have worse shot rates than teams that lose (relatively) because teams that win turtle up at the end and allow an onslaught of shots from distance. These negative correlations are being muddled by the gamestate, and don’t negate the value the shots to the analysis of player and team quality.

    Like

  2. Matt, You are right! But many equate Moneyball with individual statistics that drive overall team performance in winning and losing… even after reading the book twice I certainly did…

    What I am saying, and perhaps in a muddled kind of way, is that the individual statistics of a player have nuanced value relative to the overall team performance and the End State (winning)… thereby rendering the market value of the player, generated by tracking their individual statistics, flawed…

    There are different levels of flawed I suppose but where the market value for soccer players is strictly based upon individual statistics, without considering the influence of coaches, referee’s, teammates or opponents, it is flawed – in a big way… meaning for me the concept of Moneyball, as it applies to baseball (for soccer) is inappropriate…

    I will clarify that in my article – so thanks 🙂

    By Shot rates do you mean Shots Taken to Goals Scored?

    If so, an important distinction – I don’t measure TSR as I work from the premise that in between shots taken and goals scored there are three questions that need to be answered first… is the shot taken on goal, was the shot taken blocked, or was the shot taken saved.

    Another distinction – I don’t measure statistics relative to winning or losing – I measure team performance relative to points earned…

    I’m not sure that scratches your itch Matty but it’s good to hear from you again…

    Like

    1. I agree that it’s harder to use individual statistics to measure value in soccer. In baseball, a player wanting to get on base more doesn’t hinder the team in anyway. But in soccer, wanting to shoot more can definitely take away from the team in some cases.

      But individual metrics can be derived from a combination of player events, and team success when players take part in that event successfully. If we learn that teams succeed a lot when their defenders win a high percentage of headers, then we can attempt to assign value to those headers. It’s not as easy in soccer, but it shouldn’t be impossible.

      As for shots, I’m referring specifically to something like TSR, but the point can be made for any shooting stat. I wrote about it here: http://www.americansocceranalysis.com/home/2014/12/18/shots-confusion-in-correlations

      Taking high quality shots in high quantities is what all teams want ideally (and the opposite on defense). So players that help to produce more, better shots for, and fewer, worse shots against must be valuable players. However, the in-game correlations will lead to faulty conclusions about the value of shots, as shown in the article linked.

      Like

      1. Matty, As always I value our chats – please bear with me 🙂 Your thoughts have given me an opportunity to circle back to both market value and what I think and feel should be new statistics to better measure activities on the pitch that I think will satisfy both of us…

        i read your article – I think we have two completely different approaches…

        Within PWP the data inputs are percentages derived from past events developed through ratios associated with the process of soccer as I see it.

        I don’t measure predictability as such for shots for one primary reason – I simply do not agree that shots taken are ‘repeatable’ because no two shots are really the same… granted shot locations may be the same but the actual physical contact and the environment in which the shot occurs are different – with one exception – penalty kicks…

        In terms of variation – as expected, the variation for the ratios of shots taken to shots on goal and shots on goal to goals scored is quite high – that is more due to the lower volume/frequency of the event in the game than anything else. I don’t measure TSR…

        Penetration, Passing Accuracy, Possession and Creation of Shots relative to Penetration all have far lower variations… better teams show lower variation – for the most part but the greater variations are more associated with home games versus away games…

        The other reason I stay away from predictability is due to low overall frequency of the primary events (major sample points) – games played… As I mentioned – I do not see shots as being repeatable and (it’s just me) but I don’t really rely upon predictability (forecasting) because I feel and think you need at least 15 games worth of data (both home and away) to establish a reasonable predictability model… each game is different – each season is different – and sometimes head coaches change too….

        I agree 100% that game states can impact volume – and that in turn – (may?) impact accuracy but over the course of a season poor team performance, no matter what volume of activity, shows up as a regular occurring event, regardless of volume, for teams that fail to earn points… So – in looking at the game as a whole – with the measurements within PWP game state is not a ‘game changer’ — I don’t measure game state yet the aggregate R2 remains .90 or higher… and even for an individual team the R2 has been recorded as high as .92 for Southampton and .90 for Burnley – other teams it varies – but if game state had a huge impact in my analysis (and it wasn’t measured) then I would offer my R2 would be far lower…

        Anyhow, I digress… I completely understand and get your approach in looking at a sequence of events – I just simply disagree as I see the label of a key pass being attributed to the attacking player (as a success) when in fact the outcome of that pass might better be attributed to poor defending by a defender and not a ‘great ball’ by the attacker — this gets back to what does and doesn’t get measured in soccer…

        Great positional defending is not measured – nor is poor positional defending – hence my call to create new statistics – (open pass – hindered pass – open shot – hindered shot) this way you can attribute great attacking key passes and shots to great attacking play (great being attributed to higher success when being hindered) versus a regular pass that becomes ‘great’ simply because the pass or shot was open… when in fact it wasn’t a great pass it was just crap defending…

        Imagine if you could quantify strikers or defenders/midfielders (with public statistics) with a success rate attributed to both hindered activities and open activities.

        If you want real market value statistics it would seem to me those would be them…

        Hope that helps???

        Best, Chris

        Like

  3. To be honest I never heard of ‘aggregate R2’ until Alex Olshanksy used if to support his research about possession… but here’s how I would explain it based upon what he said – aggregate R2 shows that possession compared to points earned for ‘teams as a whole’ has an aggregate R2 of .77……..

    When I measure possession specifically for a team, in a game by game event based approach, possession (alone) has very little correlation to points earned – the single game statistic with the best game to game correlation is usually goals scored or goals against – as noted above in the article – though goals against for a team like Arsenal is better than Goals Scored…

    So for me, that (goals scored) logic (for attack) supports your theory that goals scored has relevance and (if considered a repeatable statistic) means your approach and logic makes sense… hope that helps???

    Overall, I think we agree to disagree that shots are repeatable – I simply believe they are not repeatable – therefore I won’t use them as a predictive tool… And even as my missus pointed out to me tonight – some PK’s are not repeatable either – for instance a PK taken in the snow or pouring rain is different than a PK taken in 90 degree heat…

    I suppose the other difference, for me, is that TSR is not an ideal statistic as it measures total shots taken to total goals scored – in my view that completely misses out the quality aspect associated with the strike itself – is it on target, whether or not the shot is blocked – good positional play by the defender – or whether or not it is saved – good positional play by the goal keeper…

    If TSR is not a ratio of shots taken to goals scored then I completely misunderstand what it is and should read about it again… it wouldn’t be the first time I misunderstood work by others 🙂

    Like

  4. Matty, one last thought for the evening – my CPWP Predictability Index (a title best attributed to some guys from Prozone as well as Ben Knapper and myself) has an R2 of .84 relative to points earned in the English Premier League Table (through week 22) and that Index ‘excludes’ goals scored per shots on goal from my Index.

    It was agreed that the best way to create a predictability model from PWP was to omit goals scored and goals against. But I don’t rely on it to actually predict outcomes because I don’t have 15 home games and 15 away games (the minimum sample size I need to validate the tool as a forecasting model – as you say in your article – my approach is evaluating past events to ‘explain’ not forecast…

    And since I don’t see shots taken as being repeatable I can’t simply use shots taken either… because there is no way to ensure at least 15 shots taken were taken under the exact same circumstance, with the same defending players in the same exact defending positions while also having the ball struck exactly the same from exactly the same place…

    Like

Comments are closed.

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: