Moneyball 3 Soccer Statistics – Major League Soccer (2014 to 2016 in Review)

Each year for the last two years I’ve offered some thoughts on current soccer statistics, their strengths and weaknesses, how current efforts are taking shape, and how English Premier League team statistics (in a limited measurement period) offer that one size / type of statistic doesn’t show equally.

Here’s a link to those previous articles in case you missed them.

  1. Moneyball 2 – Soccer Statistics and Taking it to the Next Level (Published March 2016) (2nd most read article in 2016)
  2. Busting the Myth of Moneyball in Soccer Statistics   (Published Jan 2015)  (2nd most read article in 2015 and 4th most read article in 2016)

My intent with this latest review is to take a comprehensive look at team soccer statistics offered for Major League Soccer (from 2014 to 2016) and prove (beyond all doubt) that modern team soccer statistics are not, and never ever will, speak equally in value for all teams in one league in one year.

  • That’s a heavy load and it potentially flies in the face of individual fantasy statistics and how they’re used to calculate individual player performance.
  • It also brings into question the real value of individual player data large soccer statistic companies use to create a baseline market value for a player.
  • Finally, for me, and perhaps the most important.  This information should shine a very bright light on the weaknesses about how mainstream media want to tell you a story about a team/player that has relevance to another team/player.

To begin, my research takes into account every regular season game played in Major League Soccer for 2014, 2015, and 2016.

  • The premise is to conduct the same analysis I conducted in my smaller sample size of the English Premier League published in January 2015.
  • I will look at team statistics (quantitative and qualitative datum) for each team, for each year, and determine which of those statistics had greater or less correlation to those teams earning points in the league table.
  • These measurements shall be considered as a lagging indicators (first) but if patterns appear (team to team and year to year where the head coach has been the same) then I would offer these same measurements (may?) also be interpreted as a leading indicator.
  • All told there have been 2024 games played with roughly each team playing 102 games each – 51 home and 51 away; this excludes the newer teams like New York City FC and Orlando City FC.
  • Categories measured for correlation to points earned include for both teams, in every game, (1012 games played) Total Passes, Passing Accuracy, Possession Percentage, Total Passes Successful/Unsuccessful, Total Final Third Passes, Final Third Passes Successful/Unsuccessful, Percentage of Successful/Unsuccessful Passes Final Third, Passes Successful/Unsuccessful Outside the Final Third, Passes Successful in Final Third versus Passes Successful Entire Pitch, Shots Taken, Percentage of Shots Taken versus Successful Final Third Passes, Shots Blocked, Shots on Goal, Percentage of Shots on Goal versus Shots Taken, Goals Scored, Percentage of Goals Scored versus Shots on Goal, Percentage of Goals Scored versus Shots Taken (ATSR), Percentage of Opponent Goals Scored versus Shots Taken (DTSR), Composite Goals Scored versus Shots Taken (CTSR), Goals Saved, Save Percentage, Location, APWP Index, DPWP Index, CPWP Index, Goal Differential, Crosses, Percentage of Successful/Unsuccessful Crosses, Corners, Duals Won, Tackles Won, Clearances, Fouls, Yellow Cards, Red Cards, and PKs Awarded.

The results (by team):

  • Before offering up those categories that had the best positive or negative correlation to points earned (by team) here’s a look at what I call “noise”; these are team statistics that show no substantive correlation (separately) to points earned.  Statistically speaking these are all the measurements that showed less than .3 (or greater than -.3) r to points earned.  In no particular order here are the results:
    • Clearances
    • Opponent Unsuccessful Crosses
    • Opponent Crosses
    • Shots Taken divided by Passes Completed in the Final Third
    • Opponent Unsuccessful Passes Final Third
    • Attacking Total Shot Ratio
    • PKs Awarded
    • Opponent Red Cards
    • Shots Taken
    • Percentage of Clearances to Opponent Final Third Crosses and Corners
    • Percentage of Successful Passes Final Third
    • Opponent Percentage of Passes Completed Outside the Final Third
    • Opponent Possession Percentage
    • Percentage of Successful Crosses
    • Opponent Total Final Third Passes
    • Opponent Total Passes
    • Opponent Successful Crosses
    • Opponent Corners
    • Opponent Passes Completed Outside the Final Third
    • Opponent Shots Blocked
    • Opponent Total Passes Unsuccessful
    • Goals Saved
    • Opponent Passes Attempted Outside the Final Third
    • Tackles Won
    • Fouls
    • Opponent Passing Accuracy
    • Opponent Final Third Passes Successful
    • Duals Won
    • Passes Completed Final Third versus Passes Completed Entire Pitch
    • Opponent Fouls
    • Opponent Percentage of Unsuccessful Passes
    • Passes Completed Outside the Final Third
    • Saves
    • Opponent Duals Won
    • Opponent Tackles Won
    • Final Third Passes Successful
    • Total Passes Completed
    • Opponent Passes not Completed Outside the Final Third
    • Shots Blocked
    • Total Passes Unsuccessful
    • Possession Percentage
    • Opponent Percentage of Successful Passes Final Third
    • Successful Crosses
    • Total Final Third Passes
    • Opponent Cross Completion Percentage
    • Opponent Shots Taken
    • Opponent PKs
    • Red Cards
    • Defending Total Shot Ratio
    • Opponent Shots Taken versus Passes Completed Final Third
    • Final Third Passes Unsuccessful
    • Crosses
    • Unsuccessful Crosses
    • Opponent Clearances
    • Opponent Shots on Goal versus Shots Taken

My thoughts on this initial listing:

A pretty comprehensive list of standard and derived statistics many may be familiar with.

Perhaps most damning of all are the team statistics that are comprised of individual statistics used to substantiate a players value (either in fantasy soccer or real-time statistics).

Note the inclusion of clearances, tackles won, duals won, yellow card, passing accuracy, crosses, fouls – and most importantly – the favorite among television punters – possession percentage.

It may be harsh or cruel to offer this summation – but when looking at team statistics on television, or even in news articles the inclusion of any one of these statistics, without relation to other statistics, is flawed – seriously flawed.

If the United States is going to get better at understanding and playing soccer, at the world level, then the nonsense offered by pundits on TV and in the mainstream media needs to take notice.

After all – it’s these folks that have the ears of our younger players – and the more information they are fed that is erroneous the less likely this country will ever reach the heights of Germany, France, Brazil, Spain, England or others…

Now for those team statistics that offer the best positive correlation to points earned; in order of importance/correlation to points earned in Major League Soccer:

  1. Goal Differential – the gold standard:  r .86
  2. Composite Possession with Purpose Index – the silver standard:  r .74
  3. Goals Scored – the bronze standard: r .63
  4. Those greater than r of .3 but less than bronze include Possession with Purpose Attacking Index (.53), Goals Scored versus Shots on Goal (.44), Composite Total Shot Ratio (.38), Shots on Goal (.35), and Shots on Goal versus Shots Taken (.32)

Note the Composite Possession with Purpose Index is a composite of 6 qualitative team statistics – showing that it’s a combination of measured quality (across the pitch) that exceeds the simplified measurement of goals scored.

In other words – goals alone are not good enough – you need some form of qualified team possession, penetration, shot creation, shots taken, shots on goal, and goals scored to win a game.  And the good teams (those that usually earn points) are very “consistent” in offering that combination (chemistry) of events.

insert picture……….

But there’s two sides to soccer – here’s the best team statistics with a negative correlation to points earned; in order of importance/correlation to points earned in Major League Soccer:

  1. Opponent Goals Scored – the gold standard:  r -.58
  2. Possession with Purpose Defending Index – the silver standard: r -.50
  3. Opponent Goals Scored versus Shots on Goal – the bronze standard: r -.45
  4. Those less that -.3 but greater than bronze include Save Percentage (-.42), and Opponents Shots on Goal (-.32)

Finally, there’s also two locations in soccer – home and away.  Do teams show different correlations to these same measurements when playing at home versus away.  The common belief is yes, let’s see.

  1. The gold standard for home games shows Goal Differential as having the highest correlation (.86)
  2. The silver standard, again Composite PWP Index, is next up at (.71)
  3. The bronze standard, too, is the same; Goals Scored at (.61)
  4. Those greater than r of .3 include PWP Attacking Index (.57), Goals Scored versus Shots on Goal (.51), Composite TSR (.49) Shots on Goals versus Shots Taken (.41), and Opponent Unsuccessful Cross and Opponent Crosses both (.31).
  5. The last two above .3 are new and replace Goals Scored.

Again the Composite PWP Index shows itself as a stronger summation of success related to points earned than other individual team statistics (excluding goal differential).  What is interesting is the emergence of Opponent Crosses, and their failure, as showing correlation to the home team winning.  Does this offer a tactical piece of evidence that teams can take advantage of when playing at home?

  1. The gold standard team statistic with the best negative correlation to points earned is Opponent Goals Scored at (-.55)
  2. The silver standard is Save Percentage (-.48), and
  3. The bronze standard (tied at (-.43) are Opponent Goals Scored versus Shots on Goal and Opponent Clearances.
  4. Defending PWP Index (-.42), Crosses (-.38) and Unsuccessful Crosses (-.37) round up those less than (-.3) correlation to points earned.

Like from the attacking side; there are some new additions in the top tier; Crosses and Unsuccessful Crosses as well as Opponent Clearances.  Save Percentage edged up to 2nd and Defending PWP Index dropped to 4th.

In looking away – here’s the best team statistics with a positive correlation to points earned:

  1. Gold standard is Goals Differential (.85)
  2. Silver standard is Composite PWP Index (.71)
  3. Bronze standard is Goals Scored (.60)
  4. Composite TSR (.48), PWP Attacking Index (.45), Goals Scored versus Shots on Goal (.42), Clearances (.41), Opponent Crosses and Unsuccessful Opponent Crosses come in tied at (.37)

The top three remain while Clearances as well as Crosses and Unsuccessful Crosses show well too.

  1. Goal standard for the best negative correlation is Opponent Goals Scored (-.56)
  2. Silver standard is Defending PWP Index (-.55)
  3. Bronze standard is Opponent Goals Scored versus Shots on Goal (-.49)
  4. Save Percentage falls back to 4th (-.41), Opponent Shots on Goal versus Shots Taken (-.38) and Crosses (-.38).

A return to 2nd for Composite PWP and Save Percentage drops back to 4th best.  Crosses, too, makes an appearance in the top tier.

In conclusion:

  • Goal Differential continues to remain the gold standard as a team statistic that best correlates to points earned.
  • Composite PWP Index remains 2nd best.  What makes it best, in my view, is the ability to parse out the 6 qualitative parts of the Index to translate a bigger picture about what team events went well and what didn’t go well… in comparison to the opponent.
  • Crosses, Unsuccessful Crosses (failure), and Clearances show well as individual team statistics that have okay positive or negative correlation to points earned.
  • By okay, I mean there is ‘some’ correlation at times – but it’s not consistent.
  • That said, it’s far more consistent in being okay than these better known, and oft used, team or individual statistics:
    • Possession Percentage
    • Passing Accuracy
    • Penetration into the Final Third
    • Shots Taken
    • Corners
    • Tackles
    • Duals
    • Yellow Cards
  • Bottom line here – when someone is offering that the market value/importance of a player or position of a player shows he’s a better player because his individual statistics (like those above) are better than elsewhere that person really hasn’t got a clue about what they’re selling.