SQL Demo

 

In this demo I answer the question “Who defeated Manchester United in the 2014/2015 season?” using SQL and the European Soccer Database. Click to skip to the results.

Details of the database

The European Soccer Database contains information from >25k matches, including names of >10k players, various attributes of the players and teams (such as the team line up and squad formation), as well as historic betting odds from up to 10 providers, and details of match events (such as the number of fouls and corner kicks). The data spans 11 European countries with their lead championships from seasons 2008-2016, and is contained in 7 tables, Fig. 1.

EU-soccer-db-details
Fig. 1 Details of the database we will investigate (source: kaggle.com)

Tables of interest

To answer the question we only need two out of the seven tables: Match and Team. Their relationship and some relevant column names are shown in Fig. 2.

details-of-tables
Fig. 2 Schema for the Match and Team tables. Only a few relevant columns from the Match table are displayed.
The team_api_id fields map to each other between the tables as shown.

An sample of the Match table is as follows

SELECT id, date, home_team_api_id, away_team_api_id, home_team_goal, away_team_goal, season
FROM Match
LIMIT 3;
id date home_team_api_id away_team_api_id home_team_goal away_team_goal season
1 2008-08-17 00:00:00 9987 9993 1 1 2008/2009
2 2008-08-16 00:00:00 10000 9994 0 0 2008/2009
3 2008-08-16 00:00:00 9984 8635 0 3 2008/2009

And similarly, for the Team table:

SELECT  id, team_api_id, team_fifa_api_id, team_long_name, team_short_name
FROM Team
LIMIT 3;
id team_api_id team_fifa_api_id team_long_name team_short_name
1 9987 673 KRC Genk GEN
2 9993 675 Beerschot AC BAC
3 10000 15005 SV Zulte-Waregem ZUL

Building the query

We will need to scan through every match played by Manchester United in the Match table, identify who they were playing against, limit it to the 2014/2015 season, and further narrow it down to just the games where they lost.

Since some of the time they will be the home team, and some of the time the away team, we will actually need to scan through the Match table twice. In a first step, it will be useful to obtain the team_api_id and team_short_name for Manchester United in order to hardcode this into future queries:

SELECT team_short_name, team_api_id AS MU_team_api_id 
FROM Team 
WHERE team_long_name = 'Manchester United';
team_short_name MU_team_api_id
MUN 10260

Home team games vs Away team games

To retrieve the match IDs in 2014/2015 where MU are the home team, as well as the outcome of the football match, we can do the following query:

SELECT m.id AS match_id,
    CASE WHEN m.home_team_goal > m.away_team_goal THEN 'MU Win'
         WHEN m.home_team_goal < m.away_team_goal THEN 'MU Loss' 
         ELSE 'Tie' END AS outcome_home_games
FROM Match AS m
LEFT JOIN Team AS t ON m.home_team_api_id = t.team_api_id
WHERE m.season = '2014/2015' AND t.team_api_id = 10260;

The first few results are:

match_id outcome_home_games
4013 MU Loss
4031 MU Win
4051 MU Win

A similar query retrieves the match IDs for when MU are the away team, where we would instead use m.away_team_api_id in the left join.

Combining into CTEs

By using common table expressions (CTEs) for the home and away games obtained above, we can do further joins back onto the Match table in order to extract all of the records for the different football games played by MU.

-- Set up the home team CTE
WITH home AS (
  SELECT m.id, t.team_long_name
  FROM Match AS m
  LEFT JOIN Team AS t ON m.home_team_api_id = t.team_api_id),
-- Set up the away team CTE
away AS (
  SELECT m.id, t.team_long_name
  FROM Match AS m
  LEFT JOIN Team AS t ON m.away_team_api_id = t.team_api_id)
-- Select team names, the date and goals
SELECT DISTINCT
    date,
    home.team_long_name AS home_team,
    away.team_long_name AS away_team,
    m.home_team_goal,
    m.away_team_goal
-- Join the CTEs onto the match table
FROM Match AS m
INNER JOIN home ON m.id = home.id
INNER JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
      AND (home.team_long_name = 'Manchester United' 
           OR away.team_long_name = 'Manchester United');

The first few results:

date home_team away_team home_team_goal away_team_goal
2014-08-16 00:00:00 Manchester United Swansea City 1 2
2014-11-02 00:00:00 Manchester City Manchester United 1 0
2014-11-08 00:00:00 Manchester United Crystal Palace 1 0

A few notes:

Putting it all together

The most recent query above returns 38 records in total – comprising all of the games played by MU in that Premier League season. This is few enough that we could simply go through them and read off which teams defeated Manchester United.

However, what if many more games were played per season? Or what if we were instead interested in results across every season in the dataset? In either case it would be unrealistic to read through every fixture. We can add a CASE statement to the main query with a corresponding WHERE clause to list just the outcomes which were losses for Manchester United:

WITH home AS (
  SELECT m.id, t.team_long_name, t.team_short_name
  FROM Match AS m
  LEFT JOIN Team AS t ON m.home_team_api_id = t.team_api_id),
away AS (
  SELECT m.id, t.team_long_name, t.team_short_name
  FROM Match AS m
  LEFT JOIN Team AS t ON m.away_team_api_id = t.team_api_id)
SELECT DISTINCT
    date,
    home.team_long_name AS home_team,
    away.team_long_name AS away_team,
    m.home_team_goal,
    m.away_team_goal,
        CASE 
          WHEN (
            (m.home_team_goal > m.away_team_goal AND home.team_short_name = 'MUN')
            OR (m.home_team_goal < m.away_team_goal AND away.team_short_name = 'MUN')
          )
          THEN 'MU Win'
          WHEN (
            (m.home_team_goal < m.away_team_goal AND home.team_short_name = 'MUN')
            OR (m.home_team_goal > m.away_team_goal AND away.team_short_name = 'MUN')
          )
          THEN 'MU Loss' 
          ELSE 'Tie' 
        END AS outcome
FROM Match AS m
INNER JOIN home ON m.id = home.id
INNER JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
      AND (home.team_long_name = 'Manchester United' 
           OR away.team_long_name = 'Manchester United')
      AND outcome = 'MU Loss';

Results

date home_team away_team home_team_goal away_team_goal outcome
2014-08-16 00:00:00 Manchester United Swansea City 1 2 MU Loss
2014-11-02 00:00:00 Manchester City Manchester United 1 0 MU Loss
2015-01-11 00:00:00 Manchester United Southampton 0 1 MU Loss
2015-02-21 00:00:00 Swansea City Manchester United 2 1 MU Loss
2015-04-18 00:00:00 Chelsea Manchester United 1 0 MU Loss
2015-04-26 00:00:00 Everton Manchester United 3 0 MU Loss
2015-05-02 00:00:00 Manchester United West Bromwich Albion 0 1 MU Loss
2014-09-21 00:00:00 Leicester City Manchester United 5 3 MU Loss

In conclusion, in the 2014/2015 season Manchester United were defeated by seven teams: Southampton, Swansea City (twice), Manchester City, Chelsea, Everton, West Bromwich Albion, and Leicester City.