Football Data Visualization

Guwahati • 2013

Introduction

This project focuses on visualizing the data of over 20,000 football players from around the world. Only the international career data is used in the project. Club football data is not used. Largely because it is very difficult to find data. Also, different leagues have different standards, so they can't be compared directly (for example, La Liga is obviously more competitive that say, Danish Superliga)

The script I wrote to scrape the data from Wikipedia can be found here I used Node.js for the script. I am planning to write a detailed blog post on the entire 'scraping' part of the project in near future.

The full dataset can be found here The data is in CSV format. You are free to use it for any purposes.

Getting Football data is more difficult than you think. Unlike Basketball and American Football, data on Soccer is more sparse. Sure there are wonderful repositories of open-source data, like football.db but for the purpose of the project, I needed the birth locations of each and every player.

For this, it was necessary to scrape the data on my own. I used Node.js for scraping. ExpressJS, Request and Cheerio frameworks were used. The source code which was used for scraping can be found here

ScatterPlot Visualization

However, the GeoData file and the dataset didn't match (GeoData file was of 2001 census while the dataset was based on 2013 data) Additional districts had been created during the time and many local boundaries were redrawn.

South America

Below is the Geo Scatterplot of the birth places of South American players. The size of the bubble corresponds to the appearances made by the players.

  • BLN   Forward players are colored red

  • BLN   Midfielders are colored green

  • BLN   Defenders are colored yellow

  • BLN   Goalkeepers are colored blue

Forward Players

Midfield Players

Defense Players

Goalkeepers

European Geo-ScatterPlot

Geo Scatter plot of the continent where football player was born. Each dot represents the birth location of the player. The size of the bubbles are proportional to their appearances.

Since the dataset is quite large, the visualization script for Europe is slow (somewhat!) Also, another point to be noted is that the map of Turkey isn't shown here. This leads to weird situation where all the players of Turkey appear to be born in sea. Will be fixing it soon when I find a better European Union GeoJSON map.

As you can see, most of the football players are concentrated in the central European regions. There is stark regionalized difference in the distribution of Football players in many countries like France/Scandalavian countries.

Europe has had a very colourful history over the past decades. The territorial boundaries have changed a lot since World War 1. Hence, the distribution of Football players is not uniform.

Although, the boundaries shown are correct as they stand today, there are couple of other points to be noted:

  1. West German Football team has not been included in the plot. There were only 26 West German players for the entire Cold War period compared to 161 East German players
  2. I've included Soviet Union too, along with its present day successor states. This was because there was a large discontinuity in dataset otherwise.
  3. The map of Turkey has not been included in this map.

Geo-torque map over the years

Birthplaces of Football players over the years. The data from the year 1871 was available, when the game was first played in England. Since then, it has propogated to nearly all the corners of the world.

These timelapses roughly depict the propogation of the game of Football over the last century. Europe and South America are the continents where the game became popular first, and then it is slowly adopted by Asian and African countries.

Below is the Geo-torque map for European continent. Press 'Run' to play the visualization:

1871

1871

Overseas Players

Since I had the birthplace coordinates of each and every players, I could chart out the world map and plot the birth coordinates of a players of a particular country and find out who / how many players were born overseas. Once again, the size of the bubbles is kept proportional to the Appearances to give it a weighted emphasis.

Below is the birthplace scatterplot of international players who have played for any of the South America's football teams. In total of 16 South American football players were born ovreseas. There was no particular pattern for the results.

In contrast to South American international players, European players have been born in a lot of overseas locations. It is evident from the map plotted below. Total of 62 European football players were born overseas.

A large number of players were born in erstwhile South American and African colonies of imperialist countries. Spain, Portugal and France lead the numbers with 17, 11 and 19 players born overseas respectively.

The full dataset which was scraped for this project can be found here The data is in CSV format. You are free to use it for any purposes.

For visualization purposes, a JavaScript visualization library: d3.js has been used. I will document and upload the JS code once the project is completed.