When metro networks increase in size, it becomes more difficult to understand which stations people are using most intensively. But this could help a city improve its transportation schedule or provision for station maintenance/cleanliness.
Transport for London has made available the "London Underground Performance Reports", which allows us to see the number of entries and exits at each metro station, during weekdays, Saturdays or Sundays between 2007 and 2017. The data is quite large (occasionally containing some small errors) and spread across multiple Excel scheets, which is somewhat inconvenient. Before we begin, we could order it in a single, easy to access and extend file that doesn't require extra libraries dedicated to data extraction.
Each row in these neat blocks has 6 values related to "entry weekday", "entry Saturday", "entry Sunday", "exit weekday", "exit Saturday" and "exit Sunday".
We could use the data about the 268 points of interest to find metro stations where the highest number of people have entered and observe any changes over time:
Here the station "Piccadilly Circus" comes between "Paddington" and "Leicester Square", so there is a small overlap. We also notice the high growth rate of "Stratford", which in ten years has surpassed stations which were initially much more popular. While in 2017 many stations have registered slight decreases in the number of people entries, the station with the highest number of entries—"King's Cross St. Pancras"—has grown even further. For clarity, many other stations have been omitted here.
Similarly, we can observe stations with high number of exits for the whole week, which looks quite similar:
It is then not difficult to construct the difference between all entries and all exits:
Normally, we could expect most stations to have a difference between entries and exits of zero, relatively consistently preserved over the years. If we were to draw the data for all stations, we would see the thick additive line they form around zero. To avoid overplotting, we can show only the stations that most significantly deviate from that zero. The diagram shows that more people entered the stations "Finsbury Park", "Startford", "Mornington Crescent" and "Vauxhall" than came back from them. At the same time more people left through the stations "Covent Garden", "Camden Town", "Green Park" and "Oxford Circus". The reasons behind this outcome can vary.
To see where people go at the weekend, we consider only the values related to Saturday and Sunday:
"King's Cross St. Pancras" is a lot more popular than the rest, which may indicate that people are further traveling by train or waiting for the arrival of their friends/relatives. Another well-separated cluster build the stations "Oxford Circus", "Victoria", "Stratford" and "Waterloo".
At the end we could also see which are the least used stations in terms of total number of entries and exits on all days:
Between 2015 and 2017, the station "Chorleywood" has lost one third of the dynamics its previously had (total entries and exits). On the other side, "Chesham" has increased its low popularity among travelers by almost 300% over ten years. "Roding Valley" is the only station registering a total of entries and exits lower than 2000.
Overall, this data could become more interesting/useful if connected to other datasets, so that one could see how events in the surroundings affect travelers at a given day. But this requires even more data.
Note: "(number of people)" on the diagrams is likely incorrect, since one person can enter/leave the same station many times. An entry/exit is not necessarily a different person each time.