Screen Shot 2018-02-26 at 19.50.29

About

Visualising rail disruption in a nutshell

On average, a Londoner spends 18 months of their life travelling to and from work. For commuters all across the country, travelling to work often involves multiple stages, meaning a delay at any stage could potentially be amplified as the journey progresses.

There are now a multitude of apps providing real-time data to help people plan journeys they are about to take. What often isn’t included are indications of a particular service’s reliability based on historic records of punctuality. Such a tool would be just as useful in planning a new journey, or reviewing a regular commute.

With funding and support from the Open Data Institute, the Fasteroute/Visualising Rail Disruption team built a web application, Delay Explorer, and integrated records of historic punctuality into their app, Fasteroute. With these, users can save significant amounts of time by exploring potential routes and planning journeys by rail using services that are more reliable. They can also avoid risky connections that could be missed due to trains that are often delayed.

The Fasteroute/Visualising Rail Disruption team are:

  • George Goldberg
  • Joe Letts
  • Craig Buchanan

Analysing the data

The Visualising Rail Disruption team worked with the ODI to analyse aggregated data over 11 weeks, to find how commuters could save time stuck on trains by changing the time they travel. Trains arriving into 16 major commuting hubs across Britain were analysed for their punctuality, by recording any discrepancy between the plan-of-the-day schedule and a service's arrival time. Discrepancies were characterised as being 1–5 minutes late, 5–10 minutes late, 10–15 minutes late, more than 15 minutes late or more than 30 minutes late, considered equivalent to being cancelled. The trends were analysed in half-hour periods throughout the morning between 7am and 10am.

You can find a link to the aggregated data here (Google Sheet). Fasteroute worked with the ODI to find out what the data means for commuters across Britain.

Categorised into 30-minute time periods between 7am and 10am, the data shows that:

  1. Services arriving into cities between 8:30am and 9am are the most unreliable for commuters. An average of two-fifths (42.5%) of train services are delayed or cancelled during this time, causing commuters using these services to endure over 10 hours a year in delays
  2. More than one-third (35.7%) of all train arrivals between 8:30am and 9am are delayed by between one and 10 minutes, accounting for the most common delay durations

Considering all train delays, on average commuters could:

  • Recover over three and a half hours a year if they travelled an hour earlier (arriving 7:30–8am instead of 8:30–9am), reducing the average chance of delays by a quarter
  • Save early risers travelling an hour and a half earlier (arriving 7–7:30am instead of 8:30-9am) over five hours a year, reducing the average chance of delays by half

Although 1–10 minute delays are considered by the rail industry to be minor disruptions, avoiding such delays will allow commuters to accrue time savings over a year. They are more likely to avoid knock-on impacts, such as missing transport connections, which could result in further delays.

The data shows that delays become longer as the morning goes on, which may suggest that earlier delays affect later services. If people were to work more flexible hours, they could reduce their exposure to regular delays. This may also reduce passenger congestion and ease pressure points on the network.

Cities most affected by morning delays

On average, between 7am and 10am, commuters have between a one-in-four chance (24%, 7–7:30am) and two-in-five chance (43%, 8:30–9am) of arriving late. This varies from city to city:

  1. Commuters in Birmingham, Manchester, York, Sheffield and Cardiff are most likely to be delayed on services arriving between 7am and 10am (two in five, 39–45%, of trains are late)
  2. Commuters arriving into Birmingham, Manchester, Glasgow and Cardiff between 8:30am and 9am, the busiest time for train services, are more likely than not (50–52%) to be on a service arriving late rather than on time
  3. Commuters in Exeter, Southampton and Newcastle are the least likely to arrive late with roughly one in four (23–31%) late services

The analysis categorises delays as one minute or more after the planned arrival time, which differs to network operators that may classify delays as five minutes for short journeys and 10 minutes for longer journeys.

Background

The UK’s urban population is growing in size, pressuring public services and resources. While city planners and local authorities work hard to future-proof infrastructure against the needs of a growing population – by developing transport services, for example – plans may take years to come into effect. Better use of open data can provide insights to help the public to find more efficient ways of using services already in operation.

On average, a Londoner spends 18 months of their life travelling to and from work. For those whose commute involves multiple means of transport, a delay at any stage could potentially be amplified as the journey progresses. Imagine, then, if a commuter had access to information that indicated that taking a more reliable train 10 minutes earlier than usual could get them to their destination in less time. Any onward journey may also be completed in shorter time: commuters would be more likely to catch connecting services by rail or bus. By arriving to work in less time, they would be able to leave earlier, and potentially avoid further delays on their way home. Aside from simply experiencing less frustration, our commuter could have more time to read a bedtime story to their children.

Another commuter may be travelling by train to a job interview or an important meeting in a town or city that they don’t typically visit. If this person could find out how consistent and punctual certain trains were likely to be, they would be able to judge how much time to leave spare in case of delays.

For most commuters, rail travel is a means to an end. They want to find the fastest, most efficient route from A to B. Users of the Visualising Rail Disruption web application, and the smartphone app Fasteroute, can save significant amounts of time by exploring potential routes and planning journeys using services that are more reliable, avoiding risky connections that could be missed due to trains that are often delayed.

There are many apps and tools that use real-time open data to help commuters find the best immediate route to a destination, such as the National Rail Enquiries app or CityMapper. But until now, commuters have been unable to easily evaluate how reliable an individual service is by using a train’s historical record of punctuality.

— George Goldberg, Fasteroute developer and Project Manager of Visualising Rail Disruption

The Visualising Rail Disruption application provides useful visual information about the reliability of train services through a chosen station. It provides valuable information to people, such as potential house buyers or renters who may be searching for a home with access to well-connected, reliable rail links. House price websites already provide information about the nearest transport links and schools around a prospective property. Knowing whether transport links are reliable before making a large investment in an area can prevent these potential commuters losing minutes or hours every week due to rail delays and cancellations.

Since the project began, we have worked to improve access to the aggregated data, adding an indicator of reliability into our mobile application Fasteroute. Users can see at-a-glance which trains are reliably on time and which are not when looking at the departure board through the traffic-lights coloured indicators. When looking at the service details, they can see an explanation of the traffic light colour indicated for that train.

— George Goldberg, Fasteroute developer and Project Manager of Visualising Rail Disruption

About the data

Without open data, Fasteroute would not be able to provide accurate and comprehensive analysis of train punctuality across the rail network. The Delay Explorer and our mobile applications build upon the extensive open data provided by organisations such as Network Rail and National Rail Enquiries. — Craig Buchanan, Fasteroute team, developers of Visualising Rail Disruption

Open data is data anyone can access, use or share.

Visualising Rail Disruption used open data from National Rail's Darwin Push Port (available under the Open Government Licence). This data provides highly detailed information about rail schedules and real-time train movements of passenger trains up and down the country. This includes delays (and the official reasons for them), live train arrivals and departures, and changes to train schedules (from changes in time to changes in starting points, destinations and routes).

The project relied on map images from the OpenStreetMap project (Open Database Licence), which allowed us to create an accurate geographic representation of rail stations in the UK. This data produced the map imagery that the application displays in the main window and is a key part of the application.

Open source software was used to help us build the application, including Facebook's React framework. This software allows developers to produce simple, streamlined and interactive web applications. It is coupled with the Leaflet mapping library for the presentation of the OpenStreetMap tiles.

All the code written to collect, display and explore the data during this Summer Showcase Project is available from our Github Team Page

You can read about how how Visualising Rail Disruption worked with the ODI on the Collaboration page.

Methodology

The project produced a web application to allow commuters to explore rail service punctuality. In order to achieve this, a number of steps were carried out:

  1. The Darwin Push Port, which is the primary source for train timetable information and real-time updates on when trains actually arrive and depart, is a live feed, with no historical data storage. This made it necessary to build infrastructure to collect data in real-time and aggregate it to build up a historical data store.
  2. Data from NaPTaN and the Darwin Reference data sets were combined (along with some manually researched data) to produce a single reference for stations, including the two different identifiers used by National Rail systems (TIPLOC and CRS), station names and their geographical coordinates.
  3. A program was developed to aggregate the real-time data collected each day, to group together the same trains on each day they run, and generate the statistics over time to be presented in the web application.
  4. Finally the web application, based around a map from OpenStreetMap, was built to present this collected data.

Analysing service punctuality: methodology

Fasteroute, working with the ODI, first assembled a list of cities that were both regional centres and commuter hubs. The list of commuter cities and stations monitored in this analysis was created using the Office of Rail Regulation list of busiest rail stations in the UK, identifying destinations with large volumes of commuters. All arrivals at main stations in each city were counted: any difference with the scheduled arrival time was noted, and a mean average was calculated. Where multiple major main stations in a city are positioned on the same rail line, as is the case for Blackfriars and St Pancras on the City Thameslink line in London, any delay of a service visiting those stations was recorded at each, but to avoid counting the delay more than once, the average delay was used.

The aggregated data used to count the volume of trains arriving at each main train station (on time, under five minutes, 10 minutes, 15 minutes, and 30 minutes or cancelled) is presented in 30-minute increments: 07:01–07:30, 07:31–08:00, 08:01–08:30, 08:31–09:00, 09:01–09:30, 09:31–10:00.

The number of trains in each 30-minute period was counted and an average delay in minutes was calculated based on all delays of all services arrive in that timeframe.

Working openly: see how Fasteroute arrived at these findings

Due to the newness of the Darwin Push Port as a data feed, and the resulting limited availability of open source software to assist working with it, Fasteroute built a number of different tools as part of this project, all of which have been released as open source software under the Apache license. These components are as follows:

  • National Rail stations This project consists of a JSON file containing details of all National Rail stations in the UK, including their CRS codes, TIPLOC codes, managing Train Operating Companies, names, latitudes and longitudes. This data has been created by combining that available from the Darwin Reference data set, NaPTaN and some manual research to fill in the final few gaps.
  • Darwin Gateway This Java program acts as a proxy to the Darwin Push Port, receiving messages from it in real-time as well as pulling in snapshot files where necessary to ensure all data is present, and translating the messages from XML to JSON for easier consumption in downstream applications.
  • Darwin DB This is a Python program which listens to the Darwin Push Port (proxied by darwin-gateway), and records all train schedules and the updates and forecasts they receive to a PostgreSQL database for persistent storage.
  • Delay Explorer Aggregator This is a Python program which generates the aggregate statistics for timetabled trains, routes and stations presented in the train-explorer web application based on the data recorded each day by darwin-db.
  • Delay Explorer API This component provides a REST API for access to the data provided by the train-explorer-aggregator and national-rail-stations components. It is implemented partially in Python and partially in Go.
  • Delay Explorer, web version The web application frontend that, in conjunction with train-explorer-api, provides the user interface of the project, allowing non-technical users to explore the data collected and understand the different delay performances of the trains they travel on. It is implemented in ReactJS making use of Leaflet (and OpenStreetMap) for the map component.

Collaboration

The ODI has helped us financially to build and run the data collection tools and the web application, [and has provided] inspiration [for] how to utilise open data. Through the ODI we have been inspired to not only use and benefit from open data, but also to communicate about the use of open data to others.

— George Goldberg, Fasteroute developer and Project Manager of Visualising Rail Disruption

Analysing the aggregated data

You can find a link to the aggregated data here (Google Sheet). Fasteroute worked with the ODI to find out what the data means for commuters across Britain.

Categorised into 30-minute time periods between 7am and 10am, the data shows that:

  1. Arriving into cities between 8:30am and 9am is the most unreliable time for commuters. An average of two-fifths (42.5%) of train services are delayed or cancelled during this time, causing commuters using these services to endure over 10 hours a year in delays
  2. More than one-third (35.7%) of all train arrivals between 8:30am and 9am are delayed by between one and 10 minutes, the most common delay duration.

Considering all train delays, on average commuters could:

  • Recover over three and a half hours a year if they travelled an hour earlier (arriving 7:30–8am instead of 8:30–9am), reducing the average chance of delays by a quarter
  • Save early risers travelling an hour and a half earlier (arriving 7–7:30am instead of 8:30-9am) over five hours a year, reducing the average chance of delays by half

Although 1–10 minute delays are considered by the rail industry to be minor disruptions, avoiding such delays will create cumulative time savings and help commuters to avoid negative knock-on effects to journeys, such as missing transport connections, which could result in further delays.

The data shows that delays become longer as the morning goes on, which may suggest that earlier delays affect later services. If people were to work more flexible hours, they could reduce their exposure to regular delays. This may also reduce passenger congestion and ease pressure points on the network.

Cities most affected by morning delays

On average between 7am and 10am, commuters have between a one in four chance (24%, 7–7:30am) and two in five chance (43%, 8:30–9am) of arriving late. This varies from city to city:

  1. Commuters in Birmingham, Manchester, York, Sheffield and Cardiff are most likely to be delayed on services arriving between 7–10am (two in five, 39–45%, of trains are late)
  2. For Commuters arriving into Birmingham, Manchester, Glasgow and Cardiffbetween 8:30am–9am, the busiest time for train services, more than half (50–52%) of trains are likely to arrive late rather than on time
  3. Commuters in Exeter, Southampton and Newcastle are the least likely to arrive late with roughly one in four (23–31%) late services

The Fasteroute data analysis categorises delays as one minute or more after the planned arrival time, which differs to network operators that may classify delays as five minutes or more. The data is collected in real time, giving rail passengers an up-to-the-minute view of train service delays, allowing better-informed travel choices to be made to optimise their journeys on a daily basis.

Limitations

This data differs from the audited figures released by National Rail, as it shows the data in real-time before it is filtered. For example, National Rail’s audited figures may omit unexpected and unpreventable delays caused during adverse weather conditions. Fasteroute uses the raw real-time data from National Rail’s Darwin Push Port, published under an Open Government Licence.

National Rail measures against a Plan of the Day, which is the standard timetable with any amendments made for engineering works. This Plan of the Day is also used by Fasteroute when comparing real-time arrival data with the scheduled arrival time.

The overall punctuality performance significantly decreases during the most popular commuter arrival destination times of 8:30–9am (average delays were 3min 20s for all cities analysed, weighted by commuting population) and increases the earlier people travel (average delay was only 1min 33s for trains arriving 7–7.30am, for instance).

What next?

We see this project as being the starting point of allowing rail commuters opportunities to explore historical rail data. We are incredibly excited for the future as every day more National Rail data is collected allowing users to view an ever richer dataset. We envisage allowing users to select the historical date range of rail data that they want to view, so commuters could find out if their train is more delayed in the winter or in the summer, or if their rail service is getting more reliable with time. We also hope to keep building transport tools to help better inform travellers based on open data, and to spur on others to produce tools based on open data that allow people to make better decisions. In the future, this project could be expanded to cover more than just rail journeys. it could cover local buses, long distance coaches, city metro services and ferries. It could even be extended to cover private, personal transport – indicating historical data on delays due to congestion on motorways, for example.

— George Goldberg, Fasteroute