Cloudera makes bold bet on strategic acquisition of Verta’s Operational AI Platform

Read the blog


Optimized data management of a fleet of more than 2,000 trains that supply hundreds of terabytes of information

Reduction of crowds and capacity control during events in the Community

Service quality enhancement: predictive maintenance, optimized network planning or control of station capacity and real-time occupation of trains

Metro De Madrid is a rail service in Madrid, with 294 km and 302 stations throughout the network. It is ranked among the great underground systems in the world: it is the fifth in terms of the number of stations, behind London, New York, Shanghai and Paris. As a big company, with 6,977 employees, is the heart of mobility in the capital city of Spain. Their mission: make the Metro the preferred mobility option in the community, offering efficient and quality transport for all the citizens of Madrid. Public service, efficiency, and customer experience are the three key pillars to achieving this mission and positively contributing to society. Moreover, daily activity must be supported on principles of security, sustainability, technological innovation and digitalization.

Limited Access to data

The Madrid underground metro network needed to analyze the mobility of passengers at different times and on different routes, in order to be able to effectively plan the number of trains moving on each line. The massive volumes of data were only available through a relational database with partial data sets due to capacity and limited user access. Managing the data was very labor intensive and expensive.

Previously, Metro de Madrid carried out massive surveys of suburban users every seven years at the stations. This kind of research produced a limited, static data set that did not take into account the special events calendar like concerts, soccer games or holidays.

Real-time analytics

In order to leverage ridership data to make service improvements that impact positively on citizens of Madrid, Metro de Madrid is boosting digital transformation. To make it possible, the organization needed a transformation of its data management. For this reason, a project on Big Data ecosystems was launched to select the best technology.

After analyzing different solutions, the team chose Cloudera Data Platform (CDP) because of its functionality, scalability, and lower total cost of ownership relative to other options. Since Metro de Madrid migrated to CDP, data-driven decisions are near-instant.      

The information in real time comes from a control panel and allows Metro de Madrid to automatically generate warnings based on what is detected.

Previously, Metro de Madrid could only track users when they enter the underground system  but not when they depart, which paints an incomplete picture of the typical commuter journey. With Cloudera technology, Metro de Madrid developed an algorithm based on the turnstiles, which analyzes the different entry patterns to deduce where the passenger has left and obtain the usual routes. This system takes into account the modeling of schedules and days and has allowed an evolution that has gone from the generation of a single generic matrix every seven years to now obtaining 80 daily matrices modeled by type of day.

"With Cloudera technology, this information, which is obtained every 15 minutes, is what determines the exact number of trains that should drive on each line at a certain time or the exact number of passengers per square meter in each wagon", says Metro de Madrid.

Metro de Madrid is using Cloudera Data Platform to advance in their data strategy and in the future will look for the right opportunities to take advantage of the flexibility of cloud. In addition, Metro de Madrid is receiving and extracting information from trains for analysis with CDP.

A better service for the community

The use of Big Data has allowed Metro de Madrid to take a series of measures aimed at assuring passenger safety during the COVID-19 pandemic. Thanks to the modeling of the origin-destination matrix, Metro de Madrid analyzed historical data and predicted the number of people riding on each train. This information management has allowed the subway to predict, every 15 minutes, the density of the stations and to stop, in case of exceeding the capacity, the access to the stations to ensure security measures. Today, this functionality is used to control capacity in crowds such as soccer matches, concerts, demonstrations, etc.

In addition, Metro de Madrid captures and manages train information in real-time, sending it to Cloudera’s platform and carrying out a decrypted analysis, which translates data into different activity indicators. This data is oriented to three different areas: maintenance, service planning and customer service.

In maintenance, data assets of the railway network are obtained from sensors in the trains, traffic signs, or rolling stocks. Managing this data, Metro can predict possible breakdowns and implement intelligent, predictive maintenance. This provides more agility in the resolution of incidents, improves the quality of the service, and reduces exponentially the stops of trains in full operation.

As for service planning, the technological platform helps Metro de Madrid manage the capacity of its stations, analyzing data provided by the facilities, such as escalator start-up, turnstile information or the train itself.

"We have a lot of work ahead of us, we now see that we have unlimited ground in the world of data. Technological barriers have been eliminated and the possibilities are multiplying," concludes Metro de Madrid.

"We have achieved an evolution from a generic matrix every seven years to achieve 80 daily matrices modeled by type of day thanks to Cloudera," says Metro de Madrid.

With Cloudera technology this information, which is obtained every 15 minutes, determines the exact number of trains that drive on each line at a certain time or the exact number of passengers per square meter in each wagon.

Metro de Madrid.

Your form submission has failed.

This may have been caused by one of the following:

  • Your request timed out
  • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.