Building Cepsa’s Data Lake

data lake, production

The first product we have created is Data Lake – Production, which stores the information (150 million data points a day) from 300,000 sensors installed at eight of our industrial plants: La Rábida Refinery, Gibraltar and Tenerife refineries, the chemicals plants at Detén Química (Brazil), Shanghai (China), Puente Mayorga, Palos de la Frontera, and the biofuels plant at San Roque.

how did we do it?

Data Lake Production is built on the Amazon Web Service, which is capable of collecting, storing, and giving users access to an average of 2,000 signals a second in a safe and scalable manner on Artificial Intelligence platforms. All of this information is available in real and historical time and has limitless storage. Another benefit for Cepsa is that the information from our plants and laboratories can be enriched with other external data such as meteorological, cost or price information.

Continue reading  min

Did you know that a refinery generates 170 billion data each day

The result

A strong product that allows us not only to store the data generated at our production plants, but to extract value from this information to help us be more efficient in our processes and anticipate decisions.


This will help us build other initiatives that require this information. For example, the optimization of our chemical processes using artificial intelligence at our Huelva chemical plant. 

Team work has been key in carrying out this project, which opens the way to exploring data from our plants to help us meet the challenges of Industry 4.0.
Pepe Leal
The Data Lake of Manufacturing will make it possible for tools for analysis, optimization and visualization to access all the data to extract value.
Ramón Rodríguez López

Data Lake Production