Carrefour On-Premise Big Data Platform

Case Study

On-Premise Big Data Platform Developed for Carrefour

eSolutions has implemented for Carrefour Romania a big-data platform with the purpose of optimizing the company’s operations. The platform aggregates information about stocks, prices, sales, promotions, orders, etc. from shops and warehouses all over the country.

Data is collected and processed in a centralized manner in order to be subsequently consumed by other systems and applications, through APIs, or transferred to external systems.

The big-data platform implemented for Carrefour Romania increases performance and process scalability, has high availability to perform daily routine activities, and facilitates access to data for the entire ecosystem of apps and IT systems. The platform supports product lifecycle automation through integrated technologies and increases the speed at which changes can be made, therefore contributing to the optimization of operations by decreasing data processing time.

The Client

With over 10.100 stores in 34 countries, the Carrefour Group is the second retailer in the world and leading in Europe. Daily, over 10 million customers visit Carrefour stores across the world, enjoying a wide variety of products and services at fair prices.

In Romania, the Carrefour Group offers its customers multiple shopping possibilities, both in physical shops as well as online, through the unique portal at www.carrefour.ro or through the delivery service BRINGO.

Business Challenge

With the evolution of the company, business needs and technical requirements have increased. Thus, our partner was dealing with a series of challenges, such as: accumulating large volumes of data over time, as a result of increasing the number of stores; heterogeneous systems that were producing data; increasing data processing needs and transforming it into valuable information; the need to dramatically decrease the time in which data becomes available (real-time/ near real-time).

Therefore, the client was facing difficulties in centralizing, processing, and transmitting data due to its large volume, inconsistency, and lack of standardization. Communication between applications and platforms was hampered by the complexity of the ecosystem and the inexistence of a standardized approach.

Solution delivered

  • The first step in laying the foundation of the big-data platform was to install and configure the infrastructure (servers, virtual machines). We opted for scalable cloud-ready infrastructure, using the Hadoop platform.
  • For the data collection/transfer/ingestion layer we chose Apache NiFi, an extremely flexible, efficient, and complete solution. We used Apache Hadoop (HDFS), on multiple nodes, for distributed data storage. Distributed data processing is made with Apache Spark and real-time events are processed with Apache Kafka (Kafka streams).
  • Storage of data processed according to business needs is made using Apache Cassandra (high performance for scalability, throughput, and availability). For other usage scenarios, PostgreSQL, Redis, and Apache Druid are also used.
  • Data is offered for consumption (ingestion) in different formats, via secure APIs.
  • From the security point of view, differentiated access was configured between different data areas, alerts were configured, as well as data recovery procedures in case of emergency.
  • The platform includes a monitoring solution offering information and alerts not only concerning the infrastructure and technical components but also for the business processes being executed. For logging and monitoring, we used ELK Stack (Elasticsearch, Logstash, Kibana).
  • The solution has been built so that any component can scale horizontally for future business needs or technological developments, offering, at the same time, high availability and fast access to data.

Results

The big-data platform implemented for Carrefour gathers data from across 400 stores concerning 500.000 unique products, being updated and served to other applications in real-time. There are aggregated, among others, 35 million records that show the image of each product (stock, price, attributes, promotions, etc.) in all stores. Data flows processed in one day exceed 10 billion records.

The platform helps streamline and simplify internal processes, ensuring real-time access to data for the entire Carrefour application ecosystem, thus becoming a unique operational data source. For Carrefour customers, the entire process translates into improving the shopping experience through the eCommerce platform and product delivery services (Bringo).

The Team

The first phase of the project started at the end of 2017. Subsequently, new flows and features have been constantly developed, the project still being in progress. The project team is a complex one, including Big Data Solution Architects, Big Data Developers, Big Data Engineers, SysOps, Project Manager.

Technologies

Apache NiFi (for Data Ingest), HDFS (for Data Lake), Kafka Streams (for Event Sourcing), Spark (for Batch Processing), PostgreSQL, Redis, Cassandra, Druid (for Data Storage).