UE Data management in large-scale distributed systems

User information

Please note that you are curently looking at the ongoing Academic Programs. Applications are now closed for this academic year (2020-2021) for licences, professional licences, masters, DUT and regulated health training. If you are interested for an application in 2021-2022, please click on this link for the appropriate Academic Programs.

Degrees incorporating this pedagocial element :


Data management and knowledge extraction have become the core activities of most businesses and organizations. The increasing speed at which systems and users generate data has led to many interesting challenges, both in the industry and in the research community. Data management infrastructures are growing fast, leading to the creation of large data centers and federations of data centers. Current data requirements can no longer be handled exclusively with classic DBMS. They require a variety of flexible data models (relational, NoSQL...), consistency semantics and algorithms issued by the database and distributed system communities. In addition, large-scale systems are more prone to failures, and should implement appropriate fault tolerance mechanisms. The dissemination of an increasing amount of sensors and devices in our environment highly contribute to Big Data and the development of ubiquitous computing. Data stream processing, combined with persistent data processing becomes an important issue. Combining large amounts of data from different sources offers many opportunities in the domains of data mining and knowledge discovery. Heterogeneous data, once reconciled, can be used to produce new information to adapt to the behavior of users and their context, thus generating a richer and more diverse experience. As more data becomes available, innovative data analysis algorithms are conceived to provide new services, focusing on two key aspects: accuracy and scalability.
Program summary:
In this course, we will first present the fundamentals of distributed data management, including distributed query evaluation, consistency models and transactional issues. Then, we will give an overview of large-scale data management systems, and focus on MapReduce approaches and some NoSQL solutions.

Recommended prerequisite

RDBMS (SQL), parallel programming (threads)

Targeted skills

Distributed queries, consistency models, MapReduce, NoSQL