UE Data management in large-scale distributed systems

Diplômes intégrant cet élément pédagogique :

Descriptif

Data management and knowledge extraction have become the core activities of most businesses and organizations. The increasing speed at which systems and users generate data has led to many interesting challenges, both in the industry and in the research community. Data management infrastructures are growing fast, leading to the creation of large data centers and federations of data centers. Current data requirements can no longer be handled exclusively with classic DBMS. They require a variety of flexible data models (relational, NoSQL...), consistency semantics and algorithms issued by the database and distributed system communities. In addition, large-scale systems are more prone to failures, and should implement appropriate fault tolerance mechanisms. The dissemination of an increasing amount of sensors and devices in our environment highly contribute to Big Data and the development of ubiquitous computing. Data stream processing, combined with persistent data processing becomes an important issue. Combining large amounts of data from different sources offers many opportunities in the domains of data mining and knowledge discovery. Heterogeneous data, once reconciled, can be used to produce new information to adapt to the behavior of users and their context, thus generating a richer and more diverse experience. As more data becomes available, innovative data analysis algorithms are conceived to provide new services, focusing on two key aspects: accuracy and scalability.
Program summary:
In this course, we will first present the fundamentals of distributed data management, including distributed query evaluation, consistency models and transactional issues. Then, we will give an overview of large-scale data management systems, and focus on MapReduce approaches and some NoSQL solutions.

Pré-requis

RDBMS (SQL), parallel programming (threads)

Compétences visées

Distributed queries, consistency models, MapReduce, NoSQL

Informations complémentaires

Méthode d'enseignement : En présence
Lieu(x) : Grenoble - Domaine universitaire
Langue(s) : Anglais