UE Data management

Degrees incorporating this pedagocial element :


Data management and knowledge extraction have become the core activities of most  businesses and organizations. The increasing speed at which systems and users generate data has led to many interesting challenges, both in the industry and in the research community.

Data management infrastructures are growing fast, leading to the creation of large data centers and federations of data centers. Current data requirements can no longer be handled exclusively with classic DBMS. They require a variety of flexible data models  (relational, NoSQL...), consistency semantics and algorithms issued by  the database and  distributed system communities. In addition, large-scale systems are more prone to failures, and should implement appropriate fault tolerance mechanisms.

The dissemination of an increasing amount of sensors and devices in our environment highly contribute to  Big Data and the development of ubiquitous computing. Data stream processing, combined with persistent data processing becomes an important issue.

Combining large amounts of data from different sources offers many opportunities in the domains of data mining and knowledge discovery. Heterogeneous data, once reconciled, can be used to produce new information to adapt to the behavior of users and their context, thus generating a richer and more diverse experience. As more data becomes available, innovative data analysis algorithms are conceived to provide new services, focusing on two key aspects: accuracy and scalability. 

Program summary: In this course, we will first present the fundamentals of distributed data management, including distributed query evaluation, consistency models and data mediation. Will then study the domain of ubiquitous data management, focusing on sensor networks and their use in context aware applications. Then, we will give an overview of large-scale data management systems, from fully decentralized architectures such as peer-to-peer networks, to the MapReduce framework. The course also propose an overview of the key topics in data mining.

Evaluation :

2-hours written exam (E) and a reports on practicals or search work (P). The final mark in session 1 is obtained as 0.7E+0.3P. The final mark in session 2 is obtained as a written exam only.


Fundamentals of DBMS

Targeted skills

At the end of the course, the student will be able to analyze a distributed system application and chose the appropriate database, develop Map-Reduce applications, and perform some data analysis.