UE Software infrastructure for data centers and Cloud computing

Diplômes intégrant cet élément pédagogique :

Descriptif

Modern Web applications, popular online services (e.g., search engines, social networks, streaming services) and Big Data applications share some major requirements: they need a large amount of computing resources and have stringent constraints in terms of reliability, availability and performance. To fulfill such requirements, these systems are implemented using a large number of servers hosted in a data center, forming so-called “rack-scale” or even “warehouse-scale” platforms.

At the core of the success of companies like Google, Facebook, Twitter or Amazon, is the ability to exploit data center resources efficiently and reliably through well-designed software infrastructures. And while a few challenges are specific to the massive size of the giant companies mentioned above, most design principles and research and development works on such software infrastructures are also of interest for smaller scale systems.

This course aims at studying the design of software infrastructures for data center systems. It introduces some of the main building blocks and abstraction levels of such infrastructures. The following topics will be covered:
- An overview of the Cloud computing landscape including (i) the basic facilities to deploy applications (virtual machines, containers, functions) and data (block storage, object storage, file storage, database storage), and (ii) the characteristics of “Cloud-native” applications
-  Resource management services for the resource allocation, placement, scheduling, supervision and orchestration of distributed applications (for example, the Kubernetes system)
- Coordination and communication services  (for example, etcd and ZooKeeper) allowing to build consistent and highly available applications despite failures and churn communication services (for example, Kafka) allowing to interconnect and integrate various applications acting as producers and consumers of data streams
-  Data processing and storage services  including in-memory data storages used to increase applications throughput (for example, memcached)
- The impact of new hardware trends  an overview of recent progresses in the hardware design of computer systems (e.g., speed improvements, and evolutions of the hardware/software, specialized processing units) and the consequences for Cloud infrastructures and applications.

Through this course, students will learn about the design of these services and frameworks, and get the chance to understand the underlying theoretical and practical challenges related to operating systems and distributed systems (including scalability, fault tolerance, data consistency and resource virtualization).

The course is organized into several types of activities: lectures and case studies, lab sessions (mini-projects), study and presentation of influential/recent research papers.

Related courses:

The topics covered in this course are related to (but have no significant overlap with) the contents of other M2R courses listed below. Attending these courses is recommended but not strictly required to attend the present course.

 - “Advanced aspects of operating systems” (focused on operating systems aspects at the scale of a single machine)
 - “Distributed systems” (mostly focused on the algorithmic aspects of distributed systems)
 - “Parallel systems”
 - “Data management in large-scale distributed systems” (mostly focused on building applications to process large volumes of data)

Pré-requis

Basic knowledge (M1 level) of operating systems and networks

Compétences visées

At the end of the course, the students will: (i) understand the main challenges involved when using and operating Cloud infrastructures, (ii) know the internal principles of Cloud infrastructure services, (iii) be able to design Cloud-based applications.

Bibliographie

- Kris Nova and Justin Garrison. Cloud Native Infrastructure. O’Reilly, 2017.
- Brendan Burns. Designing Distributed Systems. O’Reilly, 2018.
- Martin Kleppmann. Designing Data-Intensive Applications. O’Reilly, 2016.
- Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. The Datacenter as a Computer. Designing Warehouse-Scale Machines (3rd edition). Morgan & Claypool, 2018.

Informations complémentaires

Méthode d'enseignement : En présence
Lieu(x) : Grenoble - Domaine universitaire
Langue(s) : Anglais