Please note that you are curently looking at the ongoing Academic Programs. Applications are now closed for this academic year (2020-2021) for licences, professional licences, masters, DUT and regulated health training. If you are interested for an application in 2021-2022, please click on this link for the appropriate Academic Programs.
Degrees incorporating this pedagocial element :
Description
Today, parallel computing is omnipresent across a large spectrum of
computing platforms, from processor cores or GPUs up to Supercomputers
and Cloud platforms. The largest Supercomputers gather millions of
processing units and are heading towards Exascale (a quintillion or
10^18 flops - http://top500.org). If parallel processing was
initially targetting scientific computing, it is now part of many
domains like Big Data analytics and Deep Learning. But making an
efficient use of these parallel resources requires a deep
understanding of architectures, systems, parallel programming models,
and parallel algorithms.
This class will progressively enable attendees to master advanced
parallel processing. No prior knowledge on parallel programming is required to attend this
class, nor specific skills on system, processor architecture or
theoretical models beyond the base training that any computer science M1 student should
have received. Students should have a taste for experimenting with
advanced computer systems and ready to be exposed to a few
theoretical models (mainly cost models for reasoning about parallel
algorithms).
The class is organized around weekly lectures, discussions and help
time. The presented materials that will be
available each week on the class web page. To get practical experience
and good understanding of the main concepts the students are
expected to develop short programs and experiments. Students will also
have to prepare by teams of 2 a presentation to be presented by the
end of the class. Students will have access to Grid'5000 parallel
machines and the SimGrid simulator for experiments.
This class prepare students to pursue a M2R internship in related
topics in an academic or industrial research team to next pursue a PhD
in computer science or to work into companies on parallel or large systems.
This class is organized around 2 main blocks:
1. Overview of parallel systems:
- Introduction to parallelism from GPU to supercomputers.
- Hardware and system considerations for parallel processing
(multi-core architectures, process and thread handling, cache efficiency, remote data access, atomic instructions)
- Parallel programming: message passing, one-sided communications,
task-based programming, work stealing based runtimes (MPI, Cilk, TBB, OpenMP).
- Modeling of parallel programs and platforms. Locality,
granularity, memory space, communications.
- Parallel algorithms, collective communications, topology aware algorithms.
- Scheduling: list algorithms; partitioning techniques. Application to resource management (PBS, LSF, SGE, OAR).
- Large scale scientific computing: parallelization of Lagrangian
and Eulerian solvers, parallel data analytics and scientific
visualization.
- Parallel big data frameworks, batch and stream processing (Spark, FLink)
- AI and HPC: from frameworks (TensorFlow,Pytorch, Scikit-learn) to
dedicated hardware (GPUs, TPUs). Parallel stochastic gradient
descent, parallel neural networks, synchronous/asynchronous
training strategies.
2. Functional parallel programming:
We propose to study a clean and modern approach to the design of parallel algorithms: functional programming.
Functional languages are known to provide a different and cleaner programming experience by allowing a shift of focus from "how to do" on "what to do".
If you take for example a simple dot product between two vectors. In c language you might end up with:
unsigned int n = length(v1);
double s = 0.0;
for (unsigned int i = 0 ; i < n ; i++) {
s += v1[i] * v2[i];
}
in python however you could write:
return sum(e1*e2 for e1, e2 in zip(v1, v2))
You can easily notice that the c language code displayed here is highly sequential with a data-flow dependence on the i variable.
It intrinsically contains an ordering of operations because it tells you how to do things in order to obtain the final sum.
On the other end the python code exhibits no dependendencies at all. It does not tell you how to compute the sum but just what to compute : the sum of all products.
In this course we will study how to express parallel operations in a safe and performant way.
The main point of the course is to study parallel iterators and their uses but we will also consider classical parallel programming schemes like divide and conquer.
We will both study the theoretical complexity of different parallel algorithms and practice programming and performance analysis on real machines.
All applications will be developped in the rust (https://www.rust-lang.org/) programming language around the rayon (https://github.com/rayon-rs/rayon) parallel programming library.
No previous knowledge of the rust language is required as we will introduce it gradually during the course.
You need however to be proficient in at least one low level language (typically C or C++)
In brief
Period : Semester 9Credits : 6
Number of hours Lectures (CM) : 36h
Culmination Code (APOGEE) : GBX9MO41
Hing methods : In person
Location(s) : Grenoble - University campus
Language(s) : English
Contacts
Arnaud Legrand
Bruno Raffin
