UE Parallel systems

Degrees incorporating this pedagocial element :


Today, parallel computing is omnipresent across a large spectrum of
computing platforms. At the ``microscopic'' level, processor cores
have used multiple functional units in concurrent and pipelined
fashions for years, and multiple-core chips are now commonplace with a
trend toward rapidly increasing numbers of cores per chip.  At the
``macroscopic'' level, one can now build clusters of hundreds to
thousands of individual (multi-core) computers. Such
distributed-memory systems have become mainstream and affordable in
the form of commodity clusters.  Furthermore, advances in network
technology and infrastructures have made it possible to aggregate
parallel computing platforms across wide-area networks in so-called
``grids''. The popularization of virtualization has allowed
to consolidate workload and resource exploitation in ``clouds'' and
raise many energy issues.

An efficient exploitation of such platforms requires a deep
understanding of both architecture, software and infrastructure
mechanisms and of advanced algorithmic principles. The aim of this
course is thus twofold. It aims at introducing the main trends and
principles in the area of high performance computing infrastructures,
illustrated by examples of the current state of the art. It intends to
provide a rigorous yet accessible treatment of parallel algorithms,
including theoretical models of parallel computation, parallel
algorithm design for homogeneous and heterogeneous platforms,
complexity and performance analysis, and fundamental notions of
scheduling and work-stealing. These notions will always be presented
in connection with real applications and platforms.

Program summary :
Program and expected schedule

  1. Introduction: parallelism from GPU to World-wide Grid platforms. Illustration with presentation of corresponding middleware.
  2. Hardware: High Performance Architectures Processors (superscalar, simultaneous multi-threading, multi-core…). Symmetric MultiProcessors, hierarchical memory (cache coherency, NUMA, COMA).
     OS features for cluster computing Multi-threading. Memory Management for NUMA. Communications (low-level interface, active messages, RDMA).
  3. Networks: (bandwidth, latency, DMA, PIO, overlapping) Data Transfer Protocols (GridFTP,TakTuk…). Communication models (Hockney, LogP, TCP). Analysis of global communication schemes.
     Storage: (redundant disk array, fault tolerance, high availability… )
  4. Modeling of a parallel programs and platforms. Fundamental characteristics:: Work and Depth. Dataflow graph representation of an execution. BSP programs. Locality, granularity, memory space, communications.
  5. Classical scheduling techniques: list algorithms; partitioning techniques. Application to Resource management (PBS, LSF, SGE, OAR).
  6. Case study 1: HPC applications Seismic applications (Cluster computing, ANR/NUMASIS).
     Desktop Grids. High Energy Physics applications (Grid Computing, LCG/EGEE). Monte Carlo Simulations for astrophysics (Grid Computing, Ciment).
  7. Case study 2: Google Map Reduce.
     Dynamic simulations (molecular dynamics). Data-mining.
  8. Scheduling by work-stealing: description and fundamental properties Analysis on processors with changing speeds. Work-stealing and data locality.
  9. Parallel programming: MPI, POSIX threads, Kaapi/CILK, TBB. Practical session.
 10. Parallelism extraction and algorithmic schemes: recursive splitting and series-parallel graphs. Examples: iterated product; branch & bound; FFT; matrix product; sorting.
 11. Adaptive algorithms and cascading divide & conquer : prefix computation, data compression, linear system solving.