UE Fundamentals of probalistic data mining

User information

Please note that you are curently looking at the ongoing Academic Programs. Applications are now closed for this academic year (2020-2021) for licences, professional licences, masters, DUT and regulated health training. If you are interested for an application in 2021-2022, please click on this link for the appropriate Academic Programs.

Degrees incorporating this pedagocial element :


This lecture introduces fundamental concepts and associated numerical methods in model-based clustering, classification and models with latent structure. These approaches are particularly relevant to model random vectors, sequences or graphs, to account for data heterogeneity, and to present general principles in statistical modelling. The following topics are addressed:

  • Principles of probabilistic data mining and generative models; models with latent variables
  • Probabilistic graphical models
  • Mixture models and clustering
  • PCA and probabilistic PCA
  • Nonparametric density estimation
  • Generative models for series and graphs : hidden Markov models

Evaluation :

2-hours written exam (E1) and two reports on practicals or research work (P).The final mark in session 1 is obtained as 0.5E1+0.5P. The final mark in session 2 is obtained as E2 (a 2nd session written exam only).

Recommended prerequisite

Fundamental principles in probability theory (conditioning) and statistics (maximum likelihood estimator and its usual asymptotic properties).

Constrained optimization, Lagrange multipliers.

Targeted skills

At the end of the course, the student will be able to perform model-based clustering, analysis and segmentation of time-series with hidden Markov models, build a graphical model associated with a given distribution and represent numerical multivariate data with missing coordinates into planes.