UE Fundamentals of probalistic data mining

Degrees incorporating this pedagocial element :

Description

This lecture introduces fundamental concepts and associated numerical methods in model-based clustering, classification and models with latent structure. These approaches are particularly relevant to model random vectors, sequences or graphs, to account for data heterogeneity, and to present general principles in statistical modelling. The following topics are addressed:

  • Principles of probabilistic data mining and generative models; models with latent variables
  • Probabilistic graphical models
  • Mixture models and clustering
  • PCA and probabilistic PCA
  • Nonparametric density estimation
  • Generative models for series and graphs : hidden Markov models

Evaluation :

2-hours written exam (E1) and two reports on practicals or research work (P).The final mark in session 1 is obtained as 0.4E1+0.6P. The final mark in session 2 is obtained as E2 (a 2nd session written exam only).

Prerequisites

Fundamental principles in probability theory (conditioning) and statistics (maximum likelihood estimator and its usual asymptotic properties).

Constrained optimization, Lagrange multipliers.

Targeted skills

At the end of the course, the student will be able to perform model-based clustering, analysis and segmentation of time-series with hidden Markov models, build a graphical model associated with a given distribution and represent numerical multivariate data with missing coordinates into planes.

Bibliography

Lauritzen, S.L. Graphical Models. Clarendon Press, Oxford, United Kingdom, 1996.

Koller, D. and Friedman, N. Probabilistic graphical models: principles and techniques. MIT press, 2009.

Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer Verlag, 2006.