UE Fundamentals of Data Processing and Distributed Knowledge

Degrees incorporating this pedagocial element :


Modern computing increasingly takes advantage of large amounts of distributed data and knowledge. This is grounded on theoretical principles borrowing to several fields of computer science such as programming languages, data bases, structured documentation, logic and artificial intelligence. The goal of this course is to present some of them, the problems that they solve and those that they uncover. The course considers three perspectives on data and knowledge: interpretation (what they mean), analysis (what they reveal) and processing (how can they be traversed efficiently and transformed safely).

The first part offers a semantic perspective on distributed knowledge. Distributed knowledge may come from data sources using different ontologies on the semantic web, autonomous software agents learning knowledge or social robots interacting with different interlocutors. The course adopts a synthetic view on these. It first presents principles of the semantics of knowledge representation (RDF, OWL). Ontology alignments are then introduced to reduce the heterogeneity between distributed knowledge and their exploitation for answering federated queries is presented. A practical way for cooperating agents to evolve their knowledge is cultural knowledge evolution that is then illustrated. Finally, the course defines dynamic epistemic logics as a way to model the communication of knowledge and beliefs.

The second part summarizes data models and algorithms required to mine and recommend content on the social Web. The course examples are drawn from real-world applications such as movie recommendations and extracting travel itineraries from photos. The course goals are: acquire knowledge on how recommender systems work in practice, and on scalable algorithms for processing large volumes of social data and extracting value from that data, and learn how to run and interpret large-scale user studies.

The third part introduces a perspective on programming language foundations, algorithms and tools for processing structured information, and in particular tree-shaped data. It consists in an introduction to relevant theoretical tools with an application to NoSQL (not only SQL) and XML technologies in particular. Theories and algorithmic toolboxes such as fundamentals of tree automata and tree logics are introduced, with applications to practical problems found for extracting information. Applications include efficient query evaluation, memory-efficient validation of document streams, robust type-safe processing of documents, static analysis of expressive queries, and static type-checking of programs manipulating structured information. The course also aims at presenting challenges, important results, and open issues in the area.