PhD students



  • Giorgia Cantisani is working on multimodal, attention-driven music analysis.
  • Nicolas Furnon is working on multi-view speech enhancement.
  • Salah Zaiem is working on self-supervised speech representation learning.
  • David Perera is working on few shot learning applied to audio event detection.


  • Valentin Barrière worked on multimodal sentiment analysis in interactions.
  • Alexandre Garcia worked on structured-output prediction methods for opinion prediction.
  • Magdalena Fuentes worked on audio-based beat and down-beat tracking using deep learning and structured-output prediction methods.
  • Kah Jun Hong worked on acoustic-based detection of pulmonary edema.
  • Sanjeel Parekh worked on audiovisual object modelling. He successfully defended his PhD on March 18th, 2019.
  • Ayoub Hajlaoui worked on EEG-based affect recognition. He successfully defended his PhD on September 20th, 2018.
  • Victor Bisot worked on acoustic scene and event recognition. He successfully defended his PhD on March 16th, 2018.
  • Anne-Claire Conneau worked on emotion classification based on EEG. She successfully defended her PhD on June 9th, 2016.
  • Nicolas Seichepine worked on multimodal nonnegative matrix factorisation.
  • Aymeric Masurelle worked on the analysis of multimodal dance scenes. He successfully defended his PhD on October 1st, 2015.
  • Rémi Foucard worked on audio autotagging with boosting techniques. He successfully defended his PhD on December 20th, 2013.
  • Sébastien Gulluni worked on interactive electroacoustic music structuring. He successfully defended his PhD on December 20th, 2011.
  • Cyril Joder worked on music audio to score alignment with graphical models. He successfully defended his PhD on September 29th, 2011.
  • Félicien Vallet worked on Automatic Structuring of TV Talk-shows. He successfully defended his PhD on September 21st, 2011.




Ongoing research projects


Here is an overview of the research projects I have been actively involved in.

AHEAD (Augmented HEADphones Experience) is a maturation project whose purpose is to develop an AI powered headphones system and audio infrastructure to offer spectators an immersive sound experience at sports events.

DiSCogs    aims at developing methods inspired from machine learning to solve some of the challenges faced in heterogeneous unconstrained microphone array processing.

StaRel is an interdisciplinary research project which aims to develop innovative technological and music-analytical methods to gain fresh insight into the understanding and modeling of the rhythmic/metrical structure in audio recordings of expressive music performances. The project brings together researchers from T&eactue;lécom ParisTech; L2S, CNRS, France; Universidade Federal do Rio de Janeiro, Brazil; and Universidad de la Republica, Uruguay.    Read More »

Past projects


The Chair Machine Learning for Big data was a Télécom ParisTech Chair held by Stéphan Clémencon. It aimed at developing methodological research addressing the challenges posed by the statistical analysis of big data.    Read More »

LASIE (Large Scale Information Exploitation of Forensic Data) provided a set of tools and processes to support law enforcement agents and investigators or analysts in their everyday work. The proposed system significantly reduces the required investigation time by utilizing automatic processes for analysing multimedia contents, as well as visual analytics from an inference engine able to highlight otherwise hidden information.    Read More »

SeNSE (Signaux Socio Emotionnels) was centered at the analysis of social and emotional signals exchanged during human-virtual agent and human-human interactions.    Read More »

The ultimate goal of REVERIE was to provide the means for building a mixed reality space in which real and virtual worlds engage and seamlessly interact in real-time, generating compelling and highly realistic immersive environments.    Read More »

MEEGAPERF (Monitoring EEG pour l'Anticipation des PERFormances) was a French DGA-funded project that focuses on realtime human performance monitoring through EEG analysis.
Read More »

VERVE : Vanquishing fear and apathy through E-inclusion: Personalised and populated Realistic Virtual Environments for clinical, home and mobile platforms, will develop ICT tools to support the treatment of people who are at risk of social exclusion due to fear and/or apathy associated with a disability. These tools will be in the form of personalised Virtual Reality (VR) scenarios and serious games specifically designed for therapeutic targets and made broadly available via a novel integration of interactive 3D environments directly into Web browsers. The project will perform cutting edge research into rendering and simulating personalised and populated VR environments, 3D web graphics, and serious games. These technical efforts will be underpinned by clinical/laboratory and industry partners and in liaison with the stakeholders (i.e., participants, carers/family, and health professionals).
Read More »

"Bringing the Media Internet to Life" - or simply, 3DLife - was a European Union funded project that aimed to integrate research conducted within Europe in the field of Media Internet. 3DLife's ultimate target was to lay the foundations of a European Competence Centre under the name "Excellence in Media Computing and Communication" or simply EMC2.    Read More »

Quaero was a collaborative research and development program, centered at developing multimedia and multilingual indexing and management tools for professional and general public applications such as the automatic analysis, classification, extraction and exploitation of information.
The research aimed to facilitate the extraction of information in unlimited quantities of multimedia and multilingual documents, including written texts, speech and music audio files, and images and videos.
Quaero was created to respond to new needs for the general public and professional use, and new challenges in multimedia content analysis resulting from the explosion of various information types and sources in digital form, available to everyone via personal computers, television and handheld terminals.    Read More »

TANGERINE: Theory and applications of nonnegative matrix factorization, was a ''young researcher'' project funded by the French agency for research (ANR) and coordinated by Cédric Févotte.
Read More »

Older projects



K-space stands for Knowledge Space of Semantic Inference for Automatic Annotation and Retrieval of Multimedia Content. K-Space integrated leading European research teams to create a Network of Excellence in semantic inference for semi-automatic annotation and retrieval of multimedia content. The aim was to narrow the gap between content descriptors that can be computed automatically by current machines and algorithms, and the richness and subjectivity of semantics in high-level human interpretations of audiovisual media: The Semantic Gap ...


SELIA was a collaborative project about speaker diarization, involving Telecom ParisTech and EURECOM, coordinated by N. Evans.


This project aimed at developing a versatile multimedia indexing and mining platform. This platform, called PLATO, is a python web application using Ajax techniques and PostgreSQL as a backend database. It is both: i) an intelligent media library, capable of handling a diversity of multimedia documents (images, sounds, videos and texts) and associated metadata; ii) a repository of research software, processing tools and computation resources (cluster of machines) allowing one to do online experiments; and iii) a set of demonstrative and communication tools.


"StAndardisation du Remastering Audio Haute-Définition" was a project with MIST technologies (now known as Audionamix) and the Studios-Copra on high-quality HD remastering of music recordings.


On creating powerful multimedia search, information retrieval and knowledge mining tools.


COdage Hiérarchique et Robuste de sources Audiovisuelles et applications à l'INTErnet.