information

Type
Soutenance de thèse/HDR
performance location
Centre Georges Pompidou, Petite Salle (Paris)
date
July 8, 2024

Lenny Renault's thesis defence

Lenny Renault, a doctoral student at Sorbonne University, completed his thesis entitled "Neural Audio Synthesis of Realistic Piano Performances" at the STMS laboratory (Ircam - Sorbonne University - CNRS - Ministry of Culture), as part of the Sound Analysis and Synthesis team, under the supervision of Axel Roebel, head of the team, and the co-supervision of Rémi Mignot, researcher. His thesis was funded by the European Horizon 2020 project n°951911 - AI4Media. The jury is made up of: - Mark Sandler, Queen Mary University of London, Rapporteur - Mathieu Lagrange, CNRS, Laboratoire des Sciences du Numérique de Nantes (LS2N), Rapporteur - Gaël Richard, Laboratoire Traitement et Communication de l'Information (LTCI) - Télécom Paris, Examiner - Jesse Engel, Google DeepMind, Examiner - Juliette Chabassier, Modartt, Examiner - Axel Roebel, STMS Lab, Thesis supervisor Abstract: Musician and instrument make up a central duo in the musical experience. Inseparable, they are the key actors of the musical performance, transforming a composition into an emotional auditory experience. To this end, the instrument is a sound device, that the musician controls to transcribe and share their understanding of a musical work. Access to the sound of such instruments, often the result of advanced craftsmanship, and to the mastery of playing them, can require extensive resources that limit the creative exploration of composers. This thesis explores the use of deep neural networks to reproduce the subtleties introduced by the musician's playing and the sound of the instrument, making the music realistic and alive. Focusing on piano music, the conducted work has led to a sound synthesis model for the piano, as well as an expressive performance rendering model. DDSP-Piano, the piano synthesis model, is built upon the hybrid approach of Differentiable Digital Signal Processing (DDSP), which enables the inclusion of traditional signal processing tools into a deep learning model. The model takes symbolic performances as input and explicitly includes instrument-specific knowledge, such as inharmonicity, tuning, and polyphony. This modular, lightweight, and interpretable approach synthesizes sounds of realistic quality while separating the various components that make up the piano sound. As for the performance rendering model, the proposed approach enables the transformation of MIDI compositions into symbolic expressive interpretations. In particular, thanks to an unsupervised adversarial training, it stands out from previous works by not relying on aligned score-performance training pairs to reproduce expressive qualities. The combination of the sound synthesis and performance rendering models would enable the synthesis of expressive audio interpretations of scores, while enabling modification of the generated interpretations in the symbolic domain.

speakers


share


Do you notice a mistake?

IRCAM

1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43

opening times

Monday through Friday 9:30am-7pm
Closed Saturday and Sunday

subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles

Institut de Recherche et de Coordination Acoustique/Musique

Copyright © 2022 Ircam. All rights reserved.