Audio Language Models

Audio analysis and audio synthesis require modeling long-term, complex phenomena and have historically been tackled in an asymmetric fashion, with specific analysis models that differ from their synthesis counterpart. In this presentation, we will introduce the concept of audio language models, a recent innovation aimed at overcoming these limitations. By discretizing audio signals using a neural audio codec, we can frame both audio generation and audio comprehension as similar autoregressive sequence-to-sequence tasks, capitalizing on the well-established Transformer architecture commonly used in language modeling. This approach unlocks novel capabilities in areas such as textless speech modeling, zero-shot voice conversion, text-to-music generation and even real-time spoken dialogue. Furthermore, we will illustrate how the integration of analysis and synthesis within a single model enables the creation of versatile audio models capable of handling a wide range of tasks involving audio as inputs or outputs. We will conclude by highlighting the promising prospects offered by these models and discussing the key challenges that lie ahead in their development.

From the same archive

Introduction à la journée d'études du GdR IASIS dédiée à la synthèse audio - Thomas Hélie, Mathieu Lagrange

Poster sessions - Clara Boukhemia, Samir Sadok, Amandine Brunetto, Haoran Sun, Vincent Lostanlen, Morgane Buisson, Xiran Zhang, Reyhaneh Abbasi, Ainė Drėlingytė, Étienne Paul André, Yuexuan Kong, Étienne Bost, Axel Marmoret, Javier Nistal, Hugo Pauget Ballesteros

AI in 64Kbps: Lightweight neural audio synthesis for embedded instruments - Philippe Esling

Music sound synthesis using machine learning - Fanny Roche

Grey-box modelling informed by physics: Application to commercial digital audio effects - Judy Najnudel

Introduction à la journée d'études du GdR IASIS dédiée à la synthèse audio - Thomas Hélie, Mathieu Lagrange

Poster sessions - Clara Boukhemia, Samir Sadok, Amandine Brunetto, Haoran Sun, Vincent Lostanlen, Morgane Buisson, Xiran Zhang, Reyhaneh Abbasi, Ainė Drėlingytė, Étienne Paul André, Yuexuan Kong, Étienne Bost, Axel Marmoret, Javier Nistal, Hugo Pauget Ballesteros

AI in 64Kbps: Lightweight neural audio synthesis for embedded instruments - Philippe Esling

Music sound synthesis using machine learning - Fanny Roche

Grey-box modelling informed by physics: Application to commercial digital audio effects - Judy Najnudel

Hybrid deep learning for music analysis and synthesis - Gaël Richard

Invariance learning for a music indexing robust to sound modifications - Rémi Mignot

Basic Pitch: A lightweight model for multi-pitch, note and pitch bend estimations in polyphonic music - Rachel Bittner

GDR ISIS, Méthodes et modèles en traitement de signal, Introduction

Labeling a Large Music Catalog - Romain Hennequin

Audio Language Models

From the same archive

Introduction à la journée d'études du GdR IASIS dédiée à la synthèse audio - Thomas Hélie, Mathieu Lagrange

Poster sessions - Clara Boukhemia, Samir Sadok, Amandine Brunetto, Haoran Sun, Vincent Lostanlen, Morgane Buisson, Xiran Zhang, Reyhaneh Abbasi, Ainė Drėlingytė, Étienne Paul André, Yuexuan Kong, Étienne Bost, Axel Marmoret, Javier Nistal, Hugo Pauget Ballesteros

AI in 64Kbps: Lightweight neural audio synthesis for embedded instruments - Philippe Esling

Music sound synthesis using machine learning - Fanny Roche

Grey-box modelling informed by physics: Application to commercial digital audio effects - Judy Najnudel

Introduction à la journée d'études du GdR IASIS dédiée à la synthèse audio - Thomas Hélie, Mathieu Lagrange

Poster sessions - Clara Boukhemia, Samir Sadok, Amandine Brunetto, Haoran Sun, Vincent Lostanlen, Morgane Buisson, Xiran Zhang, Reyhaneh Abbasi, Ainė Drėlingytė, Étienne Paul André, Yuexuan Kong, Étienne Bost, Axel Marmoret, Javier Nistal, Hugo Pauget Ballesteros

AI in 64Kbps: Lightweight neural audio synthesis for embedded instruments - Philippe Esling

Music sound synthesis using machine learning - Fanny Roche

Grey-box modelling informed by physics: Application to commercial digital audio effects - Judy Najnudel

Hybrid deep learning for music analysis and synthesis - Gaël Richard

Invariance learning for a music indexing robust to sound modifications - Rémi Mignot

Basic Pitch: A lightweight model for multi-pitch, note and pitch bend estimations in polyphonic music - Rachel Bittner

GDR ISIS, Méthodes et modèles en traitement de signal, Introduction

Labeling a Large Music Catalog - Romain Hennequin

speakers

information

IRCAM

opening times

subway access