Data-driven room geometry inference using smart speakers

resource center

Do you notice a mistake?

information

event: Avancées méthodologiques pour la Réalité Augmentée Audio et ses Applications
Type: Atelier / Formation
performance location: Ircam, Salle Igor-Stravinsky (Paris)
date: December 6, 2024

Knowledge of geometric properties of a room may be very beneficial for many audio applications, including sound source localization, sound reproduction, and augmented and virtual reality. Room geometry inference (RGI) deals with the problem of acoustic reflector localization based on room impulse responses recorded between loudspeakers and microphones.

Rooms with highly absorptive walls or walls at large distances from the measurement setup pose challenges for RGI methos. In the first part of the talk, we present a data-driven method to jointly detect and localize acoustic reflectors that correspond to nearby and/or reflective walls. We employ a multi-branch convolutional recurrent neural network whose input consists of a time-domain acoustic beamforming map, obtained via Radon transform from multi-channel room impulse responses. We propose a modified loss function forcing the network to pay more attention to walls that can be estimated with a small error. Simulation results show that the proposed method can detect nearby and/or reflective walls and improve the localization performance for the detected walls.

Data-driven RGI methods generally rely on simulated data since the RIR measurements in a diverse set of rooms may be a prohibitively time-consuming and labor-intensive task. In the second part of the talk, we explore regularization methods to improve RGI accuracy when deep neural networks are trained with simulated data and tested with measured data. We use a smart speaker prototype equipped with multiple microphones and directional loudspeakers for real-world RIR measurements. The results indicate that applying dropout at the network’s input layer results in improved generalization compared to using it solely in the hidden layers. Moreover, RGI using multiple directional loudspeakers leads to increased estimation accuracy when compared to the single loudspeaker case, mitigating the impact of source directivity.

Methodological advances for Audio Augmented Reality and its applications

As part of the project HAIKUS (ANR-19-CE23-0023), funded by the French national research agency, IRCAM, LORIA and IJLRA organized a one-day workshop focusing on methodological advances for Audio Augmented Reality and its applications.

Audio Augmented Reality (AAR) seeks to integrate computer-generated and/or pre-recorded auditory content into the listener's real-world environment. Hearing plays a vital role in understanding and interacting with our spatial environment. It significantly enhances the auditory experience and increases user engagement in Augmented Reality (AR) applications, particularly in artistic creation, cultural mediation, entertainment and communication industries.

Audio-signal processors are a key component of the AAR workflow, as they are required for real-time control of 3D sound spatialisation and artificial reverberation applied to virtual sound events. These tools have now reached a level of maturity, capable of supporting large multichannel loudspeaker systems as well as binaural rendering on headphones. However, the accuracy of the spatial processing applied to virtual sound objects is essential to ensure their seamless integration into the listener's real environment, thereby guaranteeing a high-quality user experience. To achieve this level of integration, methods are needed to identify the acoustic properties of the environment and adjust the spatialization engine's parameters accordingly. Ideally, such methods should enable automatic inference of the acoustic channel's characteristics, based solely on live recordings of the natural, and often dynamic, sounds present in the real environment (e.g. voices, noise, ambient sounds, moving sources). These topics are gaining increasing attention, especially in light of recent advances on data-driven approaches within the field of acoustics. In parallel, perceptual studies are conducted to define the level of requirements needed to guarantee a coherent sound experience.

Organising committee: Antoine Deleforge (INRIA), François Ollivier (MPIA-IJLRA), Olivier Warusfel (IRCAM)

speakers

Data-driven room geometry inference using smart speakers

information

Methodological advances for Audio Augmented Reality and its applications

speakers

From the same archive

On the Impact of Simulation Realism in Virtually-Supervised Acoustic Parameter Estimation

Methodological advances for Audio Augmented Reality and its applications : Posters

Spatial interpolation of room acoustics models for dynamic audio rendering

Avancées méthodologiques pour la Réalité Augmentée Audio et ses Applications : Introduction

Design and applications of a High order spherical microphone array

Perceptual matching of room acoustics and perceptual optimization of room acoustic rendering for AR/XR audio

Common-slope modelling for 6DoF Audio

Rendering methods and evaluation methodology for audio augmented reality

share

IRCAM

opening times

subway access