A DEEP LEARNING FRAMEWORK FOR MUSICAL ACOUSTICS SIMULATIONS

Abstract

The acoustic modeling of musical instruments is a heavy computational process, often bound to the solution of complex systems of partial differential equations (PDEs). Numerical models can achieve a high level of accuracy, but they may take up to several hours to complete a full simulation, especially in the case of intricate musical mechanisms. The application of deep learning, and in particular of neural operators that learn mappings between function spaces, has the potential to revolutionize how acoustics PDEs are solved and noticeably speed up musical simulations. However, such operators require large datasets, capable of exemplifying the relationship between input parameters (excitation) and output solutions (acoustic wave propagation) per each target musical instrument/configuration. With this work, we present an open-access, open-source framework designed for the generation of numerical musical acoustics datasets and for the training/benchmarking of acoustics neural operators. We first describe the overall structure of the framework and the proposed data generation workflow. Then, we detail the first numerical models that were ported to the framework. Finally, we conclude by sharing some preliminary results obtained by means of training a state-of-the-art neural operator with a dataset generated via the framework. This work is a first step towards the gathering of a research community that focuses on deep learning applied to musical acoustics, and shares workflows and benchmarking tools.

1. INTRODUCTION

The study of the acoustics of musical instruments is a challenging topic. Physics phenomena underlying music making are quite various and include excitation, resonant behavior, as well as the coupling and the dynamic modification of the involved mechanical parts. These make musical instruments remarkable examples of engineering, but also acoustic systems difficult to model. The most accurate simulations that exist today leverage the numerical solution of partial differential equations (PDEs), that are in turn designed to model the specific acoustic behavior of the targeted instruments (Bilbao, 2009) . Unfortunately, the majority of the employed solvers are characterized by heavy computational requirements, often leading to restrictive implementation conditions (e.g., low spatio-temporal resolution, high degree of model simplification, non-interactive paradigms). Recent advancements in deep learning have shown how neural networks may be used to enhance and even replace traditional PDE solvers (Bhatnagar et al., 2019) , with the aim to improve performance. In particular, the use of neural operators has yielded promising results in fluids dynamics (Li et al., 2020) , suggesting that their application may be successfully extended to revolutionize the simulation of the acoustics and the aeroacoustics of musical instruments. Being completely data-driven, neural operators could be trained to solve acoustics PDEs with synthetic datasets, generated via the large array of traditional numerical implementations that are available in the literaturefoot_0 . Although exciting, this scenario is hindered by a lack of common practices that are needed to bridge the domains of musical acoustics and deep learning. These include shared datasets, benchmarks, as well as general tools to help researchers categorize, manage and employ acoustics data for training and inference. The aim of our research is to foster the rapid growth of an active community where these common practices could be discussed and formalized, along with the overall emerging field of deep learning-based musical acoustics. In line with this mission, in this work we present the Neuralacoustics framework, a collection of open-access/open-source scripts and tools designed to address the aforementioned needs. In particular, we provide an in-depth description of the dataset generation workflow proposed as part of the framework, and we introduce the first numerical models available in it. We also discuss preliminary results obtained by training a state-of-the-art neural operator for the solution of a simple acoustics problem, using exclusively the tools available in the framework.

2. BACKGROUND

Musical Acoustic Simulations. In the musical domain, the practice of designing mathematical models of instruments is often referred to with the term physical modeling synthesis. Common techniques include modal synthesis (Causse et al., 2011) and digital waveguides (Smith, 1992 ). Yet, the most precise techniques rely on numerical analysis (Castagné & Cadoz, 2003 ) (e.g., finite elements, finite differences). Numerical models implement solvers of PDE systems; they can finely simulate fundamental aspects of musical acoustics, like wave propagation and aeroacoustics, as well as physical phenomena beyond instruments and music (Yokota et al., 2002; Arnela & Guasch, 2014 ). The downside of numerical approaches lies in the computational load of the resulting models, as well as in the amount of parameters they have to comprise to properly simulate the instrument's behavior. Of particular interest to our work is the case of time-domain simulations of musical instruments (Bilbao, 2009) . In this context, the PDEs solved by the models describe the relationship between previous and next states of the instruments, organized over discrete time steps. Other than taking into account time-varying acoustic excitation of the instruments, this approach potentially enables the design of interactive models. Despite the high computational requirements of numerical analysis, real-time interactive models of musical instruments have been designed in recent years (Sosnick & Hsu, 2010; Allen & Raghuvanshi, 2015; Zappi et al., 2017) . Unfortunately, this approach relies on expensive dedicated hardware (GPUs) and implementations are characterized by noticeable technical constraints, that limit access to models' parameters and interaction (Renney et al., 2022) . As a result, numerical analysis is employed for the greater part to model simple musical systemsfoot_1 (Bilbao et al., 2019), or for batch (i.e., non-real-time) simulations (Bilbao & Chick, 2013; Arnela & Guasch, 2014 ) that may require run-times of several hours. In both cases, the applicability as well as the intelligibility of the resulting models are heavily hindered. Deep Learning and PDE Solvers. Recently, deep learning has been successfully explored for the generation of PDE solvers describing time-dependent problems (Blechschmidt & Ernst, 2021; Li et al., 2020) . These neural solvers may reduce the overall computational requirements of traditional ones, while approximating their output with a remarkable degree of precision. One of the simplest examples of neural solvers consists of deep convolutional neural networks parametrizing the operator that maps inputs and outputs (i.e., solutions) of the PDEs (Bhatnagar et al., 2019; Khoo et al., 2021) . The limitation to this approach lies in its dependence on the chosen mesh, meaning that it is not possible to compute solutions outside the discretization grid used for training. Physics informed neural networks solve this issue, as they are mesh-independent and designed to work alongside classical schemes (e.g., Runge-Kutta) (Raissi et al., 2019) . Although capable of addressing problems in the small data setting and with high dimensionality (Blechschmidt & Ernst, 2021), they are often employed to solve time-dependent PDEs that share many similarities with the ones modeling musical acoustics-e.g., Navier-Stokes equations (Rudy et al., 2017; Font et al., 2021; Cai et al., 2022) . Being only partially data-driven, this approach requires to tailor the network to a specific instance of the PDEs and to repeat training at any given new input. Most of the individual advantages of the approaches introduced so far are collated in neural operators (Li et al., 2020) . Neural operators are mesh-free operators that require no prior knowledge of the underlying PDEs. They learn mappings between infinite-dimensional spaces of functions relying only on a finite collection of observations; and they can be used without retraining to solve PDEs



In this scenario the only constraint would be computational time-an affordable caveat when generating training sets. These numerical models can be deemed as "simple" only if compared to the complexity of actual acoustic instruments.

