Acoustic SLAM — State of the Art: Review

A review of Simultaneous Localization And Mapping based on Acoustic signals.

From Wiki :
From Wiki:


With the recent increase in general UAV usage (both flying and ground-based vehicles), the ability to autonomously navigate at previously unknown and unexplored locations has become a task of paramount importance. Robots capable of exploring standard and extreme environments (such as caves, damaged buildings, etc.) are receiving increasing attention. One solution that has been considered is the Simultaneous Localization and Mapping (SLAM), where a vehicle can simultaneously explore and draw its environment.

Visual SLAM using sensor camera arrays has received widespread attention in both academia and industry, partially due to the rapid improvement of computer vision technology. Cameras, however, are severely limited by several factors, such as their considerable computational demand, and their inability to operate under harsh lighting conditions, which can be a severe limitation for missions.

An alternative to camera-based sensing is to use an acoustic sensors, which operate in a sonar-like fashion. Acoustic SLAM (aSLAM) has received limited attention, but some work has been conducted in this respect, especially with ground-based robots (such as the work by Evers et al. [1]). It is important to note that acoustic SLAM also subject to some limitations, such as low resolution (especially compared to visual instrumentation), but the requirements for calculation are much lower. The review is divided into three parts: (a) A general and brief background to the general SLAM problem. (b) State-of-the-art methods used today for acoustics SLAM. (c) Review of modern industrial applications of aSLAM.


This review contains extensive literature survey on Acoustics SLAM considering the current industrial application. In SLAM, an object (robot, car, etc.) moves along unknown trajectories [2] in an unknown environment, with the goal of mapping its surroundings. SLAM algorithm is used to reconstruct environment map and object trajectory. In most cases, the map is specified by known landmarks. Reconstructing the map is equivalent to establishing the location of these landmarks relative to the object. The problem and various methods for solving it have been previously summarized by Durrant-Whyte and Bailey [3]-[4], as well as by Cadena et al [5], and Thrunet al. [6]. Two classical algorithms for solving general SLAM problem is the Extended Kalman Filter (EKF-SLAM) [7] and the factored Solution to the SLAM (FastSLAM) [8]. EKF is a well-known navigation and control algorithm that deals with the non-linearity problems. The FastSLAM algorithm, on the other hand, utilizes a Rao-Blackwellized [9] representation of the posterior, integrating particle filter and Kalman filter representations. Currently, both of these approaches compose the heart of the field of SLAM.

State-of-the-Art Methods for Acoustics SLAM

In Acoustics SLAM many methods work well in free space but are more challenging to deploy in in-door environments due to echoes from landmarks.

DoA Estimation using Particle Filter

Evers and Naylor [1] presented an algorithm based on an observer equipped with microphone arrays for exploring and learning various acoustic-based landmarks in a given environment. The algorithm is based on a probabilistic triangulation of the source positions, which is accomplished by estimating the of the Direction-of-Arrival (DoA) of the source, and by exploiting the observer’s spatiotemporal diversity, and maximizing it by using the Direct-Path Dominance MUltiple SIgnal Classification (DPD-MUSIC) [10]. For each time step, as suggested by Botev and Grotowski [11]. the continuous single-source PDF (probability density function) is calculated approximately by estimating the samples’ kernel density. Multi-source PDF is approximated by its Probability Hypothesis Density (PHD). This study suggested the fusion of motion reports, and source maps by a probabilistic method, known as the particle filter (also called Sequential Monte Carlo) [12]. This approach was applied for audio by Andrieu and Godsill [13]. In their experimental study, they achieved localization and mapping results, including robustness of DoA estimation errors [1].

Geometry Reconstruction based on Microphone Array

Dokmanic et al. [14] showed a room geometry reconstruction based on acoustic instrumentation (specifically a microphone array). They formulated the model as a constrained optimization problem without using a probabilistic approach, but their solution is limited to a few microphones. Ba et al. [15] suggested a novel room modeling algorithm based on a circular microphone array. They presented experimental results, where the mapping was accomplished almost perfect, and the distance to the walls of the room were determined with high precision.


Furukawa et al. [16] reported on acoustic-based localization using an Unmanned Aeirial Vehicle (UAV). They developed a method to improving acoustic source localization by installing a microphone array on a multi-rotor UAV. They claimed that the main problem was the self-noise emitted by the vehicle’s propellers. They implemented Generalized Eigenvalue Decomposition (GEVD) based MUSIC.

In 2017, Krekovic, Dokmanic, and Vetterli reconstructed the trajectory and room geometry based on echoes, which were captured by a static microphone array. Their work was inspired by bats, which fly indoors by listening to echoes of its chirps [17]. Lollmann et al. [18] introduced the experimental results as part of the IEEE Audio and Acoustic Signal Processing (AASP) source LOcalization and TrAcking (LO- CATA). The audio was recorded using planar, spherical, and pseudo-spherical microphone arrays. Experiments included: static and moving arrays for moving human talkers and loudspeakers.


[1] Christine Evers and Patrick A Naylor. Acoustic slam. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(9):1484–1498, 2018.

[2] MWM Gamini Dissanayake, Paul Newman, Steve Clark, Hugh F Durrant- Whyte, and Michael Csorba. A solution to the simultaneous localization and map building (slam) problem. IEEE Transactions on robotics and automation, 17(3):229–241, 2001.

[3] Hugh Durrant-Whyte and Tim Bailey. Simultaneous localization and mapping: part i. IEEE robotics & automation magazine, 13(2):99–110, 2006.

[4] Tim Bailey and Hugh Durrant-Whyte. Simultaneous localization and mapping (slam): Part ii. IEEE Robotics & Automation Magazine, 13(3):108–117, 2006.

[5] Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jos´e Neira, Ian Reid, and John J Leonard. Past, present, and future of simul- timeous localization and mapping: Toward the robust-perception age. IEEE Transactions on robotics, 32(6):1309–1332, 2016.

[6] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic robotics (in- telligent robotics and autonomous agents’ series).

[7] Andrew H Jazwinski. Stochastic processes and filtering theory. Courier Corpo- ration, 2007.

[8] Michael Montemerlo, Sebastian Thrun, Daphne Koller, Ben Wegbreit, et al. Fastslam: A factored solution to the simultaneous localization and mapping problem. Aaai/iaai, 593598, 2002.

[9] Arnaud Doucet, Nando De Freitas, Kevin Murphy, and Stuart Russell. Rao- blackwellised particle filtering for dynamic bayesian networks. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pages 176–183. Morgan Kaufmann Publishers Inc., 2000.

[10] Or Nadiri and Boaz Rafaely. Localization of multiple speakers under high re- verberation using a spherical microphone array and the direct-path dominance test. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10):1494–1505, 2014.

[11] Zdravko I Botev, Joseph F Grotowski, Dirk P Kroese, et al. Kernel density estimation via diffusion. The annals of Statistics, 38(5):2916–2957, 2010.

[12] Pierre Del Moral. Nonlinear filtering: Interacting particle resolution. Comptes Rendus de l’Academie des Sciences-Serie I-Mathematique, 325(6):653–658, 1997.

[13] C Andrieu and SJ Godsill. A particle filter for model based audio source separa- tion. In Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation (ICA 2000), 2000.

[14] Ivan Dokmani´c, Laurent Daudet, and Martin Vetterli. From acoustic room recon- struction to slam. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6345–6349. Ieee, 2016.

[15] Demba Ba, Fla´vio Ribeiro, Cha Zhang, and Dinei Florˆencio. L1 regularized room modeling with compact microphone arrays. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 157–160. IEEE, 2010.

[16] Koutarou Furukawa, Keita Okutani, Kohei Nagira, Takuma Otsuka, Katsutoshi Itoyama, Kazuhiro Nakadai, and Hiroshi G Okuno. Noise correlation matrix estimation for improving sound source localization by multirotor uav. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3943–3948. IEEE, 2013.

[17] Miranda Krekovi´c, Ivan Dokmani´c, and Martin Vetterli. Omnidirectional bats, point-to-plane distances, and the price of uniqueness. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3261– 3265. Ieee, 2017.

[18] Heinrich W L¨ollmann, Christine Evers, Alexander Schmidt, Heinrich Mellmann, Hendrik Barfuss, Patrick A Naylor, and Walter Kellermann. The locata challenge data corpus for acoustic source localization and tracking. In 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM), pages 410–414. IEEE, 2018.