3D-Sound-Localization

Quaternion Neural Networks for 3D Sound Source Localization: Implementation using First Order Ambisonics.

The objective of our work is to build a working deep quaternion neural network (DQNN) based network that works with First Order Ambisonics data sets. In particular, we are going to extend DQNN, adding capabilities to both support pre-existing data sets (ansim, resim, etc.) and the FOA one in a smart, modular, performing way. Therefore, other metrics have been added like the SELD score, mainly used in the 2019 paper outcomes evaluation, and a tiny library for a graphical representation of the results.

doa

seld

seld3

Usage

This project can be easily executed using one of the two proposed notebooks:

The latter gives you the possibility to use a pre-loaded and pre-extracted dataset (~200GB).

Model metrics CSV table

A quick view of our CSV files.

| - | description |
| —- | —- |
| A | training_loss |
| B | validation_loss |
| C | sed_loss_er |
| D | sed_loss_f1 |
| E | doa_loss_avg_accuracy |
| F | doa_loss_gt |
| G | doa_loss_pred |
| H | doa_loss_gt_cnt |

| - | description |
| —- | —- |
| I | doa_loss_pred_cnt |
| J | doa_loss_good_frame_cnt |
| K | sed_score |
| L | doa_score |
| M | seld_score |
| N | sed_confidence_interval_low |
| O | sed_confidence_interval__up |