Annual: 2019

AS026 »
iOwlT: Sound Geolocalization System
📁Machine Learning
👤Matheus Farias
 (Universidade Federal de Pernambuco (UFPE))
📅Oct 08, 2019
Regional Final
Community Award



👀 9647   💬 10

AS026 » iOwlT: Sound Geolocalization System

Description

iOwlT is a geolocalization sound system based on the nature of prey searches by an nocturne owl. Using the technique of multilateration of signals, widely used in telecommunications, and the phase shift of a signal detected by distinct sensors well distributed in space, it is possible, with the knowledge of algebra and physics, to prove that in an N-dimensional space, just N+1 detectors are needed to accurately determine the origin of the event. As the real life has 3 dimensions only 4 sound sensors determines the location of an event.

Combining the power of the parallel processing, achieved with the use of the FPGA from DE-10 Nano board to deal with the appropriate simultaneous treatment of the audio signals by, mainly, adaptive digital filters (they adapt to the sound signals obtained in order to optimize their processing), and the use of machine learning algorithms trained to recognize the desired event, the present project aims to design an embedded system that could be coupled to a vehicle, which detects the location of gun-shooting events, that when identified will be displayed in a mobile application.

It can be used in urban areas to detect sources of gun shots and even in forbidden hunting areas to identify possible hunters, the problem to be solved has applications that do not restrict the location of the source and of a specific sound, it can be adapted to the recognition of another audible pattern.

Demo Video

  • URL: https://youtu.be/tyNTH2mqUxQ

  • Project Proposal

    1. High-level Project Description

    Acoustic systems of location and event identification have several applications in the everyday world, being present in security systems, earthquake recognition, sonar and various types of man-machine iteration.

    Shooting sound mapping techniques began to be implemented in the last decades, even though it has been a problem of interest since the mid-First World War. In addition to military practices and environmental protection (e.g. detection of hunters in forbidden areas), this mechanism can be used in urban areas, providing instantaneous data to the local police or collecting data for further study of violence in certain areas.

    Aiming to recognize and map specific types of sounds, an idea of an intelligent and self - adaptive system was developed based on the functioning and learning of the auditory system of some species of owls.

    Owls are animals that possess a powerful hunting ability during the night, and to accomplish such a feat, as at night the sight is naturally more overshadowed by the absence of light, the owl has to use other benefits of evolution to improve accuracy of predicting the location of your dinner, one of them is the sound.

    Experiments conducted by neurobiologists Eric I. Knudsen and Masakazu Konishi in Mechanisms of sound localization in the barn owl have been able to prove, using barn owls as the species of study, that this species of owl is able to locate a prey being immersed in a totally dark room, only using the sound emitted by its prey.


                                                                                 A barn owl

    The great evolutionary advantage present in this species is related to the considerable asymmetry that exists between their ears, it is known that the left ear is positioned around centimeters below the right ear, and with this difference of height, the owls can receive the information of the emitter with phase shift. From this difference, it becomes possible to accurately measure the location of its target, much like the triangulation positioning process, widely used in telecommunications engineering with telephone networks, or even in satellites. A very interesting video produced by the BBC demonstrates the whole hunting process of this species: How Does An Owl's Hearing Work?.

                                                               Front vision of barn owl skull


                                                                Back vision of barn owl skull

    Owls that locate their prey using a sharp hearing aid are not born with this technique already welldeveloped, thus necessitating an apprenticeship to adapt to their own physical characteristics (skull diameter, height difference between the ears, etc.) that can vary significantly in the same species, beyond that, the owls have on the side of the head channels of rigid feathers that can regulate the passage of sound. Thus, these animals have a very efficient adaptive control, allowing that the accuracy in the location prediction maintains high even when dealing with different environmental conditions or physiological differences inherent to the species.

    The technique of finding the coordinates of an unknown source from delays in reception
    of the signal in receivers distributed in a known manner in space is part of a technique called
    multilateration, which has no trivial solution. It is possible to show with algebra that in an N dimensional space N+1 receivers are needed,
    with known positions, to uniquely determine the coordinates of an unknown source.


    Taking a case of easier visualization, there are 3 known receptors R1, R2 and R3 and a target T with unknown location in an x-y plane.



    When T emits a sound, the receivers detect the signal at different times. From the image below it is realized that R1 will receive the information first in a time t, R2 in t+dT1 and R3 in t+dT2.
    To calculate the distance between T and the i-receptors we have:



    Where v is the sound velocity and ti is the time of arrival of the signal from T to the i-th receptor



    Centered at each of these receptors one can draw the circles C1, C2 and C3:



    The only unknown variables to this system of equations are x, y, and d1. For purposes, it is possible to solve this system by applying direct minimization techniques, otherwise, in the real case, with noises and inaccuracies, these circles do not have a intersection and we need to define cost functions with numerical algorithms (e.g. gradient descendent) that minimizes the error and find and approximate value for T.

    Bibliography

    [1] J. Pak and J. W. Shin, "Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks"

    [2] Renda, William & Zhang, Charlie. (2019). Comparative Analysis of Firearm Discharge Recorded by Gunshot Detection Technology and Calls for Service in Louisville.

    [3] Mandal, Atri & Lopes, C.V. & Givargis, T & Haghighat, A & Jurdak, Raja & Baldi, Pierre. (2005). Beep: 3D indoor positioning using audible sound. 2005.

    2. Block Diagram


    Block Diagram of iOwlT system

     

    FPGA

    The FPGA will have 2 essential modules, the Digital Filters module, that will proccess the sound digitalized by the A/D converter, using parameters adapted to the environment, and the Circular Buffer module, that stores the sound signals in circular buffers to after do the correlation operation.

    A/D Converter

    The DE-10 Nano board has an analog-to-digital converter of only one output, so the signals picked up by the sound detectors pass through a 8:1 multiplexer, observing the datasheet, it is noted that this mux takes a total of 3µs to switch, and therefore the largest possible delay in the acquisition of the signals, i.e considering 8 sound detectors, is 3x7 = 21µs, as the audible frequency is in the range of 20Hz to 20kHz, using the Nyquist theorem, the sampling rate for the set of observed signals is 40kHz, and this results in 25µs, so the analysis of the signals is practically simultaneous, leading to an increase of considerable performance as well.

    HPS (ARM)

    The HPS will have 3 modules, the Adaptive System module, that will change the filters parameters to adapt the solution to the environment, the Multilateration Algorithm, which is the correlation operation combined with the effective measurement of where is the sound emitter using the multilateration technique and the Neural Network, which is responsible to determine if the sound is the desired sound or not.

    3. Intel FPGA virtues in Your Project

    Adapt to changes

    Processing of sound signals made in the FPGA is supported by an adaptive system, and therefore depending on the distance of the sound source, or the type of sound (being more general than gun shots), eventually will result in a change in the variables which determines the nature of the filter, the threshold to start the sound recognition. Therefore, the iOwlT system is adaptable to this feature.

    The present project can be used not only embedded on a police car. Depending on the use of the technology, the iOwlT system may well be positioned in static strategic positions, such as on traffic lights, an interesting application would be to identify a possible earthquake imminence, since the onset of a seismic shake is determined much earlier by sound signals of high intensity but with very low frequency, being audible to animals like horses but not to humans. Such sonorous signals could be identified by the iOwlT system, and therefore there would be a longer preparation time for the coming earthquake.

    Boost Perfomance

    The iOwlT system, using FPGA technology, can performs the multilateration algorithm with an outstanding precision, using the idea of a circular buffer to be the data structure that stores the sound received by each microphone. As the microphones receives the signal with a phase difference, it's possible to see very clear the phase difference by counting the pivot index difference with correlation. As discussed in A/D converter section of Block Diagram, the almost simultaneity of signal analysis contributes to increase considerable performance as well.

    The sampling rate control system and real-time audio capture is of great importance for sound recognition and phase difference calculation between microphones. Having this circular buffered system implemented in hardware is guaranteed that the system will function properly. This same implementation would be very difficult to perform on a normal microprocessor system due to severe time constraints.

    Expands I/O

    The analog inputs of the DE-10 Nano board will mostly be occupied by sound detectors, although 4+1 detectors would solve the problem of precisely determining the target (1 to be the reference), as a form of security, adding extra microphones does not increase the cost considerably and ensures the reliability of the signal that will be further processed.

    The output of the multilateration will result in the location of the sound event, such output will be sent by the Bluetooth module to the connected mobile phone, so that the location of the event can be shown in the application to easily see in a cellphone.

     

    4. Design Introduction

    The project was initially motivated by a recent survey by Correio Braziliense, titled Brazil leads the ranking of firearm deaths in the world, based on this strong data, the idea of locating shooters is quite relevant for public safety worldwide, its also possible to see several examples of alarming news about firearm use in other countries, such as Nearly 40,000 People Died From Guns in US Last Year, Highest in 50 Years.

    Given this, the proposed project has, as its application focus, the allied use of location technology in conjunction with police perfomance. The system can be applied statically, as fixed to posts or buildings, as well as coupled to police cars, which can be alarmed live during their patrol. Therefore, the great user of iOwlT would be the government itself, as a measure of improving national public security, despite the great benefit is to the population.

    The use of Intel FPGA devices is extremely relevant as the detection process is just in time. The possibility of permance boost when using circular buffers to calculate correlation lag is extremely necessary for the project as a mere loss of phase information between microphones can result in a large location detection error, given that the speed of sound is close to 340 m/s.

    In addition, the project needs to be adaptable to the environment, assigning a threshold for analysis of the signal received by the neural network is an important process so that it does not perform calculations all time. Given this problem, it is necessary to use ambient noise determination metrics so that a reasonable threshold can be obtained depending on the application environment, for example, in rural zones there is less noise in relation to urban zones.

    Finally, being able to connect numerous devices attached to the Intel FPGA card is important for the iOwlT system, as for the use of the multilateration algorithm, at least 5 microphones must be in use at the same time (4 + 1 to be the reference). Taking even more advantage of this extensibility of devices, it is possible to establish Bluetooth communication between the DE-10 Nano and devices that are widely used in everyday life, such as mobile phones, making it easier for ordinary users to observe the location of the shooter.

    5. Function Description

    Following the natural flow of the project, first, there is the acquisition of sound data by the A/D Converter and then processed by digital filters.

     

    Digital Filters

    At this point, the received data goes through digital filters, whose purpose is to clear the received signal. To perform such cleaning, it is assumed that ambient noises are more concentrated at higher frequencies, so a low pass filter is applied using the expression below that characterizes a moving average filter.
     
                                                    
     

    Once filtered, the data is sent to circular buffers, where there is a circular buffer for each microphone.

     

    Circular Buffer

    Circular buffers are data structures that are defined by a pivot, where the first data that was placed on the structure is located, and its tail, which is the last data, as shown in the figure below.


    Example of circular buffer running

     
    In iOwlT, since the microphones are arranged at a well-defined distance, the signal received by the microphones is lagged 2 to 2, watching a recording in Audacity software is virtually imperceptible the lag, both visually and listerning, as shown in the figure below

    Same signal detected by two different microphones
     
    However, the idea behind the circular buffer is that two microphones will have their pivots shifted from a given number of samples that can be found through signal similarity analysis methods, the use of the FPGA at this time is crucial because one slight acquisition delay can lead to a considerable calculation error, since the speed of sound is high. In this case, the method used is the cross correlation.
     

                                             

    Example of two sine signals with phase shift

     

                                              

    The cross correlation of the sine functions

    It is observed that by applying the method, a peak is obtained in the operation, and the distance from that peak to the center represents the N amount of samples shifted between the analyzed microphones, and it is now possible to defined the lag time dT using the sampling rate Fs, in iOwlT system Fs is 16 kHz. In the example above, the phase shift is observed seeing that the peak is not in the center.

     

    Neural Network

    Before determining the cross-correlation between the signals, one must first know if these signals are really gunshot sounds, and for this, neural networks are used. Firstly, so that the system does not keep processing the neural network all time, a threshold based on the impulsivity of the signal is used. Since the received signal is considered impulsive, the signal is processed by the neural network and is determined wheter the signal is a shot or not.

    The neural network architecture used in the project is 4-layer MLP (input + 2 hidden layers + output), with neurons of each layer in the order: input, 200, 10, 1. The input layer quantity neurons depends on both sampling rate and the average time that defines the signal (window time). The architecture figure follows below:

                                                        

    Neural Network architecture with Fs = 16 kHz and window time = 0.5 s

    It is also important to highlight that due to the flexibility of the neural network, the system can be trained to identify other sound events (if trained correctly).

     

    Multilateration Algorithm

    Having the microphone delays 2 to 2, it its possible to determine the location of the shooter using the multilateration algorithm, basically the algorithm implemented is a simple solution of the linear system shown below.

    The multilateration linear system

     

    Where tau_i is the time difference of the i-th microphone related to a microphone set as reference, v is the sound velocity and x_i, y_i and z_i are coordinates of each microphone related to a microphone set as reference.

    6. Performance Parameters

    In iOwlT system, there are three important performance parameters that are analyzed:

     

    Threshold

    The first important parameter to be analyzed is the threshold, as it determines whether the neural network should judge whether the detected impulsive sound is a gunshot sound or not. Therefore, tests were made with impulsive sounds such as firework sounds, plastic bags and shooting itself.

    Neural Network

    The second important parameter to be analyzed is the neural network. To train the neural network, a partnership was made with BOPE (brazilian SWAT), where it was possible to create a considerable dataset for training the network. The sounds were recorded in an open environment with a pistol.

                                        

    Creating dataset at BOPE's training camp

     

    Using Holdout validation technique, the neural network took an average perfomance of 91.38%, with the average confusion matrix shown below:

               

    Average confusion matrix

     

    Multilateration Algorithm

    The third important parameter to analyze is the multilateration algorithm. To this, an arrangment of 5-legs umbrella-shaped microphones was constructed, where each leg has a microphone in the end that will be used for the calculation of multilateration, and in the center has a microphone that has the purpose of the threshold and neural network process.

                    

    The iOwlT system

     

    The coordinate system origin was defined at the center of the pentagon, and the y-axis as one of the legs. Thus, using a measuring tape, the actual distance values of a sound emitted were compared with the values found by the multilateration algorithm. The main idea was to analyze the system error (both distance and direction) for the four quadrants and varying distances (4 times for each point and took the average), as shown in the table:

     

                       

                          Table of angle direction system error

     

               Table of distance system error  

     

    The echo problem

    It is curious to observe that the iOwlT system was very good at determining the direction of the sound detected, with a 5.12% error precision in the worst case, as seen in the Table. Otherwise, the error precision to the measurements of distance was bigger, this can be explained with the echo phenomenom.

    The great problem is that the echo mimetizes the sound emitted by a source in another position, creating a pseudo-source emitting the same sound, which can impose a confusion to the iOwlT system, as it uses only the phase difference of the sound to doing calculations.

    One possibility to amenize the error distance due to echo problem is to treat the signal with a filter, maybe based on LMS or RLS algorithm. In more open areas, the echo is not a problem, so the addition of those filters are a decision that can be done based on the application.

     

     

     

    7. Design Architecture

    iOwlT system design scheme

    Hardware circuit for every microphone

    Each of the microphones used required an external FPGA circuit before they could be connected to the A/D Converter. This circuit aims to polarize the microphones to allow them to operate, amplify the signal, and offset the signal to the A/D Converter conversion range. In addition there are two protection diodes that prevent a very high voltage from being delivered to the FPGA.

    Software flow



    10 Comments

    Maria Dias
    Thank you for this work! A great impact in Brazil's security is surely the key for our development!
    🕒 Jul 07, 2019 11:51 AM
    Arlene Haines
    Congrats for such an inspiring project!
    🕒 Jul 07, 2019 11:41 AM
    Dr. Jason Thoreou
    An interesting project!

    Glad to see someone working at the intersection of wide variety of domains. All the best!
    🕒 Jul 06, 2019 02:33 PM
    AS026🗸
    Thank you, Dr. Jason! This competition is a great opportunity to learn and to apply our knowledge to solve real problems, I'm curious to see your feedback to the final implementation.
    🕒 Jul 07, 2019 08:39 AM
    carlos silva
    Even Bolsonaros' policys cut the support for reasearch you keeping believing in to do science in Brazil. Congrats!!!
    🕒 Jul 05, 2019 10:10 AM
    AS026🗸
    Thank you, Carlos!
    🕒 Jul 07, 2019 08:35 AM
    Lucas matheus zirondi
    Good luck! Wish you all the best!
    🕒 Jul 01, 2019 11:40 PM
    AS026🗸
    Thank you, Lucas!
    🕒 Jul 07, 2019 08:35 AM
    Hygor Jardim da Silva
    Excellent project, congratulations to those involved. It has enormous potential for several Brazilian regions and also for the world.
    🕒 Jul 01, 2019 09:07 AM
    AS026🗸
    Thank you, Hygor! I really appreciate your words, it is such a pleasure to represent Brazil in this global competition, hope you will like the final implementation.
    🕒 Jul 01, 2019 10:41 PM