AS026 » iOwlT: Sound Geolocalization System
iOwlT is a geolocalization sound system based on the nature of prey searches by an nocturne owl. Using the technique of multilateration of signals, widely used in telecommunications, and the phase shift of a signal detected by distinct sensors well distributed in space, it is possible, with the knowledge of algebra and physics, to prove that in an N-dimensional space, just N+1 detectors are needed to accurately determine the origin of the event. As the real life has 3 dimensions only 4 sound sensors determines the location of an event.
Combining the power of the parallel processing, achieved with the use of the FPGA from DE-10 Nano board to deal with the appropriate simultaneous treatment of the audio signals by, mainly, adaptive digital filters (they adapt to the sound signals obtained in order to optimize their processing), and the use of machine learning algorithms trained to recognize the desired event, the present project aims to design an embedded system that could be coupled to a vehicle, which detects the location of gun-shooting events, that when identified will be displayed in a mobile application.
It can be used in urban areas to detect sources of gun shots and even in forbidden hunting areas to identify possible hunters, the problem to be solved has applications that do not restrict the location of the source and of a specific sound, it can be adapted to the recognition of another audible pattern.
Acoustic systems of location and event identification have several applications in the everyday world, being present in security systems, earthquake recognition, sonar and various types of man-machine iteration.
Shooting sound mapping techniques began to be implemented in the last decades, even though it has been a problem of interest since the mid-First World War. In addition to military practices and environmental protection (e.g. detection of hunters in forbidden areas), this mechanism can be used in urban areas, providing instantaneous data to the local police or collecting data for further study of violence in certain areas.
Aiming to recognize and map specific types of sounds, an idea of an intelligent and self - adaptive system was developed based on the functioning and learning of the auditory system of some species of owls.
Owls are animals that possess a powerful hunting ability during the night, and to accomplish such a feat, as at night the sight is naturally more overshadowed by the absence of light, the owl has to use other benefits of evolution to improve accuracy of predicting the location of your dinner, one of them is the sound.
Experiments conducted by neurobiologists Eric I. Knudsen and Masakazu Konishi in Mechanisms of sound localization in the barn owl have been able to prove, using barn owls as the species of study, that this species of owl is able to locate a prey being immersed in a totally dark room, only using the sound emitted by its prey.
A barn owl
The great evolutionary advantage present in this species is related to the considerable asymmetry that exists between their ears, it is known that the left ear is positioned around centimeters below the right ear, and with this difference of height, the owls can receive the information of the emitter with phase shift. From this difference, it becomes possible to accurately measure the location of its target, much like the triangulation positioning process, widely used in telecommunications engineering with telephone networks, or even in satellites. A very interesting video produced by the BBC demonstrates the whole hunting process of this species: How Does An Owl's Hearing Work?.
Front vision of barn owl skull
Back vision of barn owl skull
Owls that locate their prey using a sharp hearing aid are not born with this technique already welldeveloped, thus necessitating an apprenticeship to adapt to their own physical characteristics (skull diameter, height difference between the ears, etc.) that can vary significantly in the same species, beyond that, the owls have on the side of the head channels of rigid feathers that can regulate the passage of sound. Thus, these animals have a very efficient adaptive control, allowing that the accuracy in the location prediction maintains high even when dealing with different environmental conditions or physiological differences inherent to the species.
Taking a case of easier visualization, there are 3 known receptors R1, R2 and R3 and a target T with unknown location in an x-y plane.
When T emits a sound, the receivers detect the signal at different times. From the image below it is realized that R1 will receive the information first in a time t, R2 in t+dT1 and R3 in t+dT2.
To calculate the distance between T and the i-receptors we have:
Where v is the sound velocity and ti is the time of arrival of the signal from T to the i-th receptor
Centered at each of these receptors one can draw the circles C1, C2 and C3:
The only unknown variables to this system of equations are x, y, and d1. For purposes, it is possible to solve this system by applying direct minimization techniques, otherwise, in the real case, with noises and inaccuracies, these circles do not have a intersection and we need to define cost functions with numerical algorithms (e.g. gradient descendent) that minimizes the error and find and approximate value for T.
 J. Pak and J. W. Shin, "Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks"
 Renda, William & Zhang, Charlie. (2019). Comparative Analysis of Firearm Discharge Recorded by Gunshot Detection Technology and Calls for Service in Louisville.
 Mandal, Atri & Lopes, C.V. & Givargis, T & Haghighat, A & Jurdak, Raja & Baldi, Pierre. (2005). Beep: 3D indoor positioning using audible sound. 2005.
Block Diagram of iOwlT system
The FPGA will have 2 essential modules, the Digital Filters module, that will proccess the sound digitalized by the A/D converter, using parameters adapted to the environment, and the Circular Buffer module, that stores the sound signals in circular buffers to after do the correlation operation.
The DE-10 Nano board has an analog-to-digital converter of only one output, so the signals picked up by the sound detectors pass through a 8:1 multiplexer, observing the datasheet, it is noted that this mux takes a total of 3µs to switch, and therefore the largest possible delay in the acquisition of the signals, i.e considering 8 sound detectors, is 3x7 = 21µs, as the audible frequency is in the range of 20Hz to 20kHz, using the Nyquist theorem, the sampling rate for the set of observed signals is 40kHz, and this results in 25µs, so the analysis of the signals is practically simultaneous, leading to an increase of considerable performance as well.
The HPS will have 3 modules, the Adaptive System module, that will change the filters parameters to adapt the solution to the environment, the Multilateration Algorithm, which is the correlation operation combined with the effective measurement of where is the sound emitter using the multilateration technique and the Neural Network, which is responsible to determine if the sound is the desired sound or not.
Adapt to changes
Processing of sound signals made in the FPGA is supported by an adaptive system, and therefore depending on the distance of the sound source, or the type of sound (being more general than gun shots), eventually will result in a change in the variables which determines the nature of the filter, the threshold to start the sound recognition. Therefore, the iOwlT system is adaptable to this feature.
The present project can be used not only embedded on a police car. Depending on the use of the technology, the iOwlT system may well be positioned in static strategic positions, such as on traffic lights, an interesting application would be to identify a possible earthquake imminence, since the onset of a seismic shake is determined much earlier by sound signals of high intensity but with very low frequency, being audible to animals like horses but not to humans. Such sonorous signals could be identified by the iOwlT system, and therefore there would be a longer preparation time for the coming earthquake.
The iOwlT system, using FPGA technology, can performs the multilateration algorithm with an outstanding precision, using the idea of a circular buffer to be the data structure that stores the sound received by each microphone. As the microphones receives the signal with a phase difference, it's possible to see very clear the phase difference by counting the pivot index difference with correlation. As discussed in A/D converter section of Block Diagram, the almost simultaneity of signal analysis contributes to increase considerable performance as well.
The sampling rate control system and real-time audio capture is of great importance for sound recognition and phase difference calculation between microphones. Having this circular buffered system implemented in hardware is guaranteed that the system will function properly. This same implementation would be very difficult to perform on a normal microprocessor system due to severe time constraints.
The analog inputs of the DE-10 Nano board will mostly be occupied by sound detectors, although 4+1 detectors would solve the problem of precisely determining the target (1 to be the reference), as a form of security, adding extra microphones does not increase the cost considerably and ensures the reliability of the signal that will be further processed.
The output of the multilateration will result in the location of the sound event, such output will be sent by the Bluetooth module to the connected mobile phone, so that the location of the event can be shown in the application to easily see in a cellphone.
The project was initially motivated by a recent survey by Correio Braziliense, titled Brazil leads the ranking of firearm deaths in the world, based on this strong data, the idea of locating shooters is quite relevant for public safety worldwide, its also possible to see several examples of alarming news about firearm use in other countries, such as Nearly 40,000 People Died From Guns in US Last Year, Highest in 50 Years.
Given this, the proposed project has, as its application focus, the allied use of location technology in conjunction with police perfomance. The system can be applied statically, as fixed to posts or buildings, as well as coupled to police cars, which can be alarmed live during their patrol. Therefore, the great user of iOwlT would be the government itself, as a measure of improving national public security, despite the great benefit is to the population.
The use of Intel FPGA devices is extremely relevant as the detection process is just in time. The possibility of permance boost when using circular buffers to calculate correlation lag is extremely necessary for the project as a mere loss of phase information between microphones can result in a large location detection error, given that the speed of sound is close to 340 m/s.
In addition, the project needs to be adaptable to the environment, assigning a threshold for analysis of the signal received by the neural network is an important process so that it does not perform calculations all time. Given this problem, it is necessary to use ambient noise determination metrics so that a reasonable threshold can be obtained depending on the application environment, for example, in rural zones there is less noise in relation to urban zones.
Finally, being able to connect numerous devices attached to the Intel FPGA card is important for the iOwlT system, as for the use of the multilateration algorithm, at least 5 microphones must be in use at the same time (4 + 1 to be the reference). Taking even more advantage of this extensibility of devices, it is possible to establish Bluetooth communication between the DE-10 Nano and devices that are widely used in everyday life, such as mobile phones, making it easier for ordinary users to observe the location of the shooter.
Following the natural flow of the project, first, there is the acquisition of sound data by the A/D Converter and then processed by digital filters.
Once filtered, the data is sent to circular buffers, where there is a circular buffer for each microphone.
Circular buffers are data structures that are defined by a pivot, where the first data that was placed on the structure is located, and its tail, which is the last data, as shown in the figure below.
Example of circular buffer running
Example of two sine signals with phase shift
The cross correlation of the sine functions
It is observed that by applying the method, a peak is obtained in the operation, and the distance from that peak to the center represents the N amount of samples shifted between the analyzed microphones, and it is now possible to defined the lag time dT using the sampling rate Fs, in iOwlT system Fs is 16 kHz. In the example above, the phase shift is observed seeing that the peak is not in the center.
Before determining the cross-correlation between the signals, one must first know if these signals are really gunshot sounds, and for this, neural networks are used. Firstly, so that the system does not keep processing the neural network all time, a threshold based on the impulsivity of the signal is used. Since the received signal is considered impulsive, the signal is processed by the neural network and is determined wheter the signal is a shot or not.
The neural network architecture used in the project is 4-layer MLP (input + 2 hidden layers + output), with neurons of each layer in the order: input, 200, 10, 1. The input layer quantity neurons depends on both sampling rate and the average time that defines the signal (window time). The architecture figure follows below:
Neural Network architecture with Fs = 16 kHz and window time = 0.5 s
It is also important to highlight that due to the flexibility of the neural network, the system can be trained to identify other sound events (if trained correctly).
Having the microphone delays 2 to 2, it its possible to determine the location of the shooter using the multilateration algorithm, basically the algorithm implemented is a simple solution of the linear system shown below.
The multilateration linear system
Where tau_i is the time difference of the i-th microphone related to a microphone set as reference, v is the sound velocity and x_i, y_i and z_i are coordinates of each microphone related to a microphone set as reference.
In iOwlT system, there are three important performance parameters that are analyzed:
The first important parameter to be analyzed is the threshold, as it determines whether the neural network should judge whether the detected impulsive sound is a gunshot sound or not. Therefore, tests were made with impulsive sounds such as firework sounds, plastic bags and shooting itself.
The second important parameter to be analyzed is the neural network. To train the neural network, a partnership was made with BOPE (brazilian SWAT), where it was possible to create a considerable dataset for training the network. The sounds were recorded in an open environment with a pistol.
Creating dataset at BOPE's training camp
Using Holdout validation technique, the neural network took an average perfomance of 91.38%, with the average confusion matrix shown below:
Average confusion matrix
The third important parameter to analyze is the multilateration algorithm. To this, an arrangment of 5-legs umbrella-shaped microphones was constructed, where each leg has a microphone in the end that will be used for the calculation of multilateration, and in the center has a microphone that has the purpose of the threshold and neural network process.
The iOwlT system
The coordinate system origin was defined at the center of the pentagon, and the y-axis as one of the legs. Thus, using a measuring tape, the actual distance values of a sound emitted were compared with the values found by the multilateration algorithm. The main idea was to analyze the system error (both distance and direction) for the four quadrants and varying distances (4 times for each point and took the average), as shown in the table:
Table of angle direction system error
Table of distance system error
The echo problem
It is curious to observe that the iOwlT system was very good at determining the direction of the sound detected, with a 5.12% error precision in the worst case, as seen in the Table. Otherwise, the error precision to the measurements of distance was bigger, this can be explained with the echo phenomenom.
The great problem is that the echo mimetizes the sound emitted by a source in another position, creating a pseudo-source emitting the same sound, which can impose a confusion to the iOwlT system, as it uses only the phase difference of the sound to doing calculations.
One possibility to amenize the error distance due to echo problem is to treat the signal with a filter, maybe based on LMS or RLS algorithm. In more open areas, the echo is not a problem, so the addition of those filters are a decision that can be done based on the application.
iOwlT system design scheme
Hardware circuit for every microphone
Each of the microphones used required an external FPGA circuit before they could be connected to the A/D Converter. This circuit aims to polarize the microphones to allow them to operate, amplify the signal, and offset the signal to the A/D Converter conversion range. In addition there are two protection diodes that prevent a very high voltage from being delivered to the FPGA.