Smart City

Self-Steering Pan-tilt Based on Sound Source Localization

PR013

Shurui Fan (Jianghan University)

Apr 04, 2022 2456 views

Self-Steering Pan-tilt Based on Sound Source Localization

Cardioid directivity microphone has good performance and is used in many scenarios, but it has to adjust its direction when the sound source moves. Or in a videoconferencing system, the camera always need to point to the speaker. Sound source localization (SSL) technology will be used as a guidance to control the pan-tilt carrying the microphone or the camera or something else to point to the sound source. Echo cancellation (EC) and beam forming (BF) will be used to enhance the signal quality. SSL, EC and BF technologies will be developed in Intel FPGA configured with a microphone array. Analog Devices plug-in boards will be used to control the motors for pan-tilt adjusting.

Project Proposal


1. High-level project introduction and performance expectation

Directivity microphone is used in many occations for its good performance. Directivity makes it a good filter for removing noises from other directions either. Nevertheless, we have to adjust its direction occationally to point to the target or place more microphones to achive the excellent sound quality when the target changes its position. Although beam steering technologies can be used to ahieve the directivity ability in microphone arrays, it is stll not good enough to substiture the high fideltiy directivity microphone. In the design, we proposed a method that can adjust the pan-tilt to point to the sound source automatically based on the sound source localization technology implemented in Intel FPGA devices. So a high quality directivity microphone or something else, like camera used in a video conference, assembled on the pan-tilt will be companied to point to the sound source either to achive a good performence and save the cost by decrease the number of the expensive equipments.

The main technologies of sound source localization, like maximizing the steered-response-power (SRP), high-resolution spectral-estimation, time difference of arrival (TDOA), require massive calculations to perform parallel signal processing which are most suitable for implementing within Intel FPGA devices.

 

2. Block Diagram

The main technologies of sound source localization, like maximizing the steered-response-power (SRP), high-resolution spectral-estimation, time difference of arrival (TDOA), require massive calculations to perform parallel signal processing which are most suitable for implementing with Intel FPGA devices.

Figure 1 Block Diagram (Pan-tilt assemble with directivity microphone or camera)

In our design, Filter-and-sum SRP method will be used to estimate the angle θ between x axis of the array and from sound source to the center of the array. By calcuting the delays of every microphone relative to the center, a delay compensate will be applied to each response of the microphone to get the beamformer with the main lobe steering to the desired direction. By adjusting the delays to make the main lobe scanning all the directions and calculating the power at each direction, we can get the maximum power value. It means at that θ the main lobe aligned to the sound source. Then, we can use this information to figure out the adjustment of the servo motors of the pan-tilt to make it pointing to the sound source. A 3-axis MEMS accelerometer from ADI will be used as a feedback to control the adjustment of the pan-tilt.

Figure 2  The filter-and-sum beamformer

 

 

 

3. Expected sustainability results, projected resource savings

High-computational power and low latency response makes FPGA an ideal platform to implement the most time stringent sound-source localization applications. Even with the incremental number of microphones of the array, it is easy to upgrade for its parallel resources. While searching the angle of the sound source, weight vector need to be updated, as well as the power value per angle need to be calculated, which requires massive calculations in realtime too. That's the preponderant feature of FPGA, which makes it a guarantee of accelerating the sound source localization algorithm to make the application realized the real-time detect.

4. Design Introduction

The design aims to provide a pan-tilt that can automatcally rotate to point to the sound source. The equipments fixed on the tilt, like camera, directivity microphone, etc, will move along with it, which makes them a more effective working area or a higher working performence without having to increase the number of them.

In the design, a seven microphone array will be used, and delay estimation is essential to compute the positon of the sound through time difference of arrival of each microphones. Parallel processing is needed to get the sound signals of seven channels simutaneously. It is easy to design a master device to generate the I2S control signals synchronously read sound signals from 7 MEMS microphones.

5. Functional description and implementation

1. Acquisite the sound signals from seven microphones simultaneously.

We will use seven MEMS microphones with I2S interface as the sound acquisition equipment. Six microphones are distributed around a circle with radius of 0.04m, and one located in the center of the circle. The I2S interface is a two-channel protocol means we have to realize 4 I2S receivers at the same time, and they are controlled by same sck and ws signals.

2. LMS Adaptive Filter TDOA

Using LMS adaptive filter to estimate the time delay between the microphone in the center of the circle and  each microphone on the circle. By calculating the peak coefficient of each filter when convergenced, we can estimate the time delay. 

LMS filter is easy to realized in FPGA as shown in the figure.

 

 

 

 

 

 

 

 

 

 

 

6. Performance metrics, performance to expectation

7. Sustainability results, resource savings achieved

8. Conclusion

0 Comments



Please login to post a comment.