Annual: 2019

AS015 »
Keyword Spotting
📁Machine Learning
👤Walter Gontijo
📅Sep 29, 2019
Regional Final

👀 3319   💬 1

AS015 » Keyword Spotting


This project is dedicated to the implementation of an FPGA-based acoustic keyword spotting (KWS) system for the portuguese language. Such system performs real-time processing using MFCC extraction as pre-processing and a convolutional neural network (CNN) as the classifier.

Demo Video

  • URL:

  • Project Proposal

    1. High-level Project Description

    Figure 1 shows the High-level description of the proposed work. Initially, the Acoustic KWS topology is defined, using an MFCC extractor and a CNN. Subsequently, a Speech Database in Brazilian Portuguese is developed. In the sequence, a Python code is developed to the training and generation of the CNN model and the parameters of the MFCCs Extractor. Finally, the FPGA implementation is performed considering the blocks: Framing, MFCCs Extractor, CNN, and Decisor, as shown in Figure 1.

    Figure 1Figure 1 - Acoustic KWS High-level Description


    2. Block Diagram

    Is presented in the Figure 2 the block diagram of the FPGA-based Acoustic Keyword Spotting implementation.


    Figure 2 - FPGA-based Acoustic KWS implementation.


    How presented in Figure 2, the Acoustic KWS implementation is composed by the blocks Framing, MFCCs Extractor, CNN, Decisor. The Figure 3 presents the Framing block diagram. 

    Figure 3 - Framing block diagram.

    The Figures 4 and 5 shows the block diagrams of MFCCs Extractor and CNN, respectively.

    Figure 4 - MFCCs Extractor Block Diagram.


    Figure 5 - CNN Block Diagram.

    3. Intel FPGA virtues in Your Project

    The main resources considered in an intel-FPGA for the development of the present project are:

    - Parallelism techniques;

    - Pipeline techniques;

    - MLAB - Number of useds FPGA internal memories;

    - Number of useds 9x9 DSP blocks;

    - Number of useds logical blocks.

    These features are available in Intel-FPGAs (Cyclone V Family) and it will allow the implementation in a single device of complex algorithms of digital signal processing (MFCC and CNN). Furthermore, it is intended to run the KWS system developed in real time processing into a DE10-Nano kit.

    4. Design Introduction


    The purpose of this design is create a high performance keyword classifier using neural networks in FPGA. Therefore, FPGA implementations using neural networks and also audio processing classifiers might be facilitated.


    Application Scope

    This project could be used by R&D companies to use neural networks in FPGA, in order to process these networks speedly. Using parallel processing and pipeline techniques, offered by FPGA devices, would provide fast track using deep neural networks, needed by e.g. in autonomous cars.

    This project could be used also to brasilian internal research, in order to provide affordability to physical deficients, using speech command based system to perform actions on their daily routine.


    Using Intel FPGA Devices

    For sure, this project has been benefited by an Intel FPGA device, which is used because its logic resources, high speed grades and also its internal DSP and Memory M10k blocks. Without these features, would be impossible to implement this high performance keyword spotting using neural networks.

    5. Function Description


    This project receives as input a digital audio with rate of 48 kHz, which is decimated to 8 kHz. After that, this digital audio is sent to a MFCCs extractor, which extracts 15 MFCCs per 256 audio samples. Later, this MFCCs are fetched by a convolutional neural network which classifies the keywords spoken into this audio. The output is used as a trigger to activate one of three memories, which will respond the keyword spoken into the audio input.

    How implement it

    To implement this project is needed a previows train of the neural network to after fetch the weights and the depth of the neural network. It's also needed to define the MFCCs hiper-parameters, which are: FFT length, hop length, number of mel filters and MFCCs number.

    To receive the input audio and send the output audio is needed an audio Codec. In this work was used the AD1836 Codec to receive the digital audio input, which needs a SPI and I2S communication. To extern the output audio was used a Low Pass Sigma Delta Modulator, easily implemented in FPGA.

    After these steps, is needed implement this four blocks in FPGA and also the memories to store the responses.



    6. Performance Parameters

    Parameters Needed to Reach

    To use this project in real time is needed these parameters.

    • At least 40 splitted internal memories, which will be used separately;
    • At least 20 interal multipliers, used to calculate the MFCCs and to perform the convolutional calculus and matrix calculus, needed into the CNN;
    • At least a clock frequency to perform all of those calculus within a restricted time defined by the real time specification.

    Current Situation Using Intel FPGA Devices

    • Used just 10% of all internal memory block available;
    • Used just 10% of all internal multipliers available on DE10-Nano's FPGA chip (5SCE10CE4C7N);
    • Clock frequency needed to perform real time: 4 MHz, Max clock frequency obtained: 300 MHz. Compairing these frequencies, this project has a frequency slack of almost 100 times. For sure, due to Intel FPGAs features and benefits.

    7. Design Architecture

    Hardware Design

    The hardware design is splitted in two pieces: MFCCs Extractor and Convolutional Neural Netowrk.

    MFCCs Extractor

    The implementation of the MFCCs extractor uses these blocks


    The Energy Spectrum Calculator and DCT Calculator blocks use a FFT calculator to perform their calculus. 


    And the layers (camadas) are 

    The implementation of each layer is decribed as follow


    Mandy Lei
    Good topci! Which FPGA board will you use?
    🕒 Jun 27, 2019 09:13 PM

    Please login to post a comment.