Annual: 2018

PR026 »
FPGA realization of sign language interaction device using SVM/HMM
📁High Performance Computing
👤Yin Xie
 (Wuhan University of Science and Technology)
📅Apr 30, 2018
Regional Final


👀 9289   💬 36

PR026 » FPGA realization of sign language interaction device using SVM/HMM

Description

This proposal presents the implementation of the communication between deaf-mutes and the normal with a hardware design named sign language interaction device adopting SVM and HMM. On the one hand, gesture recognition system is to create a system which understands human gesture and translates them into texts and audio. On the other hand, our oral language can be interpreted into texts displayed on LCD. Ultimately, the system is designed to identify 20 Chinese sign language and also real time hand gesture signs. The proposed system is very convenient and high-efficiency due to FPGA implementation which is highly suitable for control of equipment by the handicapped people.

Demo Video

  • URL: http://v.youku.com/v_show/id_XMzU3NDI5MTUzNg==.html

  • Project Proposal

    1. High-level Project Description

    Introduction

    The main purpose of this project is to realize the conversion between sign language and text. The specific process is as follows: the gestures information which comes from camera goes through the decoder, image preprocessing , skin-color detection, edge recognition, SVM classification and HMM which can establish the state model to achieve the purpose of image recognition, and then matches the identified images with the templates that have been trained by SVM and HMM in advance. On the other hand, this project also comes with voice to text function, it can realize the communication between deaf and normal people. Some of the proper nouns used are explained as follows:

    Support Vector Machine (SVM), a classifier, has a strong ability of classification. It can automatically find the sample with good distinctions of classification and use the optimal segmentation between the surface structure of these samples to maximize classes to achieve the classification of the interval static image, assisting to realize the HMM model and gesture recognition.

    Hidden Markov Model (HMM), a statistical model, mainly used to translate SVM classification results into state transition probability distribution to get the information characteristics of the system to achieve the purpose of extracting image information.

     

    Design Purpose

    Our sign language translator is based on the DE10-NANO development suite to complete the design. Terasic DE10-Nano is a development kit based on Intel SoC, which integrates the capabilities of a Cyclone FPGA and a dual core ARM Cortex-A9 processor. HDMI data interface transmits efficiently frame image data from the video decoder to HPS CPU0 module through the Bridge cache, FPGA control circuit to achieve the completion of Gauss filtering, color recognition of image data, DDR3 SDRAM is applied to store the sign template and is parallel to HPS CPU1 for dynamic gesture track to achieve real-time gesture tracking and template matching. In addition, the SVM/HMM algorithm based on FPGA will also be completed in FFT Acceleration Engine module. GPIO interface on HPS part is expanded for text display on LCD. The speech reconstruction algorithm based on HMM realizes the matching of words and speech to maximize utilization of the hardware system and the optimization of the text analysis. The proposal is designed to remove the communication barriers between deaf-mute groups and normal people, our sign language translator offers an approach by hand gesture Chinese character, voice information, text information as a bridge between them to establish a harmonious communication environment.

    Applications

    People who are not able to speak and hear, called deaf-mute, are one of the special population group in our community. The fact that few normal people know about sign language would become a difficulty that impede the development of relationship between these two social groups. The interpreter that we hope to design realizes a two-way communication between them. The combination of FPGA and the HMM&SVM algorithm makes it much easier to implement. The target users are those normal people who want to talk to the deaf-mute, or people who are interested in sign language and want to learn more about it. This machine can be trained to be better when more and more samples of sign language are captured by various people.

     

    2. Block Diagram

    Outline

    Detail Description

    Flowchart

                                   

    Gesture recognition

    The video information was collected from D8M-GPIO and converted into image sequences using the frame extraction. The FFT Acceleration Engine module of FPGA is utilized to realize preprocessing by gaussian filter algorithm. The CPU0 module in HPS is used to obtain the characteristic image by using the Canny operators to realize the image binarization. The feature image is classified by the CPU1 module in HPS, and the results are stored in SDRAM. The matching results are transmitted to the HMM module to predict the probability of the next gesture, then they are transmitted to the LCD by GPIO to get the corresponding text information.

     Speech recognition

    The voice information of the speaker is transmitted through the speech sensor to the HMM module in CPU1 and recognized and transmitted to he LCD.

    Hardware Platform

                                  

    3. Intel FPGA Virtues in Your Project

    1.Combination between FPGA and HPS to provide a complete system development environment facilitates the compatibility and integration of each module, meanwhile, we adopt the top-down the successive refinement method to improve the efficiency.

    2.Based on the look-up table design and parallel operation mechanism, we can quickly implement the DSP algorithm, optimize memory with FFT, and provide operation space for real-time data operation at the same time. it can also provide hardware support for sign recognition design combined with SVM and HMM algorithm.

    3. 1 GB DDR3 SDRAM template database provides a 32-bit data transmission, implements fast data processing between HPS and two pieces of CPU, as well as dynamic sign recognition and tracking.

    4. HDMI (High Definition Multimedia Interface) is applied to transmit the high definition video and high fidelity audio data to improve the effectiveness of the SVM/HMM algorithm.

     

    4. Design Introduction

    In order to realize the function of this translator, apart from this development board, we also need a keyboard to input the command of Linux, a mouse to control the system, a displayer to display our outcomes, as well as a USB flash disk to import video. Some details are shown in the picture below.

     

    The whole project is based on the hardware support and powerful software algorithms. The details of these two aspects are listed below.

    1.Hardware

    A whole development board can be divided into two major parts: FPGA and HPS. There are several functional modules on each parts, which play vital roles to realize various functions. In FPGA, we choose to use FFT Acceleration Engine and HDMI Video Pipeline. FFT Acceleration Engine allows the speed of data processing much more faster, while HDMI Video Pipeline serves as a channel to connect the development board and displayer through which we can observe the outcomes easily. In HPS, SDRAM, USB OTG PHY and SD Card are listed in our consideration. SDRAM is connected with CPU0 and CPU1, as well as 12cache, finishing the controls of the whole system. USB OTG PHY helps us to import live video to the development board. SD Card is used to complete the startup of Linux system and hardware initialization by using Boot Loader.

    2.Software

    We write several algorithms by using C programming language on opencv platform to complete our functions. The algorithms we use are listed below:

           1. Frame Extraction

           2. Gauss Filter

           3. Image Binaryzation

           4. Gesture Tracking

           5. Feature Extraction

           6. Template Training

           7. Template Matching

           8. Sign Recognition

    More details about these algorithms are explained in next section.

                         

    5. Function Description

    In order to realize this function,we use some functions as the following shows

    Function Explanation

    VideoCapture():The invoking of camera

    VideoFrame(string VideoName)

     GaussianBlur(InputArray src, OutputArray dst, Size ksize, double sigmaX, double sigmaY=0, int                 borderType=BORDER_DEFAULT )

    InputArray src: The input image can be the Mat type, and the image depth is CV_8U, CV_16U, CV_16S, CV_32F,  and CV_64F.
    OutputArray dst: The output image has the same type and size as the input image.

    Size ksize: Gauss kernel size, this size is different from the previous two filter kernel sizes, ksize.width and ksize.height can not have to be the same, but these two values must be positive odd numbers, if these two values are 0, their values will be calculated by sigma.
    double sigmaXThe standard deviation of the Gauss kernel function in the direction of the X.
    double sigmaY:The standard deviation of the Gauss kernel function in the Y direction, if sigmaY is 0, then the function will automatically set the value of sigmaY to the same value as sigmaX, and if sigmaX and sigmaY are 0, the two values will be calculated by ksize.width and ksize.height. Specifically, you can refer to the getGaussianKernel () function to see the details. It is recommended that size, sigmaX and sigmaY be specified. 
     int borderType=BORDER_DEFAULT: To deduce some convenient mode of the image external pixel, there is a default value of BORDER_DEFAULT. If there is no special need to change, you can refer to the borderInterpolate () function.

    cvCanny( const CvArr* image,CvArr* edges,double threshold1,double threshold2, int aperture_size=3);

    Image:The input single channel image (which can be a color image) can be modified by cvCvtColor ().

    Edges:The output edge image is also single channel, but it is black and white.

    threshold1: A first threshold threshold2: A second threshold

    aperture_size Sobel: The size of the kernel operator

    intial(Mat src)

    accbackgound(int count);

    backgound(int count);

    foregound(Mat src,Mat pre);

    skin(Mat src) Using YCrCb space to realize skin-color detection

    FindBigContour(IplImage* src,CvSeq* (&contour),CvMemStorage* storage)

    ComputeCenter(CvSeq* (&contour),CvPoint& center,float& radius)

    GetFeature(IplImage* src,CvPoint& center,float radius,float angle[FeatureNum][10],

                         float anglecha[FeatureNum][10],float count[FeatureNum])

    OneGestureTrain(CString GesturePath,CvFileStorage *fs,GestureStruct gesture)

    Train()

    Train() Vddio2Img (String VideoName)

     

     

     

     

     

     

     

     

     

    6. Performance Parameters

    Considering the time cost and hardware performance, we perform Support Vector Machine (SVM) training on 6 static gestures.

    1.50 training sets and 50 test sets. The results are as follows:

     

    Gesture

    Correct Num

    Error Num

    Accuracy

    Total time for 50 Images /s

    “你”

    47

    3

    94%

    2.3~2.4

    “好"

    48

    2

    96%

    2.2~2.5

    F

    49

    1

    98%

    2.2~2.4

    “P”

    47

    3

    97%

    2.3~2.4

    “G”

    47

    3

    94%

    2.1~2.5

    “A”

    48

    2

    96%

    2.2~2.4

     

     

    2 .60 mixed gesture recognition, statistical results and time efficiency table:

     

    Gesture

    Correct Num

    Error Num

    Accuracy

    Total time for 60 Images/s

    “你”

    9

    1

    90%

    3.1~3.5

    “好”

    8

    2

    80%

    2.8~3.0

    F

    9

    1

    90%

    2.6~2.8

    “P”

    9

    1

    90%

    2.7~3.0

    “G”

    8

    2

    80%

    2.5~2.7

    “A”

    9

    1

    90%

    2.6~2.9

                                                                                     (note that all Models are trained in MATLAB platform)

    Since the training data come from the indoors scene the test also needs to be trained much enough data . It can be seen from the data that  static recognition achieves better performance than mixed identification.It has a great relationship with the training data , and it also reflects some shortcomings of the SVM in the classification mechanism  in terms of the efficiency for dynamic recognition. In addition, the introduction of HMM will greatly improve the performance . Due to the limited time, the research on HMM is still in the commissioning stage. We will submit corresponding evaluations in the follow-up process to further improve the solution.


     
     

    7. Design Architecture



    36 Comments

    Tariq Ziad Kanaan
    Hello
    Good luck,keep going
    🕒 Apr 29, 2018 03:12 AM
    PR026🗸
    thanks,we will
    🕒 May 05, 2018 03:29 PM
    Tia Jiang
    Great! I've voted for you.
    Can you vote for our project too? The link is:
    http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=PR130
    3q~~
    🕒 Jan 30, 2018 08:21 PM
    Taufik
    Interesting project, look forward to your project.
    🕒 Jan 30, 2018 11:57 AM
    Chen Junbo
    good idea!
    🕒 Jan 30, 2018 11:31 AM
    PR026🗸
    thanks for your support! we will keep going.
    🕒 Jan 30, 2018 12:23 PM
    Tia Jiang
    please consider viewing my project
    http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=PR130
    3q so much~
    🕒 Jan 30, 2018 08:19 PM
    Nagaraj Shivaramaiah
    Nice project. How fast can this go in real-time (signs or words/minute)? What is the validation strategy/plan?
    🕒 Jan 30, 2018 12:23 AM
    PR026🗸
    Dear judge,thanks for your support. This is a good question, to be honr, we are trying to deal with this problem since the before test is almost the ideal condition we assume the signs are in approximately the same frequency. as for the validation stategy, we adopt SVM/ HMM to classify the static image with SVMand extract the " meaningful" image with HMM method, meanwhile, we want to optimize our algorithm with Hu matix. we are trying.
    🕒 Jan 30, 2018 02:02 PM
    111
    Nice project! I've voted for you.
    Can you vote for our project too? The link is:
    http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=PR132
    thank you !
    🕒 Jan 26, 2018 07:04 PM
    PR026🗸
    thanks for your voting.
    I have vote for your project since it's a practical project,come on!
    🕒 Jan 28, 2018 10:02 AM
    PR026 🗸
    Our New Version Hello 1.1.1
    More detailed design and optimized algorithm
    🕒 Jan 26, 2018 12:14 PM
    Donald Bailey
    How do you plan to do the segmentation of the hand from the background? What features are you using for the sign recognition / classification?
    🕒 Jan 26, 2018 05:38 AM
    PR026🗸
    Dear judge,thanks for your voting.
    For segmentation part,we adopt RGB-YCrCb mode to recognize hands region,then filt the connected domain which does not belong to the hands region by setting the threshold.
    For features extraction,we are trying to optimize the SVM/HMM algorithm with Hu Matix,which keeps the features of hands' geometrical characteristics,such as revolvtion,translation,scale and so.
    🕒 Jan 27, 2018 08:31 AM
    Jun Fu
    不错的作品,期待看到完成。
    🕒 Jan 25, 2018 06:51 PM
    PR026🗸
    感谢您的肯定,我们会更加努力的,事实上我们遇到过很多问题,我们正在尝试一点点解决,相信很快就有新突破的,感谢!
    🕒 Jan 26, 2018 09:37 AM
    Jun Fu
    完成的作品值得称赞,但没有完全达到预期的只要付出努力也是好作品。我一般的观点是,先解决有没有,再解决好不好。至于会有多好,有努力就足够了!祝顺利!
    🕒 Jan 29, 2018 10:20 AM
    PR026🗸
    谢谢您的鼓励,我们会继续努力。
    路漫漫其修远兮,吾将上下而求索!
    🕒 Jan 29, 2018 07:39 PM
    Adobe
    good luck,better design,best giving
    🕒 Jan 25, 2018 11:53 AM
    Adobe
    Create a innovative interface!
    🕒 Jan 25, 2018 11:45 AM
    Apple
    Innovate FPGA,come on!
    🕒 Jan 25, 2018 11:36 AM
    Google
    There is a long way to go,hope you can do it!
    🕒 Jan 25, 2018 11:11 AM
    Doreen Liu
    加油, 期待你的作品!
    🕒 Jan 19, 2018 05:59 PM
    PR026🗸
    感谢您的支持,我们会努力做得更好。您的鼓励将是我们突破难关的动力,加油!
    🕒 Jan 25, 2018 11:02 AM
    MOHAMED
    good, Keep going.
    🕒 Jan 14, 2018 11:54 PM
    MaseratiHh
    66666
    🕒 Jan 13, 2018 11:59 PM
    berkay egerci
    i voted good project
    🕒 Jan 13, 2018 04:11 AM
    一一
    Good job! Waiting for good news from your guys!
    🕒 Jan 12, 2018 08:25 PM
    Yiren
    excellent!
    🕒 Jan 12, 2018 08:20 PM
    日月潭
    good
    🕒 Jan 07, 2018 01:32 PM
    liushunyun
    good job!
    🕒 Jan 01, 2018 10:22 PM
    lixiangxiang
    so great your project!
    go ahead
    🕒 Jan 01, 2018 10:18 PM
    Wuzhe
    ヾ(@^▽^@)ノ
    🕒 Jan 01, 2018 04:36 PM
    Zhou Jiao
    Happy New Year!
    🕒 Jan 01, 2018 10:31 AM
    Apike
    Come On!
    🕒 Jan 01, 2018 10:07 AM
    PR026 🗸
    Our Device Name Is "Hello",Which reflects our idea-To say hello to the deaf-mutes.In this special day,don't forget to express your blessing to those people around you.
    Merry Christmas to all the FPGA groups,Come On!
    🕒 Dec 24, 2017 12:34 PM