Annual: 2019

AP047 »
Realtime sign language translation to speech using DNN
📁Machine Learning
👤Ramith Hettiarachchi
 (University of Moratuwa)
📅Jun 30, 2019
Regional Final

29


👀 360   💬 8

AP047 » Realtime sign language translation to speech using DNN

Description

From the world’s population, reportedly there are a considerable amount of people suffering from speech disorders such as muteness, Apraxia (childhood/acquired) and Aphasia. These may occur due to brain damage, stroke, head injury, tumor or any other illness that affects the brain/ vocal cords/ mouth/ tongue etc. Our device mainly focuses the community that suffers from these diseases that cannot be cured by speech-language pathologists.

Existing solutions include image processing techniques where image frames are processed and decoded to text and text-speech converters which converts typed text into electronic vocalizations. However, devices that use cameras require to adapt to various/ sudden lighting conditions with the same high quality which may lead to less accuracy in such conditions. Also, eye contact-eye contact speech opportunity is obstructed in the text-speech method which will not give the person a natural conversation experience.

In order to address these problems and give the users a real time experience in communicating with another we propose a system designed to recognize gestures (Sign Language/ Fingerspelling) using Electromyography (EMG) and Inertial measurement unit (IMU) sensors. The DE-10 Nano kit will get these sensor readings and a pre-trained Deep Neural Network (DNN) will be used for inferencing. Since inferencing will be done on the FPGA board itself, the need for a separate computational device will be eliminated which will make our device a real time processing, portable device. Ultimately this will enable people in need to communicate in an efficient way.

The output is given in both voice and text formats and it will support upto five spoken languages - English, Chinese, French, Hindi and Arabic. An Arduino Nano will be used to interface the output devices. A speaker and a HC-05 bluetooth module is used as output devices. The transcript of the translated sign language can be seen through a mobile device connected via bluetooth.

Possible future extensions of this work include, the addition of any sign language as input and supporting another language as output in addition to the five built in languages.The community support will be very useful in scaling this to multiple sign languages and speech.

Project Proposal

1. High-level Project Description

From the world’s population, reportedly there are a considerable amount of people suffering from speech disorders such as muteness, Apraxia (childhood/acquired) and Aphasia. These may occur due to brain damage, stroke, head injury, tumor or any other illness that affects the brain/ vocal cords/ mouth/ tongue etc.  Exclusion from communication can have a significant impact, specially when communicating with normal people in their day to day life. Additional social issues such as feelings of loneliness, isolation and frustration may come up. Our device mainly focuses the community that suffers from these diseases that cannot be cured by speech-language pathologists.

Existing solutions include image processing techniques where image frames are processed and decoded to text and text-speech converters which converts typed text into electronic vocalizations. However, devices that use cameras require to adapt to various/ sudden lighting conditions with the same high quality which may lead to less accuracy in such conditions. Also, eye contact-eye contact speech opportunity is obstructed in the text-speech method which will not give the person a natural conversation experience.

In order to address these problems and give the users a real time experience in communicating with another we propose a system designed to recognize gestures (Sign Language/ Fingerspelling) using Electromyography (EMG) and Inertial measurement unit (IMU) sensors. The DE-10 Nano kit will get these sensor readings and a pre-trained Deep Neural Network (DNN) will be used for inferencing. Our system will be in the form of a single-board computer based device using FPGA for complex computations since, it will be processing gestures in real time and it will be capable of giving the output in 5 different languages. Since inferencing will be done on the FPGA board itself, the need for a separate computational device will be eliminated which will make our device a real time processing, portable device. Ultimately this will enable people in need to communicate in an efficient way.

The output is given in both voice and text formats and it will support upto five spoken languages - English, Chinese, French, Hindi and Arabic. An Arduino Nano will be used to interface the output devices. A speaker and a HC-05 bluetooth module is used as output devices. The transcript of the translated sign language can be seen through a mobile device connected via bluetooth.

Possible future extensions of this work include, the addition of any sign language as input and supporting another language as output in addition to the five built in languages.The community support will be very useful in scaling this to multiple sign languages and speech.

2. Block Diagram

3. Intel FPGA Virtues in Your Project

I/O Expansion

Our proposed system makes use of incorporating a HC-05 low power bluetooth module inorder to communicate with the 2x MYO Armbands. DE-10 Nano will be configured to communicate through the UART protocol. Furthermore, the Arduino expansion header on the DE-10 Nano will be useful to display the decoded text on an OLED display. Furthermore speaker will be directly controled by the DE-10 Nano board.

Scalability

Even though we propose to deploy a trained Deep Neural Model based on the training data collected by many users, per user modifications and fine tuning can be done overtime. For an example the initial trained model may not carry the optimal parameters used to describe a user’s gestures. Therefore, based on the users input, we can perform a calibration stage so that future readings will be more accurate for that particular user.

Boosts Performance

There have been previous work on the area of gesture recognition using mobile phones. However due to the limited processing power, all the methodologies have been limited to traditional machine learning approaches of SVM, Naive Bayes, Random Forest and MLP. 

With the use of DE-10 Nano FPGA, we will have the power of accelerated hardware to perform real time inferencing using our deep learning Model. Ultimately this will enable us to have much better precision in decoding sign language gestures. 

Furthermore since we will be getting data form 2x MYO bands, the data streams need to be fed to the inference engine without exceeding the latency threshold. This will be effectively addressed as a parallel processing technique using the FPGA.

MYO Armband

(MYO armband)

 

4. Design Introduction

Our device mainly focuses the community that suffers from these diseases that cannot be cured by speech-language pathologists. The target group includes, people suffering from speech disorders such as muteness, Apraxia (childhood/acquired) and Aphasia. They often find it really challenging to convey their messages to people who do not understand sign language.

The purpose of this design is to build a mobile system which is powerful enough to translate sign language gestures to text and speech. The existing systems which use a trained deep neural network, make use of a seperate computer which performs the computations. However this becomes an issue in practical applications. Therefore, the DE-10 Nano will play a major role to get accellerated performance while scaling the device to a small form factor. 

Why Intel FPGA?

There have been previous work on the area of gesture recognition using mobile phones. However due to the limited processing power, all the methodologies have been limited to traditional machine learning approaches of SVM, Naive Bayes, Random Forest and MLP. 

However, with the use of DE-10 Nano FPGA, we will have the power of accelerated hardware to perform real time inferencing using our deep learning Model. Ultimately this will enable us to have much better precision in decoding sign language gestures. 

5. Function Description

The IMU and EMG data will be extracted from 2 MYO armbands which has the following specifications,

  • Eight EMG electrodes
  • 9-axes IMU composed of a 3-axes accelerometer
  • 3-axes gyroscope and a 3-axes magnetometer
  • Vibration motor used to alert the user

These floating point numbers will be fed to the inference engine (a trained DNN), which will then classify the gesture based on its probability score.

The selected gesture will be transmitted as an integer to the arduino nano via UART, which will then display characters on the OLED screen 

Intended Features from DE-10 Nano as the Main Computational Device

  • Powerful sensor fusion of (IMU and EMG) of 2x MYO armbands 

  • Ability to convert gestures to text (OLED).

  • Convert sign language gestures to speech.

  • Text to speech language can be customized.

  • Sending a stream of text to a remote computer.

  • Sending Urgent/Emergency commands.

  • Filter out general hand movements from sign commands

 

The exact output control logic can be implemented through the DE-10 board itself, however because the process of outputing the decoded text is not time critical we make use of the intermediate device (Arduino Nano)

6. Performance Parameters

The performance of our device can be evaluated on two metrics; firstly, the overall accuracy of the translation of sign language to speech/ text, and secondly the efficiency of the system in terms of speed, energy consumption, and memory usage.

A study[1] shows that traditional machine learning approaches of SVM, Naive Bayes, Random Forest and MLP have given respective average accuracies of 81.38 %, 90.66%, 80.47% and 90.45 %. Our aim is to drift from a traditional machine learning approach and harness the power of a DNN to get a higher accuracy.

[1]

@inproceedings{paudyal2016sceptre,
title={Sceptre: a pervasive, non-invasive, and programmable gesture recognition technology},
author={Paudyal, Prajwal and Banerjee, Ayan and Gupta, Sandeep KS},
booktitle={Proceedings of the 21st International Conference on Intelligent User Interfaces},
pages={282--293},
year={2016},
organization={ACM}
}

7. Design Architecture



8 Comments

Pravilasha Ramakrishnan
Good concept,all the best
🕒 Jul 06, 2019 03:16 AM
AP047🗸
Thank You! :)
🕒 Jul 06, 2019 11:07 AM
Aba Gn
Good concept! Best of luck!
🕒 Jul 05, 2019 02:13 AM
AP047🗸
Thank You! :)
🕒 Jul 05, 2019 05:36 AM
Amaya Dharmasiri
Great stuff!! good luck
🕒 Jul 05, 2019 12:48 AM
AP047🗸
Thank You! :)
🕒 Jul 05, 2019 12:58 AM
AP047 🗸
Hi,
We are using the MYO's build in Bluetooth, and for the FPGA's end only, intermediate modules - HC05 is used.
🕒 Jul 03, 2019 05:20 PM
Vihanga
Why not use MYO's inbuilt BLE?
🕒 Jul 03, 2019 05:05 PM

Please login to post a comment.