AP070 » Image Captioning Based Aid for Blinds Using Deep Learning
According to World Health Organization (WHO), The population of visually impaired is estimated for the year 2010 to be 285 million globally. This massive population have to deal with many environmental, social, technological challenges in daily life. It is difficult for them to navigating outside the spaces that they are accustomed to, they can’t easily participate in social activities, blindness restricts their career options that affects their finances and their self esteem. Blindness can make it difficult to use the internet for research, general purpose or social media. Impaired vision not only affects an individual physically, but also affecting their emotional health. Becoming familiar with the challenges that blindness creates can help sighted people understand better about their problems and the importance of this project. Our team taking a step forward to create a bridge between visually impaired lifestyle with normal lifestyle.
The proposed idea is about to create a wearable device that can guide a visually impaired individual in daily life. This device is mounted with a camera and actuators (speaker) interfaced with FPGA (OpenVino starter kit), the device takes the input from external environment as an image, and generate a meaningful output understandable by visually impaired person. We are implementing image captioning algorithm using Convolutional Neural Network (CNN). In the first stage, the project will be able to generate output through actuators understandable by visually impaired individual. After successful implementation of the first stage, the project will be upgraded with voice output using LSTM architecture and a text to speech generator
Computer vision has become ubiquitous in our society, with applications in several fields. In this project, we focus on one of the visual recognition facets of computer vision, i.e image captioning. Due to the recent advancements in the field of object detection, the task of scene description in an image has become easier. We can create a wearable product for the blind which will guide them travelling on the roads without the support of anyone else. This is done by interfacing a camera sensor with FPGA i.e OpenVINO Starter Kit, that captures real time images and feed those image frames to FPGA. Then a CNN based Image Captioning Algorithm logic inside FPGA, takes the input image frames as an input data for already trained Convolution Neural Network(CNN). The CNN then generates a meaningful output which is sent to external environment using actuators, that can be sensed by a blind person.
During the training period of CNN, Data set is provided from SD card interfaced with the OpenVINO Starter Kit.
Features of our system
Read sign boards
Read text in image (in Phase 2)
Voice output that explain the scenario in image frame (in phase 2)
Challenges to face
In CNN, there is a huge number of complex Arithmetic computation, that operates on IEEE 754 standard based Floating point numbers. For the complex calculation, it is required to design Floating Point Unit (FPU) that can perform addition, subtraction, multiplication and division operations for IEEE 754 floating points also there will be the requirements for the calculating of activation functions. It is a must to design all these complex computation units that generates output with high accuracy and optimized latency for faster computation. The main challenge is to derive accurate results in real time when used in real life scenarios.
Interfacing RAM and SDcard with FPGA and HPS during training
Designing LSTM architecture and speech output in real time
Progress done so far
We have successfully designed an Artificial Neural Network having 5 layers with parameterized number of neurons. This ANN performs forward pass in [4(L0+L1+L2+L3)] clock cycles and backward propagation in [12(L0+L1+L2+L3)+4(L1+L2+L3+L4)] clock cycles, where (L0-L5) are the number of neurons in respected layers.
FPGA suits best for our project due to the following reasons
1. They can easily be customized to an embedded system, for eg. a wearable device that can detect any undesirable changes in the body and are able to make the decision that is the condition of the body is worth informing the doctors or not.
2. Unlike processors (GPU) who need OS as a part of a software stack, memory management a juggling processor capacity FPGA doesn't require the extra baggage of an OS FPGA replaces software functionality with hardware which boosts the performance of the system.
3. FPGA execute in parallel, that provides deterministic hardware circuit to all the sensors working parallelly committed to their task at an increased speed during the processing of the sensor data.
4. We all know that industry is moving towards heterogeneous computer architecture to increase the performance and reduce computation time. With FPGA working along in parallel with a CPU or GPU drastically increase performance of a machine. For Example - Microsoft using FPGAs in there data centers have significantly helped in various tasks like searching and conversion.