Annual: 2019

AS023 »
Real time extraction of text from image using deep learning
📁Machine Learning
👤Priyanka Kondaparthi
 (San Jose State University )
📅Oct 09, 2019
Regional Final

54


👀 623   💬 16

AS023 » Real time extraction of text from image using deep learning

Description

Purpose: The power of deep learning has given the ability to extract meaningful information from image analysis.The real time extraction of text from images is useful for visually impaired people to read instructions on sign boards ,get information of items during shopping and various other applications.In this project we aim to interface the camera with OpenVINO Starter kit and explore the high computing performance to extract text from the images in real time using
convolution neural networks.
Applications:This project has applications in autonomous vehicle navigation, to guide visually impaired people. This project can be further exploited to extract the text in english and render in any other language.

Demo Video

  • URL: https://youtu.be/iZKYMzaTxc8

  • Project Proposal

    1. High-level Project Description

    Introduction:

    Deep learning facilitates the delivery of technology to improve the quality of life.There are plethora of navigation aiding devices availbale in market for the visually impaired people.However, the navigation guidance devices fail to assist in case of cautioning about a wet floor,elevator failure,instructions written on walls  inside buildings such as libraries that provide information for assisting the user,shopping.With the high performace computing power of  OpenVINO starter kit, the current project aims to develop a system that enables real time text extraction from images.This project delivers a useful and engaing product for the visually impaired people which helps them in experiencing shopping,revealing information on the instruction/sign boards and better navigation inside public buildings.The aim of the project is to develop an engaging product whose applications extend  beyond mere recognition of obstacles for navigation assistance.

    Image Capture:The data is taken dynamically from the camera and is sent to CNN  implemented on FPGA.The communication between FPGA and camera module is established using I2C.The image data is stored in SDRAM for use by CNN to extract text.

    Text Extraction:We train CNN to extract the text from the image.Being OpenCL High Performance Computing Platform OpenVINO helps in implementing a neural network model with impressive response time.

    Text to Speech Conversion:The text extracted is converted into speech by using Emic 2 Text to speech modules.Through  Arduino extension header , the extracted text is transmitted to text to speech conversion module.

     

    OpenCL High Performance Computing,Arduino extension header,higher avable memory(64 MB SDRAM and 1 GB DDR3) to store image buffer data which helps in the faster computation are the motivating factors to use OpenVINO strater Kit for efficient implementation of this project

    2. Block Diagram

    3. Intel FPGA virtues in Your Project

    • Arduino Uno Revision 3 Expansion Header to interface with Arduino for getting the speech from text
    • OpenCL High peformance platform to accelerate the training of  (Convolution Neural Networks)CNN
    • Higher available memory(64 MB SDRAM and 1 GB DDR3 helps in the storage of image buffer data from frames of camera module.This feature enables the real time extraction of text from image which results in a neural netowrk with good respone time.

    4. Design Introduction

    Introduction:

    Deep learning facilitates the delivery of technology to improve the quality of life.There are plethora of navigation aiding devices availbale in market for the visually impaired people.However, the navigation guidance devices fail to assist in case of cautioning about a wet floor,elevator failure,instructions written on walls  inside buildings such as libraries that provide information for assisting the user,shopping.With the high performace computing power of  OpenVINO starter kit, the current project aims to develop a system that enables real time text extraction from images.This project delivers a useful and engaing product for the visually impaired people which helps them in experiencing shopping and better navigation inside public buildings.The aim of the project is to develop an engaging product whose applications extend  beyond mere recognition of obstacles for navigation assistance.

    5. Function Description

    Functional Description:

    The design uses text detection and text recognition models to detect and extract  text from the digital camera module feed. The

    Scene cpature:

    The D8M digital camera module is used to capture the surrounding scene.The video stream input is fed to the text detection and recognition models.

    Text detection

    The text detection model flags if there is presence of text in the video frame.The output of text detection model is the count of isolated words that can be distinguished by a boundary box.The output of text recognition model is input to the text recognition model

    Text recognition

    The text recognition model takes the input from the text detection model and draws a rectangular box around the detected tex. The

    NN model.

    For text detection , the open model zoo demo model text_detection_demo has been used.The mdoel takes either image , video or web cam feed as input and detects the text in the frames.The model takes text detection and text recognotion xmls as inputs

     

     

    6. Performance Parameters

    Perfromance

    Running the neural network models on the OSK gives higher perfromance over the CPU.

    OSK FPGA gives 2.5 times the speedup in comparison to CPU.

     

    7. Design Architecture

    Demo Video Link:

    https://youtu.be/iZKYMzaTxc8

    Design Architecture:

    The D8M GPIO camera module interfaces with the GPIO of OSK.The 1 GB DDR3 SDRAM allows good buffer size for efficient streaming of digital camera module feed.

     

     

     

    Training model

     

     

    Test model

     

     


     



    16 Comments

    Sreeteja
    Interesting work Priya. All the best. Keep going
    🕒 Jul 06, 2019 08:32 AM
    Kalyan pinni
    Nice project.. keep rocking
    🕒 Jul 06, 2019 02:21 AM
    shashidhar
    Great work!
    🕒 Jul 05, 2019 08:59 PM
    M Balachander Reddy
    Very nice project to work on. This can aid visually challenged persons to a greater extent. All the best.
    🕒 Jul 05, 2019 08:38 PM
    Manu Chemudupati
    An interesting project .
    🕒 Jul 05, 2019 08:12 PM
    Julian Paris Ortiz Ortiz
    Nice project.
    🕒 Jul 03, 2019 09:07 PM
    Subhankar Satapathy
    Futuristic application can be used various detecting appliances
    🕒 Jul 02, 2019 08:52 PM
    Chanakya Kolla
    Useful topic, I hope you can do more projects like this. All the best!
    🕒 Jul 02, 2019 03:20 PM
    siri
    excellent project.... very usefull.... in all urs is best
    🕒 Jul 02, 2019 12:44 PM
    Sreeja kondaparthi
    Good innovative project !!
    🕒 Jul 02, 2019 05:31 AM
    Karan Sunchanakota
    Good Topic!
    🕒 Jul 01, 2019 07:46 PM
    shiva kondaparthi
    Good innovative project
    🕒 Jul 01, 2019 06:00 PM
    Sirisha
    Very useful project! Helps even for people who cannot read! All the best
    🕒 Jul 01, 2019 05:30 PM
    Priyatham Ganta
    Good project
    🕒 Jul 01, 2019 12:26 PM
    Ushasri
    Good topic
    🕒 Jul 01, 2019 04:47 AM
    Mandy Lei
    Good topic! Maybe you can submit the Block Diagram of your design.
    🕒 Jun 27, 2019 09:16 PM

    Please login to post a comment.