AS037 »
CRIOS - Cruise speed controller with Deep Learning
📁Machine Learning
👤Ânderson Ignacio da Silva
 (Mr.)
📅Jul 09, 2018
Regional Final


👀 2204   💬 6

AS037 » CRIOS - Cruise speed controller with Deep Learning

Description

The overall idea of this project is to develop a complete system to detect traffic signs, classify them and inform/help the driver to control speed and movements along the route driving the car using deep learning for the traffic signs and fast segmentation for image processing.

Demo Video

  • URL: https://youtu.be/WBMqI9TFrvA

  • Project Proposal

    1. High-level Project Description

    Motivation

    The evolution of electronics was crucial to the advance on robotics, as most essential processing (i.e. the core of a robot) is done through digital computers, integrated sensors and smart actuators. This integration was only possible by the constant growth on microelectronics, advanced microprocessors micro-architectures, robust programming techniques and low power system designs. One of the major issues on autonomous robots is, besides its controls system, the ability of making it self-aware of its surroundings. This includes the recognition of structures, such as roads, bridges, buildings; includes the recognition of mobile objects, such as pedestrians, cars, bikes, trucks; and specially of objects that changes paradigm of the robot control, such as traffic signs and emergency signals. This work is focused on developing a system that can be used on a self-driving car, or another type of autonomous robot to detect and interpret traffic signs that the robot may encounter on its operation.

    The idea to use a FPGA in this mainframe is to allow a faster processing, making parallel system designs, while focusing the main control system on most essential processing operations. This approach may increase system’s overall performance, allowing a more flexible system while maintaining the basic functions intact.

    Design Goals

    The overall idea of this project is to develop a complete system to detect traffic signs, classify them and inform/help the driver to control speed and movements along the route driving the car. With a complete system running, the project aims to:

    1. Recognize in real-time traffic signs;
    2. Classify each traffic sign based on previous trained convolutional neural network;
    3. Report and help the driver in the handling work, with a set of tips;
       

    Application areas and Target users

    This design is intended to be used in many application areas, from educational self-driving car projects to autonomous platforms for test in roads. Besides covering a wide area, we focus on application of guidance device that helps the driver mapping automatically each traffics sign.
     

    2. Block Diagram

    Figure 1: System overview.

    The project is separated in five modules, where each module will be responsible to process a image/data and output its results to the next. Once up an image has been processed, the auxiliary peripherals (ARM/Arduino processors) will process the information with its built-in application to interface the results with the external car environment sensors. 

    Figure 2: Global perspective.

    The Figure 2 above shows a global perspective of all blocks that composes the entire project. In our approach, the image captured from the camera is first processed where the true-color values of each pixel will be replaced to fit with the real color from the traffic signs on the road. On the next step, an optimized fast segmentation algorithm will get the pre-processed image to map each object in the frame, marking with squares where each object is recognized and then hand on to the next process. After classified, the objects in the image will be analyzed by a "pseudo-computer" block.
    The concept of this hardware is to understand the context of the driver to guide him according to the the traffic signs on the roads, where the context will be taken account to deliver to the driver its possible actions (i.e. reduce speed if some winding road is ahead or be careful if there are possibilities of deer/cattle crossing).
    Once it's done, the FPGA will also be responsible to merge all the information processed into a one video output with the results and other highlighted informations, such as history of events, speed delta (target from signs and actual from vehicle), objects recognized and others.

    3. Intel FPGA virtues in Your Project

    • High-speed execution of parallel structures with DE10-Nano what makes great to use in CNN's (Deep Learning) instead of classical x86 structures or GPUs;
    • Altera device turns possible to run the segmentation (extraction of ROIS - Regions of Interest) with high paralelism criterias that can execute the task in a shorter period than running in a dedicated traditional CPU;
    • The Arduino expansion header and the aditional peripherals of the DE10-Nano brings a huge possibilites of inpout sensors for the CRIOS project, what creates a great structure for a complete controller of speed in vehicles;
    • The idea of creating a parallelized processing structure makes the project more flexible to future modifications, while maintaining its main operational features in a composition of a system.

    4. Design Introduction

    The design aims to recognize traffic signs on a given picture. This is useful for real-time aplications on robots, such as autonomous cars and field-driving robots, but can be expanded to other aplications, depending on the desing requirements.
    The algorithm proposed for recognize traffic signs on a given picture is described as:

    • Expose the traffic sign on the frame (Thresholding);
    • Crops the area on the frame that the traffic sign was found (Segmentation);
    • Extracts the actual traffic sign correspondence using a specific algorithm (Recognition);
    • Interface the traffic sign in the output video for the user.;

    A graphical flow is shown on Figure 3:

     

     

     

     

     

    Figure 3: Crios algorithm flow.

     

    5. Function Description

    The next sections will explain each individual process operations:

     

    Thresholding:

     

     

    Figure 4: Thresholding data flow.


    The thresholding process is the initial part of the proposed algorithm. It will be implemented on the FPGA portion of the device. The algorithm will process a single frame captured from the camera and send the processed data to the segmentation unit. Its purpose is to, given a frame with radom data (cars, people, dogs, buildings), apply a filter based on the desired objects (in this case, traffic signs) to allow the segmentation algorithm make an extraction of the data on the frame. The portion of the frame with the desired data is called “Region of Interest (ROI)”, and it will be used later on the traffic sign recognition algorithm. The output data will be a gray-scale picture.
    The global view of the Thresholdingdata flow is shown on Figure 4.


    The intended algorithm will be more complex, having its data flow described on the Figure 5:  

     

    Figure 5: Detailed Thresholding data flow.

     


    Each sub-block is described as following:

     

    Camera


    This block is responsible by the frame capture from the camera, and it will store the data on an On-Chip Random Access Memory (OCRAM) for data access from the post processing blocks. This memory is often called "Frame Buffer", as it stores a full frame for post-use. 

     

    Frame configuration


    The frame data acquired from the camera will have the following configurations:

     

    • Resolution: 640 x 480 pixel standard "VGA";
    • Frame rate: 60 frames per second;
    • Data format: Bayer filter mosaic RAW (10-bit).

     

    The configuration of the camera is done by a dedicated sub-system, which its details will not be taken in account in this document.

     

    On-Chip Random Access Memory

     

    The data captured from the camera will be stored on the FPGA’s On-Chip Random Access Memory (OCRAM), where it is going to be used for post-processing algorithms data access. This memory is often called "Frame Buffer", as it stores a full frame for post-use. 
    The "Frame Buffer" consists of a memory constructed with a 307200 x 10-bit elements structure, where the data stored captured from the camera is in the RAW data file format.
    As the thresholding processing algorithms uses various data formats to evaluate its operations (i.e. RGB, HSV), the RAW data will need to be converted.
    The OCRAM data access will be controlled by the Thresholdingcontrol block, where it select which sub-processing block will access and modify the content stored on the memory block. 

     

    Data Converter

     

    The first frame data processing sub-system (Binary Thresholding) will process data in the HUE-Saturation-Value (HSV) format. Thus, the data stored in the RAW format on the frame buffer memory will need to be converted to RGB, and then, to HSV. This block converts the data stored to the specified one.


    Binary Thresholding

     

    This block will apply a filter based on the HUE of the frame stored on the OCRAM, and will store the processed data on a second OCRAM for post-processing algorithm usage.
    The implementation of the algorithm is as described:

     

    • Use the desired traffic signs for HUE extraction; 
    • Define the HUE threshold for the Binary Thresholding using the HUE extracted from the traffic signs;
    • Read the HUE data from the frame buffer for each pixel;
    • Reconstruct the frame on a second OCRAM using binary numbers (i.e if the pixel HUE from the frame buffer matches the defined threshold, the correspondent pixel value on the second OCRAM will be “1”. Else the pixel value will be “0”).
    • Repeat the logic on the whole frame buffer.
       

    With this logic a second frame will be reconstructed, but as it will be used only binary data, it will be called “Binary Mask”.

     

    Low Pass Filter

     

    The data processed by the Binary Thresholding may leave noisy data on the second frame, which may interfer on the segmentation algorithm. Thus a low pass filter will be applied on the secondary OCRAM, minimizing the noise on the frame, while maintaining the desired data (i.e. the traffic sign region) intact, as the ROI will have a low-frequency data zone.

     

    Thresholding Control

     

     

     

    This block manages the whole Thresholding process by asserting control signals on the processing blocks, and its main features are:

    • Control when a new frame will be stored on the frame buffer;
    • Control which processing algorithm are accessing the secondary OCRAM;
    • Control when the processed data is ready to be outputted to the segmentation part of the main algorithm.

     

    Direct Memory Access

    This block will transfer the processed data on the Thresholdin galgorithm to the Hard-Processor-System (HPS) Synchronous Dynamic Random Access Memory (SDRAM). It will execute the transfer process under directly command of the Protocol Control block.

     

    FPGA Protocol Control (slave)

    This block will trigger the thresholding control and DMA when the HPS (master) requests a new frame. The block will also output information about the region detected on the segmentation algorithm done on the HPS to the HDMI interface. The Lightweight HPS-to-FPGA interface was used to enable the communication between the HPS and FPGA fabric, and the protocol used to manage the communication is described on the Figure 6:

    Figure 6: HPS-to-FPGA protocol bus definition.

    As the physical interface used is limited to 32-bits, the protocol was built to allow the transmission of all needed information.

    The handshake protocol used on the main protocol is shown on Figure 7:

    Figure 7: Handshake protocol for data exchange between FPGA fabric and HPS.

     

    Segmentation Algorithm


    The segmentation algorithm will be implemented on the HPS portion of the Cyclone V device. It will be executed on the embbed Linux environment, under specifics instructions sets.
     The algorithm will process the data sent from Thresholdingprocess to the SDRAM, finding the region selected on the previously algorithm and croppping it. 
     The process is done by applying a OpenCV 3.1, who is installed in the LXDE version, dedicated region-finding computing function, called "Find Contours". This funcion finds and outputs contours found on a given input picture. As the picture sent from Thresholdingis a gray-scale data type, the actual found region will take in account any variances the traffic sign may have, as the implemented algorithm will minimizes geometry imperfections on the captured data. The output result will be the position on the picture of the selected area (x,y), as shown on Figure 8.

    Figure 8: Region Boundary Selection.

     

    Recognition Algorithm


    The boundary defined on the segmentation algorithm will be, along with the base frame captured on the very beginnig of the processing flow, the inputs of the third and last step on the proposed algorithm. The selection of the ROI on the base frame, using the boundaries defined on the segmentation process will result in the input data to the recognition algorithm, where it will process the selected data to output the found traffic sign.
    The process will be as described:

    • Fetch base frame;
    • Fetch region boundaries;
    • Crop region;
    • Find the correspondent traffic sign;

    First, to get the pre-processed image, it was nedeed to define a protocol for reading the image from the SDRAM and then process by the Open CV program.

    Once up the image has been read by the C++ program, the software starts the findContours procedure to search for regions in the limiarized image. By the time of development, it wasn't possible to implement a full CNN in the FPGA/HPS portion to classify the ROI (Regions of Interest) in the project, so as the final step for classification, it was decided to use the recognition by the color HSV (Hue, Saturation and Value) of the ROI's. So to classify each traffic sign, the range values for the respective colors, RED, GREEN and YELLOW were the values below of:

    • STOP  - RED - H,S,V from (0, 100, 100)/(10, 255, 255) to (160, 100, 100)/(179, 255, 255)
    • STOP 2 -YELLOW - H,S,V from (15,0,50) to (40,255,220)
    • INFORMATIONS - GREEN - H,S,V from (44,54,63) to (71,255,255)

    Always a new frame was wrote by the FPGA portion, as in the previous sections described, the software in C++ starts the same procedure by looking for the ROI's, drawing the rectangles correpondents and then selecting by the H/S/V the right traffic sign correspondent. 

    As the DE10-Nano dev. kit package was delayed in the delivery, three only traffic signs was classified as in the video upload above. Each traffic sign was defined as a input metric color by the algorithm that compares the values of HSV at each frame.

    6. Performance Parameters

    The performance of the whole system could be defined by the metrics of:

    • FPS - Frames per second of processing;
    • FPGA usage area;
    • HPS CPU consumption by segmentation/classification;
    • Time of processing by each step of the solution;

    To get started with the results by the metrics above, it'll be present the following results:

         

      Min Max Typ BottleNeck
    FPS 14 25 22 No
    FPGA Logic Elements Usage 15k/100k 25k/100k 20k/100k No
    FPGA Memory Blocks 600 kbit 600 kbit 600 kbit Few mem. Blocks
    HPS CPU Usage (%) 75 100 89 Yes
    Time of processing (ms) 18 35 25 No

     

     

    The bottleNect metric indicates that we need to spent more time in a future version to optimize the procedure executed by the HPS/ARM Cortex A9, once up all the programable logic consumes a high percentage of CPU usage, where it impacts in slowly results by the Open CV part of segmentation/classification. Another important drawback was the memory block of the FPGA that was too small to store two images for the pre-processing step of the project.

    7. Design Architecture

    The final design architecture is described on Figure 9:

    Figure 9: CRIOS design architecture.



    6 Comments

    Donald Bailey · Judge ★
    Have you tested the algorithms in software before trying to implement them on an FPGA?
    🕒 Jan 24, 2018 08:47 PM
    AS037🗸
    We're testing in this moment but as the software has a high latency in image processing (compared to hardware dedicated design) we had to do some constraints against performance.
    🕒 Jan 25, 2018 07:48 PM
    Mandy Lei · Judge ★
    Good project! Keep going!
    🕒 Jan 19, 2018 04:05 AM
    AS037🗸
    Thanks!
    🕒 Jan 25, 2018 07:46 PM
    Mateus Oliveira de Lima Soares
    Very interesting purpose, could you use parallelism along the classification procedure even?
    🕒 Jan 02, 2018 07:23 AM
    AS037🗸
    Yes, we could use parallelism because the CNN has a natural parallel architecture, thanks =)
    🕒 Jan 25, 2018 07:48 PM

    Please login to post a comment.