AS037 » CRIOS - Cruise speed controller with Deep Learning
The overall idea of this project is to develop a complete system to detect traffic signs, classify them and inform/help the driver to control speed and movements along the route driving the car using deep learning for the traffic signs and fast segmentation for image processing.
The evolution of electronics was crucial to the advance on robotics, as most essential processing (i.e. the core of a robot) is done through digital computers, integrated sensors and smart actuators. This integration was only possible by the constant growth on microelectronics, advanced microprocessors micro-architectures, robust programming techniques and low power system designs. One of the major issues on autonomous robots is, besides its controls system, the ability of making it self-aware of its surroundings. This includes the recognition of structures, such as roads, bridges, buildings; includes the recognition of mobile objects, such as pedestrians, cars, bikes, trucks; and specially of objects that changes paradigm of the robot control, such as traffic signs and emergency signals. This work is focused on developing a system that can be used on a self-driving car, or another type of autonomous robot to detect and interpret traffic signs that the robot may encounter on its operation.
The idea to use a FPGA in this mainframe is to allow a faster processing, making parallel system designs, while focusing the main control system on most essential processing operations. This approach may increase system’s overall performance, allowing a more flexible system while maintaining the basic functions intact.
The overall idea of this project is to develop a complete system to detect traffic signs, classify them and inform/help the driver to control speed and movements along the route driving the car. With a complete system running, the project aims to:
Application areas and Target users
This design is intended to be used in many application areas, from educational self-driving car projects to autonomous platforms for test in roads. Besides covering a wide area, we focus on application of guidance device that helps the driver mapping automatically each traffics sign.
Figure 1: System overview.
The project is separated in five modules, where each module will be responsible to process a image/data and output its results to the next. Once up an image has been processed, the auxiliary peripherals (ARM/Arduino processors) will process the information with its built-in application to interface the results with the external car environment sensors.
Figure 2: Global perspective.
The Figure 2 above shows a global perspective of all blocks that composes the entire project. In our approach, the image captured from the camera is first processed where the true-color values of each pixel will be replaced to fit with the real color from the traffic signs on the road. On the next step, an optimized fast segmentation algorithm will get the pre-processed image to map each object in the frame, marking with squares where each object is recognized and then hand on to the next process. After classified, the objects in the image will be analyzed by a "pseudo-computer" block.
The concept of this hardware is to understand the context of the driver to guide him according to the the traffic signs on the roads, where the context will be taken account to deliver to the driver its possible actions (i.e. reduce speed if some winding road is ahead or be careful if there are possibilities of deer/cattle crossing).
Once it's done, the FPGA will also be responsible to merge all the information processed into a one video output with the results and other highlighted informations, such as history of events, speed delta (target from signs and actual from vehicle), objects recognized and others.
The design aims to recognize traffic signs on a given picture. This is useful for real-time aplications on robots, such as autonomous cars and field-driving robots, but can be expanded to other aplications, depending on the desing requirements.
The algorithm proposed for recognize traffic signs on a given picture is described as:
A graphical flow is shown on Figure 3:
Figure 3: Crios algorithm flow.
The next sections will explain each individual process operations:
Figure 4: Thresholding data flow.
The thresholding process is the initial part of the proposed algorithm. It will be implemented on the FPGA portion of the device. The algorithm will process a single frame captured from the camera and send the processed data to the segmentation unit. Its purpose is to, given a frame with radom data (cars, people, dogs, buildings), apply a filter based on the desired objects (in this case, traffic signs) to allow the segmentation algorithm make an extraction of the data on the frame. The portion of the frame with the desired data is called “Region of Interest (ROI)”, and it will be used later on the traffic sign recognition algorithm. The output data will be a gray-scale picture.
The global view of the Thresholdingdata flow is shown on Figure 4.
The intended algorithm will be more complex, having its data flow described on the Figure 5:
Figure 5: Detailed Thresholding data flow.
Each sub-block is described as following:
This block is responsible by the frame capture from the camera, and it will store the data on an On-Chip Random Access Memory (OCRAM) for data access from the post processing blocks. This memory is often called "Frame Buffer", as it stores a full frame for post-use.
The frame data acquired from the camera will have the following configurations:
The configuration of the camera is done by a dedicated sub-system, which its details will not be taken in account in this document.
On-Chip Random Access Memory
The data captured from the camera will be stored on the FPGA’s On-Chip Random Access Memory (OCRAM), where it is going to be used for post-processing algorithms data access. This memory is often called "Frame Buffer", as it stores a full frame for post-use.
The "Frame Buffer" consists of a memory constructed with a 307200 x 10-bit elements structure, where the data stored captured from the camera is in the RAW data file format.
As the thresholding processing algorithms uses various data formats to evaluate its operations (i.e. RGB, HSV), the RAW data will need to be converted.
The OCRAM data access will be controlled by the Thresholdingcontrol block, where it select which sub-processing block will access and modify the content stored on the memory block.
The first frame data processing sub-system (Binary Thresholding) will process data in the HUE-Saturation-Value (HSV) format. Thus, the data stored in the RAW format on the frame buffer memory will need to be converted to RGB, and then, to HSV. This block converts the data stored to the specified one.
This block will apply a filter based on the HUE of the frame stored on the OCRAM, and will store the processed data on a second OCRAM for post-processing algorithm usage.
The implementation of the algorithm is as described:
With this logic a second frame will be reconstructed, but as it will be used only binary data, it will be called “Binary Mask”.
Low Pass Filter
The data processed by the Binary Thresholding may leave noisy data on the second frame, which may interfer on the segmentation algorithm. Thus a low pass filter will be applied on the secondary OCRAM, minimizing the noise on the frame, while maintaining the desired data (i.e. the traffic sign region) intact, as the ROI will have a low-frequency data zone.
This block manages the whole Thresholding process by asserting control signals on the processing blocks, and its main features are:
Direct Memory Access
This block will transfer the processed data on the Thresholdin galgorithm to the Hard-Processor-System (HPS) Synchronous Dynamic Random Access Memory (SDRAM). It will execute the transfer process under directly command of the Protocol Control block.
FPGA Protocol Control (slave)
This block will trigger the thresholding control and DMA when the HPS (master) requests a new frame. The block will also output information about the region detected on the segmentation algorithm done on the HPS to the HDMI interface. The Lightweight HPS-to-FPGA interface was used to enable the communication between the HPS and FPGA fabric, and the protocol used to manage the communication is described on the Figure 6:
Figure 6: HPS-to-FPGA protocol bus definition.
As the physical interface used is limited to 32-bits, the protocol was built to allow the transmission of all needed information.
The handshake protocol used on the main protocol is shown on Figure 7:
Figure 7: Handshake protocol for data exchange between FPGA fabric and HPS.
The segmentation algorithm will be implemented on the HPS portion of the Cyclone V device. It will be executed on the embbed Linux environment, under specifics instructions sets.
The algorithm will process the data sent from Thresholdingprocess to the SDRAM, finding the region selected on the previously algorithm and croppping it.
The process is done by applying a OpenCV 3.1, who is installed in the LXDE version, dedicated region-finding computing function, called "Find Contours". This funcion finds and outputs contours found on a given input picture. As the picture sent from Thresholdingis a gray-scale data type, the actual found region will take in account any variances the traffic sign may have, as the implemented algorithm will minimizes geometry imperfections on the captured data. The output result will be the position on the picture of the selected area (x,y), as shown on Figure 8.
Figure 8: Region Boundary Selection.
The boundary defined on the segmentation algorithm will be, along with the base frame captured on the very beginnig of the processing flow, the inputs of the third and last step on the proposed algorithm. The selection of the ROI on the base frame, using the boundaries defined on the segmentation process will result in the input data to the recognition algorithm, where it will process the selected data to output the found traffic sign.
The process will be as described:
First, to get the pre-processed image, it was nedeed to define a protocol for reading the image from the SDRAM and then process by the Open CV program.
Once up the image has been read by the C++ program, the software starts the findContours procedure to search for regions in the limiarized image. By the time of development, it wasn't possible to implement a full CNN in the FPGA/HPS portion to classify the ROI (Regions of Interest) in the project, so as the final step for classification, it was decided to use the recognition by the color HSV (Hue, Saturation and Value) of the ROI's. So to classify each traffic sign, the range values for the respective colors, RED, GREEN and YELLOW were the values below of:
Always a new frame was wrote by the FPGA portion, as in the previous sections described, the software in C++ starts the same procedure by looking for the ROI's, drawing the rectangles correpondents and then selecting by the H/S/V the right traffic sign correspondent.
As the DE10-Nano dev. kit package was delayed in the delivery, three only traffic signs was classified as in the video upload above. Each traffic sign was defined as a input metric color by the algorithm that compares the values of HSV at each frame.
The performance of the whole system could be defined by the metrics of:
To get started with the results by the metrics above, it'll be present the following results:
|FPGA Logic Elements Usage||15k/100k||25k/100k||20k/100k||No|
|FPGA Memory Blocks||600 kbit||600 kbit||600 kbit||Few mem. Blocks|
|HPS CPU Usage (%)||75||100||89||Yes|
|Time of processing (ms)||18||35||25||No|
The bottleNect metric indicates that we need to spent more time in a future version to optimize the procedure executed by the HPS/ARM Cortex A9, once up all the programable logic consumes a high percentage of CPU usage, where it impacts in slowly results by the Open CV part of segmentation/classification. Another important drawback was the memory block of the FPGA that was too small to store two images for the pre-processing step of the project.
The final design architecture is described on Figure 9:
Figure 9: CRIOS design architecture.