Annual: 2019

EM038 »
CNN hardware accelerator for detection of distracted drivers
📁Machine Learning
👤Amr Adel
 (Cairo university, Faculty of Engineering)
📅Jul 06, 2019
Regional Final



👀 1987   💬 2

EM038 » CNN hardware accelerator for detection of distracted drivers

Description

Our project is about deep neural networks hardware accelerators. The power consumption of current GPU implementations restricts the usage of neural nets in small devices such as cell phones. Our design is about developing a digital design specific to CNNs, such design can speed up the inference time and reduce the area and power thus providing a tiny but efficient deep learning processor for mobile devices or critical applications

We will use the proposed design to detect distracted car drivers (more details below)

Project Proposal

1. High-level Project Description

 

AI is the new electricity - Andrew Ng

The rise of deep learning affected all fields the world, with digital design as one of those fields

The limitations of CPU pushed scientists to use GPU for training and inference of the computationally expensive neural network. However, the GPUs themselves had their own limitations. This encouraged digital designers to start making processors that are specific to deep learning and its computation needs.

Low power consumption:

GPUs have high power consumption and relatively large area making them unable to fit in mobile devices.
We propose building a CNN architecture that can be used as the core of classification, object detection applications. This design will have much lower power consumption compared to GPUs, as shown in the following figure

reference

Low latency:

 FPGA implementations of CNN architectures have higher speed compared to GPUs and CPUs due to the parallel nature of FPGAs making them a better choice for critical application such as autonomous cars

GPU vs FPGA

reference

 

RTL design:

While the common approach to build such systems is HLS, we are going follow a RTL approach in order to have more control and make better use of the FPGA resources

Changing appication:

CNNs has the amazing ability of changing the application using transfer learning. The application of the device can easily be changed just by updating the parameters (weights) of the system while maintaing the same hardware architecture.

2. Block Diagram

There are two approaches to implement CNNs on FPGA:
1- Using a pipelined architecture and so have high speed but higher power and area
2- Non pipelined architecture: It has lower speed but lower power and area

reference

 

 

 

 

Table: Comparison of the first layer of Mobile-Net between pipelined , unpipelined architecture 

Application:

- The World Health Organization reported 1.25 million deaths yearly due to road accident with fifth of these accidents are caused by distracted drivers. We will implement a device that detects the distraction of a driver using the output of a camera placed on the dashboard. Once the device detects a distraction it will send a signal to alert the driver.
This tasks requires a high-speed device, so we will implement a pipelined CNN architecture
.


However, the pipelined architecture has its problems: Each stage produces several feature maps so it's important that the FPGA has enough resources to store the results of each stage.

Area reduction techniques that we will follow :
1- We will implement the Squeeze net architecture: The squeeze net is a tiny CNN architecture with about 50X fewer parameters than the famous AlexNet such as 
 If the image is 224x224 RGB it requires only 421.000 multiplication which is saving a lot of power of the FPGA and support inference


2- Quantization: Authors of the Squeeze net showed that quantizing the network weights from  32 bits to 8 or 6 bits doesn't affect the accuracy but reduces area greatly

3- Pruning: Some of the feature maps doesn't contribute much to the final output, removing these maps will reduce the required data to be stored and will reduce the number of operations needed thus, reducing the power too

 

reference

Table : Comparisons between different architecture 

3. Intel FPGA Virtues in Your Project

Function of Altera FPGA Device

The features of Altera DE- 10 Nano Kit appear to be suitable to implement our architecture for classification :

- 110k Logic element 

- 112 DSP block 

- 224 of (18 * 18 Multiplier ) 

- 112  of ( 27 * 27 Multiplier )

- 336 of ( 9 * 9 Multiplier )

reference

The DE-10 Nano kit was sufficient for an implementation of the Squeezenet architecture using HLS (see reference below).

Table: Resource utilization of the accelerator

reference

Our design is expected to have lower utilization due to the optimizations techniques we mentioned and the RTL approach

4. Design Introduction

5. Function Description

6. Performance Parameters

7. Design Architecture



2 Comments

Aleksandr Amerikanov
Your project lacks a description of the specific application for which you are developing it.

For example here http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=EM018 - mushrooms are recognized.

And here, http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=EM031 - detect the presence and counts people on the frames of video stream.

Here: http://www.innovatefpga.com/cgi-bin/innovate/teams.pl?Id=AP002 traffic is monitored.

So far your description looks too general.
🕒 Jul 06, 2019 04:10 PM
EM038🗸
Thank you Aleksandr for your comment. We modified the description.
Your feedback is highly appreciated.
🕒 Jul 06, 2019 05:52 PM

Please login to post a comment.