AP002 » Vision Based Traffic Control with Custom CNN Accelerator for Object Detection
Traffic congestion is a widespread problem that results in the loss of billions of dollars annually, valuable time of citizens and in some cases: invaluable human lives. By utilizing our custom designed CNN accelerator, we propose an edge-computing solution for this problem, that is both cost-effective and scalable. For developing countries like Sri Lanka, our vision-based traffic control on FPGA would be an ideal solution as described below.
In most countries, traffic flow is controlled by traffic lights with pre-set timers. In Sri Lanka, this often causes congestion during peak hours as the system is not sensitive to the traffic levels in each lane of an intersection. To solve this, the traffic policemen usually turn off the lights and manually control the traffic during peak hours. However, the policemen are unable to visually judge the level of traffic in each lane from their vantage point close to the ground.
An automated solution to this problem would be vision-based traffic sensing. However, the neural networks that excel in machine vision tasks require powerful GPUs or dedicated hardware. Laying cables along the road to transmit video feeds to control centers would require expensive infrastructure which is infeasible for a developing country like Sri Lanka.
Therefore, we present an implementation of a traffic sensing algorithm that is based on Object Detection on FPGA as a cost-effective, scalable, edge solution. We use YOLOv2, a state-of-the-art CNN for object detection accelerated through our custom CNN accelerator with post processing done on the ARM processor.
Custom CNN Accelerator Design:
A unique aspect of our project is, we design and implement a brand-new highly parallelized CNN accelerator whose single core at 100 Mhz can run a 384 x 384 RGB image through YOLOv2: (a 23-layer state-of-the-art object detection CNN with 2 billion floating point multiplications, 6 million comparisons, 8 billion additions) within 0.2 seconds. Multiple such cores can be implemented in parallel / series inside an FPGA to further improve throughput. The architecture can also be used to accelerate several other neural networks with slight modifications.