EM007 » Deep Argus - AI 4 Visually Handicapped People
For many visually disabled people it is challenging to identify other persons and classify objects correctly. A wearable infrared camera system with multiple, real-time convolutional neural network would be helpful for low vision affected and increases the confidence in rain, fog and darkness.
We would like to compare two established technology approaches and unite the individual advantages of the 110,000 reconfigurable logic elements and the sequential ARM Cortex-A9 hardcore processor on the Terasic DE10-Nano SoC.
Our project objective is a small, helpful bodycam for 217,000,000 visually handicapped people!
Index Terms: Field programmable gate arrays, Neural network hardware, Fixed-point arithmetic, 2D convolution
Based on the WHO Report of December 20181, worldwide 217 million persons have moderate to severe vision impairment. The presenting distance visual acuity is less than 30 percent (WHO category 1) or even less than 10 percent (WHO category 2)2. With regards to the Johnsen Criteria3, is it possible to recognize persons or objects but not identify them for sure. This uncomfortable situation is amplified at reduced contrast in winter, rain or darkness. The following simulations give a first impression and visualize the limitation.
Augmented reality with low vision: https://simulator.seenow.org
Figure 1: Low Vision Simulation
In joint discussion with interested users and influential associations we collected common expectations and defined the project scope. The UML Use Case diagram shows how the different actors and requirements are matched together.
Figure 2: UML Use Case Diagram
Product users would like to identify objects, people and text. Especially the people and face recognition function is very individual and we would like to support this cooperative step with a mobile app. The reconfigurable platform will be updated at home - new people can be trained and new features and improvements will be implemented.
In reference to the PMI-framework4 we organized ourselves in a hybrid project team and started our project in march 2019. In the first phase we developed the Project Charter, identified stakeholders and created the Project Management Plan. We collected different requirements and started the system evaluation. Especially two technology approaches are very interesting for us and we split our team:
- The Cyclone V SoC as the Hardware Platform with the heterogeneous programming language OpenCL
- The Raspberry Pi with an AI hardware accelerator (Intel® Neural Computer Stick 2, NCS 2 as short)
In a weekly jour-fixe we shared our progress and lessons learned. Our Raspberry Pi hardware setup was good documented and we tested our first prototype within one month. The power consumption for the Raspberry Pi 3B+ and the NCS 2 USB stick was high and the image processing took up to three seconds.
In parallel we started the model-driven development in Matlab and programmed the Cyclone V SoC in Verilog. The final prototype will be implemented based on the DE10-Nano Development Kit and compared to the golden reference in Matlab, to the Raspberry Pi and real-live simulations.
Figure 3: Timeline
The complete value chain will be executed in independent sheltered workshops and in cooperation with experienced low-vision opticians for a professional product recommendation. We would like to create the groundwork for future "Deep Argus Teams" and share our technology worldwide for steadily development.
Figure 4: Value Stream Mapping
Our near infrared sensor (Camera, in Figure 5) has a 3,5 mm wide angle lens and is dimensioned based on the Johnson Criteria with min. 26 pixel for each object5. Near infrared (NIR) is similar to visible light in that photons are reflected or absorbed by an object, providing a strong contrast. Ambient star light and background radiance are natural emitters of infrared and provide excellent illumination for outdoor imaging6. Water vapor or fog is transparent and will be filtered.
Figure 5: High Level Block Diagram
The 640 x 480 pixel camera sensor is connected to the GPIO connector and controlled by the Camera Control Interface. The parallel image pipeline is directly connected to this interface. The optimized image is processed in the face detection VJA and object classification CNN (Interference Engines in Figure 5) and stored in the OnChip-RAM. Based on the identified object, the face or object recognition is processed. The final result is converted from audio feedback (stored in ROM, future approach: Text2Speech) and submitted via an audio codec and an AUX-To-Bluetooth interface to a Bluetooth speaker, such as a bone conduction headset7.
Advantages for Convolutional Neural Networks
With the Cyclone V SoC, Intel delivers a high performance, low power hardware which is the perfect platform in order to reach our aims. With the combination of a FPGA and an embedded dual-core ARM Cortex A9 hardcore processor system (HPS), the system is perfectly equipped to handle powerful computation on a high standard. Therefore, the system allows us to build an application with high parallelism throughput enhanced with performing serial algorithms.
The possibility of parallelism brings a high boost in computation of convolutional neural networks (CNN), which makes it an ideal platform for object classification and face recognition. Furthermore, the compute-intensive work of image processing can be accelerated inside the FPGA fabric.
Steeled to changes
The configurability of the FPGA fabric makes it feasible to implement more sophisticated configurations afterwards, which allows us to get the best performance during the whole development process and future product updates.
For our wearable application the minimalistic size, weight and power (SWaP) is important. Compared to the Raspberry Pi, for the final FPGA platform less dedicated hardware, such as the AI accelerator, is necessary and this platform reduces all three parameters significantly.
The portable, cross platform HDL supports an early time to market (TTM) and long-term flexibility for more advanced technologies in the growing Intel product portfolio8.
Our product is developed with the aim to provide an "easy-to-use" application for visually handicapped people. The product demo video9 shows the usage and result of the face detection and object recognition performed on the Intel FPGA.
Selection of the Development Board
According to the system requirements regarding the performance (which is mostly represented by the number of logic elements, short form: LE) necessary for the deep learning based product and the realistic budget for a system for visually handicapped people the two development boards DE10-Nano for the final product and the DE10-Standard for development and debugging are selected according to the below shown cost-performance ratio:
Figure 7 - System Platform Evaluation
These two selected development boards are based on the Cyclone V SX FPGA SoC containing 110,000 LEs and 112 DSP blocks10.
The camera which was chosen within this project is using the OmniVision OV7670 camera sensor11 which has a maximum resolution of 640 x 480 pixels. The camera and a block diagram of the camera sensor is shown in Figure 8. The reason why this camera was chosen is the good resolution of the sensor, the Digital Video Port (DVP) protocol - a data interface which is quiet fast to access the whole picture and well to handle via a HW-implementation. In addition, the camera gives good opportunities to be modified for focus, image filter etc. if needed.