Annual: 2019

AS020 »
Project WatchDog: Smart home security system
📁Other: High-speed Video Processing/Artificial Intelligence/IoT
👤Neeloy Chakraborty
 (University of Illinois at Urbana-Champaign)
📅Oct 12, 2019
Regional Final

34


👀 624   💬 2

AS020 » Project WatchDog: Smart home security system

Description

The purpose of our project is to build a smart home security camera system. On a high level, the system would focus its sights on particular objects in the room that the user wants to ensure doesnt get handled by intruders. If an intruder enters the room and begins to handle the object, then the user will be notified. Aside from the notification, there will be an hdmi display that shows the camera feed with detected objects. That feed will also be transferred to the user pc or phone over wifi.

We aim to build this system using the DE10-nano development kit. We would run OpenCV on the ARM Cortex A9 chip to interface with the camera and take care of the frame differencing and Kalman filtering to identify and track moving objects. The Cyclone V SE would be used to run an Artificial Intelligence that classifies the moving object detected by the ARM chip. There would be a live video stream via HDMI to a monitor, a data dump via ethernet, and a wireless status update via the Arduino header to the cloud.

Project Proposal

1. High-level Project Description

For project WatchDog, we propose an FPGA-based security camera that uses artificial intelligence to identify and categorize objects, people, and animals in an area of interest. The purpose of our project is to build a smart home security camera system. On a high level, the system would focus its sights on particular objects in the room that the user wants to ensure doesn’t get handled by intruders. If an intruder enters the room and begins to handle the object, then the user will be notified. This would be primarily intended for affordable home security applications, and can be easily extended to industrial surveillance. The solution is intended to solve several problems in existing solutions by not relying on an internet connection to classify objects. In addition, we offer a plethora of outputs such as access from a computer as well as 5 GHz Wi-Fi access (most modern solutions only support 2.4 GHz networks). Furthermore, this project is aimed towards home or business owners who would like to keep a closer eye on their valuables.

We have several reasons for implementing this using the Intel Cyclone V SE FPGA on the DE10 Nano board. The DE10 Nano hosts an ARM Cortex-A9 processor with built-in support for OpenCV, which made it a natural candidate to pursue video processing. Additionally, the DE10 Nano development board allows us to easily interface with HDMI and an Arduino UNO, which would be used to send low bandwidth IoT notifications to devices on the same network. In terms of object recognition, we plan on implementing a classifier on the Cyclone V SE that can efficiently identify the objects in the frame. The Arduino UNO can then be interfaced with a servo motor which enables us to swivel the camera in the direction that the detected object is moving. And perhaps most importantly, videos will be streamed over ethernet to PC which provides a low latency, high bandwidth, secure connection.

Here is a link to a video showcasing our project: 

https://photos.google.com/share/AF1QipMP8xxDJmb0549J0CxsTrzAFsD3TRz5aiLseg6MYAVYPtctO5vnFTxomjRbiW_g1Q/photo/AF1QipNCoymHAuV1YNNBVggPf7SjD5DsYeD8XCQ_z5l-?key=U3UtY09zVlByRkp3dldsclZGd3JBb09hTVZLZzBn

2. Block Diagram

Figure: High level block diagram of Project WatchDog

3. Intel FPGA virtues in Your Project

Boosts Performance

FPGAs excel at speeding up processes that have to be executed on processors sequentially. A neural network will be created to dramatically increase the speed of processing images in comparison to TensorFlow on a Raspberry Pi. Benchmarks will be run to compare the speed and thermal performance of the OpenCV/Neural Network design created on the FPGA against the OpenCV/TensorFlow application on a Raspberry Pi. The FPGA should perform much faster and cooler than the Raspberry Pi. Another performance boosting application we would like to measure is the frames/second of video data sent over ethernet on the FPGA vs over wifi from the Raspberry Pi.

Expands I/O

With the addition of the Arduino, our project will be connected to the Internet of Things over wifi, allowing notifications to be sent to the user’s phone as well as any other devices of interest. This showcases the virtue of expanding I/O, allowing us to send meaningful data to the user in a way that the board is not normally able to do so. This is also what sets our solution apart from existing solutions. A vast majority of existing projects limit the outputs available (usually just an app), but our design takes advantage of several outputs to be easiest for the user. 

Adapts to Changes

Finally, our security system will be able to adapt to changes and its surroundings in two main ways. Even though the underlying hardware stays the same, the classifier is specifically tuned to each customer’s needs by feeding it training images of the valuable object. (Due to the size of the neural net, the training will not be done on the FPGA, but can be input into the FPGA). In addition, the Arduino can control servo motors to swivel and adjust the field of view of the camera to keep the valuable object in frame.

Data Collected

For Project Watchdog, we compared our solution to a Raspberry Pi 3 running TensorFlow to perform classification on the Raspberry Pi camera. The following table details the benchmarks we obtained. Unfortunately, due to a lack of time (as well as corrupted Raspberry Pi files), we were not able to run the tensorflow model on the Raspberry Pi 3. We instead found benchmarks of the same model run by someone else on the internet.

Device

Raspberry Pi 3

Project Watchdog

Temperature (℃)

74° - 78°

-

Frames per second

1 - 2 

-

Update speed (s)

1.73 

556.07 ms

Ethernet latency

 

303 ms

Temperature: The temperature on Project Watchdog will be gathered using the Thermal Analyzer in Quartus Prime. As the neural network has not been implemented, the value will not be accurate enough to relay in this paper at the moment. The Raspberry Pi 3 benchmarks were gathered from another website where they tested for the temperature. 

Frames per Second: We have run the TensorFlow model on the Raspberry Pi a few months earlier and was averaging between 1-2 FPS on the live video classifications. 

Update speed: What is meant by “update speed” in this case is how long it takes to send the data packets over the internet to the web server. Since Watchdog was not linked to the Arduino/ESP8266, we simulated the data on a laptop and used Postman to time how long the data takes to make it to the web server. The result is posted below.

 

Figure: Time of data packets sent from laptop to web server

 

Since the Raspberry Pi is simulating existing solutions, we had the Pi send an image over WiFi from the Pi to the web server. The time is noted below.

Figure: Screenshot of time capture of images sent from Raspberry Pi 3 to web server

Figure: Sample image sent to web server

 

For reference, this is the sample photo that was sent from the Raspberry Pi. It is 121 pixels by 91 pixels, and took almost 2 seconds, whereas the byte update was significantly less time. 

Ethernet latency: This was done similar to update speed testing, but was instead performed on a computer with an ethernet connection to the same network. This is effectively a simulation of Watchdog dumping status information over ethernet.

Resources: https://www.hackster.io/news/benchmarking-tensorflow-and-tensorflow-lite-on-the-raspberry-pi-43f51b796796

Unfortunately, due to time limitations we were not able to test the Ethernet speeds on the Raspberry Pi.

4. Design Introduction

For project WatchDog, we propose an FPGA-based security camera that uses artificial intelligence to identify and categorize objects, people, and animals in an area of interest. The purpose of our project is to build a smart home security camera system. On a high level, the system would focus its sights on particular objects in the room that the user wants to ensure doesn’t get handled by intruders. If an intruder enters the room and begins to handle the object, then the user will be notified. This would be primarily intended for affordable home security applications, and can be easily extended to industrial surveillance. The solution is intended to solve several problems in existing solutions by not relying on an internet connection to classify objects. In addition, we offer a plethora of outputs such as access from a computer as well as 5 GHz Wi-Fi access (most modern solutions only support 2.4 GHz networks). Furthermore, this project is aimed towards home or business owners who would like to keep a closer eye on their valuables.

We have several reasons for implementing this using the Intel Cyclone V SE FPGA on the DE10 Nano board. The DE10 Nano hosts an ARM Cortex-A9 processor with built-in support for OpenCV, which made it a natural candidate to pursue video processing. Additionally, the DE10 Nano development board allows us to easily interface with HDMI and an Arduino UNO, which would be used to send low bandwidth IoT notifications to devices on the same network. In terms of object recognition, we plan on implementing a classifier on the Cyclone V SE that can efficiently identify the objects in the frame. The Arduino UNO can then be interfaced with a servo motor which enables us to swivel the camera in the direction that the detected object is moving. And perhaps most importantly, videos will be streamed over ethernet to PC which provides a low latency, high bandwidth, secure connection.

5. Function Description

Our project’s proposed design can effectively be explained by breaking it down to its fundamental components: the camera, the ARM Cortex-A9 processor, Artificial Intelligence, Ethernet, HDMI, and the Arduino header.

Artificial Intelligence (Initial training)

The Artificial Intelligence was implemented by using a Deep Neural Net. There are quite a few existing architectures, including quite a few that are pretrained. However, implementing them on the FPGA is quite a large undertaking. We therefore decided to use an architecture which was the most simplistic, even if we would have difficulty tuning its accuracy. We modeled our Neural Net very heavily off of the one used to classify handwritten digits from the MNIST dataset. We chose to use the CIFAR-100 dataset to train the neural net because of its similarity to the MNIST dataset. To accelerate development time, we decided to do all of the training on our personal computers and then transfer the weights over to the FPGA. However, because the neural net we implemented was not based on an existing architecture, training proved to be rather difficult. The neural net ended up having an exceedingly low accuracy (about 5%), but we were satisfied with being able to implement it on the FPGA. We implemented stochastic gradient descent coupled with backpropogation.

Camera

The camera we utilized was the Logitech C310 USB web camera. OpenCV allows for easy access to the pixel data of the camera in C++/Python. We utilized frame differencing algorithms to perform the image segmentation to identify changes in the frame. 

The program in OpenCV operates by first performing frame differencing. The raw output of this will be a very noisy, segmented object that is white against black. It is the implementation of 2D convolution where the kernel being convoluted, calculates the local minimum, and then replace the pixel in the convolution region with the minimum pixel value. The dark zones of the image will get larger, while the lighter zones get thinner, in effect cleaning up the image. The result is a crisp frame differenced image. Afterwards, the image is normalized to a 32x32 bit image and then grayscaled.

The camera itself is running at 720p and 60 fps.

Figure: A movement in front of the amera produces a frame difference

ARM Cortex-A9 Processor

The HPS of the DE10 Nano hosts the ARM processor running LXDE OS. This was chosen for the easy initial programming of image transformation algorithms in C++ and OpenCV. When a change in the current frame is identified, the OpenCV program writes a 32x32 image to an instantiated FPGA Onchip RAM over the 32bit HPS-FPGA bridge. Once the image is written to memory, the C++ program utilizes the Lightweight HPS-FPGA Bus, to write a ready signal into a PIO output for the FPGA to understand it is ready to classify the image.

HPS-FPGA Communication

A SoC was implemented in Platform Designer to instantiate the interconnects and memory needed between the HPS & FPGA. The following is an image of the Platform Designer.

Figure: SoC with HPS-FPGA Memory Sharing Interconnects

The HPS-FPGA is able to write to one port of a dual port RAM that holds 1024B data with 32 bit addresses. The other port is exported to allow for the FPGA to access the memory contents. Finally, the lightweight HPS-FPGA bridge is connected directly to a PIO block that acts as a ready signal for the classification to begin within the FPGA fabric. 

Unfortunately, we encountered several problems in the communication from the HPS to the FPGA. In order to write to the memory from the HPS, the mmap() function was utilized in C, which maps virtual software addresses to physical hardware addresses. We believe that the mmap() in the embedded code did not address the proper location to write the image data, as test print statements showed that the RAM held miscellaneous values upon reboot (while in Platform Designer, the RAM was set to be uninitialized). This would be the first piece of code to debug, to allow for live video classification. On the other end, another issue could be that the FSM that has been initialized to read from the memory is reading at the wrong time (or from the wrong address). We will be verifying the issue in simulation as well as via instantiating a block to allow for Signal Tap real-time viewing. 

Figure: Showcases the mmap() function mapping a virtual address to a physical address for the SRAM block

Artificial Intelligence (Classification)

With more time, we would have been able to instantiate the neural net on the FPGA. Ideally, this would be a series of sigmoid neurons all linked together, essentially acting as one big multiplier. The net would have 1024 inputs (This comes from 32 bits x 32 bits) which has 20 outputs (each output represents the class that the image belongs to). While we were unable to physically create the model in hardware, we have researched into the necessary modules to peform the task of an artificial neural network.

Figure: The innerworkings of a general neuron using a computational design method

Reference: https://pdfs.semanticscholar.org/01df/944ffbb15b45d71397d568420f0eb35cfb8a.pdf?_ga=2.2497272.995983512.1570595674-1392114836.1570595674

A neuron's output value is dependent on a summation of all of its input connected neurons' outputs and path weights multipled, with a bias for that individual neuron. That value is passed through an excitation function (which in our case is planned to be the log-sigmoid function). In our implementation, we will have to instantiate a 2's complement adder/subtractor, multiplier, as well as an FSM to control the data exchanges between the modules. The logsig module is one of the more difficult and power consuming blocks as it will be computing the excitation function 1/(1+e^(-x)).

We are excited to continue our project to implement this network with trained weights. 

Gigabit Ethernet 

For our project, the ethernet would be used to output the stream at near real-time speeds to another device. This allows for high bandwidth and low latency communication in a secure format.

The idea with the ethernet would have been to have a frame dump/status update that would be much more secure as well as slightly faster than WiFi. Unfortunately, we ran out of time to instantiate the Ethernet interface.

HDMI 

The HDMI would be used to output the footage from the camera with an overlay of what the AI has identified as the objects detected in real-time. HDMI is a widespread standard so it would be reasonable to assume that most TVs and monitors have HDMI. Due to time, we were only able to output the OpenCV video stream onto HDMI. After the net is instantiated and it is able to classify the object, a name tag can be overlayed on top of the object.

Arduino header

Due to the amount of time spent on the instantiation of the neural net and the FPGA memory management, we weren’t able to attempt getting the link between the Arduino and the DE10-Nano board working via the built-in Arduino header. However, we were still able to get the Arduino-side of Project Watchdog up and running. We set up a web server using Express JS, and had the ESP-8266 send the outputs of the Neural Net classifier to the web server. Since the Neural Net was not accurate or connected to the Arduino UNO, we simulated the output on the Arduino. Once the web server receives this simulated message, it passes it onto the website that it is hosting. Using a dynamic template engine, we were able to get the webpage to update what Watchdog has classified. Even though the link between Watchdog and the website is not there, with a little more time we are confident that it could definitely be accomplished.

Figure: Dynamic template of output from classification of objects in frame

6. Performance Parameters

We are aiming for the following performance parameters in the categories of thermals, power, frame rate, and latency:

  • FPGA core temperature less than 70℃ 

    • A Raspberry Pi running full TensorFlow has been measured to run at 74℃ and TensorFlow light at 78℃

  • Total power consumption (dynamic, static, and I/O) less than 450 mW 

    • Other video processing projects we have implemented on the Cyclone IV have had a total power consumption of 388 mW

  • Frame rate on HDMI and Ethernet device at least 25 FPS 

    • HDMI is capable of 60 FPS without any hardware acceleration 

    • Ethernet is a gigabit ethernet. Assuming our camera has 8-bit color depth, having a video stream at 720p at 30 FPS is 30 1280 720 = 221,184,00 bits.Since 1 Gbit/221,184,00 bits = 4.5, there is enough bandwidth to send the frames in real time. We are allowing a little wiggle room for events such as latency

  • Sending a single processed frame via Ethernet to device takes at most 5 ms

    • This is a generous estimate, and will be measured using packet tracing utilities.

The benefit of using the DE10-nano development board equipped with the Intel SoC FPGA is that all of the necessary interfaces are present for us to realize our idea. In particular, having high-speed DDR3, an HPS system, and a bridge between them enables us to effectively move high quantities of data around as we need to. In addition, hardware is inherently faster in execution than pure software, and we are looking forward to documenting the drastic differences in efficiency and speed between the FPGA and Raspberry Pi designs.

7. Design Architecture

Figure: Planned low level block diagram of Project WatchDog

Figure: Implemented RTL of Project WatchDog currently

Conclusion

Even though Project Watchdog was ultimately a failure, we were very proud of the progress we made and all of the new skills we learned. 

We worked very extensively with OpenCV and learned how to perform a lot of convolution-based image processing techniques, such as erosion. We also learned a great deal with networking and IoT-development. Before this project, we had never set up any kind of web infrastructure like a web server. Working with ESP-8266 was very interesting, and we will definitely be using it in the future for more projects. 

Far and away, the training of the neural net was difficult. We did not anticipate how difficult the training process would be. 

We actually have plans to continue this project. One of our group members was recently granted access to the supercomputer cluster on campus (affectionately named HAL) that is primarily for training complicated, tensorflow based neural nets. We would absolutely love to learn how to use HAL to train a neural net that can accurately classify images. 

From there, after the weights are calculated, we will be implementing the net modules as described above. We will be considering bit precision, power/energy, and speed to ensure that the neural network is outputting the correct classification as needed.

One of the other things we would like to learn is how to properly instantiate the memory in the FPGA, to allow for efficient and correct memory access and usage. 

We also plan to host a lot of the IoT infrastructure on Amazon Web Services (AWS), which is recognized very widely in the industry as being fast, secure, and scalable. 

Ultimately, we would say that Project Watchdog works in semi-developed independent pieces right now. We still believe in the vision that we set out with when we started, so we will be continuing development with Watchdog. 

We are incredibly thankful to the producers of this competition for giving us the opportunity to compete and gain experience with IoT applications of the FPGA. We also want to give a huge thank you to our sponsor, Professor Zuofu Cheng for giving us support and advice during the competition. We look forward to continuing this, and seeing through with the ultimate vision of Project WatchDog: an efficient, smart, home security solution.



2 Comments

Mandy Lei
This design focuses on smart home security, it is a good idea! Please submit the complete design document!
🕒 Jun 25, 2019 09:41 PM
AS020🗸
Thanks Mandy, we're looking forward to competing this year!
🕒 Jun 30, 2019 10:33 PM

Please login to post a comment.