EM036 » Real-time objects remover in Full HD videos
The task of image segmentation and its further real-time processing is one of the major computer vision research areas. Image segmentation of video sequence is essential to object detection, motion parameters estimation, big-data analysis automation.
Image segmentation of Full HD video sequence in real-time is impossible using only CPU computing. Therefore there is a need for an accelerator that will perform specific pipeline processing. The ideal platform for this task is FPGA with low power consumption, high performance and reconfigurable computing.
We offer a solution for image segmentation of video sequence in Full HD image resolution (1920x1080) with a frame rate of 30 frames per second. Data is sent from a single camera in real-time. As an example of the use of our solution we have removed the selected objects from the video sequence with the background recovery of the deleted part of the image.
Many computer vision tasks require segmenting an image. For example, it may be required when background has to be subtracted, moving objects detected, the number of objects of certain type counted. Image segmentation is an integral part of computer vision, especially when it is necessary to work with big-data:
self-driving car control
automation of medical equipment data processing (MRI, ultrasound)
various security systems
surveillance systems of the metro users and people in the street
road traffic analysis systems
production process automation
waste sorting automation
drone images processing automation
satellite image processing
We use a statistical algorithm powered by Gaussian Mixture Model (GMM), for row-by-row image segmentation. Our implementation of the algorithm is based on estimated parameters of every pixel (intensity and speed of motion).
As of 2019 it is impossible to segment a Full HD (1920x1080) video with a rate of 30 frames per second with a true segmentation probability of at least 95% using only CPU.
The use of Intel FPGA allows us to implement a pipeline processing scheme to estimate the distribution of intensity and pixel motion speed.
The ARM Cortex-A9 processor allows to run OS Linux and accelerate the process of input data analysis using hardware interrupts. It also simplifies the development of the user interface with the finished product. The DE10-Nano development kit allows to connect a Full HD camera to the GPIO connector to process a video stream in the pipeline and deliver the finished image to the monitor via the HDMI TX interface. Thus, the development kit includes all needed features and it allows us to initiate the development process immediately.
We demonstrate the possibility of real-time Full HD (1920x1080) video stream processing by removing objects from a segmented image and replacing a removed area with the background. Such application might be useful in filmshooting (when shooting staff has to be removed), or in photo/videoshooting in a crowded place when the crowd has to be removed.
Figure 1 demonstrates the block diagram of our device.
Figure 1. System flow diagram
The video stream is buffered frame-by-frame in external DDR3 memory. Then, the intensity of the color and speed of each pixel are estimated in each frame. After that, GMM module based on the algorithm of Gaussian Mixture Model segments the image. Using a laptop, a segment that needs to be removed and replaced with a background is selected, after which the video stream is sent to the screen.
Input data stream sent from the Full HD (1920x1080) 24 bit RGB camera with a frame rate of 30 frames per second is 1.5 Gbps.
GMM segmentation module receives two inputs:
The “Object Remover” module receives a segmented image «Foreground» (1920x1080, 24 bit RGB + 8 bit segment info, total 2 Gbps), removes a subtracted image from the input stream and reconstructs the “Background”. Finally data is sent to the monitor with a rate of 1.5 Gbps.
Processing of such a big number of inputs in real-time is impossible even with a “powerful” CPU. The advantage of the algorithm is the possibility of its implementation for row-by-row image processing, so the algorithm allows to make parallel processing in FPGA.
A typical number for computing with a CPU is 25 FPS at a resolution of 640x480, total 184 Mbps (8 times slower than our solution)!
Advantages of using FPGA:
The project's aim is to develop an FPGA-based hardware-software complex to remove unwanted objects from video in real time. Unwanted objects involve all dynamic objects in footage. The device determines the presence of a moving object in a frame and replaces it with a background image. As a result, the device produces an adaptive background that can be used by end-users, for example, in video encoding tasks or as a part of computer vision. The final form of the hardware-software complex includes a FullHD video camera for image capture, a monitor for image output, a dev board to which the camera and the monitor are connected and which processes a video stream.
We are using FPGA because it is impossible to process a FullHD high-resolution video at 30 frames per second, using a CPU or a GPU to remove unwanted objects from a video image.
Automatic video stream analysis systems to detect people, objects or events of interest in various conditions to identify abnormal or dangerous situations. Segmentation and determination of moving objects in a video sequence is the first important step toward extracting information from a video stream. To segment a video stream with maximum performance a statistical algorithm based on the Gaussian Mixture Model (GMM) is actively used. In this case, each framne from a video sequence is described as a set of independent pixels, each of which is a random process described by the following probability density:
Where K is the number of independent Gaussian random processes the quantity and parameters of which have to be estimated.
This is the pixel value and parameters of a cluster at time t
Work phases for each pixel
1. Search for the best cluster
1.1. If the cluster is not found, a new cluster is created to replace the most inappropriate cluster.
1.2. If the cluster is found, the cluster parameters are updated.
2. Sorting of clusters by the criterion of maximum of ratio .
3. Determination of the minimum number of clusters B that describe the background, i.e. satisfy the inequality . Here T - is the fraction of time when clusters of one pixel are in the video sequence.
4. Calculation of probabilities of a pixel belonging to each cluster.
5. The choice of the cluster k_opt with a maximum probability of the pixel belonging.
6. If k_opt > B, the pixel is defined as a dynamic object (foreground), otherwise it is defined as background.
Segmentation of a FullHD resolution video stream (1920x1080) at a speed of 30 frames per second is a complex high-performance task that requires a significant amount of memory (40 MByte) to store cluster parameters per pixel. Besides, it needs high throughput when working with memory (16 Gbit/s).
The main data streams in the system are given below.
Input data from video camera (Terasic D8M):
Processing data with DDR3 Memory (WRITE mode)
Processing data with DDR3 Memory (READ mode)
Output data to FullHD monitor (HDMI PHY ADV7513):
Figure 2 shows the parameters ofthe system generated by Quartus Prime 18.1 Standard Edition. Due to the use of the Intel HLS Compiler tool 100% of DSP blocks were efficiently used.
Figure 2. System parameters
The development board DE10-Nano with DDR3 memory connected to the FPGA is one of the only reasonably priced solutions on the market that provides high throughput of data processing when working with memory. Moreover, a large number of built-in DSP blocks allows to save a resource of logical elements (LE).
The main computing device "GMM_FG_DETECTOR" (fig. 3) for segmenting dynamic objects in a video sequence was developed using the Intel HLS Compiler tool.
In order to insert a processing module into our system, a data arbiter was developed which consists of a reader finite state machine and a writer finite state machine.
At the input and output he GMM module supports the standard Intel Video and Image Processing Protocol.
The GMM_FG_DETECTOR module via Avalon-ST interfaces "scr-mem" and "snk_mem" stores cluster parameters for each pixel in external DDR3 memory.
Figure 3. GMM module architecture
Figure 4 shows the architecture of the system with the developed module "GMM".
1. Data is received from the camera via demo IP Core "Terasic Camera IP".
2. Video stream undergoes binary segmentation (foreground/background) in the GMM module.
3. The GMM Visor module allows to select the output image depending on the position of SW switches [1:0].
4. The data is reformatted in the "Clocked Video IP" block and displayed on the monitor.
Figure 4. Architecture of the system
Project on GitHub: https://github.com/CatWithoutBoots/Realtime-objects-remover-in-FullHD-videos