PR023 »
Posture Recognition Based on Deep Learning
📁Machine Learning
👤Xudong Chen
 (Shanghai University)
📅Nov 06, 2017


👀 1126   💬 10

PR023 » Posture Recognition Based on Deep Learning


To help machines learn what we human beings are doing via a camera is important. Once it comes true, machines can make different responses to all kinds of human's postures. But the process is very difficult as well, because usually it is very slow and power-consuming, and requires a very large memory space. Here we focus on real-time posture recognition, and try to make the machine "know" what posture we make. The posture recognition system is consisted of DE10-Nano SoC FPGA Kit, a camera, and an HDMI monitor. SoC FPGA captures video streams from the camera, recognizes human postures with a CNN model, and finally shows the original video and classification result (standing, walking, waving, etc.) via HDMI interface.

Project Proposal

1. High-level Project Description


    It is very important to teach the machines to recognize human postures, using computer vision technologies, because it is much more convenient compared with those using extra devices (gravity accelerators, gyroscope and so on). So, which algorithm or model can we choose to realize posture recognition?

    Convolution Neural Networks (CNNs) are used in many fields related to images and videos, such as image recognition, object classification and target tracking. However, the computations of CNNs can be so complicated that CNNs on ordinary CPUs are very slow, power-consuming and requires too much memory space. And it becomes difficult to realize CNNs on embedded devices.

    Usually CNNs are realized on GPUs, TPUs, FPGAs and so on. And with the development of High-Level-Synthesis (HLS) and OpenCL, FPGAs are more and more used to accelerate the computing of CNNs.

Proposal of the Design

    Here we propose a posture recognition system based on DE10-Nano SoC FPGA Kit, with both FPGA part and HPS part on it. The FPGA part captures video streams from the camera, and stores it into DDR3 through AXI-Bridge or FPGA-to-SDRAM interface. Using OpenCL and HPS part, we can accelerate the process of building skeleton and recognizing posture based on CNNs. And the result, again, is stored into DDR3 SDRAM. Finally, we can see the original video and classification result (standing, walking, waving, etc.), and how the system performs via HDMI video.

Application Scope

    In consideration of camera’s performance, the proposed posture recognition system requires a bright environment. Once the machines learn how to recognize users’ postures, the classification results can be used in many applications, such as remotely controlling a model car, motion sensing games, etc.

Targeted Users

    The proposed posture recognition system is targeted to those who are wild about interesting technologies, motion sensing games and so on.

2. Block Diagram


       Posture recognition is a difficult task, since the machine should consider several continuous frames in the video, and capture the object or the human correctly, and recognize whether the object is static or dynamic.

       Generally, the human body can be considered as a system with bones and joints. So we can try to represent the human body based on skeleton. Moreover, in the “foreground extraction” field, there is a so-called “optical flow” method, which can be used to measure the speed of the object.

       So, here we combine the two methods, and recognize the posture in the following way:

  1. Represent body based on skeleton;
  2. Measure the speed of the body using optical flow method;
  3. Arrange the results of last several frames together, and get a “feature fusion figure”;
  4. Use the CNN model to recognize the posture.


       We train the CNN model in the offline way, and it means that we should sample many labeled data. After training, we can make real-time inference. The posture recognition system is consisted of a camera (D5M, D8M, or OV7670), the DE10-Nano SoC FPGA Kit, and an HDMI monitor.

       Firstly, FPGA write and read DDR3 on the HPS side, through AXI-bridge or F2SDRAM interface. And then the video stream is transferred into posture recognition module, where there is an OpenCL kernel on FPGA and Host Program on HPS. The posture recognition module computes the skeleton representation and the optical flow, and combines the results into a feature figure, and CNN model is computed at the same time. Finally, the posture label and the original video are mixed in the image fusion module, and are shown together via the HDMI interface.

Connect Camera and FPGA Kit

       Since we focus on posture recognition algorithm and its realization on SoC FPGAs, we choose an MT9D111 camera named SF-MT9D111-Ver1.0 on Taobao, and here is the Taobao Link and its picture:

       And we choose DE0-Nano-SoC FPGA Kit as our first trial. But the problem occurs: MT9D111 camera cannot be directly connected to GPIOs on DE0-Nano-SoC Kit. So we design a pinboard to connect MT9D111 and DE0-Nano-SoC.

       And here is the posture recognition platform. Later, we will change to DE10-Nano.

Configure MT9D111

       Before receiving pixel stream from the camera, we need to configure MT9D111 via I2C interface. And firstly we generate an MCLK at 25 MHz to MT9D111 (on its pin CLKIN) as its operating clock. Registers in MT9D111 are separated into 3 pages: Page #0 for sensor core registers, Page #1 and Page #2 are for IFP registers.

       The I2C interface timing diagram can be described as following:

       Moreover, we can describe a reading/writing operation with MT9D111 as { R/W, Page, RegAddr, RegData }, in which R/W is READ/WRITE selection, Page is 16-bit to choose which page the register is inside, RegAddr is 8-bit as the address of register, and RegData is 16-bit as the data to write into the register, and for reading, RegData==0x0000.

       According to the MT9D111 datasheet, the I2C interface of MT9D111 supports both 16-bit and 8-bit mode. Considering the portability of I2C Verilog HDL code (cause many sensors use 8-bit I2C, MPU6050, etc), we choose 8-bit mode. And the total operation goes as following:

       Operation RW_MT9D111_REG (R/W, Page, RegAddr, RegData)

  1.        i2c_write_8b( SlaveAddr, 0xF0, Page>>8)          // Page High 8b into Register 0xF0
  2.        i2c_write_8b( SlaveAddr, 0xF1, Page&0xFF)     // Page Low 8b into Register 0xF0
  3.        if READ :
  4.               i2c_read_8b(SlaveAddr, RegAddr, &D.high); // Read Register RegAddr, High 8b
  5.               i2c_read_8b(SlaveAddr, 0xF1, &D.low);      // Read Register RegAddr, Low 8b
  6.        else if WRITE:
  7.               i2c_write_8b(SlaveAddr, RegAddr, RegData>>8);   // Write RegAddr, High 8b
  1.               i2c_ write _8b(SlaveAddr, 0xF1, RegData&0xFF);   // Write RegAddr, Low 8b

Video sample & display system

      In our video sample & display system, there are three main parts: video input & output, video storage, and video analysis. We use the 1/3.2-inch camera MT9D111 to sample video, and ADV7513 to output HDMI video, and both chips can be configured to use RGB565 pixel format. And MT9D111 data can be stored into DDR3, and ADV7513 data from DDR3 via fpga-to-sdram interface on HPS. Moreover, Linux APP can use mmap() and memcpy() to save data inside DDR3 into an ima file, which can be copied to PC via SCP. And we can use MATLAB / TensorFlow to analyze video data.


       In order to test the correctness of the systemwe use MATLAB to plot the image inside DDR3, and use HDMI monitor to show video output.

3. Intel FPGA Virtues in Your Project


There are both FPGA logic and ARM processor on the SoC FPGA chip. And the resources on DE10-Nano SoC FPGA Kit are following:

On FPGA side

  1. 5CSEBA6U23I7 device (110K LEs)
  2. EPCS64 Serial configuration device
  3. HDMI TX compatible with DVI 1.0 and HDCP v1.4

On HPS side

  1. 800MHz Dual-core ARM Cortex-A9 processor
  2. 1GB DDR3 SDRAM (32-bit data bus)
  3. 1 Gigabit Ethernet PHY with RJ45 connector
  4. Micro SD card socket
  5. UART to USB, USB Mini-B connector


       On SoC FPGA there are three AXI-Bridges to make the communication between FPGA and ARM much more convenient and faster: HPS2FPGA, FPGA2HPS and Light-Weighted HPS2FPGA.


       The most interesting point is “OpenCL”, which can be used to realize heterogeneous computing with CPU and FPGA. Using OpenCL, we can realize complicated algorithms on FPGA in very short time. And compared with that on CPU, the computing process on FPGA can be paralleled and pipelined, which makes computing much faster, and less power-consuming as well.


       Using DE10-Nano SoC FPGA Kit, the proposed posture recognition system can be realized within a tiny board with a camera. Moreover, the recognition process can be accelerated and optimized using OpenCL. Finally, the proposed system is connected to Ethernet, making it convenient to monitor.


Tadesse Amare
Its very important to be considered!!!!!!!!!!
🕒 Jan 29, 2018 09:03 PM
thank you very much! we are going on!
🕒 Jan 29, 2018 09:21 PM
Donald Bailey · Judge ★
How do you plan to do the skeleton recognition and optical flow tasks?
🕒 Jan 26, 2018 05:34 AM
Thank you! We previously reviewed some information about skeleten represent and optical flow, both of them are realized in OpenCV. And there are some demos about LK optical flow using OpenCL on Altera website. But C/C++ language isn't our goal, we plan to test the algorithms in C/C++, and finally realize them using Verilog HDL.
🕒 Jan 26, 2018 09:21 AM
Jun Fu · Judge ★
🕒 Jan 25, 2018 09:42 PM
🕒 Jan 26, 2018 09:13 AM
Jun Fu · Judge ★
🕒 Jan 29, 2018 10:22 AM
🕒 Jan 29, 2018 11:51 AM
berkay egerci
keep going ! i voted
🕒 Jan 13, 2018 04:11 AM
Thank you very much! we are still going on, and now we focus on sampling video data, to analyze the video offline.
🕒 Jan 26, 2018 09:14 AM

Please login to post a comment.