Annual: 2019

PR050 »
Fast Neural Architecture Search for Edge AI with OpenVINO
📁High Performance Computing
👤益銓 梁
 (國立台灣大學)
📅Nov 21, 2019
Regional Final



👀 3298   💬 2

PR050 » Fast Neural Architecture Search for Edge AI with OpenVINO

Description

Neural architecture search (NAS) is a technique for finding a neural network architecture model for domain-specific applications. This search tool itself is usually based on reinforcement learning, with a recurrent neural network to generate neural network models, but it would take a long time to find good candidates by searching and testing all possible combinations of neural network architecture models as each candidate needs to be trained with data. Moreover, as latency, power consumption, and chip areas are highly sensitive to hardware, there is no guarantee that the results will meet user’s requirements without modeling hardware characteristics in the search process. To address the aforementioned issues specifically for Edge AI applications on the OpenVINO Starter Kit, we propose a NAS framework which accelerates the search process for the FPGA and meets the accuracy and latency required by the user. Furthermore, our framework will optimize the model after the candidates are found. We will conduct case studies to demonstrate the effectiveness of our framework for Edge AI applications such as image classification.

Demo Video

  • URL: https://www.youtube.com/watch?v=YL1-kTxFabA

  • Project Proposal

    1. High-level Project Description

    Purpose of the Design

    神經架構搜尋 (NAS) 是一種特定領域應用的尋找神經網絡架構技術。這個搜尋工具通常基於增強式學習,使用遞歸神經網絡來生成神經網絡模型,但是需要耗時費日的通過搜索、測試所有可能候選神經網絡模型組合來找到好的候選模型且使用資料訓練模型。而且由於延遲、功耗、晶片面積容易受硬體影響,因此如果不在搜尋生成神經網絡模型過程中將硬體特性納入考量,將不保證結果符合使用者的要求。為了解決上述問題針對邊緣運算在人工智慧OpenVINO Starter Kit 上的應用,我們提出一個使用FPGA來加速神經架構搜尋 (NAS) 的框架並且在找到符合使用者需求及硬體特性的神經網絡模型之後,對於找到的模型再進一步地做優化。我們將進行案例研究證明我們的邊緣AI應用框架 (如圖像分類) 的有效性

    Neural architecture search (NAS) is a technique for finding a neural network architecture model for domain-specific applications. This search tool itself is usually based on reinforcement learning, with a recurrent neural network to generate neural network models, but it would take a long time to find good candidates by searching and testing all possible combinations of neural network architecture models as each candidate needs to be trained with data. Moreover, as latency, power consumption, and chip areas are highly sensitive to hardware, there is no guarantee that the results will meet user’s requirements without modeling hardware characteristics in the search process. To address the aforementioned issues specifically for Edge AI applications on the OpenVINO Starter Kit, we propose a NAS framework which accelerates the search process for the FPGA and meets the accuracy and latency required by the user. Furthermore, our framework will optimize the model after the candidates are found. We will conduct case studies to demonstrate the effectiveness of our framework for Edge AI applications such as image classification.

    Application Scope 

    神經架構搜尋 (NAS) 的應用範圍是無限的。對於任何新產生出來的資料集,皆可以使用NAS來尋找最適合的應用模型,我們致力於發揮硬體的特性提高效能。在本次設計中著重於在尋找影像辨識上的應用,並和目前現有的Benchmark 做效能比較。

    The application scope of NAS is limitless. When given a specific data set, user can use FPGA-implementation Neural Architecture Search to generate a suitable neural network. We are dedicated to utilizing the advantages of the FPGA to enhance the performance. In this contest, we focus on searching for image classification, and comparing the performance and accuracy with human-invented model.

    Targeted users

    對於手上有新資料集想開發新應用模型卻沒有足夠運算能力的一般開發者,此模型是極其理想的工具。例如: 當醫院的醫師有大量的醫學影像病例,卻礙於法規無法將資料外流之餘,又沒有足夠的運算及開發能力。此平台便可快速且有效的解決此問題,幫助醫院開發問診工具。

    This framework is ideal for general user/developer who has specific data set, but not enough computational ability. For example, doctors in the hospital may have massive amount of medical image data, however, due to strict policies, they are forbidden to disclose patient’s information to the parties outside of hospitals. This Framework can thus help the doctors who do not have the ability to develop ML to speedily develop the diagnosis tool.

    2. Block Diagram

      

    3. Intel FPGA Virtues in Your Project

    在完成此平台實作之前,我們根據所設計的方塊圖,Training Framework及Model Optimizer 兩個部分平行的下去開發。左邊部份實作為利用FPGA加速NAS,右邊部分為針對NAS所產出的模型去研究如何優化。

    在右邊部分,我們根據OpenVINO 現有的優化工具去實驗,了解OpenVINO的使用流程。之後帶入NAS所產生出的神經網路模型下去優化。以下表格是我們目前所做的實驗數據,實驗數據顯示 ResNet 152  的效能有顯著的提升。未來我們還會再針對效能分析資訊以及硬體資源部分再下去做優化。


    Before we finish this proposed framework implementation, we parallel implement the two parts of the above block diagram. On the left side, we accelerate NAS using the Cylcone V FPGA on OpenVINO Starter Kit. On the right side, we optimize the neural network model produced by NAS.

    To accelerate NAS with the Cyclone V FPGA, we have been experimenting Intel’s OpenVINO toolkit to understand the procedure and the working flow. Afterwards, we will begin to optimize the neural architecture generated by NAS. The following table is our experiment result for ResNet 152, Inception V4, and SSD300. As indicated in the table, the original optimizer has remarkable performance improvement. We will go further to analyze the hardware/software to achieve higher performance in the future.


       

     

    4. Design Introduction

    1. Markov Decision Processes

    在尋找神經網絡架構的過程中,我們使用在機率論以及統計學中常會用到的一個數學模型 - 馬可夫決策過程 (Markov Decision Processes, MDPs),來處理面對部分隨機的狀態下如何進行決策,考慮指導代理人 (agent) 在有限平面環境下,找到一條最佳路徑。首先我們先定義有限的狀態S,和動作空間U,Si 到Sj 轉移的機率P ,每一個步驟的時間t , 到達某個狀態的獎賞 r。代理人的目標是將獎賞最大化,其搜尋軌跡期望T的獎賞可以如下 (1) 表示。

    The main purpose of neural architecture search process is to train a learning agent to sequentially choose neural network layers. We use Markov decision process (MDP) to perform layer selection. Speaking more specifically, the purpose of this process is to teach an agent to find optimal path in a finite-horizon environment, to ensure that the agent will deterministically terminate in a finite number of time steps, and to maximize the total reward over all possible trajectories. To achieve the goal, we restrict the environment to be a discrete and finite state space S. For any si∈S , A is a set of agent actions, a∈A

        (1)

     

    1. Q-Learning 

    Q-Learning 演算法是主要是由tuple  [狀態,動作,獎勵] 所組成, Agent 的最終目標是最大限度地提高總體的獎勵值。Q-learning的主要的核心過程是Q-table,而Q-table 的行列分別代表state (S) 和 action (a) 的直,Q-table 的值Q(s,a)。 Q(s,a) 衡量當前狀態 states (S) 採取的 action (a)  到底有多好,在訓練的過程中,我們使用Bellman Equation 去更新 Q-table 

    Q-learning algorithm is a 3-tuple (state, action, reward). The maximum total expected reward we defined is Q*(si, a). Q*( .) is known as the active value function and individual Q*(si, a) is known as Q-values. The recursive maximization equation Bellman’s Equation can be formulated as an iterative equation. This equation is the simplest form of Q-learning propose by Watkins (1989). It is use to update Q-table. Where alpha is learning rate, r is reward, γ represents discount factor.

    (2)

    其中alpha 表示learning rate,r表示 reward ,γ表示diccount factor  

    演算法如下,

    1. 初始化Q-table

    2. 根據當前的狀態和參考Q-table內的值選擇下一個動作 (可加入隨機性)

    3. 根據當前的狀態和計算獎勵

    4. 更新 Q-table 

    Algorithm:

    1. Initiate Q-table and check terminal state.

    2. Decide the next step according to current state and Q-table.

    3. Decide the next step according to current state and calculate reward.

    4. Update Q-table.

     

    但Q-table 存在一個問題,在真實情況下的狀態 (state) 可能無限多,這樣 Q-table 會無限大,為了解決這個問題,我們通過加入限制來實現,並在尋找的過程中,通過加入限制來提升尋找model 的效率。

    However, this method will lead to a problem in Q-table: the search space will be too large in real circumstances, and so is the Q-table. To resolve the problem, we add inference time limitation and accuracy threshold constraints to accelerate the search time.

     

    訓練過程 Training Process

     

    根據前面給的Q-learning 迭代方程式 (2) 我們設定Q-learning rate α=0.01, diccount factor = 1,Epsilon ε 從1.0逐步降低至0.1。當ε =1.0 時探索大量的model ,代理人隨機採樣CNN model ,並使用一樣的權重來訓練。代理人在ε =0.1 時停止探索。如果model已被訓練過又再一次被採樣到,則不會再花時間重新訓練直接提供先前的效能數據給代理人。在每個model 都被採樣過且訓練過後,代理人將隨機挑選出100個model 並根據方程式 (2) 更新其Q-value 的值

    According to the equation (2) mentioned previously, we setup Q-learning rate alpha = 0.01, discount factor = 1, epsilon ε form 1.0 decrease to 0.1in every step. When ε =1.0 sample 40 models for training, agent will stop when ε = 0.1. After each model is sampled and trained, the agent will randomly sample 10 or 15 models from the replay dictionary and update the Q-value.

    ε 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
    ModelsTrained 1500 100 100 100 150 150 150 150 150 150

     

     

    ε 

    1.0

    0.9

    0.8

    0.7

    0.6

    0.5

    0.4

    0.3

    0.2

    0.1

    Models Trained

    1500

    100

    100

    100

    150

    150

    150

    150

    150

    150


     


     


     

    5. Function Description

    先產生一個model 並且訓練一個Iteration,藉由 OpenVINO Starter Kit 轉換過後在FPGA 上面執行 5 次 inference,並且取其平均,如果inference 時間長於要求,則放棄繼續訓練 提升精確度,並且產生新的model 。若inference  符合要求,則繼續往下訓練 10 epochs 提升精確度,並將其結果(Accuracy, Latency ) 加入Reward  , 重新計算並且更新Q-table 後再繼續產生新的model 循環

     
     

    6. Performance Parameters

    下表是使用Terasic OpenVINO StarterKit 實驗的數據結果。由於競賽的時間限制,以及硬體限制的關係,比較數據將限制在epsilon_schedule = [40,10,10,10,10,15,15,15,15,15,] 證明在同樣的search space 範圍內能夠加速搜尋。第一列表示原始qnn 不加任何限制在同樣的search space 需要 53 個小時。而底下使用FPGA來幫助加速,在同樣的搜尋空間之中,限制推論時間30ms, 15ms 皆可在17及4小時達到相近的精確度。效能提升最高13.25倍。

    若放寬推論時間限制至30ms,則可以在38小時得到72.58%的精確度


     

    7. Design Architecture

     

     

    此實驗環境在 Ubuntu 16.04, python 2.7, FPGA-OpenVINO Starter-kit, CPU-intel i7-7700, GPU-Nvidia 1070ti。充分發揮CPU, GPU, FPGA, 異質運算的特點,協助快速搜尋能力。

     

     

    Reference 

    [1] Song, Mingcong, et al. "Towards pervasive and user satisfactory cnn across gpu microarchitectures." 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2017.

    [2] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).

    [3] Jiang, Weiwen, et al. "Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search." arXiv preprint arXiv:1901.11211 (2019).

    [4]Baker, Bowen, et al. "Designing neural network architectures using reinforcement learning." arXiv preprint arXiv:1611.02167(2016).


     

     

     



    2 Comments

    张泽
    创意新颖、应用广泛、期待看到你的后续设计!
    🕒 Jul 01, 2019 10:17 PM
    Zhou Wenyan
    Good proposal, looking forward to see your project!
    🕒 Jun 27, 2019 09:56 AM

    Please login to post a comment.