Annual: 2019

AS012 »
Real time streaming analytics Machine Learning
📁Machine Learning
👤Paulo Alonso
 (Phirmup Systems)
📅Oct 15, 2019
Regional Final

👀 3721   💬 2

AS012 » Real time streaming analytics Machine Learning


Real time streaming analytics Machine Learning by FPGA accelerator card using FPGA AI Engine OpenVINO/DLA. The approach get through cloud based on open source for multispectrum, multi-RAT Spectrum sharing between M2M and LTE-A in 5G networks. The "streaming" works on a secure framework to provide firmware updates on Network devices code bases by Coreboot and EFI Development Kit (tianocore) for the rapidly evolving RTOS embedded system with detailed coverage of requirements and optimization of Boot Setting File (BSF). The software access multiple connections through protocols for secure remote access and learns from the "behavior" of your network, then activates the most probable event scene based on the Java-based web interface usage pattern with a multi-cloud hybrid platform using the Python framework to automate data analytics, including edge-based gateway decision, improves connection, monitoring, authentication of intelligent sensor efficiency for automatic adaptation of usage. Enhancing the learning capabilities for qualification of components based for specific tasks allocated in gateways that could act as instruments distributed by a platform have a dedicated neural network processing unit and AI function API for performance the FPGA tuning module SIM card data with modular network topology Data Center (AS, peering, links) and Edge Firewall. With a smart layer for rules and automation, uses "admin" link data centers networks and Gateways controls allowing for addition to performer "node to gateway" communications such as (BACnet). Slicing data for 5G evolution bridge our physical and virtual world for operators tap into based on our experience on development ready to turn technical concepts into 5G transformations. The ability to carry out processes, like profiling data sources algorithms can comb through all of the different data sources include the data log and select the ones that fall within a certain category have a clearer "frame" of the sources available, AI technology can tag them automatically, eliminating considerable manual work. In addition the links among different sources for detecting anomalies on data traficc. Select switch protocols to explore Link-Aggregation (802.3ad) or VPC like and always think about stacking and VLAN (802.1q). VLAN Extension in L2, VPLS / VPWS, OTV like E-VPN and PBB-EVPN. Define QoS, Jumbo Frame and SAN Network. Framework programmatic ones being 5G security sensors automated firmware update by FOTA. The software can connect industrial SCADA or DCS directly to the cloud using industrial protocols such as NB-IoT, CAT-M1, MODBUS, OPC, ISA100 wireless technology, PROFIBUS for connectivity and CoAP / MQTT for gateway solves the problem of interoperability and M2M communication through of internal flash memory FPGA, with this feature in synchronization of private files from open cloud storage service to a safe and upgraded service. It has differential correction parameters implemented through the very useful radio module when there is no reliable network coverage structure to reduce the high cost of the DTLS handshake in the WSN and provides reduced latency when compared to a standard DTLS use case without require changes to the final hosts in a multiple output multiplex network (MIMO) through the RTM module in parallel through many more antennas.

Demo Video

  • URL:

  • Project Proposal

    1. High-level Project Description

    Purpose of the Design

    Thought to gain the insights needed to tackle the project. Knowledge acquired bring innovation to adapt to the rapidly changing, and keep at the forefront. Whether are developing software applications, creating accelerated or embedded software systems at the hardware level.


    Application using Kubernetes with continuous VM deployment on Cloud, to Machine Learning get reliability in real-world operations. Development of logical functions for serverless applications where cloud data management scale workloads and connect application to the platform using any on-premise, Cloud, and SaaS integration standard.

    Machile Learning Analytics

    Cloud applications on the Kubernetes Container Orchestration Platform

    Cloud with Mesh services acceleration

    Dividing “monolithic” into microservices

    Testing microservices with Services Virtualization

    AI Inference Performance Optimization & overlay Ips

    The design of meeting performance targets for model inference in end-to-end system, also to use the Intel platform to measure performance, find the potential bottleneck, and optimize the inference performance in a number of ways.

    Deep Learning

    Working on more efficient algorithms and focusing to allow quick development of products, where software is becoming even more important. The relevance of SoCs, FPGAs, in efficiently executing AI inference tasks as well as accelerating the "whole application" to achieve real-life end-to-end requirements.

    Deploying Data Center Accelerated Applications

    Software application make it from hardware design out into the real world cover integration with third-party frameworks, Python APIs and bindings, containerization for microservice deployment (including Docker and Kubernetes).

    Intel Software Runtime

    Acceleration runtimes fundamentally perform three related tasks: memory allocation, memory migration, and sequencing of computation. Under that scope, development of capability Intel runtime nuances and expressive power of the "soft side" of hardware acceleration.

    Development Software Accelerated Libraries

    Use within focusing primarily on commonalities in structure and use model, also into three libraries in particular (OpenCV, FinTech, and Database Acceleration) for design look at how provide acceleration for a number of different use cases.

    Application Scope

    Unified Software Environment application make it from initial hardware design out into the real world integration with third party frameworks, Python APIs and bindings, containerization for microservice deployment (including Docker and Kubernetes).

    Machine Learning Hardware Accelerator FPGA

    Cyclone V integrated hardware has an integrated host processor with memory coherency between the Cyclone V processor and the FPGA providing a heterogeneous compute solution for workload optimizations. Proving performance and efficiencies across many data center type workloads such as cloud services, analytics, genomics, security, packet processing, virtual switching, compression, deduplication. The programming model (virtual addressing and data caching) the Intel Cyclone V integrated FPGA enables new classes of algorithms for acceleration.

    Hardware Acceleration integrated FPGA systems programming tools, operating systems, and applications for accelerator-based computing systems architecture programmed using RTL and OpenCL technology.

    Logic Design FPGA

    Hardware and Software Architecture

    Intel Cyclone V integrated FPGA

    Accelerator Abstraction Layer

    Core Cache Interface

    Accelerator Function Unit Design

    Memory Protocol Factory

    OpenCL Programming

    High Level Design (HLD)

    RTL, HLD, and OpenCL

    Hardware Accelerator

    Hardware accelerators programming tools technologies require reformulate algorithms for FPGA, partition a design between FPGA and CPU, and orchestrate data transfers between FPGA and CPU, and portability between different CPU and FPGA systems to target proposal.

    Cyclone V FPGA incorporates Intel Accelerator Abstraction Layer Software to achive development in computer architecture, operating systems, programming tools, and innovative applications for accelerator-based on Cyclone V FPGA system (DE10Nano) made via one centralized cluster.

    Project developed on a framework to integrate design across varied platforms. One goal is develop an improved application portability while maintaining performance. Reconfigurable Hardware Virtualization (RHV) framework share the reconfigurable resources in High-Performance Computers (HPC) among all system microprocessors and/or processor cores.

    Trust and Assurance Approaches

    FPGAs circuit design programmable architecture, take an circuit board there is IP (Intellectual Property) available for a wide variety of functions that can be put on the FPGAs, including microprocessors, filters, phase locked loops, and DSP.

    Persisting Streamed Data

    Open Apache Parquet put object storage in a public cloud, private cloud or on-premises as needed. Sharing a common SQL Engine with other data management solutions to access data throughout hybrid data architecture.

    Targeted Users

    Targeted are Data Centers, Computational Storage, Telecom, Industrial, Vertical/system integrators, Network specialists (Test, Measurement and Wireline). Other potential customers are cellular service providers and satellite operators.





    2. Block Diagram

    Board Layout




    SoC Diagram



    DE10 Nano Block Diagram



    Cyclone V Schematic Diagram



    Block Diagram



    Schematic Block Diagram




    3. Intel FPGA virtues in Your Project

    Intel FPGA Virtues

    FPGA addapt to changing requirements in system mode or run-time in SoC.

    DRC Computer co-processors for streaming analytics, graph databases, data deduplication, database queries and character-matching.

    Digital signal processor (DSP) and graphics processing unit (GPU), are capable of executing an algorithm written in a high-level language, such as C, and have function-specific accelerators to improve the execution.

    Intel High-Level Synthesis compiler using C/C++. This compiler extracts parallelism, organizes memory, and connects multiple programs within an FPGA by operations, conditional statements, loops, functions, dynamic memory allocation and pointers.

    Compare Performance

    FPGA provides date rate optimization, with Control-centric algorithms that implement FPGAs network express control in C/C++ for user datagram protocol (UDP) packet processing.

    Verification techniques to the Intel HLS compiler presents typical coding errors and possible solutions to each problem and what to do when program behavior cannot be fully verified at the C level  software test bench, code coverage, uninitialized variables, out-of-bounds memory access, co-simulation and when C/C++ verification is not possible.

    OCT Intel FPGA IP allows to dynamically calibrate I/O with reference to an external resistor and design trade-offs by AXI and application.

    Design Parameter

    Intel FPGA devices functions, generate hierarchical simulation scripts by referencing of subsystems and IP components without traversing system hierarchy use VHDL syntax to connect ports in Qsys with wire-level connectivity IP components that use system interfaces into reduction IP upgrade regeneration time.

    IP Core Generation Workflow

    The offsets can be found in the IP Core Report Register Address Mapping Using a JTAG to AXI Master is a way to interface with HDL Coder IP core registers in systems which do not have this feature, add clock interface hRD.

    Connect ARM Microcontroller to a FPGA using Extended Processor

    Use ARM9 CPU core with hardware based native Jazelle Java Byte-code execution application is the processors with External Memory Interface for the demo/test design, only two Block-RAMs are used, when the address decoding is not very fine tuned.

    Add interfaces not native to the Processor

    Memory controllers interfaces add remaining data sections to local memory, where I/O not fit into any of the other Avalon Interface types. Add the VHDL code from native stack with clock recovery at the receiver.

    Interfacing an External Processor to an Intel FPGA

    Interfaces to a PCI Express and PCI Lite endpoint inside the FPGA menu items to add any of the standard Avalon interfaces to custom bridge.

    4. Design Introduction

    Design Introduction

    Electronic companies design the hardware dedicated to their products with their standards and protocols which makes it challenging for the end users to reconfigure the hardware as per their needs. This requirement for hardware led to the growth of a new segment of customer-configurable field programmable integrated circuits called FPGAs.

    What is FPGA?

    The FPGA is Field Programmable Gate Array. It is a type of device that is widely used in electronic circuits. FPGAs are semiconductor devices which contain programmable logic blocks and interconnection circuits. It can be programmed or reprogrammed to the required functionality after manufacturing.

    Basics of FPGA

    When a circuit board is manufactured and if it contains an FPGA as a part of it. This is programmed during the manufacturing process and further can be reprogrammed later to create an update or make necessary changes.

    This feature of FPGA makes it unique from ASIC Application Specific Integrated Circuits (ASIC) are custom manufactured for specific design task. Today FPGA easily pushes the performance barrier up to 500MHz.

    In microcontrollers, the chip is designed for a customer and they have to write the software and compile it to hex file to load onto the microcontroller. This software can be easily replaced as it is stored in flash memory.

    In FPGAs, there is no processor to run the software and we are the one designing the circuit. We can configure an FPGA as simple as an AND gate or a complex as the multi-core processor.

    To create a design we write Hardware Description Language (HDL), which one is VHDL. Then the HDL is synthesized into a bit file using a BITGEN to configure the FPGA.

    The FPGA stores the configuration in RAM, that is the configuration is lost when there is no power connectivity. Hence, they must be configured every time power is supplied.

    FPGA Architecture

    FPGAs are prefabricated silicon chips that can be programmed electrically to implement digital designs. The first static memory based FPGA called SRAM is used for configuring both logic and interconnection using a stream of configuration bits. Today’s modern EPGA contains approximately 3,30,000 logic blocks and around 1,100 inputs and outputs.



    FPGA Architecture

    FPGA Architecture consists of three major components:

    Programmable Logic Blocks, which implement logic functions

    Programmable Routing (interconnects), which implements functions

    I/O blocks, which are used to make off-chip connections

    Programmable Logic Blocks

    The programmable logic block provides basic computation and storage elements used in digital systems. A basic logic element consists of programmable combinational logic, a flip-flop, and some fast carry logic to reduce area and delay cost. Modern FPGAs contain a heterogeneous mixture of different blocks like dedicated memory blocks, multiplexers. Configuration memory is used throughout the logic blocks to control the specific function of each element.

    Programmable Routing

    The programmable routing establishes a connection between logic blocks and Input/Output blocks to complete a user-defined design unit. It consists of multiplexers pass transistors and tri-state buffers. Pass transistors and multiplexers are used in a logic cluster to connect the logic elements.

    Programmable I/O

    The programmable I/O pads are used to interface the logic blocks and routing architecture to the external components. The I/O pad and the surrounding logic circuit form as an I/O cell.

    These cells consume a large portion of the FPGA’s area. And the design of I/O programmable blocks is complex, as there are great differences in the supply voltage and reference voltage.

    The selection of standards is important in I/O architecture design. Supporting a large number of standards can increase the silicon chip area required for I/O cells.

    With advancement, the basic FPGA Architecture has developed through the addition of more specialized programmable function blocks.

    The special functional blocks like ALUs, block RAM, multiplexers, DSP, and microprocessors have been added to the FPGA, due to the frequency of the need for such resources for applications.

    FPGA Design

    Design solutions in areas of FPGA, includes design / development, RTL coding, test suite development and testing / verification. High Speed Bus interfaces, video processing, data acquisition for ASIC in FPGA on the target platform. Customized solutions built around these IP cores, follows defined implementation procedures starting from requirement specification to target testing.

    FPGA IP Core

    Design of FPGA IP Cores by custom and integrated the IPs, which includes, Processor & peripheral cores, bus interfaces cores, video/multimedia cores, and storage cores. The FPGA IP cores provide the standard optimized for deliver VHDL synthesizable RTL or optimized netlist for FPGAs with VHDL testbench simulation design with IP for system integration.

    FPGA Architecture Design Flow

    FPGA Architecture design comprises of design entry, design synthesis, design implementation, device programming and design verification. Design verification includes functional verification and timing verification that takes place at the time of design flow. The following flow shows the design process of the FPGA.

    Design Entry

    The design entry is done in different techniques like schematic based, hardware description language (HDL) and a combination of both etc. If the designer wants to deal with hardware, then the schematic entry is a logic choice.

    If the designer thinks the design in an algorithmic way, then the HDL is the better choice. The schematic based entry gives the designer a greater visibility and control over the hardware.

    Design Synthesis

    This process translates VHDL code into a device netlist format, i.e., a complete circuit with logical elements. The design synthesis process will check the code syntax and analyze the hierarchy of the design architecture. This ensures the design optimized for the design architecture. The netlist is saved as Native Generic Circuit (NGC) file.

    Design Implementation

    The implementation process consists of:

    • Translate

    • Map

    • Place and Route


    This process combines all the input netlists to the logic design file which is saved as NGD (Native Generic Database) file. Here the ports are assigned to the physical elements like pins, switches in the design. This is stored in a file called User Constraints File (UCF).



    Mapping divides the circuit into sub-blocks such that they can be fit into the FPGA logic blocks. Thus this process fits the logic defined by NGD into the combinational Logic Blocks, Input-Output Blocks and then generates an NCD file, which represents the design mapped to the components of FPGA.



    The routing process places the sub-blocks from the mapping process into the logic block according to the constraints and then connects the logic blocks.


    Device Programming

    The routed design must be loaded into the FPGA. This design must be converted into a format supported by the FPGA. The routed NCD file is given to the BITGEN program, which generates the BIT file. This BIT file is configured to the FPGA.

    Design Verification

    Verification can be done at various stages of the process.

    1.Behavioral Simulation (RTL Simulation)

    Behavioral simulation is the first of all the steps that occur in the hierarchy of the design. This is performed before cheap lace dresses the synthesis process to verify the RTL code. In this process, the signals and variables are observed and further, the procedures and functions are traced and breakpoints are set.

    2. Functional Simulation

    Functional simulation is performed post-translation simulation. It gives the information about the logical operation of the circuit.

    3. Static Timing Simulation

    This is done post mapping. Post map timing report gives the signal path delays. After place and route, timing report takes the timing delay information. This provides a complete timing summary of the design.

    Applications of FPGAs

    Wide range of applications like random logics, SPLDs, device controllers, communication encoding and filtering.

    Emulation of entire large hardware systems via the use of many interconnected FPGAs.

    Offer a powerful solution for meeting machine vision, industrial networking, motor control and video surveillance.

    FPGAs are used in custom computing machines.

    FPGAs provide a unique combination of highly parallel custom computation and low-cost computation.

    FPGA architecture provides a new generation in the programmable logic devices.

    The word Field in the name itself denotes to the ability of the gate arrays to be programmed for a specific function by the user instead of by the manufacturer of the device. The word Array is used to denote a series of columns and rows of gates that can be programmed by the end user.


    Accelerator: The unit that can be assigned to an instance for offloading specific functionality. For non-FPGA devices, it is either the device itself or a virtualized version of it (vGPUs). For FPGAs, an accelerator is either the entire device, a region within the device or a function.

    Bitstream: An FPGA image, usually a binary file, possibly with vendor-specific metadata. A bitstream may implement one or more functions.

    Function: A specific functionality, such as matrix multiplication or video transcoding, usually represented as a string or UUID. This term may be used with multi-function devices, including FPGAs and other fixed function hardware like Intel QuickAssist.

    Region: A part of the FPGA which can be programmed without disrupting other parts of that FPGA. If an FPGA does not support Partial Reconfiguration, the entire device constitutes one region. A region may implement one or more functions.


    FPGA integrated development applications targeting Data Center accelerator cards and other FPGA-as-a-Service.

    Software Development Flow

    A profiler to guide application optimization

    Compilers for host and FPGA-accelerated code

    Emulation flows development and debug

    Automatic communication between software and hardware

    Host application uses standard OpenCL API calls to interact with the FPGA accelerated functions which can be modeled in either RTL, C/C++, or OpenCL

    Quartus provides optimized compiler for host applications, cross compiler for the adaptable hardware, debugging and profilers to identify performance bottlenecks and optimize application

    The runtime and board-specific shells automatically manage communication between the FPGA accelerators and the host application

    Advantages of FPGA Acceleration

    Design-Specific Architectures for applications of high performance algorithms and workloads

    FPGA devices are adaptable to be built to accelerate specific parts of the code

    FPGA development provides a set of tools and reports to profile the performance of your host application, and determine acceleration

    Provide automated runtime instrumentation of cache, memory and bus usage to track real-time performance on the hardware

    Distributed Deep Learning

    FPGA platform accelerate model training for Machine Learning high performance AI software stack MLDL frameworks such TensorFlow, Caffe and pyTorch, results performance increase of 10x to 1000x application dependent.


    A field programmable gate array is a semiconductor device than can take on the personality of a customers design by programming it. Unlike a processor that executes a program, an FPGA configures itself to become an operating circuit that will then respond to inputs in the same way that a dedicated piece of hardware would behave the schematic is where are place and wire components and add simulation probes.


    FPGA Project

    FPGA, or Field Programmable Gate Array (FPGA), is a logic gate level programmable device that serves to implement digital circuits.

    Internally, an FPGA consists of three types of elements:

    IOB (Input / Output Block), or input / output block. These circuits are responsible for interfacing between the internal blocks of the FPGA and its external pins.

    Configurable Logic Block (CLB). These are blocks composed of combinational elements (look-up tables - LUTs - used to implement the logic gate function) and flip-flops that can be programmatically configured.

    Switch Matrix (Switching or Interconnect Matrix). These are keys arranged in rows and columns and responsible for connecting the various CLBs with each other and with the inputs and outputs of the FPGA.

    FPGA projects are carried out according to a flow that includes:

    Register Transfer Level (RTL) circuit description in a hardware description language, the most commonly used being VHDL.

    Logical synthesis. Step that generates the netlist and signal delay information, both necessary for simulations.

    Behavioral and electrical simulations to verify circuit operation.

    Placement and routing. Mapping step of the synthesized circuit to the physical elements of the FPGA.

    Power and Time analysis. To verify that the generated circuit does not present critical runs between the signals and if the time and power specifications are being met.

    Defined the content of this interconnection matrix, that is, defining the routing of the various constituent elements of the FPGA. The contents of this array must be stored nonvolatile in some memory that may be internal or external to the FPGA itself. In the second case, there must be some boot procedure, using for example an external microcontroller, which will load the memory value to the FPGA at boot time.

    In complex systems involving microcontrollers and PFGAs, FPGA reconfiguration is possible during system operation, thus adding an extra degree of flexibility to the design without having to include additional hardware.

    At the other extreme, FPGA projects may include a microcontroller implemented from the elements of the FPGA itself and thereby achieve, in a single integrated circuit (the FPGA), several more or less complex logic blocks and the ability to execute software changed by the design engineer, during the PCB assembly process, or even changed after the equipment has been shipped to customers out in the ‘field’, provide benefits to designers of data center engines.