EM045 » Hardware Accelerator for Text Classification
Text classification is a process, which determines a category of a text according to its meaning. It applicates in a wide range of different fields such as educational purposes, data mining and spam filtration. However, text classification is a difficult computational task. This project will provide FPGA based hardware accelerator design, which may be used to decrease the computational expenses, therefore reducing the amount of time spent on solving the announced task. The main purpose of the accelerator is to classify scientific articles into different categories by their topics.
Textual classification is a complex computational task that requires significant computational power. One way to solve this problem is to use a hardware accelerator that works with a classical computing system. The goal of this project is to create a hardware accelerator based on FPGA for the classification of scientific articles on various topics. The accelerator will be used in conjunction with the Jupyter Notebook development environment for the Python language. For example, k-nearest neighbors to be implemented on FPGAs. For the accelerator, it is planned to use a board from the Terasic OpenVino kit.
The popularity of a hardware accelerator concept based on FPGA board appears to be following the rise of computational complexity of different tasks. Therefore, the development of a system with a hardware accelerator could contribute to unique demands in cutting-edge scientific projects.
Objective of the work: is to develop a hardware accelerator for the problems of textual classification of scientific articles. The developed accelerator will be used in conjunction with the Jupyter Notebook development environment for the Python language. The classification methods will be taken from the scikit-learn library and adapted for the hardware platform. A Python program running on a desktop system sends data about the classifier that needs to be used, and the training and test samples to the accelerator. The hardware accelerator performs the classification and returns the result to the PC. As an accelerator, it is planned to use the Terasic OpenVino kit.The closest solution for the problem of developing hardware accelerator for text classification algorithms is provided in empirical studies by scientists of Iowa State University (Townsend K.R. et al. K-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator // IEEE Int. Conf. Electro Inf. Technol. 2015. Vol. 2015-June. P. 257–263). The authors describe the acceleration of k-nearest neighbors (kNN) method with FPGA based hardware. As the scientists claim, the kNN text classification algorithm is among the most accurate classification approaches, but it is also among the most computationally expensive. To increase the performance and efficiency of the method they implement an FPGA-based sparse matrix vector multiplication coprocessor. The authors implement hardware acceleration for each stage of classification process. The overall results of the study show that FPGA implementation of a kNN method has a speed up factor of 15 over a single threaded CPU and 1.5 over a 32-threaded CPU. Those results demonstrate that the concept of FPGA based hardware accelerator is successful.
To design an efficient hardware accelerator it is important that FPGA board requires the following criteria: an ability to high speed data transfer and efficient computational speed. Therefore, Terasic OpenVino Starter Kit FPGA card has been chosen. PCI-e bus could satisfy the needs of transferring large amounts of text classification data in a small amount of time and FPGA Cyclone V GX consists of 301K logic programmable elements which is more than enough to implement several classification methods. Classification algorithms will be implemented to a system under control of HPS. HPS could control the whole system and would manage data transferring operations and calculation processes. Also, DDR3 memory could be used to store training and testing samples. Every component of a system will be connected by Avalon bus.
References, used in the design of the project:
 Townsend K.R. et al. K-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator // IEEE Int. Conf. Electro Inf. Technol. 2015. Vol. 2015-June. P. 257–263
 A.Y. Romanov, K.E. Lomotin, E.S. Kozlova, A.L. Kolesnichenko, Research of neural networks application efficiency in automatic scientific articles classification according to UDC, in: 2016 Int. Sib. Conf. Control Commun., IEEE, 2016. doi:10.1109/SIBCON.2016.7491783.
 A. Romanov, K. Lomotin, E. Kozlova, Automatization of scientific articles classification according to universal decimal classifier, in: CEUR Workshop Proc., 2017: pp. 122–133. http://ceur-ws.org/Vol-1975/paper14.pdf.
 Романов А. Ю., Ломотин К. Е., Козлова Е. С. Применение методов машинного обучения для решения задачи автоматической рубрикации статей по УДК // Информационные технологии. 2017. Т. 23. № 6. С. 418-423.
Figure 1 - Structure of the hardware accelerator
Figure 2 - System flow diagram