EM018 » Mushroom Recognition
Collecting Mushrooms for human consumption is a very popular activity in Catalonia. It is so popular that the goverment is often discussing methods to control access to the forests of the country, including tolls. There are hundreds of different mushroom especies, some are very appreciated, and some are poisonous, and even can cause death. There are a number of fatalities every day because of mushroom intoxication.
The exact identification of mushroom especies is an important challenge that can save lives. The aim of this project is to build a machine learning system to identify each mushroom especies from photos taken with mobile phones.
Mushroom and fungi gathering is a popular activity around the world. There are thousands of fungi species and some of them resemble to each other. It requires an expert to identify and classify species and a neophit usually runs in some risks when identifying collected fungi.
Traditionally classifying objects through images required a thorough work on the poblem and build complicated hand crafted algorithms. Furthermore, for complex classification tasks, such as mushroom identification, with a high color, form and context diversity, it was almost imposible. With the advent of Convolutional Neuronal Networks the possibility to build applications for complex classification was opened.
The problem in certain applications is the need to perform classification without a connection to a server. This obliges to port the heavy CNN structures to edge devices. FPGAs stand as a well suited and efficient hardware solution to solve this problem.
Nevertheless, CNNs are complicated and heavy networks which require specific languages for building and deploying them. This situation altogether with the traditionally complicated FPGA programming and design made difficult porting such structures to FPGA hardware. Recently, however, dedicated software for porting neural networks to FPGA has appeared. Among them, Intel OpenVINO platform appears as an outstanding solution for this task.
For this reason, the CNN implemented will vary depending on the memory available and also on the accuracy attained. A priori, the neural network that will be used will be a MobileNetV2. However, as the size of this network might not be enough, further experiments with networks, such as quantized Inception are considered.
The main problem training NNs is the quantity and quality of data. Our funghi dataset consists of around 300k labelled images. With this dataset previous experiments with Mobile NetV2 have shown an accuracy at top of 40%. However, we expect to improve this results by adding more data, adding preprocessing and choosing the right network.
Our intention is to build a fungi classifier which is efficient, well performing and portable. This will solve the proble for mushroom pickers of identifying dubious species in field situations and help spread knowledge of this still expertise requiring field.
The pipeline will consist of:
The two oustanding capabilities of the OpenVINO Kit and FPGAs are:
Mapping CNNs or any other kind of neural networks to hardware is not an easy task. For this reason a variety of frameworks have been born over the last years: CMSIS-NN, ARM-NN or Tensorfow for Microcontrollers, among others. In all the cases, two main important points stand out: the level of human interaction needed for porting the NNs and the hardware in which those frameworks can port NNs.
In the case of FPGAs, OpenVINO offers a simple and flexible but effective framework which allows to port different CNNs without varying the API calls. This flexibility allows for an excellent and fast prototyping in which different nets are tested.
Traditionally, target platforms of neural networks were GPUs. However, with the advent of the previusly named frameworks, those networks can now be ported easily to resource constrained platforms. But, what happens if more power is needed? In the case of GPUs, the energy efficiency is not suitable since they have high energy profiles while FPGA offer a good tradeoff between fast and powerful conputation and lower energy profiles
The purpose of the design is to be able to carry out funghi classification thorugh CNNS in an efficient and fast manner. For this matter, Intel FPGAs altogether with OpenVINO offer a perfect solution.
The application is devoted to funghi classification and the targeted users are expert or novice fungi gatherers that want to have an in field double check or first identification of a funghi.
Usually, for high complex classification tasks, GPus are used. However, this hardware target has a high energy profile. For this reason, if a server was to be used to carry out Funghi classification for images coming through internet, a more efficient hardware would be desired. In this case, FPGAs offer a good tradeoff between the energy profile and the inference speed.
Funghi seeking is a popular activity in many countries. However, it is not an easy task since around 75,000 identified funghi species exist. For this reason, a mechanism who can indentify mushrooms or funghies when the seeker is doubtful is helpful and can help avoid risky situations.
The main process consists in the following steps:
An improvement over this process would be to implement a web API in the computer where the FPGA is connected. With this implementation anyone who had access could send the photo from their mobile phone and retrieve a result while they are in fornt of the mushroom.
To implement the simplest case, there are several components that are needed: the dataset, the neural networks and the ONNX converter, the OpenVINO converters and the FPGA OpenVIno starter kit.
The dataset is the 2018 FGCVx Fungi Classification Challenge dataset (https://www.kaggle.com/c/fungi-challenge-fgvc-2018/data). It contains around 100,000 samples of funghi images with around 1,500 different species. It is a challenging dataset due to differences among funghies and the surroundings in which they are found. Another difficulty with this dataset is the low number of images per specie, which impeds a robust training for some of them. The dataset is divided in training, validation and test followwing the percentages 60%, 20% and 20% respectively,
For the neural networks, Pytorch is used as framework. Four different networks with different number of parameters and components are used: Resnet18, Densenet121, Squeezenet and MobilenetV2. The networks are finetuned choosing which layers are frozen; usually 90% percent of layers frozen. Once the nets are trained they are converted to the ONNX framework.
Once they are in the ONNX framework, they can be converted to OpenVINO .bin and .xml intermediary representations with the appropriate converters in the OpenVINO suite. Then, the network can be called for inference in the FPGA with the commands with the appropriate commands: InferenceEngine::CNNNetReaderclass. For this, the classification demo is used to include the converted models.
There will be three main performance parameters:
For a good functioning of the system the network has to be as small as possible to give good latency and to be portable to memory contrained devices while also giving a good performance not to confuse mushroom seekers.
The results obtained are the following:
|Latency (fps) GTX 1080||344||285||370||370|
|Latency (fps) Hetero FPGA/CPU||45||9||25||75|
In the results can be seen the differences in efficiency among the networks. Resnet is the biggest and it incorporates the residual connections among consecutive convolutional blocks. Next is Densenet, which is a little bit smaller. The main characteristic is the dense connections among all the layers: each layer is connected to all the others.
Differentiated from these two networks, MobileNEt and Squeezent stand out by its efficient use of resources. The depthwise convolutions and bottleneck layers in the first case, and the fire modules and efficient use of convolutions in the second, make those two netowkrs more effficient. This can be seen in the number of parameters and the attained accuracy in the test set.
The main software flow is the following:
The software flow is the following:
The used networks are trained and finetuned in Pytorch. The networks come from the torchvision model zoo. The adaptation to the desired dataset only requires one architecture change: the classification layer output. The code is simple and in the case of Mobilenet is:
model_conv = torchvision.models.mobilenet_v2(pretrained=True)
num_ftrs = model_conv.classifier.in_features
model_conv.classifier = nn.Linear(num_ftrs, classes)
The number of parameters frozen can be defined activating the tracking of the gradient. For this Pytorch offers a simple interface:
total_param = round(percentage_frozen*len(list(model_conv.parameters())))
for param in list(model_conv.parameters())[:total_param]:
param.requires_grad = False
where the percentage_frozen is the percentage of layers which are frozen.
When the network is trained, the conversion to ONNX is mandatory in order to be able to pass the networks to the FPGA. The pytorch networks can be easly ported to ONNX with the following command: torch.onnx.export . In order to convert the model a sample batch or image is needed to be run through the FPGA.
Finnally, to convert the ONNX model to the OpenVINO Intermediate Representation the python file python3 mo.py --input_model .onnx. And the .bin and .xml files are obtained and can be called with the OpenVINO suite.
For implementing the demo, we use the classification demo and substitute the image and IR.
/my_classification_sample -i \
demo/my_ir/squeezenet1.1.xml -d "HETERO:FPGA,CPU"
The results are straightforward.