AS014 » Dermoscopic image processing for cancer detection
When the skin cancer is not detected in early stage can cause metastasis, consequently, the cancer scatters to overall body. Based in this fact, the proposal consists in image processing and machine learning approach to make a computer assists in cancer detection in acquired images according to existent patterns
When the skin cancer is not detected in early stage can cause metastasis, consequently, the cancer scatters to over all body. Based in this fact, the proposal consists in image processing and machine learning approach to make a computer assists in cancer detection in acquired images according to existent patterns
The main idea is provide an application based on known medical methodologies that no require a tissue extraction. Using computational tools, the system will be able to analyze an image and infer if the capture is a benign spot or a treatment is needed.
Firstly, the application focus on medics on training that seek a reference to develop a new skill on image analysis to infer a medical issue. Secondly, by modular implementation, the system can be used to teach methodologies to process digital images for a specific application based on its requirements. Finally, the approach can show how the technology advances could be used to accelerate developed applications
The diagram of Figure 1 provides a scheme about the system, it is divided in two CNN-based main processes, Segmentation and Classification.
Fig 1. Involved process to analyze the images
The first process will isolate the region of interest in the image in order to erase those pixels that no provide significative information to the classifier. This will be achieved by using machine learning approaches to recognize the patterns and select the effective analyzed area. The second process works with a masked image that remove those pixels that are background, this allow to the classifier weight the pixels that are region of a spot. The Feature extraction uses the region moments to calculate the Area and Eccentricity in order to provide more information about the segmentation.
The application of the technologies will consist in the collaboration between the host computer and the OpenVino Starter kit. This allow have a powerful and scalable tool to implement machine learning models. As in the Fig. 2 is shown, the main processor will be the first contact with the data, the algorithm, implemented in the FPGA, will request the data from the processor with the OpenVino platform as integrator of the technologies, and the results should be transferred from the FPGA to the CPU to show the result.
Fig 2. Involved process to analyze the images
There is a main architecture for image processing based on deep learning: the U-Net.
The U-Net is a purposal of Ronneberger et. al. for biomedical image segmentation. This architecture empower the down-sampling and upsamplig to classify the pixels according the number of classes nedded. the down-sampling or contracting path uses a typical Convolutional Neural Network (CNN) path, according to the original source, the architecture consist of 3x3 unpadded convolutions followed by the Rectified Linear Unit activation funtion (ReLU) and a 2x2 max pooling doubling the feature maps in each downsamplig block. In the other hand, to recover the classification mask the architecture starts with an upsamplig followed by a 2x2 "up-convolution", to reduce the feature channels, a concatenation with the same level reduced fetured map and two 3x3 convolutions followed by a ReLU. The architecture is illustrated below.
Fig 3. U-net architecture (example for 32x32 pixels in the lowest resolution).
A minimal change in the architecture is the padding addition to the Convolutional operations and the image resolution to images with 160x320px
The first step is define an architecture for the inference engine, in our case the UNet architecture was selected. In first instance, UNet is trained for a segmentation problem in order to solve which regions are of interest for the problem solve. If the architecture is feasible, for a second instance, a transfer learning approach is visualized because the architecture can recognize whole those features that defines a dermoscopic lesion, consequently, the trained architecture allow skin problem detection.
The dataset used provides the segmentation mask and clinical diagnosis. Given this information we can train UNet and take our trained model and use only the first half of the architecture to train a new model to solve the problem of classification.
For the first training we are using the PH2 dataset in order to test the training process with a small dataset, achieving accuracies of 97% and 92% for the training and validation subsets respectively, this results are obtained according to the Intersection over Union (IoU) measure. The evolution over the epochs is visualized in the next figure.
Fig 4. Training evolution.
The training is done by a computer with the next features:
Processor: Intel i9-9900k
Memory: 32 GB RAM
Graphics: NVIDIA GTX 1080
Platform: TensorFlow 1.12
OS: Ubuntu 16.02
Based on the idea that U-Net starts like a CNN architecture, the transfer learning approach is used to generate a classifier for the dermoscopic images. Only the first half of the architecture is used to extract the features and the new output is constructed by a small Fully Connected Network (FCN) with a softmax for three classes available in the PH2 dataset.
The first optimization is done by the model freezing, removing hyperparaters from the model and ensure that the parameter has a static value and a good precision. The second stage consits in the Intermediate Representation (IR) model generation, breaking up the model into two files parameters (.bin) and architecture (.xml). The IR need two parameters extra scale_values, because the data is normalized during optimization process, and reverse_channels, according a openCV image handling.
The designed systems performs the inference for segmentation and classification with InferenceEngine libraries, while openCV process the images for vizualization, given the next three examples
Fig 5. System test examples
This optimization allow use a computer with restricted resources because the heavy processing is loaded in the FPGA and allow to the processor made other processes, avoiding a system crash. Another important use of the inference results is the statistical feature extraction using the segmentation mask obtained in the segmentation inference. For this, the openCV libraries is sufficient to get this statistical data about segmentation, providing more information to create a computer aided diagnostic. Between the features we can get area, eccenticity, moments and more in order to infor to the user and show importan information about the region of interest like the more symetrical axis
Fig 6. Region of Interest feature extration and representation
In the first attempt of system development we had a small resource computer, achieving a 75% of accuracy in the segmentation engine. The last described equipment allows to us increase the training computation and model configuration, with that we reached a model with 92% accuracy and 0.15 in Cross Entropy Loss measure.
The dataset has a high variation in image acquisition like over ilumination, hair presence, lenses coupling and so on. This effects are considered by the architecture and derives in inference error controlled by Cross Entropy and IoU measures that could be more effective with small architectures in less variation acquisition scenarios.
In addition, We achieve a segmentation time of 93.88 ms and a classification time of 22.39 ms with models with precisions of U8 and FP16 for the input and output respectively. the input images are resized from 576x767 px to 160x320 px to use the loaded models and the segmentation is postprocessed to obtain the mayor and minior symetrical axis, area and Eccentricity with statistical moments. The total process time rises up to 139.02 ms, this means that we can proccess 7.19 FPS with all processing times.
Fig 7. System implementation workflow
The system is an integral design because we need use specific resources for each stage. The CPU, Nvidia GPU and Tensorflow platform work together for the training process. The trained architectures need an optimization in order to get the best version of the model generated. For this, a cooperative work between Tensorflow and OpenVINO platforms should be stablished for the IR model generation. After that, the CPU, GPU and FPGA hardware platforms could have an heterogeneous approach, this means that the CPU and GPU process the image for adaptation and visualization while the CPU and FPGA made the inference about the data obtained. The data to process consists in images adquired by telemedicine perspective, consequently, only a patient or medic take and send a photo to standard use computer and obtain a preliminar diagnosis.