Our paper entitled: “Multi-resolution-Tract CNN with Hybrid Pretrained and Skin-Lesion Trained Layers” was accepted and presented as an oral talk in the Machine Learning in Medical Imaging (MLMI) Workshop (part of the MICCAI conference).
In this work, we used a convolutional neural network (CNN) to classify 10 different types of skin lesions, including melanoma and non-melanoma types.
The key technical contribution was to use multiple tracts (or paths) within the neural network, to train (and test) the network on an image using multiple image resolutions simultaneously. Additionally, we extended a CNN pretrained on a single image resolution to work for multiple image resolutions.
Here are our slides presented at MLMI (thanks Aïcha!) showing our deep learning approach to classify skin disease:
There were some good questions asked after the talk. As I wasn’t able to attend this workshop, I thought I would try to answer the questions here.
> How accurate are clinicians on the same task?
Not sure. I haven’t found statistics for the case of general disease classification. I’ll keep out an eye for this.
> Were images cropped around the lesion, and if yes how was the lesion detected?
The images were detected by a human. As far as I can tell from reading the related documentation, the human just took an image of the lesion. Not sure if a rough lesion cropping was done.
See:
Ballerini, L., Fisher, R. B., Aldridge, B., & Rees, J. (2013). A Color and Texture Based Hierarchical K-NN Approach to the Classification of Non-melanoma Skin Lesions. In M. E. Celebi & G. Schaefer (Eds.), Color Medical Image Analysis (Vol. 6, pp. 63–86). Springer Netherlands.
http://link.springer.com/chapter/10.1007%2F978-94-007-5389-1_4
https://licensing.eri.ed.ac.uk/i/software/dermofit-image-library.html
> Did you try not sharing the network’s parameters in conv layers?
No, I did not but this would be interesting/worthwhile to try.
> Why is the reported validation accuracy lower than test?
There’s some natural variation between the folds, especially with these relatively small dataset sizes. Also, there’s probably some overfitting to the validation set occurring as the hyper-parameters are tweaked to work well on the validation set.
> Why did you use Alexnet and did you try other pre-trained networks?
Alexnet is a very common network and other works have shown this network to generalize particularly well. See:
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2014). DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In International Conference on Machine Learning (ICML) (Vol. 32, pp. 647–655).
http://arxiv.org/abs/1310.1531
Other networks could also be tried (actually I did try a few networks quickly, but I didn’t notice a clear improvement, and generally a slight decrease in overall accuracy – but I did not rigorously check this).