Johannes Ulén, Fredrik Kahl and I recently worked on a data set from the ICPR Contest on HEp-2 Cells Classification. The task is to classify images of cells as one of six possible categories. The image below show examples from the data set:
This type of image classification, which takes place in a very controlled setting is relatively easy, in the sense that it is possible to achieve good results. For these “easy” tasks, convolutional neural networks have proven extremely effective. See for example the MNIST database of handwritten digits and the German traffic sign recognition benchmark. In the latter benchmark, convolutional neural networks perform better than humans. They have big drawbacks, however, with the training time being the biggest. Even with small images and powerful GPUs, the training time can range into weeks. The cell images are larger, so we decided to use random forests and a total training time of a few seconds.
The interesting part is to somehow compute relevant features from the image to be classified. Our approach thresholds the image at many different intensity levels and computes shape features (area, perimeter etc.) on the resulting binary image. We also compute gradients at various scales and texture information like co-occurrences. The nice thing is that one does not have to worry too much about the relevance of each individual feature—the machine learning algorithm will take care of that for you.
The source code is available at Github.