Cell Classification

Johannes Ulén, Fredrik Kahl and I recently worked on a data set from the ICPR Contest on HEp-2 Cells Classification. The task is to classify images of cells as one of six possible categories. The image below show examples from the data set:

Each row shows images from one class.

Each row shows randomly selected images from one class. Click to enlarge.

This type of image classification, which takes place in a very controlled setting is relatively easy, in the sense that it is possible to achieve good results. For these “easy” tasks, convolutional neural networks have proven extremely effective. See for example the MNIST database of handwritten digits and the German traffic sign recognition benchmark. In the latter benchmark, convolutional neural networks perform better than humans. They have big drawbacks, however, with the training time being the biggest. Even with small images and powerful GPUs, the training time can range into weeks. The cell images are larger, so we decided to use random forests and a total training time of a few seconds.

The interesting part is to somehow compute relevant features from the image to be classified. Our approach thresholds the image at many different intensity levels and computes shape features (area, perimeter etc.) on the resulting binary image. We also compute gradients at various scales and texture information like co-occurrences. The nice thing is that one does not have to worry too much about the relevance of each individual feature—the machine learning algorithm will take care of that for you.

The source code is available at Github.


One Response to Cell Classification

  1. vheinitz says:

    Great article! Very practical and pragmatic selection of features! I wonder, if your classifier has found its way into a commercial product?
    Does the code work on Octave too?

    I’m currently working on a prototype for a commercial Hep-2 classifier. We have started with 4 basic patterns and the best accuracy, we could achieve is > 98%. However, I’m analyzing entire images, and can select first most similar cells concerning size and histogram, before starting pattern analysis of the cells. It’s easier.

    What I’ve put much effort until now is the feature selection. I used to thoroughly select the features partly because for the recognition time, partly because redundant or not relevant features make the recognition inaccurate. I’m using about 10 features (Haralick+some from OpenCVs feature2d-lib) and classifying with SVM. However, I’ve never tried to use more than 50 features. Maybe increasing the number of features to 900 as in your case will again increase the accuracy. I’m curious to test it.

    In any case, I’m going to implement and check the features you described in my system.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s