Gluon CV: add image classification, detection and segmentation to your application in minutes

Apache MXNet is an open source library for Deep Learning, supporting both symbolic and imperative programming. The latter is implemented by the Gluon API, which we discussed before.

One of the cool features of Gluon is its extensive model zoo, where you can grab a large number of pre-trained image models as easily as this:

from mxnet.gluon.model_zoo import vision
net = vision.squeezenet1_1(pretrained=True)

Guess what? It just got better!

The Terminator: real-time classification and segmentation. He*did* come from the future!
The Terminator: real-time classification and segmentation. He*did* come from the future!

Gluon CV

Gluon CV (Computer Vision) is a brand new project which extends the model zoo to:

Similar models were previously available on Github (such as these), but using them wasn’t always straightforward. A while ago, I also showed you how to use pre-trained models with the symbolic API, but Gluon makes it much simpler.

Gluon CV also includes:

Let’s try this thing.

Installation

Gluon CV is still a very young project. If you want to enjoy the latest features and bug fixes, I’d recommend installing the latest MXNet and Gluon CV packages. You might want to do this in a virtual environment to avoid messing up your Python environment :)

$ virtualenv ~/mxnet-gluoncv
$ source ~/mxnet-gluoncv/bin/activate
$ pip3 install mxnet gluoncv --pre --upgrade
$ python3
>>> import mxnet, gluoncv
>>> mxnet.__version__
'1.2.0'
>>> gluoncv.__version__
'0.2.0'
$ git clone https://github.com/dmlc/gluon-cv

Good to go.

Image classification

Let’s first try to classify this image with demo_imagenet.py.

Illustration for Image classification
$ python3 demo_imagenet.py --model resnet50_v2 --input-pic kreator.jpg

The input picture is classified to be
[electric_guitar], with probability 0.671.
[drumstick], with probability 0.103.
[stage], with probability 0.076.
[banjo], with probability 0.024.
[acoustic_guitar], with probability 0.016.

Looking at the script itself, all it really takes is about 5 lines of code!

  • Load a pre-trained model.
  • Read and transform the image (resize, crop, normalize colors).
  • Predict the image and display the top 5 categories.

Let’s try image detection.

Image detection

The 20 Pascal VOC classes are: person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/monitor.

Let’s grab a picture displaying some of these objects and run a SSD with demo_ssd.py.

$ python3 demo_ssd.py --network ssd_512_resnet101_v2_voc --images room.jpg
Illustration for Image detection

This is pretty good! The painting was even detected as a person, which makes sense I think.

Looking at the script itself, once again this takes less than 10 lines of code.

Image segmentation

Last but not least, let’s use the demo_fcn.py script to segment this image.

Illustration for Image segmentation
Illustration for Image segmentation

Not bad at all! The three cars were picked up, as well the most visible people. Not sure what the yellow thing in the lower left corner is, though :D

That’s it for today. Gluon CV really makes it very simple to use state of the art pre-trained models. Please take a look at the code and try it with your own apps: it’s much easier than you probably think.

Happy to answer questions here or on Twitter. For more content, please feel free to check out my YouTube channel.


I approve this message. Give’em hell, ladies \m/