Gluon CV: add image classification, detection and segmentation to your application in minutes
Apache MXNet is an open source library for Deep Learning, supporting both symbolic and imperative programming. The latter is implemented by the Gluon API, which we discussed before.
Launched in October 2017, Gluon is a new Open Source high-level API for Deep Learning developers. Right now, it’s…medium.com
One of the cool features of Gluon is its extensive model zoo, where you can grab a large number of pre-trained image models as easily as this:
from mxnet.gluon.model_zoo import vision
net = vision.squeezenet1_1(pretrained=True)
Guess what? It just got better!

Gluon CV
Gluon CV (Computer Vision) is a brand new project which extends the model zoo to:
- More image classification models trained on ImageNet and CIFAR-10: ResNet v1 and v2, MobileNet v2, WideResNet and RestNext.
- Single-shot detection models trained on the Pascal VOC dataset: VGG16 300x300, VGG16 512x512 and ResNet 50 512x512.
- Segmentation models trained on Pascal VOC: ResNet 50 and ResNet 101.
Similar models were previously available on Github (such as these), but using them wasn’t always straightforward. A while ago, I also showed you how to use pre-trained models with the symbolic API, but Gluon makes it much simpler.
Gluon CV also includes:
- Utility APIs to transform and display images,
- Tutorials,
- Prediction, training and fine-tuning scripts!
Let’s try this thing.
Installation
Gluon CV is still a very young project. If you want to enjoy the latest features and bug fixes, I’d recommend installing the latest MXNet and Gluon CV packages. You might want to do this in a virtual environment to avoid messing up your Python environment :)
$ virtualenv ~/mxnet-gluoncv
$ source ~/mxnet-gluoncv/bin/activate
$ pip3 install mxnet gluoncv --pre --upgrade
$ python3
>>> import mxnet, gluoncv
>>> mxnet.__version__
'1.2.0'
>>> gluoncv.__version__
'0.2.0'
$ git clone https://github.com/dmlc/gluon-cv
Good to go.
Image classification
Let’s first try to classify this image with demo_imagenet.py.

$ python3 demo_imagenet.py --model resnet50_v2 --input-pic kreator.jpg
The input picture is classified to be
[electric_guitar], with probability 0.671.
[drumstick], with probability 0.103.
[stage], with probability 0.076.
[banjo], with probability 0.024.
[acoustic_guitar], with probability 0.016.
Looking at the script itself, all it really takes is about 5 lines of code!
- Load a pre-trained model.
- Read and transform the image (resize, crop, normalize colors).
- Predict the image and display the top 5 categories.
Let’s try image detection.
Image detection
The 20 Pascal VOC classes are: person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/monitor.
Let’s grab a picture displaying some of these objects and run a SSD with demo_ssd.py.
$ python3 demo_ssd.py --network ssd_512_resnet101_v2_voc --images room.jpg

This is pretty good! The painting was even detected as a person, which makes sense I think.
Looking at the script itself, once again this takes less than 10 lines of code.
Image segmentation
Last but not least, let’s use the demo_fcn.py script to segment this image.


Not bad at all! The three cars were picked up, as well the most visible people. Not sure what the yellow thing in the lower left corner is, though :D
That’s it for today. Gluon CV really makes it very simple to use state of the art pre-trained models. Please take a look at the code and try it with your own apps: it’s much easier than you probably think.
Happy to answer questions here or on Twitter. For more content, please feel free to check out my YouTube channel.
I approve this message. Give’em hell, ladies \m/