Gluon CV: add image classification, detection and segmentation to your application in minutes

Apache MXNet is an open source library for Deep Learning, supporting both symbolic and imperative programming. The latter is implemented by the Gluon API, which we discussed before.

Gluon: building blocks for your Deep Learning universe
Launched in October 2017, Gluon is a new Open Source high-level API for Deep Learning developers. Right now, it’s…medium.com

One of the cool features of Gluon is its extensive model zoo, where you can grab a large number of pre-trained image models as easily as this:

from mxnet.gluon.model_zoo import vision
net = vision.squeezenet1_1(pretrained=True)

Guess what? It just got better!

The Terminator: real-time classification and segmentation. He*did* come from the future!

Gluon CV

Gluon CV (Computer Vision) is a brand new project which extends the model zoo to:

More image classification models trained on ImageNet and CIFAR-10: ResNet v1 and v2, MobileNet v2, WideResNet and RestNext.
Single-shot detection models trained on the Pascal VOC dataset: VGG16 300x300, VGG16 512x512 and ResNet 50 512x512.
Segmentation models trained on Pascal VOC: ResNet 50 and ResNet 101.

Similar models were previously available on Github (such as these), but using them wasn’t always straightforward. A while ago, I also showed you how to use pre-trained models with the symbolic API, but Gluon makes it much simpler.

Gluon CV also includes:

Utility APIs to transform and display images,
Tutorials,
Prediction, training and fine-tuning scripts!

Let’s try this thing.

Installation

Gluon CV is still a very young project. If you want to enjoy the latest features and bug fixes, I’d recommend installing the latest MXNet and Gluon CV packages. You might want to do this in a virtual environment to avoid messing up your Python environment :)

$ virtualenv ~/mxnet-gluoncv
$ source ~/mxnet-gluoncv/bin/activate
$ pip3 install mxnet gluoncv --pre --upgrade
$ python3
>>> import mxnet, gluoncv
>>> mxnet.__version__
'1.2.0'
>>> gluoncv.__version__
'0.2.0'
$ git clone https://github.com/dmlc/gluon-cv

Good to go.

Image classification

Let’s first try to classify this image with demo_imagenet.py.

$ python3 demo_imagenet.py --model resnet50_v2 --input-pic kreator.jpg

The input picture is classified to be
 [electric_guitar], with probability 0.671.
 [drumstick], with probability 0.103.
 [stage], with probability 0.076.
 [banjo], with probability 0.024.
 [acoustic_guitar], with probability 0.016.

Looking at the script itself, all it really takes is about 5 lines of code!

Load a pre-trained model.
Read and transform the image (resize, crop, normalize colors).
Predict the image and display the top 5 categories.

Let’s try image detection.

Image detection

The 20 Pascal VOC classes are: person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/monitor.

Let’s grab a picture displaying some of these objects and run a SSD with demo_ssd.py.

$ python3 demo_ssd.py --network ssd_512_resnet101_v2_voc --images room.jpg

This is pretty good! The painting was even detected as a person, which makes sense I think.

Looking at the script itself, once again this takes less than 10 lines of code.

Image segmentation

Last but not least, let’s use the demo_fcn.py script to segment this image.

Not bad at all! The three cars were picked up, as well the most visible people. Not sure what the yellow thing in the lower left corner is, though :D

That’s it for today. Gluon CV really makes it very simple to use state of the art pre-trained models. Please take a look at the code and try it with your own apps: it’s much easier than you probably think.

Happy to answer questions here or on Twitter. For more content, please feel free to check out my YouTube channel.

I approve this message. Give’em hell, ladies \m/

About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at AWS and Chief Evangelist at Hugging Face, Julien has authored books on Amazon SageMaker and contributed to the open-source AI ecosystem. His mission is to make AI accessible, understandable, and controllable for everyone.