Hi everybody! Welcome to this new episode of Sage Makeup Fridays. If you've been following us so far, you'll notice I'm on my own today, and there's a really good reason for that. My dear friend Sego just had a baby, and so she's pretty busy at the moment, and she's got better things to do than worry about machine learning. So, congratulations Sego, and thanks again so much for all the help you've been providing with SageMaker Friday. I'll do my best, right? And let's see how this works.
This is episode 11. We're still working on AutoML, and in the last couple of weeks, we've studied SageMaker Autopilot, and we also looked at an open-source library called Autogluon. Today, we will actually spend the whole episode working with Autogluon on a computer vision problem. So, AutoML for computer vision. Let's see what this is all about. We'll actually reuse the example that we've been using already a few times this season, where we start from a computer vision dataset for cancer cell detection, metastasis detection. We'll try to automatically build a model with Autogluon, and I'm sure we'll learn a few things. Next week, we'll keep working with Autogluon on a multimodal problem, so we'll work on a dataset that includes images, natural language, categorical features, etc. Make sure to catch that one too.
Before we jump into the example, this is where you will find the code for those last few episodes. Go grab it, run it, and ping me if you have any questions. In case you missed the last few episodes, just a quick word about Autogluon. Autogluon is an open-source project. You can find code, examples, and documentation on GitHub. There's also a really cool research paper, which is actually quite readable, as I've said a few times. I encourage you to spend some time and read it. It provides a good overview of what AutoML is and some of the unique features that Autogluon brings. It's called Autogluon because it's based on the Gluon API, which is part of Apache MXNet. You don't need to know any MXNet; you can just use Autogluon as is. But just so you know, this is based on Apache MXNet, the Gluon API, and some other toolkits like Gluon CV for computer vision and Gluon NLP, which I'll mention later today.
The main features of Autogluon include the ability to build models automatically for tabular data, text, images, and, as I've said earlier, multi-modal data, which combines these types. It also does data processing and feature engineering automatically. It includes a selection of well-known algorithms like linear regression, KNN, tree-based algorithms, XGBoost, LightGBM, and deep learning models. This means neural networks for tabular data that go beyond traditional feed-forward architectures. There are some clever tricks in the research paper. It also includes NLP models and computer vision models. Autogluon automatically builds ensemble models using techniques like bagging and stacking. More importantly, it takes just a few lines of code. I'm lazy, so that's definitely a good thing. Just a few lines, and you can fire it up and wait for your model to be trained while you do something useful in between.
Okay, so enough slides for now. Let's close this thing and get started with our problem today. Let me quickly show you some of the images we want to work with, and then I'll backtrack a bit. The problem we're trying to solve today involves medical images that show cells. Some of these cells are healthy, and some show metastasis, which is not a good sign. It's a binary classification problem with two classes: no metastasis and metastasis. We're trying to automatically build a model that can accurately detect and classify these images.
The images come from a dataset called Chameleon 16, which contains 40,000 images, each 96 by 96 pixels. The dataset is stored in an HDF5 file, a dense and convenient format. However, for this example, I had to extract the images from this packed file. In real life, you would have the images ready and could train on them directly. I wrote a few lines of Python code to extract the images and organize them into folders, one per class. This will be included in the GitHub repository. The process involves creating folders, opening the HDF5 file, and looping over the images to store them in the appropriate folders based on their labels.
Now, let's start with the process. I'm using SageMaker Studio with an MXNet kernel optimized for GPU, running on a GPU-powered instance. The reason for this is that we're training a computer vision model, and a single GPU will help with that. I'm using a G4DNXL instance, which is quite cost-effective. You could use larger P3 instances with up to eight GPUs for faster training, but for cost and time reasons, the G4DNXL instance is just fine.
I'm installing Autogluon and some widgets for progress indicators. I'll need the image dataset object and the image predictor object. These are the main objects we'll use. For hyperparameter optimization, there are additional objects, but we'll discuss that later. I'm checking that my dataset is ready. It's almost balanced, which is good, with almost as many images with metastasis as without. 40,000 images is a decent number for this type of problem.
Loading the data is simple. You just use the image dataset object and point it at the folder you created. We see the first few examples, the path to the first few images, and the dataset's shape. The dataset has 14,000 samples and two classes, which looks good. I'm training on the full dataset, but you could split it using the random split function with a split factor to create training and validation datasets.
Now, let's configure our predictor using the image predictor object. It has several parameters, but we'll stick to defaults for now. We call the fit method, passing the dataset and some parameters. I'm restricting training to five epochs and using transfer learning with pre-trained models. I'm ensuring we use a single GPU per trial, which is useful if you have a multi-GPU instance. I've set a time limit of two hours, although five epochs will likely take much less time. This is to prevent the job from running for days.
We see the training logs, labels, and data splitting. Hyperparameter optimization is not set, but we'll discuss that later. The default parameters include a batch size of 16, a learning rate of 0.01, and five epochs. The default model is ResNet 50 V1B, a good all-around model for classification with reasonable training times. You can use other models from the GluonCV model zoo, which includes ResNet, VGG, DenseNet, SqueezeNet, MobileNet, and more.
After one epoch, we already get 73% accuracy, and validation accuracy is 84%. This shows the power of transfer learning. Training from scratch would require many more epochs. After 12 minutes of training, we hit almost 90% accuracy with default settings. This is a great start for those new to machine learning and deep learning. We wrote very little code: just pass the dataset and parameters, and you get 90% accuracy.
For those who want to improve accuracy, you can try larger or more sophisticated models, run hyperparameter optimization, or use presets like best quality, which will train for longer and explore more. I set a time limit of four hours and tried ResNet 101 V1. This improves results but takes longer to train and predict.
Running jobs that span hours or days is better done as SageMaker processing jobs. You can create a SageMaker processing processor and run your Autogluon code as a script. If you need GPU instances, use the Spark processor. For tabular datasets, the SKLearn processor works well with CPU instances. For deep learning jobs, the Spark processor is a better choice.
You can also run this on EC2 instances or build a custom container for SageMaker to leverage managed spot training and other features. These are the different options: experiment in the notebook for quick demos, run on EC2 for more control, or use SageMaker processing jobs or custom containers for large-scale training.
That's pretty much what I wanted to show you today. Two lines of code, 90% accuracy out of the box, and you can improve results by tweaking parameters, trying different models, and running hyperparameter optimization. The code is in the GitHub repo, so go and try it out. Grab some images, create your folders, and write two lines of code. Ping me if you have questions.
That's the end of this solo episode. I hope it was okay, and I'll see you next week with the final episode, where we'll explore a multi-modal dataset with images and text. Until then, I hope you learned a few things. Thanks for tuning in, and I'll see you next week. Bye-bye.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.