Let's create an EC2 instance. From the EC2 console, click on Launch Instance, and we want to use the DeepRunning AMI, which will have Jupyter installed and all the necessary tools. We can find it in the marketplace. There are different versions, so let's pick the latest one, which is v26, with all the latest tools and frameworks. I want to use the Ubuntu edition, so that's what I select.
Next, we're going to select an instance type. The Detectron model is compute-heavy, so let's grab a GPU instance for this. A P32XL sounds about right; it has one GPU, which should be enough. You can leave all the other settings at their defaults. I like to have a power user role allowing me to do various tasks, but that's probably not needed here. Make sure you have the right permissions. Storage is important because we're going to download the COCO dataset, which is quite large—tens of gigabytes if I remember correctly. Let's ensure we have plenty of storage. Why not use SSD storage as well? You don't have to, but I can tweak all those things since I'm not paying my bills.
For tags, let's call this "detectron" to demo. For the security group, I will only need SSH and Jupyter Notebook access. So that's already created. If you haven't, you just need SSH, TCP 22. I'm going to redirect the Jupyter port to 8888, so you need to have that one open as well. Review and launch. Select your key pair. Yes, that's the one I want. And launch. This will take a few minutes, so let's pause the video and I'll see you when the instance is up.
After a few minutes, the instance is up. Now let's connect to it. I'm using SSH here, setting up port redirection from instance port 8888 to local port 8000. This will allow me to connect from my Mac using my local browser to the remote Jupyter Notebook running on the EC2 instance. Let's connect now. This is the EC2 instance I've started. Now I just need to start a notebook. If I grab that URL with the token and open it, remember to change 8888 to 8000, I should be redirected to the notebook running on the EC2 instance.
The next step is to grab the Detectron notebook from the Colab website. Just download the IPython notebook. In the interest of time, I've done all of this and created a "Detect Run 2" folder so that everything is in the right place. Here is the notebook. This is the exact one I downloaded from Colab. I've made a few changes to make it run on the Deep Learning MI. I've highlighted my changes, and this notebook is available in one of my GitLab repos. The URL is in the description, so you can grab my version and run it directly on the Deep Learning MI.
This cell confirms it's running on Deep Learning MI v26 with the Python 3 kernel. I won't explain all the cells; you can go through the tutorial for more information. I'll highlight the changes I've made. For example, we need to install OpenCV, which is not present by default on the Deep Learning MI. We also need to clone the repo, and if you're trying this on your local machine, ensure you have a recent version of the GNU compiler (GCC) because it will compile some components. Another package I need to install is the Google Colab package because some cells use proprietary methods from that package. It's good practice to restart the kernel to ensure all imports are taken into account. Let's do this, clear everything, and resume from here.
The first step is to use a pre-trained Detectron model to segment an image. We can visualize the results. There's a lot to explain, but the tutorial has plenty of good information. Next, we train a dataset. We use the balloon dataset, which has many balloon pictures. Download, unzip, and write utility methods to load images. Here's an image from the dataset. This is one of the annotated images. Now we fine-tune the segmentation model on the balloon class. This is where a GPU instance is helpful, as it takes a while on CPU. We can see the training is fast, taking about a minute. Once done, we predict on images from the validation dataset. The segmentation looks neat, and the model works well. We can see metrics like AP, which is a good metric for segmentation models.
We can also try keypoint detection, which focuses on detecting key points for humans, such as arms, legs, and eyes. Panoptic segmentation tries to segment everything in a picture, associating every pixel with an instance. We can try this on a video, which is cool. The video is grabbed from YouTube, and one change is to install FFmpeg, which is not present by default on the Deep Learning MI. In the demo script, you need to replace x264 with MPEG if your FFmpeg version doesn't support x264 due to licensing issues. Once you make this change, the script will process the video, using the segmentation model on each frame. This takes about one minute and 40 seconds. After processing, we can check that the MPEG file is there. I commented out the last few lines and SCP the file to my local machine. If I open it, we see the segmented video. This is just the beginning, but by tweaking the script, you can segment arbitrary length videos.
That's it for today. See you next time with another exciting topic. Bye.