Hey, hi everybody. This is Julien from Arcee and welcome to episode two of my podcast. It's a bit of a special episode today. I'm going to focus on TensorFlow 2.0 and how to run it on AWS. The reason for this is that TensorFlow 2.0 is now available on all compute platforms. So you can easily run it on EC2, Container Services, and SageMaker. It's a good opportunity to cover all bases. First, I'm going to give you a little background information on TensorFlow. Then I will explain how TensorFlow 2.0 is a step forward and how it's different from TensorFlow 1.0. After that, I'll show you how to get started with TensorFlow 2.0 on EC2, containers, and SageMaker. Let's get to work!
As you probably know, TensorFlow is an open-source library for machine learning and deep learning. The main API is in Python, and you have some additional support for languages like Java, for example. It came out a little more than four years ago, and the first version, called TensorFlow 1.x, has been extremely successful. I recently read a research report from an analyst company called Nucleus, and they're telling us that TensorFlow is used in 74% of deep learning research projects, with PyTorch a distant second at 43%. So TensorFlow is really the number one library out there. Over time, a lot of features have been added to TensorFlow, and TensorFlow 2.0 came out at the end of September. So how is that different from TensorFlow 1.x? I think it's time for the whiteboard.
TensorFlow 1.x uses a programming model called symbolic mode. Let me explain. Here we go. So let's say we're trying to compute A multiplied by B plus C. Of course, these are not integers or floating points; these are matrices because when you're working with machine learning or deep learning, you're working with matrices, multi-array, multi-dimensional arrays. The fancy word for these is tensors, which is why this library is called TensorFlow. Anyway, let's not get bogged down in vocabulary. When you're working with symbolic programming, you first define the execution graph. So we would need two variables, A and B, a multiplication operator, variables, operators, and we would need a third variable C, and an addition, a plus operator. We would combine them in a graph just like this: A and B feed into the multiplication operator, and then feed the result of that, and C, to the plus operator, and that would give us our results.
If we were writing symbolic code, it would look something like this. We would define three named variables, A, B, C, storing no data at this point, just being variables, so names really for data that we'll provide later on. Then we would define a new variable d, which would be the multiplication of a and b. Again, at this point, this is just definition. No actual processing is performed. Then e would be d plus c, and that would build the execution graph you see here. Once the graph is fully defined, we would compile it using a library function. This would give us a proper function, let's call it f, that we could then apply to actual values for a, b, and c. We would invoke f passing values for a, b, c, and that would give us our result, let's call it Y. So we can clearly see here why this programming model is called define then run. First, we define the graph and then we run the graph using data that we provided.
The problem with this is that as you compile the graph, it's transformed into an internal representation, something that's really highly optimized for speed and memory consumption, and it looks probably nothing like your initial graph anymore. This makes it really difficult to debug and inspect the code and understand why that code is not working the way you want it to run. For example, if we look at this graph again, we could see that D is pretty useless. Sure, we need D to store the result, but that's the only thing it does. Then we use D on the next line for E. So you could say, maybe the memory that we allocate for D can be reused by E. There's no reason to have memory allocated for D and E. A memory optimization would be to reuse the memory allocated for D for the E tensor and save memory. This is just a very basic example, but this is the kind of stuff that graph compilation would do. The benefit is you end up using less memory and, of course, you run the graph faster, so you train faster. So that's symbolic programming: fast, efficient, difficult to debug, and difficult to understand.
Now, let's talk about TensorFlow 2.0 and Imperative Mode. The main difference between TensorFlow 2.0 and TensorFlow 1.x is that we can now shift from Symbolic Mode to imperative mode, which TensorFlow calls eager mode. Let's see what this does. So let's look at the same calculation here. The good news is you already know what imperative mode is because imperative mode is just running code and writing code the way we've been running it forever. So writing a line of code at a time and running a line of code at a time. This is called defined by run. There's no two stages here. We just run that code and it builds and runs the graph line by line. If we're using NumPy, we would create three variables, A, B, C, three NumPy arrays with actual data. So data would be provided right there. Then we would create additional NumPy arrays, one for multiplication. So D is A multiplied by B and E is D plus C, and of course, we get our result. Now, if we try to look at what's happening, it's really running line by line. Every time we run a line, we create a new NumPy object, and all of them exist in memory. So A, B, C, D, E are all inspectable. This makes the code easier to understand and easier to debug. You know exactly what each line does. There is nothing happening magically. What you see is what you run and what you debug. That's the main difference. So easier to understand, a more natural way of writing code, a more friendly way of writing code.
Of course, the downside is it's slower because we have fewer opportunities or possibly no opportunities to actually optimize and do all the crazy stuff that we can do on graphs. The good news is you actually get symbolic mode as well. You can start by writing your code in the imperative fashion, which is great for experimentation, debugging, etc. Then you can easily transform it, compile it to the symbolic form and get the increased speed and optimization that goes with it. So you don't get to pick; you can have your cake and eat it. Or as we say in France, you can have cheese and dessert, right? Which is nice. So that's the biggest difference with TensorFlow 2.0. The other one I want to mention is that the Keras API, which used to be a separate library running on top of TensorFlow, is now fully integrated with TensorFlow. It's now the preferred API. You can use Keras at a very high level, or you can also customize it heavily, more than you could in the past. Here, too, you get more opportunities to experiment quickly, as well as to optimize and write custom code, custom training loops, custom layers, etc. So these two things, eager mode and full Keras integration, are really cool features.
Now, let's look at how you can run TensorFlow 2.0 on AWS. The analyst report I mentioned earlier also told us that 85% of cloud-based TensorFlow workloads run on AWS. That's a nice number. So I guess it gives us a responsibility to make sure TensorFlow runs nicely on AWS. Let's take a look at the different ways you can do that. The first one is to run it on an EC2 instance. To make it simple, we've built deep learning AMIs. If you've never heard about AMIs, that means Amazon Machine Image. It's basically the binary file that is used to create virtual machines on Amazon EC2. And yes, it's pronounced AMIs, not AMIs. If you go to the AWS Marketplace, you'll find different AMIs already packaged. The one you want to use if you want to use TensorFlow 2.0 is version 26 or later. At the time of recording, this is the latest version, but don't go and pick something older because you're going to miss TensorFlow 2.0. These are available for Amazon Linux 2 or Ubuntu 18. So whatever suits you. You can just select this AMI, launch an instance, which I've already done. I've launched a G4 instance here. I can SSH to my instance and I can see the different environments that are available there. This is all managed by Conda, the package manager for Python. Let's select TensorFlow for Python 3.6 and activate that. Now, if I run Python 3 and import TensorFlow, and I look at the version, it is TensorFlow 2.0. And you know what to do next. So this is one way of doing it. Just fire up the Deep Learning AMI, the latest version, and it comes with TensorFlow 2.0 pre-installed. We update those AMIs very regularly, so you will also get the future versions.
Another way you can use TensorFlow 2.0 is with the deep learning containers. Deep learning containers are what you would think. AWS maintains containers that package the deep learning libraries that are available on the deep learning AMI. So we have MXNet versions, PyTorch versions, and TensorFlow versions, and we have separate containers for training and prediction. For the sake of simplicity, I'm going to keep working on this same instance. But of course, this would work exactly the same on one of our container services, ECS, EKS, or just any EC2 instance pre-installed with Docker. The first step is to log in to Amazon ECR, the Docker registry service for AWS. Make sure you provide the right region for that. Next, you can just pull the image. You'll find a list of image names in the deep learning container documentation. I already did that because it's not really interesting to see Docker images being pulled. So now my image is available, and I can easily run it just like that: `docker run`. And again, if I run Python and import TensorFlow, I should see that this is the proper version. Yes, version 2.0. So nothing fancy, just containers. But unless you really enjoy maintaining your own containers, why not give those a try? They might just save you some time, and of course, these come with optimized versions. We have a dedicated team working on optimizing TensorFlow on AWS, so this is not a vanilla version; it's actually a pretty fast version.
How do you use TensorFlow 2.0 on SageMaker? Just like you used the previous version. This is a very simple notebook with a simple TensorFlow 2.0 script, which I will put on GitLab, and you'll get all the information for that. How do you use that thing? Well, remember that when you're training a TensorFlow script on SageMaker, you use the `SageMaker.TensorFlow.TensorFlow` estimator. This takes your script as the first parameter, your infrastructure requirements (how many instances you want, what type of instance, hyperparameters, etc.), and the framework version. There's a parameter called `framework_version` where you say, "Hey, I want to use TensorFlow 2.0.0." And that's it. So in case you're wondering, you need the SageMaker SDK 1.49 or later. Make sure you update your SDK to this latest version. This was pushed yesterday. But if you have 1.49 or later, you can now just say, "All right, please give me Framework version 2.0.0." And that's it. For the record, this hasn't officially been announced, but the code is out there, so the feature is available for all of you. Deploying is exactly the same as well. You would call `.deploy` on your estimator and get a model, and you'll be able to predict. So from a SageMaker perspective, the only difference is to use the new framework version.
Well, I think that's it. I think that's what I wanted to show you today. So remember, three ways you can use TensorFlow 2.0: Deep Learning AMI (make sure you use version 26 and up), Deep Learning Containers, and SageMaker (make sure you use SDK 1.49 and up). That's it for this episode. I hope you learned a few things. Merry Christmas and Happy Holidays to all of you out there, and I'll see you soon. Maybe I'll have a New Year's episode. Who knows? Anything's possible. It's AWS. It's machine learning. It's totally crazy. See you next time, and until then, keep rocking.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.