AWS AI Machine Learning Podcast Episode 3 Machine Learning from A to Z
December 30, 2019
In this episode, I explain in plain English 26 Machine Learning and Deep Learning terms that you need to know. If you're beginning with the field, this should save you quite a bit of frustration!
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️
Audio samples have been generated with Amazon Polly :)
This podcast is also available in audio at https://julsimon.buzzsprout.com/
For more content, follow me on:
* Medium: https://medium.com/@julsimon
* Twitter: https://twitter.com/julsimon
Transcript
Hi everyone, this is Julien from AWS. This episode is called Machine Learning from A to Z. Not because I'm going to explain all of it in 15 minutes, but because I'm going to go from A to Z and explain some machine learning terminology that is frequently used and is often confusing and intimidating, especially if you're just starting with machine learning. Hopefully, this short episode will give you a better understanding of those important words. As usual, I will try to explain all of it with minimal jargon and minimal theory. Don't forget to subscribe to my channel or my podcast to get all the future episodes. Let's get started.
So here we go. Accuracy. Accuracy is important. It is one of the key metrics you will use to evaluate how well your model performs. The simple definition of accuracy is to run a number of predictions, count how many correct predictions you're making, divide that number by the total number of predictions you made, and you have accuracy. Simple as that. Easy to understand, especially for non-technical users. That's probably the number one metric people will want to hear about.
Back propagation. Back propagation is really central to deep learning. It's an algorithm that lets you update the weights in the neural network. Forward propagation means placing input data on the input layer of the neural network and letting the neural network compute activation values layer by layer. When you get to the end, you have some kind of prediction. In the early stages of the training process, it's going to be wrong. Back propagation goes back from the output layer to the input layer and updates weights as it goes. There's a bit of magic here because it has to update the weights in the right direction. Weights are numerical values and they need to be updated just right so that every update actually reduces prediction error. Backpropagation is the algorithm that does that.
Convolution is a mathematical function. In the context of neural networks, convolution has been the breakthrough technology that made computer vision efficient with neural networks, thanks to researchers like Yann LeCun. In deep learning, convolution is how you extract patterns from images. You do this using convolution filters, also called kernels. These are small two or three-dimensional arrays of numbers that you slide across the image or batch of images to extract patterns. This is what made many computer vision applications possible.
Dataset. You can't really do machine learning without data. Building a dataset is the main task. Once you have your data ready, cleaned, etc., you could say the hardest part of the job is done. If you have a poorly maintained dataset, you're never going to get good results. As I say all the time, garbage in, garbage out. If you have the best algorithm possible, it won't be efficient. Caring for your data, cleaning it, filling in missing values, adding new data, etc., is really central. Data scientists spend a lot of time curating datasets because that's the starting point for the whole machine learning story.
Epoch. An epoch is just a complicated word for an iteration. In the context of deep learning, an epoch means pushing the dataset through the neural network once. You go through the dataset batch by batch or sample by sample, and once you've reached the end of the dataset, that's called an epoch. Typically, you train for hundreds of epochs, especially for large problems. Computer vision models are typically very slow to train. So that's what an epoch is: going through the dataset once.
Feature. After datasets, features are maybe the next most important thing. Features are high-level variables that the algorithm will use to train the model. If you have a well-defined dataset with data in columns, these are your features. However, some columns might not be expressive enough for the algorithm to learn, or you might transform them or build new features from them to help the model learn. For example, transforming a street address into GPS coordinates can be much more useful for a machine learning algorithm than a string. Feature engineering is the set of techniques you apply to build features from the raw dataset, and it's a key skill for data scientists.
Gradient. Gradient is a complicated word for something simple. A gradient is a tiny update applied to a machine learning parameter. The term is mostly used in deep learning, where during backpropagation, an optimization algorithm decides how to update weights. You increase or decrease them a bit for each individual weight, and the update applied is called the gradient. This comes from math, where the derivative of a simple function is called a derivative, and the derivative for all dimensions of a function with multiple variables is called a gradient. It's a complicated term for a simple concept: tiny updates iteratively applied to machine learning parameters.
Hyperparameter. Hyperparameters are training parameters. When we say parameter in machine learning, we mean model parameters that are learned and updated during the training process. Hyperparameters, on the other hand, are parameters that you, the user or machine learning engineer, set for the training process. For example, how many epochs to run, the batch size, and so on. Every machine learning algorithm has specific hyperparameters. For a neural network, the number of layers and their width are hyperparameters. Finding the optimal set of hyperparameters is a hard problem, which is why many practitioners use hyperparameter optimization techniques to find the best values.
Iteration. Iteration is central to machine learning. The training process slices the dataset into batches that are fed to the algorithm, and then an optimization process runs. Once an epoch is done, the process repeats. Even from a development perspective, machine learning is highly iterative. You try different algorithms, parameters, and tweaks. When you start, you might think you're done after training one model, but in practice, machine learning engineers typically train hundreds, if not thousands, of different models. You need the right mindset to try all kinds of things, use your intuition, and keep looking for the best possible combination. Your work is never really done when you're working with machine learning.
Just do it. I couldn't come up with a word starting with J, so here's my motivation. If you're listening to this, you're probably new to machine learning or quite new. Don't be afraid. Don't let the jargon or the elite mentality in the community bring you down. You can all do it. Machine learning is mostly code, with a little bit of theory. We need many more machine learning engineers, so just go and do it. Don't let anything stand in your way.
Keras. Keras is an open-source library for machine learning and deep learning. It's my favorite by far. It's the easiest to get started with, has great documentation, a good blog, and tons of tutorials. It's beginner-friendly but also allows you to build extremely advanced models, especially now that it's tightly integrated with the new TensorFlow version. Keras started as a high-level API on top of TensorFlow and is now deeply integrated. You can go from super high-level to super custom, so I recommend starting with Keras.
Loss. Loss is another complicated term. Prediction loss means prediction error. It's central to many machine learning algorithms, especially deep learning, where we measure the difference between predictions and reality, or ground truth. For example, if an image is a dog, a cat, or an elephant, you predict all three and measure the distance between the predictions and reality. The loss function measures this distance. Loss functions are part of the package in libraries like Keras, TensorFlow, and more. You can implement your own if you need a different way to measure error.
Model. A model is what we're trying to get. It starts from an algorithm applied to a dataset. The algorithm explores the dataset, updates its parameters, and when the training process is done, you have a model. A model is a combination of an algorithm, hyperparameters for that specific training job, and a dataset to learn from. You use the model to predict.
Neuron. In the context of deep learning, a neuron is a simple mathematical construct with inputs, which are floating-point values. Each input is assigned a weight, another floating-point value. The operation a neuron computes is called multiply and accumulate, which is simple: it multiplies each input by its associated weight and sums everything. This mimics the biological neuron, which fires based on the amount of electrical current. The neuron is associated with an activation function, which introduces non-linear behavior. The popular function today is ReLU, which outputs zero if the multiply and accumulate value is negative and the same value if it's positive. This introduces non-linear behavior.
Optimizer. The optimizer is the function that updates the weights during backpropagation. Backpropagation starts from the output layer, looks at the loss, and goes layer by layer from the back to the front, updating the weights. The optimizer decides how weights are updated. There are many functions to do this, and we'll see one called SGD in a few minutes.
Python. Python is the number one language you should learn if you want to get into machine learning. R is another popular choice, and now we have libraries for Java and more. However, Python is still the dominant language. Libraries like TensorFlow, Keras, PyTorch, and mxnet have Python APIs, and the Python ecosystem is rich with libraries like NumPy, Pandas, and scikit-learn. Python is the one to start with.
Quantile. Quantiles are useful for probabilistic predictions. Instead of outputting a single value, you output ranges of values with probabilities. For example, there's an 80% chance that the value will be between 550 and 600, and a 95% chance it will be less than 680. Quantiles are referred to as P90, P50, etc., where P90 means 90% of predictions will be lower than this value. Comparing P10 to P90, you have 80% of predictions stored between those two values. When you want to build probabilistic predictions, quantiles are what you need.
Regularization. Regularization is a technique that helps your model learn better by making it a little harder for the model to update weights. This is useful when the model learns too well, especially with neural networks. If the model learns the training set too well, it won't generalize well to new data. Regularization techniques make it harder for the model to overfit the training set, hoping it will perform better on real-life data.
SGD. SGD stands for Stochastic Gradient Descent and is the granddaddy of all optimizers. It's a very old technique, invented in 1951, and is still heavily used today. It's well understood and predictable. SGD is a good place to start before trying more advanced optimizers. Run a baseline using SGD to see what kind of accuracy you get, and then try more complex optimizers.
Training. The training process starts from a dataset. In deep learning, you slice the dataset into batches, push each batch through the neural network, compute multiply and accumulate, and get to the output layer. You use the loss function to measure the difference between the truth and predictions, run backpropagation from the output layer to the front, and use an optimizer to update model parameters. For traditional machine learning algorithms, the technique is a bit different but follows the same big picture. You start from a dataset, let the algorithm learn iteratively, and update parameters. At the end, you get a model. You repeat this process, measuring accuracy until you get good results.
Regularization helps fight overfitting, which means learning the training data too well. Underfitting is the opposite, where the model has a hard time learning even on the training set. Overfitting and underfitting can be caused by various issues, such as messy data, insufficient data, or incorrect hyperparameters. Regularization helps with overfitting, making the model work harder to learn the dataset and generalize better to new data.
Validation. Validation measures how well the model performs on data it hasn't seen. You split your dataset into a training set and a validation set. The training set is used to learn and update parameters, while the validation set is used to measure the model's accuracy on unseen data. This gives you a sense of how well the model generalizes. If you have very good training accuracy but low validation accuracy, you need to fix it. Validation accuracy should be on par with training accuracy for the model to be useful on real-life data.
Weights. Weights are the parameters inside a neural network. They are assigned to neuron connections and updated during the training process. Biases are additional parameters tied to individual neurons. Ideally, everything is fine, but if weights go to zero or become very large, it can create problems like exploding or vanishing gradients. Sometimes, you need to inspect weights and gradients to understand what's happening in the neural network.
Expectations. Setting expectations is important, especially in the early stages of a project. People with little understanding of machine learning might think it's simple, but it's a proper engineering domain. Things are not simple and rarely work on the first try. Set expectations and explain how you will tackle the problem and what kind of metrics are reasonable to expect.
Why. Why do you even do machine learning? The first question I ask customers is, "What's the business problem you're trying to solve?" Understanding the business problem is central. Many people embark on ML projects without a clear goal, just because it's trendy. This is a recipe for disaster. Make sure you understand the business problem and challenge business owners on it. Not all problems can be solved with machine learning, so sometimes you need to use something else.
Zero. We would like to have zero prediction errors, but that's never going to happen. You will always have prediction errors, even with a high-performing model. Getting to zero errors is not possible, and this is an expectation you need to set. Looking at prediction errors is a great way to improve your model. They may point you to specific samples that are not predicted well, and you can come up with solutions to improve the model.
That's the end of this episode. If you want to learn more and get started with machine learning, I recommend our machine learning classes. You can find them at aws.training.com/machine-learning. You can also read my blog and follow me on Twitter for more content. Don't forget to subscribe to my channel and podcast so you won't miss future episodes. If you have questions or comments, I'm happy to read them. I'll see you around. Bye-bye.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.