Introducing Amazon SageMaker Clarify part 2 Model explainability AWS re Invent 2020

December 08, 2020
Following up on part 1 (https://youtu.be/jvcPZmnXaxo), I show how you to use the model explainability capability in Amazon SageMaker Clarify, using SHAP values computed on a credit model. https://aws.amazon.com/sagemaker/ https://aws.amazon.com/blogs/aws/new-amazon-sagemaker-clarify-detects-bias-and-increases-the-transparency-of-machine-learning-models/ ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future episodes ⭐️⭐️⭐️ For more content: * AWS blog: https://aws.amazon.com/blogs/aws/ * Medium blog: https://julsimon.medium.com/ * YouTube: https://youtube.com/juliensimonfr * Podcast: http://julsimon.buzzsprout.com * Twitter https://twitter.com/@julsimon

Transcript

Let's move on to explainability now. If we click on model insights, we'll see feature importance, showing which features contribute most to the predicted outcome. The first one is maturity month, and these are global SHAP values for the dataset, as expected. That's pretty intuitive; the risk for the bank is different for long-term versus short-term credit. A14, which stands for no checking account, also plays into the decision. The intuition is that without a checking account, you can't write checks, so you're not spending your money as easily. Loan amount is also interesting, as borrowing a huge sum of money versus a small sum certainly impacts the decision. A61, which indicates the amount in your savings account, is important too. This information is available in a report that you can export. It's a notebook that you see here, showing feature importance and all the metrics we computed, both pre-training and post-training. This is useful. Now, what if you want to understand how individual SHAP values work for each data instance? In our analysis report, we can find these metrics, which we saw in Studio. There's also the actual imbalance in the dataset. Feature importance is shown here, and we can find individual SHAP values for each data instance in S3 as an output of the analysis job. We can plot these values. If you've seen this before, it makes sense; if not, let me explain. Here we see feature importance. Maturity month is the top one, followed by A14 (no checking account), and then loan amount. Top features are the most important, and bottom features are the least important. Each dot represents the feature value for an individual data instance, and the color indicates whether the feature value is high or low. For example, with maturity month, all the low values have a positive contribution to the predicted output. In other words, if maturity month is low (short-term credit), the chances of credit approval increase strongly. These blue dots represent individual instances where low maturity month values increase the predicted probability, while red values generally decrease the probability. To summarize, if individual instances have a low feature value for maturity month, the probability of credit approval increases. Conversely, a high value for maturity month decreases the probability of approval. Let's look at loan amount. Very high values have a negative contribution to the predicted output. If you want to borrow a large sum of money, the bank is likely to say no. Smaller loan amounts are more favorable. Interestingly, some red dots suggest the bank might be interested in loaning large sums due to higher interest rates or greater profit. Generally, a high loan amount is detrimental. You can explore this further and look at all other features. In a nutshell, SHAP values provide global insights into which features are important for the dataset and allow us to plot individual feature values to see their contributions to positive or negative outcomes. To sum up, SageMaker Clarify allows you to compute pre-training and post-training metrics on your dataset and model. You can see various metrics in Studio, the notebook report, and S3. Feature importance is visible in Studio, and you can fetch the CSV file from S3 to plot individual feature values. Thank you, and I'll be back soon with more videos.

Tags

Feature ImportanceSHAP ValuesModel ExplainabilityCredit Risk AnalysisSageMaker Clarify

About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.

Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.