Hi everybody, this is Julien from Arcee. In this video, I would like to announce that we open-sourced the bias computation part of Amazon SageMaker Clarify, the service we just launched at AWS Reinvent for bias detection and model explainability. There's a package on GitHub that lets you compute the metrics on your own datasets outside of AWS or, of course, read the source code and understand how those metrics are actually computed. So let's do a quick demo.
Here's the repository on GitHub. We can start by cloning it. Let's clone this. Okay. And we can install it. All right, hopefully not too many dependencies. No. So now we're good. And we have examples here, right? Yes. So let's just run that. Okay, so we see a couple of examples. Let's look at the first one. Let's clean the cells. Okay, so we import the SDK, and this uses the same German credit dataset that I used in the Clarify version how-to video that's already online. Okay, so import this. Then we load the dataset from the website directly. Set some columns. Alright, a thousand rows. Check out some rows. Okay. Plot some feature pairs. Okay, well that's pretty nice. Count foreign workers. And that's exactly the same thing I've done.
Okay, now we can compute pre-training metrics. We can compute metrics on the dataset itself. The facets we have here are foreign worker, and we use a group variable for housing. This lets us compute different metrics for the different values of that feature. Okay, that's another metric you can configure. So that's simple enough. Bias report on the dataset, the facet, the label, pre-training metrics, and the group variable. Just run this. This is very small, so it should be fast. Okay, and we see the same bias metrics that we saw in the other video: class imbalance, DPL, and all the ones we already discussed. And we can do the same for age and print out the report. Yes, okay. So these are the same values and same examples as the other video. I think this is useful if you want to work locally and add bias metrics to your local workflow, your local experiment. And then you can use the same metrics when you train with SageMaker, and it's going to be the same thing. So this is a really convenient way of doing it.
And of course, we could go and look at the code. Let's look at the code. SRC, SM Clarify, Bias, Metrics, Pre-training. Aha, okay. And we see how class imbalance is computed. Yes, it's going to be lots of pandas. How DPL is computed, et cetera, et cetera. Pretty cool. So you can see exactly how we compute those. If you'd rather read code than equations, which is my case, this is actually helping a lot. Okay, all right, so try it out. Let me know what you think. Feel free to contribute, send pull requests or issues. The maintainers will certainly appreciate your contribution. And make sure to star this repo and keep an eye on it. I'm sure more features will be added to this. Okay, well that's it for this nice open-source announcement on SageMaker Clarify. See you soon, bye bye.