Using a SageMaker XGBoost model in scikit-learn

This is a quick post answering a question I get a lot: “how can I use in scikit-learn an XGBoost model that I trained on SageMaker?”.

Here it goes. Once you’ve trained your XGBoost model in SageMaker (examples here), grab the training job name and the location of the model artifact.

I’m using the CLI here, but you can of course use any of the AWS language SDKs.

$ export TRAINING_JOB_NAME='xgboost-190511-0830-010-14f41137'

$ export MODEL_ARTIFACT=`aws sagemaker describe-training-job \
--training-job-name $TRAINING_JOB_NAME \
--query ModelArtifacts.S3ModelArtifacts \
--output text`

$ echo $MODEL_ARTIFACT
s3://sagemaker-eu-west-1-ACCOUNT_NUMBER/sagemaker/DEMO-hpo-xgboost-dm/output/xgboost-190511-0830-010-14f41137/output/model.tar.gz

Then, download the artifact and extract the model.

$ aws s3 cp $MODEL_ARTIFACT .

$ tar xvfz model.tar.gz
x xgboost-model

The model is a pickled Python object, so let’s now switch to Python and load the model.

$ python3
>>> import sklearn, pickle
>>> model = pickle.load(open("xgboost-model", "rb"))
>>> type(model)
<class 'xgboost.core.Booster'>

You’re done. From now on, you can use the model as if you’d trained it locally. For example, you can dump it and visualize it.

>>> model.dump_model('model.txt')
>>> exit()

$ head model.txt
booster[0]:
0:[f2<512] yes=1,no=2,missing=1
 1:[f1<3.5] yes=3,no=4,missing=3
  3:[f2<1.5] yes=7,no=8,missing=7
   7:[f42<0.5] yes=15,no=16,missing=15
    15:leaf=0.508301735
    16:leaf=1.51004589
   8:leaf=1.72906268
  4:[f52<0.5] yes=9,no=10,missing=9
   9:leaf=1.39554036

See? That was super easy :)

Thanks for reading. Happy to answer questions here or on Twitter.

About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at AWS and Chief Evangelist at Hugging Face, Julien has authored books on Amazon SageMaker and contributed to the open-source AI ecosystem. His mission is to make AI accessible, understandable, and controllable for everyone.