SageMaker Fridays Season 3 Episode 2 — Easy data preparation with SageMaker Data Wrangler

In this episode, we start from the popular Titanic survivor dataset. We import it in SageMaker Data Wrangler, where we build visualizations and apply built-in transforms (column operations, imputing missing values, one hot encoding, normalization). Then, we export these transforms to a Jupyter notebook running a SageMaker Processing job. We run the notebook and take a look at the processed dataset, before training a model with XGBoost. We also take a quick look at other export options (Python code, SageMaker Pipelines, SageMaker Feature Store). As usual, 100% live, no slides :)

Join us for future episodes at Join us for more episodes at https://amazonsagemakerfridays.splashthat.com/


About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at AWS and Chief Evangelist at Hugging Face, Julien has authored books on Amazon SageMaker and contributed to the open-source AI ecosystem. His mission is to make AI accessible, understandable, and controllable for everyone.