SageMaker Fridays Season 3, Episode 3 — Managing engineered features with SageMaker Feature Store

In this episode, we build a sentiment analysis model starting from the Amazon Customer Reviews dataset. First, we import the dataset in Parquet format in Amazon Athena. Then, we import it from Athena to SageMaker Data Wrangler for a quick look. Then, we move to a Jupyter notebook and we start engineering features using popular open source libraries (nltk and spaCy), and we automate them with SageMaker Processing. Next, we load the processed dataset in SageMaker Feature Store, both offline and online. Next, we run Athena queries on the offline store in order to build a training set, which we use to train and deploy a sentiment analysis model with the built-in BlazingText algorithm. Finally, we see how to update and delete individual features in the online store, and how to use timestamps for feature versioning. 100% live, no slides :)


About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at AWS and Chief Evangelist at Hugging Face, Julien has authored books on Amazon SageMaker and contributed to the open-source AI ecosystem. His mission is to make AI accessible, understandable, and controllable for everyone.