AWS AI Machine Learning Podcast Episode 7

Transcript

Hi everyone, this is Julien from Arcee. Welcome to episode 7 of my podcast. Don't forget to subscribe to be notified of future episodes. In this episode, I'm talking to my friend Francesco, an experienced data scientist who works for a number of different companies, including Amazon and Kindle. He's also active in blogging about data science. We talk about getting started with machine learning, running your machine learning projects, interacting with business stakeholders, and a whole bunch of different things. I'm sure we'll enjoy this conversation and you will learn a few things. So let's not wait and let's listen to Francesco. Francesco, thank you very much for taking the time to speak to us today. So let's start with a quick intro. Tell us about how you got started with data science and machine learning. Hello, Julien. It was almost by chance for me. I started at the end of 2013, looking into Python mainly. By chance, I stumbled upon Andrew Ng's course on Coursera. On the front cover, there was a robot, and everyone was talking about this new thing called machine learning, which I knew nothing about. I decided to give it a try. Interestingly, the homeworks were in MATLAB, not Python. However, the instructor was so engaging that I stuck with it. The course lasted 12 weeks and was my first introduction to machine learning. After that, I thought the best way to get started was Kaggle, so that was my next step. I recommend starting with Andrew Ng's course and doing the homework in Python. The course takes about 12 weeks, up to four months. Once you complete the course, move on to "The Elements of Statistical Learning." It's like the Bible of machine learning, though it's a huge book. There's also a more accessible version, "An Introduction to Statistical Learning," which uses R. Stick with that one. Then, definitely try Kaggle. Competing is great because it gives you an immediate benchmark. It's crucial to compare your results with others to avoid getting stuck in a comfort zone. Another often overlooked skill is SQL. I rarely see this mentioned in blog posts or discussions about machine learning. In your day-to-day job, nobody will get your data for you. You need to be able to handle data from various sources, often undocumented, and without good SQL skills, you won't get far. This leads to my next question: When you start working on a new project, what are the first few steps you take? Is there a general way of addressing a new machine learning project, or is it custom every time? The very first step, and the hardest part, is understanding the business context. For example, when I worked for Amazon Kindle, we were asked to predict whether an ebook would sell well online. The challenge was to consider the broader business context. Is the ebook just part of the business, or should we also consider the paper side? If the ebook cannibalizes paper sales, the overall sales volume could decrease, making publishers unhappy. This is a critical aspect that isn't always obvious but must be incorporated into your model. Framing the business problem correctly is essential. A machine learning algorithm is just a mathematical formula and doesn't understand business nuances. You need to incorporate this knowledge into your dataset or tweak your loss function. This knowledge comes from business people, and it's crucial to gather it early to avoid having to restart your project. A machine learning solution should solve a business problem. There's no machine learning solution just for fun. Business problems are complex, and you need to understand the context to ensure you're solving the right problem effectively. Another important aspect is explaining your model to business stakeholders. When a product manager asks why something is happening, you can't just say you don't know. You need to understand and explain the relationships between features and the dependent variable. This helps you deeply understand the problem and provide valuable insights to the business. Interpretable machine learning is a huge area of progress, but it's often overlooked. As machine learning practitioners, we're not just solving problems; we're also providing advice to improve business processes. This requires a deep understanding of what your model is doing. Modeling the problem teaches you about the data and the business problem itself. It's not just about prediction; it's about understanding hidden relationships and making informed business decisions. Democratizing machine learning is exciting. It should be used by business analysts who aren't necessarily experts in algorithms. For example, if a product manager asks about the relationship between house price and surface area, you should answer considering all other features. This is where models and partial dependency plots come in, helping to understand the relationships accurately. Education is crucial at all levels. Start coding and using models to understand what's going on. They are extremely powerful tools. There's one last point I want to stress. Whenever I talk to new people in this domain, I tell them to create a blog. Start from what you love. If you're interested in computer vision, train a cat versus dog classifier and write about it. This is an incredible experience and a great way to share your journey. Many people focus on learning every statistical method before writing code, but it's better to start coding now. That's a great conclusion. Thank you very much for taking the time and sharing your knowledge. This is really invaluable. Thank you, Francesco. I hope to see you soon. Thank you, Julien. That's it for this episode. I hope you enjoyed it. Don't forget to subscribe to my channel, and I'll see you soon with more conversations and content. Until then, keep rocking.

AWS AI Machine Learning Podcast Episode 7

Transcript

Tags

About the Author