Building AI Solutions that Matter Julien SIMON Chief Evangelist @HuggingFace LWD 21

January 31, 2024
In this episode of Leading With Data, we interact with Julien Simon, Chief Evangelist at Hugging Face 🤗 Julien Simon is an engineer at heart and has a knack for explaining complex technologies in a very simple and understandable format. Watch as he delves into: 👉 How he got hooked to computers as a young boy that charted his journey in this field 👉 Explains his Evangelist role at Hugging Face and AWS 👉 Talks about why the cutting-edge tools may not always be the solution for real-world problems 👉 Underscores the need for more and better open-source LLMs 👉 How he comes up with stunningly simple explanations to complex technologies 👉 Why it is important to drill down to the core of the customer problem 👉 His bet on industry leading open-source LLMs and local inference with LLMs for the coming years -------------------------------- Important Links 🔥 -------------------------------- ✅ Watch & discuss on AV's Community Platform: https://bit.ly/48LDsuu ✅ Listen on the go across leading podcast platforms: 1️⃣ Spotify 🔗 https://tinyurl.com/2av72fpn 2️⃣ Apple Podcasts 🔗 https://tinyurl.com/bdcmzcae 3️⃣ Google Podcasts 🔗 https://tinyurl.com/mhe8977d

Transcript

Fortunately, I see a lot of folks doing it the opposite way. And that's why I make fun of those LinkedIn influencers who post something mind-boggling about a new attention layer or something. And it's like, so what? I work in a Fortune 1000 company. I want to classify documents or build a customer support chatbot. Why do I care about Flash Attention 2 and VLM? What's the point? And these are great, by the way. These are useful, and there are plenty more. But you have to turn it on its head and say, okay, what's the problem we have? Why would we even need faster attention layers and faster inference servers, etc.? What's the problem there? And then take it from there, dive deep, but never lose track of the original problem you're trying to solve. Otherwise, yeah, you're feeding your curiosity and you're feeding your brain, but something needs to come out for customers in plain English. And that's my obsession. Hi friends, and welcome to this very exciting episode of Leading with Data. Today I have with me Julien Simon. Julien is an evangelist at Hugging Face. He has been an evangelist at AWS before. Someone who is very passionate about community and open source. And I think it's going to be so much fun talking to and learning from him about his experiences and how to build community. So looking forward to the discussion, and Julien, welcome to the show. Thank you. Thank you very much. Always a pleasure to talk to you. Great. So, Julien, let me start the discussion with what you mentioned on your LinkedIn that you are an engineer at heart. You spend quite a bit of time with various CTOs, evangelizing analytics, AI, ML solutions, and building community. So, can you, in your own words, explain the journey and how you ended up in this role? What were the early days for you? So what's your story in your world? It's a long story, so I'll try to keep it short and entertaining. So honestly, I'm the poster child for the kid who got a computer really early on, a century ago, literally. And I wasn't even in high school yet. And within an hour or two, I knew this was it for me. That was my love, my passion, my career eventually. And it's just at the root of this, it's just curiosity, trying to understand how things work. And I always thought computers are the most complex machine out there, the most complex device that the human mind has ever built. And even a century ago, they were already quite complicated and interesting to a young boy. And well, they certainly got better, overcomplicated in the following decades. And that's what drives me. At the end of the day, I love to learn new things. And with AI, there's certainly no shortage of that, from AI to hardware, to frameworks, to APIs, to cloud services. There's just a never-ending list of things you can learn. So, you have to focus, but I guess I found my niche, so to speak. I've been working on deep learning since late 2015, I think. So it's been a few years, time flies. And I don't have any background in AI. I'm a software engineer. I guess electrical engineering and software engineering, because a century ago you had to do electrical engineering before you could do software, right? What a waste of time. No offense to electrical engineers either. It was a bit of a waste to me. That's where I come from. So over time, I worked with internet platforms as a CTO and VP engineering in different startups. And data was always in the picture. Whatever you run, whatever platform you run, whatever industry you're in, in the internet age, there's going to be databases and there's going to be structured data and unstructured data. Structured data and then big data shows up, and well, let's learn big data and try to build something with it. And then machine learning pops up. It's like, well, yeah, maybe you could do machine learning, and then deep learning pops up, and well, what is this thing? And now transformer models and generative models. So I think that's really what I love doing. And hopefully, I can do this for some more time and keep my brain active and very busy. In a nutshell, the evangelist role or developer relations role is really about, it's an interesting career because, in my opinion, it's about getting paid to learn stuff, getting paid to sit at this computer for hours and days and weeks, banging my head, virtually of course, on the keyboard and the screen, trying to understand those novel techniques and really understand them. Not just, okay, I run Hello World or I run a blog and that's fine. Again, nothing wrong with that. But if I'm going to explain it to developers and customers and enterprise users and CTOs, et cetera, I need a very, very deep understanding of the tech. And so anytime you see one of my videos on YouTube, which is, I guess, my main focus, I've been writing a little less these days, I should go back to that. But those 30, 40-minute videos are sometimes full-time weeks of just me trying to figure things out. So maybe I'm just slow, or maybe I'm just obsessing over details, but that's what it is. That's what this job is. And joining AWS as a tech evangelist years ago, and now joining Hugging Face two and a half years ago as chief evangelist is that same, I guess, of me learning stuff, being patient, overcoming my frustration of not understanding. So if you think everything comes easy to me, sometimes people tell me that. It's like, oh, you make it sound easy. Well, yes, thank you, because I work very, very hard at making it sound easy. But believe me, nothing is easy, okay? I am the worst learner, tester, developer, I'm the worst anything at everything, which makes me a perfect person to hit every single problem. Write them on a piece of paper and then write the answer, hopefully, that I find to the problem and then put everything together and explain it to everybody out there, whether in person or online. So that's really what the job is. So it is an oddball. I have to say it is an oddball job, but it works for me. The older kid still wanting to learn, getting paid for that, and then generally trying to be helpful and wasting a lot of time so that you don't have to. That's really what the job is about. There is so much interesting to unpack from there. So let me start. One of the things which, even when I reflect back on our journey or anyone who has been successful, is that trait of, as you mentioned, machine learning popped up, and then deep learning popped up, and then more recently transformers and generative AI. Personally, I've felt that being in the community, you essentially have a bit of an advantage to see what is trending, what is gaining a lot of attention. But for you, how does that work? On one side, how do you keep yourself up to speed with what is happening? And then, do you feel that your evangelism role kind of puts you out there? Yeah, it's a good question. I think it's actually one of the harder things because sitting and learning stuff, I mean, yeah, I'm used to it. I've learned and I guess forgotten so many things over time. My brain is plastic enough to do that and keep doing that for a while. The bigger question is what should I be focusing on? And it's very tempting to keep chasing the latest trend. I mean, just open your LinkedIn feed, right? Do it now. All of you watching and listening to this, do it now. You'll probably see 10 different posts of 10 different people claiming whatever model is the best, whatever training technique is the best, whatever open source project is the best, blah, blah, blah, blah. And again, there is nothing wrong with that. Hopefully, those folks know a little bit what they're talking about, which is a completely different problem. But anyway, so you see the same things and which ones you actually invest time and energy into, right? Because you could be chasing, you could be spending two hours a day reading advanced research papers, but is that what is helping you progress as an individual? And I guess more importantly, is this what your organization or whatever business you're working for, is that what they need? So focusing on things that move the needle in terms of business value, which has always been my focus. And when I say business value, I don't necessarily mean commercial stuff. I mean, if you work for an education organization, then, well, I guess business value means improving student outcomes. And if you work for a hospital or a medical organization, it means improving patient outcomes. So it doesn't have to be dollars, generating dollars or any other currency. It's really about solving business problems. And all the bleeding edge techniques are interesting. And I'd love to be able to read them all and try them all. But that's not really where customers are. I spend most of my time when I'm not learning stuff, I'm spending most of my time with enterprise customers and sometimes public sector customers. And the problems they have are real-life problems. They're not moonshot projects. They're not trying to build the craziest, most advanced project in the world; they have real-life problems to solve. And so you need to find the right set of tools and techniques that will help them do that in a reasonable timeframe with minimal risk, reasonable cost, etc. So you can't just pull the latest preprint from arXiv saying, oh, look at this. Meta did this, or Google did that, or Microsoft PhDs did this. It's like, well, we don't understand any of it, and it's not really applicable to what we do anyway. So thank you, but no thank you. And so that's where I focus on. I keep an eye on the bleeding edge stuff, but I focus most of my time on what's actionable and usable by real-life practitioners today. And how do you do that? Do you constantly reference the problems you have come across, or is there a framework of sorts which you have built? Well, if you've been around for a while, which I guess is certainly me, you get a sense of, yeah, this is a really cool thing. But first of all, is it even available in an open source library? Is it purely research? And congrats to whoever wrote the paper. I understand half of it and I skipped the math and everything, but it's like, yeah, this is really good, but it's not available anyway. So it's pure research. So it doesn't work for me and our customers. If it's already something that's implemented, if it's a new model or if it's a new training technique and the Transformers library supports it or PyTorch supports it, okay, then I guess I will, if it's something that a reasonable data science team could start looking at and test in a matter of hours, then okay, maybe I'll take a look. But more than anything, again, yes, I will start from what customers are asking. I spent six years at Amazon, at AWS specifically. So working backwards and customer obsession are still very, very strong things for me. And I believe there's a ton of value in just shutting up, listening to pain points from real-life people in industries you know nothing about, and trying to help them out. And most of the time, they're not looking for another deep learning optimizer or another insanely complicated distributed training technique; they have more pragmatic, down-to-earth questions. But if you solve those, then they're on their way to creating business value and being successful. So it's a balance between just going crazy with engineering and research and feeding my hunger and my curiosity with the bleeding edge stuff. And then trying to be a little more reasonable, a little more pragmatic, saying, yeah, okay, don't spend too much time on this, because you won't get that question for another six months, maybe. But instead, why don't you go and dive into, I don't know, data cleaning, or cost optimization or something, because that's really what people want to do right now. And then once you have zeroed in on some of the topics which you feel are relevant for the customers, you use, I'm presuming you spend time on distilling them through because you mentioned that you spend a lot of time kind of simplifying those. And then I see you obviously post on Medium, create videos, and during this process, right? If you have to put the level of clarity which you yourself get, right? So let's say when you saw it on LinkedIn, it was zero. And when you ended with the process, it was, let's say, 100. How does that happen? And what does that process look like? So I always try, regardless of how much I may already know, to approach any new tech, any new thing I'm investigating as, I would say, a vanilla customer. So I'm trying to shut down part of my brain and part of my memory because initially, I want to take a look at this thing with a fresh set of eyes and definitely not as somebody who's been obsessing about deep learning for the last seven, eight years. And so initially, I am really playing with Joe user and copying and pasting, running from the Hello World example and just not trying to second guess anything and say, okay, let's run the Hello World thing. Okay, no, if it's not running, then something's wrong, but sometimes it's not running. And just start from there and then read the docs, read more examples, absolutely not jump to the GitHub repo and start reading code. Most of the time I do that too, but later, because I really want to understand at some point, I think I have this mental model on how this works and I want to validate my assumptions by reading code and I go and do that. Just to get a little ahead of anybody else, because I probably won't get those questions, but knowing a little more is going to help me answer everything else with more clarity and more certainty. I never like to say, oh, I think it works that way, or I suppose it works that way, or maybe it works that way. I want to know. I want to know how it works. And then, okay, if I get that question, then I know. If I don't, I can use some of the low-level details to better answer the question in plain English. So I think it's, again, it's trying to stay in sync with customers and what they need right now and how much they need about it, how much depth, and work in layers. And if you look at my content, generally, some of it can be beginner-friendly, and I'll make it clear, if you're completely new to this, then great. Welcome. Some stuff will be like deep dives, and that's definitely a few layers down. And sometimes I go hardcore and I'll just say, well, if this is the first time you watch anything or read anything about this, this isn't the one for you. So I think you need to understand the problems at different levels. What's the purpose? What kind of business problem is this helping solve? What are the use cases? What's the developer experience like? What's the cost like? What's the performance like? You know, the initial questions that will help you, I would say, a shortlist technologies or models or whatever. And then once you understand that, yes, this is working in this case and not in that case, and then this is how I would use it. Okay. Once you figure that out, which is really 90% of what customers will want to know, then you can spend more time diving deeper. But unfortunately, I see a lot of folks doing it the opposite way. And that's why I make fun of those LinkedIn influencers who post something mind-boggling about a new attention layer or something. And it's like, so what? I work in a Fortune 1000 company. I want to classify documents or build a customer support chatbot. Why do I care about Flash Attention 2 and VLM? What's the point? And these are great, by the way. These are useful. And there are plenty more. But you have to turn it on its head and say, okay, what's the problem we have? Why would we even need faster attention layers and faster inference servers, etc.? What's the problem there? And then take it from there, dive deep, but never lose track of the original problem you're trying to solve. Otherwise, yeah, you're feeding your curiosity and you're feeding your brain, but something needs to come out for customers in plain English. And that's my obsession. And yeah, I think the other challenge which happens is you don't realize when you've moved from feeding your curiosity to feeding your dopamine kicks and then you end up just scrolling more and more information. Yeah, and it's a good point. And I think the job satisfaction, the danger for somebody like me or folks in similar roles, the danger is going all in on personal satisfaction. And then building content or going on stage, basically showing off, saying, look how cool I am. Look how smart I am. Look how much I know, blah, blah, blah, and live coding and blah, blah, blah. And that's the huge risk. And I think you need to shift the job satisfaction to getting input, person feedback or online feedback or email saying, I watch this or YouTube comments, which I love and I try to answer as many as I can. Folks telling me, oh, wow, I've been trying to understand this for six months. I watch your video. I got it. And thank you. Right. I never saw it better explained or something like this. Doesn't happen all the time, but it does happen from time to time. And that's me. That's my dopamine kick. It's like, okay, I hit the spot. I put in plain English something that's amazingly complex and I tied it to business problems that folks have in real life. That's it. That's the bull's eye for me. So you need to be aware of that and understand what people expect from you and what will make them happy and focus on that. In this process, have you started using any of the Generative AI tools to accelerate your own research and your own shortlisting? I don't use any of it for my ride because I think it's generating stuff and just copying and pasting into a blog post or something is just plain wrong. And at the end of the day, I think my domain knowledge and my style are what makes me successful. And if I just take a shortcut and copy-paste whatever LLM spits out on a particular domain, even if it's a factually correct answer, it's too bland. It's too bland. It's generic. And I think if people like my content or don't like my content, it's because of that. It's because of what I learned in the last years and who I am and how I tell it like it is and make funny jokes or not so funny jokes or use strong words sometimes. And that's it. You like it or you don't. Like a journalist. It's like anyone else, right? You feel a connection to that person. So using models, using AI for that doesn't really work. Sometimes, when I struggle with a particular sentence, I'm not quite sure it sounds okay. Yes, maybe I'll just say, can you simplify this? Can you make it sound more blah, blah, blah. But just a tiny bit. And generally, I use, you know, and you have to trust me on this. I use Hugging Chat for that. Our own chatbot. So go check it out. Hugging.chat. We have the latest models and they're amazing. But yeah, so that's really what I use. I guess I'm too old-fashioned. I'm a craftsman in a way. And I think the way I work is very personal, very lonely in a way. And it needs to be the screen and me, and coffee or tea. And then it clicks. And once it clicks, writing is very fast for me. I never had any problem with writing. So I don't think I need a lot of assistance. I just need time to figure things out. Time and quiet. Interesting. And then, yeah, completely agree that the outcome which comes from your own fashion involvement and the creation process. Shifting gears a bit, you obviously spend a ton of time with different customers, their problems, and you are usually very involved. Yes, it was difficult to schedule this and I forgot to apologize, but we made it. And along with that, you know, on a macro level, there is so much action happening on Generative AI. There is so much buzz. There is so many discussions happening. So when you step back and filter out this almost meaningful activity which has been going around. What are some of the key trends, key things which you observe, and how do you see this entire development over the last year, year and a half? Yeah, so I think the first, if we look at 2023, the first quarter was really a lot of experimenting and just understanding what generative AI and LLMs could do and could mean in your organization. And the cool thing is, it wasn't only, I would say, engineers; it was literally anybody. Your HR manager, your marketing manager, your finance manager could go and ask stuff to ChatGPT and see how well or how bad it would do. And by the way, could we use this in our internal workflows, etc., and business processes? So very good. And then in Q2, we started seeing the first wave of open source LLMs, right? You know, LLaMA, Alpaca, or Koala, all the animals starting, showing up. And I think it was the first sign that, okay, maybe there's a choice here. It's not just OpenAI. We're starting to see the open source community and research community figuring things out, training models, building datasets, etc. And of course, there was still a pretty large gap. And then during the summer, LLaMA2 came out from Meta. LLaMA2 and then Falcon. Yeah, and then very quickly Falcon. And those two, I think, were the wake-up call that open source LLMs are very serious contenders. And even though the quality gap is still there, it's closing very, very fast. And by then, I think some early adopter customers realized that they would run into issues with OpenAI and other closed models in terms of maybe compliance and certainly cost, et cetera, et cetera. And so they saw those first few great LLMs out there and started experimenting. And I think Q3 and Q4 have just been insane in terms of acceleration. The bigger Falcon and Mistral and Mixtral and fine-tune variants and it never stops. Right. And so I think now, plus, you know, the OpenAI shenanigans, I think convinced everybody that you can't put all your eggs in the same basket. And just like if you've been in tech long enough, you know, there isn't one programming language to rule them all. Same for databases, same for storage systems, same for networking equipment, same for everything, right? And so why would you have a silver bullet for AI when we never saw that in tech in the last 50, 60 years? So just like every other field in technology, you'll have a panel of solutions and closed models are a great solution for some use cases. And I think the productivity side of things, to co-pilot everything on the Google and Microsoft side is certainly valuable and I'll certainly be using it. But then when you look at enterprise use cases where you want to create a competitive advantage through training models on your proprietary data and then deploying them internally and getting maximum ROI from that, because again, cost is everything, as a lot of folks are now figuring out, I think, or have figured out in the last few months. Then you'll probably turn to open source LLMs that you can fine-tune and deploy and optimize and control and secure. So it's a scope. You have a whole spectrum of solutions and it's a toolbox. And as a practitioner, as a professional, you need to understand what those tools are for and when to use what, you know, when to use what. And that's great. Competition is great. Choice, customer choice is great. If anything, we need more, you know, and we're getting close to 500,000 models on the Hugging Face Hub. By the time you're watching this, maybe we've hit that crazy number. And I think that's fine because you'll find, you'll certainly find something that's close enough to your business problem. You can start experimenting real quick and you'll be on your way. So I think things are, perception is more balanced now because of that experimentation cycle and some early disillusions that happened last year on closed models. So yeah, very, very interesting. And I think 2024 is gonna be even crazier. Yeah, and just double-clicking on this, let's say I represent a bank which has its own set of data, all the historical data, data about customers, risks, etc. And, you know, I'm having a discussion with you in terms of how can we use generative AI for our use cases, and it could be risk, it could be customer management. How would you guide these customers? And then what would be your framework to help them think through these challenges? So first, I would ask a lot of questions, making sure I understand the use case. Because, again, there is no technology or model to rule them all. And, well, I guess over the years, I learned a few things about use cases in finance or in manufacturing. But I know so little compared to you. And if you've been working in the bank for 10 years, you know everything there is to know, and I hardly know anything. So I need to understand what you're trying to build. What are the key metrics that matter? Right? So compliance would certainly be very high on the list and non-negotiable. Domain adaptation would certainly be very important because banks like other companies tend to have crazy jargon and internal policies and internal knowledge that could be quite different from one bank to the next. And cost would certainly be important. So try to understand the use case, try to understand what matters to you. And then see if this is really a Gen AI problem or not. Because if we can downscale this to, I would say, a traditional transformer problem, or maybe a simpler machine learning problem, then perfect. I keep saying the best AI is no AI at all. AI is complicated. AI is risky in some ways. AI is expensive if you do it wrong. So if you can solve the same problem with a model that's 10x or 50x smaller, great, right? So I want to make sure people are not jumping to conclusions and picking Gen AI because they're excited or because so much is happening these days that they think that's the Swiss Army knife solution to their problem. So if it really is a Gen AI problem, then of course, we need to talk about model selection, which languages do you need to support? And I travel a lot. And I'm not a native speaker, so I'm very well aware that English is not the universal language. Well, there are parts of the world who don't speak English, who don't want to speak English, and that's totally fine. That's how it should be. So you need to find models that support the local languages and you need to start experimenting. And I push customers to start experimenting as early as possible because AI is still a very new topic, well, generative AI is, and everybody has an opinion and we shouldn't really care about opinions. We should care about facts and we should take decisions based on data and KPIs. So instead of discussing things to death for weeks with governance committees and whatnot, please start experimenting as quickly as you can in your sandbox. You'll be safe. There's no risk in doing that. So start to figure out, okay, with these two, three LLMs that are shortlisted. And with this initial test set of prompts, what do the answers look like? What do I like? Why don't I like? And can I do some early prompt engineering to tweak the output to get the right tone, to get the right brevity, et cetera? And generally, how does the model behave on my domain? How does the internal, how does the model know from its training? How much does it know about my domain? Is it knowledgeable about banking? Or is it always saying stupid things? So maybe that helps you shorten the list again. And then you can start plugging in external sources of truth using technologies like retrieval-augmented generation, which is very, very easy to prototype if you use, let's say, LangChain as an orchestrator and a simple vector index. You don't need a fancy vector database or your Elasticsearch cluster, whatever is already there, right? Start plugging your own knowledge and start, again, start learning what works, what doesn't work, and which one of those models would be a better fit. And then maybe at some point you select one and you go into fine-tuning to make it even more relevant to your domain, et cetera, et cetera. But it is a super iterative process. You should be able to iterate within days or even hours if you can. And within a few weeks, you should have a reasonable POC that shows the promise of business value. And then, okay, then you can pause and you can start looking at, so how do I make it even better? Keep iterating on it. And what are the boxes I need to tick? Who are the folks I need to keep happy in terms of compliance and cost, etc. And that's a parallel track, but you should never stop experimenting. Experiment, experiment, experiment until you know enough about what you like, what you don't like, what are some of the risks, how to mitigate them, etc. Then pause, launch the compliance and security and cost management and ops topics, because you will need these to get to product. And keep iterating and improving the model. At some point, you know, things align and you'll have a production-ready solution, but you'll probably never stop iterating on it. So short version is, you know, experiment, experiment, experiment, and make sure this is really a Gen AI problem. Yeah, that is 100% true. And, you know, so I think there are two, three things that you have mentioned. One, obviously, you know, make sure it's a generative AI problem and not one of the classical ML problems. And then you also mentioned a trend which you see where the open source is clearly gaining a lot of momentum. Some of the pros of being open source are coming across and people are realizing that. Small models is I think another thing which you referred to that could be interesting going forward. But specifically when you now look forward in what's the next 12, 24 months, what are some of the things where you see a lot of development happening, a lot of action happening? What are some of the things you're looking forward to? So I think we're at the stage, I mean, we're at the beginning of the year, you know, 2024, we're at the stage where I can say with a straight face that the best open source LLMs are on par with the best closed models, right? And if you think I'm wrong, I mean, again, then go to Hugging Chat, experiment with Mixtral, which is one of the models we support for the chatbot, and tell me what you think. It feels to me this is as good as anything else. So now we need to leave them in the rearview mirror. I'm not satisfied with parity. I really want open source models to be the best and keep outperforming not only in terms of business value and language quality, etc., but also in terms of cost performance. Because one thing that comes up a lot when I speak with enterprise customers is the nasty surprise they got when they moved their OpenAI POC to prod. The per-token thing is, I don't want to say it's deceiving. That's not the word I'm looking for. I'd rather say it's easy to miscalculate how much this is going to cost at the end of the month, especially if you plug in RAG systems and et cetera, et cetera. And I think I've met a lot of customers who've really been bitten by this. And it's not a cheap shot at OpenAI. They've done an amazing job. It's just that I think a lot of folks didn't really understand how much this was going to cost in the end. And it was way too much for them. And they had to pull the plug. So cost is important. And working with smaller open source models will obviously help. Model optimization is one of my favorite topics. I started with tiny computers with just a few kilobytes of RAM, and I believe small is beautiful in tech. Shrinking models while maintaining business value and running them on commodity CPUs is an exciting prospect. Video generation is still in its early stages, and creating consistent outputs for longer durations is a challenge. If you had to put a timeframe on when a good open source model might tackle this, what would you say? The pace of innovation on models will continue to accelerate. Models like Mixtral are getting very close to outperforming the best closed models, but they are still large and complex. There's a strong incentive to create models with just a few billion parameters that outperform the best closed models. We're seeing a lot of work on optimizing inference, such as flash attention and page attention, to serve large models more efficiently. Fundamental advances in the transformer architecture and attention layers during training will likely lead to smaller models with the same quality. Applying optimization techniques to quantize and shrink these models will enable them to run on local machines. It's interesting how some people think AI is solved, but if you've been in tech long enough, you know nothing is ever truly solved. New programming languages continue to emerge, and there's always someone with a better idea. The AI competition is now global, with engineering talent everywhere. For example, Falcon came out of Abu Dhabi, which was a surprise. Local talent is crucial for language models, and we're seeing a lot of regional LLMs, such as those for Scandinavian languages and the Sea Lion model for Asian languages, developed by AI Singapore. These models are essential for building applications that understand local languages and cultures, bringing smart apps to people everywhere. Towards the end of the show, we ask a few rapid-fire questions to get to know you better as a person. What's a dream holiday for you? Being on holiday anywhere is good enough, but I'd love to visit the very south of South America, like Argentina and Chile. Antarctica would be awesome, but it's a long flight and not a common destination for AI conferences. When you create content, you tend to be by yourself with coffee and music. Can you elaborate on that? What does that environment look like? I try to block full weeks in my schedule to focus on content creation, which means not attending meetings or traveling. I have a short list of two or three things I want to understand by the end of the week. I immerse myself in reading docs, code samples, and writing code, while documenting every question I have. By the first day, I usually have 65 questions to answer, and it takes the rest of the week to go through the list. Isolation is crucial for deep content creation. What is your favorite pastime when you're on a flight? I try to get some sleep, but I'm terrible at it. Jet lag is a killer for me. If I can't sleep, I read. I've seen all the movies multiple times, so I usually bring a book. I love fantasy, and I can read "The Lord of the Rings" 90 more times and still discover new things. It's a big book, so even on long flights, I won't finish it. Thanks a lot, Julien, for sharing these insights and stories. It's been a pleasure talking to you. Before you ask, I know I haven't been to India in a while. My last trip was pre-COVID. I'm dying to go back and spend time with the amazing tech community there. Hopefully, I can be back in 2024 and enjoy what I think is possibly the best food in the world. Thanks a lot.

Tags

Generative AIOpen Source LLMsEnterprise AI AdoptionAI Cost OptimizationModel Experimentation

About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.

Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.