EPO1 Interviewing Julien Simon The Future of AI Small Models Open Source Community
February 26, 2025
In this deep-dive conversation with Julien Simon, Chief Evangelist at Acree AI, we explore the fascinating world of AI models, fine-tuning techniques, and how smaller models are revolutionizing the industry! 🤖✨
⏱️ Timeline:
0:00 - 2:30 Open source revolution in AI and how the landscape has evolved 2:30 - 5:45 "Mile Wide, Inch Deep" vs. "Inch Wide, Mile Deep" Models 🧠 5:45 - 8:58 Practical advice on fine-tuning models with proprietary data
8:58 - 10:55 Evolution of AI models and tone of voice capabilities
10:55 - 19:12 Role of a Chief Evangelist and building bridges between tech and business 🌉
19:12 - 24:17 Why small language models matter: cost efficiency & accessibility
24:35 - 27:14 The impressive evolution of small models (10B outperforming 72B less than a year!) 💪
28:02 - 30:21 Democratizing AI access through running models on basic hardware
30:47 - 31:02 BREAKING: Acree's new 7B reasoning model outperforming O1 Preview! 🚀
32:13 - 33:12 The crucial role of quality data in developing reasoning capabilities 📚
34:30 - 35:16 DeepSeek's approach: Reinforcement learning for better reasoning 🔄
38:24 - 41:00 The benchmark debate: When companies cherry-pick results 📊
41:56 - 43:48 Discussion on Gemini model's historical representation issues
43:48 - 46:34 AI Model Alignment: Cultural Perspectives and Local Values 🌏
46:34 - 51:37 The importance of developing local models for different regions 🌍
52:47 - 55:55 Building Local AI Communities: Collaboration & Getting Started 🤝 - The power of community-driven initiatives - Leveraging open source and cloud resources - Starting small and building momentum - Importance of local champions and partnerships
🔍 Key Topics:
✅ How fine-tuning techniques are evolving
✅ Why small language models are disrupting the industry
✅ The ROI advantages of running smaller, efficient models
✅ Data privacy and keeping information within your "fortress"
✅ Practical applications of AI in various business contexts
✅ The exciting breakthrough of Acree's reasoning model challenging larger competitors
✅ Building local AI communities and fostering collaboration
🌟 Closing Insights:
"If you're not going to do it, the West Coast people are not going to do it for you." - Julien Simon
Emphasis on community-driven initiatives
Importance of starting small and building momentum
Leveraging open source resources and cloud partnerships
The power of local champions and academic collaboration
Check out Julien Simon's YouTube channel to learn about the technical development of LLMs
🔗 Useful Links:
Acree AI: https://acree.ai
Jungs AI: https://jungs.ai
Julien Simon Youtube on Acree Advanced Reasoning model: https://www.youtube.com/watch?v=QU782ZdZ0u0
Mistral 24B Benchmark Blog: https://huggingface.co/arcee-ai/Arcee-Blitz
Acree Advanced Reasoning model 7B - https://www.arcee.ai/blog/arcee-maestro-7b-preview-arcee-blitz-advancing-reasoning-and-speed-in-smaller-models
FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs?: https://arxiv.org/abs/2411.05059
Subscribe for more AI insights and interviews with industry leaders! 🔔
#aitechnology #SmallLanguageModels #MachineLearning #JungsAI #AcreeAI #JulienSimon
Transcript
Hey everyone, I'm excited to announce that I'll be launching a new podcast, Young's AI Podcasts. The idea behind this podcast is to have real conversations with industry leaders in AI and machine learning, diving into not just the tech, but the bigger picture, innovation, careers, youth development, opportunities, and the impact AI has on our lives. We'll be speaking with experts, founders, and pioneers who are shaping the future. For our very first episode, we'll be interviewing Julien from Arcee. Stay tuned. He is a visionary leader and respected authority in artificial intelligence and machine learning, currently shaping the future of AI as chief evangelist at Arcee. With a career spanning over two decades, Julien has cultivated a reputation for bridging cutting-edge technology with real-world business value, earning him recognition as a trusted voice in the global tech community. His journey includes pivotal roles at industry giants and innovative startups alike. Prior to Arcee, Julien served as chief evangelist at Hugging Face, where he empowered Fortune 1000 companies to harness machine learning solutions, translating complex technologies into actionable strategies. Before that, as global technical evangelist for AI and machine learning at AWS, he guided organizations in unlocking the potential of cloud-based AI, simplifying scalability and innovation through his expertise.
Julien's leadership extends beyond evangelism. He has held executive roles such as Chief Technology Officer at Viadeo and Aldebaran Robotics, Vice President of Engineering at Criteo and Digiplug, and co-founder of Netattitude. These experiences honed his ability to lead global teams, drive R&D breakthroughs, and deliver transformative products from embedded systems to robotics and mobile communications. A passionate advocate for ethical AI, Julien champions accountability and collaboration in the field. His book, Learn Amazon SageMaker, demystifies machine learning for practitioners, reflecting his commitment to democratizing technology. He frequently addresses pressing challenges like algorithmic bias and sustainability, urging the industry to balance innovation with responsibility. With over 200,000 followers, he runs a successful YouTube channel delivering high-quality technical content on AI, machine learning, and deep learning. At his core, Julien remains an engineer with an insatiable curiosity. He thrives on being the voice of the customer, translating feedback into solutions that bridge technical possibilities with human needs. His blend of technical mastery, strategic vision, and principled leadership makes him not just a pioneer in AI, but a compelling storyteller for its potential to shape a better future.
Let's begin. I'm honestly privileged to have this conversation with you because I remember the last time we actually engaged was probably two years ago when you came to Africa for the ABS Roadshow and the Hugging Face roadshow. Ever since then, it was amazing. But how was Africa for you when you came here? Well, it wasn't my first time. I actually visited Johannesburg years ago when I was at AWS for an event over there. But it was a really short trip. I think I stayed only for two days and didn't see too much. But this time was nicer because I got to go to Johannesburg again and to Cape Town. That was my first time in Cape Town. Nice. And you keep hearing how beautiful it is. And now I understand, Cape Town is truly great. And the weather was perfect. The most important thing for me, of course, is the quality of the interaction with the tech community, developers, customers, and enterprise users. There was a lot of curiosity about AI in general and open-source AI in particular. We did a really good meeting in Cape Town. It was a full room with tons of questions from students to professionals. That's all there is for me. That's the reason why I take those long flights. I want to help people figure things out. The AWS team over there was super nice to me. They took great care of me and fed me great food. So it was all nice. I would love to come back and maybe not just to South Africa, but also to other countries in the region. So anybody here wants to invite me, get in touch.
Definitely. No, honestly, that's something we would love to do because Cape Town has the edge over Johannesburg in terms of startups. It's often considered the Silicon Valley of South Africa, though not exactly, but there are a lot of startups based in Cape Town. So it makes a lot of sense. Even though you had your time in South Africa traveling and were working for Hugging Face, how was it working for Hugging Face? When you started, I assume it was during its inception, and it has since become a multi-billion dollar company, a unicorn. How was that journey? So I joined from AWS in October 2021. Hugging Face was rising, with around 40 people, still very focused on open source and growing models and libraries. But they were also starting to look at cloud integrations and enterprise projects, which is my background. What really gets me going is meeting customers with real-life problems in real-life organizations, with all the complexity that entails, from compliance to budgeting to skills gaps. Despite these challenges, they manage to get things done and create business value, whether in healthcare, retail, or financial services. Hugging Face was rising, and it was a perfect storm. ChatGPT came out and did all the marketing for us, showing what generative AI could do. Within six months, we started seeing the first open-source generative AI models, like the original LLaMA and Vicuna. Then we saw even better models like LLaMA 2 and Mistral, and now LLaMA 3 and beyond. The path to AI was not just about calling OpenAI or Anthropic APIs; there was an alternative for enterprise use cases, especially where you need an expert model for a specific use case. If you're a mobile operator with millions of customer support calls, you don't want the model to talk about poetry or astronomy. You want it to understand how to change subscriptions, get new phones, or replace lost SIM cards. It's a narrow slice of human knowledge but needs to be very deep. This is where the large models, which are a mile wide and sometimes an inch deep, fall short. Companies started realizing this in late 2022 and early 2023, which created more incentive for the open-source community to build cool small language models. Now, we're at a point where open-source models can outperform larger, closed models. We see this in benchmarks, and Arcee is also doing that. Just last night, we released two models distilled from DeepSeek models, and they are amazingly good for their size. This was a perfect storm of riding the AI wave and quickly showing that there was an alternative. Hugging Face has been great at growing the community, being the central place for models, maintaining libraries, and getting them integrated across clouds. They are important in the tech world today.
Definitely. In 2023, you spoke about the importance of training AI models with your particular data and IP, where you get true value. One thing I picked up is that within the industry, we understand open source is key. Having your own data in the model gives you more expandability and ensures data privacy. When organizations try to fine-tune models with their own IP, what advice would you give them, especially when they don't know where to start? The story has evolved over the last year or year and a half. Back then, I remember the pyramid slide showing the need for industry data, language data, company data, and use case data. This was a simple way to explain it, and people liked that slide. But the next step is figuring out how to bring your data into the mix. The first step is to determine what kind of questions you want the model to answer. It sounds basic, but it's crucial. When chatting with a model, you want it to have the right knowledge about your company, products, documentation, procedures, and customer profiles. But you also need the model to have the right tone of voice and level of compliance. For a banking support chatbot, the tone should match what you'd hear from a bank employee. If the chatbot is too familiar, too cold, or uses too much financial jargon, it won't work. The answer could be factually correct, but the tone is wrong. You need to find all these properties: the knowledge, the tone, and the answer quality. You need to evaluate models. A year or year and a half ago, fine-tuning was often necessary. People would use retrieval-augmented generation (RAG) to get fresh data, but RAG doesn't turn a generalist model into an expert. It provides fresh data and some jargon, but the base knowledge remains the same. Fine-tuning was often required. Now, the models are so good in terms of reasoning and step-by-step explanation. The old models would give a blob of text that didn't sound natural. Today, models out of the box are good. They've seen enough data, and reinforcement learning and alignment techniques have made significant progress. It's easier to control them through reasonable prompting. I advise starting with off-the-shelf models. See if you can get the right tone, safety, and quality. You will still need RAG for fresh data. If the model lacks fundamental knowledge or specific expertise, you might need fine-tuning. This involves writing your own Q&A pairs, which isn't as simple as it seems. A few hundred or a couple of thousand Q&A pairs can make a significant improvement. Sometimes, you need additional pre-training, especially in fields like life sciences with a lot of internal, confidential data and jargon. But this is not for everyone. The models have improved, and off-the-shelf models now offer better success. Start with the simple approach: off-the-shelf models, add RAG, and then evaluate. Fine-tuning and pre-training should be last resorts because they are not as easy as people think.
Definitely. I agree. You mentioned reasoning capability, which has been a significant topic, especially with the DeepSeek R1 model. You also mentioned reinforcement learning techniques. Now, focusing on your organization, Arcee, you work as a chief evangelist, and many people are wondering what that entails. If you don't mind giving some context around that and the company. Yes, it is a unique job title, but it's a good conversation starter. Meetings often start with, "Hey, before we get into the meeting, what do you do?" There are different aspects to the role. One is developer relations and technical evangelism, which is at the core of what I did at AWS, Hugging Face, and now at Arcee. I work with engineering and R&D to understand what they're building, give it a spin, and figure out what it does and how it can be used. I explain the value proposition to users and create demos, blog posts, and talks to show what can be built with it. It's about playing with the latest tools and putting myself in the shoes of potential users to show them examples and use cases that make sense. This requires being technical; otherwise, it's just PowerPoint slides. Some evangelists focus more on the business side, strategy, and consulting, which is not my approach. I dive deep into the details and then come back with findings that a wide technical audience can understand.
The other side is customer-focused. I engage with potential customers, dive into their use cases, and see how we can help. I bring product and technical expertise to the table and bridge the business problem with the technical solution. I love this part. Many DevRel people avoid customer meetings, but I never refuse them. I'm not part of the sales team; I'm here to help and explain. My lack of a sales quota allows me to be honest and say things that a sales team might not. This is valuable to customers, who respect that I'm on their side. There's also a partnership angle, where we work with AWS and other companies to integrate our models and platforms into their systems. You can use Arcee models on Amazon SageMaker, Amazon Bedrock, and the AWS Marketplace. This involves technical work and convincing people at AWS to include us. It's a mix of all these things: waving the flag, convincing people that Arcee models and platforms are worth trying, and proving it with demos, code, and talks. I travel and talk to people, essentially acting as a brand ambassador.
One thing I picked up is that Arcee focuses on small language models, and you also have distilled models. Distilled models involve using a large language model to train a smaller model. From that perspective, why are small language models important in the industry? A year and a half or two years ago, the options were mainly OpenAI and Anthropic. Those models are good and sometimes show impressive abilities, but they may not be relevant to the problems we're trying to solve. They have a wow effect but don't necessarily translate into business value for real-life companies. There are concerns about privacy and compliance, especially in highly regulated industries. Companies are wary of sending data to APIs, regardless of the provider. Keeping data within the IT platform is always preferred. Larger models are also difficult to tailor or fine-tune effectively. Some fine-tuning APIs don't deliver significant adaptation, as shown in a recent paper. The cost is another issue. Larger models require more infrastructure and are more expensive to run. Privacy, compliance, lack of domain adaptation, and cost drive enterprise customers to look for alternatives. They want full privacy, a narrower use case, and cost efficiency. Small language models (SLMs) have gained traction in the last year and a half. Initially, customers saw better results with larger models, but now we're seeing small models that are amazing. Models like Qwen from Alibaba and their variants, and DeepSeek, are base models that companies like Arcee improve through techniques like distillation and model merging. We released a model based on Mistral 24B that is massively better than the original. We also have VirtuosoLite, a 10 billion parameter model that outperforms our 72B model from last year. This 10 billion model can run on the smallest GPU instances on AWS, making it cost-effective. Even in regions with older GPU generations, you can still run state-of-the-art models. You can even run them on CPU, which I'm a huge fan of. Projects like LLaMA CPP and MLX optimize models for CPU inference, allowing you to run them on laptops, industrial PCs, or remote locations with limited cloud connectivity. The cost-performance ratio is excellent, and you don't have to compromise on quality or reasoning. One of the models we released yesterday is a 7B reasoning model that outperforms larger models on math benchmarks. This year will see some incredible breakthroughs in SLMs, thanks to the ecosystem's rapid progress.
Reasoning is a critical area where AI models can significantly benefit, especially for agentic AI. When it comes to enhancing reasoning, techniques like chain of thought are fascinating. The DeepSeek R1 model provides insight into its thinking process, which is valuable. What techniques do you use to enhance generative AI models and get the right reasoning for complex tasks? It comes back to the original question: what am I trying to do? Models like the Microsoft PHY series were trained on textbooks, which are structured to explain concepts step by step. The quality of the training data is critical. A model trained on well-structured data can be prompted to explain or analyze something step by step. I try to interact with the model naturally, as if I were asking an expert to explain something to me. For example, if you wrote a piece of code a year ago, and I need to understand it, I would ask you to explain it step by step and highlight the critical parts. This approach feels more natural than artificial prompt engineering techniques, which I dislike. A great model should understand what I'm after, and that means being trained on relevant data. Human preference and alignment are crucial, and reinforcement learning techniques, like those used in DeepSeek, are key to making models efficient and well-aligned. No two humans have the same education or fine-grained understanding of the same problem. Plus, there are infinite nuances in questions and answers. Models need to pick up on these nuances through training on the best possible data.
One thing to note is that when DeepSeek came along, OpenAI blamed them for using their data, but that data wasn't available for OpenAI to use in the first place. Now, looking at Grok, a lot of social media has been used, which is a hot topic. Benchmarking is another issue; companies or individuals tend to cherry-pick which benchmarks they want to compare, leaving out certain models to make their own look better. What advice would you give to address this issue in the industry, especially regarding standardized benchmarking?
Google has its own exclusive deal with Reddit for training purposes, but using social media data can introduce hate and toxicity into models. Grok Thrice, for example, wanted an unfiltered model that could say whatever it wanted. Elon Musk has his own understanding of freedom of speech and wants an unfiltered model. There's a balance between zero filtering and having models that can say offensive things versus models aligned in the wrong way. The problem with filtering and alignment is who's in charge and who watches them. We've seen examples of aligned models going wrong, like the original Gemini model's take on some parts of European history. The model is just doing what it's told, and if it's aligned in a certain direction, it doesn't have a deep understanding of world history. It's generating text or images, multiplying matrices, but there's no reasoning, just the illusion of reasoning.
This is a philosophical question, similar to debates about freedom of speech or the press. In the US, they have the First Amendment, but in other countries, certain views are not allowed by law. It's not just an AI problem; it's a world problem. In countries like Singapore and Dubai, the chatbot's responses must align with how the local people want to raise their kids. It's not right or wrong; it's their country, their rules. This highlights the need for local models trained by local organizations, especially in places like Singapore, Indonesia, Dubai, and Africa, where language support is crucial. Local actors best understand the culture, history, and sensitivity on certain issues.
The world is complicated, and we need local models to account for the diversity of languages and cultures. OpenAI and others are not interested in supporting the hundreds of languages spoken in Africa or the dialects in Indonesia. We need decentralization, not just a few huge models that work well for the top 10 languages. Initiatives are happening to fine-tune models for local needs, and we need more of this.
Regarding advice for governments, especially in Africa, where there's a scarcity of resources, collaboration is key. Languages and cultures are more intricate than geographical boundaries. Why should Senegal, Kenya, and Nigeria each build their own LLM? They should collaborate. Europe is trying to do something, and Africa needs to do the same. Universities, research labs, and open source communities should start building bridges. Don't wait for governments; use the enthusiasm and energy of students and the open source community. Start small, and get more awareness and adoption. Local champions need to start building their own ecosystems. Reach out to tech companies for support, get cloud credits, and leverage open source. It's all in your hands. If you don't do it, the West Coast people won't do it for you.
Thank you, Marco. It's been a pleasure. I would love to have this conversation longer. We are in the infancy stage of developing an AI machine learning hackathon and are getting support from startups and cloud providers. If you have contacts or can help connect us, that would be great. Hackathons are a good option, and cloud providers can offer credits and venues. Use guerrilla tactics to get things going. Thank you, Chris. Have a fantastic day.
Tags
AI PodcastIndustry LeadersMachine LearningEthical AIOpen Source AI
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.