Discussion with Mark McQuade CEO and co founder Arcee.ai

Transcript

Hi, everybody. This is Julien from Arcee. Welcome to this podcast episode. I'm super happy to have my friend Mark all the way from the US. Mark, great to have you. I'll give Mark a chance to introduce himself, and then we'll go into an interesting discussion on small language models and what Mark's company has been up to. We'll dive into things like model merging and whatnot. So let's see where this goes. I don't have a plan. We don't need a plan. So Mark, very happy to have you. Would you please introduce yourself and tell us a little bit about Arcee, the company you started? Yeah, great to be on, Julien. Great to see you again. Mark McQuaid, and I've known you since Hugging Face, right? I was an early hire, I guess early enough. Learned a lot during my time at Hugging Face and used a lot of those learnings to start Arcee, which is a company I started about a year ago. Our mission is to enable organizations to better customize and utilize LLMs. That mission has brought us to where we are now, with a heavy focus on small language models (SLMs), model merging, and MergeKit. It's been quite the year, and I'm excited to be here and catch up with you. Well, let's dive into everything you said. A good place to start would be small language models. By now, everyone's familiar with transformer models and large language models, which are amazing for various use cases, like conversational apps and generating TVI, etc. But tell me a little bit about small language models. What is a small language model? How does it compare to a large language model? What's the benefit of working with SLMs compared to their bigger counterparts? Yeah, small language models are subjective, but at Arcee, we consider models under 70 billion parameters as small. The sweet spot is probably in the range of 7 to 13 billion parameters. The power of small language models lies in their ability to be customized and grounded for specific use cases and tasks. We believe the future is a world of millions of smaller, specialized models, not one model to rule them all. In a large Fortune 50 company, they won't just rely on Claude or GPT for everything. Instead, they'll have hundreds, if not thousands, of SLMs focused on specific tasks, like financial risk analysis or customer support. These models are smaller, more efficient, less expensive, and less complex, making them highly beneficial. I think about 99% of business use cases can be solved with a smaller, specialized model over a large one. I spend a lot of time meeting with enterprise customers, and most start with open-source models, particularly small ones, because they are faster, cheaper, and more scalable. They look at these models before considering large, closed-source models. Just coming to customization, would you say a small model is easier to customize because it generally knows less? Is it easier to focus on a narrow enterprise use case where you want an inch-wide, mile-deep model? Exactly. Smaller language models can have a more focused point and are easier to inject domain knowledge into. They can be grounded and focused on specific domain data much more easily. Even though models like LLaMA 3 8B have been trained on 1.5 trillion tokens, they still have strong general reasoning capabilities. The beauty of SLMs is that they can be trained on trillions of tokens, giving them both domain-specific and general reasoning abilities. They are not toy models; you can get a lot done with high-quality 8, 7, 6, and even smaller models. The Microsoft Phi 2 and Phi 3 models are great examples of this. Now, let's talk about Arcee, the platform you are building, and how it helps customers work with small language models and customize them. Why should people take a look at Arcee's platform? We built a platform that runs inside a customer's VPC, which resonates with ultra-enterprise companies that never allow their data to leave their environment. Our stack includes the ability to do continual pre-training, full fine-tuning, supervised fine-tuning, and model merging. Model merging is our core, and we provide MergeKit as a library. We understand that the power of merging is also in the ability to customize and fine-tune models. Our platform allows you to customize models through training and then merge them, injecting knowledge from other great models into your own. Currently, our platform is VPC-deployed, but we're expanding into a cloud offering soon. We differentiate with our model merging capabilities and our focus on making customization simple, even for non-machine learning engineers. Fine-tuning has become more standard and commoditized, and the effort lies in building the dataset, not the code. We aim to make the process even simpler, with a few clicks or a drag-and-drop interface. We provide a core UI that allows you to upload a dataset, point it to your model, and click a button to run pre-training routines. Then, you move on to the merging piece and finally to DPO to polish the entire thing. This is unique because many platforms only offer fine-tuning through APIs and SDKs. Let's talk about model merging. Most folks are familiar with fine-tuning, but model merging is still a novel technique. Tell us about MergeKit and its relationship with Arcee. Where does model merging fit in the toolbox of engineers and practitioners, and how would you recommend folks get started with it? Model merging is a novel technique to fuse multiple models together. Charles Goddard, the creator of MergeKit, developed it to utilize open-source checkpoints more effectively. MergeKit started as a side project and gained traction, leading to its integration into Arcee. We see model merging as the next frontier of transfer learning, allowing you to combine thousands of open-source checkpoints. The true power lies in training a model on your domain data and then using MergeKit to merge it with another model, healing any catastrophic forgetting and boosting its capabilities. For example, if you train a model on financial data, it might degrade a bit. Instead of adding more general tokens, you can merge it with another model great at general reasoning. This heals the degradation and boosts the model's performance. Model merging can be a healing and boosting mechanism, combining the best of multiple worlds. I did an intro-level video on model merging, and it's popular. The research papers show interesting examples, like merging math and code models or combining computer vision datasets. Model merging can be the missing link, allowing you to leverage the compute already spent on fine-tuned models. You can control the merging process by setting weights, experimenting with different configurations, and finding the best evaluation score. Merging is efficient and can be done on a CPU, making it easy to experiment. We recently released evolutionary model merging, which uses an evolutionary algorithm to find the best possible merge configuration. You define the evaluations, and the algorithm iterates through merge-eval cycles to produce an optimized model. This requires GPUs, but it produces a much better model. A couple of recent additions to MergeKit are the mixture of experts (MOE) and pass-through merging. MOEs are mergeable, but the challenge is the limited number of trained MOEs. We're working on efficient training of MOEs by adding experts to existing models. Pass-through merging, or Frankenmerging, involves chopping and stitching different model pieces together. While it's experimental and not recommended for production yet, it shows promise. We're heavily investing in using pass-through merging to extend smaller models efficiently, like extending LLaMA 3 8B to 11B parameters and fine-tuning only the added parameters. The best place to start is the MergeKit repo. You can read the docs, run the examples, and take it from there. We also have a Hugging Face space with a MergeKit UI and another space for auto-generating config files. On June 21st, we're launching our hosted SaaS, Arcee Maestro, which will allow you to log in and merge models. We'll have a generous free tier for base mergers and evolutionary model merging in the product. Any last thing you want to add about Arcee or advice for developers out there? Stay tuned for our cloud offering, launching on June 21st. It will provide continual pre-training and model merging as a service. Sign up for the waitlist now. Play with MergeKit to see the true power of model merging. It's not just about merging two models; it's about training your own model and then merging it. There's a huge appetite for companies that want to train and own their own models. Pairing this with model merging reveals its true potential. Let's stop gaming the leaderboard and start building useful stuff. Mark, it's a pleasure. Thank you so much for taking the time. Everyone, keep an eye on Arcee and MergeKit. They have a lot of interesting stuff in store. Thanks for joining us. I hope you enjoyed the conversation. See you soon with more. Thank you very much, bye, Mark.

Discussion with Mark McQuade CEO and co founder Arcee.ai

Transcript

Tags

About the Author