Hi everybody, this is Julien from Arcee. In this video, I'd like to show you how you can use Arcee Conductor to automatically route your reasoning prompts to the best SLM or LLM. Let's get started. We've already discussed Conductor in a few videos, so I'll put all the links in the video description. If you haven't signed up yet, this is the URL, and you'll get $20 of free inference credit. Okay, so once you've signed up, you'll be able to go to the console and create an API key, which is the only thing you need to start using Conductor. Okay, and creating the API key happens here. Once you've signed up, you're ready to use my notebook. So let's just go and switch to that notebook and start running examples.
Enabling model routing for reasoning models in Conductor is super simple. As you can see here, we'll just need to set the model to auto-reasoning, just like we did auto-tool for function calling. Everything else is absolutely the same. Of course, you can still use the official OpenAI clients and get to work. Okay, so let's just run this. Okay, don't forget to enter your key here, which is the key you created when you signed up in the UI. Okay, now we can list available models and print a sorted list. So we see the usual suspects, the virtuosos, blitz, and other models we've discussed in the past. We see Arcee Maestro, which is one of ours as well, and that supports our reasoning mode. And of course, we have Auto-Reasoning, which is the router mode for reasoning models. Okay, and we'll run some examples and we'll see that different models are called. Okay, so let's try a first example. This is, I guess, a simple one. Write a friendly message for a new user of Arcee Conductor. Maybe that's you. Conductor routes your prompt to the best model, etc., etc. Get started with $20 and learn more at this URL. Okay, so we can just run this. Okay, again, the model here is set to auto-reasoning. And this is a simple one because we are working with reasoning models exclusively here, we're going to get a two-part answer. We're going to see the reasoning of the model and we're going to see the actual answer. Okay? And we can see here we invoked Arcee Maestro in reasoning mode. So we have some internal thinking, and this is very useful. It always looks a bit noisy, but if you read it, it's interesting to see how the model works and what it's considering to generate the answer, how it's reasoning to build what hopefully is a good answer. And by reading that, maybe you realize, oh, okay, yeah, I didn't think about this parameter, I didn't think about this factor, I didn't think about that angle in writing the email or in writing the meeting invite or whatever it is you're using the model for. And then, of course, you get the actual answer. And it's pretty cool. So simple prompt, small model. We'll look at pricing when we're done. But obviously, this one is a smaller and less expensive model. So we see the router doing a good job here. Okay, let's try another one.
So this one is a personal interest of mine, suggesting some productivity tips for frequent intercontinental flights between Paris and San Francisco. I usually fly midday or early afternoon. I need to be productive on the plane. I work on a Mac laptop. Wi-Fi may not be available. I'm looking at you, Air France. So I need to be able to work offline. My work involves reading, writing, and coding. Okay. Let's see if AI can help me stay productive on the plane. Okay, let's run this. Okay, so this is probably, there are certainly more things to consider here. So, flight duration, what about my different tasks, health, preparation checklist, etc., etc. Okay, so all those good tips that are really, really worth reading because you say, oh yeah, I never thought about that, you know, the model's right, maybe I should use Git to work locally. Coding on the plane is always fun. Okay, and now here's the answer. Pre-flight preparation, tools for offline work, time zone and sleep management. Comfort and health, productivity workflow, emergency backup, mental prep. Pre-flight meditation. Why not? Anyway, it's a good list. It's a good list. It'll get you started, and we can definitely understand how the model came up with that list. It's not a random page coming from Wikipedia or something. It's tailored to what I asked. So, pretty good. And again, we used Arcee Maestro for this. So, very cost-effective.
Let's try another example. Let's keep doing the travel thing. So, a real-life example. I will attend two events in June. One in DC, June 9 to 13. One in San Diego, June 10 to 12. I only need to spend the full June 10 in DC. They only need me for one day over there. And I like to find the day before obviously and I'll fly from Paris. What are my options? So that's a really messy prompt I just wrote, you know, didn't pay any attention. We have overlapping dates, we have a lot of ambiguity. I'm just saying, hey, I need to be in San Diego June 10 to 12. I don't say for how long. This is a horrible prompt, but that's why you want reasoning models because they should be able to figure it out. So let's run this and see what happens. If AI can solve my travel pain points. Okay, so... Yep, there's a lot of thinking in there. So you can read it if you like, I won't read it. This is a bit of a scheduling conflict, thank you, AI. I don't know if AI can solve technical evangelism, let's see. Okay, so different options. A bit of a puzzle. Yes, my life is complicated. Thank you. Okay, off it goes. Ah, see? Even a reasonably simple prompt like that takes a bit of work. The model is working hard here. Okay, all right. So, fly from Paris to DC on June 8th. Hmm? Why not? This ensures you arrive the day before you're required for a full day in DC. Okay, makes sense. I mean, I could fly on June 9th and still be fine, I guess. Attend the DC event June 10. Depart DC on June 10 evening. Yes, depending on how late that is. To arrive in San Diego the same night. Okay, yeah, because the time zones work in this case. Example flight, oh, well, is that a real flight? Let's check it out. I doubt it. So the flight actually exists, but it flies from Costa Rica to San Francisco, so not quite what I was looking for. All right, so nice hallucination here, but... Nice try. So I'll miss the first day of the San Diego event, but I'll arrive in time for June 11 and 12. And I can spend the rest of the stay in San Diego. That looks okay. Because I didn't say how long I had to stay in San Diego. I didn't put any kind of constraint here. So obviously, I cannot be in two different locations on June 10. So the model made the right call here. Okay. Key considerations. Alternative option. The San Diego event allows late registration. That's key session on June 11, 12. This itinerary works best. If not, consider skipping San Diego and focusing on DC. Hmm. Why not? Why not? Okay, so as we can see here, this looked like a basic question, but it took a bit of work from the model, mostly because my prompt was fuzzy and ambiguous. But hey, isn't that the name of the game? That's why we need models that can reason, or at least come up with options and look at their own decisions and decide if they're working or not, simply because we are ambiguous folks and it's very difficult to write very clear prompts. So pretty good here, and we used DeepSeek R1. Okay, so we probably paid a little more but got a good answer. Okay, let's keep going.
Explain when I should use put options to protect my portfolio of US stocks against a market downturn, not that these things ever happen. Show me an example. Let's see. And it doesn't look like a complicated question here. But who knows? The model is thinking. And you could probably already know it is not really a reasoning prompt. I guess a vanilla SLM would be able to answer that. Actually, we will give it a shot. Okay, so when to use put options, a concrete example, blah blah blah. Oh, 03 mini. Okay, interesting. Let's try... Why don't we try Blitz? Which is not originally a model. Well, it looks like a good answer too. So, yep. Probably not a proper reasoning prompt here. Okay, interesting. All right. Okay, let's try maybe a last one, a coding prompt. So here I have my streaming function for OpenAI text generation and make this function more Pythonic. Explain possible trade-offs between memory efficiency, execution speed, maintainability. Suggest what you think is the best approach. All right, let's give that a shot. Plenty of trade-offs here for sure. Looks like Sony to me. Okay, so here again, trade-offs, memory efficiency, execution speed, maintainability. What we think is the best version, why this is best. Okay, good answer and yeah, a lot of coding problems go to Sony. That's probably one of the best models for that right now. Okay, so we could continue for a while. You see how easy this is. Just set the model to auto-reasoning. Now let's take a quick look at the API history. And we'll see, as usual, we'll see the price of those queries. So we can see the Arcee Maestro queries are all under a cent. This one is actually a tenth of a cent. This one is half a cent because we ended up generating a little more. DeepSeek is a bit more expensive. And again, Sony is a bit more expensive. And if you want to look at model prices, you can actually see them in the documentation or on the page here. Right. So you can see Sony is three dollars for 1 million output tokens and 15 for input. DeepSeek is cheaper for output, GPT about the same as DeepSeek, O3 mini quite cheaper, and Arcee Maestro much, much cheaper. So 90 cents versus 330 compared to three versus 15. So, you know, four to five times cheaper generally. So as usual, what this means is if you send all your prompts to, or maybe GPT or maybe DPSIC, you're certainly overpaying by a large amount for a lot of the simple prompts, and my guess is a lot of those prompts are simple enough to be handled by the smaller models. That's the very reason why we built Conductor.
Well, that's it for today. I hope you enjoyed it. The link to the notebook, of course, will be in the video description, so give it a shot. And thank you so much for watching. Until next time, keep rocking.
Tags
Arcee ConductorAuto-ReasoningModel RoutingAI Productivity TipsCost-Effective AI Models
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.