Model routing with Open WebUI and Arcee Conductor

Transcript

Hi everybody, this is Julien from Arcee. In a previous video, I introduced you to Arcee Conductor, our new inference platform that automatically sends each prompt to the best SLM or LLM. In this video, we're going to use a very popular chat interface called OpenWebUI, and I'm going to show you how you can very easily add Conductor to OpenWebUI. Not only will you be able to use the best SLM or LLM for every prompt, but you will also be able to use all the cool tools that are available in OpenWebUI, like attaching files, using retrieval augmented generation, and so on. Okay, sounds good? Let's get started. Of course, you will find installation instructions in the GitHub repository for OpenWebUI. There's a pip install path, but I actually recommend using Docker, which is very straightforward and worked for me every single time. You will need to have Docker Desktop, so if you don't have Docker Desktop, please go to the Docker website and download Docker Desktop and install it. It should only take a few clicks. Once you've done that, that's the command you need to run to install OpenWebUI on your machine. It's just a simple Docker command that will download the image and start the OpenWebUI container on your local machine. It will start automatically, and you will see it running in Docker. Here, I've done this already. We see the image has been downloaded and the container has been started automatically. If we don't click on this, we can see it's running. This will only take a minute or two because you are downloading a fairly large Docker image. But that's about it. Once the container is running, you can open your browser and go to localhost port 3000, and you will see the sign-in window. If this is really the first time you run OpenWebUI, you need to register and create your admin user. Just use your email address and a password, and you'll have that user created in no time. I've done that already, so let me sign in and then I'll show you the configuration for Conductor. So here I'm signed in and I am the admin user. The first thing you want to check is that, in the admin panel under settings and connections, direct connections is on. It should be on by default. If you are not the admin user for your OpenWebUI deployment, go ask them if this is enabled or if they can do it for you. This is really important because this is what will allow us to create connections to Conductor through the OpenAI compatible API endpoint. We absolutely need this to be on. If you just installed it and you're the admin user, it's going to be on. Now we're ready to configure Conductor. You need to have a Conductor account. If you haven't created one already, please go to conductor.arcee.ai and register there. Registration is free. Conductor is a pay-as-you-go service, so you can create your account and you won't pay anything until you start using the service. At the time of recording, we're giving away $200 of free inference credits. So enjoy the credits while they last because they won't last forever. Once you have registered, go into your account and create an API key. Obviously, I have one already. Just click on create API key, and you'll get your key. Make sure to save it because it will only be displayed once. As you can see, I cannot view the key, so if you fail to save it, you need to delete it and create another one. Save your key because you're going to need it for the OpenWebUI configuration. Now that we have an API key for Conductor, we can connect OpenWebUI and Conductor. Go to Settings, Manage Direct Connect, enter the Conductor endpoint URL. Don't forget `/v1`. Paste your API key, and we need to add the auto model, which means automatic. This tells Conductor to use whatever model works best for each prompt. Don't forget to click the plus button, verify the connection, and save. While we're at it, let's add the models from the model engine in case we want to call them individually. It's a different URL but the same key. Here, we don't need to enter any model ID; we will list the models available on that endpoint automatically. Let's verify the connection again, and save. Now, if we go to a new chat and look at models, we see a lot of models. We see auto, which is the Conductor router, and then we see the models available in our platform, our own SLMs, such as the Virtuosos and Spotlight, a visual language model. Let's give it a shot. Let's select the router mode and run a prompt: "Write a short welcome message for a new employee joining Arcee AI next week." Why not? As usual, we're sending the prompt to Conductor. The router model will pick the best model for the job based on prompt complexity, domain, and cost-effectiveness. Here's the answer we get. That was a simple prompt, so chances are we used a simple model. Let's take a quick look at the Conductor UI to see what went on. In the API history, we see the prompt, the input tokens, the output tokens, and the price, which is very low because we likely used a small model. We also see the extra prompts that OpenWebUI runs to generate tags, which are also very cheap. Let's run a more complex example. Now, let's try: "What's the difference between logits distillation and hidden state distillation? Can I do both with PyTorch?" This is a much more complicated question, very domain-specific and code-related. We're getting a detailed answer with key differences and a bit of code. Clearly, we're not using the simplest model; we're using something more elaborate. If we go to the history and reload the page, we can see the cost of this particular query was certainly higher, so we used one of the larger models, possibly one of the LLMs, to answer this. Simple prompts go to very cost-effective models, and more complicated prompts go to more elaborate but also more expensive models. That's exactly how the service should work, and we see the total cost here. Now, let's try adding a PDF file to OpenWebUI and asking questions. Let's fetch a research article and upload it. Let's ask, "Does this article mention Arcee?" It mentions MergeKit, interesting. Tell me more. The article discusses a comprehensive evaluation of MergeKit, demonstrating its benefits and comparing its performance against other merging methods. Let's ask, "Does it mention Arcee Fusion?" Yes, it does. It mentions a mythical Arcee Fusion, one of the latest techniques our team added to MergeKit. Under the hood, a lot is happening, and depending on the complexity of the prompt, it selects one model or the other. You're spending your money very efficiently. For reference, our smallest and most cost-effective model, Blitz, is 300 times cheaper than Sony A37. Any prompt that Blitz could handle and that you're sending to Sony A37 instead, you're paying 300 times too much. Let's do one last thing. I want to show you how you can use individual models if you want to. Let's create a new chat. Because we also configured the model engine API, we see all the individual models. Let's try Spotlight, a visual language model. Let's grab an image and say, "Describe the image in detail." The image shows a vibrant urban scene in front of the Nasdaq building. The billboard announces a funding round by Arcee AI. The building itself is modern with latched glass windows reflecting the bright sunlight. We got a description, so you can use all of these individually if you want. Again, you can set the mode and model to auto and let Conductor do its thing and select the right model for each individual prompt. Thanks to our OpenAI compatible API, it's super easy to use Conductor and configure it into all those nice open-source tools like OpenWebUI and many more. It should always be the same: enter the URL for Conductor and your key, and you should be good to go. Now you can use all the cool stuff that OpenWebUI comes with, like attaching files and running RAG. There's an endless list of tools, and this is a pretty cool user experience, I think. That's it for this one. I hope you liked it. If you have questions, please ask in the comments. Much more coming as usual. And until next time, keep rocking.

Model routing with Open WebUI and Arcee Conductor

Transcript

Tags

About the Author