Hi everybody, this is Julien from Arcee. Function calling is the ability to inject data coming from external APIs or external tools into the text generation process. In this video, I'm going to show you function calling with Arcee Conductor, enabling you to use external APIs or tools while making sure you automatically select the best SLS or LLM to run inference. Let's get started. If you haven't done so already, you should sign up to Arcee Conductor. Just go to conductor.arcee.ai and it takes a minute. Once you've signed up, you just need to create an API key, and that's the only thing we'll need to run the demo.
Okay, so of course the link to the notebook is in the video description. So first, we need to install some dependencies. As always, I recommend creating a virtual environment. Here, we're going to invoke APIs from Yahoo Finance. So that's why we need the WiFinance package. And of course, we need the OpenAI client because, as you probably know by now, Conductor is compatible with the OpenAI API. Okay, then we create the OpenAI client pointing to the Conductor URL, and passing the key. So let's just run this.
And then you can see my three test functions. One is going to retrieve the stock price for a particular company. One is going to get the CEO name, and the last one is going to get a company summary. Okay, and we can see the three corresponding Yahoo Finance calls. So feel free to tweak this. Feel free to use your own APIs. This is just a simple example with a free public API. Okay, so let's just run this. We can just double check that, yes, those functions work. Okay, so here I'm calling the Yahoo API directly. And here's the most important part. So here we use the OpenAI tools definition. And as you can see, we have a number of blocks describing each function. With the description and some examples, this is really useful because it will help the model match your query to similar queries that the function should be able to fulfill. Then we describe the parameters, and here we pass the company name, which is a string, so that's going to be the human-readable name, and we are passing the ticker. In fact, we'll see that in most queries, we'll just mention company names, and the model will automatically infer what the correct ticker is, which is another benefit of working with a language model.
Here's the second function, the CEO name, so very similar examples, parameters, and the summary. This information will be passed in every prompt, and this is how the model will try to match your query to the appropriate function. Okay, so then I'm writing a function called tools, which will get the user and the max tokens, and as you can see here, I'm invoking the Conductor API, passing auto-tool as the model name. If you watched my previous videos on model routing, we used to pass auto, which would select the right SLM or LLM for plain text generation. Here, we want to do function calling, so we need to enable models that are capable of function calling. Make sure to use auto-tool. And the rest is just the usual API, passing the tool definition that we just looked at. And tool choice is set to auto, meaning we let the model pick the appropriate tool.
Okay, so this will be the application. The actual model that Conductor selected. Don't confuse auto-tool, which is, "hey, pick the right model," with tool choice set to auto, which is, "hey, once Conductor has sent this, has routed this query to a model, the model will pick the right tool automatically." And then looking at the model's response, we can see if a tool was selected, which function was selected, which parameters were selected, and we actually run the function call, which is what you see here, calling the actual function and displaying the results. And if no tool was called, then we say, "hey, no tool was called." This is a slightly more elaborate version where I'm running the function calling here, and then I'm passing the output from that previous function to another model to write a better story because we're probably not just interested in the function output. The function output is just useful information, and now we want to pass this information to maybe another creative writing model to make the answer look nice, to apply tone of voice, to put extra context in there, etc. So that's what I'm doing here. So here I'm calling another model to generate a story.
So let's run this first prompt. We see that this was routed to caller, which is our own function calling model. Caller decided that given this query, we should call get CEO name with company name set to General Motors and stock symbol set to GM. So the query doesn't say GM; it says General Motors, but as the function expects the ticker, the stock symbol, the model automatically infers that the correct ticker for General Motors is GM, which is pretty cool. Then we actually call that function and display the result. That's the baseline, but now if we take this output and pass it to Blitz, let's see what we get. So we're going to run the function calling prompt again, retrieve the result, pass this output to Blitz. And hopefully, we'll see a slightly better story here. That's useful because we see a tool was called, so we can trust the answer probably a little more. Here's the answer, and there's some extra analysis based on Blitz's knowledge. So if you have models that do well in particular domains, you can use their knowledge to augment your answer. And we have some extra URLs in case we want to see more, and those were generated by Blitz too.
Okay, let's try another one now. What's the last price of McDonald's? This time, we used Sone, why not? Probably the router decided it was a slightly more complex question. We get the output from the API, and now if we ask Blitz to do a little more, we use the tool, and here's the answer. We got a little more information on McDonald's, again with some URLs. Let's try a slightly harder question: Does 3M make filtration products for the automotive industry? This feels like a company summary kind of thing. So we used Color again, get company summary, 3M, MMM, and we get verbatim output from the Yahoo Finance API. There's got to be some information in there, but that's where probably using the additional model will be useful. So I use the tool. Yes, 3M does make filtration products, and we get an analysis. While the tool does not specifically mention filtration products, 3M's extensive involvement in the automotive sector suggests that the company offers filtration solutions, etc. Obviously, we could have RAG into the mix, retrieve information from 3M documents, and augment the quality of the answer here. But just to show you, the model could be a source of knowledge too.
Let's try maybe this one: On which products do Procter & Gamble and Johnson & Johnson compete the most? Probably a little harder still. So we call GPT-4-0 on this one. Very interesting. We see two function calls because I did mention two company names, and so automatically the model figures it out and calls those two APIs. We see the output for the two companies here. Let's see what Blitz can write about this. So Procter & Gamble and Johnson & Johnson compete most in the healthcare and personal care sector, and we get an analysis with overlapping areas and some additional resources. As you can see, we're working with well-known public companies, so it's kind of expected that Blitz would know what those companies are manufacturing and selling. This is just a small model, the smallest and most cost-efficient model you could use in Conductor. So that's pretty cool.
In fact, if we have to have fun and combine automatic tool calling with automatic model routing, we could set the model to auto. So now we're going to ask Conductor to make two decisions: pick the right model for function calling and then pick the right model for report generation. Let's see how this works. So running Conductor in full auto mode, this is what we get. If we look at the Conductor, let me reload this to make sure. We can see that GPT-40 was selected for function calling, and Blitz was actually selected to write the story. We can see tool results here, so that's pretty cool. Using Blitz all along, and of course, you can see different models have been used along the way: Color, Sone, GPT, and Blitz, etc. We'll keep adding more models here, and the obvious benefit is we only use the larger models when we have to, particularly here for story generation. We can get away with Blitz, which is again the simplest and smallest and most cost-efficient model.
So, bottom line, if you're running function calling with GPT or Sone 100%, well, I'm sorry to say, you're probably spending too much money because, as we can see here, we can get quality results, quality function calling results, and quality text generation results from much more cost-efficient models. All right, well, that's what I wanted to show you. This just came out, and I think this is a pretty cool demonstration and use case for model routing. I'll see you soon with more content, of course, as always, keep rocking.
Tags
API IntegrationFunction CallingModel RoutingCost EfficiencyAI Text Generation
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.