Build a Reasoning AI Chatbot with Arcee AI Trinity Mini + Gradio + OpenRouter

December 22, 2025
I just built a chatbot that shows you exactly how the AI thinks. Using Arcee AI’s Trinity Mini model, I created a dual-panel interface in which the reasoning process is displayed in real time above the chat response. This makes the AI’s decision-making transparent—you can watch it work through problems step by step before it delivers the final answer. Read full post on Substack →

Transcript

Hi, hi, Julien here. In this video, we're going to continue exploring the recently launched RCAI Trinity Mini model. And to do that, we're going to vibe code a simple chatbot. And I'm going to use Python, Gradio, the OpenAI API. We're going to invoke the model hosted on OpenRouter. And of course, I'm going to use my favorite tools, cursor, and cloud code. Okay, let's get started. I'm not going to do that again. I'll just include all the links in the video description. Okay. So elevator pitch, 26 billion parameter model, mixer of experts, reasoning capabilities, and we'll see those in action. Okay, the model is available on Hocking Face under an Apache 200 license, which means you can use it even for commercial apps. And that's when we're going to used today it's also available on open router with a friendly open a I compatible API okay all right let's get started let's fire up cursor okay so there isn't much here just an environment and a dot on file with the open router API key okay which is the usual way to do that instead of hard coding it into your code but then again I'm showing it which is not what you should do. And obviously I will disable that key when I'm done. Okay. So let's just prompt the model. I'm going to use plan mode for now and just let them. Yeah, just let Currisor, pick the right model. So here's my prompt. I want to build a simple chatbot based on the RCI Trinity mini model. You should use Gradio, the opening eye client, and streaming mode. I also want to see the reasoning code. Add some simple prompts and then I give it the model page on open router and point it at the API key okay let's try this we'll look at the plan first and if we like the plan then we'll switch to to building there's a question how would you like to display the reasoning code in a separate panel tab below in line Thank you. Okay, that's a collapsible section. Okay, show me the reasoning code on top, then the answer below. Use different panels. That's why I like to ask the models, you know, ask me for clarification instead of just picking one of the options and then me realizing, ah, okay, no, that's not what I wanted. Okay, so it looks like we have our plan. Let's take a look at the plan. Okay, create chatbot, Gradio, OpenAI, streaming handler, implement reasoning extraction, add simple prompt. Okay. The layout. Okay, panels are stacked vertically. Simple prompts. Okay, well, that looks fairly straightforward. Okay, let's go build. So that's probably going to take. So I'll post the video and there's anything fascinating I'll show you. Here's the code already. Okay, so we've made good progress. We have the basic structure. We have history management, which is nice because I didn't ask about that. So we should be able to ask full-up questions. We have the streaming code. We have sample prompts. And then we have the app and the Gradually. All right, that should be done soon. Okay, so now we have extraction for the reasoning code, which is important. And yeah, that should be the last thing. It's fixing something somehow, which is fine. Let's give it a minute. Okay, so now we have the code. One thing I think we're missing is the requirements file. So let's ask for that. I didn't specify. Okay, done. So let's just keep all those updates and let's just run the chatbot. Okay, there's a small error. Fine. Yeah, let's ask you to review the code for Gradio 6 compatibility, because that's the one it installed by default. I didn't specify any version here. Some additional keywords or inconsistencies. Let's double check that. We should save on debug. Okay, so apparently it did fix a few things. So let's keep all of those and run. Okay, let's try and run it. I don't like the built-in browser so much. Okay. Good, good, good. Reasoning code, chat. Okay. Examples. Let's give it a shot. Oh, yes. Man, this is fast. In the previous video I showed you how fast this was. It was like 250 tokens per second. And we get that here. Okay, so we see the reasoning code. Nice and then we see we see the answer with a bit of math and everything I like here a bit of code good um nice nice why don't we try another one what are the key differences between supervised and unsupervised learning let's go , wow. Isn't that cool? Seriously? That's the beauty of those mixture of experts models. So it's 26 billion parameters, but it's only 3 billion active, meaning when you run inference, you're only using, you're only computing, so to speak, with 3 billion parameters. So that's why you get that kind of speed. Literally, you get the speed of a 3B model with the knowledge of a 26B model. Yeah, so that's pretty nice. One thing I'd love to have here is maybe the ability to set the number of tokens. So why don't we go back to cursor and ask for that? Okay. Let's try this add a slider to control the number of generated tokens. Okay, yeah, we can build that directly. It shouldn't be a big thing. It's just one more widget in the UI. Yeah, it's passing a max tokens to the chat function. And what's this bit? Okay. Yeah, it's passing the same to the completion API at OpenRouter. It's adding the slider. Max 2K. Yeah, let's make that 4K maybe. And yeah, then it's passing it again. Okay, well, that looks good. Let's run that again. Redode it. Ah, here's my slider. Okay, well, let's ask 4K. And let's try. Yeah, let's try this one again. Yeah. All right. That's pretty cool. You could keep playing with this. Right. It's so easy. Maybe one last thing I want to show you. One thing I tend to do. Let's shut this ran is run a code review with a different model. And we could use code here. And there's an extension for Clod in Cursor. So you get the best of both worlds. You get all the cool models here. And you get to use your CLI. Check which one we have here. Let's use Opus. Okay, do code review. List issues. Don't fix them. Oops, don't fix them for now. And trust me, it will always. Always find something it's pretty crazy no error handling or missing API key okay that's problem no error handling for API calls oh that's bad um global client initialization okay hard-coded magic numbers for tag parsing oh that's that's ugly yeah redundant very All right. Oh, there's a lot of stuff wrong about this thing. Critical issues. Logic issues. Oh, dog strings. Oh, my God. 15 issues. There you go, vibe coding. Okay, let's fix the critical issues. I advise you to fix them, I would say, by one or not all at once. If you ask it to fix 15, it's going to rewrite everything and it's going to break a lot of things. And I like to tell it apply the smallest possible change. Okay, ask me if I'm sure. Okay. So if API keys missing, just yell at the user. Yes, I like that one. Gen, right? Code reviews, code reviews, and it have this tendency to just rewrite everything and, they can just kill everything you've done so far. So what is it doing here? It's going to add a try-accept block around the API call, which isn't changed. Yeah, okay, I like this one too. Okay, great. So that's a good start. And we could keep going, right. We could list the remaining issues. One thing I like to do is for a real project, not a toy demo like this, is to ask it to create get up issues automatically have a bunch of agents which live in a terminal which is way too tiny to see oh there you go so have a GitHub issue creator I have a GitHub issue resolver etc etc and those are really nice because you can you can use the GitHub CLI to get things done and focus the agent on a particular thing And that's really nice instead of using everything in the same conversation, right? So that's pretty cool. So should we try the chat? But again, just see if we haven't broken anything. I'd be surprised, but you never know. Okay, it's still working. Okay, let's try one of them again. Just bump this, submit. All right, and off it goes. Perfect. Perfect. So, as you can see, you got to use the right tool for the job. And the coding assistants are absolutely amazing. I love the combination of the built-in models in cursor and encode as well. Do the building first, check out the code, run it, get it. To a decent place and then use plot code and maybe your subagents to polish everything from security to maybe the UI, maybe creating issues, et cetera, et cetera. And working with different models, we'll cover more ground and hopefully find more issues, right? And when it comes to solving business problems, I think working with with a high quality model like this one is just mind blowing, right? The speed, as you can see, is just amazing. And it's super easy to use, right? You can use the, again, the OpenI API from OpenRouter. It will look exactly like any of the other models you're working with, except, of course, the speed and the cost is very, very different. We're using the free version, but if we look at, which has some kind of quota, I suppose, but if we look at the paid version, you're only paying 15 cents per million output token, which is, I think, at least 10 times cheaper than GPT5 Mini, which is already fairly cost-effective. So if you're looking at the GPT-5s or the huge anthropic models, You can probably divide your cost by 50, maybe 100 if you work with a model like this. And again, it's a reasoning model. It's incredibly good. You'll see the benchmarks in the blog post. So, yeah, I would recommend that you take a look. You could get a lot of speed up, a lot of scalability, and a lot of savings, right? So what's not to like? Okay, that's what I wanted to show you today. A little bit of Vype coding, a little bit of AI-assisted software development with large models to build a simple and fast and cost-effective application with a much smaller model. So, thank you for watching. I hope you liked it, and I'll see you soon with more content. Until next time, you know what to do. Keep rocking.

Tags

AIMachine LearningTechnology