It’s Never Different This Time: LLM Reliability Without the Hype with Julien Simon
November 24, 2025
In this video, Sylvain Kalache (Rootly) and I chat about the reliability of GenAI systems:➡️ Why the reliability fundamentals haven’t changed➡️ Why identical models behave differently across providers➡️ The realities of fast-moving open-source stacks➡️ The rising role of ML Ops➡️ How enterprises navigate the open-source vs. closed-source debate.As usual, zero BS and a few hot takes. You should enjoy this one 😉 Thanks again to Sylvain for the opportunity.
Read full post on Substack →
Transcript
In to the use, my host for today. In this episode, we welcome Julien Simon, VP and Chief Evangelist at RSI, an open intelligence lab. Let's dive in. Julien, from a reliability point of view, for people who are using hosted LLM, what are the main challenges that they are facing? You know, it's never, it's never different this time.
So if you're working, and you're going to hear me say that a lot, if you're working with hosted APIs, you know, like OpenAI or Anthropic or Amazon Bat Rock, any of those, at the end of the day, you're still working with a cloud-based API. So all the usual concerns will apply and I guess more concerns will apply. But generally you're, you know, you need to worry about, of course, uptime and maybe you get an SLA, maybe you don't. You need to worry about latency, so how long it takes to actually run one inference query. Thruput, how many parallel queries can you run?
How quickly do you get throttled? Do you have explicit throttling limits? Do you have soft limits? Do you have hard limits? Are they real?
Have you tested them? You know, load testing and all that good stuff. And of course, I would add security. And that's not really part of reliability, but it needs to be in the picture. How much can you trust that API?
Because you're going to be sending it all your good data, your confidential data, customer data with PI probably, and so on. Just generally treating those as, I would say, NILA API is a good place to start. Don't let the AI magic blind you. It is an API. It's running in the cloud.
And it can break and fail and slow down in every possible way. That's where you should start. Figure that out. Do some testing. Check out SLAs.
Hopefully you have one. If not, what's your backup strategy if that API fails, et cetera. Right. And many, many people complain about "'Hey, let's say I'm going to, you know, like, like, like, like, like, like, "'Let's say I'm going to, you know, "'quiry like the same model, "'but from different providers "'and I'm getting, like, "'Drastically different responses "'and "'why is that?' "'Ah, that's a really good question. "'It's a thing.
"'It's real. "'And, you know, you could think, well, "'if, let's take a basic example. "'Let's say, you know, "'You're calling, you know, Llama, "'three, eight billion "'or, you know, "'sevillion or, "'70 billion, "'or any one of them, okay? "'And you've tested, "'and you've tested, You they are The a Confeder right . Right hey There to � or Confeder are and go I like and I – and I So they you Well or be and maybe where again em The where it - I You Yeah Yeah There enn , ...
It T in And and you