About¶
Olla was created by Thushan Fernando, building on his earlier work with LLM streaming proxies such as Sherpa (and before that, Scout). Those earlier tools had grown rich in features, doing far more than just routing LLM requests. Most users who adopted Scout and Sherpa, were mostly using the LLM Streaming Proxy component to combine multiple instances and unify models - though, we ensured models were in-sync across instances.
Olla takes a step back to deliver a clean, focused and high-performance proxy with model unification, starting with a port of the Sherpa Proxy and refining it into something leaner, faster and easier to work with.
Core Principals¶
Olla is built with a few guiding principles in mind:
- Fast and memory-efficient - Designed for high throughput with minimal allocations, so it runs lean even under heavy load.
- Configuration-driven - Behaviour is defined in a config file, not buried behind endless CLI flags.
- Robust under pressure - Handles malformed requests, model failures and connection hiccups gracefully.
- Extensible - Easy to adapt for new LLM backends or routing logic without rewriting the core.
- Observable - Built-in metrics and status endpoints so you can see what’s happening in real time.
The Name "Olla"¶
The name Olla is a tribute to a dear friend and colleague who would often ask, “We could to it in Olla?” - his playful twist on “We could do it in Ollama?”. Shortening words is something of a national pastime in Australia, and as a newly minted citizen, he embraced it wholeheartedly.
Tragically, we lost our mate in a motorbike accident in early-2025. Naming this tool Olla felt like a fitting way to honour his memory.
In Latin, an olla is a ceramic pot used for cooking stews and soups (Wikipedia). We liked the idea that you can “cook something up” in this Olla-with LLMs as your ingredients.
And in some Nordic languages, olla carries yet more meanings-just the tip of the iceberg.