Model Aliases¶
Configuration
Key Points:model_aliases: gpt-oss-120b: - gpt-oss:120b # Ollama format - gpt-oss-120b-MLX # LM Studio MLX format - gguf_gpt_oss_120b.gguf # llamacpp GGUF filename
- Aliases are defined at the top level of
config.yaml- Each alias maps a single virtual name to one or more actual model names
- The request body's
"model"field is automatically rewritten for the selected backend- Aliases take priority over standard model routing when both match
Overview¶
When running multiple LLM backends (Ollama, LM Studio, llamacpp, vLLM, etc.), the same underlying model often has different names on each platform. For example, Llama 3.1 8B might be known as:
llama3.1:8bon Ollamallama-3.1-8b-instructon LM StudioMeta-Llama-3.1-8B-Instruct.ggufon llamacpp
Without aliases, a client request for llama3.1:8b would only match the Ollama endpoint — even though the other backends have the same model.
Model aliases let you define a single virtual model name that maps to all of these variants, so any backend that has the model can serve the request.
How It Works¶
When a request arrives with a model name that matches a configured alias:
Client request: "model": "my-llama"
│
▼
┌─────────────────────┐
│ Alias Resolution │ my-llama → [llama3.1:8b, llama-3.1-8b-instruct]
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Endpoint Discovery │ Find endpoints serving any of those model names
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Load Balancing │ Select best endpoint (priority, health, etc.)
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Model Rewrite │ Rewrite "model" field → "llama3.1:8b" (for Ollama)
└─────────┬───────────┘
│
▼
Backend
- Alias resolution — Olla checks whether the requested model name is a configured alias and looks up the list of actual model names.
- Endpoint discovery — For each actual model name, Olla queries the model registry to find endpoints that serve it. This builds an endpoint → actual model name mapping.
- Load balancing — The matched endpoints are filtered through the normal load balancing pipeline (priority, health checks, etc.).
- Model rewrite — Before the request is sent to the selected backend, Olla rewrites the
"model"field in the JSON request body to the actual model name that backend expects.
Configuration¶
Aliases are defined under the model_aliases key in config.yaml:
model_aliases:
# Alias name → list of actual model names across backends
my-llama:
- "llama3.1:8b" # Ollama
- llama-3.1-8b-instruct # LM Studio
- Meta-Llama-3.1-8B-Instruct.gguf # llamacpp
my-codegen:
- "qwen2.5-coder:7b" # Ollama
- qwen2.5-coder-7b-instruct # LM Studio
Clients can then use the alias name in their requests:
curl http://localhost:40114/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-llama",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Olla will route to whichever backend has one of the listed models and rewrite "my-llama" to the correct name for that backend.
Self-Referencing Aliases¶
An alias name can also appear in its own list of actual model names. This is useful when the alias name is itself a real model name on one of the backends:
model_aliases:
gpt-oss-120b:
- "gpt-oss:120b" # Ollama knows it as gpt-oss:120b
- gpt-oss-120b # LM Studio knows it as gpt-oss-120b (same as alias)
In this case:
- An Ollama endpoint serving
gpt-oss:120bwill be included, and the request body will be rewritten to"gpt-oss:120b". - An LM Studio endpoint serving
gpt-oss-120bwill also be included, and the request body keeps"gpt-oss-120b"(no unnecessary rewrite since it already matches).
Alias Priority¶
When a model name matches both a configured alias and a real model known to the registry, the alias takes priority. This ensures consistent cross-backend routing.
If the alias resolves to zero endpoints (none of the actual model names are available), Olla falls back to standard model routing using the alias name as a regular model name.
Interaction with Other Features¶
Model Routing¶
Alias resolution runs before the standard model routing pipeline (strict, optimistic, or discovery modes). Once alias endpoints are resolved, they go through the same load balancing and health filtering as any other request.
Model Unification¶
Aliases are separate from model unification. Unification merges model catalogues within a single provider type (e.g. multiple Ollama instances). Aliases map across provider types (e.g. Ollama ↔ LM Studio ↔ llamacpp).
Proxy Engines¶
Both the Olla and Sherpa proxy engines support model alias rewriting. The rewrite happens transparently before the request is forwarded to the backend.
Example Scenario¶
Consider a home lab with three backends:
discovery:
static:
endpoints:
- url: "http://workstation:11434"
name: "ollama-rtx4090"
type: "ollama"
priority: 100
- url: "http://macbook:1234"
name: "lmstudio-m2"
type: "lm-studio"
priority: 75
- url: "http://server:8080"
name: "llamacpp-a100"
type: "llamacpp"
priority: 50
model_aliases:
llama3:
- "llama3.1:8b"
- llama-3.1-8b-instruct
- Meta-Llama-3.1-8B-Instruct.gguf
A request for "model": "llama3" will:
- Resolve to all three endpoints (each has the model under a different name)
- Prefer
ollama-rtx4090(highest priority) - Rewrite the model name to
llama3.1:8bif routed to Ollama,llama-3.1-8b-instructif routed to LM Studio, etc. - Fall back to the next endpoint if the primary is unhealthy
Troubleshooting¶
Alias Not Resolving¶
Issue: Requests with an alias name return 404 or route incorrectly.
Possible Causes:
- Actual model names in the alias don't match what backends report
- Model discovery hasn't run yet
Solutions:
- Check discovered models:
curl http://localhost:40114/olla/models - Verify model names match exactly (including tags like
:latest) - Wait for model discovery to complete or trigger a refresh
Wrong Model Name Sent to Backend¶
Issue: Backend receives the alias name instead of the actual model name.
Possible Causes:
- Actual model name in the alias list doesn't exactly match the model name reported by the backend
- Request body is not JSON
Solutions:
- Compare alias model names against discovered models:
curl http://localhost:40114/olla/models - Ensure requests use
Content-Type: application/json
Alias Overriding a Real Model¶
Issue: An alias is intercepting requests meant for a real model with the same name.
This is by design — aliases always take priority. If you need to reach the real model directly, either:
- Remove the alias, or
- Include the real model name in the alias list (self-referencing) so it stays in the candidate pool