Model Routing¶
Default Configuration
Routing Strategies:model_registry: routing_strategy: type: "strict" # strict, optimistic, or discovery options: fallback_behavior: "compatible_only" # compatible_only, all, or none discovery_timeout: 2s discovery_refresh_on_miss: false
strict
(default) - Only routes to endpoints with the modeloptimistic
- Routes with configurable fallback behaviordiscovery
- Refreshes model catalog before routingFallback Behaviors:
compatible_only
(default) - Rejects if model not foundall
- Routes to any healthy endpointnone
- Always rejects if model not found
Olla implements intelligent model routing strategies to handle scenarios where requested models aren't available on all endpoints.
Overview¶
When a request specifies a model (e.g., phi3.5:latest
), Olla needs to determine which endpoints can handle that request. Not all endpoints have all models, and endpoints can become unhealthy during operation.
Routing Strategies¶
Strict Mode (Default)¶
Only routes requests to endpoints known to have the model.
Characteristics: - High reliability - requests only go to endpoints with the model - Fails fast when model unavailable - Returns 404 when model not found anywhere - Returns 503 when model only on unhealthy endpoints
Use Case: Production environments where predictability is critical.
Optimistic Mode¶
Attempts to route to any healthy endpoint when the model isn't found.
Characteristics: - Higher availability - tries all healthy endpoints - May route to endpoints without the model - Configurable fallback behavior - Best effort approach
Use Case: Development environments or when models might be pulled on-demand.
model_registry:
routing_strategy:
type: optimistic
options:
fallback_behavior: compatible_only # or "all", "none"
Discovery Mode¶
Refreshes model discovery before making routing decisions.
Characteristics: - Most accurate model availability - Adds latency for discovery refresh - Configurable timeout - Falls back to strict behavior after refresh
Use Case: When models are frequently added/removed from endpoints.
model_registry:
routing_strategy:
type: discovery
options:
discovery_refresh_on_miss: true
discovery_timeout: 2s
Fallback Behavior¶
Controls what happens when the requested model isn't available on any healthy endpoint:
compatible_only
: Reject the request with 404 - prevents routing to endpoints that don't have the modelall
: Route to any healthy endpoint even if they don't have the modelnone
: Never fall back, always reject with 404 if model not found
Response Headers¶
Routing decisions are exposed via HTTP headers for observability:
Status Codes and Routing Decisions¶
Different scenarios result in specific HTTP status codes and routing behaviors:
Strict Mode Behavior¶
Scenario | Status Code | Routing Decision | Description |
---|---|---|---|
Model found on healthy endpoint | 200 OK | routed | Normal routing to endpoint with model |
Model not found anywhere | 404 Not Found | rejected | Model doesn't exist in the system |
Model exists but only on unhealthy endpoints | 503 Service Unavailable | rejected | Model unavailable due to endpoint health |
Optimistic Mode Behavior¶
Scenario | Fallback | Status Code | Routing Decision | Description |
---|---|---|---|---|
Model found on healthy endpoint | Any | 200 OK | routed | Normal routing to endpoint with model |
Model not found | none | 404 Not Found | rejected | Model doesn't exist, no fallback |
Model not found | compatible_only | 404 Not Found | rejected | Model doesn't exist, no fallback |
Model not found | all | 200 OK | fallback | Routes to any healthy endpoint |
Model on unhealthy endpoint only | none | 503 Service Unavailable | rejected | Model unavailable, no fallback |
Model on unhealthy endpoint only | compatible_only | 503 Service Unavailable | rejected | Model unavailable, no fallback |
Model on unhealthy endpoint only | all | 200 OK | fallback | Routes to any healthy endpoint |
Discovery Mode Behavior¶
Scenario | Status Code | Routing Decision | Description |
---|---|---|---|
Model found after refresh | 200 OK | routed | Discovery found the model |
Model not found after refresh | Depends on fallback | rejected or fallback | Follows fallback behavior settings |
Discovery timeout | Depends on fallback | rejected or fallback | Falls back to cached data |
Routing Reasons¶
The X-Olla-Routing-Reason
header provides detailed information about routing decisions:
Reason | Status | Description |
---|---|---|
model_found | 200 | Model found on healthy endpoints |
model_not_found | 404 | Model doesn't exist in the system |
model_not_found_fallback | 200 | Model not found but falling back to all endpoints |
model_unavailable_no_fallback | 503 | Model exists but unavailable, no fallback |
model_unavailable_compatible_only | 503 | Model exists but unavailable, compatible_only prevents fallback |
all_healthy_fallback | 200 | Using all healthy endpoints as fallback |
discovery_failed | Varies | Discovery refresh failed, using cached data |
Configuration Example¶
Complete routing configuration:
# Strict mode for production (default)
model_registry:
type: "memory"
enable_unifier: true
routing_strategy:
type: strict
# Optimistic with compatible fallback
model_registry:
type: "memory"
enable_unifier: true
routing_strategy:
type: optimistic
options:
fallback_behavior: compatible_only
# Discovery with timeout
model_registry:
type: "memory"
enable_unifier: true
routing_strategy:
type: discovery
options:
discovery_refresh_on_miss: true
discovery_timeout: 2s
fallback_behavior: none
Metrics and Monitoring¶
Routing decisions are tracked in metrics:
- Total routing decisions by strategy
- Success/failure rates per strategy
- Fallback usage statistics
- Discovery refresh latency
Access metrics via /internal/status
endpoint.
Best Practices¶
- Use strict mode in production for predictable behavior
- Enable discovery mode when models change frequently
- Monitor routing headers to understand request flow
- Set appropriate timeouts for discovery mode
- Choose fallback behavior carefully:
none
orcompatible_only
for APIs that need model accuracyall
only when any endpoint can handle unknown models
Troubleshooting¶
Getting 404 for Known Models¶
Issue: Requests return 404 even though the model exists
Possible Causes: - Model only exists on unhealthy endpoints - Using compatible_only
or none
fallback when model isn't discovered yet - Model discovery hasn't run yet
Solutions: 1. Check endpoint health: curl http://localhost:40114/internal/status/endpoints
2. Verify model discovery: curl http://localhost:40114/internal/status/models
3. Try discovery mode with refresh:
Getting 503 vs 404¶
Understanding the Difference: - 404: Model doesn't exist anywhere in the system - 503: Model exists but no healthy endpoints have it
How to Debug:
# Check all models in the system
curl http://localhost:40114/olla/models
# Check endpoint health
curl http://localhost:40114/internal/status/endpoints
# Look at routing headers
curl -I http://localhost:40114/olla/ollama/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "test-model", "messages": []}'
Unexpected Fallback Behavior¶
Issue: Requests going to endpoints without the model
Check Your Configuration: