Health Checking¶
Supported Settings:
Default Configuration
check_interval
(default: 30s) - Time between health checkscheck_timeout
(default: 5s) - Maximum time to wait for responsecheck_path
(auto-detected) - Health check endpoint pathEnvironment Variables: Per-endpoint settings not supported via env vars
Olla continuously monitors the health of all configured endpoints to ensure requests are only routed to available backends. The health checking system is automatic and requires minimal configuration.
Overview¶
Health checks serve multiple purposes:
- Availability Detection: Identify when endpoints come online or go offline
- Performance Monitoring: Track endpoint latency and response times
- Intelligent Routing: Ensure requests only go to healthy endpoints
- Automatic Recovery: Detect when failed endpoints recover
How It Works¶
Health Check Cycle¶
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Scheduler ββββββΆβ Health Check ββββββΆβUpdate Statusβ
β (interval) β β Request β β & Route β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β² β
ββββββββββββββββββββββββββββββββββββββββββ
- Scheduler triggers checks based on configured intervals
- Health Check sends HTTP request to endpoint's health URL
- Status Update marks endpoint as healthy/unhealthy
- Route Update adds/removes endpoint from routing pool
Health States¶
Endpoints can be in one of these states:
State | Description | Routable | Behaviour |
---|---|---|---|
Healthy | Passing health checks | β Yes | Normal routing |
Degraded | Slow but responding | β Yes | Reduced traffic weight |
Recovering | Coming back online | β Yes | Limited test traffic |
Unhealthy | Failing health checks | β No | No traffic routed |
Unknown | Not yet checked | β No | Awaiting first check |
Configuration¶
Basic Health Check Setup¶
discovery:
static:
endpoints:
- url: "http://localhost:11434"
name: "local-ollama"
type: "ollama"
health_check_url: "/" # Health endpoint
check_interval: 5s # How often to check
check_timeout: 2s # Timeout per check
Health Check URLs by Platform¶
Different platforms use different health endpoints:
Platform | Default Health URL | Expected Response |
---|---|---|
Ollama | / | 200 with "Ollama is running" |
LM Studio | /v1/models | 200 with model list |
vLLM | /health | 200 with JSON status |
OpenAI-compatible | /v1/models | 200 with model list |
Check Intervals¶
Configure how frequently health checks run:
endpoints:
- url: "http://localhost:11434"
check_interval: 5s # Fast checks for local
- url: "http://remote:11434"
check_interval: 30s # Slower for remote
Recommendations:
- Local endpoints: 2-5 seconds
- LAN endpoints: 5-10 seconds
- Remote/Cloud: 15-30 seconds
- Critical endpoints: 2-3 seconds
Check Timeouts¶
Set appropriate timeouts based on endpoint characteristics:
endpoints:
- url: "http://fast-server:11434"
check_timeout: 1s # Fast server
- url: "http://slow-server:11434"
check_timeout: 5s # Allow more time
Adaptive Health Checking¶
Backoff Strategy¶
When an endpoint fails, Olla implements exponential backoff:
- First failure: Check again after
check_interval
- Second failure: Wait
check_interval * 2
- Third failure: Wait
check_interval * 4
- Max backoff: Capped at 5 minutes
This reduces load on failing endpoints while still detecting recovery.
Fast Recovery Detection¶
When an unhealthy endpoint might be recovering:
- Half-Open State: Send limited test traffic
- Success Threshold: After 2 successful checks, mark healthy
- Full Traffic: Resume normal routing
Health Check Types¶
HTTP GET Health Checks¶
The default health check method:
endpoints:
- url: "http://localhost:11434"
health_check_url: "/"
# Sends: GET http://localhost:11434/
# Expects: 200-299 status code
Model Discovery Health Checks¶
For endpoints that support model listing:
endpoints:
- url: "http://localhost:11434"
type: "ollama"
model_url: "/api/tags"
# Health check also validates model availability
Circuit Breaker Integration¶
Health checks work with the circuit breaker to prevent cascade failures:
Circuit States¶
Closed (Normal)
β
ββββ 3 failures βββΆ Open (No Traffic)
β β
β β 30s timeout
β βΌ
βββββ 2 successes βββ Half-Open (Test Traffic)
- Closed: Normal operation, all requests pass through
- Open: Endpoint marked unhealthy, no requests sent
- Half-Open: Testing recovery with limited requests
Circuit Breaker Behaviour¶
The circuit breaker activates after consecutive failures:
- Failure Threshold: 3 consecutive failures trigger opening
- Open Duration: Circuit stays open for 30 seconds
- Half-Open Test: Send 3 test requests
- Recovery: 2 successful tests close the circuit
Monitoring Health Status¶
Health Status Endpoint¶
Check overall system health:
Response:
{
"status": "healthy",
"endpoints": {
"healthy": 3,
"unhealthy": 1,
"total": 4
},
"uptime": "2h15m",
"version": "1.0.0"
}
Endpoint Status¶
View detailed endpoint health:
Response:
{
"endpoints": [
{
"name": "local-ollama",
"url": "http://localhost:11434",
"status": "healthy",
"last_check": "2024-01-15T10:30:45Z",
"last_latency": "15ms",
"consecutive_failures": 0,
"uptime_percentage": 99.9
},
{
"name": "remote-ollama",
"status": "unhealthy",
"last_check": "2024-01-15T10:30:40Z",
"consecutive_failures": 6,
"error": "connection timeout"
}
]
}
Model Statistics¶
Monitor model performance across endpoints:
Metrics include:
- Request counts per model
- Model availability across endpoints
- Average check latency
- Endpoints by status
Troubleshooting¶
Endpoint Always Unhealthy¶
Issue: Endpoint never becomes healthy
Diagnosis:
# Test health endpoint directly
curl -v http://localhost:11434/
# Check Olla logs
docker logs olla | grep health
Solutions:
- Verify health check URL is correct
- Increase
check_timeout
for slow endpoints - Check if endpoint requires authentication
- Verify network connectivity
Flapping Health Status¶
Issue: Endpoint rapidly switching between healthy/unhealthy
Solutions:
-
Increase
check_interval
to reduce check frequency: -
Increase
check_timeout
for variable latency: -
Check endpoint logs for intermittent issues
High Health Check Load¶
Issue: Health checks consuming too many resources
Solutions:
-
Increase intervals for stable endpoints:
-
Use different intervals for different endpoint types:
False Positives¶
Issue: Endpoint marked healthy but requests fail
Solutions:
-
Verify health check URL actually validates service:
-
Add model discovery to validate functionality:
Best Practices¶
1. Use Appropriate Health Endpoints¶
Choose health check URLs that validate actual functionality:
- β
/
- Only checks if server responds - β
/api/tags
- Verifies models are available - β
/v1/models
- Confirms API is operational
2. Set Realistic Timeouts¶
Balance between quick failure detection and false positives:
# Local endpoints - fast timeout
- url: "http://localhost:11434"
check_timeout: 1s
# Remote endpoints - allow for network latency
- url: "https://api.example.com"
check_timeout: 5s
3. Configure Check Intervals¶
Match check frequency to endpoint stability:
# Development - frequent checks
check_interval: 2s
# Production - balanced
check_interval: 10s
# Stable external APIs - less frequent
check_interval: 30s
4. Monitor Health Metrics¶
Track health check performance:
- Success rate should be > 95%
- Check latency should be consistent
- Watch for patterns in failures
5. Use Priority with Health¶
Combine health checking with priority routing:
endpoints:
# Primary - check frequently
- url: "http://primary:11434"
priority: 100
check_interval: 5s
# Backup - check less often
- url: "http://backup:11434"
priority: 50
check_interval: 15s
Advanced Configuration¶
Custom Health Check Headers¶
While Olla doesn't support custom headers in configuration, you can use a reverse proxy:
# nginx configuration
location /health {
proxy_pass http://backend/health;
proxy_set_header Authorization "Bearer token";
}
Health Check Scripting¶
For complex health validation, use an external script:
#!/bin/bash
# custom-health-check.sh
# Check if Ollama is running
curl -s http://localhost:11434/ > /dev/null || exit 1
# Check if specific model is loaded
curl -s http://localhost:11434/api/tags | grep -q "llama3" || exit 1
# Check disk space
df -h | grep -q "9[0-9]%" && exit 1
exit 0
Run periodically and update Olla configuration based on results.
Integration with Monitoring¶
Olla provides health and status information through its internal endpoints:
/internal/health
- Overall system health/internal/status
- Detailed status information/internal/status/endpoints
- Endpoint health details/internal/stats/models
- Model usage statistics
These can be integrated with external monitoring systems to track:
- Endpoint availability over time
- Health check latency trends
- Failure rates by endpoint
- Circuit breaker state changes
Next Steps¶
- Load Balancing - How health affects routing
- Circuit Breaker - Failure protection details
- Monitoring - Complete monitoring setup