Health Checking¶

Default Configuration
endpoints:
  - url: "http://localhost:11434"
    check_interval: 30s
    check_timeout: 5s
Supported Settings:

check_interval (default: 30s) - Time between health checks

check_timeout (default: 5s) - Maximum time to wait for response

check_path (auto-detected) - Health check endpoint path

Environment Variables: Per-endpoint settings not supported via env vars

Olla continuously monitors the health of all configured endpoints to ensure requests are only routed to available backends. The health checking system is automatic and requires minimal configuration.

Overview¶

Health checks serve multiple purposes:

Availability Detection: Identify when endpoints come online or go offline
Performance Monitoring: Track endpoint latency and response times
Intelligent Routing: Ensure requests only go to healthy endpoints
Automatic Recovery: Detect when failed endpoints recover

How It Works¶

Health Check Cycle¶

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Scheduler │────▶│ Health Check │────▶│Update Status│
│  (interval) │     │   Request    │     │   & Route   │
└─────────────┘     └──────────────┘     └─────────────┘
        ▲                                        │
        └────────────────────────────────────────┘

Scheduler triggers checks based on configured intervals
Health Check sends HTTP request to endpoint's health URL
Status Update marks endpoint as healthy/unhealthy
Route Update adds/removes endpoint from routing pool

Health States¶

Endpoints can be in one of these states:

State	Description	Routable	Behaviour
Healthy	Passing health checks	✅ Yes	Normal routing
Degraded	Slow but responding	✅ Yes	Reduced traffic weight
Recovering	Coming back online	✅ Yes	Limited test traffic
Unhealthy	Failing health checks	❌ No	No traffic routed
Unknown	Not yet checked	❌ No	Awaiting first check

Configuration¶

Basic Health Check Setup¶

discovery:
  static:
    endpoints:
      - url: "http://localhost:11434"
        name: "local-ollama"
        type: "ollama"
        health_check_url: "/"        # Health endpoint
        check_interval: 5s           # How often to check
        check_timeout: 2s            # Timeout per check

Health Check URLs by Platform¶

Different platforms use different health endpoints:

Platform	Default Health URL	Expected Response
Ollama	`/`	200 with "Ollama is running"
LM Studio	`/v1/models`	200 with model list
vLLM	`/health`	200 with JSON status
OpenAI-compatible	`/v1/models`	200 with model list

Check Intervals¶

Configure how frequently health checks run:

endpoints:
  - url: "http://localhost:11434"
    check_interval: 5s    # Fast checks for local

  - url: "http://remote:11434"
    check_interval: 30s   # Slower for remote

Recommendations:

Local endpoints: 2-5 seconds
LAN endpoints: 5-10 seconds
Remote/Cloud: 15-30 seconds
Critical endpoints: 2-3 seconds

Check Timeouts¶

Set appropriate timeouts based on endpoint characteristics:

endpoints:
  - url: "http://fast-server:11434"
    check_timeout: 1s     # Fast server

  - url: "http://slow-server:11434"
    check_timeout: 5s     # Allow more time

Adaptive Health Checking¶

Backoff Strategy¶

When an endpoint fails, Olla implements exponential backoff:

First failure: Check again after check_interval (no backoff)
Second failure: Wait check_interval * 2
Third failure: Wait check_interval * 4
Fourth failure: Wait check_interval * 8
Max backoff: Capped at check_interval * 12 or 60 seconds (whichever is lower)

This reduces load on failing endpoints while still detecting recovery quickly on the first failure.

Fast Recovery Detection¶

When an unhealthy endpoint might be recovering:

Half-Open State: Send limited test traffic
Success Threshold: After 2 successful checks, mark healthy
Full Traffic: Resume normal routing

Automatic Model Discovery on Recovery¶

When an endpoint recovers from an unhealthy state, Olla automatically:

Detects Recovery: Health check transitions from unhealthy to healthy
Triggers Discovery: Automatically initiates model discovery
Updates Catalog: Refreshes the unified model catalog with latest models
Resumes Routing: Endpoint is immediately available for request routing

This ensures the model catalog stays up-to-date even if models were added/removed while the endpoint was down.

Health Check Types¶

HTTP GET Health Checks¶

The default health check method:

endpoints:
  - url: "http://localhost:11434"
    health_check_url: "/"
    # Sends: GET http://localhost:11434/
    # Expects: 200-299 status code

Model Discovery Health Checks¶

For endpoints that support model listing:

endpoints:
  - url: "http://localhost:11434"
    type: "ollama"
    model_url: "/api/tags"
    # Health check also validates model availability

Connection Failure Handling¶

Automatic Retry on Connection Failures¶

When a request fails due to connection issues, Olla automatically:

Detects Failure: Identifies connection refused, reset, or timeout errors
Marks Unhealthy: Immediately updates endpoint status to unhealthy
Retries Request: Automatically tries the next available healthy endpoint
Updates Health: Triggers exponential backoff for failed endpoint

This happens transparently without dropping the user request. The retry behaviour is automatic and built-in as of v0.0.16.

Connection errors that trigger automatic retry: - Connection Refused: Backend service is down - Connection Reset: Backend crashed or restarted - Connection Timeout: Backend is overloaded - Network Unreachable: Network connectivity issues

Circuit Breaker Integration¶

Health checks work with the circuit breaker to prevent cascade failures:

Circuit States¶

     Closed (Normal)
          │
          ├─── 3 failures ──▶ Open (No Traffic)
          │                        │
          │                        │ 30s timeout
          │                        ▼
          └──── 2 successes ◀── Half-Open (Test Traffic)

Closed: Normal operation, all requests pass through
Open: Endpoint marked unhealthy, no requests sent
Half-Open: Testing recovery with limited requests

Circuit Breaker Behaviour¶

The circuit breaker activates after consecutive failures:

Failure Threshold: 3 failures (health checker) or 5 failures (Olla proxy engine)
Open Duration: Circuit stays open for 30 seconds
Half-Open Test: Allows one test request through
Recovery: First successful request closes the circuit

Monitoring Health Status¶

Health Status Endpoint¶

Check overall system health:

curl http://localhost:40114/internal/health

Response:

{
  "status": "healthy",
  "endpoints": {
    "healthy": 3,
    "unhealthy": 1,
    "total": 4
  },
  "uptime": "2h15m",
  "version": "1.0.0"
}

Endpoint Status¶

View detailed endpoint health:

curl http://localhost:40114/internal/status/endpoints

Response:

{
  "endpoints": [
    {
      "name": "local-ollama",
      "url": "http://localhost:11434",
      "status": "healthy",
      "last_check": "2024-01-15T10:30:45Z",
      "last_latency": "15ms",
      "consecutive_failures": 0,
      "uptime_percentage": 99.9
    },
    {
      "name": "remote-ollama",
      "status": "unhealthy",
      "last_check": "2024-01-15T10:30:40Z",
      "consecutive_failures": 6,
      "error": "connection timeout"
    }
  ]
}

Model Statistics¶

Monitor model performance across endpoints:

curl http://localhost:40114/internal/stats/models

Metrics include:

Request counts per model
Model availability across endpoints
Average check latency
Endpoints by status

Troubleshooting¶

Endpoint Always Unhealthy¶

Issue: Endpoint never becomes healthy

Diagnosis:

# Test health endpoint directly
curl -v http://localhost:11434/

# Check Olla logs
docker logs olla | grep health

Solutions:

Verify health check URL is correct
Increase check_timeout for slow endpoints
Check if endpoint requires authentication
Verify network connectivity

Flapping Health Status¶

Issue: Endpoint rapidly switching between healthy/unhealthy

Solutions:

Increase check_interval to reduce check frequency:
```
check_interval: 10s  # From 2s
```
Increase check_timeout for variable latency:
```
check_timeout: 5s    # From 1s
```
Check endpoint logs for intermittent issues

High Health Check Load¶

Issue: Health checks consuming too many resources

Solutions:

Increase intervals for stable endpoints:

check_interval: 30s  # For very stable endpoints

Use different intervals for different endpoint types:

# Critical, local
- url: "http://localhost:11434"
  check_interval: 5s

# Stable, remote  
- url: "http://remote:11434"
  check_interval: 60s

False Positives¶

Issue: Endpoint marked healthy but requests fail

Solutions:

Verify health check URL actually validates service:

# Bad: Just checks if port is open
health_check_url: "/"

# Good: Checks if models are loaded
health_check_url: "/api/tags"

Add model discovery to validate functionality:

model_url: "/api/tags"
# This ensures models are actually available

Best Practices¶

1. Use Appropriate Health Endpoints¶

Choose health check URLs that validate actual functionality:

❌ / - Only checks if server responds
✅ /api/tags - Verifies models are available
✅ /v1/models - Confirms API is operational

2. Set Realistic Timeouts¶

Balance between quick failure detection and false positives:

# Local endpoints - fast timeout
- url: "http://localhost:11434"
  check_timeout: 1s

# Remote endpoints - allow for network latency
- url: "https://api.example.com"
  check_timeout: 5s

3. Configure Check Intervals¶

Match check frequency to endpoint stability:

# Development - frequent checks
check_interval: 2s

# Production - balanced
check_interval: 10s

# Stable external APIs - less frequent
check_interval: 30s

4. Monitor Health Metrics¶

Track health check performance:

Success rate should be > 95%
Check latency should be consistent
Watch for patterns in failures

5. Use Priority with Health¶

Combine health checking with priority routing:

endpoints:
  # Primary - check frequently
  - url: "http://primary:11434"
    priority: 100
    check_interval: 5s

  # Backup - check less often
  - url: "http://backup:11434"
    priority: 50
    check_interval: 15s

Advanced Configuration¶

Custom Health Check Headers¶

While Olla doesn't support custom headers in configuration, you can use a reverse proxy:

# nginx configuration
location /health {
    proxy_pass http://backend/health;
    proxy_set_header Authorization "Bearer token";
}

Health Check Scripting¶

For complex health validation, use an external script:

#!/bin/bash
# custom-health-check.sh

# Check if Ollama is running
curl -s http://localhost:11434/ > /dev/null || exit 1

# Check if specific model is loaded
curl -s http://localhost:11434/api/tags | grep -q "llama3" || exit 1

# Check disk space
df -h | grep -q "9[0-9]%" && exit 1

exit 0

Run periodically and update Olla configuration based on results.

Integration with Monitoring¶

Olla provides health and status information through its internal endpoints:

/internal/health - Overall system health
/internal/status - Detailed status information
/internal/status/endpoints - Endpoint health details
/internal/stats/models - Model usage statistics

These can be integrated with external monitoring systems to track:

Endpoint availability over time
Health check latency trends
Failure rates by endpoint
Circuit breaker state changes

Next Steps¶

Load Balancing - How health affects routing
Circuit Breaker - Failure protection details
Monitoring - Complete monitoring setup