Skip to content

Health Checking

πŸ“ Default Configuration

endpoints:
  - url: "http://localhost:11434"
    check_interval: 30s
    check_timeout: 5s
Supported Settings:

  • check_interval (default: 30s) - Time between health checks
  • check_timeout (default: 5s) - Maximum time to wait for response
  • check_path (auto-detected) - Health check endpoint path

Environment Variables: Per-endpoint settings not supported via env vars

Olla continuously monitors the health of all configured endpoints to ensure requests are only routed to available backends. The health checking system is automatic and requires minimal configuration.

Overview

Health checks serve multiple purposes:

  • Availability Detection: Identify when endpoints come online or go offline
  • Performance Monitoring: Track endpoint latency and response times
  • Intelligent Routing: Ensure requests only go to healthy endpoints
  • Automatic Recovery: Detect when failed endpoints recover

How It Works

Health Check Cycle

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Scheduler │────▢│ Health Check │────▢│Update Statusβ”‚
β”‚  (interval) β”‚     β”‚   Request    β”‚     β”‚   & Route   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β–²                                        β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Scheduler triggers checks based on configured intervals
  2. Health Check sends HTTP request to endpoint's health URL
  3. Status Update marks endpoint as healthy/unhealthy
  4. Route Update adds/removes endpoint from routing pool

Health States

Endpoints can be in one of these states:

State Description Routable Behaviour
Healthy Passing health checks βœ… Yes Normal routing
Degraded Slow but responding βœ… Yes Reduced traffic weight
Recovering Coming back online βœ… Yes Limited test traffic
Unhealthy Failing health checks ❌ No No traffic routed
Unknown Not yet checked ❌ No Awaiting first check

Configuration

Basic Health Check Setup

discovery:
  static:
    endpoints:
      - url: "http://localhost:11434"
        name: "local-ollama"
        type: "ollama"
        health_check_url: "/"        # Health endpoint
        check_interval: 5s           # How often to check
        check_timeout: 2s            # Timeout per check

Health Check URLs by Platform

Different platforms use different health endpoints:

Platform Default Health URL Expected Response
Ollama / 200 with "Ollama is running"
LM Studio /v1/models 200 with model list
vLLM /health 200 with JSON status
OpenAI-compatible /v1/models 200 with model list

Check Intervals

Configure how frequently health checks run:

endpoints:
  - url: "http://localhost:11434"
    check_interval: 5s    # Fast checks for local

  - url: "http://remote:11434"
    check_interval: 30s   # Slower for remote

Recommendations:

  • Local endpoints: 2-5 seconds
  • LAN endpoints: 5-10 seconds
  • Remote/Cloud: 15-30 seconds
  • Critical endpoints: 2-3 seconds

Check Timeouts

Set appropriate timeouts based on endpoint characteristics:

endpoints:
  - url: "http://fast-server:11434"
    check_timeout: 1s     # Fast server

  - url: "http://slow-server:11434"
    check_timeout: 5s     # Allow more time

Adaptive Health Checking

Backoff Strategy

When an endpoint fails, Olla implements exponential backoff:

  1. First failure: Check again after check_interval (no backoff)
  2. Second failure: Wait check_interval * 2
  3. Third failure: Wait check_interval * 4
  4. Fourth failure: Wait check_interval * 8
  5. Max backoff: Capped at check_interval * 12 or 60 seconds (whichever is lower)

This reduces load on failing endpoints while still detecting recovery quickly on the first failure.

Fast Recovery Detection

When an unhealthy endpoint might be recovering:

  1. Half-Open State: Send limited test traffic
  2. Success Threshold: After 2 successful checks, mark healthy
  3. Full Traffic: Resume normal routing

Automatic Model Discovery on Recovery

When an endpoint recovers from an unhealthy state, Olla automatically:

  1. Detects Recovery: Health check transitions from unhealthy to healthy
  2. Triggers Discovery: Automatically initiates model discovery
  3. Updates Catalog: Refreshes the unified model catalog with latest models
  4. Resumes Routing: Endpoint is immediately available for request routing

This ensures the model catalog stays up-to-date even if models were added/removed while the endpoint was down.

Health Check Types

HTTP GET Health Checks

The default health check method:

endpoints:
  - url: "http://localhost:11434"
    health_check_url: "/"
    # Sends: GET http://localhost:11434/
    # Expects: 200-299 status code

Model Discovery Health Checks

For endpoints that support model listing:

endpoints:
  - url: "http://localhost:11434"
    type: "ollama"
    model_url: "/api/tags"
    # Health check also validates model availability

Connection Failure Handling

Automatic Retry on Connection Failures

When a request fails due to connection issues, Olla automatically:

  1. Detects Failure: Identifies connection refused, reset, or timeout errors
  2. Marks Unhealthy: Immediately updates endpoint status to unhealthy
  3. Retries Request: Automatically tries the next available healthy endpoint
  4. Updates Health: Triggers exponential backoff for failed endpoint

This happens transparently without dropping the user request. The retry behaviour is automatic and built-in as of v0.0.16.

Connection errors that trigger automatic retry: - Connection Refused: Backend service is down - Connection Reset: Backend crashed or restarted - Connection Timeout: Backend is overloaded - Network Unreachable: Network connectivity issues

Circuit Breaker Integration

Health checks work with the circuit breaker to prevent cascade failures:

Circuit States

     Closed (Normal)
          β”‚
          β”œβ”€β”€β”€ 3 failures ──▢ Open (No Traffic)
          β”‚                        β”‚
          β”‚                        β”‚ 30s timeout
          β”‚                        β–Ό
          └──── 2 successes ◀── Half-Open (Test Traffic)
  • Closed: Normal operation, all requests pass through
  • Open: Endpoint marked unhealthy, no requests sent
  • Half-Open: Testing recovery with limited requests

Circuit Breaker Behaviour

The circuit breaker activates after consecutive failures:

  1. Failure Threshold: 3 failures (health checker) or 5 failures (Olla proxy engine)
  2. Open Duration: Circuit stays open for 30 seconds
  3. Half-Open Test: Allows one test request through
  4. Recovery: First successful request closes the circuit

Monitoring Health Status

Health Status Endpoint

Check overall system health:

curl http://localhost:40114/internal/health

Response:

{
  "status": "healthy",
  "endpoints": {
    "healthy": 3,
    "unhealthy": 1,
    "total": 4
  },
  "uptime": "2h15m",
  "version": "1.0.0"
}

Endpoint Status

View detailed endpoint health:

curl http://localhost:40114/internal/status/endpoints

Response:

{
  "endpoints": [
    {
      "name": "local-ollama",
      "url": "http://localhost:11434",
      "status": "healthy",
      "last_check": "2024-01-15T10:30:45Z",
      "last_latency": "15ms",
      "consecutive_failures": 0,
      "uptime_percentage": 99.9
    },
    {
      "name": "remote-ollama",
      "status": "unhealthy",
      "last_check": "2024-01-15T10:30:40Z",
      "consecutive_failures": 6,
      "error": "connection timeout"
    }
  ]
}

Model Statistics

Monitor model performance across endpoints:

curl http://localhost:40114/internal/stats/models

Metrics include:

  • Request counts per model
  • Model availability across endpoints
  • Average check latency
  • Endpoints by status

Troubleshooting

Endpoint Always Unhealthy

Issue: Endpoint never becomes healthy

Diagnosis:

# Test health endpoint directly
curl -v http://localhost:11434/

# Check Olla logs
docker logs olla | grep health

Solutions:

  1. Verify health check URL is correct
  2. Increase check_timeout for slow endpoints
  3. Check if endpoint requires authentication
  4. Verify network connectivity

Flapping Health Status

Issue: Endpoint rapidly switching between healthy/unhealthy

Solutions:

  1. Increase check_interval to reduce check frequency:

    check_interval: 10s  # From 2s
    

  2. Increase check_timeout for variable latency:

    check_timeout: 5s    # From 1s
    

  3. Check endpoint logs for intermittent issues

High Health Check Load

Issue: Health checks consuming too many resources

Solutions:

  1. Increase intervals for stable endpoints:

    check_interval: 30s  # For very stable endpoints
    

  2. Use different intervals for different endpoint types:

    # Critical, local
    - url: "http://localhost:11434"
      check_interval: 5s
    
    # Stable, remote  
    - url: "http://remote:11434"
      check_interval: 60s
    

False Positives

Issue: Endpoint marked healthy but requests fail

Solutions:

  1. Verify health check URL actually validates service:

    # Bad: Just checks if port is open
    health_check_url: "/"
    
    # Good: Checks if models are loaded
    health_check_url: "/api/tags"
    

  2. Add model discovery to validate functionality:

    model_url: "/api/tags"
    # This ensures models are actually available
    

Best Practices

1. Use Appropriate Health Endpoints

Choose health check URLs that validate actual functionality:

  • ❌ / - Only checks if server responds
  • βœ… /api/tags - Verifies models are available
  • βœ… /v1/models - Confirms API is operational

2. Set Realistic Timeouts

Balance between quick failure detection and false positives:

# Local endpoints - fast timeout
- url: "http://localhost:11434"
  check_timeout: 1s

# Remote endpoints - allow for network latency
- url: "https://api.example.com"
  check_timeout: 5s

3. Configure Check Intervals

Match check frequency to endpoint stability:

# Development - frequent checks
check_interval: 2s

# Production - balanced
check_interval: 10s

# Stable external APIs - less frequent
check_interval: 30s

4. Monitor Health Metrics

Track health check performance:

  • Success rate should be > 95%
  • Check latency should be consistent
  • Watch for patterns in failures

5. Use Priority with Health

Combine health checking with priority routing:

endpoints:
  # Primary - check frequently
  - url: "http://primary:11434"
    priority: 100
    check_interval: 5s

  # Backup - check less often
  - url: "http://backup:11434"
    priority: 50
    check_interval: 15s

Advanced Configuration

Custom Health Check Headers

While Olla doesn't support custom headers in configuration, you can use a reverse proxy:

# nginx configuration
location /health {
    proxy_pass http://backend/health;
    proxy_set_header Authorization "Bearer token";
}

Health Check Scripting

For complex health validation, use an external script:

#!/bin/bash
# custom-health-check.sh

# Check if Ollama is running
curl -s http://localhost:11434/ > /dev/null || exit 1

# Check if specific model is loaded
curl -s http://localhost:11434/api/tags | grep -q "llama3" || exit 1

# Check disk space
df -h | grep -q "9[0-9]%" && exit 1

exit 0

Run periodically and update Olla configuration based on results.

Integration with Monitoring

Olla provides health and status information through its internal endpoints:

  • /internal/health - Overall system health
  • /internal/status - Detailed status information
  • /internal/status/endpoints - Endpoint health details
  • /internal/stats/models - Model usage statistics

These can be integrated with external monitoring systems to track:

  1. Endpoint availability over time
  2. Health check latency trends
  3. Failure rates by endpoint
  4. Circuit breaker state changes

Next Steps