System Endpoints¶

Internal endpoints for health monitoring, system status, and process information.

Endpoints Overview¶

Method	URI	Description
GET	`/version`	Get Olla version information
GET	`/internal/health`	Health check endpoint
GET	`/internal/status`	System status and statistics
GET	`/internal/status/endpoints`	Detailed endpoint status
GET	`/internal/status/models`	Model registry status
GET	`/internal/stats/models`	Model usage statistics
GET	`/internal/process`	Process information and metrics

GET /version¶

Get version information about the running Olla instance.

Request¶

curl -X GET http://localhost:40114/version

Response¶

{
  "version": "0.1.0",
  "build": {
    "version": "0.1.0",
    "commit": "abc123def",
    "date": "2024-01-15",
    "go_version": "go1.24.0"
  }
}

GET /internal/health¶

Health check endpoint for monitoring Olla's availability and backend connectivity.

Request¶

curl -X GET http://localhost:40114/internal/health

Response¶

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "uptime": "2h30m15s",
  "endpoints": [
    {
      "name": "local-ollama",
      "url": "http://localhost:11434",
      "status": "healthy",
      "latency": "1.2ms",
      "last_check": "2024-01-15T10:29:45Z"
    },
    {
      "name": "local-lm-studio",
      "url": "http://localhost:11234", 
      "status": "healthy",
      "latency": "0.8ms",
      "last_check": "2024-01-15T10:29:45Z"
    }
  ]
}

Response Fields¶

Field	Type	Description
`status`	string	Overall health status (healthy/degraded/unhealthy)
`timestamp`	string	Current timestamp in RFC3339 format
`uptime`	string	Time since Olla started
`endpoints`	array	Status of each configured endpoint
`endpoints[].name`	string	Endpoint identifier
`endpoints[].url`	string	Backend URL
`endpoints[].status`	string	Endpoint health (healthy/unhealthy)
`endpoints[].latency`	string	Last health check latency
`endpoints[].last_check`	string	Timestamp of last health check

GET /internal/status¶

Detailed system status including statistics, configuration, and model information.

Request¶

curl -X GET http://localhost:40114/internal/status

Response¶

{
  "version": "0.1.0",
  "build": {
    "version": "0.1.0",
    "commit": "abc123def",
    "date": "2024-01-15",
    "go_version": "go1.24.0"
  },
  "config": {
    "proxy_engine": "sherpa",
    "load_balancer": "least-connections",
    "endpoints_configured": 2,
    "models_discovered": 5
  },
  "statistics": {
    "requests_total": 1523,
    "requests_active": 3,
    "requests_failed": 12,
    "average_latency": "125ms",
    "p95_latency": "450ms",
    "p99_latency": "850ms"
  },
  "endpoints": {
    "local-ollama": {
      "healthy": true,
      "models": 3,
      "requests": 1200,
      "errors": 5,
      "average_latency": "120ms"
    },
    "local-lm-studio": {
      "healthy": true,
      "models": 2,
      "requests": 323,
      "errors": 7,
      "average_latency": "135ms"
    }
  },
  "models": {
    "total": 5,
    "by_provider": {
      "ollama": ["llama3.2:latest", "mistral:latest", "codellama:latest"],
      "lm-studio": ["phi-3-mini", "gemma-2b"]
    }
  }
}

Response Fields¶

Field	Type	Description
`version`	string	Olla version
`build`	object	Build information
`config`	object	Current configuration
`statistics`	object	Request statistics
`endpoints`	object	Per-endpoint statistics
`models`	object	Model information

GET /internal/process¶

Process information and resource metrics.

Request¶

curl -X GET http://localhost:40114/internal/process

Response¶

{
  "pid": 12345,
  "started_at": "2024-01-15T08:00:00Z",
  "uptime_seconds": 9015,
  "memory": {
    "alloc_mb": 45.2,
    "total_alloc_mb": 512.8,
    "sys_mb": 72.3,
    "heap_alloc_mb": 45.2,
    "heap_objects": 125432,
    "gc_runs": 42,
    "gc_pause_ms": 0.125
  },
  "cpu": {
    "goroutines": 28,
    "threads": 12,
    "cpu_percent": 2.5
  },
  "connections": {
    "active": 3,
    "idle": 12,
    "total_created": 1523
  },
  "runtime": {
    "go_version": "go1.24.0",
    "os": "linux",
    "arch": "amd64",
    "max_procs": 8
  }
}

Response Fields¶

Field	Type	Description
`pid`	integer	Process ID
`started_at`	string	Process start time
`uptime_seconds`	integer	Uptime in seconds
`memory`	object	Memory statistics
`memory.alloc_mb`	float	Currently allocated memory
`memory.total_alloc_mb`	float	Total allocated memory
`memory.sys_mb`	float	System memory
`memory.heap_alloc_mb`	float	Heap allocated memory
`memory.heap_objects`	integer	Number of heap objects
`memory.gc_runs`	integer	Number of GC runs
`memory.gc_pause_ms`	float	Last GC pause duration
`cpu`	object	CPU statistics
`cpu.goroutines`	integer	Active goroutines
`cpu.threads`	integer	OS threads
`cpu.cpu_percent`	float	CPU usage percentage
`connections`	object	Connection pool stats
`runtime`	object	Runtime information

Rate Limits¶

System endpoints have elevated rate limits:

1000 requests per minute
Burst size: 50 requests

This ensures monitoring systems can poll frequently without being rate-limited.