API Reference¶
Olla exposes several API endpoints for proxy operations, health monitoring, and system status. All endpoints follow RESTful conventions and return JSON responses unless otherwise specified.
Base URL¶
API Sections¶
System Endpoints¶
Internal endpoints for health monitoring and system status.
/internal/health
- Health check endpoint/internal/status
- System status and statistics/internal/process
- Process information
Unified Models API¶
Cross-provider model discovery and information.
/olla/models
- List all available models across providers
Ollama API¶
Proxy endpoints for Ollama instances.
/olla/ollama/*
- All Ollama API endpoints- OpenAI-compatible endpoints included
LM Studio API¶
Proxy endpoints for LM Studio servers.
/olla/lmstudio/*
- All LM Studio API endpoints/olla/lm-studio/*
- Alternative prefix/olla/lm_studio/*
- Alternative prefix
OpenAI API¶
Proxy endpoints for OpenAI-compatible services.
/olla/openai/*
- OpenAI API endpoints
vLLM API¶
Proxy endpoints for vLLM servers.
/olla/vllm/*
- vLLM API endpoints
Authentication¶
Currently, Olla does not implement authentication at the proxy level. Authentication should be handled by: - Backend services (Ollama, LM Studio, etc.) - Network-level security (firewalls, VPNs) - Reverse proxy authentication (nginx, Traefik)
Rate Limiting¶
Global and per-IP rate limits are enforced:
Limit Type | Default Value |
---|---|
Global requests/minute | 1000 |
Per-IP requests/minute | 100 |
Health endpoint requests/minute | 1000 |
Burst size | 50 |
Request Headers¶
Required Headers¶
Content-Type: application/json
for POST requests
Optional Headers¶
X-Request-ID
- Custom request ID for tracing
Response Headers¶
All responses include:
Header | Description |
---|---|
X-Olla-Request-ID | Unique request identifier |
X-Olla-Endpoint | Backend endpoint name |
X-Olla-Model | Model used (if applicable) |
X-Olla-Backend-Type | Provider type (ollama/lmstudio/openai/vllm) |
X-Olla-Response-Time | Total processing time (trailer) |
Error Responses¶
Standard HTTP status codes are used:
Status Code | Description |
---|---|
200 | Success |
400 | Bad Request |
404 | Not Found |
429 | Rate Limit Exceeded |
500 | Internal Server Error |
502 | Bad Gateway |
503 | Service Unavailable |
Error Response Format¶
Streaming Responses¶
For streaming endpoints (chat completions, text generation), responses use:
Content-Type: text/event-stream
for SSE streamsTransfer-Encoding: chunked
for HTTP streaming- Line-delimited JSON for data chunks
CORS Support¶
CORS headers are included for browser-based clients:
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization