OpenWebUI Integration with OpenAI¶

OpenWebUI can speak to any OpenAI‑compatible endpoint. Olla sits in front as a smart proxy, exposing a single OpenAI API base that merges multiple backends (e.g. vLLM, SGLang) and handles load‑balancing + failover.

Set in OpenWebUI:

export OPENAI_API_BASE_URL="http://localhost:40114/olla/openai/v1"

What you get via Olla

One stable OpenAI base URL for all backends
Priority/least‑connections load‑balancing and health checks
Streaming passthrough
Unified /v1/models across providers

Overview¶

Project	github.com/open-webui/open-webui
Integration Type	Frontend UI
Connection Method	Open API Compatibility
Features Supported (via Olla)	Chat Interface Model Selection Streaming Responses
Configuration	Set `OPENAI_API_BASE_URL` to Olla OpenAI endpoint `export OPENAI_API_BASE_URL="http://localhost:40114/olla/openai/v1"`
Example	You can find an example of integration in `examples/ollama-openwebui` for Ollama as a full example, just remember to change to `OPENAI_API_BASE_URL`.

Architecture¶

┌─────────────┐     ┌───────── Olla (40114) ──────┐      ┌─────────────────────┐
│  OpenWebUI  │ ───▶│ /olla/openai/v1 (proxy)    │ ───▶ │ vLLM :8000 (/v1/*)  │
│  (3000)     │     │  • LB + failover            │      └─────────────────────┘
└─────────────┘     │  • health checks            │      ┌─────────────────────┐
                    │  • model unification (/v1)  │ ───▶ │ SGLang :30000 (/v1) │
                    └─────────────────────────────┘      └─────────────────────┘

Quick Start (Docker Compose)¶

Create compose.yaml:

services:
  olla:
    image: ghcr.io/thushan/olla:latest
    container_name: olla
    restart: unless-stopped
    ports:
      - "40114:40114"
    volumes:
      - ./olla.yaml:/app/config.yaml:ro
      - ./logs:/app/logs
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:40114/internal/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s

  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: openwebui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - openwebui_data:/app/backend/data
    environment:
      - OPENAI_API_BASE_URL=http://olla:40114/olla/openai/v1
      - WEBUI_NAME=Olla + OpenWebUI
      - WEBUI_URL=http://localhost:3000
    depends_on:
      olla:
        condition: service_healthy

volumes:
  openwebui_data:
    driver: local

Create olla.yaml (static discovery example):

server:
  host: 0.0.0.0
  port: 40114

proxy:
  engine: sherpa           # or: olla (lower overhead), test both
  load_balancer: priority  # or: least-connections

# Service discovery of OpenAI-compatible backends
# (Each backend must expose /v1/*; Olla will translate as needed.)
discovery:
  type: static
  static:
    endpoints:
      - url: http://192.168.1.100:8000
        name: gpu-vllm
        type: vllm
        priority: 100

      - url: http://192.168.1.101:30000
        name: gpu-sglang
        type: sglang
        priority: 50

# Optional timeouts & streaming profile
# proxy:
#   response_timeout: 1800s
#   read_timeout: 600s
#   profile: streaming

Bring it up:

docker compose up -d

OpenWebUI: http://localhost:3000

Verifying via cURL¶

List unified models:

curl http://localhost:40114/olla/openai/v1/models | jq

Simple completion (non‑streaming):

curl -s http://localhost:40114/olla/openai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-oss-120b",  
    "messages": [{"role":"user","content":"Hello from Olla"}]
  }' | jq

Streaming (SSE):

curl -N http://localhost:40114/olla/openai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-oss-120b",
    "stream": true,
    "messages": [{"role":"user","content":"Stream test"}]
  }'

Inspect Olla headers (which backend served the call):

curl -sI http://localhost:40114/internal/status/endpoints | sed -n '1,20p'
# Look for: X-Olla-Endpoint, X-Olla-Backend-Type, X-Olla-Response-Time

OpenWebUI Configuration Notes¶

Env var: OPENAI_API_BASE_URL must point to Olla’s /olla/openai/v1.
Model picker: OpenWebUI’s model list is sourced from /v1/models (via Olla). If empty, see Troubleshooting.
API keys: If OpenWebUI prompts for an OpenAI key but your backends don’t require one, leave blank.

Multiple Backends (vLLM, SGLang, LM Studio)¶

Add as many OpenAI‑compatible servers as you like. Priorities control routing.

discovery:
  static:
    endpoints:
      - url: http://vllm-a:8000
        name: vllm-a
        type: vllm
        priority: 100

      - url: http://sglang-b:30000
        name: sglang-b
        type: sglang
        priority: 80

      - url: http://lmstudio-c:1234
        name: lmstudio-c
        type: openai-compatible     # generic OpenAI-compatible server
        priority: 60

Tip: Use least-connections when all nodes are similar; use priority to prefer local/cheaper nodes.

Authentication (Front‑door keying via Nginx)¶

Olla doesn’t issue/validate API keys (yet). To expose Olla publicly, front it with Nginx to enforce simple static keys.

/etc/nginx/conf.d/olla.conf

map $http_authorization $api_key_valid {
    default 0;
    ~*"Bearer (sk-thushan-XXXXXXXX|sk-yolo-YYYYYYYY)" 1;
}

server {
    listen 80;
    server_name ai.example.com;

    location /api/ {
        if ($api_key_valid = 0) { return 401; }
        proxy_pass http://localhost:40114;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
    }
}

Then point external users to http://ai.example.com/api/olla/openai/v1 and give them a matching Authorization: Bearer ....

For more robust auth (rate limits, per‑key quotas, logs), put an API gateway (Traefik/Envoy/Kong) ahead of Olla.

Monitoring & Health¶

Olla health:

curl http://localhost:40114/internal/health

Endpoint status:

curl http://localhost:40114/internal/status/endpoints | jq

Unified models:

curl http://localhost:40114/olla/openai/v1/models | jq

Logs:

docker logs -f olla

docker logs -f openwebui

Troubleshooting¶

Models not appearing in OpenWebUI

Olla up?

curl http://localhost:40114/internal/health

Backends discovered?

curl http://localhost:40114/internal/status/endpoints | jq

Models resolvable?

curl http://localhost:40114/olla/openai/v1/models | jq

OpenWebUI sees correct base?

docker exec openwebui env | grep OPENAI_API_BASE_URL

Connection refused from OpenWebUI → Olla

Verify compose service names and ports
From container: docker exec openwebui curl -sS olla:40114/internal/health

Slow responses

Switch to proxy.engine: olla or profile: streaming
Use least-connections for fairer distribution
Increase proxy.response_timeout for long generations

Docker networking (Linux)

To hit host services: http://172.17.0.1:<port>
Remote nodes: use actual LAN IPs

Standalone (no compose)¶

Run Olla locally:

olla --config ./olla.yaml

Run OpenWebUI:

docker run -d --name openwebui \
  -p 3000:8080 \
  -v openwebui_data:/app/backend/data \
  -e OPENAI_API_BASE_URL=http://host.docker.internal:40114/olla/openai/v1 \
  ghcr.io/open-webui/open-webui:main