Skip to content

OpenWebUI Integration with Ollama

OpenWebUI is a powerful web interface for interacting with LLMs. Olla acts as a proxy between OpenWebUI and your Ollama backends, providing load balancing, failover and unified model management across multiple Ollama instances.

Set in OpenWebUI:

export OLLAMA_BASE_URL="http://localhost:40114/olla/ollama"

You can find an example integration of OpenWebUI with Olla and Ollama instances in examples/ollama-openwebui - see latest in Github.

Overview

Project github.com/open-webui/open-webui
Integration Type Frontend UI
Connection Method Ollama API Compatibility
Features Supported
(via Olla)
  • Chat Interface
  • Model Selection
  • Streaming Responses
Configuration Set OLLAMA_BASE_URL to Olla endpoint
export OLLAMA_BASE_URL="http://localhost:40114/olla/ollama"  
Example You can find an example of integration in examples/ollama-openwebui

Architecture

┌─────────────┐    ┌──────────┐    ┌─────────────────┐
│  OpenWebUI │───▶│   Olla   │───▶│ Ollama Instance │
│ (Port 3000)│    │(Port     │    │  (Primary)       │
│            │    │ 40114)   │    │                  │
└─────────────┘    └──────────┘    └─────────────────┘
                          ├──────▶┌──────────────────┐
                          │       │ Ollama Instance 2│
                          │       │  (Fallback)      │
                          │       └──────────────────┘
                          └──────▶┌─────────────────┐
                                  │ Ollama Instance 3│
                                  │  (GPU)         │
                                  └─────────────────┘

Quick Start

Docker Compose Setup

  1. Create compose.yaml:
services:
  # Olla proxy/load balancer
  olla:
    image: ghcr.io/thushan/olla:latest
    container_name: olla
    restart: unless-stopped
    ports:
      - "40114:40114"
    volumes:
      - ./olla.yaml:/app/config.yaml:ro
      - ./logs:/app/logs
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:40114/internal/health"]
      timeout: 5s
      interval: 30s
      retries: 3
      start_period: 10s

  # OpenWebUI interface
  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: openwebui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - openwebui_data:/app/backend/data
    environment:
      # Point to Olla instead of direct Ollama
      - OLLAMA_BASE_URL=http://olla:40114/olla/ollama
      - WEBUI_NAME=Olla + OpenWebUI
      - WEBUI_URL=http://localhost:3000
    depends_on:
      olla:
        condition: service_healthy

volumes:
  openwebui_data:
    driver: local
  1. Create olla.yaml configuration - copy the existing olla.yaml, below is just for brevity:
server:
  host: "0.0.0.0"
  port: 40114

proxy:
  engine: "sherpa"
  load_balancer: "priority"

discovery:
  type: "static"
  static:
    endpoints:
      - url: "http://192.168.1.100:11434"
        name: "main-ollama"
        type: "ollama"
        priority: 100

      - url: "http://192.168.1.101:11434"
        name: "backup-ollama"
        type: "ollama"
        priority: 50
  1. Start the stack:
docker compose up -d
  1. Access OpenWebUI at http://localhost:3000

Configuration Options

Basic Configuration

The minimal configuration requires setting the Ollama base URL:

environment:
  - OLLAMA_BASE_URL=http://olla:40114/olla/ollama

Advanced Configuration

environment:
  # Olla connection
  - OLLAMA_BASE_URL=http://olla:40114/olla/ollama

  # OpenWebUI settings
  - WEBUI_NAME=My AI Assistant
  - WEBUI_URL=http://localhost:3000
  - WEBUI_SECRET_KEY=change-this-secret-key

  # Default models
  - DEFAULT_MODELS=llama3.2:latest,mistral:latest

  # User management
  - DEFAULT_USER_ROLE=user
  - ENABLE_SIGNUP=true

  # Features
  - ENABLE_RAG_WEB_SEARCH=true
  - RAG_WEB_SEARCH_ENGINE=duckduckgo

See Ollama documentation for more details.

Using Multiple Backends

Olla enables OpenWebUI to use multiple backend types simultaneously:

Mixed Backend Configuration

discovery:
  static:
    endpoints:
      # Primary Ollama instance
      - url: "http://gpu-server:11434"
        name: "ollama-gpu"
        type: "ollama"
        priority: 100

      # LM Studio for specific models
      - url: "http://workstation:1234"
        name: "lmstudio"
        type: "lm-studio"
        priority: 80

      # vLLM for high throughput
      - url: "http://vllm-server:8000"
        name: "vllm"
        type: "vllm"
        priority: 60

Model Unification

OpenWebUI sees a unified model list across all backends:

# Check unified models
curl http://localhost:40114/olla/ollama/api/tags

# Response includes models from all Ollama-type endpoints
{
  "models": [
    {"name": "llama3.2:latest", "size": 2023547950, ...},
    {"name": "mistral:latest", "size": 4113487360, ...},
    {"name": "codellama:13b", "size": 7365960704, ...}
  ]
}

Standalone Setup

Without Docker

  1. Start Olla:
olla --config olla.yaml
  1. Start OpenWebUI:
docker run -d \
  --name openwebui \
  -p 3000:8080 \
  -v openwebui_data:/app/backend/data \
  -e OLLAMA_BASE_URL=http://host.docker.internal:40114/olla/ollama \
  ghcr.io/open-webui/open-webui:main

With Existing OpenWebUI

Update your existing OpenWebUI configuration:

# Stop OpenWebUI
docker stop openwebui

# Update environment
docker run -d \
  --name openwebui \
  -p 3000:8080 \
  -v openwebui_data:/app/backend/data \
  -e OLLAMA_BASE_URL=http://your-olla-host:40114/olla/ollama \
  ghcr.io/open-webui/open-webui:main

Monitoring

Check Health

# Olla health
curl http://localhost:40114/internal/health

# Endpoint status
curl http://localhost:40114/internal/status/endpoints

# Available models
curl http://localhost:40114/olla/ollama/api/tags

View Logs

# Olla logs
docker logs olla -f

# OpenWebUI logs
docker logs openwebui -f

Monitor Performance

Check response headers for routing information:

curl -I http://localhost:40114/olla/ollama/api/tags

# Headers show:
# X-Olla-Endpoint: main-ollama
# X-Olla-Backend-Type: ollama
# X-Olla-Response-Time: 45ms

Troubleshooting

Models Not Appearing

Issue: OpenWebUI doesn't show any models

Solution:

  1. Verify Olla is healthy:

    curl http://localhost:40114/internal/health
    

  2. Check endpoints are discovered:

    curl http://localhost:40114/internal/status/endpoints
    

  3. Verify models are available:

    curl http://localhost:40114/olla/ollama/api/tags
    

  4. Check OpenWebUI logs:

    docker logs openwebui | grep -i error
    

Connection Refused

Issue: OpenWebUI can't connect to Olla

Solution:

  1. Verify network connectivity:

    docker exec openwebui ping olla
    

  2. Check Olla is listening:

    netstat -an | grep 40114
    

  3. Verify environment variable:

    docker exec openwebui env | grep OLLAMA_BASE_URL
    

Slow Response Times

Issue: Chat responses are slow

Solution:

  1. Ensure that Proxy Profile is set correctly:
    proxy:
     profile: "auto" # or "streaming"
    
  2. Switch to high-performance engine:

    proxy:
      engine: "olla"  # Instead of "sherpa"
    

  3. Use appropriate load balancer:

    proxy:
      load_balancer: "least-connections"
    

  4. Increase timeouts:

    proxy:
      response_timeout: 1200s  # 20 minutes
    

Docker Networking Issues

Issue: Containers can't communicate

Solution:

For Ollama on Docker host:

endpoints:
  - url: "http://host.docker.internal:11434"  # macOS/Windows
  - url: "http://172.17.0.1:11434"            # Linux

For remote instances:

endpoints:
  - url: "http://192.168.1.100:11434"  # Use actual IP

Advanced Features

GPU Support

Add GPU-enabled Ollama to the stack:

services:
  ollama-gpu:
    image: ollama/ollama:latest
    container_name: ollama-gpu
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  olla:
    # ... existing config
    depends_on:
      - ollama-gpu

Update olla.yaml:

endpoints:
  - url: "http://ollama-gpu:11434"
    name: "local-gpu"
    type: "ollama"
    priority: 100

Authentication

Authentication Not Supported

Olla does not currently support authentication headers for endpoints. If your API requires authentication, you'll need to:

  • Use a reverse proxy that adds authentication
  • Wait for this feature to be implemented
  • Access only public/local endpoints

Custom Networks

Create isolated networks:

networks:
  olla-net:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16

services:
  olla:
    networks:
      - olla-net

  openwebui:
    networks:
      - olla-net

Best Practices

1. Use Priority Load Balancing

Configure priorities based on cost and performance:

endpoints:
  # Free/local first
  - url: "http://localhost:11434"
    priority: 100

  # Backup/cloud
  - url: "https://api.provider.com"
    priority: 10

2. Monitor Health

Set up health check alerts:

discovery:
  static:
    endpoints:
      - url: "http://ollama:11434"
        check_interval: 10s
        check_timeout: 2s

3. Configure Appropriate Timeouts

For large models:

proxy:
  response_timeout: 1800s  # 30 minutes
  read_timeout: 600s       # 10 minutes

4. Use Volumes for Persistence

volumes:
  - ./olla-config:/app/config:ro
  - ./olla-logs:/app/logs
  - openwebui_data:/app/backend/data

Integration with Other Tools

Nginx Reverse Proxy

server {
    listen 80;
    server_name ai.example.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
    }

    location /olla/ {
        proxy_pass http://localhost:40114;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
    }
}

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: olla
spec:
  replicas: 1
  selector:
    matchLabels:
      app: olla
  template:
    metadata:
      labels:
        app: olla
    spec:
      containers:
      - name: olla
        image: ghcr.io/thushan/olla:latest
        ports:
        - containerPort: 40114
        volumeMounts:
        - name: config
          mountPath: /app/config.yaml
          subPath: olla.yaml
      volumes:
      - name: config
        configMap:
          name: olla-config
---
apiVersion: v1
kind: Service
metadata:
  name: olla
spec:
  selector:
    app: olla
  ports:
  - port: 40114
    targetPort: 40114

Example Repository

A complete example is available at: github.com/thushan/olla/examples/ollama-openwebui

Next Steps