OpenWebUI Integration with Ollama¶
OpenWebUI is a powerful web interface for interacting with LLMs. Olla acts as a proxy between OpenWebUI and your Ollama backends, providing load balancing, failover and unified model management across multiple Ollama instances.
Set in OpenWebUI:
You can find an example integration of OpenWebUI with Olla and Ollama instances in examples/ollama-openwebui - see latest in Github.
Overview¶
| Project | github.com/open-webui/open-webui |
|---|---|
| Integration Type | Frontend UI |
| Connection Method | Ollama API Compatibility |
| Features Supported (via Olla) |
|
| Configuration | Set OLLAMA_BASE_URL to Olla endpoint |
| Example | You can find an example of integration in examples/ollama-openwebui |
Architecture¶
┌─────────────┐ ┌──────────┐ ┌─────────────────┐
│ OpenWebUI │───▶│ Olla │───▶│ Ollama Instance │
│ (Port 3000)│ │(Port │ │ (Primary) │
│ │ │ 40114) │ │ │
└─────────────┘ └──────────┘ └─────────────────┘
│
├──────▶┌──────────────────┐
│ │ Ollama Instance 2│
│ │ (Fallback) │
│ └──────────────────┘
│
└──────▶┌─────────────────┐
│ Ollama Instance 3│
│ (GPU) │
└─────────────────┘
Quick Start¶
Docker Compose Setup¶
- Create
compose.yaml:
services:
# Olla proxy/load balancer
olla:
image: ghcr.io/thushan/olla:latest
container_name: olla
restart: unless-stopped
ports:
- "40114:40114"
volumes:
- ./olla.yaml:/app/config.yaml:ro
- ./logs:/app/logs
healthcheck:
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:40114/internal/health"]
timeout: 5s
interval: 30s
retries: 3
start_period: 10s
# OpenWebUI interface
openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
restart: unless-stopped
ports:
- "3000:8080"
volumes:
- openwebui_data:/app/backend/data
environment:
# Point to Olla instead of direct Ollama
- OLLAMA_BASE_URL=http://olla:40114/olla/ollama
- WEBUI_NAME=Olla + OpenWebUI
- WEBUI_URL=http://localhost:3000
depends_on:
olla:
condition: service_healthy
volumes:
openwebui_data:
driver: local
- Create
olla.yamlconfiguration - copy the existingolla.yaml, below is just for brevity:
server:
host: "0.0.0.0"
port: 40114
proxy:
engine: "sherpa"
load_balancer: "priority"
discovery:
type: "static"
static:
endpoints:
- url: "http://192.168.1.100:11434"
name: "main-ollama"
type: "ollama"
priority: 100
- url: "http://192.168.1.101:11434"
name: "backup-ollama"
type: "ollama"
priority: 50
- Start the stack:
- Access OpenWebUI at http://localhost:3000
Configuration Options¶
Basic Configuration¶
The minimal configuration requires setting the Ollama base URL:
Advanced Configuration¶
environment:
# Olla connection
- OLLAMA_BASE_URL=http://olla:40114/olla/ollama
# OpenWebUI settings
- WEBUI_NAME=My AI Assistant
- WEBUI_URL=http://localhost:3000
- WEBUI_SECRET_KEY=change-this-secret-key
# Default models
- DEFAULT_MODELS=llama3.2:latest,mistral:latest
# User management
- DEFAULT_USER_ROLE=user
- ENABLE_SIGNUP=true
# Features
- ENABLE_RAG_WEB_SEARCH=true
- RAG_WEB_SEARCH_ENGINE=duckduckgo
See Ollama documentation for more details.
Using Multiple Backends¶
Olla enables OpenWebUI to use multiple backend types simultaneously:
Mixed Backend Configuration¶
discovery:
static:
endpoints:
# Primary Ollama instance
- url: "http://gpu-server:11434"
name: "ollama-gpu"
type: "ollama"
priority: 100
# LM Studio for specific models
- url: "http://workstation:1234"
name: "lmstudio"
type: "lm-studio"
priority: 80
# vLLM for high throughput
- url: "http://vllm-server:8000"
name: "vllm"
type: "vllm"
priority: 60
Model Unification¶
OpenWebUI sees a unified model list across all backends:
# Check unified models
curl http://localhost:40114/olla/ollama/api/tags
# Response includes models from all Ollama-type endpoints
{
"models": [
{"name": "llama3.2:latest", "size": 2023547950, ...},
{"name": "mistral:latest", "size": 4113487360, ...},
{"name": "codellama:13b", "size": 7365960704, ...}
]
}
Standalone Setup¶
Without Docker¶
- Start Olla:
- Start OpenWebUI:
docker run -d \
--name openwebui \
-p 3000:8080 \
-v openwebui_data:/app/backend/data \
-e OLLAMA_BASE_URL=http://host.docker.internal:40114/olla/ollama \
ghcr.io/open-webui/open-webui:main
With Existing OpenWebUI¶
Update your existing OpenWebUI configuration:
# Stop OpenWebUI
docker stop openwebui
# Update environment
docker run -d \
--name openwebui \
-p 3000:8080 \
-v openwebui_data:/app/backend/data \
-e OLLAMA_BASE_URL=http://your-olla-host:40114/olla/ollama \
ghcr.io/open-webui/open-webui:main
Monitoring¶
Check Health¶
# Olla health
curl http://localhost:40114/internal/health
# Endpoint status
curl http://localhost:40114/internal/status/endpoints
# Available models
curl http://localhost:40114/olla/ollama/api/tags
View Logs¶
Monitor Performance¶
Check response headers for routing information:
curl -I http://localhost:40114/olla/ollama/api/tags
# Headers show:
# X-Olla-Endpoint: main-ollama
# X-Olla-Backend-Type: ollama
# X-Olla-Response-Time: 45ms
Troubleshooting¶
Models Not Appearing¶
Issue: OpenWebUI doesn't show any models
Solution:
-
Verify Olla is healthy:
-
Check endpoints are discovered:
-
Verify models are available:
-
Check OpenWebUI logs:
Connection Refused¶
Issue: OpenWebUI can't connect to Olla
Solution:
-
Verify network connectivity:
-
Check Olla is listening:
-
Verify environment variable:
Slow Response Times¶
Issue: Chat responses are slow
Solution:
- Ensure that Proxy Profile is set correctly:
-
Switch to high-performance engine:
-
Use appropriate load balancer:
-
Increase timeouts:
Docker Networking Issues¶
Issue: Containers can't communicate
Solution:
For Ollama on Docker host:
endpoints:
- url: "http://host.docker.internal:11434" # macOS/Windows
- url: "http://172.17.0.1:11434" # Linux
For remote instances:
Advanced Features¶
GPU Support¶
Add GPU-enabled Ollama to the stack:
services:
ollama-gpu:
image: ollama/ollama:latest
container_name: ollama-gpu
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
olla:
# ... existing config
depends_on:
- ollama-gpu
Update olla.yaml:
Authentication¶
Authentication Not Supported
Olla does not currently support authentication headers for endpoints. If your API requires authentication, you'll need to:
- Use a reverse proxy that adds authentication
- Wait for this feature to be implemented
- Access only public/local endpoints
Custom Networks¶
Create isolated networks:
networks:
olla-net:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
services:
olla:
networks:
- olla-net
openwebui:
networks:
- olla-net
Best Practices¶
1. Use Priority Load Balancing¶
Configure priorities based on cost and performance:
endpoints:
# Free/local first
- url: "http://localhost:11434"
priority: 100
# Backup/cloud
- url: "https://api.provider.com"
priority: 10
2. Monitor Health¶
Set up health check alerts:
3. Configure Appropriate Timeouts¶
For large models:
4. Use Volumes for Persistence¶
Integration with Other Tools¶
Nginx Reverse Proxy¶
server {
listen 80;
server_name ai.example.com;
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
}
location /olla/ {
proxy_pass http://localhost:40114;
proxy_http_version 1.1;
proxy_set_header Host $host;
}
}
Kubernetes Deployment¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: olla
spec:
replicas: 1
selector:
matchLabels:
app: olla
template:
metadata:
labels:
app: olla
spec:
containers:
- name: olla
image: ghcr.io/thushan/olla:latest
ports:
- containerPort: 40114
volumeMounts:
- name: config
mountPath: /app/config.yaml
subPath: olla.yaml
volumes:
- name: config
configMap:
name: olla-config
---
apiVersion: v1
kind: Service
metadata:
name: olla
spec:
selector:
app: olla
ports:
- port: 40114
targetPort: 40114
Example Repository¶
A complete example is available at: github.com/thushan/olla/examples/ollama-openwebui
Next Steps¶
- Configuration Reference - Complete Olla configuration
- Load Balancing - Configure load balancing strategies
- Model Unification - Understand model management