Security Best Practices¶
This guide covers security considerations and best practices for deploying Olla in production environments.
Default Security Configuration
Key Settings:server: rate_limits: global_requests_per_minute: 1000 per_ip_requests_per_minute: 100 request_limits: max_body_size: 52428800 # 50MB max_header_size: 524288 # 512KB
- Rate limiting enabled by default
- Request size limits prevent abuse
- No authentication built-in (use reverse proxy)
Environment Variables:
OLLA_SERVER_RATE_LIMITS_*
Security Principles¶
Olla follows these security principles:
- Defence in Depth: Multiple layers of security controls
- Least Privilege: Minimal permissions required
- Fail Secure: Safe defaults when errors occur
- Zero Trust: Verify all requests
Network Security¶
Bind Address Configuration¶
Control network exposure carefully:
# Development - local only
server:
host: "localhost" # Only local connections
port: 40114
# Production - controlled exposure
server:
host: "0.0.0.0" # Accept from network
port: 40114
Recommendations:
- Use
localhostunless network access is required - Deploy behind a reverse proxy for internet exposure
- Use firewall rules to restrict access
TLS/HTTPS Configuration¶
Olla doesn't handle TLS directly. Use a reverse proxy:
nginx example:
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://localhost:40114;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Caddy example:
Firewall Rules¶
Restrict access at the network level:
# Allow only from trusted networks
iptables -A INPUT -p tcp --dport 40114 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 40114 -j DROP
# Or use UFW
ufw allow from 10.0.0.0/8 to any port 40114
ufw deny 40114
Rate Limiting¶
Protect against abuse and DoS attacks:
Global Rate Limits¶
Prevent system overload:
server:
rate_limits:
global_requests_per_minute: 5000 # Total system capacity
health_requests_per_minute: 1000 # Monitoring endpoints
Per-Client Rate Limits¶
Prevent single client abuse:
server:
rate_limits:
per_ip_requests_per_minute: 60 # Strict for public APIs
burst_size: 10 # Small burst allowance
Rate Limit Strategies¶
| Deployment Type | Global RPM | Per-IP RPM | Burst |
|---|---|---|---|
| Internal API | 10000 | 1000 | 100 |
| Public API | 5000 | 60 | 10 |
| Development | 1000 | 100 | 50 |
| High Security | 1000 | 20 | 5 |
Request Validation¶
Size Limits¶
Prevent resource exhaustion attacks:
server:
request_limits:
max_body_size: 10485760 # 10MB - adjust based on needs
max_header_size: 131072 # 128KB - usually sufficient
Guidelines:
- Set limits based on legitimate use cases
- Smaller limits for public APIs
- Monitor rejected requests
Input Validation¶
Olla validates:
- Request size limits
- Header size limits
- URL format and structure
- HTTP method restrictions
Access Control¶
Internal Endpoints¶
Restrict access to internal endpoints:
# These endpoints should not be public:
# /internal/health
# /internal/status
# /internal/process
# /version
nginx protection:
Backend Endpoint Security¶
Secure your LLM endpoints:
discovery:
static:
endpoints:
# Use internal networks
- url: "http://10.0.1.10:11434" # Internal IP
name: "internal-ollama"
type: "ollama"
# Avoid public endpoints when possible
# If required, ensure they have authentication
Deployment Security¶
Container Security¶
When using Docker:
# Run as non-root user
FROM golang:1.21-alpine AS builder
# ... build steps ...
FROM alpine:latest
RUN adduser -D -g '' appuser
USER appuser
COPY --from=builder /app/olla /usr/local/bin/
# docker-compose.yml
services:
olla:
image: ghcr.io/thushan/olla:latest
user: "1000:1000" # Non-root UID/GID
read_only: true # Read-only filesystem
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if needed for low ports
Kubernetes Security¶
Security policies for Kubernetes:
apiVersion: v1
kind: Pod
metadata:
name: olla
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: olla
image: ghcr.io/thushan/olla:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
resources:
limits:
memory: "512Mi"
cpu: "1000m"
requests:
memory: "256Mi"
cpu: "100m"
Process Isolation¶
Run Olla with minimal privileges:
# systemd service
[Service]
User=olla
Group=olla
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
NoNewPrivileges=yes
ReadWritePaths=/var/log/olla
Logging and Auditing¶
Security Logging¶
Configure appropriate logging:
server:
request_logging: true # Log all requests
logging:
level: "info"
format: "json" # Structured logs for analysis
output: "stdout"
Log Sensitive Data¶
Never log:
- API keys or tokens
- Request/response bodies with sensitive data
- User credentials
- Internal IP addresses in public logs
Audit Trail¶
Monitor these security events:
- Rate limit violations
- Request size limit violations
- Circuit breaker trips
- Failed health checks
- Configuration changes
Upstream Response Header Stripping¶
Olla removes the following headers from upstream responses before returning them to clients:
AuthorizationProxy-AuthorizationSet-CookieX-Api-KeyX-Auth-Token
Any header named in an endpoint's auth.header field or in the headers: map is also stripped on the response side. This means operator-supplied custom auth header names are protected even when they do not appear in the list above. The reason: backends should not be able to set cookies on clients or reflect credentials back through Olla.
Custom header names
If you configure a non-standard credential header (e.g. auth.header: X-My-Token), Olla strips X-My-Token from responses as well. No additional configuration is needed.
CORS¶
CORS is disabled by default. It is only relevant when a browser client (OpenWebUI, a custom dashboard, or a web app) connects directly to Olla. CLI tools, SDKs, and coding agents send no Origin header and pass through Olla completely untouched regardless of this setting.
When to enable¶
Enable CORS when:
- You are running a browser-based UI (e.g. OpenWebUI) that talks directly to Olla rather than through a reverse proxy that handles CORS itself.
- You have a custom JavaScript dashboard consuming Olla's API.
Do not enable CORS when all clients are server-side or CLI-based -- it adds no value and broadens the attack surface.
Permissive configuration (development)¶
Suitable for local development where any origin should be allowed:
server:
cors:
enabled: true
allowed_origins:
- "*"
allowed_methods:
- "GET"
- "POST"
- "OPTIONS"
allowed_headers:
- "*"
max_age: 300
Locked-down configuration (production)¶
Restrict to known origins and expose only the headers your UI needs to read:
server:
cors:
enabled: true
allowed_origins:
- "https://my-dashboard.example.com"
allowed_methods:
- "GET"
- "POST"
- "OPTIONS"
allowed_headers:
- "Authorization"
- "Content-Type"
allow_credentials: true
max_age: 600
Credentials + wildcard origin
Setting allow_credentials: true alongside allowed_origins: ["*"] is forbidden by the CORS specification. Olla rejects this combination at startup with a fatal error. Always list explicit origins when enabling credentials.
Exposed headers¶
When exposed_headers is left empty (the default), Olla automatically exposes the full X-Olla-* response header set to browser clients:
X-Olla-Endpoint,X-Olla-Model,X-Olla-Backend-Type,X-Olla-Request-ID,X-Olla-Response-Time- Routing headers:
X-Olla-Routing-Strategy,X-Olla-Routing-Decision,X-Olla-Routing-Reason - Sticky session headers:
X-Olla-Sticky-Session,X-Olla-Sticky-Key-Source,X-Olla-Session-ID
This means browser JavaScript can read routing and model metadata without any additional configuration. Override by listing specific headers in exposed_headers if you want to restrict what the browser can access.
Environment variable overrides¶
| Variable | Type | Example |
|---|---|---|
OLLA_SERVER_CORS_ENABLED | bool | true |
OLLA_SERVER_CORS_ALLOWED_ORIGINS | comma-separated | https://app.example.com,https://admin.example.com |
OLLA_SERVER_CORS_ALLOWED_METHODS | comma-separated | GET,POST,OPTIONS |
OLLA_SERVER_CORS_ALLOWED_HEADERS | comma-separated | Authorization,Content-Type |
OLLA_SERVER_CORS_EXPOSED_HEADERS | comma-separated | X-Olla-Model,X-Olla-Endpoint |
OLLA_SERVER_CORS_ALLOW_CREDENTIALS | bool | true |
OLLA_SERVER_CORS_MAX_AGE | int (seconds) | 600 |
No spaces in comma-separated values
Env var lists use commas with no surrounding spaces: https://a.com,https://b.com not https://a.com, https://b.com.
See Configuration Reference for the full field reference.
Secrets Resolution¶
Credential values in auth: and headers: blocks support two forms:
${VAR}: resolved from the environment at startup. An unset variable with no:-defaultis a fatal error, so misconfigured auth surfaces before the server starts accepting traffic._filefields (token_file,key_file,username_file,password_file): reads the secret from a file path and trims whitespace. The standard pattern for Docker Secrets and Kubernetes mounted volumes.
Setting both the inline field and its _file sibling is a fatal startup error.
Secrets Management¶
Configuration Files¶
Protect configuration files:
Environment Variables¶
Use environment variables for sensitive data:
Security Monitoring¶
Key Metrics¶
Monitor for security issues:
- Rate Limit Hits: Potential abuse
- Error Rates: Potential attacks
- Request Patterns: Unusual activity
- Circuit Breaker: Endpoint failures
- Response Times: DoS indicators
Alerting Thresholds¶
Set alerts for:
- Rate limit violations > 10/minute
- Error rate > 5%
- Circuit breaker trips > 3/hour
- Response time > 10s
- Memory usage > 80%
Common Security Issues¶
1. Exposed Internal Endpoints¶
Risk: Information disclosure
Mitigation:
2. No Rate Limiting¶
Risk: DoS attacks
Mitigation:
3. Large Request Acceptance¶
Risk: Resource exhaustion
Mitigation:
4. Public Bind Address¶
Risk: Unintended exposure
Mitigation:
Security Checklist¶
Production deployment checklist:
- Configure rate limiting
- Set request size limits
- Use reverse proxy with TLS
- Restrict bind address appropriately
- Configure firewall rules
- Run as non-root user
- Protect configuration files
- Enable request logging
- Monitor security metrics
- Regular security updates
- Implement log rotation
- Set up alerting
- Document security procedures
- Enable CORS only if browser clients connect directly; use explicit origins in production
Incident Response¶
Rate Limit Violations¶
When detecting abuse:
- Check logs for source IPs
- Verify legitimate vs malicious traffic
- Adjust rate limits if needed
- Block malicious IPs at firewall
- Document the incident
Circuit Breaker Trips¶
When endpoints fail:
- Check endpoint health directly
- Review error logs
- Verify network connectivity
- Check for attacks on backends
- Implement additional monitoring
Compliance Considerations¶
Data Protection¶
- Olla doesn't store request/response data
- Logs may contain metadata
- Configure log retention appropriately
- Implement log encryption if required
Network Segmentation¶
- Deploy in appropriate network zones
- Use private networks for backend communication
- Implement network policies
- Regular security assessments
Next Steps¶
- Performance Best Practices - Optimise safely
- Monitoring Guide - Security monitoring
- Configuration Reference - Security settings