Security Best Practices¶

This guide covers security considerations and best practices for deploying Olla in production environments.

Default Security Configuration
server:
  rate_limits:
    global_requests_per_minute: 1000
    per_ip_requests_per_minute: 100
  request_limits:
    max_body_size: 104857600  # 100MB
    max_header_size: 1048576  # 1MB
Key Settings:

Rate limiting enabled by default

Request size limits prevent abuse

No authentication built-in (use reverse proxy)

Environment Variables: OLLA_SERVER_GLOBAL_RATE_LIMIT, OLLA_SERVER_PER_IP_RATE_LIMIT, OLLA_SERVER_MAX_BODY_SIZE, OLLA_SERVER_MAX_HEADER_SIZE

Security Principles¶

Olla follows these security principles:

Defence in Depth: Multiple layers of security controls
Least Privilege: Minimal permissions required
Fail Secure: Safe defaults when errors occur
Zero Trust: Verify all requests

Network Security¶

Bind Address Configuration¶

Control network exposure carefully:

# Development - local only
server:
  host: "localhost"  # Only local connections
  port: 40114

# Production - controlled exposure
server:
  host: "0.0.0.0"   # Accept from network
  port: 40114

Recommendations:

Use localhost unless network access is required
Deploy behind a reverse proxy for internet exposure
Use firewall rules to restrict access

TLS/HTTPS Configuration¶

Olla doesn't handle TLS directly. Use a reverse proxy:

nginx example:

server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    location / {
        proxy_pass http://localhost:40114;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Caddy example:

api.example.com {
    reverse_proxy localhost:40114
}

Firewall Rules¶

Restrict access at the network level:

# Allow only from trusted networks
iptables -A INPUT -p tcp --dport 40114 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 40114 -j DROP

# Or use UFW
ufw allow from 10.0.0.0/8 to any port 40114
ufw deny 40114

Rate Limiting¶

Protect against abuse and DoS attacks:

Global Rate Limits¶

Prevent system overload:

server:
  rate_limits:
    global_requests_per_minute: 5000  # Total system capacity
    health_requests_per_minute: 1000  # Monitoring endpoints

Per-Client Rate Limits¶

Prevent single client abuse:

server:
  rate_limits:
    per_ip_requests_per_minute: 60  # Strict for public APIs
    burst_size: 10                  # Small burst allowance

Rate Limit Strategies¶

Deployment Type	Global RPM	Per-IP RPM	Burst
Internal API	10000	1000	100
Public API	5000	60	10
Development	1000	100	50
High Security	1000	20	5

Request Validation¶

Size Limits¶

Prevent resource exhaustion attacks:

server:
  request_limits:
    max_body_size: 10485760  # 10MB - adjust based on needs
    max_header_size: 131072  # 128KB - usually sufficient

Guidelines:

Set limits based on legitimate use cases
Smaller limits for public APIs
Monitor rejected requests

Input Validation¶

Olla validates:

Request size limits
Header size limits
URL format and structure
HTTP method restrictions

Access Control¶

Internal Endpoints¶

Restrict access to internal endpoints:

# These endpoints should not be public:
# /internal/health
# /internal/status
# /internal/process
# /version

nginx protection:

location /internal/ {
    allow 10.0.0.0/8;
    deny all;
    proxy_pass http://localhost:40114;
}

Backend Endpoint Security¶

Secure your LLM endpoints:

discovery:
  static:
    endpoints:
      # Use internal networks
      - url: "http://10.0.1.10:11434"  # Internal IP
        name: "internal-ollama"
        type: "ollama"

      # Avoid public endpoints when possible
      # If required, ensure they have authentication

Deployment Security¶

Container Security¶

When using Docker:

# Run as non-root user
FROM golang:1.24-alpine AS builder
# ... build steps ...

FROM alpine:latest
RUN adduser -D -g '' appuser
USER appuser
COPY --from=builder /app/olla /usr/local/bin/

# docker-compose.yml
services:
  olla:
    image: ghcr.io/thushan/olla:latest
    user: "1000:1000"  # Non-root UID/GID
    read_only: true     # Read-only filesystem
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if needed for low ports

Kubernetes Security¶

Security policies for Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: olla
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
  containers:
  - name: olla
    image: ghcr.io/thushan/olla:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
    resources:
      limits:
        memory: "512Mi"
        cpu: "1000m"
      requests:
        memory: "256Mi"
        cpu: "100m"

Process Isolation¶

Run Olla with minimal privileges:

# systemd service
[Service]
User=olla
Group=olla
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=yes
NoNewPrivileges=yes
ReadWritePaths=/var/log/olla

Logging and Auditing¶

Security Logging¶

Configure appropriate logging:

server:
  request_logging: true  # Log all requests

logging:
  level: "info"
  format: "json"  # Structured logs for analysis
  output: "stdout"

Log Sensitive Data¶

Never log:

API keys or tokens
Request/response bodies with sensitive data
User credentials
Internal IP addresses in public logs

Audit Trail¶

Monitor these security events:

Rate limit violations
Request size limit violations
Circuit breaker trips
Failed health checks
Configuration changes

Upstream Response Header Stripping¶

Olla removes the following headers from upstream responses before returning them to clients:

Authorization
Proxy-Authorization
Set-Cookie
X-Api-Key
X-Auth-Token

Any header named in an endpoint's auth.header field or in the headers: map is also stripped on the response side. This means operator-supplied custom auth header names are protected even when they do not appear in the list above. The reason: backends should not be able to set cookies on clients or reflect credentials back through Olla.

Custom header names

If you configure a non-standard credential header (e.g. auth.header: X-My-Token), Olla strips X-My-Token from responses as well. No additional configuration is needed.

CORS¶

CORS is disabled by default. It is only relevant when a browser client (OpenWebUI, a custom dashboard, or a web app) connects directly to Olla. CLI tools, SDKs, and coding agents send no Origin header and pass through Olla completely untouched regardless of this setting.

When to enable¶

Enable CORS when:

You are running a browser-based UI (e.g. OpenWebUI) that talks directly to Olla rather than through a reverse proxy that handles CORS itself.
You have a custom JavaScript dashboard consuming Olla's API.

Do not enable CORS when all clients are server-side or CLI-based -- it adds no value and broadens the attack surface.

Permissive configuration (development)¶

Suitable for local development where any origin should be allowed:

server:
  cors:
    enabled: true
    allowed_origins:
      - "*"
    allowed_methods:
      - "GET"
      - "POST"
      - "OPTIONS"
    allowed_headers:
      - "*"
    max_age: 300

Locked-down configuration (production)¶

Restrict to known origins and expose only the headers your UI needs to read:

server:
  cors:
    enabled: true
    allowed_origins:
      - "https://my-dashboard.example.com"
    allowed_methods:
      - "GET"
      - "POST"
      - "OPTIONS"
    allowed_headers:
      - "Authorization"
      - "Content-Type"
    allow_credentials: true
    max_age: 600

Credentials + wildcard origin

Setting allow_credentials: true alongside allowed_origins: ["*"] is forbidden by the CORS specification. Olla rejects this combination at startup with a fatal error. Always list explicit origins when enabling credentials.

Exposed headers¶

When exposed_headers is left empty (the default), Olla automatically exposes the full X-Olla-* response header set to browser clients:

X-Olla-Endpoint, X-Olla-Model, X-Olla-Backend-Type, X-Olla-Request-ID, X-Olla-Response-Time
Routing headers: X-Olla-Routing-Strategy, X-Olla-Routing-Decision, X-Olla-Routing-Reason
Sticky session headers: X-Olla-Sticky-Session, X-Olla-Sticky-Key-Source, X-Olla-Session-ID

This means browser JavaScript can read routing and model metadata without any additional configuration. Override by listing specific headers in exposed_headers if you want to restrict what the browser can access.

Environment variable overrides¶

Variable	Type	Example
`OLLA_SERVER_CORS_ENABLED`	bool	`true`
`OLLA_SERVER_CORS_ALLOWED_ORIGINS`	comma-separated	`https://app.example.com,https://admin.example.com`
`OLLA_SERVER_CORS_ALLOWED_METHODS`	comma-separated	`GET,POST,OPTIONS`
`OLLA_SERVER_CORS_ALLOWED_HEADERS`	comma-separated	`Authorization,Content-Type`
`OLLA_SERVER_CORS_EXPOSED_HEADERS`	comma-separated	`X-Olla-Model,X-Olla-Endpoint`
`OLLA_SERVER_CORS_ALLOW_CREDENTIALS`	bool	`true`
`OLLA_SERVER_CORS_MAX_AGE`	int (seconds)	`600`

No spaces in comma-separated values

Env var lists use commas with no surrounding spaces: https://a.com,https://b.com not https://a.com, https://b.com.

See Configuration Reference for the full field reference.

Secrets Resolution¶

Credential values in auth: and headers: blocks support two forms:

${VAR}: resolved from the environment at startup. An unset variable with no :-default is a fatal error, so misconfigured auth surfaces before the server starts accepting traffic.
_file fields (token_file, key_file, username_file, password_file): reads the secret from a file path and trims whitespace. The standard pattern for Docker Secrets and Kubernetes mounted volumes.

Setting both the inline field and its _file sibling is a fatal startup error.

Secrets Management¶

Configuration Files¶

Protect configuration files:

# Restrict file permissions
chmod 600 config.yaml
chown olla:olla config.yaml

Environment Variables¶

Use environment variables for sensitive data:

# Instead of in config.yaml
export OLLA_SERVER_PORT=40114

Security Monitoring¶

Key Metrics¶

Monitor for security issues:

Rate Limit Hits: Potential abuse
Error Rates: Potential attacks
Request Patterns: Unusual activity
Circuit Breaker: Endpoint failures
Response Times: DoS indicators

Alerting Thresholds¶

Set alerts for:

Rate limit violations > 10/minute
Error rate > 5%
Circuit breaker trips > 3/hour
Response time > 10s
Memory usage > 80%

Common Security Issues¶

1. Exposed Internal Endpoints¶

Risk: Information disclosure

Mitigation:

location /internal/ {
    return 403;  # Or restrict by IP
}

2. No Rate Limiting¶

Risk: DoS attacks

Mitigation:

server:
  rate_limits:
    global_requests_per_minute: 1000
    per_ip_requests_per_minute: 60

3. Large Request Acceptance¶

Risk: Resource exhaustion

Mitigation:

server:
  request_limits:
    max_body_size: 5242880  # 5MB

4. Public Bind Address¶

Risk: Unintended exposure

Mitigation:

server:
  host: "localhost"  # Local only

Security Checklist¶

Production deployment checklist:

Incident Response¶

Rate Limit Violations¶

When detecting abuse:

Check logs for source IPs
Verify legitimate vs malicious traffic
Adjust rate limits if needed
Block malicious IPs at firewall
Document the incident

Circuit Breaker Trips¶

When endpoints fail:

Check endpoint health directly
Review error logs
Verify network connectivity
Check for attacks on backends
Implement additional monitoring

Compliance Considerations¶

Data Protection¶

Olla doesn't store request/response data
Logs may contain metadata
Configure log retention appropriately
Implement log encryption if required

Network Segmentation¶

Deploy in appropriate network zones
Use private networks for backend communication
Implement network policies
Regular security assessments

Next Steps¶

Performance Best Practices - Optimise safely
Monitoring Guide - Security monitoring
Configuration Reference - Security settings