Endpoint Authentication¶

Olla can attach outbound authentication headers to requests forwarded to a backend endpoint. This is for authenticating Olla to the backend. It has no bearing on how clients authenticate to Olla.

When to Use It¶

Most local inference servers (Ollama, llama.cpp without --api-key) run without authentication. You need auth: when:

A backend is started with an API key flag (e.g. vllm --api-key, llama-server --api-key)
A backend sits behind a reverse proxy that requires credentials
A LiteLLM gateway has a master key configured

Supported Types¶

`bearer`¶

Sends Authorization: Bearer <token>.

discovery:
  static:
    endpoints:
      - url: "http://gpu-server:8000"
        name: "vllm-gpu"
        type: "vllm"
        auth:
          type: bearer
          token: "sk-my-secret-token"

`api_key`¶

Sends a custom header (default X-Api-Key). Use header: to override. The raw credential value is written to the header with no scheme prefix -- use bearer if the backend expects Authorization: Bearer <token>.

      - url: "http://analytics-llm:9000"
        name: "analytics-gw"
        type: "openai-compatible"
        auth:
          type: api_key
          key: "${ANALYTICS_API_KEY}"
          header: "X-Api-Key"   # optional, this is the default

`basic`¶

Sends Authorization: Basic <base64(user:pass)>.

      - url: "http://internal-llm:8080"
        name: "llamacpp-basic"
        type: "llamacpp"
        auth:
          type: basic
          username: "admin"
          password: "s3cr3t"

Environment Variable Interpolation¶

Hardcoding credentials in config files is an antipattern. Use ${VAR} placeholders instead:

auth:
  type: bearer
  token: "${VLLM_API_KEY}"

Olla expands these at startup using ExpandStrict. If the variable is unset and has no default, the process exits with a clear error. This prevents silent misconfiguration.

Default Values¶

Use ${VAR:-default} for optional credentials or fallback values:

auth:
  type: api_key
  key: "${CUSTOM_API_KEY:-changeme}"

Defaults in production

:-default is useful for development. In production, prefer requiring the variable explicitly so a missing secret surfaces as a startup failure rather than silently using a fallback.

File-Based Secrets (`_file` suffix)¶

Each credential field has a _file sibling that reads the value from a file path. This is the standard pattern for Docker Secrets and Kubernetes mounted secrets, where a volume provides a file containing a single secret value.

auth:
  type: bearer
  token_file: "/run/secrets/vllm_api_key"

The file contents are trimmed of leading/trailing whitespace. Setting both the inline field and the _file field is a fatal startup error.

Available `_file` Fields¶

Auth type	Inline field	File field
`bearer`	`token`	`token_file`
`api_key`	`key`	`key_file`
`basic`	`username`	`username_file`
`basic`	`password`	`password_file`

Docker Compose Example¶

# docker-compose.yml
services:
  olla:
    image: ghcr.io/thushan/olla:latest
    secrets:
      - vllm_api_key
    volumes:
      - ./config.local.yaml:/app/config/config.local.yaml

secrets:
  vllm_api_key:
    file: ./secrets/vllm_api_key.txt

# config.local.yaml
discovery:
  static:
    endpoints:
      - url: "http://vllm:8000"
        name: "vllm"
        type: "vllm"
        auth:
          type: bearer
          token_file: "/run/secrets/vllm_api_key"

Kubernetes Secret Example¶

apiVersion: v1
kind: Secret
metadata:
  name: olla-backend-creds
stringData:
  vllm-token: "sk-my-token"
---
# In your Deployment, mount as a volume or env var:
env:
  - name: VLLM_API_KEY
    valueFrom:
      secretKeyRef:
        name: olla-backend-creds
        key: vllm-token

Then reference it from config:

auth:
  type: bearer
  token: "${VLLM_API_KEY}"

The `headers:` Escape Hatch¶

For backends that need authentication headers that don't fit bearer/api_key/basic, use the headers: map directly. Headers set here are copied verbatim on every forwarded request.

      - url: "http://custom-llm:9000"
        name: "custom"
        type: "openai-compatible"
        headers:
          X-Custom-Auth: "token abc123"
          X-Tenant-ID: "acme"

headers: and auth: can coexist. The auth: block sets the Authorization (or configured) header; headers: sets everything else.

Order of Precedence¶

When a forwarded request is assembled, headers are applied in this order:

Client request headers are stripped of hop-by-hop headers
headers: map values are set verbatim
auth: sets the credential header (overrides any headers: entry for the same name)

The auth: block intentionally wins over headers: for the credential header. This prevents an operator from accidentally overriding a resolved secret with a static headers: entry.

Request and Response Headers¶

The precedence rules above apply to the request path (Olla to the backend). The response path (backend to your client) is handled separately, and the two do not interact.

Client request headers pass through to the backend untouched, apart from hop-by-hop headers and the inbound Authorization / Cookie strip that protects against client credentials leaking upstream. Configuring auth: or headers: on an endpoint does not strip or rewrite anything a client sends.

On the response path, Olla strips a small set of headers the backend returns before forwarding to the client:

A static list: Authorization, Proxy-Authorization, Set-Cookie, X-Api-Key, X-Auth-Token
Any header name configured in that endpoint's auth: or headers: block

The second rule guards against reflection. If you inject X-Custom-Auth: <secret> toward a backend and that backend echoes the header back in its response, Olla removes it so the injected credential cannot leak back out. The strip is keyed on the header name you configured, not on anything the client sends, so custom client headers keep working as before.

Fatal Startup Behaviour¶

Auth validation runs before the HTTP server starts. The process exits immediately on:

Unknown auth.type (must be bearer, api_key, or basic)
Both inline field and _file sibling set simultaneously
Neither inline nor _file set for a required field
${VAR} placeholder where VAR is unset and no :-default is provided
File in _file field that does not exist or cannot be read

This fail-fast behaviour is intentional: a proxy that silently starts without credentials and forwards unauthenticated requests to a protected backend is harder to debug than a startup error.

Recipes¶

vLLM with `--api-key`¶

Start vLLM:

vllm serve meta-llama/Llama-3.1-8B-Instruct --api-key sk-my-key

Olla config:

      - url: "http://vllm-host:8000"
        name: "vllm-gpu"
        type: "vllm"
        auth:
          type: bearer
          token: "${VLLM_API_KEY}"

llama.cpp with `--api-key`¶

Start llama-server:

llama-server -m model.gguf --api-key sk-my-key

Olla config:

      - url: "http://llamacpp-host:8080"
        name: "llamacpp"
        type: "llamacpp"
        auth:
          type: bearer
          token: "${LLAMACPP_API_KEY}"

LiteLLM with Master Key¶

Start LiteLLM proxy:

litellm --config litellm_config.yaml --master_key sk-master

Olla config:

      - url: "http://litellm:4000"
        name: "litellm-gw"
        type: "litellm"
        auth:
          type: bearer
          token: "${LITELLM_MASTER_KEY}"

LiteLLM API key format

LiteLLM accepts the master key as a standard Authorization: Bearer header or as x-goog-api-key depending on the version and configuration. Use api_key auth with header: x-goog-api-key if bearer does not work for your deployment.

Endpoint Authentication¶

When to Use It¶

Supported Types¶

bearer¶

api_key¶

basic¶

Environment Variable Interpolation¶

Default Values¶

File-Based Secrets (_file suffix)¶

Available _file Fields¶

Docker Compose Example¶

Kubernetes Secret Example¶

The headers: Escape Hatch¶

Order of Precedence¶

Request and Response Headers¶

Fatal Startup Behaviour¶

Recipes¶

vLLM with --api-key¶

llama.cpp with --api-key¶

LiteLLM with Master Key¶

See Also¶

`bearer`¶

`api_key`¶

`basic`¶

File-Based Secrets (`_file` suffix)¶

Available `_file` Fields¶

The `headers:` Escape Hatch¶

vLLM with `--api-key`¶

llama.cpp with `--api-key`¶