Skip to content

Endpoint Authentication

Olla can attach outbound authentication headers to requests forwarded to a backend endpoint. This is for authenticating Olla to the backend. It has no bearing on how clients authenticate to Olla.

When to Use It

Most local inference servers (Ollama, llama.cpp without --api-key) run without authentication. You need auth: when:

  • A backend is started with an API key flag (e.g. vllm --api-key, llama-server --api-key)
  • A backend sits behind a reverse proxy that requires credentials
  • A LiteLLM gateway has a master key configured

Supported Types

bearer

Sends Authorization: Bearer <token>.

discovery:
  static:
    endpoints:
      - url: "http://gpu-server:8000"
        name: "vllm-gpu"
        type: "vllm"
        auth:
          type: bearer
          token: "sk-my-secret-token"

api_key

Sends a custom header (default X-Api-Key). Use header: to override. The raw credential value is written to the header with no scheme prefix -- use bearer if the backend expects Authorization: Bearer <token>.

      - url: "http://analytics-llm:9000"
        name: "analytics-gw"
        type: "openai-compatible"
        auth:
          type: api_key
          key: "${ANALYTICS_API_KEY}"
          header: "X-Api-Key"   # optional, this is the default

basic

Sends Authorization: Basic <base64(user:pass)>.

      - url: "http://internal-llm:8080"
        name: "llamacpp-basic"
        type: "llamacpp"
        auth:
          type: basic
          username: "admin"
          password: "s3cr3t"

Environment Variable Interpolation

Hardcoding credentials in config files is an antipattern. Use ${VAR} placeholders instead:

auth:
  type: bearer
  token: "${VLLM_API_KEY}"

Olla expands these at startup using ExpandStrict. If the variable is unset and has no default, the process exits with a clear error. This prevents silent misconfiguration.

Default Values

Use ${VAR:-default} for optional credentials or fallback values:

auth:
  type: api_key
  key: "${CUSTOM_API_KEY:-changeme}"

Defaults in production

:-default is useful for development. In production, prefer requiring the variable explicitly so a missing secret surfaces as a startup failure rather than silently using a fallback.

File-Based Secrets (_file suffix)

Each credential field has a _file sibling that reads the value from a file path. This is the standard pattern for Docker Secrets and Kubernetes mounted secrets, where a volume provides a file containing a single secret value.

auth:
  type: bearer
  token_file: "/run/secrets/vllm_api_key"

The file contents are trimmed of leading/trailing whitespace. Setting both the inline field and the _file field is a fatal startup error.

Available _file Fields

Auth type Inline field File field
bearer token token_file
api_key key key_file
basic username username_file
basic password password_file

Docker Compose Example

# docker-compose.yml
services:
  olla:
    image: ghcr.io/thushan/olla:latest
    secrets:
      - vllm_api_key
    volumes:
      - ./config.local.yaml:/app/config/config.local.yaml

secrets:
  vllm_api_key:
    file: ./secrets/vllm_api_key.txt
# config.local.yaml
discovery:
  static:
    endpoints:
      - url: "http://vllm:8000"
        name: "vllm"
        type: "vllm"
        auth:
          type: bearer
          token_file: "/run/secrets/vllm_api_key"

Kubernetes Secret Example

apiVersion: v1
kind: Secret
metadata:
  name: olla-backend-creds
stringData:
  vllm-token: "sk-my-token"
---
# In your Deployment, mount as a volume or env var:
env:
  - name: VLLM_API_KEY
    valueFrom:
      secretKeyRef:
        name: olla-backend-creds
        key: vllm-token

Then reference it from config:

auth:
  type: bearer
  token: "${VLLM_API_KEY}"

The headers: Escape Hatch

For backends that need authentication headers that don't fit bearer/api_key/basic, use the headers: map directly. Headers set here are copied verbatim on every forwarded request.

      - url: "http://custom-llm:9000"
        name: "custom"
        type: "openai-compatible"
        headers:
          X-Custom-Auth: "token abc123"
          X-Tenant-ID: "acme"

headers: and auth: can coexist. The auth: block sets the Authorization (or configured) header; headers: sets everything else.

Order of Precedence

When a forwarded request is assembled, headers are applied in this order:

  1. Client request headers are stripped of hop-by-hop headers
  2. headers: map values are set verbatim
  3. auth: sets the credential header (overrides any headers: entry for the same name)

The auth: block intentionally wins over headers: for the credential header. This prevents an operator from accidentally overriding a resolved secret with a static headers: entry.

Request and Response Headers

The precedence rules above apply to the request path (Olla to the backend). The response path (backend to your client) is handled separately, and the two do not interact.

Client request headers pass through to the backend untouched, apart from hop-by-hop headers and the inbound Authorization / Cookie strip that protects against client credentials leaking upstream. Configuring auth: or headers: on an endpoint does not strip or rewrite anything a client sends.

On the response path, Olla strips a small set of headers the backend returns before forwarding to the client:

  • A static list: Authorization, Proxy-Authorization, Set-Cookie, X-Api-Key, X-Auth-Token
  • Any header name configured in that endpoint's auth: or headers: block

The second rule guards against reflection. If you inject X-Custom-Auth: <secret> toward a backend and that backend echoes the header back in its response, Olla removes it so the injected credential cannot leak back out. The strip is keyed on the header name you configured, not on anything the client sends, so custom client headers keep working as before.

Fatal Startup Behaviour

Auth validation runs before the HTTP server starts. The process exits immediately on:

  • Unknown auth.type (must be bearer, api_key, or basic)
  • Both inline field and _file sibling set simultaneously
  • Neither inline nor _file set for a required field
  • ${VAR} placeholder where VAR is unset and no :-default is provided
  • File in _file field that does not exist or cannot be read

This fail-fast behaviour is intentional: a proxy that silently starts without credentials and forwards unauthenticated requests to a protected backend is harder to debug than a startup error.

Recipes

vLLM with --api-key

Start vLLM:

vllm serve meta-llama/Llama-3.1-8B-Instruct --api-key sk-my-key

Olla config:

      - url: "http://vllm-host:8000"
        name: "vllm-gpu"
        type: "vllm"
        auth:
          type: bearer
          token: "${VLLM_API_KEY}"

llama.cpp with --api-key

Start llama-server:

llama-server -m model.gguf --api-key sk-my-key

Olla config:

      - url: "http://llamacpp-host:8080"
        name: "llamacpp"
        type: "llamacpp"
        auth:
          type: bearer
          token: "${LLAMACPP_API_KEY}"

LiteLLM with Master Key

Start LiteLLM proxy:

litellm --config litellm_config.yaml --master_key sk-master

Olla config:

      - url: "http://litellm:4000"
        name: "litellm-gw"
        type: "litellm"
        auth:
          type: bearer
          token: "${LITELLM_MASTER_KEY}"

LiteLLM API key format

LiteLLM accepts the master key as a standard Authorization: Bearer header or as x-goog-api-key depending on the version and configuration. Use api_key auth with header: x-goog-api-key if bearer does not work for your deployment.

See Also