Skip to content

Ollama API

Proxy endpoints for Ollama instances. All Ollama API endpoints are available through the /olla/ollama/ prefix.

Endpoints Overview

Method URI Description
GET /olla/ollama/ Health check
POST /olla/ollama/api/generate Generate completion
POST /olla/ollama/api/chat Chat completion
POST /olla/ollama/api/embeddings Generate embeddings
GET /olla/ollama/api/tags List local models
POST /olla/ollama/api/show Show model information
POST /olla/ollama/api/pull Pull a model
POST /olla/ollama/api/push Push a model
POST /olla/ollama/api/copy Copy a model
DELETE /olla/ollama/api/delete Delete a model
POST /olla/ollama/api/create Create a model
GET /olla/ollama/api/ps List running models

OpenAI-Compatible Endpoints

Method URI Description
GET /olla/ollama/v1/models List models (OpenAI format)
POST /olla/ollama/v1/chat/completions Chat completion (OpenAI format)
POST /olla/ollama/v1/completions Text completion (OpenAI format)
POST /olla/ollama/v1/embeddings Generate embeddings (OpenAI format)

POST /olla/ollama/api/generate

Generate text completion using Ollama's native API.

Request

curl -X POST http://localhost:40114/olla/ollama/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:latest",
    "prompt": "Why is the sky blue?",
    "stream": false,
    "options": {
      "temperature": 0.7,
      "top_p": 0.9,
      "max_tokens": 200
    }
  }'

Response

{
  "model": "llama3.2:latest",
  "created_at": "2024-01-15T10:30:00Z",
  "response": "The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it collides with gas molecules. Blue light waves are shorter and scatter more than other colors, making the sky appear blue to our eyes.",
  "done": true,
  "context": [1234, 5678, ...],
  "total_duration": 5250000000,
  "load_duration": 1200000000,
  "prompt_eval_count": 8,
  "prompt_eval_duration": 350000000,
  "eval_count": 45,
  "eval_duration": 3700000000
}

Streaming Response

When "stream": true:

{"model":"llama3.2:latest","created_at":"2024-01-15T10:30:00Z","response":"The","done":false}
{"model":"llama3.2:latest","created_at":"2024-01-15T10:30:00Z","response":" sky","done":false}
{"model":"llama3.2:latest","created_at":"2024-01-15T10:30:00Z","response":" appears","done":false}
...
{"model":"llama3.2:latest","created_at":"2024-01-15T10:30:01Z","response":"","done":true,"total_duration":5250000000}

POST /olla/ollama/api/chat

Chat completion with conversation history.

Request

curl -X POST http://localhost:40114/olla/ollama/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:latest",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "stream": false,
    "options": {
      "temperature": 0.7
    }
  }'

Response

{
  "model": "llama3.2:latest",
  "created_at": "2024-01-15T10:30:00Z",
  "message": {
    "role": "assistant",
    "content": "The capital of France is Paris."
  },
  "done": true,
  "total_duration": 2150000000,
  "load_duration": 1200000000,
  "prompt_eval_count": 10,
  "prompt_eval_duration": 150000000,
  "eval_count": 8,
  "eval_duration": 800000000
}

POST /olla/ollama/v1/chat/completions

OpenAI-compatible chat completion endpoint.

Request

curl -X POST http://localhost:40114/olla/ollama/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:latest",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is 2+2?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 100,
    "stream": false
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1705334400,
  "model": "llama3.2:latest",
  "system_fingerprint": "fp_abc123",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2 + 2 equals 4."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Response

When "stream": true:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"llama3.2:latest","choices":[{"index":0,"delta":{"role":"assistant","content":"2"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"llama3.2:latest","choices":[{"index":0,"delta":{"content":" +"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1705334400,"model":"llama3.2:latest","choices":[{"index":0,"delta":{"content":" 2"},"finish_reason":null}]}

data: [DONE]

POST /olla/ollama/api/embeddings

Generate embeddings for text input.

Request

curl -X POST http://localhost:40114/olla/ollama/api/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "prompt": "The quick brown fox jumps over the lazy dog"
  }'

Response

{
  "embedding": [0.123, -0.456, 0.789, ...],
  "model": "nomic-embed-text",
  "total_duration": 150000000,
  "load_duration": 50000000,
  "prompt_eval_count": 9
}

GET /olla/ollama/api/tags

List all locally available models.

Request

curl -X GET http://localhost:40114/olla/ollama/api/tags

Response

{
  "models": [
    {
      "name": "llama3.2:latest",
      "model": "llama3.2:latest",
      "modified_at": "2024-01-15T08:00:00Z",
      "size": 4089352192,
      "digest": "sha256:abc123def456",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": ["llama"],
        "parameter_size": "3.2B",
        "quantization_level": "Q4_0"
      }
    },
    {
      "name": "mistral:latest",
      "model": "mistral:latest",
      "modified_at": "2024-01-14T12:00:00Z",
      "size": 4113859584,
      "digest": "sha256:def789ghi012",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "mistral",
        "families": ["mistral"],
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

POST /olla/ollama/api/show

Get detailed information about a specific model.

Request

curl -X POST http://localhost:40114/olla/ollama/api/show \
  -H "Content-Type: application/json" \
  -d '{
    "name": "llama3.2:latest"
  }'

Response

{
  "modelfile": "FROM llama3.2.Q4_0.gguf\nPARAMETER temperature 0.7\nPARAMETER top_p 0.9",
  "parameters": "temperature 0.7\ntop_p 0.9",
  "template": "{{ .System }}\nUser: {{ .Prompt }}\nAssistant:",
  "details": {
    "parent_model": "",
    "format": "gguf",
    "family": "llama",
    "families": ["llama"],
    "parameter_size": "3.2B",
    "quantization_level": "Q4_0"
  },
  "model_info": {
    "general.architecture": "llama",
    "general.file_type": "Q4_0",
    "general.parameter_count": 3200000000,
    "general.quantization_version": 2
  }
}

Model Management

The following endpoints are proxied but typically used for model management:

  • POST /olla/ollama/api/pull - Download a model from Ollama registry
  • POST /olla/ollama/api/push - Upload a model to Ollama registry
  • POST /olla/ollama/api/copy - Create a copy of a model
  • DELETE /olla/ollama/api/delete - Remove a model
  • POST /olla/ollama/api/create - Create a model from a Modelfile
  • GET /olla/ollama/api/ps - List currently loaded models

Request Headers

All requests are forwarded with these additional headers:

  • X-Olla-Request-ID - Unique request identifier
  • X-Forwarded-For - Client IP address
  • X-Forwarded-Host - Original host
  • X-Forwarded-Proto - Original protocol

Response Headers

All responses include:

  • X-Olla-Endpoint - Backend endpoint name (e.g., "local-ollama")
  • X-Olla-Model - Model used for the request
  • X-Olla-Backend-Type - Always "ollama" for these endpoints
  • X-Olla-Response-Time - Total processing time