Skip to content

LiteLLM API

Proxy endpoints for LiteLLM gateway. All LiteLLM API endpoints are available through the /olla/litellm/ prefix.

Endpoints Overview

Core Endpoints (Always Available)

These endpoints are available in all LiteLLM deployments:

Method URI Description
GET /olla/litellm/health Health check
GET /olla/litellm/v1/models List available models
POST /olla/litellm/v1/chat/completions Chat completion
POST /olla/litellm/v1/completions Text completion
POST /olla/litellm/v1/embeddings Generate embeddings

Optional Endpoints (Kubernetes/Advanced Deployments)

Method URI Description
GET /olla/litellm/health/readiness Readiness probe (K8s)
GET /olla/litellm/health/liveness Liveness probe (K8s)

Database-Required Endpoints

Note: These endpoints only work when LiteLLM is configured with a PostgreSQL database backend:

Method URI Description Requirements
POST /olla/litellm/key/generate Generate API key Database + Admin
GET /olla/litellm/key/info Get key info Database
GET /olla/litellm/user/info User information Database
GET /olla/litellm/team/info Team information Database
GET /olla/litellm/spend/calculate Calculate spend Database

POST /olla/litellm/v1/chat/completions

Chat completion using OpenAI-compatible format, routing to 100+ providers.

Request

curl -X POST http://localhost:40114/olla/litellm/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing in simple terms."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": false
  }'

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1705320600,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing is like having a super-powered calculator that can try many solutions at once..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 120,
    "total_tokens": 145
  }
}

Streaming Response

When "stream": true:

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1705320600,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1705320600,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: [DONE]

GET /olla/litellm/v1/models

List all available models from configured providers.

Request

curl http://localhost:40114/olla/litellm/v1/models

Response

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1698959748,
      "owned_by": "openai"
    },
    {
      "id": "claude-3-opus",
      "object": "model",
      "created": 1698959748,
      "owned_by": "anthropic"
    },
    {
      "id": "gemini-pro",
      "object": "model",
      "created": 1698959748,
      "owned_by": "google"
    },
    {
      "id": "llama-70b",
      "object": "model",
      "created": 1698959748,
      "owned_by": "together_ai"
    }
  ]
}

POST /olla/litellm/v1/embeddings

Generate embeddings using available embedding models.

Request

curl -X POST http://localhost:40114/olla/litellm/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-ada-002",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283,
        -0.005336422,
        -0.00040876168,
        ...
      ]
    }
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

GET /olla/litellm/health

Check LiteLLM gateway health status.

Request

curl http://localhost:40114/olla/litellm/health

Response

{
  "status": "healthy",
  "models": 25,
  "providers": [
    "openai",
    "anthropic",
    "bedrock",
    "gemini",
    "together_ai"
  ]
}

Provider-Specific Models

LiteLLM supports provider-prefixed model names:

OpenAI Models

  • gpt-4, gpt-4-turbo, gpt-3.5-turbo
  • text-embedding-ada-002, text-embedding-3-small

Anthropic Models

  • claude-3-opus, claude-3-sonnet, claude-3-haiku
  • claude-2.1, claude-2, claude-instant

Google Models

  • gemini-pro, gemini-pro-vision
  • palm-2, chat-bison

AWS Bedrock Models

  • bedrock/claude-3-opus, bedrock/claude-3-sonnet
  • bedrock/llama2-70b, bedrock/mistral-7b

Together AI Models

  • together_ai/llama-3-70b, together_ai/mixtral-8x7b
  • together_ai/qwen-72b, together_ai/deepseek-coder

Response Headers

All LiteLLM requests through Olla include tracking headers:

X-Olla-Endpoint: litellm-gateway
X-Olla-Backend-Type: litellm
X-Olla-Model: gpt-4
X-Olla-Request-ID: req_abc123
X-Olla-Response-Time: 2.341s

Error Handling

LiteLLM errors are passed through with additional context:

{
  "error": {
    "message": "Rate limit exceeded for model gpt-4",
    "type": "rate_limit_error",
    "code": 429,
    "provider": "openai"
  }
}

See Also