LM Studio API¶

Proxy endpoints for LM Studio servers. Available through multiple prefixes: /olla/lmstudio/, /olla/lm-studio/, and /olla/lm_studio/.

Endpoints Overview¶

Method	URI	Description
GET	`/olla/lmstudio/v1/models`	List available models
POST	`/olla/lmstudio/v1/chat/completions`	Chat completion
POST	`/olla/lmstudio/v1/completions`	Text completion
POST	`/olla/lmstudio/v1/embeddings`	Generate embeddings
GET	`/olla/lmstudio/api/v0/models`	Legacy models endpoint

Alternative Prefixes¶

All endpoints are available through these equivalent prefixes:

/olla/lmstudio/*
/olla/lm-studio/*
/olla/lm_studio/*

GET /olla/lmstudio/v1/models¶

List all models available in LM Studio.

Request¶

curl -X GET http://localhost:40114/olla/lmstudio/v1/models

Response¶

{
  "object": "list",
  "data": [
    {
      "id": "TheBloke/phi-3-mini-4k-instruct-GGUF/phi-3-mini-4k-instruct.Q4_K_M.gguf",
      "object": "model",
      "created": 1705334400,
      "owned_by": "TheBloke",
      "permission": [],
      "root": "phi-3-mini",
      "parent": null,
      "max_context_length": 4096
    },
    {
      "id": "TheBloke/gemma-2b-instruct-GGUF/gemma-2b-instruct.Q4_K_M.gguf",
      "object": "model",
      "created": 1705334400,
      "owned_by": "TheBloke",
      "permission": [],
      "root": "gemma-2b",
      "parent": null,
      "max_context_length": 8192
    }
  ]
}

POST /olla/lmstudio/v1/chat/completions¶

OpenAI-compatible chat completion endpoint.

Request¶

curl -X POST http://localhost:40114/olla/lmstudio/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-3-mini",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful coding assistant."
      },
      {
        "role": "user",
        "content": "Write a Python function to calculate fibonacci numbers"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": false
  }'

Response¶

{
  "id": "chatcmpl-lmstudio-abc123",
  "object": "chat.completion",
  "created": 1705334400,
  "model": "phi-3-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a Python function to calculate Fibonacci numbers:\n\n```python\ndef fibonacci(n):\n    if n <= 0:\n        return []\n    elif n == 1:\n        return [0]\n    elif n == 2:\n        return [0, 1]\n    \n    fib_sequence = [0, 1]\n    for i in range(2, n):\n        fib_sequence.append(fib_sequence[-1] + fib_sequence[-2])\n    \n    return fib_sequence\n\n# Example usage\nprint(fibonacci(10))  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]\n```\n\nThis function generates the first n Fibonacci numbers."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 142,
    "total_tokens": 170
  }
}

Streaming Response¶

When "stream": true:

data: {"id":"chatcmpl-lmstudio-abc123","object":"chat.completion.chunk","created":1705334400,"model":"phi-3-mini","choices":[{"index":0,"delta":{"role":"assistant","content":"Here's"},"finish_reason":null}]}

data: {"id":"chatcmpl-lmstudio-abc123","object":"chat.completion.chunk","created":1705334400,"model":"phi-3-mini","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}

data: {"id":"chatcmpl-lmstudio-abc123","object":"chat.completion.chunk","created":1705334400,"model":"phi-3-mini","choices":[{"index":0,"delta":{"content":" Python"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-lmstudio-abc123","object":"chat.completion.chunk","created":1705334401,"model":"phi-3-mini","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

POST /olla/lmstudio/v1/completions¶

Text completion endpoint for non-chat models.

Request¶

curl -X POST http://localhost:40114/olla/lmstudio/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-2b",
    "prompt": "The meaning of life is",
    "max_tokens": 100,
    "temperature": 0.8,
    "top_p": 0.9,
    "stream": false
  }'

Response¶

{
  "id": "cmpl-lmstudio-xyz789",
  "object": "text_completion",
  "created": 1705334400,
  "model": "gemma-2b",
  "choices": [
    {
      "text": " a question that has puzzled philosophers, theologians, and thinkers throughout human history. While there is no single definitive answer, many perspectives suggest that meaning comes from personal growth, relationships, contribution to society, and the pursuit of happiness and fulfillment.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 48,
    "total_tokens": 54
  }
}

POST /olla/lmstudio/v1/embeddings¶

Generate embeddings for text input.

Request¶

curl -X POST http://localhost:40114/olla/lmstudio/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Response¶

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.123, -0.456, 0.789, ...],
      "index": 0
    }
  ],
  "model": "nomic-embed-text",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

GET /olla/lmstudio/api/v0/models¶

Legacy models endpoint for backward compatibility.

Request¶

curl -X GET http://localhost:40114/olla/lmstudio/api/v0/models

Response¶

{
  "models": [
    {
      "id": "phi-3-mini",
      "object": "model",
      "owned_by": "microsoft",
      "permission": [],
      "engines": {
        "chat_completions": {
          "context_length": 4096,
          "max_tokens": 4096,
          "tokenizer": "phi-3"
        }
      }
    },
    {
      "id": "gemma-2b",
      "object": "model",
      "owned_by": "google",
      "permission": [],
      "engines": {
        "completions": {
          "context_length": 8192,
          "max_tokens": 8192,
          "tokenizer": "gemma"
        }
      }
    }
  ]
}

Model Loading¶

LM Studio typically preloads models, resulting in: - Fast initial response times - Single model active at a time - No model loading delays

Request Options¶

Common Parameters¶

Parameter	Type	Default	Description
`temperature`	float	0.7	Sampling temperature (0.0-2.0)
`top_p`	float	0.95	Nucleus sampling threshold
`top_k`	integer	40	Top-k sampling
`max_tokens`	integer	-	Maximum tokens to generate
`stop`	array	-	Stop sequences
`presence_penalty`	float	0	Penalize new tokens (-2.0 to 2.0)
`frequency_penalty`	float	0	Penalize repeated tokens (-2.0 to 2.0)
`repetition_penalty`	float	1.1	Repetition penalty (0.0-2.0)
`seed`	integer	-	Random seed for reproducibility

LM Studio-Specific Parameters¶

Parameter	Type	Description
`n_predict`	integer	Alternative to max_tokens
`mirostat`	integer	Mirostat sampling (0/½)
`mirostat_tau`	float	Mirostat target entropy
`mirostat_eta`	float	Mirostat learning rate
`grammar`	string	BNF grammar for constrained generation

Performance Characteristics¶

Concurrency: Single request at a time (LM Studio limitation)
Context Window: Model-dependent (typically 4K-32K)
Response Time: Fast (models preloaded in memory)
Streaming: Fully supported with low latency

Request Headers¶

All requests are forwarded with:

X-Olla-Request-ID - Unique request identifier
X-Forwarded-For - Client IP address
X-Forwarded-Host - Original host

Response Headers¶

All responses include:

X-Olla-Endpoint - Backend endpoint name (e.g., "local-lm-studio")
X-Olla-Model - Model used for the request
X-Olla-Backend-Type - Always "lm-studio" for these endpoints
X-Olla-Response-Time - Total processing time