LM Studio Integration¶

Home	lmstudio.ai
Since	Olla `v0.0.12`
Type	`lm-studio` (use in endpoint configuration)
Profile	`lmstudio.yaml` (see latest)
Features	Proxy Forwarding Health Check (native) Model Unification Model Detection & Normalisation OpenAI API Compatibility
Unsupported	Model Management (loading/unloading) Instance Management Model Download
Attributes	OpenAI Compatible Single Model Concurrency Preloaded Models
Prefixes	`/lmstudio` `/lm-studio` `/lm_studio` (see Routing Prefixes)
Endpoints	See below

Configuration¶

Basic Setup¶

Add LM Studio to your Olla configuration:

discovery:
  static:
    endpoints:
      - url: "http://localhost:1234"
        name: "local-lm-studio"
        type: "lm-studio"
        priority: 90
        model_url: "/api/v0/models"
        health_check_url: "/v1/models"
        check_interval: 2s
        check_timeout: 1s

Multiple LM Studio Instances¶

Run multiple LM Studio servers on different ports:

discovery:
  static:
    endpoints:
      - url: "http://localhost:1234"
        name: "lm-studio-1"
        type: "lm-studio"
        priority: 100

      - url: "http://localhost:1235"
        name: "lm-studio-2"
        type: "lm-studio"
        priority: 90

      - url: "http://192.168.1.10:1234"
        name: "lm-studio-remote"
        type: "lm-studio"
        priority: 50

Endpoints Supported¶

The following endpoints are supported by the LM Studio integration profile:

Path	Description
`/v1/models`	List Models & Health Check
`/v1/chat/completions`	Chat Completions (OpenAI format)
`/v1/completions`	Text Completions (OpenAI format)
`/v1/embeddings`	Generate Embeddings
`/api/v0/models`	Legacy Models Endpoint

Usage Examples¶

Chat Completion¶

curl -X POST http://localhost:40114/olla/lmstudio/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-3b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7
  }'

Streaming Response¶

curl -X POST http://localhost:40114/olla/lm-studio/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-7b-instruct",
    "messages": [
      {"role": "user", "content": "Write a short poem about coding"}
    ],
    "stream": true
  }'

List Available Models¶

curl http://localhost:40114/olla/lm_studio/v1/models

LM Studio Specifics¶

Model Loading Behaviour¶

LM Studio differs from other backends:

Preloaded Models: Models must be loaded in LM Studio before use
Single Concurrency: Only one request processed at a time
Fast Response: No model loading delay during requests

Resource Configuration¶

The LM Studio profile includes optimised resource settings:

characteristics:
  timeout: 3m
  max_concurrent_requests: 1  # LM Studio handles one at a time
  streaming_support: true

Memory Requirements¶

LM Studio uses quantised models with reduced memory requirements:

Model Size	Memory Required	Recommended
70B	42GB	52GB
34B	20GB	25GB
13B	8GB	10GB
7B	5GB	6GB
3B	2GB	3GB

Profile Customisation¶

To customise LM Studio behaviour, create config/profiles/lmstudio-custom.yaml. See Profile Configuration for detailed explanations of each section.

Example Customisation¶

name: lm-studio
version: "1.0"

# Add custom prefixes
routing:
  prefixes:
    - lmstudio
    - lm-studio
    - lm_studio
    - studio      # Add custom prefix

# Adjust timeouts for slower hardware
characteristics:
  timeout: 5m     # Increase from 3m

# Modify resource limits
resources:
  concurrency_limits:
    - min_memory_gb: 0
      max_concurrent: 1  # Always single-threaded

See Profile Configuration for complete customisation options.

Troubleshooting¶

Models Not Appearing¶

Issue: Models don't show in Olla's model list

Solution: 1. Ensure models are loaded in LM Studio UI 2. Check LM Studio is running on the configured port 3. Verify with: curl http://localhost:1234/v1/models

Request Timeout¶

Issue: Requests timeout on large models

Solution: Increase timeout in profile:

characteristics:
  timeout: 10m  # Increase for large models

Connection Refused¶

Issue: Cannot connect to LM Studio

Solution:

Verify LM Studio is running
Check "Enable CORS" in LM Studio settings
Ensure firewall allows the port
Test direct connection: curl http://localhost:1234/v1/models

Single Request Limitation¶

Issue: Concurrent requests fail

Solution: LM Studio processes one request at a time. Use priority load balancing to route overflow to other endpoints:

proxy:
  load_balancer: "priority"

discovery:
  static:
    endpoints:
      - url: "http://localhost:1234"
        name: "lm-studio"
        type: "lm-studio"
        priority: 100

      - url: "http://localhost:11434"
        name: "ollama-backup"
        type: "ollama"
        priority: 50  # Fallback for concurrent requests

Best Practices¶

1. Use for Interactive Sessions¶

LM Studio excels at:

Development and testing
Interactive chat sessions
Quick model switching via UI

2. Configure Appropriate Timeouts¶

proxy:
  response_timeout: 600s  # 10 minutes for long generations
  read_timeout: 300s      # 5 minutes read timeout

3. Monitor Memory Usage¶

LM Studio shows real-time memory usage in its UI. Monitor this to:

Prevent out-of-memory errors
Choose appropriate model sizes
Optimise quantisation levels

4. Combine with Other Backends¶

Use LM Studio for development and Ollama/vLLM for production:

discovery:
  static:
    endpoints:
      # Development - high priority
      - url: "http://localhost:1234"
        name: "lm-studio-dev"
        type: "lm-studio"
        priority: 100

      # Production - lower priority
      - url: "http://localhost:11434"
        name: "ollama-prod"
        type: "ollama"
        priority: 50

Integration with Tools¶

OpenAI SDK¶

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:40114/olla/lmstudio/v1",
    api_key="not-needed"  # LM Studio doesn't require API keys
)

response = client.chat.completions.create(
    model="llama-3.2-3b-instruct",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

LangChain¶

from langchain.llms import OpenAI

llm = OpenAI(
    openai_api_base="http://localhost:40114/olla/lm-studio/v1",
    openai_api_key="not-needed",
    model_name="mistral-7b-instruct"
)

Continue.dev¶

Configure Continue to use Olla with LM Studio:

{
  "models": [{
    "title": "LM Studio via Olla",
    "provider": "openai",
    "model": "llama-3.2-3b-instruct",
    "apiBase": "http://localhost:40114/olla/lmstudio/v1"
  }]
}

Next Steps¶

Profile Configuration - Customise LM Studio behaviour
Model Unification - Understand model management
Load Balancing - Configure multi-backend setups