LM Studio API: using the local chat completions endpoint

The LM Studio API is fully OpenAI-compatible. Point any existing SDK or HTTP client at http://localhost:1234/v1 and send requests exactly as you would to the hosted service — no cloud account, no per-token billing.

Top Considerations

The LM Studio API mirrors the OpenAI Chat Completions contract at /v1/chat/completions. Set base_url="http://localhost:1234/v1" in the Python or Node.js SDK. The api_key field is required by the SDK but not validated by the local server — pass any non-empty string. The model name must match the ID returned by GET /v1/models, or use any placeholder string to route to the currently loaded model.

API contract overview

The LM Studio API implements the OpenAI Chat Completions schema at /v1/chat/completions. Request and response shapes are identical to the hosted OpenAI API, which means any existing SDK integration works after a single base URL change.

When LM Studio's server mode is active, it listens for HTTP requests at http://localhost:1234 by default. The /v1/ prefix on every route is intentional — it mirrors OpenAI's URL structure so that client libraries configured with a base_url parameter require no further adjustment. The request body for a chat completion is a JSON object with a model string, a messages array of role-content pairs, and optional sampling parameters. The response is a JSON object with a choices array containing the generated text. This is exactly the shape the OpenAI API returns.

One difference from the hosted API is that the LM Studio API does not enforce the api_key header. The OpenAI Python SDK requires an api_key at instantiation time and will raise an error if it is empty, so supply any non-empty string such as "lm-studio" or "not-needed". The local server reads but ignores the value. No request is rejected based on key content.

The model field in the request body nominally selects which model responds, but the LM Studio server always routes to the currently loaded model regardless of the value provided. The field is present to maintain schema compatibility. For clarity, retrieve the true model ID from GET /v1/models and use that value; this makes logs and audit trails accurate and prevents confusion when switching between models.

Python example using the openai SDK

Three lines of setup — import, instantiate with a custom base_url, call create() — are all that separates existing OpenAI Python code from talking to the LM Studio API locally.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",          # required by SDK; not validated locally
)

completion = client.chat.completions.create(
    model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user",   "content": "What is quantization in the context of LLMs?"},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(completion.choices[0].message.content)

Replace the model string with the id from your GET /v1/models response. The rest of the call is identical to a production OpenAI SDK request. To stream the response, add stream=True and iterate the returned generator, printing each chunk.choices[0].delta.content fragment as it arrives.

Node.js example using the openai package

The Node.js openai package accepts a baseURL option at construction time. One configuration change routes all completions through the LM Studio API on localhost.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:1234/v1",
  apiKey: "lm-studio",
});

const response = await client.chat.completions.create({
  model: "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user",   content: "Show me a Python function that reverses a string." },
  ],
  temperature: 0.6,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

For streaming in Node.js, pass stream: true and iterate the async generator with for await (const chunk of stream), reading chunk.choices[0]?.delta?.content per iteration. The openai package version 4 and later handles the SSE parsing and reconnection automatically.

Request parameter reference

Six parameters cover the vast majority of LM Studio API use cases. All are optional except model and messages; defaults shown below reflect the LM Studio server's built-in values.

LM Studio API /v1/chat/completions request parameters, types, defaults, and usage notes
Parameter Type Default Notes
model string Required. Any non-empty string routes to the loaded model; use the ID from /v1/models for clarity.
messages array Required. Array of {"role": "...", "content": "..."} objects. Roles: system, user, assistant.
temperature number 0.8 Sampling temperature 0–2. Lower values produce more deterministic output; higher values more varied.
max_tokens integer -1 (unlimited) Maximum tokens to generate. -1 means generate until the model’s context ceiling or a stop token.
top_p number 0.95 Nucleus sampling cutoff. Values below 1.0 restrict sampling to the top-probability token mass.
stream boolean false When true, the server returns Server-Sent Events instead of a single JSON response object.

Error codes and troubleshooting

The LM Studio API returns standard HTTP status codes with OpenAI-shaped error bodies. The most common error is 404, which means no model is loaded — fix by loading a model in the Chat or Discover tab before sending requests.

A 400 Bad Request means the request body is malformed JSON or is missing a required field. Confirm that model and messages are present and that messages is a non-empty array. A 404 Not Found on the chat completions route almost always means no model is currently loaded in LM Studio — open the app, navigate to Chat or Discover, and load a model. A 422 Unprocessable Entity indicates an invalid parameter value, such as a temperature above 2.0 or a max_tokens that exceeds the model’s context ceiling. A 503 Service Unavailable appears when the model is still loading into memory — wait for the status bar to show the model as ready before sending requests.

If the connection is refused entirely (errno ECONNREFUSED or similar), the server is not running. Open the Server tab in LM Studio and press Start Server. For guidance on responsible local AI deployment, see the FTC’s guidance on AI products and the MIT research community’s ongoing work on LLM evaluation.

Frequently asked questions

Answers to the five most common developer questions about integrating with the LM Studio API.