LM Studio API: using the local chat completions endpoint

Q: Is the LM Studio API compatible with the OpenAI API?

Yes. The LM Studio API implements the OpenAI Chat Completions schema at /v1/chat/completions. Any client that already uses the OpenAI SDK or sends raw HTTP to api.openai.com works against the LM Studio API by changing only the base_url to http://localhost:1234/v1 and setting any non-empty string as the API key.

Q: How do I use the Python openai SDK with the LM Studio API?

Install the openai package, then instantiate the client with base_url='http://localhost:1234/v1' and api_key='lm-studio'. The api_key value is not validated by the local server; any non-empty string satisfies the SDK's requirement. Then call client.chat.completions.create() exactly as you would for the hosted OpenAI API.

Q: What model name should I use in LM Studio API requests?

Pass the model ID shown in the id field of a GET /v1/models response. LM Studio also accepts any non-empty string and routes the request to the currently loaded model, so hard-coded placeholders like 'local-model' work too, but using the real ID avoids ambiguity when switching models.

Q: What error codes does the LM Studio API return?

The LM Studio API returns standard HTTP status codes: 200 for success, 400 for malformed request bodies, 404 when no model is loaded, 422 for invalid parameter combinations, and 503 when the model is still loading. Error bodies follow the OpenAI error object shape with type and message fields.

Q: Does the LM Studio API support function calling and tool use?

Support for tool_calls and function calling depends on the model loaded in LM Studio. Models fine-tuned with tool-calling templates (such as Llama 3.1 Instruct) return structured tool_calls objects in the response. Models not trained for tool use will attempt to follow the schema but output quality is not guaranteed.

The LM Studio API is fully OpenAI-compatible. Point any existing SDK or HTTP client at http://localhost:1234/v1 and send requests exactly as you would to the hosted service — no cloud account, no per-token billing.

Top Considerations

The LM Studio API mirrors the OpenAI Chat Completions contract at /v1/chat/completions. Set base_url="http://localhost:1234/v1" in the Python or Node.js SDK. The api_key field is required by the SDK but not validated by the local server — pass any non-empty string. The model name must match the ID returned by GET /v1/models, or use any placeholder string to route to the currently loaded model.

API contract overview

The LM Studio API implements the OpenAI Chat Completions schema at /v1/chat/completions. Request and response shapes are identical to the hosted OpenAI API, which means any existing SDK integration works after a single base URL change.

When LM Studio's server mode is active, it listens for HTTP requests at http://localhost:1234 by default. The /v1/ prefix on every route is intentional — it mirrors OpenAI's URL structure so that client libraries configured with a base_url parameter require no further adjustment. The request body for a chat completion is a JSON object with a model string, a messages array of role-content pairs, and optional sampling parameters. The response is a JSON object with a choices array containing the generated text. This is exactly the shape the OpenAI API returns.

One difference from the hosted API is that the LM Studio API does not enforce the api_key header. The OpenAI Python SDK requires an api_key at instantiation time and will raise an error if it is empty, so supply any non-empty string such as "lm-studio" or "not-needed". The local server reads but ignores the value. No request is rejected based on key content.

The model field in the request body nominally selects which model responds, but the LM Studio server always routes to the currently loaded model regardless of the value provided. The field is present to maintain schema compatibility. For clarity, retrieve the true model ID from GET /v1/models and use that value; this makes logs and audit trails accurate and prevents confusion when switching between models.

Python example using the openai SDK

Three lines of setup — import, instantiate with a custom base_url, call create() — are all that separates existing OpenAI Python code from talking to the LM Studio API locally.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",          # required by SDK; not validated locally
)

completion = client.chat.completions.create(
    model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user",   "content": "What is quantization in the context of LLMs?"},
    ],
    temperature=0.7,
    max_tokens=512,
)

print(completion.choices[0].message.content)

Replace the model string with the id from your GET /v1/models response. The rest of the call is identical to a production OpenAI SDK request. To stream the response, add stream=True and iterate the returned generator, printing each chunk.choices[0].delta.content fragment as it arrives.

Node.js example using the openai package

The Node.js openai package accepts a baseURL option at construction time. One configuration change routes all completions through the LM Studio API on localhost.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:1234/v1",
  apiKey: "lm-studio",
});

const response = await client.chat.completions.create({
  model: "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user",   content: "Show me a Python function that reverses a string." },
  ],
  temperature: 0.6,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

For streaming in Node.js, pass stream: true and iterate the async generator with for await (const chunk of stream), reading chunk.choices[0]?.delta?.content per iteration. The openai package version 4 and later handles the SSE parsing and reconnection automatically.

Request parameter reference

Six parameters cover the vast majority of LM Studio API use cases. All are optional except model and messages; defaults shown below reflect the LM Studio server's built-in values.

LM Studio API /v1/chat/completions request parameters, types, defaults, and usage notes
Parameter	Type	Default	Notes
`model`	string	—	Required. Any non-empty string routes to the loaded model; use the ID from `/v1/models` for clarity.
`messages`	array	—	Required. Array of `{"role": "...", "content": "..."}` objects. Roles: system, user, assistant.
`temperature`	number	0.8	Sampling temperature 0–2. Lower values produce more deterministic output; higher values more varied.
`max_tokens`	integer	-1 (unlimited)	Maximum tokens to generate. -1 means generate until the model’s context ceiling or a stop token.
`top_p`	number	0.95	Nucleus sampling cutoff. Values below 1.0 restrict sampling to the top-probability token mass.
`stream`	boolean	false	When true, the server returns Server-Sent Events instead of a single JSON response object.

Error codes and troubleshooting

The LM Studio API returns standard HTTP status codes with OpenAI-shaped error bodies. The most common error is 404, which means no model is loaded — fix by loading a model in the Chat or Discover tab before sending requests.

A 400 Bad Request means the request body is malformed JSON or is missing a required field. Confirm that model and messages are present and that messages is a non-empty array. A 404 Not Found on the chat completions route almost always means no model is currently loaded in LM Studio — open the app, navigate to Chat or Discover, and load a model. A 422 Unprocessable Entity indicates an invalid parameter value, such as a temperature above 2.0 or a max_tokens that exceeds the model’s context ceiling. A 503 Service Unavailable appears when the model is still loading into memory — wait for the status bar to show the model as ready before sending requests.

If the connection is refused entirely (errno ECONNREFUSED or similar), the server is not running. Open the Server tab in LM Studio and press Start Server. For guidance on responsible local AI deployment, see the FTC’s guidance on AI products and the MIT research community’s ongoing work on LLM evaluation.

Frequently asked questions

Answers to the five most common developer questions about integrating with the LM Studio API.

Is the LM Studio API compatible with the OpenAI API?

How do I use the Python openai SDK with the LM Studio API?

What model name should I use in LM Studio API requests?

What error codes does the LM Studio API return?

Does the LM Studio API support function calling and tool use?