LM Studio API: using the local chat completions endpoint
The LM Studio API is fully OpenAI-compatible. Point any existing SDK or HTTP client at http://localhost:1234/v1 and send requests exactly as you would to the hosted service — no cloud account, no per-token billing.
Top Considerations
The LM Studio API mirrors the OpenAI Chat Completions contract at /v1/chat/completions. Set base_url="http://localhost:1234/v1" in the Python or Node.js SDK. The api_key field is required by the SDK but not validated by the local server — pass any non-empty string. The model name must match the ID returned by GET /v1/models, or use any placeholder string to route to the currently loaded model.
API contract overview
The LM Studio API implements the OpenAI Chat Completions schema at /v1/chat/completions. Request and response shapes are identical to the hosted OpenAI API, which means any existing SDK integration works after a single base URL change.
When LM Studio's server mode is active, it listens for HTTP requests at http://localhost:1234 by default. The /v1/ prefix on every route is intentional — it mirrors OpenAI's URL structure so that client libraries configured with a base_url parameter require no further adjustment. The request body for a chat completion is a JSON object with a model string, a messages array of role-content pairs, and optional sampling parameters. The response is a JSON object with a choices array containing the generated text. This is exactly the shape the OpenAI API returns.
One difference from the hosted API is that the LM Studio API does not enforce the api_key header. The OpenAI Python SDK requires an api_key at instantiation time and will raise an error if it is empty, so supply any non-empty string such as "lm-studio" or "not-needed". The local server reads but ignores the value. No request is rejected based on key content.
The model field in the request body nominally selects which model responds, but the LM Studio server always routes to the currently loaded model regardless of the value provided. The field is present to maintain schema compatibility. For clarity, retrieve the true model ID from GET /v1/models and use that value; this makes logs and audit trails accurate and prevents confusion when switching between models.
Python example using the openai SDK
Three lines of setup — import, instantiate with a custom base_url, call create() — are all that separates existing OpenAI Python code from talking to the LM Studio API locally.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio", # required by SDK; not validated locally
)
completion = client.chat.completions.create(
model="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "What is quantization in the context of LLMs?"},
],
temperature=0.7,
max_tokens=512,
)
print(completion.choices[0].message.content)
Replace the model string with the id from your GET /v1/models response. The rest of the call is identical to a production OpenAI SDK request. To stream the response, add stream=True and iterate the returned generator, printing each chunk.choices[0].delta.content fragment as it arrives.
Node.js example using the openai package
The Node.js openai package accepts a baseURL option at construction time. One configuration change routes all completions through the LM Studio API on localhost.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:1234/v1",
apiKey: "lm-studio",
});
const response = await client.chat.completions.create({
model: "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "Show me a Python function that reverses a string." },
],
temperature: 0.6,
max_tokens: 256,
});
console.log(response.choices[0].message.content);
For streaming in Node.js, pass stream: true and iterate the async generator with for await (const chunk of stream), reading chunk.choices[0]?.delta?.content per iteration. The openai package version 4 and later handles the SSE parsing and reconnection automatically.
Request parameter reference
Six parameters cover the vast majority of LM Studio API use cases. All are optional except model and messages; defaults shown below reflect the LM Studio server's built-in values.
| Parameter | Type | Default | Notes |
|---|---|---|---|
model |
string | — | Required. Any non-empty string routes to the loaded model; use the ID from /v1/models for clarity. |
messages |
array | — | Required. Array of {"role": "...", "content": "..."} objects. Roles: system, user, assistant. |
temperature |
number | 0.8 | Sampling temperature 0–2. Lower values produce more deterministic output; higher values more varied. |
max_tokens |
integer | -1 (unlimited) | Maximum tokens to generate. -1 means generate until the model’s context ceiling or a stop token. |
top_p |
number | 0.95 | Nucleus sampling cutoff. Values below 1.0 restrict sampling to the top-probability token mass. |
stream |
boolean | false | When true, the server returns Server-Sent Events instead of a single JSON response object. |
Error codes and troubleshooting
The LM Studio API returns standard HTTP status codes with OpenAI-shaped error bodies. The most common error is 404, which means no model is loaded — fix by loading a model in the Chat or Discover tab before sending requests.
A 400 Bad Request means the request body is malformed JSON or is missing a required field. Confirm that model and messages are present and that messages is a non-empty array. A 404 Not Found on the chat completions route almost always means no model is currently loaded in LM Studio — open the app, navigate to Chat or Discover, and load a model. A 422 Unprocessable Entity indicates an invalid parameter value, such as a temperature above 2.0 or a max_tokens that exceeds the model’s context ceiling. A 503 Service Unavailable appears when the model is still loading into memory — wait for the status bar to show the model as ready before sending requests.
If the connection is refused entirely (errno ECONNREFUSED or similar), the server is not running. Open the Server tab in LM Studio and press Start Server. For guidance on responsible local AI deployment, see the FTC’s guidance on AI products and the MIT research community’s ongoing work on LLM evaluation.
Frequently asked questions
Answers to the five most common developer questions about integrating with the LM Studio API.
Yes. The LM Studio API implements the OpenAI Chat Completions schema at /v1/chat/completions. Any client that already uses the OpenAI Python SDK, the Node.js openai package, or raw HTTP to api.openai.com works against the LM Studio API by changing only the base_url to http://localhost:1234/v1 and providing any non-empty string as the API key.
Install the openai package, then instantiate the client with base_url="http://localhost:1234/v1" and api_key="lm-studio". The api_key value is not validated by the local server; any non-empty string satisfies the SDK requirement. Then call client.chat.completions.create() exactly as you would against the hosted OpenAI API.
Use the id field from a GET /v1/models response. LM Studio also accepts any non-empty string and routes the request to the currently loaded model, so placeholder values like "local-model" work in practice. Using the real model ID keeps logs accurate and prevents confusion when you switch between models.
The LM Studio API returns standard HTTP codes: 200 for success, 400 for malformed request bodies, 404 when no model is loaded, 422 for invalid parameter combinations, and 503 when the model is still initialising. Error response bodies follow the OpenAI error object shape with type and message fields.
Support for tool_calls depends on the loaded model. Models fine-tuned with tool-calling templates — such as Llama 3.1 Instruct or Qwen 2.5 Instruct — return structured tool_calls objects in the response. Models without tool-calling training will attempt to follow the schema but output reliability is not guaranteed. Check the model card on Hugging Face to confirm whether tool calling is supported before building a production integration.