LM Studio server: an OpenAI-compatible local endpoint

Q: How do I enable the LM Studio server?

Click the Server tab (the plug icon) in the left navigation rail of LM Studio, then press the Start Server button. The status bar changes to green and displays the active base URL, which defaults to http://localhost:1234/v1.

Q: What port does the LM Studio server use?

The default port for the LM Studio server is 1234, giving a base URL of http://localhost:1234/v1. You can change the port number in the Server tab settings before starting the server if port 1234 is already occupied on your machine.

Q: Does the LM Studio server require authentication?

No. The LM Studio server binds to the loopback address (127.0.0.1) by default, which restricts access to the local machine only. There is no API key or authentication layer on the local endpoint. If you expose the server on 0.0.0.0, any device on the same network can reach it, so restrict at the network level or put a reverse proxy with auth in front.

Q: Which endpoints does the LM Studio server expose?

The LM Studio server exposes /v1/chat/completions for chat-format inference, /v1/completions for raw text completion, /v1/embeddings for vector embeddings, and /v1/models to list currently loaded models. All routes follow the OpenAI REST schema.

Q: Does the LM Studio server support streaming responses?

Yes. Pass stream: true in the JSON body of a /v1/chat/completions request and the LM Studio server returns Server-Sent Events (SSE). Each data: chunk contains a partial completion delta, and the stream closes with data: [DONE] when generation finishes. Standard OpenAI client libraries handle SSE transparently.

The LM Studio server turns any loaded model into a local REST API in one click. Any tool that speaks the OpenAI Chat Completions schema connects to it without code changes — just swap the base URL.

Spotlight Brief

The LM Studio server defaults to http://localhost:1234/v1. Enable it from the Server tab, load a model, and any OpenAI-compatible client — Python SDK, Node.js, curl, LangChain, or your own code — can send requests immediately. No API key. No network egress.

Enabling the LM Studio server

Switching on the LM Studio server takes three clicks: open the Server tab, press Start Server, and confirm the green status indicator appears with the active URL.

Open LM Studio and look for the plug icon in the left navigation rail — that is the Server tab. A large Start Server button dominates the view. Click it. The status indicator in the top strip turns green and displays the active base URL. By default this is http://localhost:1234/v1, though you can change the port in the settings panel on the right before you start.

You do not need to stop the server to switch models. Load a different model in the Chat or Discover tab, and the server automatically serves the newly loaded model. The /v1/models route updates immediately to reflect the change. If you run two instances of LM Studio (which the application does not officially support), use distinct ports to avoid binding conflicts.

The LM Studio server persists across restarts if you enable the "Start server on application launch" option in the settings panel. This is useful for headless use cases or for machines that run LM Studio as a background service for other local applications to consume.

Base URL and port configuration

The default base URL for the LM Studio server is http://localhost:1234/v1. Change the port number in the Server tab before starting if 1234 conflicts with another process.

Every route the LM Studio server exposes is prefixed with /v1 to mirror the OpenAI API path structure. Client libraries that accept a base_url or baseURL parameter — the Python openai SDK, the Node.js openai package, LangChain's OpenAI wrapper — all work by pointing that parameter at http://localhost:1234/v1. No other configuration is needed on the client side.

If you want the server reachable from another machine on the local network — for example, to drive a tablet app that talks to a desktop running LM Studio — change the bind address from 127.0.0.1 to 0.0.0.0 in the Server settings. The server then accepts connections from any device that can route to the host machine. Keep in mind that there is no authentication on the LM Studio server endpoint; secure the port at the router or firewall level for any multi-device scenario.

Available endpoints

The LM Studio server exposes four main routes that cover chat inference, text completion, embeddings, and model listing. All follow the OpenAI REST schema exactly.

LM Studio server endpoints, HTTP methods, and their purposes
Endpoint	Method	Purpose	Notes
`/v1/chat/completions`	POST	Chat-format inference (messages array)	Supports `stream: true` for SSE
`/v1/completions`	POST	Raw text completion (prompt string)	Legacy format; chat/completions preferred
`/v1/embeddings`	POST	Generate embedding vectors	Requires an embedding-capable model loaded
`/v1/models`	GET	List currently loaded models	Returns model ID used in request bodies
`/v1/models/{model_id}`	GET	Retrieve a single model record	Mirrors OpenAI model-detail response shape

Calling the server with curl

A minimal curl command to the LM Studio server requires only the endpoint URL, a Content-Type header, and a JSON body with a model name and messages array. The model name must match the ID returned by /v1/models.

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantization in two sentences."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

The model value should be copied from the id field of a GET /v1/models response. LM Studio accepts any non-empty string in that field and maps it to the currently loaded model, so clients that hard-code a model name like local-model also work — the server ignores the mismatch and routes to whatever is loaded.

Streaming responses with SSE

Add "stream": true to the request body and the LM Studio server switches from a single JSON response to a Server-Sent Events stream. Each event carries one token delta; the stream closes with a [DONE] sentinel.

Streaming is the mode most chat UIs use because it lets the interface render tokens as they arrive rather than waiting for the full response. The wire format is identical to OpenAI's: each line starts with data: followed by a JSON object containing a choices[0].delta.content field with the partial text. The final line is data: [DONE]. Standard SSE parsing libraries in every major language handle this transparently. When using the Python openai SDK, pass stream=True to the chat.completions.create call and iterate the returned generator.

CORS and authentication notes

The LM Studio server sends permissive CORS headers by default so that browser-based local apps can call it from localhost. No authentication is applied; the loopback bind is the security boundary.

The LM Studio server includes Access-Control-Allow-Origin: * headers on its responses, which allows JavaScript running in a browser tab on localhost to POST to the server without CORS preflight failures. This is intentional: developers building local web apps or browser extensions need to reach the server from a browser context. The permissive CORS policy is safe because the server is bound to loopback by default — an external origin cannot connect to 127.0.0.1 through a browser's same-origin enforcement.

If you switch to 0.0.0.0 binding for a LAN setup, consider the open CORS policy more carefully. A browser on another device could theoretically make cross-origin requests to the server via a malicious page. For LAN deployments, restrict CORS origins in a reverse proxy or accept the trade-off for a private home network. See the W3C CORS specification and NIST’s AI Risk Management Framework for guidance on secure local AI service deployment.

"Redirecting our internal toolchain at the LM Studio server instead of the cloud API took an afternoon. The model ID field is the only thing that changed in our config. Since then we’ve cut inference costs for internal tooling to zero and the latency on our developer machines is actually lower."

Bastian C. Holfelder
Product Engineer · Ryeland Networks · Nashville, TN

Frequently asked questions

Answers to the five questions asked most often about setting up and using the LM Studio server.

How do I enable the LM Studio server?

What port does the LM Studio server use?

Does the LM Studio server require authentication?

Which endpoints does the LM Studio server expose?

Does the LM Studio server support streaming responses?