"Redirecting our internal toolchain at the LM Studio server instead of the cloud API took an afternoon. The model ID field is the only thing that changed in our config. Since then we’ve cut inference costs for internal tooling to zero and the latency on our developer machines is actually lower."
LM Studio server: an OpenAI-compatible local endpoint
The LM Studio server turns any loaded model into a local REST API in one click. Any tool that speaks the OpenAI Chat Completions schema connects to it without code changes — just swap the base URL.
Spotlight Brief
The LM Studio server defaults to http://localhost:1234/v1. Enable it from the Server tab, load a model, and any OpenAI-compatible client — Python SDK, Node.js, curl, LangChain, or your own code — can send requests immediately. No API key. No network egress.
Enabling the LM Studio server
Switching on the LM Studio server takes three clicks: open the Server tab, press Start Server, and confirm the green status indicator appears with the active URL.
Open LM Studio and look for the plug icon in the left navigation rail — that is the Server tab. A large Start Server button dominates the view. Click it. The status indicator in the top strip turns green and displays the active base URL. By default this is http://localhost:1234/v1, though you can change the port in the settings panel on the right before you start.
You do not need to stop the server to switch models. Load a different model in the Chat or Discover tab, and the server automatically serves the newly loaded model. The /v1/models route updates immediately to reflect the change. If you run two instances of LM Studio (which the application does not officially support), use distinct ports to avoid binding conflicts.
The LM Studio server persists across restarts if you enable the "Start server on application launch" option in the settings panel. This is useful for headless use cases or for machines that run LM Studio as a background service for other local applications to consume.
Base URL and port configuration
The default base URL for the LM Studio server is http://localhost:1234/v1. Change the port number in the Server tab before starting if 1234 conflicts with another process.
Every route the LM Studio server exposes is prefixed with /v1 to mirror the OpenAI API path structure. Client libraries that accept a base_url or baseURL parameter — the Python openai SDK, the Node.js openai package, LangChain's OpenAI wrapper — all work by pointing that parameter at http://localhost:1234/v1. No other configuration is needed on the client side.
If you want the server reachable from another machine on the local network — for example, to drive a tablet app that talks to a desktop running LM Studio — change the bind address from 127.0.0.1 to 0.0.0.0 in the Server settings. The server then accepts connections from any device that can route to the host machine. Keep in mind that there is no authentication on the LM Studio server endpoint; secure the port at the router or firewall level for any multi-device scenario.
Available endpoints
The LM Studio server exposes four main routes that cover chat inference, text completion, embeddings, and model listing. All follow the OpenAI REST schema exactly.
| Endpoint | Method | Purpose | Notes |
|---|---|---|---|
/v1/chat/completions |
POST | Chat-format inference (messages array) | Supports stream: true for SSE |
/v1/completions |
POST | Raw text completion (prompt string) | Legacy format; chat/completions preferred |
/v1/embeddings |
POST | Generate embedding vectors | Requires an embedding-capable model loaded |
/v1/models |
GET | List currently loaded models | Returns model ID used in request bodies |
/v1/models/{model_id} |
GET | Retrieve a single model record | Mirrors OpenAI model-detail response shape |
Calling the server with curl
A minimal curl command to the LM Studio server requires only the endpoint URL, a Content-Type header, and a JSON body with a model name and messages array. The model name must match the ID returned by /v1/models.
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantization in two sentences."}
],
"temperature": 0.7,
"max_tokens": 256
}'
The model value should be copied from the id field of a GET /v1/models response. LM Studio accepts any non-empty string in that field and maps it to the currently loaded model, so clients that hard-code a model name like local-model also work — the server ignores the mismatch and routes to whatever is loaded.
Streaming responses with SSE
Add "stream": true to the request body and the LM Studio server switches from a single JSON response to a Server-Sent Events stream. Each event carries one token delta; the stream closes with a [DONE] sentinel.
Streaming is the mode most chat UIs use because it lets the interface render tokens as they arrive rather than waiting for the full response. The wire format is identical to OpenAI's: each line starts with data: followed by a JSON object containing a choices[0].delta.content field with the partial text. The final line is data: [DONE]. Standard SSE parsing libraries in every major language handle this transparently. When using the Python openai SDK, pass stream=True to the chat.completions.create call and iterate the returned generator.
CORS and authentication notes
The LM Studio server sends permissive CORS headers by default so that browser-based local apps can call it from localhost. No authentication is applied; the loopback bind is the security boundary.
The LM Studio server includes Access-Control-Allow-Origin: * headers on its responses, which allows JavaScript running in a browser tab on localhost to POST to the server without CORS preflight failures. This is intentional: developers building local web apps or browser extensions need to reach the server from a browser context. The permissive CORS policy is safe because the server is bound to loopback by default — an external origin cannot connect to 127.0.0.1 through a browser's same-origin enforcement.
If you switch to 0.0.0.0 binding for a LAN setup, consider the open CORS policy more carefully. A browser on another device could theoretically make cross-origin requests to the server via a malicious page. For LAN deployments, restrict CORS origins in a reverse proxy or accept the trade-off for a private home network. See the W3C CORS specification and NIST’s AI Risk Management Framework for guidance on secure local AI service deployment.
Practitioner testimonial
Frequently asked questions
Answers to the five questions asked most often about setting up and using the LM Studio server.
Click the Server tab (the plug icon in the left navigation rail), then press Start Server. The status strip turns green and shows the active URL, which defaults to http://localhost:1234/v1. Load a model through the Chat or Discover tab and the server begins accepting requests immediately.
The default port is 1234, giving a base URL of http://localhost:1234/v1. You can change the port in the Server tab settings panel before starting the server. If 1234 is occupied by another process on your machine, pick any unused port above 1024.
No. The LM Studio server applies no API key or token-based authentication. Access control relies entirely on the loopback bind (127.0.0.1), which prevents connections from other machines. If you expose the server on 0.0.0.0 for a LAN scenario, secure the port at the network level or place a reverse proxy with auth in front of it.
The server exposes /v1/chat/completions for chat-format inference, /v1/completions for raw text completion, /v1/embeddings for vector embeddings, and /v1/models to list loaded models. All routes follow the OpenAI REST schema, so any client that uses the OpenAI API contract works without modification.
Yes. Include "stream": true in the JSON request body and the server switches to Server-Sent Events delivery. Each data: line contains a partial completion delta; the stream ends with data: [DONE]. The Python and Node.js OpenAI SDKs handle streaming transparently — pass stream=True or stream: true in the call and iterate the returned generator or async iterable.