LM Studio tutorial: a first-run walkthrough from install to first prompt

Six concrete steps that take you from a fresh machine to a working local model session — with server mode activated and verified.

Practical Recap

This LM Studio tutorial covers the full first session: platform installer, model browser, download, load, chat, and server mode. Each step lists an estimated time and a concrete outcome so you know when to move on.

Before you begin

Two things to confirm before running this LM Studio tutorial: your machine has at least 8 GB of RAM available, and you have a reliable internet connection for the model download in step 3.

This tutorial targets a complete first session with LM Studio. It assumes no prior experience with local inference tools, GGUF models, or command-line API calls. The only prerequisite is a machine running Windows 10 or 11, macOS 13 or later, or a mainstream Linux distribution. If you are on Linux, you will need to make one file executable before launching — step 1 notes exactly how.

Hardware-wise, 8 GB of RAM is the practical floor for the 7B model used in steps 3 and 4. A GPU is not required: LM Studio falls back to CPU inference automatically, which is slower but fully functional. If your machine has an NVIDIA, AMD, or Apple Silicon GPU, LM Studio will detect it and surface an offload option during model loading.

Six-step tutorial: from install to server mode

Steps 1 through 4 cover setup and model loading; steps 5 and 6 cover interactive chat and the local API — the two things most people actually want to do.

LM Studio tutorial — step, estimated time, and expected outcome
StepEstimated timeOutcome
1. Download and install LM Studio3–5 minApplication launches and shows the home screen
2. Open the model browser1 minDiscover tab shows a grid of available models
3. Download a model5–20 min (network-dependent)Progress bar completes; model file appears on disk
4. Load the model and open chat1–3 minModel loaded indicator appears; chat input is active
5. Write a system prompt and first message2–5 minModel returns a coherent response in the chat window
6. Enable server mode and verify with curl2–3 mincurl returns a JSON completion response from localhost

Step 1 — Download and install LM Studio

Navigate to the LM Studio download page and pick the installer that matches your platform. On Windows, run the .exe and follow the standard wizard. On macOS, open the .dmg and drag the application to your Applications folder. On Linux, save the AppImage to a convenient location, then open a terminal and run chmod +x LMStudio-*.AppImage before double-clicking to launch.

When the application opens, you will see a home screen with a sidebar containing five icons: Discover, My Models, Chat, Server, and Settings. Take a moment to notice the status bar at the bottom of the window — it shows whether a model is loaded, the current server state, and basic hardware information.

Step 2 — Open the model browser

Click the Discover icon (the first item in the left sidebar, which looks like a compass or search symbol). The browser loads a grid of cards, each representing a model family. Each card shows the model name, parameter count, and a hardware-fit badge — green means the model is likely to run well on your hardware, yellow means it is a stretch, and red means it exceeds detected capacity.

Use the search bar at the top to filter by name. For this tutorial, type llama-3-8b-instruct or mistral-7b-instruct. Either will work. The browser will return a list of quantized variants for your chosen model, sorted by file size.

Step 3 — Download a model

From the variant list, select the row labeled Q4_K_M. This quantization level gives a good balance between response quality and memory footprint: a 7B Q4_K_M model sits around 4.1–4.4 GB on disk and needs roughly the same amount in RAM. Click the Download button next to that row. A progress bar appears at the bottom of the screen. Depending on your connection speed, this step takes between five minutes on a fast connection and twenty minutes on a slower one. The file is saved to a local models directory — you can change this path in Settings if needed.

Step 4 — Load the model and open chat

Once the download finishes, click the model's name in the My Models view (second sidebar icon) or simply click Load from the Discover card. A load dialog appears with an optional GPU offload slider. If LM Studio detected a compatible GPU, drag the slider to the right to offload as many layers as your VRAM allows — more layers on the GPU means faster token generation. Click Load and wait for the status bar to change from "No model loaded" to the model name with a green indicator.

Now click the Chat icon (third sidebar item). A session window opens with two input areas: the system prompt at the top and the user message field at the bottom.

Step 5 — Write a system prompt and send your first message

In the system prompt field, type something like: You are a helpful assistant. Answer concisely and accurately. This sets the behavior for the session. In the user message field below, type a question — for example: What are the main differences between GGUF and GGML model formats? Press Enter or click the Send button.

The model begins generating tokens almost immediately. You will see the response appear word by word in the chat transcript. For a Q4_K_M 7B model on a mid-range CPU, expect roughly 5–15 tokens per second; on a GPU, that number rises to 40–80 tokens per second or higher depending on the hardware. When the response finishes, the input field clears and is ready for your next message. This is a successful first chat session with LM Studio.

Step 6 — Enable server mode and verify with curl

Click the Server icon (fourth sidebar item). You will see a toggle labeled Start server and a port field showing 1234 by default. Click Start server. The status in the panel turns green and shows the endpoint address: http://localhost:1234/v1.

Open a terminal on the same machine and run the following command:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-model",
    "messages": [{"role":"user","content":"Say hello in one sentence."}]
  }'

The terminal should print a JSON object with a choices array and a message.content field containing the model's response. That is the LM Studio server responding to an OpenAI-compatible API call. Any tool or application that already speaks the OpenAI Chat Completions schema — Python SDKs, JavaScript libraries, code editors with AI extensions — can now point its base URL at http://localhost:1234/v1 and work without further modification.

What to explore after this tutorial

Server mode verified means the hardest part is done — from here, the workflow expands into presets, multi-turn sessions, and connecting external tools.

With a model loaded and server mode running, three directions are worth exploring. First, try chat presets: in the Chat settings panel you can save combinations of system prompt, temperature, top-p, and stop tokens as named presets, then switch between them without resetting the session. Second, experiment with quantization levels — load the same model family at Q5_K_M or Q8_0 and compare response quality versus generation speed for your specific use case. Third, wire up an external client: if you use a code editor with an AI extension or a notebook environment, point its configuration at http://localhost:1234/v1 and your local model becomes the backend.

For deeper background on how local inference works, the NIST AI resource hub covers evaluation frameworks that are relevant when you start assessing model quality systematically. The Stanford Human-Centered AI group publishes accessible research on language model behavior that can sharpen your intuitions about prompt design.

The documentation index maps every topic on this site. The troubleshooting page covers what to do if a step in this tutorial did not behave as described. The vs-Ollama comparison is worth reading once you have a feel for LM Studio's workflow.

Frequently asked questions

Five questions from readers who are working through this LM Studio tutorial for the first time.