LM Studio tutorial: a first-run walkthrough from install to first prompt
Six concrete steps that take you from a fresh machine to a working local model session — with server mode activated and verified.
Practical Recap
This LM Studio tutorial covers the full first session: platform installer, model browser, download, load, chat, and server mode. Each step lists an estimated time and a concrete outcome so you know when to move on.
Before you begin
Two things to confirm before running this LM Studio tutorial: your machine has at least 8 GB of RAM available, and you have a reliable internet connection for the model download in step 3.
This tutorial targets a complete first session with LM Studio. It assumes no prior experience with local inference tools, GGUF models, or command-line API calls. The only prerequisite is a machine running Windows 10 or 11, macOS 13 or later, or a mainstream Linux distribution. If you are on Linux, you will need to make one file executable before launching — step 1 notes exactly how.
Hardware-wise, 8 GB of RAM is the practical floor for the 7B model used in steps 3 and 4. A GPU is not required: LM Studio falls back to CPU inference automatically, which is slower but fully functional. If your machine has an NVIDIA, AMD, or Apple Silicon GPU, LM Studio will detect it and surface an offload option during model loading.
Six-step tutorial: from install to server mode
Steps 1 through 4 cover setup and model loading; steps 5 and 6 cover interactive chat and the local API — the two things most people actually want to do.
| Step | Estimated time | Outcome |
|---|---|---|
| 1. Download and install LM Studio | 3–5 min | Application launches and shows the home screen |
| 2. Open the model browser | 1 min | Discover tab shows a grid of available models |
| 3. Download a model | 5–20 min (network-dependent) | Progress bar completes; model file appears on disk |
| 4. Load the model and open chat | 1–3 min | Model loaded indicator appears; chat input is active |
| 5. Write a system prompt and first message | 2–5 min | Model returns a coherent response in the chat window |
| 6. Enable server mode and verify with curl | 2–3 min | curl returns a JSON completion response from localhost |
Step 1 — Download and install LM Studio
Navigate to the LM Studio download page and pick the installer that matches your platform. On Windows, run the .exe and follow the standard wizard. On macOS, open the .dmg and drag the application to your Applications folder. On Linux, save the AppImage to a convenient location, then open a terminal and run chmod +x LMStudio-*.AppImage before double-clicking to launch.
When the application opens, you will see a home screen with a sidebar containing five icons: Discover, My Models, Chat, Server, and Settings. Take a moment to notice the status bar at the bottom of the window — it shows whether a model is loaded, the current server state, and basic hardware information.
Step 2 — Open the model browser
Click the Discover icon (the first item in the left sidebar, which looks like a compass or search symbol). The browser loads a grid of cards, each representing a model family. Each card shows the model name, parameter count, and a hardware-fit badge — green means the model is likely to run well on your hardware, yellow means it is a stretch, and red means it exceeds detected capacity.
Use the search bar at the top to filter by name. For this tutorial, type llama-3-8b-instruct or mistral-7b-instruct. Either will work. The browser will return a list of quantized variants for your chosen model, sorted by file size.
Step 3 — Download a model
From the variant list, select the row labeled Q4_K_M. This quantization level gives a good balance between response quality and memory footprint: a 7B Q4_K_M model sits around 4.1–4.4 GB on disk and needs roughly the same amount in RAM. Click the Download button next to that row. A progress bar appears at the bottom of the screen. Depending on your connection speed, this step takes between five minutes on a fast connection and twenty minutes on a slower one. The file is saved to a local models directory — you can change this path in Settings if needed.
Step 4 — Load the model and open chat
Once the download finishes, click the model's name in the My Models view (second sidebar icon) or simply click Load from the Discover card. A load dialog appears with an optional GPU offload slider. If LM Studio detected a compatible GPU, drag the slider to the right to offload as many layers as your VRAM allows — more layers on the GPU means faster token generation. Click Load and wait for the status bar to change from "No model loaded" to the model name with a green indicator.
Now click the Chat icon (third sidebar item). A session window opens with two input areas: the system prompt at the top and the user message field at the bottom.
Step 5 — Write a system prompt and send your first message
In the system prompt field, type something like: You are a helpful assistant. Answer concisely and accurately. This sets the behavior for the session. In the user message field below, type a question — for example: What are the main differences between GGUF and GGML model formats? Press Enter or click the Send button.
The model begins generating tokens almost immediately. You will see the response appear word by word in the chat transcript. For a Q4_K_M 7B model on a mid-range CPU, expect roughly 5–15 tokens per second; on a GPU, that number rises to 40–80 tokens per second or higher depending on the hardware. When the response finishes, the input field clears and is ready for your next message. This is a successful first chat session with LM Studio.
Step 6 — Enable server mode and verify with curl
Click the Server icon (fourth sidebar item). You will see a toggle labeled Start server and a port field showing 1234 by default. Click Start server. The status in the panel turns green and shows the endpoint address: http://localhost:1234/v1.
Open a terminal on the same machine and run the following command:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "local-model",
"messages": [{"role":"user","content":"Say hello in one sentence."}]
}'
The terminal should print a JSON object with a choices array and a message.content field containing the model's response. That is the LM Studio server responding to an OpenAI-compatible API call. Any tool or application that already speaks the OpenAI Chat Completions schema — Python SDKs, JavaScript libraries, code editors with AI extensions — can now point its base URL at http://localhost:1234/v1 and work without further modification.
What to explore after this tutorial
Server mode verified means the hardest part is done — from here, the workflow expands into presets, multi-turn sessions, and connecting external tools.
With a model loaded and server mode running, three directions are worth exploring. First, try chat presets: in the Chat settings panel you can save combinations of system prompt, temperature, top-p, and stop tokens as named presets, then switch between them without resetting the session. Second, experiment with quantization levels — load the same model family at Q5_K_M or Q8_0 and compare response quality versus generation speed for your specific use case. Third, wire up an external client: if you use a code editor with an AI extension or a notebook environment, point its configuration at http://localhost:1234/v1 and your local model becomes the backend.
For deeper background on how local inference works, the NIST AI resource hub covers evaluation frameworks that are relevant when you start assessing model quality systematically. The Stanford Human-Centered AI group publishes accessible research on language model behavior that can sharpen your intuitions about prompt design.
The documentation index maps every topic on this site. The troubleshooting page covers what to do if a step in this tutorial did not behave as described. The vs-Ollama comparison is worth reading once you have a feel for LM Studio's workflow.
Frequently asked questions
Five questions from readers who are working through this LM Studio tutorial for the first time.
The six steps typically take 20 to 35 minutes the first time through, depending on your internet speed for the model download in step 3 and how much time you spend exploring chat settings. Steps 1 through 3 usually take under 10 minutes combined on a fast connection. Step 6, the server mode verification, adds fewer than 5 minutes once you have a terminal open.
For a first session, a Q4_K_M quantization of a 7B-parameter instruct model is the safest choice. It loads on 8 GB of RAM, runs at a readable speed on CPU-only hardware, and produces coherent answers on standard tasks. Llama 3 8B Instruct and Mistral 7B Instruct are both reliable picks available in the in-app library.
No. The tutorial works on CPU-only hardware, though inference will be slower — roughly 5 to 10 tokens per second on a modern CPU. If you have an NVIDIA, AMD, or Apple Silicon GPU, LM Studio detects it automatically and the layer-offload slider in step 4 will speed things up considerably.
Yes. The steps are identical across platforms. Linux users need to make the AppImage executable with chmod +x LMStudio-*.AppImage before launching — step 1 notes this. After that, the LM Studio UI behaves the same as on Windows and macOS, and server mode verification with curl works without changes.
Good next steps are the server mode page for wiring LM Studio to external clients, the API page for making programmatic calls, and the performance page for improving inference speed. The vs-Ollama comparison is useful once you have a feel for LM Studio and want to understand where it sits relative to other local inference tools.