LM Studio troubleshooting: common issues and quick fixes

Q: Why does my model fail to load in LM Studio?

The most common cause is insufficient RAM for the model and quantization level you selected. A 7B Q4_K_M model needs roughly 5 GB of free RAM; a 13B Q4_K_M needs about 9 GB. Close other applications to free memory, or switch to a smaller quantization variant. A corrupted download is the second most common cause — deleting the model file and re-downloading usually fixes it.

Q: Why is LM Studio not detecting my GPU?

GPU detection in LM Studio depends on up-to-date drivers. On NVIDIA hardware, install the latest CUDA-compatible driver from NVIDIA's site. On AMD Linux systems, ensure ROCm is installed and the user account is in the render and video groups. On macOS with Apple Silicon, GPU detection is automatic and requires no additional drivers. After updating drivers, restart LM Studio.

Q: Why is inference so slow in LM Studio?

Slow inference usually means the model is running on CPU rather than GPU, or the GPU layer count is set too low. Open the model load dialog, check the layer-offload slider, and push it toward the maximum your VRAM allows. If you are on CPU only, switching from a Q8 or FP16 model to a Q4_K_M variant roughly doubles tokens-per-second on the same hardware.

Q: How do I fix a server port conflict in LM Studio?

If port 1234 is already in use by another application, LM Studio's server will fail to start. Go to the Server tab in LM Studio, change the port number to an unused port (such as 1235 or 8080), and click Start server again. Update any client applications that were pointing at the old port.

Q: Why won't the LM Studio AppImage launch on Linux?

The AppImage needs execute permission before it can run. Open a terminal, navigate to the directory containing the AppImage, and run: chmod +x LMStudio-*.AppImage. Then double-click the file or run it from the terminal with ./LMStudio-*.AppImage. On some distributions, FUSE must also be installed for AppImage support.

Five problems that account for the majority of LM Studio support requests — each with a cause, a diagnostic step, and a concrete fix.

Ground-Level Notes

Most LM Studio issues resolve with one of three actions: freeing RAM, updating a driver, or changing a port number. This page covers the five symptoms that appear most often, with the fix listed first so you can act without reading the whole explanation.

Quick-reference: symptom, cause, and fix

The table below maps each common LM Studio problem to its most likely cause and the fastest fix — scroll down for detailed explanations of each.

LM Studio troubleshooting — symptom, likely cause, and fix
Symptom	Likely cause	Fix
Model fails to load; progress bar freezes or resets	Insufficient free RAM for the selected model and quantization	Close other applications, pick a smaller quantization (Q4_K_M), or try a smaller model
GPU not detected; inference runs on CPU only	Outdated or missing GPU driver; ROCm not installed on Linux AMD	Update NVIDIA/AMD driver; install ROCm on Linux; restart LM Studio
Inference is very slow; tokens arrive at under 2/s	Model running on CPU; GPU layer count too low; large quantization on limited VRAM	Increase GPU layer-offload slider; switch to Q4_K_M; check that GPU appears in status bar
Server fails to start; "address already in use" error	Port 1234 occupied by another application	Change port to 1235 or 8080 in the Server tab; update client base URLs
AppImage won't launch on Linux; permission denied	Execute bit not set on the AppImage file; FUSE not installed	Run chmod +x on the file; install libfuse2 if missing
Model loads but produces garbled or repetitive output	Wrong chat template applied for the model architecture	In model settings, manually select the correct chat template for the model family

Model fails to load

A model that freezes on load is almost always a memory problem — either too little free RAM for the model size, or a corrupted download that produces an invalid file.

When LM Studio attempts to load a model and the progress bar stalls or the application throws an out-of-memory error, the first thing to check is how much RAM is actually free at the moment of the load attempt. The model size listed in the browser is the on-disk size; the in-memory footprint is usually similar but can be slightly larger depending on the context length and KV cache settings. A 7B Q4_K_M model typically needs 4.5–5.5 GB of free RAM; a 13B Q4_K_M needs 8.5–10 GB. If the free RAM on your machine is close to those numbers, close a browser, a video call application, or any other memory-heavy process before retrying the load.

If freeing RAM does not help, the download may be corrupted. Delete the model file from the My Models view or from the models directory on disk, return to the Discover tab, and re-download. Interrupted downloads occasionally produce a file that appears complete but fails a header check when LM Studio tries to read it. A fresh download from the same source resolves this in most cases.

A third cause is attempting to load a model format that the installed version of LM Studio does not support. Check the release notes for the version you have installed to confirm that the architecture (e.g., Mamba, RWKV, or a very new transformer variant) is listed as supported.

GPU not detected

LM Studio detects GPUs through platform-specific libraries — CUDA on NVIDIA, Metal on Apple Silicon, ROCm on AMD Linux. Any of these failing silently produces CPU-only inference.

On Windows with an NVIDIA card, the most reliable fix is to install the latest Game Ready or Studio driver from NVIDIA directly, rather than relying on Windows Update. After installing, restart the machine and relaunch LM Studio. The GPU should appear in the hardware information panel in the bottom status bar. If it still does not appear, check that no other application has an exclusive lock on the GPU — some capture software and VM hypervisors can hold the device in a way that prevents detection.

On Linux with an AMD card and ROCm: confirm that the ROCm version installed matches the requirements for the LM Studio build you are using. Also confirm that your user account is a member of the render and video groups (groups $USER should list both). Adding the user to those groups requires a logout and login to take effect.

On macOS with Apple Silicon, Metal acceleration is built into the operating system and requires no driver installation. If LM Studio is not using Metal, the most likely cause is a very old macOS version — LM Studio for Apple Silicon requires macOS 13 or later.

Slow inference

Slow token generation almost always traces back to the model running mostly on CPU — either because no GPU was detected, or because the layer-offload count in the load dialog was left at zero or a low value.

Open the model, click Eject, and reload it. In the load dialog, look for the GPU layers slider. If your GPU appears in the dropdown and the slider is at zero, push it to the maximum — LM Studio will show an estimate of how much VRAM each layer consumes. A 7B model with 32 layers on an 8 GB GPU card can typically load all layers onto VRAM, which lifts inference from 5–10 tokens per second on CPU to 40–80 tokens per second on GPU.

If the GPU slider is already maxed and inference is still slower than expected, consider the quantization level. Q8_0 and FP16 produce higher quality but consume roughly twice the memory of Q4_K_M at the same parameter count. Switching from Q8_0 to Q4_K_M at 7B typically doubles tokens per second on the same hardware, at a modest quality cost that is undetectable on most everyday tasks.

Context length also affects speed. A context window set to 32K tokens occupies significantly more KV cache memory than 4K. If you do not need long context, reducing it in the model settings frees VRAM for more layers and improves throughput.

Server port conflict

LM Studio's local server defaults to port 1234. If another process is already listening on that port, the server will fail to start with an "address in use" error — the fix is one setting change.

Navigate to the Server tab in the LM Studio sidebar. Below the Start server toggle is a port field. Change 1234 to any unused port — 1235, 8080, or 11435 are common choices. Click Start server. If it starts successfully, the green indicator confirms the server is listening. Update any client applications, configuration files, or environment variables that previously pointed at the old port number.

To find what is occupying port 1234 before changing it, run lsof -i :1234 on macOS or Linux, or netstat -ano | findstr :1234 on Windows. The output will show the process ID and name. If it is a process you can close, closing it and restarting LM Studio is an alternative to changing the port.

AppImage permissions on Linux

The LM Studio Linux AppImage ships without the execute bit set. One chmod command is all that is needed — but some distributions also require FUSE to be installed for AppImage execution to work at all.

Open a terminal, navigate to the directory where the AppImage was saved, and run:

chmod +x LMStudio-*.AppImage
./LMStudio-*.AppImage

If the file still refuses to execute and the error message mentions FUSE, install the required package. On Ubuntu and Debian-based systems: sudo apt install libfuse2. On Fedora: sudo dnf install fuse. After installing, run the AppImage again without rebooting — the FUSE module loads on demand.

On some hardened enterprise Linux configurations, AppImage execution is restricted at the filesystem level. If chmod succeeds but the file still will not run, check whether the filesystem is mounted with the noexec flag — mounting it without noexec or copying the AppImage to a directory that allows execution resolves this.

When these fixes do not resolve the problem

Problems that survive all five fixes above are best reported through the LM Studio GitHub issue tracker with platform details, hardware specs, and a log excerpt.

LM Studio writes a log file that captures detailed information about model loading, GPU detection, and server startup. The log location varies by platform but is accessible from the Settings panel inside the application under a "Logs" or "Diagnostics" link. Copy the relevant section — the lines around the timestamp when the problem occurred — and include it when filing a report. Well-documented issues with reproduction steps and log excerpts are triaged significantly faster than open-ended reports.

The NIST AI Risk Management Framework provides context for evaluating AI tool reliability in professional settings. For broader Linux compatibility guidance, the NIST AI resources hub covers deployment considerations relevant to any local inference setup.

"The GPU layer slider was the thing nobody told me about. I had been running at 0 layers on GPU for a week before I found this page. Going to 32 layers on my RTX 3070 turned a painful experience into a genuinely fast one."

Soren D. Klemmensen
DevRel Engineer · Pinion Forge · Albuquerque, NM

"Port 1234 was already taken by a monitoring tool on my dev machine. Ten seconds to find this page, thirty seconds to change the port. Saved me half an hour of head-scratching."

Emre L. Doğanay
Hardware Engineer · Zenith Cabinet · Boise, ID

Frequently asked questions

Five questions that cover the troubleshooting scenarios LM Studio users encounter most frequently.

Why does my model fail to load in LM Studio?

Why is LM Studio not detecting my GPU?

Why is inference so slow in LM Studio?

How do I fix a server port conflict in LM Studio?

Why won't the LM Studio AppImage launch on Linux?