Browsing the model library inside LM Studio

The LM Studio model library lets you discover, compare, and download quantized GGUF models without leaving the application. Hardware-fit hints, file-size estimates, and quality indicators make the choice straightforward even on the first visit.

Brief Digest

The model library lives in the Discover tab of LM Studio. Search by name or family, read the hardware-fit badge before downloading, and pick a quantization that matches your RAM. Q4_K_M is the right default for most machines. Downloaded models appear in your local list instantly and can be loaded with a single click.

The Discover tab: searching the in-app catalog

The Discover tab is LM Studio's model browser. It connects to a curated index of Hugging Face repositories, surfaces quantized GGUF variants for each model, and shows estimated RAM requirements alongside each option.

When you open the Discover tab for the first time, a search field and a list of popular models greet you. The catalog draws from quantised repositories published on Hugging Face, primarily from community-maintained sources that specialise in producing GGUF-format files from official model releases. The search index covers model names, architecture families, and capability tags. Typing llama returns every Llama family variant in the catalog; typing instruct narrows results to instruction-tuned models across all families.

Each search result expands to show a list of quantization options. They are sorted by file size in descending order — the largest, highest-quality variants at the top, the most compressed at the bottom. Next to each row you see a file size, an estimated RAM or VRAM footprint, and a hardware-fit badge. The badge reads the system memory that LM Studio detected at launch and compares it against the estimated footprint to produce a green (comfortable), yellow (marginal), or red (likely over-limit) rating.

The search does not require an internet connection for models you have already downloaded. LM Studio maintains a local manifest of downloaded models and shows them in a separate "Downloaded" section even when offline. This is useful for air-gapped machines where the library is populated via external drive rather than direct download.

Reading hardware-fit hints

Hardware-fit hints save you from downloading a model your machine cannot load. Green means comfortable; yellow means possible with slower performance; red means the model exceeds available memory.

LM Studio calculates hardware-fit estimates at start-up by reading available system RAM and, on GPU-accelerated builds, available VRAM. The estimate for each model variant is derived from its parameter count and quantization level: a Q4_K_M 7B model occupies roughly 5 GB in memory, so on a machine with 16 GB of total RAM and 8 GB available, that variant earns a green badge. The same model at Q8_0 (~7 GB) may still be green; a 13B Q4_K_M (~9 GB) would turn yellow at marginal headroom.

The badges are estimates, not guarantees. Actual footprint depends on context length, KV cache size, and operating-system memory pressure at load time. If a yellow-badged model fails to load, reduce the context length in the load dialog or unload other applications to free memory. A red-badged model can still load if you use aggressive layer offloading to push layers onto system RAM, but generation speed will drop significantly.

Choosing a quantization: Q4 vs Q5 vs Q6 vs Q8

The quantization number controls the bit depth of model weights. Lower means smaller files and faster inference; higher means closer to original quality. The K-quant variants use a mixed-precision strategy that improves the quality-per-bit ratio over naive quantization.

For most everyday tasks — chat, summarization, code completion, Q&A — Q4_K_M strikes the best balance. It compresses a 7B model to around 4.4 GB, loads quickly into a mid-range GPU, and produces output that is indistinguishable from Q8 on typical prompts. The main cases where you will notice a difference are long, multi-step reasoning chains, precise arithmetic, and tasks where the model needs to maintain context across very long generations. For those tasks, Q5_K_M or Q6_K is worth the extra memory.

Q8_0 is the ceiling for local inference without going to full FP16. It is near-lossless on every benchmark and the right choice when you are evaluating a model for quality rather than deploying it for throughput. Full FP16 files exist in the ecosystem but are uncommon in the LM Studio library because the VRAM requirements make them impractical on consumer hardware — a 7B model at FP16 occupies 14 GB of VRAM, leaving nothing for the operating system or KV cache.

Q2_K and Q3_K are compressed enough to run very large models on constrained hardware. A Q3 70B model fits in about 30 GB, which is accessible on a machine with 32 GB of RAM. The cost is meaningful quality degradation on complex instructions. Use these variants when running a larger model at lower quality is preferable to running a smaller model at full quality — a judgment call that depends on the task.

File sizes and the download process

Model downloads happen in the background inside LM Studio. A progress bar tracks each file, downloads can be paused, and the model appears in the local list immediately on completion.

File sizes span a wide range: a Q4_K_M 1B model may be under 1 GB; a Q4_K_M 70B model is roughly 40 GB. LM Studio displays the exact size before you click download so you can plan for disk space. The default storage location is a folder inside your user directory — on macOS this is ~/Library/Application Support/LM Studio/models; on Windows it is under %APPDATA%\LM Studio\models; on Linux it sits in ~/.cache/lm-studio/models. The location is configurable in Settings.

Downloads run in parallel with the rest of the application. You can start a chat with an already-loaded model while a new one downloads in the background. The Discover tab shows live progress with a transfer speed indicator. If the download is interrupted, LM Studio resumes from where it left off the next time you open the app and reconnect to the network — partial files are preserved between sessions.

Major model families in the library

Five model families dominate the LM Studio library: Llama, Mistral, Qwen, Phi, and Gemma. Each has distinct strengths, licensing terms, and hardware profiles worth knowing before you download.

Major model families available in the LM Studio model library, with parameter sizes, licenses, and notes
Family Parameter sizes License Notes
Llama (Meta) 1B, 3B, 8B, 70B, 405B Llama community license Most widely supported; largest ecosystem of fine-tunes and quantisations
Mistral 7B, 8x7B (MoE), 22B Apache 2.0 Apache 2.0 permits commercial use; strong coding and instruction following
Qwen (Alibaba) 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B Qwen license (varies by size) Excellent multilingual coverage; strong coding models in the family
Phi (Microsoft) 1.3B, 2.7B, 3.8B, 14B MIT Unusually capable for parameter count; MIT license permits broad use
Gemma (Google) 2B, 7B, 9B, 27B Gemma Terms of Use Strong on reasoning benchmarks; check terms for commercial use cases

License terms are a practical matter, not a formality. The Llama community license permits most personal and commercial use but includes attribution requirements and a user-count threshold above which additional terms apply. Mistral's Apache 2.0 license is the most permissive for commercial applications. Phi's MIT license is similarly broad. Qwen and Gemma licenses have model-specific terms that warrant a read before production deployment. The LM Studio library does not enforce license compliance — that responsibility rests with the user. See the FTC’s guidance on AI products and NIST’s AI Risk Management Framework for context on responsible AI deployment.

Loading models from outside the library

LM Studio loads any GGUF file from disk. Place the file in the models directory or drag it into the application window and it appears in the local model list without a separate import step.

Many practitioners source models directly from Hugging Face using the huggingface-cli download command or from a curated internal repository. The file just needs the .gguf extension and a valid GGUF header. Once it lands in the models directory, LM Studio picks it up automatically on the next time the Discover tab refreshes — usually within a few seconds. There is no registration, manifest update, or restart required. This path is also how offline and air-gapped deployments work: download models on a network-connected machine, transfer the GGUF files to an external drive, copy them to the target machine's model directory, and LM Studio finds them immediately.

Frequently asked questions

Answers to the four questions asked most often about the LM Studio model library.