Browsing the model library inside LM Studio
The LM Studio model library lets you discover, compare, and download quantized GGUF models without leaving the application. Hardware-fit hints, file-size estimates, and quality indicators make the choice straightforward even on the first visit.
Brief Digest
The model library lives in the Discover tab of LM Studio. Search by name or family, read the hardware-fit badge before downloading, and pick a quantization that matches your RAM. Q4_K_M is the right default for most machines. Downloaded models appear in your local list instantly and can be loaded with a single click.
The Discover tab: searching the in-app catalog
The Discover tab is LM Studio's model browser. It connects to a curated index of Hugging Face repositories, surfaces quantized GGUF variants for each model, and shows estimated RAM requirements alongside each option.
When you open the Discover tab for the first time, a search field and a list of popular models greet you. The catalog draws from quantised repositories published on Hugging Face, primarily from community-maintained sources that specialise in producing GGUF-format files from official model releases. The search index covers model names, architecture families, and capability tags. Typing llama returns every Llama family variant in the catalog; typing instruct narrows results to instruction-tuned models across all families.
Each search result expands to show a list of quantization options. They are sorted by file size in descending order — the largest, highest-quality variants at the top, the most compressed at the bottom. Next to each row you see a file size, an estimated RAM or VRAM footprint, and a hardware-fit badge. The badge reads the system memory that LM Studio detected at launch and compares it against the estimated footprint to produce a green (comfortable), yellow (marginal), or red (likely over-limit) rating.
The search does not require an internet connection for models you have already downloaded. LM Studio maintains a local manifest of downloaded models and shows them in a separate "Downloaded" section even when offline. This is useful for air-gapped machines where the library is populated via external drive rather than direct download.
Reading hardware-fit hints
Hardware-fit hints save you from downloading a model your machine cannot load. Green means comfortable; yellow means possible with slower performance; red means the model exceeds available memory.
LM Studio calculates hardware-fit estimates at start-up by reading available system RAM and, on GPU-accelerated builds, available VRAM. The estimate for each model variant is derived from its parameter count and quantization level: a Q4_K_M 7B model occupies roughly 5 GB in memory, so on a machine with 16 GB of total RAM and 8 GB available, that variant earns a green badge. The same model at Q8_0 (~7 GB) may still be green; a 13B Q4_K_M (~9 GB) would turn yellow at marginal headroom.
The badges are estimates, not guarantees. Actual footprint depends on context length, KV cache size, and operating-system memory pressure at load time. If a yellow-badged model fails to load, reduce the context length in the load dialog or unload other applications to free memory. A red-badged model can still load if you use aggressive layer offloading to push layers onto system RAM, but generation speed will drop significantly.
Choosing a quantization: Q4 vs Q5 vs Q6 vs Q8
The quantization number controls the bit depth of model weights. Lower means smaller files and faster inference; higher means closer to original quality. The K-quant variants use a mixed-precision strategy that improves the quality-per-bit ratio over naive quantization.
For most everyday tasks — chat, summarization, code completion, Q&A — Q4_K_M strikes the best balance. It compresses a 7B model to around 4.4 GB, loads quickly into a mid-range GPU, and produces output that is indistinguishable from Q8 on typical prompts. The main cases where you will notice a difference are long, multi-step reasoning chains, precise arithmetic, and tasks where the model needs to maintain context across very long generations. For those tasks, Q5_K_M or Q6_K is worth the extra memory.
Q8_0 is the ceiling for local inference without going to full FP16. It is near-lossless on every benchmark and the right choice when you are evaluating a model for quality rather than deploying it for throughput. Full FP16 files exist in the ecosystem but are uncommon in the LM Studio library because the VRAM requirements make them impractical on consumer hardware — a 7B model at FP16 occupies 14 GB of VRAM, leaving nothing for the operating system or KV cache.
Q2_K and Q3_K are compressed enough to run very large models on constrained hardware. A Q3 70B model fits in about 30 GB, which is accessible on a machine with 32 GB of RAM. The cost is meaningful quality degradation on complex instructions. Use these variants when running a larger model at lower quality is preferable to running a smaller model at full quality — a judgment call that depends on the task.
File sizes and the download process
Model downloads happen in the background inside LM Studio. A progress bar tracks each file, downloads can be paused, and the model appears in the local list immediately on completion.
File sizes span a wide range: a Q4_K_M 1B model may be under 1 GB; a Q4_K_M 70B model is roughly 40 GB. LM Studio displays the exact size before you click download so you can plan for disk space. The default storage location is a folder inside your user directory — on macOS this is ~/Library/Application Support/LM Studio/models; on Windows it is under %APPDATA%\LM Studio\models; on Linux it sits in ~/.cache/lm-studio/models. The location is configurable in Settings.
Downloads run in parallel with the rest of the application. You can start a chat with an already-loaded model while a new one downloads in the background. The Discover tab shows live progress with a transfer speed indicator. If the download is interrupted, LM Studio resumes from where it left off the next time you open the app and reconnect to the network — partial files are preserved between sessions.
Major model families in the library
Five model families dominate the LM Studio library: Llama, Mistral, Qwen, Phi, and Gemma. Each has distinct strengths, licensing terms, and hardware profiles worth knowing before you download.
| Family | Parameter sizes | License | Notes |
|---|---|---|---|
| Llama (Meta) | 1B, 3B, 8B, 70B, 405B | Llama community license | Most widely supported; largest ecosystem of fine-tunes and quantisations |
| Mistral | 7B, 8x7B (MoE), 22B | Apache 2.0 | Apache 2.0 permits commercial use; strong coding and instruction following |
| Qwen (Alibaba) | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Qwen license (varies by size) | Excellent multilingual coverage; strong coding models in the family |
| Phi (Microsoft) | 1.3B, 2.7B, 3.8B, 14B | MIT | Unusually capable for parameter count; MIT license permits broad use |
| Gemma (Google) | 2B, 7B, 9B, 27B | Gemma Terms of Use | Strong on reasoning benchmarks; check terms for commercial use cases |
License terms are a practical matter, not a formality. The Llama community license permits most personal and commercial use but includes attribution requirements and a user-count threshold above which additional terms apply. Mistral's Apache 2.0 license is the most permissive for commercial applications. Phi's MIT license is similarly broad. Qwen and Gemma licenses have model-specific terms that warrant a read before production deployment. The LM Studio library does not enforce license compliance — that responsibility rests with the user. See the FTC’s guidance on AI products and NIST’s AI Risk Management Framework for context on responsible AI deployment.
Loading models from outside the library
LM Studio loads any GGUF file from disk. Place the file in the models directory or drag it into the application window and it appears in the local model list without a separate import step.
Many practitioners source models directly from Hugging Face using the huggingface-cli download command or from a curated internal repository. The file just needs the .gguf extension and a valid GGUF header. Once it lands in the models directory, LM Studio picks it up automatically on the next time the Discover tab refreshes — usually within a few seconds. There is no registration, manifest update, or restart required. This path is also how offline and air-gapped deployments work: download models on a network-connected machine, transfer the GGUF files to an external drive, copy them to the target machine's model directory, and LM Studio finds them immediately.
Frequently asked questions
Answers to the four questions asked most often about the LM Studio model library.
Open the Discover tab (magnifying-glass icon in the left navigation rail) and type a model name, family name, or capability keyword into the search field. Results update in real time. Each result expands to show all available quantization variants with hardware-fit badges, file sizes, and estimated RAM footprints.
Green means the model variant's estimated RAM or VRAM footprint fits comfortably in your available memory. Yellow indicates a marginal fit where performance may be reduced due to layer offloading. Red means the model likely exceeds your memory and may fail to load or run very slowly. LM Studio derives these estimates from the model's parameter count, quantization, and the system memory detected at launch.
Q4_K_M is the default recommendation for most machines and tasks. It halves the file size versus FP16 with minimal quality loss on chat, summarization, and coding tasks. Q5_K_M and Q6_K improve quality on reasoning-heavy tasks if you have the headroom. Q8_0 is near-lossless and best for evaluation or quality-sensitive workflows. Q2 and Q3 are aggressive compressions that trade quality for a much smaller footprint, useful for running very large models on constrained hardware.
Yes. LM Studio loads any GGUF file regardless of where it was downloaded. Place the file in your LM Studio models directory (shown in Settings › Models) and it appears in the local model list within seconds. You can also drag a GGUF file directly into the LM Studio window. No restart or import step is required.