Windows setup
Detailed CUDA and system configuration for LM Studio on Windows 10 and 11, including driver prerequisites and antivirus notes.
Windows install guideA clear breakdown of the RAM, VRAM, CPU, and disk space needed to run LM Studio across different model sizes — from a compact 7B all the way up to a 70B parameter model.
LM Studio runs on 8 GB RAM (minimum) but is most comfortable on 16 GB+. GPU acceleration is optional — CPU-only works. Model size sets the real floor: 7B needs ~5 GB memory, 13B needs ~9 GB, 30B needs ~18 GB, 70B needs ~40 GB. Disk usage is dominated by models, not the app itself.
The LM Studio application itself is lightweight; the hardware floor is set by the models you choose to load, not by the app binary.
LM Studio the application consumes under 300 MB of disk space and roughly 200–400 MB of RAM at idle (before any model is loaded). That baseline runs on nearly any modern computer. The practical constraint is always the model: the moment you click "Load," the selected model's weights stream into memory, and the system requirements jump substantially depending on which model and quantization you chose.
The minimum supported operating systems are Windows 10 22H2 (x64), macOS 13 Ventura (Apple Silicon or Intel), and Linux via AppImage on a kernel 5.15 or later. Any machine that meets those OS requirements and has 8 GB of total RAM can run the application and load a small quantized model. "Run" here means functional — usable token rates for practical work require more headroom than the absolute floor.
Quantization compresses model weights to lower bit depths, dramatically reducing the memory required to load a model at the cost of a small quality decrease that is often imperceptible in practice.
A 7B parameter model at full FP16 precision occupies roughly 14 GB. At Q4_K_M quantization, the same model drops to about 4.3 GB — a 3x reduction. That compression is why a 7B model fits comfortably on a 16 GB laptop while the same model at full precision would exceed available memory.
LM Studio's model browser shows the file size and a hardware-fit indicator before you commit to a download. Green means the model fits your detected hardware; yellow means it will work but is tight; red means you would need the layer offload slider to spill some layers to system RAM. Paying attention to those indicators before downloading saves time and frustration.
Q4_K_M is the most popular balance of size and quality. Q5_K_M and Q6_K trade slightly more disk and RAM for noticeably better output on instruction-following tasks. Q8_0 is nearly lossless against FP16 but doubles the memory footprint compared to Q4. Full FP16 is generally only practical for users with high-VRAM data center GPUs or very large unified-memory Mac configurations.
These figures represent practical experience across common hardware setups, not theoretical minimums — aim for the recommended column when choosing hardware for regular use.
For 7B models at Q4 quantization, 8 GB of RAM is the absolute floor and 16 GB is comfortable. On GPU-equipped machines, 6 GB of VRAM covers full GPU resident loading; 4 GB forces partial offload. Disk: 5 GB per model.
For 13B models at Q4, 16 GB of RAM is the floor and 24 GB is recommended. A 12 GB GPU loads the full model; 8 GB manages it with a few CPU layers. Disk: 9 GB per model.
For 30B models at Q4, 24 GB of RAM is the floor and 32 GB is recommended. A 24 GB GPU (RTX 3090 / 4090 class) handles them fully; below that, partial offload is needed. Disk: 18–20 GB per model.
For 70B models at Q4, 48 GB of RAM is the floor and 64 GB+ is recommended for comfortable operation. Dedicated GPU: a single 48 GB card works; consumer cards require substantial CPU offload. Disk: 40–45 GB per model.
NIST's AI Risk Management Framework (nist.gov) includes hardware evaluation as part of responsible AI deployment — relevant for organizations sizing infrastructure for production LM Studio setups.
Any reasonably modern x64 CPU works for CPU-only inference; AVX2 support is the practical baseline that determines CPU-mode speed.
LM Studio's CPU inference path uses AVX2 SIMD instructions on x64 processors. Virtually all Intel Core processors from 4th generation (Haswell, 2013) onward and AMD Ryzen processors from 1st generation onward support AVX2. Processors older than those can still run LM Studio but fall back to scalar code, which is substantially slower.
For Apple Silicon, any M-series chip (M1 through M4 and their Pro/Max/Ultra variants) is well-optimized. The neural engine and GPU cores both participate in inference through the Metal backend, so raw core count matters less than on x64. An M1 MacBook Air with 8 GB of unified memory runs 7B models at rates many users find acceptable for casual use.
On Linux and Windows, a modern 8-core CPU significantly improves CPU-only inference over a 4-core chip. Hyperthreading adds less benefit than additional physical cores. If you plan to rely entirely on CPU inference for extended periods, an AMD Ryzen 7 or Intel Core i7 class processor from 2020 or later is a reasonable baseline.
Model files are large and benefit from fast SSD read speeds; spinning hard drives work but extend load times noticeably.
Model loading speed is directly proportional to storage throughput. An NVMe SSD loads a 7B model in under 5 seconds; a SATA SSD takes 10–20 seconds; a 5400 RPM hard drive can take 2–3 minutes for the same file. Since you often switch between models during exploration, fast storage has a real impact on workflow.
Plan for at least 50 GB of free storage dedicated to models if you intend to keep multiple loaded. Power users running several model sizes simultaneously should budget 200 GB or more. The LM Studio model directory can be pointed at any drive, including external SSDs — see the portable build page for notes on running entirely from external storage.
| Model size | Min system RAM | Recommended VRAM | Approx disk per model |
|---|---|---|---|
| 7B parameters | 8 GB | 6 GB (4 GB with offload) | 4–5 GB |
| 13B parameters | 16 GB | 12 GB (8 GB with offload) | 8–9 GB |
| 30B parameters | 24 GB | 24 GB (16 GB with offload) | 18–20 GB |
| 70B parameters | 48 GB | 48 GB (24 GB with heavy offload) | 40–45 GB |
Detailed CUDA and system configuration for LM Studio on Windows 10 and 11, including driver prerequisites and antivirus notes.
Windows install guideLayer offload, quantization trade-offs, and GPU configuration tips to get the most throughput from your available hardware.
Tuning guideBrowse quantized GGUF models with hardware-fit indicators before downloading. Filter by parameter count and context length.
Open model libraryCommon hardware questions from users evaluating LM Studio across different machine configurations.
LM Studio the application uses under 400 MB at idle. The model you load sets the real floor. A 7B model at Q4 quantization needs approximately 4–5 GB, so 8 GB of total system RAM is a functional minimum. 16 GB is strongly recommended for comfortable multitasking alongside a browser and other tools.
No. LM Studio runs on CPU-only hardware. A dedicated GPU with CUDA, ROCm, or Metal support speeds up inference considerably, but it is not required. On a modern quad-core CPU, a 7B Q4 model produces usable results — just at a slower token rate than GPU-accelerated setups.
A 13B model at Q4_K_M quantization occupies roughly 8–9 GB of VRAM. A 12 GB GPU can host it fully. An 8 GB GPU can load most layers with a few spilling to system RAM using the layer offload slider. Q5_K_M at 13B comfortably fits on 12 GB VRAM with a small quality improvement over Q4.
The LM Studio application is under 300 MB. Model files dominate: a 7B Q4 model is 4–5 GB, a 13B Q4 is 8–9 GB, a 30B Q4 is 18–20 GB, and a 70B Q4 exceeds 40 GB. Plan for at least 50–100 GB free on the drive where you store models if you intend to keep several on hand.
Yes, and many users run LM Studio primarily on laptops. A 16 GB RAM laptop with a modern CPU handles 7B models well. Apple Silicon MacBooks are particularly capable — the unified memory architecture means 16 GB on an M2 MacBook Air drives 7B models at speeds that rival discrete GPU machines.