Unlocking AI Potential: LM Studio Now Supports Llama.cpp CPU Offload for MoE | Ranjan Kumar

The latest update to LM Studio (version 0.3.23 build 3) brings exciting news for AI enthusiasts. The studio now supports llama.cpp CPU offload for Mixture of Experts (MoE), which allows offloading MoE weights to the CPU, freeing up GPU VRAM for layer offload.

This means that users can now utilize their CPU to handle MoE computations, leaving their GPU to focus on layer offloading. The result? Faster processing times and more efficient use of system resources.

In fact, one user has already reported impressive results using Qwen3 30B (both thinking and instruct) on a 64GB Ryzen 7 and a RTX3070 with 8GB VRAM. By offloading the model’s layers to the GPU, they achieved an impressive 15 tok/s.

This update has significant implications for AI development, enabling developers to push the boundaries of what’s possible with MoE models.

## What This Means for AI Development
– **Faster processing times**: By offloading MoE computations to the CPU, developers can reduce processing times and accelerate their workflows.
– **More efficient resource utilization**: By freeing up GPU VRAM, developers can make more efficient use of their system resources.
– **New possibilities for MoE models**: This update opens up new possibilities for MoE models, enabling developers to explore new applications and use cases.

## Get Started with LM Studio
If you’re interested in exploring the latest updates to LM Studio, head over to their website to learn more.

*Further reading: [LM Studio Documentation](https://lm-studio.readthedocs.io/en/latest/)*

Leave a Comment Cancel Reply