Imagine being able to process and understand massive amounts of text data with ease. That’s exactly what’s now possible with the latest update to the Qwen3 models, Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507. These models have been upgraded to support ultra-long context, allowing them to handle up to 1 million tokens. That’s a significant leap forward in natural language processing capabilities.
But what makes this possible? The secret lies in two innovative technologies: Dual Chunk Attention (DCA) and MInference. DCA is a length extrapolation method that breaks down long sequences into manageable chunks while preserving global coherence. MInference, on the other hand, is a sparse attention mechanism that focuses on key token interactions, reducing overhead and boosting inference speed.
The result? Up to 3× faster performance on near-1M token sequences, making these models more efficient and effective than ever before. And the best part? They’re fully compatible with vLLM and SGLang for seamless deployment.
If you’re interested in learning more, be sure to check out the updated model cards for details on how to enable this feature. You can also explore the Hugging Face and ModelScope repositories for more information.
The possibilities are endless with these upgraded models. What will you do with this newfound power?