Demystifying Soft Thinking in Large Reasoning Models | Ranjan Kumar

Have you ever wondered how large language models (LLMs) process abstract and fluid concepts? It’s fascinating to explore how these models can generate soft, abstract tokens to facilitate reasoning within a continuous concept space. But, surprisingly, research has found that these soft tokens often underperform traditional ‘hard’ tokens.

A recent paper delves into the ‘Soft Thinking’ capabilities of various LLMs, examining their internal behavior using a suite of probing techniques. The findings reveal that LLMs predominantly rely on the most influential component of the soft inputs during subsequent decoding steps, which hinders the exploration of different reasoning paths.

To tackle this issue, the paper explores sampling strategies to introduce randomness, employing methods such as Dirichlet resampling and the Gumbel-Softmax trick. The experiments demonstrate that incorporating randomness can alleviate the limitations of vanilla approaches and unleash the potential of Soft Thinking. Notably, the Gumbel-Softmax trick provides adequate randomness with controlled smoothness, resulting in superior performance across eight reasoning benchmarks.

This research has significant implications for the development of more advanced LLMs that can better handle abstract concepts and facilitate more effective reasoning.

Leave a Comment Cancel Reply