The Secret Sauce to Achieving ASI: Grading Its Own Reasoning | Ranjan Kumar

Have you heard about the latest breakthrough in AI development? The new OpenAI model has achieved an incredible feat, winning an IMO gold medal. But what’s truly remarkable is the secret behind its success – the model’s ability to grade its own reasoning within its Chain of Thought (CoT). Each step in CoT gets a confidence score, which works as a reward signal for the Reinforcement Learning (RL). This approach was met with skepticism initially, but it paid off spectacularly.

I believe this is a crucial stepping stone towards achieving Artificial Super Intelligence (ASI) or at least the next major milestone. The model learns to ask itself ‘How solid is this step?’ and uses that self-grade, plus parallel sampling, to search long reasoning trajectories.

This new technique allows the model to express its certainty in natural language as it thinks, often saying ‘no answer’ rather than hallucinating. It’s fascinating to see the model’s thought process, with lots of ‘good!’ when confident and question marks when not.

This development has significant implications for the AI community. If we can teach AI models to evaluate their own reasoning, we might be able to create more reliable and accurate AI systems. It’s an exciting time for AI research, and I’m eager to see where this breakthrough takes us.

If you’re interested in learning more, I recommend checking out the latest podcast episode that covers this topic in detail.

Leave a Comment Cancel Reply