GPT-5 Outperforms Other AI Models in Probability Problem | Ranjan Kumar

Have you ever wondered how different AI models perform when faced with complex probability problems? A recent experiment put four free AI LLMs to the test, and the results are fascinating.

The problem: Jake has 200 black, 400 white, and 600 green marbles in a container. He draws the marbles one by one without putting any back. What is the probability that at least 1 white and 1 green marble remain in the container right after the last black marble is drawn?

The four models – Deepseek R1, GPT-4o, Grok 3, and GPT-5 – were given the exact same problem and wording. Here’s how they fared:

Deepseek R1 took over 15 minutes to respond, eventually producing a 13,000-word essay that arrived at the wrong answer. GPT-4o responded quickly but got the wrong answer as well. Grok 3 took three minutes to respond and got the correct answer, 7/12. And GPT-5? It took a mere 12 seconds to respond and not only got the correct answer but also provided a clear and concise explanation using the inclusion-exclusion principle.

The implications of this experiment are significant. While it’s not meant to imply that one model is inherently better than the others, it does demonstrate the capabilities of GPT-5 in a specific context. It’s a reminder that AI models are constantly evolving and improving, and their potential applications are vast.

So, what do you think? Are you impressed by GPT-5’s performance, or do you think other models will catch up soon?

Leave a Comment Cancel Reply