Imagine 100+ large language models playing each other in a conversational version of the Prisoner’s Dilemma. Sounds like a wild experiment, right? Well, someone actually did it, and the results are fascinating.
The game was set up with a standard Prisoner’s Dilemma payoff matrix, and each model played 100 matches against every other model. What’s surprising is that as the models got larger, they became less likely to defect (choose the option that saves themselves at the cost of their counterpart). You’d think that bigger models would be more selfish, but it seems they’re actually more cooperative.
The data shows that smaller models performed better in the game, achieving higher ratings than their larger counterparts. This raises interesting questions about how we design and train these models. Are we inadvertently promoting selfish behavior in our AI systems?
The experiment was conducted using Python and a conversational formulation of the Prisoner’s Dilemma. You can check out the full details, including the data and methods used, at dilemma.critique-labs.ai.
What do you think? Are you surprised by these results, or do you think there’s something more going on here?