Is Model Collapse Already Happening in Large Language Models?

Is Model Collapse Already Happening in Large Language Models?

Have you noticed that large language models (LLMs) seem to be getting worse? I’m not the only one who’s picked up on this trend. The telltale signs of AI slop are becoming more and more noticeable, with easy-to-find artifacts like ‘it’s not X, it’s Y’, overuse of em dashes, flowery and corporate language, overformatting, and unnecessary use of Markdown.

It’s like LLMs are trying to mimic human writing, but ending up with a mess that’s unnatural and awkward to read. But why is this happening? Is it because data scrapers are scraping increasing amounts of LLM-generated text, which is then fed back into the models, creating a cycle of AI-generated content that’s becoming increasingly prevalent online?

I suspect that model collapse might already be happening, and if we don’t find an efficient way to filter out AI slop from datasets, the writing quality and capabilities of LLMs will continue to decline. It’s a vicious cycle that could have serious implications for the future of AI-generated content.

What do you think? Have you noticed this trend, and do you think it’s a cause for concern?

Leave a Comment

Your email address will not be published. Required fields are marked *