Imagine having access to 4.8 million real user-chatbot conversations. Sounds like a conversational AI dream come true? Well, thanks to the release of WildChat-4.8M, that’s now a reality. This massive dataset is a game-changer for anyone working on conversational AI, language models, or chatbots.
The dataset is divided into two versions: a public version with 3.2 million non-toxic conversations and a gated full version with 1.5 million toxic conversations, available for approved research use cases. What’s unique about WildChat-4.8M is that it includes 122,000 conversations from reasoning models, showcasing real-world problem-solving scenarios.
The creators of WildChat-4.8M aimed to fill a gap in open datasets, where real user prompts are scarce. Large language model companies often have access to such data, but it’s rarely shared with the open-source community. With WildChat-4.8M, the conversational AI community can now tap into a wealth of real-world conversations.
Access the non-toxic public version at https://hf.co/datasets/allenai/WildChat-4.8M and the gated full version at https://hf.co/datasets/allenai/WildChat-4.8M-Full. Explore the dataset with the WildVisualizer tool at https://wildvisualizer.com.
This dataset has the potential to revolutionize conversational AI research and development. Get ready to unlock new insights and improve your chatbots with WildChat-4.8M!