I recently put ChatGPT-5 to the test by asking it to spell some unusual words: rscheinlichkeit, enschappelijke, ziehungsweise, sprechpartner, and enschappelijk. Unfortunately, it failed to get any of them correct. This highlights an ongoing issue with tokenization in language models like GPT-5.
It’s not surprising, given the complexity of human language. But it’s a reminder that even the most advanced AI models have their limitations. Tokenization, in particular, can be a stumbling block when dealing with rare or obscure words.
I’m curious to know if anyone else has encountered similar issues with GPT-5 or other language models. Have you found any workarounds or strategies to help them better handle unusual words?
Let’s discuss!