Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think there's a direct link to the tokenizer - it's a higher level capability. You can stitch together a nonsense word out of common "word fragment" tokens and see if that impairs the LLM's ability to recognize the word as nonsense.


That is wrong, I just generated 5 random letters in python and sent it to gpt-5 and it totally failed to answer properly, said "Got it, whats up :)" even though what I wrote isn't recognizable at all.

The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails.


I tried this four times, every time it recognized it as nonsense.


Same




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: