The AI dungeon thing is a salient point - GPT2 (which it started out with) could barely make a coherent paragraph, and GPT3 could write about half a page of text that more or less made sense.
With modern LLMs, they still get occasionally tripped up, but you could go for pages without a minor detail not making sense.
Something similar might happen with these game models, given enough time.
AI Dungeon was the text equivalent of the Minecraft thing in TFA. I still remember the distinct feeling I had after getting immersed in the interactive story experience for an hour - for the rest of the day, I felt like I woke up from an intense fever dream.
With modern LLMs, they still get occasionally tripped up, but you could go for pages without a minor detail not making sense.
Something similar might happen with these game models, given enough time.