Honestly at this point my favorite format is JSONLines (one JSON object per line).
It instinctively feels horrible, but it’s easy to create and parse in basically every language, easy to fully specify, recovers well from one broken line in large datasets, chops up and concatenates easily.
I second this. I'm using JSONL to bake in the data for my single binary Finnish to English pocket dictionary ( https://github.com/hiAndrewQuinn/tsk ). It just makes things like data transformations so easy, especially with jq.
It instinctively feels horrible, but it’s easy to create and parse in basically every language, easy to fully specify, recovers well from one broken line in large datasets, chops up and concatenates easily.