Infrastructure is good, but some of the lag could be fixed with code. For example: the page should already know whether a given answer is correct, and give a response without needing to hit the server at all. This would give you instantaneous feedback, which the user can digest while the next audio file is cached. As it is, having to wait 5-10 seconds for my button-presses to register is really killing the enjoyability of what is otherwise a very cool concept.
You don't want the correct answer on the client before the answer is submitted - it would be trivial to cheat by extracting the answer with a little bit of reverse engineering.