You are right in that ptrace is slow and nasty. The key problem, I believe, is the tracer-tracee mode that involves two host processes and the switch is asynchronous (ptrace(SYSEMU) and then waitpid).
We do have the KVM platform that offers the synchronous switch, which performs better if you have bare-metal virtualization support.
We do have the KVM platform that offers the synchronous switch, which performs better if you have bare-metal virtualization support.