A year or few back, I did camera-passthru XR in VR HMDs, on a DIY stack with ele...

A year or few back, I did camera-passthru XR in VR HMDs, on a DIY stack with electron.js (which is chromium) as compositor. So a couple of thoughts...

The full-screen electron window apparently displaced the window system compositor. So "compositing" specialized compositors can be useful.

Chromium special cased video, making it utterly trivial to do comfortable camera passthru, while a year+ later, mainstream stacks were still considering that hard to impossible. So having compositor architecture match the app is valuable.

That special case would collapse if the CPU needed to touch the camera frames for analysis. So carefully managing data movement between GPU and CPU is important. That's one design objective of google's mediapipe for instance. Perhaps future compositors should permit similar pipeline specifications?

My XR focus was software-dev "office" work, not games. So no game-dev "oh, the horror! an immersion-shattering visual artifact occurred!" - people don't think "Excel, it's ghastly... it leaves me aware I'm sitting in an office!". Similarly, with user balance based on nice video passthru, the rest of the rendering could be slow and jittery. Game-dev common wisdom was "VR means 90 fps, no jitters, no artifacts, or user-sick fail - we're GPU-tech limited", and I was "meh, 30/20/10 fps, whatever" on laptop integrated graphics. So... Games are hard - don't assume their constraints are yours without analysis. And different aspects of the rendered environment can have very different constraints.

My fuzzy recollection was chromium could be persuaded to do 120 Hz, though I didn't have a monitor to try that. Or higher? - I fuzzily recall some variable to uncap the frame rate.

I've used linux evdev (input pipeline step between kernel and libinput) directly from an electron renderer process. Latency wasn't the motivation, so I didn't measure it. But that might save an extra ms or few. At 120 Hz, that might mean whatever ms HID to OS, 0-8 ms wait for the next frame, 8 ms processing and render, plus whatever ms to light. On electron.js.

New interface tech like speech and gesture recognition may start guessing at what it's hearing/seeing many ms before it provides it's final best guess at what happened. Here low latency responsiveness is perhaps more about app system architecture than pipeline tuning. With app state and UI supporting iterative speculative execution.

Eye tracking changes things. "Don't bother rendering anything for the next 50 ms, the user is saccading and thus blind." "After this saccade, the eye will be pointing at xy, so only that region will need a full resolution render" (foveated rendering). Patents... but eventually.