XR Graphics Challenges and how Unity managed it

XR graphics rendering is different and challenging. In a traditional rendering environment, the scene is rendered from a single view. Unity take our objects and transform them into a space that’s appropriate for rendering. They do this by applying a series of transformations to our objects, where they take them from a locally defined space, into a space that they can draw on our screen.

This series of transformations is sometimes referred to as the “graphics transformation pipeline”, and is a classic technique in rendering.

Enter the XRagon

Modern XR devices introduced the requirement of driving two views in order to provide the stereoscopic 3D effect that creates depth for the device wearer. Each view represents an eye. While the two eyes are viewing the same scene from a similar vantage point, each view does possess a unique set of view and projection matrices.

**Culling -> Shadows -> Opaque -> Transparent -> Post Processing -> Present**

Multi-Camera

In order to render the view for each eye, the simplest method is to run the render loop twice. Each eye will configure and run through its own iteration of the render loop. At the end, we will have two images that we can submit to the display device. The underlying implementation uses two Unity cameras, one for each eye, and they run through the process of generating the stereo images. This was the initial method of XR support in Unity, and is still provided by 3rd party headset plugins.

While this method certainly works, Multi-Camera relies on brute force, and is the least efficient as far as the CPU and GPU are concerned. The CPU has to iterate twice through the render loop completely, and the GPU is likely not able to take advantage of any caching of objects drawn twice across the eyes.

Multi-Pass

Multi-Pass was Unity’s initial attempt to optimize the XR render loop. The core idea was to extract portions of the render loop that were view-independent. This means that any work that is not explicitly reliant on the XR eye viewpoints doesn’t need to be done per eye.

Single-Pass

Single-Pass Stereo Rendering means that Unity will make a single traversal of the entire renderloop, instead of twice, or certain portions twice.

This isn’t exactly twice as fast as Multi-Pass. This is because Unity already optimized for culling and shadows, along with the fact that they are still dispatching a draw per eye and switching viewports, which does incur some CPU and GPU cost.

Stereo Instancing (Single-Pass Instanced)

This is only available to users running their desktop VR experiences on Windows 10 or HoloLens.

High level performance overview

As you can see, Single-Pass and Single-Pass Instancing represent a significant CPU advantage over Multi-Pass. However, the delta between Single-Pass and Single-Pass Instancing is relatively small.