Facebook advances VR rendering quality with neural 4×4 supersampling

Rendering 3D graphics for the latest high-resolution displays has never been an easy task, and the challenge level increases multiple times for VR headsets with twin displays using high refresh rates — something Oculus’ parent company Facebook knows all too well. Today, Facebook researchers revealed a new technique for upsampling real-time-rendered 3D content, using machine learning to instantly transform low-resolution, computationally easier imagery into a very close approximation of much higher-resolution reference materials.

The easiest way to understand Facebook’s innovation is to imagine the Mona Lisa rendered as only 16 colored squares, such as a 4×4 grid. A human looking at the grid would see an unforgivably jaggy, boxy image, perhaps recognizing the Mona Lisa’s famous outlines, but a trained computer could instantly identify the grid and replace it with the original piece of art. Employing three-layer convolutional neural networks, Facebook’s researchers have developed a technique that works not just for flat images but rather for 3D rendered scenes, transforming “highly aliased input” into “high fidelity and temporally stable results in real-time,” taking color, depth, and temporal motion vectors into account.

From a computational standpoint, the research suggests that a 3D environment rendered similarly to the original Doom game could be upscaled, with advance training, to a VR experience that looks like Quake. This doesn’t mean any developer could just convert a primitive 3D engine into a rich VR experience, but rather that the technique could help a power-constrained VR device — think Oculus Quest — internally render fewer pixels (see “Input” in the photo above) while displaying beautiful output (“Ours” in the above photo), using machine learning as the shortcut to achieve near-reference quality results.

While the specifics of the machine training are complicated, the upshot is that the network is trained using images grabbed from 100 videos of a given 3D scene, as real users would have experienced it from various head angles. These images enable a full-reference reference scene that would take 140.6 milliseconds to render at 1,600 by 900 pixels to instead be rendered in 26.4 milliseconds at 400 by 225 pixels, then 4×4 upsampled in 17.68 milliseconds, for a total of 44.08 milliseconds — a nearly 3.2x savings in rendering time for a very close approximation of the original image. In this way, a Quest VR headset wearer would benefit from the scenario already having been thoroughly explored on much more powerful computers.

VB Transform 2020 Online – July 15-17. Join leading AI executives: Register for the free livestream.

The researchers say that their system dramatically outperforms the latest Unreal Engine’s temporal antialiasing upscaling technique, shown as Unreal TAAU above, by offering much greater accuracy of reconstructed details. They note that Nvidia’s deep-learning super sampling (DLSS) is closest to their solution, but DLSS relies on proprietary software and/or hardware that might not be available across all platforms. Facebook suggests that its solution won’t require special hardware or software and can be integrated easily into modern 3D engines, using their existing inputs to provide 4×4 supersampling at a time when common solutions use 2×2 upsampling at most.

As positive as the new system is, it’s unsurprisingly not perfect. Despite all the advance training and the temporally stable smoothness of the resulting imagery, it’s possible for some fine details to be lost in the reproduction process, such that text might not be readable on a sticky note (as shown above) if its presence wasn’t properly flagged within the last few frames of the low-resolution render. There are also still questions regarding the expense of implementation for “high-resolution display applications,” though more horsepower, better optimizations, and professional engineering are expected to improve the system’s performance.

The underlying research paper was published today as “Neural Supersampling for Real-Time Rendering,” attributed to Lei Xiao, Salah Nouri, Matt Chapman, Alexander Fix, Douglas Lanman, and Anton Kaplanyan of Facebook Reality Labs. It’s being presented at Siggraph 2020 in mid-July.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *