Battlefield 5 DXR ray-tracing: the DICE tech interview (Eurogamer)

Vyse

Extreme Poster
Mar 27, 2006
26,816
358
83
#1
This one's for the hardcore! With the arrival of DXR and our first look at a video game with real-time, hardware-accelerated ray tracing, we're moving into unknown territory here, discussing technology and techniques never seen before in a shipping game. There's been plenty of discussion about this early, initial work with ray tracing since the DXR patch for Battlefield 5 launched, and some criticism of the performance hit. In putting our coverage together, we wanted to understand the challenges faced by the developer, how its ray tracing implementation actually works and to get some idea of the behind-the-scenes work happening right now to improve game performance. And all of this starts by understanding what the four DXR quality presets actually do, and where the quality trades are made.

What are the real differences between low, medium, high, and ultra DXR settings?

Yasin Uludağ: Right now the differences are:

Low: 0.9 smoothness cut-off and 15.0 per cent of screen resolution as maximum ray count.
Med: 0.9 smoothness cut-off and 23.3 per cent of screen resolution as maximum ray count.
High: 0.5 smoothness cut-off and 31.6 per cent of screen resolution as maximum ray count.
Ultra: 0.5 smoothness cut-off and 40.0 per cent of screen resolution as maximum ray count.

[Note: The cut-off controls which surface materials are assigned ray traced reflections in the game world. Materials are either rough (wood, rocks) or smooth (metal/glass). Based upon how smooth and shiny they are (or conversely how rough) they are able to receive ray traced reflections. The point at which the reflection on a surface transitions from a traditional cube map reflection into a ray traced reflection is then dictated by the threshold setting chosen for this. A 0.9 roughness cut off is conservative and covers polished metals, glass and water. A 0.5 value covers surfaces that are even just moderately shiny at glancing view angles. The "percentage of resolution as maximum ray count" describes the maximum total percentage of the chosen screen resolution which can have a ray traced ray assigned to it at a 1:1 ratio (one ray per pixel). The amount of total possible rays shot out and the apparent clarity of reflections then goes up from low to ultra settings.]


I say maximum ray count here because we will try to distribute rays from this fixed pool onto those screen pixels that are prescribed to be reflective (based on their reflective properties) but we can never go beyond one ray per pixel in our implementation. So, if only a small percentage of the screen is reflective, we give all of those pixels one ray.

We distribute rays where we think they are needed the most and drop the ones that didn't make it. We will never go beyond the maximum ray count if your entire screen is covered in water that is reflective, instead, it will reduce the resolution on a 16x16 tile basis to accommodate. To do this it is necessary to integrate a full-screen buffer using fast on-chip memory and atomic instructions for the last remaining parts as that gives low contention at the hardware level and it's super fast.

However, there are discussions internally to change what each individual settings do; we could do more, like play with LODs and cull distances as well as perhaps some settings for the new hybrid ray tracer that is coming in the future. We are thinking hard about these settings, and looking to have higher quality there as well.

You previously talked to us about optimisations made after Gamescom - which have made their way into the current build of the game?


Yasin Uludağ: The current launch build has a ray binning optimisation that re-orders rays based on so-called super tiles (which are large 2D tiles on the screen). Each super tile re-orders the rays within them based on their direction (angular binning). This is very good for both the texture cache and instruction cache because similar rays often hit the similar triangles and execute the same shaders. On top of that, it is very good for the triangle traverser hardware (the RT core) because the rays take coherent paths while finding the closest intersection with the BVHs.

Another neat optimisation mentioned at Gamescom is how we deal with lighting performance. There are ways to use the built-in acceleration structures in DXR where you can make queries into DXR acceleration structures through ray-gen shaders but we preferred implementing it through compute for time reasons and to aid performance. We have a linked list of lights and cubemaps on the GPU in a grid-like acceleration structure - so there is a separate grid for non shadow lights, shadow casting lights, box cubemaps etc. These are the cubemaps applied inside the reflections. This grid is also camera aligned - this is faster as it grabs the nearest lights rapidly. Without this, the lighting was slow because it had to 'walk over' all the lights to guarantee no popping.

We use Nvidia intrinsics in almost every single compute shader that surrounds and manages ray tracing. Without the Nvidia intrinsics our shaders would be running slower. Another optimisation is partially exposed to the user with the quality settings we mentioned. We call this optimisation “variable rate ray tracing”. As mentioned, the ray tracer is deciding based upon a 16x16 tile how many rays we should have in that region. This can go all the way from 256 rays down to four rays. The deciding factor is the BRDF reflectance, how much is diffuse, how much is specular, if the surface in shadow or in sunlight, what is the smoothness of the reflection, etc. We are basically trying to be smart about where we place the rays with compute shaders and how many of them to place and where. We are working on further improving this part as well currently. This should not be confused with the variable rate shading that Nvidia announced.

What are planned optimisations for the future?

Yasin Uludağ:
One of the optimisations that is built into the BVHs are our use of “overlapped” compute - multiple compute shaders running in parallel. This is not the same thing as async compute or simultaneous compute. It just means you can run multiple compute shaders in parallel. However, there is an implicit barrier injected by the driver that prevents these shaders running in parallel when we record our command lists in parallel for BVH building. This will be fixed in the future and we can expect quite a bit of performance here since it removes sync points and wait-for-idles on the GPU.

We also plan on running BVH building using simultaneous compute during the G-Buffer generation phase, allowing ray tracing to start much earlier in the frame, and the G-Buffer pass. Nsight traces shows that this can be a big benefit. This will be done in the future.

Another optimisation we have in the pipe and that almost made launch was a hybrid ray trace/ray march system. This hybrid ray marcher creates a mip map on the entire depth buffer using a MIN filter. This means that every level takes the closest depth in 2x2 regions and keeps going all the way to the lowest mip map. Because this uses a so-called min filter, you know you can skip an entire region on the screen while traversing.

With this, ray binning then accelerates the hybrid ray traverser tremendously because rays are fetched from the same pixels down the same mip map thereby having super efficient cache utilisation. If your ray gets stuck behind objects as you find in classic screen-space reflections, this system then promotes the ray to become a ray trace/world space ray and continue from the failure point. We also get quality wins here as decals and grass strands will now be in reflections.

We have optimised the denoiser as well so it runs faster and we are also working on optimisations for our compute passes and filters that run throughout the ray tracing implementation.

We have applied for presenting our work/tech at GDC, so look out for that!
Interview is taken from Eurogamer's full ray-tracing analysis article: https://www.eurogamer.net/articles/digitalfoundry-2018-battlefield-5-rtx-ray-tracing-analysis

(1 of 2)
 

Vyse

Extreme Poster
Mar 27, 2006
26,816
358
83
#2
(2 of 2)

What are the current bottlenecks in the ray tracing implementation?

Yasin Uludağ: We have a few bugs in the launch build which prevent us from utilising the hardware efficiently such as the bounding boxes expanding insanely far due to some feature implemented for the rasteriser that didn't play well with ray tracing. We only noticed this when it was too late. Basically, whenever an object has a feature for turning certain parts on and off, the turned-off parts would be skinned by our compute shader skinning system for ray tracing exactly like the vertex shader would do for the rasteriser. (Remember we have shader graphs and we convert every single vertex shader automatically to compute and every pixel shader to a hit shader, if the pixel shader has alpha testing, we also make a any hit shader that can call IgnoreHit() instead of the clip() instruction that alpha testing would do). The same problem also happens with destructible objects because that system collapses vertices too.

Following the API specifications, if you instead of collapsing them to (0, 0, 0), collapse them to (NaN, NaN, NaN) the triangle will be omitted because it's “not a number”. This is what we did and it gave a lot of perf. This has bug has been fixed and will be shipping soon and we can expect every game level and map to see large, significant performance improvements.

Another problem we are having currently in the launch build is with alpha tested geometry like vegetation. If you turn off every single alpha tested object suddenly ray tracing is blazingly fast when it only is for opaque surfaces. Opaque-only ray tracing is also that much faster since we are binning rays as diverging rays can still cost a lot. We are looking into optimisations for any hit shaders to speed this up. We also had a bug that spawned rays off the leaves of vegetation, trees and the like. This compounded with the aforementioned bounding box stretching issue, where rays were trying to escape OUT while checking for self intersections of the tree and leaves. This caused a great performance dip. This has been fixed and significantly improves performance.

We are also looking into reducing the LOD levels for alpha tested geometry like trees and vegetation and we are also looking at reducing memory utilisation by the alpha shaders like vertex attribute fetching (using our compute input assembler). All in all, it is too early to say where we are bottlenecking on the GPU as a whole. First, we need to fix all of our bugs and the known issues (like the aforementioned from alpha testing problem and bounding box issue among others). Once we get things together with all of our optimisations, then we can look at bottlenecks on the GPU itself and start talking about them.

How are you getting to the bottom of performance problems?

Yasin Uludağ: We were initially negatively affected in our QA testing and distributed performance testing due to the RS5 Windows update being delayed. But we have received a custom compiler from Nvidia for the shader that allow us to inject a “counter” into the shader that tracks cycles spent inside a TraceRay call per pixel. This allows us to narrow down where the performance drops are coming from, we can change to primary ray mode instead of reflection rays to see which objects are “bright”. We map high cycle counters to bright and low cycle counters to dark and then go in to fix those geometries. The trees and vegetation instantly popped out as being super-bright.

Having these metrics by default in D3D12 would be a great benefit, as they currently are not. We would also love to see other exposed metrics for how 'good' a “BVH” REFIT was - ie. if the BVH has deteriorated from multiple refits and if we need to rebuild it. Characters running around can deteriorate rather fast!

In playing the game, looking at the order of complexity involved, the visuals, etc. we cannot help but recall other upheavals like Crysis, Quake, or the introduction of the pixel shader. Those took time to get to be more performant, is DXR/RTX going a similar path?

Yasin Uludağ: Yes! People can expect us to keep improving our ray tracing as time goes, as both we at DICE and Nvidia have a bunch of optimisations coming in from the engine side and driver side and we are far from done. We have specialists from Nvidia and DICE working on our issues as we speak. From now on, it's only going to get better, and we have more data now too since the game released. By the time people read this, many of the improvements mentioned will already have been completed. As you mention Quake and Crysis - Working on ray tracing and being the first out with it in this way is a privilege. We feel super-lucky to be part of this transition in the industry and we will do everything we can do deliver the best experience possible. Rest assured, our passion for ray tracing is burning hot!
Feel free to translate/dumb down some of the tech speak for us, @Fijiandoce :)
 
Last edited:

Fijiandoce

Administrator
Staff member
Oct 8, 2007
6,447
141
63
#3
I know review outlets are currently hammering Turing cards' performance with RTX on, but some perspective from these guys would have been nice. Real-time raytracing previously would have been measured in seconds-per-frame(SPF), but here we are running with tolerable frames-per-second(FPS).

I think this is more for @mynd though since it's more GPU stuff :geek:

The best i can say is that, by and large, there is a hell of a lot of math involved. First testing the scene for reflectivity, then determining what should be displayed on the reflective surface by testing what intersects with the rays that are projected...

Doesn't look very efficient though. The (NaN, NaN, NaN) looks like it was a bit of a hack for perf. gain though, so thats nice!
 

Vyse

Extreme Poster
Mar 27, 2006
26,816
358
83
#4
Yeah, what little I could understand made it sound like DXR ray-tracing is brand new and this game isn't developed from the ground up to fully realize the technology. Interested to see where it'll go in the next generation of games.