Trending
Opinion: How will Project 2025 impact game developers?
The Heritage Foundation's manifesto for the possible next administration could do great harm to many, including large portions of the game development community.
Featured Blog | This community-written post highlights the best of what the game industry has to offer. Read more like it on the Game Developer Blogs or learn how to Submit Your Own Blog Post
Even though the target applications for ray tracing are extremely varied, this article focuses mainly on shadows. Ray tracing creates more accurate shadows, free from the artifacts of shadow maps; in addition, ray traced shadows are also up to 50% faster.
About a year ago I published an article on the Gamasutra blogs called Practical techniques for ray tracing in games in which I explained how developers can implement a series of hybrid rendering techniques on our PowerVR Wizard ray tracing GPUs to achieve some pretty impressive effects.
Even though the target applications for ray tracing are extremely varied, this post is focused mainly on shadows. Not only does ray tracing create more accurate shadows that are free from the artifacts of shadow maps, but ray traced shadows are also up to twice as efficient; they can be generated at comparable or better quality in half the GPU cycles and with half the memory traffic (more on that later).
In what follows below, I’d like to take you through the process of implementing an efficient technique for soft shadows, optimizing the proposed algorithms and analyzing the final results.
Firstly, let’s review cascaded shadow maps – the state of the art technique used today to generate shadows for rasterized graphics. The idea behind cascaded shadow maps is to take the view frustum, divide it up in a number of regions based on distance from the viewpoint, and render a shadow map for each region. This will offer variable resolution for shadow maps: the objects that are closer to the camera will get higher resolution, while the objects that fall off into the distance will get lower resolution per unit area.
In the diagram below, we can see an example of several objects in a scene. Each shadow map is rendered, one after another, and each covers an increasingly larger portion of the scene. Since all of the shadow maps have the same resolution, the density of shadow map pixels goes down as we move away from the viewpoint.
Finally, when rendering the objects again in the final scene using the camera’s perspective, we select the appropriate shadow maps based on each object’s distance from the viewpoint, and interpolate between those shadow maps to determine if the final pixel is lit or in shadow.
All of this complexity serves one purpose: to reduce the occurrence of resolution artifacts caused by a shadow map being too coarse. This works because an object further away from the viewpoint will occupy less space on the screen and therefore less shadow detail is needed. And it works nicely – although the cost in GPU cycles and memory traffic is significant.
Ray traced shadows fundamentally operate in screen space. For that reason, there is no need to align a shadow map’s resolution to the screen resolution; the resolution problem just doesn’t exist.
The basic ray traced shadow algorithm works like this: for every point on a visible surface, we shoot one ray directly toward the light. If the ray reaches the light, then that surface is lit and we use a traditional lighting routine. If the ray hits anything before it reaches the light, then that ray is discarded because that surface is in shadow. This technique produces crisp, hard shadows like the ones we might see on a perfectly cloudless day.
However, most shadows in the real world have a gradual transition between lighter and darker areas – this soft edge is called a penumbra. Penumbras are caused by different factors related to the physics of light; even though most games model light sources as a dimensionless point source, in reality light sources have a surface. This surface area is what causes the shadow softness. Within the penumbra region, part of the light is blocked by an occluding object, but the remaining light has a clear path. This is the reason you see areas that are not fully in light and not fully in shadow either.
The diagram below shows how we can calculate the size of a penumbra based on three variables: the size of the light source (R), the distance to the light source (L), and the distance between the occluder and the surface on which the shadow is cast (O). By moving the occluder closer to the surface, the penumbra is going to shrink.
Based on these variables, we derive a simple formula for calculating the size of the penumbra.
Using this straightforward relationship, we can formulate an algorithm to render accurate soft shadows using only one ray per pixel. We start with the hard shadows algorithm above; but when a ray intersects an object, we record the distance from the surface to that object in a screen-space buffer.
This algorithm can be extended to support semi-transparent surfaces. For example, when we intersect a surface, we can also record whether it is transparent; if the surface is transparent, we choose to continue the ray through the surface, noting its alpha value in a separate density buffer.
This method has several advantages over cascaded shadow maps or other common techniques:
There are no shadow map resolution issues since it is all based in screen space
There are no banding, noise or buzzing effects due to sampling errors
There are no biasing problems (sometimes called Peter-Panning) since you are shooting rays directly off geometry and therefore getting perfect contact between the shadow and the casting object
Below we show an example of the buffers that are generated by the ray tracing pass (we’ve uploaded the original frame here).
First, we have the ray tracing density buffer. Most of the objects in the scene are opaque, therefore we have a shadow density of 1. However, the fence region contains multiple pixels that have values between 0 and 1.
Next up is the distance to occluder buffer. As we get further away from the occluding objects, the value of the red component increases, representing a greater distance between the shadow pixel and the occluder.
Finally we run a filter pass to calculate the shadow value for each pixel using these two buffers.
Firstly, we work out the size of the penumbra affecting each pixel, use that penumbra to choose a blur kernel radius, and then blur the screen-space shadow density buffer accordingly. For a pixel that has a populated value in the distance to occluder buffer, calculating the penumbra is easy. Since we have the distance to the occluder, we just need to project that value from world space into screen space, and use the projected penumbra to select a blur kernel radius.
When the pixel is lit, we need to work a little harder. We use a cross search algorithm to locate another pixel with a ray that contributes to the penumbra. If we find any pixels on the X or Y axis that are in shadow (i.e. have valid distance values), we’ll pick the maximum distance to occluder value and use it to calculate the penumbra region for this pixel; we then adjust for the distance of the located pixel and calculate the penumbra side.
From here on, the algorithm is the same: we take the size of the penumbra from world space and project it into screen space, then we figure out the region of pixels that penumbra covers, and finally perform a blur. In the case where no shadowed pixel is found, we assume our pixel is fully lit.
Below we have a diagram representing our final filter kernel.
We are covering the penumbra region with a box filter and sample it while still being aware of discontinuous surfaces. This is where depth rejection comes to our aid; to calculate the depth rejection values, we use a local differencing to figure out the delta between the current pixel and certain values on the X and Y axis. The result will tell us how far we need to step back as we travel in screen space. As we’re sampling our kernel, we’ll expand the depth threshold based on how far away we are from the center pixel.
In the example above, we have rejected all the samples marked in red because the corresponding area belongs to the fence and we are interested in sampling a spot on the ground. After the blurring pass, the resulting buffer represents an accurate estimate of the shadow density across the screen.
The images below compare an implementation of four slice cascaded shadow maps at 2K resolution versus ray traced shadows. In the ray traced case, we retain the shadow definition and accuracy where the distance between the shadow casting object and the shadow receiver is small; by contrast, cascaded shadow maps often overblur, ruining shadow detail.
Clicking on the full resolution images reveals the severe loss of image quality that occurs in cascaded shadow maps. I've embedded the low-res images below for reference:
In the second and third examples, we’ve removed the textures so we can highlight the shadowing.
The diagram below describes the initial implementation of the ray tracing hybrid model described in this article.
The first optimization we can make is to cast fewer rays. We can use the information provided by dot (N, L) to establish if a surface is back facing a ray. If the dot (N, L) result is less or equal to 0, we don’t need to cast any rays because we can assume the pixel is shadowed by virtue of facing away from the light.
Looking at the rendering pipeline, there are further optimizations we can make. The diagram below shows the standard deferred rendering approach; this approach involves many read and write operations and costs bandwidth (and therefore power).
The first optimization we’ve made is to reduce the amount of data in each buffer by using data types that don’t have any more bits than the bare minimum needed; for example, we can pack our distance density buffer into only 8 bits by normalizing the distance value between 0 and 1 since it doesn’t require very high precision. The next step is to collapse passes; if we use the framebuffer fetch extension, we can collapse the ray tracing and G-Buffer into one pass, saving all of the bandwidth of reading the gBuffer from the ray emission pass.
Memory bandwidth usage analysis
Before we look at the final numbers, let’s spend some time looking at memory traffic. Bandwidth is the amount of data that is accessed from external memory; memory traffic consumes bandwidth. Every time a developer codes a texture fetch, the shading cluster (USC) in a PowerVR Rogue GPU will look for it inside the cache memory; if the texture is not stored locally in cache, the USC will access DRAM memory to get the value. For every access to external memory, the chip will incur significant latency and the device will consume more power. When optimizing a mobile application, the goal of a developer is to always minimize the accesses to memory.
By using specialized instruments to look at bandwidth usage, we can compare cascaded shadow maps with ray traced soft shadows on a PowerVR Wizard GPU. In total, the cascaded shadow maps implementation consumes about 233 MB of memory while the same scene rendered with ray traced soft shadows requires only 164 MB. For ray tracing, there is an initial one-time setup cost of 61 MB due to the acceleration structure that must be built for the scene.
This structure can be reused from frame to frame, so it isn’t part of the totals for a single frame. We’ve also measured the G-Buffer independently to see how much of our total cost results from this pass.
Therefore, by subtracting the G-Buffer value from the total memory traffic value, shadowing using cascaded maps requires 136 MB while ray tracing is only 67 MB, a 50% reduction in memory traffic.
We notice similar effects in other views of the scene depending on how many rays we are able to reject, how much filtering we have to perform. Overall, we get an average of 50% reduction in memory traffic using ray traced shadows.
Looking at total cycle counts, the picture is even better; we see an impressive speed boost from the ray traced shadows. Because the different rendering passes are pipelined in both apps (i.e. the ray traced shadows app and the cascaded shadow maps app) we are unable to separate how many clocks are used for which pass. This is because portions of the GPU are busy executing work for multiple passes at the same time.
Using ray tracing to implement soft shadows leads to a 50% speedup on a PowerVR Wizard GPU
However, the switch to ray traced shadows resulted in a doubling of the performance for the entire frame!
I hope you’ve enjoyed this article about ray tracing and PowerVR Wizard GPUs; we look forward to sharing some exciting news and real-world demonstrations in the near future – stay tuned to our blog and dedicated Twitter account for more details coming soon!
For those interested in finding out more information on our PowerVR Ray Tracing technology, here is a selection of available resources from our archives:
An introduction of ray tracing and the fundamentals of our PowerVR Wizard architecture (Ray tracing made easy)
A blog article about the PowerVR GR6500 ray tracing GPU (PowerVR GR6500: ray tracing is the future… and the future is now) and an in-depth article detailing how ray tracing works (Ray tracing: the future is now)
A featured post on Gamasutra describing practical techniques for ray tracing in games (Practical techniques for ray tracing in games)
A guide on how we’ve implemented hybrid rendering in an experimental version of the Unity game engine (Implementing hybrid ray tracing in a rasterized game engine)
A short preview of the PowerVR Ray Tracing-enabled Unity lightmap editor (PowerVR Ray Tracing behind the lightmaps in the Unity editor)
The story of how we taped out our PowerVR GR6500 ray tracing GPU in a test chip (Award-winning PowerVR GR6500 ray tracing GPU tapes out)
A big thank you to Justin DeCell for making this happen.
Read more about:
Featured BlogsYou May Also Like