How to build a BVH – part 5: TLAS & BLAS

In this series we explore the wonderful world of the bounding volume hierarchy: the mysterious data structure that enables real-time ray tracing on recent GPUs, in complex, animated scenes. Despite the broad adoption of ray tracing in games, in-depth understanding of the underlying technology seems to be reserved to a select few. In this article we show how a proper BVH is implemented without support from DXR / RTX / OptiX / Vulkan, in easy-to-read ‘sane C++’. Whether this is actually useful for you, or just interesting as background information, depends on your needs, obviously.

This series consist of ten articles. For each article, full source code is available on Github, in easy-to-read ‘Sane C++’. Contents:

  1. Basics: data structure, simple construction and ray traversal.
  2. Faster rays: the Surface Area Heuristic and ordered traversal.
  3. Faster construction: binned BVH building.
  4. BVHs for animated models: refitting and rebuilding.
  5. BVHs for animated scenes: the top-level BVH (this article).
  6. TLAS & BLAS part 2 – all together now.
  7. Consolidating, part 1: ray tracing textured and shaded meshes.
  8. Consolidating, part 2: Whitted-style ray tracing.
  9. OpenCL: GPU ray tracing, part 1.
  10. OpenCL: GPU ray tracing, part 2 (series finale).

In the final two articles we explore how to build a BVH for a fully dynamic scene, using the same techniques employed by modern games: the top level acceleration structure and the bottom level acceleration structure.

Mise en Place

As usual we’ll start with some preparations – an approach, I recently learned, called ‘mise en place’ in the French cuisine, which apparently also works for programming. For today’s main dish we will need multiple BVHs, so it is time to bring a tiny bit of OOP to the code (just a pinch should do). Let’s add a proper BVH class to encapsulate the BVHNode pool, triangle indices and a triangle list:

A lot of functionality we implemented before is moved to this class, in general without (much) modifications. I made some changes to the names of the methods: BuildBVH() is now BVH::Build(), RefitBVH() is now BVH::Refit(), and IntersectBVH(...) is now BVH::Intersect(...). Methods that are not to be called outside the BVH class are declared private. The data is also private. Loading the scene is now handled by the constructor of the BVH class:

The global scope definition of N (the number of triangles in the mesh) is now passed to the constructor, because I insist on keeping file loading as basic as possible.

With the newly created BVH class, we can now reduce the global application data to:

…which we initialize in the TopLevelApp::Init() function:

OOP has its merits; this is tidy. Plus, we are now ready to add more meshes. Intersecting these efficiently is going to be the first step; after that, we will also animate them efficiently, all using some quite beautiful constructs, commonly referred to as the TLAS and BLAS.

Two Armadillos

Now that the BVH data and logic is properly stored in a class, we can easily add a second mesh:

This is not terribly useful: both armadillos will be at the same location. We could solve this by adding an offset to the vertices of each armadillo. The next issue is the calculation of an intersection. This is not too hard either: calling BVH::Intersect() with a ray updates the intersection t for that ray; we can thus call this function once for each armadillo. That does work, but it seems impractical for a scene with more than a couple armadillos… And while we are pointing out problems with the current plan: why do we store the armadillo twice, while all the polygons are the same?

Let’s start with that last question. To move an armadillo, we can add a translation to every vertex of the mesh. Or, and that is a tantalizing alternative: we can move the camera with the inverse of the same translation.

Is the train moving to the left, or is the landscape moving to the right? Doesn’t matter, it’s all the same.

So instead of loading the armadillo twice, and moving one of them, and intersecting them one at a time, we can do something clever:

Here, the same BVH is intersected twice, with the same ray even, but between invocations of bvh.Intersect(ray), we change the ray. By moving the ray a bit to the left, we effectively move the scene to the right. Then, by moving the ray to the right, we move the scene to the left. The result is a scene with two armadillos.

We can take this one step further. Moving the ray in the opposite direction of a mesh can be applied to translations as well as orientations. Concrete and formal:

Given a mesh with a BVH built in object space and 4×4 matrix that stores orientation and translation for the mesh, we can intersect its BVH in world space by applying the inverse transform to the rays.

Recall the race game scenario. The track is stationary (it uses the identity matrix), but the cars use matrices to zip around. And the car tires use matrices relative to the car.

This may sound complex, but it is very powerful. In any 3D engine, the meshes of the scene hierarchy come with their own 4×4 matrix. To ray trace these meshes, we simply transform rays using the inverse matrix. This still works if the matrix changes from one frame to another: mesh motion defined by a matrix does not require a rebuild or refit, and is essentially free.

Spinning Armadillos

It is time to add this functionality to the BVH class. We will need a few things:

  • Storage for the transformation matrix (4×4, so it includes rotation and translation, potentially also scaling and shearing).
  • Storage for the bounds of the BVH in world space. This will be important for constructing the TLAS later.

The new fields are named invTransform and bounds:

Note that we don’t actually store the transformation matrix: we store its inverse. Inverting a matrix is somewhat costly, so we don’t want to do this for every ray.

The new fields are filled using a new method:

This is some pretty dense code, so allow me to explain. First, as promised, the invTransform field is set using the inverted 4×4 matrix. After this we calculate the bounding box for the transformed bounding box of the object. This sounds strange, but it is quite useful: The only way to detect if a ray misses a BVH is by checking its root node against the ray. In the case of a transformed BVH, this already requires transformation of the ray. Storing (the AABB of) the transformed bounds of the BVH root node allows us to do an early ‘hit/miss’ test using the original ray. The AABB is likely going to be too large, but that’s OK: at least it lets us efficiently ignore BVHs that a ray does not even get close to. In the actual code, the for loop calculates the 8 unique combinations of x, y and z from aabbMin and aabbMax of the root node; these combinations represent the 8 corners of the root AABB.

Using the inverted transform we can now make the required modifications to BVH::Traverse.

The ray that is about to intersect the BVH first gets copied, so we can restore it later: we may wish to intersect the same ray with another BVH. After that we transform the ray origin and direction, and recalculate the reciprocals of the direction. With the transformed ray we then traverse the BVH in the usual manner.

Maths note: the ray origin is a position; to transform it using a matrix we should apply rotation and translation. We thus use a homogeneous coordinate with 1 for the w component. The ray direction on the other hand should be rotated but not translated; it is thus turned into a homogeneous coordinate with 0 for the w component. The TransformPosition and TransformVector functions reflect this difference.

At the end of the code we restore the original ray, except for one field: the t value, which represents the nearest intersection distance.

With the proposed modifications, the Tick function now becomes quite a bit more elegant:

Note how bvh[0] is transformed to the left using a translation matrix. For bvh[1] we use a concatenated matrix, consisting of a rotation around the y-axis followed by a translation to the right.

Many Armadillos

With the code we have so far we can conveniently spin and move models, without rebuilding or refitting their BVHs. But what if we have a lot of Armadillos? This is where the top level acceleration structure comes in.

It starts with the notion that a pair of BVHs can be combined into a single BVH, by simply adding one new node, that has the pair of BVHs as child nodes. We now have a new, valid BVH that we can traverse as if it were a regular BVH. If we combine this with the concept of transformed BVHs, we get something very powerful. The nodes that we use to combine a set of BVHs into a single BVH are referred to as the top level acceleration structure, or TLAS. It effectively helps us to hierarchically cull groups of objects, just like we culled groups of triangles using the BVH.

Let’s start with the definition of a TLAS node:

This looks like a regular BVH node. The leftFirst field, which originally stored either the index of the left node, or the ‘index of the index’ of the first triangle, now stores a BLAS index: in our simple case with two BVHs, this would be either 0 or 1. In the TLAS, we will never have more than one BLAS in a leaf node, so indirection is not needed here. The triCount field is also not needed. Since we still need to know if this is a leaf node or not, we now simply use a boolean, encoded as a uint.

Also analogous to the original BVH we get a TLAS, which stores data related to the top level acceleration structure:

The tlasNode field is a pool of nodes, analogous to the bvhNode array we used for constructing a BVH. The triangle array of the BVH is replaced by a BVH array. Each BVH is a bottom level acceleration structure, so we name this array blas.

Let’s build a minimal TLAS, before we get to more complex scenes.

This is the constructor for the TLAS. It takes an array of BVHs (and since I stubbornly evade the use of a convenient vector here, N stores the BVH count), which we store in the TLAS object. A TLAS over N BVHs will consist of 2N-1 nodes, so we pre-allocate this set of nodes. And as usual, for cache alignment reasons, we skip tlasNode[1], so the actual array has 2N elements. Element 0 is, as usual, reserved for the root.

For just two BVHs we can build the TLAS manually.

Here, two TLAS leaf nodes are created, which reference BVH 0 and 1. The nodes have massive AABBs, to ensure that a ray will always intersect them. We’ll need to refine that later. The root node of the TLAS references child node 2 (and thus also 3), which we just created. The root also has a massive AABB, for now.

The TLAS is traversed like a regular BVH:

The difference between TLAS traversal and BLAS traversal is in the leaf nodes. For the TLAS, a leaf contains a single BLAS, which we must intersect. After intersecting the BLAS, we still may need to intersect other BVHs (if they are closer than the nearest intersection so far), so the functionality that restores the ray after traversing a BVH comes in handy now.

Let’s put the TLAS/BLAS to the test. First, we create the TLAS in the ::Init function:

Then, in the ::Tick function, we replace the two calls to BVH::Intersect(...) by a single call to TLAS::Intersect(...):

And that is all. We now have two BVHs, nicely tucked away into a single structure, which we can intersect with a single call. We can still change the transforms for the BVHs, per frame if we wish, without any cost.

Loose Ends

A lot remains to be done, but this is already a pretty long article. Next time we will fill in the details:

  • Rebuilding the TLAS per frame
  • Building a TLAS for lots of armadillos
  • Proper AABBs in the TLAS
  • Instancing, so we don’t have to load the armadillo twice
  • Combining TLAS/BLAS with refitting and rebuilding

In short: everything will come together.

Closing Remarks

The complete code for this project (for Windows / Visual Studio) can be found on GitHub.

Enjoyed the article? Spotted an error?

Ping me on Twitter (@j_bikker), comment on this article, or send an email to bikker.j@gmail.com.
You can also find me on Github: https://github.com/jbikker.

Or read some more articles: https://jacco.ompf2.com. Some favorites:

Author: jbikker

3 thoughts on “How to build a BVH – part 5: TLAS & BLAS

  1. Maybe I misunderstand something, but shouldn’t ray.t inside BVH::Intersect() get converted from BLAS-space to TLAS-space? (by scaling by appropriate vector)

    Also, great series – it helped me tremendously in understanding BVHs 🙂

    1. I know it is very counter-intuitive, but no: you should *not* scale ray.t when leaving the BLAS. Note that ray.D is likewise not normalized when you *enter* BLAS space.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.