GPU-based visibility: empirical study

In this post I am going to study the performance of some GPU-based visibility techniques against traditional ones.

Visibility techniques covered

CPU-based Viewfrustum Culling (VFC)

VFC culls all objects that fall completely outside of the camera frustum. To compute this efficiently (and conservatively), AABBs are used to check whether models are inside or outside the camera frustum.

Stop and Wait (SW or S&W)

S&W uses occlusion queries to determine if an object is visible in screen, and therefore should be rendered. To perform this, we need to render in a front-to-back fashion so that front objects already rendered will occlude back objects, which will not pass the occlusion query test and consecutively will not be rendered.
It is important to notice that objects are visited sequentially after being rendered individually, so that one object will not be queried nor rendered until the previous one has been rendered (or discarded) after being queried.

Coherent Hierarchical Culling (CHC)

Approach based on Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful by Bittner et al.
CHC is a heuristic that tries to reduce waiting times from the S&W approach. To do so, it uses scene hierarchies and temporal information. At first, only the first level of hidden nodes from the previous frame (of each branch of the scene hierarchy) will be queried. In the current implementation, this navigation is performed in a depth-first search (DFS) strategy. Visible leaf nodes in the previous frame are queried in the current frame and rendered without waiting for the queries results. At iteration the algorithm tries to see if there some query results available. If the invisible queried nodes turn out to be visible in this frame, then they are set visible and traversed (i.e. its child nodes will be queried again, and so on). On the other hand, if a leaf nodes that was visible in the previous frame turns out to be hidden, we will have missed because we will have already rendered it. However, we will implicitly pull up the invisibility if parent nodes were not visible as well, since at each new frame all nodes are initialized as hidden after being visited queried. This will fix our miss for the following frame.

Experiment set-up

A camera path has been recorded and reproduced using the following configurations:

No optimization
Viewfrustum Culling (VFC)
Stop and Wait (S&W)
Stop and Wait + Viewfrustum Culling (S&W + VFC)
Coherent Hierarchical Culling (CHC)
Coherent Hierarchical Culling + Viewfrustum Culling (CHC + VFC)

The model used has been moai.ply in a 24 x 24 grid (key button 6).
In the current system, this is a significantly saturated configuration in geometry, so real/interactive times are not achieved for some cases, as can be observed in Figure 1. However, this is not relevant, since the goal of this study is to compare the relative costs of the different techniques, which are effectively displayed.
The camera path has been designed so that different scenarios appear (see Table 1).

Results

Table 1: Path intervals description and approximate average frame times obtained with no optimization, *VFC* and *CHC* (most relevant techniques), with sample images belonging to each interval. Best measures for each scenario are highlighted in green.
Frame	Description	No OPT	VFC	CHC
0-350	Camera close to individual models.	195 ms	160 ms	40 ms
350-550	Camera frustum containing the whole scene at floor level	195 ms	195 ms	55 ms
550-800	Camera seeing the whole scene from above	195 ms	195 ms	370 ms
800-950	Camera navigating below the floor plane	195 ms	120 ms	5 ms
950-1200	Camera navigating inside the grid at floor level	195 ms	90 ms	30 ms
1200-1600	Camera looking at edges and corners of the grid (low amount of objects)	195 ms	45 ms	70 ms
1600-2000	Camera seeing most of the scene from above	195 ms	170 ms	330 ms

Figure 1: Frame times to compute each frame in the test sequence used, evaluated with the proposed visibility techniques. Dashed lines are used to represent the same techniques as non-dashed ones but with viewfrustum culling (VFC).

Discussion

As can be observed in Table 1 and Figure 1, the use of different visibility techniques can imply a huge change in performance.

Viewfrustum Culling (VFC)

The first important difference to notice is the impact of VFC. Compared to no optimizations, VFC achieves much better performance in general, and almost identical performance in the worst cases. Therefore, it is safe to use VFC in this configuration.
Worst cases correspond to sequence intervals in which all elements of the grid are contained inside the camera frustum, hence VFC is not able to reduce the bottleneck.
Best cases are associated to frames that contain very few geometry in the camera frustum.

Stop and Wait (S&W)

The first occlusion query-based method introduced is S&W. The technique alone performs worse than using no optimization techniques. As expected, this is because of the constant stalls in CPU and GPU during the occlusion querying phase.
With VFC, the technique seems to perform better, but still worse than only using VFC.
We could conclude that it is not worth using S&W for almost any scenario.

Coherent Hierarchical Culling (CHC)

Finally, lets analyze the CHC effects on performance. In general, it reduces significantly the frame times with respect to the previous approaches. Looking at Figure 1, we can even see many frames in which the improvement is around one order of magnitude.
However, it can be observed as well that this technique is not very stable in terms of frame rate. VFC could seem unstable too, but in all cases it has the same or better performance than no optimization times. This is not the case of CHC. There are some frames in which the performance is significantly worse than no optimization at all. This scenario appears in frames with no occlusions (or very few of them), because even if all leaf nodes end up being rendered, they will need to be queried as well. Therefore, since a leaf node is equivalent to a model, all the models will be queried, and the frame rendering phase will not end until all queries have finished.
Also, notice that VFC has almost no impact when using CHC. This is because the hierarchical structure of the scene is performing this step implicitly. If an inner node (containing a set of models) does not pass the query test, then it is considered as occluded, but it could be also the case that it has not been projected to the screen, i.e. it is outside the camera frustum.
Therefore, very few queries are avoided if we use VFC, to the point that it there is no significant gain in using it (VFC test as expensive as launching the few queries avoided).

Even though current techniques are more involved, the performance of the algorithms covered give a brief idea of the pros and cons of each one. I learned so much working on this, so I hope it has been useful to you too in some way.

See you in the next one!

« Webpage opening

Santi Gonzalez