In this post I am going to study the performance of some GPU-based visibility techniques against traditional ones.
Visibility techniques covered
CPU-based Viewfrustum Culling (VFC)
VFC culls all objects that fall completely outside of the camera frustum. To compute this efficiently (and conservatively), AABBs are used to check whether models are inside or outside the camera frustum.
Stop and Wait (SW or S&W)
S&W uses occlusion queries to determine if an object is visible in
screen, and therefore should be rendered. To perform this, we need to
render in a front-to-back fashion so that front objects already rendered
will occlude back objects, which will not pass the occlusion query test
and consecutively will not be rendered.
It is important to notice that objects are visited sequentially after
being rendered individually, so that one object will not be queried nor
rendered until the previous one has been rendered (or discarded) after
being queried.
Coherent Hierarchical Culling (CHC)
Approach based on Coherent Hierarchical Culling: Hardware Occlusion
Queries Made Useful by Bittner et al.
CHC is a heuristic that tries to reduce waiting times from the S&W
approach. To do so, it uses scene hierarchies and temporal information.
At first, only the first level of hidden nodes from the previous frame
(of each branch of the scene hierarchy) will be queried. In the current
implementation, this navigation is performed in a depth-first search
(DFS) strategy. Visible leaf nodes in the previous frame are queried
in the current frame and rendered without waiting for the queries
results. At iteration the algorithm tries to see if there some query
results available. If the invisible queried nodes turn out to be visible
in this frame, then they are set visible and traversed (i.e. its child
nodes will be queried again, and so on). On the other hand, if a leaf
nodes that was visible in the previous frame turns out to be hidden, we
will have missed because we will have already rendered it. However, we
will implicitly pull up the invisibility if parent nodes were not
visible as well, since at each new frame all nodes are initialized as
hidden after being visited queried. This will fix our miss for the
following frame.
Experiment set-up
A camera path has been recorded and reproduced using the following configurations:
-
No optimization
-
Viewfrustum Culling (VFC)
-
Stop and Wait (S&W)
-
Stop and Wait + Viewfrustum Culling (S&W + VFC)
-
Coherent Hierarchical Culling (CHC)
-
Coherent Hierarchical Culling + Viewfrustum Culling (CHC + VFC)
The model used has been moai.ply in a 24 x 24 grid (key button 6).
In the current system, this is a significantly saturated configuration in geometry, so real/interactive times are not achieved for some cases, as can be observed in Figure 1. However, this is not relevant, since the goal of this study is to compare the relative costs
of the different techniques, which are effectively displayed.
The camera path has been designed so that different scenarios appear
(see Table 1).
Results
Frame | Description | Image | No OPT | VFC | CHC |
---|---|---|---|---|---|
0-350 | Camera close to individual models. | 195 ms | 160 ms | 40 ms | |
350-550 | Camera frustum containing the whole scene at floor level | 195 ms | 195 ms | 55 ms | |
550-800 | Camera seeing the whole scene from above | 195 ms | 195 ms | 370 ms | |
800-950 | Camera navigating below the floor plane | 195 ms | 120 ms | 5 ms | |
950-1200 | Camera navigating inside the grid at floor level | 195 ms | 90 ms | 30 ms | |
1200-1600 | Camera looking at edges and corners of the grid (low amount of objects) | 195 ms | 45 ms | 70 ms | |
1600-2000 | Camera seeing most of the scene from above | 195 ms | 170 ms | 330 ms |
Discussion
As can be observed in Table 1 and Figure 1, the use of different visibility techniques can imply a huge change in performance.
Viewfrustum Culling (VFC)
The first important difference to notice is the impact of VFC. Compared to no optimizations, VFC achieves much better performance in
general, and almost identical performance in the worst cases. Therefore, it is safe to use VFC in this configuration.
Worst cases correspond to sequence intervals in which all elements of the grid are contained inside the camera frustum, hence VFC is not able to reduce the bottleneck.
Best cases are associated to frames that contain very few geometry in
the camera frustum.
Stop and Wait (S&W)
The first occlusion query-based method introduced is S&W. The technique alone performs worse than using no optimization techniques. As
expected, this is because of the constant stalls in CPU and GPU during the occlusion querying phase.
With VFC, the technique seems to perform better, but still worse than only using VFC.
We could conclude that it is not worth using S&W for almost any scenario.
Coherent Hierarchical Culling (CHC)
Finally, lets analyze the CHC effects on performance. In general, it reduces significantly the frame times with respect to the previous approaches. Looking at Figure 1, we can even see many frames in which the improvement is around one order of magnitude.
However, it can be observed as well that this technique is not very stable in terms of frame rate. VFC could seem unstable too, but in all cases it has the same or better performance than no optimization times. This is not the case of CHC. There are some frames in which the performance is significantly worse than no optimization at all. This scenario appears in frames with no occlusions (or very few of them), because even if all leaf nodes end up being rendered, they will need to be queried as well. Therefore, since a leaf node is equivalent to a model, all the models will be queried, and the frame rendering phase will not end until all queries have finished.
Also, notice that VFC has almost no impact when using CHC. This is because the hierarchical structure of the scene is performing this step implicitly. If an inner node (containing a set of models) does not pass the query test, then it is considered as occluded, but it could be also the case that it has not been projected to the screen, i.e. it is outside the camera frustum.
Therefore, very few queries are avoided if we use VFC, to the point that it there is no significant gain in using it (VFC test as expensive as launching the few queries avoided).
Even though current techniques are more involved, the performance of the algorithms covered give a brief idea of the pros and cons of each one. I learned so much working on this, so I hope it has been useful to you too in some way.
See you in the next one!