C++ Graphics Rendering Modeling

Scalable Renderer

Academic SOLO project

Description

Real-time renderer built in C++ with scalability features and optimization algorithms for graphics and game engines. It involves LOD generation, time-critical rendering, and visibility preprocessing, among others.

Final project for the UPC course Scalable Rendering for Graphics and Game Engines.

Demo


Implemented features

  • Compute and display framerate: Framerate computed using the time elapsed during the previous F frames. This amount is displayed, as well as the framerate provided by ImGui.

  • Draw multiple copies of the same object: Done, even though it is not useful in the current state of the application. In any case, this functionality can be triggered via numerical keys.

  • Complex museum: Tile-based museum generated via external application (Tiled). A tile file parser has been implemented, allowing user-friendly manipulation directly on the Tiled editor. The current museum has 11 rooms filled with different content (see Figure 1).

    Tilemap


  • Museum visualization: Floor and statue tiles are first filled with a flattened cube to visualize a floor, and a statue is added then if needed. When a floor and a non-floor tile threshold is found, a flattened vertical cube is added in the right tile face to symbolize a wall.

  • Movement: FPS-like movement, with ASWD effect being with respect to the camera, and E (up) Q (down) with respect to global coordinates.

  • Mesh loading: To perform this operation, a static (C++ static key) dictionary structure has been used. When an entity asks for a mesh to be its representative, then it reads it from disk, but instead of discarding it afterwards, it saves it on memory so that all entities using the same mesh can simply have a pointer attribute pointing to the region where it is contained, without having to read it again.

  • Compute simplified versions of the loaded model using an octree: The octree is built by evaluating each vertex of the model individually, generating new levels until reaching the maximum desired one.
    The level of detail (LOD) meshes are then reconstructed by connecting the representative of the octree nodes (using the average) at the selected level if there are vertices being represented connected in the detailed version.
    LODs are generated and stored to disk, unless the targeted LOD already exists, then it is loaded without computing the octree.
    Given the current implementation of the octree class, it is not possible to simplify very complex meshes. For example, lucy.ply cannot be simplified with QEM using  ≥ 9 as maximum depth without running into memory issues (for 16 GB RAM systems, which is my case). Hence, the scene proposes a set of models with LOD levels 6,7,8, and 9 but without including lucy.ply.
    The octree implementation could be optimized and also separated in different types of octrees in terms of the desired representation and clustering techniques to use.

    • Compute vertex representative using QEM: The total fundamental error quadric is computed on the fly as new vertices are evaluated. Afterwards, this is used to obtain the point that minimizes the error metric as the node’s representative.
  • Preservation of thin features: In the vertex evaluation phase, instead of adding to a single cumulative value or quadric, use 8 different clusters. Each of this clusters define a set of directions within an octant. Each vertex will be classified to the cluster that contains its normal.
    If close enough to the entity, some flipped and distorted triangles can be observed, which is a drawback of the method, but also there could be some bug in the implementation enhancing this effect.

  • Representative and clustering techniques can be combined: Average representative with voxel clustering (simple approach), QEM representative with voxel clustering, average representative with normal-based voxel clustering, and QEM representative with normal-based voxel clustering.

  • Time-critical rendering: Since the triangles per second (TPS) is a GPU-dependant metric, a UI slider is used to let the user chose an appropriate value. Even though it could be optimized, the current implementation recomputes the whole value queue from scratch at every frame (some constant measures are stored in the mesh structure to avoid repetitive computations).

  • Hysteresis transition: A blocked flag is assigned to each entity. When an entity transitions to a different LOD (because of the time-critical algorithm), it is set as blocked and we store the distance from the object to the camera. This entity will not be allowed to change its LOD until the camera has gone closer or further than a given distance. In the current implementation, this threshold has been implemented in two different ways (both can be tuned by the user):

    1. Relative distance: the threshold is proportional to the distance from the entity to the camera at the moment of blocking.

    2. Absolute distance: the threshold is fixed.

  • Precompute cell-to-cell visibility: The precomputation is done in grid space using ray casting for each cell in random directions.

  • Visibility precomputation optimization:

    1. Random offset for cell center sampling: A technique that has provided good accuracy with a reduced number of rays is to use as cell center a randomized position within the cell bounds. Since this point defines where the ray will pass through, this allows further cell visibility exploration rather than visibility from the fixed cell center.