Optimizing an Open World

Understanding the Problem

Sons of Ryke is a massive open-world flight game in which the player can move hundreds of meters per second. This poses a few challenges we need to address. Primarily, we need to render this massive world efficiently, stream it in/out quickly & shift it to re-center as the player moves. Traditional culling methods like portal & occlusion culling are also largely useless as the player is viewing the world from above. All of these problems are primarily rendering problems, with GPU usage our bottleneck, so that’ll be the focus of this article.

When optimizing for the GPU we’re really trying to accomplish 1 of 3 things:

  1. Reduce the number of overall computations required by reducing tri-counts & fragment shader complexity.

  2. Reduce the amount of data passed into GPU memory so we can avoid shuffling data around or waiting on data transfers.

  3. Parallelizing computations as much as possible by reducing draw-calls & setpass-calls.

 

Level of Detail Meshes

LODs are your friend

For the uninitiated, LODs are Level of detail meshes. The basic idea is the further away a mesh is, the smaller it appears on screen & the less detail is required. They’re the mesh equivalent of Mip-maps. It’s not necessary to make LODs for everything, especially if the original mesh is already fairly simple, but for poly-dense items or items you know you’ll be rendering thousands of times, LODs are critical.

In Sons of Ryke all of the cliff & stone kit pieces use up to 4 LODs & simply put, the game wouldn’t run without these. We also use LODs heavily for scatter items like trees, which are rendered in the tens of thousands.

Hierarchical LODs

Hierarchical LOD’s serve to reduce draw-calls more than mesh complexity. At its simplest, you would take all of the lowest LOD meshes in an area & combine them into a single mesh. Then, when the player is far enough away, you would render this single low-detail mesh, instead of potentially hundreds of LOD3/4 meshes. You can add additional sub-divisions higher up if you need more granularity too.

In Sons of Ryke we’re doing this per “node” (Pictured Right) and for each ship. This culling also doubles to cull trails, & Vfx. This saves thousands of draw-calls each frame, doubly so in more significant battles with nearly a hundred ships flying around.

Separate Shadows

Why use a 20k-poly mesh to render shadows when you already have lower LOD meshes that will serve the purpose? We’re already separating out a low-density mesh for collision, it makes perfect sense to do the same for shadows. Simply disable shadows on all of your LOD meshes & create a duplicate mesh that only renders shadows.

 
 

Draw-call & SetPass-call Reduction

Texture Atlassing

Atlassing is the act of combining the resources of multiple materials into a single resource, allowing a shared material to be used for multiple objects. Usually in the form of shared texture sheets. By using the same material on multiple objects we can then batch those objects together, reducing SetPass-calls. Unity handles this for low-poly objects automatically with its SRP Batcher.

In Sons of Ryke most of the texture detail comes from combining multiple octaves of procedural 3D noise. It’s really only the albedo, emissive, metallicity & smoothness that changes per “sub-material“. I’ve also added a noise scale parameter to break up the noise between different sub-materials.

I use a custom tool to build texture atlasses from a bunch of sub-materials specified in a Material Set object. This tool produces 2 textures, one for albedo/emission & another packed with everything else. By using a custom tool like this I can rapidly iterate on these properties without having to leave Unity. It also supports multiple “Palettes“, so the same texture atlas & material can be used across multiple biomes/characters/ship types.

Mesh Combining

Once we’re sharing materials between objects we can optimize further by combining those objects into a single mesh. This increases memory requirements but reduces draw-calls. You need to be careful here & test for your use case, as this will not always be an optimization.

If you’re working with a bunch of mid to low-poly objects this is likely to benefit performance. There are a bunch of mesh combining tools online or on the asset store to handle this for you.

For Sons of Ryke, I use a custom tool that packs the combined meshes of prefab objects into a separate asset in the same directory as the prefab itself. This keeps these potentially large mesh files out of the scene file itself & makes sharing & cleaning up combined meshes easy & intuitive.

 

Static Batching

Static batching is mesh combining on steroids. It works by combining all meshes of a given material within a scene/hierarchy into a single giant mesh. It can then render parts of this giant mesh based on which sub-objects are visible, allowing it to work with LOD meshes. This is a huge boon & something not afforded by traditional mesh combining.

This comes with a few caveats though -

  1. Memory requirement - even if 10% of this mesh is actually in use, all of this mesh is stored in memory. This mesh is also usually fairly large. Ideally, you want some method of dividing the world. If you’re using scenes this is fairly easy. If you’re generating the world yourself you’ll need another method.

  2. Cannot be moved (Sort of). Once combined, meshes are stored in relation to the root object (Usually the scene) & cannot be moved. So for characters, ships & other moving objects, this is not an option (See Hierarchical LODs instead).

Batching, by default, is attached to the scene object, which cannot be moved. In Sons of Ryke, we need to ability to shift the world over as the player explores it to avoid floating-point precision issues. We already have the world divided up into “tiles“ which are streamed in and out as the player moves through the world. Each tile contains the base tile mesh & a bunch of “nodes“. These tiles make the perfect parent for static batching, each acting as a container that can be loaded & discarded as needed.

Unity comes with this handy method - StaticBatchUtility.Combine. This allows us to specify a root object (In our case our tile) that acts as the root transform. It also allows us to specify an array of objects to combine, so we have complete control over which objects we’re combining. Usually, a layer-mask will suffice here.

It’s important to note that Unity will not automatically clean up combined meshes created this way. It’s critical you call Resources.UnloadUnusedAssets to destroy them once you’re done with them. There are even reports of memory leaks in editor when not cleaned up.

 

GPU Instancing

GPU instancing, when combined with compute buffers/shaders, allows us to efficiently draw tens of thousands of copies of the same object in a single draw-call. This is incredibly useful when drawing things like grass, small stones, or other “scatter” objects. For this to be done efficiently at scale these objects cannot be gameObjects, so they cannot have collision or any other logic tied to them. Instead, you’re passing a simple buffer of TRS matrices to the GPU to be rendered.

The kicker here, is that all of those objects will be rendered, whether they’re on-screen or not. Even if they’re immediately clipped before rasterization this is still expensive. So we need to perform any culling or LODing ourselves.

In Sons of Ryke, I’ve developed a custom system I named “Expanse“ to handle all of this for me. You can read more about it here.

Next
Next

Meaningful GPU Instanced Rendering