Quick Ref For Technical Artists
- Abby Karnstein
- Nov 21, 2024
- 7 min read
Updated: Nov 21, 2024
A collection of vaguely useful things to know.
A list of shading models from least to most expensive:
Unlit
Default Lit
Preintegrated Skin
Subsurface
Clearcoat
Subsurface Profile
Blend modes are the way the renderer combines an actor's material in the foreground over what has already been drawn in the background.
Instruction count can be useful when we compare directly to other materials, but the shader complexity viewmode does not use the complexity of the shader nodes themselves, and so can be innaccurate. Nonetheless, it's a great way to quickly identify problems areas in a level. Shader Instruction Count is more useful for determining the real expense of a material, when profilers aren't an option.
Anti-Aliasing Settings:
FXAA is the cheapest method of aliasing- it only works on the rendered image and not on the geometry of the level.
SSAA, or supersampling, renders the scene at a higher resolution and then downsamples via an averaging filter.
MSAA is the same as SSAA but optimised, and used primarily on mobile and only available if foregoing deferred rendering for forward. It focuses only on the overlapping areas of the level.
TAA was the default in UE4, and uses custom algorithms alongside a temporal filter solution and hardware anti-aliasing. The quality exceeds that of FXAA, but isn't as cheap.
TSR is the default in UE5, and is an upscaling system usually seen with things like photography, where an image is made much larger at no cost to the quality. This allows a 1080p render to be viewed at 4k resolution in real time with no major impact to hardware.
Higher aliasing levels (above 4) is generally overkill, especially on UE5, but could be useful if you have a a lot of thin objects that require heavier aliasing.
Forward vs. Deferred Rendering
Forward rendering will take each object in a scene, starting with the farthest away, send the vertices and other information to the GPU, compute all the lighting, and move on to the next object. Computation of occluded areas is wasted this way, but it has been the typical rendering method for games up until recent history. Forward rendering works best when the lighting complexity is low, and few lights are being used.
Deferred rendering writes all the information we might want about a pixels final color into a G Buffer- the base color, the reflections, occlusion, whatever we might want, without lighting information. Once this buffer has the information for all objects in a scene, a lighting pass is done all at once at the end. Deferred is the default for Unreal as it works much better with complex and otherwise computationally demanding scenes, processing 1000 lights and 1000 objects much more easily than a forward renderer could.
Lumen
Lumen works off of MDFs, or Mesh Distance Fields. These are merged into a Global Distance Field to work more efficiently with real time rendering. These both have viewport view modes. Only the first 2 metres of the ray use the Mesh Distance Field.
A surface cache is also generated offline for each mesh actor, creating simplified cards on geometry to tell the engine where light should bounce. A default of 12 cards is assigned to each mesh.
Lumen uses infinite rays to calculate the most accurate lighting scenario, allowing for offscreen reflections to influence the viewed scene. The standard is software raytracing, but for offline renders, a more robust hardware raytracing option is available. MDFs are packed into a single atlas to allow for this, and stream in and out depending on current distance from the camera, and frustrum culling.
Thin objects can require a higher resolution scale, Voxel Density can also be increased, but this setting is global, and can have a major impact on performance. Similarly, large meshes work poorly with Lumen if nanite is not enabled on them. Distance Field resolution is based on the imported dimensions of the mesh, and will not account for in-editor scaling.
Large indirect lighting scale is not correctly supported by screen traces past certain threshold, and will cause view-dependent global illumination. Screen traces are enabled by default, and despite limitations with indirect lighting typically provide a better lighting result. Lumen by default covers 200m from the camera while using software ray tracing.
Lumen does not support previous generation consoles, but is supported on Android Vulkan, and can be used with the mobile renderer.
Common Bottlenecks
Memory & Material Bottlenecks
Hard references should be reserved for assets that are critical to the game’s function and are always present; they will load the asset in memory when they are called. Conversely, soft references hold the path to the asset, and are only loaded when needed, allowing for dynamic loading and unloading of assets.
The default textures loaded into a material are hard references- they will always be loaded and always exist in memory, even when not used by any objects on screen. For this reason, it is much better practice to use very small, plain textures as your default maps within materials, to refrain from large textures being loaded into memory unnecessarily.
Textures that do not adhere to the power of two (128x128, 512x512,4096x4096) cannot use mipmapping (texture LODs) and will often occupy much more space in memory than regularised counterparts. A texture at size 1023x1025 will be much more expensive at runtime and create issues that otherwise would not exist with a 1024x1024 texture. These textures can also be subject to the Moiré effect (aliasing).
Default DXT compression is generally fine but can cause artefacting and excess size of textures that could be smaller and cleaner, this is because, among other things, DXT1 uses a fixed compression ratio. Greyscale compression is often heavier than RGB or masked compression, and should be reserved for specific use cases.
Permutations
Usage Flag Permutations create additional shaders that Unreal must calculate, oftentimes inside the material graph with use of switches, static component mask parameters and switch parameters. Each change of a switch from true to false or vice versa will create what is essentially a new shader. Greyed out nodes within a material will not be compiled and not create additional compilation queries, but active output nodes will. Particularly heavy are the ‘Used with (x)’ flags within the material, which will compile its own permutations for each switch, setting and feature combination.
Permutations can also be generated by the project’s global settings, as different code must be compiled for different material types. The shader has to compile different material versions for assets with both static lighting and dynamic lighting, if both are toggled, as well as static with fog and dynamic with fog should atmospheric or volumetric fog be enabled. There are new permutations for the blend modes of materials, and for different usages of materials. A static, skeletal and Niagara mesh using the same material will generate extra code to compensate for the different properties of each Actor. More on permutations here.
As with most things, UV seams on objects should be as many as necessary and as little as possible- Unreal will create extra geometry along hard edges and seams as part of its own internal computation to correctly render imported assets. This is becoming less and less relevant in modern graphics as rasterization is the cheapest thing to do in the modern graphics pipeline, even more so with the use of nanite, but is something to be aware of regardless.
Widget Switcher
Widget Switcher loads and holds in memory all children for the duration of the call, it is only able to be hard referenced.
Culling
Culling is a very useful tool, but should not be used as a replacement for level streaming. Occlusion and frustum culling are great but they also come at a cost, and we should aim to have as few things in the culling stream as possible to reduce the number of overall occlusion queries we have to do at runtime. Instanced meshes here should therefore be used for small groups of actors in close proximity to one another, as excessive instance merging will render both culling options unusable.
Auto instancing exists as of 4.22 to maximise performance, and can save on draws in the base and depth passes.
For each LOD on a static mesh, there will be a separate draw call on the GPU regardless of merging.
Each material slot that a mesh uses creates a new draw call, plus extra for each pass, meaning there is no performance loss or gain in having a mesh with 10 materials vs 10 meshes with 1 assigned material. If a mesh has the same material in a couple of slots, merge the materials to save a few more milliseconds. This does not apply to nanite meshes.
Stick to the dictated texel density upon engine import, the resolution is slightly better if you export higher from Painter and then decrease resolution in Photoshop as Unreal's compression is fine but there's better options than DXT1, as the compression is universal and therefore dependent on the original texture used..
Always utilise texture groups to define and maintain texture properties across multiple maps. This allows for a much more efficient texture memory pool.
Modular creation is generally very good, but not always. Actors should be merged if you have a lot of heavy draw calls and split if your pixel shader is working overtime- there are more things to consider in the case of the pixel shader, but this is one factor.
Remember that draw calls are not equal and we should not measure against how many we have but how much they actually cost. A mesh with 1000 draw calls can be much cheaper than one with 100 if it is significantly smaller on screen. Rendering will take longer the closer the object is to the camera as it's using more pixels. Shader complexity should be gauged on level of detail as well as distance from the player.
Sometimes we can save on draw calls and milliseconds by hiding objects when they are not visible, which can be very useful with things like the trains, as they will disappear for all players at the same point.
Lighting
Each movable object creates two dynamic shadows from a stationary light, a shadow for the static world casting onto the object and one for the object casting on to the world. In these instances, switching to dynamic lighting can save a lot of milliseconds.
Movable lights with Cast Shadow enabled can have poor implications on performance, especially if they have a large radius, as each object within that radius must then be drawn into a shadow pass.
Generally we want to disable cast shadows where possible. Ctrl + L + LMB will create a point light under the cursor with whichever color is currently underneath the cursor, which is very useful for creating fill lights and cheaply aiding Global Illumination.
Resources:
Epic has some performance guidelines outlined here: https://dev.epicgames.com/community/learning/courses/eER/unreal-engine-technical-guide-to-linear-content-creation-production/k8pB/unreal-engine-performance-profiling-and-debugging
Additional:
Comments