Instanced Rendering

Learn how to use GPU instancing to render thousands of objects with minimal draw calls for maximum performance.

GPU instancing is a rendering technique that draws multiple copies of the same mesh in a single draw call. This dramatically reduces CPU overhead and enables rendering of thousands of objects at high frame rates.

Performance Impact

Instancing can reduce draw calls from 1000+ to just 1, improving performance by 10-100x for scenes with many identical objects like trees, rocks, enemies, and particles.

What is Instancing#

Traditional rendering issues one draw call per object. Each draw call has overhead for state changes, GPU command submission, and driver validation. With 1000 objects, that's 1000 draw calls and significant CPU bottleneck.

GPU instancing sends the mesh geometry once, along with per-instance data (position, rotation, scale, color) in a single batch. The GPU then draws all instances in one draw call using vertex shaders to position each instance.

Benefits#

  • Massive draw call reduction — 1000 objects = 1 draw call instead of 1000 (1000x reduction)
  • Lower CPU overhead — Less time spent on draw call submission and state changes
  • Better cache utilization — GPU can reuse geometry data across all instances
  • Scalability — Supports 10,000+ instances per mesh with minimal overhead
  • Per-instance customization — Each instance can have unique transform, color, and material index

When to Use Instancing#

Use CaseBenefitExample
Many identical objectsMaximum draw call reductionTrees, rocks, grass, props
Particle systemsThousands of particles efficientlyFire, smoke, rain, sparks
Enemies/NPCsLarge crowds and armiesZombies, soldiers, civilians
Procedural contentEfficient generationDungeons, cities, terrain details
Building blocksVoxel-like worldsMinecraft-style games

GPU Instancing Setup#

STEM provides automatic GPU instancing through the Instanced component. Simply mark entities for instancing and the InstancedRenderSystem handles batching automatically.

Basic Setup#

import { addComponent, Instanced, Transform } from '@stem/core/ecs';
import { AssetRegistry } from '@stem/core/assets';
// Load a mesh asset
const treeAssetId = await AssetRegistry.loadGLTF('/models/tree.glb');
// Create 1000 trees
for (let i = 0; i < 1000; i++) {
const entity = world.createEntity();
// Add Transform component
addComponent(world, Transform, entity);
Transform.position[entity][0] = Math.random() * 200 - 100;
Transform.position[entity][1] = 0;
Transform.position[entity][2] = Math.random() * 200 - 100;
Transform.scale[entity][0] = 1 + Math.random() * 0.5;
Transform.scale[entity][1] = 1 + Math.random() * 0.5;
Transform.scale[entity][2] = 1 + Math.random() * 0.5;
// Add Instanced component - triggers automatic batching
addComponent(world, Instanced, entity);
Instanced.assetId[entity] = treeAssetId;
}
// Result: 1000 trees rendered in 1-2 draw calls instead of 1000!

Spatial Chunk Batching#

STEM uses spatial chunking to group nearby instances into batches. This enables efficient frustum culling and LOD transitions.

// STEM automatically organizes instances into 32x32 meter chunks
// Chunks are culled based on camera frustum
// Each chunk maintains separate batches per asset/material
import { getInstancedChunkTable } from '@stem/core/ecs/systems/InstancedRenderSystem';
const chunkTable = getInstancedChunkTable();
// Get chunk statistics
const stats = chunkTable.getStats();
console.log('Total chunks:', stats.totalChunks);
console.log('Visible chunks:', stats.visibleChunks);
console.log('Total buckets:', stats.totalBuckets);
console.log('Total instances:', stats.totalInstances);
// Each visible chunk renders its instances
// Hidden chunks are completely skipped (zero overhead)

Instance Attributes#

Each instance can have unique attributes sent to the GPU including transform, color, material index, and LOD offset.

Transform (Matrix)#

// Each instance has a 4x4 transform matrix (16 floats = 64 bytes)
// Automatically computed from Transform component
import { Transform } from '@stem/core/ecs';
// Set position
Transform.position[entity][0] = x;
Transform.position[entity][1] = y;
Transform.position[entity][2] = z;
// Set rotation (euler angles in radians)
Transform.rotation[entity][0] = pitch;
Transform.rotation[entity][1] = yaw;
Transform.rotation[entity][2] = roll;
// Or use quaternion directly
Transform.quaternion[entity][0] = qx;
Transform.quaternion[entity][1] = qy;
Transform.quaternion[entity][2] = qz;
Transform.quaternion[entity][3] = qw;
// Set scale
Transform.scale[entity][0] = sx;
Transform.scale[entity][1] = sy;
Transform.scale[entity][2] = sz;
// Matrix is automatically composed and uploaded to GPU

Color Attribute#

import { addComponent, MeshColor } from '@stem/core/ecs';
// Add color to instance
addComponent(world, MeshColor, entity);
MeshColor.r[entity] = 1.0; // Red
MeshColor.g[entity] = 0.5; // Green
MeshColor.b[entity] = 0.2; // Blue
// Color is multiplied with base material color in shader
// Useful for team colors, damage tinting, variety

Material Index#

// Material index allows per-instance material variants
// Useful for texture atlases or material arrays
import { Instanced } from '@stem/core/ecs';
// Each instance can reference a different material variant
// (stored in the asset's material array)
const materialIndex = 2; // Use material variant #2
// Material index is automatically managed by the system
// based on asset metadata and texture atlases

LOD Offset#

// LOD offset (0.0 - 1.0) for smooth LOD transitions
// Automatically managed by InstancedLODSystem
// Offset interpolates between LOD levels in shader:
// - 0.0 = Current LOD level
// - 1.0 = Next LOD level
// - 0.5 = 50% blend between LOD levels
// Example shader usage:
// vec3 position = mix(lodPosition0, lodPosition1, instanceLodOffset);

Dynamic Instancing#

Instances can be added, removed, and modified at runtime with efficient GPU buffer updates.

Adding Instances#

import { addComponent, Instanced, Transform } from '@stem/core/ecs';
function spawnTree(x: number, z: number) {
const entity = world.createEntity();
addComponent(world, Transform, entity);
Transform.position[entity][0] = x;
Transform.position[entity][1] = 0;
Transform.position[entity][2] = z;
addComponent(world, Instanced, entity);
Instanced.assetId[entity] = treeAssetId;
// Instance is automatically added to appropriate chunk/bucket
// GPU buffer is grown if needed (power-of-2 growth)
return entity;
}
// Spawn 100 trees per frame efficiently
for (let i = 0; i < 100; i++) {
spawnTree(Math.random() * 200, Math.random() * 200);
}

Removing Instances#

import { removeEntity } from '@stem/core/ecs';
function despawnTree(entity: number) {
removeEntity(world, entity);
// Instance is automatically removed from chunk/bucket
// GPU buffer is compacted (swap-remove for O(1) deletion)
// No memory leaks or fragmentation
}

Updating Instances#

import { addComponent, Transform, TransformDirty } from '@stem/core/ecs';
function moveTree(entity: number, x: number, z: number) {
// Update position
Transform.position[entity][0] = x;
Transform.position[entity][2] = z;
// Mark as dirty to trigger GPU upload
addComponent(world, TransformDirty, entity);
// If moved to different chunk, automatically relocated
// GPU matrix is updated in next frame
}
// Efficient update system
export const TreeWindSystem = (world: IWorld) => {
const query = createTypedQuery([Transform, Instanced, TreeWind]);
const entities = query(world);
const time = performance.now() / 1000;
for (let i = 0; i < entities.length; i++) {
const eid = entities[i];
// Animate tree swaying
const offset = TreeWind.offset[eid];
Transform.rotation[eid][2] = Math.sin(time + offset) * 0.05;
// Mark dirty for GPU upload
addComponent(world, TransformDirty, eid);
}
return world;
};
// System efficiently updates 1000s of instances per frame

Batching Strategies#

STEM uses multiple batching strategies to maximize performance.

Spatial Batching#

Instances are grouped into 32x32 meter chunks based on XZ position. This enables efficient frustum culling and reduces state changes.

// Chunk key computed from position
const CHUNK_SIZE = 32; // meters
const chunkX = Math.floor(x / CHUNK_SIZE);
const chunkZ = Math.floor(z / CHUNK_SIZE);
const chunkKey = (chunkX << 16) | (chunkZ & 0xFFFF);
// All instances in same chunk are rendered together
// Camera frustum culls entire chunks at once
// Result: Only visible chunks are processed

Material Batching#

Within each chunk, instances are further grouped by material signature (geometry + material + texture atlas).

// Render signature for batching
interface RenderSignature {
renderKey: number; // Hash of geometry + material + atlas
assetId: number; // Asset ID
materialIndex: number; // Material variant
atlasId: number; // Texture atlas ID
bindlessId: number; // Bindless texture ID
}
// Instances with same signature are batched
// Different materials = different draw calls
// Same material + different atlas = different draw calls
// Example:
// - 500 oak trees = 1 draw call
// - 300 pine trees = 1 draw call
// - 200 rocks = 1 draw call
// Total: 3 draw calls for 1000 instances

Buffer Batching#

GPU buffer updates are batched to minimize state changes and uploads.

// Instead of updating each instance separately:
// BAD: 1000 GPU uploads
for (let i = 0; i < instances.length; i++) {
mesh.setMatrixAt(i, matrix);
mesh.instanceMatrix.needsUpdate = true; // Upload!
}
// STEM batches updates efficiently:
// GOOD: 1 GPU upload with range
const dirty = bucket.getDirtyRange(); // e.g., [50, 150]
updateMatrixRange(bucket, dirty.start, dirty.end);
mesh.instanceMatrix.addUpdateRange(dirty.start * 16, dirty.count * 16);
mesh.instanceMatrix.needsUpdate = true; // Single upload
// Benefit: Only dirty instances uploaded (50-90% reduction)

When to Use Instancing#

Instancing is ideal for rendering many copies of the same mesh. Here are guidelines for when to use it.

Good Candidates#

Object TypeInstance CountBenefit
Trees100-10,000Massive forests with minimal overhead
Rocks500-5,000Natural terrain scatter
Grass1,000-100,000Dense grass fields
Particles1,000-50,000Fire, smoke, rain effects
Enemies50-1,000Large crowds and armies
Buildings100-1,000Cities with repeated structures
Projectiles100-5,000Bullets, arrows, magic missiles

Poor Candidates#

  • Unique objects — Hero characters, bosses, unique props (< 5 copies each)
  • Skinned meshes — Animated characters (use skeletal animation batching instead)
  • Dynamic geometry — Meshes that change shape frequently (use custom shaders)
  • Low instance count — < 10 instances (overhead not worth it, use regular rendering)

Instancing Checklist#

When to Instance

Use instancing when:
• Same geometry used 10+ times
• Objects share the same material
• Per-instance data is simple (transform, color)
• Objects don't need unique animations
• Draw calls are a bottleneck (> 200 calls)

Performance Tips#

Minimize Materials#

// BAD: Different materials = separate batches
const oakMaterial = new MeshStandardMaterial({ map: oakTexture });
const pineMaterial = new MeshStandardMaterial({ map: pineTexture });
// GOOD: Shared material with texture atlas
const treeMaterial = new MeshStandardMaterial({ map: treeAtlas });
// Use UV offsets to select texture region
// or material index to select atlas region
// Result: All trees in one batch

Optimize Geometry#

// Keep instance geometry simple
// Target: 100-500 triangles per instance
// Maximum: 2,000 triangles per instance
// Use LOD for distant instances
// - LOD 0 (< 20m): 500 tris
// - LOD 1 (20-50m): 200 tris
// - LOD 2 (50-100m): 50 tris
// - LOD 3 (> 100m): Culled or billboard
// Good geometry for instancing:
// - Rocks: 50-200 tris
// - Trees: 200-1000 tris
// - Buildings: 500-2000 tris
// - Grass: 2-10 tris (billboard)

Buffer Management#

import { getInstancedRenderMetrics } from '@stem/core/ecs/systems/InstancedRenderSystem';
// Monitor buffer pool stats
const metrics = getInstancedRenderMetrics();
console.log('Matrix updates:', metrics.matrixUpdateCount);
console.log('Buffer acquisitions:', metrics.bufferAcquisitionCount);
console.log('WASM path usage:', metrics.wasmPathUsage);
console.log('JS path usage:', metrics.jsPathUsage);
// Tips:
// - Batch additions/removals when possible
// - Avoid frequent chunk boundary crossings
// - Pre-allocate instances at scene load
// - Use dirty flags to minimize updates

Memory Usage#

Each instance uses 80 bytes of GPU memory:

  • Matrix — 64 bytes (16 floats)
  • Material index — 4 bytes (1 uint32)
  • LOD offset — 4 bytes (1 float)
  • Flags — 4 bytes (1 uint32)
  • Entity ID — 4 bytes (1 uint32)
// Memory calculation
const instanceCount = 10000;
const bytesPerInstance = 80;
const totalMemory = instanceCount * bytesPerInstance;
console.log('Memory for 10K instances:', (totalMemory / 1024 / 1024).toFixed(2), 'MB');
// Output: Memory for 10K instances: 0.76 MB
// Very memory efficient!
// 100K instances = 7.6 MB
// 1M instances = 76 MB

Instancing Best Practices

• Use same geometry and material for batching
• Keep per-instance data minimal (< 100 bytes)
• Target 100-1000 instances per batch
• Use spatial chunking for frustum culling
• Combine with LOD for distant instances
• Monitor draw calls (target < 100)
Documentation | Web Engine