Instanced Rendering

Learn how to use GPU instancing to render thousands of objects with minimal draw calls for maximum performance.

GPU instancing is a rendering technique that draws multiple copies of the same mesh in a single draw call. This dramatically reduces CPU overhead and enables rendering of thousands of objects at high frame rates.

Performance Impact

Instancing can reduce draw calls from 1000+ to just 1, improving performance by 10-100x for scenes with many identical objects like trees, rocks, enemies, and particles.

What is Instancing#

Traditional rendering issues one draw call per object. Each draw call has overhead for state changes, GPU command submission, and driver validation. With 1000 objects, that's 1000 draw calls and significant CPU bottleneck.

GPU instancing sends the mesh geometry once, along with per-instance data (position, rotation, scale, color) in a single batch. The GPU then draws all instances in one draw call using vertex shaders to position each instance.

Benefits#

Massive draw call reduction — 1000 objects = 1 draw call instead of 1000 (1000x reduction)
Lower CPU overhead — Less time spent on draw call submission and state changes
Better cache utilization — GPU can reuse geometry data across all instances
Scalability — Supports 10,000+ instances per mesh with minimal overhead
Per-instance customization — Each instance can have unique transform, color, and material index

When to Use Instancing#

Use Case	Benefit	Example
Many identical objects	Maximum draw call reduction	Trees, rocks, grass, props
Particle systems	Thousands of particles efficiently	Fire, smoke, rain, sparks
Enemies/NPCs	Large crowds and armies	Zombies, soldiers, civilians
Procedural content	Efficient generation	Dungeons, cities, terrain details
Building blocks	Voxel-like worlds	Minecraft-style games

GPU Instancing Setup#

STEM provides automatic GPU instancing through the Instanced component. Simply mark entities for instancing and the InstancedRenderSystem handles batching automatically.

Basic Setup#

import { addComponent, Instanced, Transform } from '@stem/core/ecs';
import { AssetRegistry } from '@stem/core/assets';
 
// Load a mesh asset
const treeAssetId = await AssetRegistry.loadGLTF('/models/tree.glb');
 
// Create 1000 trees
for (let i = 0; i < 1000; i++) {
  const entity = world.createEntity();
 
  // Add Transform component
  addComponent(world, Transform, entity);
  Transform.position[entity][0] = Math.random() * 200 - 100;
  Transform.position[entity][1] = 0;
  Transform.position[entity][2] = Math.random() * 200 - 100;
  Transform.scale[entity][0] = 1 + Math.random() * 0.5;
  Transform.scale[entity][1] = 1 + Math.random() * 0.5;
  Transform.scale[entity][2] = 1 + Math.random() * 0.5;
 
  // Add Instanced component - triggers automatic batching
  addComponent(world, Instanced, entity);
  Instanced.assetId[entity] = treeAssetId;
}
 
// Result: 1000 trees rendered in 1-2 draw calls instead of 1000!

Spatial Chunk Batching#

STEM uses spatial chunking to group nearby instances into batches. This enables efficient frustum culling and LOD transitions.

// STEM automatically organizes instances into 32x32 meter chunks
// Chunks are culled based on camera frustum
// Each chunk maintains separate batches per asset/material
 
import { getInstancedChunkTable } from '@stem/core/ecs/systems/InstancedRenderSystem';
 
const chunkTable = getInstancedChunkTable();
 
// Get chunk statistics
const stats = chunkTable.getStats();
console.log('Total chunks:', stats.totalChunks);
console.log('Visible chunks:', stats.visibleChunks);
console.log('Total buckets:', stats.totalBuckets);
console.log('Total instances:', stats.totalInstances);
 
// Each visible chunk renders its instances
// Hidden chunks are completely skipped (zero overhead)

Instance Attributes#

Each instance can have unique attributes sent to the GPU including transform, color, material index, and LOD offset.

Transform (Matrix)#

// Each instance has a 4x4 transform matrix (16 floats = 64 bytes)
// Automatically computed from Transform component
 
import { Transform } from '@stem/core/ecs';
 
// Set position
Transform.position[entity][0] = x;
Transform.position[entity][1] = y;
Transform.position[entity][2] = z;
 
// Set rotation (euler angles in radians)
Transform.rotation[entity][0] = pitch;
Transform.rotation[entity][1] = yaw;
Transform.rotation[entity][2] = roll;
 
// Or use quaternion directly
Transform.quaternion[entity][0] = qx;
Transform.quaternion[entity][1] = qy;
Transform.quaternion[entity][2] = qz;
Transform.quaternion[entity][3] = qw;
 
// Set scale
Transform.scale[entity][0] = sx;
Transform.scale[entity][1] = sy;
Transform.scale[entity][2] = sz;
 
// Matrix is automatically composed and uploaded to GPU

Color Attribute#

import { addComponent, MeshColor } from '@stem/core/ecs';
 
// Add color to instance
addComponent(world, MeshColor, entity);
MeshColor.r[entity] = 1.0; // Red
MeshColor.g[entity] = 0.5; // Green
MeshColor.b[entity] = 0.2; // Blue
 
// Color is multiplied with base material color in shader
// Useful for team colors, damage tinting, variety

Material Index#

// Material index allows per-instance material variants
// Useful for texture atlases or material arrays
 
import { Instanced } from '@stem/core/ecs';
 
// Each instance can reference a different material variant
// (stored in the asset's material array)
const materialIndex = 2; // Use material variant #2
 
// Material index is automatically managed by the system
// based on asset metadata and texture atlases

LOD Offset#

// LOD offset (0.0 - 1.0) for smooth LOD transitions
// Automatically managed by InstancedLODSystem
 
// Offset interpolates between LOD levels in shader:
// - 0.0 = Current LOD level
// - 1.0 = Next LOD level
// - 0.5 = 50% blend between LOD levels
 
// Example shader usage:
// vec3 position = mix(lodPosition0, lodPosition1, instanceLodOffset);

Dynamic Instancing#

Instances can be added, removed, and modified at runtime with efficient GPU buffer updates.

Adding Instances#

import { addComponent, Instanced, Transform } from '@stem/core/ecs';
 
function spawnTree(x: number, z: number) {
  const entity = world.createEntity();
 
  addComponent(world, Transform, entity);
  Transform.position[entity][0] = x;
  Transform.position[entity][1] = 0;
  Transform.position[entity][2] = z;
 
  addComponent(world, Instanced, entity);
  Instanced.assetId[entity] = treeAssetId;
 
  // Instance is automatically added to appropriate chunk/bucket
  // GPU buffer is grown if needed (power-of-2 growth)
 
  return entity;
}
 
// Spawn 100 trees per frame efficiently
for (let i = 0; i < 100; i++) {
  spawnTree(Math.random() * 200, Math.random() * 200);
}

Removing Instances#

import { removeEntity } from '@stem/core/ecs';
 
function despawnTree(entity: number) {
  removeEntity(world, entity);
 
  // Instance is automatically removed from chunk/bucket
  // GPU buffer is compacted (swap-remove for O(1) deletion)
  // No memory leaks or fragmentation
}

Updating Instances#

import { addComponent, Transform, TransformDirty } from '@stem/core/ecs';
 
function moveTree(entity: number, x: number, z: number) {
  // Update position
  Transform.position[entity][0] = x;
  Transform.position[entity][2] = z;
 
  // Mark as dirty to trigger GPU upload
  addComponent(world, TransformDirty, entity);
 
  // If moved to different chunk, automatically relocated
  // GPU matrix is updated in next frame
}
 
// Efficient update system
export const TreeWindSystem = (world: IWorld) => {
  const query = createTypedQuery([Transform, Instanced, TreeWind]);
  const entities = query(world);
  const time = performance.now() / 1000;
 
  for (let i = 0; i < entities.length; i++) {
    const eid = entities[i];
 
    // Animate tree swaying
    const offset = TreeWind.offset[eid];
    Transform.rotation[eid][2] = Math.sin(time + offset) * 0.05;
 
    // Mark dirty for GPU upload
    addComponent(world, TransformDirty, eid);
  }
 
  return world;
};
 
// System efficiently updates 1000s of instances per frame

Batching Strategies#

STEM uses multiple batching strategies to maximize performance.

Spatial Batching#

Instances are grouped into 32x32 meter chunks based on XZ position. This enables efficient frustum culling and reduces state changes.

// Chunk key computed from position
const CHUNK_SIZE = 32; // meters
const chunkX = Math.floor(x / CHUNK_SIZE);
const chunkZ = Math.floor(z / CHUNK_SIZE);
const chunkKey = (chunkX << 16) | (chunkZ & 0xFFFF);
 
// All instances in same chunk are rendered together
// Camera frustum culls entire chunks at once
// Result: Only visible chunks are processed

Material Batching#

Within each chunk, instances are further grouped by material signature (geometry + material + texture atlas).

// Render signature for batching
interface RenderSignature {
  renderKey: number;      // Hash of geometry + material + atlas
  assetId: number;        // Asset ID
  materialIndex: number;  // Material variant
  atlasId: number;        // Texture atlas ID
  bindlessId: number;     // Bindless texture ID
}
 
// Instances with same signature are batched
// Different materials = different draw calls
// Same material + different atlas = different draw calls
 
// Example:
// - 500 oak trees = 1 draw call
// - 300 pine trees = 1 draw call
// - 200 rocks = 1 draw call
// Total: 3 draw calls for 1000 instances

Buffer Batching#

GPU buffer updates are batched to minimize state changes and uploads.

// Instead of updating each instance separately:
// BAD: 1000 GPU uploads
for (let i = 0; i < instances.length; i++) {
  mesh.setMatrixAt(i, matrix);
  mesh.instanceMatrix.needsUpdate = true; // Upload!
}
 
// STEM batches updates efficiently:
// GOOD: 1 GPU upload with range
const dirty = bucket.getDirtyRange(); // e.g., [50, 150]
updateMatrixRange(bucket, dirty.start, dirty.end);
mesh.instanceMatrix.addUpdateRange(dirty.start * 16, dirty.count * 16);
mesh.instanceMatrix.needsUpdate = true; // Single upload
 
// Benefit: Only dirty instances uploaded (50-90% reduction)

When to Use Instancing#

Instancing is ideal for rendering many copies of the same mesh. Here are guidelines for when to use it.

Good Candidates#

Object Type	Instance Count	Benefit
Trees	100-10,000	Massive forests with minimal overhead
Rocks	500-5,000	Natural terrain scatter
Grass	1,000-100,000	Dense grass fields
Particles	1,000-50,000	Fire, smoke, rain effects
Enemies	50-1,000	Large crowds and armies
Buildings	100-1,000	Cities with repeated structures
Projectiles	100-5,000	Bullets, arrows, magic missiles

Poor Candidates#

Unique objects — Hero characters, bosses, unique props (< 5 copies each)
Skinned meshes — Animated characters (use skeletal animation batching instead)
Dynamic geometry — Meshes that change shape frequently (use custom shaders)
Low instance count — < 10 instances (overhead not worth it, use regular rendering)

Instancing Checklist#

When to Instance

Use instancing when:
• Same geometry used 10+ times
• Objects share the same material
• Per-instance data is simple (transform, color)
• Objects don't need unique animations
• Draw calls are a bottleneck (> 200 calls)

Performance Tips#

Minimize Materials#

// BAD: Different materials = separate batches
const oakMaterial = new MeshStandardMaterial({ map: oakTexture });
const pineMaterial = new MeshStandardMaterial({ map: pineTexture });
 
// GOOD: Shared material with texture atlas
const treeMaterial = new MeshStandardMaterial({ map: treeAtlas });
 
// Use UV offsets to select texture region
// or material index to select atlas region
// Result: All trees in one batch

Optimize Geometry#

// Keep instance geometry simple
// Target: 100-500 triangles per instance
// Maximum: 2,000 triangles per instance
 
// Use LOD for distant instances
// - LOD 0 (< 20m): 500 tris
// - LOD 1 (20-50m): 200 tris
// - LOD 2 (50-100m): 50 tris
// - LOD 3 (> 100m): Culled or billboard
 
// Good geometry for instancing:
// - Rocks: 50-200 tris
// - Trees: 200-1000 tris
// - Buildings: 500-2000 tris
// - Grass: 2-10 tris (billboard)

Buffer Management#

import { getInstancedRenderMetrics } from '@stem/core/ecs/systems/InstancedRenderSystem';
 
// Monitor buffer pool stats
const metrics = getInstancedRenderMetrics();
console.log('Matrix updates:', metrics.matrixUpdateCount);
console.log('Buffer acquisitions:', metrics.bufferAcquisitionCount);
console.log('WASM path usage:', metrics.wasmPathUsage);
console.log('JS path usage:', metrics.jsPathUsage);
 
// Tips:
// - Batch additions/removals when possible
// - Avoid frequent chunk boundary crossings
// - Pre-allocate instances at scene load
// - Use dirty flags to minimize updates

Memory Usage#

Each instance uses 80 bytes of GPU memory:

Matrix — 64 bytes (16 floats)
Material index — 4 bytes (1 uint32)
LOD offset — 4 bytes (1 float)
Flags — 4 bytes (1 uint32)
Entity ID — 4 bytes (1 uint32)

// Memory calculation
const instanceCount = 10000;
const bytesPerInstance = 80;
const totalMemory = instanceCount * bytesPerInstance;
 
console.log('Memory for 10K instances:', (totalMemory / 1024 / 1024).toFixed(2), 'MB');
// Output: Memory for 10K instances: 0.76 MB
 
// Very memory efficient!
// 100K instances = 7.6 MB
// 1M instances = 76 MB

Instancing Best Practices

• Use same geometry and material for batching
• Keep per-instance data minimal (< 100 bytes)
• Target 100-1000 instances per batch
• Use spatial chunking for frustum culling
• Combine with LOD for distant instances
• Monitor draw calls (target < 100)

PreviousProfiling NextLOD System