Instanced Rendering
Learn how to use GPU instancing to render thousands of objects with minimal draw calls for maximum performance.
GPU instancing is a rendering technique that draws multiple copies of the same mesh in a single draw call. This dramatically reduces CPU overhead and enables rendering of thousands of objects at high frame rates.
Performance Impact
What is Instancing#
Traditional rendering issues one draw call per object. Each draw call has overhead for state changes, GPU command submission, and driver validation. With 1000 objects, that's 1000 draw calls and significant CPU bottleneck.
GPU instancing sends the mesh geometry once, along with per-instance data (position, rotation, scale, color) in a single batch. The GPU then draws all instances in one draw call using vertex shaders to position each instance.
Benefits#
- Massive draw call reduction — 1000 objects = 1 draw call instead of 1000 (1000x reduction)
- Lower CPU overhead — Less time spent on draw call submission and state changes
- Better cache utilization — GPU can reuse geometry data across all instances
- Scalability — Supports 10,000+ instances per mesh with minimal overhead
- Per-instance customization — Each instance can have unique transform, color, and material index
When to Use Instancing#
| Use Case | Benefit | Example |
|---|---|---|
| Many identical objects | Maximum draw call reduction | Trees, rocks, grass, props |
| Particle systems | Thousands of particles efficiently | Fire, smoke, rain, sparks |
| Enemies/NPCs | Large crowds and armies | Zombies, soldiers, civilians |
| Procedural content | Efficient generation | Dungeons, cities, terrain details |
| Building blocks | Voxel-like worlds | Minecraft-style games |
GPU Instancing Setup#
STEM provides automatic GPU instancing through the Instanced component. Simply mark entities for instancing and the InstancedRenderSystem handles batching automatically.
Basic Setup#
import { addComponent, Instanced, Transform } from '@stem/core/ecs';import { AssetRegistry } from '@stem/core/assets'; // Load a mesh assetconst treeAssetId = await AssetRegistry.loadGLTF('/models/tree.glb'); // Create 1000 treesfor (let i = 0; i < 1000; i++) { const entity = world.createEntity(); // Add Transform component addComponent(world, Transform, entity); Transform.position[entity][0] = Math.random() * 200 - 100; Transform.position[entity][1] = 0; Transform.position[entity][2] = Math.random() * 200 - 100; Transform.scale[entity][0] = 1 + Math.random() * 0.5; Transform.scale[entity][1] = 1 + Math.random() * 0.5; Transform.scale[entity][2] = 1 + Math.random() * 0.5; // Add Instanced component - triggers automatic batching addComponent(world, Instanced, entity); Instanced.assetId[entity] = treeAssetId;} // Result: 1000 trees rendered in 1-2 draw calls instead of 1000!Spatial Chunk Batching#
STEM uses spatial chunking to group nearby instances into batches. This enables efficient frustum culling and LOD transitions.
// STEM automatically organizes instances into 32x32 meter chunks// Chunks are culled based on camera frustum// Each chunk maintains separate batches per asset/material import { getInstancedChunkTable } from '@stem/core/ecs/systems/InstancedRenderSystem'; const chunkTable = getInstancedChunkTable(); // Get chunk statisticsconst stats = chunkTable.getStats();console.log('Total chunks:', stats.totalChunks);console.log('Visible chunks:', stats.visibleChunks);console.log('Total buckets:', stats.totalBuckets);console.log('Total instances:', stats.totalInstances); // Each visible chunk renders its instances// Hidden chunks are completely skipped (zero overhead)Instance Attributes#
Each instance can have unique attributes sent to the GPU including transform, color, material index, and LOD offset.
Transform (Matrix)#
// Each instance has a 4x4 transform matrix (16 floats = 64 bytes)// Automatically computed from Transform component import { Transform } from '@stem/core/ecs'; // Set positionTransform.position[entity][0] = x;Transform.position[entity][1] = y;Transform.position[entity][2] = z; // Set rotation (euler angles in radians)Transform.rotation[entity][0] = pitch;Transform.rotation[entity][1] = yaw;Transform.rotation[entity][2] = roll; // Or use quaternion directlyTransform.quaternion[entity][0] = qx;Transform.quaternion[entity][1] = qy;Transform.quaternion[entity][2] = qz;Transform.quaternion[entity][3] = qw; // Set scaleTransform.scale[entity][0] = sx;Transform.scale[entity][1] = sy;Transform.scale[entity][2] = sz; // Matrix is automatically composed and uploaded to GPUColor Attribute#
import { addComponent, MeshColor } from '@stem/core/ecs'; // Add color to instanceaddComponent(world, MeshColor, entity);MeshColor.r[entity] = 1.0; // RedMeshColor.g[entity] = 0.5; // GreenMeshColor.b[entity] = 0.2; // Blue // Color is multiplied with base material color in shader// Useful for team colors, damage tinting, varietyMaterial Index#
// Material index allows per-instance material variants// Useful for texture atlases or material arrays import { Instanced } from '@stem/core/ecs'; // Each instance can reference a different material variant// (stored in the asset's material array)const materialIndex = 2; // Use material variant #2 // Material index is automatically managed by the system// based on asset metadata and texture atlasesLOD Offset#
// LOD offset (0.0 - 1.0) for smooth LOD transitions// Automatically managed by InstancedLODSystem // Offset interpolates between LOD levels in shader:// - 0.0 = Current LOD level// - 1.0 = Next LOD level// - 0.5 = 50% blend between LOD levels // Example shader usage:// vec3 position = mix(lodPosition0, lodPosition1, instanceLodOffset);Dynamic Instancing#
Instances can be added, removed, and modified at runtime with efficient GPU buffer updates.
Adding Instances#
import { addComponent, Instanced, Transform } from '@stem/core/ecs'; function spawnTree(x: number, z: number) { const entity = world.createEntity(); addComponent(world, Transform, entity); Transform.position[entity][0] = x; Transform.position[entity][1] = 0; Transform.position[entity][2] = z; addComponent(world, Instanced, entity); Instanced.assetId[entity] = treeAssetId; // Instance is automatically added to appropriate chunk/bucket // GPU buffer is grown if needed (power-of-2 growth) return entity;} // Spawn 100 trees per frame efficientlyfor (let i = 0; i < 100; i++) { spawnTree(Math.random() * 200, Math.random() * 200);}Removing Instances#
import { removeEntity } from '@stem/core/ecs'; function despawnTree(entity: number) { removeEntity(world, entity); // Instance is automatically removed from chunk/bucket // GPU buffer is compacted (swap-remove for O(1) deletion) // No memory leaks or fragmentation}Updating Instances#
import { addComponent, Transform, TransformDirty } from '@stem/core/ecs'; function moveTree(entity: number, x: number, z: number) { // Update position Transform.position[entity][0] = x; Transform.position[entity][2] = z; // Mark as dirty to trigger GPU upload addComponent(world, TransformDirty, entity); // If moved to different chunk, automatically relocated // GPU matrix is updated in next frame} // Efficient update systemexport const TreeWindSystem = (world: IWorld) => { const query = createTypedQuery([Transform, Instanced, TreeWind]); const entities = query(world); const time = performance.now() / 1000; for (let i = 0; i < entities.length; i++) { const eid = entities[i]; // Animate tree swaying const offset = TreeWind.offset[eid]; Transform.rotation[eid][2] = Math.sin(time + offset) * 0.05; // Mark dirty for GPU upload addComponent(world, TransformDirty, eid); } return world;}; // System efficiently updates 1000s of instances per frameBatching Strategies#
STEM uses multiple batching strategies to maximize performance.
Spatial Batching#
Instances are grouped into 32x32 meter chunks based on XZ position. This enables efficient frustum culling and reduces state changes.
// Chunk key computed from positionconst CHUNK_SIZE = 32; // metersconst chunkX = Math.floor(x / CHUNK_SIZE);const chunkZ = Math.floor(z / CHUNK_SIZE);const chunkKey = (chunkX << 16) | (chunkZ & 0xFFFF); // All instances in same chunk are rendered together// Camera frustum culls entire chunks at once// Result: Only visible chunks are processedMaterial Batching#
Within each chunk, instances are further grouped by material signature (geometry + material + texture atlas).
// Render signature for batchinginterface RenderSignature { renderKey: number; // Hash of geometry + material + atlas assetId: number; // Asset ID materialIndex: number; // Material variant atlasId: number; // Texture atlas ID bindlessId: number; // Bindless texture ID} // Instances with same signature are batched// Different materials = different draw calls// Same material + different atlas = different draw calls // Example:// - 500 oak trees = 1 draw call// - 300 pine trees = 1 draw call// - 200 rocks = 1 draw call// Total: 3 draw calls for 1000 instancesBuffer Batching#
GPU buffer updates are batched to minimize state changes and uploads.
// Instead of updating each instance separately:// BAD: 1000 GPU uploadsfor (let i = 0; i < instances.length; i++) { mesh.setMatrixAt(i, matrix); mesh.instanceMatrix.needsUpdate = true; // Upload!} // STEM batches updates efficiently:// GOOD: 1 GPU upload with rangeconst dirty = bucket.getDirtyRange(); // e.g., [50, 150]updateMatrixRange(bucket, dirty.start, dirty.end);mesh.instanceMatrix.addUpdateRange(dirty.start * 16, dirty.count * 16);mesh.instanceMatrix.needsUpdate = true; // Single upload // Benefit: Only dirty instances uploaded (50-90% reduction)When to Use Instancing#
Instancing is ideal for rendering many copies of the same mesh. Here are guidelines for when to use it.
Good Candidates#
| Object Type | Instance Count | Benefit |
|---|---|---|
| Trees | 100-10,000 | Massive forests with minimal overhead |
| Rocks | 500-5,000 | Natural terrain scatter |
| Grass | 1,000-100,000 | Dense grass fields |
| Particles | 1,000-50,000 | Fire, smoke, rain effects |
| Enemies | 50-1,000 | Large crowds and armies |
| Buildings | 100-1,000 | Cities with repeated structures |
| Projectiles | 100-5,000 | Bullets, arrows, magic missiles |
Poor Candidates#
- Unique objects — Hero characters, bosses, unique props (< 5 copies each)
- Skinned meshes — Animated characters (use skeletal animation batching instead)
- Dynamic geometry — Meshes that change shape frequently (use custom shaders)
- Low instance count — < 10 instances (overhead not worth it, use regular rendering)
Instancing Checklist#
When to Instance
• Same geometry used 10+ times
• Objects share the same material
• Per-instance data is simple (transform, color)
• Objects don't need unique animations
• Draw calls are a bottleneck (> 200 calls)
Performance Tips#
Minimize Materials#
// BAD: Different materials = separate batchesconst oakMaterial = new MeshStandardMaterial({ map: oakTexture });const pineMaterial = new MeshStandardMaterial({ map: pineTexture }); // GOOD: Shared material with texture atlasconst treeMaterial = new MeshStandardMaterial({ map: treeAtlas }); // Use UV offsets to select texture region// or material index to select atlas region// Result: All trees in one batchOptimize Geometry#
// Keep instance geometry simple// Target: 100-500 triangles per instance// Maximum: 2,000 triangles per instance // Use LOD for distant instances// - LOD 0 (< 20m): 500 tris// - LOD 1 (20-50m): 200 tris// - LOD 2 (50-100m): 50 tris// - LOD 3 (> 100m): Culled or billboard // Good geometry for instancing:// - Rocks: 50-200 tris// - Trees: 200-1000 tris// - Buildings: 500-2000 tris// - Grass: 2-10 tris (billboard)Buffer Management#
import { getInstancedRenderMetrics } from '@stem/core/ecs/systems/InstancedRenderSystem'; // Monitor buffer pool statsconst metrics = getInstancedRenderMetrics();console.log('Matrix updates:', metrics.matrixUpdateCount);console.log('Buffer acquisitions:', metrics.bufferAcquisitionCount);console.log('WASM path usage:', metrics.wasmPathUsage);console.log('JS path usage:', metrics.jsPathUsage); // Tips:// - Batch additions/removals when possible// - Avoid frequent chunk boundary crossings// - Pre-allocate instances at scene load// - Use dirty flags to minimize updatesMemory Usage#
Each instance uses 80 bytes of GPU memory:
- Matrix — 64 bytes (16 floats)
- Material index — 4 bytes (1 uint32)
- LOD offset — 4 bytes (1 float)
- Flags — 4 bytes (1 uint32)
- Entity ID — 4 bytes (1 uint32)
// Memory calculationconst instanceCount = 10000;const bytesPerInstance = 80;const totalMemory = instanceCount * bytesPerInstance; console.log('Memory for 10K instances:', (totalMemory / 1024 / 1024).toFixed(2), 'MB');// Output: Memory for 10K instances: 0.76 MB // Very memory efficient!// 100K instances = 7.6 MB// 1M instances = 76 MBInstancing Best Practices
• Keep per-instance data minimal (< 100 bytes)
• Target 100-1000 instances per batch
• Use spatial chunking for frustum culling
• Combine with LOD for distant instances
• Monitor draw calls (target < 100)