The Draw Call Problem

And How Universe Solved It Differently

1. The Problem (15 Years of Industry Pain)

456

draw calls

O(n)

CPU traversal

12ms

per frame

Every frame, the CPU walks a scene graph. For every object, it sets uniforms, binds textures, issues a draw call. The GPU waits. The driver validates. State changes flush the pipeline.

456 draw calls. Each one a CPU-to-GPU round trip. Each one a driver validation. Each one a pipeline stall. Multiply by 60 frames per second. This was the #1 bottleneck in game engines for 15 years — not the GPU's ability to shade pixels, but the CPU's ability to tell the GPU what to shade.

The entire history of real-time rendering from 2010 to 2025 is the story of trying to get the CPU out of the GPU's way.

2. How Each Generation Solved It

2015–2020 GPU Instancing + Batching

Merge identical objects into single draw calls — 100 trees become 1 instanced draw
Batch static geometry into combined meshes — an entire building as one vertex buffer
Reduced draw calls from thousands to hundreds
Still CPU-bound on scene graph traversal — the CPU still decides what to draw

CPU scene graph→ instance groups→ batched meshes→ ~100 draw calls→ GPU rasterize→ pixels

2020–2023 GPU-Driven Rendering (UE5 Nanite)

The breakthrough: the CPU stops deciding what to render
CPU uploads ALL scene data once. Submits ONE indirect draw call
GPU compute shader runs frustum + occlusion culling
GPU generates its own draw commands — no CPU per-object overhead
Result: the CPU submits 1 call. The GPU renders 10 billion triangles
Nanite: hierarchical cluster culling (128 tri/cluster), pixel-level LOD, no pop

CPU uploads once→ GPU compute cull→ GPU indirect draw→ GPU rasterize 10B tri→ pixels

2023+ Mesh Shaders (DX12 Ultimate, Vulkan)

GPU generates geometry procedurally — no vertex buffers from CPU at all
The GPU IS the scene graph
CPU does almost nothing per frame — dispatches a single compute pass
Geometry is created, culled, and rasterized entirely on-chip

CPU dispatch→ GPU task shader→ GPU mesh shader→ GPU generates tris→ GPU rasterize→ pixels

Now Cloud Gaming (GeForce NOW, Xbox Cloud)

Server renders on A100/H100 GPUs — full Nanite/mesh shader pipeline on dedicated hardware
Streams compressed video over WebRTC to the client
Client is a video decoder — no GPU required
20–60ms input-to-photon latency
The draw call problem becomes someone else's problem — you just watch the pixels

client input→ network→ server GPU renders→ JPEG/H.264→ network→ canvas.drawImage

3. What They All Converge On

ONE dispatch per frame → GPU decides everything → pixels out

Every generation gets closer to this. The question is how you get there.

Approach	How it achieves "one dispatch"	Still needs
GPU instancing	Merge identical meshes	CPU scene graph
GPU-driven (Nanite)	GPU compute culls, GPU indirect draw	CPU uploads objects once
Mesh shaders	GPU generates geometry on-chip	CPU uploads parameters
Cloud streaming	Server GPU renders, client decodes video	Server infrastructure
SDF raymarching	One fullscreen quad, one shader	Nothing — geometry IS math
Lithos megakernel	One AGX dispatch, font table → silicon	Nothing

4. Where Universe Actually Sits

Universe's SDF path is already past where the AAA industry is going.

The industry spent a decade learning that the scene graph is the enemy. Nanite's genius is making the GPU manage the scene graph instead of the CPU. But they're still within the paradigm of "there are objects, and we must decide which ones to draw."

SDF raymarching doesn't manage objects. There are no objects. There's sceneSDF(vec3 p) — a function that returns a distance. No vertex buffers. No index buffers. No draw calls. No culling. No LOD chains. No scene graph. Just a function.

Nanite

"Let the GPU manage the scene graph efficiently"

—

Lithos

"There is no scene graph"

Both get to 1 dispatch per frame. Lithos gets there with 69KB instead of a 30MB runtime.

5. What Universe Is Actually Missing

Not the rendering — the interactivity.

1. Dynamic Scenes

Objects move, physics collide. SDF must be re-emitted when objects move. The bull breathes but its position is baked. A game needs the bull to charge, knock over barrels, respond to the player. SDF re-emission per frame is the unsolved problem.

2. Collision & Physics

GJK/SAT/impulse solvers. Universe has no physics. The bull breathes but doesn't walk. A Nanite world has rigid body dynamics, ragdolls, destructible geometry. Universe has contemplation.

3. Networking

Multiplayer state sync. Universe is single-user. 100 players in a destructible building need server-authoritative tick rates, delta compression, client prediction. Universe needs none of this.

4. Asset Streaming

LOD chains, virtual textures, mipmap hierarchies. SDF doesn't need this — math is infinite resolution. You don't stream a polynomial. You evaluate it.

6. The Honest Architecture Map

What AAA Has That Universe Doesn't

GPU compute for scene management
Rigid body physics engine
Collision detection (GJK/SAT)
Asset pipeline (FBX/glTF import)
Animation state machines
Networking / multiplayer
Destructible geometry
Path-traced global illumination

What Universe Has That AAA Doesn't

Scene as pure function — sceneSDF(p)
Server-side shader specialization
Substrate-level compilation (Lithos)
Infinite resolution without LOD
69KB full scene binary
Zero scene graph, zero GC
Inference in the same dispatch
Font table → AGX silicon path

What Both Converge Toward

One dispatch per frame
GPU does all work
CPU near-idle at render time
Pixel streaming to client
Compute-first architecture
No per-object CPU overhead

7. The Path Forward

The streaming mode (/virgo/stream) IS cloud gaming. Dawn WebGPU on M4 → JPEG over WebSocket → canvas.drawImage. That's GeForce NOW at home scale. No A100 datacenter required — an M4 Mac Mini renders the cosmos and streams it to any browser.

The Pipeline Today

lithos-emit.mjs→ GLSL → WGSL→ Dawn WebGPU→ M4 GPU render→ JPEG→ WebSocket→ any browser

The Lithos Endgame

scene.ls→ font table compile→ AGX megakernel→ 40k threads on silicon→ pixels

The Lithos endgame is more radical: instead of rendering through a graphics API, the megakernel dispatches 40k threads directly on AGX silicon through the font table. No API. No driver overhead. No runtime. The math IS the bytes on the GPU.

The industry converges on "GPU does everything"

Universe converges on "there's nothing for the GPU to manage — just a function to evaluate"

For a cosmos — for a contemplative universe of terrain and stars and zodiac homes — SDF is the right architecture. The industry's solutions are for games with 100 players shooting each other in destructible buildings. You're building a universe where someone walks through a meadow and listens to their sign's ambient soundscape.

Different problem, different solution. Yours is more elegant for what it does.