03 foundation
What a Frame Knows Before It Sees the Light
The G-Buffer and the geometry pass. Why deferred over forward, multi-render-target setup, G-Buffer layout, and writing scene data without a single light.
What a Frame Knows Before It Sees the Light
Deferred rendering begins with a question: what if the geometry pass and the lighting pass never had to meet?
The Triangle Was a Handshake. This Is a Conversation.
The last post ended with a triangle on screen and eight concepts in our toolkit. That was the proof — the CPU and GPU can talk. But the triangle pipeline was a straight line: vertices go in one end, pixels come out the other. One shader, one pass, one output.
A real frame is not a line. It is a graph of dependent transformations — passes that write data other passes need, resources that change meaning as the frame progresses, dependencies that must be resolved before anything executes. The moment you need two passes that share an intermediate result, the architecture has to answer questions the triangle pipeline never asked: who goes first? How does the second pass find the first pass’s output? Who makes sure the data is ready?
By the end of this post, we will write scene geometry into structured buffers without computing a single light. That sounds like a half-finished feature. It is actually the most important architectural decision in the pipeline — because deferred rendering is not a rendering technique. It is a separation of concerns.
Forward Rendering: The Simplest Thing That Works
Before we defer anything, let’s understand what we’re deferring from.
Forward rendering is the straightforward approach: for each object in the scene, run a shader that computes the final lit color in one pass. The vertex shader transforms the geometry. The fragment shader samples the material, loops over every light, accumulates their contributions, and writes the finished pixel.
// Forward fragment shader (pseudocode)
input: worldPosition, worldNormal, albedo
uniform: lights[], cameraPos
main:
color = vec3(0)
for each light in lights:
diffuse = max(dot(normal, lightDir), 0) * light.color
specular = blinnPhong(normal, lightDir, viewDir) * light.color
atten = 1.0 / (distance * distance)
color += (diffuse + specular) * atten
output = vec4(albedo * color, 1.0)
Simple. Direct. One pass, one output, done. And for many scenarios, this is the right answer.
Forward rendering has real advantages that are easy to overlook when you’re excited about deferred techniques. There are no intermediate buffers — no G-Buffer eating bandwidth with 16-bit float textures. There is no second fullscreen pass to read those buffers back. Every computation happens once, in place, for each visible fragment.
On tiled GPUs — the architecture powering every mobile device and Apple Silicon — these advantages are amplified. Tiled renderers split the screen into small tiles and process each tile entirely in on-chip memory. A forward shader that reads material data and writes a final color does everything inside the tile, never touching main memory for intermediates. The bandwidth savings are significant. This is why mobile engines and frameworks like Unity’s Universal Render Pipeline default to forward rendering on these platforms — not because deferred is too expensive to compute, but because the memory traffic it generates fights the hardware’s fundamental design.
The tradeoff appears when complexity grows. Forward rendering’s cost scales as O(objects x lights). Every fragment of every object evaluates every light. With one directional light and a simple scene, that’s cheap. With fifty point lights and overlapping geometry, the same pixel might get lit three times by three different triangles that all land on the same screen position — and each time, it evaluates all fifty lights. Overdraw multiplies the already-linear light cost.
Forward rendering is not the wrong answer. It is the wrong answer for what we’re building. We want a pipeline where adding a new post-processing effect — ambient occlusion, screen-space reflections, bloom — means adding a new pass, not rewriting the lighting shader. That requires intermediate data that multiple passes can share. That requires deferred.
Worth noting: the industry hasn’t settled on a single winner. Tiled deferred, forward+, and clustered shading are hybrids that try to combine the bandwidth efficiency of forward with the scalability of deferred. They’re worth exploring, but not today. Today we commit to the deferred path and see where it leads.
Decomposing a Frame: Why Passes Exist
Here is the deeper question: why split a frame into independent stages at all? Why not write one enormous shader that handles geometry, lighting, ambient occlusion, tonemapping, and UI compositing in a single dispatch?
The answer is the same reason you decompose any program into functions.
Each pass has a single responsibility, declared inputs, declared outputs, and no knowledge of the passes around it. The geometry pass writes surface data. It does not know how many lights will illuminate that surface. The lighting pass reads surface data and computes illumination. It does not know how many triangles produced that data — or whether the data came from rasterization, ray tracing, or a neural network. The tonemap pass takes HDR values and maps them to displayable range. It doesn’t care whether those values came from deferred lighting, path tracing, or a lookup table.
Each pass is independently replaceable. You can swap the tonemapper without touching the lighting code. You can rewrite the geometry pass to support skinned meshes without changing anything downstream. The passes compose because they agree on an interface — the resources they share — and nothing else.
Here is a concrete example of what this buys you. Later in this series, we’ll implement screen-space ambient occlusion. SSAO reads the G-Buffer’s normal and depth data, computes an occlusion value per pixel, and writes it to a new texture. The lighting pass then reads that occlusion texture and darkens corners and crevices. Adding SSAO means:
- Declare a new pass with its inputs and outputs
- Write the shader
- Connect it to the graph
No existing pass changes. The geometry pass doesn’t know SSAO exists. The lighting pass gains one extra texture input. That’s it.
If this sounds familiar, it should. A pass is a function. Its inputs are the resources it reads — function arguments. Its outputs are the resources it writes — return values. A frame is a composition of these functions. And the system that describes which functions compose, in what order, with what dependencies? That’s a render graph. It is the composition operator for an entire frame.
Post 2 said “the pipeline is immutable data.” This post takes the next step: each pass is a pure function, and a frame is a composition of pure functions. The render graph makes that composition explicit — but that’s the next post. First, let’s build the function that starts the frame.
The G-Buffer: A Frame’s Memory
The G-Buffer — geometry buffer — is a set of textures that store everything the lighting pass will need to compute light, without storing the light itself. Think of it as the geometry pass’s return value: a structured record of what the scene looks like from the camera’s perspective, broken into components that have independent physical meaning.
Each texture in the G-Buffer stores one aspect of the scene’s surfaces:
World Position (R16G16B16A16Sfloat) — For every pixel, where in 3D space is the surface? This is the raw spatial information: the point in the world that this pixel “sees.” We use 16-bit floating point because world positions can span large ranges (a scene might be hundreds of units wide), and range matters more than precision here. A 32-bit float per channel would double the bandwidth for accuracy we don’t need at this stage. Note: you can reconstruct position from the depth buffer and the inverse view-projection matrix, which saves a whole render target. That’s an optimization for later — for now, the explicit position buffer keeps the lighting shader simple and the concepts clear.
World Normal (R16G16B16A16Sfloat) — Which direction does the surface face at this pixel? The normal vector determines how light interacts with the surface — a surface facing the light receives full illumination, a surface facing away receives none. We use 16-bit float because normals live in the [-1, 1] range and need good precision across that range. Small errors in normals produce visible lighting artifacts, especially in specular highlights.
Albedo (R8G8B8A8Unorm) — What color is this surface before any light hits it? This is the base reflectance — red brick, gray concrete, green grass — independent of illumination. Eight bits per channel is enough because human color perception is roughly 8-bit. You could argue for sRGB encoding here, but unsigned normalized keeps the shader math simple and the precision loss is negligible in practice.
Depth (implicit) — How far from the camera is this surface? We get this for free from the hardware depth test, which every rendering pipeline needs anyway for correct occlusion. The depth buffer tells us which surfaces are in front of which — and later, when we implement SSAO, it will tell us how close surfaces are to each other.
The G-Buffer fragment shader is almost absurdly simple:
// G-Buffer fragment shader (pseudocode)
input: worldPosition, worldNormal, uv
output: rt0 (position), rt1 (normal), rt2 (albedo)
main:
rt0 = vec4(worldPosition, 1.0)
rt1 = vec4(normalize(worldNormal), 0.0)
rt2 = vec4(albedoColor, 1.0)
Three outputs. No lighting math. No light loop. No conditionals. The shader is trivially simple because it has been freed from lighting responsibility entirely. That simplicity is not a shortcut — it is the payoff of decomposition. The geometry pass does one thing, and it does it completely.
Writing Geometry, Not Pixels
Setting up the G-Buffer pass means configuring a render pass with multiple outputs — something the triangle pipeline didn’t need.
The render pass declares three color attachments (position, normal, albedo) and one depth attachment. Each color attachment is cleared at the start and stored at the end. The depth attachment enables correct occlusion — closer surfaces overwrite farther ones.
The pipeline binds the vertex and fragment shaders, the vertex layout (position, normal, UV), and the rasterization state. Push constants carry the model matrix and the view-projection matrix — the minimum the vertex shader needs to transform geometry from object space to world space and from world space to screen space.
The entire pass, in pseudocode:
beginRenderPass(cmd, gbufferPass, framebuffer,
clearColors: [black, black, black],
clearDepth: 1.0)
bindPipeline(cmd, gbufferPipeline)
bindVertexBuffer(cmd, meshVertexBuffer)
bindIndexBuffer(cmd, meshIndexBuffer)
pushConstants(cmd, { model: modelMatrix, viewProj: viewProjMatrix })
drawIndexed(cmd, indexCount)
endRenderPass(cmd)
Clear. Bind. Push. Draw. That is the entire geometry pass. No lighting uniforms, no light count, no shadow maps. The pass doesn’t know those concepts exist.
Now picture what comes out:
A box with typed inputs and typed outputs. A function signature drawn as a diagram.
In the engine, I can switch a debug visualization to show any of these buffers in real time. The position buffer renders as a colorful map of world-space coordinates — surfaces closer to the origin are dark, surfaces farther away are bright, each axis mapped to a color channel. The normal buffer shows surface directions as RGB — flat surfaces facing up are green, walls facing the camera are blue, angled surfaces blend between channels. The albedo buffer shows the scene’s unlit color — the surface texture as if every light were turned off.
Each buffer tells a different story about the scene. Together, they tell the whole story — except for light.
Everything the Light Needs, Nothing It Doesn’t
Let’s take stock of where we are.
We have a pass that transforms scene geometry into structured data. For every visible pixel, we now know three things: where the surface is in the world, which direction it faces, and what color it is before illumination. A lighting algorithm — any lighting algorithm — could take this data and produce a fully lit image. The information is complete.
But we haven’t written that algorithm yet. And beyond the algorithm itself, we haven’t answered a harder set of questions. How does the lighting pass find the G-Buffer? The textures exist in GPU memory, but nothing connects them to the shader that needs them. How does the system guarantee the geometry pass finishes before the lighting pass reads its output? On a GPU, execution is asynchronous — without explicit synchronization, the lighting shader might sample textures that are still being written. And how do we describe this dependency between passes without hard-coding the execution order into the application?
These are not rendering problems. They are composition problems. And they do not arise in a forward pipeline — because a forward pipeline has no composition. One pass, one output, no intermediates. The cost of decomposition is that you need a system to recompose.
The G-Buffer is a promise. It says: here is everything you need to light this frame. The surface data is there. The normal data is there. The albedo data is there. But a promise is not a program.
To turn these buffers into a lit image, we need two things: a lighting pass that reads the G-Buffer and computes illumination, and a system that composes passes — that figures out the execution order, inserts the synchronization, and manages the resource transitions between them. The lighting pass is a shader. The composition system is something more interesting — it is a render graph, and it is pure data.
That is the next post. We will pick up these buffers, light them, and discover that the architecture for composing passes is the same architecture we’ve been building all along: describe the work as immutable data, let a pure function figure out the order, and push the side effects to the boundary.
The data is ready. The screen is dark. The light comes next.