Draft Work in progress. Wording, structure, and claims may still change. Feedback welcome. ← Back to roadmap

06 lighting

Multiple Point Lights — The Payoff of Deferred

Scaling lights in deferred shading. Light volumes, attenuation, and per-light accumulation in the lighting pass.

When One Light Becomes Many

A single light is a tutorial. Lots of lights is the reason deferred shading exists. This is the post where the G-Buffer finally pays for itself.


The Bill Comes Due

The last post left the engine with a Blinn-Phong lighting pass that knew about exactly one light, threaded through push constants. That post-also promised that the G-Buffer was going to make this scale. Time to cash that promise.

The interesting question isn’t “how do I add a for loop.” It’s where the per-light data has to live, what shape it takes, and which part of the engine writes it. Push constants stop being the answer the moment you have more than a couple of lights — Vulkan’s spec only guarantees 128 bytes, which is two Vector4s of light data and not much else. The whole transport has to change.

This post swaps the push-constant light for a per-frame storage buffer of lights, threads a lightCount through, and lets the shader loop. The shader edit is trivial. The interesting part — as usual — is what happens on the CPU side, where an immutable Scene has to become bytes the GPU can read without anyone reaching across the purity boundary.


Why Push Constants Are A Dead End

Push constants are the cheapest transport Vulkan exposes. The driver packs them into command-buffer state and the shader reads them through dedicated registers — no descriptor set, no buffer upload, no synchronization to think about. For one light, they were perfect.

Two reasons the model breaks for many:

  • Capacity. Vulkan guarantees 128 bytes of push-constant space. A single point light at vec4 position + vec4 color is already 32 bytes. Three lights and the camera position is the entire budget on a spec-minimum device. Real scenes want tens of lights.
  • Shape. Push constants are a fixed-size struct, declared in the shader at compile time. A variable-length array of lights doesn’t fit the model. You can simulate one with a max-size array and a count, but you’ve now picked an arbitrary maximum, baked it into the pipeline, and have to recompile to change it.

The standard escape hatch is a storage buffer — an SSBO, in GL parlance, or a VK_DESCRIPTOR_TYPE_STORAGE_BUFFER in Vulkan. Variable length, big enough for any reasonable scene, addressable as a flat array in the shader. The cost is one extra descriptor set and one extra buffer upload per frame. For lighting data that’s free.


The Shader Edit Is The Easy Part

Here is the lighting shader after the swap. I’m only showing what changed.

struct Light {
    vec4 positionType;    // xyz = world position, w = type tag
    vec4 directionPad;    // unused for point lights
    vec4 colorIntensity;  // rgb = color, a = intensity
};

layout(set = 1, binding = 0, std430) readonly buffer Lights {
    Light lights[];
};

layout(push_constant) uniform LightParams {
    vec4 cameraPos;
    int  shadingMode;
    int  lightingOnly;
    int  lightCount;
    // ...
} pc;

void main() {
    // ... sample G-Buffer the same way ...

    vec3 accum = vec3(0.0);
    for (int i = 0; i < pc.lightCount; i++) {
        accum += shadeLight(lights[i], fragPos, N, V, albedo,
                            specularStrength, shininess, stripAlbedo);
    }

    outColor = vec4(ambient + accum, 1.0);
}

Three things to notice:

  1. The buffer is variable length. Light lights[] is a runtime-sized array. The shader has no compile-time opinion about how many lights exist; it learns the count from pc.lightCount each frame.
  2. The light is already shaped for variants. positionType.w is a type tag. Right now there’s only one type — point — so the tag is always zero. The fields directionPad is dead weight. That’s deliberate: the next post adds directional lights as a second variant, packed into the same buffer, and the layout already has the room. Same trick as the G-Buffer alpha channels in the last post — find the slack in your data shape and let it work for you.
  3. shadeLight is a pure function. All per-light logic moves into a helper that takes a Light and the surface state and returns a color. The for loop in main is just a fold over the buffer.

Per-pixel cost goes from O(1) to O(L) where L is the light count. That’s exactly what we wanted. Per-pixel cost was the whole motivation for going deferred — geometry cost is paid once, lighting cost scales independently.


The CPU Side Is Where The Functional Story Lives

The shader sees an array of Light structs in std430 layout. The scene sees an immutable record PointLight(position, color, intensity). Something has to translate.

The naive translator is a method on Scene that copies fields into a writeable buffer. That works, but it puts GPU layout knowledge inside the pure scene module, which is exactly where the project rules say it doesn’t belong. The boundary fix is a pure codec — a free function from domain types to a GPU-shaped struct — that lives by itself, has a single responsibility, and is testable without spinning up a Vulkan device.

// RenderLab.Scene/GpuLight.cs
[StructLayout(LayoutKind.Sequential, Pack = 4)]
public readonly record struct GpuLight(
    Vector4 PositionType,
    Vector4 DirectionPad,
    Vector4 ColorIntensity)
{
    public const int TypePoint = 0;
}
// RenderLab.Scene/LightPacking.cs
public static class LightPacking
{
    public static GpuLight Pack(PointLight light) => new(
        PositionType: new Vector4(light.Position, GpuLight.TypePoint),
        DirectionPad: Vector4.Zero,
        ColorIntensity: new Vector4(light.Color, light.Intensity.Value));

    public static int PackInto(ReadOnlySpan<Light> lights, Span<GpuLight> destination)
    {
        if (destination.Length < lights.Length)
            throw new ArgumentException(...);

        int written = 0;
        for (int i = 0; i < lights.Length; i++)
            if (lights[i] is PointLight p)
                destination[written++] = Pack(p);
        return written;
    }
}

A few moves worth naming:

  • Pack is a one-liner free function. Domain in, GPU layout out, no allocations, no side effects. This is the kind of code you can stare at and verify by eye.
  • PackInto writes into a span the caller owns. No allocation inside the codec — the imperative shell holds the buffer (it has to; the buffer is a Vulkan resource), and the codec just copies bytes into it. This is the right shape for a hot path: zero garbage, no hidden allocations, but the logic — the field mapping — stays pure.
  • It’s covered by tests. LightPackingTests verifies that the byte layout matches what the shader expects. Those tests run on a CI box with no GPU, no driver, no swapchain — they just compare struct fields. That’s the dividend pure code keeps paying.

The composition root then does the boring imperative-shell glue: ask the scene for its lights, get a mapped pointer to the per-frame SSBO, hand both to PackInto, write lightCount into the push constants. Side effects isolated to one place.


ImGui Catches Up

Adding lights at runtime needs UI. The lighting debug menu grew an “Add light” button, a per-light editor, and a remove button. The interesting part is how it stays compatible with the immutable scene.

The ImGui-facing function takes the current light list in, draws sliders, returns a new list out. Edits are expressed as record with — a new immutable list with one element replaced. Add and remove are ImmutableArray<T>.Add and RemoveAt. The pure UI module produces a UiMsg (the Elm-style discriminated union) for each operation, and the update function rebuilds the model. The pure scene snapshot for the next frame is a fresh value derived from the new model.

There is no light-list mutation anywhere in the engine. There is no observer pattern, no event bus, no “did this light change?” check. The renderer just consumes whatever scene it’s handed each frame and packs it into the SSBO. If the list is identical to last frame’s, the packed bytes are identical too, and the GPU does the same work; if it isn’t, the new state shows up next frame. State diffing is implicit in the “new value per frame” pipeline.

It’s not free — packing the whole list every frame costs cycles. For tens of lights, those cycles are nothing compared to the per-pixel shading work. For thousands of lights, it would be — but at that scale the bottleneck is no longer the upload; it’s the per-pixel O(L) loop, which is the next problem and a different post.


Where The Loop Stops Being Cheap

Per-pixel cost is now O(L) for every covered pixel, regardless of whether a light contributes anything to that pixel. A point light a hundred meters away from the camera, with attenuation that drops its contribution to effectively zero everywhere on screen, still costs one full Blinn-Phong evaluation per pixel.

That’s the next ceiling, and the standard answers to it are:

  • Light volumes. Render a sphere around each point light, only run the lighting shader on pixels covered by that sphere. Trades a per-pixel branch for some extra geometry work. Was the canonical trick from about 2005 to 2015.
  • Tiled deferred. Bin lights into screen-space tiles in a compute pre-pass, only iterate the bin’s lights per pixel. The current default for AAA engines from roughly 2012 onward.
  • Clustered. Same idea as tiled but in 3D — bin lights into view-space frustum cells. Better for scenes with vertical light variation.

I haven’t picked one yet. The lab’s roadmap puts ambient occlusion next, because the visible cost in the current scene is “ambient is a constant lie,” not “the light loop is too expensive.” When the demo scene grows enough lights for the loop to matter, that’s the post that picks a culling strategy. For now, the naive O(L) loop renders the demo at a healthy frame rate with ten or twenty lights — and “healthy frame rate” is measured honestly because the swapchain is configured to tear rather than smooth (see the field note on present modes).


What We Built, What We Didn’t

What ships in this post:

  • Per-frame lighting SSBO at set 1, binding 0, std430-laid-out, packed by a pure codec
  • Variable-length light array — the shader learns the count from a push constant
  • Pure LightPacking codec with a unit test that proves the byte layout
  • ImGui controls to add, edit, and remove lights at runtime, all going through the immutable scene
  • The same lighting math as the previous post, now folded over many lights

What I deliberately didn’t build:

  • A second light kind. Light is currently a discriminated union with one case (PointLight). The buffer’s directionPad field is dead weight — slack for next post.
  • Light culling. No volumes, no tiles, no clusters. Naive O(L) per pixel. This is a knowingly-deferred problem.
  • A smarter ambient term. Still 0.05 * albedo. The lie persists.
  • Shadows. Lights illuminate through walls. Real shadows are a separate, much larger arc.

The next post adds directional lights — the second variant the buffer was already shaped for — and replaces the constant ambient with a hemispheric one that at least responds to surface orientation. That’s the last visible piece of the basic lighting model before the lab’s narrative pivots from “make it lit” to “make the ambient stop lying,” which is what SSAO is for.