field notes

Picking a Swapchain Present Mode

Field note. Why the lab uses FIFO_RELAXED with MinImageCount instead of the AAA Mailbox default — honest frame-time signal over smoothed presentation.

Picking a Swapchain Present Mode for a Lab

Technical note — why the engine uses FIFO_RELAXED with minImageCount instead of the AAA default of Mailbox + minImageCount + 1.

Status: Accepted. Code in VulkanSwapchain.cs reflects the decision.

Context

The default Vulkan swapchain advice — the one repeated in tutorials, in vendor samples, and in shipped AAA engines — is some flavour of:

prefer Mailbox, fall back to FIFO
imageCount = capabilities.MinImageCount + 1

The reasoning is sound when you are shipping a game. Mailbox lets the GPU produce frames as fast as it can and replaces any un-presented frame in the queue, so the player sees the freshest image without tearing. The extra image lets the CPU queue the next frame while the GPU works on the current one and the previous one is being scanned out — three-deep pipelining absorbs the occasional 20 ms spike and the player never sees a hitch.

The cost of that comfort is what made me stop and rethink. Both knobs exist to hide frame-time variance. A 25 ms frame that should be a visible failure becomes invisible because the buffer ahead of it papers over the gap. For a game that is exactly the goal. For a lab whose entire point is measuring and seeing what rendering techniques cost, it is the opposite of what I want.

The current engine uses Mailbox + MinImageCount + 1. Time to look at it again with the lab in mind, not the shipping game.

Options

Three configurations on the table. All ignore tearing-only modes (IMMEDIATE) — those are useful for benchmarking max throughput but not for working on a feature.

1. Keep the AAA default — Mailbox + MinImageCount + 1. Smooth frames. Lowest perceived hitch on spikes. Hides frame-time overruns inside the extra buffering and inside the “replace-don’t-queue” semantics of Mailbox. Wrong tool when the overrun is the signal.

2. Strict FIFO + MinImageCount + 1. The vsync default. Universally supported. A missed frame double-displays the previous one — a stutter rather than a tear. Still buffered, so a single late frame may still be hidden if the following frame catches up.

3. FIFO_RELAXED + MinImageCount (no +1). Take the minimum the surface gives you (often 2 on desktop) and present with FIFO_RELAXED. A missed frame tears on that frame instead of stuttering, because relaxed mode presents immediately when the previous interval was missed. Less hidden buffering, tighter CPU/GPU coupling, lowest input latency.

What that buys for a lab:

Honest signal. Overruns are visible — a tear is a louder diagnostic than a stutter.
Lower latency. Fewer frames in flight between input and pixel.
Simpler mental model. A fixed 16.67 ms budget per frame, hard pass/fail, no buffering tricks softening the edges.

What it costs:

Tighter CPU/GPU coupling. Without the +1 image, the CPU can’t always queue the next frame while the GPU is rendering and the previous is being presented. A 10 ms render technique can measure as ~16 ms wall-clock because of waits. Wall-clock frame time is no longer a clean technique-cost number — it becomes a budget gate.
Driver / compositor variance. Some Windows compositors (notably DWM in windowed mode) don’t honour FIFO_RELAXED and silently behave like FIFO. The lab still works; it just loses the relaxed behaviour on those paths.
MinImageCount is platform-dependent. Most desktop drivers return 2; some return 3. The “no +1” intent only really bites when the minimum is 2, so log the negotiated count on startup and don’t assume.

Decision

Use FIFO_RELAXED with capabilities.MinImageCount (no +1). Fall back to FIFO if FIFO_RELAXED is not advertised.

The lab philosophy is the tiebreaker. This project exists to see and measure rendering techniques, not to ship the smoothest possible frame. Hiding overruns inside an extra image of latency is the kind of well-intentioned smoothing that makes a learning project lie to itself. A frame that takes 18 ms should look wrong, and with this configuration it does.

The asymmetry is the same shape as the VMA decision: the cost of the AAA default is silent and continuous (every overrun hidden); the cost of the lab default is loud and immediate (occasional tear when something goes wrong). Loud-and-immediate is the right cost structure for a place where the entire point is to learn.

Consequences

Code change. Two small edits in VulkanSwapchain.cs:

ChoosePresentMode prefers FIFO_RELAXED instead of MAILBOX, with FIFO as the universal fallback.
imageCount = capabilities.MinImageCount (no +1), still clamped against MaxImageCount.
Log the negotiated (presentMode, imageCount, min, max) on startup so future-me isn’t surprised when a driver hands back 3.

Measurement discipline. Wall-clock frame time is now a budget gate, not a technique-cost metric. To measure how much a pass actually costs, use the GPU timestamp queries already on RenderLab.Ui.ImGui (GpuTimestamps). Present-mode choice does not pollute timestamp deltas; it does pollute Stopwatch-style frame timing.

Tearing as feedback. A visible tear is now a pass-fail diagnostic. If a tear shows up while working on a new technique, the technique has overrun the budget. That’s the engine telling the truth, not a bug.

Compositor caveat. On Windows under DWM-composited windowed mode, FIFO_RELAXED may behave like FIFO. The lab will still run; overruns will manifest as stutters instead of tears in that mode. Documented but not worked around — fullscreen exclusive presentation is out of scope.

Reversibility. Single function, four lines of intent. If the project ever wants AAA-style smoothness for a public demo build, the swap is trivial and contained.

Functional angle

Same shape as VMA: this is a shell-layer choice, not an architectural one. The functional core (Graph, Scene, Functional) doesn’t know what present mode is in use. The render graph compiler doesn’t read it. Pass declarations are unchanged. The only thing that notices the change is the imperative shell inside RenderLab.Gpu — and the human looking at the screen.

That’s the test for a well-isolated decision: it lives behind the purity boundary, the pure half of the codebase is unaffected, and the cost of changing your mind later is one function.

Follow-ups (not in this note)

If/when the engine grows a “presentation mode” toggle for demos vs. benchmarks vs. lab work, this becomes one option among several. Today there is one mode, and it’s the lab mode.
Revisit if a future paper genuinely needs the latency profile of Mailbox to demonstrate a technique (e.g. low-latency reprojection work).
Revisit if the project ever exits windowed-mode DWM and gains a fullscreen-exclusive path where FIFO_RELAXED semantics are reliable.