Attaching final Playwright screenshots to PR comments
A useful PR review surface starts with one final screenshot per E2E test, a manifest the runner can consume, and selective artifact comments instead of raw report dumps.
Context
End-to-end UI tests often produce evidence that is technically available but operationally weak.
The report exists, screenshots exist, and traces exist, but the reviewer still has to leave the PR, find the artifact bundle, and reconstruct which image matters.
That is too much friction for a review loop that only needs a quick answer:
- what did the final page state look like
- which spec produced it
- is the result relevant to the files that changed
The problem is not artifact generation alone. It is turning Playwright output into a review surface that fits directly inside the PR flow.
Decision / Insight
Treat final UI screenshots as review artifacts with their own pipeline, not as incidental test byproducts.
The useful flow is:
Playwright final-state capture -> manifest -> runner-side selection -> optional public upload -> PR comment
That order matters. The test repo is only responsible for producing stable evidence. The PR runner is responsible for deciding whether that evidence belongs in the review conversation.
This keeps the boundary clean:
- the Playwright layer captures final screenshots consistently
- the manifest exposes them in a machine-readable shape
- the PR pipeline selects only artifacts relevant to the current change
- the comment stays compact instead of dumping the full report surface
Breakdown
Options considered
-
Link only to the Playwright HTML report
- Easy to produce.
- Still forces the reviewer to navigate away from the PR and hunt for the meaningful screenshot.
-
Attach every screenshot generated by the run
- Maximizes visibility.
- Bloats the PR comment and weakens the signal when only a small part of the E2E suite matters to the change.
-
Capture one final screenshot per test and comment only the relevant subset
- Requires a small artifact pipeline.
- Produces a much better PR review surface.
Trade-offs
- Final full-page screenshots add storage and upload steps, but make visual review faster.
- Selecting artifacts from changed files reduces noise, but means the pipeline needs a reliable mapping from spec to screenshot.
- Public uploads make GitHub image embedding simple, but require retention and permission handling outside the repo.
- Fallback-to-local behavior keeps PR creation resilient, but produces a weaker review surface when upload fails.
Constraints
The pipeline stays useful only if it remains narrow:
- capture the last visible page state, not every intermediate transition
- write artifacts to a predictable repo-local path
- emit a manifest the outer runner can parse without inference
- select artifacts based on changed E2E specs or E2E infrastructure changes
- keep PR comments compact with a limited number of inline images
- do not block PR creation if artifact upload fails
Implementation
In the site repo, Playwright captures one final full-page screenshot after each E2E test and records a structured artifact entry with:
- screenshot path
- test title
- test file
- route
- project name
- creation timestamp
Those entries are collected under output/playwright/ and compiled into output/playwright/review-artifacts.json during global teardown.
That gives the outer runner a deterministic input instead of asking it to inspect Playwright internals or parse report HTML.
In Night Shift, the github-issues profile reads that manifest after the code change is complete and validations pass.
It then:
- loads all available review artifacts
- selects only the artifacts tied to changed E2E specs, or all artifacts when shared E2E infrastructure changed
- preserves the selected files inside the task directory
- optionally uploads them to a public droplet path
- comments on the new draft PR with one inline image plus links to the remaining uploaded artifacts
If upload is not configured or fails, the PR still gets created. The comment falls back to local preserved paths instead of embedded public images.
That fallback is important. The screenshot pipeline improves review quality, but it is not allowed to become a write-path dependency for opening the PR itself.
This also keeps the same architectural boundary used in Night Shift for GitHub issue backlogs: the runner owns the review surface, while the execution layer only produces bounded artifacts.
Reusable Takeaway
If Playwright screenshots are meant to help code review, treat them as first-class review artifacts.
A practical baseline is:
- capture one final screenshot per E2E test
- emit a manifest the outer pipeline can consume
- select artifacts from changed files instead of posting everything
- upload for inline PR embedding when available
- fall back gracefully when uploads fail
The non-obvious improvement is not better screenshots. It is better placement.
Once the final UI state is visible directly in the PR comment, the evidence stops behaving like test exhaust and starts behaving like review input.