DiffSense: A Local-First Pull Request Reviewer Built During Build Small
Abstract
DiffSense is a privacy-first pull request review assistant built for the Hugging Face Build Small hackathon. The app accepts either a unified diff or a public GitHub pull request URL, parses the changed files and hunks, runs a deterministic review engine for high-signal security and correctness risks, and renders the result as inline review comments with structured JSON output.
The core design choice is simple: the app must remain useful even when hosted model providers are unavailable, cold, rate-limited, or missing a particular model route. DiffSense therefore treats deterministic review as the always-on path and model inference as an enhancement layer. It exposes bridge points for JetBrains Mellum 2, NVIDIA Nemotron 3 Nano, NVIDIA Nemotron 3 Nano 4B, OpenBMB MiniCPM-V 4.6, and Modal, while also preparing persistent local checkpoint slots under the Space bucket mounted at /data.
Motivation
Code review is a daily workflow for engineering teams, but most AI review tools assume that source code can be sent to a third-party SaaS service. That assumption is often wrong. Teams working on customer data, unreleased products, internal APIs, regulated systems, or security-sensitive infrastructure may need review assistance without exporting private code.
DiffSense is aimed at that gap. It is not trying to replace a human reviewer with a black-box chat interface. Instead, it turns a diff into a concrete review artifact:
- severity-tagged findings,
- per-file and per-hunk locations,
- inline comments attached to changed lines,
- actionable fix suggestions,
- JSON output that can be copied into automation or a pull request workflow.
The hackathon constraint shaped the product in a useful way. Rather than building a large hosted reviewer that only works when every model endpoint is healthy, we built a small, inspectable workflow that starts from deterministic analysis and adds model passes where they make the product better.
Product Experience
The app is a Gradio Space with a three-part workspace:
- The left sidebar configures model and provider passes.
- The center pane accepts the diff or pull request URL, image uploads, and shows the summary/model trace after processing.
- The right pane shows the detailed inline review and structured JSON.
The user flow is intentionally short:
- Open the Space.
- Paste a unified diff or a public GitHub PR URL.
- Optionally upload PR screenshots, diagrams, or UI diffs.
- Click Review diff.
- Read inline comments and copy the structured JSON if needed.
For public GitHub PRs, DiffSense appends .diff to the pull request URL and fetches the public unified diff with a short timeout. Pasted diffs stay inside the app process unless a model/provider pass is explicitly enabled.
Architecture
Unified diff or public GitHub PR URL
-> normalize input
-> fetch public .diff when needed
-> parse unified diff into files, hunks, and changed lines
-> run deterministic review rules
-> optionally summarize with Mellum bridge
-> optionally route/triage with Nemotron bridge
-> optionally sanity-check with Tiny Titan bridge
-> optionally process uploaded images with MiniCPM-V bridge
-> optionally POST to Modal endpoint
-> render summary, agent trace, inline diff review, and JSON
The app is implemented in a single app.py file to keep the Space easy to inspect during judging. The key pieces are:
normalize_diff: accepts pasted diffs or public GitHub PR URLs.parse_unified_diff: converts unified diff text into file/hunk/line dataclasses.review_diff: applies deterministic code-review rules.summarize_with_model: narrows the model role to summarizing known findings.run_nemotron_router: produces routing/triage notes.run_tiny_titan_checker: produces a compact <=4B sanity-check path.run_minicpm_vision: accepts image uploads for PR screenshots and diagrams.render_review: renders a custom HTML diff view with inline findings.render_agent_trace: exposes model runtime and bridge status.
Deterministic Review Engine
The deterministic path is the product's reliability layer. It parses added lines and checks for review risks that are common, high-signal, and easy to explain:
- hardcoded credentials,
- disabled TLS or JWT verification,
- unsafe
pickledeserialization, - dynamic execution via
evalorexec, shell=Truesubprocess calls,- SQL string interpolation,
- bare
except:, - temporary
TODO,FIXME, orHACKmarkers, - return-contract changes such as newly introduced
return None, - large behavior changes outside test files.
Each finding is normalized into this shape:
{
"file": "src/auth.py",
"hunk": "@@ -1,9 +1,13 @@",
"line": 11,
"severity": "critical",
"category": "security",
"comment": "The change disables a verification check, which can turn a trusted boundary into a bypass.",
"suggestion": "Keep verification enabled and add a narrowly scoped test fixture for local development.",
"source": "deterministic"
}
This made the app demoable under time pressure. Even if all hosted inference routes fail, the reviewer still produces useful output.
Model and Provider Bridges
DiffSense integrates the hackathon model stack as optional bridge points rather than hard dependencies.
| Role | Model or Provider | Purpose |
|---|---|---|
| Code summary | JetBrains/Mellum2-12B-A2.5B-Instruct |
Summarize deterministic findings and diff risk |
| Agentic routing | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 |
Triage changed files, merge risk, and follow-up tests |
| Tiny checker | nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 |
<=4B lightweight review sanity check |
| Visual context | openbmb/MiniCPM-V-4.6 |
PR screenshot, UI diff, and diagram context |
| External runtime | Modal endpoint | Optional POST bridge via DIFFSENSE_MODAL_ENDPOINT |
The model prompts are intentionally constrained. For example, Mellum is asked to summarize deterministic findings rather than invent findings from scratch. This keeps the output auditable and prevents the model layer from undermining the review engine.
Local Checkpoint Strategy
The Space is configured with a read/write Hugging Face bucket mounted at /data. DiffSense creates and monitors these model slots:
/data/models/mellum2-instruct
/data/models/nemotron-3-nano-30b-a3b
/data/models/nemotron-3-nano-4b
/data/models/minicpm-v-4.6
Each slot is considered ready when it contains a config.json. Text-model bridge calls first check for local checkpoints before falling back to hosted Hugging Face Inference routes. This lets the app grow from a reliable deterministic demo into a local/ZeroGPU-backed model reviewer without committing checkpoints into the Space repo.
The app also reports model runtime status directly in the UI so judges can see the configured local-first paths.
Privacy Model
DiffSense has three privacy tiers:
- Pasted diff with model toggles off: diff analysis stays in the app process.
- Public GitHub PR URL: the app fetches the public
.diffdocument. - Optional model/provider pass: compact diff context and deterministic findings are sent to the selected provider or local checkpoint path.
This is why the deterministic review path is not just a fallback. It is the privacy-preserving default that makes the tool useful for sensitive code.
Gradio UI Design
The UI uses gr.Blocks with custom CSS and HTML rendering rather than a chatbot layout. That choice matters because code review is a reading and scanning task. A chat transcript is the wrong shape for a diff.
The current layout is optimized for a demo and for actual use:
- configuration in the sidebar,
- input and summary in the center,
- detailed inline review in the larger right pane,
- JSON output beneath the detailed review.
Findings are rendered inside the diff with severity badges, file headers, hunk headers, line numbers, and suggested fixes. This makes the output feel like a review artifact rather than a model response.
Development Process
The project was built under a tight hackathon deadline with Codex as an active build partner.
The build sequence was:
- Analyze the hackathon constraints and sponsor badge criteria.
- Choose a real developer workflow that benefits from local AI: pull request review.
- Build a deterministic reviewer first so the demo could never be blocked by model availability.
- Add a custom Gradio UI for a non-chat, code-review-specific experience.
- Add public GitHub PR URL fetching.
- Add model/provider bridge toggles for Mellum, Nemotron, Tiny Titan, MiniCPM-V, and Modal.
- Add persistent
/datacheckpoint slots for ZeroGPU/local checkpoint readiness. - Stabilize Space runtime by disabling experimental Gradio SSR.
- Rebalance the UI into configuration, input/summary, and detailed review panes.
- Iterate the visible model status copy so the app reads as local-first and resilient rather than broken when hosted providers are unavailable.
The most important engineering decision was to reduce risk early. A deterministic reviewer with a custom diff renderer is valuable on its own; model bridges then improve the experience rather than define it.
Failure Handling
The app is designed to stay useful across common hackathon failure modes:
- hosted model route unavailable,
- OAuth token missing,
- Space rebuild,
- provider rate limit,
- cold start,
- missing local checkpoints,
- public PR URL fetch failure.
For model failures, the UI reports that the bridge is armed and that deterministic fallback is active. The review still completes.
For rebuild persistence, model files belong under /data, not /app. The /app directory can be reset during rebuilds, but the mounted bucket persists as long as it remains attached to the Space.
Hackathon Fit
DiffSense targets the Backyard AI track because it is a practical local AI tool for a daily developer workflow.
It also maps cleanly to sponsor badges:
- Gradio app: implemented as a Hugging Face Space using Gradio.
- Best Use of Codex: Codex was used throughout design, implementation, debugging, deployment, and documentation.
- Best Agent: the app is a staged review pipeline with parsing, classification, summarization, routing, and rendering.
- Off Brand: custom diff UI instead of a stock chat interface.
- Best Demo: one-click sample or public PR URL produces clear review output quickly.
- Best MiniCPM Build: MiniCPM-V 4.6 image path is integrated for visual PR context.
- Nemotron Hardware Prize: Nemotron 3 Nano router bridge is integrated.
- Tiny Titan: Nemotron 3 Nano 4B checker path is integrated.
- Best Use of Modal: Modal endpoint bridge is included through
DIFFSENSE_MODAL_ENDPOINT.
What We Would Build Next
The next product improvements are straightforward:
- Add a real Modal endpoint and set
DIFFSENSE_MODAL_ENDPOINT. - Stage quantized checkpoints under
/data/models. - Add downloadable patch suggestions.
- Add GitHub comment export.
- Add per-rule enable/disable controls.
- Add a richer MiniCPM-V demo with screenshots and architecture diagrams.
Conclusion
DiffSense is small by design. It does not require a perfect model endpoint to be useful, and it does not ask teams to send private code to a SaaS reviewer. It turns a diff into a structured, inspectable review artifact and creates clear extension points for local checkpoints and sponsor models.
That combination, reliable deterministic review plus optional small-model intelligence, is the core idea: useful now, private by default, and ready to grow into a fully local AI code review workflow.