What if the question every AI-detection tool asks is the wrong one?
"Was this written by AI?" has quietly become unanswerable. The best engineers now use AI as a tool, and plenty of human-written text is hollow. So when the AI Slop Scan Hackathon asked builders to fight AI-generated slop, the strongest teams refused the obvious framing. Instead of building yet another authorship classifier, they built tools that measure something harder and more useful: whether a piece of work actually shows human thinking.
Forty-three teams shipped working detectors across code review, documentation, marketplace reviews, and general writing. What follows is a look at what they built and the engineering decisions that separated the top of the field.
Reframing the Problem
The clearest expression of this shift came from SlopGuard (team Team Batman), which finished first overall. SlopGuard is a content-oversight scoring engine built around a single insight: "Was this AI-generated?" is both unanswerable and unfair, because great developers use AI while lazy writers produce slop by hand. Rather than guess at origin, SlopGuard scores writing for signs of human reasoning, turning a binary witch-hunt into a graded measurement of substance. That reframing — from detection to oversight — is what carried it to a 4.19 average across eight judges.
Signal-OSS, the runner-up, applied the same philosophy to a narrower, high-value target: code review. AI-generated pull request descriptions and review comments have a recognizable failure mode — they restate the diff in fluent prose without explaining anything. Signal-OSS is a zero-tolerance scanner that measures the actual information density of PRs, commit messages, and code comments. It does not ask whether an LLM wrote the text; it asks whether the text carries any signal at all. For teams drowning in auto-generated review noise, that is a directly deployable check, and the judges rewarded its focus with a 4.15.
Third place went to SlopLens (SlopLens labs), which took the density idea and productized it aggressively. SlopLens is a three-layer hybrid engine that scores any text for information density, filler, and naturalness, and — crucially — ships in four forms: a web app, a Chrome extension, a REST API, and a CLI, all free to run. The lesson the judges drew was that a good metric is only half the work; meeting developers where they already are is the other half.
The DELTA Between Reporting and Thinking
If there was a single conceptual thread running through the best submissions, it was the distinction between reporting and reasoning. Showreceipts (BharatShowreceipts-DELTA) made it explicit with DELTA, a metric for low-quality AI content in PR descriptions and documentation. Its framing was memorable: slop reports, humans think. A hollow description restates the diff; a quality description explains why an approach was chosen and what alternatives were rejected. By scoring that gap directly, Showreceipts gave teams a way to flag documentation that looks complete but says nothing — and earned a 4.10 in the process.
Papyrus (OneAbove) pushed the same instinct into academic and research integrity. Rather than detect AI authorship, Papyrus audits the reference layer: it verifies that citations actually exist, classifies how they fail, and scores whether evidentiary claims align with their sources. In an era where fabricated citations are a known failure mode of language models, auditing the references is often more decisive than auditing the prose.
On-Device and Real-Time
Several teams competed on engineering constraints rather than novel metrics. SlopBlock (Blockers) is a Windows desktop application that detects and hides AI-generated text, images, and YouTube videos in real time as you browse — entirely on-device, with no cloud, no accounts, and no telemetry. Running multimodal detection locally and fast enough to keep up with a live browsing session is a real systems achievement, and it placed the project firmly in the excellence tier at 4.01.
BS-Meter (Momo) went wide instead of deep, scoring text across six domains — code review, content and SEO, social and news, academic, email, and general writing — using twelve deterministic signal categories to catch fluff, clickbait, keyword stuffing, and weak reasoning. Determinism was a deliberate choice: by avoiding a model-in-the-loop, BS-Meter stays cheap, fast, and explainable, three properties that matter more than raw accuracy when a tool needs to run on every piece of text a team produces.
Detecting Coordination, Not Authorship
The most architecturally ambitious entries looked past individual documents entirely. Review Constellation (Nebula) is an explainable review-fraud platform built for the marketplace-reviews and code-review tracks. Instead of classifying whether any single review was AI-written, it detects coordinated behaviour patterns across many reviews — the signature of an organized campaign rather than a lone author. That shift, from per-item classification to network-level behaviour, mirrors how fraud detection actually works in production trust-and-safety systems, and it made Review Constellation one of the most forward-looking submissions of the event.
Taken together, these projects sketch a quiet consensus: the future of slop detection is not a better authorship classifier. It is a toolkit of information-density metrics, reference audits, and behavioural signals that measure substance directly and survive the fact that humans and machines now write side by side.
How the Field Was Judged
Every submission was evaluated on a weighted five-criterion rubric — detection accuracy (30%), practical usefulness (25%), technical execution (20%), innovation (15%), and presentation (10%). The panel brought senior engineering and product perspective from across the industry: Amr Arqoub, Innovation Director of Technology & Partnerships at Freshpet; Iuliia Kozlova, Lead Software Testing Expert at SimbirSoft and a CNCF Kubestronaut; Myroslav Mishov, Lead Enterprise Architect and KubeCon security-track reviewer; Sergii Demianchuk, Senior Software Engineering Technical Leader at Cisco; and Sushil Choubey, Principal Supply Chain Manager at Amazon — among others who reviewed across the four batches.
Looking Forward
The teams that did best at AI Slop Scan succeeded by rejecting the premise of the genre. They stopped trying to catch machines and started trying to measure thought. That is a harder problem, and a more durable one: as generation models improve, authorship detection gets steadily less reliable, but the gap between work that reasons and work that merely fills space is not going anywhere. The tools built here — density scorers, reference auditors, behavioural fraud detectors — are early sketches of how teams will keep that gap visible. It turns out the best way to fight slop is not to ask who wrote it, but to ask whether it was worth writing at all.
.jpg)