System Architecture
Strategic Context
My goal with this project was to build something that could be implemented by the team quickly and make a real impact on the bottom-line. I chose to focus on automating comparison pieces specifically because:
1. "X vs Y" queries are bottom-of-funnel by definition. They pull in searchers actively comparing two specific products, not exploring the category, which makes them very high-intent. The traffic these articles capture is among the most valuable available, and organic ranking on these terms is significantly cheaper than paying for the same intent through paid search.
2. Vercel has done a few of these but the comparison surface area is much larger than what's published. Existing KB pieces (vercel-vs-fastly, vercel-waf-vs-cloudflare-waf, vercel-vs-netlify) demonstrate the strategy works. The search volume on uncovered terms is significant:
Roughly 1,600+ monthly searches across uncovered terms, all bottom-of-funnel, with very low keyword difficulty across the board. A pipeline that produces these on demand is able to capture evaluation-stage traffic on an ongoing basis, pieces that only compound over time.
What Was Built
The system is a four-agent pipeline that turns a curated source corpus into a publication-ready comparison article. Editorial discipline is single-sourced: every agent loads shared_rules.md (universal principles) and piece_brief.md (piece-specific spec) at runtime through a shared prompt_context.py, so voice and editorial judgment are never duplicated across agent code.
Extract. Fetches each source URL and runs it through Claude Sonnet 4.6 with handling rules looked up from a roles.json table — vendor docs are factually authoritative, competitor critiques are partisan and always attributed, community discussions become thematic signals rather than quotes. Output is a structured corpus of claims tagged by dimension, vendor, and disputed-claim references.
Compare. Two-pass synthesis on Claude Opus 4.7. A setup call identifies in-scope dimensions, editorial tensions (typed as between-subject disagreements, subject disclosures, or external critiques), and category-framing voices. A composition pass then populates each entry with positions, steel-mans, evidence references, draft-ready narrative payloads, and a verdict that defaults to "depends on workload." A composite weight built from three mechanical scores plus one inline judgment drives priority order. The matrix renders to a human-readable brief; the pipeline pauses here for editorial review.
Draft. Writes against the matrix, the piece brief, and a voice anchor selected from a curated style-reference library. Selection is metadata-driven: each anchor is tagged with genre, subject domain, and register; the piece brief declares matching fields; the agent picks on exact match with primary_voice_anchor: true as universal fallback.
Revision. Rereads the composed draft against a reader test (can the reader decide, explain the differentiation, name one strength of each product?) and an anti-pattern list, then revises. Produces revision notes alongside the final draft, catching false balance, duplicated paragraphs, and voice drift the composer missed.
The whole pipeline is wrapped by an orchestrator with progress indicators, cost estimates, and the human review checkpoint between Compare and Draft.