Klem HQ · AI Workflow Roadmap · v1

The Content Review Pipeline Roadmap

A five-phase plan for moving from inconsistent, slow content review to a pipeline that runs in your voice — with the AI catching the easy mistakes and the human deciding everything that matters.

For content leads and solo marketers shipping 5 to 20 pieces a month Reading time 22 minutes Phases 5 Verified against tool versions current as of 2026-05-15
Messy draft pile through AI review gates into three lanes A stack of unsorted drafts passes through an AI reviewer gate and is sorted into three lanes labelled revise, approve, and kill. incoming drafts AI review gates style · facts · structure rubric-based first pass Revise AI flags + editor suggestions Approve editor signs and schedules Kill archived with a note
Three lanes after one quiet, rubric-bound pass.
Roadmap overview · five phases
All five phases at a glance 1 Codify 2 Intake 3 First pass 4 Editor focus 5 Recalibrate

Before you start

If your content team is one person, or three people sharing one editor's calendar, the bottleneck is almost never writing. It is the review loop — the pings, the half-edits, the drafts that sit in someone's tab for four days waiting on a vibe check. This roadmap is for the moment when that loop has become the story of your month, and you are ready to put an AI reviewer in front of your editor without letting it make the calls only a human should make.

You end up with an AI that does the first pass — style flags, structural issues, fact-check candidates — so the editor reads each draft once, catches what only they can catch, and moves on.

What you will have at the end

  • A written style rubric the AI uses as its source of truth — not the model's defaults.
  • A draft intake that captures audience, channel, urgency before a reviewer sees the piece.
  • An automated first pass flagging style violations, structural issues, fact-check candidates.
  • An editor queue surfacing AI-flagged regions plus your tier-two concerns — not the whole draft.
  • A monthly recalibration that tracks false positives and style drift.

What you need before phase 1

  • A working {your CMS} or shared doc tool (WordPress, Webflow, Notion, Google Docs, Sanity, Ghost) with 90 days of drafts and published pieces.
  • A paid {your AI reviewer} account — ChatGPT, Claude, or similar — on a plan supporting custom instructions or projects. Free tiers lose your rubric every session.
  • Twelve to fifteen pieces of your own published writing you consider on-voice. The rubric is built from these.
  • Six to nine hours over two weeks. Four hours for phase 1 — the rubric is the load-bearing wall.
Tool architecture for an AI-assisted content review pipeline Drafts flow from your CMS or doc tool into an AI reviewer, then into an editor queue, and finally into a publisher. {your CMS} Notion · Docs · Sanity · Ghost submit {your AI reviewer} rubric-bound first pass flags + notes {your editor's queue} Linear · Asana · Notion · Trello approve {your publisher} CMS · newsletter · scheduler
Four nodes, one rubric, one decision point — everything else is wiring.
Before

Long cycle, scattered pings, uneven voice

~8 days
draft submitted to published
  • Three reviewer pings per piece on average — one editor, two cross-functional readers, often a brand check.
  • Style consistency drifts piece-to-piece; the rubric lives in someone's head.
  • Two of every ten pieces stall in editor's tab for a week or longer.
  • Fact-checks are caught after publish more often than before.
After

Short cycle, focused review, consistent style

~3 days
same draft volume, voice held steady
  • One reviewer ping on average — editor reads AI-flagged regions, signs, done.
  • Style score on each piece, tracked against rubric — drift visible monthly.
  • Fewer pieces stall; the AI catches structural issues before the editor opens the doc.
  • Fact-check candidates surface inline with the draft, not after the link goes out.

Illustrative range, not benchmark — your numbers will vary by piece length and team size.

The roadmap

Five phases, sequenced so that the rubric comes before any AI touches a draft. If you stop after phase 1 you still come out ahead — a written style guide is more than most content teams have. Phases 2 through 5 layer in intake, automated review, focused editor handling, and the recalibration loop that keeps the rubric honest as your brand evolves.

1
Phase 01

Codify the rubric the AI will use to read your drafts

Before any model gets close to a working draft, you write down — in language the AI can apply mechanically — what good means to your team. This is the load-bearing wall of the pipeline. Get it wrong and every later phase amplifies the wrongness.

Most content teams already have a style guide. The honest assessment is that most of those guides are either too short to be useful or too long to read. The AI cannot apply either one. What it needs is a concrete, list-shaped document with three sections: voice attributes with examples, banned constructions with examples, structural conventions with examples. The model learns from examples, not adjectives.

Spend the first hour pulling your twelve to fifteen on-voice pieces into one document. Read them as if you had never seen them, and underline the recurring moves — the way you open, the way you close, the rhythm of your sentences. Write each move as one sentence followed by two examples lifted verbatim from the archive.

The second hour is the banned list: for each construction, paste the verbatim violation and a verbatim rewrite. The model recognises "do not write like this; write like this" far better than "be authentic". The third hour is structural conventions — opener length, H2 versus bold leads, digits versus words, em dashes versus en dashes. Small individually; together they are most of what readers register as voice.

The fourth hour is the cold read: apply the rubric to three unseen pieces one item at a time. If a rule cannot produce a clear yes-or-no, it is not yet operational — rewrite it.

Phase 1 flow — archive to rubric Twelve to fifteen on-voice published pieces are read, the recurring moves and banned constructions are listed, structural conventions are noted, and the rubric is cold-tested against three unseen pieces. 12–15 pieceson-voice archive underline moveshuman read draft rubricvoice · ban · structure cold-test 3 piecesrubric applied item-by-item
Archive → underline → draft rubric → cold-test
Tools you will use
  • {your AI reviewer} — for the cold-read pass at the end. Paste the draft rubric, paste one unseen piece, ask the model to grade item-by-item.
  • {your CMS} or a flat folder — to surface the twelve to fifteen on-voice pieces you will mine.
  • A single document — Notion, Google Doc, Markdown file — that becomes the rubric itself.
Time + cost estimate
Three to four hours of focused work, ideally in one sitting. No additional cost beyond your existing AI subscription.
What you ship at the end
A rubric of 1,200 to 2,000 words in three sections: voice attributes (six to ten items, each with two verbatim examples), banned constructions (eight to twelve items, each with violation and rewrite), structural conventions (six to ten one-line rules). This document is the single source of truth for every later phase.
Common failure modes
  • Writing the rubric as adjectives. "Be warm, be direct, be human" tells the model nothing. Every rule needs two verbatim examples — the model learns from instances, not abstractions.
  • Letting the model write your rubric. Ask the model to summarise the moves in your archive and you get a generic copywriting checklist. The mining must be human; the model can phrase rules after you identify them, not before.
  • Confusing personal preference with style. If a rule exists because you, personally, hate semicolons, label it as preference. Otherwise the AI flags colleagues' on-voice writing as off-voice.
  • Treating the rubric as finished. The first draft is v0.1. It is wrong in places you cannot see until phase 3 runs against real drafts.
Decision gate before phase 2 Run the rubric against three unseen pieces. If the rubric returns a clear yes-or-no on every rule and 80 percent of those judgements match your unaided read, proceed. If it stalls or disagrees on more than two items per piece, rewrite the offending rules and re-test.
2
Phase 02

Capture drafts with the metadata that decides everything downstream

A draft that arrives in your queue with no metadata is a draft that costs an editor ten minutes of figuring out before they can even start reading. Phase 2 fixes that — every incoming draft brings its audience, channel, and urgency with it, by default.

The same 800-word draft is a different review job depending on whether it is going to the weekly newsletter, a sales-enablement landing page, or the founder's LinkedIn. Without intake metadata, the AI applies a generic rubric and the editor still has to retranslate every flag in their head.

Build a simple intake form — Notion template, Google Form, typeform, Airtable record — with five mandatory fields: piece title, intended audience (in your own taxonomy, not "B2B SaaS marketers" but "founder-led mid-market SaaS readers, week-of-launch awareness"), publishing channel, target word count, urgency. Add one optional field: "what about this draft are you nervous about?" Writers fill it in surprisingly often, and it routes the AI's attention.

Wire submission to do three things at once: copy the draft body to a known location, create a queue record with metadata, trigger the phase-3 AI pass against the channel-specific rubric. If any step is manual copy-paste, the writer skips it under deadline and the pipeline collapses.

Do not let writers submit until v0.9, not v0.2. The AI reviewer is wasted on first-draft stream-of-consciousness. Add a checkbox: "I have read this draft end-to-end at least once and would not be embarrassed to publish it as-is." If the writer cannot honestly tick the box, the draft is not ready.

Phase 2 flow — writer submits with metadata A writer fills an intake form with title, audience, channel, word count and urgency; submission creates a queue record and triggers the AI first pass against the right channel rubric. writer · v0.9 draftread end-to-end once intake form5 fields + 1 nervous-about queue recordmetadata + body trigger phase 3channel-specific rubric
Writer → intake → queue record → AI first pass
Tools you will use
  • An intake form — Notion database template, Google Forms, Typeform, or Airtable form. Whichever your team already opens daily.
  • An automation layer — Zapier, Make, n8n, or native integrations between your form and your queue tool.
  • {your editor's queue} — Linear, Asana, Notion database, Trello board. Where editors already track work.
Time + cost estimate
Two to three hours to build the form, the automation, and the queue template. Ongoing cost is the marginal automation-tool seat — typically under $20 a month for a content team of one to five.
What you ship at the end
Every draft a writer submits arrives in your editor queue with five required fields and the body attached, ready for the phase-3 AI pass. Writers cannot submit without filling the form. Editors no longer have to ask "what's the audience for this one?" before reading.
Common failure modes
  • The form is too long. Twelve fields kills compliance within a week. Five required plus one optional is the ceiling.
  • Audience as marketing-deck-speak. "B2B decision-makers" is not an audience. "Founder-led mid-market SaaS readers" is. Generic audience fields produce generic reviews.
  • Manual copy-paste in the chain. If submitting the form does not auto-copy the body, writers will skip steps under deadline. Treat any manual step as a future failure scheduled for the worst possible moment.
  • Skipping the v0.9 gate. Letting v0.2 drafts in clogs the AI reviewer and trains writers to outsource their first-read to the model. The "I have read this end-to-end" checkbox is not optional.
Decision gate before phase 3 Run intake for one full week. At week's end, audit the queue: every record should have all five required fields populated, with no editor cleanup. If two or more records are missing fields, the form is wrong — shorten it before automating the AI pass against bad metadata.
3
Phase 03

Run the AI first-pass review against the rubric

Now the AI does what it is actually good at: scanning a draft against mechanical rules and surfacing candidates. Style violations, structural issues, fact-check candidates. Three outputs. No verdicts, no rewrites, just flags.

This is the place most teams overshoot. The temptation is to ask the AI to "improve" the draft, "rewrite this paragraph in our voice", or "tell me if this is good enough to ship". Resist all three — each one turns the AI from a useful first reader into a flawed second writer, and a flawed second writer flattens your voice over time.

The AI's job is a structured report in three categories. First, style violations: every instance of a banned construction, with line numbers and rule. Second, structural issues: missing H2s, oversized paragraphs, openings that bury the lede, weak conclusions. Third, fact-check candidates: any specific claim containing a number, date, company name, product name, or citation.

The third category is the one most teams overlook and it has the highest return. The AI cannot tell you whether a claim is true; it can tell you which sentences make claims, and that alone reduces fact-check time from "re-read the entire piece looking for things that might be wrong" to "verify these eleven sentences."

Configure the output as a fixed-shape document, not a chatty response: three sections, each a list, each item shaped {line number, quoted phrase, rule reference, suggested action}. The AI does not score the draft, does not suggest sentence-level rewrites, does not decide whether the piece is good enough. Those are explicitly out of scope; the prompt instructs the model to refuse. The editor decides; the AI surfaces candidates so the editor decides quickly.

Phase 3 flow — AI first pass produces a structured flag report A submitted draft is read by the AI against the channel rubric; three categories of flag are produced: style violations, structural issues, and fact-check candidates. submitted draft+ metadata AI first passrubric · no verdicts style violations structural issues fact-check candidates structured reportline numbers + rules
Draft → AI pass → three flag categories → structured report
Tools you will use
  • {your AI reviewer} via API or projects — the rubric stays loaded as a system prompt; per-channel variants live as projects or templated prompts.
  • Automation layer from phase 2 — to trigger the API call and route the response.
  • A markdown destination — your queue tool's description field, a Notion subpage, a GitHub gist, anywhere the structured report can be rendered as a checklist.
Time + cost estimate
Three to four hours to write the system prompts, test them across two or three channels, and wire the output destination. Ongoing cost: roughly $0.05 to $0.30 per draft, depending on length and model. A team running 15 pieces a month spends under $5.
What you ship at the end
Within five minutes of submission, every draft arrives in the editor queue paired with a structured flag report. The editor sees the draft, the metadata, and a checklist of flagged regions in three categories — not a chatty assessment, not a rewrite. The report links to specific line numbers so the editor jumps directly to the spots that need attention.
Common failure modes
  • Asking the AI to score. "Rate this draft 1 to 10" is the most damaging prompt instruction in a content pipeline. It trains the editor to trust a meaningless number. Strip any scoring from the prompt.
  • Asking the AI to rewrite. The moment the AI offers suggested rewrites, the editor's job becomes accept-or-reject. Three weeks later the published voice has homogenised toward the model's defaults. Flag-only, no rewrites.
  • Skipping fact-check candidates. Many teams configure phase 3 as style-only because facts feel harder. They are the category with the most asymmetric upside: a missed style flag is small, a missed fact is a correction notice. Configure the third category from day one.
  • Letting the report sprawl. If the AI produces 80 flags on a 1,200-word piece, the rubric is too aggressive or the writer submitted too early. Cap at 15 ranked flags per category.
Decision gate before phase 4 Run phase 3 for ten drafts. The editor tags every flag as useful, false positive, or missed-the-point. If 70 percent of flags are useful, the rubric is operational — proceed. If false positives dominate, return to phase 1 rather than adding more rules to the prompt.
4
Phase 04

Surface the AI-flagged regions plus the editor's tier-two concerns

The editor does not read the whole draft. They read the AI's flagged regions and their own checklist of things the AI cannot see. That is the entire job — focused, fast, and reserved for the judgement only a human can supply.

Phase 4 is where the time savings actually land. Every preceding phase exists to make this one short. The editor opens the queue, picks the top item, sees the flag report next to the draft, works through it region by region, decides revise-approve-kill, and is done.

Reframe the editor's role as decision, not detection. Detection — finding violations, missing H2s, load-bearing claims — is now the AI's job. What the AI cannot do is decide whether a flagged construction is wrong on this piece, in this context, for this audience. That is judgement, and judgement scales by attention, not hours.

Build the editor's tier-two checklist alongside the AI report — five to seven yes-or-no questions on what the AI is honestly bad at: does the opening land for the specified audience? Does the piece say something the brand has not already said three times this quarter? Is the closing earned or tacked on? Does the writer sound like themselves on a good day?

Sequence matters. Read the AI report first while resolving flags, then run the tier-two checklist on the whole. The first 80 percent of attention is on resolving flags; the last 20 percent is on the questions only the editor can answer. Reverse the order and the AI report becomes a chore at the end of an already-finished read.

The decision is one of three: revise (back to the writer with notes), approve (schedule and publish), kill (archive with a written note). The kill option is non-negotiable — a pipeline that cannot kill a draft has no quality floor. Most teams kill 5 to 10 percent in a healthy pipeline. Log every editor decision next to the AI's flag count: pieces approved with few resolved flags show what the rubric already captures; pieces killed despite zero flags reveal what the AI cannot catch, and they feed phase 5.

Phase 4 flow — editor decides revise, approve, or kill An editor opens the queue, reads the AI flag report and the tier-two checklist alongside the draft, and routes the piece into one of three lanes. editor opens queuetop item AI flags + tier-tworesolve · then question Reviseback to writer Approveschedule + publish Killarchive with note decision loggedfeeds phase 5
Open → resolve flags → tier-two pass → revise / approve / kill
Tools you will use
  • {your editor's queue} — Linear, Asana, Notion, Trello, or whatever already runs your team's work.
  • A tier-two checklist template — six items max, embedded in the queue record so the editor cannot miss it.
  • A decision log — a spreadsheet or database that records, for each piece: AI flag count, editor decision, time spent in review.
Time + cost estimate
One to two hours to write the tier-two checklist and embed it in the queue template. Ongoing cost is the editor's time per piece — typically 15 to 25 minutes for a 1,000-word draft once the pipeline is running cleanly, down from 45 to 70 minutes before.
What you ship at the end
A working editor loop: the editor opens the queue, sees the flag report and the tier-two checklist next to the draft, makes a revise-approve-kill decision in a measurable window, and the decision is logged with the AI's flag count attached. Three lanes for every piece, no exceptions, including pieces from the founder.
Common failure modes
  • The editor reads the whole draft anyway. Three weeks in, old habits return and time savings evaporate. Pair the queue with a working time-per-piece metric and have the editor watch their own number for two weeks.
  • The tier-two checklist gets longer. Every miss feels like it deserves a new item. After two months the checklist is twelve items and editors skim it. Cap at seven; merge or drop quarterly.
  • The kill lane is empty. If your team has not killed a draft in two months, either writers are improbably consistent or — far more likely — the editor is approving things they would not have a year ago. The kill lane has to be used occasionally for the system to mean anything.
  • Auto-publishing on approve. Tempting and dangerous. Approve should schedule for publish, not publish immediately. The buffer between approve and publish is where someone catches what neither the AI nor the editor saw. Keep it.
Decision gate before phase 5 Run phase 4 for two weeks. Pull the decision log: distribution across revise / approve / kill, average editor time per piece. Healthy looks like 30–50 percent revise, 45–65 percent approve, 3–10 percent kill. If kill is zero or approve is over 80 percent, the quality floor is missing — recheck the rubric and tier-two checklist before measuring anything else.
5
Phase 05

Recalibrate the rubric monthly against false positives and style drift

A rubric written in May is not the same rubric you need in October. Your audience shifts, your products evolve, your team's writing matures. Phase 5 is the habit that keeps the rubric honest — track what the AI gets wrong, track what your team's voice does without anyone noticing, and rewrite once a month.

The biggest reason content pipelines decay is not that the AI gets worse. The rubric stays static while everything else changes, and after six months the AI is reviewing this month's drafts against last spring's idea of good. Phase 5 catches drift early, while it is still cheap to fix.

Two streams feed recalibration. The first is the false-positive log — every flag an editor tags as false positive is recorded with the rule that fired and the line that triggered it. After a month, you have 40 to 120 entries, and the patterns are visible by eye: one rule fires on three pieces because the wording is too literal (rephrase), another fires almost exclusively on landing pages (channel-scope).

The second stream is the style-drift audit. Once a month, pull five recently-published pieces that passed cleanly. Read them as one corpus. Ask: does this still sound like us? If the answer is yes-on-each-piece-but-something-feels-off-as-a-whole, the rubric is no longer capturing what the team values. This audit catches the slow flattening nobody notices day-to-day, and it is what most teams skip.

Schedule recalibration as a recurring 90-minute block, first Friday of the month. Open the false-positive log, five recent on-rubric pieces, and the rubric document. Make changes in three categories: rephrase, channel-scope, add-or-retire. Bump the rubric to v0.x+1 and redeploy the system prompt. Every six months, run a longer recalibration: phase 1's cold test against three unseen pieces. Disagreement above 25 percent means deeper revision than monthly tweaks will fix.

Phase 5 flow — false positives and drift audit feed the rubric The false-positive log from phase 4 and a monthly five-piece drift audit both feed a 90-minute recalibration block that updates the rubric. false-positive logfrom phase 4 5-piece drift auditread as corpus 90-min recalibrationfirst Friday · monthly rubric v0.x+1rephrase · scope · add/retire redeploy promptphase 3 picks up
Log + drift audit → recalibration → rubric revision → redeploy
Tools you will use
  • The false-positive log — a column in your decision database from phase 4. No extra tool needed.
  • The rubric document — version controlled. A Notion page with a change log column or a Markdown file in a repo.
  • A 90-minute recurring calendar block, first Friday of every month, in the editor's calendar.
Time + cost estimate
90 minutes a month, plus a half-day every six months for the deeper revision. No additional tool cost.
What you ship at the end
A monthly habit and a versioned rubric. After three months, the false-positive rate is visibly trending down. After six months, the rubric has gone from v0.1 to v0.7 and the editor trusts the AI's flags enough that they read flagged regions first by reflex, not by training.
Common failure modes
  • Skipping the drift audit. False positives are urgent and noisy; drift is slow and quiet. Teams that audit only false positives end up with a rubric that scores higher on its own terms while the actual voice flattens. The five-piece corpus read catches this.
  • Over-rotating on a single editor's read. If the false-positive log is dominated by one person's calls, the rubric tracks that one person's preferences rather than the team's voice. Have a second person classify a sample before recalibration.
  • Adding rules without retiring them. Every recalibration adds rules, none get retired, and after a year the rubric is 4,000 words and contradicts itself. Cap rules per section and force a retire decision before any addition.
  • Forgetting to redeploy. The document gets revised but the live system prompt still points at v0.3. Build the redeploy into the recalibration block as an explicit final step.
Ongoing decision gate If false-positive rate climbs above 35 percent for two months running, the rubric is broken in a way monthly tweaks cannot fix — block 90 minutes and rewrite the offending section from scratch. The pipeline is allowed to be honest about its own deterioration.

What the AI cannot do

These are specific limits as of 2026-05. Treat them as the failure modes you would otherwise discover at the worst possible moment — the morning after a piece publishes.

Honest limits

  • It cannot judge voice authenticity. A draft can be technically on-rubric and still read as a thin imitation of the writer on a good day. The model lacks the felt sense of "this person, on this topic, would not say it this way." Only an editor who knows the writer catches that, in phase 4's tier-two checklist.
  • It cannot catch a novel argument that contradicts the brand. The rubric encodes patterns from past writing. If a draft makes a claim the brand has never made — and would not, if you stopped to think — the rubric has no signal that anything is wrong. Tier-two question: "Does this piece say anything we have never said before, and if so, is that intentional?"
  • It cannot evaluate cultural or political nuance. A sentence that reads neutral in one context reads pointed in another. Models miss regional, generational, or community-specific signal that a culturally-fluent editor catches in three seconds. Sensitive topics go through a second human reviewer regardless of AI flag count.
  • It cannot decide whether a piece is good enough to ship. Goodness is a judgement about whether this piece, on this day, advances what the brand is trying to say. The AI can tell you which constructions violate the rubric. It cannot tell you whether the piece is worth your audience's time. Phase 4's revise-approve-kill is reserved for the editor; phase 3 prompts explicitly refuse "should we ship this" requests.
  • It cannot verify a claim is true. The fact-check candidates from phase 3 are sentences containing claims — not correct or incorrect claims. The model will confidently confirm a hallucinated statistic if asked. Treat the list as a to-do; verify each claim against a primary source before approving.
  • It cannot remember what you decided last month. Without explicit retrieval, every review starts from zero. A draft that argues a position you debated and rejected at last quarter's offsite passes the AI's read. If a piece touches a known strategic landmine, route it manually.
Decision tree — AI-flag-only versus AI-suggest-edits versus human-rewrite Each draft is routed by three sequential questions; the answers decide whether the AI flags only, the AI suggests sentence-level edits, or a human rewrites the piece entirely. Draft submitted Sensitive topic? yes no Writer fluent in voice? no yes Flag count > 15? yes no AI flags only editor decides Human rewrite sensitive lane AI suggest edits writer pairing Send back to writer draft too early
Three sequential checks; the answer decides how much the AI is allowed to touch.

After you finish

The pipeline needs maintenance — not because AI is fragile but because your audience, your products, and the categories that describe your writing all change. The cadence below is what holds up over twelve months.

Maintenance cadence

  • Weekly — Skim the decision log. Note pieces that landed in the kill lane and pieces that flew through approve with very few resolved flags. Both extremes are signal.
  • Monthly — The 90-minute recalibration block on the first Friday. False-positive log review, five-piece drift audit, rubric revision, prompt redeploy.
  • Quarterly — Read one full month of published pieces in a single sitting, the way a reader who has never seen them would. Update the rubric if a pattern is visible.
  • Every six months — Re-run phase 1's cold test. If model and editor disagreement runs over 25 percent, allocate a half-day and rewrite the weakest rubric section from scratch.
  • On model upgrades — Re-run the phase 3 flag-utility check (ten drafts, editor tags each flag) before letting the new model go live. Behaviour changes across versions in ways the release notes rarely capture.