Multi-agent deliberation

A single language model answering a question is a brittle system. The model proposes a draft, the draft contains errors, the user reads the errors and trusts them. There is no second look.

OCC treats answering as a deliberation among multiple agents, each with a specialized role and a specialized prompt. Every answer is the product of three sequential calls: an Expert drafts, a Critic reviews, a Synthesizer integrates. This page explains how each role works, why the division of labor was designed this way, and how the network's collective hardware can take over the most demanding role.

Why three agents

Each role has a distinct cognitive task with distinct failure modes. Keeping them separate prevents one role's weakness from contaminating another's:

  • The Expert has the user's question and all the retrieved pages. It is best positioned to write a complete first draft. Its weakness: when a specific (a date, a citation, a name) is almost right, the Expert often fills the gap with a plausible-sounding fact from training data rather than admitting the source is silent.
  • The Critic has the same sources but a different job: not to write, but to verify. It examines the draft claim by claim. Its weakness as a writer becomes an asset as a reviewer — it does not get to propose alternatives, only to flag.
  • The Synthesizer sees the Expert's draft and the Critic's review and produces the final answer. To keep it honest, the Synth does not see the raw sources; if it did, it would just regenerate a plausible answer and risk overriding what the previous two roles found. Its job is integration, not reinvention.
division of labor

Three roles cover what a single call cannot: drafting and verification and integration, each with prompts and inputs tuned for that one task.

The Expert step

The Expert receives:

  • The user's question.
  • The retrieved knowledge context — typically four to twelve pages of pack content, capped at the tier's retrieval_chars budget.

Its system prompt establishes faithfulness rules that override stylistic defaults:

  1. Specific factual claims must come from the context, or be presented as general background clearly framed as such.
  2. If the context doesn't contain a specific the user asked about, the Expert writes "the source does not specify X" rather than inventing one.
  3. The Expert never produces "I don't have access to external sources" disclaimers — the context is the source.
  4. When uncertain, the Expert says so explicitly rather than producing a confident-sounding specific.

The output is a thorough draft that uses the available material, structured for the question's depth — not deliberately compressed. On rich source material the Expert produces a substantial draft (5–8 KB on a mid-tier machine, more on higher tiers); on thinner material it produces a focused short answer. The length is anchored to what the sources support, not to a fixed quota.

The Critic step

The Critic's job is verification. To do it well, the Critic needs to know what exists in the retrieval set — even if, for memory reasons, it cannot read every retrieved page in full.

The Critic receives three things:

  1. A sources manifest — a list of every retrieved page with its pack/file, title, and one-line summary. The manifest tells the Critic what knowledge was pulled in, even when only an excerpt of that knowledge is in the prompt.
  2. A knowledge excerpt — the first portion of the assembled context, sized to fit the tier's budget alongside the rest of the prompt.
  3. The Expert's draft — the answer to review.

The Critic also has a tool: fetch_full_page. When it suspects a specific claim — a date, a citation, an exact quote — and the excerpt doesn't contain enough text to verify, the Critic calls this tool with a pack/file from the manifest and receives the full page body in return. The fetch is local: pages are already in memory from retrieval, so the tool incurs no network cost. The Critic typically uses zero or one fetches for ordinary questions, more for questions with many specific factual claims.

The Critic operates under verification rules that complement the Expert's faithfulness rules:

  • For every specific claim in the Expert's draft, check whether the excerpt or fetched pages support it.
  • If a claim is not supported, flag it as unsupported by source. Do not propose an alternative specific from training data.
  • If a claim contradicts the source, flag it as contradicts source and quote the relevant passage.
  • If a claim references a manifest page whose body wasn't in the excerpt and the Critic chose not to fetch it, flag it as cannot verify in available excerpt rather than as unsupported. The presence of a relevant page in the manifest is enough to consider the claim plausible.
critic asymmetry

The Critic flags rather than rewrites. Substituting one hallucination for another would be worse than identifying the gap. A Critic that says "the citation is unsupported" is more useful than one that confidently asserts a different citation from its own training data.

The Synthesizer step

The Synthesizer receives:

  • The user's original question.
  • The Expert's draft.
  • The Critic's review.

It does not receive the raw context. This is intentional. With direct access to the sources, the Synth would become a fourth Expert with extra steps — re-engaging with the same material and potentially overriding what the previous two roles found. By giving it only the Expert's draft and the Critic's review, the Synth is forced to be an integrator: preserve the Expert's substance where the Critic agreed, remove or soften where the Critic flagged, address gaps the Critic identified.

The Synth's system prompt instructs it to:

  • Match the answer's length and depth to the question and the available material — not artificially compress a well-supported, multi-faceted topic.
  • Preserve the Expert's specifics (names, dates, quotes, examples) where the Critic validated them.
  • Remove or soften the Critic's flagged claims rather than carrying them through.
  • Stay strictly on topic — never comment on the knowledge context, never explain what the context contains or doesn't contain.
  • Reply in the language of the user's question, regardless of the language of the sources.

The output is the answer the user sees. It streams to the chat as it is generated.

Network mode — peer Critic

When a peer with stronger hardware is available on the network, the Critic step can run on that peer instead of locally.

The mechanics:

  1. The Node identifies a candidate peer via the broker — a Node on a stronger tier than its own, currently online.
  2. The Node encrypts the Critic payload (manifest, excerpt, Expert draft) using the peer's public key.
  3. The encrypted payload is routed through the broker. The broker handles delivery but cannot read the content — it has no key.
  4. The peer decrypts, runs the Critic step on its more capable model, encrypts the response with the requester's public key, and sends it back.
  5. The Node decrypts the Critic review and continues to the Synthesizer step locally.

The local Node always retains the Expert and Synthesizer roles. Only the Critic is borrowed. The user's query and the Expert's draft are visible to exactly two parties: the user's Node and the chosen peer. The broker sees ciphertext.

hardware as a shared resource

A modest Node still gets a thorough, peer-reviewed answer; a powerful Node lends its capacity in return. Hardware inequality becomes a resource the network can balance rather than a barrier to participation.

Tier-aware scaling

Every input to the deliberation pipeline is sized to the machine running it. The caps are derived proportionally from the tier's retrieval_chars budget:

  • Critic excerpt — about 40% of retrieval_chars.
  • Expert draft passed to the Critic — about 50% of retrieval_chars.
  • Expert draft passed to the Synthesizer — about 70% of retrieval_chars.
  • Critic review passed to the Synthesizer — about 25% of retrieval_chars.

A laptop at retrieval_chars = 8 KB works with caps that fit comfortably in an 8 K context window. A server at retrieval_chars = 65 KB lets each role consume tens of KB of input without truncating anything load-bearing. The same code paths serve both — only the numbers change.

This scaling is invisible to the user. It is the reason OCC produces better answers on stronger hardware without the code branching by tier or features being silently disabled on smaller machines.

What the user sees

Once the Synthesizer has finished, the Node displays:

  • The final answer, streamed as it is produced.
  • A Sources panel containing:
    • The list of retrieved pages — pack path, title, snippet of the body — collapsed by default, expandable.
    • The Expert's draft.
    • The Critic's review.

Nothing in this exchange is hidden from the user. If a claim was removed from the final answer because the Critic flagged it as unsupported, the exchange that led to its removal is right there in the panel. The Sources panel is not a flourish — it is the user-facing surface of OCC's faithfulness guarantee.


How this connects to the rest

Something missing or incorrect? Open an issue on GitHub