Anatomy of a pack

the format

An OCC pack is a folder of markdown files. There is no database, no proprietary format, no compiled artifact. Anything in a pack can be opened in a text editor and read; anything can be edited by hand.

The conventions exist to make packs predictable to tools (the Node, the broker, Forge), readable by language models, and reviewable by humans.

This page describes the layout, the file roles, the page structure, and the qualities that distinguish a good pack from a confused one.

Directory layout

A pack lives at a single root path. Its structure:

my-pack/
├── manifest.yaml
├── raw/
│   ├── _index.md
│   └── articles/
│       ├── _index.md
│       └── 2026-05-10-some-source.md
└── wiki/
    ├── index.md
    ├── log.md
    ├── schema.md
    └── concepts/
        ├── concept-one.md
        ├── concept-two.md
        └── ...

Two top-level subdirectories define the two layers of the pack:

  • raw/ — the original source documents, immutable. Every source ingested into the pack has a copy here, dated, with frontmatter recording where it came from.
  • wiki/ — the LLM-generated knowledge layer: the pages that retrieval actually reads when answering questions.

Plus a top-level manifest.yaml describing the pack itself.

manifest.yaml

A minimal record of pack identity and provenance:

name: caesar
version: 1.0.0
domains:
  - history
  - rome
language: en
sources:
  - url: https://en.wikipedia.org/wiki/Julius_Caesar
    fetched: '2026-05-10'
    hash: 6f8cb28a83cad4b05edead944b1742527466733170923d84298a0888a722ac54
signature: null
min_node_version: 0.1.0

Each ingested source contributes one entry under sources, with the URL, the date it was fetched, and a content hash that lets a reviewer verify the pack was built from the source they expect. The signature field will hold a cryptographic signature when pack signing is rolled out (see Roadmap); it is null in the current state of the network.

The wiki/ directory

This is the layer the Node reads from. Four kinds of file live here.

index.md

The retrieval entry point. A markdown table with one row per concept page:

# Wiki Index

Last updated: 2026-05-10
Total pages: 26

## Pages

| File | Title | Summary |
|------|-------|---------|
| concepts/julius-caesar.md | Gaius Julius Caesar | Roman general, statesman, and author whose conquest of Gaul, victory in civil war, and dictatorship precipitated the end of the Republic. |
| concepts/gallic-wars.md | Gallic Wars | Series of military campaigns (58–50/51 BC) led by Julius Caesar that brought much of Gaul under Roman control. |

The broker indexes every row of every pack's index.md into the central search index. The titles and summaries you see here are exactly what /search returns to the Node during retrieval. They are also what the Critic sees in its sources manifest.

highest-leverage decision

Keeping summaries accurate and informative is the single highest-leverage decision in pack quality. A vague summary produces bad retrieval; a precise one surfaces the page for exactly the questions it can answer.

The index is regenerated from disk after every Forge run, so it always reflects the actual frontmatter of the pages it lists.

log.md

A chronological, append-only record of what was ingested when:

## [2026-05-10] ingest | julius-caesar

- Pages written: 28
- Source: https://en.wikipedia.org/wiki/Julius_Caesar

Every Forge run appends one block. Reviewers and operators use the log to understand a pack's history without having to read git history.

schema.md

The pack's own conventions: how pages should be named, what counts as in-scope, the tone, the granularity. Different packs make different choices — a pack of biographies and a pack of API documentation are not structured the same way — so each pack documents its own conventions.

A schema is a short document. It exists so that whoever next contributes to the pack writes pages that fit.

concepts/<slug>.md

The concept pages. One page per topic. This is the substance of the pack. Each page follows a fixed structure.

Concept page structure

---
title: Gaius Julius Caesar
slug: julius-caesar
category: concept
aliases: ["Julius Caesar", "Caesar"]
sources:
  - raw/articles/2026-05-10-julius-caesar.md
confidence: high
created: 2026-05-10
updated: 2026-05-10
summary: Roman general, statesman, and author who dominated late Republican politics, led the Gallic Wars, crossed the Rubicon precipitating civil war, and was assassinated in 44 BC.
tags: [roman-republic, gallic-wars, dictator]
---

# Gaius Julius Caesar

> Gaius Julius Caesar (12/13 July 100 BC – 15 March 44 BC) was a Roman general, statesman, and author...

## Identity and chronology

- Full name: Gaius Julius Caesar; patrician of the gens Julia.
- Lifespan: 12/13 July 100 BC – 15 March 44 BC.

## Gallic Wars (58–51 BC)

...

## See Also

- [[gallic-wars|Gallic Wars]] — Caesar-led military campaigns 58–51 BC
- [[crossing-the-rubicon|Crossing the Rubicon]] — Caesar's 49 BC act of insurrection

## Sources

- [julius-caesar](../../raw/articles/2026-05-10-julius-caesar.md) — narrative of Caesar's life, campaigns, reforms, and legacy.

## Key Points

- Caesar's conquest of Gaul and political alliance with Pompey and Crassus propelled him to unparalleled prominence.
- His legislative program in 59 BC and Gallic command redefined Roman power projection and domestic politics.
- Assassinated in 44 BC; ensuing wars ended with Augustus and the Roman Empire.

The components:

Frontmatter — YAML at the top, the page's metadata. Minimum fields:

  • title — the human-readable page name. What appears in the search index.
  • slug — the page's identifier. Matches the filename without .md. Used by wikilinks.
  • summary — one or two sentences describing the page. What /search returns and what the Critic sees in the sources manifest.
  • sources — list of files in raw/ that this page draws from.
  • confidencehigh, medium, or low. Reflects how well-supported the page is by its sources.
  • created and updated — dates, automatically maintained.
  • aliases — alternative names the page might be referenced by.
  • tags — short keyword tags.

Abstract blockquote — a short paragraph (2–4 sentences) immediately after the H1 title. The first thing a reader (human or model) sees. The Critic uses it as a starting point when verifying claims.

Body sections## subheaders with structured content: bullet lists, prose, code blocks where relevant. Optimized for being read whole, not skimmed.

## See Also — wikilinks to directly related pages in the pack. Each entry is one bullet: - [[slug|Display Name]] — short note describing the relationship

The slug must exist in the pack's index.md. Broken wikilinks are flagged by lint.

## Sources — bullets pointing to the raw/ files the page draws from.

## Key Points — three to five bullet takeaways. The condensed version of the page.

Wikilinks

OCC packs use wikilinks for internal references:

[[slug]]                  unstyled
[[slug|Display Name]]     with custom display text

These are not standard markdown links — they are pack-internal references that the Node and tooling resolve against the pack's index.md. The retrieval pipeline parses them out of fetched pages to drive the See Also expansion (see Knowledge retrieval).

If a wikilink points to a slug that no longer exists in the pack, lint flags it as a broken link. There are no dangling references in a healthy pack.

What "quality" means

A good OCC pack:

  • Has a clear scope. The schema says what is in-scope and what isn't. Pages don't drift into adjacent topics.
  • Has dense, factual pages. Each page is readable in five minutes and contains substantive material — not filler, not generic introductory text. Every section has a reason to exist.
  • Cites its sources. Every specific claim traces, through ## Sources and the sources frontmatter, back to a file in raw/ and from there to an external document.
  • Uses cross-links deliberately. Every wikilink connects pages that genuinely belong together. Linking everything to everything is worse than linking nothing.
  • Has a useful index. Titles describe what the page actually contains. Summaries are specific (date ranges, named entities, technical concepts) rather than vague.
  • Is internally consistent. Two pages don't contradict each other on the same fact, or if they do, both flag the contradiction explicitly.

Forge's lint pass catches many of these mechanically. Reviewer judgment catches the rest.

Versioning

Each pack is versioned in its manifest.yaml. The convention is semver: major version for breaking structural changes, minor for new sources or significant content additions, patch for fixes.

The pack catalog tracks history through the public repository — every change to a pack is a normal git commit on a normal branch, reviewed as a pull request. Diffing two versions of a pack is just a git diff.

Future: signing

The current manifest.yaml includes a signature field that is null today. As pack signing is rolled out (see Roadmap), each published pack will carry a signature over the content hash of its files. Nodes will be able to verify that a pack fetched from any broker is the exact pack the maintainer signed — protecting against tampering anywhere along the distribution path.

Until signing is in place, integrity rests on the public review process and the immutability of the source repository.


A pack is markdown, organized for human review and machine consumption alike. The format is open, the structure is documented, the conventions are inspectable. Anyone can read a pack; anyone can write one. The discipline of the format is what allows OCC's small-machine retrieval and multi-agent deliberation to produce trustworthy answers.

Something missing or incorrect? Open an issue on GitHub