The broker and the network

OCC is a network of Nodes, but Nodes don't talk directly to each other. Between them sits the broker: a small server that hosts the published packs, runs a search index, and routes encrypted exchanges between Nodes that want to collaborate. This page explains what the broker does, why it exists in this shape rather than as pure peer-to-peer, and what it explicitly does not do.

What the broker is

A broker is a single FastAPI service running on a modest virtual machine. It does five things:

Serves pack files. Every approved pack lives on disk under /opt/occ-packs/<path>/wiki/. Nodes fetch index files and concept pages over HTTP.
Exposes a tree. A hierarchical view of the pack catalog, used for browsing and tooling.
Runs a search index. A SQLite FTS5 index over titles and summaries, served at /search. This is what powers retrieval — see Knowledge retrieval.
Maintains a peer registry. Nodes connect over WebSocket, register themselves with their tier and public key, and stay listed as long as the connection is alive.
Routes peer-to-peer exchanges. When one Node wants a Critic review from another, the encrypted payload passes through the broker. The broker delivers; it does not decrypt.

The current production broker runs on a modest cloud VM. The whole service is one Python file, around 400 lines including the search-index logic. There is nothing special about the host — the broker is replaceable.

What the broker is not

what the broker doesn't do

The broker is not where inference happens. No language model runs on the broker. Queries do not go to the broker for "answering". Inference is local to each Node, or routed to a peer Node — never to the broker itself. The broker does not see your queries when peers collaborate: payloads are encrypted with the peer's public key before they leave your Node. The broker has no key. It moves bytes.

It helps to be explicit, because every part of OCC's architecture leans on this:

The broker is not the source of authority. The packs are. The broker hosts the packs the community has approved, but it does not decide what counts as approved knowledge. That role belongs to the registry and the governance process.
The broker does not store user data. No user accounts, no per-user state, no telemetry. Standard HTTP access logs exist on the server but contain no user identity beyond IP address. The peer registry is ephemeral — Nodes that disconnect are dropped.

Why a broker, not pure P2P

A reasonable question for an open-source network is: why a server at all?

The honest answer is that pure peer-to-peer imposes serious costs on every participant in exchange for ideological purity that doesn't buy more decentralization than the broker model already provides. A broker the size and shape of OCC's gives the network the properties people care about — replaceability, no proprietary core, no authority over content — without forcing every Node to deal with NAT traversal, peer discovery, distributed search, and the operational complexity that follows.

Specifically:

Discovery. A Node joining the network for the first time needs to find peers. Pure P2P solutions (DHTs, gossip protocols, bootstrap nodes) all introduce either a hidden central element or significant latency. A small broker resolves this trivially: connect, register, see who else is online.
Search at scale. Searching across thousands of packs from a Node's local machine would require either downloading the full pack metadata or querying every peer. A central full-text index serves the same purpose in milliseconds with negligible storage. As the catalog grows toward many thousands of packs, this is the difference between a usable system and one in name only.
NAT and connectivity. Most consumer machines are behind NAT or firewalls. Direct peer connections require hole-punching, relay servers, or both — which means some infrastructure regardless. A broker is that infrastructure made explicit and small.
Replaceability. What pure P2P advocates actually want is the absence of a single chokepoint. OCC achieves that differently: the broker is small, the protocol is documented, the packs live in public version-controlled repositories, and the broker code is open source.

the architectural test

If this broker were turned off tomorrow, what would be permanently lost? The answer is nothing — only the current peer registry, which rebuilds itself on the next set of connections. Pack content, history, and authority all live in repositories outside the broker.

The HTTP surface

The endpoints a Node interacts with are deliberately minimal:

GET /tree and GET /tree/{path} — browse the pack catalog as a hierarchical tree.
GET /packs — the flat list of pack paths.
GET /packs/<pack>/wiki/<file> — fetch any file inside a pack's wiki: the index, a concept page, the schema.
POST /search — full-text search across all packs. Body: {q, k}. Returns the top k (pack, file, title, summary, score) matches by BM25.
POST /admin/reindex — triggers a rebuild of the search index, scoped to a single pack or the whole catalog. Token-protected; called by the deploy admin tool after a pack is uploaded.

There is no authentication on the read endpoints — pack content is public by design.

The WebSocket surface

A second surface, separate from HTTP, handles peer connectivity:

A Node opens a WebSocket to /ws and sends a register message with its tier, VRAM, and public key.
The broker keeps the connection open and updates the Node's last_seen on each ping.
When a Node wants a peer Critic, it sends a query message with the target peer's ID and the encrypted payload. The broker forwards it.
The peer responds with a response message; the broker routes it back to the original requester.
When a Node disconnects, it is removed from the active registry.

The broker tracks just enough state to route messages: which Node IDs are connected, their public keys, their declared tier. Nothing in the message bodies passes through the broker in cleartext.

Indexing and reindexing

The search index lives in a single SQLite file alongside the broker process. On startup, the broker walks /opt/occ-packs and rebuilds the index from every pack's index.md — extracting (pack, page, title, summary) tuples for every row of every index. With several hundred packs this takes well under a second; with thousands, still single-digit seconds.

After the initial build, the index updates incrementally. When a new pack is deployed, the deploy tool calls /admin/reindex with the pack path. The broker re-reads that pack's index.md and replaces the corresponding rows.

The index is rebuilt rather than amended on each pack change because the cost is trivial and a clean rebuild guarantees consistency. The pack files on disk are the source of truth; the index is a cache that any operator can throw away and recreate.

Federation, planned

Today there is one broker. As the catalog grows and as different communities form their own pack collections, this single broker will need to give way to a federation of independent brokers — each maintained by a different organization, each potentially hosting a different curated subset of packs.

The protocol is being designed so that federation does not require changes on the Node side: a Node can be configured to talk to any broker, and brokers can mirror each other's pack catalogs. As pack signing is rolled out (see Roadmap), the trust model based on signatures over content hashes will let a Node verify that a pack fetched from one broker is bit-identical to the same pack at another.

Federation is on the roadmap rather than implemented today. The current single-broker setup is sufficient for the early phase of the network.

Operating a broker

Running an OCC broker is uvicorn-launching one Python file, with /opt/occ-packs on disk and an SSH-deployed pipeline for adding packs. The minimum requirements are modest: a handful of GB of RAM, a few cores, enough disk for the pack catalog (each pack is small).

Anyone can operate a broker. For most users, joining the existing network is enough — the official broker handles the public catalog. Self-hosting a broker is appropriate when running OCC inside an organization on a private set of packs, or when participating in a future federated topology with custom curation.

Something missing or incorrect? Open an issue on GitHub