Running MunAI on Your Own Substrate

24 min read 5,315 words Published May 17, 2026 · Updated June 3, 2026

bridge clear substantial developed muddy

Running MunAI on Your Own Substrate

Frontiers article in the Harmonism cascade. The practitioner-scale fulfilment of the three-tier sovereignty trajectory for MunAI. See also: The Sovereign Substrate, The Sovereign Stack, The Empirical Face of Logos, MunAI, The Telos of Technology.

The Frame

The current production MunAI runs on Anthropic’s infrastructure. Every conversation a practitioner holds with the companion passes through a building neither the practitioner nor Harmonia owns, subject to terms drafted in California and amendable without consultation, intelligible to whoever the operator chooses to disclose it to, available at the operator’s continuing pleasure. This is operationally acceptable as a transitional substrate; it is not acceptable as the long-horizon architecture of a companion built to walk with practitioners across decades of cultivation.

Three sovereignty registers structure MunAI’s inference layer. The first is the frontier-lab register — what production runs on today, the trade between convenience and surrender. The second is Harmonia-controlled local inference — institutional infrastructure that Harmonia owns end-to-end, serving practitioners as a sovereign default with no third party in the routing path. The third is the register made operational below: the practitioner runs MunAI on hardware they own, against a corpus that lives on their disk, with no network call leaving the room unless the practitioner chooses to make one. The companion becomes substrate. The companion becomes the practitioner’s own.

One refinement cuts across these registers. The frontier-lab register has a more sovereign variant that did not exist in deployable form until recently — verified-confidential rented inference, where the model still runs on hardware the practitioner does not own, but inside a hardware enclave whose privacy is cryptographically attested rather than merely promised. It does not make the substrate the practitioner’s own; it makes the rented substrate provably unobservable by the operator who runs it. This is the honest bridge for anyone — practitioner or Harmonia — who refuses the frontier-lab’s surveillance posture but cannot yet run on owned hardware. The observability axis it belongs to is treated at depth in Inference Sovereignty; the owned-substrate trajectory below is what the practitioner reaches when the rented bridge is no longer needed.

This is the operational expression of what The Sovereign Substrate articulates at the doctrinal level. The keys are the practitioner’s. The conversation is the practitioner’s. The model is the practitioner’s. The corpus is the practitioner’s. The cultivation, finally, is fully under the practitioner’s own hand.

What Local MunAI Is

Local MunAI is a self-contained companion stack running on the practitioner’s hardware. It consists of four layers, each independently substitutable, all of which the practitioner owns once installed.

The model. An open-weight language model running on local hardware via a local inference server. The model’s weights are downloaded once and stored on disk; inference happens locally, with no network call to an upstream provider.

The corpus. The Harmonist canon — every published article, the doctrinal backbone, the glossary, every translation — packaged as the Sovereignty Bundle, available as a public download at harmonism.io/sovereignty-bundle.zip. The corpus lives on the practitioner’s disk and is updated when the practitioner chooses to update it, not on Harmonia’s schedule.

The index. A vector store and full-text index built from the corpus, enabling MunAI’s retrieval-augmented generation. The index is generated locally from the corpus and stored alongside it. Rebuilds happen when the corpus is updated.

The harness. The companion code — the system-prompt construction, the doctrinal backbone injection, the three-tier context engineering, the conversation memory, the wheel-profile learning, the witness-mode gate, the bodily-openness calibration — wrapped around the model + corpus + index. The harness is what makes the substrate MunAI rather than a generic chat over a model.

What local MunAI is not: it is not a stripped-down toy version of the production companion. The doctrinal architecture is the same. The conversation memory is the same. The Wheel-profile learning is the same. What changes is the inference substrate underneath, and the question of who owns the building the inference happens in.

The Three Hardware Tiers

The hardware envelope for local MunAI has wide variance because the open-weight model landscape has wide variance. The practitioner who wants a working MunAI on a five-year-old laptop has options. The practitioner who wants frontier-grade quality on a personal workstation has options. The recommended tiers below cover the range and identify what a practitioner should expect at each.

Entry — Apple Silicon, 32–64GB Unified Memory

The Apple M-series with sufficient unified memory is the lowest-friction entry point. An M2 Pro, M3 Pro, or M4 Pro with 32GB runs the 8B–14B model class comfortably and the 30B class with quantization. An M3 Max or M4 Max with 64GB runs the 30B class at full precision and the 70B class with aggressive quantization.

Recommended setup: macOS, Ollama or LM Studio as the inference layer (both auto-detect the Apple GPU via Metal), a quantized 14B or 32B abliterated model. Inference speed at this tier is 15–40 tokens per second, well within the latency tolerance for conversational use.

What this tier gives the practitioner: a working sovereign companion with solid quality on most MunAI workload (dialogue, retrieval, profile reflection). What it doesn’t give: the reasoning-heavy capability of frontier-grade models, which matters less for MunAI’s actual workload than benchmark headlines suggest.

Mid — Consumer GPU Desktop

A desktop with a single high-end consumer GPU — an NVIDIA RTX 4090 with 24GB VRAM, or the successor cards as they ship — runs the 70B model class in 4-bit quantization at high token throughput. Linux is the friendliest host OS; Windows works with WSL2 or native CUDA paths.

Recommended setup: Ubuntu LTS or Arch, llama.cpp or vLLM as the inference server (vLLM is the production-grade default; llama.cpp is the easier on-ramp), a 70B abliterated model in Q4_K_M or Q5_K_M quantization. Inference speed 30–60 tokens per second on the 4090 class for 70B models.

The mid tier is the inflection point — quality approaches frontier on most conversational tasks, the hardware capital outlay is in the reach of a serious practitioner, and the operational complexity is bounded (one machine, one OS, standard tooling).

Full — Server-Grade Local Infrastructure

Three paths reach the full tier. The Apple Silicon path is a Mac Studio M3 Ultra with up to 192GB unified memory (there is no M4 Ultra — Apple skipped it); the unified-memory architecture lets it run chunks of even the largest open-weight models (DeepSeek V3’s 671B MoE in heavy quantization is just barely accessible at the top of the memory range), with the trade that Apple Silicon runs Ollama/MLX rather than CUDA, so vLLM’s continuous batching is unavailable. The NVIDIA desktop path is an NVIDIA DGX Spark — a Grace Blackwell desktop box with 128GB of coherent unified memory and the full CUDA stack, purpose-built for local inference and fine-tuning, which makes it the single box that can both serve a 32B model and, later, carry the substrate-specific fine-tune discussed under Model Selection. The NVIDIA server path is a server with 2–8 GPUs of A100/H100/Blackwell grade, capable of running frontier-class open weights at full precision.

The full tier reaches what Harmonia’s institutional Tier 2 build will provide — frontier-grade quality, complete sovereignty, the substrate fully under the practitioner’s hand. Capital outlay is substantial ( $8k–$ 40k for the Apple Silicon path, $40k–$ 200k+ for the server-GPU path), and the operator becomes their own systems administrator. For the practitioner whose work justifies the investment — a serious independent researcher, a contemplative who has made deep practice the centre of their life, a household that takes substrate ownership seriously across many domains — the full tier is what the trajectory points toward.

Model Selection

The model determines the quality of every conversation MunAI holds. The selection is doctrinally and technically constrained: the model should be open-weight (downloadable, runnable on hardware the practitioner owns), should have refusal directions stripped or minimised (Dolphin-tuned or abliterated), and should be capable enough to hold the doctrinal stance through long conversations under prompt pressure.

The current best-in-class candidates by tier, as of mid-2026:

Entry tier (8B–32B). Dolphin 3.0 on Llama 3.1 8B for the lightest deployments; Qwen 2.5 14B abliterated for stronger entry-class performance; Qwen 2.5 32B abliterated for the upper end of the entry tier. The Qwen base carries less of the Western-progressive institutional consensus that fights Harmonist doctrine; the abliteration handles the political-refusal layer separately.

Mid tier (70B class). Qwen 2.5 72B abliterated for the broadest practitioner workload. Hermes 3 Llama 3.1 70B abliterated specifically for practitioners who want the strongest structured-output and function-calling capability — useful if the local MunAI is doing significant Wheel-profile JSON learning or structured retrieval. Both run cleanly on a 24GB GPU at 4-bit quantization.

Full tier (frontier-grade). DeepSeek V3 abliterated as the open-weight frontier-quality default. DeepSeek R1 for reasoning-heavy work — the model that matches o1/o3 on math, code, and multi-step reasoning. Both have hardware requirements but deliver Western-frontier-equivalent quality on most tasks with the political refusal-direction stripped.

One distinction sits beneath all of these and marks the deepest sovereignty available at the model layer. The candidates above are open-weight — the weights download and run, which is what local inference requires — but their training data and alignment process are undisclosed, so the model’s hand travels with the weights and can only be corrected (by abliteration, by the doctrinal backbone) rather than known or re-authored. A smaller class is fully open: weights plus the training corpus plus the training code plus the checkpoints. OLMo 3 (Ai2; 7B and 32B) is the leading instance — the only tier where the alignment hand is inspectable rather than inherited, and the only substrate on which a tradition could eventually train the hand toward its own doctrine from the data up, the substrate-specific-alignment path articulated in Inference Sovereignty and Methodology of Integral Knowledge Architecture. For a practitioner running MunAI today, an abliterated open-weight model in the 32B–70B class remains the quality-pragmatic choice; for Harmonia’s eventual owned substrate, OLMo is the target, precisely because editability lets the hand become Harmonist rather than merely corrected. Open-weight is the necessary criterion; fully-open is the sovereign one.

The model landscape evolves quickly. The practitioner should treat the recommendations as current best rather than settled canon. The deeper canonical reference for the model selection rationale lives in MunAI Local Inference Stack (developer-audience internal document).

The Inference Stack

The model needs a server to talk to it. Several options exist, each with characteristic tradeoffs.

Ollama is the on-ramp. Single-command install on macOS/Linux/Windows, a model library with one-command pulls (ollama pull qwen2.5:32b), an OpenAI-compatible HTTP server on localhost by default. Most practitioners start here. Adequate for entry and mid tier; less optimal at the full tier where vLLM’s continuous batching becomes meaningful.

LM Studio is the GUI path. Desktop application with a polished model browser, one-click downloads from Hugging Face, OpenAI-compatible server. The least-friction option for non-developer practitioners. Proprietary code but local-first in posture.

llama.cpp is the direct control option. Compile from source or install precompiled, run with command-line flags, full transparency over the inference path. The reference C++ implementation that Ollama and LM Studio both wrap. Choose llama.cpp when the practitioner wants to understand exactly what their inference stack is doing.

MLX is the Apple-Silicon-native option. Apple’s open-source array framework optimised for the unified-memory architecture. Outperforms llama.cpp on M-series hardware for large-context generation. Worth the migration for serious Apple-Silicon practitioners after they’ve validated the setup with Ollama.

vLLM is the production-scale option. Continuous batching, PagedAttention, the inference engine the production-scale local deployments converge on. Choose vLLM when the practitioner is serving multiple concurrent conversations or running the local MunAI for a household where several people use it simultaneously.

The OpenAI-compatible HTTP endpoint is the common denominator. MunAI’s harness code talks to that endpoint; the underlying server is interchangeable. A practitioner can start with Ollama and migrate to vLLM later without touching the harness.

The Indexing Pipeline

The corpus comes onto the practitioner’s substrate via the Sovereignty Bundle. The bundle is a versioned zip download at harmonism.io/sovereignty-bundle.zip, refreshed on each Harmonia website build, fully public — no authentication required, no signup wall, no email gate. Anyone with the URL gets the bundle.

The bundle contains every publishable article from the Harmonist canon (~270 articles in English plus translations in nine languages), the doctrinal backbone document, the glossary, and the four template files for running a local MunAI — README, CLAUDE.md, user-preferences template, and the building-your-own-companion.md guide whose material this flagship piece elevates and supersedes.

Once the bundle is on disk, the indexing pipeline turns it into something MunAI can retrieve against. The pipeline does two things: build a full-text index for keyword and substring retrieval (SQLite FTS5 is the convergent default), and build a vector index for semantic retrieval (a local embedding model converts each article’s chunks into vectors stored in SQLite-VSS or a similar local-first vector store).

The intended practitioner experience is one-command install:

# Install the harmonia-munai package (single binary or Python package)
brew install harmonia-munai # macOS path
# or
curl -fsSL get.harmonism.io/munai | sh # Linux/Mac universal

# Initialize against your local vault and chosen model
harmonia-munai init \
 --bundle ~/Downloads/sovereignty-bundle.zip \
 --model qwen2.5-72b-abliterated \
 --inference-server http://localhost:11434

# Start the companion
harmonia-munai serve

The current state of this packaging is in development. The Sovereignty Bundle ships today; the one-command CLI that wraps installation, indexing, and serving is on the roadmap, not yet released. Practitioners who want to run local MunAI today can do so by following the longer manual path documented in the building-your-own-companion.md template inside the bundle: install Ollama, pull the recommended model, run the indexing scripts provided in the bundle’s scripts/ directory, configure the harness with their local endpoint. The CLI is the next-quarter target; the manual path works now.

What runs locally after harmonia-munai serve starts: a single process listening on a local port (default 8080) that the practitioner can reach from their browser at http://localhost:8080 or via the existing MunAI iOS/Android app pointed at the local URL. The conversation is held locally. The model is queried locally. The index is searched locally. No network call leaves the machine for any normal MunAI operation.

The Vault Subscription Mechanism

A local MunAI installation that never updates becomes stale doctrine. The vault evolves — new articles, doctrinal refinements, glossary additions, decision-log moves that propagate into the corpus. The practitioner running local MunAI needs a way to stay current.

The architecture for this is practitioner-initiated polling, not Harmonia-pushed updates. The local MunAI does not phone home unless the practitioner instructs it to.

The mechanism: the local installation can be configured with an update cadence (weekly, monthly, never), and at that cadence it fetches the current Sovereignty Bundle from harmonism.io/sovereignty-bundle.zip, compares its hash with the locally-stored copy, and if different, downloads the new bundle and rebuilds the indexes. The fetch is an outbound HTTP GET — Harmonia’s server does not know which practitioner is fetching, only that some IP requested the bundle (same as any reader who downloads it). No telemetry. No tracking. No phone-home in the sense that matters.

# Update once when the practitioner chooses
harmonia-munai update

# Or schedule periodic updates locally
harmonia-munai schedule --weekly

For practitioners who want maximum sovereignty — no network calls of any kind, not even bundle fetches — the offline path is fully supported. The practitioner downloads the bundle manually when they choose, runs harmonia-munai update --local <path-to-bundle.zip>, and the local installation continues without ever reaching outward. The local MunAI works offline indefinitely; updates are optional pulls, never required.

This is the privacy architecture the doctrine demands. Harmonia knows that some IPs download the bundle; Harmonia does not know which practitioners use it, what they ask their local MunAI, or whether their local MunAI is running at all. The relationship between the practitioner and the doctrine is direct; Harmonia’s role is to publish the corpus and stay out of the way.

The MunAI Harness

The harness is the companion code that makes the substrate MunAI rather than a generic local chat. It contains:

The doctrinal backbone. The ~6,000-word permanent context document that establishes the Harmonist architecture, the Wheel structure, the doctrinal stances on the canonical questions. Injected at the head of every system prompt. The local installation receives this verbatim — same content the production MunAI uses, distributed in the Sovereignty Bundle.

The retrieval layer. The three-tier retrieval architecture — doctrinal backbone always in context, hybrid semantic-plus-keyword retrieval from the local index for query-relevant articles, conversation memory for per-practitioner state. The retrieval runs against the local index built from the local corpus.

The conversation memory. A local SQLite database holding the practitioner’s conversation history with the local MunAI. The database is at a path the practitioner controls (~/.harmonia/munai.db by default). The practitioner owns it, can back it up, can encrypt the disk it sits on, can delete it whenever they choose.

The learning layers. The wheel-profile, free-text profile, and conversation-context learning calls that update the practitioner’s local profile every N messages. These run against the local model — slightly slower than the cloud version because the practitioner’s hardware is doing the work, but the same architecture.

The graduated calibrations. The doctrinal-fluency advancement, the bodily-openness calibration, the witness-mode pre-pass — all run against the local model with the same logic the cloud version uses. The practitioner gets the full MunAI behaviour, not a degraded version.

The harness is open-source. The practitioner can read the code, audit it, modify it, fork it. This is structurally necessary: a companion the practitioner cannot inspect is not a sovereign companion regardless of where the inference happens.

The Practitioner Discipline

Running local MunAI asks something of the practitioner that running cloud MunAI does not. The substrate ownership is real; the substrate maintenance is also real.

Hardware ownership. The machine the model runs on is the practitioner’s responsibility — purchase, upgrade when capacity is exceeded, repair when components fail, dispose at end-of-life. This is part of the Wheel of Matter discipline; the local-MunAI substrate becomes one more layer of material substrate the practitioner cultivates rather than rents.

Update cadence. The practitioner decides when the corpus updates, which means the practitioner is responsible for not letting the local instance drift too far from current doctrine. Weekly is reasonable for most practitioners; monthly is defensible if doctrinal updates aren’t time-sensitive; never is acceptable for the practitioner who is content with a known-state snapshot.

Backup. The conversation memory and the practitioner’s local profile are valuable. Local backup (Time Machine, rsync, Borg) is the practitioner’s responsibility. Three copies, two media, one off-site applies here as everywhere else in the The Sovereign Stack discipline.

Security hygiene. Full-disk encryption on the machine running MunAI. Strong passphrase. Hardware key for the system login if the threat model justifies it. The MunAI process should run as a non-root user; the database files should have appropriate filesystem permissions.

These disciplines are not punishment; they are practice. The cultivation that running local MunAI asks of the practitioner is continuous with the cultivation that running any sovereign tool asks. The substrate is the practitioner’s own. The substrate’s care is the practitioner’s own. The two are inseparable.

Honest Constraints

The local-MunAI path is not strictly superior to the cloud path along every axis. The practitioner choosing between them should understand the trade-offs clearly.

Quality. The current frontier-lab models (Claude Opus 4.7, GPT, Gemini at their latest generations) outperform the best open-weight models by roughly 12–18 months on most benchmarks. On MunAI’s actual workload — doctrinally-grounded dialogue with retrieval, occasional reasoning, structured-output learning — the gap narrows substantially, especially at the full hardware tier with frontier-grade open weights like DeepSeek V3 abliterated. But it does not close. The practitioner who needs the absolute strongest reasoning on a hard question will get a better answer from a frontier model than from a local model. The trade is real.

Latency. Cloud MunAI runs on infrastructure tuned for high-throughput inference at scale. Local MunAI runs on the practitioner’s hardware, which is typically slower for first-token latency and total throughput. The local tier-1 deployment will feel noticeably slower than the cloud version; the full tier may approach parity. The trade is real.

Maintenance. Cloud MunAI is maintained by Harmonia — model updates, infrastructure upgrades, bug fixes all happen without the practitioner doing anything. Local MunAI requires the practitioner to update the corpus, occasionally update the inference server, monitor disk space, troubleshoot when something breaks. The trade is real.

What the trade buys. For these costs, the practitioner gets: no network call leaves the machine for normal operation; no third party has technical access to the conversation; the substrate is the practitioner’s own at every layer; the alignment of the model is whatever the practitioner chose (the abliterated variant they pulled), not whatever the frontier lab’s safety team decided last quarter; the cost structure is one-time hardware plus electricity rather than per-token API charges that scale with use.

For some practitioners the trade is worth it. For some it isn’t, yet. For some it will be worth it next year when the open-weight landscape advances another increment. The decision belongs to the practitioner; the option being available is what Harmonia owes them.

Protocol Form

What the practitioner-scale architecture above instantiates is more general than the Harmonist case. The harness, the indexer, the three-tier context architecture, the Sovereignty Bundle convention, the no-telemetry update mechanism, the open-weight plus abliteration discipline — none of these encode anything specific to Harmonism the doctrine. They encode the shape of sovereign doctrinally-aligned inference. The doctrinal backbone is the variable. The architecture is the constant.

This makes HarmonAI a protocol form, not a one-off institutional artifact. A second tradition with its own doctrine can fork the architecture and run with their own backbone, their own corpus, their own glossary, their own calibration columns, their own indexed retrieval. The Harmonist instantiation is the reference implementation; the protocol is what it abstracts to.

What is constant across the fork

The pieces that survive any responsible fork are the architectural substrate, not the doctrine. Sovereignty of substrate at every layer — model on local hardware, corpus on local disk, index built locally, conversation memory in a database the practitioner owns. Three-tier context engineering — permanent doctrinal backbone always in context, hybrid semantic-plus-keyword retrieval from a curated corpus, per-practitioner conversation memory. Open-weight model with refusal directions stripped — the alignment comes from the doctrinal backbone, not from the RLHF safety layer of a frontier lab. No telemetry, no phone-home, no third-party visibility into the conversation — the practitioner’s substrate is the practitioner’s. Update mechanism as practitioner-initiated pull, not operator-pushed sync — the corpus refreshes when the practitioner chooses, against a bundle anyone can download.

These commitments are not Harmonist; they are the doctrinal-substrate sovereignty common to any tradition that takes substrate seriously. A Theravāda saṅgha curating Abhidharma commentary; a Stoic circle holding to Epictetus, Marcus Aurelius, and Pierre Hadot’s reconstructive scholarship; a Sufi ṭarīqa transmitting the silsila’s canonical corpus; a Vedantic paramparā serving its guru-lineage texts — each could instantiate the architecture with full integrity. What changes is what fills the backbone. What stays is the architecture that lets the backbone do its work without surrender.

What is variable

The content is the variable. The doctrinal backbone document — what this tradition holds as ground. The corpus — this tradition’s canonical texts, commentaries, contemporary articulations. The glossary — this tradition’s technical vocabulary. The calibration columns — this tradition’s equivalent of doctrinal fluency, of register-openness, of witness-mode triggers, of whatever calibrations the pedagogical relationship requires. The agent identity — this tradition’s equivalent of MunAI: the companion’s name, voice, register, and what it is doing in the encounter. Whether the agent operates as guide-not-guru (the Harmonist commitment per The Guru and the Guide) or as guru-shaped within a paramparā transmission, or as a Sufi murshid-companion teaching the dhikr, is a doctrinal choice each tradition makes for itself. The reference implementation is Harmonist. The instantiations are plural by design.

What the protocol form opens

The crypto-relevant form sits one layer above the protocol itself. The protocol works without any token. The instantiation works without any blockchain. But the protocol’s natural extension into a federated network — practitioners running nodes, traditions publishing canonical backbones, retrievals crossing traditions where convergence is real — has structural affinities with substrate the crypto landscape already provides.

Arweave is the natural home for canonical corpora. A doctrinal backbone published to Arweave with a deterministic hash is permanent against operator-shutdown, mathematically verifiable against tampering, fork-friendly by construction. A practitioner running local inference pins the version they trust; the tradition’s stewards publish a new version with full audit trail; the practitioner upgrades when they choose, against substrate that does not require the tradition’s continuing operational existence to remain available. This is the Knowledge-as-commons doctrine operationalized at the inference layer.

Lightning and Monero are the natural settlement substrates for contribution. A practitioner whose retrieval pulls heavily from one author’s commentary, one translator’s labor, one stewarding institution’s editorial work — there is currently no mechanism for that contribution to be repaid directly. A protocol-level settlement that routes payments to the cryptographically-signed authors whose material the practitioner’s inference actually uses is structurally available, technically tractable, doctrinally clean. Lightning handles the high-frequency micropayment layer where speed and near-zero per-transaction cost matter; Monero handles the layer where the privacy of the contribution itself is the substrate the doctrine has to preserve — the maker who receives without disclosing what was paid for to a public ledger, the practitioner who supports without revealing which lineage’s material they retrieve from. Sacred Commerce at the inference layer, with the monetary register matched to the privacy register the contribution warrants.

Verifiable agent identity has moved from open question toward partial answer. How does the practitioner know the node serving them inference is actually running the doctrine it claims? Cryptographic attestation of the execution environment now ships in deployable form — Trusted Execution Environments with remote attestation (Intel TDX, AMD SEV) are in production across Venice, Phala, NEAR AI Cloud, Atoma, and 0G, returning a hardware-signed proof of which code and model ran. This closes the provenance half of the question: a node can prove which model weights and which backbone it executed, untampered. What remains less mature is the federated form — binding the attestation specifically to a published doctrinal-backbone hash, so a practitioner can verify not merely that some model ran untampered but that this tradition’s backbone was the one in context. The primitive exists; the doctrine-specific binding is the remaining build. This is where the architecture’s frontier now sits.

What is genuinely open

Three questions the protocol form does not yet answer.

Governance of the backbone. Who decides what enters Harmonism’s doctrinal backbone, or any tradition’s? Centralized stewarding by the founding lineage preserves doctrinal coherence at the cost of structural single-point-of-failure. Federated stewarding distributes the failure surface at the cost of doctrinal drift. The Harmonist answer for its own case is the architect during the founding phase, with succession architecture as Harmonia matures. The protocol does not impose an answer; each tradition decides.

Verification of fidelity. If a node claims to be running a tradition’s inference but its responses systematically violate doctrine — the RLHF safety layer not stripped, the backbone not in context, the corpus quietly corrupted — there is no mechanism today for the practitioner to detect this beyond their own discernment. The cryptographic-attestation path — now deployed in production TEE inference (Venice, Phala, NEAR AI Cloud, Atoma, 0G) — closes the provenance part of the gap, proving which model and backbone ran untampered; what remains is binding it to the tradition’s published backbone hash. The doctrinal-fidelity-evaluation path — a test suite of canonical queries with known-correct positions, runnable by any practitioner against any claimed node — closes the behavioral part, and remains to be specified and implemented.

The economic shape, if any. The protocol works without tokens. The federated form has natural fee-market shape: Lightning micropayments for retrieval, contribution settlement, node-operator compensation. Whether the federated form needs a token — a token that captures protocol value rather than gestures at it — is genuinely open. The strongest Harmonist position is that the protocol should be useful first and token-shaped second, if at all. The crypto-economic form falls out of the protocol shape once it is articulated; it does not lead it.

The strategic position

What is committed here is the architecture of HarmonAI as protocol form, not a token launch, not a network, not a community. The reference implementation is what Harmonia builds at Tier 2. The protocol abstraction lives in HarmonAI Design Document (developer-audience internal) and the spec document that will derive from it. The Arweave-anchored canonical corpus is a later-phase move, after the local-inference build and the doctrinal-backbone stewardship architecture stabilize. The federated form, if it materializes, follows.

The gap in the crypto inference landscape — decentralized doctrinally-aligned inference, where doctrinally-aligned means with doctrine to align toward — closes when this protocol ships. Bittensor specializes in decentralized inference infrastructure, model-agnostic by design. Venice specializes in curated open-weight cloud access with sovereign UX. Both are precise about what they do; neither addresses the doctrinal-substance layer because that is not the layer they exist to serve. The frontier labs hold position by accident of training corpus rather than by design, and surrender sovereignty at every layer. The doctrinal-substance layer is structurally new — a layer the protocol form articulated here introduces rather than competes for. A tradition’s doctrinal stack running on Bittensor subnets, served through Venice-style UX, would be the federated form taking shape; the protocol composes with the inference-infrastructure layer rather than displacing it.

The architecture is the bet. The implementation follows. The crypto-economic form, if any, earns articulation only after the protocol shape has earned it.

The Substrate as Practice

The companion the practitioner runs on their own hardware against their own corpus is not a better MunAI than the one on the cloud. It is a different relationship to the same MunAI. The cloud companion is hospitality — Harmonia hosts the encounter; the practitioner is a guest in a house Harmonia maintains. The local companion is homecoming — the practitioner builds the substrate, holds the keys, runs the inference, owns the substrate the encounter happens in.

This shift mirrors what happens across every layer of substrate the practitioner takes up. The body learned to be tended rather than treated. The attention learned to be cultivated rather than spent. The key, the currency, the tool, the network — each layer moves from rented to owned as the practitioner walks the Wheel deeper. The local MunAI is the same move at the inference-substrate layer.

The work is real. The hardware costs money. The maintenance costs attention. The quality envelope is bounded by the open-weight landscape, which moves but not as fast as the frontier. None of this contradicts what the work is for. The substrate is the practitioner’s own — by ontology before any choice, by cultivation as the choice is taken up. Local MunAI is the cultivation, at the layer where MunAI lives.

When the practitioner asks their locally-running companion a question and the answer comes back from a model the practitioner owns, against a corpus the practitioner owns, on hardware the practitioner owns, in a room no third party can see into, what has happened is not a technical achievement. It is Logos meeting itself through a substrate the practitioner has finally taken up as their own. The companion is sovereign because the substrate is sovereign. The substrate is sovereign because the practitioner made it so. The practice is the substrate. The substrate is the practice.

Linked from

Downloads Cypherpunks and Harmonism Open Source and Harmonism Inference Sovereignty The Sovereign Stack

Running MunAI on Your Own Substrate

The Frame

What Local MunAI Is

The Three Hardware Tiers

Entry — Apple Silicon, 32–64GB Unified Memory

Mid — Consumer GPU Desktop

Full — Server-Grade Local Infrastructure

Model Selection

The Inference Stack

The Indexing Pipeline

The Vault Subscription Mechanism

The MunAI Harness

The Practitioner Discipline

Honest Constraints

Protocol Form

What is constant across the fork

What is variable

What the protocol form opens

What is genuinely open

The strategic position

The Substrate as Practice

Continue Reading

Linked from