Documentation · Releases · Cerberus

Cerberus

v.G.1.1 — Cerberus

Every managed box gets a baseline firewall posture pushed from the hub — presets, validated custom rules, drift detection, and a full audit trail.

Cerberus is the hub-driven UFW lifecycle for managed nodes, end to end. Before it, a box's firewall was whatever had been hand-typed on it; there was no single place to see the desired posture, no drift detection, and no audit of who changed what. Cerberus makes the hub the source of truth: an operator bootstraps a node's baseline, applies preset templates (stoa-prime, stoa-tunnel, mailcow, ipfs) or server-side-validated custom rules, and every system probe re-evaluates the live ruleset against the desired state and surfaces drift on the node card. The IPFS admin RPC port 5001/tcp is hard-blocked by the preset and gated behind an explicit unsafe-port confirmation when added by hand. Every mutation — bootstrap, preset apply/remove, custom-rule add/delete, reconcile, the unsafe-port warning — goes through admin_audit with the operator email, target node, action, and result, and a fleet bulk-apply page lets an Ancient Admin push a preset across many compatible nodes after a dry-run preview. Cerberus is the first patch of the Genesis post-launch line; it ships at the legacy coordinate v.G.1.1.

Why “Cerberus”

Cerberus is the three-headed hound that guards the gates of the underworld — the vigilant gatekeeper who lets none pass unsanctioned. The codename captures the shape of this release: before Cerberus a managed box’s perimeter was whatever had accumulated on it by hand, with no central desired state and no way to notice when it drifted. Cerberus puts a single watchful guard at every box’s edge — the hub declares the posture, the node enforces it, every probe re-checks it, and nothing changes without an audited reason.

Codename note: this node ships at the legacy coordinate v.G.1.1 — the first patch of the Genesis (Boreas) post-launch line. The firewall material was aligned out of the audit-cycle changelog section into its own ## Cerberus — v.G.1.1 section; Cerberus is the existing shipped Boreas forest node, not a new mint. Old /docs/releases/cerberus-firewall bookmarks keep working — they permanently redirect here, to the canonical /docs/releases/cerberus page.

Headline changes

  • Hub-managed UFW for every managed node. A hub-driven UFW lifecycle, end to end: four endpoints — bootstrap (initial install + ruleset apply), reconcile (re-apply desired state, sweep drift), state (read the live UFW snapshot), plus the drift-status surface.
  • Preset templates. Curated rule bundles pushed from the hub — stoa-prime, stoa-tunnel (3 ranges, 300 ports), mailcow (10 ports), ipfs (3-or-4 with an optional WebSocket), and a custom sentinel for operator-typed rules. Apply and remove are idempotent — a fully in-sync re-apply is a no-op, and a remove only touches preset-owned rows.
  • Validated custom rules with the unsafe-port guard. Operator-typed rules go through full server-side validation (port/range, protocol enum, printable-ASCII comment, IPv4/IPv6 CIDR, 5-tuple dedup). The IPFS admin RPC port 5001/tcp is hard-blocked by the IPFS preset; adding it by hand is gated behind an explicit unsafe-port confirmation and writes a dedicated warning audit row.
  • Probe-cycle drift detection. Every system probe re-evaluates the live firewall against the desired ruleset and surfaces drift on the node card, with an inline reconcile drawer offering the three drift actions — re-apply to the box, import from the box, or mark unmanaged.
  • Full audit trail + fleet bulk-apply. Every mutation (bootstrap, preset apply/remove, custom-rule add/delete, reconcile, the unsafe-port warning) goes through admin_audit with the operator email, target node, action, and result. An Ancient-Admin-only bulk-apply page pushes a preset across many compatible nodes after a dry-run preview, writing one umbrella audit row plus the per-node queued/completion contract.

For operators

The per-node firewall lives on the node’s admin tab strip (gated to operator / Modern+ / Ancient): a bootstrap CTA for unbootstrapped nodes, the four preset apply/remove cards with the IPFS browser-WebSocket toggle, a custom-rule add form that mirrors the server-side validator, a rules table grouped by preset source, and the inline reconcile drawer for drift. The full audit trail for a node is one click away — the footer deep-links to the firewall namespace of the audit log.

The operator-facing reference for hub-managed UFW lives at /docs/tools/firewall — baseline policy, the preset catalog, per-server overrides, drift detection, and the audit surface, the same reference the Tools pillar landing card links into.

Cerberus.1 — location groups

Cerberus.1 expands the codename from firewall control alone into cross-server port-allocation cooperation. Two physical servers behind the same LAN/NAT can now be tagged into the same location groupso they share a single port-slot pool. Before Cerberus.1, adding a sibling server in the same NAT collided on the tunnelee SSH range (22000:22100) because each server’s allocator only saw its own per-server pool; the operator hit exactly that with AncientOne and AncientIntel. After Cerberus.1, the allocator scopes by group membership: AncientOne uses slots 0–3 (17891–17894 / 22001–22004 / …) and adding AncientIntel into the same group auto-picks slots 4–7 from the union, avoiding the collision.

  • Backward-compatible. Servers without a location_group_id keep the per-server pool behaviour (the dominant case at ship time). The widened SQL collapses to the byte-equivalent single-server WHERE clause for ungrouped nodes.
  • Adding a server. The add-server form (/hub/nodes/new) gains a Location step with three options: standalone (default), join an existing group, or create a new named group. The new group is created first via POST /api/admin/locations and its id is passed through to the node insert.
  • Per-server picker. The per-server detail page shows a Location: <group-name>row with an inline edit affordance (ancient-gated). Editing opens a picker that lists every group plus a “detach to standalone” option; committing PATCHes the node and reloads.
  • Bulk page. /hub/locations lists every group with its members, per-member container count, and slot range used (low/high). Each member links back to the server’s detail page.
  • Port-pool banner. The per-node port-pool card adds a banner above the slot table reading “Location pool — shared with N other server(s): <names>” when the node is grouped, and the slot table’s “Assigned to” column shows the owning server label for slots held by siblings.
  • Collision warnings. Moving a node into a group whose siblings already hold the same slot/purpose surfaces a warning list inline before commit; the join is emitted with a non-blocking warnings.affectedContainers array in the PATCH response and audited under the new node.location_group.set action kind alongside location_group.create / .rename / .delete.

Codename note: Cerberus.1 stamps under the existing Cerberus codename as the second forest node (per-codename ordinal .1), legacy token v.G.1.1a under the letter-suffix convention. The v.G.1.2 integer slot stays reserved for the autonomic-commitment-management spec. The Cerberus subsystem now spans both firewall control (Cerberus.0) and port-allocation cooperation (Cerberus.1); future quicks on either layer route to the next Cerberus ordinal under the per-codename forest-ordinal model.

Cerberus.1-a: Owner-scoped gating

The first ship of Cerberus.1 made every group globally visible to every admin tier. Cerberus.1-a tightens the surface to a four-tier owner-scoped model so operators only ever see the groups they actually manage, while ancient admins retain full oversight.

The locked permissions table:

Actionnon-ancient (baron / modern / client)ancient
Self-create groupsup to 1 (quota)unlimited
List groupsonly ownall (Personal / Foreign split)
Update own groupyesyes
Update foreign group403yes
Delete own groupyes (409 if members)yes (force=true bypasses)
Delete foreign group403yes (force-delete auto-detaches)
Join own server → own groupyesyes
Join own server → foreign group403yes (override)
Join foreign server → any group403yes (override)
  • Ancient-grant flow. An ancient admin can mint a group on behalf of a specific operator via POST /api/admin/locations with a targetUserEmail field. The new row carries granted_by_ancient_at (timestamp) so the operator sees a “Granted by ancient” badge on their list and the row does NOT consume the operator’s one-group quota slot.
  • Quota counter. Non-ancient operators see a You have N of 1 location groupbanner above the create form. The “+ New location group” button becomes disabled at quota with a tooltip pointing at the ancient-admin escalation path.
  • Force-delete + Revoke. When ancient deletes a group with members and ?force=true, every member node is auto-detached (location_group_id = NULL), each detach emits a node.location_group.auto_detach_on_group_delete audit row, and the group is removed. The terminal audit is location_group.force_deleted when the group was ancient’s own, and location_group.revoked when it belonged to another operator — the foreign-row “Revoke” button in the UI calls this path.
  • Per-owner UNIQUE name. Migration 095 adds a UNIQUE(created_by, name) index on location_groups so two different operators can each call their group “BasementLAN” without collision; a single operator still cannot duplicate. The 409 carries {error: 'name_taken'}.
  • Cross-owner join refusal. PATCH /api/admin/nodes/[id] with location_group_id enforces the new gate: non-ancient callers must own BOTH the node AND the target group, or the call returns 403 cross_owner_join_forbidden (audited at failure tier). Ancient bypasses both checks.

Codename note: Cerberus.1-a is a hotfix subsection appended under the existing Cerberus codename. The gating model is locked operator design — every admin tier sees the matrix above and nothing else.

Cerberus.1-b: Per-server location-edit confirm wrap

The last raw admin_confirm_required string leak in the location-groups surface. Cerberus.1 (/hub/locations create) and Cerberus.1-a (Bundle 2 — seven codex pages) wrapped every other mutation with the pre-confirm hook; the per-server Location row on the node detail page went unwrapped and continued to surface the raw key as red text whenever the 5-min fresh-confirm window had expired between page-load and Save.

  • Pre-confirm wrap. The edit affordance on the per-server detail page Location row now opens the standard useConfirmPassword modal before issuing the create-group POST /api/admin/locations (when the operator chose “create new”) and the node PATCH /api/admin/nodes/[id] that assigns the new location_group_id. The modal copy adapts to the selected mode (create / join / detach to standalone) so the operator sees an accurate prompt.
  • Friendly error translation. If the fresh-confirm window expires mid-flow, the response is translated from admin_confirm_required to “Admin confirmation expired. Click Save again to re-authorize.” — no raw snake_case key in the operator UI.
  • Draft-name pre-validation. The “Create new location group” mode now validates the name field before opening the password modal, so the operator isn’t asked to authorize a mutation that’s about to fail client-side anyway.

Codename note: Cerberus.1-b is a single-file hotfix on components/admin/location/LocationRow.tsx — mirror of the same wrap pattern Cerberus.1 and Cerberus.1-a applied to every other mutation surface. lib/version.ts and lib/releases.ts are byte-untouched (letter-suffix hotfixes do not mint new forest nodes).

Cerberus.1-c: Trailing-slash dynamic-route capture fix

Save on the per-server Location row returned Not found red text even with a valid target group. Root cause: with next.config.ts trailingSlash: true, a PATCH to /api/admin/nodes/<id> 308-redirects to /api/admin/nodes/<id>/; Next.js’s dynamic-route capture for the slashed URL includes the trailing slash in req.query.id, so the WHERE id=? SELECT silently misses and the auth guard 404s on a perfectly valid node. Defensively stripped in both requireOwnedNodeApi (which covers every /api/admin/nodes/[id] method) and the /api/admin/locations/[id] handler. Letter-suffix hotfix; lib/version.ts + lib/releases.ts byte-untouched.

Cerberus.1-d: LocationPicker dropdown state-sync

Save still returned Not found after Cerberus.1-c — the actual cause was a controlled-form desync, not a route-capture issue. PM2 diagnostic logs showed the PATCH body arriving as {"location_group_id":""} — an empty string, not the AncientHeadquarters UUID — so the server’s WHERE id = ? correctly returned no row and the auth guard 404’d. (The Cerberus.1-c trailing-slash strip remains as harmless defensive code, but it was not the cause.)

The desync: the per-server edit panel fetches the groups list asynchronously, so LocationPicker renders first with props.groups === [] and useState initializes the internal groupId to "". When the fetch lands and the dropdown populates, the browser displaysthe first option as selected (because no option value matches the empty string), but the React state is still empty. Clicking “Same location as” followed by Save without an explicit dropdown click then submits the empty string.

Fix: a useEffect on props.groups in LocationPicker syncs groupId to props.groups[0].id whenever the groups prop arrives non-empty and the current groupIddoesn’t resolve to one of them. If the operator had already clicked the “Same location as” radio before the fetch landed, the effect also re-fires props.onChangeso the parent’s selection state catches up. Single-file hotfix; lib/version.ts + lib/releases.ts byte-untouched.

Cerberus.1-e: Remove chainweb container — phantom-state recovery

Operator-stated UI-first principle: every operational action must be doable from the hub UI. When a chainweb install aborts mid-flight (most commonly at the Let’s Encrypt cert obtain step), the host can end up with a real Docker container the hub never recorded — the bootstrap commits the node row only AFTER cert success, so a failed cert obtain leaves the box with a flapping stoa-node container the operator can SEE (the per-host status probe surfaces it as “starting / unreachable”) but can’t REMOVE through the UI, because the existing uninstall action 409’d with “no chainweb installation recorded for this node”. The only escape was SSH-and-clean-by-hand — which violates the principle.

  • Probe-first uninstall. When the hub DB has no recorded install, POST /api/admin/nodes/[id]/stoachain/uninstall now SSH-probes the host (key auth) for phantom artifacts: a stoa-node container, an /etc/letsencrypt/live/<host> tree, the hub sudoers file, or a stray StoaNode/ directory under /home, /mnt, or /srv. If any signal fires, the job is enqueued anyway with a forceClean: true flag (audited) and the existing handler’s standalone- container fallback sweeps it. Only when probe AND DB are BOTH empty does the original nothing to uninstall 409 fire.
  • LE cert artifact cleanup. The stoachain-uninstall handler script gains a step 4 that removes /etc/letsencrypt/{live,archive}/<node.host> and /etc/letsencrypt/renewal/<node.host>.conf. Without this, the next install attempt’s certbot would find the existing renewal config and refuse to re-issue, even on the clean container path.
  • Same UI button, more capability. No new affordance to learn — the existing Remove chainweb container action just succeeds in the phantom case where it previously returned the raw 409. The operator clicks once and the cleanup job streams its progress through the standard job-overlay UI.

Codename note: Cerberus.1-e extends the install/uninstall surface rather than the location-groups subsystem proper, but rides under the Cerberus letter chain for continuity with the active v.G.1.1[a-d] hotfix series. lib/version.ts + lib/releases.ts byte-untouched.

Related

  • Cerberus chapter — the public operator reference for hub-managed UFW: bootstrap, presets, custom rules, the F12 unsafe-port confirmation, and the reconcile drift actions.
  • Asclepius — v.G.1.x — the audit-cycle codename for the Genesis post-launch line; Cerberus is the first patch in that line, and future audit cycles append under Asclepius.

← back to Releases · stamped against H.1.19