Documentation · Tools · Autonomic commitment

Autonomic commitment management

vH.1.19phase: ChronosThe H.1.x benchmark/scoring rehaul arc

Overview

Autonomic commitment management is an opt-in, per-drive engine that keeps every chainweb container pinned to a drive committed to the maximum legal share of that drive’s free space, distributed equally at 3-decimal-GB precision and re-evaluated every probe tick (~60 s). When free disk grows, commits go up; when it shrinks, commits go down. There is no headroom buffer — the engine just keeps adjusting.

The motivating scenario: a kernel update consumes 700 MB of free disk on a host with three chainweb containers. Free space drops from 220.5 GB to 219.8 GB. The existing floor-to-10 commitment ceiling drops from 220 GB to 210 GB. In the manual flow, the operator has to notice this, open the Stoicism subtab, recalculate by hand, and PATCH each container. Autonomic mode absorbs the 700-MB shift on the next tick without a single human touch.

The toggle is operator-driven only. There is no heuristic that auto-enables the flag based on container count or any other observation — the operator decides per drive.

Eligibility

The toggle is offered only on drives that satisfy a four-axis composite predicate. A drive is eligible when, and only when, ALL of the following hold simultaneously:

  • SSD host_drives.is_ssd = 1. Mirrors rule 2/3 of the Medusa scoring requirement.
  • Capacity at least 50 GB capacity_bytes ≥ 50e9. The existing 50-GB minimum carries over.
  • Not under /boot mount_point NOT LIKE ‘/boot%’. The kernel and bootloader live there; the engine never touches it.
  • At least one chainweb child countChainwebsOnDrive(driveId) ≥ 1. Autonomic mode is meaningless on an empty drive; there is nothing to allot.

Eligibility is evaluated dynamically at display time. If a drive becomes ineligible (for example, the last chainweb container is migrated away), the toggle hides itself but the stored flag is preserved untouched. If the drive later becomes eligible again, the prior flag value is honoured and the engine resumes on the next tick.

The toggle endpoint

One endpoint owns the autonomic flag for any given drive. It is the single authorised mutation path; manual scoring writes against an autonomic drive’s containers are refused at the server boundary.

PATCH /api/admin/nodes/[id]/host-drives/[driveId]/autonomic-commit

Request body

{ "enabled": true }   // or { "enabled": false }

Body must be a JSON object with exactly one boolean property. Anything else (missing body, {}, { "enabled": "yes" }) returns 400 with { error: "enabled must be boolean" }.

Success response (200)

{
  "ok": true,
  "driveId": "drv_…",
  "enabled": true,
  "recomputed": true,
  "updates": 2,
  "warnings": 0
}
  • ok — always true on a 200.
  • driveId— the drive that was toggled (echoed for client convenience).
  • enabled— the flag’s state AFTER the call.
  • recomputed trueonly on the OFF→ON transition (an immediate first recompute ran). falseon ON→OFF, on idempotent no-op calls, and on the stale-state idempotent path.
  • updates— the number of per-container committed_gb rows changed by the immediate recompute. Always 0 when recomputed is false.
  • warnings— the number of floor-protection skips during the immediate recompute (see “Audit guarantees” below).

Status codes

CodeMeaningBody / notes
200Toggle accepted{ ok: true, driveId, enabled, recomputed, updates, warnings }
400Validation / eligibility{ error: "enabled must be boolean" } or { error: "ineligible", message } when enabling a drive that fails the four-axis gate.
404Not found / not yoursReturned when the node id, the drive id, or the (node, drive) pairing does not resolve. Same status used for non-owner access — the admin-gate posture is "404 over 403" so the existence of an asset is never leaked.
405Method not allowedGET / POST / DELETE / PUT all return 405 with `Allow: PATCH`. Only PATCH is wired.

Behaviour on enable (OFF→ON)

When the toggle transitions from OFF to ON, the server runs an immediate first recomputeas part of the same PATCH — the operator does NOT wait up to 60 s for the next worker tick. The sequence inside the endpoint is:

  1. UPDATE host_drives set autonomic_commit_enabled = 1.
  2. Emit one autonomic_commit.toggle audit row with permanent retention.
  3. Call runAutonomicCommitForDrive(driveId) synchronously. This reads the freshest mount_free_bytes, computes the equal-split distribution, and UPDATEs each chainweb child’s committed_gb.
  4. Return the response carrying the post-recompute updates and warnings counters.

From the operator’s perspective: by the time the PATCH returns, every container on the drive already carries its equal share. The next worker tick is a no-op unless free disk has actually moved.

Behaviour on disable (ON→OFF)

When the toggle transitions from ON to OFF, the engine simply stops touching the drive. The current per-container committed_gb values stay frozen at their last autonomic-derived 3-decimal values. There is no revert to a prior manual value, and norecompute on disable — the response carries recomputed: false and updates: 0.

The operator is then free to PATCH the standard scoring endpoint with integer commitments from the frozen baseline. Disabling is always allowed regardless of eligibility — an ineligible drive (for example, one with zero chainweb children) can still have its flag flipped off.

Audit guarantees

The autonomic engine writes three distinct audit-action keys, each with its own retention policy and forensic detail shape. Every entry lands in admin_audit with the standard envelope (actor email, target node id, timestamp).

Action keyRetentionEmitted when / detail capture
autonomic_commit.togglepermanentOn every actual ON↔OFF transition. Idempotent re-PATCH (same state) writes no row.nodeId, driveId, mountPoint, before (bool), after (bool), plus the actor email on the audit envelope. On enable the toggle row also captures `immediateRecomputeOk` and `immediateRecomputeError` so partial-failure forensics are preserved.
autonomic_commit.adjustrolling 30 dOn every worker-tick recompute that actually changes one or more per-container committed-GB values. No-change ticks emit nothing.nodeId (host), driveId, mountPoint, freeBytes, maxCommittableGb (3-decimal), containers: [{ nodeId, before, after }] for every chainweb child on the drive.
autonomic_commit.warning_floorrolling 30 dWhen the equal-split would drop a container's committed-GB below its current chainweb-data-used floor. The unsafe write is skipped; the other containers on the drive still receive their updates.nodeId (host), driveId, containerNodeId, attemptedCommitGb, floorGb, freeBytes.

The forensic detail is sized to reconstruct any past adjustment from first principles: the drive, the mount point, the free-bytes reading at the time, the computed total committable, and the before/after committed-GB for every affected container. Audit volume is capped at one adjustrow per drive per tick, and only on actual change — the steady-state idempotent tick produces zero rows.

The sum-equality invariant

Every autonomic recompute holds an exact invariant: the sum of per-container committed-GB values on the drive equals the drive’s total committable amount, to 3-decimal-GB precision. There is no rounding loss across containers; the total is reconstructed from the integers actually written.

Worked example

Drive total committable: 61 GB. Three chainweb containers pinned to the drive.

  • Base share: floor(61_000 / 3) / 1000 = 20.333 GB.
  • Remainder: 61_000 − 20_333 × 3 = 1 thousandth.
  • Distribution (containers sorted by node UUID ascending, first 1 gets the +0.001 bump): [20.334, 20.333, 20.333].
  • Sum check: 20.334 + 20.333 + 20.333 = 61.000. ✓

All internal arithmetic happens on integer-thousandths to avoid floating-point drift; conversion to a real-number view happens only at storage and display boundaries. The committed_gb column already stores REAL values without a schema rewrite.

Floor protection

The engine refuses to set any container’s commitment belowthat container’s current chainweb-data-used floor — the existing protective invariant from the manual scoring path. When equal-split would require a sub-floor write, the engine skips that container, leaves its committed_gb unchanged, emits one autonomic_commit.warning_floor audit row, and continues with the rest of the drive. Skipped containers show up as before === after entries in the same tick’s autonomic_commit.adjust row, so the partial update is fully reconstructable from the audit trail.

Mount Capacity card — toggle, sub-lines, quasi-reserved bar

The Mount Capacity card on the per-node admin page is the primary cockpit for the autonomic flag. It enumerates every host drive on the box and renders the toggle inline with that drive’s capacity bar — one toggle per eligible drive, no global switch.

  • Toggle visibility — the toggle is rendered only on drives that satisfy the four-axis eligibility predicate above. On ineligible drives the toggle is hidden entirely; no greyed-out affordance, no tooltip explaining why — the row just doesn’t carry the control.
  • Per-container sub-lines — when autonomic is ON, the drive row expands with one sub-line per chainweb container pinned to it, formatted committed N.NNN GB / drive total Y.YYY GB. Both numbers render at 3-decimal precision. Re-evaluated on every probe tick — a kernel update that shaves 700 MB off free disk shows up here within ~60 s without an operator click.
  • Quasi-reserved bar treatment — when autonomic is ON, the committed area in the drive’s capacity bar is rendered the same way as actually-used space, communicating “intent to occupy”. The engine WILL fill that envelope as chainweb data lands; the bar reflects the post-convergence shape, not the current on-disk usage. On manual drives the bar keeps the conventional split between used (filled) and committed (outlined).
  • Stored-flag preservation — if a drive becomes ineligible while the flag is ON (e.g., the last chainweb container is migrated away), the toggle hides itself but the stored autonomic_commit_enabled = 1 row is preserved untouched. The moment a chainweb container is pinned back, eligibility returns and the engine resumes on the next tick — no operator re-toggle needed.

The toggle is the sole authorised mutation path for the flag. There is no global “enable autonomic on every drive” button and no API for cluster-wide flips — per-drive operator decision is the design.

Operator gotcha — disable-then-enable on an ineligible drive

Stored-flag preservation cuts both ways. If you disableautonomic on a drive that is currently ineligible (for example, you’ve already removed the last chainweb container in preparation for retiring the drive), the flag is written to 0and the preservation guarantee no longer applies — you have just performed an operator-induced reset.

Re-enabling then requires the drive to first regain eligibility — a chainweb container has to be added back before the toggle re-appears in the Mount Capacity card. There is no “hidden re-enable” affordance on an ineligible drive; the four-axis predicate is the gate for every visible toggle, on or off.

The safe pattern: do NOT disable autonomic on a drive you intend to keep autonomic during an ineligible window. Wait until at least one chainweb container is pinned back, then disable from a fully-eligible state. This protects the “stored flag preserved” guarantee from operator-induced reset.

Stoicism subtab — read-only state on autonomic containers

The Stoicism subtab (NodeScoringCard) is the per-container scoring surface. When the container’s pinned drive has autonomic enabled, the card switches to a read-only presentation that prevents accidental manual writes against an engine-managed value.

  • Read-only commit input — the commitment number input is disabled. The current value renders at 3-decimal-GB precision (e.g. 20.334 GB) so the operator sees the engine’s exact assignment, not a rounded-to-integer view.
  • “Managed autonomously” sub-line — below the input the card shows “Managed autonomously on drive {label} — disable on Mount Capacity to edit”, where {label} falls back to the mount point if no human label is set. The operator always knows which drive owns the lock.
  • Disable affordance — a one-click Disable autonomic on this drive button sits inline with the sub-line. Clicking opens a confirmation dialog; on confirm, the card issues the same PATCH … /autonomic-commit with { enabled: false } described above. After the round-trip, the input flips back to writable and the operator can resume manual scoring from the frozen baseline.

Server-side the manual scoring endpoint accepts the operator’s value even when the drive is autonomic (per the fractional-acceptance widening) — the engine’s next tick simply overwrites it on convergence. The read-only UI is the operator-facing guard against this transient overwrite, not a server contract.

Install Wizard — auto-fill on autonomic target drives

The Install Wizard’s segregated-container flow walks the operator through choosing a target drive and a commitment slice for a new chainweb container. When the chosen target drive already has autonomic enabled, the slice slider is replaced by a read-only display showing the post-add equal share — the engine, not the operator, decides what the new container gets.

  • Auto-computed value — the wizard reads the drive’s current total committable amount, divides by N + 1 (existing chainweb children plus the one being added), rounds down to 3-decimal-GB, and displays that value as the slice. Worked example: a 61 GB drive with 2 existing chainweb children, post-add count 3, displays 20.333 GB as the conservative base share.
  • Score preview reflects auto-value — the live score preview at the bottom of the wizard updates against the auto-computed slice, not a slider position. The operator sees the same number the engine will write on first tick.
  • Persisted commit value — on submit, the new container is created with the exact 3-decimal value persisted as committed_gb on the host_drive_attachments row. The very next worker tick may bump the value by +0.001if the new container’s UUID happens to sort first under the deterministic remainder distribution — this is expected convergence, not drift.

On manual (non-autonomic) target drives the wizard keeps the slider as before. The auto-fill is a per-target switch, not a global wizard mode.

Display formatting — 3-decimal vs integer GB

Every visible commitment value in the admin UI passes through one helper: formatCommittedGb(value, autonomic). The rule is mechanical:

  • When the drive is autonomic — render at 3-decimal precision (e.g. 20.334 GB). The engine’s exact value, no rounding loss at the display boundary.
  • When the drive is manual (autonomic OFF) — render at integer GB (e.g. 20 GB). Mirrors the existing manual-mode floor-to-10 commitment shape; no visible change for non-autonomic drives.

The helper is wired into NodeScoringCard, the Mount Capacity card’s drive-total + sub-lines, and the fleet view’s server-grouped-nodes commitment column — no view bypasses it. A drive flips from 20 GB to 20.000 GB the moment its autonomic flag flips ON; this precision shift is the visual cue that the engine has taken over.

← back to Tools · stamped against vH.1.19