v.Chaos.Jason.0-aΒ·v0.7.4q
Finalizes the role matrix that was nudged into existence by the real
handover flow (ancient admin sets up a hub, then hands accounts off to
modern admins who in turn manage their own clients).
### Admin console β per-link access tags
The `/admin` quick-links list now prefixes each entry with a compact
three-glyph role badge:
- `β
` ancient (gold)
- `β` modern (blue)
- `β` client (grey)
A glyph lights up when that role can access the page; greyed out
otherwise. Unavailable links render disabled with a small
"(restricted)" tag β visible but unclickable, so modern/client admins
can see at a glance what exists but isn't theirs to touch.
### Role matrix tightened
- **Acolytes** β now ancient-only (page gate + API GET both locked to
ancient). Modern admins were never able to mutate the roster (already
ancient-only there), but they could browse. That's gone now β the
public-site team roster is an ancient concern.
- **Admins & Clients page** β no longer loads for clients. The page gate
now 404s any role below modern. (Quick-links already hid it, but a
direct URL hit would still render a stripped view.)
- **Client management** (`/api/admin/clients/...`) β promote, revoke,
and reset-onboarding actions now accept both ancient and modern admins
via the new `requireFreshAdminNonClientConfirmApi` guard. Client role
itself is still rejected.
- **Admins roster API** β GET now rejects clients explicitly
(404, not 403, keeping the "not-admin" veneer).
- **Admins page UI** β Promote-to-Client form, Revoke, and Reset
Onboarding buttons now render for modern admins too.
Grant-Modern-Admin stays ancient-only.
- **Mailcow mailbox list** β modern admins now fetch it at page load so
the Promote-to-Client picker populates for them.
### Files touched
- `lib/admin.ts` β added `requireFreshAdminNonClientConfirmApi`.
- `pages/admin/index.tsx` β new `<AccessTag>` + `<QuickLink>` helpers;
Quick-links rewritten to use them.
- `pages/admin/acolytes.tsx` β role gate tightened to ancient.
- `pages/admin/admins.tsx` β UI gates + effect dependency for Mailcow
fetch now include modern role.
- `pages/api/admin/acolytes/index.ts` β `requireAncientAdminApi`.
- `pages/api/admin/clients/index.ts` β POST guard swapped;
GET rejects clients.
- `pages/api/admin/clients/[email].ts` β DELETE + PATCH guards swapped.
- `pages/api/admin/admins/index.ts` β GET rejects clients.
---
v.Chaos.Jason.0-bΒ·v0.7.4j
Five slices bundled in one release so v0.7.4 closes cleanly. Each
addresses a gap surfaced during today's real-world VPS onboarding.
Phase code β **CR3** (Client Role 3 β onboarding end-to-end).
### v0.7.4e β Install Wizard Certificate step
Previously: fresh installs got a self-signed P-256 cert and the
operator had to manually go to the Identity tab, paste the DuckDNS
token, run Obtain-LE, restart. Long chain of clicks + context
switches with the DuckDNS dashboard.
Now: Identity step in the wizard expands when `p2pHostname` ends in
`.duckdns.org`. Extra fields: DuckDNS token (required for auto-LE)
+ email (optional). On install, after `docker compose up`, the
handler runs `certbot certonly --manual --preferred-challenges dns`
with auto/cleanup hooks that hit DuckDNS's update API. Cert files
land at `<tlsDir>/tls-{cert,key}.pem` (what compose mounts) β
container restarts β node emerges with CA-signed cert ready to peer.
Renewal deploy-hook installed at
`/etc/letsencrypt/renewal-hooks/deploy/stoa-inst.sh` β re-copies
the renewed cert to the compose-mount paths and restarts
stoa-node automatically on every certbot.timer fire.
Non-DuckDNS hostnames: LE step skipped; self-signed bootstrap
remains and operator can run Obtain-LE manually (existing flow,
unchanged).
### v0.7.4j β Seed-at-install option
Install Wizard Profile step grew a checkbox: "Install with current
hub seed (recommended)". Shows seed cut height + size + donor.
Defaults ON when a current seed exists. On apply, install handler
replaces empty chainweb boot with an inline call to the existing
`stoachainReseedHandler` (reuses v0.7.3c-f's rollback + cert-preserve
+ stream-plumbing logic). Net: install completes with chainweb at
the donor's cut-at-backup time, not cut=0. Minutes saved on stoa;
hours-to-days on Kadena-mainnet-sized chains later.
If no current seed on hub, the checkbox turns into a grey note
linking to `/admin/seeds` to produce one first.
### v0.7.4g β Already-managed detection at Add-Node
New preflight endpoint `POST /api/admin/nodes/already-managed-probe`.
SSHes with password auth to the target and runs 5 detection checks:
- `/etc/sudoers.d/ancientholdings-stoa` exists (hub-sudoers file)
- chainweb-node process running
- `stoa-node` container present
- `RunStoaNode.managed.sh` file under `/home`, `/mnt`, or `/srv`
- `ah-hub:` / `ancientholdings-hub` marker in authorized_keys
If any trigger, the Add-Node wizard's Bootstrap submit surfaces a
`window.confirm` listing detected signals before proceeding.
Operator can Cancel (safe default) or click OK to force-adopt
anyway (e.g. re-adding after accidental delete, or they've already
cleaned up another hub's leftover).
Non-destructive probe β purely read-only. Prevents the
"two hubs dueling for one server" footgun.
### v0.7.4h β Key-purge on unmanage + `/admin/orphans` page
Node DELETE endpoint rewritten. New flow:
1. SSH into the target, remove any line in `~/.ssh/authorized_keys`
(and `/root/.ssh/authorized_keys`) containing the `ah-hub:`
marker. Backup copy left as `*.bak.<timestamp>`.
2. Unconditionally delete vault secret + nodes row (the hub commits
to losing its SSH access regardless of whether step 1 succeeded).
3. If step 1 failed (network partition, target offline, auth
failure), write a row into `node_orphans` capturing what was
attempted + the error.
New admin page `/admin/orphans` (ancient-only). Lists unresolved
orphans with clear "SSH in yourself and remove the ah-hub: line"
instruction + a "Mark resolved" button. Keeps resolved history
(last 20) with the operator's cleanup note.
### v0.7.4i β Onboarding transparency modal for clients
First time a `client`-role admin lands on `/admin`, modal appears:
- Names the hub's ancient admin (first in env list)
- States plainly: hub has full SSH access to their managed
servers, every action is audit-logged, client retains ownership,
unmanage removes the hub's key
- "I understand β continue" stamps `clients.accepted_transparency_at`
(one-way). "Cancel β sign me out" redirects to home.
Modal fires **only** for role=client. Ancient/modern admins see
nothing (they already know the game). Uses new endpoints:
- `GET /api/admin/clients/me` β role + acceptance stamp
- `POST /api/admin/clients/me` β stamp acceptance (no-op if
already accepted)
### Backlog
New `plans/BACKLOG.md` seeded with:
- **Storage-partition awareness per service** (user-requested
today) β ability to see where each hub-hosted service lives
(partition + path + free space) and move services between
partitions. Matters once hub hosts multiple websites
(caduceus subdomain + others). Live server currently has a
480 GB partition at 29% that will eventually need management.
- A few smaller items surfaced in today's VPS-onboarding arc.
### Version bump
- `lib/version.ts` β **v0.7.4j**. Phase code **CR3** (Client Role
3 β onboarding end-to-end). Closes the v0.7.4 phase as planned in
`plans/v0.7.4-client-role.md`.
With this release a fresh VPS β synced stoa peer is **one form in
the Install Wizard** (DuckDNS token being the only external dance
the operator still does manually β grab it once from DuckDNS
dashboard). Original 45-min ops slog compressed to ~10 min.
---v.Chaos.Jason.0-cΒ·v0.7.4k
User spotted an inconsistency on `/admin/seeds`: AncientMiner's row
showed `bytales.duckdns.org` (nice DNS name) while IonosFiveVPS
showed `82.165.48.252` (raw IP) β even though IonosFive now has
`kjrkentolopon.duckdns.org` as its P2P identity.
Cause: the UI was showing `nodes.host` (SSH entry point from the
Add-Node wizard). AncientMiner happened to be added via its
DuckDNS name for SSH; IonosFive was added via raw IP. The two
don't have to match.
**Fix**:
- `ManagedNodeSeedRow` gains `p2pHostname: string | null` β
populated from live argv's `p2p-hostname` flag (skipping the
`0.0.0.0` placeholder).
- Seeds page prefers `p2pHostname` when displaying node identity,
falls back to `host` (SSH) if no p2p-hostname is set yet.
- Appends `(ssh: <host>)` in muted text when the two differ, so
the operator can still see the SSH entry point at a glance.
- Tooltip explains which is which on hover.
Behavior preserved: the backing data still uses `host` for SSH.
Only the display changed.
**Version bump**
- `lib/version.ts` β `v0.7.4k`. Phase stays `CR2`.
v.Chaos.Jason.0-dΒ·v0.7.4f
**The real bug behind "fresh install won't sync."** User with a fresh
IonosFive VPS had:
- LE-signed cert β
- Real DNS hostname pointing at the box β
- `p2p-hostname` set to that hostname β
- Port 1789 reachable β
And still cut=0, no peers, no sync. Diagnosis:
`RECOMMENDED_PROFILE` (what the Install Wizard uses) does not
include `known-peer-info`. The `stoa` custom chainweb variant has
**no built-in bootstrap peer list** β that's a `mainnet01`-only
thing baked into upstream chainweb-node. So a fresh `stoa` node
with no `known-peer-info` has **zero peer-discovery seeds** and
sits at cut=0 forever waiting to be contacted, which can't happen
either since its hostname was just created seconds ago and nobody
in the network knows about it.
**Fix**: add `'known-peer-info': ['node1.stoachain.com:1789',
'node2.stoachain.com:1789']` to `RECOMMENDED_PROFILE`. Two entries
for redundancy β a fresh node survives one seed being
temporarily down. Once peer gossip discovers the broader graph
on first handshake, the seed entries become non-critical.
`ANCIENT_PROFILE` had one entry already; parity restored.
**For existing nodes that were affected**: add `known-peer-info`
manually via Flag Editor and Restart β takes 2 min. Or re-run
Install Wizard (cleanup auto-wipes and re-installs with the new
default).
**Version bump (CR2 continues)**
- `lib/version.ts` β `v0.7.4f`. Skipped `.e` because that slot is
reserved for the "certificate step in Install Wizard" slice
(still planned; this patch unblocks the current user first).
v.Chaos.Jason.0-eΒ·v0.7.4d
Install handler's self-signed cert generator was still emitting
ECDSA P-384 / SHA-384 β a copy-paste carryover from chainweb-node's
example script that never matched any production Stoa node.
node1 / node2 / AncientMiner all use ECDSA P-256 / SHA-256 per
`RunStoaNode.sh`. v0.7.3g fixed this for `stoachain-cert-rotate`
but missed `stoachain-install` β same class of miss as the
compose-plugin one.
**Fix**: install handler's `openssl req -x509` now uses
`-newkey ec -pkeyopt ec_paramgen_curve:P-256 -sha256`.
**Important note for the operator**: the curve change does NOT
make peers accept the node. Chainweb-node validates peer certs
against the **system CA bundle**; any self-signed cert (P-256 or
P-384) is rejected as "unknown CA" β verified fact from the
node2 TLS forensics. A fresh install thus produces a
chainweb-node that peers refuse. To get peer acceptance:
1. Point DNS for your chosen `p2p-hostname` at the new VPS.
2. Chainweb tab β Identity β "Obtain Let's Encrypt certificate"
(HTTP-01 if port 80 is free; DNS-01 for DuckDNS).
3. Restart the node.
v0.7.5 (planned) folds the certbot step into the Install Wizard
itself, so the wizard asks "enter hostname + obtain LE now?" at
install time. For now, it's a manual post-install click.
**Version bump**
- `lib/version.ts` β `v0.7.4d`. Phase stays `CR2`.
v.Chaos.Jason.0-fΒ·v0.7.4c
Install failed on a fresh Ionos Ubuntu 24.04 VPS with empty error
"docker compose up failed:" β same class of bug that
convert-supervision had through v0.7.3r/t but the install handler
never got the fix. Two root causes hit simultaneously:
1. Ionos's default docker CLI is from `docker.io` apt package, which
**does not include the compose plugin**. `docker compose up -d`
parses `-d` as a top-level docker flag and chokes.
2. The install handler's `dockerComposeUp` used `2>&1` to merge
stderr into stdout but then only read `r.stderr` on failure β
always empty. The operator saw "docker compose up failed:" with
nothing after the colon.
**Fixes (`lib/handlers/stoachain-install.ts`)**
- New preflight step 5b: `ensureDockerComposePlugin(target)` runs
between `docker pull` and `docker compose up`. If `docker compose
version` fails, fetches the v2.29.1 compose plugin binary from
GitHub releases and drops it into
`/usr/libexec/docker/cli-plugins/docker-compose`. Architecture-
aware (x86_64 / aarch64 / armv7). Uses sudo + tee + chmod (all
already in the canonical sudoers list).
- `dockerComposeUp` now:
- streams compose output live via `onChunk` (you see pull/create/
start progress in the job log in real time)
- captures the merged output and includes it in the error message
(tail -600 chars, with explicit exit code)
- adds a defensive `docker rm -f stoa-node` before compose up to
survive stale containers from prior failed installs
- bumped timeout 60s β 5min for cold image pulls
**Version bump**
- `lib/version.ts` β `v0.7.4c`. Phase code stays `CR2`.v.Chaos.Jason.0-gΒ·v0.7.4b
Second slice of v0.7.4. Pure-plumbing v0.7.4a now has a visible
surface β you can promote mailcow mailboxes to `client`, assign nodes
to those clients, and clients will see only their own nodes.
**Phase code**: v0.7.4b ships as `CR2` (Client Role 2 β promotion +
ownership UI).
**Admin page β Clients section (`/admin/admins`)**
- New "Clients" roster section (parallel to the Ancient+Modern
roster). Shows each client's email, promote-date, promoter, and a
"pending onboarding" badge when `accepted_transparency_at` is null
(wired in v0.7.4e).
- New "Promote to Client" form (ancient-only) β dropdown lists
Mailcow mailboxes that aren't already admins / clients.
- Revoke button on each client row (ancient-only; warns that nodes
owned by revoked client become stranded until reassigned).
- Page re-titled "Admins & Clients" with updated 3-tier intro copy.
**New API routes**
- `GET /api/admin/clients` β list clients.
- `POST /api/admin/clients` β promote an email (ancient + fresh-confirm).
Refuses if email is already ancient/modern (upgrade path has to go
through explicit tier removal first).
- `DELETE /api/admin/clients/[email]` β revoke (ancient + fresh-confirm).
**Nodes list (`/admin/nodes`)**
- Shows `owner:` line per node (email or "unowned Β· ancient-only").
- SSR filters the list by ownership β modern/client admins see only
nodes they own. Ancient sees all, including unowned.
**Node detail (`/admin/nodes/[id]`)**
- SSR returns 404 if caller can't `canAccessNode` (same behavior as
the API layer β no leak between "doesn't exist" and "not yours").
- New `OwnerRow` component under the SSH line. Shows the owner email
or "unowned Β· ancient-only". Ancient admins get a "change" link
that inline-edits the field, fresh-confirms via password modal,
PATCHes `/api/admin/nodes/[id]/owner`, reloads.
- New `PATCH /api/admin/nodes/[id]/owner` API route (ancient + fresh).
**Add-Node wizard (`/admin/nodes/new`)**
- New "Owner email" field at the bottom of the shared form. Defaults
to the admin doing the adding.
- Ancient admins can type any email. Modern/client admins see the
field but it's locked to their own email (the API also refuses
mismatched ownership for non-ancient callers).
- Both `POST /api/admin/nodes` (paste-key) and
`POST /api/admin/nodes/bootstrap` (password bootstrap) accept
`ownerEmail`, default to caller, validate.
**Schema changes**
- `CreateNodeInput` and `BootstrapInput` gain `ownerEmail?: string | null`.
- `NodeRow` and `PublicNode` gain `owner_email: string | null`.
- `bootstrapNode` persists `owner_email` at INSERT time; defaults to
`issuedBy` if caller didn't specify.
- `createNode` persists `owner_email` at INSERT time (lowercased).
**How to test after dev reload**
1. Log in as ancient admin β `/admin/admins` β Clients section
empty. Pick a non-admin mailbox β "Promote to Client". Confirm
it appears in the Clients roster.
2. `/admin/nodes/[id]` β click "change" next to Owner. Assign it
to the client you just promoted. Save.
3. Sign out. Sign in as the client's email. You land at `/admin`
with their node visible at `/admin/nodes`. No other admin pages
accessible (they'd 404).
**Version bump**
- `lib/version.ts` β `v0.7.4b`, phase `CR2`.
v.Chaos.Jason.0-hΒ·v0.7.4a
Starts the v0.7.4 phase (client role + ownership) per
`plans/v0.7.4-client-role.md`. This slice is **pure plumbing** β no
user-visible changes yet. Subsequent slices (bβe) add the UI for
promotion, owner assignment, already-managed detection, key purge on
unmanage, and the onboarding transparency modal.
**Phase code**: v0.7.4a ships with phase code `CR1` (Client Role 1 β
Ownership plumbing), replacing SC5.
**Migration 016**
- `nodes.owner_email TEXT` β nullable column. Pre-v0.7.4a rows keep
NULL = "unowned, ancient-only". Fresh Add-Node flows in v0.7.4b
will populate it explicitly.
- New `clients` table β mirrors `modern_admins` shape. Email +
created_at + created_by + accepted_transparency_at (null until
v0.7.4e's modal consent).
- New `node_orphans` table β audit trail for unmanage attempts where
the hub couldn't remove its SSH key from the target. v0.7.4d
populates it.
**`AdminRole` extended**
- Added `'client'` to the union. Priority: `ancient > modern > client`.
- `getAdminRole()` checks `clients` table when neither ancient-env nor
`modern_admins` matches.
**New helpers (`lib/admin.ts`)**
- `canAccessNode(caller, node)` β ancient always; modern/client only if
owner_email matches their email; null owner = false for non-ancient.
- `requireOwnedNodeApi(req, res, opts?)` β route guard combining
`requireAdminApi` + node lookup + ownership check. Returns
`{ email, role, session, nodeId, ownerEmail }`. Pass `{ fresh: true }`
for fresh-confirm routes. 404s uniformly on unauthorized or
not-found (no surface leak).
**Node-route wiring (13 files updated, 8 skipped)**
- Updated (now ownership-scoped):
`[id].ts`, `apt-upgrade`, `backup`, `metrics/[...netdataPath]`,
`netdata-install`, `probe`, `stoachain/control`, `stoachain/docker-logs`,
`stoachain/flags` (GET only β PATCH stays ancient), `stoachain/logs`,
`stoachain/peer-activity`, `stoachain/preflight`, `stoachain/status`,
`test`.
- Skipped (ancient-only by design, bypass ownership):
`drive-benchmark`, `stoachain/cert-rotate`, `stoachain/certbot-obtain`,
`stoachain/convert-supervision`, `stoachain/install`,
`stoachain/peer-trust-reset`, `stoachain/reseed`, `sudoers-repair`.
**Master plan updated**
- `plans/control-hub.md` Β§16 Progress log: added the 2026-04-18 β 2026-04-21
SC-series build-out summary + the v0.7.4a entry.
**Next**: v0.7.4b β client-role promotion UI in `/admin/acolytes` +
owner-assignment UI on node detail.v.Chaos.Jason.0-iΒ·v0.7.3af
v0.7.3ae's resolver fixed node2-hardcoding but had a gap: adopted
docker nodes (never went through the Install wizard) have
`stoachain_runner_path = NULL` in the DB, so the resolver fell
through to "use live argv's `--database-directory`". For docker
nodes that value is the **container-internal** path (`/data`) because
chainweb-node runs inside the container. Resolver would have
returned `/data/backups` and tar would have failed again.
Hit on live for AncientMiner.
**Fix**: when supervision is docker AND we have no captured runner
path, run `docker inspect stoa-node` to read the host source of the
`/data` bind mount. That's the authoritative host data dir.
**Resolution flow (now)**
1. Hub-installed docker (runner_path ends compose.yml) β derive
from stoa-root convention
2. Adopted docker (runner_path NULL, supervision=docker) β
`docker inspect` the `/data` mount source
3. screen/systemd β live argv's `--database-directory` (host path)
4. Fallback β stored flags' database-directory
5. Throw with actionable error if nothing resolves
The `/data !== db` sanity guard now also prevents accidentally
treating a container-internal path as a host path in the later
fallbacks.
**Version bump**
- `lib/version.ts` β `v0.7.3af`.
v.Chaos.Jason.0-jΒ·v0.7.3ae
Two bugs surfaced once v0.7.3ad stopped auto-promoting junk seeds
and forced the real failure into visibility:
**Bug 1: backup handler had the remote backup dir hardcoded**
(`/mnt/nvmedrive/StoaNodeData/backups`). Worked for node2 by
coincidence; every other node's tar ran against a non-existent
path and produced an empty archive. Seen in the wild on live's
AncientMiner attempt:
```
tar: /mnt/nvmedrive/StoaNodeData/backups/1776731165148056: Cannot open
tar: Error is not recoverable: exiting now
```
**Bug 2: donor eligibility threshold was 95% of the tallest
*candidate***. If only one node had `enable-backup-api` on, it was
always β₯95% of itself and passed β even when another managed node
(without backup-api) showed the network was miles ahead.
**Fixes**
- `lib/handlers/backup-stoachain.ts`:
- New `resolveHostBackupDir(node, nodeId, log)` helper. For docker
nodes: derive from `stoachain_runner_path` (compose dir β
`<stoaRoot>/data/backups`). For screen/systemd: use live
argv's `--database-directory` (host path directly) β
`<db-dir>/backups`. Falls back to stored flags; throws with a
clear operator message if neither source resolves.
- The `du` baseline measurement also uses the derived data dir
(not the hardcoded path).
- `lib/seeds.ts`:
- Max cut is now tracked across ALL reachable managed nodes, not
just backup-api-enabled candidates.
- Eligibility threshold raised 95% β **999β° (99.9%)**. Matches
the "sync progress" green-zone threshold in the per-node Status
card, so what the admin sees as "synced" is exactly what the
donor picker accepts.
- `cut-too-low` reason text now shows permille: "sync progress
823.1β° is below the 999β° donor threshold".
Belt-and-suspenders with v0.7.3ad's 1 GiB archive-size check:
size check catches empty archives at write time; sync check
catches partially-synced donors at pick time.
**Version bump**
- `lib/version.ts` β `v0.7.3ae`.v.Chaos.Jason.0-kΒ·v0.7.3ad
Hit on live: the auto-refresh job promoted a **714-byte archive**
from AncientMiner (manifest: `innerBytes: 20, remoteSizeBytes: 0`)
as the hub's current seed. Happens when the donor's chainweb backup
API returns a near-empty archive β most likely because the donor
wasn't ready (recent restart, still syncing, internal backup worker
uninitialized).
Without a guard, a reseed from this "seed" would replace target
nodes with an empty data dir. Real footgun β seed-refresh must
refuse to promote junk.
**Fix (`lib/handlers/seed-refresh.ts`)**
- After the backup sub-handler returns, cross-check `size_bytes`.
- If below `MIN_SEED_SIZE_BYTES = 1 GiB`, throw with a clear
operator message. The backup row is preserved (operator can
inspect or delete via `/admin/backups`); the existing current
seed (if any) is untouched.
- Threshold chosen to be generous enough that any healthy chainweb
donor clears it, strict enough that an empty-archive failure gets
caught (real stoa-chain data is ~50 GB by now).
**Cleanup on live**
- Deleted the bad seed_archives row + 714-byte .ahbk file on the
production hub (one-off SSH). Next scheduled seed-refresh will
produce a real seed once a healthy donor is available.
**Version bump**
- `lib/version.ts` β `v0.7.3ad`.
v.Chaos.Jason.0-lΒ·v0.7.3ac
Follow-up to v0.7.3ab: seeds and client backups have different
semantics (hub infrastructure vs client-facing archives) and mixing
them in the Backups UI is confusing. Splits them cleanly.
**Changes**
- `listBackups(opts)` gains `excludeSeeds?: boolean`. The Backups
page + API both pass it to exclude seed-referenced rows.
- `/admin/backups` no longer shows seeds. Header paragraph now
points operators at `/admin/seeds` for hub-infrastructure archives.
- New endpoint `GET /api/admin/seeds/[id]/download`:
- **Ancient admin + fresh-confirm required**
- Serves the `.ahbk` file (HTTP Range supported, resumable)
- **No auto-delete** β hub keeps its copy, operator gets a copy
- Filename baked with seed status + promote date for cold-storage
clarity (`stoa-seed-current-2026-04-21-<id8>.ahbk`)
- `/admin/seeds` History table gains a `Download` column with a
`β .ahbk` button per row. Button triggers the password modal
(stamps fresh-confirm on the session) then navigates to the
download URL.
- History section has an explanatory paragraph: seeds are
infrastructure, download is out-of-band only, no auto-delete.
**Use cases for the download**
- Cold/offline archive of the reseed baseline (disaster recovery)
- Manual reseed on a firewalled node that can't SSH to the hub
- Inspection / diagnostics of the archive content
**Version bump**
- `lib/version.ts` β `v0.7.3ac`.v.Chaos.Jason.0-mΒ·v0.7.3ab
User caught a real footgun: the hub's seed archive (the `.ahbk` used
for new-node installs + reseeds) shares the same `data/backups/`
directory and `backups` table as client-facing backups. Downloading
it via the normal backups page auto-deleted the file on completion
(standard behavior for client backups), which would orphan the seed
and break future reseeds.
**Fix β seed-referenced backups are now protected**
- New helpers in `lib/backups.ts`:
- `getBackupSeedStatus(id)` β `'current' | 'archived' | null`
- `listBackupSeedStatuses(ids)` β batch map for list endpoints
- `deleteBackup(id, opts)` now throws `BackupIsSeedError` if the
backup is seed-referenced. Pass `{ force: true }` only from
internal seed-management code (none currently; reserved for
future demotion flows).
- `DELETE /api/admin/backups/[id]` catches the new error, returns
**409 Conflict** with the seedStatus, and logs the refusal.
- `GET /api/admin/backups/[id]/download` auto-delete-on-completion
logic now skips seed-referenced backups. Staged `.tar.gz.ready`
is still cleaned up (it's disposable); only the `.ahbk` is the
seed archive and stays on disk.
- `GET /api/admin/backups` and `GET /api/admin/backups/[id]` now
include `seedStatus` in the response.
- `/admin/backups` UI surfaces this:
- `HUB SEED Β· current` (orange) or `HUB SEED Β· archived` (grey)
badge next to the label
- Tooltip on Download explains auto-delete is skipped for seeds
- Header paragraph mentions the HUB SEED exemption
Downloads of seeds now behave as: admin gets a copy of the file,
hub keeps the file, reseed remains possible. No more one-shot
"download β lose the seed" accident.
**Version bump**
- `lib/version.ts` β `v0.7.3ab`.v.Chaos.Jason.0-nΒ·v0.7.3aa
Node2 conversion succeeded (chainweb now runs inside
`stoa-node` container), but the Control tab still showed
`supervision=screen`. Two cooperating bugs:
1. Priority was `screen > docker > systemd`. Any screen session
present made detection short-circuit to screen.
2. Screen detection regex matched **any** session name: `\d+\.\w+`.
Node2 has unrelated screens on the box β `StoaMiner` (kadena
ASIC miner) and `cronoton` β both matched. First one picked β
mis-reported.
**Authoritative fix**: use the **cgroup of the chainweb-node PID**.
A docker-supervised process lives in `/system.slice/docker-<hash>.scope`;
a systemd unit lives in `/system.slice/<unit>.service`. That's the
truth regardless of which other services happen to be on the box.
**Changes (lib/stoachain-live.ts)**
- Bash probe now captures `/proc/$PID/cgroup` in a new `---CGROUP---`
section.
- Supervision picker checks cgroup first (docker / systemd), falls
back to screen/docker/systemd blocks only if cgroup didn't resolve.
- Screen session detection regex tightened: `[0-9]+\.StoaNode` only
β unrelated screens no longer trigger false positives.
**Version bump**
- `lib/version.ts` β `v0.7.3aa`.
v.Chaos.Jason.0-oΒ·v0.7.3z
Node2 screen β docker conversion failed:
```
error mounting ".../StoaNodeData.stoa/tls/tls-cert.pem" to rootfs at "/data/tls-cert.pem":
...not a directory: Are you trying to mount a directory onto a file (or vice-versa)?
```
Two distinct bugs:
**Bug 1 (root cause): cert path not translated after data-dir move.**
On nodes where the TLS cert lives inside the data dir (e.g.
`/mnt/nvmedrive/StoaNodeData/tls-cert.pem`), the `mv` of the data dir
moves the cert along with it. The flags loaded from live argv still
point at the pre-move path, so the `sudo cp` to copy cert+key into
the new `tls/` subdir silently fails. The handler didn't check cp's
exit code β it logged "copied cert+key" regardless. Docker's
bind-mount then auto-created the missing source path as a directory,
and `runc` rejected the mount because you can't bind-mount a dir
onto a file.
**Bug 2: dead-but-existing container poisons supervision detection.**
A compose-up that creates a container but fails to start it leaves
that container in "Created" state. `detectSupervisionLive` was using
`docker ps -a` (all containers), so a stopped stoa-node was reported
as docker-supervised even after rollback restarted screen/systemd.
**Fixes (lib/handlers/stoachain-convert-supervision.ts)**
- Before cp: if the cert/key paths were inside the old data dir,
translate them to the new (post-mv) location. Logs the translation
so it's visible.
- cp: check exit code and throw on failure. Also `test -f` the
resulting `tls-cert.pem` to make sure it's actually a regular file.
- detection: `docker ps` (running only), not `docker ps -a`.
- New rollback step pushed right after compose.yml is written:
`docker compose down` + `docker rm -f stoa-node`. LIFO order puts
this first on rollback (while compose.yml still exists), then
remove-intermediate / mv-back / restart-old. Prevents orphaned
container from blocking clean retry.
**Scripts**
- `scripts/recover-node2-post-fail.ts` β one-off to clean up node2's
dead container + leftover .stoa dir after the v0.7.3y attempt.
**Version bump**
- `lib/version.ts` β `v0.7.3z`.
v.Chaos.Jason.0-pΒ·v0.7.3y
Node2 benchmark: write succeeded at 210 MB/s, then cache-drop step
timed out at 10s. Root cause: `sync` blocks until RocksDB dirty
pages are flushed β on a busy chainweb node that's easily >10s.
Timeout killed the whole benchmark even though cache-drop is
strictly a "read test accuracy" nice-to-have.
**Fixes**
- Dropped the `sync` preamble. We care about clearing the page cache,
not durability; `drop_caches` handles what we need.
- Bumped timeout 10s β 30s for the drop itself.
- Wrapped the call in try/catch β if it still times out or fails for
any reason, log a warning and continue. The read test may show
inflated cached throughput in that case, but the write number is
the authoritative one anyway (RocksDB's bottleneck is writes).
Net effect: no more benchmark deaths from a busy node, and the
worst degradation is "read test optimistic".
**Version bump**
- `lib/version.ts` β `v0.7.3y`.
v.Chaos.Jason.0-qΒ·v0.7.3x
v0.7.3w's auto-sudoers-repair worked (AncientLinux benchmark got past
the dd write step, 345 MB/s). Next failure was the **read** parse:
```
536870912 bytes (537 MB, 512 MiB) copied, 0,445005 s, 1,2 GB/s
```
Two issues packed into one line:
- Comma decimal separator (`0,445005`, `1,2`) β AncientLinux is in a
German/Romanian locale
- GB/s (not MB/s) β fast NVMe reads report in GB/s
The old regex `/,\s*([\d.]+)\s*MB\/s/` expected dot-decimals AND
MB/s. Missed both on this line.
**Fix**: new `parseDdThroughput(output)` helper that accepts
MB/s, GB/s, KB/s (with GBβMB and KBβMB normalization) and both
`.` and `,` as decimal separator. Returns null if unparseable so
the caller can throw an honest error.
Used for both write and read parsing in `drive-benchmark.ts`.
**Version bump**
- `lib/version.ts` β `v0.7.3x`.
v.Chaos.Jason.0-rΒ·v0.7.3w
v0.7.3v's probe correctly identified AncientLinux's `/home/StoaNode/data`
as root-owned (docker runs chainweb as root), triggering the
`sudo -n dd` path. That path then failed because AncientLinux's sudoers
is from the pre-v0.7.3m template and doesn't include `/bin/dd`,
`/bin/sh`, or `/bin/sync`.
Rather than tell the operator "go click Sudoers Repair and retry",
the handler now auto-repairs sudoers on sudo-refusal. Every manual
fix becomes a UI feature.
**Changes**
- New `lib/sudoers.ts` β single source of truth for the canonical
NOPASSWD command list, with `repairSudoers(target, username)` and
`ensureSudoers(target, username, log)` helpers.
- `lib/handlers/drive-benchmark.ts` β on sudo refusal during dd write,
calls `ensureSudoers()` to refresh `/etc/sudoers.d/ancientholdings-stoa`
to the canonical list, then retries the dd once. If it still fails,
returns an actionable error ("check the sudoers file manually").
- Refactored `pages/api/admin/nodes/[id]/sudoers-repair.ts` to use
the shared primitive β previously the canonical list was duplicated
across three files.
- Also dropped the last remaining fake MB/s fallback: if dd exits 0
but output lacks the `MB/s` line, throw an error instead of
inventing a reading from wall-clock time.
- Added `/usr/bin/curl` to the canonical sudoers list (needed by
v0.7.3t's compose-plugin install and v0.7.3u's docker install).
**Version bump**
- `lib/version.ts` β `v0.7.3w`.v.Chaos.Jason.0-sΒ·v0.7.3v
Two bugs in one:
1. **Drive benchmark always used `sudo -n dd`** β fine for docker
installs (root-owned data dir) but failed on user-owned data dirs
(screen/systemd installs, e.g. AncientLinux's `/home/StoaNode/data`)
whose sudoers didn't have a `/bin/dd` entry. The dd never actually
ran; sudo refused with "a password is required".
2. **The handler fabricated a fake MB/s reading on failure.** Because
the error-handling ran AFTER the mbps calculation, and the
calculation fell back to `sizeMb / wall-clock` when the dd output
had no "MB/s" line, the job log showed a plausible-looking number
(the ssh round-trip time, e.g. "1802.8 MB/s") before throwing
the actual error. Misleading.
**Fixes (both in `lib/handlers/drive-benchmark.ts`)**
- Probe `benchDir` perms first via `[ -w ... ]`. If the ssh user can
write, skip `sudo` entirely. Only use sudo when the dir is
root-owned (docker case).
- Check dd exit code BEFORE parsing MB/s. If exit != 0 and stderr
indicates sudo refusal, return a clear "run Sudoers Repair" error
instead of trying to plot a fake reading.
- Same pattern for the read-test dd and the rm cleanup.
- Cache-drop (`/proc/sys/vm/drop_caches`) still needs sudo β left
best-effort with `|| true`. If cache-drop fails, the read number
is just inflated (cached), but the write number is still accurate.
**Version bump**
- `lib/version.ts` β `v0.7.3v`.
v.Chaos.Jason.0-tΒ·v0.7.3u
Closes the "you want to convert to docker but docker isn't installed"
gap. v0.7.3t handled the compose plugin; v0.7.3u handles the whole
docker engine.
**Fix**: convert-supervision's docker preflight now runs Docker's
official `get.docker.com` convenience script if `command -v docker`
fails. That sets up the apt repo, installs `docker-ce` +
`docker-compose-plugin` + dependencies, and enables + starts
`docker.service`. After install, preflight re-verifies `docker --version`
and proceeds to the compose-plugin check (which should now pass since
get.docker.com includes the plugin).
Rationale: "if you're converting TO docker and docker is missing,
install it" is the obvious operator expectation. Failing with
"go run the install-wizard bootstrap step yourself" made the Upgrade
button lying. The converter is now genuinely self-healing for the
docker-as-target case.
Every manual fix becomes a UI feature β in line with the operator
principle that production users won't have Claude to SSH in for them.
**Install flow (docker path)**
1. `command -v docker` β if missing, run `get.docker.com` (10-min timeout)
2. `docker --version` β sanity check after install
3. `docker compose version` β if missing, fetch v2 plugin binary from
GitHub (v0.7.3t code)
4. Proceed with conversion
Streamed output: the `[docker-install]` and `[compose]` lines show
pull/install progress in real time.
**Version bump**
- `lib/version.ts` β `v0.7.3u`.
v.Chaos.Jason.0-uΒ·v0.7.3t
Real error surfaced by v0.7.3s's error-visibility + rollback: node1
had docker CLI 29.1.3 but **no compose plugin**. Ubuntu 22.04's
`docker.io` package ships the CLI without the plugin. Running
`docker compose up -d` then fails with
`unknown shorthand flag: 'd' in -d` because docker treats `compose`
as a positional arg and `-d` as a top-level docker flag.
**Fix**: convert-supervision's docker preflight now checks for
`docker compose version` and β if missing β downloads the official
v2 plugin binary (v2.29.1) from GitHub releases directly into
`/usr/libexec/docker/cli-plugins/docker-compose`. Single-binary
install; no apt repo, no GPG key, no Docker repo setup needed.
Architecture-aware (x86_64 / aarch64 / armv7). Uses sudo + tee
(already in sudoers).
This lifts off the operator's plate the "why doesn't my upgrade
work" confusion when their distro's docker package is incomplete.
Can later be factored into a shared `ensureDockerCompose()` primitive
used by the install-wizard too.
**Rollback proven end-to-end**
- Last failed attempt from v0.7.3s logs showed: `[compose] unknown
shorthand flag: 'd' in -d` β `[rollback] β restored to systemd`.
Node never needed manual SSH recovery. That's the target state.
**Version bump**
- `lib/version.ts` β `v0.7.3t`.
v.Chaos.Jason.0-vΒ·v0.7.3s
The big one. Previously "the old supervision never comes back up on
failure" was left to operators to fix manually (or a one-off recovery
script). v0.7.3s bakes **full rollback** into every conversion.
**How it works**
- Before any destructive step, `captureOldStartInfo()` records how to
restart the current mode:
- systemd: resolves the active unit name
- screen: captures the runner path from live argv or stored profile
- docker: captures the compose working dir via `docker inspect`
- The destructive section builds a `rollbackStack` of labelled undo
callbacks as it goes:
- after stop β "restart old mode" (registered first, runs last)
- if data dir was moved β "mv data back" (using `[ -d src ] && [ ! -e dest ]` guards)
- if data dir was newly created as part of layout β "remove intermediate"
- before writing systemd unit/wrapper β snapshots originals to
`.TS.bak`, records "restore systemd unit + wrapper" (stops + disables
+ restores backups + daemon-reload)
- before writing screen runner β snapshots to `.TS.bak`, records
"restore screen runner"
- On any failure in steps 4-7: run the stack in reverse (LIFO). Each
undo is wrapped in try/catch so one failing undo doesn't block the
rest. After rollback, re-runs supervision detection; logs whether
the old mode came back successfully.
**Verify is now inside the rollback scope** β if chainweb-node doesn't
come up within 3 min under the new mode, we revert to the known-good
old mode instead of leaving the node silent. (Previously the handler
explicitly skipped rollback for verify failures; that was exactly the
kind of half-broken state operators had to SSH in to fix.)
**Error visibility (from v0.7.3r, restated)**
- compose output is now streamed live to the job log via `onChunk`
(pull/create/start progress visible in real time)
- Combined (stderr + streamed stdout) is included in the error
message, tail -600 chars, with explicit exit code
- Same treatment for systemctl + screen start
- Defensive `docker rm -f stoa-node` before compose up (survives
a stale container collision from a prior failed attempt)
**Scripts**
- New `scripts/recover-node1-systemd.ts` β one-off recovery used to
restore node1 to systemd after the v0.7.3o/p/q chain of failed
conversions left it half-converted. Useful as a reference for
similar recoveries; not intended to be part of the regular ops path
now that rollback is built in.
**Version bump**
- `lib/version.ts` β `v0.7.3s`.v.Chaos.Jason.0-wΒ·v0.7.3r
Bugfix chain continuing from v0.7.3p/q. v0.7.3p unlocked the upgrade
for adopted nodes (node1); the actual `docker compose up` then failed
with only "docker compose up failed:" (empty stderr). Root cause: the
handler merged stderr into stdout via `2>&1` but then only reported
`r.stderr` on failure β dropping the real error on the floor.
**Fixes**
- Live-stream compose output to the job log (`onChunk`) β you see
the pull/create/start progress in real time.
- Error message now includes the combined captured output, tail
-600 chars, with explicit exit code.
- Defensive cleanup: `docker rm -f stoa-node` before compose up, so
a stale `stoa-node` container from a prior failed attempt doesn't
cause a name-conflict error on the next try.
- systemd-start + screen-start error paths: same treatment
(stdout + exit code surfaced).
- Docker-compose timeout raised from 3 min β 5 min to cover cold
image pulls on slow connections.
**Version bump**
- `lib/version.ts` β `v0.7.3r`.
v.Chaos.Jason.0-xΒ·v0.7.3q
Bugfix: after running a drive benchmark that classified a drive as SSD,
the "Drive (sysfs)" row still rendered the red "HDD (discouraged)"
badge because the badge was hardcoded to sysfs β the empirical class
was only being applied to the Storage card's tone and the HDD-
discouragement warning.
Fix: new `effectiveClassBadge` that renders the benchmark class when
available, sysfs as the fallback. The row is renamed from "Drive
(sysfs)" to "Drive class", with a source note: "from empirical
benchmark β sysfs heuristic said hdd" when they disagree, or "from
/sys/block β heuristic (run benchmark below for empirical)" when
only sysfs is available. Drive model moved to its own KV row.
Also drops the now-dead `driveBadge` helper.
**Version bump**
- `lib/version.ts` β `v0.7.3q`.
v.Chaos.Jason.0-yΒ·v0.7.3p
Bugfix: v0.7.3o's convert-supervision failed on adopted nodes (like
node1) because it required a pre-captured `stoachain_flags_json` in
the DB. Adopted systemd/screen nodes that never went through the
Install wizard never had stored flags; the handler blew up at step
2/8 with "no stored flag profile β trigger a Restart first".
Fix: source flags from **live argv first** (via `fetchLiveFlags`, which
SSHes + parses `ps` output), fall back to stored only if live parsing
fails. Since the handler already confirms the node is running in
step 1/8 (supervision detection), live always works in practice.
Affected path: `lib/handlers/stoachain-convert-supervision.ts` step 2/8.
No other behavior change; the hierarchy lock + UI + API route from
v0.7.3o are untouched.
**Version bump**
- `lib/version.ts` β `v0.7.3p`.
v.Chaos.Jason.0-zΒ·v0.7.3o
Turns the v0.7.3n anyβany converter into an **upgrade-only** ladder
along the hierarchy `docker > systemd > screen`. Screen is the worst
supervision mode for a production daemon β no restart policy, no boot
recovery, session death = node death β and the UI now surfaces that
so operators can't accidentally miss it.
**Hierarchy (correct ordering)**
- `docker` β
β
β
β image-pinned, isolated, reboot-safe via
`restart: unless-stopped`. Best.
- `systemd` β
β
β proper lifecycle (`Restart=on-failure`), boot recovery
(`WantedBy=multi-user.target`), but binary lives on host. Upgrade
recommended.
- `screen` β
β no restart, no boot recovery. Upgrade highly
recommended.
**Hub-enforced upgrade-only conversions (3)**
- `screen β systemd`
- `screen β docker`
- `systemd β docker`
**Refused downgrades (3)** β reinstall under the lower mode instead:
- `systemd β screen`
- `docker β screen`
- `docker β systemd`
**Changes**
- New `lib/supervision.ts` β single source of truth for ranks,
star counts, labels, taglines, and reboot survivability. Exports
`canUpgradeTo(from, to)`, `upgradeTargetsFrom(from)`.
- `lib/handlers/stoachain-convert-supervision.ts` β enforces
`canUpgradeTo` at job start. Downgrade requests fail with a clear
message before any state changes.
- `pages/api/admin/nodes/[id]/stoachain/convert-supervision.ts` β
fetches live supervision, validates upgrade, rejects downgrades at
the API layer so a downgrade never even hits the worker.
- `components/admin/NodeTabs.tsx` β replaces `SupervisionConverterCard`
with `SupervisionCard`. Shows current mode with star rating, tagline
("Best β no upgrade needed" / "Upgrade recommended" / "Upgrade highly
recommended"), and an explicit "Survives hardware reboot: yes / no"
indicator. When the node isn't at the top, an Upgrade button with
dropdown of valid targets. Placed at the top of the Control sub-tab.
- Tone: docker green, systemd amber, screen red.
**Auto-restart verification**
- Docker: `renderDockerCompose` already emits `restart: unless-stopped`
(verified `lib/stoachain-layout.ts:160`).
- Systemd: unit template already has `Restart=on-failure` +
`WantedBy=multi-user.target` + `systemctl enable` (verified
`stoachain-convert-supervision.ts:430-445`).
- Screen: no auto-restart (intentional; reinforces 1-star rating).
**Deferred to later**
- Install wizard 3-mode selector + binary-extract-from-image primitive.
Today only docker installs are wired; systemd/screen exist through
adoption of legacy nodes or manual bootstrap. Fresh systemd/screen
installs are a future slice; every node currently in the network can
already be upgraded along the hierarchy via this converter.
**Version bump**
- `lib/version.ts` β `v0.7.3o`.v.Chaos.Jason.0-aaΒ·v0.7.3n
Closes gaps in supervision handling so every node-op works regardless of
whether the node runs under screen / systemd / docker, and adds a
first-class migration path between the three.
**Seeds page: live backup-api detection**
- `listManagedNodeStatus` (lib/seeds.ts) now fetches live flags alongside
`/info`. The "Backup API" column in `/admin/seeds` no longer falls back
to stored flags when the node's running argv has been edited out-of-band
(node1 symptom before this fix).
- Stored flags remain the fallback when the node is unreachable.
**Unified logs endpoint**
- New `GET /api/admin/nodes/[id]/stoachain/logs?lines=N` dispatches on
detected supervision:
- `docker` β `docker logs --tail N stoa-node`
- `systemd` β `journalctl -u stoa-node.service --lines=N`
- `screen` β `tail -n N` of common runner log files (`/var/log/stoa-node.log`,
`/mnt/nvmedrive/StoaNodeData/chainweb.log`, etc.), or a
friendly "attach to the screen session" note when no log
file exists
- Old `/docker-logs` route kept as a back-compat alias.
- Peer-activity route (`/stoachain/peer-activity`) now uses the same
supervision-aware source β "Peer Activity" works for systemd + screen
nodes too, not just docker.
- New `NodeLogsCard` in NodeTabs replaces the docker-only
`ContainerLogsCard` in the Control sub-tab. Title/source adapts:
"Container logs" / "Service logs (journalctl)" / "Screen logs".
**Flag Editor Apply+Restart: systemd support**
- `stoachain-control` handler gained `rewriteSystemdWrapper`. When the
user Applies flag changes on a systemd-supervised node, the handler
inspects `systemctl cat stoa-node.service`, finds the wrapper script
referenced by `ExecStart=`, and overwrites it with the output of
`toRunnerScript(flags)` (base64 + tee, chmod 755). Then `daemon-reload`
+ `systemctl restart`.
- Matches the existing docker-compose rewrite and screen runner-script
rewrite paths β all three supervision modes now behave identically in
the Flag Editor.
**Anyβany supervision converter (NEW)**
- New `lib/handlers/stoachain-convert-supervision.ts` migrates a node
between any two supervision modes without losing chain data. Six
conversions covered:
- screen β docker
- screen β systemd
- docker β systemd
- 8-step pipeline: detect current β load flags β preflight target
prerequisites β stop current β prepare new mode layout β start under
new mode β verify live `/info` β update stored state.
- Docker target: rearranges into canonical `<stoaRoot>/{chainweb, data, tls}`
layout, renders compose.yml via `renderDockerCompose`, mounts through
to container-internal paths (`/data`, `/data/tls-cert.pem`).
- Systemd target: writes `/usr/local/bin/run-stoa.sh` wrapper +
`/etc/systemd/system/stoa-node.service` unit, daemon-reload + enable.
- Screen target: writes `RunStoaNode.managed.sh` next to the data dir.
- Auto-rollback attempts to restart under the old mode if the
conversion fails after stop (not guaranteed β old-mode artifacts may
already be overwritten when rearranging into docker layout).
- New API `POST /api/admin/nodes/[id]/stoachain/convert-supervision`
with body `{toMode: 'docker' | 'systemd' | 'screen'}`. Ancient admin +
fresh-confirm required.
- **UI** new `SupervisionConverterCard` on Chainweb β Control sub-tab:
dropdown of available target modes, destructive confirmation dialog,
redirects to job log on submit.
**Registry**
- `lib/handlers/registry.ts` now registers 14 handler kinds (added
`stoachain-convert-supervision`).
**Version bump**
- `lib/version.ts` β `v0.7.3n`.v.Chaos.Jason.0-abΒ·v0.7.3m
Three items operator-requested.
**Drive benchmark (empirical classification)**
- New `lib/handlers/drive-benchmark.ts` β `dd`-based sequential write + read
test against the node's data-dir filesystem. 512 MB default, `conv=fdatasync`
+ `oflag=dsync` to bypass cache. Caches dropped before read (via
`/proc/sys/vm/drop_caches`).
- Classifies by measured write throughput:
- β₯ 500 MB/s β `nvme`
- β₯ 150 MB/s β `ssd`
- β₯ 50 MB/s β `hdd`
- < 50 MB/s β `slow` (red warning; check for virtualized/network storage)
- Persists via inline ALTER TABLE (additive cols `drive_bench_*` on `nodes`).
- New API `POST /api/admin/nodes/[id]/drive-benchmark` β ancient admin +
fresh-confirm.
- Sudoers template updated: `/bin/dd`, `/bin/sh`, `/bin/sync` added.
- **UI** on Chainweb β Status β Storage card: "Empirical benchmark"
section alongside sysfs class. "Re-run benchmark" button. Shows
write + read MB/s, measured timestamp, highlights mismatch between
sysfs-heuristic and empirical-measured class.
**`backup-directory` locked in Flag Editor**
- Added to `IMMUTABLE_FLAGS` client + server side. Chainweb auto-derives
it to `<database-directory>/backups` when omitted; setting it elsewhere
breaks RocksDB hardlink checkpointing.
- Operators now only toggle `enable-backup-api`; the dir is always correct
by default. Matches how node2 / node1 / AncientLinux all work in
practice.
**Node1 `--enable-backup-api` enabled (manual fix)**
- Out-of-band: SSH'd into node1, edited `/usr/local/bin/run-stoa.sh` to
add `--enable-backup-api`, reloaded via `systemctl restart
stoa-node.service`. Verified with POST to `/make-backup` β returned
backup id successfully.
- Old runner script archived with `.TS.old` suffix.
- Note: systemd-supervised nodes don't yet support Flag Editor's
Apply+Restart path β that's v0.7.4+ work (unit-file rewriting).
Manual fix for now; v0.7.4 ships proper support.
**Version bump**
- `lib/version.ts` β `v0.7.3m`.
v.Chaos.Jason.0-acΒ·v0.7.3l
Filling in three automation gaps the user called out:
**Certbot now detected in system-probe**
- `lib/handlers/system-probe.ts` β new `SVC_CERTBOT` section captures:
binary version, `certbot.timer` enabled/active state, next scheduled
run, list of installed deploy-hooks.
- `SystemProbe.services.certbot` β surfaces in the probe output so the
admin UI can show certbot alongside docker, nginx, etc. Install wizard's
certbot install (added in v0.7.3i) is now visibly confirmed by probe.
**Cert renewal deploy-hook**
- `stoachain-certbot-obtain` now installs a per-node deploy-hook at
`/etc/letsencrypt/renewal-hooks/deploy/stoa-<nodeId8>.sh`.
- When `certbot.timer` renews the cert (~60 days from now, automated),
the hook:
1. Copies the renewed cert files into chainweb's TLS paths
2. Fixes ownership + permissions
3. Detects supervision (docker / systemd / screen) at hook run-time
and restarts accordingly (docker compose up -d --force-recreate,
systemctl restart, or bail with instructions for screen)
- Previously: certbot renewed fine but the new cert never reached
chainweb's in-memory copy β would have been a silent time-bomb ~60
days out.
**Daily seed auto-refresh (scheduled)**
- `worker/index.ts` β `maybeScheduleSeedRefresh()` runs on every main
loop iteration (throttled to 15 min between checks). If:
- auto-refresh isn't disabled (system_state flag)
- current seed is >23h old (or missing entirely)
- no seed-refresh job is already queued/running
- there's an eligible donor
β enqueues a `seed-refresh` job automatically. Runs under actor email
`system:seed-auto-refresh` in the audit trail.
- `pages/api/admin/seeds/auto-refresh.ts` β POST endpoint to toggle
the scheduler on/off (fresh-confirm + ancient-admin).
- Admin UI on `/admin/seeds` gets a new "Auto-refresh schedule" panel:
green/gray Enabled/Disabled toggle, next ETA (based on current seed
age + 23h), last enqueue timestamp + job id, last skip reason (e.g.
"no donor available").
**Version bump**
- `lib/version.ts` β `v0.7.3l`.v.Chaos.Jason.0-adΒ·v0.7.3k
Follow-on cleanup after v0.7.3j proved the LE flow works end-to-end.
**Cert-doctor logic inverted (critical bugfix)**
- `lib/cert-doctor.ts`: the old v0.7.3h logic said "CA-signed is bad,
self-signed is good, certbot auto-renew breaks the network". Every
claim was wrong β verified today by restoring node2's LE cert and
watching all three nodes resume syncing.
- New logic:
- `severity='healthy'` (green) β CA-signed + certbot auto-renew active
- `severity='warn'` (amber) β CA-signed without auto-renewal configured
- `severity='error'` (red) β self-signed (broken on public Stoa P2P)
- `severity='unknown'` β cert unreadable / ephemeral / missing
- Messages rewritten to match reality.
**Identity card: positive confirmation when healthy**
- Green banner appears when TLS is set up right (LE + certbot timer).
*"Peer trust is unaffected by renewals because chainweb validates
via CA chain, not fingerprint pinning."*
- Amber banner when LE cert but no auto-renew.
- Red banner stays only for self-signed β the actually-broken case.
**Sync progress indicator on Status card**
- New KVs: **Target (tallest peer)** + **Sync progress**.
- Target = max cut height across all managed nodes the hub has
live-probed (parallel SSH, O(slowest node)). Null if this is the
tallest.
- Progress shown as permil with 3 decimals (e.g. `998.234 β°`), colored:
- green `β₯ 999 β°`
- gold `β₯ 950 β°`
- amber `< 950 β°`
- Shows "N blocks behind" when delta > 0; "at tip" when caught up.
**certbot handler: auto-resolves docker host paths**
- `stoachain-certbot-obtain`: if the node's stored runner_path is a
`docker-compose.yml`, the handler derives `<stoaRoot>/tls/...` host
paths from it automatically. No more manual `certPath` + `keyPath`
in the API payload for docker-supervised nodes.
**Add Node UI: docker-default signaling**
- "Easy setup (with password)" β **"Easy setup Β· docker"** + green
"recommended" badge.
- "Advanced (paste private key)" β **"Advanced Β· existing install"**.
- Explanatory line under the tabs: *"Docker supervision gives each node
a self-contained environment β¦ what the hub recommends for any new
install."*
**Version bump**
- `lib/version.ts` β `v0.7.3k`.
v.Chaos.Jason.0-aeΒ·v0.7.3j
Found three bugs in the v0.7.3i certbot handler while actually running
it against AncientLinux:
**1. Silent apt install "success"**
- `sudo -n apt-get install -y certbot 2>&1 | tail -5` β tail's exit code
masks apt-get's failure. Handler thought certbot was installed; it
wasn't.
- Fixed: wrap in `set -o pipefail`, then verify post-install with
`command -v certbot`.
**2. DEBIAN_FRONTEND=noninteractive rejected by sudo**
- sudo's `env_reset` strips non-whitelisted env vars. Setting
`DEBIAN_FRONTEND` in the sudo command failed: *"you are not allowed
to set the following environment variables"*.
- Fixed: dropped the env var. apt-get install is fine without it.
**3. DNS-01 hook scripts didn't land on disk**
- `echo ${JSON.stringify(script)} | tee ...` corrupted escape sequences.
The `\n` in the script source became literal `\n`, not a newline. The
file ended up as one giant first line that `sh` couldn't parse β
certbot reported `/bin/sh: 1: /etc/letsencrypt/duckdns-hooks/auth.sh:
not found`, even though the file existed.
- Fixed: base64-encode the script content, decode on the remote side
via `base64 -d | tee`. Same pattern used by `writeManagedRunner` etc.
- Added sanity check: `test -x && test -s` on the written file before
invoking certbot.
**UX cleanup**
- **Deleted** `CertRotateButton` / the self-signed rotate UI entirely.
Per feedback: *"just remove it all together... since its only noise
now."* The old `stoachain-cert-rotate` handler stays registered for
API-level compatibility but has no UI surface anymore.
- **Renamed** "Obtain Let's Encrypt cert (recommended)" β simply
**"Install TLS cert"**. No "recommended" hedge β LE is the only way
chainweb P2P works on public Stoa.
- **Auto-detected challenge** from the hostname: `.duckdns.org` β
DNS-01, everything else β HTTP-01. Removed the challenge dropdown.
Operator still needs to provide a DuckDNS token for NAT'd nodes; the
field auto-reveals only when DNS-01 is the auto-choice.
**On "ancientholdings as its own CA" (future work)**
- Honest answer logged: technically possible (~weeks of work), but
creates network-splitting effect with operators who already trust LE.
Deferred to Phase 3+ as a consortium-CA option; main Stoa network
stays on LE.
**Version bump**
- `lib/version.ts` β `v0.7.3j`.v.Chaos.Jason.0-afΒ·v0.7.3i
**Root cause finally verified**: chainweb-node's P2P TLS validates
against the standard system CA bundle. Self-signed certs are rejected
with `HandshakeFailed "certificate has unknown CA"`. Let's Encrypt
certs (CA-signed) work fine β the original node1 + node2 setup used LE
for exactly this reason.
**The hub's old cert-rotate generated self-signed certs β broken for
real chainweb use.** Rotating node2 twice today (P-384 and P-256, both
self-signed) broke peer sync each time. **Restoring node2's original LE
cert from `/etc/letsencrypt/live/` immediately fixed sync network-wide**
β confirmed by: node2 1,622,533 β 1,624,038 in minutes; node1 unstuck
from 1,621,032 β 1,621,330; AncientLinux β 1,623,530.
**New handler: `stoachain-certbot-obtain`**
- `lib/handlers/stoachain-certbot-obtain.ts`:
1. Installs certbot via apt if missing.
2. Archives existing cert+key with `.TS.old` suffix.
3. Runs certbot:
- HTTP-01 (`--standalone`): certbot binds :80; nginx is briefly
stopped if active; chainweb keeps running.
- DNS-01 via DuckDNS: writes a small auth-hook script that updates
a TXT record via DuckDNS's API (works for NAT'd nodes like
AncientLinux on `bytales.duckdns.org`).
4. Copies `fullchain.pem` + `privkey.pem` from
`/etc/letsencrypt/live/<domain>/` to chainweb's configured paths.
5. chown to the chainweb user, chmod 600 on the key.
6. Optionally auto-restarts chainweb-node to load the new cert.
**Bootstrap: certbot now installed alongside docker**
- `lib/nodes.ts` β `prepareTarget` script now installs certbot via apt/dnf/yum.
- Canonical sudoers list gains `/usr/bin/certbot`, `/usr/bin/apt-get`,
`/usr/bin/cp`, `/bin/cp`. `sudoers-repair` endpoint updated to match.
**UI: Identity card**
- Primary action is now **"Obtain Let's Encrypt cert"** with challenge
method dropdown (HTTP-01 / DNS-01-DuckDNS), ACME email field, DuckDNS
token field (revealed when DNS-01 selected), auto-restart checkbox.
- The old **"Rotate (self-signed)"** button is tucked behind an
`Advanced` expand arrow with a warning that self-signed certs are
rejected by chainweb P2P on public networks.
**API**
- `POST /api/admin/nodes/[id]/stoachain/certbot-obtain` β fresh-confirm +
ancient-admin. Body accepts `domain`, `email`, `challenge`,
`duckdnsToken`, `restart`.
**Node2 cert restored out-of-band** via SSH β see above for heights
proving the fix. Next: user can obtain LE certs for AncientLinux (DNS-01
via DuckDNS) through the new UI action.
**Version bump**
- `lib/version.ts` β `v0.7.3i`.v.Chaos.Jason.0-agΒ·v0.7.3h
**Systemd supervision in stoachain-control**
- `lib/handlers/stoachain-control.ts` β new `systemd` branch alongside
existing docker + screen paths. Resolves the unit name (prefers
`stoa-node.service`, falls back to any active `stoa*` / `chainweb*`
unit) and dispatches `systemctl start|stop|restart <unit>`.
- `waitForChainweb` reused for post-start liveness check.
- `detectSupervisionLive` now recognizes systemd (between docker and
screen in priority).
- Limitation documented: flag edits in the Flag Editor don't yet apply
to systemd-supervised nodes because the handler doesn't rewrite the
unit file's `ExecStart` line. Restart/Start/Stop work. Flag-driven
recomposes will land in v0.7.3i or later.
**Cert-doctor**
- New `lib/cert-doctor.ts` β inspects a node's TLS setup beyond just
"cert file exists":
- Issuer CN vs Subject CN β classifies `self-signed` / `ca-signed` /
`unknown`
- Scans for `certbot.timer` systemd unit + cron entries mentioning
`certbot` / `letsencrypt`
- Extracts last-run / next-run timestamps from the timer
- Status endpoint `/api/admin/nodes/[id]/stoachain/status` now returns
a `certDoctor` section.
**Identity card UI**
- New red warning banner at the top of the Identity card when:
- Cert is CA-signed (issuer β subject)
- Certbot auto-renew is active (timer or cron)
- Warning explicitly lists the class of problem and suggests rotation
to self-signed ECDSA.
- New `Issuer` KV row shows the issuer CN + cert-kind tag (`self-signed`
/ `ca-signed`). Red styling when ca-signed.
**Why this lands as one feature**
- StoaNodeOne was just added to the hub: systemd-supervised, Let's
Encrypt cert with active certbot timer. Surfacing both problems
(can't control via hub; cert will periodically rotate) in one release
so operators see the full picture.
**Version bump**
- `lib/version.ts` β `v0.7.3h`.v.Chaos.Jason.0-ahΒ·v0.7.3g
Response to the feedback *"all of these manual help-ups, in production
you are not there to fix shiet"* β every manual fix I've done during
this session is now exposed as a UI action the operator can trigger
themselves.
**Peer activity card + auto-detection banner**
- New `lib/peer-activity.ts` β parses chainweb-node's docker logs into
per-peer summaries (error count, last success/error, dominant failure
tag: `unknown-ca`, `timeout`, `conn-refused`, etc.).
- New API `GET /api/admin/nodes/[id]/stoachain/peer-activity?minutes=N`
β SSHes to target, pulls last N minutes of container logs, returns
events + summaries + **auto-detected issues** with suggested actions.
- New `PeerActivityCard` on Chainweb β Status sub-tab. Polls every 15s.
Shows per-peer table. If the node isn't syncing AND a dominant tag
of `unknown-ca` is detected β **red banner with "Reset peer trust"
one-click button**.
**Reset peer trust (self-service)**
- New handler `peer-trust-reset` (`lib/handlers/peer-trust-reset.ts`):
composes `seed-refresh` + `stoachain-reseed` in one job. Refreshes
seed from a healthy donor (excludes the target), then reseeds the
target. As a side effect, the target's peer-DB is replaced with the
donor's current view β stale fingerprints cleared.
- Honest caveat documented in the handler: **this is a pragmatic proxy**
for a surgical peer-DB wipe. True surgical would be a rocksdb key
prefix delete; building that requires chainweb source reading we
haven't done.
- New API `POST /api/admin/nodes/[id]/stoachain/peer-trust-reset`.
Fresh-confirm + ancient admin.
**Cert-rotate now generates ECDSA P-256** (was P-384)
- `lib/handlers/stoachain-cert-rotate.ts` β switched curve to P-256 with
SHA-384 signatures. Matches the original working Stoa cert (the one
AncientLinux trusted pre-incident); evidence suggests P-384 may be
rejected by some chainweb-node builds. 128-bit security is still
ample for P2P identity; cert generation is faster.
**Force-fail stuck job**
- New API `POST /api/admin/jobs/[id]/force-fail` + button on
`/admin/jobs/[id]`. Marks a `running` or `queued` job as failed in
the DB immediately. Operator escape hatch for jobs that never complete
due to worker bugs or external issues (ssh2 half-open channels, dead
remote processes). Warns that side effects the handler was in the
middle of may persist.
**Sudoers repair**
- New API `POST /api/admin/nodes/[id]/sudoers-repair` + new
`SudoersRepairCard` on Chainweb β Control sub-tab. One-click rewrites
`/etc/sudoers.d/ancientholdings-stoa` with the current canonical
NOPASSWD command list. Idempotent. Uses existing `tee` NOPASSWD grant
so no password prompt.
- Fixes the pre-v0.7.3d installs that didn't include `tar`/`df`/`du`/
`find` in sudoers (AncientLinux, Node2).
**Version bump**
- `lib/version.ts` β `v0.7.3g`.
**To test**: if AncientLinux still blocked by TLS, click Rotate on node2
again (will get P-256 this time) β if sync resumes, curve was the
issue. If not, click "Reset peer trust" on AncientLinux β will take
~20 min but should fully clear any stale-fingerprint pinning.
v.Chaos.Jason.0-aiΒ·v0.7.3f
v0.7.3e proved the full reseed pipeline works end-to-end β AncientLinux
jumped from cut height ~79,000 to ~1,621,032 (StoaNodeTwo's height at
seed-capture time) in minutes, as designed. But the handler parked in
`running` state at 90% afterward because of an ssh2 quirk.
**Root cause**: when the remote `tar -xz` exits cleanly after consuming
all stdin, ssh2 sometimes emits only the `exit` event and NEVER the
`close` event. The handler was waiting on `close` to settle the
promise, so it parked indefinitely even though tar finished + data was
correct.
**Fix**: settle on whichever of `exit`, `close`, or a post-EOF 60s
timer fires first. `plaintextTarGz.pipe(stream).on('end')` triggers
the timer as a belt-and-suspenders. Either:
- `exit` fires with the tar exit code β settle immediately based on it
- `close` fires without exit β settle based on stderr (empty = success)
- post-EOF 60s passes without either β settle based on stderr
All three paths guarantee deterministic resolution. No more 20-min
wait-on-timeout after successful reseeds.
**Recovery of the stuck job from v0.7.3e test**:
- Job `efabc16d-β¦` was stuck at 90% running. I killed the worker,
manually ran the remaining handler steps (mv staging β data, rm
data.old, docker compose up -d), and marked the job succeeded in
the DB so the UI reflects reality.
- AncientLinux verified running off the seed at height 1,621,032.
**Version bump**
- `lib/version.ts` β `v0.7.3f`.v.Chaos.Jason.0-ajΒ·v0.7.3e
v0.7.3d passed the sudoers preflight but failed during extraction with
`gzip: stdin: not in gzip format` β and the node got stranded again
(container stopped, data moved aside, extract dead). Two bugs:
**1. First chunks of the decrypted stream disappeared**
- `lib/handlers/stoachain-reseed.ts` β the progress-tracking `data`
listener was attached to `plaintextTarGz` BEFORE `.pipe()` was set up.
Adding a data listener puts a Node Readable into flowing mode
immediately; during the `await` gap before the SSH pipe attached, the
first chunks flowed into only the counter (no pipe yet) and vanished.
The remote tar received bytes starting mid-gzip β "not in gzip format".
- Fix: replace the separate `data` listener with an inline `Transform`
in the pipe chain. Every byte passes through counter β pipe β SSH
stdin, no losses.
**2. Failed reseed stranded the node**
- When extract fails, the handler had already stopped the node + moved
data aside. No rollback meant the operator had to SSH in and move
things back manually.
- New `rollbackAfterExtractFailure()` helper fires on any extract throw:
remove the (partial) staging dir, `mv data.old.<ts>` back to live, and
`docker compose up -d` (for docker supervision). Best-effort β each
step is try/catch, any rollback failure is logged but doesn't mask the
original extract error.
- Tight-disk mode (no data.old kept) skips the restore step with a clear
log line β operator must reseed or sync from genesis.
**3. Broader stderr pattern matching**
- `streamIntoTarExtract` now also settles immediately on:
- `not in gzip format` / `unexpected end of file` β stream plumbing / corrupt archive
- `error is not recoverable` / `child died with signal` β tar internal fatal
- `no space left on device` β disk full during extract
- Previously only sudo-denial patterns triggered early-settle; everything
else waited for the close event that ssh2 sometimes doesn't emit.
**Version bump**
- `lib/version.ts` β `v0.7.3e`.
v.Chaos.Jason.0-akΒ·v0.7.3d
First reseed on AncientLinux hung: the target's sudoers (written by the
install wizard) didn't include `tar`, so `sudo -n tar -xz` immediately
hit "a password is required" and the SSH stream closed in a way the
handler didn't catch. Job stuck at 22% "running" forever, and worse β
by the time we noticed, the node was already stopped with its data moved
aside.
Three fixes:
**1. Pre-flight `sudo -n tar` BEFORE stopping the node**
- `lib/handlers/stoachain-reseed.ts` β `preflightSudoTar()` runs a
harmless `sudo -n tar --version` as the first destructive-safe check.
If sudo denies it, fail immediately with the exact sudoers line the
operator needs. Node stays running, data stays put β recoverable state.
**2. Hang-safe stream plumbing**
- Handler's `streamIntoTarExtract` promise now has a single `settle()`
gate and hooks error / exit / close / stderr-pattern triggers all of
which settle deterministically. Sudo denial patterns in stderr settle
immediately instead of waiting for the SSH `close` event that ssh2
sometimes doesn't emit when the remote process dies before receiving
any stdin bytes.
- 20-minute belt-and-suspenders timeout β any state where the SSH
channel goes half-open without firing events still fails cleanly.
**3. Install-template sudoers now includes tar + df + du + find**
- `lib/nodes.ts` β bootstrap writes `tar`, `df`, `du`, `find` into the
NOPASSWD list. Every NEW install gets the right sudoers.
- **Existing installs (AncientLinux, Node2)** need a one-time sudoers
patch β the handler's new preflight surfaces this as a clear error
with the exact fix.
**Recovery on the stuck install**
- Stuck job `86285db2-β¦` marked failed in the DB manually (it was
never going to complete on its own).
- AncientLinux's moved-aside data dir restored to `/home/StoaNode/data`;
container brought back up; syncing resumed.
- AncientLinux's sudoers patched directly via SSH to include the new
entries.
**Version bump**
- `lib/version.ts` β `v0.7.3d`.
v.Chaos.Jason.0-alΒ·v0.7.3c
Consumer half of SC5 lands: a running-but-unsynced node can now jump
near head by pulling the hub's current seed instead of waiting days to
sync naturally. End-to-end streaming: hub decrypts the .ahbk in-memory,
pipes plaintext tar.gz over SSH into a `tar -xz` on the target. No
intermediate files anywhere β peak memory β SSH channel buffer.
**New handler: `stoachain-reseed`**
- `lib/handlers/stoachain-reseed.ts` β 8-step pipeline:
1. Preflight (current seed exists, target reachable, disk-space check)
2. Detect supervision (docker / screen), resolve host data dir via
docker inspect OR stored flags' `database-directory`
3. Stop node (`docker compose down` OR `screen quit` + pkill)
4. Move existing data aside: `mv data/ β data.old.<ts>/`
(or rm up front in tight-disk mode)
5. Stream decrypt from hub's `openArchiveStream()` β pipe into SSH
`sudo tar -xz -C <data.staging>`
6. Structural verify β `CURRENT` file present in staging
7. Atomic `mv data.staging/ β data/`
8. Restart node + delete `data.old/`
- Handles the 700 GB case cleanly: decrypt + extract happen as one
streaming pass; disk preflight requires 1.1Γ seed size free (or
~seed+existing in deleteOldFirst mode).
**Disk-space UX**
- Default mode keeps `data.old/` aside during extract, deletes on success
β peak disk ~2Γ for minutes only, rollback-safe.
- Tight-disk mode (operator-selected checkbox) deletes existing data
BEFORE extract β peak disk ~1Γ, zero rollback. UI warns explicitly:
"if extraction fails, you will have NO chain data."
**UI: Reseed card on Chainweb β Control sub-tab**
- `components/admin/NodeTabs.tsx` β new `ReseedCard` alongside
ControlCard + RunnerCard. Loads current seed via
`/api/admin/seeds`, shows donor / seed cut height / node cut height /
blocks-skipped-forward preview.
- Tight-disk checkbox. Confirm dialog explains destruction before
enqueue. Fresh-confirm required.
- Inline rollback/rewind warnings if seed height is behind node height.
**API**
- `POST /api/admin/nodes/[id]/stoachain/reseed` β fresh-confirm +
ancient-admin. Body `{deleteOldFirst?: boolean}`. Rejects with 409 if
no current seed exists on the hub.
**Not in this release** (deferred to v0.7.3d):
- Install wizard "seeded install" mode (for brand-new nodes, not
reseed). Needs install-handler extension + wizard UI β distinct code
path.
- Chainweb-node boot test of staging dir before swap. Adds ~60s per
reseed and hasn't been needed for the common docker case; revisit if
post-swap failures become a pattern.
**Version bump**
- `lib/version.ts` β `v0.7.3c` Β· `SC5 Seeded install β reseed pipeline`.v.Chaos.Jason.0-amΒ·v0.7.3b
End-to-end test of v0.7.3a on localhost surfaced three UX gaps:
1. **"No eligible donors"** even though Node2 was running with
`--enable-backup-api`. The filter was reading `probe.chainweb.backupEnabled`
from `system_probe_json`, which the probe doesn't populate β the field's
under `probe.chainwebFlags` (and currently empty). Authoritative source
for backup-api flag is actually `nodes.stoachain_flags_json` (set at
install + every Start/Restart via the hub).
2. **Only eligible donors shown.** When nothing was eligible, the admin had
no way to see WHY each managed node was excluded.
3. **No indication of in-flight refresh.** If the admin bounced away from
the job log page, there was no way to tell from /admin/seeds that a
new seed was being built.
All three fixed.
**Fix: donor detection from stored flags + live cut height**
- `lib/seeds.ts` β `storedBackupApiEnabled()` reads the DB-stored flag
profile (authoritative) instead of the probe.
- New `listManagedNodeStatus()` surveys every managed node in parallel,
SSHing each via `fetchLiveStatus` for cut height + reachability. O(slowest
node) not O(sum). Returns eligibility status per node:
- `eligible` / `eligible-rotation` β can donate (latter skipped by
auto-pick; admin can still pick manually)
- `no-backup-api`, `not-reachable`, `not-running`, `cut-too-low`,
`unknown` β with a human reason
- `listDonorCandidates` + `pickDonor` converted to async; rebuilt on top of
`listManagedNodeStatus`.
**UI: full managed-nodes table + active refresh banner + download queue placeholder**
- `/admin/seeds` gets three new sections:
- **Active refresh banner** (top): shows when a `seed-refresh` job is
`queued` or `running`, with live progress % + step label. Link to full
job log. Polls every 5s.
- **Managed nodes table**: every node the hub is managing, eligibility
badge (β / β· / β / β / β), cut height, last-donated date + relative
age. The "why not eligible" reason shows inline beneath the badge.
- **Active downloads table**: structure in place, populated as [] until
v0.7.3c ships the consumer streaming pipeline.
- Refresh controls updated: auto-pick works even when only
`eligible-rotation` nodes exist (explicit "recent donor β override"
label). "β³ refresh already in flight" notice blocks starting a second.
**APIs**
- `GET /api/admin/seeds` now returns `managedNodes`, `activeRefresh`, and
`activeDownloads` in addition to `current` + `archives`.
**Version bump**
- `lib/version.ts` β `v0.7.3b`.v.Chaos.Jason.0-anΒ·v0.7.3a
Start of phase SC5: the hub can now produce "seeds" β promoted backup
archives intended for serving to new / unsynced nodes so they skip
weeks of syncing-from-genesis. This release only covers the PRODUCER
side (hub making + promoting seeds); consumer side (new installs +
reseed consuming a seed) lands in v0.7.3b.
**Schema**
- `db/migrations/015_seed_archives.sql` β adds `seed_archives` (metadata
for promoted backups; one 'current' row at a time) and
`seed_downloads` (future queue/progress for clients streaming the
seed down; consumed in 0.7.3b).
- A seed row references a `backups.id` β the archive file itself lives
where the backups system put it (`data/backups/<id>.ahbk`). No
duplicated bytes on the hub.
**New library**
- `lib/seeds.ts` β CRUD for seed_archives, atomic `promoteBackupToSeed`
(previous current β archived in one SQLite tx, new insert), and
`pickDonor` / `listDonorCandidates` with health filters:
- node must have chainweb-node running per latest probe
- `--enable-backup-api` enabled
- cut height within 5% of the tallest candidate (proxy for "synced")
- not donor in the last 3 days (rotation; skipped when only one
candidate is available)
**New handler**
- `lib/handlers/seed-refresh.ts` β full 4-step flow:
1. Pick donor (explicit or auto-rotated)
2. Capture donor live status (cut height + chainweb-node version for
the seed manifest)
3. Run `stoachainBackupHandler` as a direct function call (same code
path as manual customer backups, same encryption)
4. Promote the resulting `backups` row as the new current seed;
previous current β archived
- Registered as kind `seed-refresh`.
**Admin panel**
- New page `/admin/seeds` β current seed card (donor, cut height, size,
sha256, age), eligible-donors picker, one-click "Refresh seed now"
button (fresh-confirm + ancient-admin), and a history table of past
promotions.
- Link added from `/admin/`.
**APIs**
- `GET /api/admin/seeds` β read-only (plain admin auth); returns
current + archives + donor candidates.
- `POST /api/admin/seeds/refresh` β enqueues a `seed-refresh` job;
fresh-confirm + ancient-admin; body `{donorNodeId?: string}`.
**Not in this release** (lands in 0.7.3b):
- Install wizard "seeded install" mode
- `stoachain-reseed` handler for existing nodes (stop β download β
verify β swap β start)
- Streaming download + extract pipeline (tar.zst + secretstream β staging
dir β boot test β atomic rename)
- Disk-space preflight with keep-old vs delete-old UX
**Version bump**
- `lib/version.ts` β `v0.7.3a` Β· `SC5 Seeded install β producer side`.v.Chaos.Jason.0-aoΒ·v0.7.2d
Restart on AncientLinux actually worked β chainweb-node came up inside
the recreated container with the new `p2p-hostname=bytales.duckdns.org`
and `cluster-id=AncientMiner`, syncing cuts from node1 at height 70490.
But the hub's job reported "failed" because `waitForContainerChainweb`
couldn't detect the process.
Root cause: `docker top stoa-node -eo comm` fails on this Docker /
kernel combo with `"Couldn't find PID field in ps output"`. The custom
`-eo comm` ps-options syntax isn't universally supported.
Fix: drop the custom ps format. Use the default `docker top` output
(which includes the full CMD with argv in the rightmost column) and
grep for the substring `chainweb-node`. Works whether the entrypoint
execs `/chainweb/chainweb-node` (current image) or any wrapper, and
doesn't rely on a specific ps format.
Also: backfilled the AncientLinux node row's
`stoachain_last_action=restart` + `stoachain_runner_path` to
`/home/StoaNode/chainweb/docker-compose.yml` so the UI reflects the
de-facto successful restart from the previous job attempt.
Known quirk surfaced by the logs: home node can't sync peers from
node2.stoachain.com due to `certificate has unknown CA` β backlog item,
not new. node1 syncing works fine.
**Version bump**
- `lib/version.ts` β `v0.7.2d`.
v.Chaos.Jason.0-apΒ·v0.7.2c
v0.7.2b's docker branch was technically correct but didn't run: the worker
had booted before the code change and kept running the old screen-only
handler. When the user hit Apply + Restart on AncientLinux (docker-
supervised), the stale handler ran `screen -X quit` (no-op), then `pkill
-TERM chainweb-node` β which killed the chainweb-node process INSIDE the
container (PID-visible on the host), then tried to write the runner to
`/data/RunStoaNode.managed.sh` (the container-internal data dir path,
which doesn't exist on the host). Job failed. Container's restart policy
(`restart: unless-stopped`) brought chainweb-node back up.
Three preventive changes so this can't recur.
**Worker logs its VERSION on boot**
- `worker/index.ts` β import `VERSION`, `PHASE_CODE`, `PHASE_NAME` from
`lib/version.ts` and banner them on startup. Operators can now tell at
a glance (tmux scrollback, PM2 logs) whether the worker process is
running current code after a patch. Every suffix-bump (`v0.7.2b β c`)
changes the banner text.
**Screen-path stopNode refuses to pkill when a stoa-node container is running**
- `lib/handlers/stoachain-control.ts` β before the screen quit + SIGTERM
sequence, the handler checks `docker ps --filter name=stoa-node
--filter status=running`. If a live stoa-node container is found, it
throws a clear error asking the operator to restart the worker. The
supervision branch at the top of the handler should have caught this
earlier, so the only way the screen path ever reaches a docker node is
stale worker code.
**CLAUDE.md documents `npm run worker:watch`**
- The `package.json` already had `worker:watch` using `tsx watch`, which
auto-reloads on every `.ts` change. `CLAUDE.md` now recommends it as
the dev default; the plain `npm run worker` only makes sense when
you're debugging the worker itself and don't want auto-restart.
**Version bump**
- `lib/version.ts` β `v0.7.2c`. Suffix-ticks on every patch from now on
(`a`, `b`, `c`, β¦), as requested β the live badge shows it, the
worker banner shows it, the changelog cross-references it.
v.Chaos.Jason.0-aqΒ·v0.7.2b
Follow-on from v0.7.2a after the user pointed out that the RunnerCard told
the same screen-based story regardless of supervision mode β and noticed
the description was wrong for the docker-supervised node (AncientLinux).
Turned out that wasn't just bad copy: the `stoachain-control` handler was
entirely screen-only. Clicking Apply + Restart on a docker-supervised node
would have tried `screen -dmS StoaNode` against a container β nonsense.
**`stoachain-control` is now supervision-aware**
- `lib/handlers/stoachain-control.ts` β live supervision detection at the
start of every run (`docker ps -a --filter name=stoa-node`, then
`screen -ls`). The handler dispatches to a docker branch or the
original screen branch.
- Docker Restart: `docker inspect stoa-node` to find
`com.docker.compose.project.working_dir` + current image tag β
`computeLayout()` from that dir β `renderDockerCompose(layout, imageTag,
flags)` β write `docker-compose.yml` over SSH β `docker compose up -d
--force-recreate`. `--force-recreate` ensures new env vars take effect
even when the image tag hasn't changed. Waits for `chainweb-node` to
appear in `docker top` output, up to 4 minutes.
- Docker Stop: `cd <composeDir> && docker compose down`. The container
is removed; next Start recreates from the compose file.
- Docker Start: same as Restart, but only runs the up/wait phase.
- `stoachain_runner_path` for docker nodes stores the compose file path
(that's the "hub-rewritten-on-every-Restart" thing for docker), so the
status endpoint continues to identify docker-managed nodes the same
way it did before.
- Compose dir not found (container `docker rm`'d manually) β clear
error message asking the operator to re-run Install.
**`RunnerCard` β honest per-supervision copy**
- `components/admin/NodeTabs.tsx` β split into `ScreenRunnerCard` and
`DockerRunnerCard` with correct fields + accurate procedure writeups.
Docker version surfaces: container name (`stoa-node`), the inside-
container binary path (`/chainweb/chainweb-node`), the host-side
compose path, and the three bind-mount pairs (data, cert, key). The
"How Start / Restart works" steps describe the actual flow β inspect β
renderDockerCompose β tee β up --force-recreate β poll docker top.
- Screen version keeps its original writeup but re-titled "Runner +
binary (screen)" for clarity, and clarifies that the legacy-runner
rollback path is screen-specific.
**Version bump**
- `lib/version.ts` β `v0.7.2b`.
**Known caveat**
- If Apply + Restart on a docker node is the first time the hub has
operated on it, the stored flags profile is already there (install
wizard wrote it). But if someone imported a docker node without
running the wizard (rare β no UI path), stoachain_flags_json is
empty, and the first Restart will fall through the live-capture
branch. The capture logic parses `ps -eo args` which on a docker
host includes the container's chainweb-node argv β should work.
v.Chaos.Jason.0-arΒ·v0.7.2a
End-to-end test on the live site with the home node surfaced several
usability issues addressed here.
**Chainweb tab: sub-tab navigation**
- `components/admin/NodeTabs.tsx` β the Chainweb tab's cards were stacked
vertically and scrolled 3+ screens deep. Split into sub-tabs:
`Status | Control | Flags | Identity | Backup`. URL hash format is
`#chainweb/<sub>` so links are bookmarkable.
- The Flags sub-tab has an inner toggle: `Edit config` (default) vs
`Current (live)`. Read-only live-parsed view is always one click away.
**Editor prefilled with live ("ghost") values**
- Every input is now pre-filled with the value chainweb-node is actually
running. The operator changes only what they want (e.g. `p2p-hostname`
from `ancientminer.home` to `bytales.duckdns.org`) and the rest stays
put.
- Previously the editor seeded from the stored profile JSON, which for
newly-added nodes was empty β every input looked unset even though the
node had 35+ live flags. Moot for nodes that had been Restart-ed through
the hub once; fatal UX for nodes that hadn't.
- On Apply, the editor sends a SNAPSHOT of the full live profile + pending
edits as the new stored profile. "Save what's running, plus my changes,
into the DB." No more slowly-growing stored profile that lags behind
live.
**Flag validation tightened**
- `lib/stoachain-flags-catalog.ts` β `block-gas-limit` min bumped from 0
to 1_600_000 (the Stoa network production min). Clearing the field
still falls back to chainweb-node's compiled-in default (1.6M), which
now matches.
**GET /flags no longer requires fresh-confirm**
- `pages/api/admin/nodes/[id]/stoachain/flags.ts` β read-only GET
downgraded from `requireFreshAdminConfirmApi` β `requireAdminApi`.
Matches the other read-only endpoints (`status`, `docker-logs`,
`preflight`). Opening the Flags sub-tab without typing your password
in the last 5 minutes no longer 401s.
- PATCH still requires fresh-confirm + ancient-admin (restarting
chainweb-node interrupts P2P gossip β destructive).
**Row metadata swap**
- FlagRow now surfaces "stored differs from live" instead of the other
way around. Since the editor prefills from live, the interesting case
is "you hand-edited the runner without restarting through the hub,
so stored lags behind what's actually running." Apply still
snapshots-live-then-persists, which is the desired rebaseline.
**Version bump**
- `lib/version.ts` β `v0.7.2a`.v.Chaos.Jason.0-asΒ·v0.7.2
First live edit path for chainweb-node flags. Previously the hub could only
start / stop / restart the node with whatever profile was captured at install
time; to change a flag the operator had to SSH in, hand-edit the runner, and
hope they didn't mistype. Now every catalog-known flag has an input control
in the UI with validation, a pending-diff counter, and Apply + Restart.
**New API endpoint**
- `GET /api/admin/nodes/[id]/stoachain/flags` β returns the stored profile
JSON from the `nodes.stoachain_flags_json` column. Empty `{}` if the node
was never restarted via the hub (first Apply seeds it).
- `PATCH /api/admin/nodes/[id]/stoachain/flags` β accepts
`{ flags: Partial<ChainwebFlags>, restart?: boolean }`. Validates every
incoming key against `FLAGS_CATALOG` (type + range + enum + hex), rejects
immutable flags (`chainweb-version`, `database-directory`, cert paths),
merges into the stored profile, persists, optionally enqueues a
`stoachain-control` restart job. Value `null` for any key means "revert
to chainweb-node default" (key is dropped from stored JSON).
- Both routes require `requireFreshAdminConfirmApi` and ancient-admin; PATCH
requires both because restarting chainweb-node interrupts P2P gossip.
**New UI card β Flag editor** (below the read-only Flags card on the Chainweb tab)
- `components/admin/NodeTabs.tsx` β `<FlagEditorCard>` with per-flag inputs
grouped by category (Core / Data / TLS / P2P / Consensus / Mempool /
Service / Mining / Backup / Logging / Runtime / Debug).
- Input type per flag is driven by `FlagMeta`: switch-pair β checkbox,
enum β select, number β numeric input with `[min, max]` hint, repeatable
β textarea (one entry per line), hex / string / path β text input.
- "Show all flags" toggle surfaces the full catalog (~40 flags); default
view shows only what's currently set in the stored profile plus the
always-visible immutable (locked) rows.
- Each row shows a `pending` badge when changed, `locked` badge on
immutable flags, `debug` badge on debug-only flags, the inline
description, the relevant catalog warning when the value deviates from
default, and the live value if it differs from stored.
- Per-row `revert` discards the pending change; `clear` sets to null
(chainweb-node falls back to its compiled-in default).
- Footer: pending count, `discard all`, `Save (no restart)`, and
`Apply + Restart`. Save-only persists to the DB so the next Restart
picks up the new flags; Apply + Restart does both, enqueues the
`stoachain-control` job, and polls inline (same pattern as
CertRotateButton β no nav-away required to watch progress).
- GHC runtime (`+RTS`) gets its own row at the bottom; it's not a flag
per se but the handler emits it in the runner.
**Integration**
- The flag-editor writes the same `stoachain_flags_json` column that
`stoachain-control restart` already reads when rebuilding the runner
script (for screen-supervised nodes) or the compose file (for
docker-supervised nodes). No new wiring needed β "edit flags β Apply
+ Restart" just happens.
- `FuturePhase` banner on the Chainweb tab updated: flag editor is no
longer future work. Listed forward: v0.7.3 seeded install, v0.8.x web
terminal / hub registry.
**Version bump**
- `lib/version.ts` β `v0.7.2` Β· `SC4 StoaChain flag editor`.v.Chaos.Jason.0-atΒ·v0.7.1b
Two quality-of-life additions on top of v0.7.1a's install flow, observed
during end-to-end test on a home Linux machine.
**Container logs card** (new)
- `components/admin/NodeTabs.tsx` β ContainerLogsCard shown on Chainweb
tab when the node is supervision=docker. Live tail of `docker logs
--tail N stoa-node` via a new API route.
- Configurable line count (100 / 200 / 500 / 1000 / 2000); auto-refresh
every 5s (toggle); Copy-all button for pasting into support convos.
- New endpoint: `GET /api/admin/nodes/[id]/stoachain/docker-logs?lines=N&container=X`.
Any-admin access; read-only; safe to poll.
**Auto-reprobe after mutating jobs** (new)
- `stoachain-install`, `stoachain-control` (start/stop/restart), and
`stoachain-cert-rotate` handlers now enqueue a `system-probe` job as
their last successful step. The probe runs within seconds of the
mutating job finishing, so the Docker tab's container listing +
Overview "Services detected" rows reflect the new state without the
operator needing to click Reprobe.
- Also wired on the UI side for CertRotate + Install β they POST a
probe from the browser on job completion as belt-and-braces (fires
even if somehow the handler-side enqueue fails).
- Fixes the UX mismatch we hit during testing where preflight showed
the target clean but Docker tab still showed a zombie stoa-node.
**Wizard UX fix: P2P hostname is OPTIONAL**
- Tested install on a home machine with no DNS name β the wizard
required a P2P hostname input, which when filled with a placeholder
(`ancientminer.home`) advertised an unresolvable name to peers.
node1 rejected sync with HTTP 400 and node2's TLS failed.
- Fix: P2P hostname field marked optional. Blank = the install sends
`0.0.0.0` (chainweb's auto-detect-via-peer-gossip sentinel). Peers
use the NAT-translated source address instead of the advertised
hostname. Home operators now work without DNS.
- Regex validation still applies when the field IS filled β for
operators with real DNS names.
- Updated helper text explicitly tells home operators to leave it
blank and tells validators/bootstraps to fill it in.
**End-to-end test result (v0.7.1a + b combined)**
- Fresh home Linux machine (AncientMiner, 32 GB RAM, NVMe): preflight
green, install wizard succeeds, container healthy, chainweb-node
begins receiving cut data from node1.stoachain.com. Cut height
climbing β node actively syncing.
- Two real chainweb quirks observed during test (tracked as separate
backlog items, NOT install-flow bugs):
- TLS handshake with node2.stoachain.com returns
"certificate has unknown CA" despite `_disablePeerValidation=True`
in Stoa version config.
- node1.stoachain.com returns HTTP 429 during aggressive initial
cut polling; self-resolves as peer relationship stabilizes.
---v.Chaos.Jason.0-auΒ·v0.7.1a
Incremental patch on v0.7.1. Extends the existing Easy-setup bootstrap
flow in `/admin/nodes/new` so that, in addition to installing the hub's
SSH key, it also prepares the target for container-based chainweb
management. Fills the gap v0.7.1's install wizard revealed: the
wizard assumed a "prepared" server, but the bootstrap flow wasn't
actually preparing anything beyond SSH auth.
**What Easy setup now does** (was: only steps 1-5)
1. Password-auth SSH into the target (password used in-memory, never stored)
2. Generate ed25519 SSH keypair
3. Install the public key in `~/.ssh/authorized_keys`
4. Reconnect with the new key to verify it works
5. **(new)** Install `docker.io` if missing (apt / dnf / yum auto-detected)
6. **(new)** Enable + start the docker daemon via systemctl
7. **(new)** Add SSH user to the `docker` group (skipped when user is root)
8. **(new)** Write `/etc/sudoers.d/ancientholdings-stoa` with NOPASSWD for
`docker, mkdir, chmod, chown, openssl, tee, systemctl, screen, pkill, mv`
(skipped when user is root β root doesn't need sudo)
9. **(new)** Verify `sudo -n docker --version` works β proves sudoers took
effect before declaring bootstrap successful
10. Seal the private key in the vault
11. Show the private key to the operator once for external backup
All steps idempotent β safe to re-run against an already-prepared box
without breaking anything.
**UI changes** (`pages/admin/nodes/new.tsx`)
- Expanded "Easy setup" helper block into a dropdown listing the full
11-step sequence with inline docs about password handling, distro
detection, and root-user semantics. Labeled as recommended.
- Added equivalent dropdown on "Advanced" explaining when to use it
(rare β only for pre-existing SSH key auth + manual prep) and what
it skips vs Easy setup.
- Clarified the root-SSH caveat (modern Linux distros block root
password login; use a non-root sudo user instead).
**What this enables**
A truly fresh Linux box (whether a home machine, a Hetzner VPS, or a
DigitalOcean droplet) can go from "ssh user + password" to "fully
hub-managed chainweb-node capable" in one click, with zero manual
prep. The install wizard from v0.7.1 now works end-to-end on any
blank Linux target.
**Code changes**
- `lib/nodes.ts` β new `prepareTarget()` function runs after SSH key
install. Constructs a distro-aware setup script executed in one
SSH round-trip under the existing password auth. Idempotent.
- `pages/admin/nodes/new.tsx` β copy expansion only; no logic change.
- Version bumped to `v0.7.1a`.
---
v.Chaos.Jason.0-avΒ·v0.7.1
Hub can now provision a fresh chainweb-node container on any registered
server via a UI wizard β no SSH, no Haskell toolchain, no manual docker
commands. Uses the published `ghcr.io/stoachain/stoa-node:latest` image
(v2.32.0-stoa.1 or later).
**Pre-work shipped outside the hub repo** β StoaChain repo now has
first-class container support:
- `docker/entrypoint.sh` expanded to cover the full production flag surface
(all ~30 flags, correct mining-coordination vs node-mining semantics,
ECDSA P-384 cert auto-detection).
- `cabal.project` pins crypton 1.0.4 / memory 0.18.0 / merkle-log 0.2.0
to survive Hackage drift from post-Kadena-shutdown dep churn.
- Published image at [ghcr.io/stoachain/stoa-node](https://github.com/StoaChain/stoa-chain/pkgs/container/stoa-node),
GitHub Release [v2.32.0-stoa.1](https://github.com/StoaChain/stoa-chain/releases/tag/v2.32.0-stoa.1)
with the raw binary attached for operators who don't use docker.
**On the hub side:**
- `lib/stoachain-layout.ts` β canonical `StoaNode/` layout generator.
Every install creates `<root>/StoaNode/{chainweb,data/backups,tls}/`
with docker-compose.yml + nginx.conf.example. Backups stay inside
data/ (hardlink-friendly for RocksDB checkpointing). Service API
defaults to `127.0.0.1` binding unless operator opts into public.
- `lib/stoachain-install-preflight.ts` β single-SSH-roundtrip env audit:
docker installed + running, RAM β₯ 4 GB, drive class (NVMe/SSD/HDD),
sudoers `NOPASSWD`, port 1789 / 1848 availability, whether any
chainweb-node is already running. Returns structured report; HDD
targets flagged red, warnings on low RAM, exact sudoers line included
when sudo is denied.
- `lib/handlers/stoachain-install.ts` β orchestrator job handler:
1. Create canonical layout (sudo mkdir + chown)
2. Generate ECDSA P-384 cert + key at `tls/`, chmod 600 on key
3. Render hub-managed docker-compose.yml from the chosen profile
4. Write nginx.conf.example as reference (not applied)
5. `docker pull ghcr.io/stoachain/stoa-node:latest`
6. `docker compose up -d` in `chainweb/`
7. Wait up to 120s for container's `/info` to respond
8. Persist stoachain_flags_json + runner_path + last_action in DB
Failure at any step aborts without auto-rollback; partial state
visible on target for manual inspection.
- API routes:
- `POST /api/admin/nodes/[id]/stoachain/preflight` β any admin; runs
the checks and returns the structured report.
- `POST /api/admin/nodes/[id]/stoachain/install` β Ancient-admin-only,
fresh-confirm required; validates body (root path, hostname format,
pubkey hex) then enqueues `stoachain-install` job.
- `components/admin/InstallWizard.tsx` β 6-step wizard:
1. Preflight (auto-runs; advanced-override checkbox available if any
fail and operator knows better)
2. Storage (drive picker with size/class badges; auto-selects best)
3. Identity (P2P hostname + optional cluster-id)
4. Profile (Recommended vs Mining coordinator; pubkey field shown
for mining; backup-API + public-service toggles)
5. Review (shows full install plan + Apply button with
fresh-confirm modal)
6. Running (live progress bar + log tail, success banner with
post-install manual steps checklist, or failure with recovery
guidance)
Shown on Chainweb tab only when the node has no chainweb-node running
and no hub-managed runner path recorded β i.e. truly fresh targets.
**What you can now do**
- Register a fresh Linux box in Nodes β click Install chainweb-node β
walk through 5 steps β have a syncing StoaChain node in ~2-5 minutes.
- Legacy node2 (screen-managed, pre-hub) is unaffected β the install
wizard only appears on nodes without an existing chainweb-node.
**Explicitly deferred** (next phases)
- Flag editor UI to change flags post-install β v0.7.2
- Seeded install from donor .ahbk β faster bootstrap β v0.7.3
- Migrate existing screen-managed node2 to container mode β v0.7.4
- Hub-hosted container registry as backup to GHCR β v0.8.x
- Vendor deps into StoaChain repo for supply-chain resilience β v0.8.x
---v.Chaos.Jason.0-awΒ·v0.7.0
First phase of the v0.7.x arc. The hub now understands chainweb-node
at the flag level, can read it live over SSH, display its identity /
storage / peer state, and Start/Stop/Restart it against the existing
screen-managed production node.
**What shipped**
- `docs/chainweb-reference.md` β ~7,900-word living reference covering
every chainweb-node flag we care about (defaults, roles, warnings,
ranges, citations into StoaChain Haskell source), the TLS certificate
system, the P2P discovery cascade, service-API endpoint catalog, and
an audit of the production runner script (14 of 35 flags are default
no-ops, some like `--mining-update-stream-limit 50` are below default
β candidates for cleanup).
- `lib/stoachain-flags-catalog.ts` β 40-flag catalog with role /
category / type / default / recommended / warning / doc-anchor
metadata, plus two named profiles:
- **Ancient** β byte-for-byte reproduction of the production script
(~35 flags).
- **Recommended** β minimal-equivalent (~20 flags, same behavior).
- `lib/stoachain-flags.ts` β `fromPsArgs`, `fromScript`,
`toRunnerScript`, `toDockerEnv`, `diffFlags`. One model, two
materializations (bash runner for screen mode today, docker env
for v0.7.1 container mode).
- `lib/stoachain-live.ts` extended:
- `fetchLiveFlags` β parses the live `ps -eo args` into structured
`ChainwebFlags`, detects parent runner script, classifies which
named profile matches (or `custom`).
- `fetchLiveCert` β reads TLS cert + key over SSH, runs `openssl`
for SHA-256 fingerprint, subject, validity dates, key-file perms.
Distinguishes persistent / ephemeral / missing modes.
- `fetchLiveDrive` β walks data-directory β mount β block device β
`/sys/block/.../queue/rotational`. Classifies as NVMe / SSD /
HDD; used by the UI to flash a red warning when chainweb-node
lives on a rotational drive.
- `lib/handlers/stoachain-control.ts` β Start / Stop / Restart job
handler. On first touch, parses live argv into flags and saves them
as the stored profile. Thereafter renders a hub-managed runner
script (`<data-dir>/RunStoaNode.managed.sh`) from that profile on
every Start/Restart; the operator's legacy runner is never
overwritten. Stop sequence: `screen -X quit` β 10s grace β TERM β
10s β KILL. Missing `sudo -n` permissions surface with the exact
sudoers line to add.
- `lib/handlers/stoachain-cert-rotate.ts` β openssl-over-SSH cert
generation + rotation. Refuses to run while chainweb-node is
active. Two modes: `upgrade` (first-time cert from ephemeral) and
`rotate` (archive existing to `*.TS.old`). Handler wired; UI
button deferred to v0.7.1.
- Migration 014 β 8 new columns on `nodes`:
`stoachain_flags_json`, `stoachain_flags_at`, `stoachain_profile`,
`stoachain_binary_path`, `stoachain_runner_path`,
`stoachain_last_action`, `stoachain_last_action_at`,
`stoachain_last_action_by`. Trimmed from an earlier design β
anything trivially queryable live over SSH (cert expiry, drive
usage, live flags) is NOT persisted, to keep the DB narrow and
avoid stale display.
- API endpoints:
- `GET /api/admin/nodes/[id]/stoachain/status` β single round-trip
payload: live status + cert + drive + flags + audit. Polled
every 10 s from the UI.
- `POST /api/admin/nodes/[id]/stoachain/control` β enqueues a
`stoachain-control` job. Requires fresh admin confirm.
- `ChainwebTab` UI rebuild β seven cards:
- **Status** β live tone badge, profile badge, per-chain height
grid (10 cells), peer count, auto-refresh timestamp.
- **Control** β Start / Stop / Restart buttons with
confirm-password modal; disabled states wire to the live
running flag; last-action audit line.
- **Peer identity (TLS)** β color-coded badge (Certified /
Ephemeral / Missing), fingerprint, subject, validity with
days-until-expiry (red <30d, amber <90d), key-perm warning.
- **Storage** β drive-class badge (NVMe / SSD / HDD), mount, fs
type, capacity + used %, red warning on HDD.
- **Flags** β grouped read-only table of every parsed flag,
matching-profile badge, catalog-gap collapsible for unknown
flags.
- **Runner + binary** β paths for both; the hub-generated runner
path is shown even before first Start so operators know where
the managed script will land.
- Existing **Backup** card unchanged.
- Profile classification: a simple equality check against Ancient and
Recommended. Everything else classifies as `custom`.
**Research inputs**
- StoaChain Haskell source at `d:\_Claude\StoaChain\` (branch
`AncientStoa`) β ground truth for flag defaults + semantics.
- `chainweb-node --help` output captured from production binary
`StoaChain_2.32.0` (276 lines, `docs/research/chainweb-node-help.txt`).
- User's two runner scripts (`RunStoaNode.sh` and `.backupapi.sh`),
captured over SSH.
- Kadena upstream docs (archived since Kadena Inc. shutdown
2025-10-21; mainnet last block 2025-11-15).
**Explicitly deferred (come later in v0.7.x)**
- Flag editor UI (change profile / individual flags) β v0.7.2.
- Cert rotation UI (handler exists; button deferred) β v0.7.1.
- Container-mode detection (screen vs docker) β v0.7.1.
- Hub-driven install on fresh host β v0.7.2 (includes drive
auto-select + canonical `StoaNode/{chainweb,data,tls}/` layout).
- Seeded install (donor .ahbk β new node) β v0.7.3.
- Screen β container migration button β v0.7.4.
- GHCR image publish workflow β v0.7.2 (StoaChain repo addition).
**Known gaps from research (future-Claude TODOs)**
- `--enable-local-timeout` semantics β Bool or Β΅s? Flagged in doc Β§8.
- Backup-API on port 1848 is unauthenticated β documented in
research Β§4, production workaround is SSH+localhost; long-term plan
is firewall-to-loopback or nginx auth.
- `--bootstrap-reachability 0` silently masks a firewalled P2P port;
v0.7.0 shows an amber warning in flags view when this is set.
---