Documentation · Releases · Antikythera

Antikythera

v.G.1.3 — Antikythera

The hub measures CPU against the actual chainweb-node hot path, not an abstract sysbench number.

Antikythera is the cpu-benchmark-chainweb-tuning spec. The pre-v.G.1.3 bench ran sysbench (single + multi-thread) and called it done — a generic CPU number that didn't say much about whether your box would run chainweb well. The v.G.1.3 bench adds six workloads that mirror chainweb-node's hot path (Blake2s hashing, secp256k1 ECDSA verify, Ed25519 verify, random-memory access, single-thread Pact-proxy, hash-and-verify pipeline), promotes the IPC instructions-per-cycle counter from diagnostic to a scored signal at ~10% weight, blends the lot into a chainweb-aggregate score, and labels the result with one of four buckets — strong / good / adequate / weak — so operators can read at a glance whether a box is chainweb-friendly. The slice bench reaches full structural parity with the host bench: single-arbiter scoring formula via `computeServerScore` for both bench types. And along the way the rehaul caught a pre-existing bash-redirection bug that had been silently nulling IPC fleet-wide since the perf-stat code shipped — a detail that itself justified the entire rehaul.

What landed

Six chainweb-correlated workloads

Blake2s throughput — mirrors chainweb's hash-tree construction. Measured via openssl speed -evp blake2s256. Reported as MB/s.
secp256k1 ECDSA verify — mirrors transaction-signature verification. Measured via openssl speed ecdsap256. Reported as verifications/sec.
Ed25519 verify — mirrors block-signature verification. Measured via openssl speed ed25519. Reported as verifications/sec.
Random-memory access — mirrors RocksDB random-read latency. Measured via sysbench memory --memory-access-mode=rnd --memory-block-size=4K. Reported as MB/s.
Pact-proxy single-thread — mirrors Pact contract evaluation latency. Measured via sysbench cpu --threads=1 --cpu-max-prime=20000. Reported as events/sec.
Hash-and-verify pipeline — mirrors chainweb's block-validation hot path. Custom shell loop combining Blake2s hash + Ed25519 verify in a tight inner loop. Reported as ops/sec.

Total bench-time addition ≈ 30 seconds. No new apt dependencies: every workload uses openssl, sysbench, and b2sum — all already part of the existing bench deps phase.

Chainweb-aggregate score + bucket label

The new chainwebScore blends:

50% multi-thread sysbench events/sec (the legacy signal, retained)
10% IPC ratio (instructions-per-cycle from the perf hardware counter)
40% averaged chainweb-workload ratios (the six workloads above, equally weighted)

Each ratio is the measured value divided by the calibrated baseline for a reference 6c/12t modern desktop class (Ryzen 5 5600 / i5-12400). The blended score then maps to a four-state bucket per REQ-08:

★ Strong — score ≥ 1.0, at or above reference.
✓ Good— 0.75 ≤ score < 1.0, solid chainweb performance, slightly below reference.
~ Adequate— 0.5 ≤ score < 0.75, usable but not optimal.
! Weak— score < 0.5, significantly below reference; chainweb-node may struggle under load.

IPC promotion + the silent-null bug

The pre-v.G.1.3 perf-stat invocation contained a Bash redirection order-of-operations bug: (perf stat ... >/dev/null 2>&1) 2>&1 | tail -30 discarded perf's stderr (where the hardware-counter table lives) inside the subshell before the outer 2>&1 could capture it. Result: the IPC value has been silently NULL fleet-wide since this code shipped. v.G.1.3 fixes it via a capture-then-detect rewrite (no subshell-internal silencing) plus a sudoers refresh that adds /usr/bin/perf + /usr/lib/linux-tools-*/perf so sudo -n perf stat works on Ubuntu 22.04+ paranoid-3+ kernels. First re-bench after the upgrade may show a step-change in IPC for boxes that were silently failing before.

Slice bench full mirror — single-arbiter scoring

The pre-v.G.1.3 segregated-slice bench had its own parallel scoring path that bypassed computeServerScore — slice rows scored differently from host rows under the same hardware. v.G.1.3 retires the inline shortcut and routes both bench types through a single canonical formula. Score-formula bug fixes from now on affect both bench types simultaneously. The slice bench also gets the same six chainweb workloads (executed inside the slice's systemd-run --scope cgroup), the same IPC capture (with graceful fallback when perf-stat fails inside the cgroup), and the same phase markers that drive the new substep UI.

Operator UX — live substep list

The pre-v.G.1.3 bench-progress card showed a single CPU progress bar. v.G.1.3 replaces it with a vertical substep list (mirroring the v.G.1.2e bootstrap-stage UX): each of the nine substeps — cpu_single, cpu_multi, perf_stress, plus the six chainweb workloads — transitions ○ pending → ● in-progress → ✓ complete with the measured value rendered inline. Workloads that are unavailable (older OpenSSL without Ed25519, missing b2sum, etc.) render as ○ skipped (unavailable) rather than blocking the bench.

Bucket-label visibility

Operator's own server score card — bucket-label tile next to the CPU score, with a hover hint explaining the bucket semantics.
Admin cluster overview / hub nodes index — bucket pill in the ServerScore column.
Public node listings — only the raw chainweb score is visible, NOT the bucket label (audited and locked with a regression test).

Operator notes

No forced re-bench. Existing bench rows survive the migration with NULL on the new chainweb fields → legacy formula path keeps scoring them correctly. Operators who want the new bucket label run a fresh bench.
First re-bench surfaces the IPC fix. Boxes that were silently failing perf-stat capture before will now report real IPC values, potentially shifting the chainweb score by a few percent.
Calibrated baselines, not absolute thresholds. The seven baselines (sysbench, IPC, plus six chainweb workloads) are starting points calibrated against a reference modern desktop. The v.G.1.3 spec ships a runtime-configurable getBaseline(key) mechanism so recalibration is possible without redeploy via the system_state table.
REQ-11 reduced contention penalty. The IPC term and the six chainweb workload terms are single-thread by design; the contention penalty applied to them uses only steal- time + run-variance (drops the multi-thread scaling-efficiency component). The legacy multi-thread sysbench term keeps the full penalty.

Out of scope (deliberately)

No Geekbench / PassMark calibration. Early discussion explored translating the existing sysbench number into Geekbench/PassMark units. The operator explicitly rejected this mid-discovery: the goal is to measure chainweb's hot path directly, not to translate a generic CPU number into another generic CPU number.
No 7-Zip LZMA proxy and no Pact-evaluation runtime bench. Both were considered but deferred to later releases — the six workloads above cover chainweb's actual hot path; adding more would push the bench-time budget without proportionate signal gain.
No forced operator action. Re-benching is encouraged but never blocking.

Background — why "Antikythera"

The Antikythera mechanism is the world's first known analog computer. Constructed around 100 BCE, recovered in 1901 from a Greek shipwreck off the island of Antikythera, it used some 30 bronze gears to predict astronomical positions, eclipse cycles, and the timing of the ancient Olympic games — with a precision that wasn't matched until the mechanical clocks of ~1500 years later.

It is also literally the first device known to have measured computational behaviour: its gear-train computed celestial mechanics with the same kind of precision-against-a-known-reference contract that the v.G.1.3 chainweb bench applies to CPU hot paths. Multiple workloads, blended into a single output, calibrated against a reference. Two thousand years apart, the same underlying idea.

It also has an uncomfortable echo for this spec: the mechanism's true capability was hidden for centuries before being properly understood — not unlike the perf-stat IPC capture that v.G.1.3 finally surfaced as having been silently broken since the code shipped.