Documentation / 7. Jobs & the worker

Chapter 7

Jobs & the worker

Background execution, parallelism, leases, SSH pool.

Background work β€” SSH probes, benchmarks, installs, backups, certbot obtains, seed refreshes, scoring ticks, tip polls β€” runs through a single job queue owned by the worker process. Everything is resumable, audit-logged, and visible on /admin/jobs.

Worker lease

Only one worker at a time executes jobs. The lease lives in worker_lease in the SQLite DB and is renewed every ~5 seconds by the current leader. If the leader disappears (crash, deploy), a replacement acquires the stale lease and carries on. Multiple worker instances can stand by without conflict.

Parallel job execution

Inside the leader, up to MAX_PARALLEL_JOBS = 32 jobs run concurrently. A per-node lock stops two jobs touching the same target at once; per-kind caps stop one kind (e.g. backup-stoachain) saturating every slot. Different nodes make independent progress; a 6-minute benchmark on node A doesn’t block a quick probe on node B.

Job kinds

Handlers are registered in lib/handlers/registry. Currently 15 kinds:

  • apt-upgrade
  • backup-stoachain
  • benchmark-node
  • drive-benchmark
  • netdata-install
  • node-test
  • peer-trust-reset
  • seed-refresh
  • stoachain-cert-rotate
  • stoachain-certbot-obtain
  • stoachain-control
  • stoachain-convert-supervision
  • stoachain-install
  • stoachain-reseed
  • system-probe

Background ticks (not job-queued)

Some work isn’t queued β€” it runs directly in the worker loop on throttled cadences:

  • Chainweb tip pollerβ€” every 30 s, 8-way concurrency, SSH-probes every node’s cut height + writes to node_chainweb_tip.
  • Scoring tick β€” every 60 s, runs the 7-gate eligibility engine for every node.
  • Rich-list materialisation β€” hourly, refreshes the rich_list_mv table.
  • Daily integer mint β€” once per UTC day at 06:00, sweeps Current β†’ Redeemed for every account.
  • Hub-scores nightly backup β€” once per UTC day after 03:00.
  • Job sweep β€” hourly, drops completed jobs older than 30 days.

SSH connection pool

As of v0.7.8z14, runRemote is backed by a pool keyed by user@host:port. Idle TTL 5 min, max connection age 1 h, reaper on a 60 s interval. Multi-channel ssh2 lets concurrent exec calls share one connection β€” no handshake overhead per call.

Where this grows

T3 (β‰₯ 350 active nodes) adds composite indexes on the hot paths + moves job logs out of the DB into flat files. T4 swaps the in-process queue for BullMQ and the SQLite state for Postgres. See Β§11 Scaling plan.