Corner case observed: fire-reminder writes the run row with
status='pending' UP FRONT (so the Activity tab shows progress
mid-run), then flips to a terminal status once it's done. If the
bot is killed between those two writes — e.g. a redeploy or crash —
the row sits at 'pending' forever. pg-boss already marked the job
'completed', so it won't retry. Activity surfaces and the dashboard
counters then show a "stuck" run that never moves.
sweepStalePendingRuns runs at bot startup, finds any 'pending' run
older than 5 minutes, and:
• Flips the run to 'failed' with a clear error_summary so the UI
stops treating it as in-flight.
• Flips its still-'pending' run_target rows to 'skipped' with the
same reason so per-group counts remain coherent.
The 5-minute floor is generous enough that an actual mid-run worker
rebalance isn't accidentally killed.
Tests:
* 4 sweep tests covering: no-stale path skips the second UPDATE;
with-stale path fires both UPDATEs; counts are forwarded; the
edge case where a stale run has zero pending targets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>