Records the design decisions for the next planned work:
- Per-reminder delivery window (default 6am–6pm, operator timezone).
Window-close hard-stops the run; remaining targets become
skipped; status reports as partial with a clear "this account is
at capacity, consider another paired account" message.
- Per-account isolation via pg-boss teamSize ≥ N + an in-process
PerKeyMutex keyed by accountId. Different accounts run in
parallel; the same account serialises (no double-rate sends
that would risk a ban).
- Per-account token-bucket rate limiter (default 40 msg/min,
BOT_MAX_SEND_PER_MINUTE).
- Up-front media-upload cache via prepareWAMessageMedia: 1000
groups × 5 MB upload turns into 5 MB. Biggest single win for
text+picture reminders.
- Bounded group concurrency (default 3 in-flight per account);
parts-within-a-group stay serial for visible message order.
- Pre-fetched DB Maps (groups / messages / media), no inner-loop
round-trips.
- Replaces the rigid 1.5 s inter-part sleep with 200–500 ms
jitter; the per-account rate-limiter is the real gate.
Out of scope for v1 (documented under "v2 candidates"): cross-day
window resume, mid-restart resumability, multi-account auto-split,
adaptive rate-limit back-off, pause/resume mid-run.
Acceptance: 1000-group reminder + one image, established account
finishes in ~30–50 minutes, well inside a 6am–6pm window. Two
reminders on different accounts at the same wall-clock minute
both progress in parallel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 KiB
Windowed, Pacing-Safe Reminder Fan-Out
Design spec for a faster, ban-safe, multi-account-friendly reminder delivery loop. Written 2026-05-10. Implementation tracked in a follow-up plan doc.
Goal
Deliver a reminder to many groups (target: 1000+) safely within a
per-reminder delivery window. If we cannot finish in the window, stop,
mark the run partial, and tell the operator the account is at
capacity for this fan-out.
Constraints
- WhatsApp's anti-spam is the dominant ceiling. For an established account (years of legit history), 30–60 sends/minute is the sustainable safe band; tighter for newer accounts.
- The system runs on a single bot process talking to multiple paired WhatsApp accounts. Each account's Baileys socket is independent.
- Two simultaneous fan-outs on the same WhatsApp account would double its effective send rate and risk a ban.
- The operator dropped multi-account fan-out (one reminder splitting across N accounts) earlier this week. We respect that decision — this design does not automatically split work across accounts.
Approach (selected: B)
A. Minimal pacing fix. Drop the rigid 1.5s sleep, add a token-bucket rate limiter, add window-end check, cache DB lookups. Wins ≈30% on text-only reminders; very little on media-heavy ones.
B. Pacing + media-upload cache + bounded concurrency. Everything
in A, plus upload each unique media file ONCE per run via Baileys'
prepareWAMessageMedia and reuse the resulting WAMediaUpload
payload for every group send. Run up to N groups in parallel within
one account (parts within a group stay serial so order is preserved).
Wins are massive on text + picture: 1000 groups × 5 MB = 5 GB of
upload turns into 5 MB. Recommended.
C. Multi-account fan-out — dropped per operator decision.
Per-account isolation (cross-account parallelism)
Today boss.work() is called with default teamSize=1, so a single
fan-out monopolises the whole bot. Two reminders on different
accounts queue serially, which surprises the operator.
The new model is per-account serialization, cross-account parallelism:
teamSizeraised so multiple reminders on different accounts run simultaneously.- A per-key async mutex keyed by
accountIdwraps the inner work, so two reminders on the same account take turns. - The token-bucket rate limiter is per-account too, so one account's pacing budget never throttles another.
pg-boss worker pool (teamSize = BOT_FIRE_CONCURRENCY, default 8)
├─ R1 (account A) ──┐
│ ├─ per-account-A mutex ──→ serialised within A
├─ R3 (account A) ──┘
│
├─ R2 (account B) ──── per-account-B mutex ──→ parallel with A's
│
└─ R4 (account C) ──── per-account-C mutex ──→ parallel with A and B
Delivery window
Each reminder gets a window in its operator timezone. If the run cannot finish inside the window, send what we can and stop.
- New columns on
reminders:delivery_window_start_hour int default 6delivery_window_end_hour int default 18- Both interpreted in the row's existing
timezonecolumn.
- Validation:
0 ≤ start < end ≤ 24. Cross-midnight windows (e.g. 22 → 06) are rejected in v1 to keep the math obvious; can be added later if anyone needs them. - UI uses two number inputs in the When step (and edit-when page).
delivery-window.tsexports a pure helper:windowEndAt(timezone, endHour, fireAt) → Date. Returns the end-of-window timestamp for the calendar dayfireAtfalls on, in the given timezone. IffireAtis already past that day's end-hour, the returned timestamp is in the past — the run loop's first iteration seesnow() >= windowEndAt, marks every targetskipped, and the run resolves tofailed(zero sent). That's the right behaviour: "we can't send after window close, even one message".- Only the end hour is enforced at runtime in v1. The start hour is documented on the row but not gated — operators schedule fire times that fall in their band naturally (cron + the picker's default 09:00 time fields land inside 06–18). Enforcing the start too would mean holding messages from a 4am cron miss-fire until 6am, which is a v2 conversation.
Run loop changes (fire-reminder.ts)
Up-front, once per run:
- Load all
reminder.targets,reminder.messages, and referencedmedia_filesrows into in-memory Maps. Drops ~3000 round-trips to ~3 round-trips for a 1000-group run. - Pre-create every
reminder_run_targetsrow withstatus = "pending"so progress is observable from the Activity tab while the fan-out is mid-flight. - Pre-upload each unique media via Baileys'
prepareWAMessageMedia. Cache the resultingWAMediaUploadpayload keyed bymediaIdfor the duration of the run. - Compute
windowEndAtand stash it.
Per-target (limited to BOT_GROUP_CONCURRENCY parallel groups,
default 3):
- Window-end gate: if
Date.now() >= windowEndAt, mark the targetskippedwitherror="delivery window closed"and skip. - Already-sent gate: if the run-target row is already
sent(i.e. a retry is replaying), skip. - Acquire a token from the per-account rate limiter (default 40
msg/min, configurable
BOT_MAX_SEND_PER_MINUTE). assertSessions(group)— call once per group, cache for the run.- For each part in
reminder.messages:- text →
socket.sendMessage(jid, { text }) - media →
socket.sendMessage(jid, uploadedMediaCache[mediaId]) - sleep
jitter(200..500 ms)between parts (replaces the rigid 1.5 s wait — preserves per-chat ordering at WA's natural pace).
- text →
- Update the run-target row to
sentwith latency.
Final status:
- success — every target sent.
- partial — at least one sent, at least one not (window-close,
failed, missing group).
error_summaryreads:"Delivery window closed at 18:00 (Asia/Kuala_Lumpur). 412 of 1000 groups delivered. This account is at capacity for this fan-out — consider sending the remainder from another paired account." - failed — zero sent.
Notification body
The existing reminder.fired SSE event already carries
{ status }. The web's notification mapper already handles
partial with a "see activity" hint. The body extends to mention
"X of Y delivered" when status === "partial".
Components
| File | Role | LOC est. |
|---|---|---|
migrations/0008_*.sql |
add 2 int columns to reminders |
<20 |
packages/db/src/schema.ts |
drizzle alignment | <10 |
apps/bot/src/scheduler/per-key-mutex.ts (new) |
accountId-keyed async mutex | ~40 |
apps/bot/src/scheduler/rate-limiter.ts (new) |
per-account token bucket | ~60 |
apps/bot/src/scheduler/media-upload-cache.ts (new) |
prepareWAMessageMedia results, keyed by mediaId |
~50 |
apps/bot/src/scheduler/delivery-window.ts (new) |
pure window-end calculator | ~30 |
apps/bot/src/scheduler/fire-reminder.ts (rewrite) |
new loop using all of the above | ~200 |
apps/bot/src/scheduler/reminder-jobs.ts |
teamSize config |
<10 |
apps/bot/src/env.ts |
BOT_FIRE_CONCURRENCY, BOT_MAX_SEND_PER_MINUTE, BOT_GROUP_CONCURRENCY |
<20 |
apps/web/src/actions/reminders.ts |
accept the two new fields | <30 |
apps/web/src/components/reminder-wizard/when-form-client.tsx |
"Delivery hours" inputs | <40 |
apps/web/src/components/reminder-edit/edit-when-form.tsx |
same | <30 |
apps/web/src/lib/notifications.ts |
partial-status body extension | <15 |
Tests
delivery-window.test.ts— pure function. Window in past → next-day end; window crosses midnight (start > end) — explicitly reject in the schema; timezone offsets handled correctly.rate-limiter.test.ts— fake-clock token bucket. N tokens drained, then refill rate; backpressure viaacquire()returning a Promise.per-key-mutex.test.ts— different keys do NOT block each other (parallelism); same key DOES (serialisation); a throwing handler releases the lock; cleanup removes empty entries.media-upload-cache.test.ts— mock socket:preparecalled once per unique mediaId regardless of how many groups consume it.fire-reminder.test.ts(extend) — window-end gate marks remaining targetsskipped; partial-status error_summary includes account / delivered / total context.
Tuning knobs (env)
| Var | Default | Effect |
|---|---|---|
BOT_FIRE_CONCURRENCY |
8 | pg-boss worker pool size; max accounts running simultaneously |
BOT_GROUP_CONCURRENCY |
3 | per-account parallel group sends |
BOT_MAX_SEND_PER_MINUTE |
40 | per-account token-bucket rate; loosen to 60 if no flags after weeks of running, tighten to 20 if any rate-limit response |
Per-reminder delivery_window_start_hour / delivery_window_end_hour
default to 6/18 and can be widened (e.g. 0/24) for a specific big run.
Out of scope (v2 candidates)
- Crash resumability across bot restarts. Today, if the bot dies
mid-fan-out, pg-boss will retry the job; the loop will skip any
rows already marked
sent, but the in-memory rate-limiter and upload-cache state are gone — meaning the retry uploads media again and starts pacing from a full bucket. Acceptable for v1. - Pause / resume mid-run controls.
- Cross-day window resume (current design hard-stops at window end and reports partial; doesn't queue the remainder for tomorrow).
- Multi-account auto-split of a single reminder.
- Adaptive rate limiting (auto-back-off on WA rate-limit response codes; today the operator tunes the env var).
Acceptance
- 1000-group reminder with one image, established account: completes in roughly 30–50 minutes, comfortably inside a 6am–6pm window.
- Two reminders on different accounts firing within seconds of each other: both progress simultaneously, neither blocks the other.
- A run that hits the window end: stops cleanly, marks remaining as skipped, surfaces the partial-status message in the Activity tab and via the browser notification.
- 355 existing tests still pass; ≈25 new tests cover the new helpers.