cm_whatsapp_bot_v1

Author	SHA1	Message	Date
yiekheng	2e6fbfa7a5	fix(bot): reschedule was silently dropped under stately policy Scheduled reminder for May 10 8:20 PM never fired. Bot logs showed "reminder.fire: scheduled" with jobId: null at 12:18 UTC — pg-boss returned null because the queue was on policy=stately, which dedupes sends across the (created/active/retry) state cone by singletonKey. A previous schedule for the same reminder (next recurring fire, created earlier) was still in 'created' state, so the new send for today 8:20 PM hit the dedupe and was silently rejected. Two fixes: 1. Switch the queue policy back to 'standard' (the default) and force-flip any existing 'stately' queue row on boot. Standard lets us enqueue across reschedules. 2. scheduleReminderFire now does a pre-send cancel: any 'created' job for this singletonKey is moved to 'cancelled' before the new boss.send. The new schedule wins; old stale jobs are tombstoned so the recurring/edit path produces exactly-one upcoming fire. Duplicate-fire safety (the 'qwerd msg three times' bug) is already covered at the handler level by the inner-mutex recent-run check inside fireReminderInner — that's what stately was guarding against, and the inner check works under standard too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:27:52 +08:00
yiekheng	a789b61e1f	fix(bot): triple-fire reminder bug — force pg-boss policy + close TOCTOU dedupe Repro: fire a reminder, message lands 2-3 times in WhatsApp (logs showed three 'fire-reminder: done' entries within 1.5 s for the same reminderId). Two interlocking root causes: 1. The queue was created at 'standard' policy (pre-dating the stately rollout). pg-boss's createQueue is idempotent and DOES NOT update the policy on an existing queue row, so re-deploying the code that requested policy=stately silently kept the standard policy. Standard accepts duplicate enqueues with the same singletonKey — three reminder.fire jobs for the same reminderId could all land at once. 2. The handler-level recent-run dedupe was TOCTOU. The check ran OUTSIDE the per-account mutex, so three concurrent invocations all read 'no recent run', then queued up on the mutex one at a time and each INSERTed a fresh run + sent the message. Fixes: - registerReminderJobs now forces the queue policy via direct SQL (UPDATE pgboss.queue SET policy = 'stately' WHERE name = ... AND policy <> 'stately') on every boot. Idempotent + survives pre-existing standard-policy rows. - fireReminderInner re-checks for a recent run AFTER the mutex is held but BEFORE the INSERT. By that point any concurrent winner has already inserted, so the duplicate sees the row and bails cleanly. New test in fire-reminder.test.ts (the TOCTOU repro): outer check returns no recent run, inner check returns a freshly-inserted one, asserts the mutex was acquired but the second findFirst was hit (i.e. we got past the outer check and the inner check stopped us). Verified live: pgboss.queue.policy is now 'stately' for reminder.fire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:43:41 +08:00
yiekheng	4cb4015666	fix(bot): dedupe duplicate reminder.fire jobs (msg sent twice) Observed: reminder fired twice within ~2s. The bot logs showed two distinct pg-boss jobIds for the same reminder enqueued at the same scheduledAt — both ran fire-reminder, both sent the message. Root cause: pg-boss's `singletonKey` only deduplicates on queues with a 'singleton' / 'stately' / 'short' policy. Our queue was created without specifying a policy, defaulting to 'standard', which IGNORES the singletonKey. Two sends with the same key produced two jobs. Fix lives at two layers: * Layer 1 — queue policy. createQueue(REMINDER_FIRE_QUEUE) now passes `{ policy: 'stately' }`. With this, future fresh deploys fold a duplicate send (same singletonKey) into the existing 'created' job rather than producing a second one. This doesn't retroactively change an existing queue's policy (pg-boss doesn't support that), but new queues are correct from creation. * Layer 2 — defense-in-depth check inside fireReminder. Before acquiring the per-account mutex, query reminderRuns for any row with the same reminderId fired in the last 30s. If found, log + bail. This guards against: - Existing queues stuck on policy='standard'. - Race windows even within 'stately' policy. - The operator double-clicking Save in the wizard. - A jittery pg_notify('bot.command') replay. Resume jobs (payload.runId set) skip this check — they're meant to attach to an existing run. Tests: * New "BAILS OUT when a fresh fire collides with a recent run" case in fire-reminder.test.ts. * beforeEach now resets findExistingRunMock too, since both the resume and dedupe paths share that mock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 16:41:11 +08:00
yiekheng	376bbe595b	feat(web,bot): resumeReminderRunAction + cancelReminderRunAction Web actions: * resumeReminderRunAction({ runId }) → validates ownership and that the run is in 'paused' state, then publishes a reminder.resume command via pg_notify('bot.command'). The bot's command-consumer picks it up and enqueues a fresh pg-boss job at REMINDER_FIRE_QUEUE carrying { reminderId, runId }; fire-reminder's existing resume branch attaches to the row. * cancelReminderRunAction({ runId }) → flips remaining 'pending' targets to 'skipped' with error="canceled by operator", marks the run 'partial' with a clear errorSummary, and lifts the parent reminder out of 'paused' (recurring → active so the next occurrence fires; one-off → ended). Bot: * New BotCommand variant { type: "reminder.resume"; reminderId; runId } * command-consumer registers handleResumeReminder which calls enqueueReminderResume(boss, reminderId, runId) — a sibling of scheduleReminderFire that posts the job at REMINDER_FIRE_QUEUE with { reminderId, runId } and singletonKey "reminder:resume:<runId>" so the resume doesn't conflict with a future-occurrence schedule. Tests: * reminders.run-actions.test.ts (11 tests) — every guard rail (invalid uuid, missing run, missing reminder, foreign operator, wrong status) and the recurring/one-off lifecycle branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:54:21 +08:00
yiekheng	c9a7e6f089	feat(bot): cross-account parallel + same-account serial fan-out Replaces the single-threaded, 1.5s-sleep-per-part loop with a concurrency model that: * Wraps inner work in PerKeyMutex(accountId) so two reminders on the SAME account take turns (running them concurrently would double the effective send rate and risk a WhatsApp ban). Different accounts run in parallel. * Bumps pg-boss localConcurrency to BOT_FIRE_CONCURRENCY (default 8), so up to 8 different-account reminders can fire simultaneously. * Bulk-loads groups + media in 2 queries (drops ~3000 round-trips to ~3 for a 1000-group run) and pre-creates run_target rows so the Activity tab shows progress mid-run. * Pre-uploads each unique media via MediaUploadCache (one generateWAMessageContent call per mediaId, then relayMessage to every group). For 1000 groups × 5 MB image, this turns 5 GB of upload into 5 MB. * Runs BOT_GROUP_CONCURRENCY (default 3) groups in parallel within one account; parts within a group stay serial so chat order is preserved. * Gates every send on a per-account TokenBucket (BOT_MAX_SEND_PER_MINUTE, default 40). * Replaces the rigid 1.5s inter-part sleep with 200..499 ms jitter. Adds a unit test verifying accountMutex.run is called keyed by accountId for active reminders, and skipped for inactive / missing. Window enforcement, paused/resume, and ETA preview are deferred to later phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:44:23 +08:00
yiekheng	01eb5752ee	feat(scheduler): add fire-reminder handler + job registration Also fix rrule default-import workaround so the shared package loads correctly under NodeNext ESM resolution (rrule@2.8.1 has no exports field).	2026-05-09 17:29:21 +08:00
yiekheng	113adc7edf	feat(scheduler): add pg-boss client + lifecycle Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 17:19:01 +08:00

7 Commits