Repro: fire a reminder, message lands 2-3 times in WhatsApp (logs
showed three 'fire-reminder: done' entries within 1.5 s for the same
reminderId).
Two interlocking root causes:
1. The queue was created at 'standard' policy (pre-dating the
stately rollout). pg-boss's createQueue is idempotent and DOES
NOT update the policy on an existing queue row, so re-deploying
the code that requested policy=stately silently kept the
standard policy. Standard accepts duplicate enqueues with the
same singletonKey — three reminder.fire jobs for the same
reminderId could all land at once.
2. The handler-level recent-run dedupe was TOCTOU. The check ran
OUTSIDE the per-account mutex, so three concurrent invocations
all read 'no recent run', then queued up on the mutex one at a
time and each INSERTed a fresh run + sent the message.
Fixes:
- registerReminderJobs now forces the queue policy via direct SQL
(UPDATE pgboss.queue SET policy = 'stately' WHERE name = ...
AND policy <> 'stately') on every boot. Idempotent + survives
pre-existing standard-policy rows.
- fireReminderInner re-checks for a recent run AFTER the mutex is
held but BEFORE the INSERT. By that point any concurrent winner
has already inserted, so the duplicate sees the row and bails
cleanly.
New test in fire-reminder.test.ts (the TOCTOU repro): outer check
returns no recent run, inner check returns a freshly-inserted one,
asserts the mutex was acquired but the second findFirst was hit
(i.e. we got past the outer check and the inner check stopped us).
Verified live: pgboss.queue.policy is now 'stately' for reminder.fire.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observed: reminder fired twice within ~2s. The bot logs showed two
distinct pg-boss jobIds for the same reminder enqueued at the same
scheduledAt — both ran fire-reminder, both sent the message.
Root cause: pg-boss's `singletonKey` only deduplicates on queues with
a 'singleton' / 'stately' / 'short' policy. Our queue was created
without specifying a policy, defaulting to 'standard', which IGNORES
the singletonKey. Two sends with the same key produced two jobs.
Fix lives at two layers:
* Layer 1 — queue policy. createQueue(REMINDER_FIRE_QUEUE) now
passes `{ policy: 'stately' }`. With this, future fresh deploys
fold a duplicate send (same singletonKey) into the existing
'created' job rather than producing a second one. This doesn't
retroactively change an existing queue's policy (pg-boss doesn't
support that), but new queues are correct from creation.
* Layer 2 — defense-in-depth check inside fireReminder. Before
acquiring the per-account mutex, query reminderRuns for any row
with the same reminderId fired in the last 30s. If found, log
+ bail. This guards against:
- Existing queues stuck on policy='standard'.
- Race windows even within 'stately' policy.
- The operator double-clicking Save in the wizard.
- A jittery pg_notify('bot.command') replay.
Resume jobs (payload.runId set) skip this check — they're meant
to attach to an existing run.
Tests:
* New "BAILS OUT when a fresh fire collides with a recent run" case
in fire-reminder.test.ts.
* beforeEach now resets findExistingRunMock too, since both the
resume and dedupe paths share that mock.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'ended' label read like a terminal failure state ("the reminder
gave up") when in practice it just means "this reminder isn't going
to fire on its own — restart it if you want it back". 'inactive' is
the more accurate read.
* SQL migration 0009 backfills existing rows.
* Bot fire-reminder writes 'inactive' on one-off completion / no
further occurrences.
* Web actions, queries, filters, and reminder lifecycle gates updated.
* Dashboard counter card label "Active / Paused / Ended / Total"
becomes "Active / Paused / Inactive / Total".
* Reminders list filter tab "Ended" becomes "Inactive".
* Status pill style key renamed to match.
* Tests updated alongside the runtime changes.
Also: the "Pause sending by" deadline opt-in now renders as a
visible card-shaped row with hover state + Set/Off label on the
right, so the toggle is discoverable instead of a tiny native
checkbox tucked next to the label.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fire-reminder.ts now:
* Computes windowEnd via @cmbot/shared/windowEndAt(timezone, endHour,
now). Per-target loop trips the gate before sending; pending rows
are LEFT pending (not flipped to skipped) so the run is resumable.
* Accepts an optional runId on the FireReminderPayload. When set,
the handler ATTACHES to that existing run instead of creating a
new one and only re-tries pending targets. Resume is allowed even
when the reminder.status is 'paused' (otherwise we couldn't drag
it back into delivery).
* Final-status logic adds a 'paused' branch (window closed mid-run
with at least one row still pending AND something delivered);
failed when window closed before any send; partial / success
otherwise.
* Lifecycle: a paused run flips the reminder row to status='paused'
and skips the recurring re-arm. Resuming or completing later
flips it back to 'active'.
* SSE event payload gains optional sent/total counts.
reminderFiredToNotification picks up:
* New 'paused' headline + 'X of Y groups delivered. Tap to resume
or cancel.' body.
* 'partial' body uses sent/total when present.
WebEventMap and the bot's WebEvent union match the new shape.
Tests:
* fire-reminder.test.ts gains a "resume against paused reminder
acquires mutex" case.
* notifications.test.ts gains 3 paused/partial-sent body cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the single-threaded, 1.5s-sleep-per-part loop with a
concurrency model that:
* Wraps inner work in PerKeyMutex(accountId) so two reminders on the
SAME account take turns (running them concurrently would double the
effective send rate and risk a WhatsApp ban). Different accounts run
in parallel.
* Bumps pg-boss localConcurrency to BOT_FIRE_CONCURRENCY (default 8),
so up to 8 different-account reminders can fire simultaneously.
* Bulk-loads groups + media in 2 queries (drops ~3000 round-trips to
~3 for a 1000-group run) and pre-creates run_target rows so the
Activity tab shows progress mid-run.
* Pre-uploads each unique media via MediaUploadCache (one
generateWAMessageContent call per mediaId, then relayMessage to
every group). For 1000 groups × 5 MB image, this turns 5 GB of
upload into 5 MB.
* Runs BOT_GROUP_CONCURRENCY (default 3) groups in parallel within
one account; parts within a group stay serial so chat order is
preserved.
* Gates every send on a per-account TokenBucket
(BOT_MAX_SEND_PER_MINUTE, default 40).
* Replaces the rigid 1.5s inter-part sleep with 200..499 ms jitter.
Adds a unit test verifying accountMutex.run is called keyed by
accountId for active reminders, and skipped for inactive / missing.
Window enforcement, paused/resume, and ETA preview are deferred to
later phases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In-tab notification bridge so the operator gets a system notification
when a reminder fires successfully (or partly / fails) and when a
send-test message lands. Foundation for true background push later
(VAPID + service-worker subscription); this lands the wiring so
behaviour is testable today.
Pieces
------
- `lib/notifications.ts` — pure helper module:
* notificationSupport / getPermission — feature detection that
treats the SSR / unsupported-browser case as "denied" so callers
don't have to handle a third state.
* isOptedIn / setOptedIn — localStorage-backed opt-in flag
(key `cmbot.notifications.optedIn`). Survives gracefully when
window is missing or storage throws (private mode / quota).
* showNotification(opts) — gated dispatch returning a discriminated
result ({ ok: true, tag } | { ok: false, reason }) so callers
can fall back to a UI toast on opt-out / unsupported / error.
* reminderFiredToNotification + sendTestDoneToNotification —
pure mappers from the bot's SSE events into notification args.
Skips bookkeeping noise (status === "skipped") and failures
that the in-page toast already shows verbatim.
- `components/notification-manager.tsx` — client component mounted
once at the app shell. Subscribes to `reminder.fired` and
`send_test.done` via useEvents and forwards each through the pure
mappers. Renders no DOM.
- `components/notifications-toggle.tsx` — settings-page card with
three states (unsupported / not-granted / granted+opted-in).
"Send test" button fires a sample notification so the operator
can verify the wiring without waiting for a real reminder. The
blocked-by-browser path points them at site settings instead of
silently doing nothing.
- `app/settings/page.tsx` — new "Notifications" card sits above
the Appearance card.
- `app/layout.tsx` — `<NotificationManager />` rendered alongside
`<Toaster />` inside ThemeProvider so the SSE subscription is
active across all routes.
Bot side
--------
- `apps/bot/src/scheduler/fire-reminder.ts` — emits
`pgNotifyWeb({ type: "reminder.fired", reminderId, runId, status })`
after every run regardless of success/partial/failed. The web
side decides whether to surface it as a notification (skipped is
filtered out client-side).
- send_test.done was already emitted by `ipc/send-test-handler.ts`.
PWA service-worker tests (the original ask before this thread)
--------------------------------------------------------------
- Extracted the Serwist config into `pwa/config.ts` so the choices
(skipWaiting, clientsClaim, navigationPreload, runtimeCaching,
precacheEntries) are pinnable without booting a worker scope.
- 6 tests in `pwa/config.test.ts` lock the surface (no extra keys
appear silently, the manifest passes through unchanged, the
pinned booleans stay where production expects them).
- 6 tests in `app/manifest.webmanifest/route.test.ts` cover the
manifest contract (display=standalone, start_url=/, dark theme
colors match the OS, both icons are PNG + maskable, paths
match committed PNGs in public/).
Test counts
-----------
281 web + 31 shared + 26 bot = 338 total (was 306).
- +6 pwa/config (service-worker config pinning)
- +6 app/manifest.webmanifest (PWA manifest contract)
- +20 lib/notifications (full coverage of mappers + dispatch
gates + SSR / unsupported / blocked / opted-out paths)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old behaviour: HEIC/AVIF photos, .mov / .webm / .mkv videos, and niche
audio (FLAC, etc.) got rejected outright at upload with "Images are
not supported" / "Videos are not supported" errors. Strict but
unfriendly — recipients could still receive these as a downloadable
file via WhatsApp's document path; we just weren't using it.
New behaviour: anything not playable inline gets routed through the
document path automatically. The recipient downloads the file and
opens it in their default app. The 100 MB document cap applies
instead of the inline 5 / 16 / 16 MB caps. Only oversized uploads
get rejected.
Where the policy lives
----------------------
The classifier moved into a new `@cmbot/shared/whatsapp-media`
module so the web upload validator AND the bot's fire-reminder send
path share one source of truth:
- resolveDeliveryKind(mime, bytes?) → "image" | "video" | "audio"
| "document". Native types stay as-is; HEIF / AVIF / QuickTime /
WebM / Matroska / non-MP3-or-M4A audio all collapse to "document".
- Bytes argument is optional but recommended — sniffing the first
12 bytes of the file catches iOS Safari's habit of labelling
a HEIC as image/jpeg or a .mov as video/mp4. Bytes win when they
disagree with the mime.
Web side
--------
- `lib/whatsapp-media.ts` re-exports the shared helpers and keeps
only the validator + byte-formatter. `validateForWhatsApp` calls
resolveDeliveryKind internally; the size cap it returns is for the
RESOLVED kind (so a HEIC routes to document and gets the 100 MB
cap). The "Images are not supported" / "Videos are not supported"
rejection messages are gone — there's no format rejection anymore.
- `actions/media.ts` collapses the previous explicit-mime + byte-sniff
pair into a single `validateForWhatsApp(mime, size, bytes)` call.
- Compose-step upload-zone hint updated to spell out the per-kind
caps: "JPEG/PNG up to 5 MB · MP4/3GP up to 16 MB · MP3/M4A/OGG
up to 16 MB · documents up to 100 MB".
Bot side
--------
- `fire-reminder.ts` reads the first 12 bytes of the file before
dispatching and calls `resolveDeliveryKind(mimeType, head)` to
pick the senderKind. So a HEIC on disk (whose mime claims
image/jpeg) gets sent via Baileys' document path — no failed
thumbnail extraction, message arrives as a downloadable .heic.
- New `readHeadBytes(filePath, n)` helper opens, reads N bytes,
closes — no full-file slurp.
Tests
-----
249 web + 31 shared + 26 bot = 306 passing total.
Web (`lib/whatsapp-media.test.ts`):
- "HEIC at 30 MB allowed: routes to document (100 MB cap)"
- "HEIC at 110 MB rejects: exceeds the document cap"
- "MOV at 50 MB allowed (would be 16 MB cap as video, 100 MB as
document)"
- "MOV pretending to be mp4 demotes to document (50 MB allowed)"
- "FLAC audio routes to document path"
- "genuine MP4 byte-sniff path keeps it as video"
Shared (`packages/shared/src/whatsapp-media.test.ts`, new):
- The cross-package contract: 11 tests covering size limits,
classifyMediaKind, resolveDeliveryKind for native + demoted +
byte-sniff cases, plus the underlying helpers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom
-------
Click Unpair on a connected account (or Delete Account). The web
action runs:
DELETE FROM whatsapp_groups WHERE account_id = ?
and Postgres rejects it:
error: update or delete on table "whatsapp_groups" violates
foreign key constraint
"reminder_run_targets_group_id_whatsapp_groups_id_fk"
on table "reminder_run_targets"
Cause
-----
\`reminder_run_targets.group_id\` had a non-null FK to
whatsapp_groups.id with no ON DELETE rule (defaults to NO ACTION /
RESTRICT). So any reminder that had ever fired pinned the group rows
in place. Unpair couldn't wipe the synced groups, the action threw,
and the row never reached \`status='unpaired'\`.
Fix
---
Mirror the pattern \`reminder_runs.reminder_id\` already uses
(migration 0005): nullable column + ON DELETE SET NULL + a
denormalised label snapshot, so historical fan-out records survive a
group wipe but stay readable.
Migration 0006:
- Drop the composite \`(run_id, group_id)\` PK; add a surrogate
\`id uuid pk default gen_random_uuid()\` since \`group_id\` can no
longer be part of the PK once it's nullable.
- Make \`group_id\` nullable.
- Re-create the FK with ON DELETE SET NULL.
- Add \`group_label text\` for the snapshot.
fire-reminder.ts now writes the group's name into \`group_label\`
on every insert path (success / failed / skipped /
account-not-connected / group-missing) so the Activity tab can keep
showing "Sent to <Group Name>" even after the group is gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dashboard
- Stat cards are now clickable: Accounts → /accounts, Active reminders →
/reminders?filter=active, Recent runs → /reminders.
- Recent activity rows link to the underlying reminder when it still
exists. Runs whose reminder has been deleted render with a "(deleted)"
marker and stay non-clickable.
- New "Clear history" action wipes all run rows the operator owns plus
any orphan rows (reminderId=NULL).
Run history persists after reminder delete
- reminder_runs.reminder_id is now nullable with ON DELETE SET NULL, so
deleting a reminder no longer cascade-erases its history.
- New reminder_runs.reminder_name column snapshots the name at fire
time so history rows stay readable even after the reminder is gone.
- Fire-reminder records the snapshot.
- Dashboard query LEFT JOINs and COALESCEs name from the live reminder,
the snapshot, or "(deleted reminder)" as last resort.
QR
- Drop the 25 s server-side throttle. With listener accumulation already
fixed (previous commit), the payload-equality dedupe is enough.
Symptom: after the first QR expired the throttle blocked the next
emit, and the QR never refreshed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reminders
- Add recurrence to wizard step 3 (None / Daily / Weekly+weekday picker /
Monthly / Yearly). Build the RRULE client-side and thread it through
the wizard URL state.
- Action stores rrule + scheduleKind="recurring" on insert.
- Bot reschedules the next occurrence after firing a recurring reminder
using the existing rrule helpers in @cmbot/shared. One-off behavior
unchanged.
- Add reminders.last_fired_at column to track last fire.
Pairing
- Move QR PNG out of the pg_notify payload (the 8000-byte limit was
silently truncating it; QR never reached the web → "QR hang"). PNG
now lives on whatsapp_accounts.last_qr_png; NOTIFY just signals
{type: session.qr, accountId, ts}. Web fetches the bytes from a new
read-only /api/qr/[accountId] route (allowed via middleware).
- handleStartPairing now stops any in-flight session before starting a
fresh one — fixes Re-pair where session.start was a silent no-op and
Baileys never re-emitted QR.
- Pair-live: countdown moved out from over the QR (it was overlapping
the scan area); shown as a discrete progress bar above the QR.
- Add a "Save QR" download button.
Account detail page
- Pair / Unpair / Delete cards are themselves the trigger (form submit
or DialogTrigger) — no inline buttons, whole card is clickable.
- Sync Groups Now card removed earlier; bot already auto-syncs.
Account list page
- Cards are the link target. A small floating Delete trigger (top-right
trash icon) opens the destructive confirm dialog without blocking
navigation on the rest of the card.
Tests
- recurrence.test.ts: 10 tests for buildRrule / kindFromRrule /
describeRecurrence (incl. weekly day combos and BYDAY ordering).
- reminders.schema.test.ts: regression for the "Invalid datetime" bug —
proves strict Zod .datetime() rejected luxon's offset ISO and the
{ offset: true } option accepts both forms.
Migration: 0004_next_prowler.sql
- whatsapp_accounts.last_qr_png (text)
- reminders.last_fired_at (timestamptz)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A fired one-off reminder was staying active forever in the DB and showing
🟢 in the Reminders list. Update reminders.status to 'ended' once a one-off
has fired (regardless of run outcome — one-off is done after one attempt).
Recurring reminders stay 'active' — they have more occurrences pending.