Three connected bits of paired-account hygiene:
1. Duplicate-pair guard (apps/bot/src/ipc/pair-handler.ts)
Operator scans the QR with a phone that's already linked to
another account row → both rows would fight over the same
WhatsApp device and sends become a coin flip. After Baileys'
`open` event the bot now queries siblings of the same operator,
passes them through findDuplicateExistingAccount() (a pure
helper extracted to pair-state.ts), and on a hit:
- stops the new session (intentional; keeps the original's
session intact)
- scrubs the partial auth blob from disk
- resets the row's status to unpaired and clears phone_number
- emits a new session.duplicate event with the existing row's
label so PairLive can render a clear message
New PairLive 'duplicate' phase: amber icon + "Phone already
linked, unpair the existing account first or scan with a
different phone".
2. Logout-before-delete (apps/bot/src/ipc/unpair-handler.ts +
apps/bot/src/whatsapp/session-manager.ts)
Delete used to call account.unpair which only closes the local
socket — the operator's phone kept showing a phantom "linked
device" pointing at a row that no longer exists. Added:
- new account.delete command type (web side and bot side)
- sessionManager.logoutAndStop(): calls socket.logout() so
WhatsApp drops the device on the server side, THEN closes
the local socket. Best-effort; logout RPC failure doesn't
strand the delete.
- new handleDelete() handler that calls logoutAndStop, removes
session files, audits, and notifies.
- deleteAccountAction now sends account.delete instead of
account.unpair.
Unpair stays unchanged — re-pair-friendly, no logout.
3. Tests (bot 77 → 88, web 477 → 480)
- findDuplicateExistingAccount: 6 cases covering match, no-match,
self-exclusion, null/empty/whitespace handling, whitespace
normalisation, deterministic-pick when (defensively) two
siblings share a phone.
- handleUnpair / handleDelete: handleDelete calls logoutAndStop
BEFORE rm; handleUnpair never touches logoutAndStop (regression
guard for a refactor that swaps them); audit log payload
includes the row's label; audit lookup throwing doesn't strand
the delete.
- listAccounts ordering: static guard against the rename-
reshuffles-list regression. Pins `asc(a.createdAt)` + `asc(a.id)`
and rejects `asc(a.label)` in the function body.
Bot restarted with the new flow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After a successful QR scan Baileys closes with status 515 and the
session-manager schedules a reconnect via setTimeout(stop().then(start)).
That cleanup stop emits a SECOND close event which arrived at our
pair-handler listener with warmingUp already cleared (the first qr
cleared it). The decision then resolved to 'treat-as-timeout',
detaching the listener and pushing session.timeout to the UI right at
the moment the user actually paired successfully — pairing then
silently completed in the DB but the UI never got session.connected.
Fix: re-arm pairingWarmingUp inside the post-pair-restart branch so
the cleanup-stop's close is swallowed too. Cleared again by the
following qr/open from the freshly-reopened socket, which then emits
session.connected to the UI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extract the pair-handler's close-event decision into a pure helper
decidePairListenerOnClose(warmingUp, restartRequired) returning one of
ignore-leaked-close / post-pair-restart / treat-as-timeout. Refactor
pair-handler to call the helper instead of the inline if-chain.
New tests in pair-state.test.ts:
- warmingUp=true → ignore-leaked-close (regression: prior session's
close racing the new listener)
- warmingUp=true + restartRequired=true → still ignore (defense in
depth — a stale 515 must not hand control to the reconnect path)
- warmingUp=false + restartRequired=true → post-pair-restart
- warmingUp=false → treat-as-timeout
Bot suite goes from 60 → 64 tests, all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repro: scan QR window once → click Back → click Pair again → instantly
see 'Pairing timed out' (sometimes for several attempts in a row).
Root cause: when handleStartPairing hits a still-running session it
calls await sessionManager.stop(accountId) and immediately attaches a
fresh listener. session.close() resolves before sessionManager broadcasts
the close event to listeners (handleEvent has several awaits between
close arriving and the listener fan-out). The new listener was already
attached by then and saw the OLD session's close as if it were the new
session timing out — flipped the row to unpaired and pushed
session.timeout to the UI.
Fix: track a per-account 'pairingWarmingUp' Set. The new attempt enters
warming-up the moment its listener attaches; clears on the first qr or
open (those events can only come from the freshly-started session). A
close that arrives while still warming is logged and ignored. abandonPair
also clears the flag for safety.
Also drop the redundant Admin card from /settings — the Admin nav entry
on the sidebar/drawer already routes admins to /settings/users, the
extra card was duplicate UI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Web actions:
* resumeReminderRunAction({ runId }) → validates ownership and that
the run is in 'paused' state, then publishes a reminder.resume
command via pg_notify('bot.command'). The bot's command-consumer
picks it up and enqueues a fresh pg-boss job at REMINDER_FIRE_QUEUE
carrying { reminderId, runId }; fire-reminder's existing resume
branch attaches to the row.
* cancelReminderRunAction({ runId }) → flips remaining 'pending'
targets to 'skipped' with error="canceled by operator", marks the
run 'partial' with a clear errorSummary, and lifts the parent
reminder out of 'paused' (recurring → active so the next
occurrence fires; one-off → ended).
Bot:
* New BotCommand variant { type: "reminder.resume"; reminderId; runId }
* command-consumer registers handleResumeReminder which calls
enqueueReminderResume(boss, reminderId, runId) — a sibling of
scheduleReminderFire that posts the job at REMINDER_FIRE_QUEUE
with { reminderId, runId } and singletonKey "reminder:resume:<runId>"
so the resume doesn't conflict with a future-occurrence schedule.
Tests:
* reminders.run-actions.test.ts (11 tests) — every guard rail
(invalid uuid, missing run, missing reminder, foreign operator,
wrong status) and the recurring/one-off lifecycle branches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fire-reminder.ts now:
* Computes windowEnd via @cmbot/shared/windowEndAt(timezone, endHour,
now). Per-target loop trips the gate before sending; pending rows
are LEFT pending (not flipped to skipped) so the run is resumable.
* Accepts an optional runId on the FireReminderPayload. When set,
the handler ATTACHES to that existing run instead of creating a
new one and only re-tries pending targets. Resume is allowed even
when the reminder.status is 'paused' (otherwise we couldn't drag
it back into delivery).
* Final-status logic adds a 'paused' branch (window closed mid-run
with at least one row still pending AND something delivered);
failed when window closed before any send; partial / success
otherwise.
* Lifecycle: a paused run flips the reminder row to status='paused'
and skips the recurring re-arm. Resuming or completing later
flips it back to 'active'.
* SSE event payload gains optional sent/total counts.
reminderFiredToNotification picks up:
* New 'paused' headline + 'X of Y groups delivered. Tap to resume
or cancel.' body.
* 'partial' body uses sent/total when present.
WebEventMap and the bot's WebEvent union match the new shape.
Tests:
* fire-reminder.test.ts gains a "resume against paused reminder
acquires mutex" case.
* notifications.test.ts gains 3 paused/partial-sent body cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom
-------
Click "Unpair" on a connected account. The web action sets
\`status='unpaired'\`, but the account detail page often still shows
"Disconnected" — and on accounts that had been previously connected,
the QR pair flow restarts a few seconds later all on its own.
Cause
-----
Two races inside the session manager:
1. The web's \`unpairAccountAction\` notifies the bot via \`pg_notify\`
and then writes \`status='unpaired'\` to the row. The bot's
\`handleUnpair\` calls \`sessionManager.stop()\` which closes the
Baileys socket; Baileys eventually fires a \`connection: close\`
event which the manager's \`handleEvent\` translates into a
\`status='disconnected'\` UPDATE. Whichever write lands second wins.
The user clicks Unpair and sees Disconnected.
2. The same close-event handler schedules a 5-second
\`stop().then(start())\` reconnect for accounts whose
\`lastConnectedAt\` is set. Five seconds after unpair, the bot
silently re-opens the socket, the row flips to \`pending\`, and the
QR carousel restarts.
Fix
---
\`stop(accountId, { intentional: true })\` marks the account in a new
\`intentionalStops\` Set. When the close event lands, \`handleEvent\`
drains the flag (with \`Set.delete()\` returning whether the key was
present, so it's exactly-once and a stale flag can't bleed into a
later session) and skips both the DB UPDATE and the reconnect
schedule. The caller — only \`handleUnpair\` for now — is the one
choosing the row's next state, so we step out of its way.
The flag is set ONLY when callers ask for it. Internal recoveries
(restartRequired auto re-open, ephemeral-close back-off) keep the
default behaviour and continue to write \`disconnected\` + reschedule.
Drive-bys
---------
- Refresh the stale "the row is gone by the time we run" comment in
unpair-handler — the row stays alive now (the operator can re-pair
without retyping the label). Look up the account first so the
audit log carries the real \`operatorId\` instead of \`null\`. The
delete-account flow really does delete the row before notifying us;
the lookup tolerates that and falls back to \`null\`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The send-test form was stuck on "Sending to <Group>…" because the
server action returns the moment it publishes the IPC NOTIFY; the bot
processed the actual WhatsApp send out-of-band and the form had no way
to learn whether it succeeded.
Round-trip now wired end-to-end:
- New WebEvent variant `send_test.done` { groupId, ok, error }.
- bot/src/ipc/send-test-handler emits it on every exit path:
- missing group → ok=false, "Group not found"
- account offline → ok=false, "Account not connected — re-pair first"
- send threw → ok=false, error message
- send succeeded → ok=true, null
- web/src/hooks/use-events declares the new event in its type map.
- web SendTestForm subscribes via useEvents, filters by its own
groupId so a parallel send-test on another group can't move our
state, and renders one of three pills:
* Sending… (in-flight — Loader2 spinner)
* Sent ✓ (success — emerald CheckCircle2)
* <error message> (failure — destructive AlertCircle)
The "Send Test" button stays disabled while in-flight.
Tests (+5; 110 web tests total):
send-test-form.test.tsx
- SSR markup: textarea, submit button, hidden groupId, no premature
pill on first render.
- useEvents wiring: form registers a `send_test.done` handler.
- Handler safely accepts:
* matching success event
* matching failure event
* mismatched groupId (must not throw)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Found from the live bot log: after the user scans the QR, Baileys
receives `pair-success`, logs "pairing configured successfully, expect
to restart the connection...", and then closes the websocket with
status 515 (DisconnectReason.restartRequired) so it can reopen with
the new credentials. The next `open` event finishes the pairing.
The previous code path treated ANY close during pairing as a failure:
it parked the row as `unpaired`, wiped the QR, and emitted
session.timeout to the UI. The user was greeted with "Pairing timed
out — The QR window closed before a device was linked" at the exact
moment they had successfully paired.
Three changes:
- session.ts emits `restartRequired: boolean` on the SessionEvent close
payload (true when reason === DisconnectReason.restartRequired).
- pair-handler treats the restart-required close as a no-op: keeps the
listener attached and the DB row in `pending` so the upcoming `open`
event flips it to `connected`.
- session-manager always reconnects on restart-required (250 ms after
the close — no `lastConnectedAt` gate, no 5 s back-off).
Pure helpers (`pair-state.ts`) updated to model the new branch:
- decideOnPairClose returns null when restartRequired (don't touch DB).
- shouldAutoReconnect returns true on restartRequired regardless of
whether the account has ever connected before.
Tests (+1; 26 bot tests, 104 web tests = 130 green):
- pair-state.test.ts gains explicit cases:
* restart-required close → null
* shouldAutoReconnect always true on restart-required (incl.
first-time pair, where hasEverConnected is false — the exact
case that broke in production).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bot/src/ipc/pair-state.ts (NEW)
Pure helpers for the pairing-lifecycle decisions, lifted out of
pair-handler so the rules are testable without Baileys / Postgres:
- decideOnPairClose({ current, loggedOut })
- decideOnPairTimeout({ current })
- shouldAutoReconnect({ loggedOut, hasEverConnected })
bot/src/ipc/pair-state.test.ts (NEW, 7 tests)
Locks in the regressions we just fixed:
- Non-loggedOut close from `pending` MUST settle as `unpaired`
(the row used to stay `pending` and disappear from the overview).
- logged_out close → `logged_out`.
- pair-window timeout parks still-`pending` rows; ignores rows
that already moved on.
- Auto-reconnect only kicks in for accounts that have been linked
at least once — guards against the 5-second QR refresh loop on
a fresh pair.
web/src/components/accounts-list-view.test.tsx
+ Test that the overview renders accounts in transient states
(pending, unpaired, disconnected) alongside connected ones — the
`pending` row was being hidden by listAccounts before this fix.
Bot: 24 tests passing (+7).
Web: 99 tests passing (+1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Accounts list was hiding any row in the transient `pending` status
(originally meant only for an active QR scan). When a pair attempt
failed (timeout, transient connection error, page closed mid-scan)
the row was left in `pending` and silently disappeared from the
overview — the operator's "I created an account but it's gone" bug.
Two-part fix:
- listAccounts no longer filters by status. The status badge tells
the operator what state each row is in; hiding rows just hides
bugs.
- Pairing lifecycle no longer leaves rows in `pending` after failure.
When the pair-handler sees a close (Baileys exhausting QR refs, or
the pair-window timeout firing), it now sets `status='unpaired'`
and clears `last_qr_png`. The row settles into a state that the
detail page can act on (Re-pair / Delete) and remains visible on
the list.
- The bot startup sweep used to DELETE stale pending rows older than
1 hour. It now parks them as `unpaired` instead, keeping them
visible so the operator notices and can retry.
Stuck `haha` row in the live DB also flipped to `unpaired` so it
reappears on the list immediately.
98 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The session-manager's auto-reconnect (5 s after a non-logged-out close)
was firing during initial pairing. Baileys closes the socket whenever it
exhausts its QR refs (or transient handshake errors); the auto-reconnect
then opened a brand-new socket → new QR pool → another close 5 s later.
The web saw a fresh QR every ~5 s and the user could never link, because
WhatsApp invalidates each QR as soon as Baileys cycles to the next.
Fix: only auto-reconnect for accounts that have been linked before
(`whatsapp_accounts.last_connected_at IS NOT NULL`). For brand-new
pairing attempts the pair-handler's 5-minute window is now the single
authority; on close we just stop the session and let the operator
retry. With auto-reconnect off, Baileys uses its default QR cadence:
60 s for the first QR, 20 s for each subsequent rotation, ~6 refs total
(~3 minutes of valid scanning) — plenty of time to scan.
Pair-handler now also surfaces ANY close as `session.timeout` to the
web (was only emitting on `loggedOut`). Without this the user would be
left staring at the last QR after Baileys gives up, with no way to know
pairing failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dashboard
- Stat cards are now clickable: Accounts → /accounts, Active reminders →
/reminders?filter=active, Recent runs → /reminders.
- Recent activity rows link to the underlying reminder when it still
exists. Runs whose reminder has been deleted render with a "(deleted)"
marker and stay non-clickable.
- New "Clear history" action wipes all run rows the operator owns plus
any orphan rows (reminderId=NULL).
Run history persists after reminder delete
- reminder_runs.reminder_id is now nullable with ON DELETE SET NULL, so
deleting a reminder no longer cascade-erases its history.
- New reminder_runs.reminder_name column snapshots the name at fire
time so history rows stay readable even after the reminder is gone.
- Fire-reminder records the snapshot.
- Dashboard query LEFT JOINs and COALESCEs name from the live reminder,
the snapshot, or "(deleted reminder)" as last resort.
QR
- Drop the 25 s server-side throttle. With listener accumulation already
fixed (previous commit), the payload-equality dedupe is enough.
Symptom: after the first QR expired the throttle blocked the next
emit, and the QR never refreshed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reminders
- Reminder list / detail show recurrence summary ("Every Mon, Wed, Fri",
"Every 2 weeks until 2027-01-01", etc).
- Detail page reorganised: each section (Account / Message / When /
Groups) is itself a clickable card that deep-links into the wizard
step in edit mode (editReminderId URL param). No standalone Edit
button. Run history stays read-only.
- New /reminders/[id]/edit shell loads the row, encodes its state into
wizard URL params, and forwards to /reminders/new. The wizard
threads editReminderId through every step.
- updateReminderAction: validates ownership of both the existing
reminder and the (possibly changed) target account, replaces targets
+ messages wholesale, re-arms the pg-boss job (singleton key picks
up the new fire time).
- Wizard submit branches to updateReminderAction when editReminderId
is set; button reads "Save changes" / "Saving…".
- Wizard default first-fire is now the current minute in the operator
zone (not now+1h). Same-minute clicks bump silently to next minute
via a 60 s grace window so the user isn't punished.
- /reminders empty state is filter-aware: "No failed reminders yet."
when ?filter=failed and there are reminders in other states.
Recurrence
- Spec is now a structured object: { kind, interval, weeklyDays,
monthDay, end }. Builder produces RRULEs with INTERVAL, BYDAY,
BYMONTHDAY, COUNT, UNTIL as appropriate. specFromRrule round-trips
for resuming/edit.
- When-step UI: frequency pills, "Every N days/weeks/…" interval,
weekday picker (weekly), day-of-month input (monthly), end picker
(Never / After N occurrences / On date), live human-readable
summary preview.
QR pairing
- Throttle QR refresh to once per 25 s and detach the previous
per-account session listener on Re-pair so listeners can't
accumulate. The UI countdown was flicking every ~5 s because each
Re-pair attached an extra listener — every Baileys QR event then
triggered a fresh DB write + NOTIFY.
Tests (60 green total, +33 in this batch)
- recurrence.test.ts: extended to 25 tests covering interval,
monthday, end conditions (COUNT/UNTIL), and round-trip parsing.
- date-picker.test.ts: 14 tests for splitDateTime / combineDateTime /
validateScheduledAt (incl. the "click-too-fast" same-minute grace)
and defaultFirstFireIso.
- /api/qr/[accountId] route.test.ts: 4 tests — 404 when no QR yet,
404 on missing row, 200 with image/png + no-store + correct PNG
bytes, and verifies the where-clause queries by accountId.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reminders
- Add recurrence to wizard step 3 (None / Daily / Weekly+weekday picker /
Monthly / Yearly). Build the RRULE client-side and thread it through
the wizard URL state.
- Action stores rrule + scheduleKind="recurring" on insert.
- Bot reschedules the next occurrence after firing a recurring reminder
using the existing rrule helpers in @cmbot/shared. One-off behavior
unchanged.
- Add reminders.last_fired_at column to track last fire.
Pairing
- Move QR PNG out of the pg_notify payload (the 8000-byte limit was
silently truncating it; QR never reached the web → "QR hang"). PNG
now lives on whatsapp_accounts.last_qr_png; NOTIFY just signals
{type: session.qr, accountId, ts}. Web fetches the bytes from a new
read-only /api/qr/[accountId] route (allowed via middleware).
- handleStartPairing now stops any in-flight session before starting a
fresh one — fixes Re-pair where session.start was a silent no-op and
Baileys never re-emitted QR.
- Pair-live: countdown moved out from over the QR (it was overlapping
the scan area); shown as a discrete progress bar above the QR.
- Add a "Save QR" download button.
Account detail page
- Pair / Unpair / Delete cards are themselves the trigger (form submit
or DialogTrigger) — no inline buttons, whole card is clickable.
- Sync Groups Now card removed earlier; bot already auto-syncs.
Account list page
- Cards are the link target. A small floating Delete trigger (top-right
trash icon) opens the destructive confirm dialog without blocking
navigation on the rest of the card.
Tests
- recurrence.test.ts: 10 tests for buildRrule / kindFromRrule /
describeRecurrence (incl. weekly day combos and BYDAY ordering).
- reminders.schema.test.ts: regression for the "Invalid datetime" bug —
proves strict Zod .datetime() rejected luxon's offset ISO and the
{ offset: true } option accepts both forms.
Migration: 0004_next_prowler.sql
- whatsapp_accounts.last_qr_png (text)
- reminders.last_fired_at (timestamptz)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reshape the account lifecycle to match how operators actually want to
work the system:
- Add Account → creates a row with status='unpaired'. No QR yet; the
operator lands on the detail page.
- Pair / Re-pair → transitions an unpaired account to status='pending'
and opens the live QR flow. Works for first-time pair AND for re-pair
of an account that was previously unpaired.
- Unpair → asks the bot to stop the live Baileys session and clean
session files; sets status='unpaired' but KEEPS the row (and its
reminders) so the operator can re-pair without retyping anything.
- Delete → permanently removes the account and cascades to its groups,
reminders, run history.
Schema:
- whatsapp_groups.account_id and reminders.account_id now have
ON DELETE CASCADE so deleting an account fans out cleanly.
UI:
- /accounts list shows everything except the transient 'pending' state.
- /accounts/[id] shows state-aware buttons: Pair (when unpaired/banned/
disconnected), Sync + Unpair (when connected), Delete (always).
- /accounts/new is now an "Add Account" form (label only).
Other fixes:
- next.config.ts: allowedDevOrigins includes 192.168.0.253 +
test/rexwa subdomains so Server Actions work across the LAN.
- packages/shared/src/rrule.ts: rrule@2.8.1 has no exports field and
ships ESM that some bundlers can't resolve via default OR named
import. Use createRequire to bridge — works under both NodeNext
(bot runtime) and Turbopack (web SSR).