Symptom
-------
Click "Unpair" on a connected account. The web action sets
\`status='unpaired'\`, but the account detail page often still shows
"Disconnected" — and on accounts that had been previously connected,
the QR pair flow restarts a few seconds later all on its own.
Cause
-----
Two races inside the session manager:
1. The web's \`unpairAccountAction\` notifies the bot via \`pg_notify\`
and then writes \`status='unpaired'\` to the row. The bot's
\`handleUnpair\` calls \`sessionManager.stop()\` which closes the
Baileys socket; Baileys eventually fires a \`connection: close\`
event which the manager's \`handleEvent\` translates into a
\`status='disconnected'\` UPDATE. Whichever write lands second wins.
The user clicks Unpair and sees Disconnected.
2. The same close-event handler schedules a 5-second
\`stop().then(start())\` reconnect for accounts whose
\`lastConnectedAt\` is set. Five seconds after unpair, the bot
silently re-opens the socket, the row flips to \`pending\`, and the
QR carousel restarts.
Fix
---
\`stop(accountId, { intentional: true })\` marks the account in a new
\`intentionalStops\` Set. When the close event lands, \`handleEvent\`
drains the flag (with \`Set.delete()\` returning whether the key was
present, so it's exactly-once and a stale flag can't bleed into a
later session) and skips both the DB UPDATE and the reconnect
schedule. The caller — only \`handleUnpair\` for now — is the one
choosing the row's next state, so we step out of its way.
The flag is set ONLY when callers ask for it. Internal recoveries
(restartRequired auto re-open, ephemeral-close back-off) keep the
default behaviour and continue to write \`disconnected\` + reschedule.
Drive-bys
---------
- Refresh the stale "the row is gone by the time we run" comment in
unpair-handler — the row stays alive now (the operator can re-pair
without retyping the label). Look up the account first so the
audit log carries the real \`operatorId\` instead of \`null\`. The
delete-account flow really does delete the row before notifying us;
the lookup tolerates that and falls back to \`null\`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Found from the live bot log: after the user scans the QR, Baileys
receives `pair-success`, logs "pairing configured successfully, expect
to restart the connection...", and then closes the websocket with
status 515 (DisconnectReason.restartRequired) so it can reopen with
the new credentials. The next `open` event finishes the pairing.
The previous code path treated ANY close during pairing as a failure:
it parked the row as `unpaired`, wiped the QR, and emitted
session.timeout to the UI. The user was greeted with "Pairing timed
out — The QR window closed before a device was linked" at the exact
moment they had successfully paired.
Three changes:
- session.ts emits `restartRequired: boolean` on the SessionEvent close
payload (true when reason === DisconnectReason.restartRequired).
- pair-handler treats the restart-required close as a no-op: keeps the
listener attached and the DB row in `pending` so the upcoming `open`
event flips it to `connected`.
- session-manager always reconnects on restart-required (250 ms after
the close — no `lastConnectedAt` gate, no 5 s back-off).
Pure helpers (`pair-state.ts`) updated to model the new branch:
- decideOnPairClose returns null when restartRequired (don't touch DB).
- shouldAutoReconnect returns true on restartRequired regardless of
whether the account has ever connected before.
Tests (+1; 26 bot tests, 104 web tests = 130 green):
- pair-state.test.ts gains explicit cases:
* restart-required close → null
* shouldAutoReconnect always true on restart-required (incl.
first-time pair, where hasEverConnected is false — the exact
case that broke in production).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier "QR refreshes every 5 s" bug was the session-manager
auto-reconnect loop (commit 4d10c72), not the QR cadence. Baileys'
default QR rotation (60 s first ref, then ~20 s per subsequent ref) is
the correct native behaviour — each rotation just refreshes the
displayed QR via SSE. Forcing qrTimeout=60s suppressed those legitimate
rotations and made the QR feel stuck.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
next-themes hydration mismatch
- Removed the next-themes wrapper, ThemeProvider component, and the
Settings appearance card — there's no theme-toggle UI anywhere in
the app, so the library was just adding a pre-hydration `<script>`
that triggered React 19's "script tag while rendering" warning and
the `<html>` class swap caused the hydration mismatch.
- Sonner Toaster now uses a fixed `theme="light"` instead of useTheme.
- Layout drops `suppressHydrationWarning` on `<html>` since we no
longer mutate it on mount.
QR refs exhausted before the user could scan
- Pass `qrTimeout: 60_000` to makeWASocket so each QR (first AND
subsequent) lasts a full minute. Default was 60 s for the first and
20 s for each subsequent → ~6 refs × default = ~2.5 min before
Baileys gave up. With 60 s flat, the user has the full ~5 min
window matching pair-handler's PAIR_TIMEOUT_MS.
Pairing-timed-out screen
- "Try again" used to link to /accounts/new (creates a new account
instead of re-pairing the existing one). Link now points to the
existing /accounts/[id] detail page where the operator can hit
Re-pair.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The session-manager's auto-reconnect (5 s after a non-logged-out close)
was firing during initial pairing. Baileys closes the socket whenever it
exhausts its QR refs (or transient handshake errors); the auto-reconnect
then opened a brand-new socket → new QR pool → another close 5 s later.
The web saw a fresh QR every ~5 s and the user could never link, because
WhatsApp invalidates each QR as soon as Baileys cycles to the next.
Fix: only auto-reconnect for accounts that have been linked before
(`whatsapp_accounts.last_connected_at IS NOT NULL`). For brand-new
pairing attempts the pair-handler's 5-minute window is now the single
authority; on close we just stop the session and let the operator
retry. With auto-reconnect off, Baileys uses its default QR cadence:
60 s for the first QR, 20 s for each subsequent rotation, ~6 refs total
(~3 minutes of valid scanning) — plenty of time to scan.
Pair-handler now also surfaces ANY close as `session.timeout` to the
web (was only emitting on `loggedOut`). Without this the user would be
left staring at the last QR after Baileys gives up, with no way to know
pairing failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously syncGroupsForAccount only upserted, so groups removed from
WhatsApp (deleted, bot was kicked, etc.) lingered in the DB.
Now compute the diff: any whatsapp_groups row for this account whose
wa_group_jid is not in the live fetch result is deleted. Skip the delete
sweep when the fetch returns empty — that's more likely transient than
a genuine "every group gone" signal, and we don't want to nuke valid
data on a hiccup.
Return shape gains a `removed` count alongside `synced`.
Group sync previously only ran once at pairing time, so groups created in
WhatsApp afterwards never showed up.
Two complementary fixes:
- 🔄 Refresh button in the groups list view triggers
syncGroupsForAccount() on demand and re-renders the menu
- session.ts now subscribes to Baileys 'groups.upsert' and 'groups.update'
events and re-syncs (debounced 1.5s) so new groups appear without
manual action
WhatsApp's pre-key endpoint returns 406 not-acceptable if ANY single JID
in the batch is in a broken state (deleted account, deactivated, etc.).
With Baileys' default behavior of asking for the whole participant list at
once, one stale member poisons the whole group send.
Chunk participant JIDs into batches of 5 and tolerate per-chunk failures.
The send fan-out then works for the participants whose sessions did land,
which covers the vast majority of real-world groups.
Also adds explicit pino logging so we can see which chunks failed during
diagnosis.
groupMetadata alone wasn't enough — Baileys won't establish individual
libsignal sessions lazily during sendMessage, so the first send to a
freshly-paired group fails per-participant. Cast to the internal
assertSessions(jids, force=true) and call it on every participant before
attempting to send.
First send to a group after pairing fails with libsignal SessionError
"No sessions" because Baileys hasn't yet established encryption sessions
with all participants. Force-fetch group metadata before sendMessage so
Baileys populates its participant map; if the first send still races,
retry once after a 1.5s delay.
Each entry in the groups list is now a button. Tapping shows a group detail
view with [📝 Send Test Text]. Operator replies with the message body and
the bot sends it to the selected WhatsApp group via the live Baileys session,
records the action in audit_log, and shows success/failure inline.
This is a small forerunner of the full reminder send pipeline that plan 2
will build out (with media, scheduling, retries). Useful right now to
validate the end-to-end Telegram-to-WhatsApp send path during pairing tests.
Two pairing-flow fixes after live test:
- Connection Failure during pairing: Baileys announced a stale WhatsApp Web
version that the server rejected before the QR was emitted. Pull the
current version via fetchLatestBaileysVersion() at session start.
- Telegram mobile auto-converts straight quotes to curly quotes, so labels
like /pair "test 1" arrived as “test 1” and the curly quotes were never
stripped. Extend the quote-stripping regex on /pair, /unpair, /groups.