18 Commits

Author SHA1 Message Date
2fe8459d25 feat: duplicate-pair detection + logout-before-delete + ordering tests
Three connected bits of paired-account hygiene:

1. Duplicate-pair guard (apps/bot/src/ipc/pair-handler.ts)

   Operator scans the QR with a phone that's already linked to
   another account row → both rows would fight over the same
   WhatsApp device and sends become a coin flip. After Baileys'
   `open` event the bot now queries siblings of the same operator,
   passes them through findDuplicateExistingAccount() (a pure
   helper extracted to pair-state.ts), and on a hit:
     - stops the new session (intentional; keeps the original's
       session intact)
     - scrubs the partial auth blob from disk
     - resets the row's status to unpaired and clears phone_number
     - emits a new session.duplicate event with the existing row's
       label so PairLive can render a clear message
   New PairLive 'duplicate' phase: amber icon + "Phone already
   linked, unpair the existing account first or scan with a
   different phone".

2. Logout-before-delete (apps/bot/src/ipc/unpair-handler.ts +
   apps/bot/src/whatsapp/session-manager.ts)

   Delete used to call account.unpair which only closes the local
   socket — the operator's phone kept showing a phantom "linked
   device" pointing at a row that no longer exists. Added:
     - new account.delete command type (web side and bot side)
     - sessionManager.logoutAndStop(): calls socket.logout() so
       WhatsApp drops the device on the server side, THEN closes
       the local socket. Best-effort; logout RPC failure doesn't
       strand the delete.
     - new handleDelete() handler that calls logoutAndStop, removes
       session files, audits, and notifies.
     - deleteAccountAction now sends account.delete instead of
       account.unpair.
   Unpair stays unchanged — re-pair-friendly, no logout.

3. Tests (bot 77 → 88, web 477 → 480)

   - findDuplicateExistingAccount: 6 cases covering match, no-match,
     self-exclusion, null/empty/whitespace handling, whitespace
     normalisation, deterministic-pick when (defensively) two
     siblings share a phone.
   - handleUnpair / handleDelete: handleDelete calls logoutAndStop
     BEFORE rm; handleUnpair never touches logoutAndStop (regression
     guard for a refactor that swaps them); audit log payload
     includes the row's label; audit lookup throwing doesn't strand
     the delete.
   - listAccounts ordering: static guard against the rename-
     reshuffles-list regression. Pins `asc(a.createdAt)` + `asc(a.id)`
     and rejects `asc(a.label)` in the function body.

Bot restarted with the new flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:26:58 +08:00
731d6d66a6 fix(unpair): stop session manager from racing the web's status write
Symptom
-------
Click "Unpair" on a connected account. The web action sets
\`status='unpaired'\`, but the account detail page often still shows
"Disconnected" — and on accounts that had been previously connected,
the QR pair flow restarts a few seconds later all on its own.

Cause
-----
Two races inside the session manager:

1. The web's \`unpairAccountAction\` notifies the bot via \`pg_notify\`
   and then writes \`status='unpaired'\` to the row. The bot's
   \`handleUnpair\` calls \`sessionManager.stop()\` which closes the
   Baileys socket; Baileys eventually fires a \`connection: close\`
   event which the manager's \`handleEvent\` translates into a
   \`status='disconnected'\` UPDATE. Whichever write lands second wins.
   The user clicks Unpair and sees Disconnected.

2. The same close-event handler schedules a 5-second
   \`stop().then(start())\` reconnect for accounts whose
   \`lastConnectedAt\` is set. Five seconds after unpair, the bot
   silently re-opens the socket, the row flips to \`pending\`, and the
   QR carousel restarts.

Fix
---
\`stop(accountId, { intentional: true })\` marks the account in a new
\`intentionalStops\` Set. When the close event lands, \`handleEvent\`
drains the flag (with \`Set.delete()\` returning whether the key was
present, so it's exactly-once and a stale flag can't bleed into a
later session) and skips both the DB UPDATE and the reconnect
schedule. The caller — only \`handleUnpair\` for now — is the one
choosing the row's next state, so we step out of its way.

The flag is set ONLY when callers ask for it. Internal recoveries
(restartRequired auto re-open, ephemeral-close back-off) keep the
default behaviour and continue to write \`disconnected\` + reschedule.

Drive-bys
---------
- Refresh the stale "the row is gone by the time we run" comment in
  unpair-handler — the row stays alive now (the operator can re-pair
  without retyping the label). Look up the account first so the
  audit log carries the real \`operatorId\` instead of \`null\`. The
  delete-account flow really does delete the row before notifying us;
  the lookup tolerates that and falls back to \`null\`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 11:39:56 +08:00
c95b9658d1 fix(bot): treat post-pair "restart required" close as success, not timeout
Found from the live bot log: after the user scans the QR, Baileys
receives `pair-success`, logs "pairing configured successfully, expect
to restart the connection...", and then closes the websocket with
status 515 (DisconnectReason.restartRequired) so it can reopen with
the new credentials. The next `open` event finishes the pairing.

The previous code path treated ANY close during pairing as a failure:
it parked the row as `unpaired`, wiped the QR, and emitted
session.timeout to the UI. The user was greeted with "Pairing timed
out — The QR window closed before a device was linked" at the exact
moment they had successfully paired.

Three changes:

- session.ts emits `restartRequired: boolean` on the SessionEvent close
  payload (true when reason === DisconnectReason.restartRequired).
- pair-handler treats the restart-required close as a no-op: keeps the
  listener attached and the DB row in `pending` so the upcoming `open`
  event flips it to `connected`.
- session-manager always reconnects on restart-required (250 ms after
  the close — no `lastConnectedAt` gate, no 5 s back-off).

Pure helpers (`pair-state.ts`) updated to model the new branch:
- decideOnPairClose returns null when restartRequired (don't touch DB).
- shouldAutoReconnect returns true on restartRequired regardless of
  whether the account has ever connected before.

Tests (+1; 26 bot tests, 104 web tests = 130 green):
- pair-state.test.ts gains explicit cases:
  * restart-required close → null
  * shouldAutoReconnect always true on restart-required (incl.
    first-time pair, where hasEverConnected is false — the exact
    case that broke in production).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 09:45:37 +08:00
65f4d2d099 fix(bot): revert qrTimeout — keep Baileys' native 60/20s rotation
The earlier "QR refreshes every 5 s" bug was the session-manager
auto-reconnect loop (commit 4d10c72), not the QR cadence. Baileys'
default QR rotation (60 s first ref, then ~20 s per subsequent ref) is
the correct native behaviour — each rotation just refreshes the
displayed QR via SSE. Forcing qrTimeout=60s suppressed those legitimate
rotations and made the QR feel stuck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 08:58:35 +08:00
234e8aa690 fix(web,bot): drop next-themes, extend QR validity, fix retry CTA
next-themes hydration mismatch
- Removed the next-themes wrapper, ThemeProvider component, and the
  Settings appearance card — there's no theme-toggle UI anywhere in
  the app, so the library was just adding a pre-hydration `<script>`
  that triggered React 19's "script tag while rendering" warning and
  the `<html>` class swap caused the hydration mismatch.
- Sonner Toaster now uses a fixed `theme="light"` instead of useTheme.
- Layout drops `suppressHydrationWarning` on `<html>` since we no
  longer mutate it on mount.

QR refs exhausted before the user could scan
- Pass `qrTimeout: 60_000` to makeWASocket so each QR (first AND
  subsequent) lasts a full minute. Default was 60 s for the first and
  20 s for each subsequent → ~6 refs × default = ~2.5 min before
  Baileys gave up. With 60 s flat, the user has the full ~5 min
  window matching pair-handler's PAIR_TIMEOUT_MS.

Pairing-timed-out screen
- "Try again" used to link to /accounts/new (creates a new account
  instead of re-pairing the existing one). Link now points to the
  existing /accounts/[id] detail page where the operator can hit
  Re-pair.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 08:57:13 +08:00
4d10c72551 fix(bot): stop reconnect loop during fresh pairing — root cause of QR rotation every 5s
The session-manager's auto-reconnect (5 s after a non-logged-out close)
was firing during initial pairing. Baileys closes the socket whenever it
exhausts its QR refs (or transient handshake errors); the auto-reconnect
then opened a brand-new socket → new QR pool → another close 5 s later.
The web saw a fresh QR every ~5 s and the user could never link, because
WhatsApp invalidates each QR as soon as Baileys cycles to the next.

Fix: only auto-reconnect for accounts that have been linked before
(`whatsapp_accounts.last_connected_at IS NOT NULL`). For brand-new
pairing attempts the pair-handler's 5-minute window is now the single
authority; on close we just stop the session and let the operator
retry. With auto-reconnect off, Baileys uses its default QR cadence:
60 s for the first QR, 20 s for each subsequent rotation, ~6 refs total
(~3 minutes of valid scanning) — plenty of time to scan.

Pair-handler now also surfaces ANY close as `session.timeout` to the
web (was only emitting on `loggedOut`). Without this the user would be
left staring at the last QR after Baileys gives up, with no way to know
pairing failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 08:45:47 +08:00
d9a5f5a5e2 feat(bot): extend sender with image/video/document support 2026-05-09 17:23:06 +08:00
9062ba7e7f fix(bot): drop removed groups during sync
Previously syncGroupsForAccount only upserted, so groups removed from
WhatsApp (deleted, bot was kicked, etc.) lingered in the DB.

Now compute the diff: any whatsapp_groups row for this account whose
wa_group_jid is not in the live fetch result is deleted. Skip the delete
sweep when the fetch returns empty — that's more likely transient than
a genuine "every group gone" signal, and we don't want to nuke valid
data on a hiccup.

Return shape gains a `removed` count alongside `synced`.
2026-05-09 17:08:11 +08:00
43882d5a1b feat(bot): refresh groups list — manual button + auto event listener
Group sync previously only ran once at pairing time, so groups created in
WhatsApp afterwards never showed up.

Two complementary fixes:
- 🔄 Refresh button in the groups list view triggers
  syncGroupsForAccount() on demand and re-renders the menu
- session.ts now subscribes to Baileys 'groups.upsert' and 'groups.update'
  events and re-syncs (debounced 1.5s) so new groups appear without
  manual action
2026-05-09 16:54:55 +08:00
5259f88776 fix(bot): chunk participant pre-key fetches to survive broken JIDs
WhatsApp's pre-key endpoint returns 406 not-acceptable if ANY single JID
in the batch is in a broken state (deleted account, deactivated, etc.).
With Baileys' default behavior of asking for the whole participant list at
once, one stale member poisons the whole group send.

Chunk participant JIDs into batches of 5 and tolerate per-chunk failures.
The send fan-out then works for the participants whose sessions did land,
which covers the vast majority of real-world groups.

Also adds explicit pino logging so we can see which chunks failed during
diagnosis.
2026-05-09 16:52:48 +08:00
2fdcdb6202 fix(bot): explicit assertSessions before group send
groupMetadata alone wasn't enough — Baileys won't establish individual
libsignal sessions lazily during sendMessage, so the first send to a
freshly-paired group fails per-participant. Cast to the internal
assertSessions(jids, force=true) and call it on every participant before
attempting to send.
2026-05-09 16:50:51 +08:00
99cece16c0 fix(bot): pre-fetch group metadata + retry sender on libsignal race
First send to a group after pairing fails with libsignal SessionError
"No sessions" because Baileys hasn't yet established encryption sessions
with all participants. Force-fetch group metadata before sendMessage so
Baileys populates its participant map; if the first send still races,
retry once after a 1.5s delay.
2026-05-09 16:48:42 +08:00
3c4eedff03 feat(bot): tap-to-send test message from groups menu
Each entry in the groups list is now a button. Tapping shows a group detail
view with [📝 Send Test Text]. Operator replies with the message body and
the bot sends it to the selected WhatsApp group via the live Baileys session,
records the action in audit_log, and shows success/failure inline.

This is a small forerunner of the full reminder send pipeline that plan 2
will build out (with media, scheduling, retries). Useful right now to
validate the end-to-end Telegram-to-WhatsApp send path during pairing tests.
2026-05-09 16:46:22 +08:00
1e3173424a fix(bot): pin Baileys to latest WA Web version + handle smart quotes
Two pairing-flow fixes after live test:
- Connection Failure during pairing: Baileys announced a stale WhatsApp Web
  version that the server rejected before the QR was emitted. Pull the
  current version via fetchLatestBaileysVersion() at session start.
- Telegram mobile auto-converts straight quotes to curly quotes, so labels
  like /pair "test 1" arrived as “test 1” and the curly quotes were never
  stripped. Extend the quote-stripping regex on /pair, /unpair, /groups.
2026-05-09 16:28:01 +08:00
f8bd20184f feat(bot): add group sync upsert 2026-05-09 16:21:01 +08:00
c2ee793ae6 feat(bot): add session manager with state machine + reconnect 2026-05-09 16:20:20 +08:00
fc05a8b459 feat(bot): add Baileys session wrapper
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 16:18:11 +08:00
dd1eb711df feat(bot): add QR PNG renderer 2026-05-09 16:16:09 +08:00