12 Commits

Author SHA1 Message Date
731d6d66a6 fix(unpair): stop session manager from racing the web's status write
Symptom
-------
Click "Unpair" on a connected account. The web action sets
\`status='unpaired'\`, but the account detail page often still shows
"Disconnected" — and on accounts that had been previously connected,
the QR pair flow restarts a few seconds later all on its own.

Cause
-----
Two races inside the session manager:

1. The web's \`unpairAccountAction\` notifies the bot via \`pg_notify\`
   and then writes \`status='unpaired'\` to the row. The bot's
   \`handleUnpair\` calls \`sessionManager.stop()\` which closes the
   Baileys socket; Baileys eventually fires a \`connection: close\`
   event which the manager's \`handleEvent\` translates into a
   \`status='disconnected'\` UPDATE. Whichever write lands second wins.
   The user clicks Unpair and sees Disconnected.

2. The same close-event handler schedules a 5-second
   \`stop().then(start())\` reconnect for accounts whose
   \`lastConnectedAt\` is set. Five seconds after unpair, the bot
   silently re-opens the socket, the row flips to \`pending\`, and the
   QR carousel restarts.

Fix
---
\`stop(accountId, { intentional: true })\` marks the account in a new
\`intentionalStops\` Set. When the close event lands, \`handleEvent\`
drains the flag (with \`Set.delete()\` returning whether the key was
present, so it's exactly-once and a stale flag can't bleed into a
later session) and skips both the DB UPDATE and the reconnect
schedule. The caller — only \`handleUnpair\` for now — is the one
choosing the row's next state, so we step out of its way.

The flag is set ONLY when callers ask for it. Internal recoveries
(restartRequired auto re-open, ephemeral-close back-off) keep the
default behaviour and continue to write \`disconnected\` + reschedule.

Drive-bys
---------
- Refresh the stale "the row is gone by the time we run" comment in
  unpair-handler — the row stays alive now (the operator can re-pair
  without retyping the label). Look up the account first so the
  audit log carries the real \`operatorId\` instead of \`null\`. The
  delete-account flow really does delete the row before notifying us;
  the lookup tolerates that and falls back to \`null\`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 11:39:56 +08:00
ec57a78853 feat(send-test): close the loop — bot reports done back to the form
The send-test form was stuck on "Sending to <Group>…" because the
server action returns the moment it publishes the IPC NOTIFY; the bot
processed the actual WhatsApp send out-of-band and the form had no way
to learn whether it succeeded.

Round-trip now wired end-to-end:

- New WebEvent variant `send_test.done` { groupId, ok, error }.
- bot/src/ipc/send-test-handler emits it on every exit path:
  - missing group   → ok=false, "Group not found"
  - account offline → ok=false, "Account not connected — re-pair first"
  - send threw      → ok=false, error message
  - send succeeded  → ok=true,  null
- web/src/hooks/use-events declares the new event in its type map.
- web SendTestForm subscribes via useEvents, filters by its own
  groupId so a parallel send-test on another group can't move our
  state, and renders one of three pills:
    * Sending…           (in-flight — Loader2 spinner)
    * Sent ✓             (success — emerald CheckCircle2)
    * <error message>    (failure — destructive AlertCircle)
  The "Send Test" button stays disabled while in-flight.

Tests (+5; 110 web tests total):
  send-test-form.test.tsx
  - SSR markup: textarea, submit button, hidden groupId, no premature
    pill on first render.
  - useEvents wiring: form registers a `send_test.done` handler.
  - Handler safely accepts:
    * matching success event
    * matching failure event
    * mismatched groupId (must not throw)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 10:04:33 +08:00
c95b9658d1 fix(bot): treat post-pair "restart required" close as success, not timeout
Found from the live bot log: after the user scans the QR, Baileys
receives `pair-success`, logs "pairing configured successfully, expect
to restart the connection...", and then closes the websocket with
status 515 (DisconnectReason.restartRequired) so it can reopen with
the new credentials. The next `open` event finishes the pairing.

The previous code path treated ANY close during pairing as a failure:
it parked the row as `unpaired`, wiped the QR, and emitted
session.timeout to the UI. The user was greeted with "Pairing timed
out — The QR window closed before a device was linked" at the exact
moment they had successfully paired.

Three changes:

- session.ts emits `restartRequired: boolean` on the SessionEvent close
  payload (true when reason === DisconnectReason.restartRequired).
- pair-handler treats the restart-required close as a no-op: keeps the
  listener attached and the DB row in `pending` so the upcoming `open`
  event flips it to `connected`.
- session-manager always reconnects on restart-required (250 ms after
  the close — no `lastConnectedAt` gate, no 5 s back-off).

Pure helpers (`pair-state.ts`) updated to model the new branch:
- decideOnPairClose returns null when restartRequired (don't touch DB).
- shouldAutoReconnect returns true on restartRequired regardless of
  whether the account has ever connected before.

Tests (+1; 26 bot tests, 104 web tests = 130 green):
- pair-state.test.ts gains explicit cases:
  * restart-required close → null
  * shouldAutoReconnect always true on restart-required (incl.
    first-time pair, where hasEverConnected is false — the exact
    case that broke in production).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 09:45:37 +08:00
1c9cb75111 test: pairing-state transitions + accounts overview shows pending rows
bot/src/ipc/pair-state.ts (NEW)
  Pure helpers for the pairing-lifecycle decisions, lifted out of
  pair-handler so the rules are testable without Baileys / Postgres:
  - decideOnPairClose({ current, loggedOut })
  - decideOnPairTimeout({ current })
  - shouldAutoReconnect({ loggedOut, hasEverConnected })

bot/src/ipc/pair-state.test.ts (NEW, 7 tests)
  Locks in the regressions we just fixed:
  - Non-loggedOut close from `pending` MUST settle as `unpaired`
    (the row used to stay `pending` and disappear from the overview).
  - logged_out close → `logged_out`.
  - pair-window timeout parks still-`pending` rows; ignores rows
    that already moved on.
  - Auto-reconnect only kicks in for accounts that have been linked
    at least once — guards against the 5-second QR refresh loop on
    a fresh pair.

web/src/components/accounts-list-view.test.tsx
  + Test that the overview renders accounts in transient states
    (pending, unpaired, disconnected) alongside connected ones — the
    `pending` row was being hidden by listAccounts before this fix.

Bot: 24 tests passing (+7).
Web: 99 tests passing (+1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 09:36:45 +08:00
fe135cdef5 fix: don't hide accounts in 'pending' state; park failed pairs as 'unpaired'
Accounts list was hiding any row in the transient `pending` status
(originally meant only for an active QR scan). When a pair attempt
failed (timeout, transient connection error, page closed mid-scan)
the row was left in `pending` and silently disappeared from the
overview — the operator's "I created an account but it's gone" bug.

Two-part fix:

- listAccounts no longer filters by status. The status badge tells
  the operator what state each row is in; hiding rows just hides
  bugs.

- Pairing lifecycle no longer leaves rows in `pending` after failure.
  When the pair-handler sees a close (Baileys exhausting QR refs, or
  the pair-window timeout firing), it now sets `status='unpaired'`
  and clears `last_qr_png`. The row settles into a state that the
  detail page can act on (Re-pair / Delete) and remains visible on
  the list.

- The bot startup sweep used to DELETE stale pending rows older than
  1 hour. It now parks them as `unpaired` instead, keeping them
  visible so the operator notices and can retry.

Stuck `haha` row in the live DB also flipped to `unpaired` so it
reappears on the list immediately.

98 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 09:34:46 +08:00
4d10c72551 fix(bot): stop reconnect loop during fresh pairing — root cause of QR rotation every 5s
The session-manager's auto-reconnect (5 s after a non-logged-out close)
was firing during initial pairing. Baileys closes the socket whenever it
exhausts its QR refs (or transient handshake errors); the auto-reconnect
then opened a brand-new socket → new QR pool → another close 5 s later.
The web saw a fresh QR every ~5 s and the user could never link, because
WhatsApp invalidates each QR as soon as Baileys cycles to the next.

Fix: only auto-reconnect for accounts that have been linked before
(`whatsapp_accounts.last_connected_at IS NOT NULL`). For brand-new
pairing attempts the pair-handler's 5-minute window is now the single
authority; on close we just stop the session and let the operator
retry. With auto-reconnect off, Baileys uses its default QR cadence:
60 s for the first QR, 20 s for each subsequent rotation, ~6 refs total
(~3 minutes of valid scanning) — plenty of time to scan.

Pair-handler now also surfaces ANY close as `session.timeout` to the
web (was only emitting on `loggedOut`). Without this the user would be
left staring at the last QR after Baileys gives up, with no way to know
pairing failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 08:45:47 +08:00
ba9e50fec0 feat: dashboard navigation, preserve run history, QR refresh fix
Dashboard
- Stat cards are now clickable: Accounts → /accounts, Active reminders →
  /reminders?filter=active, Recent runs → /reminders.
- Recent activity rows link to the underlying reminder when it still
  exists. Runs whose reminder has been deleted render with a "(deleted)"
  marker and stay non-clickable.
- New "Clear history" action wipes all run rows the operator owns plus
  any orphan rows (reminderId=NULL).

Run history persists after reminder delete
- reminder_runs.reminder_id is now nullable with ON DELETE SET NULL, so
  deleting a reminder no longer cascade-erases its history.
- New reminder_runs.reminder_name column snapshots the name at fire
  time so history rows stay readable even after the reminder is gone.
- Fire-reminder records the snapshot.
- Dashboard query LEFT JOINs and COALESCEs name from the live reminder,
  the snapshot, or "(deleted reminder)" as last resort.

QR
- Drop the 25 s server-side throttle. With listener accumulation already
  fixed (previous commit), the payload-equality dedupe is enough.
  Symptom: after the first QR expired the throttle blocked the next
  emit, and the QR never refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:27:53 +08:00
f19ea03e0d feat: edit reminders, mature recurrence, QR throttle, more tests
Reminders
- Reminder list / detail show recurrence summary ("Every Mon, Wed, Fri",
  "Every 2 weeks until 2027-01-01", etc).
- Detail page reorganised: each section (Account / Message / When /
  Groups) is itself a clickable card that deep-links into the wizard
  step in edit mode (editReminderId URL param). No standalone Edit
  button. Run history stays read-only.
- New /reminders/[id]/edit shell loads the row, encodes its state into
  wizard URL params, and forwards to /reminders/new. The wizard
  threads editReminderId through every step.
- updateReminderAction: validates ownership of both the existing
  reminder and the (possibly changed) target account, replaces targets
  + messages wholesale, re-arms the pg-boss job (singleton key picks
  up the new fire time).
- Wizard submit branches to updateReminderAction when editReminderId
  is set; button reads "Save changes" / "Saving…".
- Wizard default first-fire is now the current minute in the operator
  zone (not now+1h). Same-minute clicks bump silently to next minute
  via a 60 s grace window so the user isn't punished.
- /reminders empty state is filter-aware: "No failed reminders yet."
  when ?filter=failed and there are reminders in other states.

Recurrence
- Spec is now a structured object: { kind, interval, weeklyDays,
  monthDay, end }. Builder produces RRULEs with INTERVAL, BYDAY,
  BYMONTHDAY, COUNT, UNTIL as appropriate. specFromRrule round-trips
  for resuming/edit.
- When-step UI: frequency pills, "Every N days/weeks/…" interval,
  weekday picker (weekly), day-of-month input (monthly), end picker
  (Never / After N occurrences / On date), live human-readable
  summary preview.

QR pairing
- Throttle QR refresh to once per 25 s and detach the previous
  per-account session listener on Re-pair so listeners can't
  accumulate. The UI countdown was flicking every ~5 s because each
  Re-pair attached an extra listener — every Baileys QR event then
  triggered a fresh DB write + NOTIFY.

Tests (60 green total, +33 in this batch)
- recurrence.test.ts: extended to 25 tests covering interval,
  monthday, end conditions (COUNT/UNTIL), and round-trip parsing.
- date-picker.test.ts: 14 tests for splitDateTime / combineDateTime /
  validateScheduledAt (incl. the "click-too-fast" same-minute grace)
  and defaultFirstFireIso.
- /api/qr/[accountId] route.test.ts: 4 tests — 404 when no QR yet,
  404 on missing row, 200 with image/png + no-store + correct PNG
  bytes, and verifies the where-clause queries by accountId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:22:22 +08:00
2b738383e4 feat: recurring reminders, fix QR pairing, account UX polish, tests
Reminders
- Add recurrence to wizard step 3 (None / Daily / Weekly+weekday picker /
  Monthly / Yearly). Build the RRULE client-side and thread it through
  the wizard URL state.
- Action stores rrule + scheduleKind="recurring" on insert.
- Bot reschedules the next occurrence after firing a recurring reminder
  using the existing rrule helpers in @cmbot/shared. One-off behavior
  unchanged.
- Add reminders.last_fired_at column to track last fire.

Pairing
- Move QR PNG out of the pg_notify payload (the 8000-byte limit was
  silently truncating it; QR never reached the web → "QR hang"). PNG
  now lives on whatsapp_accounts.last_qr_png; NOTIFY just signals
  {type: session.qr, accountId, ts}. Web fetches the bytes from a new
  read-only /api/qr/[accountId] route (allowed via middleware).
- handleStartPairing now stops any in-flight session before starting a
  fresh one — fixes Re-pair where session.start was a silent no-op and
  Baileys never re-emitted QR.
- Pair-live: countdown moved out from over the QR (it was overlapping
  the scan area); shown as a discrete progress bar above the QR.
- Add a "Save QR" download button.

Account detail page
- Pair / Unpair / Delete cards are themselves the trigger (form submit
  or DialogTrigger) — no inline buttons, whole card is clickable.
- Sync Groups Now card removed earlier; bot already auto-syncs.

Account list page
- Cards are the link target. A small floating Delete trigger (top-right
  trash icon) opens the destructive confirm dialog without blocking
  navigation on the rest of the card.

Tests
- recurrence.test.ts: 10 tests for buildRrule / kindFromRrule /
  describeRecurrence (incl. weekly day combos and BYDAY ordering).
- reminders.schema.test.ts: regression for the "Invalid datetime" bug —
  proves strict Zod .datetime() rejected luxon's offset ISO and the
  { offset: true } option accepts both forms.

Migration: 0004_next_prowler.sql
- whatsapp_accounts.last_qr_png (text)
- reminders.last_fired_at (timestamptz)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:01:31 +08:00
9437df74ee feat(web): split Add Account from Pair; add Unpair/Re-pair/Delete actions
Reshape the account lifecycle to match how operators actually want to
work the system:

- Add Account → creates a row with status='unpaired'. No QR yet; the
  operator lands on the detail page.
- Pair / Re-pair → transitions an unpaired account to status='pending'
  and opens the live QR flow. Works for first-time pair AND for re-pair
  of an account that was previously unpaired.
- Unpair → asks the bot to stop the live Baileys session and clean
  session files; sets status='unpaired' but KEEPS the row (and its
  reminders) so the operator can re-pair without retyping anything.
- Delete → permanently removes the account and cascades to its groups,
  reminders, run history.

Schema:
- whatsapp_groups.account_id and reminders.account_id now have
  ON DELETE CASCADE so deleting an account fans out cleanly.

UI:
- /accounts list shows everything except the transient 'pending' state.
- /accounts/[id] shows state-aware buttons: Pair (when unpaired/banned/
  disconnected), Sync + Unpair (when connected), Delete (always).
- /accounts/new is now an "Add Account" form (label only).

Other fixes:
- next.config.ts: allowedDevOrigins includes 192.168.0.253 +
  test/rexwa subdomains so Server Actions work across the LAN.
- packages/shared/src/rrule.ts: rrule@2.8.1 has no exports field and
  ships ESM that some bundlers can't resolve via default OR named
  import. Use createRequire to bridge — works under both NodeNext
  (bot runtime) and Turbopack (web SSR).
2026-05-10 00:27:33 +08:00
af21bc5599 feat(bot): add IPC handlers for pair / unpair / sync / send-test / schedule 2026-05-09 22:34:01 +08:00
abcf19b71a feat(bot): add IPC notify helper + command consumer skeleton 2026-05-09 22:31:40 +08:00