- README rewritten to reflect v1 reality: auth bootstrap, AES-GCM cookies, three-layer rate limit, duplicate-pair detection, logout-before-delete, journal-monotonic guard, the new test counts (482 web + 88 bot), and the right scripts (set-password, create-user). Drops the telegram-era 'Status' paragraph and the earlier 'Auth deferred' bullet. - docs/runbook.md is a new manual end-to-end smoke checklist organised by section: pre-flight, auth bootstrap, user management, account pairing (incl. back→re-pair + duplicate-phone regression checks), reminder lifecycle (incl. triple-fire + reschedule regression checks), account lifecycle, sign-out + token-version kill, cross-tenant isolation, log sweep, plus a troubleshooting cheatsheet. Closes P3/T23 + P3/T24. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.2 KiB
8.2 KiB
Manual end-to-end runbook (v1)
Smoke checklist for verifying a fresh deploy. Unit tests don't catch the live-Baileys / live-Postgres / browser-gesture path; this is what you run before declaring a release good.
Time budget: ~10 minutes if everything works, ~30 if a step fails.
Pre-flight
- Stack up.
docker ps | grep cmbot→ expectcmbot-tools,cmbot-bot,cmbot-weballUp. - Migrations clean.
NO_SUDO=1 scripts/db.sh migrate→ "Migrations applied." (and not "Refusing to run drizzle migrate" — that's the journal monotonicity guard tripping). - Web reachable.
curl -sf http://localhost:9000/api/health→ 200. - Bot reachable.
curl -sf http://localhost:8081/health→ 200.
If any pre-flight fails, fix before continuing.
1. Auth bootstrap
scripts/db.sh seed(idempotent — only inserts theadminoperator if missing).echo 'change-me-now' | scripts/set-password.sh admin→ "Password updated."- Open
http://localhost:9000/login→ enteradmin/ the password → redirected to/. - Wrong password three times in a row still rate-limits but with the generic "Too many attempts" message — no leak about which limit (IP / username / global) tripped.
- Hit
/adminURL while signed out → redirected to/loginwith?next=/admin. After a successful login, lands back on/admin.
2. User management (admin-only)
- Sidebar / drawer: only one nav entry highlights at a time.
On
/settings/users, onlyAdminlights up;Settingsdoes not. /settings/users→ Add user → usernamealice, passwordalpha7!, roleuser→ "User created."alicerow shows: username +youchip if applicable, role pill, Promote / Reset / Delete buttons on row 2.- Promote
aliceto admin → page revalidates, badge flips toadmin. - Demote back to
user. - Last-admin guard: Demote / Delete on the only remaining admin row are both disabled.
- Delete
alicevia the confirm dialog (Cancel + Delete user buttons; no third "Close" button — the static guard test catches that regression but eyeball it anyway).
3. Account pairing
/accounts→ New Account → labelWaBot Test→ Pair WhatsApp. Land on the live QR page within ~2 s.- Login screen header is JUST the centered brand mark — no nav, no menu drawer.
- Scan with WhatsApp → "Linked Devices" → "Link a device".
- Connection success. Page transitions through
qr→ (briefrestart-requiredclose handled silently) →connectedwith a green check and+60xxxphone number → auto-redirect to/accounts/<id>after 3 s. - Refresh Groups button on
/accounts/<id>/groups→ spinner during the sync, page auto-refreshes when the bot pushesgroups.syncedover SSE. No manual reload needed.
Pair regression checks (these caught real bugs)
- Back → Re-pair: from a live QR, click ← Back → Pair again from the account detail page. Should NOT instantly flash "Pairing timed out". A new QR appears and the countdown restarts at 5:00.
- Duplicate phone: with one phone already paired, scan its QR from a second account row → see the amber "Phone already linked" panel naming the existing account. The original account's session stays intact.
4. Reminder lifecycle
/reminders→ New Reminder → walk the wizard: - Step 1: pickWaBot Test. - Step 2: enter a short text message ("smoke test <timestamp>"). - Step 3: pickDailyrecurrence, fire ~2 minutes from now. Confirm "Pause sending by" checkbox is unchecked by default. - Step 4: select 1 group. - Step 5: review → Save.- Reminder appears on
/reminderswith statusActive. Recurrence column shows the human-readable description; long descriptions truncate with…. - Wait for the fire window. When the time hits, the message lands in the WhatsApp group exactly once.
/activity→ the run shows underSuccess. Default tab is Success (noAlltab).- Swipe-left a row → Delete shelf appears. Swipe-right → Pause / Restart shelf. Tapping a row navigates to its detail; dragging does NOT navigate (6-px threshold).
- Pause the reminder → status flips to
Pausedimmediately and the next-fire-time disappears. - Restart → fires on the next scheduled occurrence.
Reminder regression checks
- Triple-fire repro (only if you have a tame group): edit the reminder repeatedly within microseconds of each other (e.g. the wizard Save button hammered three times). The message must land exactly once. The bot logs should show "duplicate fire detected inside mutex" warnings on the second and third attempts.
- Reschedule under existing job: edit a recurring reminder's
schedule to a NEW time before its next-fire arrives. The new
time must fire (the old
createdjob is nowcancelledinpgboss.job; verify withselect state, count(*) from pgboss.job where name='reminder.fire' group by state).
5. Account lifecycle
- Unpair the account from
/accounts/<id>. Confirm dialog (Cancel + Yes, unpair). The account row stays in the list with "Unpaired" status; groups disappear from the picker (they're soft-archived, not deleted). - Re-pair the same account → groups come back via the
on-conflict upsert flipping
is_archivedback to false. - Delete the account from
/accounts/<id>→ Confirm dialog → the account vanishes from/accounts. Check on the phone's WhatsApp Linked Devices list — the entry is gone (the logout-before-stop flow tells WhatsApp to drop it).
6. Sign-out + session lifetime
- Sign out from the sidebar / drawer footer → land on
/login. - Hit any protected URL → redirected to login.
- Token-version kill switch: set
OPERATOR_TOKEN_VERSION=2in.env.development, restart the web container. Every previously-issued cookie is now invalid; every authenticated request bounces to/login. Reset to1after.
7. Cross-tenant isolation
- Sign in as
admin. Note dashboard counter values. - As admin, create a second user
boband give them a fresh account / reminder / fire it once. - Sign out, sign in as
bob. Dashboard counters MUST show only bob's numbers (not admin's)./reminderslists only bob's reminders./accountsonly bob's accounts.
8. Sweep
docker logs cmbot-web --since 10m | grep -iE 'error|⨯'— no output (or only Baileys "Stream Errored (restart required)" noise; that's upstream).docker logs cmbot-bot --since 10m | grep -iE 'error|fatal'— no output beyond the same Baileys upstream noise.git statusclean (no leftover_check.tsor temp files).
When a step fails
- Migration refused with "Refusing to run drizzle migrate":
open
packages/db/migrations/meta/_journal.jsonand bump the flagged entry'swhento the suggested value. Re-run. - Pair shows immediate timeout: bot logs should mention "ignoring
close from previous attempt while warming up" — that's the fix
working, but check a stale Baileys session isn't gummed up. Last
resort:
rm -rf dev-data/sessions/<accountId>and re-pair. - Reminder fires twice: check
pgboss.queue.policyforreminder.fire— must bestandard, notstately(stately drops reschedules silently). TheregisterReminderJobsboot hook force-flips this on every bot start. - Delete didn't remove the linked-device entry on the phone:
the bot's
socket.logout()is best-effort — if the socket was already disconnected when delete fired, the operator removes the entry manually from WhatsApp's UI.
If any of the regression checks (Back→Re-pair, duplicate phone, triple-fire, reschedule) fail, that's a real bug — capture the bot log and file an issue before shipping.