cm_whatsapp_bot_v1/README.md
yiekheng c906a9fa3a docs: refresh README + add docs/runbook.md for v1 sign-off
- README rewritten to reflect v1 reality: auth bootstrap, AES-GCM
  cookies, three-layer rate limit, duplicate-pair detection,
  logout-before-delete, journal-monotonic guard, the new test
  counts (482 web + 88 bot), and the right scripts (set-password,
  create-user). Drops the telegram-era 'Status' paragraph and the
  earlier 'Auth deferred' bullet.
- docs/runbook.md is a new manual end-to-end smoke checklist
  organised by section: pre-flight, auth bootstrap, user
  management, account pairing (incl. back→re-pair + duplicate-phone
  regression checks), reminder lifecycle (incl. triple-fire +
  reschedule regression checks), account lifecycle, sign-out +
  token-version kill, cross-tenant isolation, log sweep, plus a
  troubleshooting cheatsheet.

Closes P3/T23 + P3/T24.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:45:03 +08:00

209 lines
9.5 KiB
Markdown

# cm WhatsApp Reminder Bot
Self-hosted WhatsApp reminder bot. Pair multiple WhatsApp accounts via
a browser-based PWA, schedule recurring reminders to groups, and watch
the run history all from a phone home-screen icon.
## Status
**v1 production-ready.** The web app at `wabot.04080616.xyz` is the
primary control surface; the Telegram bot has been removed.
What's working today:
- **Username + password auth** with role-based access (admin / user).
HttpOnly + Secure session cookies, encrypted with AES-256-GCM (so a
leaked cookie reveals nothing about userId / role) and bound to the
`OPERATOR_TOKEN_VERSION` env so a single env bump kills every
outstanding session.
- **Three-layer login rate limit** — per-IP + per-username (lower-cased
so case-rotation doesn't help) + a global backstop, so a residential-
proxy attacker can't brute one account by hopping IPs.
- **Self-hosted Next.js 16 PWA** — installable on a phone home screen.
Mobile-first single-row header with a slide-out drawer; desktop
sidebar. Login lives outside the shell on a bare-header surface.
- **Live QR pairing** — server-side Baileys session feeds the QR
payload directly into the browser via Server-Sent Events. Scan,
see "✅ Connected" within seconds, auto-redirect.
- **Duplicate-pair detection** — scanning a QR with a phone already
linked to another account row surfaces a clear "already paired as
&lt;label&gt;" message instead of fighting Baileys for the device.
- **Multi-account, multi-group reminders** — 5-step wizard
(Account → Message → When → Groups → Review) plus per-section edit
pages so you don't have to walk the wizard end-to-end to fix one
field. Recurrence picker covers Daily / Weekly / Monthly / Yearly
with multi-rule support and per-rule fire-time pickers; the rendered
description reads as plain English ("Every week on Mon, Wed, Fri at
09:00") not raw cron. Optional "Pause sending by" deadline that
defaults OFF — operators have to opt in explicitly.
- **Multi-message stacks** — a reminder can carry multiple ordered
parts (text + media), fired in sequence with a 1.5 s gap. Media
files swap at any time from the Edit Message page.
- **Smart media handling** — per-kind WhatsApp size caps (5 MB image,
16 MB video/audio, 100 MB document). HEIC photos and `.mov` videos
fall back to the document delivery path so they reach the recipient
as a downloadable file instead of failing silently.
- **Swipe-to-act rows** — on mobile, swipe a reminder or activity
row left for Delete or right for Pause/Restart/Archive. iOS-Mail
style. Click vs drag is disambiguated by a 6-px tap threshold so a
swipe doesn't accidentally trigger the row's link.
- **Activity tab** — last 200 runs with status filters (Success /
Paused / Failed / Archived). Partial runs surface under both Paused
and Failed; Skipped runs collapse into Archived. Hard-delete and
archive both available; run history survives a reminder deletion.
- **Auto-reconnect on transient drops; restart-survival via Baileys
session persistence.** Pair once, the device stays linked across
container restarts. Logout-on-delete cleans the operator's
linked-devices list on the WhatsApp side too.
- **Hardened pg-boss scheduling** — three-tier dedupe so a triple-
click Save or microsecond-spaced enqueue doesn't fire a reminder
multiple times. Reschedule cancels stale jobs by singletonKey first
so a recurring next-fire never gets silently dropped.
- **Drizzle journal monotonicity guard** — `pnpm migrate` refuses to
run if the `_journal.json` `when` timestamps aren't strictly
increasing (a recurring foot-gun where drizzle would silently skip
a freshly-generated migration). CI tests + the migrate runner both
enforce.
- **All actions audited.** Per-run target results (sent / failed /
skipped) preserved even when the underlying group is removed.
Test count: **482 web + 88 bot = 570** passing.
## Host requirements
Only Docker. No host Node, pnpm, or any other language toolchain —
everything runs in containers via the long-lived `tools` sidecar.
## Architecture in one paragraph
Two app containers and one external dependency. `bot` (Node.js) holds
the live Baileys WhatsApp sessions, the pg-boss scheduler, and a
Postgres `LISTEN bot.command` consumer. `web` (Next.js 16 App Router
+ React 19) is stateless UI: Server Components for reads, Server
Actions for mutations, an SSE endpoint for live updates,
`@serwist/next` for the PWA shell. `tools` is a long-running
Node 22 + pnpm sidecar used for installs / tests / typechecks /
migrations so the host doesn't need a Node toolchain. Postgres lives
external at `192.168.0.210` in a `wabot` database. All cross-service
communication goes through Postgres (`LISTEN/NOTIFY` for events,
table writes for state).
Full design spec:
[`docs/superpowers/specs/2026-05-09-web-app-design.md`](docs/superpowers/specs/2026-05-09-web-app-design.md)
## Quick start (dev)
Prerequisites: Docker, the `wabot` database + `waBot` role on
`192.168.0.210` (with a `pg_hba.conf` line permitting
`192.168.0.0/24`).
```bash
# 1. Configure env
cp envs/.env.example .env.development
# edit .env.development: real DATABASE_URL, plus the LAN host to expose
scripts/gen_auth_secret.sh --write # writes AUTH_SECRET to .env.development
# 2. Bring up the stack, install deps
NO_SUDO=1 scripts/dev.sh up
NO_SUDO=1 scripts/dev.sh pnpm install
# 3. Apply migrations and seed the bootstrap operator row
NO_SUDO=1 scripts/db.sh migrate
NO_SUDO=1 scripts/db.sh seed
# 4. Set the bootstrap admin password (NO password is set by seed)
echo 'change-me-now' | scripts/set-password.sh admin
# 5. Open the web app and sign in as `admin` with the password above
# Local: http://localhost:9000
# LAN: http://<host-ip>:9000
# Public: https://wabot.04080616.xyz
```
Inside the app: `/settings/users` → Add user → invite teammates with
`user` role; promote / demote / reset password / delete from the same
page. The "Admin" nav entry is admin-only.
PWA install: phone Chrome → menu → "Install App" / "Add to Home
Screen". Launches fullscreen.
`NO_SUDO=1` is the right setting if your user is in the `docker`
group (the default for this repo). Drop it if you need `sudo docker`.
## Manual test runbook
End-to-end checks that unit tests can't cover (live Baileys,
WhatsApp delivery, swipe gestures):
[`docs/runbook.md`](docs/runbook.md).
The earlier wizard-only checklist still lives at
[`docs/superpowers/specs/manual-test-web.md`](docs/superpowers/specs/manual-test-web.md).
## Layout
- `apps/bot/` — Baileys WhatsApp + pg-boss scheduler + LISTEN/NOTIFY
command consumer
- `apps/web/` — Next.js 16 App Router PWA
- `packages/db/` — Drizzle schema and migrations
- `packages/shared/` — cross-app helpers (rrule, media paths,
timezones, WhatsApp media classifier)
- `docs/runbook.md` — manual end-to-end smoke checklist
- `docs/superpowers/specs/` — design specs and earlier manual test
runbooks
- `docs/superpowers/plans/` — implementation plans
- `docker/` — Dockerfiles (`tools.Dockerfile`, `bot.Dockerfile`,
`web.Dockerfile`)
- `scripts/``dev.sh`, `db.sh`, `gen_auth_secret.sh`,
`set-password.sh`, `create-user.sh`
## Scripts
All `pnpm`/`tsx`/`drizzle-kit` invocations run inside the `tools`
container, so no host Node is needed.
| Script | Purpose |
|---|---|
| `scripts/dev.sh up\|down\|logs\|status\|build\|exec\|pnpm\|shell\|restart-bot` | Stack lifecycle and tools-container shell |
| `scripts/db.sh migrate\|generate\|studio\|seed\|reset` | Drizzle migration helper |
| `scripts/gen_auth_secret.sh [--write]` | Generate `AUTH_SECRET` (host-only, no Node needed) |
| `scripts/set-password.sh <username>` | Set / reset a user's password (reads stdin) |
| `scripts/create-user.sh <username> <role>` | Create a user from CLI (admin / user) |
Set `NO_SUDO=1` if your user is in the docker group (recommended).
## Auth + admin model
- One bootstrap operator (`admin`) is created by the seed; its
password is set via `scripts/set-password.sh admin` on first launch.
- Two roles: `admin` (full access including user management) and
`user` (everything except `/settings/users`). Role-based nav
filtering is enforced in middleware + the AppShell + every server
action that mutates user state.
- Every user gets an isolated workspace — accounts, reminders,
groups, and run history all scope by `operator_id`. The admin
panel is the only cross-tenant surface.
- Sessions: AES-256-GCM-encrypted cookie keyed off `AUTH_SECRET`,
HttpOnly + Secure-in-prod + SameSite=Lax, 30-day TTL. The
`OPERATOR_TOKEN_VERSION` env (defaults to `"1"`) is the kill switch
— bumping it invalidates every outstanding cookie globally on the
next request.
- Login rate limits: 10 / 5 min per-IP + 5 / 15 min per-username + a
100 / min global backstop. The error message is identical for all
three so the limit-which-tripped isn't leaked.
## Deferred
- **Standalone media library** browser (currently media is uploaded
per-reminder).
- **E2E browser tests** (Playwright) on the swipe and pairing flows.
- **Search-as-you-type in the wizard's groups picker** — at 3 000+
groups per account the picker still loads the alphabetical
top-200; operators with >200 groups need to use the list page's
search to find anything past 'L'.
- **Composite index on `(account_id, name)`** for the groups list
page's `ORDER BY name LIMIT 200` query — currently a sort + limit;
the GIN trigram on `name` plus the unique on `(account_id,
wa_group_jid)` already cover most cases.
- **Self-service password reset** (email link, etc.) — out of scope
for v1; admins use the Users page.