cm_whatsapp_bot_v1/README.md
yiekheng 49f5c16b19 fix(docker): reuse node user instead of creating gid 1000 — unblocks publish
Bot + web Dockerfiles tried to addgroup -g 1000 app on top of
node:22-alpine, which already ships a `node` group at gid 1000.
Build aborted at runtime stage 5/5 with:
  addgroup: gid '1000' in use

Drop the addgroup/adduser pair on both images and just chown +
USER node onto the existing node user. Same hardening posture
(non-root, no shell login on the runtime image), one less moving
part. The compose dev overlay's `user: ${HOST_UID:-1000}:${HOST_GID:-1000}`
matches uid 1000 either way.

Plus:
- New docker-compose.portainer.yml: pulls cm-whatsapp-{bot,web}
  from gitea.04080616.xyz/yiekheng instead of building from
  source. Named volumes for sessions / media so the operator
  doesn't need shell access to manage state. Healthchecks on
  both services so Portainer's UI surfaces unhealthy containers.
- New docs/deploy-portainer.md walking through registry auth,
  stack creation, env vars, migrations, first sign-in, future
  redeploys, rollbacks.
- README links the Portainer guide alongside the dev path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:09:12 +08:00

214 lines
9.6 KiB
Markdown

# cm WhatsApp Reminder Bot
Self-hosted WhatsApp reminder bot. Pair multiple WhatsApp accounts via
a browser-based PWA, schedule recurring reminders to groups, and watch
the run history all from a phone home-screen icon.
## Status
**v1 production-ready.** The web app at `wabot.04080616.xyz` is the
primary control surface; the Telegram bot has been removed.
What's working today:
- **Username + password auth** with role-based access (admin / user).
HttpOnly + Secure session cookies, encrypted with AES-256-GCM (so a
leaked cookie reveals nothing about userId / role) and bound to the
`OPERATOR_TOKEN_VERSION` env so a single env bump kills every
outstanding session.
- **Three-layer login rate limit** — per-IP + per-username (lower-cased
so case-rotation doesn't help) + a global backstop, so a residential-
proxy attacker can't brute one account by hopping IPs.
- **Self-hosted Next.js 16 PWA** — installable on a phone home screen.
Mobile-first single-row header with a slide-out drawer; desktop
sidebar. Login lives outside the shell on a bare-header surface.
- **Live QR pairing** — server-side Baileys session feeds the QR
payload directly into the browser via Server-Sent Events. Scan,
see "✅ Connected" within seconds, auto-redirect.
- **Duplicate-pair detection** — scanning a QR with a phone already
linked to another account row surfaces a clear "already paired as
&lt;label&gt;" message instead of fighting Baileys for the device.
- **Multi-account, multi-group reminders** — 5-step wizard
(Account → Message → When → Groups → Review) plus per-section edit
pages so you don't have to walk the wizard end-to-end to fix one
field. Recurrence picker covers Daily / Weekly / Monthly / Yearly
with multi-rule support and per-rule fire-time pickers; the rendered
description reads as plain English ("Every week on Mon, Wed, Fri at
09:00") not raw cron. Optional "Pause sending by" deadline that
defaults OFF — operators have to opt in explicitly.
- **Multi-message stacks** — a reminder can carry multiple ordered
parts (text + media), fired in sequence with a 1.5 s gap. Media
files swap at any time from the Edit Message page.
- **Smart media handling** — per-kind WhatsApp size caps (5 MB image,
16 MB video/audio, 100 MB document). HEIC photos and `.mov` videos
fall back to the document delivery path so they reach the recipient
as a downloadable file instead of failing silently.
- **Swipe-to-act rows** — on mobile, swipe a reminder or activity
row left for Delete or right for Pause/Restart/Archive. iOS-Mail
style. Click vs drag is disambiguated by a 6-px tap threshold so a
swipe doesn't accidentally trigger the row's link.
- **Activity tab** — last 200 runs with status filters (Success /
Paused / Failed / Archived). Partial runs surface under both Paused
and Failed; Skipped runs collapse into Archived. Hard-delete and
archive both available; run history survives a reminder deletion.
- **Auto-reconnect on transient drops; restart-survival via Baileys
session persistence.** Pair once, the device stays linked across
container restarts. Logout-on-delete cleans the operator's
linked-devices list on the WhatsApp side too.
- **Hardened pg-boss scheduling** — three-tier dedupe so a triple-
click Save or microsecond-spaced enqueue doesn't fire a reminder
multiple times. Reschedule cancels stale jobs by singletonKey first
so a recurring next-fire never gets silently dropped.
- **Drizzle journal monotonicity guard** — `pnpm migrate` refuses to
run if the `_journal.json` `when` timestamps aren't strictly
increasing (a recurring foot-gun where drizzle would silently skip
a freshly-generated migration). CI tests + the migrate runner both
enforce.
- **All actions audited.** Per-run target results (sent / failed /
skipped) preserved even when the underlying group is removed.
Test count: **482 web + 88 bot = 570** passing.
## Host requirements
Only Docker. No host Node, pnpm, or any other language toolchain —
everything runs in containers via the long-lived `tools` sidecar.
## Architecture in one paragraph
Two app containers and one external dependency. `bot` (Node.js) holds
the live Baileys WhatsApp sessions, the pg-boss scheduler, and a
Postgres `LISTEN bot.command` consumer. `web` (Next.js 16 App Router
+ React 19) is stateless UI: Server Components for reads, Server
Actions for mutations, an SSE endpoint for live updates,
`@serwist/next` for the PWA shell. `tools` is a long-running
Node 22 + pnpm sidecar used for installs / tests / typechecks /
migrations so the host doesn't need a Node toolchain. Postgres lives
external at `192.168.0.210` in a `wabot` database. All cross-service
communication goes through Postgres (`LISTEN/NOTIFY` for events,
table writes for state).
Full design spec:
[`docs/superpowers/specs/2026-05-09-web-app-design.md`](docs/superpowers/specs/2026-05-09-web-app-design.md)
## Quick start (dev)
Prerequisites: Docker, the `wabot` database + `waBot` role on
`192.168.0.210` (with a `pg_hba.conf` line permitting
`192.168.0.0/24`).
```bash
# 1. Configure env
cp envs/.env.example .env.development
# edit .env.development: real DATABASE_URL, plus the LAN host to expose
scripts/gen_auth_secret.sh --write # writes AUTH_SECRET to .env.development
# 2. Bring up the stack, install deps
NO_SUDO=1 scripts/dev.sh up
NO_SUDO=1 scripts/dev.sh pnpm install
# 3. Apply migrations and seed the bootstrap operator row
NO_SUDO=1 scripts/db.sh migrate
NO_SUDO=1 scripts/db.sh seed
# 4. Set the bootstrap admin password (NO password is set by seed)
echo 'change-me-now' | scripts/set-password.sh admin
# 5. Open the web app and sign in as `admin` with the password above
# Local: http://localhost:9000
# LAN: http://<host-ip>:9000
# Public: https://wabot.04080616.xyz
```
Inside the app: `/settings/users` → Add user → invite teammates with
`user` role; promote / demote / reset password / delete from the same
page. The "Admin" nav entry is admin-only.
PWA install: phone Chrome → menu → "Install App" / "Add to Home
Screen". Launches fullscreen.
`NO_SUDO=1` is the right setting if your user is in the `docker`
group (the default for this repo). Drop it if you need `sudo docker`.
## Deploying
- **Local dev** — `NO_SUDO=1 scripts/dev.sh up` (described in Quick
start above).
- **Portainer** — push images with `scripts/publish.sh`, then deploy
the [`docker-compose.portainer.yml`](docker-compose.portainer.yml)
stack via the Portainer UI. Full walk-through:
[`docs/deploy-portainer.md`](docs/deploy-portainer.md).
## Manual test runbook
End-to-end checks that unit tests can't cover (live Baileys,
WhatsApp delivery, swipe gestures):
[`docs/runbook.md`](docs/runbook.md).
The earlier wizard-only checklist still lives at
[`docs/superpowers/specs/manual-test-web.md`](docs/superpowers/specs/manual-test-web.md).
## Layout
- `apps/bot/` — Baileys WhatsApp + pg-boss scheduler + LISTEN/NOTIFY
command consumer
- `apps/web/` — Next.js 16 App Router PWA
- `packages/db/` — Drizzle schema and migrations
- `packages/shared/` — cross-app helpers (rrule, media paths,
timezones, WhatsApp media classifier)
- `docs/runbook.md` — manual end-to-end smoke checklist
- `docs/superpowers/specs/` — design specs and earlier manual test
runbooks
- `docs/superpowers/plans/` — implementation plans
- `docker/` — Dockerfiles (`tools.Dockerfile`, `bot.Dockerfile`,
`web.Dockerfile`)
- `scripts/``dev.sh`, `db.sh`, `gen_auth_secret.sh`,
`set-password.sh`, `create-user.sh`
## Scripts
All `pnpm`/`tsx`/`drizzle-kit` invocations run inside the `tools`
container, so no host Node is needed.
| Script | Purpose |
|---|---|
| `scripts/dev.sh up\|down\|logs\|status\|build\|exec\|pnpm\|shell\|restart-bot` | Stack lifecycle and tools-container shell |
| `scripts/db.sh migrate\|generate\|studio\|seed\|reset` | Drizzle migration helper |
| `scripts/gen_auth_secret.sh [--write]` | Generate `AUTH_SECRET` (host-only, no Node needed) |
| `scripts/set-password.sh <username>` | Set / reset a user's password (reads stdin) |
| `scripts/create-user.sh <username> <role>` | Create a user from CLI (admin / user) |
Set `NO_SUDO=1` if your user is in the docker group (recommended).
## Auth + admin model
- One bootstrap operator (`admin`) is created by the seed; its
password is set via `scripts/set-password.sh admin` on first launch.
- Two roles: `admin` (full access including user management) and
`user` (everything except `/settings/users`). Role-based nav
filtering is enforced in middleware + the AppShell + every server
action that mutates user state.
- Every user gets an isolated workspace — accounts, reminders,
groups, and run history all scope by `operator_id`. The admin
panel is the only cross-tenant surface.
- Sessions: AES-256-GCM-encrypted cookie keyed off `AUTH_SECRET`,
HttpOnly + Secure-in-prod + SameSite=Lax, 30-day TTL. The
`OPERATOR_TOKEN_VERSION` env (defaults to `"1"`) is the kill switch
— bumping it invalidates every outstanding cookie globally on the
next request.
- Login rate limits: 10 / 5 min per-IP + 5 / 15 min per-username + a
100 / min global backstop. The error message is identical for all
three so the limit-which-tripped isn't leaked.
## Deferred
- **Standalone media library** browser (currently media is uploaded
per-reminder).
- **E2E browser tests** (Playwright) on the swipe and pairing flows.
- **Search-as-you-type in the wizard's groups picker** — at 3 000+
groups per account the picker still loads the alphabetical
top-200; operators with >200 groups need to use the list page's
search to find anything past 'L'.
- **Self-service password reset** (email link, etc.) — out of scope
for v1; admins use the Users page.