cm_whatsapp_bot_v1/README.md
yiekheng 49f5c16b19 fix(docker): reuse node user instead of creating gid 1000 — unblocks publish
Bot + web Dockerfiles tried to addgroup -g 1000 app on top of
node:22-alpine, which already ships a `node` group at gid 1000.
Build aborted at runtime stage 5/5 with:
  addgroup: gid '1000' in use

Drop the addgroup/adduser pair on both images and just chown +
USER node onto the existing node user. Same hardening posture
(non-root, no shell login on the runtime image), one less moving
part. The compose dev overlay's `user: ${HOST_UID:-1000}:${HOST_GID:-1000}`
matches uid 1000 either way.

Plus:
- New docker-compose.portainer.yml: pulls cm-whatsapp-{bot,web}
  from gitea.04080616.xyz/yiekheng instead of building from
  source. Named volumes for sessions / media so the operator
  doesn't need shell access to manage state. Healthchecks on
  both services so Portainer's UI surfaces unhealthy containers.
- New docs/deploy-portainer.md walking through registry auth,
  stack creation, env vars, migrations, first sign-in, future
  redeploys, rollbacks.
- README links the Portainer guide alongside the dev path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:09:12 +08:00

9.6 KiB

cm WhatsApp Reminder Bot

Self-hosted WhatsApp reminder bot. Pair multiple WhatsApp accounts via a browser-based PWA, schedule recurring reminders to groups, and watch the run history all from a phone home-screen icon.

Status

v1 production-ready. The web app at wabot.04080616.xyz is the primary control surface; the Telegram bot has been removed.

What's working today:

  • Username + password auth with role-based access (admin / user). HttpOnly + Secure session cookies, encrypted with AES-256-GCM (so a leaked cookie reveals nothing about userId / role) and bound to the OPERATOR_TOKEN_VERSION env so a single env bump kills every outstanding session.
  • Three-layer login rate limit — per-IP + per-username (lower-cased so case-rotation doesn't help) + a global backstop, so a residential- proxy attacker can't brute one account by hopping IPs.
  • Self-hosted Next.js 16 PWA — installable on a phone home screen. Mobile-first single-row header with a slide-out drawer; desktop sidebar. Login lives outside the shell on a bare-header surface.
  • Live QR pairing — server-side Baileys session feeds the QR payload directly into the browser via Server-Sent Events. Scan, see " Connected" within seconds, auto-redirect.
  • Duplicate-pair detection — scanning a QR with a phone already linked to another account row surfaces a clear "already paired as <label>" message instead of fighting Baileys for the device.
  • Multi-account, multi-group reminders — 5-step wizard (Account → Message → When → Groups → Review) plus per-section edit pages so you don't have to walk the wizard end-to-end to fix one field. Recurrence picker covers Daily / Weekly / Monthly / Yearly with multi-rule support and per-rule fire-time pickers; the rendered description reads as plain English ("Every week on Mon, Wed, Fri at 09:00") not raw cron. Optional "Pause sending by" deadline that defaults OFF — operators have to opt in explicitly.
  • Multi-message stacks — a reminder can carry multiple ordered parts (text + media), fired in sequence with a 1.5 s gap. Media files swap at any time from the Edit Message page.
  • Smart media handling — per-kind WhatsApp size caps (5 MB image, 16 MB video/audio, 100 MB document). HEIC photos and .mov videos fall back to the document delivery path so they reach the recipient as a downloadable file instead of failing silently.
  • Swipe-to-act rows — on mobile, swipe a reminder or activity row left for Delete or right for Pause/Restart/Archive. iOS-Mail style. Click vs drag is disambiguated by a 6-px tap threshold so a swipe doesn't accidentally trigger the row's link.
  • Activity tab — last 200 runs with status filters (Success / Paused / Failed / Archived). Partial runs surface under both Paused and Failed; Skipped runs collapse into Archived. Hard-delete and archive both available; run history survives a reminder deletion.
  • Auto-reconnect on transient drops; restart-survival via Baileys session persistence. Pair once, the device stays linked across container restarts. Logout-on-delete cleans the operator's linked-devices list on the WhatsApp side too.
  • Hardened pg-boss scheduling — three-tier dedupe so a triple- click Save or microsecond-spaced enqueue doesn't fire a reminder multiple times. Reschedule cancels stale jobs by singletonKey first so a recurring next-fire never gets silently dropped.
  • Drizzle journal monotonicity guardpnpm migrate refuses to run if the _journal.json when timestamps aren't strictly increasing (a recurring foot-gun where drizzle would silently skip a freshly-generated migration). CI tests + the migrate runner both enforce.
  • All actions audited. Per-run target results (sent / failed / skipped) preserved even when the underlying group is removed.

Test count: 482 web + 88 bot = 570 passing.

Host requirements

Only Docker. No host Node, pnpm, or any other language toolchain — everything runs in containers via the long-lived tools sidecar.

Architecture in one paragraph

Two app containers and one external dependency. bot (Node.js) holds the live Baileys WhatsApp sessions, the pg-boss scheduler, and a Postgres LISTEN bot.command consumer. web (Next.js 16 App Router

  • React 19) is stateless UI: Server Components for reads, Server Actions for mutations, an SSE endpoint for live updates, @serwist/next for the PWA shell. tools is a long-running Node 22 + pnpm sidecar used for installs / tests / typechecks / migrations so the host doesn't need a Node toolchain. Postgres lives external at 192.168.0.210 in a wabot database. All cross-service communication goes through Postgres (LISTEN/NOTIFY for events, table writes for state).

Full design spec: docs/superpowers/specs/2026-05-09-web-app-design.md

Quick start (dev)

Prerequisites: Docker, the wabot database + waBot role on 192.168.0.210 (with a pg_hba.conf line permitting 192.168.0.0/24).

# 1. Configure env
cp envs/.env.example .env.development
# edit .env.development: real DATABASE_URL, plus the LAN host to expose
scripts/gen_auth_secret.sh --write       # writes AUTH_SECRET to .env.development

# 2. Bring up the stack, install deps
NO_SUDO=1 scripts/dev.sh up
NO_SUDO=1 scripts/dev.sh pnpm install

# 3. Apply migrations and seed the bootstrap operator row
NO_SUDO=1 scripts/db.sh migrate
NO_SUDO=1 scripts/db.sh seed

# 4. Set the bootstrap admin password (NO password is set by seed)
echo 'change-me-now' | scripts/set-password.sh admin

# 5. Open the web app and sign in as `admin` with the password above
#    Local:  http://localhost:9000
#    LAN:    http://<host-ip>:9000
#    Public: https://wabot.04080616.xyz

Inside the app: /settings/users → Add user → invite teammates with user role; promote / demote / reset password / delete from the same page. The "Admin" nav entry is admin-only.

PWA install: phone Chrome → menu → "Install App" / "Add to Home Screen". Launches fullscreen.

NO_SUDO=1 is the right setting if your user is in the docker group (the default for this repo). Drop it if you need sudo docker.

Deploying

Manual test runbook

End-to-end checks that unit tests can't cover (live Baileys, WhatsApp delivery, swipe gestures): docs/runbook.md.

The earlier wizard-only checklist still lives at docs/superpowers/specs/manual-test-web.md.

Layout

  • apps/bot/ — Baileys WhatsApp + pg-boss scheduler + LISTEN/NOTIFY command consumer
  • apps/web/ — Next.js 16 App Router PWA
  • packages/db/ — Drizzle schema and migrations
  • packages/shared/ — cross-app helpers (rrule, media paths, timezones, WhatsApp media classifier)
  • docs/runbook.md — manual end-to-end smoke checklist
  • docs/superpowers/specs/ — design specs and earlier manual test runbooks
  • docs/superpowers/plans/ — implementation plans
  • docker/ — Dockerfiles (tools.Dockerfile, bot.Dockerfile, web.Dockerfile)
  • scripts/dev.sh, db.sh, gen_auth_secret.sh, set-password.sh, create-user.sh

Scripts

All pnpm/tsx/drizzle-kit invocations run inside the tools container, so no host Node is needed.

Script Purpose
scripts/dev.sh up|down|logs|status|build|exec|pnpm|shell|restart-bot Stack lifecycle and tools-container shell
scripts/db.sh migrate|generate|studio|seed|reset Drizzle migration helper
scripts/gen_auth_secret.sh [--write] Generate AUTH_SECRET (host-only, no Node needed)
scripts/set-password.sh <username> Set / reset a user's password (reads stdin)
scripts/create-user.sh <username> <role> Create a user from CLI (admin / user)

Set NO_SUDO=1 if your user is in the docker group (recommended).

Auth + admin model

  • One bootstrap operator (admin) is created by the seed; its password is set via scripts/set-password.sh admin on first launch.
  • Two roles: admin (full access including user management) and user (everything except /settings/users). Role-based nav filtering is enforced in middleware + the AppShell + every server action that mutates user state.
  • Every user gets an isolated workspace — accounts, reminders, groups, and run history all scope by operator_id. The admin panel is the only cross-tenant surface.
  • Sessions: AES-256-GCM-encrypted cookie keyed off AUTH_SECRET, HttpOnly + Secure-in-prod + SameSite=Lax, 30-day TTL. The OPERATOR_TOKEN_VERSION env (defaults to "1") is the kill switch — bumping it invalidates every outstanding cookie globally on the next request.
  • Login rate limits: 10 / 5 min per-IP + 5 / 15 min per-username + a 100 / min global backstop. The error message is identical for all three so the limit-which-tripped isn't leaked.

Deferred

  • Standalone media library browser (currently media is uploaded per-reminder).
  • E2E browser tests (Playwright) on the swipe and pairing flows.
  • Search-as-you-type in the wizard's groups picker — at 3 000+ groups per account the picker still loads the alphabetical top-200; operators with >200 groups need to use the list page's search to find anything past 'L'.
  • Self-service password reset (email link, etc.) — out of scope for v1; admins use the Users page.