# WhatsApp Reminder Bot — Design **Status:** Draft **Date:** 2026-05-03 **Author:** yiekheng (developer); operator: brother (single end-user) ## 1. Purpose Self-hosted WhatsApp reminder bot. The operator manages 10+ WhatsApp accounts (each tied to a different business responsibility), schedules one-off and recurring reminder messages — text, photos, videos — to specific WhatsApp groups, and receives login QR codes through a private Telegram bot. The system runs 24/7 on the operator's home Docker server, behind a reverse proxy, on a self-hosted Gitea registry. ## 2. Stakeholders & access - **Developer (you):** builds and maintains. Has full access to dev environment (mock WA account, dev Telegram bot). - **Operator (brother):** the single end-user in production. Pairs all real WA accounts, creates and manages reminders, receives QR codes via Telegram. Holds all production credentials. - **Customers (in WA groups):** unaware of the bot — they just receive messages from the WA accounts the operator owns. Access to the bot is gated by **Telegram user ID whitelist** (configured in env). Web UI access requires a Telegram-issued magic link, so only Telegram-trusted operators can sign in to the dashboard. ## 3. Constraints accepted up front - **Unofficial WhatsApp protocol.** Built on Baileys (`@whiskeysockets/baileys`). Violates WhatsApp ToS. Account ban risk is non-zero, especially for spam-pattern usage. Acceptable for this customer-reminder use case where messages go to known groups. - **Self-hosted infrastructure.** Postgres at `192.168.0.210` (already running). Home Docker server runs Portainer; reverse proxy is aaPanel. Domain `04080616.xyz` is available for the web UI subdomain. - **Self-hosted Gitea.** Git remote at `http://192.168.0.215:3000/yiekheng/cm_whatsapp_bot_v1.git`. Container registry at `gitea.04080616.xyz/yiekheng`. - **Single-operator threat model.** No tenant isolation. Both developer and operator are effectively admins. The repo is private to the developer. `.env` files **may** be committed to the private Gitea (operator's choice — documented trade-off below). ## 4. High-level architecture Two app containers + one external dependency. Communication between apps goes through Postgres only. ``` ┌─────────────────────────────────────────────────────────────────────┐ │ Home Docker server │ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ web │ │ bot │ │ │ │ (Next.js) │◄───────►│ (Node.js) │ │ │ │ │ via │ │ │ │ │ PWA │ Postgres│ Baileys │ │ │ │ Dashboard │ LISTEN/ │ Telegram │ │ │ │ API routes │ NOTIFY │ pg-boss │ │ │ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ shared volume: │ │ │ │ /data/media │ /data/sessions// │ │ │ │ (Baileys auth state) │ │ │ │ │ │ └───────────┬───────────┘ │ │ │ │ │ ▼ │ │ aaPanel reverse proxy ─► bot.04080616.xyz │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ ▼ ▼ Postgres at 192.168.0.210 Telegram Bot API (cloud) (whatsapp_bot_dev / via grammy long-polling whatsapp_bot_prod) (or webhook later) ``` ### Service responsibilities | Service | Stateless | Owns | |---|---|---| | `web` | Yes (can restart freely) | UI, auth, server actions, media upload, SSE for live updates | | `bot` | No (long-lived sessions) | All Baileys WhatsApp sessions, Telegram bot, pg-boss scheduler, reminder firing, group sync | **Why split:** Next.js is built for stateless request/response; WhatsApp sessions are long-lived stateful WebSockets that must survive web deploys. Splitting lets us redeploy `web` (frontend changes) without dropping any active WA sessions. ### Why Postgres-as-bus instead of Redis or HTTP - One less service to run, one less dependency to monitor. - All bot↔web communication shares the same transactional boundary as data writes — if a write commits, downstream listeners see it. - pg-boss provides BullMQ-equivalent functionality (delayed jobs, recurring jobs, retries, dead-letter) at a scale (10+ accounts, hundreds of reminders/day max) where Redis throughput advantages are irrelevant. - LISTEN/NOTIFY covers live UI updates (e.g., "session connected" toast). ## 5. Tech stack - **Language:** TypeScript everywhere. - **Frontend:** Next.js 16 (App Router), Server Components + Server Actions, PWA-installable (manifest + browser service worker for offline app shell). Visual design will be built using the `frontend-design:frontend-design` skill during implementation. - **Backend:** Node.js 22 + TypeScript for the `bot` service. - **WhatsApp:** `@whiskeysockets/baileys` (no browser; pure WebSocket). - **Telegram:** `grammy` framework (long-polling in dev, can switch to webhook in prod). - **Database:** PostgreSQL (external at `192.168.0.210`). Drizzle ORM. Migrations in `packages/db/migrations`. - **Job queue:** `pg-boss` (Postgres-native). - **QR rendering:** `qrcode` library (string → PNG buffer). - **Recurrence:** `rrule` library (RFC 5545). - **Logging:** `pino` (JSON to stdout). - **Validation:** `zod` (env, request bodies, server actions). - **Build orchestration:** pnpm workspaces + Turborepo. ## 6. Repository layout ``` cm_whatsapp_bot_v1/ ├── apps/ │ ├── web/ Next.js — UI + API routes + PWA │ └── bot/ Node — Baileys + Telegram + scheduler ├── packages/ │ ├── db/ Drizzle schema, migrations, queries │ └── shared/ cross-app types, rrule helpers, paths ├── docker/ │ ├── web.Dockerfile │ └── bot.Dockerfile ├── docker-compose.base.yml service definitions, networks ├── docker-compose.dev.yml dev overrides: hot reload, exposed ports ├── docker-compose.prod.yml prod: registry images, named volumes ├── scripts/ │ ├── dev.sh up | down | logs | status | reset-db │ ├── publish.sh build & push to Gitea registry │ ├── gen_auth_secret.sh generate AUTH_SECRET │ ├── db.sh migrate | rollback | seed | studio | reset │ └── link-account.sh CLI helper to pair the dev WA mock account ├── envs/ │ ├── .env.example documented template │ ├── .env.development dev TG bot, mock WA account, dev DB │ └── .env.production prod TG bot, real accounts, prod DB └── docs/ └── superpowers/specs/ └── 2026-05-03-whatsapp-bot-design.md ``` ## 7. Environment separation | Concern | Local dev | Production | |---|---|---| | Compose file | `base + dev.yml` | `base + prod.yml` | | Image source | local build | `gitea.04080616.xyz/yiekheng/cm-whatsapp-{web,bot}:${IMAGE_TAG}` | | Postgres database | `whatsapp_bot_dev` on `192.168.0.210` | `whatsapp_bot_prod` on `192.168.0.210` | | Postgres role | dev role with limited grants | prod role | | Telegram bot | separate dev bot (`@..._dev_bot`) — operator's QR codes never go to prod chat | production bot | | WhatsApp accounts | mock/test phone | operator's real 10+ accounts | | Web URL | `http://localhost:3000` | `https://bot.04080616.xyz` (subdomain to be confirmed) | | Hot reload | yes (Next.js HMR + tsx watch) | no | | Volumes | `./dev-data/{media,sessions}` bind mounts | named volumes | The `bot` service runs on its own internal port (8081) for health checks; not exposed externally in either env. Env-validation runs at startup via zod. Missing or malformed env values cause an immediate fast-fail exit with a clear message — both services refuse to come up half-configured. ## 8. Deploy flow ``` dev machine Gitea (192.168.0.215) home server (Portainer) ──────────── ────────────────────── ────────────────────── git push ───► cm_whatsapp_bot_v1.git scripts/publish.sh v1.0.0 ───► gitea.04080616.xyz/ yiekheng/ cm-whatsapp-web:v1.0.0 cm-whatsapp-bot:v1.0.0 Portainer stack → docker-compose.prod.yml IMAGE_TAG=v1.0.0 pulls images, runs containers aaPanel proxy → bot.04080616.xyz → web:3000 ``` Image tags: - `latest` — current main HEAD (manual publish for now; CI later if needed). - `vX.Y.Z` — release tags for production rollouts; pin in `.env.production`. - `dev-` — ad-hoc images for testing on the home server before cutting a release. Rollback = change `IMAGE_TAG` in `.env.production` and recreate the stack in Portainer. ## 9. Data model ORM: Drizzle. Migrations versioned in `packages/db/migrations/`. ### Tables ``` operators — people who can use the bot ───────────────────────────────── id uuid pk telegram_user_id bigint unique — primary identity (whitelist key) display_name text role text — 'admin' (only role for v1) default_timezone text — IANA, e.g. 'Asia/Kuala_Lumpur' created_at timestamptz whatsapp_accounts — each WA account the operator manages ───────────────────────────────── id uuid pk operator_id uuid fk → operators label text — operator-defined, e.g. "Sales 1" phone_number text nullable — populated after pairing status text — pending | connecting | connected | disconnected | logged_out | banned last_connected_at timestamptz nullable last_qr_at timestamptz nullable created_at timestamptz unique(operator_id, label) whatsapp_groups — groups discovered per account ───────────────────────────────── id uuid pk account_id uuid fk → whatsapp_accounts wa_group_jid text — WhatsApp's group JID name text participant_count int is_archived bool default false last_synced_at timestamptz unique(account_id, wa_group_jid) media_files — uploaded photos/videos/documents ───────────────────────────────── id uuid pk operator_id uuid fk → operators filename_original text mime_type text size_bytes bigint sha256 text storage_path text — relative to /data/media/ created_at timestamptz reminders — scheduled sends ───────────────────────────────── id uuid pk account_id uuid fk → whatsapp_accounts name text schedule_kind text — 'one_off' | 'recurring' scheduled_at timestamptz nullable — for one_off rrule text nullable — RFC 5545 rrule string timezone text — IANA ends_at timestamptz nullable max_runs int nullable status text — 'active' | 'paused' | 'ended' created_by uuid fk → operators created_at timestamptz updated_at timestamptz reminder_targets — groups a reminder fires into ───────────────────────────────── reminder_id uuid fk → reminders group_id uuid fk → whatsapp_groups position int pk(reminder_id, group_id) reminder_messages — message parts in send order ───────────────────────────────── id uuid pk reminder_id uuid fk → reminders position int kind text — 'text' | 'image' | 'video' | 'document' text_content text nullable — text body or media caption media_id uuid fk → media_files nullable reminder_runs — execution records ───────────────────────────────── id uuid pk reminder_id uuid fk → reminders fired_at timestamptz status text — 'success' | 'partial' | 'failed' | 'skipped' error_summary text nullable reminder_run_targets — per-target outcomes ───────────────────────────────── run_id uuid fk → reminder_runs group_id uuid fk → whatsapp_groups status text — 'sent' | 'failed' | 'skipped' wa_message_id text nullable error text nullable latency_ms int nullable pk(run_id, group_id) audit_log — append-only action history ───────────────────────────────── id uuid pk operator_id uuid fk → operators nullable source text — 'web' | 'telegram' | 'system' action text — 'reminder.create' | 'account.pair' | ... target_type text nullable target_id uuid nullable payload jsonb created_at timestamptz auth_sessions — web UI cookies ───────────────────────────────── id uuid pk operator_id uuid fk → operators token_hash text unique — SHA-256 of cookie value created_at timestamptz expires_at timestamptz last_used_at timestamptz ip_address inet nullable user_agent text nullable ``` `pg-boss` creates and owns its own `pgboss.*` schema in the same database — namespace-isolated, no manual setup required beyond initial migration. ### Key model decisions - **Recurring schedules use RRULE (RFC 5545), not cron.** RRULE expresses "every Monday and Wednesday at 9am, 20 occurrences" naturally; cron cannot. Library: `rrule` on Node. - **Timezone is per-reminder, not per-account.** Operator may run accounts spanning markets in different time zones. Default fills in from operator's `default_timezone`. - **Baileys auth state on disk, not in Postgres.** Path `/data/sessions//`, using Baileys `useMultiFileAuthState`. That's the upstream-supported path; the file set is many small frequently-mutating files (signal protocol keys); Postgres is the wrong shape. Volume is part of host backup strategy. - **Audit log is append-only.** Never updated, only inserted. Powers "who created this", "when did this account get paired", etc. - **Media in object-store-like layout on disk.** Path `/data/media/{yyyy/mm}/{uuid}.{ext}`. Postgres holds metadata only. Sweeper deletes media unreferenced by any reminder after configurable retention (default 90 days). Migration path to MinIO/S3 later: only the storage adapter changes. - **Web auth via Telegram magic link.** Operator types `/login` to the Telegram bot → bot replies with a one-time URL → click sets a session cookie via `auth_sessions`. No passwords. The operator pool is exactly the Telegram-whitelisted set. ### Out of v1 (YAGNI; easy to add later) - Templates with variable substitution (`{customer_name}`, `{day}`). - Multi-tenant operator isolation beyond the existing whitelist. - Per-customer message personalization. - Conversation threads beyond the reminder firing log. - A/B testing of reminder content. - Web push notifications (Telegram already pushes alerts). ## 10. QR pairing flow (headline UX) ``` 1. Operator (Telegram): /pair "Sales Account 3" 2. Bot inserts whatsapp_accounts row { status: 'pending', label: 'Sales Account 3' } 3. Bot starts Baileys session for that account_id ├─ session dir: /data/sessions// └─ uses useMultiFileAuthState (auto-persists creds + signal keys) 4. Baileys emits connection.update { qr: '...' } 5. Bot renders QR string → PNG, sends to operator's TG chat "📱 Scan with WhatsApp on Sales Account 3. Expires in 30s." (Baileys re-emits QR every ~20s; bot edits the same TG message via editMessageMedia) 6. Operator scans → Baileys emits connection.update { connection: 'open' } 7. Bot updates row { status: 'connected', phone_number: '+60xxx', last_connected_at: now } Bot sends TG: "✅ Sales Account 3 connected as +60xxxxxxx" Bot pgNotify('web.event', { type: 'session.connected', account_id }) 8. Bot triggers group-sync → upserts whatsapp_groups Bot sends TG: "Synced 12 groups. Ready to send." ``` ### Pairing edge cases | Situation | Behavior | |---|---| | QR expires (no scan in ~30s) | Baileys re-emits; bot edits same TG message with new QR. After 5 cycles (~2.5 min): timeout, mark account `pending`, TG: "Pairing timed out — try `/pair` again." | | Bot container restart mid-pairing | Startup sweeper drops any `pending` accounts with stale `last_qr_at`; operator re-runs `/pair`. | | `/pair` on already-connected label | Reject: "Account 'X' already connected. Use `/unpair X` first." | | WA logout from phone (linked-device removed) | Baileys `connection.close` with reason `loggedOut`. Bot marks `logged_out`, sends TG alert with re-pair instruction. Reminders for that account skip with reason `account_logged_out`. | | Network drop on connected session | Baileys auto-reconnects (built-in). Alert only if downtime >5 min. | | Web-initiated pair | Same flow; QR PNG also streamed to the open web modal via SSE so operator can scan from web instead of phone-Telegram. | ## 11. Reminder execution flow ``` On reminder create/edit (from web or Telegram): → DB row inserted/updated (transaction with reminder_targets, reminder_messages) → pgNotify('bot.command', { type: 'reminder.upsert', id }) → bot.scheduler upserts the reminder into pg-boss: one_off → schedule single delayed job at scheduled_at recurring → compute next occurrence from rrule, schedule delayed job; on completion, fire-reminder schedules the next occurrence When pg-boss fires the job: fire-reminder.handler: 1. Load reminder + targets + messages from DB 2. Insert reminder_runs { status: 'pending', fired_at: now } 3. Acquire account session from session-manager - If not connected: mark all targets 'skipped', update run status, exit 4. For each target group: a. For each message part in position order: - text → sendTextMessage - media → load /data/media/, sendMedia with optional caption b. Insert reminder_run_targets { status, wa_message_id, latency_ms } c. Throttle: jitter between targets to stay under WA rate limits 5. Roll up reminder_runs.status: all sent → 'success'; all failed → 'failed'; mix → 'partial' 6. pgNotify('web.event', { type: 'reminder.fired', run_id }) 7. If recurring and not at end_at / max_runs: schedule next occurrence in pg-boss Else if at end: update reminder.status = 'ended' ``` ## 12. Error handling | Failure | Detection | Response | |---|---|---| | WA send transient (timeout, network) | Baileys throws / promise rejects | Retry via pg-boss with exponential backoff (3 tries: 30s/2m/10m). Final failure → mark `reminder_run_targets.failed`, dashboard + TG alert. | | WA send permanent (group not found, banned account) | Specific error codes | No retry. Mark target failed with reason. If account banned → mark `whatsapp_accounts.status='banned'`, urgent TG alert. | | WA session disconnect | `connection.update` event | Auto-reconnect. Downtime >5 min → TG alert. Reminders during downtime → `skipped`. | | WA logout | reason `loggedOut` | `status='logged_out'`. Stop reconnect attempts. TG: "Account X logged out — re-pair." | | Telegram delivery failure | grammy throws | Retry once. Then log to `audit_log` only — don't recurse via TG (TG itself might be down). | | Postgres connection lost | Drizzle errors | Both services exit non-zero (Docker restarts them). Health checks fail loudly during outage. | | Media file missing on disk | `fs.stat` fails before send | Mark target `media_missing`, don't send placeholder. TG alert. | | pg-boss job lost / corrupted | pg-boss own retry → dead-letter | Surface in admin "failed jobs" view; manual retry button. | | WA rate limit | Specific error | Throttle sender to 1 send / 3 sec per account, jitter between. Backoff longer. | | Unauthorized Telegram user | Whitelist middleware | Reply: "Sorry, this bot is private." Log to `audit_log`. No state change. | | Web session expired | Cookie validation fails | Redirect to `/login`. | ### Observability - **Logs:** `pino` JSON to stdout, captured by Docker. - **Health endpoints:** - `web`: `GET /api/health` — DB ping + uptime + commit SHA. - `bot`: internal port 8081, `GET /health` — DB ping + per-session counts (`{ connected: 8, disconnected: 1, pending: 0 }`). - **Per-reminder audit trail:** `reminder_runs` + `reminder_run_targets` history, exposed in dashboard. Every fire is fully reconstructable. ## 13. Testing strategy | Layer | Tool | Scope | |---|---|---| | Unit | Vitest | rrule helpers, message-part assembly, audit log builders, env validation, error classifiers. No I/O. | | Integration (DB) | Vitest + local dev Postgres (or Testcontainers) | Drizzle queries, pg-boss schedule sync, LISTEN/NOTIFY round-trip. Per-test schema with teardown. | | Bot session logic | Vitest with mocked Baileys | Session-manager state transitions, QR rendering, group-sync upsert. No real WA connection. | | Telegram | Vitest with mocked grammy | Command parsing, whitelist middleware, error responses. | | Web E2E | Playwright (deferred) | Login (stubbed magic link), reminder create wizard, dashboard. Add when CI exists. | | Pairing flow | Manual checklist | Real WA pairing requires a real phone — documented in `docs/superpowers/specs/manual-test-pairing.md`. Run before each release. | ### CI Out of scope for v1. `pnpm test` and `pnpm lint` will run via husky + lint-staged on `git push`. Gitea Actions can be wired later. ## 14. Scripts All scripts live in `scripts/`. Patterned on `cm_bot_v2`. | Script | Purpose | |---|---| | `dev.sh` | `up \| down \| logs \| status \| reset-db` against `docker-compose.dev.yml`. Pre-flight checks for `.env.development`. Honors `NO_SUDO=1`. `reset-db` truncates only `whatsapp_bot_dev` with a confirmation prompt. | | `publish.sh` | Build + push images to `gitea.04080616.xyz/yiekheng/cm-whatsapp-{web,bot}:`. Default tag `latest`. Same auth-error guidance as the cm_bot_v2 reference. | | `gen_auth_secret.sh` | Generate `AUTH_SECRET` (32 hex bytes). `--write [path]` mode appends/replaces in env file. | | `db.sh` | Drizzle migration wrapper: `migrate \| rollback \| seed \| studio \| reset`. `reset` is dev-only, refuses if env points at prod DB. | | `link-account.sh` | CLI helper to start a WA pairing flow without going through Telegram. Emits QR straight to the terminal. Useful for the dev mock account. | | `local_build.sh` | One-liner foreground compose up. Convenience. | ## 15. Open questions for implementation phase - Confirm subdomain choice: `bot.04080616.xyz` vs `whatsapp.04080616.xyz` vs other. - Confirm Postgres connectivity from Docker bridge (`172.16.0.0/12`) is allowed in the existing `pg_hba.conf` on `192.168.0.210`. If not, add the entry before first deploy. - Confirm operator's IANA timezone for `default_timezone` seed value. - Decide media retention default (proposing 90 days; sweeper job runs daily). - Decide whether to enforce a minimum interval between recurring fires (proposing 5 minutes). These don't block design approval — they're settled during the writing-plans phase or first implementation step.