docs: web app design (Telegram-free pivot, plan 3 spec)

After live-testing the Telegram bot we hit limits that don't go away with
more menu polish (Markdown fragility, callback_data limits, no native
date pickers, awkward media UX). Pivot to a Next.js PWA installable on
the operator's phone; remove Telegram entirely.

Spec covers: service topology with bot codebase shrunk, no-auth access
stance with rate limiting + reverse-proxy gating, Server Actions
replacing public REST mutation endpoints, SSE for live updates, the new
web-side pair flow with live QR display, multi-step reminder wizard
backed by URL state, mobile-first shadcn/ui visual layer, PWA service
worker via @serwist/next, and a step-by-step plan to delete the existing
Telegram code first.

Inherits all confirmed values from the 2026-05-03 master spec.
This commit is contained in:
yiekheng 2026-05-09 22:15:51 +08:00
parent 97099bf28a
commit 3e2bc8c7ee

View File

@ -0,0 +1,314 @@
# Web App Design — Telegram-Free Pivot
**Status:** Draft
**Date:** 2026-05-09
**Supersedes:** Sections of `2026-05-03-whatsapp-bot-design.md` that describe Telegram as the primary control surface.
## 1. Why this exists
After live-testing the Telegram bot we hit limits that don't go away with more menu polish:
- Markdown parsing is fragile; user content breaks rendering.
- callback_data is capped at 64 bytes; complex flows need stateful workarounds.
- No native date/time picker; we rebuilt a year/month/day grid by hand.
- Media UX (uploading photos, previewing video) is awkward in chat.
- Keyboard navigation through deep menus is slow for daily use.
The operator wants to install the controls **as a Progressive Web App on his phone** so they look and feel native. This document describes the migration: the Telegram bot is removed entirely and replaced by a Next.js PWA.
## 2. Stakeholders & access
- **Operator (brother):** sole end-user. Uses the PWA daily.
- **Developer (you):** builds, deploys, occasionally debugs.
- **No login.** The web app is reachable only at `https://wabot.04080616.xyz`. Whoever resolves that hostname and reaches port 443 has full control. Single seeded `operators` row in Postgres represents the brother for audit purposes; the app does not authenticate the request — it trusts the network perimeter (aaPanel reverse proxy + HTTPS).
This is an explicit trade-off. Risk: a leaked URL = full access. Mitigation: rotate by changing the subdomain. Defense in depth via rate limiting + strict referer checks below.
## 3. Tech stack
- **Next.js 16 (App Router)**, TypeScript end-to-end.
- **Tailwind CSS v4** + **shadcn/ui** components (latest registry).
- **Geist** font via `next/font`.
- **react-hook-form** + **zod** for forms (same zod schemas validated client-side and re-validated in server actions).
- **`@serwist/next`** for PWA service worker.
- **Drizzle ORM** (already in `packages/db`).
- **`pg`** for `LISTEN` in the SSE endpoint.
No new database tables; all reads/writes hit the existing schema from `2026-05-03-whatsapp-bot-design.md` §9.
## 4. Service topology
```
┌─────────────────────────────────────────────────────┐
│ Home Docker server │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ web │ │ bot │ │
│ │ Next.js 16 │◄───────►│ Node.js │ │
│ │ PWA + UI │ via │ Baileys │ │
│ │ Server │ Postgres│ pg-boss │ │
│ │ Components │ (LISTEN/│ sender │ │
│ │ + Server │ NOTIFY)│ ipc │ │
│ │ Actions │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ │ shared volume: │ /data/sessions │
│ │ /data/media │ │
│ │ │ │
│ └────────────┬───────────┘ │
│ │ │
│ ▼ │
│ aaPanel reverse proxy ─► wabot.04080616.xyz │
└─────────────────────────────────────────────────────┘
Postgres at 192.168.0.210
```
### Container responsibilities (post-pivot)
| Container | Role |
|---|---|
| `web` | All UI (Server Components for reads; Server Actions for mutations); SSE endpoint for live events; PWA service worker; QR PNG rendering for pair flow. |
| `bot` | Baileys WhatsApp sessions; pg-boss scheduler; fire-reminder; sender; group sync; IPC consumer that listens to `bot.command` Postgres notifications and dispatches to the right module. **No more Telegram code.** |
### Web ↔ bot channel
Same as before — Postgres `LISTEN/NOTIFY`. Web writes a row + `pgNotify('bot.command', ...)`; bot consumes; bot writes results back + `pgNotify('web.event', ...)`; web's SSE endpoint relays to the open browser.
## 5. Access stance (no auth)
- No login screen, no session cookies, no CSRF tokens (server actions handle their own).
- aaPanel is the only ingress. Bot's `:8081` health port is unreachable from outside.
- **Rate limit at Next.js middleware**: 30 requests / 10 sec per source IP. Anything more = 429.
- **Origin/Referer check on Server Actions**: Next.js 16 enforces this by default for actions; we leave it on.
- **Rotate the subdomain** if the URL ever leaks.
- **Audit log** is still written for every action, with `operator_id` set to the single seeded operator row.
## 6. Routes
```
/ Dashboard (overview cards)
/accounts Accounts list
/accounts/new Pair new account (live QR)
/accounts/[id] Account detail
/accounts/[id]/groups Groups list (paginated/searchable)
/accounts/[id]/pairing Live pair flow page (QR + status)
/groups/[id] Group detail + send test
/reminders Reminders list
/reminders/new Reminder wizard (?step=1..5)
/reminders/[id] Reminder detail (history, edit, delete)
/settings Profile (display name, default timezone)
# Server-side only — no public REST API for mutations:
GET /api/events Single SSE stream (read-only, public-safe)
```
All other `/api/*` paths return 404 (configured at aaPanel and as middleware match-all).
### Why no REST API for mutations
- Server Actions in Next.js 16 are first-class. They post to the page's own URL with an encrypted action ID, run on the server, return a serializable result, and integrate with `revalidatePath` / `revalidateTag` automatically.
- The browser never sees `/api/reminders` etc. as discoverable URLs.
- Type-safe end-to-end: the server action's return type flows through to the calling component without manual fetch boilerplate.
## 7. Live updates (SSE)
`GET /api/events` streams Server-Sent Events. The handler:
1. Connects to Postgres with a dedicated client (`LISTEN web.event`).
2. Forwards each notification's payload to the client as an SSE message:
```
event: session.qr
data: {"accountId":"...", "qrPng":"<base64>"}
event: session.connected
data: {"accountId":"...", "phoneNumber":"+60..."}
event: groups.synced
data: {"accountId":"...", "count":12}
event: reminder.fired
data: {"reminderId":"...", "runId":"...", "status":"success"}
event: reminder.failed
data: {"reminderId":"...", "error":"..."}
event: session.disconnected
data: {"accountId":"..."}
```
3. On disconnect, releases the PG client.
Client side: a single `useEvents()` hook opens the stream once at app mount. Each event triggers `queryClient.invalidateQueries` for the relevant key — the React Query cache stays fresh without polling.
## 8. Pair flow (replaces Telegram QR delivery)
```
Operator on /accounts → tap "Pair New Account" → /accounts/new
Form: { label: string }
Submit → server action: pairAccountAction(label)
├─ Insert whatsapp_accounts row { status:'pending', label }
└─ pgNotify('bot.command', { type:'account.start_pairing', accountId })
Server action returns { accountId } and the page redirects to /accounts/[id]/pairing
/accounts/[id]/pairing (server component renders shell + client island for SSE)
Shows label, account ID, "Waiting for QR…" shimmer
Client component subscribes to SSE
Bot's IPC consumer picks up notification:
sessionManager.start(accountId)
Listens for Baileys events:
qr → render PNG (base64) → pgNotify('web.event', { type:'session.qr', qrPng })
open → update DB + sync groups → pgNotify('session.connected') + 'groups.synced'
close (loggedOut) → pgNotify('session.timeout')
Browser:
On 'session.qr' — replace shimmer with <img src="data:image/png;base64,..."> + 30s countdown ring
On 'session.connected' — show ✅ Connected as +60xxx + auto-redirect to /accounts/[id] after 3s
On 'session.timeout' or 5-min server timer — show "Pairing timed out" + "Try again" button
```
The 5-min server-side timeout from plan 2 stays (in `bot`). On timeout the bot deletes the pending row and pgNotifies `session.timeout`.
## 9. Reminder wizard (replaces Telegram menu)
`/reminders/new` is one page that uses URL search params for state (`?step=N&...`). Five steps, each rendered server-side, with a server action per step that validates and redirects to the next step's URL.
| Step | Inputs | Notes |
|---|---|---|
| 1 — Account | radio list of paired accounts | shown as cards: label, phone, last connected status |
| 2 — Groups | checkbox list with search | **multi-target** — gain over plan 2's single-group constraint |
| 3 — Compose | textarea + file upload | drag-drop on desktop, native picker on mobile; file uploads go to `/data/media` via server action `uploadMediaAction` |
| 4 — When | `<input type="datetime-local">` + quick-pick chips | native iOS/Android datetime picker; chips for Now / Tomorrow 9 AM / Next Mon 9 AM |
| 5 — Review | rendered summary + [Schedule] | server action `createReminderAction` writes DB + schedules pg-boss job |
Edit-on-the-fly: each step has a "← Edit account / groups / body / time" link that navigates back to that step with the data preserved in URL.
URL-state is sufficient for v1 — small enough to fit in a query string. If we ever need to support multi-MB body content (drafts), we move to a `reminder_drafts` table.
## 10. Visual & layout
- **Mobile-first**. Tailwind breakpoints: `sm:` and up = "desktop layout"; below = single-column with comfortable tap targets (≥44px).
- **shadcn/ui** components throughout. Latest registry: Sidebar, Dialog, Form, DataTable, Sonner (toast), Sheet (mobile drawer), Tabs, Skeleton, Card.
- **Light + dark mode** auto-follows system; manual toggle in `/settings`.
- **Spacing rhythm**: 4 / 8 / 16 / 24 / 32 px.
- **Typography**: Geist (default).
- **Status colors**: green (connected/success), amber (pending/disconnected), red (banned/failed), neutral (ended).
- **Production-grade visual layer is delegated to the `frontend-design:frontend-design` skill during implementation** — it handles spacing, hierarchy, and feel.
### Layout shape
| Viewport | Shell |
|---|---|
| Mobile (<640px) | Top app bar (title + back) + bottom nav (Dashboard / Accounts / Reminders / Settings). Sheets for filters, dialogs for confirms. |
| Desktop (≥640px) | Left sidebar (collapsible) with same nav items + secondary nav for "New Account" / "New Reminder". Main content area with breadcrumbs at top. |
## 11. PWA
- `app/manifest.webmanifest` — name, short name, theme color, 192px + 512px icons, `display: standalone`, `start_url: /`, `background_color`.
- Service worker via **`@serwist/next`** (Workbox successor designed for App Router):
- Cache app shell (HTML for navigation routes, CSS, JS, fonts) — instant launch after first visit.
- Network-first for data routes (so live data still wins).
- Static assets cache-first.
- Offline fallback page rendered if no network.
- iOS install via `apple-mobile-web-app-capable` + `apple-touch-icon` meta tags.
- "Install on home screen" prompt rendered on the dashboard if `beforeinstallprompt` fires.
## 12. Telegram removal (must happen first in implementation)
The plan-3 implementation **starts** with deleting Telegram-related code so the bot container builds clean afterward.
### Files / modules deleted
- `apps/bot/src/telegram/` — entire directory (bot.ts, callbacks.ts, menus.ts, state.ts, commands/*, middleware/*)
- `apps/bot/src/media/ingest.ts` Telegram-side download (replaced by web upload action)
- Telegram-specific tests in `apps/bot/src/telegram/**/*.test.ts`
### Files modified
- `apps/bot/src/index.ts`: drop createTelegramBot / tg.start / shutdown.tg.stop. Replace with `startCommandConsumer(boss)` from a new `apps/bot/src/ipc/command-consumer.ts`.
- `apps/bot/package.json`: remove `grammy`, keep `qrcode` (still needed for QR PNG rendering, but **moves usage** — see below).
- `apps/bot/src/whatsapp/qr-renderer.ts` stays (called from the new IPC consumer's pair-handler).
### New modules
- `apps/bot/src/ipc/command-consumer.ts` — subscribes to Postgres `LISTEN bot.command`, dispatches:
- `account.start_pairing` → starts Baileys session, wires QR/open/close events to `pgNotify('web.event', …)`
- `account.unpair` → existing unpair logic
- `account.sync_groups` → group sync
- `group.send_test` → existing send-test
- `apps/bot/src/ipc/notify.ts` — typed `pgNotify(event, payload)` helper.
### Env keys removed
- `TELEGRAM_BOT_TOKEN`
- `TELEGRAM_OPERATOR_WHITELIST`
- `TELEGRAM_QR_CHAT_ID`
`SEED_OPERATOR_TELEGRAM_ID` still exists for backwards-compat with the seed script but the value loses its meaning; we keep the seeded operators row for audit log foreign keys.
### Cleanup tests
- All vitest tests under `apps/bot/src/telegram/` deleted along with the source.
- New tests: IPC consumer dispatch tests (mocked PG client), web's pair-flow server action tests (against a real test DB).
## 13. Error handling
| Failure | Detection | Response |
|---|---|---|
| WA send transient | sender throws | pg-boss retries 3× with backoff (already in plan 2). On final failure, reminder_run_targets row gets `status='failed'`. SSE pushes `reminder.failed` → toast in UI. |
| WA session lost | Baileys close event | account row → `disconnected`. SSE pushes `session.disconnected` → status badge in /accounts goes amber. Auto-reconnect after 5 sec. |
| Pair timeout | bot's 5-min timer | Account row deleted. SSE pushes `session.timeout` → page navigates to "try again" view. |
| Server action validation | zod parse fails | Returns `{ ok:false, errors: { field: msg } }`. Form re-renders with field-level errors. |
| Postgres unavailable | drizzle throws | Both containers log error, restart via Docker. UI shows a banner "Reconnecting…" if the SSE channel drops. |
| Media upload exceeds limit (50MB) | server action rejects | Returns error; UI shows "File too large". |
| SSE channel drops | EventSource fires `error` | Client reconnects with exponential backoff (built into `EventSource`). |
## 14. Observability
- **Logs**: pino JSON to stdout, captured by Docker (unchanged).
- **Health endpoints**:
- Web: `GET /api/health` — DB ping + commit SHA + uptime.
- Bot: internal `:8081/health` — DB ping + per-WA-session counts.
- **Per-reminder audit trail** stays in DB.
- **Sentry hookup** deferred (out of scope for this design).
## 15. Build, deploy, and dev experience
- New Dockerfile: `docker/web.Dockerfile` (currently a placeholder). Multi-stage: deps → build (`pnpm --filter @cmbot/web build``.next/standalone`) → runtime (`node apps/web/.next/standalone/server.js`).
- New service in compose: `web` (replaces the existing placeholder).
- `apps/web/` package: `@cmbot/web`, depends on `@cmbot/db` and `@cmbot/shared` workspaces.
- aaPanel reverse proxy: existing config block updated to forward to `web:3000` and pass through SSE headers; deny `/api/*` except `/api/events`.
Local dev:
- `scripts/dev.sh up` brings web alongside tools + bot.
- Hot reload: web mounts `apps/web/src` (and dependent packages) into the container; `next dev` watches.
## 16. Out of scope (for this plan)
- Recurring reminders (RRULE) — same plan-2 deferral; web wizard supports one-off only for now.
- Standalone media library page — media is attached to reminders, not browseable separately yet.
- E2E browser tests (Playwright) — manual test runbook in plan 3 covers verification.
- Sentry / external error tracking.
- WebPush notifications (the operator already gets WhatsApp messages on his phone; PWA badging is enough).
- Multi-operator (still single-tenant).
- Passkeys / WebAuthn (only relevant if we add auth later).
## 17. Confirmed values
Inherits from the master spec:
- Subdomain: `wabot.04080616.xyz`
- Default timezone: `Asia/Kuala_Lumpur`
- Postgres: `192.168.0.210` / `wabot`
- Media retention: 90 days
New for this design:
- Component library: shadcn/ui
- Visual style: clean utility / admin dashboard
- Auth: none (URL is the secret)
- Real-time: SSE
- Service worker: `@serwist/next`