diff --git a/docs/superpowers/specs/2026-05-02-b-auth-design.md b/docs/superpowers/specs/2026-05-02-b-auth-design.md new file mode 100644 index 0000000..27562ab --- /dev/null +++ b/docs/superpowers/specs/2026-05-02-b-auth-design.md @@ -0,0 +1,310 @@ +# B-auth: Login + WebAuthn Passkeys Design + +**Date:** 2026-05-02 +**Status:** Approved (design) +**Sequel to:** [2026-05-02-b2-b3-ui-port-pwa-design.md](2026-05-02-b2-b3-ui-port-pwa-design.md) +**Followed by:** B4 cutover (delete `app/cm_web_view.py`, retire `cm-web` Flask service, rename `cm-web-next` → `cm-web`). + +## Problem + +The Next.js dashboard (`cm-web-next`) currently has zero auth. Anyone who can reach `https://heng.04080616.xyz/` (the public vhost) lands directly on the accounts table. The plan was for aaPanel basic auth (C3) to gate the URL — and that's a fine outer defense — but the user wants: + +1. **In-PWA Face ID / fingerprint sign-in.** Once the PWA is installed, opening it should hit a real WebAuthn flow, not an OS-mediated basic-auth dialog. Passkeys feel native; basic auth in a chromeless PWA feels jarring. +2. **A password fallback** for first-time login on a new device, or when biometric isn't available. + +The existing `CM_AGENT_ID` / `CM_AGENT_PASSWORD` env vars already define an operator identity per deployment (rex-cm has an agent, siong-cm has an agent). Reusing those as the dashboard password — instead of building a separate user table — keeps B-auth scope small and avoids duplicating identity state. + +## Goal + +Add an in-app login flow to `cm-web-next`: + +- A `/login` page that shows two options side-by-side: a "Sign in with passkey" button (preferred when one is enrolled on this device), and a username + password form (fallback). +- Password sign-in compares against the existing `CM_AGENT_ID` and `CM_AGENT_PASSWORD` env vars using a constant-time compare. +- WebAuthn passkey enrollment (after first password sign-in, on a settings page) lets the operator add a Face ID / Touch ID / fingerprint credential bound to the device. Subsequent visits skip the password. +- Session state: a signed `httpOnly` cookie via `iron-session`. 30-day rolling expiry; refreshes on activity. +- All auth state lives in `cm-web-next` — no api-server changes, no mysql schema change. Passkeys are stored as JSON in a docker volume mounted into the container. +- Middleware gates every dashboard route except `/login` and the WebAuthn Server Actions, which are reachable while logged out. + +## Non-Goals + +- **No mysql schema change.** Passkeys live in a JSON file in a docker volume. For one operator with maybe 2-4 devices total, a real DB table is overkill. +- **No separate identity service** (Authelia, Keycloak, Cloudflare Access). All auth lives in `cm-web-next`. Authelia remains an out-of-scope upgrade path if multi-tenant or multi-deployment SSO ever becomes a need. +- **No multi-user support.** One operator per deployment, identified by `CM_AGENT_ID`. The passkey JSON is keyed by `CM_AGENT_ID` so that if a deployment ever swaps identity, the passkeys for the old identity stay scoped to the old identity. +- **No "forgot password" flow.** The password is the env var. If the operator can't remember it, they look it up in the deployment's `.env`. There is no recovery email, no reset token, none of that. +- **No api-server-side auth.** api-server stays internal-only (per C5), reached only from inside the docker network by web-view and web-next. Auth is a `cm-web-next` concern, not an api-server concern. +- **No public `/api/*` routes for the auth flow.** WebAuthn challenge/response goes through Server Actions, preserving the "no scrapable JSON surface" architecture. +- **B4 cutover is not in this scope.** Legacy Flask `cm_web_view.py` keeps running with no auth (gated only by aaPanel basic auth on its `https://...` vhost) until B4 retires it. + +## Architecture + +### Identity model + +One operator per `cm-web-next` instance, identified by `CM_AGENT_ID`. The same env var the bots use to log into cm99.net is reused as the dashboard username. The "session" is a cookie that says "the holder has authenticated as `CM_AGENT_ID`." Nothing more granular. + +When `CM_AGENT_ID` changes (rex-cm gets a new agent, say), all existing passkeys for the old `CM_AGENT_ID` become inaccessible — by design. The passkey JSON is keyed by username, so swapping identities re-enrolls from scratch. + +### Login flow — password + +1. Browser hits `/` → middleware sees no session cookie → 302 to `/login?next=/`. +2. `/login` page is a Server Component (form is a Client Component for state). +3. User types `CM_AGENT_ID` and `CM_AGENT_PASSWORD`, submits. +4. Client calls `loginWithPassword(username, password)` Server Action. +5. Server Action: + - Reads `CM_AGENT_ID` and `CM_AGENT_PASSWORD` from env. + - **Constant-time compare** both fields using `crypto.timingSafeEqual` over equal-length buffers. + - If both match: sets the session cookie with `{ username: CM_AGENT_ID, authenticatedAt: Date.now() }`. + - If either doesn't: returns `{ ok: false, error: "invalid credentials" }` (no leakage about which one). +6. Browser redirects to `next` (default `/`). + +### Login flow — passkey + +1. `/login` page detects (client-side) whether `PublicKeyCredential.isUserVerifyingPlatformAuthenticatorAvailable()` returns true and whether at least one passkey is enrolled (server-supplied flag in the page payload). +2. If both true: render a "Sign in with passkey" button as the primary CTA, password form below. +3. Click triggers `beginAuthentication()` Server Action → returns `PublicKeyCredentialRequestOptionsJSON` with a fresh server-generated challenge. +4. Client invokes `@simplewebauthn/browser`'s `startAuthentication()`, which prompts Face ID / fingerprint. +5. Browser returns signed assertion → client passes to `finishAuthentication(response)` Server Action. +6. Server verifies via `@simplewebauthn/server`'s `verifyAuthenticationResponse`, looks up the matching credential by ID, increments the counter, sets the session cookie. +7. Browser redirects to `next`. + +### Passkey enrollment flow + +1. Once authenticated (via password), user visits `/settings/passkeys`. +2. "Add passkey" button → `beginRegistration()` Server Action returns `PublicKeyCredentialCreationOptionsJSON`. +3. Client invokes `@simplewebauthn/browser`'s `startRegistration()` — Face ID / fingerprint enrolls a new credential. +4. Client sends attestation to `finishRegistration(response, deviceName)` Server Action. +5. Server verifies via `verifyRegistrationResponse`, persists `{ id, publicKey, counter, name, createdAt }` to the JSON file. +6. Page revalidates, the new passkey appears in the list. + +The settings page lists existing passkeys with their device names + a "Remove" button. Removing a passkey deletes its row from the JSON file. + +### Session + +| Concern | Choice | +|---|---| +| Library | `iron-session` (single small dep, hooks into Next.js cleanly via App Router cookies API) | +| Cookie name | `cm_auth` | +| Cookie attrs | `httpOnly`, `secure` (when `NODE_ENV=production`), `sameSite=lax`, `path=/` | +| Expiry | 30-day rolling — refresh on every request that touches a page | +| Secret | `CM_AUTH_SECRET` env var. ≥32 chars random. Operator generates with `openssl rand -hex 32`. | +| Body | `{ username: string, authenticatedAt: number }` — kept minimal so a stale session doesn't carry stale state. | + +### Passkey storage + +JSON file at `/data/auth/passkeys.json` inside the container. Mounted from a named volume `${CM_DEPLOY_NAME:-cm}-web-next-auth-data` so it persists across container restarts and image rebuilds. + +Schema: + +```json +{ + "": [ + { + "id": "base64url-credential-id", + "publicKey": "base64url-public-key", + "counter": 42, + "transports": ["internal", "hybrid"], + "name": "iPhone 15 Pro", + "createdAt": "2026-05-02T12:34:56Z" + } + ] +} +``` + +Top-level keys are `CM_AGENT_ID` values; values are arrays of credential records. The JSON file is read on every WebAuthn flow (small file, no caching needed) and written atomically (write to `passkeys.json.tmp`, fsync, rename). + +A small wrapper module `web/lib/auth-store.ts` owns the read/write and locks via a single in-process mutex to prevent concurrent writes from racing. + +### Server Actions inventory + +All in `web/app/auth-actions.ts` with `"use server"`: + +| Action | Purpose | +|---|---| +| `loginWithPassword({ username, password })` | Constant-time compare → set cookie → return `{ ok }` | +| `logout()` | Clear cookie → return `{ ok: true }` | +| `beginRegistration()` | Generate registration options, store challenge in session, return options. Requires authenticated session. | +| `finishRegistration({ response, deviceName })` | Verify attestation, persist credential to JSON. Requires authenticated session. | +| `beginAuthentication()` | Generate authentication options, store challenge in session, return options. NO auth required (this IS the login). | +| `finishAuthentication({ response })` | Verify assertion, set cookie, return `{ ok }`. NO auth required. | +| `removePasskey({ credentialId })` | Delete from JSON. Requires authenticated session. | + +The challenge for register/authenticate is stored in the session cookie (small, signed, transient). On the next call (`finishRegistration` / `finishAuthentication`) the server retrieves it from the cookie and clears it. + +### Middleware + +`web/middleware.ts` runs on every request: + +```typescript +import { NextRequest, NextResponse } from "next/server"; +import { getSessionFromCookie } from "@/lib/auth"; + +const PUBLIC_PATHS = new Set(["/login"]); + +export async function middleware(req: NextRequest) { + const path = req.nextUrl.pathname; + if (PUBLIC_PATHS.has(path)) return NextResponse.next(); + + const session = await getSessionFromCookie(req.cookies); + if (!session) { + const url = req.nextUrl.clone(); + url.pathname = "/login"; + url.searchParams.set("next", path); + return NextResponse.redirect(url); + } + return NextResponse.next(); +} + +export const config = { + // Skip _next, static, favicon, manifest, icon endpoints, etc. + matcher: ["/((?!_next|icon|apple-icon|manifest.webmanifest|favicon.ico).*)"], +}; +``` + +Server Actions live OUTSIDE the matcher (Next.js routes them through a separate POST handler with magic encoded payloads). Auth-required actions check the session manually inside the action body (because middleware doesn't run on Server Action invocations the same way). + +### Files Created / Modified + +| File | Operation | Purpose | +|---|---|---| +| `web/middleware.ts` | Create | Route gate | +| `web/lib/auth.ts` | Create | Session create/read/destroy helpers (iron-session wrapper) | +| `web/lib/auth-store.ts` | Create | JSON-file CRUD for passkeys with in-process write lock | +| `web/app/auth-actions.ts` | Create | All Server Actions listed above | +| `web/app/login/page.tsx` | Create | Login UI (Server Component shell) | +| `web/app/login/login-form.tsx` | Create | Client Component for the form + passkey button | +| `web/app/settings/passkeys/page.tsx` | Create | Passkey list + add/remove (Server Component) | +| `web/app/settings/passkeys/passkey-list.tsx` | Create | Client Component handling enrollment + removal | +| `web/components/nav.tsx` | Modify | Add Settings link + Sign-out button (account menu) | +| `web/package.json` | Modify | Add `iron-session`, `@simplewebauthn/server`, `@simplewebauthn/browser` | +| `docker-compose.yml` | Modify | Add `web-next-auth-data` named volume + mount in `web-next` service | +| `docker-compose.override.yml` | Modify | Same volume mount in dev override | +| `envs/dev/.env.example` | Modify | Add `CM_AUTH_SECRET=devsecret-32-bytes-or-more-please-rotate` | +| `envs/rex/.env.example` | Modify | Same with placeholder, operator generates real value | +| `envs/siong/.env.example` | Modify | Same | +| `AGENTS.md` | Modify | Add a "Auth" subsection documenting `CM_AUTH_SECRET` and the passkey JSON volume | + +No file deletions. No changes outside `web/` and the per-deployment env templates and AGENTS.md. + +### `web/lib/auth.ts` shape + +```typescript +import "server-only"; +import { cookies } from "next/headers"; +import { sealData, unsealData } from "iron-session"; + +const COOKIE_NAME = "cm_auth"; +const COOKIE_TTL_SECONDS = 30 * 24 * 60 * 60; + +type Session = { + username: string; + authenticatedAt: number; + // Transient WebAuthn state (challenge, type) lives here too while a flow is in progress. + pendingChallenge?: { kind: "register" | "authenticate"; challenge: string; expiresAt: number }; +}; + +function secret(): string { + const s = process.env.CM_AUTH_SECRET; + if (!s || s.length < 32) { + throw new Error("CM_AUTH_SECRET missing or shorter than 32 chars"); + } + return s; +} + +export async function getSession(): Promise { /* read cookie, unseal */ } +export async function setSession(s: Session): Promise { /* seal, write cookie */ } +export async function clearSession(): Promise { /* delete cookie */ } +export async function requireSession(): Promise { /* throws if no session */ } +``` + +`server-only` ensures this never bundles into client code (poison import — fails the build if imported from a client component). + +### `web/lib/auth-store.ts` shape + +```typescript +import "server-only"; +import { promises as fs } from "node:fs"; +import path from "node:path"; + +const FILE_PATH = process.env.CM_AUTH_STORE_PATH ?? "/data/auth/passkeys.json"; + +export type PasskeyRecord = { + id: string; + publicKey: string; + counter: number; + transports: AuthenticatorTransportFuture[]; + name: string; + createdAt: string; +}; + +let writeLock: Promise = Promise.resolve(); + +export async function readPasskeys(username: string): Promise { /* ... */ } +export async function appendPasskey(username: string, rec: PasskeyRecord): Promise { /* lock, read, append, atomic-write */ } +export async function removePasskey(username: string, credentialId: string): Promise { /* lock, read, filter, atomic-write */ } +export async function bumpCounter(username: string, credentialId: string, counter: number): Promise { /* same */ } +``` + +The `writeLock` chain serializes writes within a single Node process. With one container (no clustering) this is sufficient. If we ever scale `cm-web-next` horizontally, switch to a real lock file or move to mysql. + +### Login page UI brief + +frontend-design generates `login/page.tsx` shell + `login-form.tsx` client component matching the SaaS aesthetic of the rest of the dashboard. Concrete requirements: + +- Centered card on the workbench backdrop, white with `ring-1 ring-zinc-200/60`, rounded-2xl. +- Brand mark (small "CM" tile) + "Sign in" heading. +- **Primary CTA:** "Sign in with passkey" button (large, dark zinc-900) — only rendered if the page payload says a passkey is enrolled AND the browser supports `isUserVerifyingPlatformAuthenticatorAvailable()`. +- **Below it:** "or username + password" divider, then two inputs (username, password) with a smaller "Sign in" button. +- Error state: inline red below the form if `loginWithPassword` returns `{ ok: false }`. +- All inputs use `text-base sm:text-[13px]` (the existing iOS auto-zoom fix). +- No "remember me" — cookie is rolling 30 days by default. +- "Forgot your password? Check the deployment's `.env` file" — small zinc-500 footer (matter-of-fact, internal-tool tone). + +### Settings/passkeys page UI brief + +- Standard dashboard layout (Nav, page heading "Passkeys"). +- List of enrolled passkeys: name, created date, "Remove" button. Empty state: "No passkeys enrolled yet." +- "Add passkey" button at the top: opens a modal with a single text input ("Device name", e.g., "iPhone 15"), then triggers `startRegistration`. +- After successful enrollment: row appears, success toast fires (matches existing toast pattern). + +### Nav modification + +Add a small account menu on the right side (next to the existing Accounts/Users tab pills): + +- A subtle button showing `CM_AGENT_ID` (truncated if long). +- On click: dropdown with "Passkey settings" → `/settings/passkeys`, and "Sign out" → calls `logout()` Server Action → redirect to `/login`. + +The dropdown uses the same modal/sheet primitive style — no new component primitive. + +## Verification + +1. **Cold start.** `bash scripts/dev.sh up`. Open `http://localhost:8010/`. Redirected to `/login?next=%2F`. +2. **Password sign-in.** Type `CM_AGENT_ID` and `CM_AGENT_PASSWORD` from the dev `.env`. Submit. Redirect to `/`. Accounts table renders. +3. **Cookie set.** DevTools → Application → Cookies → `cm_auth` present, `httpOnly`, `secure` (in prod) / not (in dev because `NODE_ENV=development`), `sameSite=lax`, expires ~30 days. +4. **Wrong password.** Type wrong password. Form shows red "invalid credentials". No success toast. No cookie set. +5. **Sign out.** Click the user menu → Sign out. Redirected to `/login`. Cookie cleared. +6. **Passkey enrollment** (Chrome desktop with Touch ID, or iPhone). Sign in with password → settings/passkeys → Add passkey → name "MacBook" → Touch ID prompt → success toast → row appears in list. +7. **Passkey login.** Sign out. `/login` now shows "Sign in with passkey" as primary CTA. Click → Touch ID → redirect to `/`. +8. **Passkey persistence.** `bash scripts/dev.sh down && bash scripts/dev.sh up`. Sign-in flow still recognizes the previously enrolled passkey (volume persisted). +9. **Passkey removal.** Sign in → settings/passkeys → Remove. Row disappears, JSON file no longer contains it. +10. **Middleware coverage.** While signed out: `/`, `/users/`, `/settings/passkeys` all redirect to `/login`. `/login` itself does not redirect. +11. **Server Actions auth.** Calling `removePasskey` from a client without a valid session returns an error (auth-action body checks `getSession()` and throws/returns 401-equivalent). +12. **Constant-time compare.** Manually inspect `loginWithPassword` source — uses `crypto.timingSafeEqual` over zero-padded buffers of equal length. (No timing-channel leak about which field is wrong.) +13. **Volume preserved across rebuild.** `sudo docker compose -f docker-compose.yml -f docker-compose.override.yml build --no-cache web-next` then `up`. Passkey JSON survives. + +## Risk + +Medium. + +- **JSON-file write durability.** A crash mid-write could corrupt the file. Mitigation: atomic write (`tmp` + `rename`), single in-process mutex. For one operator with low write frequency (passkey adds/removes are rare), this is sufficient. If we ever need multi-writer guarantees, switch to mysql. +- **`CM_AUTH_SECRET` rotation invalidates all sessions.** Expected behavior — operators understand a secret rotation logs everyone out. Document this. +- **Passkeys aren't multi-user.** If two operators ever need to share a deployment, they'd share the same `CM_AGENT_ID` identity and the same passkey list — fine for now but a hard scaling cliff. Captured as out-of-scope. +- **Browser support.** WebAuthn is supported in all modern browsers (iOS 16+, Chrome, Edge, Firefox, Safari). On unsupported browsers the password flow is the only path; we feature-detect and hide the passkey CTA. +- **iOS PWA standalone WebAuthn.** Apple has had platform bugs in earlier iOS versions where standalone PWAs couldn't trigger WebAuthn. iOS 17+ is reliable. Document the minimum version. +- **Server Action surface.** Server Actions ARE network-callable (Next.js routes them). They aren't "private functions" — anyone who reverse-engineers the Next.js wire format can call them. Mitigation: every action that requires auth checks the session inside the action body. The cost of reverse-engineering Next.js's encoding is much higher than calling an open `/api/foo` endpoint, so the practical attack surface is similar to a per-route auth-required `/api/*` proxy. + +## Out-of-Scope Follow-Ups + +- **B4 cutover** — separate cycle: delete `app/cm_web_view.py`, retire `cm-web` (Flask) service, rename `cm-web-next` → `cm-web`. After B4, the legacy Flask UI (which has no auth) goes away entirely. +- **Authelia / SSO** — if multi-deployment SSO ever becomes a need, swap the in-app auth for an Authelia container. No timeline; revisit if/when. +- **Session listing / revocation** — show "active sessions" on settings, allow remote logout. Useful for "I lost a phone" recovery if you want stricter than "rotate `CM_AUTH_SECRET`". YAGNI for now. +- **CSRF token on Server Actions** — Next.js's Server Action transport already includes a hidden token, but reviewing the framework's CSRF posture for our specific deployment is an exercise we can do separately. +- **Failed-login lockout** — a small per-IP counter that returns 429 after N bad password attempts. Defense-in-depth; aaPanel C4 rate-limit also helps.