cm_bot_v2/docs/superpowers/specs/2026-05-02-b-auth-design.md
yiekheng 43533c3485 fix(spec): rename auth routes to /cm-auth and /cm-passkeys
Avoids the well-known /login path that scanners hit by default.
The cm- prefix matches the rest of the project's namespacing
(cm-web-next, cm-api, etc.) and isn't on standard scanner wordlists.

Settings page moves to flat /cm-passkeys (was /settings/passkeys)
to drop the simple 'settings' word — same scanner-noise reasoning.

File paths follow: web/app/cm-auth/, web/app/cm-passkeys/.
2026-05-03 08:16:36 +08:00

311 lines
20 KiB
Markdown

# B-auth: Login + WebAuthn Passkeys Design
**Date:** 2026-05-02
**Status:** Approved (design)
**Sequel to:** [2026-05-02-b2-b3-ui-port-pwa-design.md](2026-05-02-b2-b3-ui-port-pwa-design.md)
**Followed by:** B4 cutover (delete `app/cm_web_view.py`, retire `cm-web` Flask service, rename `cm-web-next``cm-web`).
## Problem
The Next.js dashboard (`cm-web-next`) currently has zero auth. Anyone who can reach `https://heng.04080616.xyz/` (the public vhost) lands directly on the accounts table. The plan was for aaPanel basic auth (C3) to gate the URL — and that's a fine outer defense — but the user wants:
1. **In-PWA Face ID / fingerprint sign-in.** Once the PWA is installed, opening it should hit a real WebAuthn flow, not an OS-mediated basic-auth dialog. Passkeys feel native; basic auth in a chromeless PWA feels jarring.
2. **A password fallback** for first-time login on a new device, or when biometric isn't available.
The existing `CM_AGENT_ID` / `CM_AGENT_PASSWORD` env vars already define an operator identity per deployment (rex-cm has an agent, siong-cm has an agent). Reusing those as the dashboard password — instead of building a separate user table — keeps B-auth scope small and avoids duplicating identity state.
## Goal
Add an in-app login flow to `cm-web-next`:
- A `/cm-auth` page that shows two options side-by-side: a "Sign in with passkey" button (preferred when one is enrolled on this device), and a username + password form (fallback).
- Password sign-in compares against the existing `CM_AGENT_ID` and `CM_AGENT_PASSWORD` env vars using a constant-time compare.
- WebAuthn passkey enrollment (after first password sign-in, on a settings page) lets the operator add a Face ID / Touch ID / fingerprint credential bound to the device. Subsequent visits skip the password.
- Session state: a signed `httpOnly` cookie via `iron-session`. 30-day rolling expiry; refreshes on activity.
- All auth state lives in `cm-web-next` — no api-server changes, no mysql schema change. Passkeys are stored as JSON in a docker volume mounted into the container.
- Middleware gates every dashboard route except `/cm-auth` and the WebAuthn Server Actions, which are reachable while logged out.
## Non-Goals
- **No mysql schema change.** Passkeys live in a JSON file in a docker volume. For one operator with maybe 2-4 devices total, a real DB table is overkill.
- **No separate identity service** (Authelia, Keycloak, Cloudflare Access). All auth lives in `cm-web-next`. Authelia remains an out-of-scope upgrade path if multi-tenant or multi-deployment SSO ever becomes a need.
- **No multi-user support.** One operator per deployment, identified by `CM_AGENT_ID`. The passkey JSON is keyed by `CM_AGENT_ID` so that if a deployment ever swaps identity, the passkeys for the old identity stay scoped to the old identity.
- **No "forgot password" flow.** The password is the env var. If the operator can't remember it, they look it up in the deployment's `.env`. There is no recovery email, no reset token, none of that.
- **No api-server-side auth.** api-server stays internal-only (per C5), reached only from inside the docker network by web-view and web-next. Auth is a `cm-web-next` concern, not an api-server concern.
- **No public `/api/*` routes for the auth flow.** WebAuthn challenge/response goes through Server Actions, preserving the "no scrapable JSON surface" architecture.
- **B4 cutover is not in this scope.** Legacy Flask `cm_web_view.py` keeps running with no auth (gated only by aaPanel basic auth on its `https://...` vhost) until B4 retires it.
## Architecture
### Identity model
One operator per `cm-web-next` instance, identified by `CM_AGENT_ID`. The same env var the bots use to log into cm99.net is reused as the dashboard username. The "session" is a cookie that says "the holder has authenticated as `CM_AGENT_ID`." Nothing more granular.
When `CM_AGENT_ID` changes (rex-cm gets a new agent, say), all existing passkeys for the old `CM_AGENT_ID` become inaccessible — by design. The passkey JSON is keyed by username, so swapping identities re-enrolls from scratch.
### Login flow — password
1. Browser hits `/` → middleware sees no session cookie → 302 to `/cm-auth?next=/`.
2. `/cm-auth` page is a Server Component (form is a Client Component for state).
3. User types `CM_AGENT_ID` and `CM_AGENT_PASSWORD`, submits.
4. Client calls `loginWithPassword(username, password)` Server Action.
5. Server Action:
- Reads `CM_AGENT_ID` and `CM_AGENT_PASSWORD` from env.
- **Constant-time compare** both fields using `crypto.timingSafeEqual` over equal-length buffers.
- If both match: sets the session cookie with `{ username: CM_AGENT_ID, authenticatedAt: Date.now() }`.
- If either doesn't: returns `{ ok: false, error: "invalid credentials" }` (no leakage about which one).
6. Browser redirects to `next` (default `/`).
### Login flow — passkey
1. `/cm-auth` page detects (client-side) whether `PublicKeyCredential.isUserVerifyingPlatformAuthenticatorAvailable()` returns true and whether at least one passkey is enrolled (server-supplied flag in the page payload).
2. If both true: render a "Sign in with passkey" button as the primary CTA, password form below.
3. Click triggers `beginAuthentication()` Server Action → returns `PublicKeyCredentialRequestOptionsJSON` with a fresh server-generated challenge.
4. Client invokes `@simplewebauthn/browser`'s `startAuthentication()`, which prompts Face ID / fingerprint.
5. Browser returns signed assertion → client passes to `finishAuthentication(response)` Server Action.
6. Server verifies via `@simplewebauthn/server`'s `verifyAuthenticationResponse`, looks up the matching credential by ID, increments the counter, sets the session cookie.
7. Browser redirects to `next`.
### Passkey enrollment flow
1. Once authenticated (via password), user visits `/cm-passkeys`.
2. "Add passkey" button → `beginRegistration()` Server Action returns `PublicKeyCredentialCreationOptionsJSON`.
3. Client invokes `@simplewebauthn/browser`'s `startRegistration()` — Face ID / fingerprint enrolls a new credential.
4. Client sends attestation to `finishRegistration(response, deviceName)` Server Action.
5. Server verifies via `verifyRegistrationResponse`, persists `{ id, publicKey, counter, name, createdAt }` to the JSON file.
6. Page revalidates, the new passkey appears in the list.
The settings page lists existing passkeys with their device names + a "Remove" button. Removing a passkey deletes its row from the JSON file.
### Session
| Concern | Choice |
|---|---|
| Library | `iron-session` (single small dep, hooks into Next.js cleanly via App Router cookies API) |
| Cookie name | `cm_auth` |
| Cookie attrs | `httpOnly`, `secure` (when `NODE_ENV=production`), `sameSite=lax`, `path=/` |
| Expiry | 30-day rolling — refresh on every request that touches a page |
| Secret | `CM_AUTH_SECRET` env var. ≥32 chars random. Operator generates with `openssl rand -hex 32`. |
| Body | `{ username: string, authenticatedAt: number }` — kept minimal so a stale session doesn't carry stale state. |
### Passkey storage
JSON file at `/data/auth/passkeys.json` inside the container. Mounted from a named volume `${CM_DEPLOY_NAME:-cm}-web-next-auth-data` so it persists across container restarts and image rebuilds.
Schema:
```json
{
"<CM_AGENT_ID>": [
{
"id": "base64url-credential-id",
"publicKey": "base64url-public-key",
"counter": 42,
"transports": ["internal", "hybrid"],
"name": "iPhone 15 Pro",
"createdAt": "2026-05-02T12:34:56Z"
}
]
}
```
Top-level keys are `CM_AGENT_ID` values; values are arrays of credential records. The JSON file is read on every WebAuthn flow (small file, no caching needed) and written atomically (write to `passkeys.json.tmp`, fsync, rename).
A small wrapper module `web/lib/auth-store.ts` owns the read/write and locks via a single in-process mutex to prevent concurrent writes from racing.
### Server Actions inventory
All in `web/app/auth-actions.ts` with `"use server"`:
| Action | Purpose |
|---|---|
| `loginWithPassword({ username, password })` | Constant-time compare → set cookie → return `{ ok }` |
| `logout()` | Clear cookie → return `{ ok: true }` |
| `beginRegistration()` | Generate registration options, store challenge in session, return options. Requires authenticated session. |
| `finishRegistration({ response, deviceName })` | Verify attestation, persist credential to JSON. Requires authenticated session. |
| `beginAuthentication()` | Generate authentication options, store challenge in session, return options. NO auth required (this IS the login). |
| `finishAuthentication({ response })` | Verify assertion, set cookie, return `{ ok }`. NO auth required. |
| `removePasskey({ credentialId })` | Delete from JSON. Requires authenticated session. |
The challenge for register/authenticate is stored in the session cookie (small, signed, transient). On the next call (`finishRegistration` / `finishAuthentication`) the server retrieves it from the cookie and clears it.
### Middleware
`web/middleware.ts` runs on every request:
```typescript
import { NextRequest, NextResponse } from "next/server";
import { getSessionFromCookie } from "@/lib/auth";
const PUBLIC_PATHS = new Set(["/cm-auth"]);
export async function middleware(req: NextRequest) {
const path = req.nextUrl.pathname;
if (PUBLIC_PATHS.has(path)) return NextResponse.next();
const session = await getSessionFromCookie(req.cookies);
if (!session) {
const url = req.nextUrl.clone();
url.pathname = "/cm-auth";
url.searchParams.set("next", path);
return NextResponse.redirect(url);
}
return NextResponse.next();
}
export const config = {
// Skip _next, static, favicon, manifest, icon endpoints, etc.
matcher: ["/((?!_next|icon|apple-icon|manifest.webmanifest|favicon.ico).*)"],
};
```
Server Actions live OUTSIDE the matcher (Next.js routes them through a separate POST handler with magic encoded payloads). Auth-required actions check the session manually inside the action body (because middleware doesn't run on Server Action invocations the same way).
### Files Created / Modified
| File | Operation | Purpose |
|---|---|---|
| `web/middleware.ts` | Create | Route gate |
| `web/lib/auth.ts` | Create | Session create/read/destroy helpers (iron-session wrapper) |
| `web/lib/auth-store.ts` | Create | JSON-file CRUD for passkeys with in-process write lock |
| `web/app/auth-actions.ts` | Create | All Server Actions listed above |
| `web/app/cm-auth/page.tsx` | Create | Login UI (Server Component shell) |
| `web/app/cm-auth/auth-form.tsx` | Create | Client Component for the form + passkey button |
| `web/app/cm-passkeys/page.tsx` | Create | Passkey list + add/remove (Server Component) |
| `web/app/cm-passkeys/passkey-list.tsx` | Create | Client Component handling enrollment + removal |
| `web/components/nav.tsx` | Modify | Add Settings link + Sign-out button (account menu) |
| `web/package.json` | Modify | Add `iron-session`, `@simplewebauthn/server`, `@simplewebauthn/browser` |
| `docker-compose.yml` | Modify | Add `web-next-auth-data` named volume + mount in `web-next` service |
| `docker-compose.override.yml` | Modify | Same volume mount in dev override |
| `envs/dev/.env.example` | Modify | Add `CM_AUTH_SECRET=devsecret-32-bytes-or-more-please-rotate` |
| `envs/rex/.env.example` | Modify | Same with placeholder, operator generates real value |
| `envs/siong/.env.example` | Modify | Same |
| `AGENTS.md` | Modify | Add a "Auth" subsection documenting `CM_AUTH_SECRET` and the passkey JSON volume |
No file deletions. No changes outside `web/` and the per-deployment env templates and AGENTS.md.
### `web/lib/auth.ts` shape
```typescript
import "server-only";
import { cookies } from "next/headers";
import { sealData, unsealData } from "iron-session";
const COOKIE_NAME = "cm_auth";
const COOKIE_TTL_SECONDS = 30 * 24 * 60 * 60;
type Session = {
username: string;
authenticatedAt: number;
// Transient WebAuthn state (challenge, type) lives here too while a flow is in progress.
pendingChallenge?: { kind: "register" | "authenticate"; challenge: string; expiresAt: number };
};
function secret(): string {
const s = process.env.CM_AUTH_SECRET;
if (!s || s.length < 32) {
throw new Error("CM_AUTH_SECRET missing or shorter than 32 chars");
}
return s;
}
export async function getSession(): Promise<Session | null> { /* read cookie, unseal */ }
export async function setSession(s: Session): Promise<void> { /* seal, write cookie */ }
export async function clearSession(): Promise<void> { /* delete cookie */ }
export async function requireSession(): Promise<Session> { /* throws if no session */ }
```
`server-only` ensures this never bundles into client code (poison import — fails the build if imported from a client component).
### `web/lib/auth-store.ts` shape
```typescript
import "server-only";
import { promises as fs } from "node:fs";
import path from "node:path";
const FILE_PATH = process.env.CM_AUTH_STORE_PATH ?? "/data/auth/passkeys.json";
export type PasskeyRecord = {
id: string;
publicKey: string;
counter: number;
transports: AuthenticatorTransportFuture[];
name: string;
createdAt: string;
};
let writeLock: Promise<void> = Promise.resolve();
export async function readPasskeys(username: string): Promise<PasskeyRecord[]> { /* ... */ }
export async function appendPasskey(username: string, rec: PasskeyRecord): Promise<void> { /* lock, read, append, atomic-write */ }
export async function removePasskey(username: string, credentialId: string): Promise<boolean> { /* lock, read, filter, atomic-write */ }
export async function bumpCounter(username: string, credentialId: string, counter: number): Promise<void> { /* same */ }
```
The `writeLock` chain serializes writes within a single Node process. With one container (no clustering) this is sufficient. If we ever scale `cm-web-next` horizontally, switch to a real lock file or move to mysql.
### Login page UI brief
frontend-design generates `login/page.tsx` shell + `login-form.tsx` client component matching the SaaS aesthetic of the rest of the dashboard. Concrete requirements:
- Centered card on the workbench backdrop, white with `ring-1 ring-zinc-200/60`, rounded-2xl.
- Brand mark (small "CM" tile) + "Sign in" heading.
- **Primary CTA:** "Sign in with passkey" button (large, dark zinc-900) — only rendered if the page payload says a passkey is enrolled AND the browser supports `isUserVerifyingPlatformAuthenticatorAvailable()`.
- **Below it:** "or username + password" divider, then two inputs (username, password) with a smaller "Sign in" button.
- Error state: inline red below the form if `loginWithPassword` returns `{ ok: false }`.
- All inputs use `text-base sm:text-[13px]` (the existing iOS auto-zoom fix).
- No "remember me" — cookie is rolling 30 days by default.
- "Forgot your password? Check the deployment's `.env` file" — small zinc-500 footer (matter-of-fact, internal-tool tone).
### Settings/passkeys page UI brief
- Standard dashboard layout (Nav, page heading "Passkeys").
- List of enrolled passkeys: name, created date, "Remove" button. Empty state: "No passkeys enrolled yet."
- "Add passkey" button at the top: opens a modal with a single text input ("Device name", e.g., "iPhone 15"), then triggers `startRegistration`.
- After successful enrollment: row appears, success toast fires (matches existing toast pattern).
### Nav modification
Add a small account menu on the right side (next to the existing Accounts/Users tab pills):
- A subtle button showing `CM_AGENT_ID` (truncated if long).
- On click: dropdown with "Passkey settings" → `/cm-passkeys`, and "Sign out" → calls `logout()` Server Action → redirect to `/cm-auth`.
The dropdown uses the same modal/sheet primitive style — no new component primitive.
## Verification
1. **Cold start.** `bash scripts/dev.sh up`. Open `http://localhost:8010/`. Redirected to `/cm-auth?next=%2F`.
2. **Password sign-in.** Type `CM_AGENT_ID` and `CM_AGENT_PASSWORD` from the dev `.env`. Submit. Redirect to `/`. Accounts table renders.
3. **Cookie set.** DevTools → Application → Cookies → `cm_auth` present, `httpOnly`, `secure` (in prod) / not (in dev because `NODE_ENV=development`), `sameSite=lax`, expires ~30 days.
4. **Wrong password.** Type wrong password. Form shows red "invalid credentials". No success toast. No cookie set.
5. **Sign out.** Click the user menu → Sign out. Redirected to `/cm-auth`. Cookie cleared.
6. **Passkey enrollment** (Chrome desktop with Touch ID, or iPhone). Sign in with password → settings/passkeys → Add passkey → name "MacBook" → Touch ID prompt → success toast → row appears in list.
7. **Passkey login.** Sign out. `/cm-auth` now shows "Sign in with passkey" as primary CTA. Click → Touch ID → redirect to `/`.
8. **Passkey persistence.** `bash scripts/dev.sh down && bash scripts/dev.sh up`. Sign-in flow still recognizes the previously enrolled passkey (volume persisted).
9. **Passkey removal.** Sign in → settings/passkeys → Remove. Row disappears, JSON file no longer contains it.
10. **Middleware coverage.** While signed out: `/`, `/users/`, `/cm-passkeys` all redirect to `/cm-auth`. `/cm-auth` itself does not redirect.
11. **Server Actions auth.** Calling `removePasskey` from a client without a valid session returns an error (auth-action body checks `getSession()` and throws/returns 401-equivalent).
12. **Constant-time compare.** Manually inspect `loginWithPassword` source — uses `crypto.timingSafeEqual` over zero-padded buffers of equal length. (No timing-channel leak about which field is wrong.)
13. **Volume preserved across rebuild.** `sudo docker compose -f docker-compose.yml -f docker-compose.override.yml build --no-cache web-next` then `up`. Passkey JSON survives.
## Risk
Medium.
- **JSON-file write durability.** A crash mid-write could corrupt the file. Mitigation: atomic write (`tmp` + `rename`), single in-process mutex. For one operator with low write frequency (passkey adds/removes are rare), this is sufficient. If we ever need multi-writer guarantees, switch to mysql.
- **`CM_AUTH_SECRET` rotation invalidates all sessions.** Expected behavior — operators understand a secret rotation logs everyone out. Document this.
- **Passkeys aren't multi-user.** If two operators ever need to share a deployment, they'd share the same `CM_AGENT_ID` identity and the same passkey list — fine for now but a hard scaling cliff. Captured as out-of-scope.
- **Browser support.** WebAuthn is supported in all modern browsers (iOS 16+, Chrome, Edge, Firefox, Safari). On unsupported browsers the password flow is the only path; we feature-detect and hide the passkey CTA.
- **iOS PWA standalone WebAuthn.** Apple has had platform bugs in earlier iOS versions where standalone PWAs couldn't trigger WebAuthn. iOS 17+ is reliable. Document the minimum version.
- **Server Action surface.** Server Actions ARE network-callable (Next.js routes them). They aren't "private functions" — anyone who reverse-engineers the Next.js wire format can call them. Mitigation: every action that requires auth checks the session inside the action body. The cost of reverse-engineering Next.js's encoding is much higher than calling an open `/api/foo` endpoint, so the practical attack surface is similar to a per-route auth-required `/api/*` proxy.
## Out-of-Scope Follow-Ups
- **B4 cutover** — separate cycle: delete `app/cm_web_view.py`, retire `cm-web` (Flask) service, rename `cm-web-next``cm-web`. After B4, the legacy Flask UI (which has no auth) goes away entirely.
- **Authelia / SSO** — if multi-deployment SSO ever becomes a need, swap the in-app auth for an Authelia container. No timeline; revisit if/when.
- **Session listing / revocation** — show "active sessions" on settings, allow remote logout. Useful for "I lost a phone" recovery if you want stricter than "rotate `CM_AUTH_SECRET`". YAGNI for now.
- **CSRF token on Server Actions** — Next.js's Server Action transport already includes a hidden token, but reviewing the framework's CSRF posture for our specific deployment is an exercise we can do separately.
- **Failed-login lockout** — a small per-IP counter that returns 429 after N bad password attempts. Defense-in-depth; aaPanel C4 rate-limit also helps.