cm_bot_v2/docs/superpowers/specs/2026-05-02-b-auth-design.md

20 KiB

B-auth: Login + WebAuthn Passkeys Design

Date: 2026-05-02 Status: Approved (design) Sequel to: 2026-05-02-b2-b3-ui-port-pwa-design.md Followed by: B4 cutover (delete app/cm_web_view.py, retire cm-web Flask service, rename cm-web-nextcm-web).

Problem

The Next.js dashboard (cm-web-next) currently has zero auth. Anyone who can reach https://heng.04080616.xyz/ (the public vhost) lands directly on the accounts table. The plan was for aaPanel basic auth (C3) to gate the URL — and that's a fine outer defense — but the user wants:

  1. In-PWA Face ID / fingerprint sign-in. Once the PWA is installed, opening it should hit a real WebAuthn flow, not an OS-mediated basic-auth dialog. Passkeys feel native; basic auth in a chromeless PWA feels jarring.
  2. A password fallback for first-time login on a new device, or when biometric isn't available.

The existing CM_AGENT_ID / CM_AGENT_PASSWORD env vars already define an operator identity per deployment (rex-cm has an agent, siong-cm has an agent). Reusing those as the dashboard password — instead of building a separate user table — keeps B-auth scope small and avoids duplicating identity state.

Goal

Add an in-app login flow to cm-web-next:

  • A /login page that shows two options side-by-side: a "Sign in with passkey" button (preferred when one is enrolled on this device), and a username + password form (fallback).
  • Password sign-in compares against the existing CM_AGENT_ID and CM_AGENT_PASSWORD env vars using a constant-time compare.
  • WebAuthn passkey enrollment (after first password sign-in, on a settings page) lets the operator add a Face ID / Touch ID / fingerprint credential bound to the device. Subsequent visits skip the password.
  • Session state: a signed httpOnly cookie via iron-session. 30-day rolling expiry; refreshes on activity.
  • All auth state lives in cm-web-next — no api-server changes, no mysql schema change. Passkeys are stored as JSON in a docker volume mounted into the container.
  • Middleware gates every dashboard route except /login and the WebAuthn Server Actions, which are reachable while logged out.

Non-Goals

  • No mysql schema change. Passkeys live in a JSON file in a docker volume. For one operator with maybe 2-4 devices total, a real DB table is overkill.
  • No separate identity service (Authelia, Keycloak, Cloudflare Access). All auth lives in cm-web-next. Authelia remains an out-of-scope upgrade path if multi-tenant or multi-deployment SSO ever becomes a need.
  • No multi-user support. One operator per deployment, identified by CM_AGENT_ID. The passkey JSON is keyed by CM_AGENT_ID so that if a deployment ever swaps identity, the passkeys for the old identity stay scoped to the old identity.
  • No "forgot password" flow. The password is the env var. If the operator can't remember it, they look it up in the deployment's .env. There is no recovery email, no reset token, none of that.
  • No api-server-side auth. api-server stays internal-only (per C5), reached only from inside the docker network by web-view and web-next. Auth is a cm-web-next concern, not an api-server concern.
  • No public /api/* routes for the auth flow. WebAuthn challenge/response goes through Server Actions, preserving the "no scrapable JSON surface" architecture.
  • B4 cutover is not in this scope. Legacy Flask cm_web_view.py keeps running with no auth (gated only by aaPanel basic auth on its https://... vhost) until B4 retires it.

Architecture

Identity model

One operator per cm-web-next instance, identified by CM_AGENT_ID. The same env var the bots use to log into cm99.net is reused as the dashboard username. The "session" is a cookie that says "the holder has authenticated as CM_AGENT_ID." Nothing more granular.

When CM_AGENT_ID changes (rex-cm gets a new agent, say), all existing passkeys for the old CM_AGENT_ID become inaccessible — by design. The passkey JSON is keyed by username, so swapping identities re-enrolls from scratch.

Login flow — password

  1. Browser hits / → middleware sees no session cookie → 302 to /login?next=/.
  2. /login page is a Server Component (form is a Client Component for state).
  3. User types CM_AGENT_ID and CM_AGENT_PASSWORD, submits.
  4. Client calls loginWithPassword(username, password) Server Action.
  5. Server Action:
    • Reads CM_AGENT_ID and CM_AGENT_PASSWORD from env.
    • Constant-time compare both fields using crypto.timingSafeEqual over equal-length buffers.
    • If both match: sets the session cookie with { username: CM_AGENT_ID, authenticatedAt: Date.now() }.
    • If either doesn't: returns { ok: false, error: "invalid credentials" } (no leakage about which one).
  6. Browser redirects to next (default /).

Login flow — passkey

  1. /login page detects (client-side) whether PublicKeyCredential.isUserVerifyingPlatformAuthenticatorAvailable() returns true and whether at least one passkey is enrolled (server-supplied flag in the page payload).
  2. If both true: render a "Sign in with passkey" button as the primary CTA, password form below.
  3. Click triggers beginAuthentication() Server Action → returns PublicKeyCredentialRequestOptionsJSON with a fresh server-generated challenge.
  4. Client invokes @simplewebauthn/browser's startAuthentication(), which prompts Face ID / fingerprint.
  5. Browser returns signed assertion → client passes to finishAuthentication(response) Server Action.
  6. Server verifies via @simplewebauthn/server's verifyAuthenticationResponse, looks up the matching credential by ID, increments the counter, sets the session cookie.
  7. Browser redirects to next.

Passkey enrollment flow

  1. Once authenticated (via password), user visits /settings/passkeys.
  2. "Add passkey" button → beginRegistration() Server Action returns PublicKeyCredentialCreationOptionsJSON.
  3. Client invokes @simplewebauthn/browser's startRegistration() — Face ID / fingerprint enrolls a new credential.
  4. Client sends attestation to finishRegistration(response, deviceName) Server Action.
  5. Server verifies via verifyRegistrationResponse, persists { id, publicKey, counter, name, createdAt } to the JSON file.
  6. Page revalidates, the new passkey appears in the list.

The settings page lists existing passkeys with their device names + a "Remove" button. Removing a passkey deletes its row from the JSON file.

Session

Concern Choice
Library iron-session (single small dep, hooks into Next.js cleanly via App Router cookies API)
Cookie name cm_auth
Cookie attrs httpOnly, secure (when NODE_ENV=production), sameSite=lax, path=/
Expiry 30-day rolling — refresh on every request that touches a page
Secret CM_AUTH_SECRET env var. ≥32 chars random. Operator generates with openssl rand -hex 32.
Body { username: string, authenticatedAt: number } — kept minimal so a stale session doesn't carry stale state.

Passkey storage

JSON file at /data/auth/passkeys.json inside the container. Mounted from a named volume ${CM_DEPLOY_NAME:-cm}-web-next-auth-data so it persists across container restarts and image rebuilds.

Schema:

{
  "<CM_AGENT_ID>": [
    {
      "id": "base64url-credential-id",
      "publicKey": "base64url-public-key",
      "counter": 42,
      "transports": ["internal", "hybrid"],
      "name": "iPhone 15 Pro",
      "createdAt": "2026-05-02T12:34:56Z"
    }
  ]
}

Top-level keys are CM_AGENT_ID values; values are arrays of credential records. The JSON file is read on every WebAuthn flow (small file, no caching needed) and written atomically (write to passkeys.json.tmp, fsync, rename).

A small wrapper module web/lib/auth-store.ts owns the read/write and locks via a single in-process mutex to prevent concurrent writes from racing.

Server Actions inventory

All in web/app/auth-actions.ts with "use server":

Action Purpose
loginWithPassword({ username, password }) Constant-time compare → set cookie → return { ok }
logout() Clear cookie → return { ok: true }
beginRegistration() Generate registration options, store challenge in session, return options. Requires authenticated session.
finishRegistration({ response, deviceName }) Verify attestation, persist credential to JSON. Requires authenticated session.
beginAuthentication() Generate authentication options, store challenge in session, return options. NO auth required (this IS the login).
finishAuthentication({ response }) Verify assertion, set cookie, return { ok }. NO auth required.
removePasskey({ credentialId }) Delete from JSON. Requires authenticated session.

The challenge for register/authenticate is stored in the session cookie (small, signed, transient). On the next call (finishRegistration / finishAuthentication) the server retrieves it from the cookie and clears it.

Middleware

web/middleware.ts runs on every request:

import { NextRequest, NextResponse } from "next/server";
import { getSessionFromCookie } from "@/lib/auth";

const PUBLIC_PATHS = new Set(["/login"]);

export async function middleware(req: NextRequest) {
  const path = req.nextUrl.pathname;
  if (PUBLIC_PATHS.has(path)) return NextResponse.next();

  const session = await getSessionFromCookie(req.cookies);
  if (!session) {
    const url = req.nextUrl.clone();
    url.pathname = "/login";
    url.searchParams.set("next", path);
    return NextResponse.redirect(url);
  }
  return NextResponse.next();
}

export const config = {
  // Skip _next, static, favicon, manifest, icon endpoints, etc.
  matcher: ["/((?!_next|icon|apple-icon|manifest.webmanifest|favicon.ico).*)"],
};

Server Actions live OUTSIDE the matcher (Next.js routes them through a separate POST handler with magic encoded payloads). Auth-required actions check the session manually inside the action body (because middleware doesn't run on Server Action invocations the same way).

Files Created / Modified

File Operation Purpose
web/middleware.ts Create Route gate
web/lib/auth.ts Create Session create/read/destroy helpers (iron-session wrapper)
web/lib/auth-store.ts Create JSON-file CRUD for passkeys with in-process write lock
web/app/auth-actions.ts Create All Server Actions listed above
web/app/login/page.tsx Create Login UI (Server Component shell)
web/app/login/login-form.tsx Create Client Component for the form + passkey button
web/app/settings/passkeys/page.tsx Create Passkey list + add/remove (Server Component)
web/app/settings/passkeys/passkey-list.tsx Create Client Component handling enrollment + removal
web/components/nav.tsx Modify Add Settings link + Sign-out button (account menu)
web/package.json Modify Add iron-session, @simplewebauthn/server, @simplewebauthn/browser
docker-compose.yml Modify Add web-next-auth-data named volume + mount in web-next service
docker-compose.override.yml Modify Same volume mount in dev override
envs/dev/.env.example Modify Add CM_AUTH_SECRET=devsecret-32-bytes-or-more-please-rotate
envs/rex/.env.example Modify Same with placeholder, operator generates real value
envs/siong/.env.example Modify Same
AGENTS.md Modify Add a "Auth" subsection documenting CM_AUTH_SECRET and the passkey JSON volume

No file deletions. No changes outside web/ and the per-deployment env templates and AGENTS.md.

web/lib/auth.ts shape

import "server-only";
import { cookies } from "next/headers";
import { sealData, unsealData } from "iron-session";

const COOKIE_NAME = "cm_auth";
const COOKIE_TTL_SECONDS = 30 * 24 * 60 * 60;

type Session = {
  username: string;
  authenticatedAt: number;
  // Transient WebAuthn state (challenge, type) lives here too while a flow is in progress.
  pendingChallenge?: { kind: "register" | "authenticate"; challenge: string; expiresAt: number };
};

function secret(): string {
  const s = process.env.CM_AUTH_SECRET;
  if (!s || s.length < 32) {
    throw new Error("CM_AUTH_SECRET missing or shorter than 32 chars");
  }
  return s;
}

export async function getSession(): Promise<Session | null> { /* read cookie, unseal */ }
export async function setSession(s: Session): Promise<void> { /* seal, write cookie */ }
export async function clearSession(): Promise<void> { /* delete cookie */ }
export async function requireSession(): Promise<Session> { /* throws if no session */ }

server-only ensures this never bundles into client code (poison import — fails the build if imported from a client component).

web/lib/auth-store.ts shape

import "server-only";
import { promises as fs } from "node:fs";
import path from "node:path";

const FILE_PATH = process.env.CM_AUTH_STORE_PATH ?? "/data/auth/passkeys.json";

export type PasskeyRecord = {
  id: string;
  publicKey: string;
  counter: number;
  transports: AuthenticatorTransportFuture[];
  name: string;
  createdAt: string;
};

let writeLock: Promise<void> = Promise.resolve();

export async function readPasskeys(username: string): Promise<PasskeyRecord[]> { /* ... */ }
export async function appendPasskey(username: string, rec: PasskeyRecord): Promise<void> { /* lock, read, append, atomic-write */ }
export async function removePasskey(username: string, credentialId: string): Promise<boolean> { /* lock, read, filter, atomic-write */ }
export async function bumpCounter(username: string, credentialId: string, counter: number): Promise<void> { /* same */ }

The writeLock chain serializes writes within a single Node process. With one container (no clustering) this is sufficient. If we ever scale cm-web-next horizontally, switch to a real lock file or move to mysql.

Login page UI brief

frontend-design generates login/page.tsx shell + login-form.tsx client component matching the SaaS aesthetic of the rest of the dashboard. Concrete requirements:

  • Centered card on the workbench backdrop, white with ring-1 ring-zinc-200/60, rounded-2xl.
  • Brand mark (small "CM" tile) + "Sign in" heading.
  • Primary CTA: "Sign in with passkey" button (large, dark zinc-900) — only rendered if the page payload says a passkey is enrolled AND the browser supports isUserVerifyingPlatformAuthenticatorAvailable().
  • Below it: "or username + password" divider, then two inputs (username, password) with a smaller "Sign in" button.
  • Error state: inline red below the form if loginWithPassword returns { ok: false }.
  • All inputs use text-base sm:text-[13px] (the existing iOS auto-zoom fix).
  • No "remember me" — cookie is rolling 30 days by default.
  • "Forgot your password? Check the deployment's .env file" — small zinc-500 footer (matter-of-fact, internal-tool tone).

Settings/passkeys page UI brief

  • Standard dashboard layout (Nav, page heading "Passkeys").
  • List of enrolled passkeys: name, created date, "Remove" button. Empty state: "No passkeys enrolled yet."
  • "Add passkey" button at the top: opens a modal with a single text input ("Device name", e.g., "iPhone 15"), then triggers startRegistration.
  • After successful enrollment: row appears, success toast fires (matches existing toast pattern).

Nav modification

Add a small account menu on the right side (next to the existing Accounts/Users tab pills):

  • A subtle button showing CM_AGENT_ID (truncated if long).
  • On click: dropdown with "Passkey settings" → /settings/passkeys, and "Sign out" → calls logout() Server Action → redirect to /login.

The dropdown uses the same modal/sheet primitive style — no new component primitive.

Verification

  1. Cold start. bash scripts/dev.sh up. Open http://localhost:8010/. Redirected to /login?next=%2F.
  2. Password sign-in. Type CM_AGENT_ID and CM_AGENT_PASSWORD from the dev .env. Submit. Redirect to /. Accounts table renders.
  3. Cookie set. DevTools → Application → Cookies → cm_auth present, httpOnly, secure (in prod) / not (in dev because NODE_ENV=development), sameSite=lax, expires ~30 days.
  4. Wrong password. Type wrong password. Form shows red "invalid credentials". No success toast. No cookie set.
  5. Sign out. Click the user menu → Sign out. Redirected to /login. Cookie cleared.
  6. Passkey enrollment (Chrome desktop with Touch ID, or iPhone). Sign in with password → settings/passkeys → Add passkey → name "MacBook" → Touch ID prompt → success toast → row appears in list.
  7. Passkey login. Sign out. /login now shows "Sign in with passkey" as primary CTA. Click → Touch ID → redirect to /.
  8. Passkey persistence. bash scripts/dev.sh down && bash scripts/dev.sh up. Sign-in flow still recognizes the previously enrolled passkey (volume persisted).
  9. Passkey removal. Sign in → settings/passkeys → Remove. Row disappears, JSON file no longer contains it.
  10. Middleware coverage. While signed out: /, /users/, /settings/passkeys all redirect to /login. /login itself does not redirect.
  11. Server Actions auth. Calling removePasskey from a client without a valid session returns an error (auth-action body checks getSession() and throws/returns 401-equivalent).
  12. Constant-time compare. Manually inspect loginWithPassword source — uses crypto.timingSafeEqual over zero-padded buffers of equal length. (No timing-channel leak about which field is wrong.)
  13. Volume preserved across rebuild. sudo docker compose -f docker-compose.yml -f docker-compose.override.yml build --no-cache web-next then up. Passkey JSON survives.

Risk

Medium.

  • JSON-file write durability. A crash mid-write could corrupt the file. Mitigation: atomic write (tmp + rename), single in-process mutex. For one operator with low write frequency (passkey adds/removes are rare), this is sufficient. If we ever need multi-writer guarantees, switch to mysql.
  • CM_AUTH_SECRET rotation invalidates all sessions. Expected behavior — operators understand a secret rotation logs everyone out. Document this.
  • Passkeys aren't multi-user. If two operators ever need to share a deployment, they'd share the same CM_AGENT_ID identity and the same passkey list — fine for now but a hard scaling cliff. Captured as out-of-scope.
  • Browser support. WebAuthn is supported in all modern browsers (iOS 16+, Chrome, Edge, Firefox, Safari). On unsupported browsers the password flow is the only path; we feature-detect and hide the passkey CTA.
  • iOS PWA standalone WebAuthn. Apple has had platform bugs in earlier iOS versions where standalone PWAs couldn't trigger WebAuthn. iOS 17+ is reliable. Document the minimum version.
  • Server Action surface. Server Actions ARE network-callable (Next.js routes them). They aren't "private functions" — anyone who reverse-engineers the Next.js wire format can call them. Mitigation: every action that requires auth checks the session inside the action body. The cost of reverse-engineering Next.js's encoding is much higher than calling an open /api/foo endpoint, so the practical attack surface is similar to a per-route auth-required /api/* proxy.

Out-of-Scope Follow-Ups

  • B4 cutover — separate cycle: delete app/cm_web_view.py, retire cm-web (Flask) service, rename cm-web-nextcm-web. After B4, the legacy Flask UI (which has no auth) goes away entirely.
  • Authelia / SSO — if multi-deployment SSO ever becomes a need, swap the in-app auth for an Authelia container. No timeline; revisit if/when.
  • Session listing / revocation — show "active sessions" on settings, allow remote logout. Useful for "I lost a phone" recovery if you want stricter than "rotate CM_AUTH_SECRET". YAGNI for now.
  • CSRF token on Server Actions — Next.js's Server Action transport already includes a hidden token, but reviewing the framework's CSRF posture for our specific deployment is an exercise we can do separately.
  • Failed-login lockout — a small per-IP counter that returns 429 after N bad password attempts. Defense-in-depth; aaPanel C4 rate-limit also helps.