cm_bot_v2/docs/aapanel-hardening.md
yiekheng ebccad2094 B4 cutover: retire Flask cm-web, rename cm-web-next → cm-web
End-state: a single web service (Next.js dashboard) per deployment, no
side-by-side Flask UI. The image name 'cm-web' now points at the Next.js
build; the legacy 'cm-web-next' tag is no longer published.

Changes:
- Delete app/cm_web_view.py and the Flask docker/web/Dockerfile.
- Rename docker/web-next/ → docker/web/ (Next.js Dockerfile takes the
  cm-web slot).
- docker-compose.yml: drop the web-view service. Rename web-next → web,
  container ${CM_DEPLOY_NAME}-web-next → ${CM_DEPLOY_NAME}-web, image
  cm-web-next → cm-web, named volume web-next-auth-data → web-auth-data.
  transfer-bot's depends_on no longer references web-view (vestigial
  startup ordering, never a runtime dependency).
- docker-compose.override.yml: same rename, dockerfile path updated.
- envs: drop CM_WEB_NEXT_HOST_PORT. Repurpose CM_WEB_HOST_PORT for the
  Next.js port (8010 dev, 8011 rex, 8012 siong) — same numeric values
  formerly held by CM_WEB_NEXT_HOST_PORT, so aaPanel routes don't move.
- scripts/dev.sh: drops web-view + web-next from up/reset-db/logs;
  --remove-orphans still cleans up legacy containers from before cutover.
- scripts/publish.sh: drop the cm-web-next build target.
- tests/test_debug_enabled.py: drop app.cm_web_view from the helper
  matrix (cm_api is now the only Flask entrypoint with _debug_enabled).
- AGENTS.md / README.md / docs/aapanel-hardening.md: rewrite Flask-era
  references; add migration steps for existing stacks; update aaPanel
  port references (8000/8001/8005 → 8010/8011/8012).
- .gitignore: add .env, .venv/, .playwright-mcp/, node_modules/, .next/
  so 'git add -A' can't accidentally stage secrets or build artifacts.

Operator action required to upgrade an existing deployment:
  1. .env: drop CM_WEB_NEXT_HOST_PORT line. Set CM_WEB_HOST_PORT to
     what CM_WEB_NEXT_HOST_PORT was. Make sure CM_AUTH_SECRET is set.
  2. aaPanel: if proxy_pass pointed at the legacy Flask port
     (8000/8001/8005), switch it to the new one (8010/8011/8012).
  3. Pull the new cm-web image (Next.js) and redeploy the stack. The
     old ${CM_DEPLOY_NAME}-web-view and ${CM_DEPLOY_NAME}-web-next
     containers will be replaced by a single ${CM_DEPLOY_NAME}-web.

Verified locally: docker-compose YAML parses; transfer-bot runtime is
unchanged (only depends_on tidied); 38-test python suite passes.
2026-05-03 10:12:20 +08:00

174 lines
9.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# aaPanel Hardening Guide (Operator)
This is the hand-over guide for the C3 (auth), C4 (rate-limit + scanner deflection), and C7 (host firewall) slices of the prod hardening cycle. None of this is implemented in the repo — it lives in your aaPanel configuration and on your Flask host(s).
Companion spec: [superpowers/specs/2026-05-02-prod-hardening-c1-c5-c6-design.md](superpowers/specs/2026-05-02-prod-hardening-c1-c5-c6-design.md).
## Threat model
aaPanel terminates TLS for `https://<rex-domain>`, `https://<siong-domain>`, and `https://heng.04080616.xyz` (the dev tier — see "Dev vhost" below) and proxies to LAN-reachable Next.js dashboard ports on each host (8011 rex, 8012 siong, 8010 dev). A scanner on the public internet → aaPanel → app. Without these mitigations, every `/.env` `/.git/config` `/.aws/config` `/.htpasswd` `/php.php` probe round-trips through the proxy. With them, aaPanel returns 444 immediately and the app never sees the request.
> **Post-B4 update.** The dashboard now has built-in `/cm-auth` (password + WebAuthn passkey) that gates every route via Next.js middleware. C3 (basic auth at the proxy) is no longer the *primary* defense — it's optional belt-and-braces. Keep it only if you want a second factor at the edge before the Next.js middleware sees a request. The C4 (scanner deflection + rate limit) and C7 (host firewall) sections still apply unchanged in spirit; only the port numbers moved.
## C3 — (Optional) Basic auth on the rex/siong/dev vhosts
Goal: an extra password challenge at the edge before requests reach `/cm-auth`. Skip this if `/cm-auth` is enough for your threat model.
Generate an htpasswd file (one per deployment is cleaner):
```bash
# On the aaPanel host, as root:
htpasswd -c /www/server/panel/data/htpasswd-rex rex-operator
htpasswd -c /www/server/panel/data/htpasswd-siong siong-operator
htpasswd -c /www/server/panel/data/htpasswd-dev dev-operator
chmod 640 /www/server/panel/data/htpasswd-*
chown www:www /www/server/panel/data/htpasswd-*
```
Add to the rex vhost's `server { ... }` block (aaPanel: site → settings → "Configuration File"):
```nginx
auth_basic "rex restricted";
auth_basic_user_file /www/server/panel/data/htpasswd-rex;
```
Same shape for siong (`htpasswd-siong`) and dev (`htpasswd-dev`). Use a different password per deployment — reusing the same one means a leaked dev credential exposes prod. Reload nginx (aaPanel does this automatically on save).
### Phone UX note
Basic auth + iOS/Android keychain + Face ID / Touch ID flow: on first login, save the password into the OS keychain when prompted ("Save password to iCloud Keychain" on iOS, "Save to Google Password Manager" on Android). Subsequent visits trigger Face ID / fingerprint to autofill the basic-auth dialog. Caveats:
- **Safari (iOS):** integration is reliable. Face ID prompts almost every visit unless you tick "Remember me on this device" in Safari's password autofill settings.
- **Chrome (Android):** Google Password Manager autofills basic-auth in newer Chrome versions; biometric prompt appears.
- **In-app browsers (Telegram, WhatsApp link previews):** often *don't* autofill basic-auth and force you to type. If this matters, share `https://...` URLs and ask people to open in their default browser.
If autofill behavior is choppy, the upgrade path is Authelia + WebAuthn passkeys — its own future cycle, not in this one.
## C4 — Rate limit + scanner deflection
### Scanner deflection (444 on known probe paths)
In each vhost's `server { ... }`:
```nginx
# Deflect generic web vulnerability scanners. Return 444 (no response,
# closes connection) instead of letting them reach Flask.
location ~* "^/(\.env|\.env\..*|\.git/.*|\.aws/.*|\.dockerenv|\.htpasswd|\.npmrc|.+\.php|i\.php|test\.php|php\.php|wp-(login|admin|content)/)" {
access_log off;
return 444;
}
# Robots: tell well-behaved crawlers to leave us alone.
location = /robots.txt {
add_header Content-Type text/plain;
return 200 "User-agent: *\nDisallow: /\n";
}
```
### Rate limit (per source IP)
In the `http { ... }` block (one level above `server`; in aaPanel typically lives in the global nginx config or in a snippet):
```nginx
# 10MB shared zone, 30 requests/sec per source IP.
limit_req_zone $binary_remote_addr zone=cm_general:10m rate=30r/s;
```
Then inside each vhost's `server { ... }`:
```nginx
# Allow short bursts (60 reqs above rate) before throttling.
limit_req zone=cm_general burst=60 nodelay;
limit_req_status 429;
```
30 r/s × per-IP is generous for legitimate UI traffic and tight enough to slow a scanner down to nuisance levels.
## Dev vhost — `heng.04080616.xyz` → dev PC
The dev tier (sub-project A) runs on a dev PC: `bash scripts/dev.sh up` → Next.js dashboard on `0.0.0.0:8010`. Routing aaPanel to it adds public reach (with `/cm-auth` gating) so you can hand someone a URL to test against without giving them VPN.
aaPanel vhost for `heng.04080616.xyz` (in addition to the C4/C7 blocks above):
```nginx
location / {
proxy_pass http://<dev-pc-lan-ip>:8010;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_read_timeout 60s;
}
```
`X-Forwarded-Host` and `X-Forwarded-Proto` are required so WebAuthn passkey enrollment uses the public hostname (`heng.04080616.xyz`) as the relying-party ID, not the LAN IP — passkeys enrolled at one rpID can't authenticate at another, so a misconfigured proxy will silently break passkey login.
Replace `<dev-pc-lan-ip>` with the dev PC's address on your LAN.
⚠️ **Important: keep `CM_DEBUG=false` in the dev `.env` whenever aaPanel proxies the dev PC publicly.** Setting `CM_DEBUG=true` does two things:
1. The api-server (Flask) exposes the Werkzeug debugger — RCE if reachable.
2. The Next.js dashboard drops the `Secure` flag on the session cookie so phone-on-LAN HTTP testing works.
Both are dev-only conveniences. With aaPanel proxying through HTTPS, leave `CM_DEBUG=false` and use the in-app `/cm-auth` flow.
## C7 — Host firewall on each web host
Restrict the LAN-reachable Next.js dashboard ports to only aaPanel's IP. Without this, anyone else on the LAN can hit the app directly and bypass everything in C4. Apply on each host that runs a stack: rex, siong, *and* the dev PC.
Replace `<aapanel-host-ip>` with the address of your aaPanel box.
On rex/siong hosts (ports 8011 / 8012):
```bash
sudo ufw allow from <aapanel-host-ip> to any port 8011 proto tcp comment 'rex web ← aaPanel only'
sudo ufw allow from <aapanel-host-ip> to any port 8012 proto tcp comment 'siong web ← aaPanel only'
sudo ufw deny 8011/tcp
sudo ufw deny 8012/tcp
sudo ufw reload
sudo ufw status numbered
```
On the dev PC (port 8010 — match `CM_WEB_HOST_PORT` from `envs/dev/.env`):
```bash
sudo ufw allow from <aapanel-host-ip> to any port 8010 proto tcp comment 'dev web ← aaPanel only'
sudo ufw allow from 127.0.0.1 to any port 8010 proto tcp comment 'dev web ← localhost'
sudo ufw deny 8010/tcp
sudo ufw reload
```
The localhost rule on the dev PC is so you can still load `http://localhost:8010` directly while iterating, without going through aaPanel.
Verify from a third machine on the LAN:
```bash
nmap -p 8010,8011,8012 <web-host-ip>
# All three ports should show 'filtered' from anywhere except the aaPanel host
# (and except localhost on the dev PC).
```
If you don't run ufw and prefer iptables directly, the equivalent rules are:
```bash
iptables -A INPUT -p tcp --dport 8011 -s <aapanel-host-ip> -j ACCEPT
iptables -A INPUT -p tcp --dport 8012 -s <aapanel-host-ip> -j ACCEPT
iptables -A INPUT -p tcp --dport 8010 -s <aapanel-host-ip> -j ACCEPT
iptables -A INPUT -p tcp --dport 8010 -s 127.0.0.1 -j ACCEPT
iptables -A INPUT -p tcp --dport 8011 -j DROP
iptables -A INPUT -p tcp --dport 8012 -j DROP
iptables -A INPUT -p tcp --dport 8010 -j DROP
```
(Persist via `iptables-save > /etc/iptables/rules.v4` or your distro's preferred mechanism.)
## Verification (after all blocks applied)
1. Hit any UI without a session: `curl -sI https://<rex-domain>/``307` redirect to `/cm-auth?next=/`. Same shape for siong and `https://heng.04080616.xyz/`. (If C3 basic auth is also configured, you get `401` first.)
2. After signing in via `/cm-auth`: subsequent requests return `200 OK`. Use the browser; curl alone won't carry the cookie unless you `-c`/`-b` it.
3. Scanner path: `curl -i https://<rex-domain>/.env` → connection closed (444 → curl shows "Empty reply from server"). The app logs show no entry for this request.
4. Hammer-test rate limit: `for i in $(seq 1 200); do curl -s -o /dev/null -w "%{http_code}\n" https://<rex-domain>/; done | sort | uniq -c` → mix of `307`s up to the burst, then `429`s.
5. From a non-aaPanel host on the LAN: `nmap -p 8010,8011,8012 <web-host-ip>` → all three ports `filtered` (localhost on dev PC still allowed).
6. **Dev-specific check.** On the dev PC, `bash scripts/dev.sh logs api-server | grep "Debugger PIN"` should return nothing once `CM_DEBUG=false`. Sign in via the browser at `https://heng.04080616.xyz/cm-auth` and confirm the dashboard renders.