cm_bot_v2/docs/superpowers/specs/2026-05-02-debug-mode-hotfix-design.md
yiekheng 97dbb79977 Add design spec for debug-mode hotfix (env-driven CM_DEBUG)
Documents the env-driven debug toggle that replaces the hardcoded
debug=True in cm_api.py and cm_web_view.py. Default off so the
Werkzeug debugger isn't reachable in rex/siong containers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 16:15:43 +08:00

135 lines
7.2 KiB
Markdown

# Debug-Mode Hotfix: Env-Driven `CM_DEBUG`
**Date:** 2026-05-02
**Status:** Approved (design)
**Scope:** Hotfix only. Larger security hardening (real WSGI server, reverse proxy, auth, scanner deflection) is tracked separately under the security-hardening sub-project.
## Problem
Both Flask entrypoints currently start with the Werkzeug debugger enabled:
- `app/cm_web_view.py:748``app.run(host='0.0.0.0', port=8000, debug=True)`
- `app/cm_api.py:160``def run(self, port=3000, debug=True)`, then `self.app.run(host='0.0.0.0', port=port, debug=debug)`
Container logs confirm the debugger is active in deployed containers (`* Debug mode: on`, `Debugger PIN: 702-685-302`). The Werkzeug debugger gives remote code execution to anyone who can reach the port and supply the PIN, and the same containers are receiving public-style scanner probes (`/.env`, `/.git/config`, `/.aws/config`, `/.htpasswd`). This is the highest-priority issue in the codebase right now.
The user wants to keep debug mode available locally (local = dev tier) while ensuring it is off in the rex and siong production deployments.
## Goal
Make debug mode opt-in via the `CM_DEBUG` environment variable. Default off. No other behavior changes.
## Non-Goals
- Switching from `app.run` to a production WSGI server (gunicorn/uvicorn). Belongs to security hardening.
- Adding a reverse proxy, TLS, auth, or rate limiting.
- Changing `app/cm_bot_hal.py` hardcoded credentials.
- Touching `cm_telegram.py` or `cm_transfer_credit.py` — neither runs a Flask server.
- Adding `robots.txt` or scanner deflection.
## Design
### `_debug_enabled()` helper
Both Flask modules add the same small helper. Defined locally in each file (no new shared module — only two call sites, and `app/__init__.py` is currently a near-empty package marker).
```python
def _debug_enabled() -> bool:
return os.getenv("CM_DEBUG", "false").strip().lower() in ("1", "true", "yes")
```
Accepts `1`, `true`, `yes` (case-insensitive, whitespace-trimmed) as truthy. Anything else, including unset, is false. This matches the lenient parsing pattern already used for env-driven config in the recent refactor (commit `45303d0`).
### `app/cm_web_view.py`
Replace the bottom `__main__` block:
```python
if __name__ == '__main__':
print("Starting CM Web View...")
print("Web interface will be available at: http://localhost:8000")
print("Make sure the API server is running on port 3000")
app.run(host='0.0.0.0', port=8000, debug=_debug_enabled())
```
`os` is already imported at the top of the file (line 10) — no new import needed.
### `app/cm_api.py`
Three changes:
0. Add `import os` at the top of the file (currently absent — only `threading`, Flask, and `.db` are imported).
1. Change the `run` signature default so callers can still force-override, but unspecified means "read the env":
```python
def run(self, port=3000, debug=None):
if debug is None:
debug = _debug_enabled()
...
self.app.run(host='0.0.0.0', port=port, debug=debug)
```
2. Leave `run_in_thread(self, port=3000, debug=False)` alone. It is only used internally and its `debug=False` default is already safe; passing `debug=None` would break that contract.
The `__main__` block stays as `api.run(port=3000)` — by passing nothing it now picks up the env-driven default.
### `docker-compose.yml`
Add `CM_DEBUG: ${CM_DEBUG:-false}` to the `environment:` blocks of `api-server` and `web-view` (the only Flask services). The `${CM_DEBUG:-false}` form ensures the variable is *always* defined inside the container, even if the operator forgot to set it in their `.env`. Telegram and transfer services do not need it.
`docker-compose.override.yml` does not need changes — it inherits `environment:` from the base file.
### `.env.example`
Add a new section near the top:
```
# === Runtime ===
# Set to true ONLY in local dev. Werkzeug debugger = RCE if exposed.
CM_DEBUG=false
```
### `envs/rex/.env` and `envs/siong/.env`
These files are intentionally not in git (the directories are committed empty). The operator's existing prod env files do not set `CM_DEBUG`, which makes the default (`false`) apply automatically. No edit needed; the README/AGENTS.md update below documents the convention for any new deployment.
### Documentation
- `AGENTS.md` — add a one-line entry under "Build, Test, and Development Commands" or "Security & Configuration Tips" noting `CM_DEBUG=true` is the local-dev override and **must** stay unset in published env files.
## Files Changed
| File | Change |
|---|---|
| `app/cm_web_view.py` | Add `_debug_enabled()` helper; pass it to `app.run(debug=...)`. |
| `app/cm_api.py` | Add `import os`; add `_debug_enabled()` helper; change `run()` default to `debug=None` and resolve from env when `None`. |
| `docker-compose.yml` | Add `CM_DEBUG: ${CM_DEBUG:-false}` to `api-server` and `web-view` `environment:` blocks. |
| `.env.example` | New `Runtime` section documenting `CM_DEBUG`. |
| `AGENTS.md` | One-line note about `CM_DEBUG`. |
No new dependencies. No version bumps.
## Verification
1. **Local, debug on.** Set `CM_DEBUG=true` in repo-root `.env`, run `bash scripts/local_build.sh`. Web-view log shows `* Debug mode: on` and a `Debugger PIN: ...` line. API log shows the same.
2. **Local, debug off.** Set `CM_DEBUG=false` (or remove the line). Rebuild. Logs show `* Debug mode: off` and **no PIN line**. Hitting `/api/acc/` and `/api/user/` still returns 200 with valid JSON.
3. **Prod parity check.** With `CM_DEBUG` unset in the deploy env (matches rex/siong today), confirm container logs show debug off. Confirm the existing `192.168.0.210` scanner probes for `/.env` and `/.git/config` still 404 with no traceback or debugger response.
4. **Override path.** From a Python REPL inside the api container, calling `CM_API().run(port=3001, debug=True)` still honors the explicit override (regression check on the `debug=None` sentinel).
## Risk
Minimal. The Werkzeug `debug=False` path is the framework default and is what every production Flask deployment uses. The only user-visible behavior loss is the in-browser traceback page and auto-reloader, both of which should never have been on in containers in the first place.
The one edge case worth naming: the existing `cm_api.py:run()` signature lets a caller pass `debug=False` explicitly and still get debug-off behavior; changing the default to `None` preserves that. Nothing in the repo calls `run()` with a positional `debug` argument (verified via grep before implementation), so the signature change is safe.
## Out-of-Scope Follow-Ups (for the security-hardening spec)
Captured here so they aren't forgotten:
- Replace `app.run` with gunicorn (or waitress) in both `cm_api` and `cm_web_view` Dockerfiles.
- Put a reverse proxy (Caddy/Traefik/nginx) in front of `web-view` with TLS, basic auth or token auth, and rate limiting.
- Add `robots.txt` returning `Disallow: /` and a 410/444 default for unknown paths to deflect noisy scanners.
- Audit `app/cm_bot_hal.py` hardcoded credentials/PIN — already flagged in `AGENTS.md` "Security & Configuration Tips".
- Confirm whether `192.168.0.210` is a NAT hop for public traffic (router/firewall question) and decide whether the host port should be bound only to a private interface.