yiekheng 97dbb79977 Add design spec for debug-mode hotfix (env-driven CM_DEBUG)

Documents the env-driven debug toggle that replaces the hardcoded
debug=True in cm_api.py and cm_web_view.py. Default off so the
Werkzeug debugger isn't reachable in rex/siong containers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 16:15:43 +08:00

7.2 KiB

Raw Blame History

Debug-Mode Hotfix: Env-Driven `CM_DEBUG`

Date: 2026-05-02 Status: Approved (design) Scope: Hotfix only. Larger security hardening (real WSGI server, reverse proxy, auth, scanner deflection) is tracked separately under the security-hardening sub-project.

Problem

Both Flask entrypoints currently start with the Werkzeug debugger enabled:

app/cm_web_view.py:748 — app.run(host='0.0.0.0', port=8000, debug=True)
app/cm_api.py:160 — def run(self, port=3000, debug=True), then self.app.run(host='0.0.0.0', port=port, debug=debug)

Container logs confirm the debugger is active in deployed containers (* Debug mode: on, Debugger PIN: 702-685-302). The Werkzeug debugger gives remote code execution to anyone who can reach the port and supply the PIN, and the same containers are receiving public-style scanner probes (/.env, /.git/config, /.aws/config, /.htpasswd). This is the highest-priority issue in the codebase right now.

The user wants to keep debug mode available locally (local = dev tier) while ensuring it is off in the rex and siong production deployments.

Goal

Make debug mode opt-in via the CM_DEBUG environment variable. Default off. No other behavior changes.

Non-Goals

Switching from app.run to a production WSGI server (gunicorn/uvicorn). Belongs to security hardening.
Adding a reverse proxy, TLS, auth, or rate limiting.
Changing app/cm_bot_hal.py hardcoded credentials.
Touching cm_telegram.py or cm_transfer_credit.py — neither runs a Flask server.
Adding robots.txt or scanner deflection.

Design

`_debug_enabled()` helper

Both Flask modules add the same small helper. Defined locally in each file (no new shared module — only two call sites, and app/__init__.py is currently a near-empty package marker).

def _debug_enabled() -> bool:
    return os.getenv("CM_DEBUG", "false").strip().lower() in ("1", "true", "yes")

Accepts 1, true, yes (case-insensitive, whitespace-trimmed) as truthy. Anything else, including unset, is false. This matches the lenient parsing pattern already used for env-driven config in the recent refactor (commit 45303d0).

`app/cm_web_view.py`

Replace the bottom __main__ block:

if __name__ == '__main__':
    print("Starting CM Web View...")
    print("Web interface will be available at: http://localhost:8000")
    print("Make sure the API server is running on port 3000")
    app.run(host='0.0.0.0', port=8000, debug=_debug_enabled())

os is already imported at the top of the file (line 10) — no new import needed.

`app/cm_api.py`

Three changes:

Add import os at the top of the file (currently absent — only threading, Flask, and .db are imported).

Change the run signature default so callers can still force-override, but unspecified means "read the env":

def run(self, port=3000, debug=None):
    if debug is None:
        debug = _debug_enabled()
    ...
    self.app.run(host='0.0.0.0', port=port, debug=debug)

Leave run_in_thread(self, port=3000, debug=False) alone. It is only used internally and its debug=False default is already safe; passing debug=None would break that contract.

The __main__ block stays as api.run(port=3000) — by passing nothing it now picks up the env-driven default.

`docker-compose.yml`

Add CM_DEBUG: ${CM_DEBUG:-false} to the environment: blocks of api-server and web-view (the only Flask services). The ${CM_DEBUG:-false} form ensures the variable is always defined inside the container, even if the operator forgot to set it in their .env. Telegram and transfer services do not need it.

docker-compose.override.yml does not need changes — it inherits environment: from the base file.

`.env.example`

Add a new section near the top:

# === Runtime ===
# Set to true ONLY in local dev. Werkzeug debugger = RCE if exposed.
CM_DEBUG=false

`envs/rex/.env` and `envs/siong/.env`

These files are intentionally not in git (the directories are committed empty). The operator's existing prod env files do not set CM_DEBUG, which makes the default (false) apply automatically. No edit needed; the README/AGENTS.md update below documents the convention for any new deployment.

Documentation

AGENTS.md — add a one-line entry under "Build, Test, and Development Commands" or "Security & Configuration Tips" noting CM_DEBUG=true is the local-dev override and must stay unset in published env files.

Files Changed

File	Change
`app/cm_web_view.py`	Add `_debug_enabled()` helper; pass it to `app.run(debug=...)`.
`app/cm_api.py`	Add `import os`; add `_debug_enabled()` helper; change `run()` default to `debug=None` and resolve from env when `None`.
`docker-compose.yml`	Add `CM_DEBUG: ${CM_DEBUG:-false}` to `api-server` and `web-view` `environment:` blocks.
`.env.example`	New `Runtime` section documenting `CM_DEBUG`.
`AGENTS.md`	One-line note about `CM_DEBUG`.

No new dependencies. No version bumps.

Verification

Local, debug on. Set CM_DEBUG=true in repo-root .env, run bash scripts/local_build.sh. Web-view log shows * Debug mode: on and a Debugger PIN: ... line. API log shows the same.
Local, debug off. Set CM_DEBUG=false (or remove the line). Rebuild. Logs show * Debug mode: off and no PIN line. Hitting /api/acc/ and /api/user/ still returns 200 with valid JSON.
Prod parity check. With CM_DEBUG unset in the deploy env (matches rex/siong today), confirm container logs show debug off. Confirm the existing 192.168.0.210 scanner probes for /.env and /.git/config still 404 with no traceback or debugger response.
Override path. From a Python REPL inside the api container, calling CM_API().run(port=3001, debug=True) still honors the explicit override (regression check on the debug=None sentinel).

Risk

Minimal. The Werkzeug debug=False path is the framework default and is what every production Flask deployment uses. The only user-visible behavior loss is the in-browser traceback page and auto-reloader, both of which should never have been on in containers in the first place.

The one edge case worth naming: the existing cm_api.py:run() signature lets a caller pass debug=False explicitly and still get debug-off behavior; changing the default to None preserves that. Nothing in the repo calls run() with a positional debug argument (verified via grep before implementation), so the signature change is safe.

Out-of-Scope Follow-Ups (for the security-hardening spec)

Captured here so they aren't forgotten:

Replace app.run with gunicorn (or waitress) in both cm_api and cm_web_view Dockerfiles.
Put a reverse proxy (Caddy/Traefik/nginx) in front of web-view with TLS, basic auth or token auth, and rate limiting.
Add robots.txt returning Disallow: / and a 410/444 default for unknown paths to deflect noisy scanners.
Audit app/cm_bot_hal.py hardcoded credentials/PIN — already flagged in AGENTS.md "Security & Configuration Tips".
Confirm whether 192.168.0.210 is a NAT hop for public traffic (router/firewall question) and decide whether the host port should be bound only to a private interface.

7.2 KiB Raw Blame History

Debug-Mode Hotfix: Env-Driven CM_DEBUG