cm_bot_v2/docs/superpowers/specs/2026-05-02-debug-mode-hotfix-design.md
yiekheng 97dbb79977 Add design spec for debug-mode hotfix (env-driven CM_DEBUG)
Documents the env-driven debug toggle that replaces the hardcoded
debug=True in cm_api.py and cm_web_view.py. Default off so the
Werkzeug debugger isn't reachable in rex/siong containers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 16:15:43 +08:00

7.2 KiB

Debug-Mode Hotfix: Env-Driven CM_DEBUG

Date: 2026-05-02 Status: Approved (design) Scope: Hotfix only. Larger security hardening (real WSGI server, reverse proxy, auth, scanner deflection) is tracked separately under the security-hardening sub-project.

Problem

Both Flask entrypoints currently start with the Werkzeug debugger enabled:

  • app/cm_web_view.py:748app.run(host='0.0.0.0', port=8000, debug=True)
  • app/cm_api.py:160def run(self, port=3000, debug=True), then self.app.run(host='0.0.0.0', port=port, debug=debug)

Container logs confirm the debugger is active in deployed containers (* Debug mode: on, Debugger PIN: 702-685-302). The Werkzeug debugger gives remote code execution to anyone who can reach the port and supply the PIN, and the same containers are receiving public-style scanner probes (/.env, /.git/config, /.aws/config, /.htpasswd). This is the highest-priority issue in the codebase right now.

The user wants to keep debug mode available locally (local = dev tier) while ensuring it is off in the rex and siong production deployments.

Goal

Make debug mode opt-in via the CM_DEBUG environment variable. Default off. No other behavior changes.

Non-Goals

  • Switching from app.run to a production WSGI server (gunicorn/uvicorn). Belongs to security hardening.
  • Adding a reverse proxy, TLS, auth, or rate limiting.
  • Changing app/cm_bot_hal.py hardcoded credentials.
  • Touching cm_telegram.py or cm_transfer_credit.py — neither runs a Flask server.
  • Adding robots.txt or scanner deflection.

Design

_debug_enabled() helper

Both Flask modules add the same small helper. Defined locally in each file (no new shared module — only two call sites, and app/__init__.py is currently a near-empty package marker).

def _debug_enabled() -> bool:
    return os.getenv("CM_DEBUG", "false").strip().lower() in ("1", "true", "yes")

Accepts 1, true, yes (case-insensitive, whitespace-trimmed) as truthy. Anything else, including unset, is false. This matches the lenient parsing pattern already used for env-driven config in the recent refactor (commit 45303d0).

app/cm_web_view.py

Replace the bottom __main__ block:

if __name__ == '__main__':
    print("Starting CM Web View...")
    print("Web interface will be available at: http://localhost:8000")
    print("Make sure the API server is running on port 3000")
    app.run(host='0.0.0.0', port=8000, debug=_debug_enabled())

os is already imported at the top of the file (line 10) — no new import needed.

app/cm_api.py

Three changes:

  1. Add import os at the top of the file (currently absent — only threading, Flask, and .db are imported).

  2. Change the run signature default so callers can still force-override, but unspecified means "read the env":

    def run(self, port=3000, debug=None):
        if debug is None:
            debug = _debug_enabled()
        ...
        self.app.run(host='0.0.0.0', port=port, debug=debug)
    
  3. Leave run_in_thread(self, port=3000, debug=False) alone. It is only used internally and its debug=False default is already safe; passing debug=None would break that contract.

The __main__ block stays as api.run(port=3000) — by passing nothing it now picks up the env-driven default.

docker-compose.yml

Add CM_DEBUG: ${CM_DEBUG:-false} to the environment: blocks of api-server and web-view (the only Flask services). The ${CM_DEBUG:-false} form ensures the variable is always defined inside the container, even if the operator forgot to set it in their .env. Telegram and transfer services do not need it.

docker-compose.override.yml does not need changes — it inherits environment: from the base file.

.env.example

Add a new section near the top:

# === Runtime ===
# Set to true ONLY in local dev. Werkzeug debugger = RCE if exposed.
CM_DEBUG=false

envs/rex/.env and envs/siong/.env

These files are intentionally not in git (the directories are committed empty). The operator's existing prod env files do not set CM_DEBUG, which makes the default (false) apply automatically. No edit needed; the README/AGENTS.md update below documents the convention for any new deployment.

Documentation

  • AGENTS.md — add a one-line entry under "Build, Test, and Development Commands" or "Security & Configuration Tips" noting CM_DEBUG=true is the local-dev override and must stay unset in published env files.

Files Changed

File Change
app/cm_web_view.py Add _debug_enabled() helper; pass it to app.run(debug=...).
app/cm_api.py Add import os; add _debug_enabled() helper; change run() default to debug=None and resolve from env when None.
docker-compose.yml Add CM_DEBUG: ${CM_DEBUG:-false} to api-server and web-view environment: blocks.
.env.example New Runtime section documenting CM_DEBUG.
AGENTS.md One-line note about CM_DEBUG.

No new dependencies. No version bumps.

Verification

  1. Local, debug on. Set CM_DEBUG=true in repo-root .env, run bash scripts/local_build.sh. Web-view log shows * Debug mode: on and a Debugger PIN: ... line. API log shows the same.
  2. Local, debug off. Set CM_DEBUG=false (or remove the line). Rebuild. Logs show * Debug mode: off and no PIN line. Hitting /api/acc/ and /api/user/ still returns 200 with valid JSON.
  3. Prod parity check. With CM_DEBUG unset in the deploy env (matches rex/siong today), confirm container logs show debug off. Confirm the existing 192.168.0.210 scanner probes for /.env and /.git/config still 404 with no traceback or debugger response.
  4. Override path. From a Python REPL inside the api container, calling CM_API().run(port=3001, debug=True) still honors the explicit override (regression check on the debug=None sentinel).

Risk

Minimal. The Werkzeug debug=False path is the framework default and is what every production Flask deployment uses. The only user-visible behavior loss is the in-browser traceback page and auto-reloader, both of which should never have been on in containers in the first place.

The one edge case worth naming: the existing cm_api.py:run() signature lets a caller pass debug=False explicitly and still get debug-off behavior; changing the default to None preserves that. Nothing in the repo calls run() with a positional debug argument (verified via grep before implementation), so the signature change is safe.

Out-of-Scope Follow-Ups (for the security-hardening spec)

Captured here so they aren't forgotten:

  • Replace app.run with gunicorn (or waitress) in both cm_api and cm_web_view Dockerfiles.
  • Put a reverse proxy (Caddy/Traefik/nginx) in front of web-view with TLS, basic auth or token auth, and rate limiting.
  • Add robots.txt returning Disallow: / and a 410/444 default for unknown paths to deflect noisy scanners.
  • Audit app/cm_bot_hal.py hardcoded credentials/PIN — already flagged in AGENTS.md "Security & Configuration Tips".
  • Confirm whether 192.168.0.210 is a NAT hop for public traffic (router/firewall question) and decide whether the host port should be bound only to a private interface.