cm_bot_v2/docs/superpowers/specs/2026-05-02-debug-mode-hotfix-design.md

# Debug-Mode Hotfix: Env-Driven `CM_DEBUG`

**Date:** 2026-05-02
**Status:** Approved (design)
**Scope:** Hotfix only. Larger security hardening (real WSGI server, reverse proxy, auth, scanner deflection) is tracked separately under the security-hardening sub-project.

## Problem

Both Flask entrypoints currently start with the Werkzeug debugger enabled:

- `app/cm_web_view.py:748` — `app.run(host='0.0.0.0', port=8000, debug=True)`
- `app/cm_api.py:160` — `def run(self, port=3000, debug=True)`, then `self.app.run(host='0.0.0.0', port=port, debug=debug)`

Container logs confirm the debugger is active in deployed containers (`* Debug mode: on`, `Debugger PIN: 702-685-302`). The Werkzeug debugger gives remote code execution to anyone who can reach the port and supply the PIN, and the same containers are receiving public-style scanner probes (`/.env`, `/.git/config`, `/.aws/config`, `/.htpasswd`). This is the highest-priority issue in the codebase right now.

The user wants to keep debug mode available locally (local = dev tier) while ensuring it is off in the rex and siong production deployments.

## Goal

Make debug mode opt-in via the `CM_DEBUG` environment variable. Default off. No other behavior changes.

## Non-Goals

- Switching from `app.run` to a production WSGI server (gunicorn/uvicorn). Belongs to security hardening.
- Adding a reverse proxy, TLS, auth, or rate limiting.
- Changing `app/cm_bot_hal.py` hardcoded credentials.
- Touching `cm_telegram.py` or `cm_transfer_credit.py` — neither runs a Flask server.
- Adding `robots.txt` or scanner deflection.

## Design

### `_debug_enabled()` helper

Both Flask modules add the same small helper. Defined locally in each file (no new shared module — only two call sites, and `app/__init__.py` is currently a near-empty package marker).

```python
def _debug_enabled() -> bool:
    return os.getenv("CM_DEBUG", "false").strip().lower() in ("1", "true", "yes")
```

Accepts `1`, `true`, `yes` (case-insensitive, whitespace-trimmed) as truthy. Anything else, including unset, is false. This matches the lenient parsing pattern already used for env-driven config in the recent refactor (commit `45303d0`).

### `app/cm_web_view.py`

Replace the bottom `__main__` block:

```python
if __name__ == '__main__':
    print("Starting CM Web View...")
    print("Web interface will be available at: http://localhost:8000")
    print("Make sure the API server is running on port 3000")
    app.run(host='0.0.0.0', port=8000, debug=_debug_enabled())
```

`os` is already imported at the top of the file (line 10) — no new import needed.

### `app/cm_api.py`

Three changes:

0. Add `import os` at the top of the file (currently absent — only `threading`, Flask, and `.db` are imported).

1. Change the `run` signature default so callers can still force-override, but unspecified means "read the env":

   ```python
   def run(self, port=3000, debug=None):
       if debug is None:
           debug = _debug_enabled()
       ...
       self.app.run(host='0.0.0.0', port=port, debug=debug)
   ```

2. Leave `run_in_thread(self, port=3000, debug=False)` alone. It is only used internally and its `debug=False` default is already safe; passing `debug=None` would break that contract.

The `__main__` block stays as `api.run(port=3000)` — by passing nothing it now picks up the env-driven default.

### `docker-compose.yml`

Add `CM_DEBUG: ${CM_DEBUG:-false}` to the `environment:` blocks of `api-server` and `web-view` (the only Flask services). The `${CM_DEBUG:-false}` form ensures the variable is *always* defined inside the container, even if the operator forgot to set it in their `.env`. Telegram and transfer services do not need it.

`docker-compose.override.yml` does not need changes — it inherits `environment:` from the base file.

### `.env.example`

Add a new section near the top:

```
# === Runtime ===
# Set to true ONLY in local dev. Werkzeug debugger = RCE if exposed.
CM_DEBUG=false
```

### `envs/rex/.env` and `envs/siong/.env`

These files are intentionally not in git (the directories are committed empty). The operator's existing prod env files do not set `CM_DEBUG`, which makes the default (`false`) apply automatically. No edit needed; the README/AGENTS.md update below documents the convention for any new deployment.

### Documentation

- `AGENTS.md` — add a one-line entry under "Build, Test, and Development Commands" or "Security & Configuration Tips" noting `CM_DEBUG=true` is the local-dev override and **must** stay unset in published env files.

## Files Changed

| File | Change |
|---|---|
| `app/cm_web_view.py` | Add `_debug_enabled()` helper; pass it to `app.run(debug=...)`. |
| `app/cm_api.py` | Add `import os`; add `_debug_enabled()` helper; change `run()` default to `debug=None` and resolve from env when `None`. |
| `docker-compose.yml` | Add `CM_DEBUG: ${CM_DEBUG:-false}` to `api-server` and `web-view` `environment:` blocks. |
| `.env.example` | New `Runtime` section documenting `CM_DEBUG`. |
| `AGENTS.md` | One-line note about `CM_DEBUG`. |

No new dependencies. No version bumps.

## Verification

1. **Local, debug on.** Set `CM_DEBUG=true` in repo-root `.env`, run `bash scripts/local_build.sh`. Web-view log shows `* Debug mode: on` and a `Debugger PIN: ...` line. API log shows the same.
2. **Local, debug off.** Set `CM_DEBUG=false` (or remove the line). Rebuild. Logs show `* Debug mode: off` and **no PIN line**. Hitting `/api/acc/` and `/api/user/` still returns 200 with valid JSON.
3. **Prod parity check.** With `CM_DEBUG` unset in the deploy env (matches rex/siong today), confirm container logs show debug off. Confirm the existing `192.168.0.210` scanner probes for `/.env` and `/.git/config` still 404 with no traceback or debugger response.
4. **Override path.** From a Python REPL inside the api container, calling `CM_API().run(port=3001, debug=True)` still honors the explicit override (regression check on the `debug=None` sentinel).

## Risk

Minimal. The Werkzeug `debug=False` path is the framework default and is what every production Flask deployment uses. The only user-visible behavior loss is the in-browser traceback page and auto-reloader, both of which should never have been on in containers in the first place.

The one edge case worth naming: the existing `cm_api.py:run()` signature lets a caller pass `debug=False` explicitly and still get debug-off behavior; changing the default to `None` preserves that. Nothing in the repo calls `run()` with a positional `debug` argument (verified via grep before implementation), so the signature change is safe.

## Out-of-Scope Follow-Ups (for the security-hardening spec)

Captured here so they aren't forgotten:

- Replace `app.run` with gunicorn (or waitress) in both `cm_api` and `cm_web_view` Dockerfiles.
- Put a reverse proxy (Caddy/Traefik/nginx) in front of `web-view` with TLS, basic auth or token auth, and rate limiting.
- Add `robots.txt` returning `Disallow: /` and a 410/444 default for unknown paths to deflect noisy scanners.
- Audit `app/cm_bot_hal.py` hardcoded credentials/PIN — already flagged in `AGENTS.md` "Security & Configuration Tips".
- Confirm whether `192.168.0.210` is a NAT hop for public traffic (router/firewall question) and decide whether the host port should be bound only to a private interface.