fix(docker): reuse node user instead of creating gid 1000 — unblocks publish

Bot + web Dockerfiles tried to addgroup -g 1000 app on top of
node:22-alpine, which already ships a `node` group at gid 1000.
Build aborted at runtime stage 5/5 with:
  addgroup: gid '1000' in use

Drop the addgroup/adduser pair on both images and just chown +
USER node onto the existing node user. Same hardening posture
(non-root, no shell login on the runtime image), one less moving
part. The compose dev overlay's `user: ${HOST_UID:-1000}:${HOST_GID:-1000}`
matches uid 1000 either way.

Plus:
- New docker-compose.portainer.yml: pulls cm-whatsapp-{bot,web}
  from gitea.04080616.xyz/yiekheng instead of building from
  source. Named volumes for sessions / media so the operator
  doesn't need shell access to manage state. Healthchecks on
  both services so Portainer's UI surfaces unhealthy containers.
- New docs/deploy-portainer.md walking through registry auth,
  stack creation, env vars, migrations, first sign-in, future
  redeploys, rollbacks.
- README links the Portainer guide alongside the dev path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
yiekheng 2026-05-10 22:09:12 +08:00
parent 954d382b54
commit 49f5c16b19
5 changed files with 303 additions and 9 deletions

View File

@ -130,6 +130,15 @@ Screen". Launches fullscreen.
`NO_SUDO=1` is the right setting if your user is in the `docker`
group (the default for this repo). Drop it if you need `sudo docker`.
## Deploying
- **Local dev**`NO_SUDO=1 scripts/dev.sh up` (described in Quick
start above).
- **Portainer** — push images with `scripts/publish.sh`, then deploy
the [`docker-compose.portainer.yml`](docker-compose.portainer.yml)
stack via the Portainer UI. Full walk-through:
[`docs/deploy-portainer.md`](docs/deploy-portainer.md).
## Manual test runbook
End-to-end checks that unit tests can't cover (live Baileys,

View File

@ -0,0 +1,111 @@
# Portainer-ready stack. Pulls cm-whatsapp-{web,bot} from
# gitea.04080616.xyz/yiekheng instead of building from source — drop
# this file into a Portainer "Stack" (Repository or Web editor) and
# fill the env vars in the Portainer UI.
#
# Differences vs docker-compose.base.yml:
# - No `build:` blocks (Portainer pulls only).
# - Named volumes (cmbot-data, cmbot-sessions, cmbot-media) instead
# of host bind-mounts so the operator doesn't need shell access
# to manage persistent state.
# - Ports section on `web` so the operator can route a reverse
# proxy / Cloudflare Tunnel directly at the container.
# - `restart: unless-stopped` on both services.
#
# Required env vars (set in Portainer → Stack → Environment variables):
# DATABASE_URL postgres://USER:PASS@HOST:5432/wabot
# AUTH_SECRET 32-byte random hex (use scripts/gen_auth_secret.sh
# on any machine and copy the output)
# WEB_PORT host port for the web container (default 9000)
#
# Optional:
# DOCKER_IMAGE_TAG registry tag to deploy (default: latest)
# OPERATOR_TOKEN_VERSION session-cookie kill switch (default: 1)
# BOT_FIRE_CONCURRENCY pg-boss workers (default: 8)
# BOT_GROUP_CONCURRENCY per-account parallel sends (default: 3)
# BOT_MAX_SEND_PER_MINUTE per-account token-bucket rate (default: 40)
# BOT_LOG_LEVEL pino log level (default: info)
#
# Registry auth: Portainer needs a pull credential for
# gitea.04080616.xyz before you start the stack:
# Portainer → Registries → Add registry
# Name: gitea.04080616.xyz
# URL: gitea.04080616.xyz
# Username: <gitea user>
# Token: <gitea personal access token, read:packages>
# After adding, edit each service in the stack and set "Registry" to
# the one you just added so the pull resolves.
services:
bot:
image: gitea.04080616.xyz/yiekheng/cm-whatsapp-bot:${DOCKER_IMAGE_TAG:-latest}
container_name: cmbot-bot
restart: unless-stopped
environment:
NODE_ENV: production
DATABASE_URL: ${DATABASE_URL}
DATA_DIR: /data
SESSIONS_DIR: /data/sessions
MEDIA_DIR: /data/media
BOT_HEALTH_PORT: 8081
BOT_LOG_LEVEL: ${BOT_LOG_LEVEL:-info}
BOT_FIRE_CONCURRENCY: ${BOT_FIRE_CONCURRENCY:-8}
BOT_GROUP_CONCURRENCY: ${BOT_GROUP_CONCURRENCY:-3}
BOT_MAX_SEND_PER_MINUTE: ${BOT_MAX_SEND_PER_MINUTE:-40}
volumes:
- cmbot-sessions:/data/sessions
- cmbot-media:/data/media
healthcheck:
test:
- "CMD-SHELL"
- "wget -qO- --timeout=2 http://127.0.0.1:8081/health >/dev/null || exit 1"
interval: 30s
timeout: 5s
retries: 3
start_period: 20s
networks:
- cmbot
web:
image: gitea.04080616.xyz/yiekheng/cm-whatsapp-web:${DOCKER_IMAGE_TAG:-latest}
container_name: cmbot-web
restart: unless-stopped
depends_on:
- bot
environment:
NODE_ENV: production
DATABASE_URL: ${DATABASE_URL}
DATA_DIR: /data
MEDIA_DIR: /data/media
WEB_PORT: 3000
AUTH_SECRET: ${AUTH_SECRET}
OPERATOR_TOKEN_VERSION: ${OPERATOR_TOKEN_VERSION:-1}
volumes:
# Web reads media from the same persistent volume the bot wrote.
- cmbot-media:/data/media:ro
ports:
# Maps the Next.js port (3000 inside the container) to whatever
# WEB_PORT the operator set. The reverse proxy / Cloudflare Tunnel
# in front of this host points at <host>:${WEB_PORT}.
- "${WEB_PORT:-9000}:3000"
healthcheck:
test:
- "CMD-SHELL"
- "wget -qO- --timeout=2 http://127.0.0.1:3000/api/health >/dev/null || exit 1"
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
networks:
- cmbot
volumes:
cmbot-sessions:
name: cmbot-sessions
cmbot-media:
name: cmbot-media
networks:
cmbot:
driver: bridge
name: cmbot

View File

@ -26,11 +26,13 @@ COPY --from=build /app/node_modules /app/node_modules
COPY --from=build /app/apps/bot /app/apps/bot
COPY --from=build /app/packages/db /app/packages/db
COPY --from=build /app/packages/shared /app/packages/shared
RUN addgroup -g 1000 app && \
adduser -D -u 1000 -G app -s /sbin/nologin app && \
mkdir -p /data/sessions /data/media /app && \
chown -R app:app /app /data && \
# Reuse the `node` user (UID/GID 1000) that node:alpine ships with —
# `addgroup -g 1000 app` failed in CI because gid 1000 was already
# taken by the node group. Same hardening posture (non-root, no
# shell login), one less moving part.
RUN mkdir -p /data/sessions /data/media /app && \
chown -R node:node /app /data && \
chmod 700 /data/sessions
USER app
USER node
EXPOSE 8081
CMD ["node", "apps/bot/dist/index.js"]

View File

@ -29,9 +29,9 @@ ENV HOSTNAME=0.0.0.0
COPY --from=build /app/apps/web/.next/standalone ./
COPY --from=build /app/apps/web/.next/static ./apps/web/.next/static
COPY --from=build /app/apps/web/public ./apps/web/public
RUN addgroup -g 1000 app && \
adduser -D -u 1000 -G app -s /sbin/nologin app && \
chown -R app:app /app
USER app
# Reuse the `node` user (UID/GID 1000) that node:alpine ships with —
# `addgroup -g 1000 app` collided with the pre-existing node group.
RUN chown -R node:node /app
USER node
EXPOSE 3000
CMD ["node", "apps/web/server.js"]

172
docs/deploy-portainer.md Normal file
View File

@ -0,0 +1,172 @@
# Deploying via Portainer
End-to-end deploy steps for a fresh Portainer-managed host. Targets
the standard cm-whatsapp-bot pair of images published by
`scripts/publish.sh`.
## 0. Prerequisites
- Portainer 2.x running on the target host (CE or EE both fine).
- A Postgres reachable from that host (the `wabot` database with the
pgcrypto / pg_trgm extensions enabled — run migrations from any
machine that can reach the DB before the stack is brought up).
- A pull credential for `gitea.04080616.xyz` — a Gitea personal
access token with the `read:packages` scope. Generate one in
Gitea → User Settings → Applications.
- A reverse proxy / Cloudflare Tunnel pointing at
`http://<portainer-host>:<WEB_PORT>` if the deploy needs to be
reachable on the public domain (e.g. `wabot.04080616.xyz`).
## 1. Add the registry to Portainer
Portainer → **Registries****+ Add registry** → Custom registry.
| Field | Value |
|---------------|-----------------------------|
| Name | `gitea.04080616.xyz` |
| Registry URL | `gitea.04080616.xyz` |
| Authentication | enabled |
| Username | your Gitea username |
| Password | the read:packages PAT |
Save. The registry must show as connected before continuing — if the
test pull fails, the stack will hang on `pull` later.
## 2. Push the images (on your dev machine)
```bash
# Login once (sudo path matches scripts/dev.sh by default)
sudo docker login gitea.04080616.xyz
# Push :latest. Tag explicitly with DOCKER_IMAGE_TAG=v1.x.y if you
# want pinned-tag deploys (recommended for prod — never deploy
# `latest` if you can avoid it; tag versions per release).
NO_SUDO=1 ./scripts/publish.sh latest
```
`publish.sh` builds + pushes both images:
- `gitea.04080616.xyz/yiekheng/cm-whatsapp-bot:<tag>`
- `gitea.04080616.xyz/yiekheng/cm-whatsapp-web:<tag>`
## 3. Create the Portainer stack
Portainer → **Stacks****+ Add stack**.
**Name:** `cm-whatsapp-bot`
**Build method:** "Web editor" or "Repository". Either is fine —
"Repository" pointing at this repo's `master` and the file
`docker-compose.portainer.yml` is the cleanest path because future
deploys are just "Pull and redeploy" inside Portainer.
**Web editor path:** copy the contents of
[`docker-compose.portainer.yml`](../docker-compose.portainer.yml)
into the editor verbatim.
**Repository path:**
| Field | Value |
|------------------|-------------------------------------------------------------|
| Repository URL | http://192.168.0.215:3000/yiekheng/cm_whatsapp_bot_v1.git |
| Reference | refs/heads/master |
| Compose path | docker-compose.portainer.yml |
| Authentication | enabled (same Gitea PAT as step 1) |
| Auto-update | optional — enabled lets Portainer redeploy on every push |
## 4. Set environment variables
In the same stack form, scroll to **Environment variables** and add:
| Key | Value |
|---------------------------|------------------------------------------------|
| `DATABASE_URL` | `postgres://wabot:PASS@192.168.0.210:5432/wabot` |
| `AUTH_SECRET` | output of `scripts/gen_auth_secret.sh` |
| `WEB_PORT` | host port (e.g. `9000`) |
| `DOCKER_IMAGE_TAG` | `latest` (or a pinned `v1.x.y`) |
| `OPERATOR_TOKEN_VERSION` | `1` (bump only when you want to invalidate every existing session) |
| `BOT_LOG_LEVEL` | `info` |
Optional tuning (defaults are fine for most installs):
| Key | Default | When to bump |
|---------------------------|---------|--------------|
| `BOT_FIRE_CONCURRENCY` | `8` | More accounts firing in parallel |
| `BOT_GROUP_CONCURRENCY` | `3` | More groups per fire — but careful with WhatsApp rate caps |
| `BOT_MAX_SEND_PER_MINUTE` | `40` | Aged accounts can push toward 60 |
## 5. Run database migrations
The stack does NOT auto-migrate on boot. Apply migrations from any
machine that can reach the same Postgres:
```bash
DATABASE_URL='postgres://...' \
./scripts/db.sh migrate
```
If the journal is non-monotonic, the migrate runner refuses with a
clear error and prints which `_journal.json` entry to bump (the
guard added in commit 47d7c53 + the CI test in
`apps/web/src/test/drizzle-journal-monotonic.test.ts`).
Then seed the bootstrap operator + set its password:
```bash
DATABASE_URL='postgres://...' SEED_OPERATOR_USERNAME=admin \
./scripts/db.sh seed
DATABASE_URL='postgres://...' \
./scripts/set-password.sh admin # reads the password from stdin
```
## 6. Deploy the stack
In Portainer → click **Deploy the stack**. Watch the container list
in **Containers**:
- `cmbot-bot` should show *running, healthy* within ~20 s.
- `cmbot-web` should show *running, healthy* within ~30 s (Next.js
cold boot is the bottleneck).
If a container shows *unhealthy*, check **Logs**:
| Symptom | Likely cause |
|----------------------------------------------|--------------|
| `column "email" does not exist` | Migrations weren't applied. Run step 5. |
| `Server is not configured for sign-in` | `AUTH_SECRET` blank or missing. Set it in stack env. |
| `pg-boss: queue policy ...standard` | Harmless first-boot log; the bot force-flips it. |
| `Stream Errored (restart required)` (Baileys) | Upstream noise; ignore unless pairing fails. |
## 7. First sign-in
Visit `https://<your-domain>/login`, sign in as `admin` with the
password set in step 5, and walk the
[`docs/runbook.md`](runbook.md) smoke checklist before declaring
the deploy good.
## 8. Future redeploys
Two paths depending on how you set up step 3:
**Web editor flow:**
1. Run `scripts/publish.sh <tag>` on your dev machine.
2. In Portainer → Stack → "Update the stack" → "Re-pull image and
redeploy".
**Repository flow:**
1. Run `scripts/publish.sh <tag>`.
2. Commit any compose / env changes to master.
3. Portainer → Stack → "Pull and redeploy". (If auto-update is on,
skip this — Portainer redeploys on every push.)
Always pin a tag (`v1.4.2`) instead of `latest` for production —
makes rollback a one-field stack edit instead of a republish.
## Rolling back
In Portainer → Stack → set `DOCKER_IMAGE_TAG=v1.4.1` (or whatever
the previous good tag was) → Re-pull and redeploy. The cmbot-* data
volumes (sessions, media) are preserved across image swaps, so a
rollback doesn't lose pairings or uploaded media.
If the schema also rolled back, run the corresponding `down` SQL by
hand — drizzle's migrator only goes forward, by design.