# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Manga downloader and uploader toolkit. Currently supports m.happymh.com, designed for future multi-site support. - `manga.py` — Single interactive CLI. Download, upload, and sync manga. Launches real Chrome via subprocess, connects via CDP, bypasses Cloudflare. Uploads to R2 + PostgreSQL. ## Architecture ### Anti-bot Strategy - Chrome launched via `subprocess.Popen` (not Playwright) to avoid automation detection - Playwright connects via CDP (`connect_over_cdp`) for scripting only - Persistent browser profile in `.browser-data/` preserves Cloudflare sessions - All navigation uses JS (`window.location.href`) or `page.goto` with `wait_until="commit"` - Images downloaded via `response.body()` from network interception (no base64) ### Data Flow 1. **Input**: `manga.json` — JSON array of manga URLs 2. **Download**: Chrome navigates to manga page → API fetches chapter list → navigates to reader pages → intercepts image URLs from API → downloads via browser fetch 3. **Local storage**: `manga-content//` with cover.jpg, detail.json, and chapter folders 4. **Upload**: Converts JPG→WebP → uploads to R2 → creates DB records ### Key APIs (happymh) - Chapter list: `GET /v2.0/apis/manga/chapterByPage?code=&lang=cn&order=asc&page=` - Chapter images: `GET /v2.0/apis/manga/reading?code=&cid=` (intercepted from reader page) - Cover: Captured from page load traffic (`/mcover/` responses) ## Directory Convention ``` manga-content/ / detail.json # metadata (title, author, genres, description, cover URL) cover.jpg # cover image captured from page traffic 1 / # chapter folder (ordered by API sequence) 1.jpg 2.jpg ... ``` ## R2 Storage Layout ``` manga//cover.webp manga//chapters//.webp ``` ## Environment Variables (.env) ``` R2_ACCOUNT_ID= R2_ACCESS_KEY= R2_SECRET_KEY= R2_BUCKET= R2_PUBLIC_URL= DATABASE_URL=postgresql://... ``` ## Future: Multi-site Support Current code is specific to happymh.com. To add new sites: - Extract site-specific logic (chapter fetching, image URL extraction, CF handling) into per-site modules - Keep shared infrastructure (Chrome management, image download, upload) in common modules - Each site module implements: `fetch_chapters(page, slug)`, `get_chapter_images(page, slug, chapter_id)`, `fetch_metadata(page)`