sunnymh-manga-dl/CLAUDE.md
yiekheng fab3b413b8 Merge download.py and upload.py into unified manga.py with TUI
- Single interactive script (arrow-key TUI via simple-term-menu) replaces
  download.py, upload.py, and export_cookies.py
- Add sync command: streams new chapters site -> R2 directly without
  saving locally (uses RAM as cache)
- Add R2/DB management submenu (status, delete specific, clear all)
- Multi-select chapter picker with already-downloaded marked grayed out
- Chapter list fetched via /v2.0/apis/manga/chapterByPage with pagination
- Cover image captured from page network traffic (no extra fetch)
- Filter prefetched next-chapter images via DOM container count
- Chrome runs hidden via AppleScript on macOS (except setup mode)
- DB records only created after R2 upload succeeds (no orphan rows)
- Parallel R2 uploads (8 workers) with WebP method=6 quality=75
- Update CLAUDE.md to reflect new architecture
- Add requirements.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:56:05 +08:00

68 lines
2.5 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Manga downloader and uploader toolkit. Currently supports m.happymh.com, designed for future multi-site support.
- `manga.py` — Single interactive CLI. Download, upload, and sync manga. Launches real Chrome via subprocess, connects via CDP, bypasses Cloudflare. Uploads to R2 + PostgreSQL.
## Architecture
### Anti-bot Strategy
- Chrome launched via `subprocess.Popen` (not Playwright) to avoid automation detection
- Playwright connects via CDP (`connect_over_cdp`) for scripting only
- Persistent browser profile in `.browser-data/` preserves Cloudflare sessions
- All navigation uses JS (`window.location.href`) or `page.goto` with `wait_until="commit"`
- Images downloaded via `response.body()` from network interception (no base64)
### Data Flow
1. **Input**: `manga.json` — JSON array of manga URLs
2. **Download**: Chrome navigates to manga page → API fetches chapter list → navigates to reader pages → intercepts image URLs from API → downloads via browser fetch
3. **Local storage**: `manga-content/<slug>/` with cover.jpg, detail.json, and chapter folders
4. **Upload**: Converts JPG→WebP → uploads to R2 → creates DB records
### Key APIs (happymh)
- Chapter list: `GET /v2.0/apis/manga/chapterByPage?code=<slug>&lang=cn&order=asc&page=<n>`
- Chapter images: `GET /v2.0/apis/manga/reading?code=<slug>&cid=<chapter_id>` (intercepted from reader page)
- Cover: Captured from page load traffic (`/mcover/` responses)
## Directory Convention
```
manga-content/
<slug>/
detail.json # metadata (title, author, genres, description, cover URL)
cover.jpg # cover image captured from page traffic
1 <chapter-name>/ # chapter folder (ordered by API sequence)
1.jpg
2.jpg
...
```
## R2 Storage Layout
```
manga/<slug>/cover.webp
manga/<slug>/chapters/<number>/<page>.webp
```
## Environment Variables (.env)
```
R2_ACCOUNT_ID=
R2_ACCESS_KEY=
R2_SECRET_KEY=
R2_BUCKET=
R2_PUBLIC_URL=
DATABASE_URL=postgresql://...
```
## Future: Multi-site Support
Current code is specific to happymh.com. To add new sites:
- Extract site-specific logic (chapter fetching, image URL extraction, CF handling) into per-site modules
- Keep shared infrastructure (Chrome management, image download, upload) in common modules
- Each site module implements: `fetch_chapters(page, slug)`, `get_chapter_images(page, slug, chapter_id)`, `fetch_metadata(page)`