sunnymh-manga-dl/CLAUDE.md
yiekheng fab3b413b8 Merge download.py and upload.py into unified manga.py with TUI
- Single interactive script (arrow-key TUI via simple-term-menu) replaces
  download.py, upload.py, and export_cookies.py
- Add sync command: streams new chapters site -> R2 directly without
  saving locally (uses RAM as cache)
- Add R2/DB management submenu (status, delete specific, clear all)
- Multi-select chapter picker with already-downloaded marked grayed out
- Chapter list fetched via /v2.0/apis/manga/chapterByPage with pagination
- Cover image captured from page network traffic (no extra fetch)
- Filter prefetched next-chapter images via DOM container count
- Chrome runs hidden via AppleScript on macOS (except setup mode)
- DB records only created after R2 upload succeeds (no orphan rows)
- Parallel R2 uploads (8 workers) with WebP method=6 quality=75
- Update CLAUDE.md to reflect new architecture
- Add requirements.txt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:56:05 +08:00

2.5 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Manga downloader and uploader toolkit. Currently supports m.happymh.com, designed for future multi-site support.

  • manga.py — Single interactive CLI. Download, upload, and sync manga. Launches real Chrome via subprocess, connects via CDP, bypasses Cloudflare. Uploads to R2 + PostgreSQL.

Architecture

Anti-bot Strategy

  • Chrome launched via subprocess.Popen (not Playwright) to avoid automation detection
  • Playwright connects via CDP (connect_over_cdp) for scripting only
  • Persistent browser profile in .browser-data/ preserves Cloudflare sessions
  • All navigation uses JS (window.location.href) or page.goto with wait_until="commit"
  • Images downloaded via response.body() from network interception (no base64)

Data Flow

  1. Input: manga.json — JSON array of manga URLs
  2. Download: Chrome navigates to manga page → API fetches chapter list → navigates to reader pages → intercepts image URLs from API → downloads via browser fetch
  3. Local storage: manga-content/<slug>/ with cover.jpg, detail.json, and chapter folders
  4. Upload: Converts JPG→WebP → uploads to R2 → creates DB records

Key APIs (happymh)

  • Chapter list: GET /v2.0/apis/manga/chapterByPage?code=<slug>&lang=cn&order=asc&page=<n>
  • Chapter images: GET /v2.0/apis/manga/reading?code=<slug>&cid=<chapter_id> (intercepted from reader page)
  • Cover: Captured from page load traffic (/mcover/ responses)

Directory Convention

manga-content/
  <slug>/
    detail.json          # metadata (title, author, genres, description, cover URL)
    cover.jpg            # cover image captured from page traffic
    1 <chapter-name>/    # chapter folder (ordered by API sequence)
      1.jpg
      2.jpg
      ...

R2 Storage Layout

manga/<slug>/cover.webp
manga/<slug>/chapters/<number>/<page>.webp

Environment Variables (.env)

R2_ACCOUNT_ID=
R2_ACCESS_KEY=
R2_SECRET_KEY=
R2_BUCKET=
R2_PUBLIC_URL=
DATABASE_URL=postgresql://...

Future: Multi-site Support

Current code is specific to happymh.com. To add new sites:

  • Extract site-specific logic (chapter fetching, image URL extraction, CF handling) into per-site modules
  • Keep shared infrastructure (Chrome management, image download, upload) in common modules
  • Each site module implements: fetch_chapters(page, slug), get_chapter_images(page, slug, chapter_id), fetch_metadata(page)