Lazy config, cross-platform support, session recovery, doc accuracy
Code: - Defer boto3 client and DATABASE_URL reads to first use via _ensure_config(). Missing .env now prints a friendly "Missing env vars" list and exits instead of KeyError on import. - Auto-detect Chrome binary from CHROME_CANDIDATES (macOS/Linux/Windows paths). Friendly error listing tried paths if none found. - Guard termios/tty imports; EscListener becomes a no-op on Windows. - hide_chrome() is a no-op on non-macOS (osascript only works on Darwin). - with_browser catches target-closed/disconnected errors, resets the session singleton, and retries once before raising. Docs: - Fix claim that page.goto is never used — manga listing uses page.goto, only reader pages use window.location.href. - Correct AppleScript command (full tell-application form). - Clarify "Check missing pages" flow — re-upload is inline; dim-only fix reads bytes from R2 without re-upload. - Add CREATE TABLE statements for Manga/Chapter/Page so schema contract is explicit. - Add "Where to change what" table mapping tasks to code locations. - Document lazy config, cross-platform constraints, and anti-patterns (headless, thread parallelism). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
051b2e191f
commit
9cb9b8c7fd
173
CLAUDE.md
173
CLAUDE.md
@ -4,64 +4,157 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
|||||||
|
|
||||||
## Project Overview
|
## Project Overview
|
||||||
|
|
||||||
Manga downloader and uploader toolkit. Currently supports m.happymh.com, designed for future multi-site support.
|
Single-file interactive toolkit (`manga.py`) that downloads manga from m.happymh.com, stores images in Cloudflare R2 as WebP, and writes metadata to PostgreSQL. Runs as an arrow-key TUI backed by a persistent Chrome session.
|
||||||
|
|
||||||
- `manga.py` — Single interactive CLI. Download, upload, and sync manga. Launches real Chrome via subprocess, connects via CDP, bypasses Cloudflare. Uploads to R2 + PostgreSQL.
|
## Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt # playwright, boto3, psycopg2-binary, Pillow, python-dotenv, simple-term-menu
|
||||||
|
python manga.py # launch the TUI (no CLI args)
|
||||||
|
```
|
||||||
|
|
||||||
|
No tests, no lint config, no build step. Requires Google Chrome or Chromium installed. The script auto-detects from `CHROME_CANDIDATES` (macOS/Linux/Windows paths). R2 and DB credentials load lazily — see `.env` section below.
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
### Anti-bot Strategy
|
### Anti-bot: real Chrome + CDP + persistent profile
|
||||||
- Chrome launched via `subprocess.Popen` (not Playwright) to avoid automation detection
|
|
||||||
- Playwright connects via CDP (`connect_over_cdp`) for scripting only
|
|
||||||
- Persistent browser profile in `.browser-data/` preserves Cloudflare sessions
|
|
||||||
- All navigation uses JS (`window.location.href`) or `page.goto` with `wait_until="commit"`
|
|
||||||
- Images downloaded via `response.body()` from network interception (no base64)
|
|
||||||
|
|
||||||
### Data Flow
|
Cloudflare fingerprints both the TLS handshake and the browser process. The anti-detection chain matters — changing any link breaks downloads:
|
||||||
1. **Input**: `manga.json` — JSON array of manga URLs
|
|
||||||
2. **Download**: Chrome navigates to manga page → API fetches chapter list → navigates to reader pages → intercepts image URLs from API → downloads via browser fetch
|
|
||||||
3. **Local storage**: `manga-content/<slug>/` with cover.jpg, detail.json, and chapter folders
|
|
||||||
4. **Upload**: Converts JPG→WebP → uploads to R2 → creates DB records
|
|
||||||
|
|
||||||
### Key APIs (happymh)
|
1. **`subprocess.Popen(CHROME_PATH, ...)`** launches the user's real Chrome binary, not Playwright's Chromium. This gives a genuine TLS fingerprint.
|
||||||
- Chapter list: `GET /v2.0/apis/manga/chapterByPage?code=<slug>&lang=cn&order=asc&page=<n>`
|
2. **`connect_over_cdp`** attaches Playwright to Chrome via DevTools Protocol. Playwright never *launches* Chrome — only sends CDP commands to a separately-running process.
|
||||||
- Chapter images: `GET /v2.0/apis/manga/reading?code=<slug>&cid=<chapter_id>` (intercepted from reader page)
|
3. **Persistent `--user-data-dir=.browser-data`** preserves `cf_clearance` cookies between runs. After the user solves Cloudflare once (Setup menu), subsequent runs skip the challenge.
|
||||||
- Cover: Captured from page load traffic (`/mcover/` responses)
|
4. **Single session (`_session_singleton`)** — Chrome is lazy-started on first operation and reused across all commands in one `python manga.py` run. Closed only on Quit. `with_browser(func)` catches "target closed" / "disconnected" errors, resets the singleton, and retries once.
|
||||||
|
5. **`hide_chrome()`** runs `osascript -e 'tell application "System Events" to set visible of process "Google Chrome" to false'` after launch so the window doesn't steal focus. No-op on non-macOS.
|
||||||
|
|
||||||
## Directory Convention
|
**Do not switch to headless mode.** Tried — Cloudflare blocks it because the fingerprint differs from real Chrome. **Do not parallelize chapter work across threads** with Playwright's sync API — each thread would need its own event loop and crashes with "no running event loop".
|
||||||
|
|
||||||
|
### Cloudflare handling
|
||||||
|
|
||||||
|
`wait_for_cloudflare(session)` polls `page.title()` and `page.url` for the "Just a moment" / `/challenge` markers. Recovery is manual: the user is shown the browser window and solves CAPTCHA. The Setup menu (`cmd_setup`) is the dedicated flow for this. During sync/check-missing, if the reading API returns 403, the script prints "CF blocked — run Setup" and stops.
|
||||||
|
|
||||||
|
### Navigation: `page.goto` vs JS assignment
|
||||||
|
|
||||||
|
- **Manga listing page** (`/manga/<slug>`) uses `page.goto(..., wait_until="commit")`. Works because Cloudflare on this route is lenient.
|
||||||
|
- **Reader page** (`/mangaread/<slug>/<id>`) uses `page.evaluate("window.location.href = '...'")` — bypasses CF's detection of CDP `Page.navigate` for the stricter reader route.
|
||||||
|
|
||||||
|
### Image pipeline (happymh)
|
||||||
|
|
||||||
|
Per chapter (in `_try_get_chapter_images`):
|
||||||
|
1. Register a response listener that matches `/apis/manga/reading` **AND** `cid=<chapter_id>` in the URL **AND** validates `data.id` in the response body matches. Drops pre-fetched neighbouring chapters.
|
||||||
|
2. Navigate the reader URL via `window.location.href` assignment.
|
||||||
|
3. DOM-count sanity check: `[class*="imgContainer"]` total minus `[class*="imgNext"]` gives the current chapter's actual page count. Trim captured list if it includes next-chapter previews.
|
||||||
|
4. `fetch_image_bytes(page, img)` runs `fetch(url)` via `page.evaluate` inside a `page.expect_response(...)` block. The body is read via CDP (`response.body()`) — zero base64 overhead. Fallback strips the `?q=50` query if the original URL fails.
|
||||||
|
5. `fetch_all_pages(page, images, max_attempts=3)` retries each failed page up to 3 times with 2s backoff between rounds. Returns `{page_num: bytes}` for successful fetches.
|
||||||
|
|
||||||
|
### R2 + DB write ordering
|
||||||
|
|
||||||
|
**Page rows are inserted into the DB only after the R2 upload succeeds.** This prevents orphan DB records pointing to missing R2 objects. Every `INSERT INTO "Page"` includes `width` and `height` read from the JPEG/WebP bytes via PIL (`Image.open(...).width`).
|
||||||
|
|
||||||
|
### Storage layouts
|
||||||
|
|
||||||
```
|
```
|
||||||
manga-content/
|
# Local (download command)
|
||||||
<slug>/
|
manga-content/<slug>/detail.json # title, author, genres, description, mg-cover URL
|
||||||
detail.json # metadata (title, author, genres, description, cover URL)
|
manga-content/<slug>/cover.jpg # captured from page load traffic
|
||||||
cover.jpg # cover image captured from page traffic
|
manga-content/<slug>/<N> <chapter>/<page>.jpg
|
||||||
1 <chapter-name>/ # chapter folder (ordered by API sequence)
|
|
||||||
1.jpg
|
|
||||||
2.jpg
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
## R2 Storage Layout
|
# R2 (upload / sync)
|
||||||
|
|
||||||
```
|
|
||||||
manga/<slug>/cover.webp
|
manga/<slug>/cover.webp
|
||||||
manga/<slug>/chapters/<number>/<page>.webp
|
manga/<slug>/chapters/<N>/<page>.webp
|
||||||
```
|
```
|
||||||
|
|
||||||
## Environment Variables (.env)
|
Chapter order is the API's ascending index (1-based). Chapter names can repeat (announcements, extras) so the DB `Chapter.number` column uses this index, not parsed chapter titles.
|
||||||
|
|
||||||
|
### Menu actions
|
||||||
|
|
||||||
|
- **Setup** (`cmd_setup`) → brings Chrome to front, user solves CF, validates `cf_clearance` cookie.
|
||||||
|
- **Download** (`cmd_download`) → picks URL from `manga.json`, optional chapter multi-select; saves JPGs locally.
|
||||||
|
- **Upload** (`cmd_upload` → `upload_manga_to_r2`) → converts local JPGs → WebP, uploads to R2, writes DB rows.
|
||||||
|
- **Sync** (`cmd_sync`) → combined download+upload via RAM (no local files), refreshes `Manga` row metadata, only inserts chapters missing from DB.
|
||||||
|
- **R2 / DB management** submenu (`tui_r2_manage`):
|
||||||
|
- **Status** — single-pass R2 object count grouped by slug, plus DB row counts
|
||||||
|
- **Edit manga info** (`tui_edit_manga`) — title/description/genre/status/coverUrl
|
||||||
|
- **Delete specific manga** — R2 prefix + cascade DB delete
|
||||||
|
- **Delete specific chapter** (`tui_delete_chapter`) — multi-select or "All chapters"
|
||||||
|
- **Check missing pages** (`tui_check_missing_pages`) — for each chapter: if site page count ≠ R2 count, re-upload **inline** (browser still on that reader page); if counts match but DB `width`/`height` are NULL or 0, fix by reading WebP bytes from R2 (no re-upload)
|
||||||
|
- **Clear ALL (R2 + DB)**
|
||||||
|
- **Recompress manga** (`r2_recompress`) — re-encodes every WebP under `manga/<slug>/` at quality=65, overwrites in place
|
||||||
|
|
||||||
|
### WebP encoding
|
||||||
|
|
||||||
|
`_to_webp_bytes(img, quality=WEBP_QUALITY=75, method=6)` — method=6 is the slowest/smallest preset. Covers use quality 80 via `make_cover` (crops to 400×560 aspect, then resizes). Resize-during-encode was explicitly removed — page originals' dimensions are preserved.
|
||||||
|
|
||||||
|
### ESC to stop
|
||||||
|
|
||||||
|
`EscListener` puts stdin in cbreak mode (POSIX `termios`+`tty`) and runs a daemon thread listening for `\x1b`. Download/Upload/Sync check `esc.stop.is_set()` between chapters and cleanly exit. Restores terminal mode on `__exit__`. No-op on Windows (no termios) and when stdin isn't a TTY.
|
||||||
|
|
||||||
|
### Lazy config loading
|
||||||
|
|
||||||
|
`_ensure_config()` is called at the start of each R2/DB helper. It reads required env vars and constructs the boto3 client on first use. If env vars are missing, it prints the missing list and `sys.exit(1)` — no KeyError traceback on import. `s3`, `BUCKET`, `PUBLIC_URL`, `DATABASE_URL` are module globals set by that call.
|
||||||
|
|
||||||
|
## Environment variables (.env)
|
||||||
|
|
||||||
```
|
```
|
||||||
R2_ACCOUNT_ID=
|
R2_ACCOUNT_ID= # cloudflare account id
|
||||||
R2_ACCESS_KEY=
|
R2_ACCESS_KEY=
|
||||||
R2_SECRET_KEY=
|
R2_SECRET_KEY=
|
||||||
R2_BUCKET=
|
R2_BUCKET=
|
||||||
R2_PUBLIC_URL=
|
R2_PUBLIC_URL= # e.g. https://pub-xxx.r2.dev (trailing slash stripped)
|
||||||
DATABASE_URL=postgresql://...
|
DATABASE_URL= # postgresql://user:pass@host:port/dbname
|
||||||
```
|
```
|
||||||
|
|
||||||
## Future: Multi-site Support
|
Missing any of these produces a friendly error on first R2/DB operation, not on import.
|
||||||
|
|
||||||
Current code is specific to happymh.com. To add new sites:
|
## DB schema expectations
|
||||||
- Extract site-specific logic (chapter fetching, image URL extraction, CF handling) into per-site modules
|
|
||||||
- Keep shared infrastructure (Chrome management, image download, upload) in common modules
|
The script reads/writes but does **not** create tables. Create them externally:
|
||||||
- Each site module implements: `fetch_chapters(page, slug)`, `get_chapter_images(page, slug, chapter_id)`, `fetch_metadata(page)`
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE "Manga" (
|
||||||
|
id SERIAL PRIMARY KEY,
|
||||||
|
slug TEXT UNIQUE NOT NULL,
|
||||||
|
title TEXT NOT NULL,
|
||||||
|
description TEXT,
|
||||||
|
"coverUrl" TEXT,
|
||||||
|
genre TEXT, -- comma-joined list of all genres
|
||||||
|
status TEXT NOT NULL, -- PUBLISHED | DRAFT | HIDDEN
|
||||||
|
"createdAt" TIMESTAMPTZ NOT NULL,
|
||||||
|
"updatedAt" TIMESTAMPTZ NOT NULL
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE "Chapter" (
|
||||||
|
id SERIAL PRIMARY KEY,
|
||||||
|
"mangaId" INTEGER NOT NULL REFERENCES "Manga"(id),
|
||||||
|
number INTEGER NOT NULL, -- 1-based index from the API order
|
||||||
|
title TEXT NOT NULL,
|
||||||
|
UNIQUE ("mangaId", number)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE TABLE "Page" (
|
||||||
|
id SERIAL PRIMARY KEY,
|
||||||
|
"chapterId" INTEGER NOT NULL REFERENCES "Chapter"(id),
|
||||||
|
number INTEGER NOT NULL, -- 1-based page number
|
||||||
|
"imageUrl" TEXT NOT NULL,
|
||||||
|
width INTEGER,
|
||||||
|
height INTEGER,
|
||||||
|
UNIQUE ("chapterId", number)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Column identifiers are camelCase with double quotes — matches Prisma default naming.
|
||||||
|
|
||||||
|
## Where to change what
|
||||||
|
|
||||||
|
| Task | Location |
|
||||||
|
|---|---|
|
||||||
|
| Add a new site | Extract happymh-specific bits: `fetch_chapters_via_api`, `fetch_chapters_from_dom`, `fetch_metadata`, `_try_get_chapter_images`, the `/mcover/` cover capture in `load_manga_page`, the reader URL shape. Keep Chrome/R2/DB/TUI as common. |
|
||||||
|
| New menu item | Add to `show_menu` list in `main` and dispatch in the `if idx == N:` ladder. For R2/DB ops, add to `tui_r2_manage`. |
|
||||||
|
| Tweak CF detection | `wait_for_cloudflare` / `_wait_for_cf_on_page` — edit the title/URL heuristics carefully, both ops check the same signals. |
|
||||||
|
| Change image quality | `WEBP_QUALITY` at top of file; cover quality is hard-coded 80 in `make_cover`. |
|
||||||
|
| Add a new Page-table column | Update all three `INSERT INTO "Page"` sites (`upload_manga_to_r2`, `cmd_sync`, `tui_check_missing_pages` re-upload branch) and the `SELECT ... FROM "Page"` in the dim-check query. |
|
||||||
|
| Change parallelism | `UPLOAD_WORKERS` for R2 uploads; do **not** introduce chapter-level threading (sync Playwright breaks). |
|
||||||
|
|
||||||
|
## Future: multi-site support
|
||||||
|
|
||||||
|
Current code is happymh-specific (selectors, API paths, URL patterns). To generalise, a site module would implement `fetch_chapters(page, slug)`, `get_chapter_images(page, slug, chapter_id)`, and `fetch_metadata(page)`, keeping the Chrome/R2/DB/TUI layer common.
|
||||||
|
|||||||
114
manga.py
114
manga.py
@ -8,15 +8,26 @@ Usage:
|
|||||||
import io
|
import io
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
|
import platform
|
||||||
import re
|
import re
|
||||||
import select
|
import select
|
||||||
import sys
|
import sys
|
||||||
import time
|
import time
|
||||||
import socket
|
import socket
|
||||||
import subprocess
|
import subprocess
|
||||||
import termios
|
|
||||||
import threading
|
import threading
|
||||||
import tty
|
|
||||||
|
IS_MACOS = platform.system() == "Darwin"
|
||||||
|
|
||||||
|
# POSIX-only TTY modules; EscListener is a no-op on Windows.
|
||||||
|
try:
|
||||||
|
import termios
|
||||||
|
import tty
|
||||||
|
_HAS_TERMIOS = True
|
||||||
|
except ImportError:
|
||||||
|
termios = None
|
||||||
|
tty = None
|
||||||
|
_HAS_TERMIOS = False
|
||||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from urllib.parse import urlparse
|
from urllib.parse import urlparse
|
||||||
@ -40,19 +51,58 @@ BROWSER_DATA = ROOT_DIR / ".browser-data"
|
|||||||
CDP_PORT = 9333
|
CDP_PORT = 9333
|
||||||
REQUEST_DELAY = 1.5
|
REQUEST_DELAY = 1.5
|
||||||
UPLOAD_WORKERS = 8
|
UPLOAD_WORKERS = 8
|
||||||
CHROME_PATH = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
|
||||||
|
|
||||||
# R2
|
CHROME_CANDIDATES = [
|
||||||
s3 = boto3.client(
|
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome", # macOS
|
||||||
"s3",
|
"/usr/bin/google-chrome", # Linux
|
||||||
endpoint_url=f"https://{os.environ['R2_ACCOUNT_ID']}.r2.cloudflarestorage.com",
|
"/usr/bin/google-chrome-stable",
|
||||||
aws_access_key_id=os.environ["R2_ACCESS_KEY"],
|
"/usr/bin/chromium",
|
||||||
aws_secret_access_key=os.environ["R2_SECRET_KEY"],
|
"/usr/bin/chromium-browser",
|
||||||
region_name="auto",
|
r"C:\Program Files\Google\Chrome\Application\chrome.exe", # Windows
|
||||||
)
|
r"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe",
|
||||||
BUCKET = os.environ["R2_BUCKET"]
|
]
|
||||||
PUBLIC_URL = os.environ["R2_PUBLIC_URL"].rstrip("/")
|
|
||||||
DATABASE_URL = os.environ["DATABASE_URL"]
|
|
||||||
|
def _find_chrome():
|
||||||
|
for p in CHROME_CANDIDATES:
|
||||||
|
if Path(p).exists():
|
||||||
|
return p
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
CHROME_PATH = _find_chrome()
|
||||||
|
|
||||||
|
|
||||||
|
# R2/DB config loaded lazily so missing .env gives a friendly error, not KeyError on import.
|
||||||
|
_REQUIRED_ENV = ("R2_ACCOUNT_ID", "R2_ACCESS_KEY", "R2_SECRET_KEY", "R2_BUCKET", "R2_PUBLIC_URL", "DATABASE_URL")
|
||||||
|
s3 = None
|
||||||
|
BUCKET = None
|
||||||
|
PUBLIC_URL = None
|
||||||
|
DATABASE_URL = None
|
||||||
|
_config_loaded = False
|
||||||
|
|
||||||
|
|
||||||
|
def _ensure_config():
|
||||||
|
global s3, BUCKET, PUBLIC_URL, DATABASE_URL, _config_loaded
|
||||||
|
if _config_loaded:
|
||||||
|
return
|
||||||
|
missing = [k for k in _REQUIRED_ENV if not os.environ.get(k)]
|
||||||
|
if missing:
|
||||||
|
print("Missing env vars (check .env):")
|
||||||
|
for k in missing:
|
||||||
|
print(f" {k}")
|
||||||
|
sys.exit(1)
|
||||||
|
s3 = boto3.client(
|
||||||
|
"s3",
|
||||||
|
endpoint_url=f"https://{os.environ['R2_ACCOUNT_ID']}.r2.cloudflarestorage.com",
|
||||||
|
aws_access_key_id=os.environ["R2_ACCESS_KEY"],
|
||||||
|
aws_secret_access_key=os.environ["R2_SECRET_KEY"],
|
||||||
|
region_name="auto",
|
||||||
|
)
|
||||||
|
BUCKET = os.environ["R2_BUCKET"]
|
||||||
|
PUBLIC_URL = os.environ["R2_PUBLIC_URL"].rstrip("/")
|
||||||
|
DATABASE_URL = os.environ["DATABASE_URL"]
|
||||||
|
_config_loaded = True
|
||||||
|
|
||||||
|
|
||||||
# ── ESC listener ───────────────────────────────────────────
|
# ── ESC listener ───────────────────────────────────────────
|
||||||
@ -68,7 +118,7 @@ class EscListener:
|
|||||||
self._fd = None
|
self._fd = None
|
||||||
|
|
||||||
def __enter__(self):
|
def __enter__(self):
|
||||||
if not sys.stdin.isatty():
|
if not _HAS_TERMIOS or not sys.stdin.isatty():
|
||||||
return self
|
return self
|
||||||
self._fd = sys.stdin.fileno()
|
self._fd = sys.stdin.fileno()
|
||||||
try:
|
try:
|
||||||
@ -105,7 +155,9 @@ class EscListener:
|
|||||||
|
|
||||||
|
|
||||||
def hide_chrome():
|
def hide_chrome():
|
||||||
"""Hide Chrome window on macOS."""
|
"""Hide Chrome window (macOS only; no-op elsewhere)."""
|
||||||
|
if not IS_MACOS:
|
||||||
|
return
|
||||||
try:
|
try:
|
||||||
subprocess.Popen(
|
subprocess.Popen(
|
||||||
["osascript", "-e",
|
["osascript", "-e",
|
||||||
@ -124,8 +176,11 @@ def is_port_open(port):
|
|||||||
def launch_chrome(start_url=None):
|
def launch_chrome(start_url=None):
|
||||||
if is_port_open(CDP_PORT):
|
if is_port_open(CDP_PORT):
|
||||||
return None
|
return None
|
||||||
if not Path(CHROME_PATH).exists():
|
if not CHROME_PATH or not Path(CHROME_PATH).exists():
|
||||||
print(f" Chrome not found at: {CHROME_PATH}")
|
print(" Chrome not found. Install Google Chrome or Chromium.")
|
||||||
|
print(" Searched:")
|
||||||
|
for p in CHROME_CANDIDATES:
|
||||||
|
print(f" {p}")
|
||||||
return None
|
return None
|
||||||
cmd = [
|
cmd = [
|
||||||
CHROME_PATH,
|
CHROME_PATH,
|
||||||
@ -198,8 +253,18 @@ def close_session():
|
|||||||
|
|
||||||
|
|
||||||
def with_browser(func):
|
def with_browser(func):
|
||||||
"""Run func(session) using the persistent Chrome session."""
|
"""Run func(session) using the persistent Chrome session.
|
||||||
return func(get_session())
|
If the session crashed (target closed etc.), reset and retry once."""
|
||||||
|
session = get_session()
|
||||||
|
try:
|
||||||
|
return func(session)
|
||||||
|
except Exception as e:
|
||||||
|
msg = str(e).lower()
|
||||||
|
if "target" in msg or "browser" in msg or "closed" in msg or "disconnected" in msg:
|
||||||
|
print(" Browser session lost, restarting...")
|
||||||
|
close_session()
|
||||||
|
return func(get_session())
|
||||||
|
raise
|
||||||
|
|
||||||
|
|
||||||
# ── Cloudflare ─────────────────────────────────────────────
|
# ── Cloudflare ─────────────────────────────────────────────
|
||||||
@ -674,11 +739,13 @@ def make_cover(source, width=400, height=560):
|
|||||||
|
|
||||||
|
|
||||||
def upload_to_r2(key, data, content_type="image/webp"):
|
def upload_to_r2(key, data, content_type="image/webp"):
|
||||||
|
_ensure_config()
|
||||||
s3.put_object(Bucket=BUCKET, Key=key, Body=data, ContentType=content_type)
|
s3.put_object(Bucket=BUCKET, Key=key, Body=data, ContentType=content_type)
|
||||||
return f"{PUBLIC_URL}/{key}"
|
return f"{PUBLIC_URL}/{key}"
|
||||||
|
|
||||||
|
|
||||||
def r2_key_exists(key):
|
def r2_key_exists(key):
|
||||||
|
_ensure_config()
|
||||||
try:
|
try:
|
||||||
s3.head_object(Bucket=BUCKET, Key=key)
|
s3.head_object(Bucket=BUCKET, Key=key)
|
||||||
return True
|
return True
|
||||||
@ -687,6 +754,7 @@ def r2_key_exists(key):
|
|||||||
|
|
||||||
|
|
||||||
def get_db():
|
def get_db():
|
||||||
|
_ensure_config()
|
||||||
conn = psycopg2.connect(DATABASE_URL)
|
conn = psycopg2.connect(DATABASE_URL)
|
||||||
conn.set_client_encoding("UTF8")
|
conn.set_client_encoding("UTF8")
|
||||||
return conn
|
return conn
|
||||||
@ -1242,6 +1310,7 @@ def cmd_sync(manga_url=None):
|
|||||||
|
|
||||||
def r2_list_prefixes():
|
def r2_list_prefixes():
|
||||||
"""List manga slugs in R2 by scanning top-level prefixes under manga/."""
|
"""List manga slugs in R2 by scanning top-level prefixes under manga/."""
|
||||||
|
_ensure_config()
|
||||||
slugs = set()
|
slugs = set()
|
||||||
paginator = s3.get_paginator("list_objects_v2")
|
paginator = s3.get_paginator("list_objects_v2")
|
||||||
for pg in paginator.paginate(Bucket=BUCKET, Prefix="manga/", Delimiter="/"):
|
for pg in paginator.paginate(Bucket=BUCKET, Prefix="manga/", Delimiter="/"):
|
||||||
@ -1255,6 +1324,7 @@ def r2_list_prefixes():
|
|||||||
|
|
||||||
def r2_count_by_prefix(prefix):
|
def r2_count_by_prefix(prefix):
|
||||||
"""Count objects under a prefix."""
|
"""Count objects under a prefix."""
|
||||||
|
_ensure_config()
|
||||||
total = 0
|
total = 0
|
||||||
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET, Prefix=prefix):
|
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET, Prefix=prefix):
|
||||||
total += len(pg.get("Contents", []))
|
total += len(pg.get("Contents", []))
|
||||||
@ -1263,6 +1333,7 @@ def r2_count_by_prefix(prefix):
|
|||||||
|
|
||||||
def r2_delete_prefix(prefix):
|
def r2_delete_prefix(prefix):
|
||||||
"""Delete all objects under a prefix."""
|
"""Delete all objects under a prefix."""
|
||||||
|
_ensure_config()
|
||||||
total = 0
|
total = 0
|
||||||
batches = []
|
batches = []
|
||||||
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET, Prefix=prefix):
|
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET, Prefix=prefix):
|
||||||
@ -1284,6 +1355,7 @@ def r2_delete_prefix(prefix):
|
|||||||
|
|
||||||
def r2_recompress(slug, quality=65):
|
def r2_recompress(slug, quality=65):
|
||||||
"""Download all webp images for a manga, re-encode at lower quality, re-upload."""
|
"""Download all webp images for a manga, re-encode at lower quality, re-upload."""
|
||||||
|
_ensure_config()
|
||||||
prefix = f"manga/{slug}/"
|
prefix = f"manga/{slug}/"
|
||||||
keys = []
|
keys = []
|
||||||
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET, Prefix=prefix):
|
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET, Prefix=prefix):
|
||||||
@ -1865,7 +1937,7 @@ def tui_r2_manage():
|
|||||||
break
|
break
|
||||||
|
|
||||||
elif idx == 0:
|
elif idx == 0:
|
||||||
# Count R2 objects in single pass
|
_ensure_config()
|
||||||
slug_counts = {}
|
slug_counts = {}
|
||||||
total = 0
|
total = 0
|
||||||
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET):
|
for pg in s3.get_paginator("list_objects_v2").paginate(Bucket=BUCKET):
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user