640 lines
25 KiB
Markdown
640 lines
25 KiB
Markdown
# R3: Scraper Resilience Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Replace the bare `soup.find(...)['value']` pattern in `app/cm_bot.py` with a helper that raises a typed `ScraperError` and dumps the failing HTML to `logs/scraper-failures/` for postmortem.
|
|
|
|
**Architecture:** Add `ScraperError`, `_dump_html`, and `_find_input_value` to the `CM_BOT` class; convert five existing call sites that use the `<input name="X" value="...">` pattern; extend `get_register_link` and `get_user_credit` failure paths to dump HTML. Tests live in a new `tests/test_cm_bot_scraper.py`.
|
|
|
|
**Tech Stack:** Python 3.9 (containers) / 3.12 (local venv), `unittest` + `unittest.mock` (stdlib), `BeautifulSoup` (existing dep). No new dependencies.
|
|
|
|
**Spec:** [docs/superpowers/specs/2026-05-02-r3-scraper-resilience-design.md](../specs/2026-05-02-r3-scraper-resilience-design.md)
|
|
|
|
---
|
|
|
|
## File Map
|
|
|
|
| File | Operation | Purpose |
|
|
|---|---|---|
|
|
| `tests/test_cm_bot_scraper.py` | Create | Unit tests for `ScraperError`, `_dump_html`, `_find_input_value`. |
|
|
| `app/cm_bot.py` | Modify | Add `ScraperError`, helpers; convert five `'token'` extractions; extend `get_register_link` and `get_user_credit`. |
|
|
|
|
The helpers are added to the `CM_BOT` class so they have access to `self` for consistency with the existing class-based methods, even though `_dump_html` and `_find_input_value` don't actually need any instance state. Sticking to instance methods keeps the API uniform with everything else in `CM_BOT`.
|
|
|
|
---
|
|
|
|
## Task 1: Add `ScraperError`, `_dump_html`, `_find_input_value` (TDD)
|
|
|
|
**Files:**
|
|
- Create: `tests/test_cm_bot_scraper.py`
|
|
- Modify: `app/cm_bot.py`
|
|
|
|
- [ ] **Step 1: Write the failing tests**
|
|
|
|
Create `tests/test_cm_bot_scraper.py`:
|
|
|
|
```python
|
|
"""Tests for the cm_bot scraper resilience helpers.
|
|
|
|
The CM_BOT class currently uses bare `soup.find(...)['value']` calls
|
|
that throw cryptic TypeErrors when cm99.net returns an unexpected
|
|
response. R3 introduces three pieces:
|
|
- ScraperError: typed exception so callers can distinguish scraper
|
|
failures from network errors.
|
|
- _dump_html(context, content): writes the failing response to
|
|
logs/scraper-failures/<context>-<ts>.html and returns the path.
|
|
- _find_input_value(soup, name, *, context, raw): the dominant
|
|
extraction pattern. Returns the value on success, dumps + raises
|
|
ScraperError on miss.
|
|
|
|
These tests do NOT exercise the live cm99.net integration. They use
|
|
small inline HTML fixtures and patch filesystem side effects so the
|
|
tests stay hermetic.
|
|
"""
|
|
|
|
import io
|
|
import os
|
|
import shutil
|
|
import tempfile
|
|
import unittest
|
|
from unittest import mock
|
|
|
|
from bs4 import BeautifulSoup
|
|
|
|
from app.cm_bot import CM_BOT, ScraperError
|
|
|
|
|
|
# CM_BOT.__init__ reads CM_BOT_BASE_URL from the env (raises otherwise).
|
|
# Set a placeholder so the class is instantiable in tests; nothing here
|
|
# actually touches the network.
|
|
@mock.patch.dict(os.environ, {"CM_BOT_BASE_URL": "https://example.invalid"})
|
|
class ScraperHelpersTests(unittest.TestCase):
|
|
def setUp(self):
|
|
# Each test gets a fresh tmpdir so the dump helper writes
|
|
# somewhere predictable. We chdir into it for the duration of
|
|
# the test because _dump_html writes to a relative
|
|
# logs/scraper-failures path.
|
|
self._old_cwd = os.getcwd()
|
|
self._tmp = tempfile.mkdtemp(prefix="r3-test-")
|
|
os.chdir(self._tmp)
|
|
self.bot = CM_BOT()
|
|
|
|
def tearDown(self):
|
|
os.chdir(self._old_cwd)
|
|
shutil.rmtree(self._tmp, ignore_errors=True)
|
|
|
|
# ---- _dump_html ----
|
|
|
|
def test_dump_html_creates_dir_and_writes_bytes(self):
|
|
path = self.bot._dump_html("ctx-test", b"<html>hi</html>")
|
|
self.assertTrue(os.path.isfile(path), f"file should exist: {path}")
|
|
with open(path, "rb") as f:
|
|
self.assertEqual(f.read(), b"<html>hi</html>")
|
|
# The directory was created.
|
|
self.assertTrue(path.startswith(os.path.join("logs", "scraper-failures")))
|
|
|
|
def test_dump_html_accepts_str_content(self):
|
|
path = self.bot._dump_html("ctx-test", "<html>hi</html>")
|
|
with open(path, "rb") as f:
|
|
self.assertEqual(f.read(), b"<html>hi</html>")
|
|
|
|
def test_dump_html_includes_context_and_timestamp_in_filename(self):
|
|
path = self.bot._dump_html("register_form_token", b"x")
|
|
basename = os.path.basename(path)
|
|
self.assertTrue(basename.startswith("register_form_token-"), basename)
|
|
self.assertTrue(basename.endswith(".html"), basename)
|
|
|
|
# ---- _find_input_value ----
|
|
|
|
def test_find_input_value_returns_value_when_present(self):
|
|
html = '<form><input name="token" value="abc123"></form>'
|
|
soup = BeautifulSoup(html, "html.parser")
|
|
result = self.bot._find_input_value(
|
|
soup, "token", context="happy_path", raw=html.encode()
|
|
)
|
|
self.assertEqual(result, "abc123")
|
|
|
|
def test_find_input_value_raises_and_dumps_when_missing(self):
|
|
html = '<form><input name="other" value="x"></form>'
|
|
soup = BeautifulSoup(html, "html.parser")
|
|
with self.assertRaises(ScraperError) as cm:
|
|
self.bot._find_input_value(
|
|
soup, "token", context="missing_input", raw=html.encode()
|
|
)
|
|
msg = str(cm.exception)
|
|
self.assertIn("missing_input", msg)
|
|
self.assertIn("token", msg)
|
|
# The path mentioned in the message must actually exist.
|
|
# The path appears in parentheses at the end: "(response saved to <path>)"
|
|
# We check by listing the dump dir.
|
|
dumped = os.listdir(os.path.join("logs", "scraper-failures"))
|
|
self.assertEqual(len(dumped), 1, f"expected one dump, got {dumped}")
|
|
self.assertTrue(dumped[0].startswith("missing_input-"))
|
|
|
|
def test_find_input_value_raises_when_input_has_no_value_attr(self):
|
|
html = '<form><input name="token"></form>'
|
|
soup = BeautifulSoup(html, "html.parser")
|
|
with self.assertRaises(ScraperError):
|
|
self.bot._find_input_value(
|
|
soup, "token", context="no_value_attr", raw=html.encode()
|
|
)
|
|
|
|
def test_find_input_value_does_not_dump_on_success(self):
|
|
html = '<form><input name="token" value="abc"></form>'
|
|
soup = BeautifulSoup(html, "html.parser")
|
|
self.bot._find_input_value(
|
|
soup, "token", context="should_not_dump", raw=html.encode()
|
|
)
|
|
# logs/scraper-failures may not even exist on the happy path.
|
|
self.assertFalse(
|
|
os.path.isdir(os.path.join("logs", "scraper-failures")),
|
|
"happy path should not create the failure dir",
|
|
)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
unittest.main()
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests to verify they fail**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -m unittest tests.test_cm_bot_scraper -v 2>&1 | tail -10
|
|
```
|
|
|
|
Expected: `ImportError: cannot import name 'ScraperError' from 'app.cm_bot'` (or similar). The whole class is missing.
|
|
|
|
- [ ] **Step 3: Add `ScraperError`, `_dump_html`, `_find_input_value` to `app/cm_bot.py`**
|
|
|
|
In `app/cm_bot.py`, the top of the file currently has:
|
|
|
|
```python
|
|
import datetime
|
|
import requests, re
|
|
from bs4 import BeautifulSoup
|
|
import os
|
|
```
|
|
|
|
Add `ScraperError` immediately after the imports (before `class CM_BOT:`):
|
|
|
|
```python
|
|
class ScraperError(Exception):
|
|
"""A cm99.net response did not contain the field we expected.
|
|
|
|
The raw response is saved to logs/scraper-failures/ before this is
|
|
raised; the message identifies which method failed and what was
|
|
being looked for.
|
|
"""
|
|
```
|
|
|
|
Then add the two helper methods inside `class CM_BOT:`. A natural placement is right after `_setup_headers` and before `get_register_data` (around line 204):
|
|
|
|
```python
|
|
def _dump_html(self, context: str, content) -> str:
|
|
"""Save a failing cm99.net response to logs/scraper-failures/.
|
|
|
|
Returns the path written to so callers can include it in error
|
|
messages.
|
|
"""
|
|
ts = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
|
|
out_dir = os.path.join("logs", "scraper-failures")
|
|
os.makedirs(out_dir, exist_ok=True)
|
|
path = os.path.join(out_dir, f"{context}-{ts}.html")
|
|
if isinstance(content, (bytes, bytearray)):
|
|
data = bytes(content)
|
|
else:
|
|
data = str(content).encode("utf-8", "replace")
|
|
with open(path, "wb") as f:
|
|
f.write(data)
|
|
print(f"[scraper-failure] dumped {context} response to {path}")
|
|
return path
|
|
|
|
def _find_input_value(self, soup, name: str, *, context: str, raw) -> str:
|
|
"""Extract <input name=NAME value=...>'s value or raise ScraperError.
|
|
|
|
Saves the raw response to logs/scraper-failures/ before raising
|
|
so the operator can postmortem.
|
|
"""
|
|
el = soup.find("input", {"name": name})
|
|
if el is None or "value" not in el.attrs:
|
|
path = self._dump_html(context, raw)
|
|
raise ScraperError(
|
|
f"{context}: input[name={name!r}] missing or has no value attribute "
|
|
f"(response saved to {path})"
|
|
)
|
|
return el["value"]
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests to verify they pass**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -m unittest tests.test_cm_bot_scraper -v 2>&1 | tail -10
|
|
```
|
|
|
|
Expected: 6 tests, `OK`.
|
|
|
|
- [ ] **Step 5: Confirm prior tests still pass**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -m unittest tests.test_debug_enabled tests.test_bot_cli tests.test_cm_bot_scraper -v 2>&1 | tail -8
|
|
```
|
|
|
|
Expected: combined `OK`. Total: 2 (debug) + 28 (bot_cli) + 6 (scraper) = 36 tests passing.
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
git add tests/test_cm_bot_scraper.py app/cm_bot.py && \
|
|
git -c user.name='yiekheng' -c user.email='yiekheng@04080616.xyz' \
|
|
commit -m "feat(scraper): add ScraperError + _dump_html + _find_input_value helpers"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 2: Convert the five `<input name="token">` extractions to use the helper
|
|
|
|
**Files:**
|
|
- Modify: `app/cm_bot.py` (`get_register_form_token`, `get_security_pin_form_token`, `get_transfer_token`, `transfer_credit` — three lines inside this method)
|
|
|
|
The dominant pattern in cm_bot.py is `soup.find('input', {'name': 'token'})['value']`. Replacing each call site is mechanical: keep the request, change the extraction.
|
|
|
|
- [ ] **Step 1: Convert `get_register_form_token`**
|
|
|
|
Find (around line 344-354):
|
|
|
|
```python
|
|
def get_register_form_token(self):
|
|
try:
|
|
response = self.session.post(
|
|
f'{self.base_url}/cm/loadUserAccount',
|
|
headers=self.get_register_form_headers
|
|
)
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
return soup.find('input', {'name' : "token"})['value']
|
|
except requests.exceptions.RequestException as e:
|
|
print(f"Error getting register form: {e}")
|
|
return None
|
|
```
|
|
|
|
Replace the `soup.find(...)['value']` line with the helper:
|
|
|
|
```python
|
|
def get_register_form_token(self):
|
|
try:
|
|
response = self.session.post(
|
|
f'{self.base_url}/cm/loadUserAccount',
|
|
headers=self.get_register_form_headers
|
|
)
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
return self._find_input_value(
|
|
soup, "token",
|
|
context="register_form_token",
|
|
raw=response.content,
|
|
)
|
|
except requests.exceptions.RequestException as e:
|
|
print(f"Error getting register form: {e}")
|
|
return None
|
|
```
|
|
|
|
The `except requests.exceptions.RequestException` only catches network errors. `ScraperError` (which inherits from `Exception`) propagates up to whatever `cm_bot_hal.py` is catching, which is `except Exception as e` — same as before, just with a useful message instead of a TypeError.
|
|
|
|
- [ ] **Step 2: Convert `get_security_pin_form_token`**
|
|
|
|
Find (around line 357-360):
|
|
|
|
```python
|
|
def get_security_pin_form_token(self):
|
|
response = self.session.get(f'{self.base_url}/cm/setSecurityPin')
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
return soup.find('input', {'name' : "token"})['value']
|
|
```
|
|
|
|
Replace with:
|
|
|
|
```python
|
|
def get_security_pin_form_token(self):
|
|
response = self.session.get(f'{self.base_url}/cm/setSecurityPin')
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
return self._find_input_value(
|
|
soup, "token",
|
|
context="security_pin_form_token",
|
|
raw=response.content,
|
|
)
|
|
```
|
|
|
|
- [ ] **Step 3: Convert `get_transfer_token`**
|
|
|
|
Find (around line 463-466):
|
|
|
|
```python
|
|
def get_transfer_token(self):
|
|
response = self.session.get(f'{self.base_url}/cm/transfer')
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
return soup.find('input', {'name' : "token"})['value']
|
|
```
|
|
|
|
Replace with:
|
|
|
|
```python
|
|
def get_transfer_token(self):
|
|
response = self.session.get(f'{self.base_url}/cm/transfer')
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
return self._find_input_value(
|
|
soup, "token",
|
|
context="transfer_token",
|
|
raw=response.content,
|
|
)
|
|
```
|
|
|
|
- [ ] **Step 4: Convert the three extractions inside `transfer_credit`**
|
|
|
|
Find (around line 426-446):
|
|
|
|
```python
|
|
def transfer_credit(self, t_username: str, t_password: str, amount: float):
|
|
token = self.get_transfer_token()
|
|
transfer_search_data = self.get_transfer_search_data(token, t_username)
|
|
response = self.session.post(
|
|
f'{self.base_url}/cm/searchTransferUser',
|
|
data=transfer_search_data,
|
|
headers=self.transfer_search_headers
|
|
)
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
name = soup.find('input', {'id': "name"})['value']
|
|
token = soup.find('input', {'name': "token"})['value']
|
|
toUserId = soup.find('input', {'id': "toUserId"})['value']
|
|
```
|
|
|
|
This block uses two different finders: `{'name': X}` for `token`, and `{'id': X}` for `name` and `toUserId`. The `_find_input_value` helper as written only handles `{'name': X}`. We have two options:
|
|
|
|
**Option A — extend the helper.** Add an optional `by` parameter (`'name'` or `'id'`).
|
|
**Option B — keep `_find_input_value` narrow, write inline checks for the `id`-based ones.**
|
|
|
|
We pick Option A. It's a one-parameter widening with a default of `"name"`, so existing call sites are unchanged.
|
|
|
|
In `app/cm_bot.py`, update the helper signature:
|
|
|
|
```python
|
|
def _find_input_value(self, soup, ident: str, *, context: str, raw, by: str = "name") -> str:
|
|
"""Extract <input {by}=IDENT value=...>'s value or raise ScraperError.
|
|
|
|
`by` selects between matching <input name=...> (default) and
|
|
<input id=...>. Saves the raw response to logs/scraper-failures/
|
|
before raising so the operator can postmortem.
|
|
"""
|
|
el = soup.find("input", {by: ident})
|
|
if el is None or "value" not in el.attrs:
|
|
path = self._dump_html(context, raw)
|
|
raise ScraperError(
|
|
f"{context}: input[{by}={ident!r}] missing or has no value attribute "
|
|
f"(response saved to {path})"
|
|
)
|
|
return el["value"]
|
|
```
|
|
|
|
Update the test for the existing happy-path — the `name` parameter is now called `ident`. Also add a test for the `by="id"` path. Append to `tests/test_cm_bot_scraper.py` inside `ScraperHelpersTests`:
|
|
|
|
```python
|
|
def test_find_input_value_supports_by_id(self):
|
|
html = '<form><input id="toUserId" value="42"></form>'
|
|
soup = BeautifulSoup(html, "html.parser")
|
|
result = self.bot._find_input_value(
|
|
soup, "toUserId", context="by_id", raw=html.encode(), by="id",
|
|
)
|
|
self.assertEqual(result, "42")
|
|
```
|
|
|
|
The five existing test methods that use `name="token"` keep working because the rename `name → ident` is a positional argument; tests pass it positionally.
|
|
|
|
Now replace the body of `transfer_credit`:
|
|
|
|
```python
|
|
def transfer_credit(self, t_username: str, t_password: str, amount: float):
|
|
token = self.get_transfer_token()
|
|
transfer_search_data = self.get_transfer_search_data(token, t_username)
|
|
response = self.session.post(
|
|
f'{self.base_url}/cm/searchTransferUser',
|
|
data=transfer_search_data,
|
|
headers=self.transfer_search_headers
|
|
)
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
name = self._find_input_value(
|
|
soup, "name", context="transfer_search_name", raw=response.content, by="id",
|
|
)
|
|
token = self._find_input_value(
|
|
soup, "token", context="transfer_search_token", raw=response.content,
|
|
)
|
|
toUserId = self._find_input_value(
|
|
soup, "toUserId", context="transfer_search_toUserId", raw=response.content, by="id",
|
|
)
|
|
transfer_data = self.get_transfer_data(token, t_username, name, toUserId, amount, t_password)
|
|
response = self.session.post(
|
|
f'{self.base_url}/cm/saveTransfer',
|
|
data=transfer_data,
|
|
headers=self.transfer_credit_headers
|
|
)
|
|
return True if re.search(r'Successfully saved the record\.', response.text) else False
|
|
```
|
|
|
|
The rest of `transfer_credit` (the second POST and the success-string check) stays identical. The commented-out `# with open('transfer_credit.html', ...)` block at the end can be deleted as part of this edit (the dump now happens automatically on a parse miss).
|
|
|
|
- [ ] **Step 5: Run tests to verify everything still passes**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -m unittest tests.test_cm_bot_scraper -v 2>&1 | tail -10
|
|
```
|
|
|
|
Expected: 7 tests, `OK` (six original + one new for `by="id"`).
|
|
|
|
- [ ] **Step 6: Confirm full suite still green**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -m unittest tests.test_debug_enabled tests.test_bot_cli tests.test_cm_bot_scraper -v 2>&1 | tail -8
|
|
```
|
|
|
|
Expected: total 37 tests, `OK`.
|
|
|
|
- [ ] **Step 7: Commit**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
git add tests/test_cm_bot_scraper.py app/cm_bot.py && \
|
|
git -c user.name='yiekheng' -c user.email='yiekheng@04080616.xyz' \
|
|
commit -m "refactor(scraper): convert input-value extractions to helper"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 3: Make `get_register_link` and `get_user_credit` failure paths informative
|
|
|
|
**Files:**
|
|
- Modify: `app/cm_bot.py` (`get_register_link`, `get_user_credit`)
|
|
|
|
These two methods don't fit the input-value helper. `get_register_link` extracts an `<a href="...">` from a specific form; `get_user_credit` does multi-step text-content navigation through a table. We add explicit dump+raise / dump+log behavior at each.
|
|
|
|
- [ ] **Step 1: Update `get_register_link`**
|
|
|
|
Find (around line 402-406):
|
|
|
|
```python
|
|
def get_register_link(self):
|
|
response = self.session.get(f"{self.base_url}/cm/showQrCode")
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
soup = soup.find('form', {'id': 'qrCodeForm'})
|
|
return soup.find('a')['href']
|
|
```
|
|
|
|
Replace with:
|
|
|
|
```python
|
|
def get_register_link(self):
|
|
response = self.session.get(f"{self.base_url}/cm/showQrCode")
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
form = soup.find('form', {'id': 'qrCodeForm'})
|
|
if form is None:
|
|
path = self._dump_html("register_link_form", response.content)
|
|
raise ScraperError(
|
|
f"register_link: form#qrCodeForm not found "
|
|
f"(response saved to {path})"
|
|
)
|
|
anchor = form.find('a')
|
|
if anchor is None or 'href' not in anchor.attrs:
|
|
path = self._dump_html("register_link_anchor", response.content)
|
|
raise ScraperError(
|
|
f"register_link: <a href> inside form#qrCodeForm not found "
|
|
f"(response saved to {path})"
|
|
)
|
|
return anchor['href']
|
|
```
|
|
|
|
- [ ] **Step 2: Update `get_user_credit`'s except block**
|
|
|
|
Find (around line 448-461):
|
|
|
|
```python
|
|
def get_user_credit(self):
|
|
response = self.session.post(
|
|
f'{self.base_url}/cm/userProfile',
|
|
headers=self.get_user_credit_headers
|
|
)
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
try:
|
|
return round(float(soup.find('table', {'class': 'generalContent'}).find(text=re.compile('Credit Available')).parent.parent.find_all('td')[2].text.replace(",","")), 2)
|
|
except:
|
|
print(f"Error getting credit.")
|
|
now = datetime.datetime.now().strftime('%Y%m%d_%H%M')
|
|
# with open(f'credit-{now}.html', 'wb') as f:
|
|
# f.write(response.content)
|
|
return 0
|
|
```
|
|
|
|
Replace the `except:` block so it actively dumps the HTML (uncomment the previously-commented dump and route it through the helper):
|
|
|
|
```python
|
|
def get_user_credit(self):
|
|
response = self.session.post(
|
|
f'{self.base_url}/cm/userProfile',
|
|
headers=self.get_user_credit_headers
|
|
)
|
|
soup = BeautifulSoup(response.content, 'html.parser')
|
|
try:
|
|
return round(float(soup.find('table', {'class': 'generalContent'}).find(text=re.compile('Credit Available')).parent.parent.find_all('td')[2].text.replace(",","")), 2)
|
|
except Exception as exc:
|
|
self._dump_html("get_user_credit", response.content)
|
|
print(f"Error getting credit: {exc}")
|
|
return 0
|
|
```
|
|
|
|
Three changes inside the `except`: catch `Exception as exc` (was bare `except`), call `_dump_html` (was a commented-out `with open(...)`), drop the now-unused `now = datetime.datetime.now()...` line. The bare-except → `Exception as exc` widening is intentional — the original bare except also caught `KeyboardInterrupt` and `SystemExit`, which we should not be swallowing in a credit-read.
|
|
|
|
The function still returns `0` on failure to preserve the existing contract (callers in `cm_bot_hal.py:transfer_credit_api` check `amount <= 0.01` and short-circuit). We do not change that.
|
|
|
|
- [ ] **Step 3: Run all tests**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -m unittest tests.test_debug_enabled tests.test_bot_cli tests.test_cm_bot_scraper -v 2>&1 | tail -8
|
|
```
|
|
|
|
Expected: 37 tests, `OK`. (No new tests in this task — the changed methods are integration-level and would need live cm99.net or HTML fixtures to exercise. The two methods' happy paths are unchanged; their failure paths are dump+raise/log, which is independently exercised by Task 1's helper tests.)
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
git add app/cm_bot.py && \
|
|
git -c user.name='yiekheng' -c user.email='yiekheng@04080616.xyz' \
|
|
commit -m "refactor(scraper): make get_register_link and get_user_credit dump on failure"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Final verification
|
|
|
|
**Files:** none modified.
|
|
|
|
- [ ] **Step 1: All tests green**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -m unittest tests.test_debug_enabled tests.test_bot_cli tests.test_cm_bot_scraper -v 2>&1 | tail -8
|
|
```
|
|
|
|
Expected: 37 tests, `OK`.
|
|
|
|
- [ ] **Step 2: Sanity-grep for the old pattern**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
grep -n "soup.find('input'.*\['value'\]" app/cm_bot.py && echo "STILL THERE" || echo "OK: no bare input-value extractions"
|
|
```
|
|
|
|
Expected: `OK: no bare input-value extractions`.
|
|
|
|
- [ ] **Step 3: ScraperError is exported from `app.cm_bot`**
|
|
|
|
```bash
|
|
cd /home/yiekheng/projects/cm_bot_v2 && \
|
|
.venv/bin/python -c "
|
|
from app.cm_bot import CM_BOT, ScraperError
|
|
assert issubclass(ScraperError, Exception)
|
|
assert hasattr(CM_BOT, '_dump_html')
|
|
assert hasattr(CM_BOT, '_find_input_value')
|
|
print('ScraperError + helpers OK')
|
|
"
|
|
```
|
|
|
|
Expected: `ScraperError + helpers OK`.
|
|
|
|
- [ ] **Step 4: Real-call smoke (deferred to operator)**
|
|
|
|
Trigger an actual bot operation against cm99.net (e.g., from the dev tier with real agent creds: `bash scripts/bot_cli.sh credit <username> <password>`). On success: behavior unchanged. On a parse failure that previously would have TypeError'd: a `ScraperError` propagates with a clear message and a file appears under `logs/scraper-failures/<context>-<timestamp>.html`.
|
|
|
|
---
|
|
|
|
## Spec Coverage Check (self-review)
|
|
|
|
| Spec requirement | Task |
|
|
|---|---|
|
|
| `ScraperError` class | Task 1 |
|
|
| `_dump_html` instance method | Task 1 |
|
|
| `_find_input_value` instance method, default `by="name"` | Task 1 |
|
|
| `_find_input_value` extension to support `by="id"` for `transfer_credit` | Task 2 |
|
|
| Convert `get_register_form_token` | Task 2 step 1 |
|
|
| Convert `get_security_pin_form_token` | Task 2 step 2 |
|
|
| Convert `get_transfer_token` | Task 2 step 3 |
|
|
| Convert three extractions inside `transfer_credit` (`name`, `token`, `toUserId`) | Task 2 step 4 |
|
|
| `get_register_link` failure path dumps + raises | Task 3 step 1 |
|
|
| `get_user_credit` failure path dumps + logs (returns 0 unchanged) | Task 3 step 2 |
|
|
| Unit tests in `tests/test_cm_bot_scraper.py` | Task 1 + Task 2 |
|
|
| `logs/` already gitignored, no .gitignore change | (existing — verified pre-flight) |
|
|
| No CSRF token caching | (intentionally not in plan) |
|
|
|
|
No gaps. No placeholders. `ScraperError`, `_dump_html`, `_find_input_value` names consistent across tasks. `by` parameter introduced in Task 2 with a default that preserves Task 1's API contract.
|