The selector-rot problem
Every scraping team has felt it: a single nightly cron starts returning empty rows, and someone has to spend an afternoon spelunking through DevTools to fix three brittle selectors. Multiply that by 40 sources and you have a part-time job nobody wants.
The pattern: AI as the selector layer
Instead of hard-coding CSS, ask a model to find the data:
# pseudocode
html = page.accessibility_snapshot()
result = claude.extract(
schema={'price': 'number', 'sku': 'string', 'in_stock': 'boolean'},
page=html,
)You pay a few cents per page but get a scraper that survives most redesigns. Cache the AI-derived selectors, fall back to AI only when the cache misses.
What we use in production
- Playwright for the browser, with persistent contexts and rotating residential proxies for anti-bot.
- Claude Haiku 4.5 as the cheap extraction brain.
- SQLite/Postgres for cached selectors and last-good-value backstops.
Honest tradeoffs
This isn\'t free. AI extraction is 20–100x slower than a tuned CSS selector, and it adds API cost. Use it for resilience-critical sources, not for high-volume firehoses where shape is stable.
We build production scraping pipelines for clients who need data, not maintenance. See our scraping services.