Browser automation
When a page needs a real browser — JavaScript rendering, modern frameworks, login-aware pages — Han AI drives a headless Chromium via Playwright on the VPS.
What it does
Loads a page in a real browser, waits for it to render, and returns either the visible page content or a clean structured scrape.
| Field | Value |
|---|---|
| Schema names | browse, scrape_clean |
| Powered by | Playwright plus Chromium |
| Browser path | /var/hanai/playwright-browsers |
| API key required | No |
When Han AI uses it
- A static page fetch returned mostly empty markup because the page is a single-page app.
- A site requires interaction (a click, a scroll, an accept) before its content is visible.
- A scrape needs to be clean of nav chrome, ads, and pop-overs.
Examples
- “Pull the live listings from that broker’s portal page.”
- “Get the current room availability from the property management site.”
- “Open this page, accept the cookie banner, and read me the article body.”
Limits
- Slower than plain page fetch — seconds, not milliseconds.
- Heavily-protected sites (Cloudflare, hCaptcha, fingerprinting) may still block headless Chromium.
- Long-running interactive flows should be modeled as recurring tasks rather than packed into a single live turn.
Why this stack
Playwright with Chromium is the industry-standard open-source browser automation stack. Running it on your VPS means dynamic pages can be read without paying a hosted scraping service, and without your queries being logged by a third party.
See also
- Page fetch — for static pages
- Web search
- Capabilities overview