Automation Capabilities
When you launch a browser runtime, you get access to two layers of functionality:
- Runtime base — methods available on every runtime regardless of driver: execute steps, stop, live view, recording, events, captcha, and AI agent namespaces.
- Driver surface — the full method set specific to your chosen driver (Playwright, Puppeteer, Stagehand, or Selenium).
One endpoint, every method
All driver automation goes through a single HTTP endpoint:
The call field maps to the method name. args is a JSON array of the method’s arguments. You can batch multiple steps in one request for efficiency.
The SDK handles this automatically. When you write await runtime.page.goto('https://example.com'), the SDK translates it into a structured step and sends it to the execute endpoint. You never construct the HTTP payload yourself unless you want to.
Runtime base
Every browser runtime — regardless of driver — exposes these capabilities:
See Runtime Reference for full documentation.
Driver references
Each driver exposes the native API surface you’d expect, running remotely:
Page, Locator, BrowserContext, Browser, Frame, ElementHandle, Mouse, Keyboard, Touchscreen.
Page, Locator, Frame, ElementHandle, Mouse, Keyboard, Touchscreen.
Page, Context, Locator — plus AI agent methods (act, extract, observe).
WebDriver, WebElement — standard Selenium WebDriver protocol.

