Automation Capabilities

View as Markdown

When you launch a browser runtime, you get access to two layers of functionality:

  1. Runtime base — methods available on every runtime regardless of driver: execute steps, stop, live view, recording, events, captcha, and AI agent namespaces.
  2. Driver surface — the full method set specific to your chosen driver (Playwright, Puppeteer, Stagehand, or Selenium).

One endpoint, every method

All driver automation goes through a single HTTP endpoint:

POST /v1/workspaces/{workspaceId}/execute
1{
2 "runtime": "my-browser",
3 "steps": [
4 { "call": "page.goto", "args": ["https://example.com"] },
5 { "call": "page.screenshot" }
6 ]
7}

The call field maps to the method name. args is a JSON array of the method’s arguments. You can batch multiple steps in one request for efficiency.

The SDK handles this automatically. When you write await runtime.page.goto('https://example.com'), the SDK translates it into a structured step and sends it to the execute endpoint. You never construct the HTTP payload yourself unless you want to.

Runtime base

Every browser runtime — regardless of driver — exposes these capabilities:

MethodDescription
runtime.run(steps)Execute structured automation steps directly
runtime.stop()Stop the runtime and release infrastructure
runtime.live(options?)Get a live interactive view URL
runtime.recording()Get a recording replay URL
runtime.state()Query runtime state
runtime.events.list()List runtime events
runtime.events.wait()Wait for a specific event
runtime.captcha.detect()Detect captchas on the page
runtime.captcha.solve()Solve a detected captcha
runtime.stagehandStagehand AI agent namespace (act, extract, observe)
runtime.browserUseBrowser-use AI agent namespace

See Runtime Reference for full documentation.

Driver references

Each driver exposes the native API surface you’d expect, running remotely: