***

title: Automation Capabilities
description: Complete method reference for browser runtime automation across all drivers.
-----------------------------------------------------------------------------------------

When you launch a browser runtime, you get access to two layers of functionality:

1. **Runtime base** — methods available on every runtime regardless of driver: execute steps, stop, live view, recording, events, captcha, and AI agent namespaces.
2. **Driver surface** — the full method set specific to your chosen driver (Playwright, Puppeteer, Stagehand, or Selenium).

## One endpoint, every method

All driver automation goes through a single HTTP endpoint:

```
POST /v1/workspaces/{workspaceId}/execute
```

```json
{
  "runtime": "my-browser",
  "steps": [
    { "call": "page.goto", "args": ["https://example.com"] },
    { "call": "page.screenshot" }
  ]
}
```

The `call` field maps to the method name. `args` is a JSON array of the method's arguments. You can batch multiple steps in one request for efficiency.

**The SDK handles this automatically.** When you write `await runtime.page.goto('https://example.com')`, the SDK translates it into a structured step and sends it to the execute endpoint. You never construct the HTTP payload yourself unless you want to.

## Runtime base

Every browser runtime — regardless of driver — exposes these capabilities:

| Method                     | Description                                          |
| -------------------------- | ---------------------------------------------------- |
| `runtime.run(steps)`       | Execute structured automation steps directly         |
| `runtime.stop()`           | Stop the runtime and release infrastructure          |
| `runtime.live(options?)`   | Get a live interactive view URL                      |
| `runtime.recording()`      | Get a recording replay URL                           |
| `runtime.state()`          | Query runtime state                                  |
| `runtime.events.list()`    | List runtime events                                  |
| `runtime.events.wait()`    | Wait for a specific event                            |
| `runtime.captcha.detect()` | Detect captchas on the page                          |
| `runtime.captcha.solve()`  | Solve a detected captcha                             |
| `runtime.stagehand`        | Stagehand AI agent namespace (act, extract, observe) |
| `runtime.browserUse`       | Browser-use AI agent namespace                       |

See [Runtime Reference](/sdk/browser-capabilities/runtime) for full documentation.

## Driver references

Each driver exposes the native API surface you'd expect, running remotely:

<CardGroup cols={2}>
  <Card title="Playwright" href="/sdk/drivers/playwright/page">
    Page, Locator, BrowserContext, Browser, Frame, ElementHandle, Mouse, Keyboard, Touchscreen.
  </Card>

  <Card title="Puppeteer" href="/sdk/drivers/puppeteer/page">
    Page, Locator, Frame, ElementHandle, Mouse, Keyboard, Touchscreen.
  </Card>

  <Card title="Stagehand" href="/sdk/drivers/stagehand/page">
    Page, Context, Locator — plus AI agent methods (act, extract, observe).
  </Card>

  <Card title="Selenium" href="/sdk/drivers/selenium">
    WebDriver, WebElement — standard Selenium WebDriver protocol.
  </Card>
</CardGroup>