> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://platform.bctrl.ai/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://platform.bctrl.ai/_mcp/server.

# Extract Structured Data

> Point a hosted agent at a page and get schema-validated JSON back - no parsing, no selectors.

Describe what you want and the shape you want it in. The `extract` [invocation](/sdk/invocations) runs inside the runtime, reads the page, and returns output validated against your schema - zod in TypeScript, Pydantic in Python.

```ts TypeScript
import { Bctrl } from "@bctrl/sdk";
import { z } from "zod";

const bctrl = new Bctrl({ apiKey: process.env.BCTRL_API_KEY! });

const runtime = await bctrl.runtimes.create({ type: "browser", name: "extract-recipe" });
await bctrl.runtimes.start(runtime.id);

// Point the active tab at the page you want to read.
await bctrl.runtimes.targets.create(runtime.id, {
  uri: "https://news.ycombinator.com",
  activate: true,
});

const invocation = await bctrl.runtimes.invocations.createAndWait(
  runtime.id,
  {
    action: "extract",
    instruction: "Extract the top 5 stories.",
    schema: z.object({
      stories: z.array(
        z.object({
          title: z.string(),
          points: z.number(),
          commentCount: z.number(),
        })
      ),
    }),
  },
  { timeoutMs: 120_000 }
);

console.log(invocation.output); // already matches the schema

await bctrl.runtimes.stop(runtime.id);
```

```python Python
from bctrl import Bctrl
from pydantic import BaseModel, ConfigDict


class Story(BaseModel):
    model_config = ConfigDict(extra="forbid")
    title: str
    points: int
    commentCount: int


class TopStories(BaseModel):
    model_config = ConfigDict(extra="forbid")
    stories: list[Story]


bctrl = Bctrl()

with bctrl.runtimes.started_browser(name="extract-recipe") as rt:
    # Point the active tab at the page you want to read.
    bctrl.runtimes.targets.create(
        rt.runtime_id, uri="https://news.ycombinator.com", activate=True
    )

    invocation = bctrl.runtimes.invocations.create_and_wait(
        rt.runtime_id,
        action="extract",
        instruction="Extract the top 5 stories.",
        output_model=TopStories,
    )

    top: TopStories = invocation["parsed_output"]
    for story in top.stories:
        print(story.points, story.title)
```

The schema is enforced server-side: if the model produces output that doesn't validate, the invocation fails with `invocation.output_validation_failed` instead of handing you malformed JSON. In Python, `parsed_output` is the instantiated Pydantic model, not a dict.

You can also navigate with your own CDP code first and call `extract` on whatever page the browser is on - the invocation always acts on the active target.

## Next

* [Invocations](/sdk/invocations) - all hosted actions
* [Stagehand on BCTRL](/cookbook/stagehand) - act + observe + extract as a flow
* [Run a Hosted Agent](/cookbook/hosted-agent) - multi-step tasks, not single reads