Concepts

Core abstractions and building blocks of the Scraper platform.

Flows

A Flow is the core unit of work in Scraper. It defines what to scrape or automate and how. Every Flow operates in one of three modes:

Extract — Pull structured data from a webpage. Define the fields you need, point it at a URL, and receive clean JSON.
Interact — Automate browser actions like logging in, submitting forms, clicking through paginated results, or downloading files.
Monitor — Watch a page for changes over time. Scraper compares outputs across runs and alerts you when something differs.

Flows contain ordered steps, output schemas, and optional schedules. They are versioned, so you can roll back to a previous configuration at any time. Flows can also be exported and imported as JSON, making it easy to share configurations across teams or environments.

{
  "name": "HN Top Stories",
  "mode": "extract",
  "url": "https://news.ycombinator.com",
  "steps": [...],
  "schema": {
    "title": "string",
    "url": "string",
    "points": "number"
  },
  "schedule": "0 */6 * * *"
}

Steps

Steps are the individual actions within a Flow. They execute sequentially and define the exact browser operations Scraper performs during a run. Each step has a type and associated configuration.

Navigate — Go to a URL. Supports dynamic URLs with variable interpolation.
Click — Click an element by CSS selector or AI-mapped target.
Fill — Enter text into a form field. Supports input, textarea, and contenteditable elements.
Extract — Pull structured data from the current page using selectors and extraction rules.
Wait — Pause execution until an element appears, a timeout elapses, or a network request completes.
Scroll — Scroll the page to trigger lazy loading or infinite scroll content.
Screenshot — Capture the current page state as a PNG. Useful for debugging or visual verification.
Condition — Branch based on element presence, text content, or extracted values.
Loop — Repeat a set of steps for pagination, list iteration, or retry logic.

{
  "type": "loop",
  "selector": ".next-page",
  "maxIterations": 10,
  "steps": [
    { "type": "extract", "selector": ".product-card", "fields": ["name", "price"] },
    { "type": "click", "selector": ".next-page" },
    { "type": "wait", "timeout": 2000 }
  ]
}

Runs

A Run is a single execution of a Flow. Every time a Flow is triggered — manually, via API, on a schedule, or by a webhook — it creates a Run. Runs track everything that happens during execution.

Each run has a status:

Queued — The run is waiting for an available worker.
Running — The flow is actively executing steps.
Completed — All steps finished successfully and data was extracted.
Failed — An error occurred. Logs contain the failure reason and the step that caused it.
Cancelled — The run was manually stopped before completion.

Every run produces detailed logs, extracted data (if applicable), total duration, and cost metrics. Runs can be triggered manually from the dashboard, via the REST API, on a cron schedule, or in response to an incoming webhook.

Extraction Rules

Extraction Rules define how raw page content gets transformed into structured data. They map CSS selectors to output fields and apply transformations to normalize the results.

Each rule consists of a selector, a target field name, and an optional transform. The available transform types are:

text — Extract the visible text content of the element.
html — Extract the inner HTML.
number — Parse the text as a number, stripping currency symbols and commas.
date — Parse the text as an ISO 8601 date string.
url — Resolve relative URLs to absolute URLs.

Output schemas define the shape of the final extracted data. Each field in the schema corresponds to an extraction rule. When a run completes, the extracted data is validated against the schema before being stored or delivered.

{
  "rules": [
    { "selector": "h1.title", "field": "title", "transform": "text" },
    { "selector": ".price", "field": "price", "transform": "number" },
    { "selector": "time[datetime]", "field": "publishedAt", "transform": "date" },
    { "selector": "a.product-link", "field": "link", "transform": "url" }
  ]
}

Monitoring & Alerts

Monitoring flows automatically compare extracted data across runs to detect changes. When a difference is found, Scraper generates an alert and delivers it through your configured notification channels.

Alert types:

change_detected — A monitored field has changed since the last run.
threshold — A numeric field crossed a defined threshold (e.g., price dropped below $50).
error — A scheduled run failed.
schedule_missed — A scheduled run did not execute within its expected window.

Each alert has a severity level:

info — Informational, no action needed.
warning — Something may need attention.
critical — Immediate action recommended.

Notifications can be delivered via email, Slack, Discord, or any custom webhook endpoint. You can configure different channels for different severity levels.

API & Integration

Every feature in Scraper is accessible programmatically through the REST API. All requests are authenticated with API keys that you generate from the dashboard.

Each Flow gets a unique API endpoint. You can trigger runs, retrieve extracted data, and manage flow configuration entirely through the API without ever touching the dashboard.

# Trigger a flow run
curl -X POST https://scraper.bot/api/flows/f_3kx9m2/run \
  -H "Authorization: Bearer scr_live_..." \
  -H "Content-Type: application/json"

# Get run results
curl https://scraper.bot/api/runs/r_8jx2k1 \
  -H "Authorization: Bearer scr_live_..."

Official SDKs are available for TypeScript and Python, providing typed clients with built-in retry logic and error handling.

import { Scraper } from "@scraper/sdk"

const scraper = new Scraper({ apiKey: "scr_live_..." })

const run = await scraper.flows.run("f_3kx9m2")
const data = await run.waitForCompletion()

console.log(data.results)

Webhooks allow Scraper to push data to your systems in real time. Configure a webhook URL on any flow, and Scraper will POST the extracted data to your endpoint as soon as a run completes. Webhook triggers also let you start runs from external systems by sending a POST request to a unique trigger URL.

Start building with these concepts

Create your first flow in minutes. No credit card required.

Get Started Free