web-scraper

This API executes Apify web-scraper Actors and returns either the run metadata or the Actor’s outputs directly. Conceptually there are three things you work with: the Actor you want to run, the Run that executes that Actor, and the Outputs that the Actor writes (the Key-Value store and/or the Dataset). Choose the operation that matches which of those you need back immediately.

How the domain is organized

Actors are the runnable units (a scraper Actor in this case). A run is a single execution of an Actor. Runs can produce two common output destinations: a Key-Value store (individual keys) and a Dataset (an ordered collection of items).
These operations are launch-and-return primitives: two of them run the Actor and wait for it to finish, returning one of the Actor’s outputs; the third starts a run and returns the run information about that execution.
Every call requires an Apify token (passed in the token argument) and a request body that identifies which Actor to run plus any Actor input/configuration. The exact body fields depend on the Actor and its expected input schema; supply the Actor identifier and the input payload the Actor expects.

Entry points — which operation to call first

Pick the operation based on the output you need immediately:

If you need the value the Actor stored under the Key-Value store key named OUTPUT, call run-sync-apify-web-scraper. It executes the Actor, waits for completion, and returns the value stored at OUTPUT.
If you need the Actor’s dataset items (an array of scraped items), call run-sync-get-dataset-items-apify-web-scraper. It executes the Actor, waits for completion, and returns the dataset items produced by that run.
If you only need run metadata (run id, status, timestamps, the started run record) and do not need Actor outputs right away, call runs-sync-apify-web-scraper. It returns information about the initiated run rather than the Actor outputs.

Start with the operation above that matches the user’s request. These operations execute the Actor using the token and the body you supply; they do not require any pre-existing run IDs.

Common user requests and how to fulfill them

"Run the scraper and give me the scraped items": use run-sync-get-dataset-items-apify-web-scraper with a body that identifies the Actor and provides the Actor input (search parameters, start URLs, etc.). The response body will contain the dataset items from that run.
"Run the scraper and give me the OUTPUT value": use run-sync-apify-web-scraper. This returns whatever was written under the Key-Value store key OUTPUT by the Actor.
"Start a run and return its run id/status so I can track it later": use runs-sync-apify-web-scraper. The response contains the initiated run record (id and run metadata).
"I want both the dataset items and the OUTPUT from the same single run": there is no single provided operation that returns both outputs together. To get both without rerunning would require a separate API that can fetch outputs by run id — that is not available here. Options are: (a) if the Actor can be modified to write its desired combined output to one destination, request that from the user; or (b) accept rerunning the Actor and call the appropriate run-and-wait operation twice (note: that creates two separate runs).

Output differences and trade‑offs

Key-Value OUTPUT vs Dataset: the Key-Value OUTPUT is a single value (often a JSON object or string) keyed by OUTPUT. The dataset is an array of items (scraped records). Choose the operation that matches which destination the Actor writes to.
Synchronous wait: the two run-sync... operations block until the Actor completes and then return the requested output. If the Actor run is long or produces very large datasets, expect a large response and potentially platform-side time limits. Use the run-metadata operation if you want to start the run without waiting for the full result.

Practical checklist for composing the request body

Before calling, ensure the body contains at minimum:

An identifier for which Actor to run (actor id, actor name, or run configuration the Actor expects).
The Actor input payload — the JSON the Actor needs to perform the scrape (start URLs, filters, selectors, etc.). Put the input under whichever property the Actor expects (commonly input), matching the Actor’s input schema.
Any run options the Actor supports (build tag, memory, timeout overrides) if the user requests them.

Include the Apify token in the token parameter at the top level of the call.

Non‑obvious gotchas and constraints

The run-sync operations each return only one destination’s output: either the OUTPUT key or the dataset items. They do not return both. If a user asks for both from a single run, plan around that limitation.
runs-sync-apify-web-scraper returns run metadata but not outputs. If you need outputs later you cannot fetch them with the operations listed here — the provided operations either return outputs immediately while waiting, or return just the run metadata.
Permissions: the token must have permission to start runs and to read the Actor’s outputs. Lack of permissions will result in an error rather than partial data.
Large datasets or long-running scrapes may hit platform timeouts or produce very large responses. Prefer obtaining run metadata (so you can start the run and arrange for downstream retrieval outside a single synchronous response) when a scrape is expected to be extensive.
Idempotency: invoking any run operation launches a run. Repeating a run to collect a different output produces a new run each time.

Decision flow (short)

Determine whether the user wants the dataset items, the Key-Value OUTPUT, or only run metadata.
Ensure the body identifies the Actor and includes the Actor input payload and any run options the user requested.
Provide the Apify token in token.
Call the operation that matches the desired immediate return: dataset items -> run-sync-get-dataset-items-apify-web-scraper; OUTPUT key -> run-sync-apify-web-scraper; run metadata only -> runs-sync-apify-web-scraper.

Following this mapping ensures the call returns the data form the user expects without needing extra fetches that this set of operations does not support.