puppeteer-scraper

These operations run a Puppeteer-based Apify actor and return results in three different ways: wait-for-completion and return the actor’s single OUTPUT key-value value, wait-for-completion and return the actor’s dataset items, or start a run and return the run metadata immediately. The important distinctions are how results are delivered and whether the call blocks until the actor finishes.

How the domain is organized

This API revolves around a single actor execution concept: a run. A run can produce two common result containers on Apify:

a key-value store entry named OUTPUT (typically a single JSON object or small summary), and
a dataset of items (an array of records produced over time).

Each operation executes the actor and returns either results (for the sync variants) or run metadata (for the non-waiting variant). The same input structure (the actor input JSON) is passed in the body for all operations and the token is the Apify API token used to authenticate the run request.

Entry points — what to call first

When the user asks to run the actor or fetch scraped data, pick one of the three operations depending on the desired outcome:

If the user wants the actor run started and only needs the run ID / immediate run info (no waiting), call runs-sync-apify-puppeteer-scraper. It returns run metadata so you can reference the run later.
If the user wants the final single JSON output that many Apify actors put under the OUTPUT key, call run-sync-apify-puppeteer-scraper. That call waits for completion and returns the OUTPUT value in the response body.
If the user wants the scraped records (the dataset items), call run-sync-get-dataset-items-apify-puppeteer-scraper. That call waits for completion and returns the dataset items in the response body.

Choose the entry point by asking which form of result the user expects: a single JSON summary (OUTPUT), a list of item records (dataset), or just to start a run and get the run id/meta.

Common capabilities and workflows

Run-and-return OUTPUT (single-object result): use run-sync-apify-puppeteer-scraper when the actor stores its final result under the OUTPUT key-value. Typical user requests that map to this operation are: “Scrape this page and return the final JSON object,” or “Run the scraper and return the summary object.” The response body contains that object on success.

Run-and-return dataset items (records): use run-sync-get-dataset-items-apify-puppeteer-scraper when the actor emits many records to a dataset and the user expects a list of items. Typical requests: “Give me all scraped products,” or “Return the dataset of scraped articles.” The response body contains the dataset items.

Start a run and get metadata: use runs-sync-apify-puppeteer-scraper when the user only needs the run initiated (for background processing or when they will check status later). Typical requests: “Start a crawl and give me the run id” or “Schedule this job and return the run record.” The response provides run identifiers and status fields that you can refer to elsewhere.

If the user is unsure which container the actor uses, ask whether they expect a single summary object or many item records. If they still don’t know, prefer dataset items for record-oriented scrapers and OUTPUT for single-result scrapers; if uncertain and the run may be long, prefer starting the run (non-waiting) and clarify how they want results retrieved later.

Response handling notes (practical observation)

All three responses include a status string. For the sync operations, a successful run will include the requested results in body; a non-success status or an error-shaped body indicates the run did not produce the expected output. For runs-sync-apify-puppeteer-scraper, the returned run metadata includes the run id and status so you can identify the run in follow-up actions (for example, other APIs that accept a run id).

Because these operations return either results or run metadata directly, inspect the status and the body to decide the next step: return the body to the user when it contains data, or report the run id/status when launching background runs.

Choosing between dataset vs OUTPUT (non-obvious but important)

Many actors write one or the other, not both. Picking the wrong operation yields empty or unexpected results even though the run completed successfully.

If the actor was written to produce incremental records (a stream of items), the dataset call returns those items; the OUTPUT key may be absent or contain only a small summary.
If the actor was written to assemble a single result (one JSON object), the OUTPUT call returns that object; the dataset call may return an empty list.

Ask the user which shape they expect. If you must pick without guidance, prefer dataset items for list-like scraping tasks (multiple products, pages, articles) and OUTPUT for one-off conversions or aggregated summaries.

Practical gotchas and quirks

Long runs: the two run-sync operations block until completion. Runs that take a long time may delay responses; if the user expects background execution or wants immediate acknowledgement, use runs-sync-apify-puppeteer-scraper instead and return the run id.
Result emptiness vs failure: a run can complete successfully but produce an empty dataset or no OUTPUT if the actor’s input didn’t match expectations. An empty body does not always mean the run failed—inspect status and any error fields in body.
Authorization and token scope: the token is the Apify API token. If the token lacks access to the actor or the user’s Apify resources, runs will fail with authorization errors. Ensure the user supplies a token with the necessary permissions.
Large datasets and payload size: dataset responses may be large. If the user asks for an entire large scrape, confirm whether they want sampling, field selection, or a full export. If the user explicitly wants everything, return it but warn about size and delivery time.
Actor-specific inputs: the body must match the actor’s expected input schema. The operations accept arbitrary actor input JSON; they do not validate domain semantics for you. When the user provides parameters (start URLs, selectors, limits), pass them through exactly as the actor expects. If the user cannot specify the input, ask clarifying questions about the target pages, selectors, and limits.

Typical decision checklist (use before calling any operation)

Before executing, confirm three things with the user: the authentication token to use, whether they want the single OUTPUT object or dataset items, and any run-specific input (start URL(s), limits, or options). If they want immediate run metadata rather than waiting for results, choose runs-sync-apify-puppeteer-scraper.

Troubleshooting hints

If body is empty but status shows completion, ask the user whether the actor input correctly targeted content (URLs and selectors), and whether the actor normally emits dataset items or a single OUTPUT.
If runs time out or take too long, offer to start the run (non-waiting) and return the run id so the user can inspect it later.
If authorization errors occur, request a token with broader access or a different account token.

Use these patterns to map user intents (scrape now and return data, start job and return id, retrieve dataset items) to the correct operation and to know what to inspect in the response to confirm success or explain failure.