apify

Apify actor collection is organized around three repeating operations per actor: a synchronous run that returns the actor’s key-value OUTPUT, a synchronous run that returns the actor’s dataset items, and a run-start operation that returns run metadata immediately. The API surface here is actor-centric: each actor in this collection exposes the same trio of operations and each operation accepts an Apify token and the actor’s input (called InputSchema). Understanding those three roles and how an actor emits results is the most important operational pattern to get right.

How the domain is structured

Actors are the primary entity. Each actor has a fixed slug (visible in the operation name) and exposes three kinds of operations: run-and-wait-for-OUTPUT, run-and-wait-for-dataset-items, and start-run-and-return-metadata. There is no single generic dataset or key-value fetch operation in this collection—results are returned by actor-specific run endpoints.
Runs are the unit of execution. A run can produce two common output locations: a Key-value store (commonly the OUTPUT key) and a Dataset (items). Which one an actor uses is actor-specific; some actors write only to a dataset, some to OUTPUT, some to both.
The InputSchema body is the actor’s run configuration and varies by actor. Typical fields (filtering, target URLs/ids, limits, auth tokens for target services) appear in that schema but are actor-defined. Inspect the actor’s InputSchema before calling to know which options the actor supports.

Entry points — what to call first

If you already know the actor slug and you want the final data immediately, call the actor’s synchronous run that matches the output you expect:
- Use the run that returns dataset items when the actor produces structured items (scrapers, crawlers). That operation waits for completion and returns dataset items in the response.
- Use the run that returns KEY-VALUE OUTPUT when the actor writes its result to the key-value store (often under the OUTPUT key).
If you want to start the run and only need the run identifier or metadata (for logging, separate monitoring, or a long-running job you do not want to block on), call the actor’s run-start operation that returns run metadata. The run metadata typically includes the run id and references to where outputs will be stored (datasetId, keyValueStoreId) if present.
Always inspect the actor’s InputSchema before constructing the body. The schema tells which fields the actor accepts (for example: target handles, URL lists, limit, or format flags). If the user provides a human request ("scrape X profiles"), translate it to the actor’s input fields per that schema.

Common user requests and the right sequence

"Scrape this profile and return items": pick the actor whose purpose matches (e.g., instagram-profile-scraper), craft body according to that actor’s InputSchema, then call the actor’s run-sync-get-dataset-items operation so the call waits and returns the scraped items.
"Run this actor and give me the summary or a single JSON output file": choose the actor’s run-sync operation that returns key-value OUTPUT. Many actors put a single JSON summary under the OUTPUT key; inspect the response body for that key.
"Start the job but don’t wait": call the actor’s run-start operation (the one that returns run metadata). Use the returned run id or any datasetId/keyValueStoreId references to report where results will appear. Note: this collection does not provide a separate generic dataset fetch operation — dataset contents are returned by actor-specific run endpoints when those endpoints are used to run an actor to completion.
"I only want N results or a preview": look for a limit, pageSize, or equivalent field on the actor’s InputSchema. If such a field exists, pass it. If it does not, run the actor with its default input and post-filter the returned items (the call will return whatever the actor produced).

Non‑obvious patterns and gotchas

Three-operation pattern. For every actor you’ll see the same trio of operations named with the actor slug and one of: run-sync, run-sync-get-dataset-items, or runs-sync. Choose the variant that matches the output you need rather than assuming one operation will always contain the data.
Output location is actor-defined. Do not assume dataset vs key-value output. If a synchronous run returns an empty payload, check the run metadata (from the run-start operation) for datasetId or keyValueStoreId to understand where the actor wrote results.
No generic dataset fetch in this collection. There is no separate operation that takes a datasetId and returns items. To retrieve items through this API surface you must use the actor’s dataset-returning run endpoint (which re-runs the actor and returns the items) or rely on run metadata to learn where outputs will be stored externally.
Inspect responses for run status and result fields. Synchronous run endpoints wait for completion but the returned body may still indicate errors or partial results—check status fields and look for items, dataset, OUTPUT, or similarly named keys in the response body to find actual data.
The token parameter is required for every call. The caller’s Apify token determines access and where the run will be executed; ensure the token provided has the permissions the user expects.
InputSchema variability. Many actors accept startUrls, usernames, query, limit, or service-specific authentication fields, but names and semantics differ. Always map the user’s intent to the exact field names the actor expects.
Long-running actors. Synchronous run endpoints block until completion; if the target actor is long-running or the user expects immediate acknowledgement, prefer the run-start operation and report the run id so results can be checked or retrieved via other channels.

Quick decision checklist (when converting a user request to calls)

Identify the actor that matches the task (match by actor purpose in the operation name).
Inspect that actor’s InputSchema to learn the exact input fields to populate.
Decide whether you need immediate results or just a run id:
- Immediate results → pick the run-sync variant that matches whether results are in the dataset or key-value store.
- Run id / asynchronous → pick the runs-sync (run-start) operation.
Provide the user’s Apify token and the populated body.
When the call returns, locate the data by checking for items/dataset fields or the OUTPUT key in the response body; if the response is only metadata, report the run id and any datasetId/keyValueStoreId the run produced.

When you get stuck

If a synchronous run returns an empty or unexpected payload, check that the actor actually writes to the output location you queried. Use the run-start response to confirm datasetId or keyValueStoreId presence.
If the user asks to fetch a dataset by id, note that this collection does not offer a generic dataset-get-by-id operation—report the run metadata and use the appropriate actor run endpoint to obtain items instead.
If the user’s request requires a field not present in InputSchema, explain the actor’s supported inputs and ask the user to confirm how to approximate their request with available fields.

Final note on mapping intents to operations

When a user asks to "run an actor" or "get scraping results," treat the problem as two decisions: which actor (domain match) and which output channel (dataset vs OUTPUT). Select the actor-slug operation that matches both decisions: the dataset-returning synchronous run for item lists, the key-value-returning synchronous run for single-output JSON/FILES, or the run-start operation when the run should be started but not awaited.