instagram-post-scraper

This actor runs an Instagram post scraper and exposes two kinds of outputs: a structured dataset of scraped items and a key-value store entry named OUTPUT. Each invocation creates a Run (with an ID and status) and then writes results into the actor's dataset and key-value store according to the actor's internal logic. The three operations available map to three practical workflows: start-and-wait-for-key-value, start-and-wait-for-dataset, and start-and-return-run-metadata.

How the domain is organized

Actor run: a single execution of the scraper. A run has metadata (ID, status, timestamps). The runs-sync-apify-instagram-post-scraper call returns that metadata immediately.
Dataset: the actor writes structured, itemized results (one scraped post per item) into a dataset. The run-sync-get-dataset-items-apify-instagram-post-scraper call executes the actor and returns those dataset items in the response.
Key-value store (key OUTPUT): the actor also writes to the key-value store; the key OUTPUT holds a blob the actor considers the primary output (often combined or raw data). The run-sync-apify-instagram-post-scraper call executes the actor and returns the OUTPUT value.

These three resources relate as: input -> run -> {dataset items, key-value OUTPUT}.

Entry points — which operation to call first

Choose based on what the user expects back and whether you want the call to wait for completion:

If the user wants the run ID and run metadata immediately (start now, check later), call runs-sync-apify-instagram-post-scraper. It returns RunsResponseSchema containing the run identifier and status.
If the user wants the actor to complete and needs the structured scraped items (ready-to-use JSON posts), call run-sync-get-dataset-items-apify-instagram-post-scraper. The call waits for completion and returns the dataset items directly in the response body.
If the user wants whatever the actor places into the key-value OUTPUT (often an aggregated or raw output) and expects the call to wait until that value is available, call run-sync-apify-instagram-post-scraper. It waits for completion and returns the OUTPUT value.

All calls require a valid token and a body that conforms to the actor's InputSchema (set targets, limits, and other scraping options supported by the actor).

Common user tasks and the exact workflow to run

"Scrape N recent posts for this username and return the posts as JSON"

Action: call run-sync-get-dataset-items-apify-instagram-post-scraper with body fields that select the username and set the limit (e.g., max items or since/until). The response body will contain dataset items representing posts.

"Run the scraper and return raw/combined output (HTML or bundle)"

Action: call run-sync-apify-instagram-post-scraper. The response body contains the actor's OUTPUT key-value content; use this when the actor bundles or post-processes results before writing them out.

"Start a scraping run now and give me the run ID so I can track it later"

Action: call runs-sync-apify-instagram-post-scraper. Use the returned run metadata (run ID, status, timestamps) for reporting. If later output retrieval is required, prefer the synchronous run operations when the user asks for the actual data; otherwise, the run ID documents which execution produced the results.

"Compare two runs or report run status"

Action: start both runs with runs-sync-apify-instagram-post-scraper (or save the run IDs from earlier calls). The run metadata contains status and timing useful for comparison. The dataset or OUTPUT from the synchronous run calls are the actual scraped results to compare.

For every task that requires specific input options (profile vs hashtag vs URL, maximum items, date range, include/exclude comments), read the actor's InputSchema fields and set those fields in body accordingly. The actor enforces what selectors and limits exist; set only supported fields.

Non-obvious patterns and gotchas

Naming confusion: the API exposes three similarly named operations. Remember the difference by return type and whether they block for outputs:
- runs-sync-* = returns run metadata immediately (use when you only need the run ID/status).
- run-sync-get-dataset-items-* = waits and returns the dataset items (structured posts).
- run-sync-* = waits and returns the key-value OUTPUT (actor-defined primary output).
OUTPUT vs dataset: the actor may put different content in the dataset and in OUTPUT. Don’t assume they are identical—choose the operation that returns the form you actually need.
Long runs and waiting: the two run-sync-* operations wait for the actor to finish. If the actor will scrape large volumes or run long, waiting may delay the response. If the user only needs confirmation that the run started, use runs-sync-apify-instagram-post-scraper and return the run ID.
Inspect the schemas before calling: the actor's InputSchema defines exactly how to specify targets (username, hashtag, start URLs) and limits. The RunsResponseSchema shows which run fields (ID, status, timestamps) will be returned. Read those schemas to know which body fields to set and what to expect in the response bodies.
Output formats vary by actor configuration: dataset item structure (which fields are present for each post) and the shape of OUTPUT depend on the actor’s internal mapping. Do not assume field names in the returned items—examine the actual response and map fields to user-facing output.
Token is required: every operation accepts a token argument; provide a valid token in all calls.

Quick decision guide

Need structured posts right now -> run-sync-get-dataset-items-apify-instagram-post-scraper.
Need raw or aggregated actor output (single blob) -> run-sync-apify-instagram-post-scraper.
Only need run ID/status (don’t wait for completion) -> runs-sync-apify-instagram-post-scraper.

Final notes

When preparing body, set the selection criteria (username, hashtag, URL), result limits, and any filtering options the InputSchema exposes. Use the run metadata returned by runs-sync-apify-instagram-post-scraper when you need to reference or document which execution produced results. Choose the synchronous dataset or key-value callers when the user expects immediate data in the response.