instagram-profile-scraper

This Actor package exposes three ways to run the Instagram profile scraper and get results. The domain revolves around three concepts: an Actor run (the execution), the Actor’s dataset (structured items the run produces), and the Actor’s key-value store (named blobs such as the OUTPUT JSON). Every operation requires a valid Apify token and an InputSchema body that configures what to scrape and how.

How the domain is organized

An Actor run is the unit of work. A run may produce a dataset (zero or more items) and write one or more key-value entries. The scraper stores a final summary or full JSON output under the OUTPUT key in the key-value store; it also emits structured records into the run's dataset.
The run metadata links these outputs: run metadata contains identifiers or links that reference the dataset and key-value entries produced by that run. Use the run metadata to find where results were placed if you do not receive them inline.
The InputSchema controls the scrape: which profile(s) to fetch, limits, authentication/proxy options, and other scraper options. Populate the fields required by that schema to direct the Actor.

Entry points — which operation to call first

Choose the entry point based on whether the user wants immediate final data or only to start/inspect a run:

For immediate final results returned inline (blocking until the Actor finishes):
- Use run-sync-apify-instagram-profile-scraper when the caller needs the Actor's key-value OUTPUT JSON returned in the response. The response body contains the OUTPUT content.
- Use run-sync-get-dataset-items-apify-instagram-profile-scraper when the caller needs the Actor's dataset items (structured results) returned in the response body.
To start a run and receive metadata about the initiated run without waiting for completion: use runs-sync-apify-instagram-profile-scraper. The response body contains run metadata (IDs, status, and references to dataset/key-value store entries). Use that metadata to inspect or retrieve outputs later.

All three operations require the same token and an InputSchema body. Pick the operation whose response shape matches what the user asked for: dataset items vs key-value OUTPUT vs run metadata.

Common user requests and the sequence to satisfy them

"Scrape profile @username and return structured posts": call run-sync-get-dataset-items-apify-instagram-profile-scraper with the InputSchema field(s) that identify the profile (username, profile URL, or profile id) and any limits (max posts). The call will block until the Actor finishes and return the dataset items in the response body.
"Get the raw JSON output for this profile scrape": call run-sync-apify-instagram-profile-scraper with the same input; the call will wait for completion and return the key-value store entry placed under OUTPUT in the response body.
"Start a scrape now and give me the run id so I can check back later": call runs-sync-apify-instagram-profile-scraper. Inspect the returned run metadata to get the run id, status, dataset id, and key names. Use those identifiers in subsequent retrieval calls (or to surface run links to users).
"Scrape many profiles or long historical timelines": prefer runs-sync-apify-instagram-profile-scraper when runs may take long or when you plan to retrieve results asynchronously. Long runs can exceed synchronous time budgets; starting the run and querying metadata avoids blocking callers.

What to look for in responses

run-sync-* responses that return data inline place the final results in body. Confirm whether the body is the dataset items array (for the dataset-oriented call) or the key-value OUTPUT object (for the key-value call).
runs-sync-* responses return run metadata in body (typed as RunsResponseSchema). Inspect the metadata for run identifiers (run id, startedAt/finishedAt timestamps), dataset references, and key-value names so you can locate outputs later.

InputSchema: typical parameters and expectations

The InputSchema defines what the Actor accepts. Typical fields used in user requests are: a profile identifier (username or profile URL), results limits (how many posts to fetch), authentication or cookie inputs (when scraping private or login-gated content), and proxy settings. Populate the exact fields required by InputSchema for the scraper to work as intended.

Common practical points about InputSchema usage:

If the profile is public and only recent posts are needed, set a low results limit to speed runs and reduce chance of blocking.
If the target profile requires login or is rate-limited by Instagram, include credentials/cookies and/or proxy configuration in the input if the schema supports them.
For bulk scraping, include batching parameters (if present) rather than submitting many single-profile runs.

Gotchas and non-obvious behaviors

Synchronous vs asynchronous: the run-sync-* operations wait for the Actor run to finish and return results inline. If a run is long-running or may time out, use runs-sync-apify-instagram-profile-scraper so you get run metadata immediately and can retrieve outputs when ready.
Results location can differ: the Actor may write outputs both to the dataset and to the key-value store. Pick the matching run-sync operation for the data you want. If you call the wrong one, the response body may be present but not in the shape you expect.
Partial results and truncation: large scrapes may produce very large datasets. If the inline run-sync-get-dataset-items response seems truncated or small, check the run metadata returned by runs-sync-apify-instagram-profile-scraper for the dataset id and inspect whether more items exist in the dataset reference.
Authentication and access-limited profiles: scraping private or login-required profiles will not succeed without supplying the appropriate credentials/cookies in InputSchema, if the Actor supports them. Expect empty or partial results when credentials are missing.
Anti-bot / proxy needs: Instagram often enforces rate-limits and blocking. If InputSchema supports proxy or session options, include those for reliability when scraping multiple profiles or deep histories.
Token requirement: every operation requires a valid Apify token supplied as the token parameter. A missing or invalid token will prevent any run from starting.

Practical checklist for common workflows

When a user asks to scrape a profile and return results:

Decide whether they need immediate results (use a run-sync-* operation) or just to start a run and retrieve later (use runs-sync-*).
Build the InputSchema body to include the profile identifier and any limits, credentials, or proxy settings required.
Call the chosen operation with token and the InputSchema body.
If you used runs-sync-*, inspect the returned run metadata to find run id, dataset id, and key-value names to retrieve later. If you used run-sync-*, inspect body for dataset items or for the OUTPUT key-value content depending on which operation you called.

Following these patterns ensures the returned response matches the user’s expectations (structured dataset vs key-value JSON vs run metadata) and avoids blocking on long runs or missing credentials.