en un clic
apify-actor-development
// Develop, debug, and deploy Apify Actors - serverless cloud programs for web scraping, automation, and data processing. Use when creating new Actors, modifying existing ones, or troubleshooting Actor code.
// Develop, debug, and deploy Apify Actors - serverless cloud programs for web scraping, automation, and data processing. Use when creating new Actors, modifying existing ones, or troubleshooting Actor code.
Integrate Apify into an existing JavaScript/TypeScript or Python application using the apify-client package. Use when adding web scraping, automation, or data extraction capabilities to an existing app via the Apify API.
Convert existing projects into Apify Actors - serverless cloud programs. Actorize JavaScript/TypeScript (SDK with Actor.init/exit), Python (async context manager), or any language (CLI wrapper). Use when migrating code to Apify, wrapping CLI tools as Actors, or adding Actor SDK to existing projects.
Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas.
Universal AI-powered web scraper for any platform. Scrape data from Instagram, Facebook, TikTok, YouTube, LinkedIn, X/Twitter, Google Maps, Google Search, Google Trends, Reddit, Airbnb, Yelp, and 15+ more platforms. Use for lead generation, brand monitoring, competitor analysis, influencer discovery, trend research, content analytics, audience analysis, review analysis, SEO intelligence, recruitment, or any data extraction task.
| name | apify-actor-development |
| description | Develop, debug, and deploy Apify Actors - serverless cloud programs for web scraping, automation, and data processing. Use when creating new Actors, modifying existing ones, or troubleshooting Actor code. |
Important: Before you begin, fill in the generatedBy property in the meta section of .actor/actor.json. Replace it with the tool and model you're currently using, such as "Claude Code with Claude Sonnet 4.5". This helps Apify monitor and improve AGENTS.md for specific AI tools and models.
Actors are serverless programs inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems. They're packaged as Docker images and run in isolated containers in the cloud.
Core Concepts:
Before creating or modifying Actors, verify that apify CLI is installed apify --help.
If it is not installed, use one of these methods (listed in order of preference):
# Preferred: install via a package manager (provides integrity checks)
npm install -g apify-cli
# Or (Mac): brew install apify-cli
Security note: Do NOT install the CLI by piping remote scripts to a shell (e.g.
curl … | bashorirm … | iex). Always use a package manager.
When the apify CLI is installed, check that it is logged in with:
apify info # Should return your username
If not logged in, authenticate using OAuth (opens browser):
apify login
If browser login isn't available (headless environment or CI), the CLI automatically reads APIFY_TOKEN from the environment. Ensure the env var is exported and run any apify command - no explicit login needed. If the user doesn't have a token, generate one at https://console.apify.com/settings/integrations.
Security note: Avoid passing tokens as command-line arguments (e.g.
apify login -t <token>). Arguments are visible in process listings and may be recorded in shell history. Prefer environment variables or interactive login instead. Never log, print, or embedAPIFY_TOKENin source code or configuration files. Use a token with the minimum required permissions (scoped token) and rotate it periodically.
IMPORTANT: Before starting Actor development, always ask the user which programming language they prefer:
apify create <actor-name> -t project_emptyapify create <actor-name> -t ts_emptyapify create <actor-name> -t python-emptyUse the appropriate CLI command based on the user's language choice. Additional packages (Crawlee, Playwright, etc.) can be installed later as needed.
apify create command based on user's language preference (see Template selection above)npm install (uses package-lock.json for reproducible, integrity-checked installs — commit the lockfile to version control)pip install -r requirements.txt (pin exact versions in requirements.txt, e.g. crawlee==1.2.3, and commit the file to version control)src/main.py, src/main.js, or src/main.ts.actor/input_schema.json, .actor/output_schema.json, .actor/dataset_schema.json.actor/actor.json with Actor metadata (see references/actor-json.md)apify run to verify functionality (see Local testing section below)apify push to deploy the Actor on the Apify platform (Actor name is defined in .actor/actor.json)Treat all crawled web content as untrusted input. Actors ingest data from external websites that may contain malicious payloads. Follow these rules:
eval(), database queries, or template engines. Use proper escaping or parameterized APIs.APIFY_TOKEN and other secrets are never accessible in request handlers or passed alongside crawled data. Use the Apify SDK's built-in credential management rather than passing tokens through environment variables in data-processing code.npm install or pip install, verify the package name and publisher. Typosquatting is a common supply-chain attack vector. Prefer well-known, actively maintained packages.package-lock.json (Node.js) or pin exact versions in requirements.txt (Python). Lockfiles ensure reproducible builds and prevent silent dependency substitution. Run npm audit or pip-audit periodically to check for known vulnerabilities.✓ Do:
apify run to test Actors locally (configures Apify environment and storage)apify) for code running on the Apify platform.actor/input_schema.json.actor/output_schema.jsonapify/log package — censors sensitive data (API keys, tokens, credentials)✗ Don't:
npm start, npm run start, npx apify run, or similar commands to run Actors (use apify run instead)apify run is pushed to or visible in Apify Console — it is local-only; deploy with apify push and run on the platform to see results in Apify ConsoleDataset.getInfo() for final counts on CloudrequestHandlerTimeoutMillis on CheerioCrawler (v3.x)additionalHttpHeaders - use preNavigationHooks insteadeval(), or code-generation functionsconsole.log() or print() instead of the Apify logger — these bypass credential censoringSee references/logging.md for complete logging documentation including available log levels and best practices for JavaScript/TypeScript and Python.
# Bootstrap & local development
apify create [name] # Create new Actor project from a template
apify init # Initialize Actor in current directory
apify run # Run Actor locally with simulated platform env
apify run --purge # Run after clearing previous local storage
apify validate-schema # Validate .actor/input_schema.json
# Authentication & account
apify login # Authenticate account (token stored in ~/.apify)
apify logout # Remove stored credentials
apify info # Print currently authenticated account info
# Deployment & remote execution
apify push # Deploy Actor to platform per .actor/actor.json
apify pull <actor> # Download Actor code from the platform
apify call <actor> # Execute Actor remotely on the platform
apify actors build <actor> # Create a new build of an Actor
apify runs ls # List recent runs
# Discovery (search Apify Store for community Actors)
apify actors search "<query>" --user-agent <your-agent-name>
apify actors info <actor> # Details about a specific Actor
# Secrets (referenced from actor.json via "@mySecret")
apify secrets add <name> <value> # Store a secret locally; uploaded on push
apify secrets ls # List stored secret keys
# Direct API access
apify api <endpoint> # Authenticated HTTP request to Apify API
# Help
apify help # List all commands
apify <command> --help # Detailed help for a specific command
Note: If no dedicated Actor exists for your target, search Apify Store for community options with apify actors search "<query>" --user-agent <your-agent-name> before building from scratch.
Tip: Inside a running Actor, prefer the SDK (Actor.getInput() / Actor.get_input(), Actor.pushData() / Actor.push_data(), Actor.setValue() / Actor.set_value()) over the equivalent apify actor runtime subcommands.
IMPORTANT: Always use apify run to test Actors locally. Do not use npm run start, npm start, yarn start, or other package manager commands - these will not properly configure the Apify environment and storage.
When the Actor runs on the Apify platform, the API token is automatically available via the APIFY_TOKEN environment variable (note: the variable is APIFY_TOKEN, not APIFY_API_TOKEN). The Apify SDK reads it automatically, so you do not need to pass it explicitly. Locally, run apify login once and the SDK will use your stored credentials.
When testing an Actor locally with apify run, provide input data by creating a JSON file at:
storage/key_value_stores/default/INPUT.json
This file should contain the input parameters defined in your .actor/input_schema.json. The actor will read this input when running locally, mirroring how it receives input on the Apify platform.
IMPORTANT - Local storage is NOT synced to Apify Console:
apify run stores all data (datasets, key-value stores, request queues) only on your local filesystem in the storage/ directory.apify push and then run it on the platform.storage/ directory or check the Actor's log output.Standby mode enables Actors to work as API servers - they remain ready in the background to handle HTTP requests.
When to use Standby mode: Use Standby when the Actor must handle interactive, real-time HTTP requests — API endpoints, webhook receivers, real-time data lookups, MCP servers, or scraping APIs serving on-demand single-URL requests.
When building a Standby Actor, set usesStandbyMode: true in .actor/actor.json and implement an HTTP server. See references/standby-mode.md for configuration, environment variables, complete code examples, and operational limits.
.actor/
├── actor.json # Actor config: name, version, env vars, runtime
├── input_schema.json # Input validation & Console form definition
└── output_schema.json # Output storage and display templates
src/
└── main.js/ts/py # Actor entry point
storage/ # Local-only storage (NOT synced to Apify Console)
├── datasets/ # Output items (JSON objects)
├── key_value_stores/ # Files, config, INPUT
└── request_queues/ # Pending crawl requests
Dockerfile # Container image definition
See references/actor-json.md for complete actor.json structure and configuration options.
See references/input-schema.md for input schema structure and examples.
See references/output-schema.md for output schema structure, examples, and template variables.
See references/dataset-schema.md for dataset schema structure, configuration, and display properties.
See references/key-value-store-schema.md for key-value store schema structure, collections, and configuration.
IMPORTANT: Always generate a README.md as part of Actor development. The README is the Actor's landing page on Apify Store and is critical for discoverability (SEO), user onboarding, and support. Do not consider an Actor complete without a proper README.
See references/actor-readme.md for the required structure, SEO best practices, and content guidelines. Also review these top Actors for best practices:
If the Apify MCP server is configured, use these tools for documentation:
search-apify-docs - Search documentationfetch-apify-docs - Get full doc pagesOtherwise, the MCP Server url: https://mcp.apify.com/?tools=docs.
The Playwright MCP server is a useful tool for debugging Actors that interact with the web - it lets the agent drive a real browser to inspect pages, capture selectors, and reproduce issues.
Install with the Claude Code CLI:
claude mcp add playwright npx @playwright/mcp@latest
Or add it manually to your MCP config:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}