com um clique
apify-generate-output-schema
// Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas.
// Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas.
Develop, debug, and deploy Apify Actors - serverless cloud programs for web scraping, automation, and data processing. Use when creating new Actors, modifying existing ones, or troubleshooting Actor code.
Convert existing projects into Apify Actors - serverless cloud programs. Actorize JavaScript/TypeScript (SDK with Actor.init/exit), Python (async context manager), or any language (CLI wrapper). Use when migrating code to Apify, wrapping CLI tools as Actors, or adding Actor SDK to existing projects.
Integrate Apify into an existing JavaScript/TypeScript or Python application using the apify-client package. Use when adding web scraping, automation, or data extraction capabilities to an existing app via the Apify API.
Universal AI-powered web scraper for any platform. Scrape data from Instagram, Facebook, TikTok, YouTube, LinkedIn, X/Twitter, Google Maps, Google Search, Google Trends, Reddit, Airbnb, Yelp, and 15+ more platforms. Use for lead generation, brand monitoring, competitor analysis, influencer discovery, trend research, content analytics, audience analysis, review analysis, SEO intelligence, recruitment, or any data extraction task.
| name | apify-generate-output-schema |
| description | Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas. |
| user-invocable | false |
You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create dataset_schema.json, output_schema.json, and key_value_store_schema.json (if the Actor uses key-value store), and update actor.json.
"nullable": trueGoal: Locate the Actor and understand its output
Use the user's most recent request as the scope for this skill (which Actor to target, which subdirectory, any specific fields to focus on). If the scope is unclear, ask one clarifying question before continuing.
Actions:
.actor/ directory containing actor.jsonactor.json to understand the Actor's configurationdataset_schema.json, output_schema.json, and key_value_store_schema.json already exist.actor/ directories or schema files (e.g., **/dataset_schema.json, **/output_schema.json, **/key_value_store_schema.json) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structureActor.pushData(, dataset.pushData(, Dataset.pushData(Actor.push_data(, dataset.push_data(, Dataset.push_data(Actor.setValue(, keyValueStore.setValue(, KeyValueStore.setValue(Actor.set_value(, key_value_store.set_value(, KeyValueStore.set_value(src/types/, src/types/output.ts). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definitionstorages.dataset or storages.keyValueStore config exists in actor.json, note it for migrationPresent findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.
dataset_schema.jsonGoal: Create a complete dataset schema with field definitions and display views
{
"actorSpecification": 1,
"fields": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
// ALL output fields here — every field the Actor can produce,
// not just the ones shown in the overview view
},
"required": [],
"additionalProperties": true
},
"views": {
"overview": {
"title": "Overview",
"description": "Most important fields at a glance",
"transformation": {
"fields": [
// 8-12 most important field names
]
},
"display": {
"component": "table",
"properties": {
// Display config for each overview field
}
}
}
}
}
If existing output schemas were found in the repository during Phase 1 (step 5), follow their conventions:
When the Actor code already has well-defined TypeScript interfaces or Python type classes, derive fields directly from those types rather than re-analyzing pushData/push_data calls from scratch. The type definition is the canonical source.
| Rule | Detail |
|---|---|
All fields in properties | The fields.properties object must contain every field the Actor can output, not just the fields shown in the overview view. The views section selects a subset for display — the properties section must be the complete superset |
"nullable": true | On every field — APIs are unpredictable |
"additionalProperties": true | On the top-level fields object AND on every nested object within properties. This is the most commonly missed rule — it must appear at both levels |
"required": [] | Always empty array — on the top-level fields object AND on every nested object within properties |
| Anonymized examples | No real user IDs, usernames, or content |
"type" required with "nullable" | AJV rejects nullable without a type on the same field |
Warning — most common mistakes:
- Only including fields that appear in the overview view. The
fields.propertiesmust list ALL output fields, even if they are not in theviewssection.- Only adding
"required": []and"additionalProperties": trueon nested object-type properties but forgetting them on the top-levelfieldsobject. Both levels need them.
Note:
nullableis an Apify-specific extension to JSON Schema draft-07. It is intentional and correct.
String field:
"title": {
"type": "string",
"description": "Title of the scraped item",
"nullable": true,
"example": "Example Item Title"
}
Number field:
"viewCount": {
"type": "number",
"description": "Number of views",
"nullable": true,
"example": 15000
}
Boolean field:
"isVerified": {
"type": "boolean",
"description": "Whether the account is verified",
"nullable": true,
"example": true
}
Array field:
"hashtags": {
"type": "array",
"description": "Hashtags associated with the item",
"items": { "type": "string" },
"nullable": true,
"example": ["#example", "#demo"]
}
Nested object field:
"authorInfo": {
"type": "object",
"description": "Information about the author",
"properties": {
"name": { "type": "string", "nullable": true },
"url": { "type": "string", "nullable": true }
},
"required": [],
"additionalProperties": true,
"nullable": true,
"example": { "name": "Example Author", "url": "https://example.com/author" }
}
Enum field:
"contentType": {
"type": "string",
"description": "Type of content",
"enum": ["article", "video", "image"],
"nullable": true,
"example": "article"
}
Union type (e.g., TypeScript ObjectType | string):
"metadata": {
"type": ["object", "string"],
"description": "Structured metadata object, or error string if unavailable",
"nullable": true,
"example": { "key": "value" }
}
Use realistic but generic values. Follow platform ID format conventions:
| Field type | Example approach |
|---|---|
| IDs | Match platform format and length (e.g., 11 chars for YouTube video IDs) |
| Usernames | "exampleuser", "sampleuser123" |
| Display names | "Example Channel", "Sample Author" |
| URLs | Use platform's standard URL format with fake IDs |
| Dates | "2025-01-15T12:00:00.000Z" (ISO 8601) |
| Text content | Generic descriptive text, e.g., "This is an example description." |
transformation.fields: List 8–12 most important field names (order = column order in UI)display.properties: One entry per overview field with label and format"text", "number", "date", "link", "boolean", "image", "array", "object"Pick fields that give users the most useful at-a-glance summary of the data.
key_value_store_schema.json (if applicable)Goal: Define key-value store collections if the Actor stores data in the key-value store
Skip this phase if no
Actor.setValue()/Actor.set_value()calls were found in Phase 1 (beyond the defaultINPUTkey).
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "<Descriptive title — what the key-value store contains>",
"description": "<One sentence describing the stored data>",
"collections": {
"<collectionName>": {
"title": "<Human-readable title>",
"description": "<What this collection contains>",
"keyPrefix": "<prefix->"
}
}
}
Group the discovered setValue / set_value calls by key pattern:
"RESULTS", "summary") — use "key" (exact match)"screenshot-${id}", f"image-{name}") — use "keyPrefix"Each group becomes a collection.
| Property | Required | Description |
|---|---|---|
title | Yes | Shown in UI tabs |
description | No | Shown in UI tooltips |
key | Conditional | Exact key for single-key collections (use key OR keyPrefix, not both) |
keyPrefix | Conditional | Prefix for multi-key collections (use key OR keyPrefix, not both) |
contentTypes | No | Restrict allowed MIME types (e.g., ["image/jpeg"], ["application/json"]) |
jsonSchema | No | JSON Schema draft-07 for validating application/json content |
Single file output (e.g., a report):
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "Analysis Results",
"description": "Key-value store containing analysis output",
"collections": {
"report": {
"title": "Report",
"description": "Final analysis report",
"key": "REPORT",
"contentTypes": ["application/json"]
}
}
}
Multiple files with prefix (e.g., screenshots):
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "Scraped Files",
"description": "Key-value store containing downloaded files and screenshots",
"collections": {
"screenshots": {
"title": "Screenshots",
"description": "Page screenshots captured during scraping",
"keyPrefix": "screenshot-",
"contentTypes": ["image/png", "image/jpeg"]
},
"documents": {
"title": "Documents",
"description": "Downloaded document files",
"keyPrefix": "doc-",
"contentTypes": ["application/pdf", "text/html"]
}
}
}
output_schema.jsonGoal: Create the output schema that tells Apify Console where to find results
For most Actors that push data to a dataset, this is a minimal file:
{
"actorOutputSchemaVersion": 1,
"title": "<Descriptive title — what the Actor returns>",
"description": "<One sentence describing the output data>",
"properties": {
"dataset": {
"type": "string",
"title": "Results",
"description": "Dataset containing all scraped data",
"template": "{{links.apiDefaultDatasetUrl}}/items"
}
}
}
Critical: Each property entry must include
"type": "string"— this is an Apify-specific convention. The Apify meta-validator rejects properties without it (and rejects"type": "object"— only"string"is valid here).
If key_value_store_schema.json was generated in Phase 3, add a second property:
"files": {
"type": "string",
"title": "Files",
"description": "Key-value store containing downloaded files",
"template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}
{{links.apiDefaultDatasetUrl}} — API URL of default dataset{{links.apiDefaultKeyValueStoreUrl}} — API URL of default key-value store{{links.publicRunUrl}} — Public run URL{{links.consoleRunUrl}} — Console run URL{{links.apiRunUrl}} — API run URL{{links.containerRunUrl}} — URL of webserver running inside the run{{run.defaultDatasetId}} — ID of the default dataset{{run.defaultKeyValueStoreId}} — ID of the default key-value storeactor.jsonGoal: Wire the schema files into the Actor configuration
Actions:
actor.jsonstorages.dataset reference:
"storages": {
"dataset": "./dataset_schema.json"
}
key_value_store_schema.json was generated, add the reference:
"storages": {
"dataset": "./dataset_schema.json",
"keyValueStore": "./key_value_store_schema.json"
}
output reference:
"output": "./output_schema.json"
actor.json had inline storages.dataset or storages.keyValueStore objects (not string paths), migrate their content into the respective schema files and replace the inline objects with file path stringsGoal: Ensure correctness and completeness
Checklist:
dataset_schema.json fields.properties — not just the overview view fields but ALL fields the Actor can produce"nullable": truefields object has both "additionalProperties": true and "required": []properties also has "additionalProperties": true and "required": []"description" and an "example""type" is present on every field that has "nullable"output_schema.json has "type": "string" on every propertykey_value_store_schema.json has collections matching all setValue/set_value callskey or keyPrefix (not both)actor.json references all generated schema filesPresent the generated schemas to the user for review before writing them.
Goal: Document what was created
Report:
apify run, verify output tab in Console)