with one click
apify-generate-output-schema
// Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas.
// Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas.
Integrate Apify into an existing JavaScript/TypeScript or Python application using the apify-client package. Use when adding web scraping, automation, or data extraction capabilities to an existing app via the Apify API.
Develop, debug, and deploy Apify Actors - serverless cloud programs for web scraping, automation, and data processing. Use when creating new Actors, modifying existing ones, or troubleshooting Actor code.
Convert existing projects into Apify Actors - serverless cloud programs. Actorize JavaScript/TypeScript (SDK with Actor.init/exit), Python (async context manager), or any language (CLI wrapper). Use when migrating code to Apify, wrapping CLI tools as Actors, or adding Actor SDK to existing projects.
Universal AI-powered web scraper for any platform. Scrape data from Instagram, Facebook, TikTok, YouTube, LinkedIn, X/Twitter, Google Maps, Google Search, Google Trends, Reddit, Airbnb, Yelp, and 15+ more platforms. Use for lead generation, brand monitoring, competitor analysis, influencer discovery, trend research, content analytics, audience analysis, review analysis, SEO intelligence, recruitment, or any data extraction task.
| name | apify-generate-output-schema |
| description | Generate output schemas (dataset_schema.json, output_schema.json, key_value_store_schema.json) for an Apify Actor by analyzing its source code. Use when creating or updating Actor output schemas. |
You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create dataset_schema.json, output_schema.json, and key_value_store_schema.json (if the Actor uses key-value store), and update actor.json.
"nullable": trueGoal: Locate the Actor and understand its output
Initial request: $ARGUMENTS
Actions:
.actor/ directory containing actor.jsonactor.json to understand the Actor's configurationdataset_schema.json, output_schema.json, and key_value_store_schema.json already exist.actor/ directories or schema files (e.g., **/dataset_schema.json, **/output_schema.json, **/key_value_store_schema.json) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structureActor.pushData(, dataset.pushData(, Dataset.pushData(Actor.push_data(, dataset.push_data(, Dataset.push_data(Actor.setValue(, keyValueStore.setValue(, KeyValueStore.setValue(Actor.set_value(, key_value_store.set_value(, KeyValueStore.set_value(src/types/, src/types/output.ts). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definitionstorages.dataset or storages.keyValueStore config exists in actor.json, note it for migrationPresent findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.
dataset_schema.jsonGoal: Create a complete dataset schema with field definitions and display views
{
"actorSpecification": 1,
"fields": {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
// ALL output fields here — every field the Actor can produce,
// not just the ones shown in the overview view
},
"required": [],
"additionalProperties": true
},
"views": {
"overview": {
"title": "Overview",
"description": "Most important fields at a glance",
"transformation": {
"fields": [
// 8-12 most important field names
]
},
"display": {
"component": "table",
"properties": {
// Display config for each overview field
}
}
}
}
}
If existing output schemas were found in the repository during Phase 1 (step 5), follow their conventions:
When the Actor code already has well-defined TypeScript interfaces or Python type classes, derive fields directly from those types rather than re-analyzing pushData/push_data calls from scratch. The type definition is the canonical source.
| Rule | Detail |
|---|---|
All fields in properties | The fields.properties object must contain every field the Actor can output, not just the fields shown in the overview view. The views section selects a subset for display — the properties section must be the complete superset |
"nullable": true | On every field — APIs are unpredictable |
"additionalProperties": true | On the top-level fields object AND on every nested object within properties. This is the most commonly missed rule — it must appear at both levels |
"required": [] | Always empty array — on the top-level fields object AND on every nested object within properties |
| Anonymized examples | No real user IDs, usernames, or content |
"type" required with "nullable" | AJV rejects nullable without a type on the same field |
Warning — most common mistakes:
- Only including fields that appear in the overview view. The
fields.propertiesmust list ALL output fields, even if they are not in theviewssection.- Only adding
"required": []and"additionalProperties": trueon nested object-type properties but forgetting them on the top-levelfieldsobject. Both levels need them.
Note:
nullableis an Apify-specific extension to JSON Schema draft-07. It is intentional and correct.
String field:
"title": {
"type": "string",
"description": "Title of the scraped item",
"nullable": true,
"example": "Example Item Title"
}
Number field:
"viewCount": {
"type": "number",
"description": "Number of views",
"nullable": true,
"example": 15000
}
Boolean field:
"isVerified": {
"type": "boolean",
"description": "Whether the account is verified",
"nullable": true,
"example": true
}
Array field:
"hashtags": {
"type": "array",
"description": "Hashtags associated with the item",
"items": { "type": "string" },
"nullable": true,
"example": ["#example", "#demo"]
}
Nested object field:
"authorInfo": {
"type": "object",
"description": "Information about the author",
"properties": {
"name": { "type": "string", "nullable": true },
"url": { "type": "string", "nullable": true }
},
"required": [],
"additionalProperties": true,
"nullable": true,
"example": { "name": "Example Author", "url": "https://example.com/author" }
}
Enum field:
"contentType": {
"type": "string",
"description": "Type of content",
"enum": ["article", "video", "image"],
"nullable": true,
"example": "article"
}
Union type (e.g., TypeScript ObjectType | string):
"metadata": {
"type": ["object", "string"],
"description": "Structured metadata object, or error string if unavailable",
"nullable": true,
"example": { "key": "value" }
}
Use realistic but generic values. Follow platform ID format conventions:
| Field type | Example approach |
|---|---|
| IDs | Match platform format and length (e.g., 11 chars for YouTube video IDs) |
| Usernames | "exampleuser", "sampleuser123" |
| Display names | "Example Channel", "Sample Author" |
| URLs | Use platform's standard URL format with fake IDs |
| Dates | "2025-01-15T12:00:00.000Z" (ISO 8601) |
| Text content | Generic descriptive text, e.g., "This is an example description." |
transformation.fields: List 8–12 most important field names (order = column order in UI)display.properties: One entry per overview field with label and format"text", "number", "date", "link", "boolean", "image", "array", "object"Pick fields that give users the most useful at-a-glance summary of the data.
key_value_store_schema.json (if applicable)Goal: Define key-value store collections if the Actor stores data in the key-value store
Skip this phase if no
Actor.setValue()/Actor.set_value()calls were found in Phase 1 (beyond the defaultINPUTkey).
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "<Descriptive title — what the key-value store contains>",
"description": "<One sentence describing the stored data>",
"collections": {
"<collectionName>": {
"title": "<Human-readable title>",
"description": "<What this collection contains>",
"keyPrefix": "<prefix->"
}
}
}
Group the discovered setValue / set_value calls by key pattern:
"RESULTS", "summary") — use "key" (exact match)"screenshot-${id}", f"image-{name}") — use "keyPrefix"Each group becomes a collection.
| Property | Required | Description |
|---|---|---|
title | Yes | Shown in UI tabs |
description | No | Shown in UI tooltips |
key | Conditional | Exact key for single-key collections (use key OR keyPrefix, not both) |
keyPrefix | Conditional | Prefix for multi-key collections (use key OR keyPrefix, not both) |
contentTypes | No | Restrict allowed MIME types (e.g., ["image/jpeg"], ["application/json"]) |
jsonSchema | No | JSON Schema draft-07 for validating application/json content |
Single file output (e.g., a report):
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "Analysis Results",
"description": "Key-value store containing analysis output",
"collections": {
"report": {
"title": "Report",
"description": "Final analysis report",
"key": "REPORT",
"contentTypes": ["application/json"]
}
}
}
Multiple files with prefix (e.g., screenshots):
{
"actorKeyValueStoreSchemaVersion": 1,
"title": "Scraped Files",
"description": "Key-value store containing downloaded files and screenshots",
"collections": {
"screenshots": {
"title": "Screenshots",
"description": "Page screenshots captured during scraping",
"keyPrefix": "screenshot-",
"contentTypes": ["image/png", "image/jpeg"]
},
"documents": {
"title": "Documents",
"description": "Downloaded document files",
"keyPrefix": "doc-",
"contentTypes": ["application/pdf", "text/html"]
}
}
}
output_schema.jsonGoal: Create the output schema that tells Apify Console where to find results
For most Actors that push data to a dataset, this is a minimal file:
{
"actorOutputSchemaVersion": 1,
"title": "<Descriptive title — what the Actor returns>",
"description": "<One sentence describing the output data>",
"properties": {
"dataset": {
"type": "string",
"title": "Results",
"description": "Dataset containing all scraped data",
"template": "{{links.apiDefaultDatasetUrl}}/items"
}
}
}
Critical: Each property entry must include
"type": "string"— this is an Apify-specific convention. The Apify meta-validator rejects properties without it (and rejects"type": "object"— only"string"is valid here).
If key_value_store_schema.json was generated in Phase 3, add a second property:
"files": {
"type": "string",
"title": "Files",
"description": "Key-value store containing downloaded files",
"template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}
{{links.apiDefaultDatasetUrl}} — API URL of default dataset{{links.apiDefaultKeyValueStoreUrl}} — API URL of default key-value store{{links.publicRunUrl}} — Public run URL{{links.consoleRunUrl}} — Console run URL{{links.apiRunUrl}} — API run URL{{links.containerRunUrl}} — URL of webserver running inside the run{{run.defaultDatasetId}} — ID of the default dataset{{run.defaultKeyValueStoreId}} — ID of the default key-value storeactor.jsonGoal: Wire the schema files into the Actor configuration
Actions:
actor.jsonstorages.dataset reference:
"storages": {
"dataset": "./dataset_schema.json"
}
key_value_store_schema.json was generated, add the reference:
"storages": {
"dataset": "./dataset_schema.json",
"keyValueStore": "./key_value_store_schema.json"
}
output reference:
"output": "./output_schema.json"
actor.json had inline storages.dataset or storages.keyValueStore objects (not string paths), migrate their content into the respective schema files and replace the inline objects with file path stringsGoal: Ensure correctness and completeness
Checklist:
dataset_schema.json fields.properties — not just the overview view fields but ALL fields the Actor can produce"nullable": truefields object has both "additionalProperties": true and "required": []properties also has "additionalProperties": true and "required": []"description" and an "example""type" is present on every field that has "nullable"output_schema.json has "type": "string" on every propertykey_value_store_schema.json has collections matching all setValue/set_value callskey or keyPrefix (not both)actor.json references all generated schema filesPresent the generated schemas to the user for review before writing them.
Goal: Document what was created
Report:
apify run, verify output tab in Console)