| name | programmatic-seo |
| description | How to design and run a programmatic SEO program that produces durable traffic instead of penalty-bait. Data source identification, template design, schema patterns, quality control at scale, internal linking architecture, crawl budget management, AEO/GEO for programmatic pages, refresh discipline, and the make-or-break question of whether pSEO is the right answer for your program at all. Triggers on programmatic SEO, pSEO, scaled content, page generation, template SEO, location pages, comparison pages, directory site, listing site, scaled landing pages, programmatic content. Also triggers when a content set is not ranking, has been hit by an algorithm update, or when a team is considering pSEO as a growth lever. |
| category | content |
| catalog_summary | Designing pSEO programs that work: data sources, template design, quality control at scale, internal linking, crawl budget, AEO/GEO patterns, refresh discipline, and when pSEO is and is not the right answer |
| display_order | 6 |
Programmatic SEO
A senior SEO strategist's playbook for designing and running programmatic SEO programs that produce durable traffic instead of penalty-bait.
Programmatic SEO has a complicated reputation. Sites like Zillow, Airbnb, TripAdvisor, Indeed, and Yelp have built billion-dollar traffic engines on pSEO. Other sites have built pSEO programs that got hit by Google's helpful-content updates and lost 80% of their traffic in a week. The difference is rarely the technique; it is the underlying data quality and the quality control discipline at scale.
This skill is the playbook for getting that distinction right. It assumes you have decided what keyword space to target (see seo-keyword) and how the broader content program is shaped (see content-strategy). It does not write individual editorial pieces (see content-and-copy for that) and does not architect editorial topic hubs (see pillar-content-architecture for that). What it does is teach the discipline of generating high-volume pages programmatically from structured data sources without producing thin content, duplicate content, or scale-without-substance pages that get penalized.
When to use this skill: deciding whether pSEO is a fit for the program at all (the most important question), designing a new pSEO system, auditing an existing pSEO set that is not ranking or has been hit by an algorithm update, or building quality-control discipline for a pSEO program that grew faster than its quality processes.
What this skill is for
This skill spans scaled-content programs from data source through quality control. It composes with five sister and adjacent skills, and the distinction between them is what keeps each one sharp.
content-strategy is program-scope: editorial pillars, calendar, governance. Decides whether pSEO fits the program at all.
pillar-content-architecture is hub-scope: one topic with 10 to 15 intentional editorial pieces. Editorial in nature.
content-brief-authoring is per-piece scope: brief for one editorial artifact.
content-and-copy is execution scope: writing individual editorial pieces.
- This skill is scaled scope: 100s to 100,000s of pages generated programmatically from structured data sources, each targeting a long-tail query.
The clean reading order: content-strategy decides whether pSEO is a fit, seo-keyword surfaces the long-tail keyword space, this skill designs the pSEO system, editorial-qa (forthcoming) provides the QA discipline for sampled quality control across the set. content-and-copy and content-brief-authoring are not in the loop for pSEO at scale; those skills are for editorial pieces, not data-driven generated pages.
The audience: SEO content strategists, content engineers, agencies running pSEO programs, in-house teams considering pSEO as a growth lever. The voice is senior SEO strategist to junior PM or marketer. Specific, opinionated, honest. The reputation problem is not pSEO; it is pSEO without underlying value.
When pSEO is the right answer
The keystone question. pSEO works when all of the following are true.
1. Real underlying data. A genuine structured data source with depth: 10+ fields per record, ideally 20+, with first-party data, expert-curated data, or licensed datasets. Not just scraped data or AI-generated facts dressed as structured records.
2. Long-tail query volume justifies the effort. The queries the program would target have meaningful aggregate volume even if individual queries are small. Real estate "homes for sale in {neighborhood}" works at scale. "Blue widget reviews 2026" generated for every adjective times widget combination does not, because the queries are not actually searched.
3. User intent is queryable. The user's question can be answered through structured data presented well. Not through narrative explanation, judgment, or analysis the data cannot supply.
4. Update cadence aligns with query volatility. Real estate listings update daily and the data refresh aligns. "Best [thing] for [year]" pages get stale annually and need a refresh discipline budgeted in.
5. Quality control is operationally feasible. The team has the capacity to sample-audit the set, fix failures, and maintain quality as the set grows. Not aspirationally; budgeted in headcount.
pSEO does NOT work when:
- The underlying data is shallow (3 to 5 fields, mostly AI-generated, no first-party signal)
- The query volume is illusory (long-tail keywords nobody actually searches)
- User intent requires narrative or judgment that data cannot supply
- Quality control is not budgeted (write a bunch of pages, ignore them)
- The program is scaling AI-generated thin pages without unique data
The honest framing. Most teams that ask "should we do pSEO?" should hear "probably not, unless the underlying data is unique or first-party expertise makes the pages actually useful." The reputation problem is not the technique; it is the technique applied without underlying value.
Detail in references/when-pseo-works-decision.md.
Data source identification
The data source is the pSEO program. Common sources with different defensibility profiles.
First-party data. Customer transactions, content database, user-generated content. Defensible because nobody else has it. Examples: Glassdoor's employee reviews, Yelp's user ratings, TripAdvisor's traveler reviews.
Licensed datasets. Industry databases, regulatory data, government datasets, licensed third-party feeds. Defensible by license terms and integration depth. Examples: Zillow's MLS partnerships, real estate brokerage feeds, sports statistics licenses.
Aggregated public data. Scraped, cleaned, enriched. Judgment call on legality (often gray area depending on robots.txt, terms of service, jurisdictional rules). Defensibility depends on the cleaning and enrichment work. Easy to copy if the cleaning is shallow.
Expert-curated content. The dataset is built by hiring experts to populate it. Slow, high-quality, defensible. Examples: Wirecutter's product testing, expert-reviewed medical content, curated editorial databases.
Synthesized data. Combining multiple sources into a unique view. "Neighborhoods times schools times prices" combining three datasets into a comparison view. Defensibility comes from the synthesis logic and the ongoing maintenance of multi-source pipelines.
The "moat" question. Would a competitor be able to replicate the data source? If yes, the pSEO program has no defensibility; anyone can copy. If no, the data source becomes a moat that compounds. Zillow's MLS partnerships, Glassdoor's employee reviews, Crunchbase's funding data, are moats. "Scraped Wikipedia plus AI rewrite" is not.
Detail in references/data-source-identification-patterns.md.
Template design
The template is the structure that data fills. Design principles.
Above-the-fold answer. The user's specific question answered in the first 200 words of the rendered page, structured for both human reading and AI extraction. The answer is what gets cited; the rest is supporting depth.
Variable density. The template accommodates records with sparse fields (some have 5 data points, others have 50) without looking broken. Sparse pages need fallback patterns; dense pages need progressive disclosure to avoid overwhelming.
Heading hierarchy that reflects data. H2s and H3s map to data sections (overview, details, comparisons, related). The heading structure is not decorative; it tells crawlers and AI engines what the page covers.
Schema markup as part of template. Structured data (JSON-LD, microdata, or RDFa depending on the stack) embedded in the template renders machine-readable signals at scale across the entire set. Without schema, the set is invisible to the structured-data extraction layer search and answer engines run.
Internal linking placeholders. The template includes link slots for related-record cross-references, parent-category links, and sibling-record links. 5 to 15 internal links per page is typical for a well-linked set.
Distinctive value per page. Each generated page must offer something the user could not get by going up to the parent or sideways to a sibling. If the page is just a re-pivot of the parent's data, the page is filler.
The template's quality bar. A randomly sampled page from the set, viewed in isolation, should answer the user's likely query competently. If a randomly sampled page is thin, the entire set is thin.
Detail in references/template-design-patterns.md.
Schema design
The data shape that drives the template. Design principles.
Field count signals depth. 5 fields per record is thin. 15 to 20 is competent. 30+ is deep. The field count is not a vanity metric; it determines what the template can render at depth versus what falls back to filler.
Required vs optional fields. Which fields must populate for a page to ship; templates need graceful degradation for optional gaps. A page with 3 of 30 optional fields populated should not ship; a page with 25 of 30 should.
Computed fields. Derived from source data. Average price per square foot computed from listings adds depth without requiring more source data. Computed fields are the pSEO equivalent of the writer adding analysis: the data does the analysis once, every page benefits.
Cross-record fields. Fields that reference other records in the set enable internal linking and comparison. The "5 most similar neighborhoods" field on every neighborhood page powers the sibling-link section without manual curation.
Update frequency tags. Which fields are static (geographic features, founding date) versus which need refresh tracking (current pricing, availability, current statistics). Without tagging, refresh becomes "audit everything" instead of "refresh the volatile fields."
The "schema-as-product" principle. The schema that drives pSEO IS the product surface. Treat it with the same rigor as a database schema for a customer-facing application. Versioned, reviewed, documented, breaking-change-aware.
Detail in references/schema-design-patterns.md.
Quality control at scale
Auditing all 100,000 pages individually is infeasible. The discipline.
Sampling strategy. Random sample 50 to 200 pages per audit cycle, balanced across data shape (sparse, dense, recent, old, popular categories, niche categories). The sample is not "the latest 50 pages"; it is a stratified sample that exposes problems specific to particular data shapes.
Automated checks. Heading structure, schema validity, internal link count, word count thresholds, duplicate-content checks, broken-link checks. Run on every page on a cadence; surface failures as a queue.
Manual review checklist. For sampled pages, check the top-200-word answer quality, check whether the page would satisfy the user's query, check whether the page reads as distinctive versus templated. The manual review is the layer automated checks miss.
Failure thresholds. If more than 5% of sampled pages fail the manual review, halt new generation and fix the template or data before scaling further. The threshold is not negotiable; without it, quality drift compounds invisibly until the algorithm update reveals it.
Cohort tracking. Pages generated in one period rank differently from pages generated in another? That signals a template change or data quality drift; investigate. Cohort tracking is the pSEO equivalent of the experimentation team's drift detection.
The team budget question. A 10,000-page pSEO set requires roughly 0.5 to 1.0 FTE of ongoing quality control discipline. If that is not budgeted, the program will decay. The set will look fine on launch and degrade over 6 to 12 months as data drifts, templates ship without QC, and the gap between the QC plan and the actual cadence widens.
Detail in references/quality-control-at-scale.md.
Internal linking across the set
pSEO sets need intentional internal linking architecture to compound.
Hub pages. Parent-level pages (e.g., "homes for sale in Denver") that link to all child pages (specific neighborhoods). Hub pages are linked from main navigation; they are the entry point for the set's traffic.
Sibling linking. Each page links to 5 to 15 related sibling pages: similar neighborhoods, similar price ranges, similar features. Sibling links are computed from cross-record fields in the schema; they are not manual curation at scale.
Reverse-direction linking. Child pages link up to hub; sibling links go both directions. The graph is bidirectional; PageRank flows both ways.
Anchor text variation. Avoid every page using the same anchor text. Vary by record characteristic (the record name, a descriptive phrase, the parent category). Uniform anchor text at scale signals manipulation to ranking systems.
Crawl-friendly architecture. Hub pages are linked from main navigation. Child pages are reachable via hub or via segmented sitemaps. No orphan child pages.
The PageRank flow principle. Internal linking is what makes pSEO sets compound. A well-linked set distributes ranking signal across all pages; a poorly-linked set has hub pages ranking and child pages orphaned. The link architecture is half the program; the data is the other half.
Detail in references/internal-linking-at-scale.md.
Crawl budget management
Search engines have crawl budgets. Large pSEO sets can exhaust them.
Sitemap segmentation. Split sitemaps by category or recency so crawlers prioritize active pages. A single sitemap with 100,000 entries is harder for crawlers to process than 10 segmented sitemaps with 10,000 entries each, organized by category or last-update.
Noindex on thin pages. Pages with insufficient data should noindex rather than ship as bait. The discipline: pages that fail the schema's required-field threshold do not get a public URL; they sit in the system as drafts until their data populates.
Canonical handling. Comparison pages (X vs Y) often have inverse pages (Y vs X). Canonicalize to one. Comparison-pivot duplicates without canonicalization split ranking signal.
Crawl rate signals. Monitor Google Search Console crawl stats. If crawl rate is plateauing while page count grows, the set is hitting budget limits. The fix is not "publish more pages"; it is "noindex thin pages and improve internal linking so crawlers prioritize better."
Pruning criteria. Pages that fail to attract clicks or impressions for 6+ months should noindex or 410. Pruning is hygiene, not failure. A 100,000-page set with 60% earning traffic is healthier than the same set with 40% earning traffic plus 60% dead weight crawlers waste budget on.
Detail in references/crawl-budget-management.md.
AEO and GEO for programmatic pages
Answer engines treat programmatic pages differently from editorial pages.
Direct-answer extraction. AI engines extract the top-200-word answer; templates that lead with the answer get cited. The opening of the page is the citation candidate; everything after it is supporting depth.
Structured data signals. JSON-LD with comprehensive schema is read by AI engines as authority signal at scale. Programmatic pages with minimal schema lose to programmatic pages with deep schema even when the underlying data is similar.
Citation patterns. AI engines tend to cite programmatic data pages for factual queries (prices, statistics, comparisons, factual details) and editorial pages for analytical queries (why, how, what should I think). Design templates to fit the factual-query lane; do not try to make programmatic pages compete for analytical queries that editorial content owns.
Quality crackdown sensitivity. AI engines have their own quality signals beyond search engines. Thin programmatic content gets deprioritized faster in AI surfaces than in traditional search. The pSEO penalty risk has migrated to AI surfaces; the same quality-at-scale discipline applies, with less tolerance.
The "two-engine optimization" framing applies. pSEO pages should serve both search engines and answer engines. Most pSEO programs were designed pre-AEO and need template updates for the new surface: stronger top-200-word answers, deeper schema markup, FAQPage schema where the page contains FAQ structure.
Detail in references/aeo-geo-for-programmatic-pages.md.
Refresh and maintenance at scale
pSEO sets decay if not maintained.
Data refresh cadence. Quarterly minimum for most datasets. Daily or weekly for time-sensitive data (real estate listings, prices, availability, current statistics). The refresh cadence is set when the program launches; retrofitting is expensive.
Template version migration. When the template improves, all existing pages need migration. Cohort-by-cohort migration with monitoring beats big-bang migration. Each cohort gets the new template, sits for 30 days under monitoring, then the next cohort migrates.
Pruning lifecycle. Pages generated 2+ years ago that have not earned traffic should be evaluated for noindex or removal. The 24-month checkpoint is the standard cutoff; some categories warrant earlier or later pruning.
Set-level refresh. Occasionally the entire set's template needs a refresh as user expectations evolve, AI engine signals shift, or category conventions change. Set-level refreshes are 6-month projects, not weekend cleanups.
Detail in references/refresh-at-scale.md.
Common failure modes
Rapid-fire. Diagnoses in references/common-pseo-failures.md.
- "We generated 50,000 pages and got penalized." Thin content, scaled too fast, no QC discipline.
- "We have 10,000 pages but only 200 rank." Internal linking architecture missing; child pages are orphans.
- "Crawl rate plateaued." Crawl budget exhausted; noindex thin pages and improve sitemap segmentation.
- "Pages look identical." Template lacks variable density; data is too sparse to differentiate.
- "We cannot update the templates without 10,000 manual fixes." Template versioning was not designed for scale.
- "Our quality control is whoever has time." No sampling discipline, no thresholds, no ownership.
- "AI engines do not cite our pages." Top-200-word answer was not designed; structured data is thin.
- "We are at 100,000 pages and engagement is dropping." The set has outgrown its data depth.
- "Competitors copied our pages." The data source was not a moat.
- "Refresh is overwhelming." Cadence was not designed for scale; templates are not migration-friendly.
- "Our pSEO drives traffic but no conversions." Query intent does not match buyer intent; reconsider whether pSEO was the right channel for the program goal.
The framework: 12 considerations for programmatic SEO
When designing or auditing a programmatic SEO program, walk these 12 considerations.
- Right answer for the program. pSEO needs real underlying data, queryable intent, and quality-control budget. Default to no if any is missing.
- Data source as moat. Replicable data is not defensible; first-party, licensed, expert-curated, or synthesized data is.
- Schema depth. 15+ fields per record minimum; 30+ for competitive depth.
- Template variable density. Accommodates sparse and dense records gracefully.
- Above-the-fold answer. First 200 words answer the user's specific query.
- Schema markup at scale. Structured data on every page; this is the AEO/GEO signal.
- Internal linking architecture. Hub-and-spoke plus sibling linking, no orphans.
- Quality control sampling. 50 to 200 pages per audit, balanced across data shape.
- Failure thresholds. 5% sample failure halts generation until template or data fixes ship.
- Crawl budget discipline. Sitemap segmentation, noindex on thin pages, canonicalization on duplicates.
- Refresh cadence. Quarterly minimum data refresh; cohort-by-cohort template migration.
- Pruning lifecycle. Pages that fail to earn traffic in 12 to 24 months get noindexed or removed.
The output of the framework is a pSEO design document the team can reference at every stage: data source named, schema versioned, template specified, QC discipline budgeted, internal linking architecture mapped, refresh cadence set.
Reference files
references/when-pseo-works-decision.md - Five-criterion decision framework with worked examples of yes, no, and maybe.
references/data-source-identification-patterns.md - First-party, licensed, public, expert-curated, synthesized; defensibility analysis.
references/template-design-patterns.md - Variable density, above-the-fold answer, heading hierarchy, schema integration, link slots.
references/schema-design-patterns.md - Field count, computed fields, cross-record fields, update tags, schema-as-product.
references/quality-control-at-scale.md - Sampling strategy, automated checks, manual review checklist, failure thresholds, cohort tracking.
references/internal-linking-at-scale.md - Hub-and-spoke, sibling linking, anchor text variation, orphan prevention.
references/crawl-budget-management.md - Sitemap segmentation, noindex patterns, canonicalization, crawl rate monitoring, pruning criteria.
references/aeo-geo-for-programmatic-pages.md - Direct-answer extraction, structured data signals, citation patterns, two-engine optimization.
references/refresh-at-scale.md - Data refresh cadence, template versioning, cohort migration, pruning lifecycle.
references/common-pseo-failures.md - Eleven-plus failure patterns with diagnoses and fixes.
Closing: quality at scale or not at all
The only durable pSEO programs are the ones that hold quality at scale. The temptation to scale quantity at the cost of quality has produced the entire bad reputation pSEO carries. Sites that resist that temptation, that maintain real data depth, real quality control, real refresh discipline, keep producing meaningful traffic for years. Sites that do not get penalized within months of an algorithm update.
The discipline is not optional and it is not free. Budget for it before generating the first page, or do not start at all.
When in doubt about whether a pSEO program is ready, ask: is the data source actually unique, is the schema deep, does the template lead with the answer, is the internal linking architecture mapped, is QC budgeted in headcount, is the refresh cadence set? If yes to all of those, ship the design document and let the engineering work begin. If no to any of them, fix the gap before any pages generate.