| name | gsc-pull |
| description | Pull and analyze Google Search Console data via the GSC API. Produces: weekly query × page performance with WoW/MoM/YoY deltas, top movers report (winners + losers ranked by absolute click change), click-loss attribution (lost rank? lost impressions? lost CTR?), query-page mismatch detection (page ranking for the wrong query), branded-vs-nonbranded split, position-bucket analysis (top 3, 4-10, 11-20, 21+), country / device / search-type segmentation, and the URL Inspection findings for sample URLs (indexed? canonical correct? mobile-friendly? Core Web Vitals?). All output is YOUR data in YOUR DB — no vendor lock-in. Matches the GSC integration that Visibly AI / Semrush / Ahrefs gate behind monthly subscriptions. TRIGGER on "GSC", "Google Search Console", "Search Console API", "rank tracking", "SEO position changes", "click loss", "impression loss", "top movers", "indexing report", "query analysis". |
GSC Pull & Analysis Pipeline
You generate a Google Search Console data pipeline + analyzer. The platform value of Semrush/Ahrefs/Visibly is mostly "we wrap the GSC API for you." You can do it better by giving the user the data + the queries + the analysis in their own stack.
============================================================
=== PRE-FLIGHT ===
Recovery:
- If service account isn't set up: generate the gcloud + IAM commands; the user runs them once.
- If property is URL prefix only: warn user; data will be incomplete vs domain property.
============================================================
=== PHASE 1: API CLIENT + AUTH ===
Generate gsc_client.py (Python preferred — Google's google-api-python-client is mature):
from google.oauth2 import service_account
from googleapiclient.discovery import build
class GSCClient:
SCOPES = ["https://www.googleapis.com/auth/webmasters.readonly"]
def __init__(self, service_account_path: str, site_url: str):
creds = service_account.Credentials.from_service_account_file(
service_account_path, scopes=self.SCOPES
)
self.service = build("searchconsole", "v1", credentials=creds)
self.site = site_url
def search_analytics(self, start, end, dimensions, filters=None, row_limit=25000):
"""
dimensions: ["date", "query", "page", "country", "device", "searchAppearance"]
Returns up to 25k rows per call. Use date partitioning for full coverage.
"""
body = {
"startDate": start.isoformat(),
"endDate": end.isoformat(),
"dimensions": dimensions,
"rowLimit": row_limit,
"dataState": "final",
}
if filters: body["dimensionFilterGroups"] = filters
...
def inspect_url(self, inspection_url: str, language_code="en-US"):
"""URL Inspection API — 2k requests per day per site. Use sparingly."""
return self.service.urlInspection().index().inspect(body={
"inspectionUrl": inspection_url,
"siteUrl": self.site,
"languageCode": language_code,
}).execute()
VALIDATION: Smoke test fetches recent 7-day query data. URL inspection returns IndexStatusResult for a known indexed page.
============================================================
=== PHASE 2: DATA SCHEMA ===
CREATE TABLE gsc_daily (
date DATE NOT NULL,
query TEXT,
page TEXT,
country TEXT,
device TEXT,
search_appearance TEXT,
impressions INTEGER,
clicks INTEGER,
position REAL,
ctr REAL GENERATED ALWAYS AS (CASE WHEN impressions > 0 THEN clicks * 1.0 / impressions ELSE 0 END),
PRIMARY KEY (date, query, page, country, device, search_appearance)
);
CREATE INDEX idx_gsc_query ON gsc_daily(query);
CREATE INDEX idx_gsc_page ON gsc_daily(page);
CREATE INDEX idx_gsc_date ON gsc_daily(date);
CREATE TABLE gsc_url_inspection (
url TEXT PRIMARY KEY,
last_inspected DATETIME,
index_status TEXT,
coverage_state TEXT,
google_canonical TEXT,
user_canonical TEXT,
mobile_usable BOOLEAN,
rich_results_eligible BOOLEAN,
last_crawl DATETIME,
crawled_as TEXT,
page_fetch_state TEXT,
raw_response JSON
);
VALIDATION: 16-month backfill loads without primary-key conflicts. Storage size estimate matches expectation (~10MB per 1k unique queries per month).
============================================================
=== PHASE 3: WEEKLY PULL JOB ===
Schedule (cron or scheduled skill):
daily 02:00 → pull yesterday's data, idempotent upsert
weekly Mon 03:00 → reconcile last 7 days, compute deltas
monthly 1st 04:00 → 16-month backfill check, fill gaps
Per pull:
- Fetch
[date, query, page, country, device] daily.
- Fetch
[date, query] and [date, page] separately (deduplicated rollups, smaller payload).
- Sample 50 URLs/day for URL Inspection (respect 2k/day cap).
VALIDATION: Daily cron runs without manual intervention for 30 days. No duplicate-key errors.
============================================================
=== PHASE 4: ANALYSIS — TOP MOVERS ===
Generate top_movers.py:
For each (query, page) pair appearing in either of two windows (current week vs comparison week), compute:
@dataclass
class Mover:
query: str
page: str
clicks_current: int
clicks_prior: int
clicks_delta: int
clicks_pct_change: float
impressions_delta: int
position_delta: float
ctr_delta: float
is_winner: bool
Winners: top 25 by clicks_delta (filter impressions_prior > 100 to filter noise).
Losers: bottom 25 by clicks_delta (same filter).
Click-loss attribution: for each loser, classify:
RANK_LOSS: position_delta > 1.0 and clicks_delta < 0
IMPRESSION_LOSS: position stable, impressions down (Google reduced surfacing)
CTR_LOSS: position stable, impressions stable, CTR down (SERP feature ate clicks)
SEASONAL: matches a YoY pattern at the same query
VALIDATION: Movers report identifies a known win/loss from a fixture.
============================================================
=== PHASE 5: ANALYSIS — QUERY-PAGE MISMATCH ===
A query-page mismatch happens when:
- A query ranks via the WRONG page on your site (target page is buried, intent page underperforms).
- Multiple pages compete for the same query (= cannibalization — chain to
/content-cannibalization).
Detect: for each query with > 100 impressions in the window, list all (page, impressions, clicks, avg position). Flag where the top-impression page differs from the page you intended to rank.
Output query_page_mismatch.csv with recommended action (consolidate, canonicalize, redirect, differentiate, or update internal links).
VALIDATION: Mismatches show concrete page paths + actionable recommendation.
============================================================
=== PHASE 6: BRANDED-VS-NONBRANDED SPLIT ===
For each query, classify branded or nonbranded based on the brand alias list:
def classify(query: str, brand_aliases: list[str]) -> str:
q = query.lower()
return "branded" if any(a.lower() in q for a in brand_aliases) else "nonbranded"
Compute weekly totals: branded_clicks, nonbranded_clicks, branded_share. Track trend over time.
Healthy growth profile: nonbranded growing faster than branded (you're acquiring new users, not just being searched for by existing ones).
VALIDATION: Branded/nonbranded sums equal total clicks. Trend chart renders.
============================================================
=== PHASE 7: REPORTS ===
gsc-pull/
├── README.md
├── data/
│ └── gsc.db # SQLite (or pointer to Postgres)
├── reports/
│ ├── weekly_summary.md
│ ├── top_movers.csv # winners + losers
│ ├── click_loss_attribution.md
│ ├── query_page_mismatch.csv
│ ├── branded_vs_nonbranded.csv
│ ├── position_bucket_analysis.csv # % of clicks from top 3 / 4-10 / 11-20 / 21+
│ └── url_inspection_findings.md
└── cron/
└── daily_pull.sh
Weekly summary structure:
# GSC Weekly — {site} — {week}
## TL;DR
- Total clicks: X (+/- Y vs prior week, +/- Z YoY)
- Total impressions: X
- Avg position: X
- Branded share: X%
- Top winner: {query} — {page} (+N clicks)
- Top loser: {query} — {page} (-N clicks)
## Top 5 Winners
{table}
## Top 5 Losers + Attribution
{table with rank-loss / impression-loss / CTR-loss labels}
## Query-Page Mismatches
{N detected — see csv}
## Recommended Actions
{ordered by impact}
VALIDATION: Summary fits one screen. Recommended actions reference specific queries + pages.
============================================================
=== SELF-REVIEW ===
- Complete: API client + schema + cron + 4 analyses + reports?
- Robust: Handles 2k/day URL inspection cap? Survives API rate limits?
- Clean: SQLite/Postgres choice transparent? CSVs Excel-friendly?
- GSC-credible: Would an SEO consultant who lives in GSC daily accept the analyses?
Common gap: forgetting dataState=final vs all and getting estimated data that shifts next day. Confirm which is used in each query.
============================================================
=== LEARNINGS CAPTURE ===
~/.claude/skills/gsc-pull/LEARNINGS.md.
============================================================
=== STRICT RULES ===
- Never query
dataState="all" and then declare the data stable. Use final for analytical work.
- Never hit URL Inspection > 2,000/day per property — it's a hard cap.
- Never lose history. Start persisting NOW even if the user isn't analyzing yet.
- Never analyze without WoW/MoM/YoY context. Absolute numbers without comparison are vanity.
- Always classify click loss into the four root-cause buckets. "Clicks down" without attribution is useless.