ワンクリックで
stci-dataops
// Research and onboard new data sources for STCI, producing source profiles, collector stubs, and validation rules
// Research and onboard new data sources for STCI, producing source profiles, collector stubs, and validation rules
| name | stci-dataops |
| description | Research and onboard new data sources for STCI, producing source profiles, collector stubs, and validation rules |
This skill assists with researching, evaluating, and onboarding new data sources for the STCI (Standard Token Cost Index). Given a provider or aggregator, it produces structured outputs ready for implementation.
Use this skill when you need to:
Given a source URL or name, gather:
Check and document:
Decision Matrix:
| Condition | Action |
|---|---|
| API explicitly public + no ToS restrictions | Proceed to T1/T2 source |
| API available but ToS unclear | Request legal review before production |
| Scraping required + ToS prohibits | Do NOT proceed - use alternative |
| Manual collection only viable | Proceed as T4 source with caveats |
For API sources:
# Fetch sample data
curl -s [API_URL] | head -100
# Analyze structure
# - Identify model ID field
# - Identify pricing fields (input, output, per-request)
# - Identify metadata fields (context window, etc.)
# - Note rate units (per-token, per-1K, per-1M)
For HTML sources:
# Check page structure
# - Identify pricing table elements
# - Note update indicators (timestamps, version)
# - Assess scraping complexity
Output a completed source profile using template:
000-docs/009-DR-TMPL-source-profile.md
Required sections:
Produce a Python module following this pattern:
# services/collector/sources/{source_id}.py
from .base import BaseSource
class {SourceName}Source(BaseSource):
"""
{Source description}
See: {Source URL}
"""
API_URL = "{api_url}"
@property
def source_id(self) -> str:
return "{source_id}"
@property
def source_tier(self) -> str:
return "{tier}" # T1, T2, T3, or T4
def fetch(self, target_date: date) -> List[dict]:
# Implementation
pass
Specify source-specific rules:
validation:
required_fields:
- model_id
- input_rate
- output_rate
rate_bounds:
input_max: 100.0 # USD per 1M
output_max: 500.0
model_id_pattern: "^[a-z0-9-]+/[a-z0-9.-]+$"
cross_reference:
enabled: true
tolerance: 0.10 # 10% tolerance
reference_source: "openrouter"
Create test data for the source:
// data/fixtures/{source_id}_sample.json
[
{
"observation_id": "obs-2026-01-01-{source_id}-{model}",
"provider": "{provider}",
...
}
]
After running this skill, you should have:
000-docs/0XX-DR-REFF-{source_id}-profile.md)services/collector/sources/{source_id}.py)data/fixtures/{source_id}_sample.json)data/fixtures/methodology.yaml)Source: OpenRouter
URL: https://openrouter.ai
API: https://openrouter.ai/api/v1/models
Legal Assessment:
Data Structure:
{
"data": [
{
"id": "openai/gpt-4o",
"name": "GPT-4o",
"pricing": {
"prompt": "0.0000025",
"completion": "0.00001"
}
}
]
}
Normalization:
id → model_idpricing.prompt × 1M → input_rate_usd_per_1mpricing.completion × 1M → output_rate_usd_per_1mSource Tier: T1 (public API, high confidence)
STCI Data Operations Skill Version: 1.0.0