تشغيل أي مهارة في Manus بنقرة واحدة

llm-prompting

النجوم٠

التفرعات٠

آخر تحديث٩ فبراير ٢٠٢٦ في ٠٢:١٦

Expert guidance on LLM prompting patterns used in this project. Covers versioned prompts, schema enforcement, category calibration, multi-provider fallback, and narrative generation. Activates when working on prompts, LLM integration, or evaluation logic.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

uabbasi

uabbasi/good-measure-giving

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

frontend-design

uabbasi/good-measure-giving

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, artifacts, posters, or applications (examples include websites, landing pages, dashboards, React components, HTML/CSS layouts, or when styling/beautifying any web UI). Generates creative, polished code and UI design that avoids generic AI aesthetics.

2026-06-140

zakat-fiqh

uabbasi/good-measure-giving

Islamic jurisprudence expertise on zakat - the 8 categories of recipients (asnaf), scholarly interpretations across madhabs, and how to assess charity alignment with zakat eligibility. Activates when working on zakat classification, wallet tags, or donor guidance.

2026-06-070

analytics

uabbasi/good-measure-giving

Unified analytics (Cloudflare + Firestore + GA4). Pull traffic data, user activity, feature adoption, and giving metrics. Use when checking site performance, user behavior, or feature usage.

2026-04-040

content-research-writer

uabbasi/good-measure-giving

Write accessible, research-backed white papers and guides on complex topics. Target 8th grade readability while citing best-in-class scholarship. Output is 2-5 page PDFs that compete for attention in a TikTok world. NOT for charity narratives.

2026-02-090

data-pipeline

uabbasi/good-measure-giving

Run the charity evaluation pipeline - extraction, 4-stage V2 workflow, Supabase patterns, versioning. Use when working on collectors, scrapers, pipeline code, database queries, or debugging data flow.

2026-02-090

evidence-based-giving

uabbasi/good-measure-giving

Philanthropic expertise on charity evaluation methodologies, grantmaking best practices, cost-effectiveness analysis, and strategic giving. Activates when working on charity scoring, evaluation frameworks, donor guidance, or impact assessment.

2026-02-090

name	llm-prompting
description	Expert guidance on LLM prompting patterns used in this project. Covers versioned prompts, schema enforcement, category calibration, multi-provider fallback, and narrative generation. Activates when working on prompts, LLM integration, or evaluation logic.

LLM Prompting Expert

Deep expertise on the prompting infrastructure, patterns, and conventions used in this codebase.

When This Skill Activates

Writing or modifying prompts in data-pipeline/src/llm/prompts/
Working on NarrativeEvaluator, NarrativeJudge, or scoring evaluators
Debugging LLM outputs or schema validation
Adding new LLM-powered features
Optimizing prompt performance or cost

Prompt Infrastructure

Versioned Prompt System

Location: data-pipeline/src/llm/prompts/

Format: All prompts use frontmatter with version tracking:

# PROMPT: prompt_name
# VERSION: 1.0.0
# LAST_UPDATED: 2025-11-27
# DESCRIPTION: Brief description
# ---PROMPT_START---
[actual prompt content]

Key features:

SHA256 content hashing (16-char truncated) detects unintended changes
Version check modes: warn, strict, or off
Loader: prompt_loader.py provides load_prompt(), load_prompt_raw(), list_prompts()

Rule: Content changes without version bumps trigger warnings/errors.

Prompt Files

File	Version	Purpose
`baseline_narrative.txt`	v3.0.0	150-200 word summaries
`rich_narrative.txt`	v1.3.0	500-800 word deep dives
`zakat_alignment.txt`	v1.0.0	Zakat classification
`categories/*.txt`	varies	16 category calibrations

LLM Client Architecture

File: `data-pipeline/src/llm/llm_client.py`

Multi-provider via LiteLLM:

Primary: Gemini 3.0 Pro/Flash
Fallback: Claude 3.5 Sonnet, GPT-4

Task-based model selection (LLMTask enum):

Task	Model	Reason
`NARRATIVE_GENERATION`	Gemini 3 Flash	Best quality/cost
`WEBSITE_EXTRACTION`	Gemini 3 Flash	Fast, cheap
`PDF_EXTRACTION`	Gemini 3 Flash	Document handling
`PREMIUM_NARRATIVE`	Gemini 3 Flash	Top charities
`EVALUATION_SCORING`	Gemini 3 Flash	LLM-as-judge

Fallback logic:

Transient errors (429, 503, rate limits): Try fallback model
Permanent errors (auth, schema validation): Fail immediately
Max 3 attempts per chain

Safety settings disabled (charity content is not harmful):

safety_settings = [
    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
    ...
]

Schema Enforcement

Location: `data-pipeline/src/llm/schemas/`

All LLM outputs use Pydantic models with Gemini JSON schema enforcement:

# In narrative_evaluator.py
response = llm_client.generate(
    prompt=formatted_prompt,
    json_schema={
        "name": "baseline_narrative",
        "schema": BaselineNarrative.model_json_schema()
    }
)

Key Schemas

BaselineNarrative (150-200 words):

class BaselineNarrative(BaseModel):
    charity_name: str
    summary: str  # 2-3 sentences
    tier_1_strategic_fit: Tier1StrategicFit
    tier_2_execution: Tier2Execution
    overall_score: int  # 0-100
    strengths: List[Strength]
    improvement_areas: List[ImprovementArea]
    zakat_guidance: ZakatGuidance
    confidence: Confidence

Sub-score models:

class SubScore(BaseModel):
    score: int
    narrative: str  # Explain reasoning

class UmmahGapSubScores(BaseModel):
    replaceability: SubScore  # 0-8
    ummah_relevance: SubScore  # 0-6
    funding_gap: SubScore  # 0-6

Validation: Pydantic ensures type safety + sub-score sums match totals.

Scoring Framework: AMAL Impact Matrix

Embedded in prompts as calibration tables

Tier 1: Strategic Fit (50 points)

Dimension	Range	Levels
Systemic Leverage	0-30	World-class → National → Regional → Local → Consumptive
Ummah Gap	0-20	3 sub-scores: replaceability (0-8), ummah_relevance (0-6), funding_gap (0-6)

Tier 2: Execution (50 points)

Dimension	Range	Sub-scores
Operational Capability	0-25	governance (0-8), program_efficiency (0-8), deployment_capacity (0-6), track_record (0-3)
Mission Delivery	0-25	delivery_evidence (0-12), cost_effectiveness (0-8), learning_adaptation (0-5)

Key pattern: Every score level has specific criteria with examples embedded in prompt.

Category-Specific Calibration

Location: `data-pipeline/src/llm/prompts/categories/`

16 category-specific guides inject world-class benchmarks:

Example: HUMANITARIAN.txt

## WORLD-CLASS BENCHMARKS
- MSF (Doctors Without Borders)
- ICRC (International Committee of the Red Cross)
- UNHCR
- World Food Programme
- Partners in Health

## SCORING CALIBRATION FOR HUMANITARIAN ORGS

### Systemic Leverage (0-30)
26-30: International treaty influence, paradigm-shifting models
20-25: Operating at scale ($100M+), policy influence
...

How it works:

category_classifier.py maps EIN → category via config/charity_categories.yaml
NarrativeEvaluator inserts category prompt before charity data
LLM calibrates scores against domain-specific benchmarks

Categories (16 total):

HUMANITARIAN, BASIC_NEEDS, MEDICAL_HEALTH
EDUCATION_K12_RELIGIOUS, EDUCATION_HIGHER_RELIGIOUS, EDUCATION_INTERNATIONAL
RELIGIOUS_CONGREGATION, RELIGIOUS_OUTREACH
CIVIL_RIGHTS_LEGAL, WOMEN'S_SERVICES, ADVOCACY_CIVIC
ENVIRONMENT_CLIMATE, RESEARCH_POLICY, MEDIA_JOURNALISM
PHILANTHROPY_GRANTMAKING, SOCIAL_SERVICES

Prompt Templating

Variable substitution pattern:

prompt_template = load_prompt_raw("baseline_narrative")

formatted_prompt = prompt_template.format(
    charity_data=self._format_charity_data(charity),
    deterministic_scores=self._format_deterministic_scores(charity),
    category_section=self._get_category_section(charity)
)

Variables:

{charity_data}: Name, EIN, mission, financials, raw scraped data
{deterministic_scores}: Pre-calculated Tier 2 sub-scores (LLM cannot change these)
{category_section}: Category-specific benchmarks and calibration

Intentional exclusions:

Does NOT include: amal_score, amal_rating, wallet_tag
Reason: Prevent anchoring; let LLM score fresh

Deterministic Score Override

Critical pattern: Some scores are calculated deterministically, not by LLM.

# In narrative_evaluator.py after LLM generation:

# Override LLM scores with deterministic values
narrative.tier_2_execution.operational_capability.governance = deterministic_governance
narrative.tier_2_execution.operational_capability.program_efficiency = deterministic_efficiency
narrative.tier_2_execution.operational_capability.deployment_capacity = deterministic_capacity
narrative.tier_2_execution.operational_capability.track_record = deterministic_track_record

# Recalculate all subtotals deterministically
narrative.recalculate_totals()

Why: Ensures reproducibility independent of LLM behavior. LLM provides rationales; code provides numbers.

Zakat Classification: Self-Assertion

Prompt: `zakat_alignment.txt`

Pattern: Trust what the charity claims, don't make independent rulings.

ZAKAT_ELIGIBLE: Charity explicitly claims zakat eligibility on website
SADAQAH_STRATEGIC: No zakat claim BUT tier_1_strategic_fit.subtotal > 35
SADAQAH_ONLY: No zakat claim AND tier_1_strategic_fit.subtotal <= 35

8 Asnaf embedded in prompt:

Al-Fuqara (the poor)
Al-Masakin (the destitute)
Al-Amileen (zakat administrators)
Al-Muallafatul Quloob (hearts to be reconciled)
Fi Al-Riqab (freeing from bondage)
Al-Gharimeen (those in debt)
Fi Sabilillah (in Allah's cause)
Ibn Al-Sabil (stranded travelers)

Required disclaimer: "This analysis is informational only and does NOT constitute a fatwa."

LLM-as-Judge Quality Gating

File: `narrative_judge.py`

5 dimensions (each 0-100):

Dimension	Weight	What it checks
Factual Accuracy	30%	Claims match source data?
Completeness	20%	Required fields populated?
Coherence	20%	Internally consistent?
Tone/Style	15%	Donor-facing voice?
Zakat Validity	15%	Classification defensible?

Routing thresholds:

≥ 85: Auto-approve
60-84: Human review
< 60: Auto-reject

Important: Judge provides quality metrics only, not routing decisions.

Evidence Citation Pattern

Every claim requires structured evidence:

class Evidence(BaseModel):
    claim: str           # The claim being made
    source: str          # "Form 990", "Charity Navigator", "Website"
    source_year: int     # Year of data
    field: str           # Specific field from source
    value: str           # Actual value
    confidence: str      # "HIGH", "MEDIUM", "LOW"

Prompt principle: "Every claim MUST cite source and year."

Retry & Validation

Generation retry loop (max 3 attempts):

for attempt in range(MAX_GENERATION_RETRIES):
    response = llm_client.generate(prompt, json_schema=schema)

    # Validation checks:
    narrative = BaselineNarrative.model_validate_json(response)
    validate_sub_score_sums(narrative)
    density = calculate_information_density(narrative)

    if density >= 0.80:
        return narrative

    # Retry with feedback
    prompt = add_density_feedback(prompt, density)

Information density check:

Counts populated fields vs total schema fields
Threshold: 0.80 (80% of fields must be populated)
Below threshold → human review or retry

Cost Tracking

Every LLM call logs:

class LLMResponse:
    model_version: str
    prompt_version: str
    prompt_hash: str
    db_snapshot_version: str
    timestamp: datetime
    prompt_tokens: int
    completion_tokens: int
    cost_usd: float

Model registry in llm_client.py includes pricing for 30+ models.

Prompt Engineering Patterns

Pattern	How It's Used
Chain-of-thought	Each dimension has `narrative` field explaining reasoning
Scoring rubrics	Detailed 5-7 level tables embedded in prompts
Few-shot examples	Real org comparisons (MSF, Water.org, ACLU)
Low temperature	`temperature=0.3` for consistent output
Schema enforcement	JSON mode ensures exact structure
Category calibration	Domain-specific benchmarks injected
Self-assertion	Zakat based on charity's claim, not judgment
Deterministic fallback	Critical scores calculated outside LLM

Key Files Reference

File	Purpose
`prompt_loader.py`	Load/validate/version prompts
`llm_client.py`	Unified LLM interface + model registry
`narrative_evaluator.py`	Generate narratives with schema enforcement
`narrative_judge.py`	Quality gating (metrics only)
`baseline_narrative.txt`	Main prompt (v3.0.0)
`categories/*.txt`	16 category calibrations
`schemas/baseline.py`	BaselineNarrative Pydantic model
`schemas/judge.py`	JudgeResult model

Anti-Patterns

Don't:

Change prompt content without bumping version
Let LLM determine scores that should be deterministic
Skip schema validation
Use high temperature for scoring tasks
Embed specific charity data in prompt templates
Make fatwa-like zakat rulings (use self-assertion)

Do:

Version all prompts with frontmatter
Embed calibration tables directly in prompts
Use category-specific benchmarks
Validate sub-score sums match totals
Log all LLM calls with full metadata
Provide chain-of-thought fields for reasoning

llm-prompting

المزيد من هذا المستودع

المزيد من هذا المستودع

LLM Prompting Expert

When This Skill Activates

Prompt Infrastructure

Versioned Prompt System

Prompt Files

LLM Client Architecture

File: data-pipeline/src/llm/llm_client.py

Schema Enforcement

Location: data-pipeline/src/llm/schemas/

Key Schemas

Scoring Framework: AMAL Impact Matrix

Embedded in prompts as calibration tables

Category-Specific Calibration

Location: data-pipeline/src/llm/prompts/categories/

Prompt Templating

Variable substitution pattern:

Variables:

Intentional exclusions:

Deterministic Score Override

Zakat Classification: Self-Assertion

Prompt: zakat_alignment.txt

LLM-as-Judge Quality Gating

File: narrative_judge.py

Evidence Citation Pattern

Retry & Validation

Generation retry loop (max 3 attempts):

Information density check:

Cost Tracking

Prompt Engineering Patterns

Key Files Reference

Anti-Patterns

Don't:

Do:

LLM Prompting Expert

When This Skill Activates

Prompt Infrastructure

Versioned Prompt System

Prompt Files

LLM Client Architecture

File: data-pipeline/src/llm/llm_client.py

Schema Enforcement

Location: data-pipeline/src/llm/schemas/

Key Schemas

Scoring Framework: AMAL Impact Matrix

Embedded in prompts as calibration tables

Category-Specific Calibration

Location: data-pipeline/src/llm/prompts/categories/

Prompt Templating

Variable substitution pattern:

Variables:

Intentional exclusions:

Deterministic Score Override

Zakat Classification: Self-Assertion

Prompt: zakat_alignment.txt

LLM-as-Judge Quality Gating

File: narrative_judge.py

Evidence Citation Pattern

Retry & Validation

Generation retry loop (max 3 attempts):

Information density check:

Cost Tracking

Prompt Engineering Patterns

Key Files Reference

Anti-Patterns

Don't:

Do:

File: `data-pipeline/src/llm/llm_client.py`

Location: `data-pipeline/src/llm/schemas/`

Location: `data-pipeline/src/llm/prompts/categories/`

Prompt: `zakat_alignment.txt`

File: `narrative_judge.py`

File: `data-pipeline/src/llm/llm_client.py`

Location: `data-pipeline/src/llm/schemas/`

Location: `data-pipeline/src/llm/prompts/categories/`

Prompt: `zakat_alignment.txt`

File: `narrative_judge.py`