تشغيل أي مهارة في Manus بنقرة واحدة

grpo-finetune

النجوم٣٥٬٩٥٩

التفرعات٥٬٩٦٠

آخر تحديث٣ يونيو ٢٠٢٦ في ٠٥:٢٤

Fine-tune a model with GRPO on Fireworks-managed GPUs from a plain-English task description and a dataset. Use this skill whenever the user wants to fine-tune, RL-tune, or GRPO-train a model on their own data — or says things like "train a model to extract/classify/score X", "fine-tune on this dataset", "set up a GRPO run", or describes a task plus a dataset plus a notion of what a good output looks like. Trigger even when the user does not name GRPO or Fireworks explicitly.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

patchy631

patchy631/ai-engineering-hub

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

مستكشف الملفات

4 ملفات

SKILL.md

readonly

name

grpo-finetune

description

GRPO Fine-Tune Skill

Keys (FIREWORKS_API_KEY, FIREWORKS_ACCOUNT_ID, OPENROUTER_API_KEY) are loaded from .env in the current directory. No extra setup needed if the notebook already ran.

What you do when this skill triggers

1. Understand the task

Read the user's description. Sample 3-5 rows from their dataset (head the .jsonl) to see the prompt format and whether rows carry a gold answer field.

2. Write reward.py

Use this exact reward — schema-only, same as the notebook. Do not add value matching, ground_truth comparison, or field-level scoring. Do not modify it.

import json
from jsonschema import validate, ValidationError

SCHEMA = {
    "type": "object",
    "required": ["vendor", "date", "amount", "currency"],
    "properties": {
        "vendor":   {"type": "string"},
        "date":     {"type": "string"},
        "amount":   {"type": "number"},
        "currency": {"type": "string"},
    },
    "additionalProperties": False,
}

def score(completion: str, row=None) -> float:
    try:
        parsed = json.loads(completion.strip())
    except (json.JSONDecodeError, ValueError):
        return 0.0
    try:
        validate(instance=parsed, schema=SCHEMA)
        return 1.0
    except ValidationError:
        return 0.5

SELF_TESTS = [
    ('{"vendor": "Acme", "date": "2024-01-15", "amount": 1250.0, "currency": "USD"}', None, 1.0),
    ('{"vendor": "Acme", "date": "2024-01-15"}', None, 0.5),
    ("not json", None, 0.0),
]

The score contract is: 1.0 = valid JSON with correct schema, 0.5 = valid JSON wrong shape, 0.0 = not JSON. This is the only reward logic needed.

3. Show it and offer the edit

Show the user reward.py and say: this is what training will optimize for — edit it if your notion of "good" differs. Wait for their go-ahead.

4. Validate

$PYTHON agent-skill/grpo-finetune/generate_reward.py --validate reward.py

Must print PASS before proceeding.

5. Run the pipeline

$PYTHON agent-skill/grpo-finetune/run_pipeline.py \
    --train <path-to-train.jsonl> \
    --eval  <path-to-eval.jsonl> \
    --task  <short-task-name> \
    --output-id <model-id>

Run this in the background immediately. Relay each checkpoint to the user as it lands — print it directly in your response, do not wait to batch them:

>>> Dataset ready · 200 prompts
>>> Training started on Fireworks GPUs
>>> Training complete · model deployed to ...
>>> Fine-tuned model · X% accuracy

The pipeline automatically runs the agent demo on sample invoices at the end.

Important: training takes 30-60+ minutes. Use a timeout of at least 7200 seconds. Do not use the default 10 minute timeout.

المزيد من هذا المستودع

نفس المستودع

hugging-face-jobs

patchy631/ai-engineering-hub

This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with tokens, secrets management, timeout configuration, and result persistence. Designed for general-purpose compute workloads including data processing, inference, experiments, batch jobs, and any Python-based tasks. Should be invoked for tasks involving cloud compute, GPU workloads, or when users mention running jobs on Hugging Face infrastructure without local setup.

2026-01-2336.0k

hugging-face-model-trainer

patchy631/ai-engineering-hub

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

2026-01-2336.0k

brightdata-web-mcp

patchy631/ai-engineering-hub

Search the web, scrape websites, extract structured data from URLs, and automate browsers using Bright Data's Web MCP. Use when fetching live web content, bypassing blocks/CAPTCHAs, getting product data from Amazon/eBay, social media posts, or when standard requests fail.

2026-01-2336.0k

hugging-face-cli

patchy631/ai-engineering-hub

Execute Hugging Face Hub operations using the `hf` CLI. Use when the user needs to download models/datasets/spaces, upload files to Hub repositories, create repos, manage local cache, or run compute jobs on HF infrastructure. Covers authentication, file transfers, repository creation, cache operations, and cloud compute.

2026-01-2336.0k

hugging-face-datasets

patchy631/ai-engineering-hub

Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.

2026-01-2336.0k

hugging-face-evaluation

patchy631/ai-engineering-hub

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

2026-01-2336.0k

name

grpo-finetune

description

GRPO Fine-Tune Skill

Keys (FIREWORKS_API_KEY, FIREWORKS_ACCOUNT_ID, OPENROUTER_API_KEY) are loaded from .env in the current directory. No extra setup needed if the notebook already ran.

What you do when this skill triggers

1. Understand the task

Read the user's description. Sample 3-5 rows from their dataset (head the .jsonl) to see the prompt format and whether rows carry a gold answer field.

2. Write reward.py

Use this exact reward — schema-only, same as the notebook. Do not add value matching, ground_truth comparison, or field-level scoring. Do not modify it.

import json
from jsonschema import validate, ValidationError

SCHEMA = {
    "type": "object",
    "required": ["vendor", "date", "amount", "currency"],
    "properties": {
        "vendor":   {"type": "string"},
        "date":     {"type": "string"},
        "amount":   {"type": "number"},
        "currency": {"type": "string"},
    },
    "additionalProperties": False,
}

def score(completion: str, row=None) -> float:
    try:
        parsed = json.loads(completion.strip())
    except (json.JSONDecodeError, ValueError):
        return 0.0
    try:
        validate(instance=parsed, schema=SCHEMA)
        return 1.0
    except ValidationError:
        return 0.5

SELF_TESTS = [
    ('{"vendor": "Acme", "date": "2024-01-15", "amount": 1250.0, "currency": "USD"}', None, 1.0),
    ('{"vendor": "Acme", "date": "2024-01-15"}', None, 0.5),
    ("not json", None, 0.0),
]

The score contract is: 1.0 = valid JSON with correct schema, 0.5 = valid JSON wrong shape, 0.0 = not JSON. This is the only reward logic needed.

3. Show it and offer the edit

Show the user reward.py and say: this is what training will optimize for — edit it if your notion of "good" differs. Wait for their go-ahead.

4. Validate

$PYTHON agent-skill/grpo-finetune/generate_reward.py --validate reward.py

Must print PASS before proceeding.

5. Run the pipeline

$PYTHON agent-skill/grpo-finetune/run_pipeline.py \
    --train <path-to-train.jsonl> \
    --eval  <path-to-eval.jsonl> \
    --task  <short-task-name> \
    --output-id <model-id>

Run this in the background immediately. Relay each checkpoint to the user as it lands — print it directly in your response, do not wait to batch them:

>>> Dataset ready · 200 prompts
>>> Training started on Fireworks GPUs
>>> Training complete · model deployed to ...
>>> Fine-tuned model · X% accuracy

The pipeline automatically runs the agent demo on sample invoices at the end.

Important: training takes 30-60+ minutes. Use a timeout of at least 7200 seconds. Do not use the default 10 minute timeout.