| name | grpo-finetune |
| description | Fine-tune a model with GRPO on Fireworks-managed GPUs from a plain-English task description and a dataset. Use this skill whenever the user wants to fine-tune, RL-tune, or GRPO-train a model on their own data — or says things like "train a model to extract/classify/score X", "fine-tune on this dataset", "set up a GRPO run", or describes a task plus a dataset plus a notion of what a good output looks like. Trigger even when the user does not name GRPO or Fireworks explicitly.
|
GRPO Fine-Tune Skill
Keys (FIREWORKS_API_KEY, FIREWORKS_ACCOUNT_ID, OPENROUTER_API_KEY) are
loaded from .env in the current directory. No extra setup needed if the
notebook already ran.
What you do when this skill triggers
1. Understand the task
Read the user's description. Sample 3-5 rows from their dataset (head the
.jsonl) to see the prompt format and whether rows carry a gold answer field.
2. Write reward.py
Use this exact reward — schema-only, same as the notebook. Do not add value
matching, ground_truth comparison, or field-level scoring. Do not modify it.
import json
from jsonschema import validate, ValidationError
SCHEMA = {
"type": "object",
"required": ["vendor", "date", "amount", "currency"],
"properties": {
"vendor": {"type": "string"},
"date": {"type": "string"},
"amount": {"type": "number"},
"currency": {"type": "string"},
},
"additionalProperties": False,
}
def score(completion: str, row=None) -> float:
try:
parsed = json.loads(completion.strip())
except (json.JSONDecodeError, ValueError):
return 0.0
try:
validate(instance=parsed, schema=SCHEMA)
return 1.0
except ValidationError:
return 0.5
SELF_TESTS = [
('{"vendor": "Acme", "date": "2024-01-15", "amount": 1250.0, "currency": "USD"}', None, 1.0),
('{"vendor": "Acme", "date": "2024-01-15"}', None, 0.5),
("not json", None, 0.0),
]
The score contract is: 1.0 = valid JSON with correct schema, 0.5 = valid JSON
wrong shape, 0.0 = not JSON. This is the only reward logic needed.
3. Show it and offer the edit
Show the user reward.py and say: this is what training will optimize for —
edit it if your notion of "good" differs. Wait for their go-ahead.
4. Validate
$PYTHON agent-skill/grpo-finetune/generate_reward.py --validate reward.py
Must print PASS before proceeding.
5. Run the pipeline
$PYTHON agent-skill/grpo-finetune/run_pipeline.py \
--train <path-to-train.jsonl> \
--eval <path-to-eval.jsonl> \
--task <short-task-name> \
--output-id <model-id>
Run this in the background immediately. Relay each checkpoint to the user as it
lands — print it directly in your response, do not wait to batch them:
>>> Dataset ready · 200 prompts
>>> Training started on Fireworks GPUs
>>> Training complete · model deployed to ...
>>> Fine-tuned model · X% accuracy
The pipeline automatically runs the agent demo on sample invoices at the end.
Important: training takes 30-60+ minutes. Use a timeout of at least 7200
seconds. Do not use the default 10 minute timeout.