Execute qualquer Skill no Manus
com um clique

Execute qualquer Skill no Manus com um clique

$pwd:

cleanup-tasks

Name: Cleanup Tasks
Author: PrimeIntellect-ai

// Diagnose and repair-or-remove broken tasks under environments/general_agent/tasks. Use after `general-agent validate` reports [FAIL] entries (post-synthesis, after a refactor, or on a periodic sweep).

Executar no Manus

$ git log --oneline --stat

stars:54

forks:18

updated:30 de abril de 2026 às 00:22

Explorador de arquivos

2 arquivos

SKILL.md

readonly

name	cleanup-tasks
description	Diagnose and repair-or-remove broken tasks under environments/general_agent/tasks. Use after `general-agent validate` reports [FAIL] entries (post-synthesis, after a refactor, or on a periodic sweep).

Clean up broken tasks

general-agent validate catches every task whose gold.json can't be replayed on its db.json, or whose verify(gold_db) != 1.0. Most failures are shallow data bugs that a targeted fix can salvage. A minority need deeper rewrites — delete those and move on.

Workflow

# 1. Snapshot the current failures
uv run general-agent validate 2>&1 | grep '\[FAIL\]' > /tmp/fails.txt
uv run general-agent validate --fail-only > /tmp/fails_only.txt
wc -l /tmp/fails_only.txt

# 2. Group by error class (most common → rarest)
grep '\[FAIL\]' /tmp/fails.txt \
  | awk -F'— \\[FAIL\\] ' '{print $2}' \
  | sort | uniq -c | sort -rn

# 3. Run the diagnose helper on the failing set — prints one line per task
#    with gold trace length, verify score, and the first missing ref (if any)
uv run python environments/general_agent/skills/cleanup-tasks/diagnose.py $(cat /tmp/fails_only.txt)

# 4. Triage: fix the shallow ones, delete the rest (see heuristics below)

# 5. Re-validate — should be empty
uv run general-agent validate --fail-only

Failure classes & fix heuristics

`gold replay error: Expecting property name enclosed in double quotes`

JSON syntax error in db.json. Always fixable. Run python3 -m json.tool < tasks/<name>/db.json; the tool points at the bad line. Usually a missing opening quote (id": "X" instead of "id": "X") or a stray comma.

`gold replay error: <Entity> <id> not found`

gold.json references an entity not in db.json. Three sub-cases:

Entity just missing — the task was synthesised against a richer db. Check whether adding a reasonable record closes the gap:
```
# Copy an existing entry as a template and add the missing id
db['fish'].append({"id": "fish-dwarf-gourami", **realistic_fields})
```
Good when ≤2 entities need to be added and the schema is well understood (one existing record covers it).
ID naming drift — the gold uses TEAM-PHX while db has TEAM-001..TEAM-016. Always delete; rewriting either side is a full rescope.
Chain ordering — the gold calls a tool that depends on a prior step that isn't in the trace (e.g. "Animal X's trainer must be booked in ring Y before assigning the animal"). Delete unless the missing step is obvious and cheap.

`gold replay error: Unknown tool: <name>`

Gold was written against an older/different tools.py. Delete unless the missing tool is a rename of something still present (rare).

`gold solution did not change DB`

Gold is a no-op. Always a task-authoring bug. Delete.

`verify(gold_db) = X, expected 1.0`

The gold replays cleanly but the task's own verify() function rejects the resulting state.

Fractional score (0.25/0.5/0.75) — a list-valued target has a stale entry. Typical fix: shrink the target list to match what the gold actually processes. Example: embassy_t4 had target_application_ids = ['VA-001','VA-002','VA-003','VA-004'] but the instruction + gold only cover three applicants; remove VA-004.
Exactly 0.0 with an obvious numeric constraint — e.g. "total 35 > budget 30". Raise the budget (or lower the price) so the gold's chosen items fit. Check the instruction to make sure the numeric change is faithful to it.
Exactly 0.0 with no shallow explanation — delete. These require working through the verify logic step-by-step against the gold chain; rarely worth the time unless the task is load-bearing for something downstream.

`[WARN] missing verify()`

Not counted as failing by default, but these tasks can only be scored via DB-hash match (which requires the agent to replay the gold exactly). Either add a verify() to tools.py or accept the limitation — they won't block --fail-only.

Rule of thumb

≤5 minutes to a clean validate → fix.
>5 minutes or unclear root cause → delete, commit, move on.

Task definitions are recoverable via git log -- environments/general_agent/tasks/<name> — deletion is not permanent if someone later wants to repair one.

Quick-win fixes I've actually used

Symptom	Fix
Bad quote in db.json	Edit the one line
≤2 entity ids missing	Append realistic records to db.json using an existing one as template
Stale target id list	Trim to match what gold covers (usually matches the instruction)
Budget < gold trace total	Bump budget in db.json

Bulk deletion

After triage, if you want to drop whole families when any tier fails:

uv run general-agent validate --fail-only \
  | awk -F'_t[0-9]+' '{print $1}' | sort -u \
  | while read fam; do
      rm -rf environments/general_agent/tasks/"$fam"_t*
    done

But this is aggressive — typical rate is ~3.5× the tasks lost vs. deleting only the failing tiers.

What not to do

Don't "fix" by silencing an error in the rubric — real failures should keep surfacing. The rubric already swallows them cleanly (see DBAssertRubric.db_hash / verify) and logs a warning per bad task.
Don't delete from ~/.cache/general-agent/rlm-skills/<task_name>/ to force regeneration — that cache is task-tool-agnostic and doesn't need invalidation when you edit db.json.

related-skills.json

mesmo repositório

env-sync-push.md

from "PrimeIntellect-ai/research-environments"

Push all local environments to the Prime Intellect Environments Hub. Use when environments are out of sync and need to be published.

2026-05-1954

websearch.md

from "PrimeIntellect-ai/research-environments"

Search the web via the Exa API. Accepts up to 10 queries in parallel. Returns titles, URLs, and highlighted snippets from each result.

2026-05-1154

websearch.md

from "PrimeIntellect-ai/research-environments"

Search Google via the Serper API. Accepts up to 10 queries in parallel. Returns titles, URLs, snippets, and knowledge-graph data.

2026-05-1154

edit.md

from "PrimeIntellect-ai/research-environments"

Replace a unique string in a file. old_str must appear exactly once in the file.

2026-04-3054

synthesize-task.md

from "PrimeIntellect-ai/research-environments"

Synthesize a new general-agent task family from a seed task, evolving through difficulty tiers with empirical pass-rate gating. Use when asked to create new tasks or grow the task set.

2026-04-3054

open-webpage.md

from "PrimeIntellect-ai/research-environments"

Fetch a webpage via Exa and return an LLM-generated summary. Accepts an optional query that steers what the summary focuses on.

2026-04-2454

package.json

"author": "PrimeIntellect-ai"

"repository": "PrimeIntellect-ai/research-environments"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Desenvolvedores de softwareInformática e Matemática15-1252L4

name	cleanup-tasks
description	Diagnose and repair-or-remove broken tasks under environments/general_agent/tasks. Use after `general-agent validate` reports [FAIL] entries (post-synthesis, after a refactor, or on a periodic sweep).

Clean up broken tasks

Workflow

# 1. Snapshot the current failures
uv run general-agent validate 2>&1 | grep '\[FAIL\]' > /tmp/fails.txt
uv run general-agent validate --fail-only > /tmp/fails_only.txt
wc -l /tmp/fails_only.txt

# 2. Group by error class (most common → rarest)
grep '\[FAIL\]' /tmp/fails.txt \
  | awk -F'— \\[FAIL\\] ' '{print $2}' \
  | sort | uniq -c | sort -rn

# 3. Run the diagnose helper on the failing set — prints one line per task
#    with gold trace length, verify score, and the first missing ref (if any)
uv run python environments/general_agent/skills/cleanup-tasks/diagnose.py $(cat /tmp/fails_only.txt)

# 4. Triage: fix the shallow ones, delete the rest (see heuristics below)

# 5. Re-validate — should be empty
uv run general-agent validate --fail-only

Failure classes & fix heuristics

`gold replay error: Expecting property name enclosed in double quotes`

`gold replay error: <Entity> <id> not found`

gold.json references an entity not in db.json. Three sub-cases:

Entity just missing — the task was synthesised against a richer db. Check whether adding a reasonable record closes the gap:
```
# Copy an existing entry as a template and add the missing id
db['fish'].append({"id": "fish-dwarf-gourami", **realistic_fields})
```
Good when ≤2 entities need to be added and the schema is well understood (one existing record covers it).
ID naming drift — the gold uses TEAM-PHX while db has TEAM-001..TEAM-016. Always delete; rewriting either side is a full rescope.
Chain ordering — the gold calls a tool that depends on a prior step that isn't in the trace (e.g. "Animal X's trainer must be booked in ring Y before assigning the animal"). Delete unless the missing step is obvious and cheap.

`gold replay error: Unknown tool: <name>`

Gold was written against an older/different tools.py. Delete unless the missing tool is a rename of something still present (rare).

`gold solution did not change DB`

Gold is a no-op. Always a task-authoring bug. Delete.

`verify(gold_db) = X, expected 1.0`

The gold replays cleanly but the task's own verify() function rejects the resulting state.

Fractional score (0.25/0.5/0.75) — a list-valued target has a stale entry. Typical fix: shrink the target list to match what the gold actually processes. Example: embassy_t4 had target_application_ids = ['VA-001','VA-002','VA-003','VA-004'] but the instruction + gold only cover three applicants; remove VA-004.
Exactly 0.0 with an obvious numeric constraint — e.g. "total 35 > budget 30". Raise the budget (or lower the price) so the gold's chosen items fit. Check the instruction to make sure the numeric change is faithful to it.
Exactly 0.0 with no shallow explanation — delete. These require working through the verify logic step-by-step against the gold chain; rarely worth the time unless the task is load-bearing for something downstream.

`[WARN] missing verify()`

Rule of thumb

≤5 minutes to a clean validate → fix.
>5 minutes or unclear root cause → delete, commit, move on.

Task definitions are recoverable via git log -- environments/general_agent/tasks/<name> — deletion is not permanent if someone later wants to repair one.

Quick-win fixes I've actually used

Symptom	Fix
Bad quote in db.json	Edit the one line
≤2 entity ids missing	Append realistic records to db.json using an existing one as template
Stale target id list	Trim to match what gold covers (usually matches the instruction)
Budget < gold trace total	Bump budget in db.json

Bulk deletion

After triage, if you want to drop whole families when any tier fails:

uv run general-agent validate --fail-only \
  | awk -F'_t[0-9]+' '{print $1}' | sort -u \
  | while read fam; do
      rm -rf environments/general_agent/tasks/"$fam"_t*
    done

But this is aggressive — typical rate is ~3.5× the tasks lost vs. deleting only the failing tiers.

What not to do

Don't "fix" by silencing an error in the rubric — real failures should keep surfacing. The rubric already swallows them cleanly (see DBAssertRubric.db_hash / verify) and logs a warning per bad task.
Don't delete from ~/.cache/general-agent/rlm-skills/<task_name>/ to force regeneration — that cache is task-tool-agnostic and doesn't need invalidation when you edit db.json.

cleanup-tasks

Clean up broken tasks

Workflow

Failure classes & fix heuristics

gold replay error: Expecting property name enclosed in double quotes

gold replay error: <Entity> <id> not found

gold replay error: Unknown tool: <name>

gold solution did not change DB

verify(gold_db) = X, expected 1.0

[WARN] missing verify()

Rule of thumb

Quick-win fixes I've actually used

Bulk deletion

What not to do

Mais deste repositório

Mais deste repositório

Clean up broken tasks

Workflow

Failure classes & fix heuristics

gold replay error: Expecting property name enclosed in double quotes

gold replay error: <Entity> <id> not found

gold replay error: Unknown tool: <name>

gold solution did not change DB

verify(gold_db) = X, expected 1.0

[WARN] missing verify()

Rule of thumb

Quick-win fixes I've actually used

Bulk deletion

What not to do

`gold replay error: Expecting property name enclosed in double quotes`

`gold replay error: <Entity> <id> not found`

`gold replay error: Unknown tool: <name>`

`gold solution did not change DB`

`verify(gold_db) = X, expected 1.0`

`[WARN] missing verify()`

`gold replay error: Expecting property name enclosed in double quotes`

`gold replay error: <Entity> <id> not found`

`gold replay error: Unknown tool: <name>`

`gold solution did not change DB`

`verify(gold_db) = X, expected 1.0`

`[WARN] missing verify()`