Skip to main content
Run any Skill in Manus
with one click
$pwd:
hamelsmu
GitHub creator profile

hamelsmu

Repository-level view of 18 collected skills across 5 GitHub repositories, including approximate occupation coverage.

skills collected
18
repositories
5
occupation fields
2
updated
2026-04-17
occupation focus
Major fields detected across this creator.
repository explorer

Repositories and representative skills

#001
evals-skills
7 skills1.3k138updated 2026-03-03
39% of creator
build-review-interface
Web Developers

Build a custom browser-based annotation interface tailored to your data for reviewing LLM traces and collecting structured feedback. Use when you need to build an annotation tool, review traces, or collect human labels.

2026-03-03
error-analysis
Software Quality Assurance Analysts & Testers

Help the user systematically identify and categorize failure modes in an LLM pipeline by reading traces. Use when starting a new eval project, after significant pipeline changes (new features, model switches, prompt rewrites), when production metrics drop, or after incidents.

2026-03-03
eval-audit
Software Quality Assurance Analysts & Testers

Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).

2026-03-03
evaluate-rag
Data Scientists

Guides evaluation of RAG pipeline retrieval and generation quality. Use when evaluating a retrieval-augmented generation system, measuring retrieval quality, assessing generation faithfulness or relevance, generating synthetic QA pairs for retrieval testing, or optimizing chunking strategies.

2026-03-03
generate-synthetic-data
Data Scientists

Create diverse synthetic test inputs for LLM pipeline evaluation using dimension-based tuple generation. Use when bootstrapping an eval dataset, when real user data is sparse, or when stress-testing specific failure hypotheses. Do NOT use when you already have 100+ representative real traces (use stratified sampling instead), or when the task is collecting production logs.

2026-03-03
validate-evaluator
Software Quality Assurance Analysts & Testers

Calibrate an LLM judge against human labels using data splits, TPR/TNR, and bias correction. Use after writing a judge prompt (write-judge-prompt) when you need to verify alignment before trusting its outputs. Do NOT use for code-based evaluators (those are deterministic; test with standard unit tests).

2026-03-03
write-judge-prompt
Software Developers

Design LLM-as-Judge evaluators for subjective criteria that code-based checks cannot handle. Use when a failure mode requires interpretation (tone, faithfulness, relevance, completeness). Do NOT use when the failure mode can be checked with code (regex, schema validation, execution tests). Do NOT use when you need to validate or calibrate the judge — use validate-evaluator instead.

2026-03-03
#002
hamel
5 skills575updated 2026-04-17
28% of creator
#003
hamelnb
3 skills441updated 2026-03-09
17% of creator
#004
website-to-api
2 skills80updated 2026-04-11
11% of creator
Showing 5 of 5 repositories
All repositories loaded