| name | dataclaw |
| description | Export Claude Code, Codex, and other coding-agent conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset.
|
| allowed-tools | Bash(dataclaw *), Bash(hf auth login *), Bash(pip install dataclaw*), Bash(grep *) |
DataClaw Skill
THE RULE
Every dataclaw command outputs next_steps. FOLLOW THEM.
Do not memorize the flow. Do not skip steps. Do not improvise.
Run the command -> read the output -> follow next_steps. That's it.
Runtime guidance follows this checklist:
- Install
- Install skill
- Prep
3A. Choose source scope
3B. Choose project scope
3C. Set redacted strings
- Export locally
- Review and confirm
- Publish
The CLI tracks your stage as 1-4: auth -> configure -> review -> done.
dataclaw export (push) is gated - you must run dataclaw confirm first or it will refuse.
Getting Started
Run dataclaw status (or dataclaw prep for full details) and follow the next_steps.
Output Format
dataclaw prep, dataclaw config, dataclaw status, and dataclaw confirm output pure JSON
dataclaw export outputs human-readable text followed by ---DATACLAW_JSON--- and a JSON block
- Always parse the JSON and act on
next_steps
Key fields:
stage / stage_number / total_stages - where you are
next_steps - follow these in order
next_command - the single most important command to run next (null if user input needed first)
PII Audit (Stage 5)
After dataclaw export --no-push, follow the next_steps in the JSON output. The flow is:
- Ask the user their full name - then grep the export for it
- Run the pii_commands from the JSON output and review results with the user
- Ask the user what else to look for - company names, client names, private URLs, other people's names, custom domains
- Deep manual scan - sample ~20 sessions (beginning, middle, end) and look for anything sensitive the regex missed
- Fix and re-export if anything found:
dataclaw config --redact "string" then dataclaw export --no-push
- Run
dataclaw confirm with text attestations - pass --full-name, --attest-full-name, --attest-sensitive, and --attest-manual-scan. It runs PII scan, verifies attestations, shows project breakdown, and unlocks pushing.
- Push only after explicit user confirmation:
dataclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."
Commands Reference
dataclaw status
dataclaw prep
dataclaw prep --source all
dataclaw prep --source claude
dataclaw confirm --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..."
dataclaw confirm --file /path/to/file.jsonl --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..."
dataclaw list
dataclaw list --source all
dataclaw list --source claude
dataclaw config
dataclaw config --repo user/my-dataset
dataclaw config --source all
dataclaw config --exclude "a,b"
dataclaw config --redact "str1,str2"
dataclaw config --redact-usernames "u1,u2"
dataclaw config --confirm-projects
dataclaw export --publish-attestation "..."
dataclaw export --no-push
dataclaw export --source all --no-push
dataclaw export --source claude --no-push
dataclaw export --all-projects
dataclaw export --no-thinking
dataclaw export -o /path/to/file.jsonl
dataclaw update-skill claude
Gotchas
- Never run bare
hf auth login - it's interactive and will hang. Always use --token.
--exclude, --redact, --redact-usernames APPEND - they never overwrite. Safe to call repeatedly.
- Source selection is REQUIRED before export - explicitly set
dataclaw config --source <source|all> (for example claude or codex), or pass --source ... on export.
dataclaw prep outputs pure JSON - parse it directly.
- Always export with
--no-push first - review before publishing.
dataclaw export (push) reuses the exact file reviewed by dataclaw confirm - if that file is missing, re-export locally and re-confirm before pushing.
dataclaw export (push) requires dataclaw confirm first - it will refuse otherwise. Re-exporting with --no-push resets this.
- PII audit is critical - automated redaction is not foolproof.
- Large exports take time - 500+ sessions may take 1-3 minutes. Use a generous timeout.
Prerequisite
command -v dataclaw >/dev/null 2>&1 && echo "dataclaw: installed" || echo "NOT INSTALLED - run: pip install dataclaw"