一键导入
terminal-agent-improvement-loop
// Run a benchmark-driven improvement loop for Con's terminal agent. Use when iterating on pane awareness, SSH/tmux behavior, coding-cli flows, benchmark scoring, or progress tracking across many runs.
// Run a benchmark-driven improvement loop for Con's terminal agent. Use when iterating on pane awareness, SSH/tmux behavior, coding-cli flows, benchmark scoring, or progress tracking across many runs.
| name | terminal-agent-improvement-loop |
| description | Run a benchmark-driven improvement loop for Con's terminal agent. Use when iterating on pane awareness, SSH/tmux behavior, coding-cli flows, benchmark scoring, or progress tracking across many runs. |
Use this skill when improving Con as a terminal-native agent, not just fixing a one-off bug.
Primary references:
benchmarks/terminal-agent/README.mddocs/impl/terminal-agent-benchmark.mddocs/impl/terminal-agent-improvement-loop.mdpython3 benchmarks/terminal-agent/run.py --profile operator-local-codex-devloop --suite operatorpython3 benchmarks/terminal-agent/iterate.py ...python3 benchmarks/terminal-agent/score.py --profile ... --record ... --score ...python3 benchmarks/terminal-agent/judge_llm.py --profile ... --record ... --socket /tmp/con.sockpython3 benchmarks/terminal-agent/score.py --profile ... --record ... --judge-file ...python3 benchmarks/terminal-agent/log_iteration.py --scorecard ... --change "..."python3 benchmarks/terminal-agent/report.pystrict suites to protect the floor and operator suites to judge real workflows.docs/impl/terminal-agent-improvement-log.md useful to a human reader; it should explain what changed, not just repeat the numeric score.Validate Con's local socket control plane against a real running app session, and write/run con-test integration tests. Use when testing con-cli, the Unix socket API, pane control, tmux control, in-session agent calls, or when writing E2E test cases in crates/con-test/testdata/.
Maintain Con's CHANGELOG.md and release notes. Use when updating changelog entries, preparing a beta/dev release, reviewing PR release-note coverage, or ensuring contributor PR credit is present.
Run and maintain Con's terminal-agent benchmark against a live app session. Use when validating con-cli, SSH workspace reuse, tmux awareness, agent-target preparation, or when collecting benchmark evidence for regressions and release notes.