Execute qualquer Skill no Manus
com um clique

Execute qualquer Skill no Manus com um clique

$pwd:

llm-council

Name: Llm Council
Author: vishalsachdev

// Convene a 3-model council (Claude + GPT via codex CLI + Gemini CLI) on a high-stakes decision. Forces cross-critique between members and surfaces where they actually disagree, breaking Claude's default agreeableness. Use when the user asks to "convene a council", "get a second opinion", "ask GPT and Gemini", "what would other models say", or has an architecture / strategy / hiring / pricing decision where being wrong is expensive. Skip for factual questions, code with one right answer, or anything premortem-shaped (route to premortem skill instead).

Executar no Manus

$ git log --oneline --stat

stars:4

forks:1

updated:2 de maio de 2026 às 19:32

SKILL.md

readonly

related-skills.json

mesmo repositório

agentic-eval-first-development.md

from "vishalsachdev/claude-code-skills"

Architect, execute, and iterate on AI evaluations using the Data-Task-Score framework. Treats evals as the modern, quantifiable version of a PRD. Use when the user asks to "build an eval," "improve model quality," "test an agent workflow," "quantify product intuition," "move beyond vibe checks," "measure AI output," "score LLM responses," "benchmark a prompt," or "set up evaluation infrastructure." Also triggers on phrases like "how do I know if this is working," "is the model getting better," or "eval-driven development."

2026-05-024

codebase-singularity.md

from "vishalsachdev/claude-code-skills"

Apply the codebase singularity approach: reliable codebase understanding and change with a repeatable workflow, guardrails, and verification gates. Use for repo work (feature, bugfix, refactor, migration) when you need high trust, minimal diffs, and explicit validation and exit criteria.

2026-05-024

start-session.md

from "vishalsachdev/claude-code-skills"

Use when user says "let's get started", "where are we", or at beginning of a session. Reads project context from CLAUDE.md, checks git status and recent commits, and provides orientation for the session. Works across all repo types (code, research, mixed).

2026-05-024

tweet-series-extractor.md

from "vishalsachdev/claude-code-skills"

Extract tweet series from X/Twitter profiles with full content, links, and engagement metrics. Use when: (1) User provides a tweet URL and wants similar tweets by the same author, (2) User asks to "extract tweets", "collect tweet series", or "scrape tweets", (3) User wants to analyze a user's recurring tweet format (e.g., daily updates, weekly roundups), (4) User needs tweet content with embedded links preserved for analysis. Requires Chrome browser automation tools (mcp__claude-in-chrome__*).

2026-05-024

wrap-up-session.md

from "vishalsachdev/claude-code-skills"

Use when user says "let's wrap up", "close shop", "done for today", or wants to end a session. Handles session wrap-up including git operations, documentation updates, roadmap updates, and preparing for next session. Works across all repo types.

2026-05-024

premortem.md

from "vishalsachdev/claude-code-skills"

Run a premortem on a plan, launch, product, hire, strategy, or decision — assumes it failed 6 months from now and works backward to find every reason why, then produces a revised plan. Use when the user has a concrete plan or commitment with high cost-of-being-wrong and asks to "premortem", "stress test", "find blind spots", "poke holes", "what could kill this", or "what am I missing". Skip for vague ideas without a plan, simple feedback requests, factual questions, or already-irreversible decisions.

2026-05-024

package.json

"author": "vishalsachdev"

"repository": "vishalsachdev/claude-code-skills"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Desenvolvedores de softwareInformática e Matemática15-1252L4

name	llm-council
description	Convene a 3-model council (Claude + GPT via codex CLI + Gemini CLI) on a high-stakes decision. Forces cross-critique between members and surfaces where they actually disagree, breaking Claude's default agreeableness. Use when the user asks to "convene a council", "get a second opinion", "ask GPT and Gemini", "what would other models say", or has an architecture / strategy / hiring / pricing decision where being wrong is expensive. Skip for factual questions, code with one right answer, or anything premortem-shaped (route to premortem skill instead).
allowed-tools	Read, Glob, Grep, Write, Bash

LLM Council

Three frontier models, one decision. Output is not three answers side-by-side — it's a structured map of agreement, disagreement, and which disagreements matter for this user's call.

Members

Member	Invocation	Bias to expect
Claude (this session)	direct reasoning	Pragmatic, agreeable, code-rooted
GPT-5.5	`codex exec`	Skeptical, hedge-prone, edge-case sensitive
Gemini	`gemini -p`	Structural, taxonomical, categorization-heavy

When to run

Good fit:

Architecture/design with long-lived consequences
Strategy calls (pricing, positioning, hiring) where Claude's agreeableness is a real risk
"Am I thinking about this right?" — challenge the framing, not the plan
Any decision the user explicitly flags as high-stakes

Bad fit:

Factual lookups
Code with one right answer
Premortem-shaped questions ("what could go wrong") — route to premortem
Speed > depth

CLI invocations (verified)

Codex (GPT-5.5):

TMPF=$(mktemp)
codex exec --skip-git-repo-check -o "$TMPF" "<prompt>"
ANSWER=$(cat "$TMPF") && rm "$TMPF"

--skip-git-repo-check is required outside trusted dirs
-o <file> writes just the final message (avoids verbose preamble in stdout)

Gemini:

gemini -p "<prompt>"

Outputs the answer directly to stdout
Add -m <model> to pin a model

Run in parallel via two simultaneous Bash tool calls in a single message. Don't sequence them.

Process (3 rounds)

Round 1 — Independent answers

Send the question to all three members in parallel. None see the others' answers.

Each member gets the same prompt: the user's question + relevant context + this directive:

"Answer in under 250 words. Be direct. State your strongest position, not a hedged one. End with the single biggest risk you see in your own answer."

The "biggest risk in your own answer" line is load-bearing — it primes the model to flag its own weakness, which makes round 2 more productive.

Round 2 — Cross-critique (anonymized)

Send each member the other two answers, with peer identities anonymized as "Model A" and "Model B". Each model gets a different random A/B mapping so it can't infer who's who across runs. The model never learns which family produced which answer.

Why anonymize: prevents brand bias — deferring to known-strong models, attacking known-weak ones, or refusing to disagree with one's own family. Without this, R2 critiques drift toward politics instead of substance. Credit: Karpathy's llm-council.

Prompt:

"Two other models answered the same question. Their identities are anonymized.

Model A: ... Model B: ...

Where are they wrong? Where are they right and your original answer was wrong? Be specific. Don't be polite. Under 200 words."

In the transcript, record the anonymization mapping so you can de-anonymize for the synthesis step. The user-facing report uses real names; only the cross-critique itself is blind.

This is the round that produces the value. Without it the skill is theatre.

Round 3 — Synthesis (Claude, this session)

Read all 6 outputs (3 answers + 3 critiques) and produce:

Where the council agrees — 1-3 points. If they all converge, say so plainly. Do not manufacture disagreement.
Where the council disagrees — the actual deltas. Each side's strongest argument in 1-2 sentences.
Which disagreement matters most — for this user's specific decision, which delta should drive the call?
Recommendation — Claude's call. Explicitly name which member you're siding with on the load-bearing disagreement and why.

Self-bias check (mandatory before finalizing): Claude is both a member (R1 answer) and the chairman (this synthesis). That's a structural conflict — Claude will systematically over-weight its own R1 because (a) it has session context the others lack, and (b) it "feels right" to itself. Before finalizing the recommendation, ask: am I siding with my own R1 answer because it's actually better, or because I wrote it? If the only reason it's winning is "I have more context," that's not a real reason — GPT and Gemini may have caught a blind spot Claude doesn't see. State the bias check explicitly in the synthesis output (one sentence) so the user can audit it.

Output

Chat: the 4-section synthesis. Concise. End with a clear recommendation.

File (only if the question is consequential or the user asks): council-YYYYMMDD-HHMMSS.md in CWD with full transcript — the question, R1 answers, R2 critiques, R3 synthesis. Use date +%Y%m%d-%H%M%S for the timestamp.

Context passing

By default the CLIs don't see this conversation, but you control what goes in the prompt. Context is transferable — it just costs tokens. Don't treat it as a hard constraint.

Three mechanisms, in order of cost:

Point the CLI at files on disk. Both codex exec and gemini -p run in a workdir and can read files themselves. Cheapest. Use this when the relevant context is already in files (a CLAUDE.md, a brief, a code file).
Paste excerpts inline. When context lives in the conversation (decisions made, things ruled out, user's revealed preferences), paste the relevant chunks into the prompt. Medium cost.
Summarize the conversation. For long sessions where pasting verbatim is too expensive, write a 1-paragraph brief: what's been decided, what's been rejected, what the user actually cares about. Lossy but tractable.

Build a self-contained prompt for each member that includes:

The question itself
Relevant context (pasted, summarized, or via file pointer)
Specific constraints (budget, timeline, audience, existing tech) — explicit, not assumed
Response length cap (250 words R1, 200 R2)

Rule of thumb: the question's cost of being wrong sets the context budget. A $50 decision doesn't justify packaging 5KB of context for three CLI calls. A $50K decision does.

What's genuinely hard to transfer: implicit reasoning the conversation built up, dead ends already explored, the user's tone. Summarize these explicitly when they matter — don't leave them implicit.

Chairman selection (advanced)

By default Claude (this session) chairs because it has the richest session context with no transfer cost. But Claude is also a member, and that's a structural conflict — Claude will systematically over-weight its own R1 answer in synthesis.

If the self-bias check (Round 3) keeps surfacing real over-weighting, rotate the chairman:

Package the full context (R1 answers, R2 critiques, original question, relevant brief) into a single prompt
Send to Codex or Gemini with directive: "You are the chairman. Synthesize the council's findings into the 4-section output (agreement / disagreement / what matters most / recommendation). Be direct."
Use the rotated synthesis as the basis for the user-facing report; Claude still adds final commentary

Rotating chairman costs more tokens (full context goes to a cold CLI) but eliminates the member-as-chairman conflict for high-stakes decisions where you've caught Claude's bias before.

Failure handling

One CLI fails: proceed with two members. Note the missing voice in synthesis. Don't retry more than once.
All three converge: say so plainly. "The council agrees on X. Recommendation: do X." Manufactured drama is worse than agreement.
Council disagrees with the user's framing entirely: flag this explicitly. Sometimes the right output is "all three models think you're solving the wrong problem."
Timeout: wrap each call in a 90s timeout. Members are independent; one slow call shouldn't block synthesis.

Anti-patterns

Showing raw R1/R2 output without synthesis — the synthesis is the product
Weighting all three equally on every topic without packaging context for the others — if you didn't give them what Claude has, their answers will be cold and generic. Either package context, or weight accordingly
Burying the recommendation in hedged language — end with a clear call
Running the council on simple questions — hard-gate this skill
Skipping R2 to save tokens — R2 is the value; without it just ask Claude

Compose with premortem

If the council's recommendation is a concrete plan with steps (not just a yes/no or a framing), end with one line: "Want me to run a premortem on this plan to surface failure modes before you commit?" Skip if the recommendation is reversible/cheap or the user signaled closure.

Smoke test

Before using on a real decision, verify both CLIs work:

gemini -p "say hi in 5 words"
codex exec --skip-git-repo-check -o /tmp/codex-test "say hi in 5 words" && cat /tmp/codex-test

Both should return one short sentence.

llm-council

Mais deste repositório

Mais deste repositório

LLM Council

Members

When to run

CLI invocations (verified)

Process (3 rounds)

Round 1 — Independent answers

Round 2 — Cross-critique (anonymized)

Round 3 — Synthesis (Claude, this session)

Output

Context passing

Chairman selection (advanced)

Failure handling

Anti-patterns

Compose with premortem

Smoke test

LLM Council

Members

When to run

CLI invocations (verified)

Process (3 rounds)

Round 1 — Independent answers

Round 2 — Cross-critique (anonymized)

Round 3 — Synthesis (Claude, this session)

Output

Context passing

Chairman selection (advanced)

Failure handling

Anti-patterns

Compose with premortem

Smoke test