| name | codex-peer-review |
| description | [CLAUDE CODE ONLY] Leverage Codex CLI for AI peer review, second opinions on architecture and design decisions, cross-validation of implementations, security analysis, and alternative approach generation. Requires terminal access to execute Codex CLI commands. Use when making high-stakes decisions, reviewing complex architecture, or when explicitly requested for a second AI perspective. Must be explicitly invoked using skill syntax. |
Codex Peer Review Skill
🖥️ Claude Code Only - Requires terminal access to execute Codex CLI commands.
Enable Claude Code to leverage OpenAI's Codex CLI for collaborative AI reasoning, peer review, and multi-perspective analysis of code architecture, design decisions, and implementations.
Core Philosophy
Two AI perspectives are better than one for high-stakes decisions.
This skill enables strategic collaboration between Claude Code (Anthropic) and Codex CLI (OpenAI) for:
- Architecture validation and critique
- Design decision cross-validation
- Alternative approach generation
- Security, performance, and testing analysis
- Learning from different AI reasoning patterns
Not a replacement—a second opinion.
When to Use Codex Peer Review
High-Value Scenarios
DO use when:
- Making high-stakes architecture decisions
- Choosing between significant design alternatives
- Reviewing security-critical code
- Validating complex refactoring plans
- Exploring unfamiliar domains or patterns
- User explicitly requests second opinion
- Significant disagreement about approach
- Performance-critical optimization decisions
- Testing strategy validation
DON'T use when:
- Simple, straightforward implementations
- Already confident in singular approach
- Time-sensitive quick fixes
- No significant trade-offs exist
- Low-impact tactical changes
- Codex CLI is not available/installed
How to Invoke This Skill
Important: This skill requires explicit invocation. It is not automatically triggered by natural language.
To use this skill, Claude must explicitly invoke it using:
skill: "codex-peer-review"
User phrases that indicate this skill would be valuable:
- "Get a second opinion on..."
- "What would Codex think about..."
- "Review this architecture with Codex"
- "Use Codex to validate this approach"
- "Are there better alternatives to..."
- "Get Codex peer review for this"
- "Security review with Codex needed"
- "Ask Codex about this design"
When these phrases appear, Claude should suggest using this skill and invoke it explicitly if appropriate.
Codex vs Gemini: Which Peer Review Skill?
Both Codex and Gemini peer review skills provide valuable second opinions, but excel in different scenarios.
Use Codex Peer Review when:
- Code size < 500 LOC (focused reviews)
- Need precise, line-level bug detection
- Want fast analysis with concise output
- Reviewing single modules or functions
- Need tactical implementation feedback
- Performance bottleneck identification (specific issues)
- Quick validation of design decisions
Use Gemini Peer Review when:
- Code size > 5k LOC (large codebase analysis)
- Need full codebase context (up to 1M tokens)
- Reviewing architecture across multiple modules
- Analyzing diagrams + code together (multimodal)
- Want research-grounded recommendations (current best practices)
- Cross-module security analysis (attack surface mapping)
- Systemic performance patterns
- Design consistency checking
For mid-range codebases (500-5k LOC):
- Use Codex if: Focused review, single module, speed priority, specific bugs
- Use Gemini if: Cross-module patterns, holistic view, diagram analysis, research grounding
- Consider Both for: Critical decisions requiring maximum confidence
For maximum value on high-stakes decisions: Use both skills sequentially and apply synthesis framework (see references/synthesis-framework.md).
Core Workflow
1. Recognize Need for Peer Review
Assess if peer review adds value:
Questions to consider:
- Is this a high-stakes decision with significant impact?
- Are there multiple valid approaches to consider?
- Is the architecture complex or unfamiliar?
- Does this involve security, performance, or scalability concerns?
- Has the user explicitly requested a second opinion?
- Would different AI reasoning perspectives help?
If yes to 2+ questions: Proceed with peer review workflow
2. Prepare Context for Codex
Extract and structure relevant information:
Load references/context-preparation.md for detailed guidance on:
- What code/files to include
- How to frame questions effectively
- Context boundaries (what to include/exclude)
- Expectation setting for output format
Key preparation steps:
- Identify core question: What specifically do we want Codex to review?
- Extract relevant code: Include necessary files, not entire codebase
- Provide context: Project type, constraints, requirements, concerns
- Frame clearly: Specific questions, not vague requests
- Set expectations: What kind of response we need
Context structure template:
[CONTEXT]
Project: [type, purpose]
Current situation: [what exists]
Constraints: [technical, business, time]
[CODE/ARCHITECTURE]
[relevant code or architecture description]
[QUESTION]
[specific question or review request]
[EXPECTED OUTPUT]
[format: analysis, alternatives, recommendations, etc.]
3. Invoke Codex CLI
Execute appropriate Codex command:
Load references/codex-commands.md for complete command reference.
Common patterns:
Non-interactive review (recommended):
codex exec "[prepared context and question]"
Architecture review with diagram:
codex --image architecture-diagram.png "Analyze this architecture: [question]"
Security-focused review:
codex exec "Security review focus: [context and code]"
Full auto for implementation suggestions:
codex --full-auto "Suggest improvements to: [context]"
Key flags:
--full-auto: Unattended mode with minimal prompts
--image / -i: Attach architecture diagrams or screenshots
exec: Non-interactive execution streaming to stdout
Error handling:
- If Codex CLI not installed, inform user and provide installation instructions
- If API limits reached, note limitation and proceed with Claude-only analysis
- If Codex returns unclear response, reformulate question and retry once
4. Synthesize Perspectives
Compare and integrate both AI perspectives:
Load references/synthesis-framework.md for detailed synthesis patterns.
Analysis framework:
-
Agreement Analysis
- Where do both perspectives align?
- What shared concerns exist?
- What validates confidence in approach?
-
Disagreement Analysis
- Where do perspectives diverge?
- Why might approaches differ?
- What assumptions differ?
-
Complementary Insights
- What does Codex see that Claude missed?
- What does Claude see that Codex missed?
- How do perspectives complement each other?
-
Trade-off Identification
- What trade-offs does each perspective reveal?
- Which concerns are prioritized differently?
- What constraints drive different conclusions?
-
Insight Extraction
- What are the key actionable insights?
- What alternatives emerge from both perspectives?
- What risks are highlighted by either perspective?
Synthesis output structure:
## Perspective Comparison
**Claude's Analysis:**
[key points from Claude's initial analysis]
**Codex's Analysis:**
[key points from Codex's review]
**Points of Agreement:**
- [shared insights]
**Points of Divergence:**
- [different perspectives and why]
**Complementary Insights:**
- [unique value from each perspective]
## Synthesis & Recommendations
[integrated analysis incorporating both perspectives]
**Recommended Approach:**
[action plan based on both perspectives]
**Rationale:**
[why this approach balances both perspectives]
**Remaining Considerations:**
[open questions or concerns to address]
5. Present Balanced Analysis
Deliver integrated insights to user:
Presentation principles:
- Be transparent about which AI said what
- Acknowledge disagreements honestly
- Don't force false consensus
- Explain reasoning behind each perspective
- Give user enough context to make informed decision
- Present alternatives clearly
- Indicate confidence levels appropriately
When perspectives align:
"Both Claude and Codex agree that [approach] is preferable because [reasons]. This alignment increases confidence in the recommendation."
When perspectives diverge:
"Claude favors [approach A] prioritizing [factors], while Codex suggests [approach B] emphasizing [factors]. This divergence reveals an important trade-off: [explanation]. Consider [factors] to decide which approach better fits your context."
When one finds issues the other missed:
"Codex identified [concern] that wasn't initially apparent. This adds [insight] to our analysis..."
Use Case Patterns
Load references/use-case-patterns.md for detailed examples of each scenario.
1. Architecture Review
Scenario: Reviewing system design before major implementation
Process:
- Document current architecture or proposed design
- Prepare context: system requirements, constraints, scale expectations
- Ask Codex: "Review this architecture for scalability, maintainability, and potential issues"
- Synthesize: Compare architectural concerns and recommendations
- Present: Integrated architecture assessment with both perspectives
Example question: "Review this microservices architecture. Are there concerns with service boundaries, data consistency, or deployment complexity?"
2. Design Decision Validation
Scenario: Choosing between multiple implementation approaches
Process:
- Document the decision point and alternatives
- Prepare context: requirements, constraints, trade-offs known
- Ask Codex: "Compare approaches A, B, and C for [criteria]"
- Synthesize: Create trade-off matrix from both perspectives
- Present: Clear comparison showing strengths/weaknesses
Example question: "Should we use event sourcing or traditional CRUD for this domain? Consider complexity, auditability, and team expertise."
3. Security Review
Scenario: Validating security-critical code before deployment
Process:
- Extract security-relevant code sections
- Prepare context: threat model, security requirements, compliance needs
- Ask Codex: "Security review: identify vulnerabilities, attack vectors, and hardening opportunities"
- Synthesize: Combine security concerns from both analyses
- Present: Comprehensive security assessment with prioritized issues
Example question: "Review this authentication implementation. Are there vulnerabilities in session management, token handling, or access control?"
4. Performance Analysis
Scenario: Optimizing performance-critical code
Process:
- Extract performance-critical sections
- Prepare context: performance requirements, current bottlenecks, constraints
- Ask Codex: "Analyze for performance bottlenecks and optimization opportunities"
- Synthesize: Combine optimization suggestions from both perspectives
- Present: Prioritized optimization recommendations with trade-offs
Example question: "This query endpoint is slow under load. Identify bottlenecks in the database access pattern, caching strategy, and N+1 issues."
5. Testing Strategy
Scenario: Improving test coverage and quality
Process:
- Document current testing approach and coverage
- Prepare context: critical paths, known gaps, testing constraints
- Ask Codex: "Review testing strategy and suggest improvements"
- Synthesize: Combine testing recommendations from both perspectives
- Present: Comprehensive testing improvement plan
Example question: "Review our testing approach. Are there coverage gaps, missing edge cases, or better testing strategies for this complex state machine?"
6. Code Review & Learning
Scenario: Understanding unfamiliar code or patterns
Process:
- Extract relevant code sections
- Prepare context: what's unclear, specific questions, learning goals
- Ask Codex: "Explain this code: patterns used, design decisions, potential concerns"
- Synthesize: Combine explanations and identify patterns both AIs recognize
- Present: Clear explanation with multiple perspectives on design
Example question: "Explain this recursive backtracking algorithm. What patterns are used, and are there clearer alternatives?"
7. Alternative Approach Generation
Scenario: Stuck on a problem or exploring better approaches
Process:
- Document current approach and why it's unsatisfactory
- Prepare context: problem constraints, what's been tried, goals
- Ask Codex: "Generate alternative approaches to [problem]"
- Synthesize: Combine creative alternatives from both perspectives
- Present: Multiple vetted alternatives with trade-off analysis
Example question: "We're stuck on real-time conflict resolution for collaborative editing. What alternative CRDT or operational transform approaches could work better?"
Command Reference
Load references/codex-commands.md for complete command documentation.
Quick reference:
| Use Case | Command Pattern | Flags |
|---|
| Architecture review | codex --reasoning high exec "[context]" | --reasoning high for complex analysis |
| Security analysis | codex --reasoning xhigh exec "Security: [code]" | --reasoning xhigh for critical security |
| Review with diagram | codex --image diagram.png "[question]" | --image for visual context |
| Implementation suggestions | codex --full-auto "[context]" | --full-auto for unattended |
| Quick validation | codex "[question]" | Interactive mode |
| Resume analysis | codex /resume | Continue previous session |
Reasoning effort flags for peer review:
codex --reasoning high exec "[context]"
codex --reasoning xhigh exec "[context]"
Integration Points
With Other Skills
With concept-forge skill:
- Forge architectural concepts → Validate with Codex peer review
- Use
@builder and @strategist archetypes to prepare questions
With prose-polish skill:
- Ensure technical documentation is clear and professional
- Polish architecture decision records (ADRs)
With claimify skill:
- Map architectural arguments and assumptions
- Analyze decision rationale structure
With Claude Code Workflows
Pre-implementation:
- Use peer review before starting major features
- Validate architecture before building
Post-implementation:
- Use peer review to validate completed work
- Cross-check refactoring results
During implementation:
- Use peer review when stuck or uncertain
- Validate critical decisions in real-time
Quality Signals
Peer Review is Valuable When:
- Both perspectives identify same concerns (high confidence)
- Perspectives reveal complementary insights
- Trade-offs become clearer through different lenses
- Alternative approaches emerge that weren't initially visible
- Security or performance concerns are validated independently
- User gains clarity on decision through multi-perspective analysis
Peer Review Needs Refinement When:
- Responses are too vague or generic
- Question wasn't specific enough
- Context was insufficient
- Both perspectives say obvious things
- No new insights emerge
- Codex response misunderstands the question
Action: Reformulate question with better context and specificity
Skip Peer Review When:
- Codex CLI unavailable and blocking progress
- Decision is time-sensitive and low-risk
- Approach is straightforward with no trade-offs
- User doesn't value second opinion for this decision
- Context is too large to prepare efficiently
Best Practices
Effective Peer Review
DO:
- Frame specific, answerable questions
- Provide sufficient context for informed analysis
- Use for high-stakes decisions where second opinion adds value
- Be transparent about which AI provided which insight
- Acknowledge disagreements and explain them
- Synthesize perspectives rather than just concatenating them
- Give user enough context to make informed decision
DON'T:
- Use for every trivial decision
- Ask vague questions without context
- Force false consensus when perspectives diverge
- Hide which AI said what
- Ignore one perspective in favor of the other
- Present peer review as authoritative truth
- Over-rely on peer review for basic decisions
Context Preparation
Effective context:
- Focused on specific decision or area of code
- Includes relevant constraints and requirements
- Provides enough background without overwhelming
- Frames clear questions
- Sets expectations for output
Ineffective context:
- Dumps entire codebase
- No clear question or focus
- Missing critical constraints
- Vague or overly broad
- No guidance on what kind of response is useful
Question Framing
Good questions:
- "Review this microservices architecture. Are service boundaries well-defined? Any concerns with data consistency or deployment complexity?"
- "Compare these three caching strategies for our use case. Consider memory overhead, invalidation complexity, and cold-start performance."
- "Security review this authentication flow. Focus on session management, token expiration, and refresh token handling."
Poor questions:
- "Is this code good?" (too vague)
- "Review everything" (too broad)
- "What do you think?" (no specific focus)
Installation Requirements
Codex CLI must be installed to use this skill.
Installation
npm i -g @openai/codex
brew install openai/codex/codex
Authentication
codex auth login
codex auth api-key [your-api-key]
Verification
codex --version
codex /status
CRITICAL: Always Use Latest Model
Before invoking peer review, ALWAYS verify you're using the latest Codex model.
Version Check Protocol
Step 1: Check current Codex CLI and model version
codex --version
codex /status
Step 2: Web search for latest available models
WebSearch: "OpenAI Codex CLI latest model [current month year]"
Step 3: Update if outdated
npm update -g @openai/codex
brew upgrade codex
Current Latest Models (as of December 2025)
| Model | Use Case | Model ID |
|---|
| GPT-5.1-Codex-Max | Complex reasoning, architecture, multi-window context (default) | gpt-5.1-codex-max |
| GPT-5.1-Codex | Standard analysis, faster response | gpt-5.1-codex |
Reasoning Effort Levels
Codex supports different reasoning effort levels for different task complexity:
| Level | Use Case | Flag |
|---|
low | Simple queries, quick checks | --reasoning low |
medium | Daily driver, most tasks | --reasoning medium |
high | Complex architecture, security review | --reasoning high |
xhigh | Maximum reasoning, critical decisions | --reasoning xhigh |
For peer review, use high or xhigh for best results.
New Features (CLI v0.65.0+)
/resume command: Resume previous analysis sessions for continuity
- Codex Max as default: Best model now default for signed-in users
- Extended context: Process millions of tokens in single task via compaction
- Enhanced markdown tooltips: Better formatted output
Why This Matters
- Newer models have dramatically improved reasoning
- GPT-5.1-Codex-Max can work across multiple context windows
- Security analysis accuracy improves significantly
- Outdated models may miss critical issues
- This skill is only as good as the model powering it
Automated Version Check (Recommended)
Add this check to the beginning of any peer review workflow:
codex --version && codex /status
If using an older model (e.g., codex-1 or gpt-4-*), update before proceeding.
If Codex CLI is not available:
- Inform user that peer review requires Codex CLI
- Provide installation instructions
- Continue with Claude-only analysis if user can't install
- Note that second opinion isn't available
Configuration
Optional configuration in ~/.codex/config.toml:
model = "gpt-5.1-codex-max"
ask_for_approval = "suggest"
sandbox = "workspace-read"
reasoning_effort = "high"
For peer review, recommended settings:
model = "gpt-5.1-codex-max" for best reasoning
reasoning_effort = "high" or "xhigh" for complex architecture
sandbox = "workspace-read" for read-only safety
ask_for_approval = "suggest" for transparency
Limitations & Considerations
Technical Limitations
- Requires Codex CLI installation and authentication
- Subject to OpenAI API rate limits
- May have different context windows than Claude
- Responses may vary in quality based on prompt
- No real-time communication between AIs (sequential only)
Philosophical Considerations
- Different training data and approaches may lead to different perspectives
- Neither AI is objectively "correct"—both offer perspectives
- User judgment is ultimate arbiter
- Peer review adds time to workflow
- Over-reliance on peer review can slow decision-making
When to Trust Which Perspective
Trust convergence:
- When both AIs agree, confidence increases
Trust divergence:
- Reveals important trade-offs and assumptions
- Neither is necessarily "right"—different priorities
Trust specialized knowledge:
- Codex may have different strengths in certain domains
- Claude may have different strengths in others
- Consider which AI's reasoning aligns better with your context
Example Workflows
Example: Architecture Decision
User: "I'm designing a multi-tenant SaaS architecture. Should I use separate databases per tenant or a shared database with row-level security?"
Claude initial analysis: [Provides analysis of trade-offs]
Invoke peer review:
codex --reasoning xhigh exec "Review multi-tenant SaaS architecture decision:
CONTEXT:
- B2B SaaS with 100-500 tenants expected
- Varying data volumes per tenant (small to large)
- Strong data isolation requirements
- Team familiar with PostgreSQL
- Cloud deployment (AWS)
OPTIONS:
A) Separate database per tenant
B) Shared database with row-level security (RLS)
QUESTION:
Analyze trade-offs for scalability, operational complexity, data isolation, and cost. Which approach is recommended for this context?"
Note: Using --reasoning xhigh for maximum analysis on this high-stakes architectural decision.
Synthesis:
Compare Claude's and Codex's trade-off analysis, extract key insights, present balanced recommendation.
Anti-Patterns
Don't:
- Use peer review for every trivial decision (wastes time)
- Blindly follow one AI's recommendation over the other
- Ask vague questions without context
- Expect perfect agreement between AIs
- Force implementation when both AIs raise concerns
- Use peer review as decision-avoidance mechanism
- Over-engineer simple problems by seeking too many opinions
Do:
- Use strategically for high-stakes decisions
- Synthesize both perspectives thoughtfully
- Frame clear, specific questions with context
- Embrace disagreement as revealing trade-offs
- Use peer review to inform, not replace, judgment
- Make timely decisions based on integrated analysis
- Balance peer review with velocity
Success Metrics
Peer review succeeds when:
- User gains clarity on decision through multi-perspective analysis
- Important trade-offs are revealed that weren't initially apparent
- Alternative approaches emerge that are genuinely valuable
- Risks are identified by at least one AI perspective
- User makes more informed decision than without peer review
- Confidence increases (when perspectives align)
- Trade-offs become explicit (when perspectives diverge)
Peer review fails when:
- No new insights emerge (obvious analysis)
- Takes too long relative to decision impact
- Perspectives are confusing rather than clarifying
- User is more confused after peer review than before
- Blocks forward progress unnecessarily
- Becomes crutch for simple decisions
Skill Improvement
This skill improves through:
- Better question framing patterns
- More effective context preparation
- Refined synthesis techniques
- Pattern recognition for when peer review adds value
- Learning which types of questions work best with Codex
- Understanding Codex's strengths and limitations
- Calibrating when peer review is worth the time investment
Feedback loop:
- Track which peer reviews provided valuable insights
- Note which question patterns work well
- Identify scenarios where peer review was or wasn't valuable
- Refine use case patterns based on experience
Related Resources
- Codex CLI Documentation: https://developers.openai.com/codex/cli/
- Architecture Decision Records (ADR) patterns
- Design pattern catalogs
- Security review checklists
- Performance optimization frameworks
- Testing strategy guides