name

codex-peer-review

description

[CLAUDE CODE ONLY] Leverage Codex CLI for AI peer review, second opinions on architecture and design decisions, cross-validation of implementations, security analysis, and alternative approach generation. Requires terminal access to execute Codex CLI commands. Use when making high-stakes decisions, reviewing complex architecture, or when explicitly requested for a second AI perspective. Must be explicitly invoked using skill syntax.

Codex Peer Review Skill

🖥️ Claude Code Only - Requires terminal access to execute Codex CLI commands.

Enable Claude Code to leverage OpenAI's Codex CLI for collaborative AI reasoning, peer review, and multi-perspective analysis of code architecture, design decisions, and implementations.

Core Philosophy

Two AI perspectives are better than one for high-stakes decisions.

This skill enables strategic collaboration between Claude Code (Anthropic) and Codex CLI (OpenAI) for:

Architecture validation and critique
Design decision cross-validation
Alternative approach generation
Security, performance, and testing analysis
Learning from different AI reasoning patterns

Not a replacement—a second opinion.

When to Use Codex Peer Review

High-Value Scenarios

DO use when:

Making high-stakes architecture decisions
Choosing between significant design alternatives
Reviewing security-critical code
Validating complex refactoring plans
Exploring unfamiliar domains or patterns
User explicitly requests second opinion
Significant disagreement about approach
Performance-critical optimization decisions
Testing strategy validation

DON'T use when:

Simple, straightforward implementations
Already confident in singular approach
Time-sensitive quick fixes
No significant trade-offs exist
Low-impact tactical changes
Codex CLI is not available/installed

How to Invoke This Skill

Important: This skill requires explicit invocation. It is not automatically triggered by natural language.

To use this skill, Claude must explicitly invoke it using:

skill: "codex-peer-review"

User phrases that indicate this skill would be valuable:

"Get a second opinion on..."
"What would Codex think about..."
"Review this architecture with Codex"
"Use Codex to validate this approach"
"Are there better alternatives to..."
"Get Codex peer review for this"
"Security review with Codex needed"
"Ask Codex about this design"

When these phrases appear, Claude should suggest using this skill and invoke it explicitly if appropriate.

Codex vs Gemini: Which Peer Review Skill?

Both Codex and Gemini peer review skills provide valuable second opinions, but excel in different scenarios.

Use Codex Peer Review when:

Code size < 500 LOC (focused reviews)
Need precise, line-level bug detection
Want fast analysis with concise output
Reviewing single modules or functions
Need tactical implementation feedback
Performance bottleneck identification (specific issues)
Quick validation of design decisions

Use Gemini Peer Review when:

Code size > 5k LOC (large codebase analysis)
Need full codebase context (up to 1M tokens)
Reviewing architecture across multiple modules
Analyzing diagrams + code together (multimodal)
Want research-grounded recommendations (current best practices)
Cross-module security analysis (attack surface mapping)
Systemic performance patterns
Design consistency checking

For mid-range codebases (500-5k LOC):

Use Codex if: Focused review, single module, speed priority, specific bugs
Use Gemini if: Cross-module patterns, holistic view, diagram analysis, research grounding
Consider Both for: Critical decisions requiring maximum confidence

For maximum value on high-stakes decisions: Use both skills sequentially and apply synthesis framework (see references/synthesis-framework.md).

Core Workflow

1. Recognize Need for Peer Review

Assess if peer review adds value:

Questions to consider:

Is this a high-stakes decision with significant impact?
Are there multiple valid approaches to consider?
Is the architecture complex or unfamiliar?
Does this involve security, performance, or scalability concerns?
Has the user explicitly requested a second opinion?
Would different AI reasoning perspectives help?

If yes to 2+ questions: Proceed with peer review workflow

2. Prepare Context for Codex

Extract and structure relevant information:

Load references/context-preparation.md for detailed guidance on:

What code/files to include
How to frame questions effectively
Context boundaries (what to include/exclude)
Expectation setting for output format

Key preparation steps:

Identify core question: What specifically do we want Codex to review?
Extract relevant code: Include necessary files, not entire codebase
Provide context: Project type, constraints, requirements, concerns
Frame clearly: Specific questions, not vague requests
Set expectations: What kind of response we need

Context structure template:

[CONTEXT]
Project: [type, purpose]
Current situation: [what exists]
Constraints: [technical, business, time]

[CODE/ARCHITECTURE]
[relevant code or architecture description]

[QUESTION]
[specific question or review request]

[EXPECTED OUTPUT]
[format: analysis, alternatives, recommendations, etc.]

3. Invoke Codex CLI

Execute appropriate Codex command:

Load references/codex-commands.md for complete command reference.

Common patterns:

Non-interactive review (recommended):

codex exec "[prepared context and question]"

Architecture review with diagram:

codex --image architecture-diagram.png "Analyze this architecture: [question]"

Security-focused review:

codex exec "Security review focus: [context and code]"

Full auto for implementation suggestions:

codex --full-auto "Suggest improvements to: [context]"

Key flags:

--full-auto: Unattended mode with minimal prompts
--image / -i: Attach architecture diagrams or screenshots
exec: Non-interactive execution streaming to stdout

Error handling:

If Codex CLI not installed, inform user and provide installation instructions
If API limits reached, note limitation and proceed with Claude-only analysis
If Codex returns unclear response, reformulate question and retry once

4. Synthesize Perspectives

Compare and integrate both AI perspectives:

Load references/synthesis-framework.md for detailed synthesis patterns.

Analysis framework:

Agreement Analysis
- Where do both perspectives align?
- What shared concerns exist?
- What validates confidence in approach?
Disagreement Analysis
- Where do perspectives diverge?
- Why might approaches differ?
- What assumptions differ?
Complementary Insights
- What does Codex see that Claude missed?
- What does Claude see that Codex missed?
- How do perspectives complement each other?
Trade-off Identification
- What trade-offs does each perspective reveal?
- Which concerns are prioritized differently?
- What constraints drive different conclusions?
Insight Extraction
- What are the key actionable insights?
- What alternatives emerge from both perspectives?
- What risks are highlighted by either perspective?

Synthesis output structure:

## Perspective Comparison

**Claude's Analysis:**
[key points from Claude's initial analysis]

**Codex's Analysis:**
[key points from Codex's review]

**Points of Agreement:**
- [shared insights]

**Points of Divergence:**
- [different perspectives and why]

**Complementary Insights:**
- [unique value from each perspective]

## Synthesis & Recommendations

[integrated analysis incorporating both perspectives]

**Recommended Approach:**
[action plan based on both perspectives]

**Rationale:**
[why this approach balances both perspectives]

**Remaining Considerations:**
[open questions or concerns to address]

5. Present Balanced Analysis

Deliver integrated insights to user:

Presentation principles:

Be transparent about which AI said what
Acknowledge disagreements honestly
Don't force false consensus
Explain reasoning behind each perspective
Give user enough context to make informed decision
Present alternatives clearly
Indicate confidence levels appropriately

When perspectives align: "Both Claude and Codex agree that [approach] is preferable because [reasons]. This alignment increases confidence in the recommendation."

When perspectives diverge: "Claude favors [approach A] prioritizing [factors], while Codex suggests [approach B] emphasizing [factors]. This divergence reveals an important trade-off: [explanation]. Consider [factors] to decide which approach better fits your context."

When one finds issues the other missed: "Codex identified [concern] that wasn't initially apparent. This adds [insight] to our analysis..."

Use Case Patterns

Load references/use-case-patterns.md for detailed examples of each scenario.

1. Architecture Review

Scenario: Reviewing system design before major implementation

Process:

Document current architecture or proposed design
Prepare context: system requirements, constraints, scale expectations
Ask Codex: "Review this architecture for scalability, maintainability, and potential issues"
Synthesize: Compare architectural concerns and recommendations
Present: Integrated architecture assessment with both perspectives

Example question: "Review this microservices architecture. Are there concerns with service boundaries, data consistency, or deployment complexity?"

2. Design Decision Validation

Scenario: Choosing between multiple implementation approaches

Process:

Document the decision point and alternatives
Prepare context: requirements, constraints, trade-offs known
Ask Codex: "Compare approaches A, B, and C for [criteria]"
Synthesize: Create trade-off matrix from both perspectives
Present: Clear comparison showing strengths/weaknesses

Example question: "Should we use event sourcing or traditional CRUD for this domain? Consider complexity, auditability, and team expertise."

3. Security Review

Scenario: Validating security-critical code before deployment

Process:

Extract security-relevant code sections
Prepare context: threat model, security requirements, compliance needs
Ask Codex: "Security review: identify vulnerabilities, attack vectors, and hardening opportunities"
Synthesize: Combine security concerns from both analyses
Present: Comprehensive security assessment with prioritized issues

Example question: "Review this authentication implementation. Are there vulnerabilities in session management, token handling, or access control?"

4. Performance Analysis

Scenario: Optimizing performance-critical code

Process:

Extract performance-critical sections
Prepare context: performance requirements, current bottlenecks, constraints
Ask Codex: "Analyze for performance bottlenecks and optimization opportunities"
Synthesize: Combine optimization suggestions from both perspectives
Present: Prioritized optimization recommendations with trade-offs

Example question: "This query endpoint is slow under load. Identify bottlenecks in the database access pattern, caching strategy, and N+1 issues."

5. Testing Strategy

Scenario: Improving test coverage and quality

Process:

Document current testing approach and coverage
Prepare context: critical paths, known gaps, testing constraints
Ask Codex: "Review testing strategy and suggest improvements"
Synthesize: Combine testing recommendations from both perspectives
Present: Comprehensive testing improvement plan

Example question: "Review our testing approach. Are there coverage gaps, missing edge cases, or better testing strategies for this complex state machine?"

6. Code Review & Learning

Scenario: Understanding unfamiliar code or patterns

Process:

Extract relevant code sections
Prepare context: what's unclear, specific questions, learning goals
Ask Codex: "Explain this code: patterns used, design decisions, potential concerns"
Synthesize: Combine explanations and identify patterns both AIs recognize
Present: Clear explanation with multiple perspectives on design

Example question: "Explain this recursive backtracking algorithm. What patterns are used, and are there clearer alternatives?"

7. Alternative Approach Generation

Scenario: Stuck on a problem or exploring better approaches

Process:

Document current approach and why it's unsatisfactory
Prepare context: problem constraints, what's been tried, goals
Ask Codex: "Generate alternative approaches to [problem]"
Synthesize: Combine creative alternatives from both perspectives
Present: Multiple vetted alternatives with trade-off analysis

Example question: "We're stuck on real-time conflict resolution for collaborative editing. What alternative CRDT or operational transform approaches could work better?"

Command Reference

Load references/codex-commands.md for complete command documentation.

Quick reference:

Use Case	Command Pattern	Flags
Architecture review	`codex --reasoning high exec "[context]"`	`--reasoning high` for complex analysis
Security analysis	`codex --reasoning xhigh exec "Security: [code]"`	`--reasoning xhigh` for critical security
Review with diagram	`codex --image diagram.png "[question]"`	`--image` for visual context
Implementation suggestions	`codex --full-auto "[context]"`	`--full-auto` for unattended
Quick validation	`codex "[question]"`	Interactive mode
Resume analysis	`codex /resume`	Continue previous session

Reasoning effort flags for peer review:

# For architectural decisions (recommended)
codex --reasoning high exec "[context]"

# For security-critical code (maximum reasoning)
codex --reasoning xhigh exec "[context]"

Integration Points

With Other Skills

With concept-forge skill:

Forge architectural concepts → Validate with Codex peer review
Use @builder and @strategist archetypes to prepare questions

With prose-polish skill:

Ensure technical documentation is clear and professional
Polish architecture decision records (ADRs)

With claimify skill:

Map architectural arguments and assumptions
Analyze decision rationale structure

With Claude Code Workflows

Pre-implementation:

Use peer review before starting major features
Validate architecture before building

Post-implementation:

Use peer review to validate completed work
Cross-check refactoring results

During implementation:

Use peer review when stuck or uncertain
Validate critical decisions in real-time

Quality Signals

Peer Review is Valuable When:

Both perspectives identify same concerns (high confidence)
Perspectives reveal complementary insights
Trade-offs become clearer through different lenses
Alternative approaches emerge that weren't initially visible
Security or performance concerns are validated independently
User gains clarity on decision through multi-perspective analysis

Peer Review Needs Refinement When:

Responses are too vague or generic
Question wasn't specific enough
Context was insufficient
Both perspectives say obvious things
No new insights emerge
Codex response misunderstands the question

Action: Reformulate question with better context and specificity

Skip Peer Review When:

Codex CLI unavailable and blocking progress
Decision is time-sensitive and low-risk
Approach is straightforward with no trade-offs
User doesn't value second opinion for this decision
Context is too large to prepare efficiently

Best Practices

Effective Peer Review

DO:

Frame specific, answerable questions
Provide sufficient context for informed analysis
Use for high-stakes decisions where second opinion adds value
Be transparent about which AI provided which insight
Acknowledge disagreements and explain them
Synthesize perspectives rather than just concatenating them
Give user enough context to make informed decision

DON'T:

Use for every trivial decision
Ask vague questions without context
Force false consensus when perspectives diverge
Hide which AI said what
Ignore one perspective in favor of the other
Present peer review as authoritative truth
Over-rely on peer review for basic decisions

Context Preparation

Effective context:

Focused on specific decision or area of code
Includes relevant constraints and requirements
Provides enough background without overwhelming
Frames clear questions
Sets expectations for output

Ineffective context:

Dumps entire codebase
No clear question or focus
Missing critical constraints
Vague or overly broad
No guidance on what kind of response is useful

Question Framing

Good questions:

"Review this microservices architecture. Are service boundaries well-defined? Any concerns with data consistency or deployment complexity?"
"Compare these three caching strategies for our use case. Consider memory overhead, invalidation complexity, and cold-start performance."
"Security review this authentication flow. Focus on session management, token expiration, and refresh token handling."

Poor questions:

"Is this code good?" (too vague)
"Review everything" (too broad)
"What do you think?" (no specific focus)

Installation Requirements

Codex CLI must be installed to use this skill.

Installation

# Via npm
npm i -g @openai/codex

# Via Homebrew
brew install openai/codex/codex

Authentication

# Sign in with ChatGPT Plus/Pro/Business/Edu/Enterprise account
codex auth login

# Or provide API key
codex auth api-key [your-api-key]

Verification

# Verify installation
codex --version

# Check authentication
codex /status

CRITICAL: Always Use Latest Model

Before invoking peer review, ALWAYS verify you're using the latest Codex model.

Version Check Protocol

Step 1: Check current Codex CLI and model version

# Check CLI version
codex --version

# Check current model in config
codex /status

Step 2: Web search for latest available models

WebSearch: "OpenAI Codex CLI latest model [current month year]"

Step 3: Update if outdated

# Update CLI to latest version
npm update -g @openai/codex
# or
brew upgrade codex

# Update model in config if needed
# Edit ~/.codex/config.toml and set model = "[latest-model]"

Current Latest Models (as of December 2025)

Model	Use Case	Model ID
GPT-5.1-Codex-Max	Complex reasoning, architecture, multi-window context (default)	`gpt-5.1-codex-max`
GPT-5.1-Codex	Standard analysis, faster response	`gpt-5.1-codex`

Reasoning Effort Levels

Codex supports different reasoning effort levels for different task complexity:

Level	Use Case	Flag
`low`	Simple queries, quick checks	`--reasoning low`
`medium`	Daily driver, most tasks	`--reasoning medium`
`high`	Complex architecture, security review	`--reasoning high`
`xhigh`	Maximum reasoning, critical decisions	`--reasoning xhigh`

For peer review, use high or xhigh for best results.

New Features (CLI v0.65.0+)

/resume command: Resume previous analysis sessions for continuity
Codex Max as default: Best model now default for signed-in users
Extended context: Process millions of tokens in single task via compaction
Enhanced markdown tooltips: Better formatted output

Why This Matters

Newer models have dramatically improved reasoning
GPT-5.1-Codex-Max can work across multiple context windows
Security analysis accuracy improves significantly
Outdated models may miss critical issues
This skill is only as good as the model powering it

Automated Version Check (Recommended)

Add this check to the beginning of any peer review workflow:

# Check Codex version and model
codex --version && codex /status

If using an older model (e.g., codex-1 or gpt-4-*), update before proceeding.

If Codex CLI is not available:

Inform user that peer review requires Codex CLI
Provide installation instructions
Continue with Claude-only analysis if user can't install
Note that second opinion isn't available

Configuration

Optional configuration in ~/.codex/config.toml:

# Default model (always use latest - gpt-5.1-codex-max as of Dec 2025)
model = "gpt-5.1-codex-max"

# Approval mode (suggest|auto|on-failure)
ask_for_approval = "suggest"

# Sandbox mode (none|workspace-read|workspace-write)
sandbox = "workspace-read"

# Reasoning effort (low|medium|high|xhigh)
# Use xhigh for complex analysis, medium for daily tasks
reasoning_effort = "high"

For peer review, recommended settings:

model = "gpt-5.1-codex-max" for best reasoning
reasoning_effort = "high" or "xhigh" for complex architecture
sandbox = "workspace-read" for read-only safety
ask_for_approval = "suggest" for transparency

Limitations & Considerations

Technical Limitations

Requires Codex CLI installation and authentication
Subject to OpenAI API rate limits
May have different context windows than Claude
Responses may vary in quality based on prompt
No real-time communication between AIs (sequential only)

Philosophical Considerations

Different training data and approaches may lead to different perspectives
Neither AI is objectively "correct"—both offer perspectives
User judgment is ultimate arbiter
Peer review adds time to workflow
Over-reliance on peer review can slow decision-making

When to Trust Which Perspective

Trust convergence:

When both AIs agree, confidence increases

Trust divergence:

Reveals important trade-offs and assumptions
Neither is necessarily "right"—different priorities

Trust specialized knowledge:

Codex may have different strengths in certain domains
Claude may have different strengths in others
Consider which AI's reasoning aligns better with your context

Example Workflows

Example: Architecture Decision

User: "I'm designing a multi-tenant SaaS architecture. Should I use separate databases per tenant or a shared database with row-level security?"

Claude initial analysis: [Provides analysis of trade-offs]

Invoke peer review:

codex --reasoning xhigh exec "Review multi-tenant SaaS architecture decision:

CONTEXT:
- B2B SaaS with 100-500 tenants expected
- Varying data volumes per tenant (small to large)
- Strong data isolation requirements
- Team familiar with PostgreSQL
- Cloud deployment (AWS)

OPTIONS:
A) Separate database per tenant
B) Shared database with row-level security (RLS)

QUESTION:
Analyze trade-offs for scalability, operational complexity, data isolation, and cost. Which approach is recommended for this context?"

Note: Using --reasoning xhigh for maximum analysis on this high-stakes architectural decision.

Synthesis: Compare Claude's and Codex's trade-off analysis, extract key insights, present balanced recommendation.

Anti-Patterns

Don't:

Use peer review for every trivial decision (wastes time)
Blindly follow one AI's recommendation over the other
Ask vague questions without context
Expect perfect agreement between AIs
Force implementation when both AIs raise concerns
Use peer review as decision-avoidance mechanism
Over-engineer simple problems by seeking too many opinions

Do:

Use strategically for high-stakes decisions
Synthesize both perspectives thoughtfully
Frame clear, specific questions with context
Embrace disagreement as revealing trade-offs
Use peer review to inform, not replace, judgment
Make timely decisions based on integrated analysis
Balance peer review with velocity

Success Metrics

Peer review succeeds when:

User gains clarity on decision through multi-perspective analysis
Important trade-offs are revealed that weren't initially apparent
Alternative approaches emerge that are genuinely valuable
Risks are identified by at least one AI perspective
User makes more informed decision than without peer review
Confidence increases (when perspectives align)
Trade-offs become explicit (when perspectives diverge)

Peer review fails when:

No new insights emerge (obvious analysis)
Takes too long relative to decision impact
Perspectives are confusing rather than clarifying
User is more confused after peer review than before
Blocks forward progress unnecessarily
Becomes crutch for simple decisions

Skill Improvement

This skill improves through:

Better question framing patterns
More effective context preparation
Refined synthesis techniques
Pattern recognition for when peer review adds value
Learning which types of questions work best with Codex
Understanding Codex's strengths and limitations
Calibrating when peer review is worth the time investment

Feedback loop:

Track which peer reviews provided valuable insights
Note which question patterns work well
Identify scenarios where peer review was or wasn't valuable
Refine use case patterns based on experience

Related Resources

Codex CLI Documentation: https://developers.openai.com/codex/cli/
Architecture Decision Records (ADR) patterns
Design pattern catalogs
Security review checklists
Performance optimization frameworks
Testing strategy guides

name

codex-peer-review

description

Codex Peer Review Skill

🖥️ Claude Code Only - Requires terminal access to execute Codex CLI commands.

Enable Claude Code to leverage OpenAI's Codex CLI for collaborative AI reasoning, peer review, and multi-perspective analysis of code architecture, design decisions, and implementations.

Core Philosophy

Two AI perspectives are better than one for high-stakes decisions.

This skill enables strategic collaboration between Claude Code (Anthropic) and Codex CLI (OpenAI) for:

Architecture validation and critique
Design decision cross-validation
Alternative approach generation
Security, performance, and testing analysis
Learning from different AI reasoning patterns

Not a replacement—a second opinion.

When to Use Codex Peer Review

High-Value Scenarios

DO use when:

Making high-stakes architecture decisions
Choosing between significant design alternatives
Reviewing security-critical code
Validating complex refactoring plans
Exploring unfamiliar domains or patterns
User explicitly requests second opinion
Significant disagreement about approach
Performance-critical optimization decisions
Testing strategy validation

DON'T use when:

Simple, straightforward implementations
Already confident in singular approach
Time-sensitive quick fixes
No significant trade-offs exist
Low-impact tactical changes
Codex CLI is not available/installed

How to Invoke This Skill

Important: This skill requires explicit invocation. It is not automatically triggered by natural language.

To use this skill, Claude must explicitly invoke it using:

skill: "codex-peer-review"

User phrases that indicate this skill would be valuable:

"Get a second opinion on..."
"What would Codex think about..."
"Review this architecture with Codex"
"Use Codex to validate this approach"
"Are there better alternatives to..."
"Get Codex peer review for this"
"Security review with Codex needed"
"Ask Codex about this design"

When these phrases appear, Claude should suggest using this skill and invoke it explicitly if appropriate.

Codex vs Gemini: Which Peer Review Skill?

Both Codex and Gemini peer review skills provide valuable second opinions, but excel in different scenarios.

Use Codex Peer Review when:

Code size < 500 LOC (focused reviews)
Need precise, line-level bug detection
Want fast analysis with concise output
Reviewing single modules or functions
Need tactical implementation feedback
Performance bottleneck identification (specific issues)
Quick validation of design decisions

Use Gemini Peer Review when:

Code size > 5k LOC (large codebase analysis)
Need full codebase context (up to 1M tokens)
Reviewing architecture across multiple modules
Analyzing diagrams + code together (multimodal)
Want research-grounded recommendations (current best practices)
Cross-module security analysis (attack surface mapping)
Systemic performance patterns
Design consistency checking

For mid-range codebases (500-5k LOC):

Use Codex if: Focused review, single module, speed priority, specific bugs
Use Gemini if: Cross-module patterns, holistic view, diagram analysis, research grounding
Consider Both for: Critical decisions requiring maximum confidence

For maximum value on high-stakes decisions: Use both skills sequentially and apply synthesis framework (see references/synthesis-framework.md).

Core Workflow

1. Recognize Need for Peer Review

Assess if peer review adds value:

Questions to consider:

Is this a high-stakes decision with significant impact?
Are there multiple valid approaches to consider?
Is the architecture complex or unfamiliar?
Does this involve security, performance, or scalability concerns?
Has the user explicitly requested a second opinion?
Would different AI reasoning perspectives help?

If yes to 2+ questions: Proceed with peer review workflow

2. Prepare Context for Codex

Extract and structure relevant information:

Load references/context-preparation.md for detailed guidance on:

What code/files to include
How to frame questions effectively
Context boundaries (what to include/exclude)
Expectation setting for output format

Key preparation steps:

Identify core question: What specifically do we want Codex to review?
Extract relevant code: Include necessary files, not entire codebase
Provide context: Project type, constraints, requirements, concerns
Frame clearly: Specific questions, not vague requests
Set expectations: What kind of response we need

Context structure template:

[CONTEXT]
Project: [type, purpose]
Current situation: [what exists]
Constraints: [technical, business, time]

[CODE/ARCHITECTURE]
[relevant code or architecture description]

[QUESTION]
[specific question or review request]

[EXPECTED OUTPUT]
[format: analysis, alternatives, recommendations, etc.]

3. Invoke Codex CLI

Execute appropriate Codex command:

Load references/codex-commands.md for complete command reference.

Common patterns:

Non-interactive review (recommended):

codex exec "[prepared context and question]"

Architecture review with diagram:

codex --image architecture-diagram.png "Analyze this architecture: [question]"

Security-focused review:

codex exec "Security review focus: [context and code]"

Full auto for implementation suggestions:

codex --full-auto "Suggest improvements to: [context]"

Key flags:

--full-auto: Unattended mode with minimal prompts
--image / -i: Attach architecture diagrams or screenshots
exec: Non-interactive execution streaming to stdout

Error handling:

If Codex CLI not installed, inform user and provide installation instructions
If API limits reached, note limitation and proceed with Claude-only analysis
If Codex returns unclear response, reformulate question and retry once

4. Synthesize Perspectives

Compare and integrate both AI perspectives:

Load references/synthesis-framework.md for detailed synthesis patterns.

Analysis framework:

Agreement Analysis
- Where do both perspectives align?
- What shared concerns exist?
- What validates confidence in approach?
Disagreement Analysis
- Where do perspectives diverge?
- Why might approaches differ?
- What assumptions differ?
Complementary Insights
- What does Codex see that Claude missed?
- What does Claude see that Codex missed?
- How do perspectives complement each other?
Trade-off Identification
- What trade-offs does each perspective reveal?
- Which concerns are prioritized differently?
- What constraints drive different conclusions?
Insight Extraction
- What are the key actionable insights?
- What alternatives emerge from both perspectives?
- What risks are highlighted by either perspective?

Synthesis output structure:

## Perspective Comparison

**Claude's Analysis:**
[key points from Claude's initial analysis]

**Codex's Analysis:**
[key points from Codex's review]

**Points of Agreement:**
- [shared insights]

**Points of Divergence:**
- [different perspectives and why]

**Complementary Insights:**
- [unique value from each perspective]

## Synthesis & Recommendations

[integrated analysis incorporating both perspectives]

**Recommended Approach:**
[action plan based on both perspectives]

**Rationale:**
[why this approach balances both perspectives]

**Remaining Considerations:**
[open questions or concerns to address]

5. Present Balanced Analysis

Deliver integrated insights to user:

Presentation principles:

Be transparent about which AI said what
Acknowledge disagreements honestly
Don't force false consensus
Explain reasoning behind each perspective
Give user enough context to make informed decision
Present alternatives clearly
Indicate confidence levels appropriately

When perspectives align: "Both Claude and Codex agree that [approach] is preferable because [reasons]. This alignment increases confidence in the recommendation."

When one finds issues the other missed: "Codex identified [concern] that wasn't initially apparent. This adds [insight] to our analysis..."

Use Case Patterns

Load references/use-case-patterns.md for detailed examples of each scenario.

1. Architecture Review

Scenario: Reviewing system design before major implementation

Process:

Document current architecture or proposed design
Prepare context: system requirements, constraints, scale expectations
Ask Codex: "Review this architecture for scalability, maintainability, and potential issues"
Synthesize: Compare architectural concerns and recommendations
Present: Integrated architecture assessment with both perspectives

Example question: "Review this microservices architecture. Are there concerns with service boundaries, data consistency, or deployment complexity?"

2. Design Decision Validation

Scenario: Choosing between multiple implementation approaches

Process:

Document the decision point and alternatives
Prepare context: requirements, constraints, trade-offs known
Ask Codex: "Compare approaches A, B, and C for [criteria]"
Synthesize: Create trade-off matrix from both perspectives
Present: Clear comparison showing strengths/weaknesses

Example question: "Should we use event sourcing or traditional CRUD for this domain? Consider complexity, auditability, and team expertise."

3. Security Review

Scenario: Validating security-critical code before deployment

Process:

Extract security-relevant code sections
Prepare context: threat model, security requirements, compliance needs
Ask Codex: "Security review: identify vulnerabilities, attack vectors, and hardening opportunities"
Synthesize: Combine security concerns from both analyses
Present: Comprehensive security assessment with prioritized issues

Example question: "Review this authentication implementation. Are there vulnerabilities in session management, token handling, or access control?"

4. Performance Analysis

Scenario: Optimizing performance-critical code

Process:

Extract performance-critical sections
Prepare context: performance requirements, current bottlenecks, constraints
Ask Codex: "Analyze for performance bottlenecks and optimization opportunities"
Synthesize: Combine optimization suggestions from both perspectives
Present: Prioritized optimization recommendations with trade-offs

Example question: "This query endpoint is slow under load. Identify bottlenecks in the database access pattern, caching strategy, and N+1 issues."

5. Testing Strategy

Scenario: Improving test coverage and quality

Process:

Document current testing approach and coverage
Prepare context: critical paths, known gaps, testing constraints
Ask Codex: "Review testing strategy and suggest improvements"
Synthesize: Combine testing recommendations from both perspectives
Present: Comprehensive testing improvement plan

Example question: "Review our testing approach. Are there coverage gaps, missing edge cases, or better testing strategies for this complex state machine?"

6. Code Review & Learning

Scenario: Understanding unfamiliar code or patterns

Process:

Extract relevant code sections
Prepare context: what's unclear, specific questions, learning goals
Ask Codex: "Explain this code: patterns used, design decisions, potential concerns"
Synthesize: Combine explanations and identify patterns both AIs recognize
Present: Clear explanation with multiple perspectives on design

Example question: "Explain this recursive backtracking algorithm. What patterns are used, and are there clearer alternatives?"

7. Alternative Approach Generation

Scenario: Stuck on a problem or exploring better approaches

Process:

Document current approach and why it's unsatisfactory
Prepare context: problem constraints, what's been tried, goals
Ask Codex: "Generate alternative approaches to [problem]"
Synthesize: Combine creative alternatives from both perspectives
Present: Multiple vetted alternatives with trade-off analysis

Example question: "We're stuck on real-time conflict resolution for collaborative editing. What alternative CRDT or operational transform approaches could work better?"

Command Reference

Load references/codex-commands.md for complete command documentation.

Quick reference:

Use Case	Command Pattern	Flags
Architecture review	`codex --reasoning high exec "[context]"`	`--reasoning high` for complex analysis
Security analysis	`codex --reasoning xhigh exec "Security: [code]"`	`--reasoning xhigh` for critical security
Review with diagram	`codex --image diagram.png "[question]"`	`--image` for visual context
Implementation suggestions	`codex --full-auto "[context]"`	`--full-auto` for unattended
Quick validation	`codex "[question]"`	Interactive mode
Resume analysis	`codex /resume`	Continue previous session

Reasoning effort flags for peer review:

# For architectural decisions (recommended)
codex --reasoning high exec "[context]"

# For security-critical code (maximum reasoning)
codex --reasoning xhigh exec "[context]"

Integration Points

With Other Skills

With concept-forge skill:

Forge architectural concepts → Validate with Codex peer review
Use @builder and @strategist archetypes to prepare questions

With prose-polish skill:

Ensure technical documentation is clear and professional
Polish architecture decision records (ADRs)

With claimify skill:

Map architectural arguments and assumptions
Analyze decision rationale structure

With Claude Code Workflows

Pre-implementation:

Use peer review before starting major features
Validate architecture before building

Post-implementation:

Use peer review to validate completed work
Cross-check refactoring results

During implementation:

Use peer review when stuck or uncertain
Validate critical decisions in real-time

Quality Signals

Peer Review is Valuable When:

Both perspectives identify same concerns (high confidence)
Perspectives reveal complementary insights
Trade-offs become clearer through different lenses
Alternative approaches emerge that weren't initially visible
Security or performance concerns are validated independently
User gains clarity on decision through multi-perspective analysis

Peer Review Needs Refinement When:

Responses are too vague or generic
Question wasn't specific enough
Context was insufficient
Both perspectives say obvious things
No new insights emerge
Codex response misunderstands the question

Action: Reformulate question with better context and specificity

Skip Peer Review When:

Codex CLI unavailable and blocking progress
Decision is time-sensitive and low-risk
Approach is straightforward with no trade-offs
User doesn't value second opinion for this decision
Context is too large to prepare efficiently

Best Practices

Effective Peer Review

DO:

Frame specific, answerable questions
Provide sufficient context for informed analysis
Use for high-stakes decisions where second opinion adds value
Be transparent about which AI provided which insight
Acknowledge disagreements and explain them
Synthesize perspectives rather than just concatenating them
Give user enough context to make informed decision

DON'T:

Use for every trivial decision
Ask vague questions without context
Force false consensus when perspectives diverge
Hide which AI said what
Ignore one perspective in favor of the other
Present peer review as authoritative truth
Over-rely on peer review for basic decisions

Context Preparation

Effective context:

Focused on specific decision or area of code
Includes relevant constraints and requirements
Provides enough background without overwhelming
Frames clear questions
Sets expectations for output

Ineffective context:

Dumps entire codebase
No clear question or focus
Missing critical constraints
Vague or overly broad
No guidance on what kind of response is useful

Question Framing

Good questions:

"Review this microservices architecture. Are service boundaries well-defined? Any concerns with data consistency or deployment complexity?"
"Compare these three caching strategies for our use case. Consider memory overhead, invalidation complexity, and cold-start performance."
"Security review this authentication flow. Focus on session management, token expiration, and refresh token handling."

Poor questions:

"Is this code good?" (too vague)
"Review everything" (too broad)
"What do you think?" (no specific focus)

Installation Requirements

Codex CLI must be installed to use this skill.

Installation

# Via npm
npm i -g @openai/codex

# Via Homebrew
brew install openai/codex/codex

Authentication

# Sign in with ChatGPT Plus/Pro/Business/Edu/Enterprise account
codex auth login

# Or provide API key
codex auth api-key [your-api-key]

Verification

# Verify installation
codex --version

# Check authentication
codex /status

CRITICAL: Always Use Latest Model

Before invoking peer review, ALWAYS verify you're using the latest Codex model.

Version Check Protocol

Step 1: Check current Codex CLI and model version

# Check CLI version
codex --version

# Check current model in config
codex /status

Step 2: Web search for latest available models

WebSearch: "OpenAI Codex CLI latest model [current month year]"

Step 3: Update if outdated

# Update CLI to latest version
npm update -g @openai/codex
# or
brew upgrade codex

# Update model in config if needed
# Edit ~/.codex/config.toml and set model = "[latest-model]"

Current Latest Models (as of December 2025)

Model	Use Case	Model ID
GPT-5.1-Codex-Max	Complex reasoning, architecture, multi-window context (default)	`gpt-5.1-codex-max`
GPT-5.1-Codex	Standard analysis, faster response	`gpt-5.1-codex`

Reasoning Effort Levels

Codex supports different reasoning effort levels for different task complexity:

Level	Use Case	Flag
`low`	Simple queries, quick checks	`--reasoning low`
`medium`	Daily driver, most tasks	`--reasoning medium`
`high`	Complex architecture, security review	`--reasoning high`
`xhigh`	Maximum reasoning, critical decisions	`--reasoning xhigh`

For peer review, use high or xhigh for best results.

New Features (CLI v0.65.0+)

/resume command: Resume previous analysis sessions for continuity
Codex Max as default: Best model now default for signed-in users
Extended context: Process millions of tokens in single task via compaction
Enhanced markdown tooltips: Better formatted output

Why This Matters

Newer models have dramatically improved reasoning
GPT-5.1-Codex-Max can work across multiple context windows
Security analysis accuracy improves significantly
Outdated models may miss critical issues
This skill is only as good as the model powering it

Automated Version Check (Recommended)

Add this check to the beginning of any peer review workflow:

# Check Codex version and model
codex --version && codex /status

If using an older model (e.g., codex-1 or gpt-4-*), update before proceeding.

If Codex CLI is not available:

Inform user that peer review requires Codex CLI
Provide installation instructions
Continue with Claude-only analysis if user can't install
Note that second opinion isn't available

Configuration

Optional configuration in ~/.codex/config.toml:

# Default model (always use latest - gpt-5.1-codex-max as of Dec 2025)
model = "gpt-5.1-codex-max"

# Approval mode (suggest|auto|on-failure)
ask_for_approval = "suggest"

# Sandbox mode (none|workspace-read|workspace-write)
sandbox = "workspace-read"

# Reasoning effort (low|medium|high|xhigh)
# Use xhigh for complex analysis, medium for daily tasks
reasoning_effort = "high"

For peer review, recommended settings:

model = "gpt-5.1-codex-max" for best reasoning
reasoning_effort = "high" or "xhigh" for complex architecture
sandbox = "workspace-read" for read-only safety
ask_for_approval = "suggest" for transparency

Limitations & Considerations

Technical Limitations

Requires Codex CLI installation and authentication
Subject to OpenAI API rate limits
May have different context windows than Claude
Responses may vary in quality based on prompt
No real-time communication between AIs (sequential only)

Philosophical Considerations

Different training data and approaches may lead to different perspectives
Neither AI is objectively "correct"—both offer perspectives
User judgment is ultimate arbiter
Peer review adds time to workflow
Over-reliance on peer review can slow decision-making

When to Trust Which Perspective

Trust convergence:

When both AIs agree, confidence increases

Trust divergence:

Reveals important trade-offs and assumptions
Neither is necessarily "right"—different priorities

Trust specialized knowledge:

Codex may have different strengths in certain domains
Claude may have different strengths in others
Consider which AI's reasoning aligns better with your context

Example Workflows

Example: Architecture Decision

User: "I'm designing a multi-tenant SaaS architecture. Should I use separate databases per tenant or a shared database with row-level security?"

Claude initial analysis: [Provides analysis of trade-offs]

Invoke peer review:

codex --reasoning xhigh exec "Review multi-tenant SaaS architecture decision:

CONTEXT:
- B2B SaaS with 100-500 tenants expected
- Varying data volumes per tenant (small to large)
- Strong data isolation requirements
- Team familiar with PostgreSQL
- Cloud deployment (AWS)

OPTIONS:
A) Separate database per tenant
B) Shared database with row-level security (RLS)

QUESTION:
Analyze trade-offs for scalability, operational complexity, data isolation, and cost. Which approach is recommended for this context?"

Note: Using --reasoning xhigh for maximum analysis on this high-stakes architectural decision.

Synthesis: Compare Claude's and Codex's trade-off analysis, extract key insights, present balanced recommendation.

Anti-Patterns

Don't:

Use peer review for every trivial decision (wastes time)
Blindly follow one AI's recommendation over the other
Ask vague questions without context
Expect perfect agreement between AIs
Force implementation when both AIs raise concerns
Use peer review as decision-avoidance mechanism
Over-engineer simple problems by seeking too many opinions

Do:

Use strategically for high-stakes decisions
Synthesize both perspectives thoughtfully
Frame clear, specific questions with context
Embrace disagreement as revealing trade-offs
Use peer review to inform, not replace, judgment
Make timely decisions based on integrated analysis
Balance peer review with velocity

Success Metrics

Peer review succeeds when:

User gains clarity on decision through multi-perspective analysis
Important trade-offs are revealed that weren't initially apparent
Alternative approaches emerge that are genuinely valuable
Risks are identified by at least one AI perspective
User makes more informed decision than without peer review
Confidence increases (when perspectives align)
Trade-offs become explicit (when perspectives diverge)

Peer review fails when:

No new insights emerge (obvious analysis)
Takes too long relative to decision impact
Perspectives are confusing rather than clarifying
User is more confused after peer review than before
Blocks forward progress unnecessarily
Becomes crutch for simple decisions

Skill Improvement

This skill improves through:

Better question framing patterns
More effective context preparation
Refined synthesis techniques
Pattern recognition for when peer review adds value
Learning which types of questions work best with Codex
Understanding Codex's strengths and limitations
Calibrating when peer review is worth the time investment

Feedback loop:

Track which peer reviews provided valuable insights
Note which question patterns work well
Identify scenarios where peer review was or wasn't valuable
Refine use case patterns based on experience

Related Resources

Codex CLI Documentation: https://developers.openai.com/codex/cli/
Architecture Decision Records (ADR) patterns
Design pattern catalogs
Security review checklists
Performance optimization frameworks
Testing strategy guides