// Choose optimal external AI models for code analysis, bug investigation, and architectural decisions. Use when consulting multiple LLMs via claudish, comparing model perspectives, or investigating complex Go/LSP/transpiler issues. Provides empirically validated model rankings (91/100 for MiniMax M2, 83/100 for Grok Code Fast) and proven consultation strategies based on real-world testing.
| name | external-model-selection |
| description | Choose optimal external AI models for code analysis, bug investigation, and architectural decisions. Use when consulting multiple LLMs via claudish, comparing model perspectives, or investigating complex Go/LSP/transpiler issues. Provides empirically validated model rankings (91/100 for MiniMax M2, 83/100 for Grok Code Fast) and proven consultation strategies based on real-world testing. |
Purpose: Select the best external AI models for your specific task based on empirical performance data from production bug investigations.
When Claude invokes this Skill: When you need to consult external models, choose between different LLMs, or want diverse perspectives on architectural decisions, code bugs, or design choices.
1. MiniMax M2 (minimax/minimax-m2)
2. Grok Code Fast (x-ai/grok-code-fast-1)
3. GPT-5.1 Codex (openai/gpt-5.1-codex)
4. Sherlock Think Alpha (openrouter/sherlock-think-alpha) ๐ FREE
5. Gemini 3 Pro Preview (google/gemini-3-pro-preview) โญ NEW
6. Gemini 2.5 Flash (google/gemini-2.5-flash)
7. GLM-4.6 (z-ai/glm-4.6)
Qwen3 Coder (qwen/qwen3-coder-30b-a3b-instruct)
Models: minimax/minimax-m2 + x-ai/grok-code-fast-1
# Launch 2 models in parallel (single message, multiple Task calls)
Task 1: golang-architect (PROXY MODE) โ MiniMax M2
Task 2: golang-architect (PROXY MODE) โ Grok Code Fast
Time: ~4 minutes total Success Rate: 95%+ Cost: $$ (moderate)
Use for:
Benefits:
Models: minimax/minimax-m2 + openai/gpt-5.1-codex + x-ai/grok-code-fast-1
# Launch 3 models in parallel
Task 1: golang-architect (PROXY MODE) โ MiniMax M2
Task 2: golang-architect (PROXY MODE) โ GPT-5.1 Codex
Task 3: golang-architect (PROXY MODE) โ Grok Code Fast
Time: ~5 minutes total Success Rate: 99%+ Cost: $$$ (high but justified)
Use for:
Benefits:
Models: minimax/minimax-m2 + google/gemini-2.5-flash + x-ai/grok-code-fast-1
# Launch 3 models in parallel
Task 1: golang-architect (PROXY MODE) โ MiniMax M2
Task 2: golang-architect (PROXY MODE) โ Gemini 2.5 Flash
Task 3: golang-architect (PROXY MODE) โ Grok Code Fast
Time: ~6 minutes total Success Rate: 90%+ Cost: $$ (moderate)
Use for:
Benefits:
Models: openrouter/sherlock-think-alpha + google/gemini-3-pro-preview
# Launch 2 models in parallel
Task 1: golang-architect (PROXY MODE) โ Sherlock Think Alpha
Task 2: golang-architect (PROXY MODE) โ Gemini 3 Pro Preview
Time: ~5 minutes total Success Rate: TBD (new strategy) Cost: $$$ (one free, one paid = moderate overall)
Use for:
Benefits:
Prompt Strategy:
Analyze the entire Dingo codebase focusing on [specific aspect].
Context provided:
- All files in pkg/ (50+ files)
- All tests in tests/ (60+ files)
- Documentation in ai-docs/
- Total: ~200k lines of code
Your task: [specific analysis goal]
Models: openrouter/sherlock-think-alpha + x-ai/grok-code-fast-1
# Launch 2 models in parallel
Task 1: golang-architect (PROXY MODE) โ Sherlock Think Alpha (FREE!)
Task 2: golang-architect (PROXY MODE) โ Grok Code Fast
Time: ~5 minutes total Success Rate: 85%+ Cost: $$ (Sherlock is FREE, only pay for Grok!)
Use for:
Benefits:
START: Need external model consultation
โ
[What type of task?]
โ
โโ Bug Investigation (90% of cases)
โ โ Strategy 1: MiniMax M2 + Grok Code Fast
โ โ Time: 4 min | Cost: $$ | Success: 95%+
โ
โโ Critical Bug / Architectural Decision
โ โ Strategy 2: MiniMax M2 + GPT-5.1 + Grok
โ โ Time: 5 min | Cost: $$$ | Success: 99%+
โ
โโ Ambiguous / Multi-faceted Problem
โ โ Strategy 3: MiniMax M2 + Gemini + Grok
โ โ Time: 6 min | Cost: $$ | Success: 90%+
โ
โโ Cost-Sensitive / Exploratory
โ Strategy 4: Gemini + Grok
โ Time: 6 min | Cost: $ | Success: 85%+
CRITICAL: External models take 5-10 minutes. Default 2-minute timeout WILL fail.
# When delegating to agents in PROXY MODE:
Task tool โ golang-architect:
**CRITICAL - Timeout Configuration**:
When executing claudish via Bash tool, ALWAYS use:
```bash
Bash(
command='cat prompt.md | claudish --model [model-id] > output.md 2>&1',
timeout=600000, # 10 minutes (REQUIRED!)
description='External consultation via [model-name]'
)
Why: Qwen3 Coder failed due to 2-minute timeout. 10 minutes prevents this.
CORRECT (6-8x speedup):
# Single message with multiple Task calls
Task 1: golang-architect (PROXY MODE) โ Model A
Task 2: golang-architect (PROXY MODE) โ Model B
Task 3: golang-architect (PROXY MODE) โ Model C
# All execute simultaneously
WRONG (sequential, slow):
# Multiple messages
Message 1: Task โ Model A (wait...)
Message 2: Task โ Model B (wait...)
Message 3: Task โ Model C (wait...)
# Takes 3x longer
Agents in PROXY MODE MUST return MAX 3 lines:
[Model-name] analysis complete
Root cause: [one-line summary]
Full analysis: [file-path]
DO NOT return full analysis in agent response (causes context bloat).
Input: Write investigation prompt to file
ai-docs/sessions/[timestamp]/input/investigation-prompt.md
Output: Agents write full analysis to files
ai-docs/sessions/[timestamp]/output/[model-name]-analysis.md
Main chat: Reads ONLY summaries, not full files
Based on LSP Source Mapping Bug Investigation (Session 20251118-223538):
MiniMax M2 (91/100):
qPos calculation produces column 15 instead of 27strings.Index() to strings.LastIndex()Grok Code Fast (83/100):
GPT-5.1 Codex (80/100):
Gemini 2.5 Flash (73/100):
GLM-4.6 (70/100):
Sherlock Think (65/100):
Qwen3 Coder (0/100):
Test: LSP Source Mapping Bug (diagnostic underlining wrong code) Methodology: 8 models tested in parallel on real production bug
| Model | Time | Accuracy | Solution | Cost-Value |
|---|---|---|---|---|
| MiniMax M2 | 3 min | โ Exact | Simple fix | โญโญโญโญโญ |
| Grok Code Fast | 4 min | โ Correct | Good validation | โญโญโญโญ |
| GPT-5.1 Codex | 5 min | โ ๏ธ Partial | Complex design | โญโญโญโญ |
| Gemini 2.5 Flash | 6 min | โ ๏ธ Missed | Overanalyzed | โญโญโญ |
| GLM-4.6 | 7 min | โ Wrong | Overengineered | โญโญ |
| Sherlock Think | 5 min | โ Secondary | Wrong cause | โญโญ |
| Qwen3 Coder | 8+ min | โ Failed | Timeout | โ ๏ธ |
Key Finding: Faster models (3-5 min) delivered better results than slower ones (6-8 min).
Correlation: Speed โ Simplicity (faster models prioritize simple explanations first)
SESSION=$(date +%Y%m%d-%H%M%S)
mkdir -p ai-docs/sessions/$SESSION/{input,output}
# Write clear, self-contained prompt
echo "Problem: LSP diagnostic underlining wrong code..." > \
ai-docs/sessions/$SESSION/input/investigation-prompt.md
Based on decision tree:
Single message with 2 Task calls:
Task 1 โ golang-architect (PROXY MODE):
You are operating in PROXY MODE to investigate bug using MiniMax M2.
INPUT FILES:
- ai-docs/sessions/$SESSION/input/investigation-prompt.md
YOUR TASK (PROXY MODE):
1. Read investigation prompt
2. Use claudish to consult minimax/minimax-m2
3. Write full response to output file
**CRITICAL - Timeout**:
Bash(timeout=600000) # 10 minutes!
OUTPUT FILES:
- ai-docs/sessions/$SESSION/output/minimax-m2-analysis.md
RETURN (MAX 3 lines):
MiniMax M2 analysis complete
Root cause: [one-line]
Full analysis: [file-path]
Task 2 โ golang-architect (PROXY MODE):
[Same structure for Grok Code Fast]
After receiving both summaries:
Last Validated: 2025-11-18 (Session 20251118-223538) Next Review: 2025-05 (6 months) Test Task: LSP Source Mapping Bug
Re-validation Schedule:
Track:
Most common use case (90%): โ Use Strategy 1: MiniMax M2 + Grok Code Fast โ Time: 4 min | Cost: $$ | Success: 95%+
Critical issues: โ Use Strategy 2: MiniMax M2 + GPT-5.1 + Grok โ Time: 5 min | Cost: $$$ | Success: 99%+
Ambiguous problems: โ Use Strategy 3: MiniMax M2 + Gemini + Grok โ Time: 6 min | Cost: $$ | Success: 90%+
Cost-sensitive: โ Use Strategy 4: Gemini + Grok โ Time: 6 min | Cost: $ | Success: 85%+
Remember:
Full Reports:
ai-docs/sessions/20251118-223538/01-planning/comprehensive-model-comparison.mdai-docs/sessions/20251118-223538/01-planning/model-ranking-analysis.md