Run any Skill in Manus with one click

local-llm-setup

Stars3

Forks2

UpdatedJune 7, 2026 at 04:06

Cross-platform setup wizard for the local Gemma 4 12B inference stack. Automates llama-server installation (binary download or Metal/CUDA/Vulkan/ROCm compile), model download, routing proxy daemon install (launchd/systemd/NSSM), and Mode A/B validation. Covers Day 1 bootstrap and Day 2+ reconfiguration.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

richfrem

richfrem/agent-plugins-skills

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

File Explorer

8 files

SKILL.md

readonly

name	local-llm-setup
plugin	cli-agents
description	Cross-platform setup wizard for the local Gemma 4 12B inference stack. Automates llama-server installation (binary download or Metal/CUDA/Vulkan/ROCm compile), model download, routing proxy daemon install (launchd/systemd/NSSM), and Mode A/B validation. Covers Day 1 bootstrap and Day 2+ reconfiguration.
allowed-tools	Bash, Read, Write

User wants to set up local Gemma 4 for the first time on a Mac. User: Set up local LLM with Gemma 4 on my M1 Mac Agent: Detects Metal GPU, compiles llama-server from source, downloads gemma-4-12b-UD-Q4_K_XL.gguf, starts server, installs routing proxy via launchd, validates with a Mode B timing test (~2s). User wants to test Mode B task delegation speed vs Mode A proxy. User: Compare Mode B vs Mode A speed for local Gemma Agent: Runs `time python3 scripts/run_agent.py /dev/null /dev/null /tmp/t.md "hello" --cli llama` (~2s), then `time claude --model gemma-4-12b -p "hello"` (~30–60s cold), reports the delta.

Primary Use Case: Mode B Task Delegation

Mode B is the fast path. run_agent.py sends a lean prompt directly to llama-server — no proxy overhead, no 29K system prompt. Measured: ~2s wall clock for a typical bounded task.

# Start llama-server (required for cli=llama)
python3 scripts/run_server.py
curl http://localhost:8089/health   # must return {"status":"ok"}

# Mode B task delegation — fast path (~2s)
time python3 scripts/run_agent.py agents/refactor-expert.md target.py output.md \
  "List the top 3 issues." --cli llama

# Mode B with custom max tokens
python3 scripts/run_agent.py /dev/null /dev/null /tmp/out.md \
  "Summarize this architecture decision." --cli llama --max-tokens 300

Available agent personas (pass as PERSONA_FILE):

Persona	Role
`agents/refactor-expert.md`	Code quality — SOLID/DRY smell taxonomy
`agents/security-auditor.md`	OWASP vulnerability audit
`agents/architect-review.md`	C4/SOLID structural review
`agents/red-team-reviewer.md`	Adversarial exploit analysis
`agents/compliance-reviewer.md`	Coding standards drift detection
`agents/pr-reviewer.md`	Diff review — ship/hold decision
`agents/test-writer.md`	Unit test generation
`agents/debate-synthesizer.md`	Multi-perspective synthesis
`agents/output-validator.md`	Output guardrail / hallucination check
`agents/self-critic.md`	Reflection loop — task-fit check
`agents/performance-analyst.md`	Bottleneck and scale analysis

Mode A (Optional — Interactive Proxy)

Mode A routes Claude Code itself through Gemma via a proxy. It carries ~29K tokens of system prompt overhead per session, making the first turn 30–60s. Not recommended for task delegation — use Mode B instead.

python3 scripts/enable_global_routing.py   # install launchd/systemd/NSSM daemon
python3 scripts/disable_global_routing.py  # remove daemon

Co-located Scripts (`scripts/`)

Script	Purpose
`run_server.py`	Start llama-server (authoritative params)
`run_agent.py`	Task router — Mode B, 6 backends
`enable_global_routing.py`	Install Mode A proxy daemon
`disable_global_routing.py`	Remove Mode A proxy daemon
`routing_proxy.py`	Mode A API compatibility proxy (port 4000)

More from this repository

same repository

agent-swarm

richfrem/agent-plugins-skills

(Industry standard: Parallel Agent) Primary Use Case: Work that can be partitioned into independent sub-tasks running concurrently across multiple agents. Parallel multi-agent execution pattern. Use when: work can be partitioned into independent tasks that N agents can execute simultaneously across worktrees. Includes routing (sequential vs parallel), merge verification, and correction loops.

2026-06-083

dual-loop

richfrem/agent-plugins-skills

(Industry standard: Sequential Agent / Agent as a Tool) Primary Use Case: Delegating a well-defined task to a worker agent, verifying its execution, and repeating if necessary. Inner/outer agent delegation pattern. Use when: work needs to be delegated from a strategic controller (Outer Loop) to a tactical executor (Inner Loop) via strategy packets, with verification and correction loops.

2026-06-083

learning-loop

richfrem/agent-plugins-skills

(Industry standard: Loop Agent / Single Agent) Primary Use Case: Self-contained research, content generation, and exploration where no inner delegation is required. Self-directed research and knowledge capture loop. Use when: starting a session (Orientation), performing research (Synthesis), or closing a session (Seal, Persist, Retrospective). Ensures knowledge survives across isolated agent sessions.

2026-06-083

orchestrator

richfrem/agent-plugins-skills

(Industry standard: Routing Agent / Orchestrator Pattern) Primary Use Case: Analyzing an ambiguous trigger and routing it to one of the specific specialized implementations. Routes triggers to the appropriate agent-loop pattern. Use when: assessing a task, research need, or work assignment and deciding whether to run a simple learning loop, red team review, dual-loop delegation, or parallel swarm. Manages shared closure (seal, persist, retrospective, self-improvement).

2026-06-083

red-team-review

richfrem/agent-plugins-skills

(Industry standard: Review and Critique Pattern) Primary Use Case: Iterative generation paired with adversarial review, continuing until an 'Approved' verdict is reached. Orchestrated adversarial review loop. Use when: research, designs, architectures, or decisions need to be reviewed by red team agents (human, browser, or CLI). Iterates in rounds of research → bundle → review → feedback until approved.

2026-06-083

triple-loop-learning

richfrem/agent-plugins-skills

(Industry standard: Meta-Learning System / Automated Autoresearch) Primary Use Case: Continuous, self-improving orchestration of an agentic system over multiple sessions. Use when: building a continuous improvement layer that autonomously identifies workflow friction, postulates hypotheses, and tests improved instructions/coding skills against an objective headless benchmark before merging and persisting.

2026-06-083

name	local-llm-setup
plugin	cli-agents
description	Cross-platform setup wizard for the local Gemma 4 12B inference stack. Automates llama-server installation (binary download or Metal/CUDA/Vulkan/ROCm compile), model download, routing proxy daemon install (launchd/systemd/NSSM), and Mode A/B validation. Covers Day 1 bootstrap and Day 2+ reconfiguration.
allowed-tools	Bash, Read, Write

Primary Use Case: Mode B Task Delegation

Mode B is the fast path. run_agent.py sends a lean prompt directly to llama-server — no proxy overhead, no 29K system prompt. Measured: ~2s wall clock for a typical bounded task.

# Start llama-server (required for cli=llama)
python3 scripts/run_server.py
curl http://localhost:8089/health   # must return {"status":"ok"}

# Mode B task delegation — fast path (~2s)
time python3 scripts/run_agent.py agents/refactor-expert.md target.py output.md \
  "List the top 3 issues." --cli llama

# Mode B with custom max tokens
python3 scripts/run_agent.py /dev/null /dev/null /tmp/out.md \
  "Summarize this architecture decision." --cli llama --max-tokens 300

Available agent personas (pass as PERSONA_FILE):

Persona	Role
`agents/refactor-expert.md`	Code quality — SOLID/DRY smell taxonomy
`agents/security-auditor.md`	OWASP vulnerability audit
`agents/architect-review.md`	C4/SOLID structural review
`agents/red-team-reviewer.md`	Adversarial exploit analysis
`agents/compliance-reviewer.md`	Coding standards drift detection
`agents/pr-reviewer.md`	Diff review — ship/hold decision
`agents/test-writer.md`	Unit test generation
`agents/debate-synthesizer.md`	Multi-perspective synthesis
`agents/output-validator.md`	Output guardrail / hallucination check
`agents/self-critic.md`	Reflection loop — task-fit check
`agents/performance-analyst.md`	Bottleneck and scale analysis

Mode A (Optional — Interactive Proxy)

python3 scripts/enable_global_routing.py   # install launchd/systemd/NSSM daemon
python3 scripts/disable_global_routing.py  # remove daemon

Co-located Scripts (`scripts/`)

Script	Purpose
`run_server.py`	Start llama-server (authoritative params)
`run_agent.py`	Task router — Mode B, 6 backends
`enable_global_routing.py`	Install Mode A proxy daemon
`disable_global_routing.py`	Remove Mode A proxy daemon
`routing_proxy.py`	Mode A API compatibility proxy (port 4000)

local-llm-setup

Primary Use Case: Mode B Task Delegation

Mode A (Optional — Interactive Proxy)

Co-located Scripts (scripts/)

More from this repository

Primary Use Case: Mode B Task Delegation

Mode A (Optional — Interactive Proxy)

Co-located Scripts (scripts/)

More from this repository

Co-located Scripts (`scripts/`)

Co-located Scripts (`scripts/`)