| name | new-boost-module |
| description | Create new Harbor Boost modules — the Python plugins that run inside Harbor's LLM proxy. Use this skill whenever the user wants to build a Boost module, write a custom module for Harbor Boost, add a new feature to the Boost proxy pipeline, or create any kind of middleware that transforms, augments, or intercepts LLM chat completions in Harbor. Also triggers when the user mentions "boost module", "boost plugin", "custom module for boost", or wants to add prompt engineering, reasoning chains, or output transforms to Harbor's proxy layer.
|
Creating Harbor Boost Modules
Harbor Boost is an optimizing LLM proxy with an OpenAI-compatible API. Modules are Python
files that hook into the completion pipeline — they can rewrite prompts, inject system messages,
chain multiple LLM calls, stream artifacts, or replace the completion entirely.
A module is activated by prefixing a model name with the module's ID_PREFIX. For example,
a module with ID_PREFIX = "mymod" is invoked via model mymod-llama3.1.
Before You Write Code
- Read the Custom Modules guide:
docs/5.2.1.-Harbor-Boost-Custom-Modules.md
- Scan existing modules in
services/boost/src/modules/ to find similar functionality
you can reuse or learn from. There are 17+ built-in modules covering reasoning chains,
prompt rewriting, structured output, artifacts, and more.
- Read the Built-in Modules reference:
docs/5.2.3-Harbor-Boost-Modules.md
Understanding the available primitives (chat, llm, config, selection, log) saves
you from reinventing patterns that already exist.
Module Structure
Every module is a single .py file with two required exports:
ID_PREFIX = 'my_module'
async def apply(chat: 'Chat', llm: 'LLM'):
pass
Required Exports
| Export | Type | Purpose |
|---|
ID_PREFIX | str | Unique identifier. Becomes the model prefix (e.g., mymod-llama3.1). Use lowercase, short, memorable names. |
apply | async def(chat, llm) | Entry point called for every matching completion request. |
Recommended Exports
| Export | Type | Purpose |
|---|
DOCS | str | Markdown documentation shown in the Boost modules reference. Include a description, parameters, and usage examples. |
logger | Logger | Created via log.setup_logger(ID_PREFIX) for consistent, filterable logging. |
Core Primitives
chat — The Conversation
Chat is a linked list of ChatNode objects. The tail is the most recent message.
chat.tail.content
chat.text()
chat.history()
chat.user("...")
chat.assistant("...")
chat.tail.ancestor().add_parent(
ch.ChatNode(role="system", content="You are helpful.")
)
side = ch.Chat.from_conversation([
{"role": "user", "content": "Summarize this."}
])
llm — The Downstream Model
llm talks to the actual LLM backend configured in Boost.
Output to the client (streamed in real time):
await llm.emit_message("text")
await llm.emit_status("Working...")
await llm.emit_artifact("<html>...</html>")
Internal completions (not streamed, returned to you):
result = await llm.chat_completion(prompt="...", resolve=True)
result = await llm.chat_completion(messages=[...], resolve=True)
result = await llm.chat_completion(chat=some_chat, resolve=True)
result = await llm.chat_completion(prompt="...", schema=MyModel, resolve=True)
Streamed completions (sent to client AND returned):
text = await llm.stream_chat_completion(prompt="...")
text = await llm.stream_chat_completion(messages=[...])
Final completion (always reaches the client, even with intermediate output disabled):
await llm.stream_final_completion()
await llm.stream_final_completion(prompt="Custom prompt")
Boost parameters — clients can pass @boost_* params in the request body:
llm.boost_params
config — Service Configuration
For configurable modules, define config entries in services/boost/src/config.py using
the same pattern as existing modules. Config keys follow the HARBOR_BOOST_<MODULE>_<PARAM>
convention and are exposed via harbor boost <module> <param> CLI.
selection — Message Selection
For modules that operate on specific messages rather than the whole chat, use the selection
module. It provides strategy-based filtering (all, first, last, match, etc.) — see
klmbr, eli5, or rcn modules for usage patterns.
Writing Good DOCS
The DOCS string is the user-facing documentation. It should include:
- A brief description of what the module does and why it's useful
- Parameter descriptions if the module is configurable
- Usage examples showing
harbor boost CLI commands and/or standalone Docker usage
Format example:
DOCS = """
Explains complex questions simply before answering them, improving accuracy
on nuanced prompts.
**Parameters**
- `strat` - message selection strategy. Default: `match`
- `strat_params` - selection filter. Default: matches last user message
```bash
harbor boost modules add eli5
harbor boost eli5 strat match
harbor boost eli5 strat_params role=user,index=-1
Standalone
docker run \\
-e "HARBOR_BOOST_MODULES=eli5" \\
-e "HARBOR_BOOST_OPENAI_URLS=http://172.17.0.1:11434/v1" \\
-p 34131:8000 \\
ghcr.io/av/harbor-boost:latest
"""
## Common Patterns
**Prompt injection** — add context before the final completion:
```python
async def apply(chat, llm):
chat.tail.add_parent(ch.ChatNode(role="system", content="Extra context"))
await llm.stream_final_completion()
Multi-step reasoning — chain internal completions, then produce a final answer:
async def apply(chat, llm):
await llm.emit_status("Analyzing...")
analysis = await llm.chat_completion(prompt="Analyze: {q}", q=chat.tail.content, resolve=True)
await llm.stream_final_completion(prompt="Given analysis:\n{analysis}\n\nAnswer: {q}",
analysis=analysis, q=chat.tail.content)
Pass-through with side effects — do something extra while preserving normal behavior:
async def apply(chat, llm):
logger.info(f"Processing: {chat.tail.content[:50]}")
await llm.stream_final_completion()
Checklist
Before considering the module done: