一键在 Manus 中运行任何 Skill

rag-skills

星标832

分支57

更新时间2026年1月14日 06:01

RAG-specific best practices for LlamaIndex, ChromaDB, and Celery workers. Covers ingestion, retrieval, embeddings, and performance.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

llama-farm

llama-farm/llamafarm

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

RAG Skills for LlamaFarm

Framework-specific patterns and code review checklists for the RAG component.

Extends: python-skills - All Python best practices apply here.

Component Overview

Aspect	Technology	Version
Python	Python	3.11+
Document Processing	LlamaIndex	0.13+
Vector Storage	ChromaDB	1.0+
Task Queue	Celery	5.5+
Embeddings	Universal/Ollama/OpenAI	Multiple

Directory Structure

rag/
├── api.py                 # Search and database APIs
├── celery_app.py          # Celery configuration
├── main.py                # Entry point
├── core/
│   ├── base.py            # Document, Component, Pipeline ABCs
│   ├── factories.py       # Component factories
│   ├── ingest_handler.py  # File ingestion with safety checks
│   ├── blob_processor.py  # Binary file processing
│   ├── settings.py        # Pydantic settings
│   └── logging.py         # RAGStructLogger
├── components/
│   ├── embedders/         # Embedding providers
│   ├── extractors/        # Metadata extractors
│   ├── parsers/           # Document parsers (LlamaIndex)
│   ├── retrievers/        # Retrieval strategies
│   └── stores/            # Vector stores (ChromaDB, FAISS)
├── tasks/                 # Celery tasks
│   ├── ingest_tasks.py    # File ingestion
│   ├── search_tasks.py    # Database search
│   ├── query_tasks.py     # Complex queries
│   ├── health_tasks.py    # Health checks
│   └── stats_tasks.py     # Statistics
└── utils/
    └── embedding_safety.py  # Circuit breaker, validation

Quick Reference

Topic	File	Key Points
LlamaIndex	llamaindex.md	Document parsing, chunking, node conversion
ChromaDB	chromadb.md	Collections, embeddings, distance metrics
Celery	celery.md	Task routing, error handling, worker config
Performance	performance.md	Batching, caching, deduplication

Core Patterns

Document Dataclass

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Document:
    content: str
    metadata: dict[str, Any] = field(default_factory=dict)
    id: str = field(default_factory=lambda: str(uuid.uuid4()))
    source: str | None = None
    embeddings: list[float] | None = None

Component Abstract Base Class

from abc import ABC, abstractmethod

class Component(ABC):
    def __init__(
        self,
        name: str | None = None,
        config: dict[str, Any] | None = None,
        project_dir: Path | None = None,
    ):
        self.name = name or self.__class__.__name__
        self.config = config or {}
        self.logger = RAGStructLogger(__name__).bind(name=self.name)
        self.project_dir = project_dir

    @abstractmethod
    def process(self, documents: list[Document]) -> ProcessingResult:
        pass

Retrieval Strategy Pattern

class RetrievalStrategy(Component, ABC):
    @abstractmethod
    def retrieve(
        self,
        query_embedding: list[float],
        vector_store,
        top_k: int = 5,
        **kwargs
    ) -> RetrievalResult:
        pass

    @abstractmethod
    def supports_vector_store(self, vector_store_type: str) -> bool:
        pass

Embedder with Circuit Breaker

class Embedder(Component):
    DEFAULT_FAILURE_THRESHOLD = 5
    DEFAULT_RESET_TIMEOUT = 60.0

    def __init__(self, ...):
        super().__init__(...)
        self._circuit_breaker = CircuitBreaker(
            failure_threshold=config.get("failure_threshold", 5),
            reset_timeout=config.get("reset_timeout", 60.0),
        )
        self._fail_fast = config.get("fail_fast", True)

    def embed_text(self, text: str) -> list[float]:
        self.check_circuit_breaker()
        try:
            embedding = self._call_embedding_api(text)
            self.record_success()
            return embedding
        except Exception as e:
            self.record_failure(e)
            if self._fail_fast:
                raise EmbedderUnavailableError(str(e)) from e
            return [0.0] * self.get_embedding_dimension()

Review Checklist Summary

When reviewing RAG code:

LlamaIndex (Medium priority)
- Proper chunking configuration
- Metadata preservation during parsing
- Error handling for unsupported formats
ChromaDB (High priority)
- Thread-safe client access
- Proper distance metric selection
- Metadata type compatibility
Celery (High priority)
- Task routing to correct queue
- Error logging with context
- Proper serialization
Performance (Medium priority)
- Batch processing for embeddings
- Deduplication enabled
- Appropriate caching

See individual topic files for detailed checklists with grep patterns.

同仓库更多 Skills

同仓库

cli-skills

llama-farm/llamafarm

CLI best practices for LlamaFarm. Covers Cobra, Bubbletea, Lipgloss patterns for Go CLI development.

2026-01-29832

electron-skills

llama-farm/llamafarm

Electron patterns for LlamaFarm Desktop. Covers main/renderer processes, IPC, security, and packaging.

2026-01-29832

python-skills

llama-farm/llamafarm

Shared Python best practices for LlamaFarm. Covers patterns, async, typing, testing, error handling, and security.

2026-01-29832

server-skills

llama-farm/llamafarm

Server-specific best practices for FastAPI, Celery, and Pydantic. Extends python-skills with framework-specific patterns.

2026-01-29832

typescript-skills

llama-farm/llamafarm

Shared TypeScript best practices for Designer and Electron subsystems.

2026-01-29832

llama-farm/llamafarm

Manage LlamaFarm worktrees for isolated parallel development. Create, start, stop, and clean up worktrees.

2026-01-29832

name	rag-skills
description	RAG-specific best practices for LlamaIndex, ChromaDB, and Celery workers. Covers ingestion, retrieval, embeddings, and performance.
allowed-tools	Read, Grep, Glob
user-invocable	false

RAG Skills for LlamaFarm

Framework-specific patterns and code review checklists for the RAG component.

Extends: python-skills - All Python best practices apply here.

Component Overview

Aspect	Technology	Version
Python	Python	3.11+
Document Processing	LlamaIndex	0.13+
Vector Storage	ChromaDB	1.0+
Task Queue	Celery	5.5+
Embeddings	Universal/Ollama/OpenAI	Multiple

Directory Structure

rag/
├── api.py                 # Search and database APIs
├── celery_app.py          # Celery configuration
├── main.py                # Entry point
├── core/
│   ├── base.py            # Document, Component, Pipeline ABCs
│   ├── factories.py       # Component factories
│   ├── ingest_handler.py  # File ingestion with safety checks
│   ├── blob_processor.py  # Binary file processing
│   ├── settings.py        # Pydantic settings
│   └── logging.py         # RAGStructLogger
├── components/
│   ├── embedders/         # Embedding providers
│   ├── extractors/        # Metadata extractors
│   ├── parsers/           # Document parsers (LlamaIndex)
│   ├── retrievers/        # Retrieval strategies
│   └── stores/            # Vector stores (ChromaDB, FAISS)
├── tasks/                 # Celery tasks
│   ├── ingest_tasks.py    # File ingestion
│   ├── search_tasks.py    # Database search
│   ├── query_tasks.py     # Complex queries
│   ├── health_tasks.py    # Health checks
│   └── stats_tasks.py     # Statistics
└── utils/
    └── embedding_safety.py  # Circuit breaker, validation

Quick Reference

Topic	File	Key Points
LlamaIndex	llamaindex.md	Document parsing, chunking, node conversion
ChromaDB	chromadb.md	Collections, embeddings, distance metrics
Celery	celery.md	Task routing, error handling, worker config
Performance	performance.md	Batching, caching, deduplication

Core Patterns

Document Dataclass

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Document:
    content: str
    metadata: dict[str, Any] = field(default_factory=dict)
    id: str = field(default_factory=lambda: str(uuid.uuid4()))
    source: str | None = None
    embeddings: list[float] | None = None

Component Abstract Base Class

from abc import ABC, abstractmethod

class Component(ABC):
    def __init__(
        self,
        name: str | None = None,
        config: dict[str, Any] | None = None,
        project_dir: Path | None = None,
    ):
        self.name = name or self.__class__.__name__
        self.config = config or {}
        self.logger = RAGStructLogger(__name__).bind(name=self.name)
        self.project_dir = project_dir

    @abstractmethod
    def process(self, documents: list[Document]) -> ProcessingResult:
        pass

Retrieval Strategy Pattern

class RetrievalStrategy(Component, ABC):
    @abstractmethod
    def retrieve(
        self,
        query_embedding: list[float],
        vector_store,
        top_k: int = 5,
        **kwargs
    ) -> RetrievalResult:
        pass

    @abstractmethod
    def supports_vector_store(self, vector_store_type: str) -> bool:
        pass

Embedder with Circuit Breaker

class Embedder(Component):
    DEFAULT_FAILURE_THRESHOLD = 5
    DEFAULT_RESET_TIMEOUT = 60.0

    def __init__(self, ...):
        super().__init__(...)
        self._circuit_breaker = CircuitBreaker(
            failure_threshold=config.get("failure_threshold", 5),
            reset_timeout=config.get("reset_timeout", 60.0),
        )
        self._fail_fast = config.get("fail_fast", True)

    def embed_text(self, text: str) -> list[float]:
        self.check_circuit_breaker()
        try:
            embedding = self._call_embedding_api(text)
            self.record_success()
            return embedding
        except Exception as e:
            self.record_failure(e)
            if self._fail_fast:
                raise EmbedderUnavailableError(str(e)) from e
            return [0.0] * self.get_embedding_dimension()

Review Checklist Summary

When reviewing RAG code:

LlamaIndex (Medium priority)
- Proper chunking configuration
- Metadata preservation during parsing
- Error handling for unsupported formats
ChromaDB (High priority)
- Thread-safe client access
- Proper distance metric selection
- Metadata type compatibility
Celery (High priority)
- Task routing to correct queue
- Error logging with context
- Proper serialization
Performance (Medium priority)
- Batch processing for embeddings
- Deduplication enabled
- Appropriate caching

See individual topic files for detailed checklists with grep patterns.