一键在 Manus 中运行任何 Skill

embeddings-and-search

星标5

分支0

更新时间2026年6月20日 10:09

Use when generating embeddings, calling the 12 web-search providers, or running OCR over documents with the 4 OCR providers through liter-llm. Covers embed, search, and ocr methods plus reranking.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

xberg-io

xberg-io/plugins

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

SKILL.md

readonly

name	embeddings-and-search
description	Use when generating embeddings, calling the 12 web-search providers, or running OCR over documents with the 4 OCR providers through liter-llm. Covers embed, search, and ocr methods plus reranking.

Embeddings and Search

liter-llm exposes embeddings, web search (12 providers), OCR (4 providers), and reranking through the same provider/model routing convention.

Embeddings

import asyncio, os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.embed(
        model="openai/text-embedding-3-small",
        input=["first document", "second document"],
    )
    for item in response.data:
        print(len(item.embedding))

asyncio.run(main())

Many embedding models support dimension selection and base64 output; pass dimensions / encoding_format where the provider allows it.

Web search (12 providers)

client = LlmClient(api_key=os.environ["BRAVE_API_KEY"])
response = await client.search(
    model="brave/web-search",
    query="What is the Rust programming language?",
    max_results=5,
)
for result in response.results:
    print(result.title, result.url)

OCR (4 providers)

client = LlmClient(api_key=os.environ["MISTRAL_API_KEY"])
response = await client.ocr(
    model="mistral/mistral-ocr-latest",
    document={"type": "document_url", "url": "https://example.com/invoice.pdf"},
)
for page in response.pages:
    print(page.index, page.markdown[:100])

Reranking

Use rerank(...) to score and order candidate documents against a query for retrieval pipelines — combine it with embed for hybrid retrieval. Routing follows the same provider/model convention.

Notes

Search and OCR providers each need their own API key (e.g. BRAVE_API_KEY, MISTRAL_API_KEY); read them from env vars.
See the upstream provider reference for the full list of the 12 search and 4 OCR backends and their model identifiers.

同仓库更多 Skills

同仓库

automating-the-browser

xberg-io/plugins

Use when extracting a page needs scripted interaction first — click, type, press a key, scroll, wait, screenshot, or run JS before capturing the DOM. Covers `crawlberg interact <url> --actions` with the real action schema, result shape, limits, and external-CDP options.

2026-06-255

crawlberg

xberg-io/plugins

Crawl, scrape, and convert websites to Markdown using the local crawlberg CLI and its MCP server. Use when the user wants to fetch a page, follow links across a domain, enumerate URLs, or drive a real browser. Covers installation, the subcommands (scrape, crawl, map, interact, mcp, serve), output formats (JSON + Markdown), browser fallback, and when to prefer the MCP server over shelling out.

2026-06-255

crawling-a-site

xberg-io/plugins

Use when the user wants to follow links across a domain and capture every reachable page as Markdown. Covers `crawlberg crawl` with depth, page caps, concurrency, rate limiting, domain scoping, robots, and output selection.

2026-06-255

headless-fallback

xberg-io/plugins

Use when a static fetch returns nothing useful and the page needs a real browser. Covers `--browser-mode auto|always|never`, external CDP via `--browser-endpoint`, symptoms of JS-only pages and WAF blocks, and the performance cost.

2026-06-255

mapping-urls

xberg-io/plugins

Use when the user wants the list of URLs on a site rather than the page content — sitemap analysis, link planning, or seeding another tool. Covers `crawlberg map <url>` with `--limit`, `--search`, robots, output, and how it differs from a full crawl.

2026-06-255

scraping-html-to-markdown

xberg-io/plugins

Use when the user wants a single page rendered as clean Markdown plus structured metadata. Covers `crawlberg scrape <url>`, JSON vs Markdown output, what metadata is returned, and how to handle JS-heavy pages.

2026-06-255

name	embeddings-and-search
description	Use when generating embeddings, calling the 12 web-search providers, or running OCR over documents with the 4 OCR providers through liter-llm. Covers embed, search, and ocr methods plus reranking.

Embeddings and Search

liter-llm exposes embeddings, web search (12 providers), OCR (4 providers), and reranking through the same provider/model routing convention.

Embeddings

import asyncio, os
from liter_llm import LlmClient

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.embed(
        model="openai/text-embedding-3-small",
        input=["first document", "second document"],
    )
    for item in response.data:
        print(len(item.embedding))

asyncio.run(main())

Many embedding models support dimension selection and base64 output; pass dimensions / encoding_format where the provider allows it.

Web search (12 providers)

client = LlmClient(api_key=os.environ["BRAVE_API_KEY"])
response = await client.search(
    model="brave/web-search",
    query="What is the Rust programming language?",
    max_results=5,
)
for result in response.results:
    print(result.title, result.url)

OCR (4 providers)

client = LlmClient(api_key=os.environ["MISTRAL_API_KEY"])
response = await client.ocr(
    model="mistral/mistral-ocr-latest",
    document={"type": "document_url", "url": "https://example.com/invoice.pdf"},
)
for page in response.pages:
    print(page.index, page.markdown[:100])

Reranking

Notes

Search and OCR providers each need their own API key (e.g. BRAVE_API_KEY, MISTRAL_API_KEY); read them from env vars.
See the upstream provider reference for the full list of the 12 search and 4 OCR backends and their model identifiers.