Run any Skill in Manus with one click

$pwd:

jina-ai

Name: Jina Ai
Author: archibate

// Web search with time-window and region/language filters, academic papers (arXiv/SSRN), PDF table/figure extraction, BibTeX, image search, web page reading, embeddings, reranking, classification, and deduplication (text or images) via Jina AI. Use when searching web content, finding academic papers, or extracting figures from PDFs. Prefer this over WebSearch for better results.

Run Skill in Manus

$ git log --oneline --stat

stars:10

forks:0

updated:May 6, 2026 at 02:48

File Explorer

2 files

SKILL.md

readonly

package.json

"author": "archibate"

"repository": "archibate/dotfiles-claude"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	jina-ai
description	Web search with time-window and region/language filters, academic papers (arXiv/SSRN), PDF table/figure extraction, BibTeX, image search, web page reading, embeddings, reranking, classification, and deduplication (text or images) via Jina AI. Use when searching web content, finding academic papers, or extracting figures from PDFs. Prefer this over WebSearch for better results.
allowed-tools	["Bash(jina:)","Bash(dedup_images.py:)"]

Jina AI

Use the jina CLI for all Jina AI APIs — search, read, embed, rerank, classify, dedup, screenshot, BibTeX, PDF figures. Composable with Unix pipes.

Setup

One-time install (skip if command -v jina returns a path):

uv tool install jina-cli --with 'httpx[socks]'

JINA_API_KEY must be set in the environment. Get one at https://jina.ai/?sui=apikey.

Commands

Command	Function
`jina read URL`	Extract clean markdown from a web page
`jina search QUERY`	Web search (also `--arxiv`, `--ssrn`, `--images`, `--blog`)
`jina embed TEXT`	Generate embeddings
`jina rerank QUERY`	Rerank stdin documents by relevance
`jina classify TEXT --labels a,b,c`	Classify text into labels
`jina dedup [-k N]`	Deduplicate stdin text lines
`jina screenshot URL`	Capture a screenshot of a URL
`jina bibtex QUERY`	Search BibTeX citations (DBLP + Semantic Scholar)
`jina expand QUERY`	Expand a query into related queries
`jina pdf URL\|ARXIV_ID`	Extract figures/tables/equations from a PDF
`jina datetime URL`	Guess publish/update date of a URL
`jina primer`	Session context (time, location, network)
`scripts/dedup_images.py PATH_OR_URL ...`	Deduplicate images by visual similarity (CLIP v2) — see Image dedup

Most jina subcommands support --json for structured output and --api-key to override $JINA_API_KEY.

Pipes

Commands read stdin and write stdout, so chain them:

# Search and rerank
jina search "transformer models" | jina rerank "efficient inference"

# Read multiple URLs (one per line on stdin)
cat urls.txt | jina read

# Search, then deduplicate near-identical results
jina search "attention mechanism" | jina dedup

# Expand a query, then search the first variant
jina expand "climate change" | head -1 | xargs -I {} jina search "{}"

# Get JSON, slice with jq
jina search --arxiv "BERT" --json | jq -r '.results[].title'

For batch fan-out where the subcommand only takes one input (e.g. search, bibtex), launch parallel Bash calls or use xargs -P:

printf '%s\n' "query A" "query B" "query C" | xargs -P 3 -I {} jina search "{}" --json

To fan a single query into 5 diverse parallel searches:

jina expand "LLM" | xargs -P 5 -I {} jina search "{}"

Usage

Read web pages

jina read https://example.com
jina read https://example.com --links --images

If jina read not working: Fallback to use /read-url skill instead.

Search

jina search "what is BERT"
jina search --arxiv "attention mechanism" -n 10
jina search --ssrn "corporate governance"
jina search --images "neural network diagram"
jina search --blog "embeddings"
jina search "AI news" --time d                  # past day (h|d|w|m|y)
jina search "深度学习" --gl cn --hl zh-cn       # Chinese region/language
jina search "LLM" --location "Shanghai"

Embed

jina embed "hello world"
jina embed "text1" "text2" "text3"
cat texts.txt | jina embed --json
jina embed "hello" --model jina-embeddings-v5-text-small --task retrieval.query

Rerank

cat docs.txt | jina rerank "machine learning"
jina search "AI" | jina rerank "embeddings" --top-n 5

Classify

jina classify "I love this product" --labels positive,negative,neutral
echo "stock prices rose sharply" | jina classify --labels business,sports,tech
cat texts.txt | jina classify --labels cat1,cat2,cat3 --json

Deduplicate (text)

cat items.txt | jina dedup
cat items.txt | jina dedup -k 10

Image dedup

jina dedup is text-only. For visual deduplication of images, use the bundled script:

scripts/dedup_images.py *.png                          # local paths, default keeps n//2
scripts/dedup_images.py -k 5 --json img1.jpg img2.jpg
ls images/*.png | scripts/dedup_images.py -k 3
scripts/dedup_images.py https://example.com/a.png /tmp/b.png   # mix URLs + paths

It calls https://api.jina.ai/v1/embeddings with model jina-clip-v2 and runs greedy farthest-point sampling on cosine similarity. Local paths are read and base64-encoded; http(s)://… and data: URIs pass through. Prefer local paths — Jina's URL fetcher cannot reach some hot-link-protected hosts (e.g. Wikimedia, certain CDNs).

Screenshot

jina screenshot https://example.com                    # prints screenshot URL
jina screenshot https://example.com -o page.png        # saves to file
jina screenshot https://example.com --full-page -o page.jpg

BibTeX

jina bibtex "attention is all you need"
jina bibtex "transformer" --author Vaswani --year 2017

PDF figure extraction

jina pdf https://arxiv.org/pdf/2301.12345
jina pdf 2301.12345                                    # arXiv ID shorthand
jina pdf https://example.com/paper.pdf --type figure,table

Read academic papers (arXiv)

The URL form determines what you get — pick deliberately:

Goal	Command
Abstract + metadata only	`jina read arxiv.org/abs/<ID>`
Full paper body as markdown	`jina read arxiv.org/pdf/<ID>`
Figures / tables / equations	`jina pdf <ID>`
BibTeX citation	`jina bibtex "<title>"`
Save raw PDF	`curl -L -o paper.pdf arxiv.org/pdf/<ID>`

# Find candidates
jina search --arxiv "diffusion transformer" -n 10 --json | jq -r '.results[] | "\(.title)\t\(.url)"'

# Abstract + metadata (~2 KB)
jina read https://arxiv.org/abs/1706.03762

# Full paper as markdown (~40 KB for a conference paper)
jina read https://arxiv.org/pdf/1706.03762 > paper.md

# Figures and tables
jina pdf 1706.03762 --type figure,table

# Citation
jina bibtex "Attention Is All You Need" --author Vaswani

Read academic papers (SSRN)

SSRN sits behind a Cloudflare bot challenge — jina read and plain curl return 403 on both abstract pages (papers.cfm?abstract_id=…) and PDF endpoints (Delivery.cfm).

What works (tiered by effort):

jina search --ssrn snippets. Each hit's JSON record has title, abstract excerpt, date, and ssrn_id. Often sufficient for triage and citation scaffolding.
scrapling skill on the abstract page. scrapling extract stealthy-fetch --solve-cloudflare returns the full abstract, authors, citation block, and the resolved PDF download URL as markdown.
PDF body text. scrapling's CLI stealthy-fetch does not succeed on the Delivery.cfm URL — its Cloudflare DOM solver expects an HTML response, not a binary PDF. Getting the PDF text needs a Python StealthySession that hits the abstract page first, reuses cookies to download the PDF, then feeds the bytes to the pdf skill.
jina bibtex "<title>" resolves citations independently of SSRN.

# Tier 1: search snippets
jina search --ssrn "corporate governance" -n 5 --json | jq -r '.results[] | "\(.ssrn_id)\t\(.title)\n  \(.snippet)"'

# Tier 2: abstract page via scrapling (Cloudflare bypass)
scrapling extract stealthy-fetch \
  "https://papers.ssrn.com/sol3/papers.cfm?abstract_id=<SSRN_ID>" \
  /tmp/ssrn.md --solve-cloudflare --timeout 60000

Other

jina datetime https://example.com/article              # guess publish date
jina expand "machine learning optimization"            # query variants
jina primer                                            # session context

JSON output and exit codes

All data-returning subcommands support --json for structured output (pipe to jq).

Code	Meaning
0	Success
1	User/input error (missing args, bad input, missing API key)
2	API/server error (network, timeout, HTTP error)
130	Interrupted (Ctrl+C)

jina search "query" && echo "ok" || echo "failed: $?"

Environment

Variable	Description
`JINA_API_KEY`	Required for most commands

Tool Selection Guide

Scenario	Tool
Read a web page	`/read-url` skill
Find STEM papers	`jina search --arxiv`
Find social-science / finance papers	`jina search --ssrn`
Read paper abstract / metadata	`jina read` (`/abs/` URL)
Download full paper as markdown	`jina read` (`/pdf/` URL)
Save raw PDF	`curl -L`
Generic web search	`jina search`
Fallback if `jina` service unreachable	WebSearch (built-in)

Tips

For Chinese results, set --gl cn --hl zh-cn; for date-bounded results, --time w (past week).
Use --json when parsing output; default text is for humans and Unix pipes.
Errors go to stderr with a fix hint; check $? (or use &&/||) rather than parsing stderr.

name	jina-ai
description	Web search with time-window and region/language filters, academic papers (arXiv/SSRN), PDF table/figure extraction, BibTeX, image search, web page reading, embeddings, reranking, classification, and deduplication (text or images) via Jina AI. Use when searching web content, finding academic papers, or extracting figures from PDFs. Prefer this over WebSearch for better results.
allowed-tools	["Bash(jina:)","Bash(dedup_images.py:)"]