Run any Skill in Manus with one click

sciverse-academic-retrieval

Citation-grade academic literature retrieval (search, semantic chunks, byte-range read, figure fetch) over Sciverse, an open scientific platform indexing peer-reviewed and preprint papers.

Run Skill in Manus

Overview

Citation-grade academic literature retrieval (search, semantic chunks, byte-range read, figure fetch) over Sciverse, an open scientific platform indexing peer-reviewed and preprint papers.

Install command

npx skills add https://github.com/InternScience/scp --skill sciverse-academic-retrieval

Copy and paste this command into Claude Code to install the skill

Source

InternScience/scp

Stars146

Forks10

UpdatedJune 3, 2026 at 07:34

SKILL.md

readonly

More from this repository

same repository

multiomics-integration

InternScience/scp

Multi-Omics Integration - Integrate transcriptomics (TCGA), proteomics (UniProt), pathway enrichment (STRING), and metabolic pathway (KEGG) data for a target gene. Outputs a unified JSON report combining expression profiles, protein annotations, enriched pathways, and KEGG pathway details.

2026-04-29146

admet-druglikeness-report

InternScience/scp

ADMET & Drug-Likeness Report - Generate comprehensive ADMET and drug-likeness report: molecular properties, H-bond analysis, hydrophobicity, topology, and ADMET prediction. Use this skill for medicinal chemistry tasks involving calculate mol basic info calculate mol hbond calculate mol hydrophobicity calculate mol topology pred molecule admet. Combines 5 tools from 2 SCP server(s).

2026-03-03146

affinity-maturation

InternScience/scp

Affinity Maturation Pipeline - Affinity maturation: compute binding affinity, predict mutations, compute hydrophilicity, and predict drug-target interaction. Use this skill for antibody engineering tasks involving ComputeAffinityCalculator zero shot sequence prediction ComputeHydrophilicity PredictDrugTargetInteraction. Combines 4 tools from 3 SCP server(s).

2026-03-03146

alanine-scanning-pipeline

InternScience/scp

Alanine Scanning Mutagenesis Pipeline - Alanine scanning: design scan, compute properties for each mutant, predict interactions, and compare. Use this skill for protein biochemistry tasks involving AlanineScanningDesigner ComputeProtPara PredictDrugTargetInteraction calculate protein sequence properties. Combines 4 tools from 3 SCP server(s).

2026-03-03146

aliphatic-ring-analysis

InternScience/scp

Ring System Analysis - Analyze ring systems: count aliphatic carbocycles, analyze aromaticity, compute topology, and structure complexity. Use this skill for organic chemistry tasks involving GetAliphaticCarbocyclesNum AromaticityAnalyzer calculate mol topology calculate mol structure complexity. Combines 4 tools from 3 SCP server(s).

2026-03-03146

alphafold-structure-pipeline

InternScience/scp

AlphaFold Structure Analysis Pipeline - AlphaFold pipeline: download predicted structure, predict pockets, extract sequence, and compute properties. Use this skill for computational biology tasks involving download alphafold structure run fpocket extract pdb sequence calculate pdb basic info. Combines 4 tools from 3 SCP server(s).

2026-03-03146

Source

InternScience

InternScience/scp

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	sciverse-academic-retrieval
description	Citation-grade academic literature retrieval (search, semantic chunks, byte-range read, figure fetch) over Sciverse, an open scientific platform indexing peer-reviewed and preprint papers.
license	Apache-2.0
metadata	{"skill-author":"OpenDataLab"}

Sciverse Academic Retrieval

Connects to the Sciverse SCP Server via the SCP Hub MCP gateway to perform citation-grade scientific literature retrieval over a corpus that includes peer-reviewed papers (Nature, Cell, …), preprints (arXiv, bioRxiv, …) and other academic sources.

The server exposes 5 tools designed for RAG by autonomous research agents: structured metadata search, natural-language semantic chunk retrieval, byte-range source-text reading, and figure/table image fetching — all returning stable doc_id / chunk_id for reproducible citation.

Usage

import asyncio
import json
import base64
from mcp.client.streamable_http import streamablehttp_client
from mcp import ClientSession


class SciverseClient:
    """Sciverse SCP Server client (5 academic-retrieval tools).

    All requests transparently proxied by the SCP Hub to the Sciverse backend.
    Authentication uses the SCP-HUB-API-KEY header (your SCP Platform key).
    """

    def __init__(self, server_url: str, api_key: str):
        self.server_url = server_url
        self.api_key = api_key
        self.session = None

    async def connect(self):
        try:
            self.transport = streamablehttp_client(
                url=self.server_url,
                headers={"SCP-HUB-API-KEY": self.api_key},
            )
            self.read, self.write, self.get_session_id = await self.transport.__aenter__()
            self.session_ctx = ClientSession(self.read, self.write)
            self.session = await self.session_ctx.__aenter__()
            await self.session.initialize()
            return True
        except Exception as e:
            print(f"[sciverse] connect failed: {e}")
            return False

    async def disconnect(self):
        if self.session:
            await self.session_ctx.__aexit__(None, None, None)
        if hasattr(self, "transport"):
            await self.transport.__aexit__(None, None, None)

    def parse_text_result(self, result):
        """Extract concatenated text from a tool result's content blocks.

        Works for: search_papers, semantic_search, read_content, list_catalog.
        Returns: str (the tool's JSON payload as text).
        """
        if isinstance(result, dict):
            content_list = result.get("content") or []
        else:
            content_list = getattr(result, "content", []) or []
        texts = []
        for item in content_list:
            if isinstance(item, dict):
                if item.get("type") == "text":
                    texts.append(item.get("text") or "")
            else:
                if getattr(item, "type", None) == "text":
                    texts.append(getattr(item, "text", "") or "")
        return "".join(texts)

    def parse_image_result(self, result):
        """Extract a figure/table image (used by get_resource).

        Returns: dict with keys 'mime_type' (e.g. 'image/png') and 'bytes'
                 (decoded binary). Returns None if the result is not an image.
        """
        if isinstance(result, dict):
            content_list = result.get("content") or []
        else:
            content_list = getattr(result, "content", []) or []
        for item in content_list:
            data = item.get("data") if isinstance(item, dict) else getattr(item, "data", None)
            mime = item.get("mimeType") if isinstance(item, dict) else getattr(item, "mimeType", None)
            type_ = item.get("type") if isinstance(item, dict) else getattr(item, "type", None)
            if type_ == "image" and data:
                return {"mime_type": mime, "bytes": base64.b64decode(data)}
        return None

Initialize and use

SERVER_URL = "https://scp.intern-ai.org.cn/api/v1/mcp/43/Sciverse"
API_KEY = "<YOUR_SCP_HUB_API_KEY>"


async def main():
    client = SciverseClient(SERVER_URL, API_KEY)
    if not await client.connect():
        print("connect failed")
        return
    try:
        # 1. Structured search: recent transformer papers
        result = await client.session.call_tool(
            "search_papers",
            arguments={
                "query": "transformer attention",   # BM25 over title/abstract/journal
                "year_from": 2023,
                "page_size": 5,
            },
        )
        papers = json.loads(client.parse_text_result(result))
        print(f"search_papers hits: {len(papers.get('hits', []))}")

        # 2. Semantic search: RAG-style chunk retrieval
        result = await client.session.call_tool(
            "semantic_search",
            arguments={"query": "How does transformer attention work?", "top_k": 3},
        )
        chunks = json.loads(client.parse_text_result(result))
        for hit in chunks.get("hits", []):
            print(f"  - {hit['title']} (score={hit['score']:.3f}, doc_id={hit['doc_id']})")

        # 3. Read content: expand context around a known offset
        if chunks.get("hits"):
            first = chunks["hits"][0]
            result = await client.session.call_tool(
                "read_content",
                arguments={"doc_id": first["doc_id"], "offset": first["offset"], "limit": 4096},
            )
            text_window = json.loads(client.parse_text_result(result))
            print(f"read_content next_offset={text_window.get('next_offset')} more={text_window.get('more')}")

        # 4. List catalog: discover available filter fields and operators
        result = await client.session.call_tool(
            "list_catalog", arguments={"include_sample_values": False},
        )
        catalog = json.loads(client.parse_text_result(result))
        print(f"available filter fields: {len(catalog.get('fields', []))}")

        # 5. Get resource: fetch a figure referenced inside read_content's Markdown
        # (Only call after read_content returned a Markdown snippet with ![alt](file_name).)
        # result = await client.session.call_tool(
        #     "get_resource", arguments={"file_name": "figures/fig-3.png"},
        # )
        # image = client.parse_image_result(result)
        # if image:
        #     from pathlib import Path
        #     Path("fig-3.png").write_bytes(image["bytes"])
    finally:
        await client.disconnect()


asyncio.run(main())

Tool: `search_papers`

Structured metadata search by author, journal, year, subject, etc. Use when the user knows specific filter values ("Hinton's papers from 2020-2023", "Nature papers on CRISPR"). Do not use for free-text Q&A — that's semantic_search.

Args:
- query (str, optional) — BM25 keyword over title/abstract/journal
- title_contains (str, optional) — Substring match on title
- abstract_contains (str, optional) — Substring match on abstract
- authors (list[str], optional) — Any of these authors matches
- year_from / year_to (int, optional) — Publication year range (inclusive)
- journals (list[str], optional) — Journal names (any match)
- subjects (list[str], optional) — Subject classification (e.g. "biology")
- sort_by_year (str, default "desc") — desc / asc / none
- page (int, default 1), page_size (int, default 10, max 50)
- filters_advanced (list, optional) — Escape hatch with full operator set (FILTER_OP_EQ, IN, CONTAINS, GTE, LTE, …) for fields not surfaced above
Returns: JSON {hits: [...], total: int} where each hit has doc_id, title, author, abstract, publication_venue_name, publication_published_year.

Tool: `semantic_search`

Natural-language semantic search returning relevant paper chunks for RAG-style answering. Use for free-text questions ("How does attention work?"). Typical chain: semantic_search → pick chunk → read_content.

Args:
- query (str, required) — Natural-language query, 1-200 words optimal
- top_k (int, default 10, max 30)
- source_types (list[str], optional) — Filter by web / pdf
- mode (str, default "balanced") — fast (~200ms keyword only) / balanced (~600ms hybrid) / quality (~2-4s LLM-rewrite + hybrid)
Returns: JSON {hits: [...]} where each hit has chunk_id, doc_id, chunk (the matched text), score, title, offset (byte offset into source doc — pass to read_content for expansion).

Tool: `read_content`

Read a UTF-8 byte range of a paper's source text. Typically called with a doc_id/offset returned by semantic_search to expand context (read more bytes before or after a chunk for fuller answers).

Args:
- doc_id (str, required) — Paper ID from search_papers / semantic_search
- offset (int, default 0) — Byte offset to start reading
- limit (int, default 4096, max 16384) — Bytes to read
Returns: JSON {text: str, bytes_returned: int, next_offset: int, more: bool}. Markdown text may contain figure references like ![alt](file_name) — pass file_name to get_resource to fetch the image.

Tool: `get_resource`

Fetch the binary bytes of a paper figure / table image referenced inside read_content's Markdown. Use when the user asks to see / describe a figure and read_content output contains an image reference.

Args:
- file_name (str, required) — Relative path from the Markdown ![alt](file_name). Must not contain .. or start with /.
Returns: Image content block — data (base64) + mimeType (image/*). Multimodal agents (Claude, GPT-4V, Gemini, …) can read it directly.

Tool: `list_catalog`

Returns the schema catalog for search_papers: every field name, type, whether it's filterable / sortable / default-returned, human description, and applicable filter operators. Use when constructing precise search_papers filters or facing an ambiguous field need.

Args:
- include_sample_values (bool, default false) — If true, also fetch top-20 values for enum-like fields (24h cached, ~100s of ms first call).
Returns: JSON {fields: [...]} where each field has name, type (string/integer/list[string]/…), filterable, sortable, default_return, description, applicable_operators, and optionally sample_values.

Use Cases

Drug discovery / pharmacology: literature scoping for a target before triggering wet-lab skills; RAG context for ADMET / MoA reasoning.
Protein science: gather structure/function papers around a UniProt ID before predicting mutations or binding sites.
Genomics & rare disease: pull recent papers on a variant / phenotype for evidence-grade reasoning, then cite by doc_id.
Chemistry / materials: find prior art around a SMILES or reaction before computing properties.
Cross-domain literature review: agentic survey writing — chain semantic_search → read_content to assemble citation-grounded summaries with stable doc_id references for verifiability.

sciverse-academic-retrieval

More from this repository

Sciverse Academic Retrieval

Usage

Initialize and use

Tool: search_papers

Tool: semantic_search

Tool: read_content

Tool: get_resource

Tool: list_catalog

Use Cases

Sciverse Academic Retrieval

Usage

Initialize and use

Tool: search_papers

Tool: semantic_search

Tool: read_content

Tool: get_resource

Tool: list_catalog

Use Cases

More from this repository

Tool: `search_papers`

Tool: `semantic_search`

Tool: `read_content`

Tool: `get_resource`

Tool: `list_catalog`

Tool: `search_papers`

Tool: `semantic_search`

Tool: `read_content`

Tool: `get_resource`

Tool: `list_catalog`