一键在 Manus 中运行任何 Skill

pysam

星标16

分支2

更新时间2026年2月1日 04:41

Python module for reading, manipulating and writing genomic alignment formats (SAM/BAM/CRAM) and variant files (VCF/BCF). Wrapper for htslib.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

tondevrel

tondevrel/scientific-agent-skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Pysam - Genomic Alignments

Used for high-throughput sequencing pipelines. It allows efficient access to billions of DNA fragments aligned to a reference genome.

When to Use

Processing next-generation sequencing (NGS) data.
Analyzing genomic variants (SNPs, indels).
Extracting reads from specific genomic regions.
Building custom bioinformatics pipelines.
Quality control of sequencing data.

Core Principles

Indexed Access

BAM files must be indexed (.bai) for efficient random access to genomic regions.

Coordinate System

Genomic coordinates are 0-based (Python-style) for positions, but 1-based for ranges in some contexts.

Read Attributes

Each read contains sequence, quality scores, alignment position, and flags.

Quick Reference

Standard Imports

import pysam

Basic Patterns

# 1. Open BAM file
samfile = pysam.AlignmentFile("aligned_reads.bam", "rb")

# 2. Iterate over reads in a specific genomic region
for read in samfile.fetch("chr1", 10000, 10100):
    print(f"Read: {read.query_name}, Quality: {read.mapping_quality}")
    print(f"Sequence: {read.query_sequence}")
    print(f"Position: {read.reference_start}")

# 3. Variant analysis (VCF)
vcf = pysam.VariantFile("mutations.vcf")
for rec in vcf.fetch("chr1", 10000, 10100):
    print(f"Pos: {rec.pos}, Ref: {rec.ref}, Alt: {rec.alts}")
    print(f"Genotype: {rec.samples['sample1']['GT']}")

# 4. Writing aligned reads
outfile = pysam.AlignmentFile("output.bam", "wb", template=samfile)
for read in samfile:
    if read.mapping_quality > 30:
        outfile.write(read)
outfile.close()

Critical Rules

✅ DO

Always use indexed files - Create index with pysam.index("file.bam") for fast access.
Check read flags - Use read.is_paired, read.is_unmapped to filter reads.
Handle unmapped reads - Unmapped reads have reference_start = -1.
Close files explicitly - Use context managers or .close() to avoid resource leaks.

❌ DON'T

Don't iterate over entire BAM - Use fetch() with regions for efficiency.
Don't ignore quality scores - Low-quality bases can cause false variants.
Don't mix coordinate systems - Be consistent with 0-based vs 1-based indexing.

Advanced Patterns

Counting Reads per Gene

# Using a gene annotation file
genes = {}  # gene_name -> (chr, start, end)
for read in samfile.fetch():
    # Check if read overlaps any gene
    for gene, (chr, start, end) in genes.items():
        if read.reference_name == chr and start <= read.reference_start < end:
            genes[gene]['count'] += 1

Variant Filtering

# Filter variants by quality and depth
for rec in vcf.fetch():
    depth = rec.samples['sample1']['DP']
    qual = rec.qual
    if depth > 10 and qual > 20:
        # Process high-quality variant
        pass

Pysam provides the low-level access needed for genomic data processing, enabling researchers to work directly with the raw data of life itself.

同仓库更多 Skills

同仓库

ase

tondevrel/scientific-agent-skills

Atomic Simulation Environment - a set of tools for setting up, manipulating, running, visualizing, and analyzing atomistic simulations. Acts as a universal interface between Python and numerous quantum chemical and molecular dynamics codes. Use for building atomic structures, geometry optimization, molecular dynamics simulations, transition state searches (NEB), file format conversion (CIF, XYZ, POSCAR, PDB), electronic property calculations (DOS, band structures), and automating simulation workflows with DFT/MD codes like VASP, GPAW, Quantum ESPRESSO, LAMMPS.

2026-02-0116

astropy

tondevrel/scientific-agent-skills

The core library for Astronomy and Astrophysics in Python. Provides data structures for coordinates, time, units, FITS files, and cosmological models. Essential for observational data reduction and theoretical astrophysics. Use when working with astronomical coordinates (RA/Dec), physical units, FITS files, time scales, WCS, cosmology, or astronomical tables.

2026-02-0116

chempy

tondevrel/scientific-agent-skills

A Python package useful for chemistry (mainly physical/analytical/inorganic chemistry). Features include balancing chemical reactions, chemical kinetics (ODE integration), chemical equilibria, ionic strength calculations, and unit handling. Use when working with chemical equations, reaction balancing, kinetic modeling, equilibrium calculations, speciation, pH calculations, ionic strength, activity coefficients, or chemical formula parsing.

2026-02-0116

cobrapy

tondevrel/scientific-agent-skills

Constraints-Based Reconstruction and Analysis for Python. Used for modeling large-scale metabolic networks in microorganisms.

2026-02-0116

dask-optimization

tondevrel/scientific-agent-skills

Advanced sub-skill for Dask focused on distributed system performance, memory management, and task graph optimization. Covers cluster tuning, efficient serialization, data skew mitigation, and dashboard-driven debugging.

2026-02-0116

dask

tondevrel/scientific-agent-skills

A flexible library for parallel computing in Python. It scales Python libraries like NumPy, pandas, and scikit-learn to multi-core systems or distributed clusters. Features lazy evaluation and task scheduling for data that exceeds RAM capacity. Use for out-of-core computing, parallel processing, distributed computing, large-scale data analysis, dask.array, dask.dataframe, dask.delayed, dask.bag, task scheduling, lazy evaluation, and scaling beyond memory limits.

2026-02-0116

name	pysam
description	Python module for reading, manipulating and writing genomic alignment formats (SAM/BAM/CRAM) and variant files (VCF/BCF). Wrapper for htslib.
version	0.22
license	MIT

Pysam - Genomic Alignments

Used for high-throughput sequencing pipelines. It allows efficient access to billions of DNA fragments aligned to a reference genome.

When to Use

Processing next-generation sequencing (NGS) data.
Analyzing genomic variants (SNPs, indels).
Extracting reads from specific genomic regions.
Building custom bioinformatics pipelines.
Quality control of sequencing data.

Core Principles

Indexed Access

BAM files must be indexed (.bai) for efficient random access to genomic regions.

Coordinate System

Genomic coordinates are 0-based (Python-style) for positions, but 1-based for ranges in some contexts.

Read Attributes

Each read contains sequence, quality scores, alignment position, and flags.

Quick Reference

Standard Imports

import pysam

Basic Patterns

# 1. Open BAM file
samfile = pysam.AlignmentFile("aligned_reads.bam", "rb")

# 2. Iterate over reads in a specific genomic region
for read in samfile.fetch("chr1", 10000, 10100):
    print(f"Read: {read.query_name}, Quality: {read.mapping_quality}")
    print(f"Sequence: {read.query_sequence}")
    print(f"Position: {read.reference_start}")

# 3. Variant analysis (VCF)
vcf = pysam.VariantFile("mutations.vcf")
for rec in vcf.fetch("chr1", 10000, 10100):
    print(f"Pos: {rec.pos}, Ref: {rec.ref}, Alt: {rec.alts}")
    print(f"Genotype: {rec.samples['sample1']['GT']}")

# 4. Writing aligned reads
outfile = pysam.AlignmentFile("output.bam", "wb", template=samfile)
for read in samfile:
    if read.mapping_quality > 30:
        outfile.write(read)
outfile.close()

Critical Rules

✅ DO

Always use indexed files - Create index with pysam.index("file.bam") for fast access.
Check read flags - Use read.is_paired, read.is_unmapped to filter reads.
Handle unmapped reads - Unmapped reads have reference_start = -1.
Close files explicitly - Use context managers or .close() to avoid resource leaks.

❌ DON'T

Don't iterate over entire BAM - Use fetch() with regions for efficiency.
Don't ignore quality scores - Low-quality bases can cause false variants.
Don't mix coordinate systems - Be consistent with 0-based vs 1-based indexing.

Advanced Patterns

Counting Reads per Gene

# Using a gene annotation file
genes = {}  # gene_name -> (chr, start, end)
for read in samfile.fetch():
    # Check if read overlaps any gene
    for gene, (chr, start, end) in genes.items():
        if read.reference_name == chr and start <= read.reference_start < end:
            genes[gene]['count'] += 1

Variant Filtering

# Filter variants by quality and depth
for rec in vcf.fetch():
    depth = rec.samples['sample1']['DP']
    qual = rec.qual
    if depth > 10 and qual > 20:
        # Process high-quality variant
        pass

Pysam provides the low-level access needed for genomic data processing, enabling researchers to work directly with the raw data of life itself.