一键导入
pysam
Python module for reading, manipulating and writing genomic alignment formats (SAM/BAM/CRAM) and variant files (VCF/BCF). Wrapper for htslib.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Python module for reading, manipulating and writing genomic alignment formats (SAM/BAM/CRAM) and variant files (VCF/BCF). Wrapper for htslib.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
Atomic Simulation Environment - a set of tools for setting up, manipulating, running, visualizing, and analyzing atomistic simulations. Acts as a universal interface between Python and numerous quantum chemical and molecular dynamics codes. Use for building atomic structures, geometry optimization, molecular dynamics simulations, transition state searches (NEB), file format conversion (CIF, XYZ, POSCAR, PDB), electronic property calculations (DOS, band structures), and automating simulation workflows with DFT/MD codes like VASP, GPAW, Quantum ESPRESSO, LAMMPS.
The core library for Astronomy and Astrophysics in Python. Provides data structures for coordinates, time, units, FITS files, and cosmological models. Essential for observational data reduction and theoretical astrophysics. Use when working with astronomical coordinates (RA/Dec), physical units, FITS files, time scales, WCS, cosmology, or astronomical tables.
A Python package useful for chemistry (mainly physical/analytical/inorganic chemistry). Features include balancing chemical reactions, chemical kinetics (ODE integration), chemical equilibria, ionic strength calculations, and unit handling. Use when working with chemical equations, reaction balancing, kinetic modeling, equilibrium calculations, speciation, pH calculations, ionic strength, activity coefficients, or chemical formula parsing.
Constraints-Based Reconstruction and Analysis for Python. Used for modeling large-scale metabolic networks in microorganisms.
Advanced sub-skill for Dask focused on distributed system performance, memory management, and task graph optimization. Covers cluster tuning, efficient serialization, data skew mitigation, and dashboard-driven debugging.
A flexible library for parallel computing in Python. It scales Python libraries like NumPy, pandas, and scikit-learn to multi-core systems or distributed clusters. Features lazy evaluation and task scheduling for data that exceeds RAM capacity. Use for out-of-core computing, parallel processing, distributed computing, large-scale data analysis, dask.array, dask.dataframe, dask.delayed, dask.bag, task scheduling, lazy evaluation, and scaling beyond memory limits.
| name | pysam |
| description | Python module for reading, manipulating and writing genomic alignment formats (SAM/BAM/CRAM) and variant files (VCF/BCF). Wrapper for htslib. |
| version | 0.22 |
| license | MIT |
Used for high-throughput sequencing pipelines. It allows efficient access to billions of DNA fragments aligned to a reference genome.
BAM files must be indexed (.bai) for efficient random access to genomic regions.
Genomic coordinates are 0-based (Python-style) for positions, but 1-based for ranges in some contexts.
Each read contains sequence, quality scores, alignment position, and flags.
import pysam
# 1. Open BAM file
samfile = pysam.AlignmentFile("aligned_reads.bam", "rb")
# 2. Iterate over reads in a specific genomic region
for read in samfile.fetch("chr1", 10000, 10100):
print(f"Read: {read.query_name}, Quality: {read.mapping_quality}")
print(f"Sequence: {read.query_sequence}")
print(f"Position: {read.reference_start}")
# 3. Variant analysis (VCF)
vcf = pysam.VariantFile("mutations.vcf")
for rec in vcf.fetch("chr1", 10000, 10100):
print(f"Pos: {rec.pos}, Ref: {rec.ref}, Alt: {rec.alts}")
print(f"Genotype: {rec.samples['sample1']['GT']}")
# 4. Writing aligned reads
outfile = pysam.AlignmentFile("output.bam", "wb", template=samfile)
for read in samfile:
if read.mapping_quality > 30:
outfile.write(read)
outfile.close()
pysam.index("file.bam") for fast access.read.is_paired, read.is_unmapped to filter reads.reference_start = -1..close() to avoid resource leaks.fetch() with regions for efficiency.# Using a gene annotation file
genes = {} # gene_name -> (chr, start, end)
for read in samfile.fetch():
# Check if read overlaps any gene
for gene, (chr, start, end) in genes.items():
if read.reference_name == chr and start <= read.reference_start < end:
genes[gene]['count'] += 1
# Filter variants by quality and depth
for rec in vcf.fetch():
depth = rec.samples['sample1']['DP']
qual = rec.qual
if depth > 10 and qual > 20:
# Process high-quality variant
pass
Pysam provides the low-level access needed for genomic data processing, enabling researchers to work directly with the raw data of life itself.