Run any Skill in Manus with one click

$pwd:

bio-sequence-slicing

Name: Bio Sequence Slicing
Author: GPTomics

// Slice, extract, and concatenate biological sequences using Biopython. Use when extracting subsequences, joining sequences, or manipulating sequence regions by position.

Run Skill in Manus

$ git log --oneline --stat

stars:583

forks:108

updated:February 13, 2026 at 03:01

File Explorer

5 files

SKILL.md

readonly

package.json

"author": "GPTomics"

"repository": "GPTomics/bioSkills"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	bio-sequence-slicing
description	Slice, extract, and concatenate biological sequences using Biopython. Use when extracting subsequences, joining sequences, or manipulating sequence regions by position.
tool_type	python
primary_tool	Bio.Seq

Version Compatibility

Reference examples tested with: BioPython 1.83+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Sequence Slicing

Extract, slice, and concatenate sequences using Biopython's Seq objects.

"Extract a subsequence" → Use Python slicing on Seq objects with 0-based half-open coordinates.

Python: seq[start:end] (BioPython Seq)

"Join exons into mRNA" → Extract multiple regions and concatenate them.

Python: sum((seq[s:e] for s, e in coords), Seq('')) or + operator

Required Import

from Bio.Seq import Seq

Core Operations

Indexing (Single Position)

seq = Seq('ATGCGATCG')
seq[0]      # 'A' - first base (0-indexed)
seq[-1]     # 'G' - last base
seq[3]      # 'C' - fourth base

Slicing (Extract Region)

seq = Seq('ATGCGATCGATCG')
seq[0:3]     # Seq('ATG') - first 3 bases
seq[3:6]     # Seq('CGA') - positions 3-5
seq[:5]      # Seq('ATGCG') - first 5
seq[-5:]     # Seq('GATCG') - last 5
seq[::2]     # Seq('AGGTGTG') - every 2nd base
seq[::-1]    # Seq('GCTAGCTAGCGTA') - reversed

Note: Slicing returns a Seq object, not a string.

Concatenation

seq1 = Seq('ATGC')
seq2 = Seq('GGGG')
combined = seq1 + seq2  # Seq('ATGCGGGG')

Can also concatenate with strings:

seq = Seq('ATGC')
extended = seq + 'NNNN'  # Seq('ATGCNNNN')

Code Patterns

Extract CDS by Coordinates

genome = Seq('NNNNATGCGATCGATCGTAANNN')
cds_start, cds_end = 4, 21
cds = genome[cds_start:cds_end]

Extract with 1-Based Coordinates

Biology often uses 1-based coordinates. Convert to 0-based:

def extract_1based(seq, start, end):
    '''Extract using 1-based inclusive coordinates'''
    return seq[start - 1:end]

genome = Seq('ATGCGATCGATCG')
region = extract_1based(genome, 1, 3)  # Seq('ATG')

Split Sequence into Codons

def split_codons(seq):
    return [seq[i:i+3] for i in range(0, len(seq) - len(seq) % 3, 3)]

seq = Seq('ATGCGATCGATCG')
codons = split_codons(seq)  # [Seq('ATG'), Seq('CGA'), ...]

Split into Fixed-Length Chunks

def chunk_sequence(seq, size):
    return [seq[i:i+size] for i in range(0, len(seq), size)]

seq = Seq('ATGCGATCGATCGATCGATCG')
chunks = chunk_sequence(seq, 10)

Join Sequences with Linker

seqs = [Seq('ATGC'), Seq('GGGG'), Seq('TTTT')]
linker = Seq('NNN')
joined = linker.join(seqs)  # Seq('ATGCNNNGGGGNNTTTT')

Or manually:

linker = 'NNN'
joined = Seq(linker.join(str(s) for s in seqs))

Extract Multiple Regions

Goal: Splice non-contiguous regions (e.g., exons) into a single continuous sequence.

Approach: Extract each region by coordinates and concatenate with + or sum().

def extract_regions(seq, regions):
    '''Extract and concatenate multiple regions'''
    return sum((seq[start:end] for start, end in regions), Seq(''))

exon_coords = [(0, 50), (100, 150), (200, 250)]
mrna = extract_regions(genomic_seq, exon_coords)

Extract Flanking Regions

def get_flanking(seq, position, flank_size):
    '''Get sequence around a position'''
    start = max(0, position - flank_size)
    end = min(len(seq), position + flank_size + 1)
    return seq[start:end]

seq = Seq('ATGCGATCGATCGATCGATCG')
flanking = get_flanking(seq, 10, 5)  # 5 bp on each side of position 10

Tile Sequence into Overlapping Windows

def sliding_windows(seq, window_size, step=1):
    for i in range(0, len(seq) - window_size + 1, step):
        yield seq[i:i + window_size]

seq = Seq('ATGCGATCGATCG')
for window in sliding_windows(seq, 5, 2):
    print(window)

Extract Feature from SeqRecord

from Bio import SeqIO

for record in SeqIO.parse('sequence.gb', 'genbank'):
    for feature in record.features:
        if feature.type == 'CDS':
            cds_seq = feature.extract(record.seq)
            print(f'{feature.qualifiers.get("gene", ["?"])[0]}: {cds_seq[:30]}...')

Create New SeqRecord from Slice

from Bio.SeqRecord import SeqRecord

original = SeqRecord(Seq('ATGCGATCGATCGATCG'), id='full', description='Full sequence')
subset = SeqRecord(original.seq[5:15], id='subset', description=f'Positions 5-15 of {original.id}')

Coordinate Systems

System	Position 1	Example
0-based (Python)	Index 0	`seq[0:3]` gets positions 0, 1, 2
1-based (Biology)	Index 1	Position 1-3 = `seq[0:3]`
0-based half-open	Start inclusive, end exclusive	Standard Python slicing

Common Errors

Error	Cause	Solution
`IndexError`	Index out of range	Check sequence length first
Unexpected length	Off-by-one error	Remember end index is exclusive
Empty result	Start >= end	Check coordinate order
Wrong positions	1-based vs 0-based confusion	Convert coordinates explicitly

Decision Tree

Need to extract or combine sequences?
├── Single position?
│   └── Use indexing: seq[i]
├── Contiguous region?
│   └── Use slicing: seq[start:end]
├── Multiple non-contiguous regions?
│   └── Extract each, concatenate with +
├── Join sequences?
│   ├── No linker: seq1 + seq2
│   └── With linker: linker.join(seqs)
├── Split into parts?
│   └── List comprehension with slicing
└── From GenBank features?
    └── Use feature.extract(record.seq)

Related Skills

seq-objects - Create Seq and SeqRecord objects
sequence-io/read-sequences - Parse GenBank files with features to extract
transcription-translation - Translate extracted CDS regions
alignment-files - Extract sequences from BAM using samtools fasta/fastq

name	bio-sequence-slicing
description	Slice, extract, and concatenate biological sequences using Biopython. Use when extracting subsequences, joining sequences, or manipulating sequence regions by position.
tool_type	python
primary_tool	Bio.Seq