with one click
bio-sequence-slicing
// Slice, extract, and concatenate biological sequences using Biopython. Use when extracting subsequences, joining sequences, or manipulating sequence regions by position.
// Slice, extract, and concatenate biological sequences using Biopython. Use when extracting subsequences, joining sequences, or manipulating sequence regions by position.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | bio-sequence-slicing |
| description | Slice, extract, and concatenate biological sequences using Biopython. Use when extracting subsequences, joining sequences, or manipulating sequence regions by position. |
| tool_type | python |
| primary_tool | Bio.Seq |
Reference examples tested with: BioPython 1.83+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Extract, slice, and concatenate sequences using Biopython's Seq objects.
"Extract a subsequence" ā Use Python slicing on Seq objects with 0-based half-open coordinates.
seq[start:end] (BioPython Seq)"Join exons into mRNA" ā Extract multiple regions and concatenate them.
sum((seq[s:e] for s, e in coords), Seq('')) or + operatorfrom Bio.Seq import Seq
seq = Seq('ATGCGATCG')
seq[0] # 'A' - first base (0-indexed)
seq[-1] # 'G' - last base
seq[3] # 'C' - fourth base
seq = Seq('ATGCGATCGATCG')
seq[0:3] # Seq('ATG') - first 3 bases
seq[3:6] # Seq('CGA') - positions 3-5
seq[:5] # Seq('ATGCG') - first 5
seq[-5:] # Seq('GATCG') - last 5
seq[::2] # Seq('AGGTGTG') - every 2nd base
seq[::-1] # Seq('GCTAGCTAGCGTA') - reversed
Note: Slicing returns a Seq object, not a string.
seq1 = Seq('ATGC')
seq2 = Seq('GGGG')
combined = seq1 + seq2 # Seq('ATGCGGGG')
Can also concatenate with strings:
seq = Seq('ATGC')
extended = seq + 'NNNN' # Seq('ATGCNNNN')
genome = Seq('NNNNATGCGATCGATCGTAANNN')
cds_start, cds_end = 4, 21
cds = genome[cds_start:cds_end]
Biology often uses 1-based coordinates. Convert to 0-based:
def extract_1based(seq, start, end):
'''Extract using 1-based inclusive coordinates'''
return seq[start - 1:end]
genome = Seq('ATGCGATCGATCG')
region = extract_1based(genome, 1, 3) # Seq('ATG')
def split_codons(seq):
return [seq[i:i+3] for i in range(0, len(seq) - len(seq) % 3, 3)]
seq = Seq('ATGCGATCGATCG')
codons = split_codons(seq) # [Seq('ATG'), Seq('CGA'), ...]
def chunk_sequence(seq, size):
return [seq[i:i+size] for i in range(0, len(seq), size)]
seq = Seq('ATGCGATCGATCGATCGATCG')
chunks = chunk_sequence(seq, 10)
seqs = [Seq('ATGC'), Seq('GGGG'), Seq('TTTT')]
linker = Seq('NNN')
joined = linker.join(seqs) # Seq('ATGCNNNGGGGNNTTTT')
Or manually:
linker = 'NNN'
joined = Seq(linker.join(str(s) for s in seqs))
Goal: Splice non-contiguous regions (e.g., exons) into a single continuous sequence.
Approach: Extract each region by coordinates and concatenate with + or sum().
def extract_regions(seq, regions):
'''Extract and concatenate multiple regions'''
return sum((seq[start:end] for start, end in regions), Seq(''))
exon_coords = [(0, 50), (100, 150), (200, 250)]
mrna = extract_regions(genomic_seq, exon_coords)
def get_flanking(seq, position, flank_size):
'''Get sequence around a position'''
start = max(0, position - flank_size)
end = min(len(seq), position + flank_size + 1)
return seq[start:end]
seq = Seq('ATGCGATCGATCGATCGATCG')
flanking = get_flanking(seq, 10, 5) # 5 bp on each side of position 10
def sliding_windows(seq, window_size, step=1):
for i in range(0, len(seq) - window_size + 1, step):
yield seq[i:i + window_size]
seq = Seq('ATGCGATCGATCG')
for window in sliding_windows(seq, 5, 2):
print(window)
from Bio import SeqIO
for record in SeqIO.parse('sequence.gb', 'genbank'):
for feature in record.features:
if feature.type == 'CDS':
cds_seq = feature.extract(record.seq)
print(f'{feature.qualifiers.get("gene", ["?"])[0]}: {cds_seq[:30]}...')
from Bio.SeqRecord import SeqRecord
original = SeqRecord(Seq('ATGCGATCGATCGATCG'), id='full', description='Full sequence')
subset = SeqRecord(original.seq[5:15], id='subset', description=f'Positions 5-15 of {original.id}')
| System | Position 1 | Example |
|---|---|---|
| 0-based (Python) | Index 0 | seq[0:3] gets positions 0, 1, 2 |
| 1-based (Biology) | Index 1 | Position 1-3 = seq[0:3] |
| 0-based half-open | Start inclusive, end exclusive | Standard Python slicing |
| Error | Cause | Solution |
|---|---|---|
IndexError | Index out of range | Check sequence length first |
| Unexpected length | Off-by-one error | Remember end index is exclusive |
| Empty result | Start >= end | Check coordinate order |
| Wrong positions | 1-based vs 0-based confusion | Convert coordinates explicitly |
Need to extract or combine sequences?
āāā Single position?
ā āāā Use indexing: seq[i]
āāā Contiguous region?
ā āāā Use slicing: seq[start:end]
āāā Multiple non-contiguous regions?
ā āāā Extract each, concatenate with +
āāā Join sequences?
ā āāā No linker: seq1 + seq2
ā āāā With linker: linker.join(seqs)
āāā Split into parts?
ā āāā List comprehension with slicing
āāā From GenBank features?
āāā Use feature.extract(record.seq)