| name | gi-promoter |
| description | Detect promoter regions in DNA sequences using the Genomic Intelligence G0 transformer (GENA-LM BERT Large), via the hosted /v1/tasks/promoter/predict API. Returns per-window promoter probabilities and called regions. |
| version | 0.1.0 |
| author | ClawBio + Genomic Intelligence |
| domain | genomics |
| license | MIT |
| tags | ["genomics","promoter","transcription","regulatory","dna-lm","transformer","gi-api"] |
| inputs | [{"name":"input_file","type":"file","format":["fa","fasta","fna"],"description":"Single-record FASTA. Any length; the API windows automatically (default model uses 2000 bp context, 1000 bp stride).","required":false}] |
| outputs | [{"name":"report","type":"file","format":"md","description":"Markdown report ā sequence + model metadata, called promoter regions, headline counts."},{"name":"result","type":"file","format":"json","description":"Full `{data, meta}` response from the GI API plus a flattened summary."},{"name":"reproducibility","type":"directory","description":"command.sh + environment.json for exact-rerun reproducibility."}] |
| dependencies | {"python":">=3.10","packages":["requests>=2.31"]} |
| demo_data | [{"path":"example_data/promoter_tp53.fa","description":"TP53 locus, gene-sense (chr17:7661779-7687546, GRCh38, 25.8 kbp; TP53 is minus-strand, so this is the reverse complement) ā bundled real reference sequence."}] |
| endpoints | {"cli":"python skills/gi-promoter/gi_promoter.py --input {input_file} --output {output_dir}"} |
| metadata | {"openclaw":{"requires":{"bins":["python3"],"env":[],"config":[]},"always":false,"emoji":"š§¬","homepage":"https://docs.genomicintelligence.ai","os":["darwin","linux"],"install":[{"kind":"pip","package":"requests","bins":[]}],"trigger_keywords":["promoter","promoter prediction","predict promoter","find promoter","promoter region","TSS prediction","transcription start site","gi promoter","genomic intelligence promoter","G0 promoter","GENA-LM promoter"]}} |
𧬠gi-promoter
You are gi-promoter, a ClawBio agent that calls the Genomic Intelligence promoter-prediction model. Given a DNA sequence (any length), it returns per-window promoter probabilities and called regions, all in a few hundred milliseconds via the hosted API.
ā ļø Remote inference ā opt-in required. Unlike most ClawBio skills, this skill uploads your FASTA sequence to the hosted Genomic Intelligence API at https://api.genomicintelligence.ai. The skill refuses to run unless GI_API_KEY is set ā cp .env.example .env && set -a && source .env && set +a to use the shared ClawBio hackathon key (50 concurrent / 120 rpm), or request an individual key at contact@genomicintelligence.ai. Prefer a browser? The same models run interactively at https://genomicintelligence.ai. Do not submit identifiable patient data without an appropriate data-use agreement.
Trigger
Fire this skill when the user says any of:
- "predict promoters in this sequence"
- "find promoters in [gene/region]"
- "is this a promoter?"
- "score this for promoter activity"
- "gi-promoter", "G0 promoter", "GENA-LM promoter"
- "transcription start site prediction", "TSS prediction"
Do NOT fire when:
- The user asks for splice sites ā
gi-splice
- The user asks for enhancer activity ā
gi-enhancer
- The user asks for chromatin state ā
gi-chromatin
- The user asks for gene/transcript structure ā
gi-annotation
Why This Exists
- Without it: A user with a multi-kbp sequence has to spin up a GPU, download the GENA-LM weights, tokenize, window, and run inference themselves.
- With it: One CLI call ā annotated report in <1 s for typical sequences. The model is hosted; the partner key is bundled for the ClawBio hackathon (
GI_API_KEY env overrides).
- Why ClawBio: Hosted G0 inference plus ClawBio's reproducibility bundle and orchestration chaining (
gi-promoter ā gi-expression ā variant-annotation).
API Backed
POST https://api.genomicintelligence.ai/v1/tasks/promoter/predict ā default model g0-promoter-2000bp (GENA-LM BERT Large, 2000 bp context, 1000 bp prediction window). Override with --model g0-promoter-large-300bp (faster) or --model dnabert-promoter-2000bp (DNABERT, 6-mer tokenization).
Workflow
- Parse: read single-record FASTA via the shared
clawbio.gi.gi_client.read_fasta helper (uppercase, strip non-ACGTN).
- Authenticate: resolve key ā
--api-key ā GI_API_KEY env ā bundled hackathon key (concurrent=50, rpm=120).
- POST the full sequence to
/v1/tasks/promoter/predict; the API windows internally.
- Render: write
report.md (summary + region table), result.json (full {data, meta} envelope), reproducibility/.
CLI Reference
python skills/gi-promoter/gi_promoter.py --demo --output /tmp/gi-promoter-demo
python skills/gi-promoter/gi_promoter.py --input my_region.fa --output report_dir
python skills/gi-promoter/gi_promoter.py --demo --model g0-promoter-large-300bp
python clawbio.py run gi-promoter --demo
Demo
python clawbio.py run gi-promoter --demo
Bundled fixture is the TP53 locus (19 kbp, GRCh38). Expect ~20 windows, near-zero promoter calls at the default 0.5 threshold (TP53 promoter sits in a small region, not most of the locus) ā proves the model is discriminating.
Authentication
The skill ships with a hackathon partner key. For production / heavier use, request your own key (contact@genomicintelligence.ai) and:
export GI_API_KEY=gi_yourkeyhere
python skills/gi-promoter/gi_promoter.py --demo
Gotchas
- Don't pre-window the sequence yourself. Submit the full region ā the API stride/window. Pre-windowing inflates rate-limit usage and gives identical results.
- Strand matters ā submit gene-sense. The promoter model is strand-sensitive (trained on EPDnew 5'ā3' coding-strand sequence). For minus-strand genes, reverse-complement to gene-sense before submission; the plus (genomic) strand returns near-zero (e.g. TP53 on the plus strand finds 0 promoters, on the coding strand finds its real promoters). The bundled TP53 fixture is already gene-sense.
- The hackathon key is shared. If you hit
429, you're sharing 50 concurrent / 120 rpm with everyone else. Set GI_API_KEY to your own key for serious work.
- N-content: long stretches of
N produce low-confidence calls; pre-trim if the region is mostly gap.
Output Structure
output_dir/
āāā report.md # Headline counts, region table, model + timing
āāā result.json # Full {data, meta} envelope from the API
āāā reproducibility/
āāā command.sh # Exact invocation
āāā environment.json # API base, model, request_id, timestamp
Integration with Bio Orchestrator
Routes here on: "promoter", "TSS prediction", "find promoter", "score promoter activity".
Chains with: variant-annotation (annotate variants overlapping called promoters), gi-expression (predict expression for sequences scored as promoters), gwas-lookup (look up variants in called promoter regions).
Safety
Research tool. Not a clinical assay. Hosted inference ā the sequence you submit traverses the GI API endpoint. Do not submit identifiable patient data without an appropriate agreement.