원클릭으로 Manus에서 모든 스킬 실행

animalclap-taxonomy-aware-pretraining

Build taxonomy-aware audio-text pretraining systems for species recognition from animal vocalizations. Train contrastive models that augment text prompts with hierarchical taxonomic structure (scientific/common names, phylogenetic sequences), evaluate on unseen species via rare-species test sets, and predict ecological traits directly from audio.

Manus에서 실행

개요

설치 명령

npx skills add https://github.com/ADu2021/skillXiv --skill animalclap-taxonomy-aware-pretraining

이 명령을 Claude Code에 복사하여 붙여넣어 스킬을 설치하세요

출처

ADu2021/skillXiv

스타3

포크0

업데이트2026년 3월 26일 15:00

SKILL.md

readonly

name	animalclap-taxonomy-aware-pretraining
title	AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition
version	0.0.3
engine	skillxiv-v0.0.3-claude-opus-4.6
license	MIT
url	https://arxiv.org/abs/2603.22053
keywords	["Taxonomy-Aware Learning","Audio Classification","Contrastive Learning","Species Recognition","Ecological Traits"]
description	Build taxonomy-aware audio-text pretraining systems for species recognition from animal vocalizations. Train contrastive models that augment text prompts with hierarchical taxonomic structure (scientific/common names, phylogenetic sequences), evaluate on unseen species via rare-species test sets, and predict ecological traits directly from audio.

AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining

Problem Statement

Traditional audio classification models fail to recognize unseen animal species because they lack awareness of biological relationships. Audio-only systems (e.g., baseline CLAP) achieve just 1.61% top-1 accuracy on unseen species, while species form natural hierarchies (6 classes → 66 orders → 341 families → 2,152 genera) that could guide learning.

Core Innovation: Taxonomy-Aware Pretraining

Integrate hierarchical biological structure into contrastive language-audio pretraining by augmenting text representations with taxonomic metadata.

Data Structure: 4,225 hours of recordings covering 6,823 species with 22 ecological trait annotations (diet, activity pattern, habitat, climate distribution, social behavior).

Text Augmentation Strategy: For each species, generate prompts combining:

Common name (e.g., "American Robin")
Scientific name (e.g., Turdus migratorius)
Taxonomic sequence (e.g., "Aves → Passeriformes → Turdidae → Turdus → migratorius")

Architecture: HTS-AT audio encoder + RoBERTa text encoder with contrastive loss.

Key Results

Unseen Species Classification: AnimalCLAP achieves 27.6% top-1 accuracy vs. baseline CLAP's 1.61%—a 17× improvement. This validates that hierarchical structure generalizes to novel species.

Taxonomy Ablation: Randomizing taxonomic sequence ordering reduces accuracy substantially, confirming that hierarchy ordering (not just presence of names) drives generalization.

Trait Prediction: Model successfully predicts ecological traits directly from audio:

Activity patterns: 83.7% accuracy
Predator classification: 92.6% accuracy
Broader environmental traits: lower but significant

Test Set Design: 300 rare species with <15 recordings each prevents training data leakage and validates true out-of-distribution generalization.

Deployment Recipe

Data Collection Criteria: Use only Creative Commons-licensed recordings from iNaturalist and Xeno-canto; verify habitat/temporal diversity.
Annotation Workflow: Obtain species labels, aggregate taxonomic names via open databases (NCBI Taxonomy), compute taxonomic sequences programmatically.
Text Prompt Construction: For species S, generate: "A recording of [common_name], scientifically known as [scientific_name], belonging to the sequence [path_from_class_to_species]."
Training: Contrastive learning with standard CLIP objective; ensure balanced sampling across taxonomic levels to prevent genus/family overfitting.
Evaluation: Always construct test sets using rare species (<15 recordings) to measure generalization to unseen taxa. Include both accuracy and trait prediction metrics.

Practical Implications

Scaling: Hierarchy enables few-shot learning for data-scarce species; economic impact for conservation monitoring.
Trait Transfer: Learned audio representations capture ecological properties, enabling downstream tasks (habitat prediction, behavior classification) without additional annotation.
Generalization Principle: Metadata-informed contrastive learning outperforms brute-force scaling in domains with natural hierarchies.

이 저장소의 다른 Skills

같은 저장소

meaningful-kebab-case-name

ADu2021/skillXiv

Convert arXiv papers into ready-to-use agent skills using category-aware extraction. First classifies the paper into one or more of 11 research categories, then applies a specialized extraction pipeline for each category — because different types of papers produce different types of usable knowledge. A single paper can yield multiple skills if it spans categories. Use this skill whenever the user wants to turn a paper into a skill, extract practical techniques from research, build a skill library from papers, convert arXiv papers into reusable agent instructions, or batch-process multiple papers into skills. Also trigger when someone asks about extracting actionable knowledge from papers, making research practical for LLM agents, or systematically converting academic contributions into structured agent capabilities.

2026-03-263

action-quantization-behavior-cloning

ADu2021/skillXiv

Establish regret bounds for behavior cloning with discretized actions combining statistical error and quantization error terms. Prove smoothness requirements for safe quantizer design, show that learning-based quantizers fail these requirements, and propose model-based augmentation to reduce error dependence from H² to H.

2026-03-263

adaptive-lora-personalized-ranks

ADu2021/skillXiv

Dynamically allocate LoRA ranks per-layer during fine-tuning instead of using fixed uniform ranks. Learn optimal rank for each layer and subject via variational framework with discretized exponential distribution, reducing memory footprint while maintaining fidelity and text-alignment.

2026-03-263

additivellm2-domain-adaptation

ADu2021/skillXiv

Adapt general LLMs to specialized manufacturing domains via domain-adaptive pretraining on open-access journals and visual instruction tuning. Extract 50M tokens and 24K images from peer-reviewed papers, achieve >90% accuracy on domain knowledge tasks, and enable real-time defect identification from manufacturing images.

2026-03-263

agentic-ai-intelligence-explosion

ADu2021/skillXiv

Future intelligence explosions will be plural, social, and entangled with humanity through distributed collaborative systems rather than singular superintelligence. Intelligence is inherently social, demanding infrastructure matching agent development; integrate governance, institutional frameworks, and constitutional checks across hierarchies of autonomous agents and human-AI centaurs in shifting configurations.

2026-03-263

bubblerag-evidence-driven-graphs

ADu2021/skillXiv

Address hallucinations in LLM QA over black-box knowledge graphs using evidence-driven retrieval. Formalize Optimal Informative Subgraph Retrieval and employ bubble expansion to discover candidate evidence graphs, achieving state-of-the-art multi-hop QA performance.

2026-03-263

출처

ADu2021

ADu2021/skillXiv

GitHub 저장소 열기 Creator 저장소 보기

설치 명령

다운로드

Manus에서 실행

유용한 대상SOC

데이터 과학자컴퓨터 및 수학직15-2051L4

name	animalclap-taxonomy-aware-pretraining
title	AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition
version	0.0.3
engine	skillxiv-v0.0.3-claude-opus-4.6
license	MIT
url	https://arxiv.org/abs/2603.22053
keywords	["Taxonomy-Aware Learning","Audio Classification","Contrastive Learning","Species Recognition","Ecological Traits"]
description	Build taxonomy-aware audio-text pretraining systems for species recognition from animal vocalizations. Train contrastive models that augment text prompts with hierarchical taxonomic structure (scientific/common names, phylogenetic sequences), evaluate on unseen species via rare-species test sets, and predict ecological traits directly from audio.

AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining

Problem Statement

Core Innovation: Taxonomy-Aware Pretraining

Integrate hierarchical biological structure into contrastive language-audio pretraining by augmenting text representations with taxonomic metadata.

Data Structure: 4,225 hours of recordings covering 6,823 species with 22 ecological trait annotations (diet, activity pattern, habitat, climate distribution, social behavior).

Text Augmentation Strategy: For each species, generate prompts combining:

Common name (e.g., "American Robin")
Scientific name (e.g., Turdus migratorius)
Taxonomic sequence (e.g., "Aves → Passeriformes → Turdidae → Turdus → migratorius")

Architecture: HTS-AT audio encoder + RoBERTa text encoder with contrastive loss.

Key Results

Unseen Species Classification: AnimalCLAP achieves 27.6% top-1 accuracy vs. baseline CLAP's 1.61%—a 17× improvement. This validates that hierarchical structure generalizes to novel species.

Taxonomy Ablation: Randomizing taxonomic sequence ordering reduces accuracy substantially, confirming that hierarchy ordering (not just presence of names) drives generalization.

Trait Prediction: Model successfully predicts ecological traits directly from audio:

Activity patterns: 83.7% accuracy
Predator classification: 92.6% accuracy
Broader environmental traits: lower but significant

Test Set Design: 300 rare species with <15 recordings each prevents training data leakage and validates true out-of-distribution generalization.

Deployment Recipe

Data Collection Criteria: Use only Creative Commons-licensed recordings from iNaturalist and Xeno-canto; verify habitat/temporal diversity.
Annotation Workflow: Obtain species labels, aggregate taxonomic names via open databases (NCBI Taxonomy), compute taxonomic sequences programmatically.
Text Prompt Construction: For species S, generate: "A recording of [common_name], scientifically known as [scientific_name], belonging to the sequence [path_from_class_to_species]."
Training: Contrastive learning with standard CLIP objective; ensure balanced sampling across taxonomic levels to prevent genus/family overfitting.
Evaluation: Always construct test sets using rare species (<15 recordings) to measure generalization to unseen taxa. Include both accuracy and trait prediction metrics.

Practical Implications

Scaling: Hierarchy enables few-shot learning for data-scarce species; economic impact for conservation monitoring.
Trait Transfer: Learned audio representations capture ecological properties, enabling downstream tasks (habitat prediction, behavior classification) without additional annotation.
Generalization Principle: Metadata-informed contrastive learning outperforms brute-force scaling in domains with natural hierarchies.