Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

Comenzar

scientific-active-learning

アクティブラーニング (能動学習) スキル。不確実性サンプリング・ Query-by-Committee・期待モデル変化・プール型/ストリーム型・バッチアクティブラーニング・停止基準判定・モデル改善パイプライン。

Ejecutar en Manus

Resumen

Comando de instalación

npx skills add https://github.com/nahisaho/satori --skill scientific-active-learning

Copia y pega este comando en Claude Code para instalar la habilidad

Fuente

nahisaho/satori

Estrellas3

Forks1

Actualizado15 de febrero de 2026, 02:08

SKILL.md

readonly

Más de este repositorio

mismo repositorio

scientific-audit-report

nahisaho/satori

実験の監査レポート・データ来歴（プロベナンス）生成スキル。データ変換履歴・使用ツールのバージョン・データ整合性チェックを含むトレーサビリティレポートを自動生成する。「監査レポート作成」「データ来歴を記録」「トレーサビリティ」で発火。

2026-03-203

scientific-experiment-fork

nahisaho/satori

派生実験設計スキル。既存の実験をベースに条件を変更した派生実験を設計する。実験計画法（DOE）に基づくパラメータ探索を支援。「派生実験を設計して」「条件を変えて実験」「パラメータ探索」で発火。

2026-03-203

scientific-experiment-template

nahisaho/satori

実験テンプレート生成スキル。研究目的・仮説・手法・実験条件・評価基準・スケジュールを構造化した実験計画書を自動作成する。「実験テンプレート作成して」「実験計画を立てて」「実験プロトコルを作成」で発火。

2026-03-203

scientific-latex-export

nahisaho/satori

実験結果を論文形式（LaTeX / IMRaD）にエクスポートするスキル。 Introduction・Materials & Methods・Results・Discussion の構造で出版準備用の原稿を自動生成する。「論文にして」「LaTeX出力」「出版準備」で発火。

2026-03-203

scientific-peer-review

nahisaho/satori

実験結果の査読・レビュースキル。再現性・統計的妥当性・方法論の健全性を体系的に評価し、構造化されたレビューレポートを生成する。「レビューして」「査読して」「実験結果を評価して」で発火。

2026-03-203

scientific-academic-writing

nahisaho/satori

科学技術・学術論文の執筆スキル。IMRaD 標準、Nature/Science 系、ACS 系、IEEE 系、 Elsevier 系のジャーナル形式に対応した論文構成・セクション設計・文章パターンを提供。「論文を書いて」「Abstract を作成して」「Methods セクションを書いて」で発火。 assets/ に主要ジャーナル形式の Markdown テンプレートを同梱。

2026-02-153

Fuente

nahisaho

nahisaho/satori

Abrir repositorio de GitHub Ver repositorios del creador

Comando de instalación

Descarga

Ejecutar en Manus

Útil paraSOC

Científicos de datosOcupaciones informáticas y matemáticas15-2051L4

name	scientific-active-learning
description	アクティブラーニング (能動学習) スキル。不確実性サンプリング・ Query-by-Committee・期待モデル変化・プール型/ストリーム型・バッチアクティブラーニング・停止基準判定・モデル改善パイプライン。
tu_tools	[{"key":"openml","name":"OpenML","description":"能動学習データセット・評価指標"}]

Scientific Active Learning

ラベル付けコストを最小化しながらモデル精度を最大化するアクティブラーニング戦略の設計・実行・評価パイプラインを提供する。

When to Use

少量のラベル付きデータで効率的にモデルを改善するとき
ラベル付けコストが高い実験データを扱うとき
不確実性サンプリングでモデルの弱点を特定するとき
Query-by-Committee で意見が分かれるサンプルを選択するとき
バッチアクティブラーニングで複数サンプルを同時取得するとき
停止基準 (パフォーマンス収束) を判定するとき

Quick Start

1. 不確実性サンプリング

import numpy as np
import pandas as pd
from sklearn.base import clone
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score


def uncertainty_sampling(model, X_pool, strategy="entropy"):
    """
    不確実性ベース サンプリング。

    Parameters:
        model: fitted sklearn classifier
        X_pool: np.ndarray — ラベルなしプール
        strategy: str — "entropy" / "margin" / "least_confident"
    Returns:
        indices: np.ndarray — 不確実性降順のインデックス
    """
    proba = model.predict_proba(X_pool)

    if strategy == "entropy":
        scores = -np.sum(proba * np.log(proba + 1e-10), axis=1)
    elif strategy == "margin":
        sorted_p = np.sort(proba, axis=1)
        scores = 1.0 - (sorted_p[:, -1] - sorted_p[:, -2])
    elif strategy == "least_confident":
        scores = 1.0 - np.max(proba, axis=1)
    else:
        raise ValueError(f"Unknown strategy: {strategy}")

    indices = np.argsort(scores)[::-1]
    print(f"Uncertainty ({strategy}): top score = {scores[indices[0]]:.4f}")
    return indices


def query_by_committee(models, X_pool, n_members=5):
    """
    Query-by-Committee サンプリング。

    Parameters:
        models: list — 学習済み分類器リスト
        X_pool: np.ndarray — ラベルなしプール
        n_members: int — 委員会メンバ数
    """
    predictions = np.array([m.predict(X_pool) for m in models[:n_members]])
    # Vote entropy
    n_samples = X_pool.shape[0]
    n_classes = len(np.unique(predictions))
    scores = np.zeros(n_samples)

    for i in range(n_samples):
        votes = predictions[:, i]
        _, counts = np.unique(votes, return_counts=True)
        proba = counts / len(votes)
        scores[i] = -np.sum(proba * np.log(proba + 1e-10))

    indices = np.argsort(scores)[::-1]
    print(f"QBC: top disagreement = {scores[indices[0]]:.4f}")
    return indices

2. バッチアクティブラーニング

def batch_active_learning(model, X_pool, batch_size=10,
                          strategy="entropy", diversity_weight=0.5):
    """
    多様性を考慮したバッチアクティブラーニング。

    Parameters:
        model: fitted classifier
        X_pool: np.ndarray — ラベルなしプール
        batch_size: int — バッチサイズ
        strategy: str — 不確実性戦略
        diversity_weight: float — 多様性重み (0-1)
    """
    from sklearn.metrics.pairwise import euclidean_distances

    # 不確実性スコア
    indices = uncertainty_sampling(model, X_pool, strategy)

    # 候補プール (上位 batch_size * 3)
    candidate_size = min(batch_size * 3, len(indices))
    candidates = indices[:candidate_size]

    # 多様性ベース選択 (k-center greedy)
    selected = [candidates[0]]
    for _ in range(batch_size - 1):
        remaining = [c for c in candidates if c not in selected]
        if not remaining:
            break

        dists = euclidean_distances(
            X_pool[remaining], X_pool[selected])
        min_dists = dists.min(axis=1)

        # 不確実性ランクを正規化
        uncertainty_ranks = np.array([
            np.where(indices == r)[0][0] for r in remaining])
        uncertainty_scores = 1.0 - uncertainty_ranks / len(indices)

        # 複合スコア
        combined = (diversity_weight * min_dists / (min_dists.max() + 1e-10)
                    + (1 - diversity_weight) * uncertainty_scores)

        best_idx = remaining[np.argmax(combined)]
        selected.append(best_idx)

    print(f"Batch AL: selected {len(selected)} diverse-uncertain samples")
    return np.array(selected)

3. アクティブラーニングループ

def active_learning_loop(X_labeled, y_labeled, X_pool, y_pool_true,
                         X_test, y_test,
                         model=None, n_rounds=20, batch_size=10,
                         strategy="entropy"):
    """
    アクティブラーニング実験ループ。

    Parameters:
        X_labeled: np.ndarray — 初期ラベル付きデータ
        y_labeled: np.ndarray — 初期ラベル
        X_pool: np.ndarray — ラベルなしプール
        y_pool_true: np.ndarray — プールの真ラベル (Oracle)
        X_test: np.ndarray — テストデータ
        y_test: np.ndarray — テストラベル
        model: sklearn classifier (default: RF)
        n_rounds: int — ラウンド数
        batch_size: int — バッチサイズ
        strategy: str — サンプリング戦略
    """
    if model is None:
        model = RandomForestClassifier(n_estimators=100, random_state=42)

    X_l = X_labeled.copy()
    y_l = y_labeled.copy()
    X_p = X_pool.copy()
    y_p = y_pool_true.copy()

    history = []
    for rnd in range(n_rounds):
        m = clone(model).fit(X_l, y_l)
        acc = accuracy_score(y_test, m.predict(X_test))
        history.append({
            "round": rnd,
            "n_labeled": len(y_l),
            "accuracy": round(acc, 4),
            "pool_size": len(y_p),
        })

        if len(X_p) == 0:
            break

        # バッチ選択
        selected = batch_active_learning(
            m, X_p, batch_size, strategy)

        # Oracle に問い合わせ
        X_l = np.vstack([X_l, X_p[selected]])
        y_l = np.concatenate([y_l, y_p[selected]])

        mask = np.ones(len(X_p), dtype=bool)
        mask[selected] = False
        X_p = X_p[mask]
        y_p = y_p[mask]

    df = pd.DataFrame(history)
    improvement = df["accuracy"].iloc[-1] - df["accuracy"].iloc[0]
    print(f"AL loop: {n_rounds} rounds, "
          f"acc {df['accuracy'].iloc[0]:.3f} → {df['accuracy'].iloc[-1]:.3f} "
          f"(+{improvement:.3f})")
    return df


def compare_strategies(X_labeled, y_labeled, X_pool, y_pool_true,
                       X_test, y_test, n_rounds=20, batch_size=10):
    """
    複数アクティブラーニング戦略の比較。

    Parameters: (同上)
    """
    strategies = ["entropy", "margin", "least_confident"]
    results = {}

    for strat in strategies:
        history = active_learning_loop(
            X_labeled, y_labeled, X_pool, y_pool_true,
            X_test, y_test,
            n_rounds=n_rounds, batch_size=batch_size,
            strategy=strat)
        results[strat] = history

    # ランダムベースライン
    np.random.seed(42)
    random_history = active_learning_loop(
        X_labeled, y_labeled,
        X_pool[np.random.permutation(len(X_pool))],
        y_pool_true[np.random.permutation(len(y_pool_true))],
        X_test, y_test,
        n_rounds=n_rounds, batch_size=batch_size,
        strategy="least_confident")
    results["random"] = random_history

    print(f"Strategy comparison: {len(strategies) + 1} methods evaluated")
    return results

4. 停止基準判定

def stopping_criterion(history_df, patience=5, min_improvement=0.001):
    """
    アクティブラーニング停止基準判定。

    Parameters:
        history_df: pd.DataFrame — AL 履歴
        patience: int — 改善なしラウンド数
        min_improvement: float — 最小改善幅
    """
    accs = history_df["accuracy"].values
    if len(accs) < patience + 1:
        return False, "insufficient rounds"

    recent = accs[-patience:]
    best_before = accs[:-patience].max()
    improvement = recent.max() - best_before

    if improvement < min_improvement:
        return True, (f"converged: improvement {improvement:.5f} "
                      f"< threshold {min_improvement}")

    return False, f"continuing: improvement {improvement:.5f}"

パイプライン統合

eda-correlation → active-learning → ml-classification
  (データ探索)     (サンプル選択)     (モデル構築)
       │                │                ↓
  missing-data ─────────┘     ensemble-methods
    (欠損値処理)               (アンサンブル)
                                    ↓
                          uncertainty-quantification
                            (不確実性定量化)

パイプライン出力

ファイル	説明	次スキル
`al_history.csv`	AL ラウンド履歴	→ 停止判定
`selected_samples.csv`	選択サンプル	→ ラベル付け
`strategy_comparison.csv`	戦略比較	→ advanced-visualization

ToolUniverse 連携

TU Key	ツール名	連携内容
`openml`	OpenML	能動学習データセット・評価指標