تشغيل أي مهارة في Manus بنقرة واحدة

learning-pipeline

النجوم٠

التفرعات٠

آخر تحديث٢٦ فبراير ٢٠٢٦ في ١٩:٥٣

Documents the 9 learning feedback loops, SpreadBandit Thompson Sampling, adaptive ensemble, confidence tracking, and baseline tracker. Use when debugging learning behavior, tuning reward attribution, investigating model weight decay, or understanding how fills translate into parameter updates.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

trudumb

trudumb/hyper_make

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

علماء البياناتمهن الحاسوب والرياضيات·SOC 15-2051

مستكشف الملفات

2 ملفات

SKILL.md

readonly

name	learning-pipeline
description	Documents the 9 learning feedback loops, SpreadBandit Thompson Sampling, adaptive ensemble, confidence tracking, and baseline tracker. Use when debugging learning behavior, tuning reward attribution, investigating model weight decay, or understanding how fills translate into parameter updates.
requires	["measurement-infrastructure"]
user-invocable	false

Learning Pipeline Skill

Architecture Overview

Every fill is labeled training data. The system has 5 core learning components that all receive 1:1 reward attribution from each fill event:

FILL EVENT
  |
  +-> SpreadBandit.update_from_pending(reward)     # Spread selection learning
  +-> AdaptiveEnsemble.update_performance(ir,brier) # Model weight learning
  +-> KappaEstimator.on_own_fill(ts, price)        # Fill intensity learning
  +-> ReconcileOutcomeTracker.record_fill(oid,edge) # Reconcile action learning
  +-> PreFillClassifier.record_outcome(adverse,mag) # Adverse selection learning

The 9 Feedback Loops

#	Loop	Trigger	Updates	File
1	Kappa from own fills	`on_own_fill()`	Fill intensity (Hawkes)	`estimator/mod.rs`
2	AS markout queue	Fill → pending outcomes	Pending fill list	`orchestrator/handlers.rs`
3	AS outcome feedback	Markout resolved	`record_outcome()` on classifier	`orchestrator/handlers.rs`
4	Calibration progress	Periodic	`calibration_controller` update	`orchestrator/handlers.rs`
5	Sigma update	Trade/L2 events	Realized volatility	`orchestrator/handlers.rs`
6	Regime update	Trade/L2 events	HMM belief state	`orchestrator/handlers.rs`
7	Quote outcome tracking	Fill + 30s expiry	Fill rate bins, edge estimation	`learning/quote_outcome.rs`
8	Spread bandit update	Fill event	Context-arm posterior	`learning/spread_bandit.rs`
9	Ensemble weight update	Fill event	IR-based model weights	`learning/adaptive_ensemble.rs`

See references/feedback-loops.md for detailed loop descriptions.

Core Components

SpreadBandit (Thompson Sampling)

81 contexts (3 regimes x 3 positions x 3 vols x 3 flows) x 8 arms (multipliers: 0.85-1.40).

Posterior: Normal-Gamma conjugate per (context, arm) cell
Selection: Sample from posterior, pick highest sampled reward
Forgetting: factor=0.995, half-life ~138 obs, only when n >= 10
Cold start: Arm 3 (mult 1.0 = pure GLFT) when max_obs < 3
Reward: baseline_adjusted_edge_bps (actual edge minus EWMA baseline)

Key methods: select_arm(context), update_from_pending(reward), best_arm(context)

QuoteOutcomeTracker (Unbiased Edge)

Solves survivorship bias by tracking ALL quotes (filled AND unfilled).

Bins: 8 fine + 4 coarse with hierarchical shrinkage
Fill rate: Beta posterior per bin: P(fill) = alpha / (alpha + beta)
Expected edge: E[edge] = P(fill) x E[edge|fill]
Optimal spread: argmax(expected_edge) via grid search
E[PnL] reconciliation: Tracks prediction accuracy via epnl_at_registration

Key methods: register_quote(), on_fill(), expire_old_quotes(), optimal_spread_bps()

AdaptiveEnsemble (Dynamic Model Weights)

Softmax over Information Ratio with water-filling floor.

Weight formula: w[i] = exp(IR[i] / T) / sum(exp(IR[j] / T))
Temperature: 0.5 (concentrated) to 2.0 (uniform)
Floor: min_weight via iterative water-filling
Decay: EWMA blend ir_new = 0.995 * ir_old + 0.005 * ir_measured
Cold start: min_predictions_for_weight = 20

Key methods: update_performance(), compute_weights(), weighted_average(), summary()

BaselineTracker (Counterfactual Reward)

EWMA baseline subtraction centers rewards around zero for RL/bandit.

Formula: ewma = 0.99 * ewma + 0.01 * reward
Output: counterfactual = actual - baseline (or actual if not warmed up)
Warmup: min_observations = 10

EdgeBiasTracker (Calibration Health)

Detects systematic edge miscalibration.

Input: (predicted_edge_bps, realized_edge_bps)
Alert: should_recalibrate() when |ewma_bias| > 1.5 bps

Data Flow

QUOTE CYCLE:
  SpreadBandit.select_arm(context) → pending selection
  QuoteOutcomeTracker.register_quote() → pending quote
  Quote published with spread_bps

FILL EVENT (handlers.rs):
  QuoteOutcomeTracker.on_fill() → resolve pending, update fill rate bins
  SpreadBandit.update_from_pending(reward) → update cell posterior
  AdaptiveEnsemble.update_performance() → update IR, recompute weights
  KappaEstimator.on_own_fill() → update Hawkes intensity
  ReconcileOutcomeTracker.record_fill() → update action EV estimates

EXPIRY (30s timeout):
  QuoteOutcomeTracker.expire_old_quotes() → mark as Expired, update bins

Checkpoint Persistence

All components persist via #[serde(default)]:

SpreadBanditCheckpoint: cells with (context_idx, arm_idx, mu_n, kappa_n, alpha, beta, n)
QuoteOutcomeCheckpoint: bins with (lo_bps, hi_bps, observed_fills, observed_total)
BaselineTracker: (ewma_reward, n_observations)
AdaptiveEnsemble: HashMap of ModelPerformance (IR, Brier, n_predictions, weight)

Key File Map

Component	File
SpreadBandit	`learning/spread_bandit.rs`
QuoteOutcomeTracker	`learning/quote_outcome.rs`
BaselineTracker	`learning/baseline_tracker.rs`
AdaptiveEnsemble	`learning/adaptive_ensemble.rs`
EdgeBiasTracker	`learning/confidence.rs`
DecisionEngine	`learning/decision.rs`
CompetitorModel	`learning/competitor_model.rs`
Fill integration	`orchestrator/handlers.rs`

المزيد من هذا المستودع

نفس المستودع

checkpoint-management

trudumb/hyper_make

State persistence, prior transfer, and warmup lifecycle. Read when working on checkpoint/, adding new checkpoint fields, debugging cold starts or stale priors, or understanding serde(default) requirements and backward compatibility rules.

2026-02-260

config-derivation

trudumb/hyper_make

Documents auto_derive.rs first-principles parameter derivation from capital and exchange metadata. Use when onboarding new assets, debugging parameter mismatches, understanding why gamma/max_position/target_liquidity have their values, or adding new derived parameters.

2026-02-260

infrastructure-ops

trudumb/hyper_make

WebSocket management, event loop, rate limiting, reconnection, recovery, metrics, and order execution infrastructure. Use when working on orchestrator/, infra/, messages/, core/, fills/, or execution/ modules, debugging connectivity or order placement, adding message handlers, or investigating stale data and latency issues.

2026-02-260

risk-management

trudumb/hyper_make

Layered risk system with monitors, circuit breakers, kill switch, and position guards. Use when working on risk/, safety/, or monitoring/ modules, debugging position limits, emergency shutdowns, spread widening, or adding new risk monitors. Covers RiskMonitor trait, severity escalation, and defense-first architecture.

2026-02-260

spread-chain

trudumb/hyper_make

Documents the additive spread composition pipeline from GLFT optimal through to final bid/ask prices. Use when debugging wide spreads, investigating spread component contributions, tuning defensive behavior, or understanding why quotes are wider than expected. Critical for incident triage.

2026-02-260

stochastic-controller

trudumb/hyper_make

Layer 3 optimal sequential decision-making with Bayesian belief tracking, HJB value functions, and changepoint detection. Use when working on control/, stochastic/, or process_models/ modules, debugging quote/wait/pull decisions, modifying the HJB solver, or adding action types. Covers conjugate updates, BOCD, and value of information.

2026-02-260