تشغيل أي مهارة في Manus بنقرة واحدة

bio-workflows-clinical-trial-pipeline

النجوم٩٤٣

التفرعات١٦٥

آخر تحديث٢٠ يونيو ٢٠٢٦ في ٠١:٤٨

End-to-end clinical trial analysis workflow from CDISC SDTM/ADaM loading through ICH E9(R1) estimand-driven primary analysis to CONSORT 2025 regulatory-compliant reporting. Covers data preparation, FDA 2023 marginal vs conditional logistic regression, categorical tests with Boschloo, modern HTE/subgroup methods, missing-data sensitivity (MMRM, reference-based MI, Permutt tipping point), graphical multiplicity (Bretz-Maurer), survival analysis (Cox/RMST/competing risks) when applicable, and Table 1. Use when performing a complete analysis of clinical trial data.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

GPTomics

GPTomics/bioSkills

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

مستكشف الملفات

3 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

bio-ribo-seq-initiation-site-mapping

GPTomics/bioSkills

Map translation initiation sites, including non-AUG and alternative starts, from initiation-drug ribosome profiling (TI-seq). Use when locating start codons, detecting near-cognate or upstream initiation, or analyzing harringtonine, lactimidomycin (GTI-seq/QTI-seq), or retapamulin (Ribo-RET) data.

2026-06-20943

bio-ribo-seq-orf-detection

GPTomics/bioSkills

Detect and quantify translated ORFs from Ribo-seq using 3-nucleotide periodicity, including uORFs, internal ORFs, dORFs, and novel ORFs. Use when finding actively translated regions beyond annotated CDS, classifying ORFs by the 2022 community standard, quantifying ORF-level translation, or choosing between periodicity-based callers.

2026-06-20943

bio-ribo-seq-riboseq-preprocessing

GPTomics/bioSkills

Preprocess ribosome profiling reads with UMI handling, adapter trimming, contaminant/rRNA depletion, and footprint-aware alignment. Use when preparing Ribo-seq FASTQ for periodicity QC, ORF detection, translation efficiency, or stalling analysis, or when deciding how to deduplicate, which aligner to use, or how to size-select ribosome-protected fragments.

2026-06-20943

bio-ribo-seq-ribosome-periodicity

GPTomics/bioSkills

Validate Ribo-seq library quality by measuring 3-nucleotide periodicity and calibrating read-length-specific P-site offsets. Use when checking whether footprints capture genuine translation, determining P-site offsets for downstream ORF/TE/stalling analysis, or deciding which read lengths to keep.

2026-06-20943

bio-ribo-seq-ribosome-stalling

GPTomics/bioSkills

Detect ribosome pausing and stalling at codon resolution from Ribo-seq, using local-relative occupancy metrics and A-site assignment. Use when studying elongation dynamics, codon dwell times, pause motifs, or ribosome collisions, and when judging whether a pause is real biology or a cycloheximide artifact.

2026-06-20943

bio-ribo-seq-translation-efficiency

GPTomics/bioSkills

Quantify translation efficiency (TE) as ribosome occupancy relative to mRNA abundance and test for differential TE between conditions. Use when separating translational from transcriptional regulation, distinguishing genuine translational control from buffering, or choosing between riborex, Xtail, anota2seq, and DESeq2 interaction models.

2026-06-20943

name	bio-workflows-clinical-trial-pipeline
description	End-to-end clinical trial analysis workflow from CDISC SDTM/ADaM loading through ICH E9(R1) estimand-driven primary analysis to CONSORT 2025 regulatory-compliant reporting. Covers data preparation, FDA 2023 marginal vs conditional logistic regression, categorical tests with Boschloo, modern HTE/subgroup methods, missing-data sensitivity (MMRM, reference-based MI, Permutt tipping point), graphical multiplicity (Bretz-Maurer), survival analysis (Cox/RMST/competing risks) when applicable, and Table 1. Use when performing a complete analysis of clinical trial data.
tool_type	python
primary_tool	statsmodels
workflow	true
depends_on	["clinical-biostatistics/cdisc-data-handling","clinical-biostatistics/logistic-regression","clinical-biostatistics/categorical-tests","clinical-biostatistics/effect-measures","clinical-biostatistics/subgroup-analysis","clinical-biostatistics/trial-reporting","clinical-biostatistics/missing-data-sensitivity","clinical-biostatistics/multiplicity-graphical","clinical-biostatistics/survival-analysis","clinical-biostatistics/power-and-sample-size"]
qc_checkpoints	[{"after_estimand_definition":"ICH E9(R1) 5 attributes pre-specified in SAP: treatment, population, endpoint, summary measure, ICE handling strategy"},{"after_data_prep":"One row per USUBJID, no duplicate subjects, treatment arms balanced, DS domain tabulated for dropout patterns by arm"},{"after_primary_analysis":"Model converged, no separation warnings, marginal RD via g-computation reported as primary per FDA 2023; conditional OR as supportive"},{"after_subgroup":"Interaction tests run via single model (not per-subgroup p-comparisons), graphical multiplicity adjustment via gMCP, forest plot generated"},{"after_missing_data":"Per ICH E9(R1) ICE strategy: MMRM/MAR or reference-based MI (J2R/CR/CIR); Permutt tipping-point delta reported in residual SD units"},{"after_reporting":"Table 1 with SMD, missing data per CONSORT 2025 item 21c, harms per item 15, estimand statement per ICH E9(R1)"}]

Version Compatibility

Reference examples tested with: statsmodels 0.14+, scipy 1.12+, tableone 0.9+, pyreadstat 1.2+, pandas 2.1+, numpy 1.26+, matplotlib 3.8+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Clinical Trial Analysis Pipeline

"Analyze my clinical trial data end to end" -> Load CDISC domain tables, prepare a subject-level analysis dataset, run primary statistical models, perform subgroup analyses, and generate regulatory-compliant tables and figures.

Complete workflow for clinical trial statistical analysis from raw data to publication-ready results.

Scientific Reasoning Framework

Before executing any analysis step, establish the causal framework. For an RCT, randomization justifies causal interpretation of the primary analysis, but subgroup analyses and observational comparisons within the trial (e.g., adherence effects) do not inherit this protection. Key decisions requiring scientific judgment at each step: (1) data preparation -- which aggregation strategy matches the estimand, (2) covariate selection -- include confounders and prognostic factors from the SAP, exclude mediators and colliders, (3) subgroup analysis -- test only biologically motivated interactions, (4) missing data -- link DS domain reasons to the assumed mechanism before choosing a method. The workflow below provides the technical steps; the scientific reasoning at each decision point determines whether the results are valid.

Workflow Overview

CDISC Domain Files (DM, AE, EX, LB)
    |
    v
[1. Data Preparation] ----> Subject-level dataset with outcomes and covariates
    |
    v
[2. Table 1] ------------> Baseline characteristics by treatment arm
    |
    v
[3. Primary Analysis] ---> Logistic regression with OR extraction
    |
    v
[4. Categorical Tests] --> Chi-square / Fisher's exact for key associations
    |
    v
[5. Subgroup Analysis] --> Interaction terms, stratified ORs, forest plot
    |
    v
[6. Missing Data] -------> Multiple imputation sensitivity analysis
    |
    v
Results tables and figures

Step 1: Data Preparation

Goal: Create a single subject-level analysis dataset from CDISC domain tables.

Approach: Load domain files, aggregate event-level data to one row per subject, merge on USUBJID, and code the outcome variable.

import pandas as pd
import pyreadstat

dm, _ = pyreadstat.read_xport('dm.xpt')
ae, _ = pyreadstat.read_xport('ae.xpt')

# Aggregate: did each subject have the target adverse event?
target_ae = ae[ae['AEDECOD'] == 'COVID-19']
severity_map = {'MILD': 1, 'MODERATE': 2, 'SEVERE': 3, 'LIFE THREATENING': 4, 'FATAL': 5}
target_ae['AESEV_NUM'] = target_ae['AESEV'].map(severity_map)
had_event = target_ae.groupby('USUBJID')['AESEV_NUM'].max().reset_index()
had_event.columns = ['USUBJID', 'EVENT_SEVERITY']

analysis = dm[['USUBJID', 'ARM', 'ARMCD', 'AGE', 'SEX']].merge(had_event, on='USUBJID', how='left')
analysis['HAD_EVENT'] = analysis['EVENT_SEVERITY'].notna().astype(int)
analysis['TREATMENT'] = (analysis['ARMCD'] != 'PLACEBO').astype(int)

QC Checkpoint: Verify one row per USUBJID, no unexpected duplicates, treatment arms are present and reasonably balanced.

assert analysis['USUBJID'].is_unique, 'Duplicate subjects detected'
print(analysis['ARM'].value_counts())

Step 2: Table 1 Baseline Characteristics

Goal: Summarize demographics and baseline variables by treatment arm.

Approach: Use TableOne to generate a baseline table by arm with standardized mean differences and explicit missingness. Omit the baseline p-value column: in a randomized trial any imbalance is by definition due to chance, so a baseline p-value tests a null already known to be true (Senn 1994; CONSORT 2010/2025). Report SMD for balance instead. Table-construction and export mechanics (gtsummary/tableone, Word export, gene-symbol-safe supplements) live in reporting/publication-tables.

from tableone import TableOne

columns = ['AGE', 'SEX', 'RACE']
categorical = ['SEX', 'RACE']
table1 = TableOne(analysis, columns=columns, categorical=categorical,
                  groupby='ARM', pval=False, smd=True, missing=True)
print(table1.tabulate(tablefmt='github'))

Interpret SMD > 0.1 as meaningful imbalance. The response to a worrying imbalance on a prognostic covariate is to adjust for it (a pre-specified ANCOVA/model covariate), not to test it.

Step 3: Primary Analysis -- Logistic Regression

Goal: Estimate the treatment effect on the binary outcome as an adjusted odds ratio.

Approach: Fit a logistic regression with explicit reference category and clinically relevant covariates, then exponentiate coefficients to obtain ORs.

import statsmodels.formula.api as smf
import numpy as np

model = smf.logit(
    'HAD_EVENT ~ C(ARM, Treatment(reference="Placebo")) + AGE + C(SEX)',
    data=analysis
).fit()

or_table = pd.DataFrame({
    'OR': np.exp(model.params),
    'Lower_CI': np.exp(model.conf_int()[0]),
    'Upper_CI': np.exp(model.conf_int()[1]),
    'p_value': model.pvalues
})
print(or_table)
print(f'McFadden pseudo-R2: {model.prsquared:.4f}')

QC Checkpoint: Verify model converged (no warnings), check for separation (coefficients > 10 or SE > 100), report pseudo-R-squared (McFadden > 0.2 is excellent; do not compare across pseudo-R2 types).

Step 4: Categorical Tests

Goal: Test the crude association between treatment and outcome using contingency tables.

Approach: Build a 2x2 table, check expected cell counts, and choose chi-square or Fisher's exact accordingly.

from scipy.stats import chi2_contingency, fisher_exact

table = pd.crosstab(analysis['ARM'], analysis['HAD_EVENT'])
chi2, p, dof, expected = chi2_contingency(table, correction=False)

if (expected < 5).any():
    _, p = fisher_exact(table.values)
    print(f'Fisher exact p = {p:.4f}')
else:
    print(f'Chi-square p = {p:.4f} (chi2 = {chi2:.2f}, dof = {dof})')

Step 5: Subgroup Analysis

Goal: Test whether the treatment effect varies across pre-specified subgroups.

Approach: Fit a model with an interaction term, extract subgroup-specific ORs, adjust for multiplicity, and visualize with a forest plot.

import matplotlib.pyplot as plt

# Interaction test
interaction_model = smf.logit(
    'HAD_EVENT ~ C(ARM, Treatment(reference="Placebo")) * C(SUBGROUP)',
    data=analysis
).fit()

# Subgroup-specific ORs
labels, ors, lowers, uppers = [], [], [], []
for group in analysis['SUBGROUP'].unique():
    sub = analysis[analysis['SUBGROUP'] == group]
    sub_model = smf.logit(
        'HAD_EVENT ~ C(ARM, Treatment(reference="Placebo"))',
        data=sub
    ).fit(disp=0)
    or_val = np.exp(sub_model.params.iloc[1])
    ci = np.exp(sub_model.conf_int().iloc[1])
    labels.append(group)
    ors.append(or_val)
    lowers.append(ci[0])
    uppers.append(ci[1])

# Multiplicity correction for subgroup p-values
from statsmodels.stats.multitest import multipletests
sub_pvals = [smf.logit('HAD_EVENT ~ C(ARM, Treatment(reference="Placebo"))',
             data=analysis[analysis['SUBGROUP'] == g]).fit(disp=0).pvalues.iloc[1]
             for g in labels]
_, adjusted_pvals, _, _ = multipletests(sub_pvals, method='holm')

# Forest plot
fig, ax = plt.subplots(figsize=(8, 5))
y_pos = range(len(labels))
ax.errorbar(ors, y_pos,
            xerr=[np.array(ors) - np.array(lowers), np.array(uppers) - np.array(ors)],
            fmt='D', color='black', capsize=3, markersize=5)
ax.axvline(x=1.0, color='gray', linestyle='--', linewidth=0.8)
ax.set_yticks(y_pos)
ax.set_yticklabels(labels)
ax.set_xlabel('Odds Ratio (95% CI)')
ax.set_xscale('log')
plt.tight_layout()
plt.savefig('forest_plot.png', dpi=150)

QC Checkpoint: Interaction p-value reported. Multiplicity correction applied if testing multiple subgroups. Forest plot shows overall estimate for context.

Step 6: Missing Data Sensitivity Analysis (per ICH E9(R1) and clinical-biostatistics/missing-data-sensitivity)

Goal: Assess robustness of the primary result under the pre-specified ICE strategy with both MAR primary and MNAR sensitivity analyses.

Approach: First examine DS (Disposition) domain for differential dropout patterns; if dropout differs by arm, MAR is suspect and reference-based MI is required as primary. Otherwise, fit MMRM under MAR with Rubin's-rules pooling for continuous endpoints, or g-computation with bootstrap for binary. Always run Permutt 2016 tipping-point sensitivity in residual SD units.

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

n_imputations = 20
covariate_cols = ['AGE']  # Only impute covariates, not treatment or outcome
mi_data = analysis.dropna(subset=['HAD_EVENT', 'TREATMENT']).copy()

results = []
for i in range(n_imputations):
    imputer = IterativeImputer(max_iter=10, random_state=i, sample_posterior=True)
    imputed_cov = pd.DataFrame(imputer.fit_transform(mi_data[covariate_cols]),
                               columns=covariate_cols, index=mi_data.index)
    imputed_cov['HAD_EVENT'] = mi_data['HAD_EVENT'].values
    imputed_cov['TREATMENT'] = mi_data['TREATMENT'].values
    model_imp = smf.logit('HAD_EVENT ~ TREATMENT + AGE', data=imputed_cov).fit(disp=0)
    results.append({'coef': model_imp.params['TREATMENT'], 'se': model_imp.bse['TREATMENT']})

pooled_coef = np.mean([r['coef'] for r in results])
within_var = np.mean([r['se']**2 for r in results])
between_var = np.var([r['coef'] for r in results], ddof=1)
total_var = within_var + (1 + 1/n_imputations) * between_var
pooled_or = np.exp(pooled_coef)
pooled_ci = (np.exp(pooled_coef - 1.96 * np.sqrt(total_var)),
             np.exp(pooled_coef + 1.96 * np.sqrt(total_var)))
print(f'Pooled OR: {pooled_or:.3f} ({pooled_ci[0]:.3f}-{pooled_ci[1]:.3f})')

QC Checkpoint: Compare pooled OR and CI with the complete-case primary analysis. Large discrepancies suggest missing data may not be MCAR. Document the comparison.

Result Reporting Checklist (CONSORT 2025 + ICH E9(R1) aligned)

When to Add Specialized Skills

This pipeline covers the typical binary-endpoint RCT workflow. For specific designs, add the corresponding specialized skill:

Time-to-event primary endpoint (OS, PFS, DOR): add clinical-biostatistics/survival-analysis for Cox PH diagnostics, RMST under non-PH, competing risks via Fine-Gray vs cause-specific Cox, MaxCombo for delayed effects, informative censoring handling
Continuous longitudinal endpoint (HbA1c at 24 weeks): clinical-biostatistics/missing-data-sensitivity for MMRM with Kenward-Roger via R mmrm; reference-based MI via R rbmi for MNAR sensitivity
Multiple primary or key secondary endpoints: clinical-biostatistics/multiplicity-graphical for Bretz-Maurer graphical procedures via gMCP
Trial design / sample-size justification: clinical-biostatistics/power-and-sample-size for Schoenfeld events, Lakatos under non-PH, FDA 2016 NI double discount, TOST equivalence
Adaptive trial (group-sequential, SSR, platform): clinical-biostatistics/adaptive-designs for rpact/gsDesign, Mehta-Pocock promising zone, ICH E20 considerations
Bayesian primary inference or RWE comparator: clinical-biostatistics/bayesian-trials for BOIN dose-finding, robust MAP priors via RBesT, EXNEX basket trials, psborrow2 for external controls

Related Skills

clinical-biostatistics/cdisc-data-handling - CDISC SDTM/ADaM, Pinnacle 21, Dataset-JSON, ADTTE CNSR conventions
clinical-biostatistics/logistic-regression - FDA 2023 marginal vs conditional, g-computation, Brant test, Firth, Hauck-Donner
clinical-biostatistics/categorical-tests - Boschloo, mid-p McNemar, Wilson/Newcombe/Miettinen-Nurminen CIs
clinical-biostatistics/effect-measures - NNT Bender 2002 convention, profile likelihood, modified Poisson for RR
clinical-biostatistics/subgroup-analysis - Causal forests, STEPP, SIDES, EXNEX, Yadlowsky RATE, EMA 2019 subgroup guideline
clinical-biostatistics/trial-reporting - ICH E9(R1) 5 estimand strategies, Cro/Bartlett variance debate, CONSORT 2025
clinical-biostatistics/missing-data-sensitivity - MMRM/Kenward-Roger, reference-based MI, Permutt tipping point
clinical-biostatistics/multiplicity-graphical - Bretz-Maurer graphs, Goeman closed-testing admissibility
clinical-biostatistics/survival-analysis - Cox/RMST/Fine-Gray/MaxCombo/recurrent events/interval censoring
clinical-biostatistics/power-and-sample-size - Schoenfeld/Lakatos, NI double discount, crossover, MCID
clinical-biostatistics/adaptive-designs - Group-sequential, SSR, RAR consensus, BOIN, platform trials
clinical-biostatistics/bayesian-trials - MAP/EXNEX/RWE, FDA Bayesian Jan 2026 draft, psborrow2
reporting/publication-tables - Table 1 construction (SMD not baseline p-values, show missingness) and Word/LaTeX export