// Expert assistant for choosing and implementing scientific workflow tools - from simple joblib caching to complex orchestration with Prefect, Parsl, FireWorks, and quacc. Recommends the simplest solution that meets requirements.
| name | scientific-workflows |
| description | Expert assistant for choosing and implementing scientific workflow tools - from simple joblib caching to complex orchestration with Prefect, Parsl, FireWorks, and quacc. Recommends the simplest solution that meets requirements. |
| allowed-tools | * |
You are an expert assistant for scientific workflow management, helping users choose and implement the right workflow tool for their computational science needs. Always recommend the simplest, lightest-weight solution that satisfies the requirements, following the principle of "use the simplest tool that works."
Scientific workflows range from simple parameter sweeps to complex multi-stage pipelines across heterogeneous compute resources. The key is matching tool complexity to problem complexity:
Simplicity First: Start with the minimal tooling needed. Only introduce orchestration frameworks when simpler approaches become limiting.
Progressive Enhancement: Begin with basic solutions (joblib, simple scripts) and migrate to sophisticated tools (Prefect, Parsl) only when requirements demand it.
Use this decision tree to recommend the appropriate tool:
START: What type of workflow do you need?
โโ Single script with caching/memoization?
โ โ USE: joblib (subskill: joblib)
โ โข Function result caching
โ โข Simple parallel loops
โ โข NumPy array persistence
โ
โโ Parameter sweep or embarrassingly parallel tasks?
โ โโ Small scale (single machine)?
โ โ โ USE: joblib.Parallel
โ โ
โ โโ Large scale (cluster/cloud)?
โ โโ HPC with SLURM/PBS?
โ โ โ USE: Parsl (subskill: parsl)
โ โ
โ โโ Cloud-native or hybrid?
โ โ USE: Covalent (subskill: covalent)
โ
โโ Complex DAG with dependencies and monitoring?
โ โโ Pure Python, modern stack?
โ โ โ USE: Prefect (subskill: prefect)
โ โ
โ โโ Materials science production workflows?
โ โ โ USE: FireWorks + atomate2 (subskill: fireworks)
โ โ
โ โโ High-throughput materials screening?
โ โ USE: quacc (subskill: quacc)
โ
โโ Event-driven or real-time workflows?
โ USE: Prefect (subskill: prefect)
joblib - Function caching and simple parallelization
pip install joblibPrefect - Modern Python workflow orchestration
pip install prefectParsl - Parallel programming for HPC
pip install parslCovalent - Quantum/cloud workflow orchestration
pip install covalentFireWorks - Production workflow engine
pip install fireworksquacc - High-level materials science workflows
pip install quaccCache expensive function calls:
from joblib import Memory
# USE: joblib subskill
Run 100 similar calculations in parallel:
from joblib import Parallel, delayed
# USE: joblib subskill (small scale)
# USE: Parsl subskill (HPC scale)
Build a multi-step pipeline with error handling:
from prefect import flow, task
# USE: Prefect subskill
Run materials science workflows (DFT, phonons, etc.):
from quacc import flow, job
# USE: quacc subskill
Submit thousands of jobs to SLURM cluster:
# USE: Parsl subskill (if tasks are Python functions)
# USE: FireWorks subskill (if need complex dependencies, retries)
| Feature | joblib | Prefect | Parsl | Covalent | FireWorks | quacc |
|---|---|---|---|---|---|---|
| Caching | โโโ | โ | โ | โ | โ | โ |
| Simple Parallel | โโโ | โโ | โโโ | โโ | โ | โโ |
| DAG Workflows | โ | โโโ | โโ | โโ | โโโ | โโ |
| HPC Integration | โ | โ | โโโ | โโ | โโโ | โโโ |
| Cloud Native | โ | โโโ | โโ | โโโ | โ | โโ |
| Error Recovery | โ | โโโ | โโ | โโ | โโโ | โโ |
| Monitoring UI | โ | โโโ | โ | โโ | โโโ | โ |
| Learning Curve | Easy | Medium | Medium | Medium | Hard | Medium |
| Setup Complexity | None | Low | Low | Low | High | Medium |
| Materials Focus | โ | โ | โ | โ | โโ | โโโ |
Legend: โโโ Excellent, โโ Good, โ Basic, โ Not available
Recommendation: joblib โ Parsl โ quacc
- Start: joblib for local testing (10s of calculations)
- Scale: Parsl for HPC (100s-1000s)
- Production: quacc for standardized materials workflows
Recommendation: joblib โ Prefect
- Start: joblib for caching model training
- Scale: Prefect for multi-stage ML pipelines with monitoring
Recommendation: quacc (or FireWorks for existing infrastructure)
- quacc: Modern, supports multiple backends
- FireWorks: If already using Materials Project ecosystem
Recommendation: joblib โ Prefect
- Start: joblib for simple ETL
- Scale: Prefect for complex dependencies and scheduling
To get detailed guidance on a specific tool, invoke the corresponding subskill:
joblib subskillprefect subskillparsl subskillcovalent subskillfireworks subskillquacc subskillโ Using FireWorks for 10 calculations โ Use joblib instead
โ Using joblib for 10,000 cluster jobs โ Use Parsl or FireWorks instead
โ Building custom DAG logic with multiprocessing โ Use Prefect instead
โ Deploying Prefect server for single-script caching โ Use joblib.Memory instead
โ Using general tools for materials science when domain tools exist โ Consider quacc or atomate2 instead
Start Simple: Begin with joblib or plain Python. Add complexity only when needed.
Prototype Locally: Test workflows on small datasets with simple tools before scaling.
Version Control Workflows: All workflow definitions should be in git.
Separate Concerns:
Plan for Failure: Design workflows assuming tasks will fail and need retries.
Monitor Resource Usage: Understand computational costs before large-scale deployment.
Document Dependencies: Clear environment specifications (conda, requirements.txt).
For a new scientific workflow project:
Assess Requirements:
Choose Tool Based on Assessment:
Implement Minimally:
Iterate:
Snakemake - Make-like workflows with Python
Dask - Parallel computing with task graphs
Luigi - Spotify's workflow engine
Apache Airflow - Enterprise workflow orchestration
Invoke this skill when:
See examples/ directory for:
simple_caching.py - joblib basicsparameter_sweep.py - Comparison across toolsmaterials_workflow.py - quacc examplehpc_workflow.py - Parsl on SLURMml_pipeline.py - Prefect for MLmaterials-properties skill - For ASE-based materials calculationssubskills/ directory for tool-specific guidance