Run any Skill in Manus with one click

automation

Prepare dataset-curation and active-learning workflows around NepTrain and NepTrainKit. Use when the user needs perturbation-based sampling, representative-structure selection (FPS / max-min), automated NEP project scaffolding, interactive outlier inspection, or iterative retrain loops rather than a single manual NEP fit.

Run Skill in Manus

Stars11

Forks0

UpdatedApril 10, 2026 at 06:05

Source

GodJh1n

GodJh1n/GPUMD-skill

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

SKILL.md

readonly

More from this repository

same repository

nep-gpumd

GodJh1n/GPUMD-skill

Route NEP requests to task-specific subskills. NEP (Neuroevolution Potential) is the native machine-learning potential family of the GPUMD ecosystem — analogous to DeePMD-kit for LAMMPS. Use when the user asks for `nep.in`, `train.xyz`, `test.xyz`, NEP training, NEP89 reuse, prediction mode, fine-tuning, dipole / polarizability auxiliary models, or automation via NepTrain / NepTrainKit.

2026-04-1611

train

GodJh1n/GPUMD-skill

Train a first NEP potential from labeled extxyz data. Use when the user needs `nep.in`, `train.xyz`, `test.xyz`, parameter guidance, loss.out interpretation, or deployment of the resulting `nep.txt` back into GPUMD. NEP is the native machine-learning potential for GPUMD and plays the role that DeePMD plays for LAMMPS.

2026-04-1611

dft-vasp

GodJh1n/GPUMD-skill

Route VASP DFT requests to task-specific subskills based on user intent. Use when the user asks for VASP workflows and you must decide between static SCF, relaxation, DOS, or band-structure task preparation. This orchestration skill does not own detailed input generation logic; it dispatches to the correct VASP subskill and enforces consistent handoff to submission skills.

2026-04-1611

static

GodJh1n/GPUMD-skill

Prepare VASP static SCF input tasks from a user-provided structure and essential DFT settings. Use when the user needs single-point electronic structure/total-energy calculations with INCAR generation, KSPACING-based k-point policy (or explicit KPOINTS on request), and POTCAR mapping instructions.

2026-04-1611

dft-abacus

GodJh1n/GPUMD-skill

Route ABACUS requests to task-specific subskills based on user intent. Use when the user asks for any ABACUS DFT calculation and you need to determine whether the task is SCF, relaxation, MD, or electronic analysis.

2026-04-1611

static

GodJh1n/GPUMD-skill

Prepare ABACUS single-point (static SCF) task inputs from a user-provided structure and essential DFT settings. Use when the user needs total-energy/electronic SCF evaluation with explicit ABACUS INPUT/STRU/KPT generation, pseudopotential + orbital mapping, and basis-type selection (PW or LCAO).

2026-04-1611

name	automation
description	Prepare dataset-curation and active-learning workflows around NepTrain and NepTrainKit. Use when the user needs perturbation-based sampling, representative-structure selection (FPS / max-min), automated NEP project scaffolding, interactive outlier inspection, or iterative retrain loops rather than a single manual NEP fit.
compatibility	Requires NepTrain (pip install neptrain) and/or NepTrainKit when those workflows are executed.
catalog-hidden	true
license	GPL-3.0-only
metadata	{"author":"Jhin","version":"0.2.0"}

NEP Automation

Use this subskill when the user wants to go beyond a single manual nep invocation and into iterative dataset curation, active-learning loops, or GUI-assisted inspection of training data.

Two tools dominate the NEP automation ecosystem:

Tool	Role
NepTrain	CLI-driven active-learning loop: perturb → sample → label → retrain
NepTrainKit	GUI / notebook tool for dataset inspection, outlier detection, and frame selection

Agent responsibilities

Confirm which tool is installed and its version.
Do not set up an active-learning loop until the user has a working baseline NEP (from the train or fine-tune subskill).
Explain the perturbation → selection → labeling → retrain cycle before generating config files.
Warn the user about loop stability: an active-learning loop that adds low-quality frames or frames outside the DFT convergence envelope will degrade the model, not improve it.

NepTrain workflow

Concept

NepTrain orchestrates an iterative loop:

┌──────────────────────────────────────────────┐
│  1. Start with a baseline NEP + train.xyz    │
│  2. Run short GPUMD MD at target conditions  │
│  3. Select representative frames (FPS)       │
│  4. Label selected frames with DFT           │
│  5. Append to train.xyz                      │
│  6. Retrain NEP                              │
│  7. Repeat until convergence                 │
└──────────────────────────────────────────────┘

Project layout

project/
├── job.yaml           # NepTrain control file
├── nep.in             # NEP training input
├── nep.txt            # current best model
├── train.xyz          # growing training set
├── test.xyz           # held-out test set
├── structure/         # GPUMD model.xyz seed structures
│   └── model.xyz
├── run.in             # GPUMD sampling run.in
└── cache/             # NepTrain working directory

job.yaml anatomy

Annotated example (see assets/examples/neptrain/job.yaml):

version: 2.0.0
work_path: ./cache
current_job: nep
init_train_xyz: ./train.xyz      # starting training set
init_nep_txt: ./nep.txt          # starting model

nep:
  nep_restart: true              # use nep.restart for warm-start
  nep_restart_step: 10000        # generations per retrain cycle
  nep_in_path: ./nep.in
  test_xyz_path: ./test.xyz
  machine:                       # execution backend
    context_type: LazyLocal
    batch_type: Shell
    local_root: ./
    remote_root: ./
  resources:
    number_node: 1
    gpu_per_node: 1
    group_size: 1

gpumd:
  step_times:                    # MD timesteps per sampling stage
  - 10
  - 50
  - 100
  temperature_every_step:        # temperatures to sample (K)
  - 300
  - 600
  model_path: ./structure        # directory containing model.xyz
  run_in_path: ./run.in          # GPUMD run.in for sampling
  machine:
    context_type: LazyLocal
    batch_type: Shell
    local_root: ./
    remote_root: ./
  resources:
    number_node: 1
    gpu_per_node: 1
    group_size: 1

select:
  max_selected: 20               # max frames to select per iteration
  min_distance: 0.01             # FPS min-distance threshold
  filter: 0.6                   # descriptor-space diversity filter

Key parameters:

nep_restart_step: keep low (5000–20000) during exploration, increase for final refinement.
step_times: list of MD durations per cycle. Longer runs explore more phase space but risk model-driven artifacts.
temperature_every_step: temperatures to scan. Cover the range of interest for your target property.
max_selected: cap on new frames per iteration. Too many frames per cycle risks adding correlated data.
min_distance / filter: control descriptor-space diversity. Lower min_distance → more aggressive pruning; higher filter → stricter outlier rejection.

Running NepTrain

# Initialize project
neptrain init

# Run the active-learning loop
neptrain run job.yaml

NepTrain manages the retrain/sample/select loop automatically. Monitor the cache/ directory for per-iteration outputs.

When to stop

Stop the loop when:

prediction RMSE on the test set plateaus across iterations
new iterations add fewer than 2–3 frames
the target physical observable converges (check with a downstream MD)

Do not rely on training loss alone — it always decreases.

NepTrainKit workflow

NepTrainKit provides an interactive GUI for:

visualizing descriptor space (PCA / t-SNE of NEP descriptors)
identifying outlier frames
manually accepting / rejecting frames before retraining
comparing parity across config types

Typical usage

from NepTrainKit import NepTrainKit

kit = NepTrainKit()
kit.load_dataset("train.xyz")
kit.load_model("nep.txt")
kit.visualize()          # opens interactive window
kit.export("curated.xyz")  # save after manual curation

Or via the CLI:

NepTrainKit

When to use NepTrainKit vs. NepTrain

Scenario	Recommended
Automated loop, many iterations	NepTrain
Manual inspection of a suspicious dataset	NepTrainKit
Post-loop cleanup before final production fit	NepTrainKit
Debugging why a model is unstable	NepTrainKit (outlier detection)

Stability warnings

Active-learning loops can diverge. Common failure modes:

Snowball effect: the model samples unphysical states, which get labeled and added, making the next model worse. Fix: set conservative step_times and validate each cycle's MD with a short NVE sanity check.
DFT inconsistency: different iterations use different DFT settings (k-points, functional, convergence). Fix: lock the DFT workflow before starting the loop.
Correlated frames: selecting too many frames from a single trajectory adds redundant data. Fix: keep max_selected low and min_distance high.
Under-constrained loop: no test set, no validation, just training loss. Fix: always hold out a test set and check a physical observable every few iterations.

Agent checklist

baseline NEP is already working before starting automation
job.yaml covers the target temperature and pressure range
DFT labeling workflow is locked and documented
max_selected and min_distance are conservative
test set is held out and not contaminated by the loop
downstream observable checked every few iterations
final curated dataset inspected with NepTrainKit before production fit

Read first

references/fine-tuning-playbook.md

Read when needed:

Bundled templates

assets/examples/neptrain/job.yaml

Expected output

a conservative automation plan with explicit selection and labeling stages
the required control files (job.yaml, nep.in, run.in)
warnings about unstable or under-constrained active-learning loops
a validation strategy for each iteration

References

NepTrain: https://github.com/aboys-cb/NepTrain
NepTrainKit: https://github.com/aboys-cb/NepTrainKit
GPUMD-Tutorials: active-learning examples