Run any Skill in Manus with one click

$pwd:

accessing-mlflow

Name: Accessing Mlflow
Author: NVIDIA

// Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.

Run Skill in Manus

$ git log --oneline --stat

stars:2,749

forks:405

updated:April 27, 2026 at 04:23

SKILL.md

readonly

name	accessing-mlflow
description	Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.
license	Apache-2.0

Accessing MLflow

MCP Server

mlflow-mcp gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.

ID Convention

When the user provides a hex ID (e.g. 71f3f3199ea5e1f0) without specifying what it is, assume it is an invocation_id (not an MLflow run_id). An invocation_id identifies a launcher invocation and is stored as both a tag and a param on MLflow runs. One invocation can produce multiple MLflow runs (one per task). You may need to search across multiple experiments if you don't know which experiment the run belongs to.

Querying Runs

# Find runs by invocation_id
MLflow:search_runs_by_tags(experiment_id, {"invocation_id": "<invocation_id>"})

# Query for example model/task runs
MLflow:query_runs(experiment_id, "tags.model LIKE '%<model>%'")
MLflow:query_runs(experiment_id, "tags.task_name LIKE '%<task_name>%'")

# Get a config from run's artifacts
MLflow:get_artifact_content(run_id, "config.yml")

# Get nested stats from run's artifacts
MLflow:get_artifact_content(run_id, "artifacts/eval_factory_metrics.json")

NOTE: You WILL NOT find PENDING, RUNNING, KILLED, or FAILED runs in MLflow! Only SUCCESSFUL runs are exported to MLflow.

Workflow Tips

When comparing metrics across runs, fetch the data via MCP, then run the computation in Python for exact results rather than doing math in-context:

uv run --with pandas python3 << 'EOF'
import pandas as pd
# ... compute deltas, averages, etc.
EOF

Artifacts Structure

<harness>.<task>/
├── artifacts/
│   ├── config.yml                # Fully resolved config used during the evaluation
│   ├── launcher_unresolved_config.yaml # Unresolved config passed to the launcher
│   ├── results.yml               # All results in YAML format
│   ├── eval_factory_metrics.json # Runtime stats (latency, tokens count, memory)
│   ├── report.html               # Request-Response Pairs samples in HTML format (if enabled)
│   └── report.json               # Request-Response Pairs samples in JSON format (if enabled)
└── logs/
    ├── client-*.log              # Evaluation client
    ├── server-*-N.log            # Deployment per node
    ├── slurm-*.log               # Slurm job
    └── proxy-*.log               # Request proxy

Troubleshooting

If the MLflow MCP server fails to load or its tools are unavailable:

uvx not found — install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

MCP server not configured — add the config and restart the agent:

For Claude Code — add to .claude/settings.json (project or user level), under "mcpServers":

"MLflow": {
  "command": "uvx",
  "args": ["mlflow-mcp"],
  "env": {
    "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
  }
}

For Cursor — edit ~/.cursor/mcp.json (Settings > Tools & MCP > New MCP Server):

{
  "mcpServers": {
    "MLflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
      }
    }
  }
}

related-skills.json

same repository

evaluation.md

from "NVIDIA/Model-Optimizer"

Evaluates accuracy of quantized or unquantized LLMs using NeMo Evaluator Launcher (NEL). Triggers on "evaluate model", "benchmark accuracy", "run MMLU", "evaluate quantized model", "run nel". Handles deployment, config generation, and evaluation execution. Not for quantizing models (use ptq), deploying/serving models (use deployment), or comparing completed baseline-vs-quantized results (use compare-results).

2026-05-222.7k

compare-results.md

from "NVIDIA/Model-Optimizer"

Establish baseline-vs-candidate evaluation plans, delegate missing evaluations, compare validated results, and decide quantization feasibility. Use when the user asks to compare baseline vs quantized runs, explain an accuracy drop/regression, verify whether a quantized checkpoint is acceptable, or compare NEL/MLflow evaluation outputs. Do NOT use for generic single-model evaluation without comparison intent (use evaluation), live NEL status/debugging (use launching-evals), or generic MLflow browsing without a comparison goal (use accessing-mlflow).

2026-05-212.7k

deployment.md

from "NVIDIA/Model-Optimizer"

Serve a quantized or unquantized LLM checkpoint as an OpenAI-compatible API endpoint using vLLM, SGLang, or TRT-LLM. Use when user says "deploy model", "serve model", "start vLLM server", "launch SGLang", "TRT-LLM deploy", "AutoDeploy", "benchmark throughput", "serve checkpoint", or needs an inference endpoint from a HuggingFace or ModelOpt-quantized checkpoint. Do NOT use for quantizing models (use ptq) or evaluating accuracy (use evaluation).

2026-05-212.7k

launching-evals.md

from "NVIDIA/Model-Optimizer"

Run, monitor, analyze, and debug LLM evaluations via nemo-evaluator-launcher. Covers running evaluations, checking status and live progress, debugging failed runs, exporting artifacts and logs, and analyzing results. ALWAYS triggers on mentions of running evaluations, checking progress, debugging failed evals, analyzing or analysing runs or results, run directories or artifact paths on clusters, Slurm job issues, invocation IDs, or inspecting logs (client logs, server logs, SSH to cluster, tail logs, grep logs). Do NOT use for creating or modifying evaluation configs.

2026-05-212.7k

monitor.md

from "NVIDIA/Model-Optimizer"

Monitor submitted jobs (PTQ, evaluation, deployment) on SLURM clusters. Use when the user asks "check job status", "is my job done", "monitor my evaluation", "what's the status of the PTQ", "check on job <slurm_job_id>", or after any skill submits a long-running job. Also triggers on "nel status", "squeue", or any request to check progress of a previously submitted job.

2026-05-212.7k

ptq.md

from "NVIDIA/Model-Optimizer"

This skill should be used when the user asks to "quantize a model", "run PTQ", "post-training quantization", "NVFP4 quantization", "FP8 quantization", "INT8 quantization", "INT4 AWQ", "quantize LLM", "quantize MoE", "quantize VLM", or needs to produce a quantized HuggingFace or TensorRT-LLM checkpoint from a pretrained model using ModelOpt.

2026-05-212.7k

package.json

"author": "NVIDIA"

"repository": "NVIDIA/Model-Optimizer"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	accessing-mlflow
description	Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.
license	Apache-2.0

Accessing MLflow

MCP Server

mlflow-mcp gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.

ID Convention

Querying Runs

# Find runs by invocation_id
MLflow:search_runs_by_tags(experiment_id, {"invocation_id": "<invocation_id>"})

# Query for example model/task runs
MLflow:query_runs(experiment_id, "tags.model LIKE '%<model>%'")
MLflow:query_runs(experiment_id, "tags.task_name LIKE '%<task_name>%'")

# Get a config from run's artifacts
MLflow:get_artifact_content(run_id, "config.yml")

# Get nested stats from run's artifacts
MLflow:get_artifact_content(run_id, "artifacts/eval_factory_metrics.json")

NOTE: You WILL NOT find PENDING, RUNNING, KILLED, or FAILED runs in MLflow! Only SUCCESSFUL runs are exported to MLflow.

Workflow Tips

When comparing metrics across runs, fetch the data via MCP, then run the computation in Python for exact results rather than doing math in-context:

uv run --with pandas python3 << 'EOF'
import pandas as pd
# ... compute deltas, averages, etc.
EOF

Artifacts Structure

<harness>.<task>/
├── artifacts/
│   ├── config.yml                # Fully resolved config used during the evaluation
│   ├── launcher_unresolved_config.yaml # Unresolved config passed to the launcher
│   ├── results.yml               # All results in YAML format
│   ├── eval_factory_metrics.json # Runtime stats (latency, tokens count, memory)
│   ├── report.html               # Request-Response Pairs samples in HTML format (if enabled)
│   └── report.json               # Request-Response Pairs samples in JSON format (if enabled)
└── logs/
    ├── client-*.log              # Evaluation client
    ├── server-*-N.log            # Deployment per node
    ├── slurm-*.log               # Slurm job
    └── proxy-*.log               # Request proxy

Troubleshooting

If the MLflow MCP server fails to load or its tools are unavailable:

uvx not found — install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

MCP server not configured — add the config and restart the agent:

For Claude Code — add to .claude/settings.json (project or user level), under "mcpServers":

"MLflow": {
  "command": "uvx",
  "args": ["mlflow-mcp"],
  "env": {
    "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
  }
}

For Cursor — edit ~/.cursor/mcp.json (Settings > Tools & MCP > New MCP Server):

{
  "mcpServers": {
    "MLflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
      }
    }
  }
}

accessing-mlflow

Accessing MLflow

MCP Server

ID Convention

Querying Runs

Workflow Tips

Artifacts Structure

Troubleshooting

More from this repository

More from this repository

Accessing MLflow

MCP Server

ID Convention

Querying Runs

Workflow Tips

Artifacts Structure

Troubleshooting