| name | arize-datasets |
| description | Manage datasets in Arize AI using the ax CLI. Use when users want to list datasets, get dataset details, create new datasets, delete datasets, export dataset data, or work with dataset examples. Triggers on "list datasets", "create dataset", "ax datasets", "export dataset", "delete dataset", or any request about managing Arize datasets via CLI. |
Arize AX Datasets
Manage datasets in the Arize AI platform using the ax CLI.
Prerequisites
The user must have:
- Arize AX CLI installed (
pip install arize-ax-cli)
- CLI configured with valid credentials (
ax config init)
Core Dataset Commands
List All Datasets
ax datasets list
Options:
--output <format> - Output format: table (default), json, csv, parquet
--profile <name> - Use specific configuration profile
--limit <n> - Limit number of results
--offset <n> - Skip first n results (pagination)
Examples:
ax datasets list
ax datasets list --output json
ax datasets list --limit 10 --offset 0
ax datasets list --profile production
Extracting Dataset IDs:
To find a specific dataset ID for use in other operations:
ax datasets list --output json | jq '.[] | {id: .id, name: .name}'
ax datasets list --output json | jq -r '.[] | select(.name == "Training Data") | .id'
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "Training Data") | .id')
echo "Found dataset: $DATASET_ID"
ax datasets get "$DATASET_ID"
ax datasets delete "$DATASET_ID"
Without jq (using grep):
ax datasets list --output json | grep -A 2 "Training Data" | grep "id"
ax datasets list --output json | grep -B 1 '"name": "Training Data"' | grep "id" | cut -d'"' -f4
Get Dataset Details
Retrieve information about a specific dataset:
ax datasets get <dataset-id>
Options:
--output <format> - Output format
--profile <name> - Configuration profile to use
Examples:
ax datasets get ds_abc123xyz
ax datasets get ds_abc123xyz --output json
ax datasets get ds_abc123xyz --profile production
Create a New Dataset
Create a dataset from a file:
ax datasets create --file <path> [options]
Supported File Formats:
- CSV (
.csv)
- JSON (
.json, .jsonl)
- Parquet (
.parquet)
Options:
--name <name> - Dataset name (required or inferred from filename)
--description <text> - Dataset description
--profile <name> - Configuration profile to use
Examples:
ax datasets create --file data.csv --name "Training Data" --description "Production training set"
ax datasets create --file examples.json --name "Test Examples"
ax datasets create --file dataset.parquet --name "Large Dataset"
ax datasets create --file data.csv --name "Test Data" --profile staging
Delete a Dataset
Remove a dataset from Arize:
ax datasets delete <dataset-id>
Options:
--profile <name> - Configuration profile to use
--yes or -y - Skip confirmation prompt
Examples:
ax datasets delete ds_abc123xyz
ax datasets delete ds_abc123xyz --yes
ax datasets delete ds_abc123xyz --profile production
⚠️ Warning: Deletion is permanent. Always verify the dataset ID before deleting.
Export Dataset Data
Export dataset examples to various formats:
ax datasets get <dataset-id> --output <format>
Export Formats:
json - JSON format
csv - Comma-separated values
parquet - Apache Parquet format
Examples:
ax datasets get ds_abc123xyz --output json > dataset.json
ax datasets get ds_abc123xyz --output csv > dataset.csv
ax datasets get ds_abc123xyz --output parquet > dataset.parquet
Working with Multiple Profiles
When working across different environments (dev, staging, production):
ax datasets list --profile production
ax datasets create --file test_data.csv --profile staging
ax datasets get ds_dev_123 --profile dev
Pagination for Large Results
For accounts with many datasets, use pagination:
ax datasets list --limit 10 --offset 0
ax datasets list --limit 10 --offset 10
ax datasets list --limit 10 --offset 20
Common Workflows
Workflow 1: Find Dataset by Name and Get Details
ax datasets list --output json | jq '.[] | {id: .id, name: .name}'
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "Production Data") | .id')
ax datasets get "$DATASET_ID"
ax datasets get "$DATASET_ID" --output csv > dataset_export.csv
Workflow 2: Create and Verify Dataset
ax datasets create --file data.csv --name "My Dataset"
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "My Dataset") | .id')
echo "Created dataset: $DATASET_ID"
ax datasets get "$DATASET_ID"
Workflow 2: Export, Modify, and Re-upload
ax datasets get ds_abc123 --output csv > dataset.csv
ax datasets create --file dataset.csv --name "Updated Dataset v2"
Workflow 3: Migrate Dataset Between Environments
ax datasets get ds_prod_123 --profile production --output json > prod_data.json
ax datasets create --file prod_data.json --name "Production Copy" --profile staging
Workflow 4: Cleanup Old Datasets
ax datasets list --output json > all_datasets.json
ax datasets delete ds_old_001 --yes
ax datasets delete ds_old_002 --yes
Output Format Examples
Table Format (Default)
Human-readable table with columns for ID, Name, Created, and Status.
JSON Format
Structured JSON with full dataset metadata:
{
"id": "ds_abc123xyz",
"name": "Training Data",
"description": "Production training set",
"created_at": "2024-01-15T10:30:00Z",
"num_examples": 1000,
"size_bytes": 52428800
}
CSV Format
Comma-separated values, useful for importing into spreadsheets or pandas.
Parquet Format
Efficient columnar format, ideal for large datasets and data processing.
Troubleshooting
"Dataset not found"
- Verify dataset ID:
ax datasets list
- Check you're using the correct profile:
ax config show
- Ensure the dataset exists in the current space/project
"Permission denied" or "Unauthorized"
- Check API key is valid:
ax config show --expand
- Verify the key has dataset permissions in Arize
- Try re-authenticating:
ax config init
"File format not supported"
Supported formats are CSV, JSON (including JSONL), and Parquet. Check:
- File extension is correct
- File is not corrupted
- File content matches the extension
Large dataset creation fails
For very large datasets:
- Check file size and network stability
- Try breaking into smaller chunks
- Use Parquet format for better compression
- Consider using the Arize Python SDK for programmatic uploads
Output is too large
For datasets with many examples:
- Use
--limit to restrict output size
- Export to file instead of viewing in terminal:
ax datasets get ds_abc123 --output json > dataset.json
- Use pagination with
--limit and --offset
Tips
- Extract dataset IDs by name:
DATASET_ID=$(ax datasets list --output json | jq -r '.[] | select(.name == "My Dataset") | .id')
- Use JSON output for scripting:
ax datasets list --output json | jq '.[] | .id'
- List IDs and names together:
ax datasets list --output json | jq '.[] | {id, name}'
- Pipe to files for export: Always redirect large outputs to files
- Verify before delete: Use
ax datasets get "$DATASET_ID" to confirm before deleting
- Profile naming: Use descriptive names like
prod, staging, dev
- Save IDs to variables: Store dataset IDs in shell variables for reuse in scripts
- Check limits: Some operations may have rate limits or quotas
Next Steps
- View dataset details in Arize UI: https://app.arize.com
- Use datasets in experiments and evaluations
- Integrate with Arize Python SDK for programmatic access
- Set up CI/CD pipelines using the CLI
When to Use This Skill
Use this skill when users want to:
- ✅ List all datasets in their Arize account
- ✅ Get details about a specific dataset
- ✅ Create a new dataset from a local file
- ✅ Delete datasets they no longer need
- ✅ Export dataset data to different formats
- ✅ Work with datasets across multiple environments
- ✅ Troubleshoot dataset-related CLI issues
Don't use this skill for:
- ❌ GraphQL queries (use
/arize-graphql-analytics instead)
- ❌ Installing/configuring the CLI (use
/setup-arize-cli instead)
- ❌ Managing projects, models, or other Arize resources beyond datasets