| name | domo-data-generator |
| description | **Generating sample data for Domo** -- invoke when a user needs to create realistic sample datasets and upload them to a Domo instance. Primary signals: requests for sample data, demo data, test data, fake data for Domo; mentions of Salesforce, Google Analytics, QuickBooks, NetSuite, Google Ads, Facebook Ads, HubSpot, Marketo, or Health Portal sample data; questions about the datagen CLI or domo_data_generator. Covers: generating datasets, uploading to Domo, creating datasets in Domo, rolling dates, entity pools, connector icons, catalog management, and adding new dataset definitions. Skip for: real connector setup, production data pipelines, data transformations (Magic ETL), or Domo App Platform. |
Domo Sample Data Generator
Generate realistic, cross-referenced sample data for Domo using the datagen CLI.
Repository: https://github.com/brrink/domo_data_generator
Overview
The generator creates sample data mirroring major business platforms with consistent cross-source entity integrity, then uploads it to Domo. It includes:
- 18 pre-built datasets across 6 source categories (Salesforce, Google Analytics, Financial, Marketing, Health, AdPoint)
- YAML-driven catalog for easy dataset additions
- Shared entity pool (companies, people, products, sales reps, campaigns)
- Date rolling to keep data looking current
- Direct Domo integration (create datasets, upload, set connector icons)
- Structured JSON output by default (AI-agent-friendly)
- pipx installable -- runs from any directory
Setup
pipx install git+https://github.com/brrink/domo_data_generator.git
mkdir my-domo-data && cd my-domo-data
datagen init
If .env.example is missing or you want a clean start, create .env in the working directory with:
cat > .env <<'EOF'
DOMO_CLIENT_ID=your_client_id_here
DOMO_CLIENT_SECRET=your_client_secret_here
DOMO_API_HOST=api.domo.com
DOMO_INSTANCE=your_instance_name
DOMO_SET_CONNECTOR_TYPE=false
EOF
Required Environment Variables
| Variable | Purpose |
|---|
DOMO_CLIENT_ID | OAuth client identifier |
DOMO_CLIENT_SECRET | OAuth client secret |
DOMO_API_HOST | API endpoint hostname |
DOMO_INSTANCE | Domo instance name |
DOMO_SET_CONNECTOR_TYPE | Enable connector icon customization (optional, default: false) |
Auth boundary note: domo_data_generator uses its own public-API/OAuth credential flow and does not run through community-domo-cli or ryuu session auth.
Current tooling boundary: most Product API automation should use community-domo-cli, but datagen dataset create/upload in this skill currently depends on python -m datagen with .env OAuth credentials (DOMO_CLIENT_ID / DOMO_CLIENT_SECRET).
CLI Reference
Entry point: datagen [OPTIONS] COMMAND [ARGS]
Global Options
| Option | Description |
|---|
--verbose / -v | Enable verbose logging |
--output / -o TEXT | Output format: json (default), table, yaml |
--yes / -y | Skip confirmation prompts |
All commands emit structured JSON by default for easy machine parsing.
Init Command
init -- Initialize a working directory
datagen init
datagen init /path/to/dir
Copies bundled catalog YAML files to ./catalog/, creates .env template, and creates ./data/ directory. Run this once before using the CLI in a new directory.
Core Commands
generate -- Generate sample data
datagen generate --all
datagen generate salesforce_opportunities
datagen generate --all --seed 42
datagen generate --all --dry-run
Requires entity pool initialization first. Run python -m datagen pool regenerate before generate even if your schema has no explicit entity_ref columns.
| Option | Description |
|---|
name | Dataset name (YAML filename stem), optional |
--all | Generate all datasets |
--seed INTEGER | Random seed for reproducibility |
--catalog-dir PATH | Catalog directory override |
--data-dir PATH | Data directory override |
--dry-run | Preview without writing files |
upload -- Upload data to Domo (full replace)
datagen upload --all
datagen upload salesforce_opportunities
Requires DOMO_CLIENT_ID and DOMO_CLIENT_SECRET.
| Option | Description |
|---|
name | Dataset name, optional |
--all | Upload all datasets |
--catalog-dir PATH | Catalog directory override |
--data-dir PATH | Data directory override |
create-dataset -- Create dataset(s) in Domo from catalog
datagen create-dataset --all --skip-existing
datagen create-dataset salesforce_opportunities
Requires DOMO_CLIENT_ID and DOMO_CLIENT_SECRET. The domo_id is persisted locally (in the catalog YAML if writable, otherwise in data/domo_ids.json).
| Option | Description |
|---|
name | Dataset name, optional |
--all | Create all datasets |
--skip-existing | Skip datasets that already have a domo_id |
--catalog-dir PATH | Catalog directory override |
roll-dates -- Shift rolling date columns to stay current
datagen roll-dates
datagen roll-dates --anchor-date 2026-04-01
| Option | Description |
|---|
--anchor-date TEXT | Target date (YYYY-MM-DD), defaults to today |
--catalog-dir PATH | Catalog directory override |
--data-dir PATH | Data directory override |
Informational Commands
list -- List catalog dataset definitions
datagen list
datagen --output table list
datagen list --verbose
status -- Display generation status for all datasets
datagen status
Connector Icon Commands
Require DOMO_DEVELOPER_TOKEN and DOMO_INSTANCE.
discover-types -- Search Domo connector/provider types
datagen discover-types salesforce
set-type -- Set connector icon on a Domo dataset
datagen set-type salesforce_opportunities
datagen set-type salesforce_opportunities --provider-key custom_key
set-type-all -- Set connector icon on all datasets with a domo_id
datagen set-type-all
Entity Pool Commands
pool regenerate -- Regenerate the shared entity pool
datagen pool regenerate
datagen pool regenerate --seed 99
datagen pool regenerate --company-count 500 --person-count 1000
| Option | Default |
|---|
--seed INTEGER | 42 |
--company-count INTEGER | 200 |
--person-count INTEGER | 500 |
--product-count INTEGER | 50 |
--sales-rep-count INTEGER | 20 |
--campaign-count INTEGER | 30 |
pool show -- Display entity pool summary
datagen pool show
Common Workflows
Full setup for a new Domo instance
datagen init
datagen pool regenerate
datagen generate --all
datagen create-dataset --all
datagen upload --all
datagen set-type-all
Daily refresh via cron
0 6 * * * cd /path/to/project && datagen roll-dates && datagen upload --all
Generate a single dataset end-to-end
datagen generate salesforce_opportunities
datagen create-dataset salesforce_opportunities
datagen upload salesforce_opportunities
datagen set-type salesforce_opportunities
Included Datasets
| Category | Dataset Name | Key | Rows |
|---|
| Salesforce | Salesforce - Accounts | salesforce_accounts | 500 |
| Salesforce | Salesforce - Contacts | salesforce_contacts | 1,500 |
| Salesforce | Salesforce - Opportunities | salesforce_opportunities | 2,500 |
| Google Analytics | Google Analytics - Sessions | ga_sessions | 5,000 |
| Google Analytics | Google Analytics - Page Views | ga_pageviews | 10,000 |
| Financial | QuickBooks - Invoices | financial_invoices | 3,000 |
| Financial | NetSuite - General Ledger | financial_gl_entries | 5,000 |
| Marketing | Google Ads - Campaign Performance | marketing_google_ads | 3,000 |
| Marketing | Facebook Ads - Campaign Performance | marketing_facebook_ads | 2,500 |
| Marketing | HubSpot - Contacts | marketing_hubspot_contacts | 2,000 |
| Marketing | Marketing - Market Leads | marketing_market_leads | 2,500 |
| Marketing | Marketo - Leads | marketing_marketo_leads | 3,000 |
| Health | Health Portal - Demographics | health_demographics | 15 |
| Health | Health Portal - Lab Results | health_lab_results | 1,470 |
| Health | Health Portal - Vitals | health_vitals | 5,250 |
| AdPoint | AdPoint - Orders | adpoint_orders | 150 |
| AdPoint | AdPoint - Line Items | adpoint_line_items | 500 |
| AdPoint | AdPoint - Flights | adpoint_flights | 2,000 |
Entity Pool
The shared entity pool provides consistent cross-dataset references. Entities are generated once and reused across all datasets.
| Entity Type | Default Count | Key Fields |
|---|
| company | 200 | id, account_id, name, domain, industry, size, city, state, annual_revenue, employee_count |
| person | 500 | id, contact_id, first_name, last_name, full_name, email, company_id, company_name, title, phone |
| product | 50 | id, name, category, unit_price, sku |
| sales_rep | 20 | id, rep_id, first_name, last_name, full_name, email, region |
| campaign | 30 | id, name, channel, budget, status |
Adding New Dataset Definitions
Dataset definitions live in the catalog/ directory as YAML files. Each YAML file defines metadata, columns, and generator configurations.
YAML Structure
dataset:
name: My Custom Dataset
domo_id: null
source_type: custom
description: "Description of the dataset"
row_count: 1000
tags:
- custom
- demo
schema:
- name: id
type: STRING
generator: uuid4
- name: company_name
type: STRING
generator: entity_ref
entity: company
field: name
- name: amount
type: DOUBLE
generator: random_decimal
min: 100.0
max: 10000.0
precision: 2
- name: created_date
type: DATE
generator: date_range
start_days_ago: 365
end_days_ahead: 0
rolling: true
Available Column Types
STRING, LONG, DOUBLE, DECIMAL, DATETIME, DATE
Available Generators
Generic: uuid4, random_choice, weighted_choice, random_int, random_decimal, date_range, entity_ref, compound, sequence, constant, derived_from_date, stage_derived, faker
Salesforce: sf_id, sf_opportunity_name, sf_case_subject, sf_lead_rating
Google Analytics: ga_session_id, ga_page_path, ga_source, ga_medium, ga_campaign, ga_browser, ga_device_category, ga_country, ga_bounce_rate, ga_session_duration, ga_pageviews, ga_landing_page
Financial: gl_account_code, gl_account_name, invoice_number, payment_terms, payment_method, invoice_status, journal_type, department, fiscal_period, debit_credit
Marketing/Ads: ad_platform, campaign_objective, ad_format, ad_headline, ad_keyword, targeting_type, impressions, clicks_from_impressions, ctr, cost_per_click, ad_spend, conversions_from_clicks, hubspot_lifecycle, hubspot_lead_status, ad_group_id
Health: health_lab_init, health_lab_field, health_vital_init, health_vital_field, health_demographics
Generator Column Options
| Option | Used With | Description |
|---|
entity | entity_ref | Entity pool type to reference |
field | entity_ref | Field to pull from the entity |
choices | random_choice, weighted_choice | List of possible values |
min / max | random_int, random_decimal | Value range |
precision | random_decimal | Decimal places |
template | compound | String template with {field} placeholders |
refs | compound | Column references for template substitution — must be a YAML list of column name strings (e.g. ["sku", "line_id"]), not a dict/object. A dict triggers ValidationError: schema.N.refs — Input should be a valid list. |
start_days_ago / end_days_ahead | date_range | Date range relative to today |
rolling | date_range | Enable date rolling for freshness |
mapping | stage_derived | Map source values to derived values |
source_column | stage_derived, derived_from_date | Column to derive from |
format | derived_from_date | Date format string |
faker_method | faker | Faker library method name |
faker_args | faker | Arguments for the Faker method |
weighted_choice YAML format:
generator: weighted_choice
choices:
"Tier 1": 0.40
"Tier 2": 0.35
"Tier 3": 0.25
compound refs vs formatted random strings: For values like PO-12345, prefer faker with bothify instead of abusing compound / refs:
- name: purchase_order_ref
type: STRING
generator: faker
faker_method: bothify
faker_args:
text: "PO-#####"
Rules
- Run
datagen init first -- Initialize a working directory before using any other commands. This copies the catalog and creates .env.
- Always generate before uploading -- Run
generate (or generate --all) before upload to ensure CSV data files exist.
- Create datasets before first upload -- Run
create-dataset before upload for new datasets. The domo_id is persisted locally.
- Use
--skip-existing -- When running create-dataset --all, use --skip-existing to avoid duplicating datasets that already have a domo_id.
- Entity pool consistency -- Regenerating the pool (
pool regenerate) invalidates all previously generated data. Re-generate all datasets afterward.
- Date rolling -- Use
roll-dates before upload to keep date columns current. Only columns with rolling: true are affected.
- Credentials --
DOMO_CLIENT_ID and DOMO_CLIENT_SECRET are required for upload and create-dataset. DOMO_DEVELOPER_TOKEN is required for set-type and discover-types. Offline commands need no credentials.
- Reproducibility -- Use
--seed for reproducible data generation across runs.
- Output format -- Default output is JSON. Use
--output table for human-readable Rich tables.
Checklist