// "Kailash DataFlow - zero-config database framework with automatic model-to-node generation. Use when asking about 'database operations', 'DataFlow', 'database models', 'CRUD operations', 'bulk operations', 'database queries', 'database migrations', 'multi-tenancy', 'multi-instance', 'database transactions', 'PostgreSQL', 'MySQL', 'SQLite', 'MongoDB', 'pgvector', 'vector search', 'document database', 'RAG', 'semantic search', 'existing database', 'database performance', 'database deployment', 'database testing', or 'TDD with databases'. DataFlow is NOT an ORM - it generates 11 workflow nodes per SQL model, 8 nodes for MongoDB, and 3 nodes for vector operations."
| name | dataflow |
| description | Kailash DataFlow - zero-config database framework with automatic model-to-node generation. Use when asking about 'database operations', 'DataFlow', 'database models', 'CRUD operations', 'bulk operations', 'database queries', 'database migrations', 'multi-tenancy', 'multi-instance', 'database transactions', 'PostgreSQL', 'MySQL', 'SQLite', 'MongoDB', 'pgvector', 'vector search', 'document database', 'RAG', 'semantic search', 'existing database', 'database performance', 'database deployment', 'database testing', or 'TDD with databases'. DataFlow is NOT an ORM - it generates 11 workflow nodes per SQL model, 8 nodes for MongoDB, and 3 nodes for vector operations. |
DataFlow is a zero-config database framework built on Kailash Core SDK that automatically generates workflow nodes from database models.
DataFlow transforms database models into workflow nodes automatically, providing:
DataFlow provides comprehensive error enhancement across all database operations, strict mode validation for build-time error prevention, and an intelligent debug agent for automated error diagnosis.
What It Is: Automatic transformation of Python exceptions into rich, actionable error messages with context, root causes, and solutions.
All DataFlow errors include:
Example:
# Missing parameter error shows:
# - Error Code: DF-101
# - Missing parameter: "id"
# - 3 solutions with code examples
# - Link to documentation
workflow.add_node("UserCreateNode", "create", {
"name": "Alice" # Missing "id" - error enhanced automatically
})
Error Categories:
Architecture:
# BaseErrorEnhancer - Shared abstraction
# โโ CoreErrorEnhancer - KS-501 to KS-508 (Core SDK)
# โโ DataFlowErrorEnhancer - DF-XXX codes (DataFlow)
What It Is: Build-time validation system with 4 layers to catch errors before workflow execution.
Validation Layers:
Configuration:
from dataflow import DataFlow
from dataflow.validation.strict_mode import StrictModeConfig
config = StrictModeConfig(
enabled=True,
validate_models=True,
validate_parameters=True,
validate_connections=True,
validate_workflows=True,
fail_fast=True, # Stop on first error
verbose=False # Minimal output
)
db = DataFlow("postgresql://...", strict_mode_config=config)
When to Use:
Documentation:
dataflow-strict-modedataflow-validation-layersWhat It Is: Intelligent error analysis system that automatically diagnoses errors and provides ranked, actionable solutions.
5-Stage Pipeline:
Usage:
from dataflow.debug.debug_agent import DebugAgent
from dataflow.debug.knowledge_base import KnowledgeBase
from dataflow.platform.inspector import Inspector
# Initialize once (singleton pattern)
kb = KnowledgeBase("patterns.yaml", "solutions.yaml")
inspector = Inspector(db)
debug_agent = DebugAgent(kb, inspector)
# Debug errors automatically
try:
runtime.execute(workflow.build())
except Exception as e:
report = debug_agent.debug(e, max_solutions=5, min_relevance=0.3)
print(report.to_cli_format()) # Rich terminal output
Output Formats:
# CLI format (color-coded, ANSI)
print(report.to_cli_format())
# JSON format (machine-readable)
json_output = report.to_json()
# Dictionary format (programmatic)
data = report.to_dict()
Performance: 5-50ms per error, 92%+ confidence for known patterns
Documentation:
dataflow-debug-agentdocs/guides/debug-agent-user-guide.mddocs/guides/debug-agent-developer-guide.mdValidation Modes: OFF, WARN (default), STRICT
Catch 80% of configuration errors at model registration time (not runtime):
from dataflow import DataFlow
db = DataFlow("postgresql://...")
# Default: Warn mode (backward compatible)
@db.model
class User:
id: int # Validates: primary key named 'id'
name: str
email: str
# Strict mode: Raises errors on validation failures
@db.model(strict=True)
class Product:
id: int
name: str
price: float
# Skip validation (advanced users)
@db.model(skip_validation=True)
class Advanced:
custom_pk: int # Custom primary key allowed
Validation Checks:
When to Use Each Mode:
Automatic error enhancement with context, root causes, and solutions:
from dataflow import DataFlow
from dataflow.core.error_enhancer import ErrorEnhancer
db = DataFlow("postgresql://...")
# ErrorEnhancer automatically integrated into DataFlow engine
# Enhanced errors show:
# - Error code (DF-101, DF-102, etc.)
# - Context (node, parameters, workflow state)
# - Root causes with probability scores
# - Actionable solutions with code templates
# - Documentation links
try:
# Missing parameter error
workflow.add_node("UserCreateNode", "create", {})
except Exception as e:
# ErrorEnhancer automatically catches and enriches
# Shows: DF-101 with specific fixes
pass
Key Features:
Common Errors Covered:
See: sdk-users/apps/dataflow/troubleshooting/top-10-errors.md
Introspection API for workflows, nodes, connections, and parameters:
from dataflow.platform.inspector import Inspector
inspector = Inspector(dataflow_instance)
inspector.workflow_obj = workflow.build()
# Connection Analysis
connections = inspector.connections() # List all connections
broken = inspector.find_broken_connections() # Find issues
validation = inspector.validate_connections() # Check validity
# Parameter Tracing
trace = inspector.trace_parameter("create_user", "data")
print(f"Source: {trace.source_node}")
dependencies = inspector.parameter_dependencies("create_user")
# Node Analysis
deps = inspector.node_dependencies("create_user") # Upstream
dependents = inspector.node_dependents("create_user") # Downstream
order = inspector.execution_order() # Topological sort
# Workflow Validation
report = inspector.workflow_validation_report()
if not report['is_valid']:
print(f"Errors: {report['errors']}")
print(f"Warnings: {report['warnings']}")
print(f"Suggestions: {report['suggestions']}")
# High-Level Overview
summary = inspector.workflow_summary()
metrics = inspector.workflow_metrics()
Inspector Methods (18 total):
Use Cases:
Performance: <1ms per method call (cached operations)
Command-line tools matching pytest/mypy patterns for workflow validation and debugging:
# Validate workflow structure and connections
dataflow-validate workflow.py --output text
dataflow-validate workflow.py --fix # Auto-fix common issues
dataflow-validate workflow.py --output json > report.json
# Analyze workflow metrics and complexity
dataflow-analyze workflow.py --verbosity 2
dataflow-analyze workflow.py --format json
# Generate reports and documentation
dataflow-generate workflow.py report --output-dir ./reports
dataflow-generate workflow.py diagram # ASCII workflow diagram
dataflow-generate workflow.py docs --output-dir ./docs
# Debug workflows with breakpoints
dataflow-debug workflow.py --breakpoint create_user
dataflow-debug workflow.py --inspect-node create_user
dataflow-debug workflow.py --step # Step-by-step execution
# Profile performance and detect bottlenecks
dataflow-perf workflow.py --bottlenecks
dataflow-perf workflow.py --recommend
dataflow-perf workflow.py --format json > perf.json
CLI Commands (5 total):
Use Cases:
Performance: Industry-standard CLI tool performance (<100ms startup)
New: Comprehensive guides for common DataFlow mistakes
CreateNode vs UpdateNode (saves 1-2 hours):
sdk-users/apps/dataflow/guides/create-vs-update.mdTop 10 Errors (saves 30-120 minutes per error):
sdk-users/apps/dataflow/troubleshooting/top-10-errors.mdfrom dataflow import DataFlow
from kailash.workflow.builder import WorkflowBuilder
from kailash.runtime.local import LocalRuntime
# Initialize DataFlow
db = DataFlow(connection_string="postgresql://user:pass@localhost/db")
# Define model (generates 11 nodes automatically)
@db.model
class User:
id: str # String IDs preserved
name: str
email: str
# Use generated nodes in workflows
workflow = WorkflowBuilder()
workflow.add_node("User_Create", "create_user", {
"data": {"name": "John", "email": "john@example.com"}
})
# Execute
runtime = LocalRuntime()
results, run_id = runtime.execute(workflow.build())
user_id = results["create_user"]["result"] # Access pattern
DataFlow is NOT an ORM. It's a workflow framework that:
Each @db.model class generates 11 nodes:
{Model}_Create - Create single record{Model}_Read - Read by ID{Model}_Update - Update record{Model}_Delete - Delete record{Model}_List - List with filters{Model}_Upsert - Insert or update (atomic){Model}_Count - Efficient COUNT(*) queries{Model}_BulkCreate - Bulk insert{Model}_BulkUpdate - Bulk update{Model}_BulkDelete - Bulk delete{Model}_BulkUpsert - Bulk upsertresults["node_id"]["result"]{} is falsy)if "filter" in kwargs instead of if kwargs.get("filter")Use DataFlow when you need to:
from dataflow import DataFlow
from nexus import Nexus
db = DataFlow(connection_string="...")
@db.model
class User:
id: str
name: str
# Auto-generates API + CLI + MCP
nexus = Nexus(db.get_workflows())
nexus.run() # Instant multi-channel platform
from dataflow import DataFlow
from kailash.workflow.builder import WorkflowBuilder
db = DataFlow(connection_string="...")
# Use db-generated nodes in custom workflows
workflow = WorkflowBuilder()
workflow.add_node("User_Create", "user1", {...})
For DataFlow-specific questions, invoke:
dataflow-specialist - DataFlow implementation and patternstesting-specialist - DataFlow testing strategies (NO MOCKING policy)framework-advisor - Choose between Core SDK and DataFlow