with one click
classification-demo
// Quick Start demo: Build a toxicity classifier and evaluate it — the fastest way to experience the core create → evaluate workflow.
// Quick Start demo: Build a toxicity classifier and evaluate it — the fastest way to experience the core create → evaluate workflow.
| name | classification-demo |
| description | Quick Start demo: Build a toxicity classifier and evaluate it — the fastest way to experience the core create → evaluate workflow. |
| parent_skill | demos |
Build an AI function that detects toxic content using the toxi-text-3M dataset.
This is a quick ~5 minute demo focused on the core create → evaluate workflow. It walks you through:
Explain to user:
Welcome to the Content Moderation Quick Start!
This ~5 minute demo covers the core create → evaluate workflow:
1. **Create Sample Data** - Sample English toxic/not_toxic examples from the toxi-text-3M dataset
2. **Build the Function** - Create a DEMO_DETECT_TOXICITY function using Cortex AI_COMPLETE
3. **Evaluate Performance** - Measure detection accuracy against ground truth labels
By the end, you'll have a working content moderation classifier and understand how to build
and evaluate any custom AI function.
**Objects created:** All prefixed with DEMO_ for easy cleanup
If {database} and {schema} are already known from the prerequisite flow, accept them silently and skip the prompt.
Otherwise, ask user:
Where would you like to create the demo objects?
Database: [e.g., TEMP]
Schema: [e.g., PUBLIC]
All objects will be prefixed with DEMO_ for easy cleanup.
Store the database and schema for use throughout the demo.
Explain to user:
I'll load real-world examples from the toxi-text-3M dataset, filtered to English text
under 500 characters for cleaner labels and more consistent evaluation.
The data includes:
- TEXT: The original text sample
- EXPECTED_OUTPUT: Either "toxic" or "not_toxic"
The dataset is balanced 50/50 between toxic and non-toxic examples to ensure
fair evaluation.
I'll load 150 examples into this table:
- {database}.{schema}.DEMO_TOXICITY_DATA
Run the data generation script:
PYTHONPATH=<SKILL_DIRECTORY>/src uv run --project <SKILL_DIRECTORY> python <SKILL_DIRECTORY>/src/generate_toxicity_data.py \
--connection <CONNECTION_NAME> \
--database {database} \
--schema {schema} \
--train 150 \
--test 0 \
--train-table DEMO_TOXICITY_DATA \
--language en \
--max-length 500
Note: Replace <SKILL_DIRECTORY> with the absolute path to the cortex-ai-function-studio skill directory, and <CONNECTION_NAME> with the active Snowflake connection.
Verify creation:
SELECT COUNT(*) FROM {database}.{schema}.DEMO_TOXICITY_DATA;
Show a few sample rows:
SELECT
LEFT(TEXT, 200) AS TEXT_PREVIEW,
EXPECTED_OUTPUT AS LABEL
FROM {database}.{schema}.DEMO_TOXICITY_DATA
LIMIT 5;
Present the function configuration to the user:
Now we'll create an AI function that detects toxic content.
Please confirm or modify any settings you'd like to change:
Function name: DEMO_DETECT_TOXICITY
Model: llama3.1-70b
Input: TEXT (VARCHAR) - The text to classify
Output: toxicity (string) - Either "toxic" or "not_toxic"
System prompt:
"Classify the given text as either toxic or not toxic.
Return exactly one label: toxic or not_toxic."
User prompt template: "{TEXT}"
⚠️ STOP: Wait for user confirmation or modifications before creating the function.
Load create/SKILL.md and follow it from Step 9 onward (Create UDF), passing all confirmed values as context:
database, schema: From Step 2function_name: DEMO_DETECT_TOXICITYfunction_intention: Detect toxic content in English text.model: llama3.1-70b (or user's choice)inputs: [{"name": "TEXT", "sql_type": "VARCHAR"}]outputs: [{"name": "toxicity", "json_type": "string", "description": "Either toxic or not_toxic"}]system_prompt: Confirmed system prompt from aboveuser_prompt_template: {TEXT}The create workflow will generate the SQL, show it for confirmation, execute it, and run a smoke test. Skip Step 10 (next steps) in the create workflow -- return here after the function is created and tested.
After the function is confirmed working, continue to Step 5.
Present the evaluation configuration to the user:
Let's evaluate how well the function detects toxic content on our data.
Since this is a binary classification task (toxic vs not_toxic), the built-in
exact_match metric works perfectly — no custom metric needed.
Please confirm or modify any settings you'd like to change:
Metric: exact_match
Experiment: auto-generated per evaluation (run_id)
⚠️ STOP: Wait for user confirmation or modifications before running evaluation.
Load evaluate/SKILL.md and follow its workflow from Step 4 onward (Run Evaluation), passing all values as context so the user is not re-asked for information already collected:
function_name: {database}.{schema}.DEMO_DETECT_TOXICITYfunction_model: llama3.1-70b (or user's choice from Step 4)test_table: {database}.{schema}.DEMO_TOXICITY_DATAinput_columns: ['TEXT']label_column: EXPECTED_OUTPUTmetric_name: exact_matchThe evaluation auto-creates an experiment named after its run_id. Capture experiment_name from the JSON output for the queries below.
The evaluate workflow will run the evaluation and present results. Skip Step 6 (next steps) in the evaluate workflow -- return here after results are presented.
Once evaluation is done, review the results. Show the scores to the user. Offer to see what cases did not match:
Would you like to see which cases the function got wrong?
If yes, query the per-row eval artifact (requires ENABLE_EXPERIMENT_SNOWURL_READ_PATH_RESOLUTION). First create the JSON file format (required — inline (TYPE => JSON) isn't supported on SnowURL):
CREATE OR REPLACE TEMPORARY FILE FORMAT eval_detail_json_fmt
TYPE = JSON
STRIP_OUTER_ARRAY = TRUE;
SELECT
LEFT($1:input_text::STRING, 150) AS TEXT_PREVIEW,
$1:expected::STRING AS EXPECTED_LABEL,
$1:predicted::STRING AS PREDICTED_LABEL,
$1:metric_score::FLOAT AS SCORE,
$1:metric_feedback::STRING AS FEEDBACK
FROM 'snow://experiment/{experiment_name}/versions/EVAL/eval_detail.json'
(FILE_FORMAT => eval_detail_json_fmt)
WHERE $1:metric_score::FLOAT < 1.0
ORDER BY SCORE
LIMIT 20;
Discuss common failure patterns (e.g., borderline cases, sarcasm misclassified, mild profanity in casual context, false positives on quoted speech).
After reviewing results, continue to Step 6.
The Content Moderation demo is complete!
Would you like to clean up the demo objects?
This will drop:
- {database}.{schema}.DEMO_TOXICITY_DATA
- {database}.{schema}.DEMO_DETECT_TOXICITY (function)
- The per-evaluation experiment ({experiment_name})
Options:
1. Yes - Clean up all demo objects
2. No - Keep objects for further exploration
⚠️ STOP: Wait for user selection before proceeding.
If yes, execute:
DROP TABLE IF EXISTS {database}.{schema}.DEMO_TOXICITY_DATA;
DROP FUNCTION IF EXISTS {database}.{schema}.DEMO_DETECT_TOXICITY(VARCHAR);
DROP EXPERIMENT IF EXISTS {database}.{schema}.{experiment_name};
Thanks for trying the Quick Start demo!
Here's what you learned:
- **Created** an AI function that detects toxic content in English text (binary classification)
- **Evaluated** accuracy using the built-in exact_match metric against real-world labeled data
This is the core create → evaluate loop. From here you can:
- **Optimize** your function's prompt (try the "Prompt Optimization" demo)
- **Build custom metrics** for richer evaluation (try the "Custom Evaluation Metrics" demo)
- **Process images or PDFs** with multimodal AI (try the multimodal demos)
Ready to build your own AI function? Just say "create an AI function" to get started.
Create, evaluate, and optimize custom AI functions using Snowflake Cortex AI Complete. Supports text, image, and document inputs. Use when: building LLM-powered functions, evaluating AI function performance, tuning prompts, selecting models, checking async job status. Triggers: ai function builder, custom ai function, user defined ai function, build my own llm function, evaluate ai function, tune ai function, optimize ai function, demo ai function, resume ai function job, image classification, document analysis, multimodal ai function.
Create a new custom AI function. Supports table-based or manual input specification, single or variant outputs. Direct AI_COMPLETE calls or additional pre- and post-processing.
Interactive demo: Generate pseudo-labels from a strong teacher model, build a cheap student function, and evaluate accuracy. Showcases pseudo-labeling and teacher-student distillation.
Interactive demo: Build a legal contract field extractor and create a weighted composite metric that scores 4 fields independently. Showcases custom evaluation metrics for multi-field AI functions.
Interactive demo: Extract structured fields from SEC 10-K filing PDFs using multimodal AI, create a custom composite metric for per-field scoring, and evaluate extraction accuracy with per-field analysis.
Interactive demo: Build a policy-conditioned ticket router where a seed prompt performs poorly, then watch prompt optimization close the accuracy gap through prompt evolution and Pareto cost/quality analysis. The canonical demo for prompt optimization.