// Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.
| name | hypogenic |
| description | Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks. |
Hypogenic provides automated hypothesis generation and testing using large language models to accelerate scientific discovery. The framework supports three approaches: HypoGeniC (data-driven hypothesis generation), HypoRefine (synergistic literature and data integration), and Union methods (mechanistic combination of literature and data-driven hypotheses).
Get started with Hypogenic in minutes:
# Install the package
uv pip install hypogenic
# Clone example datasets
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# Run basic hypothesis generation
hypogenic_generation --config ./data/your_task/config.yaml --method hypogenic --num_hypotheses 20
# Run inference on generated hypotheses
hypogenic_inference --config ./data/your_task/config.yaml --hypotheses output/hypotheses.json
Or use Python API:
from hypogenic import BaseTask
# Create task with your configuration
task = BaseTask(config_path="./data/your_task/config.yaml")
# Generate hypotheses
task.generate_hypotheses(method="hypogenic", num_hypotheses=20)
# Run inference
results = task.inference(hypothesis_bank="./output/hypotheses.json")
Use this skill when working on:
Automated Hypothesis Generation
Literature Integration
Performance Optimization
Flexible Configuration
Proven Results
Generate hypotheses solely from observational data through iterative refinement.
Process:
Best for: Exploratory research without existing literature, pattern discovery in novel datasets
Synergistically combine existing literature with empirical data through an agentic framework.
Process:
Best for: Research with established theoretical foundations, validating or extending existing theories
Mechanistically combine literature-only hypotheses with framework outputs.
Variants:
Best for: Comprehensive hypothesis coverage, eliminating redundancy while maintaining diverse perspectives
Install via pip:
uv pip install hypogenic
Optional dependencies:
Clone example datasets:
# For HypoGeniC examples
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# For HypoRefine/Union examples
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data
Datasets must follow HuggingFace datasets format with specific naming conventions:
Required files:
<TASK>_train.json: Training data<TASK>_val.json: Validation data<TASK>_test.json: Test dataRequired keys in JSON:
text_features_1 through text_features_n: Lists of strings containing feature valueslabel: List of strings containing ground truth labelsExample (headline click prediction):
{
"headline_1": [
"What Up, Comet? You Just Got *PROBED*",
"Scientists Made a Breakthrough in Quantum Computing"
],
"headline_2": [
"Scientists Everywhere Were Holding Their Breath Today. Here's Why.",
"New Quantum Computer Achieves Milestone"
],
"label": [
"Headline 2 has more clicks than Headline 1",
"Headline 1 has more clicks than Headline 2"
]
}
Important notes:
extract_label() function output formatreview_text, post_content, etc.)Each task requires a config.yaml file specifying:
Required elements:
Template capabilities:
${text_features_1}, ${num_hypotheses})Configuration structure:
task_name: your_task_name
train_data_path: ./your_task_train.json
val_data_path: ./your_task_val.json
test_data_path: ./your_task_test.json
prompt_templates:
# Extra keys for reusable prompt components
observations: |
Feature 1: ${text_features_1}
Feature 2: ${text_features_2}
Observation: ${label}
# Required templates
batched_generation:
system: "Your system prompt here"
user: "Your user prompt with ${num_hypotheses} placeholder"
inference:
system: "Your inference system prompt"
user: "Your inference user prompt"
# Optional templates for advanced features
few_shot_baseline: {...}
is_relevant: {...}
adaptive_inference: {...}
adaptive_selection: {...}
Refer to references/config_template.yaml for a complete example configuration.
To use literature-based hypothesis generation, you must preprocess PDF papers:
Step 1: Setup GROBID (first time only)
bash ./modules/setup_grobid.sh
Step 2: Add PDF files
Place research papers in literature/YOUR_TASK_NAME/raw/
Step 3: Process PDFs
# Start GROBID service
bash ./modules/run_grobid.sh
# Process PDFs for your task
cd examples
python pdf_preprocess.py --task_name YOUR_TASK_NAME
This converts PDFs to structured format for hypothesis extraction. Automated literature search will be supported in future releases.
hypogenic_generation --help
Key parameters:
hypogenic_inference --help
Key parameters:
For programmatic control and custom workflows, use Hypogenic directly in your Python code:
from hypogenic import BaseTask
# Clone example datasets first
# git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# Load your task with custom extract_label function
task = BaseTask(
config_path="./data/your_task/config.yaml",
extract_label=lambda text: extract_your_label(text)
)
# Generate hypotheses
task.generate_hypotheses(
method="hypogenic",
num_hypotheses=20,
output_path="./output/hypotheses.json"
)
# Run inference
results = task.inference(
hypothesis_bank="./output/hypotheses.json",
test_data="./data/your_task/your_task_test.json"
)
# For literature-integrated approaches
# git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data
# Generate with HypoRefine
task.generate_hypotheses(
method="hyporefine",
num_hypotheses=15,
literature_path="./literature/your_task/",
output_path="./output/"
)
# This generates 3 hypothesis banks:
# - HypoRefine (integrated approach)
# - Literature-only hypotheses
# - Literature∪HypoRefine (union)
from examples.multi_hyp_inference import run_multi_hypothesis_inference
# Test multiple hypotheses simultaneously
results = run_multi_hypothesis_inference(
config_path="./data/your_task/config.yaml",
hypothesis_bank="./output/hypotheses.json",
test_data="./data/your_task/your_task_test.json"
)
The extract_label() function is critical for parsing LLM outputs. Implement it based on your task:
def extract_label(llm_output: str) -> str:
"""Extract predicted label from LLM inference text.
Default behavior: searches for 'final answer:\s+(.*)' pattern.
Customize for your domain-specific output format.
"""
import re
match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
if match:
return match.group(1).strip()
return llm_output.strip()
Important: Extracted labels must match the format of label values in your dataset for correct accuracy calculation.
Scenario: Detecting AI-generated content without prior theoretical framework
Steps:
config.yaml with appropriate prompt templateshypogenic_generation --config config.yaml --method hypogenic --num_hypotheses 20
hypogenic_inference --config config.yaml --hypotheses output/hypotheses.json --test_data data/test.json
Scenario: Deception detection in hotel reviews building on existing research
Steps:
config.yaml with literature processing and data generation templateshypogenic_generation --config config.yaml --method hyporefine --papers papers/ --num_hypotheses 15
Scenario: Mental stress detection maximizing hypothesis diversity
Steps:
hypogenic_generation --config config.yaml --method union --literature_hypotheses lit_hyp.json
Caching: Enable Redis caching to reduce API costs and computation time for repeated LLM calls
Parallel Processing: Leverage multiple workers for large-scale hypothesis generation and testing
Adaptive Refinement: Use challenging examples to iteratively improve hypothesis quality
Research using hypogenic has demonstrated:
Issue: Generated hypotheses are too generic
Solution: Refine prompt templates in config.yaml to request more specific, testable hypotheses
Issue: Poor inference performance Solution: Ensure dataset has sufficient training examples, adjust hypothesis generation parameters, or increase number of hypotheses
Issue: Label extraction failures
Solution: Implement custom extract_label() function for domain-specific output parsing
Issue: GROBID PDF processing fails
Solution: Ensure GROBID service is running (bash ./modules/run_grobid.sh) and PDFs are valid research papers
To add a new task or dataset to Hypogenic:
Create three JSON files following the required format:
your_task_train.jsonyour_task_val.jsonyour_task_test.jsonEach file must have keys for text features (text_features_1, etc.) and label.
Define your task configuration with:
${text_features_1}, ${num_hypotheses})Create a custom label extraction function that parses LLM outputs for your domain:
from hypogenic import BaseTask
def extract_my_label(llm_output: str) -> str:
"""Custom label extraction for your task.
Must return labels in same format as dataset 'label' field.
"""
# Example: Extract from specific format
if "Final prediction:" in llm_output:
return llm_output.split("Final prediction:")[-1].strip()
# Fallback to default pattern
import re
match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
return match.group(1).strip() if match else llm_output.strip()
# Use your custom task
task = BaseTask(
config_path="./your_task/config.yaml",
extract_label=extract_my_label
)
For HypoRefine/Union methods:
literature/your_task_name/raw/ directorypdf_preprocess.pyRun hypothesis generation and inference using CLI or Python API:
# CLI approach
hypogenic_generation --config your_task/config.yaml --method hypogenic --num_hypotheses 20
hypogenic_inference --config your_task/config.yaml --hypotheses output/hypotheses.json
# Or use Python API (see Python API Usage section)
Understanding the repository layout:
hypothesis-generation/
├── hypogenic/ # Core package code
├── hypogenic_cmd/ # CLI entry points
├── hypothesis_agent/ # HypoRefine agent framework
├── literature/ # Literature processing utilities
├── modules/ # GROBID and preprocessing modules
├── examples/ # Example scripts
│ ├── generation.py # Basic HypoGeniC generation
│ ├── union_generation.py # HypoRefine/Union generation
│ ├── inference.py # Single hypothesis inference
│ ├── multi_hyp_inference.py # Multiple hypothesis inference
│ └── pdf_preprocess.py # Literature PDF processing
├── data/ # Example datasets (clone separately)
├── tests/ # Unit tests
└── IO_prompting/ # Prompt templates and experiments
Key directories:
Liu, H., Huang, S., Hu, J., Zhou, Y., & Tan, C. (2025). HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation. arXiv preprint arXiv:2504.11524.
BibTeX:
@misc{liu2025hypobenchsystematicprincipledbenchmarking,
title={HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation},
author={Haokun Liu and Sicong Huang and Jingyu Hu and Yangqiaoyu Zhou and Chenhao Tan},
year={2025},
eprint={2504.11524},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.11524},
}
Liu, H., Zhou, Y., Li, M., Yuan, C., & Tan, C. (2024). Literature Meets Data: A Synergistic Approach to Hypothesis Generation. arXiv preprint arXiv:2410.17309.
BibTeX:
@misc{liu2024literaturemeetsdatasynergistic,
title={Literature Meets Data: A Synergistic Approach to Hypothesis Generation},
author={Haokun Liu and Yangqiaoyu Zhou and Mingxuan Li and Chenfei Yuan and Chenhao Tan},
year={2024},
eprint={2410.17309},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2410.17309},
}
Zhou, Y., Liu, H., Srivastava, T., Mei, H., & Tan, C. (2024). Hypothesis Generation with Large Language Models. In Proceedings of EMNLP Workshop of NLP for Science.
BibTeX:
@inproceedings{zhou2024hypothesisgenerationlargelanguage,
title={Hypothesis Generation with Large Language Models},
author={Yangqiaoyu Zhou and Haokun Liu and Tejes Srivastava and Hongyuan Mei and Chenhao Tan},
booktitle = {Proceedings of EMNLP Workshop of NLP for Science},
year={2024},
url={https://aclanthology.org/2024.nlp4science-1.10/},
}
Clone these repositories for ready-to-use examples:
# HypoGeniC examples (data-driven only)
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data
# HypoRefine/Union examples (literature + data)
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data
For contributions or questions, visit the GitHub repository and check the issues page.
config_template.yaml - Complete example configuration file with all required prompt templates and parameters. This includes:
Scripts directory is available for:
Assets directory is available for: