Run any Skill in Manus with one click

ai-power-profiling

Measuring and modeling power consumption profiles of generative AI workloads for data center infrastructure planning. Use when: GPU power profiling, data center energy modeling, AI workload characterization, infrastructure planning, power measurement methodology, HPC facility design, generative AI training/inference power analysis, or energy-aware computing.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/hiyenwong/ai_collection --skill ai-power-profiling

Copy and paste this command into Claude Code to install the skill

Source

hiyenwong/ai_collection

Stars1

Forks0

UpdatedJune 4, 2026 at 02:00

File Explorer

4 files

SKILL.md

readonly

More from this repository

same repository

attachment-representations-interbrain-synchrony

hiyenwong/ai_collection

Attachment representations in early childhood as independent endogenous driver of interbrain synchrony during remote cooperation. Novel Remote Partner-Belief Manipulation paradigm isolates attachment representations by manipulating partner-belief. EEG synchrony concentrated at P4 channel (right TPJ). Activation: attachment, interbrain synchrony, EEG hyperscanning, child-adult interaction, attachment representations, social neuroscience, partner-belief manipulation, early childhood, mother-child interaction, brain synchronization, attachment security, social-emotional development.

2026-06-041

sleep-replay-acceleration-sharp

hiyenwong/ai_collection

SHARP (Sleep-based Hierarchical Accelerated Replay) 方法论 — 睡眠启发的分层加速回放框架用于长程非平稳时序模式识别。受啮齿动物慢波睡眠中加速回放启发，通过分离记忆模块和模式识别模块实现无反向传播的长程信用分配。适用于流式时序学习、长程依赖建模、神经科学启发的 AI 架构。触发词：睡眠回放、加速回放、SHARP、时序学习、长程依赖、流式学习、慢波睡眠、hierarchical replay

2026-06-041

piston-control-two-ion-quantum

hiyenwong/ai_collection

Inverse-engineering methodology for piston operations in trapped-ion quantum devices. One ion serves as classical piston driven by Coulomb interaction with quantum-controlled ion. Stationary state determined self-consistently. Inverse-engineering protocols enable precise control of classical ion motion. Provides route toward controlled piston dynamics in microscopic quantum devices.

2026-06-041

quantum-fault-trees-minimal-cut

hiyenwong/ai_collection

Quantum fault tree analysis methodology using quantum computing. Extends classical reliability engineering fault trees to quantum domain. Identifies minimal cut sets in system reliability analysis using quantum algorithms. Applicable to safety-critical systems, cyber-physical systems, and quantum system reliability engineering.

2026-06-041

adaptive-hybrid-feature-fusion-medical

hiyenwong/ai_collection

Adaptive Hybrid Quantum-Classical Feature Fusion methodology for medical image classification. Addresses optimization asymmetries between quantum and classical paradigms using Temperature-Scaled Hybrid Fusion (TSHF), Dynamic Hybrid Fusion (DHF), and Static Hybrid Fusion (SHF) strategies. Use when designing hybrid quantum-classical ML pipelines for healthcare/medical imaging, especially when combining ResNet backbones with variational quantum circuits for diagnostic tasks.

2026-06-041

adaptive-spiking-neuron-asn

hiyenwong/ai_collection

Adaptive Spiking Neuron (ASN) methodology for vision and language modeling. Implements trainable membrane potential dynamics with adaptive firing mechanisms for efficient Spiking Neural Networks (SNNs). Activation: adaptive spiking neuron, ASN, spiking neural network vision language, SNN adaptive neuron, neuromorphic vision language model.

2026-06-041

Source

hiyenwong

hiyenwong/ai_collection

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	ai-power-profiling
description	Measuring and modeling power consumption profiles of generative AI workloads for data center infrastructure planning. Use when: GPU power profiling, data center energy modeling, AI workload characterization, infrastructure planning, power measurement methodology, HPC facility design, generative AI training/inference power analysis, or energy-aware computing.

AI Power Profiling for Data Center Infrastructure

Overview

This skill provides methodology for measuring generative AI workload power profiles at high resolution (0.1s) and scaling measurements to whole-facility energy demand for infrastructure planning. Addresses the challenge of proprietary and inconsistent power consumption data for AI workloads.

Key Innovation: Bridges the gap between high-resolution GPU power measurements and facility-level energy planning using standardized benchmarks and bottom-up modeling.

Core Problem

Current Challenges

Proprietary Data: Power consumption data is largely proprietary
Varying Resolutions: Data reported at inconsistent time granularities
Missing Context: Lack of workload characterization alongside power data
Planning Gap: Difficulty estimating whole-facility energy use
Reproducibility: No standardized benchmarking for power profiles

Impact

Grid connection planning uncertainty
On-site energy generation sizing
Microgrid design challenges
Operational cost estimation errors

Methodology

Step 1: High-Resolution Power Measurement

Equipment Requirements:

NVIDIA H100 GPUs (or equivalent high-performance GPUs)
Power monitoring infrastructure (0.1-second resolution)
HPC data center facility
Power measurement software/hardware

Measurement Resolution: 0.1 seconds (10 Hz sampling)

Key Metrics:

Instantaneous power consumption (W)
Average power over workload duration
Peak power consumption
Power variance/fluctuations

Step 2: Workload Characterization

Use standardized benchmarks for reproducibility:

MLCommons Benchmarks:

Training benchmarks
Fine-tuning benchmarks
Standardized model architectures
Reproducible dataset specifications

vLLM Benchmarks:

Inference workload characterization
Latency vs throughput analysis
Different inference scenarios
Batch size variations

Workload Types:

AI Training: Full model training cycles
Fine-tuning: Pre-trained model adaptation
Inference: Real-time or batch inference

Step 3: Create Power Profile Dataset

Dataset Components:

Time-series power measurements (0.1s resolution)
Workload metadata (model type, size, batch size)
GPU utilization metrics
Memory usage profiles
Duration information

Data Format:

timestamp    power_watts  gpu_util%  memory_gb  workload_type  model_info
0.0          450          95         40         training       LLM-7B
0.1          452          94         41         training       LLM-7B
0.2          455          96         42         training       LLM-7B
...

Step 4: Whole-Facility Energy Modeling

Bottom-Up Modeling Approach:

Scale GPU power to server power (include CPU, memory, storage)
Scale server power to rack power (networking, cooling overhead)
Scale rack power to facility power (HVAC, lighting, infrastructure)

Event-Driven Model:

User behavior patterns drive workload arrivals
Temporal fluctuations from AI workload mix
Realistic facility-level energy profiles
Peak demand estimation

Scaling Factors:

Server Power = GPU Power × GPU_count + CPU_power + Memory_power + Storage_power + Overhead
Rack Power = Σ(Server Power) + Network_power + Cooling_overhead
Facility Power = Σ(Rack Power) + HVAC + Lighting + Infrastructure + PUE_factor

PUE (Power Usage Effectiveness): Typical range 1.2-1.6 for modern data centers

Step 5: Infrastructure Planning Applications

Grid Connection Planning:

Peak demand estimation
Average demand calculation
Capacity requirements
Connection sizing

On-Site Energy Generation:

Solar/wind sizing
Battery storage requirements
Peak shaving strategies
Renewable integration

Distributed Microgrids:

Multiple facility coordination
Load balancing strategies
Backup power sizing
Grid independence analysis

Key Findings

Power Consumption Characteristics

Training Workloads:

High sustained power (450-700W per H100 GPU)
Longer duration (hours to weeks)
Higher total energy consumption
More predictable power profiles

Fine-tuning Workloads:

Medium sustained power (400-600W)
Moderate duration (hours)
Variable power based on fine-tuning approach
Adaptive power profiles

Inference Workloads:

Variable power (300-500W per request)
Short duration (milliseconds to seconds)
Bursty power profiles
Request-rate dependent

Temporal Fluctuations

User Behavior Impact:

Workload arrivals follow user patterns
Peak hours vs off-peak variations
Geographic distribution effects
Seasonal demand variations

Realistic Facility Profiles:

Not constant power draw
Significant temporal variation
Peak-to-average ratio matters for planning
Duration curves for capacity sizing

Implementation Workflow

Phase 1: Setup Measurement Infrastructure

# Example: GPU power monitoring setup
import pynvml

pynvml.nvmlInit()
gpu_count = pynvml.nvmlDeviceGetCount()

def get_power_sample(gpu_index):
    handle = pynvml.nvmlDeviceGetHandleByIndex(gpu_index)
    power = pynvml.nvmlDeviceGetPowerUsage(handle)  # milliwatts
    return power / 1000.0  # convert to watts

# Sample at 0.1s resolution
import time
power_data = []
for _ in range(1000):  # 100 seconds
    sample = {
        'timestamp': time.time(),
        'power': [get_power_sample(i) for i in range(gpu_count)]
    }
    power_data.append(sample)
    time.sleep(0.1)

Phase 2: Run Benchmarks

# MLCommons training benchmark
mlperf_training --model bert_large --batch_size 32

# vLLM inference benchmark
vllm_benchmark --model llama-7b --requests 1000 --batch_size 16

Phase 3: Collect Power Data

While benchmark runs, collect power samples:

Record at 0.1s intervals
Tag with workload metadata
Store in structured format
Include GPU utilization metrics

Phase 4: Create Power Profile Dataset

import pandas as pd

# Organize power data
df = pd.DataFrame(power_data)
df['workload_type'] = 'training'
df['model'] = 'bert_large'
df['batch_size'] = 32

# Save to dataset
df.to_csv('power_profile_training_bert.csv', index=False)

Phase 5: Scale to Facility Level

def estimate_facility_power(gpu_profiles, facility_config):
    """
    Scale GPU power to facility power
    
    Args:
        gpu_profiles: DataFrame with GPU power measurements
        facility_config: Dict with facility parameters
    
    Returns:
        DataFrame with facility power estimates
    """
    # Server-level scaling
    server_power = (
        gpu_profiles['gpu_power'] * facility_config['gpu_per_server'] +
        facility_config['cpu_power'] +
        facility_config['memory_power'] +
        facility_config['storage_power'] +
        facility_config['server_overhead']
    )
    
    # Rack-level scaling
    rack_power = (
        server_power * facility_config['servers_per_rack'] +
        facility_config['network_power'] +
        facility_config['rack_cooling']
    )
    
    # Facility-level scaling
    facility_power = (
        rack_power * facility_config['racks'] +
        facility_config['hvac'] +
        facility_config['lighting'] +
        facility_config['infrastructure']
    ) * facility_config['pue']
    
    return facility_power

# Example facility configuration
facility_config = {
    'gpu_per_server': 8,
    'servers_per_rack': 10,
    'racks': 50,
    'cpu_power': 200,  # W
    'memory_power': 50,  # W per server
    'storage_power': 30,  # W per server
    'server_overhead': 20,  # W
    'network_power': 500,  # W per rack
    'rack_cooling': 1000,  # W per rack
    'hvac': 50000,  # W
    'lighting': 10000,  # W
    'infrastructure': 20000,  # W
    'pue': 1.4
}

Research Applications

Capacity Planning

Questions Answered:

What peak demand should grid connection support?
How much on-site generation needed?
What battery storage capacity required?
How many GPUs can facility support?

Energy Optimization

Use Cases:

Workload scheduling to minimize peak demand
Renewable energy integration timing
Cooling system optimization
Power-aware job scheduling

Cost Estimation

Benefits:

Accurate energy cost predictions
Operational cost modeling
Infrastructure investment sizing
ROI calculations for efficiency measures

Dataset Availability

Public Dataset: Power profile measurements made publicly available

Dataset Contents:

Training workload power profiles
Fine-tuning power profiles
Inference power profiles
Timestamps and metadata
GPU utilization data

Reproducibility: Benchmarks and methods fully documented

GPU Hardware Reference

NVIDIA H100 GPU:

Peak power: ~700W
Typical training power: 450-600W
Typical inference power: 300-500W
Memory: 80GB HBM3
Architecture: Hopper

Power Measurement Tools:

nvidia-smi (utility)
pynvml (Python library)
dcgm (Data Center GPU Manager)
Power meters (hardware)

Facility Infrastructure Components

Power Infrastructure

UPS Systems: Uninterruptible power supply
PDU: Power distribution units
Transformers: Voltage conversion
Switchgear: Power switching

Cooling Infrastructure

HVAC: Heating, ventilation, air conditioning
Chillers: Liquid cooling systems
CRAC: Computer room air conditioning
Liquid cooling: Direct-to-chip cooling

Networking Infrastructure

Switches: Network switches
Routers: Network routers
Cabling: Fiber and copper cables
Load balancers: Traffic distribution

Research Paper Reference

Paper: "Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning"

Authors: Roberto Vercellino, Jared Willard, Gustavo Campos, et al.
arXiv ID: 2604.07345
Published: April 8, 2026
Categories: eess.SY, cs.DC, cs.LG
Link: https://arxiv.org/abs/2604.07345

Related Skills

data-center-operations: Facility management
energy-aware-computing: Power optimization
gpu-optimization: GPU performance tuning
benchmarking: Workload characterization

ai-power-profiling

More from this repository

AI Power Profiling for Data Center Infrastructure

Overview

Core Problem

Current Challenges

Impact

Methodology

Step 1: High-Resolution Power Measurement

Step 2: Workload Characterization

Step 3: Create Power Profile Dataset

Step 4: Whole-Facility Energy Modeling

Step 5: Infrastructure Planning Applications

Key Findings

Power Consumption Characteristics

Temporal Fluctuations

Implementation Workflow

Phase 1: Setup Measurement Infrastructure

Phase 2: Run Benchmarks

Phase 3: Collect Power Data

Phase 4: Create Power Profile Dataset

Phase 5: Scale to Facility Level

Research Applications

Capacity Planning

Energy Optimization

Cost Estimation

Dataset Availability

GPU Hardware Reference

Facility Infrastructure Components

Power Infrastructure

Cooling Infrastructure

Networking Infrastructure

Research Paper Reference

Related Skills

See Also

AI Power Profiling for Data Center Infrastructure

Overview

Core Problem

Current Challenges

Impact

Methodology

Step 1: High-Resolution Power Measurement

Step 2: Workload Characterization

Step 3: Create Power Profile Dataset

Step 4: Whole-Facility Energy Modeling

Step 5: Infrastructure Planning Applications

Key Findings

Power Consumption Characteristics

Temporal Fluctuations

Implementation Workflow

Phase 1: Setup Measurement Infrastructure

Phase 2: Run Benchmarks

Phase 3: Collect Power Data

Phase 4: Create Power Profile Dataset

Phase 5: Scale to Facility Level

Research Applications

Capacity Planning

Energy Optimization

Cost Estimation

Dataset Availability

GPU Hardware Reference

Facility Infrastructure Components

Power Infrastructure

Cooling Infrastructure

Networking Infrastructure

Research Paper Reference

Related Skills

See Also

More from this repository