Run any Skill in Manus with one click

$pwd:

hologres-schema-generator

Name: Hologres Schema Generator
Author: aliyun

// Hologres DDL schema design and table creation expert. Use for generating CREATE TABLE statements, choosing storage formats (column/row/row-column), configuring table properties (distribution_key, clustering_key, bitmap_columns, event_time_column), designing partition tables, selecting data types, and optimizing table schemas for different query patterns. Triggers: "hologres建表", "hologres create table", "hologres schema", "hologres DDL", "distribution_key", "clustering_key", "bitmap_columns", "event_time_column", "列存", "行存", "行列共存", "分区表", "hologres table design", "表设计", "hologres数据类型"

Run Skill in Manus

$ git log --oneline --stat

stars:13

forks:0

updated:May 14, 2026 at 07:56

File Explorer

6 files

SKILL.md

readonly

related-skills.json

same repository

hologres-instance-health-analyse.md

from "aliyun/hologres-ai-plugins"

Hologres 实例健康诊断与巡检分析。当用户提到实例健康诊断、实例巡检、实例使用状况分析、慢查询分析、报错分析、Warehouse 资源分析、连接数分析、CPU 内存使用分析、查询失败排查等场景时使用。覆盖 Warehouse 资源巡检、FAILED 报错归类分析、CPU/内存粒度慢查询分析三大核心模块，输出结构化诊断报告和优化建议。所有 SQL 通过 hologres-cli 执行，享有安全护栏、结构化 JSON 输出和自动错误重试能力。

2026-05-1513

hologres-cli.md

from "aliyun/hologres-ai-plugins"

AI-agent-friendly Hologres CLI with safety guardrails and structured JSON output. Use for database operations, schema inspection, SQL execution, data import/export, Dynamic Table lifecycle management (V3.1+ syntax), and GUC parameter management. Triggers: "hologres cli", "hologres command", "hologres database", "dynamic table", "hologres查询", "hologres guc", "GUC parameter"

2026-05-1413

hologres-ad-campaign.md

from "aliyun/hologres-ai-plugins"

通过 Hologres AI Function 生成广告素材并模拟投放效果分析。当用户提到广告视频生成、宣传视频制作、素材转视频、Hologres AI Function 生成视频、虚拟投放模拟、ROI 分析、投放效果分析、广告素材生成等场景时使用。全 SQL 链路覆盖从素材管理、主题图片生成、分镜脚本、视频合成、虚拟投放到实时 ROI 分析的完整闭环。

2026-05-1413

hologres-bsi-profile-analysis.md

from "aliyun/hologres-ai-plugins"

Hologres BSI（位切片索引）画像分析 Skill，用于用户画像和标签计算。适用于 BSI 表设计、数据导入、属性标签+行为标签联合人群圈选、 GMV 分析、标签分布统计、Top K 查询、分桶并行计算等场景。 Triggers: "BSI", "位切片索引", "画像分析", "用户画像", "标签计算", "人群圈选", "行为标签", "bsi_build", "bsi_sum", "bsi_filter", "bsi_stat", "bsi_topk", "roaring bitmap profile", "标签圈人", "分桶计算"

2026-05-1413

hologres-privileges.md

from "aliyun/hologres-ai-plugins"

Hologres privilege management using PostgreSQL standard authorization model (expert permission model). Use for creating users, granting/revoking Schema/table/column/view privileges, configuring default privileges for future objects, diagnosing permission issues, and planning role-based access control. Triggers: "hologres权限", "hologres grant", "hologres revoke", "permission denied", "权限管理", "hologres privileges", "hologres authorization", "default privileges", "角色权限", "授权"

2026-05-1413

hologres-query-optimizer.md

from "aliyun/hologres-ai-plugins"

Hologres Query Execution Plan Analyzer and Optimizer. Use for analyzing SQL performance issues, understanding EXPLAIN/EXPLAIN ANALYZE output, interpreting query operators, and providing optimization recommendations for Hologres queries. Triggers: "hologres explain", "query plan", "execution plan", "sql optimization", "query performance", "hologres performance", "slow query", "query optimizer", "explain analyze"

2026-05-1413

package.json

"author": "aliyun"

"repository": "aliyun/hologres-ai-plugins"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Database ArchitectsComputer and Mathematical Occupations15-1243L4

name

hologres-schema-generator

description

Hologres DDL schema design and table creation expert. Use for generating CREATE TABLE statements, choosing storage formats (column/row/row-column), configuring table properties (distribution_key, clustering_key, bitmap_columns, event_time_column), designing partition tables, selecting data types, and optimizing table schemas for different query patterns. Triggers: "hologres建表", "hologres create table", "hologres schema", "hologres DDL", "distribution_key", "clustering_key", "bitmap_columns", "event_time_column", "列存", "行存", "行列共存", "分区表", "hologres table design", "表设计", "hologres数据类型"

Prerequisites

This skill requires hologres-cli to be installed first:

pip install hologres-cli
export HOLOGRES_SKILL=hologres-schema-generator

All SQL execution depends on hologres-cli commands (hologres sql run --write, hologres table create).

Hologres Schema Design & Table Creation

Generate optimized Hologres DDL statements with proper storage format, indexing, and distribution strategies.

Information Gathering (IMPORTANT)

Before generating DDL, check whether the user has provided enough context. Many Hologres table properties are immutable after creation (orientation, distribution_key, clustering_key, event_time_column, primary key, partition column) — getting them wrong means recreating the table.

Required Information

If ANY of the following are unclear, ask the user before generating DDL:

Dimension	Why Critical	Example Question
Query pattern	Determines `orientation` (column/row/row-column)	"This table is mainly used for OLAP analytics, KV point lookups, or both?"
Column definitions	Core schema structure	"Can you list the main columns and their rough types (text, number, timestamp, etc.)?"
Primary key / unique key	Determines UPSERT capability; immutable	"Does this table need deduplication or updates by a unique key? If so, which column(s)?"
Main query conditions	Determines distribution_key, clustering_key, bitmap	"What columns are most often used in WHERE, JOIN, or GROUP BY?"

Conditionally Required

Ask these only when relevant signals appear:

Signal	Follow-up Question
Table looks like time-series or log data	"What is the estimated daily data volume? (determines whether to use partitioning)"
User mentions JOIN with other tables	"Which column will this table JOIN on, and what is the distribution_key of the other table?"
Large data volume or mentions retention	"How long should data be retained? (recommend dynamic partition management for lifecycle)"
User mentions both analytics and serving	"What is the read pattern — batch scans, point lookups by ID, or both?"

Can Use Defaults (No Need to Ask)

Property	Default Behavior
`bitmap_columns`	Auto-enabled for TEXT columns; can ALTER later
`dictionary_encoding_columns`	Use `:auto` mode; can ALTER later
`storage_mode`	Default `hot`; can ALTER later
`time_to_live_in_seconds`	Not recommended. Deletion time is non-deterministic. Use dynamic partition management instead

Gathering Flow

User request → Check available info
  |
  +-- Columns + query pattern + key info all clear?
  |     → Generate DDL directly
  |
  +-- Missing critical info?
  |     → Ask 1-3 focused questions (batch them, don't ask one at a time)
  |
  +-- Very vague request (e.g., "help me create a user table")?
        → Ask: 1) main columns  2) query pattern  3) key/dedup needs

Principle: Batch questions into a single round. Never ask more than 3-4 questions at once. If the user provides partial info, fill in reasonable defaults for the rest and explain your assumptions in comments.

Storage Formats

Choose the storage format based on your primary query pattern.

Format	Syntax	Best For	Primary Key
Column store	`orientation = 'column'`	OLAP, aggregation, scan-heavy queries	Optional
Row store	`orientation = 'row'`	Point lookups, high-QPS KV queries	Required
Row-column store	`orientation = 'row,column'`	Mixed workloads (OLAP + point lookup)	Required

Default recommendation: Use row,column (row-column store) when the workload is unclear. It handles both OLAP and point queries well.

Quick Start

-- Column store table (OLAP / analytics)
CREATE TABLE orders (
  order_id BIGINT NOT NULL,
  user_id BIGINT,
  amount DECIMAL(18,2),
  status TEXT,
  order_time TIMESTAMPTZ NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (order_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'order_id',
  clustering_key = 'order_time:asc',
  event_time_column = 'order_time',
  bitmap_columns = 'status,user_id'
);

-- Row store table (high-QPS point lookup)
CREATE TABLE user_profile (
  user_id BIGINT NOT NULL,
  name TEXT,
  email TEXT,
  tags JSONB,
  updated_at TIMESTAMPTZ,
  PRIMARY KEY (user_id)
)
WITH (
  orientation = 'row',
  distribution_key = 'user_id',
  clustering_key = 'user_id'
);

CREATE TABLE Syntax (V2.1+)

CREATE TABLE [IF NOT EXISTS] [schema.]table_name (
  column_name data_type [NOT NULL] [DEFAULT expr],
  ...
  [PRIMARY KEY (col1 [, col2, ...])]
)
[PARTITION BY LIST (partition_column)]
WITH (
  orientation = '{column | row | row,column}',
  distribution_key = 'col1[,col2]',
  clustering_key = 'col1[:asc|:desc][,col2[:asc|:desc]]',
  event_time_column = 'col',
  bitmap_columns = 'col1[,col2,...]',
  dictionary_encoding_columns = 'col1[:auto|:on|:off][,...]',
  time_to_live_in_seconds = 'N',          -- NOT recommended, see below
  storage_mode = '{hot | cold}',
  table_group = 'group_name'
);

Legacy Syntax (All Versions)

BEGIN;
CREATE TABLE table_name (...);
CALL set_table_property('table_name', 'orientation', 'column');
CALL set_table_property('table_name', 'distribution_key', 'col1');
CALL set_table_property('table_name', 'clustering_key', 'col1:asc');
CALL set_table_property('table_name', 'event_time_column', 'col1');
CALL set_table_property('table_name', 'bitmap_columns', 'col1,col2');
COMMIT;

Table Properties Reference

distribution_key (Distribution Key)

Controls how data is hash-distributed across shards. Critical for JOIN and GROUP BY performance.

Rule	Description
Choose high-cardinality columns	Avoid data skew
Use JOIN/GROUP BY columns	Enable local computation, avoid shuffle
Max 2 columns	More columns reduce distribution effectiveness
Must be subset of PK	If table has a primary key
Immutable after creation	Cannot ALTER, must recreate table

-- Single column
distribution_key = 'user_id'

-- Two columns (join on both)
distribution_key = 'order_id,user_id'

clustering_key (Clustering Key / Sorted Index)

Physically sorts data within files. Accelerates range queries and filters.

Rule	Description
Use range-query columns	e.g., timestamp, date columns
Max 2-3 columns	More columns dilute sort benefit
Put high-selectivity column first	Most-filtered column goes first
Specify sort order	`:asc` (default) or `:desc`
Column store only	Row store uses PK as sort key

clustering_key = 'order_time:asc'
clustering_key = 'ds:asc,order_time:asc'

event_time_column (Segment Key)

Organizes data files by time ranges. Enables file-level pruning for time-range queries.

Rule	Description
Use time/date columns	Timestamp of data ingestion or event time
At most 1 column	Only one segment key per table
Column store only	Not applicable to row store
Combine with partition	Partition for coarse pruning, segment key for fine pruning

event_time_column = 'event_time'

bitmap_columns (Bitmap Index)

Builds bitmap indexes for fast equality filtering on low-to-medium cardinality columns.

Rule	Description
Use filter columns	Columns frequently in WHERE clause
Low-medium cardinality	status, type, region — NOT user_id
TEXT columns auto-enabled	Default bitmap for text columns
Can be added after creation	`ALTER TABLE ... SET (bitmap_columns = ...)`

bitmap_columns = 'status,payment_type,region'

dictionary_encoding_columns (Dictionary Encoding)

Compresses text columns by mapping values to integers. Speeds up GROUP BY and aggregations.

Rule	Description
Use `:auto` mode	Let Hologres decide based on cardinality
Good for low-cardinality text	country, status, category
Avoid high-cardinality columns	user_id, order_id — no compression benefit
Don't set if unsure	Incorrect setting may hurt performance

dictionary_encoding_columns = 'country:auto,status:auto'

time_to_live_in_seconds (TTL) — NOT RECOMMENDED

Do NOT use time_to_live_in_seconds. The actual deletion time is non-deterministic — data will be deleted at an arbitrary time after the specified TTL, not at a precise point. This makes it unreliable for data lifecycle management.

Recommended alternative: Use dynamic partition management — create daily/hourly partitions and drop old partitions on a schedule (via cron or scheduling system). This gives you precise, predictable data lifecycle control.

-- BAD: TTL-based lifecycle (deletion time unpredictable)
-- time_to_live_in_seconds = '2592000'

-- GOOD: Partition-based lifecycle (precise control)
-- 1. Create table with daily partitions
CREATE TABLE events (
  event_id BIGINT NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, event_id)
) PARTITION BY LIST (ds)
WITH (orientation = 'column', distribution_key = 'event_id');

-- 2. Drop old partitions on schedule (e.g., retain 30 days)
DROP TABLE IF EXISTS events_20251101;  -- drop partition older than 30 days

Primary Key Design

Storage Format	PK Requirement	Notes
Column store	Optional	Add PK only if needed for UPSERT
Row store	Required	PK drives the row-store index
Row-column store	Required	PK serves both point lookup and analytics

Rules:

Max 32 columns in composite PK
PK columns must be NOT NULL and UNIQUE
Prohibited types: FLOAT, DOUBLE, NUMERIC, ARRAY, JSON, JSONB
PK cannot be altered after creation — must recreate table
Avoid SERIAL as PK — causes table-level locks on write

Partition Table Design

Use partitions for large tables with time-based or categorical data.

-- Parent table
CREATE TABLE events (
  event_id BIGINT NOT NULL,
  user_id BIGINT,
  event_type TEXT,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, event_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'event_id',
  clustering_key = 'ds:asc',
  event_time_column = 'ds'
);

-- Child partitions
CREATE TABLE events_20251201 PARTITION OF events FOR VALUES IN ('20251201');
CREATE TABLE events_20251202 PARTITION OF events FOR VALUES IN ('20251202');

Rules:

Only LIST partitioning is supported
Partition column must be part of PK (if PK exists)
Supported partition column types: TEXT, VARCHAR, INT, DATE (V1.3.22+)
Skip daily partitions if daily data < 100M rows — use event_time_column instead
Always filter on partition column in queries for pruning

Scenario-Based Templates

1. High-QPS Point Lookup (KV)

CREATE TABLE user_kv (
  user_id BIGINT NOT NULL PRIMARY KEY,
  profile JSONB,
  updated_at TIMESTAMPTZ
)
WITH (
  orientation = 'row',
  distribution_key = 'user_id'
);

2. Prefix Range Scan

CREATE TABLE order_lines (
  order_id BIGINT NOT NULL,
  line_no INT NOT NULL,
  product_id BIGINT,
  quantity INT,
  PRIMARY KEY (order_id, line_no)
)
WITH (
  orientation = 'row',
  distribution_key = 'order_id',
  clustering_key = 'order_id'
);

3. Time-Range Analytics (Partitioned)

CREATE TABLE page_views (
  view_id BIGINT NOT NULL,
  user_id BIGINT,
  page_url TEXT,
  view_time TIMESTAMPTZ,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, view_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'column',
  distribution_key = 'view_id',
  clustering_key = 'view_time:asc',
  event_time_column = 'view_time',
  bitmap_columns = 'user_id,page_url'
);

4. Dimension Filtering (Non-Time)

CREATE TABLE product_stats (
  product_id BIGINT NOT NULL PRIMARY KEY,
  category TEXT,
  brand TEXT,
  sales_count BIGINT,
  revenue DECIMAL(18,2)
)
WITH (
  orientation = 'column',
  distribution_key = 'product_id',
  clustering_key = 'category:asc',
  bitmap_columns = 'category,brand'
);

5. JOIN-Optimized Fact + Dimension

-- Fact table: distribute by join key
CREATE TABLE fact_orders (
  order_id BIGINT NOT NULL PRIMARY KEY,
  customer_id BIGINT,
  amount DECIMAL(18,2),
  order_date TEXT
)
WITH (
  orientation = 'column',
  distribution_key = 'customer_id'
);

-- Dimension table: same distribution key
CREATE TABLE dim_customers (
  customer_id BIGINT NOT NULL PRIMARY KEY,
  name TEXT,
  region TEXT
)
WITH (
  orientation = 'row,column',
  distribution_key = 'customer_id'
);
-- JOIN on customer_id → local join, no shuffle

6. Mixed Workload (OLAP + Serving)

CREATE TABLE realtime_metrics (
  metric_id BIGINT NOT NULL,
  device_id BIGINT NOT NULL,
  value DOUBLE PRECISION,
  ts TIMESTAMPTZ NOT NULL,
  ds TEXT NOT NULL,
  PRIMARY KEY (ds, device_id, metric_id)
)
PARTITION BY LIST (ds)
WITH (
  orientation = 'row,column',
  distribution_key = 'device_id',
  clustering_key = 'ts:asc',
  event_time_column = 'ts',
  bitmap_columns = 'device_id'
);

References

Document	Content
data-types.md	Complete data type reference
table-properties.md	Detailed table property guide with selection flowcharts
partition-guide.md	Partition table design and management

Best Practices

Choose storage format first — column for OLAP, row for KV, row-column when unsure
Set distribution_key to JOIN/GROUP BY columns — avoids cross-shard shuffle
Set clustering_key for range-query columns — improves time-range and filter queries
Set event_time_column for time-series data — enables file-level pruning
Use bitmap_columns for low-cardinality filter columns — status, type, region
Partition by date only if daily data > 100M rows — otherwise use segment key alone
PK must include partition column — required by Hologres for partition tables
Align distribution_key across JOINed tables — same column enables local join
Use WITH syntax (V2.1+) — cleaner than CALL set_table_property in transactions
Avoid SERIAL as primary key — causes table-level write locks

hologres-schema-generator

More from this repository

More from this repository

Prerequisites

Hologres Schema Design & Table Creation

Information Gathering (IMPORTANT)

Required Information

Conditionally Required

Can Use Defaults (No Need to Ask)

Gathering Flow

Storage Formats

Quick Start

CREATE TABLE Syntax (V2.1+)

Legacy Syntax (All Versions)

Table Properties Reference

distribution_key (Distribution Key)

clustering_key (Clustering Key / Sorted Index)

event_time_column (Segment Key)

bitmap_columns (Bitmap Index)

dictionary_encoding_columns (Dictionary Encoding)

time_to_live_in_seconds (TTL) — NOT RECOMMENDED

Primary Key Design

Partition Table Design

Scenario-Based Templates

1. High-QPS Point Lookup (KV)

2. Prefix Range Scan

3. Time-Range Analytics (Partitioned)

4. Dimension Filtering (Non-Time)

5. JOIN-Optimized Fact + Dimension

6. Mixed Workload (OLAP + Serving)

References

Best Practices

Prerequisites

Hologres Schema Design & Table Creation

Information Gathering (IMPORTANT)

Required Information

Conditionally Required

Can Use Defaults (No Need to Ask)

Gathering Flow

Storage Formats

Quick Start

CREATE TABLE Syntax (V2.1+)

Legacy Syntax (All Versions)

Table Properties Reference

distribution_key (Distribution Key)

clustering_key (Clustering Key / Sorted Index)

event_time_column (Segment Key)

bitmap_columns (Bitmap Index)

dictionary_encoding_columns (Dictionary Encoding)

time_to_live_in_seconds (TTL) — NOT RECOMMENDED

Primary Key Design

Partition Table Design

Scenario-Based Templates

1. High-QPS Point Lookup (KV)

2. Prefix Range Scan

3. Time-Range Analytics (Partitioned)

4. Dimension Filtering (Non-Time)

5. JOIN-Optimized Fact + Dimension

6. Mixed Workload (OLAP + Serving)

References

Best Practices