一键在 Manus 中运行任何 Skill

rerun-parquet

星标1

分支0

更新时间2026年6月17日 23:42

Ingest tabular Parquet files into Rerun chunk streams with rerun.experimental.ParquetReader. Read when converting trajectory or sensor tables (LeRobot-style parquet, exported logs) into entities and components — column grouping, timeline/index columns, static columns, and ColumnRules that assemble typed components (Transform3D, Scalars) from flat columns. Builds on rerun-chunk-processing and rerun-data-model.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

rerun-io

rerun-io/trossen-oss

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

SKILL.md

readonly

name	rerun-parquet
description	Ingest tabular Parquet files into Rerun chunk streams with rerun.experimental.ParquetReader. Read when converting trajectory or sensor tables (LeRobot-style parquet, exported logs) into entities and components — column grouping, timeline/index columns, static columns, and ColumnRules that assemble typed components (Transform3D, Scalars) from flat columns. Builds on rerun-chunk-processing and rerun-data-model.
user_invocable	true
allowed-tools	Read, Grep, Bash, WebFetch

Rerun Parquet Ingestion

ParquetReader maps a flat table onto the Rerun model: column-name prefixes become entities, suffixes become components, designated columns become timelines. The whole job is configuration; fill in the rerun-data-model mapping table first, then express it through the constructor. Stream mechanics after .stream() are in rerun-chunk-processing.

Verified against rerun-sdk 0.34.0a1.

The API

from rerun.experimental import ColumnRule, ParquetReader

reader = ParquetReader(
    table_path,
    entity_path_prefix="/world",         # prepended to every entity path
    column_grouping="prefix",            # "prefix" | "individual" | "explicit_prefixes"
    delimiter="_",                       # split for column_grouping="prefix"
    prefixes=None,                       # required for "explicit_prefixes"
    use_structs=True,                    # pack grouped columns into one struct component
    static_columns=["robot_type"],       # constant-per-file values, logged static
    index_columns=[("timestamp", "timestamp", "us"), ("frame_index", "sequence")],
    column_rules=[...],                  # typed-component assembly, below
)
stream = reader.stream()

Column grouping: which columns share an entity

"prefix" (default): split each column name on delimiter, group by the first segment. gripper_pos_x, gripper_pos_y → entity gripper.
"explicit_prefixes": group by the strings in prefixes, tried longest-first; the prefix is stripped from the component name. Use this when names contain the delimiter ambiguously (observation.state vs observation.images.top: pass the full prefixes).
"individual": every column is its own chunk/entity. Rarely the model you want; reach for it only as a debugging baseline.

use_structs=True (default) packs a group's columns into a single Arrow struct component; False emits one component per column (flat layout, what queries see as separate columns).

Timelines: `index_columns`

Each entry is (name, type) or (name, type, unit):

type: "timestamp" (since epoch), "duration" (elapsed), "sequence" (ordinal int).
unit describes what the raw integers in the column are ("ns" default, "us", "ms", "s"); Rerun rescales to ns internally. Ignored for "sequence".

If omitted, a synthetic row_index sequence timeline is generated. That is almost never the timeline you want to query or align against; always name the real time columns. Stamp both a timestamp and a sequence timeline when the table has both (multi-rate alignment, see rerun-data-model).

Typed components: `column_rules`

Without rules, grouped columns stay generic struct/scalar data. Rules combine suffix-matched columns into real Rerun components so the viewer and transform system understand them:

ColumnRule.translation3d([sx, sy, sz]) → Translation3D
ColumnRule.rotation_quat([sx, sy, sz, sw]) → RotationQuat
ColumnRule.rotation_axis_angle([ax, ay, az, angle]) → RotationAxisAngle
ColumnRule.scale3d([sx, sy, sz]) → Scale3D
ColumnRule.scalars(suffixes, names=[...]) → Scalars with named series
ColumnRule.transform(translation_suffixes, rotation_suffixes) → Transform3D (3 + 4 columns; both suffix sets must match under the same sub-prefix)

column_rules=[
    ColumnRule.translation3d(["_pos_x", "_pos_y", "_pos_z"], field_name_override="_pos"),
    ColumnRule.rotation_quat(["_quat_x", "_quat_y", "_quat_z", "_quat_w"], field_name_override="_quat"),
    ColumnRule.scalars(["_x", "_y", "_z"], names=["x", "y", "z"]),
]

Rules are tried in list order; first match wins — put specific rules before broad catch-alls (a scalars rule on ["_x", "_y", "_z"] placed first would swallow the position columns meant for translation3d).

Gotchas

No index_columns → synthetic row_index timeline only. Queries that expect a timestamp timeline find nothing.
The unit is the raw column's unit, not a desired output unit; a microsecond column declared "ns" lands 1000x in the past.
static_columns raises if a listed column actually varies; that error is a data-quality signal, not a reason to drop the static declaration.
Rule order: first matching rule wins.
Quaternion column order is x, y, z, w in rotation_quat; check the source's convention before wiring suffixes.
Anything the reader cannot express (per-row entity routing, derived values, unit conversion) belongs in lenses downstream, not in pre-pandas munging; keep the pipeline columnar (rerun-chunk-processing).

References

API source with full docstrings: rerun/experimental/_parquet_reader.py in the installed rerun-sdk package, or python -c "from rerun.experimental import ParquetReader; help(ParquetReader)"
rerun-lerobot — LeRobot datasets store episodes as parquet; that skill covers the built-in importer route vs reading the parquet directly with this reader.
rerun-data-model (mapping decisions), rerun-chunk-processing (stream mechanics after .stream())

同仓库更多 Skills

同仓库

rerun-blueprint

rerun-io/trossen-oss

Design a Rerun blueprint from the data, then iterate on it from headless screenshots. Read this when laying out a recording or dataset in the viewer, designing a default blueprint, or deciding which views show which entities. Covers archetype-to-view mapping, layout reasoning, the rrb construction API, the contents grammar, and the screenshot loop.

2026-06-171

rerun-catalog-queries

rerun-io/trossen-oss

Performance patterns and gotchas for querying a Rerun catalog from Python. Reach for this when a CatalogClient/dataset query is unexpectedly slow, or when shaping a per-segment / per-episode pipeline that hits the catalog from many places.

2026-06-171

rerun-chunk-processing

rerun-io/trossen-oss

Core mechanics of the Rerun Chunk Processing API (rerun.experimental) — LazyChunkStream pipelines, Chunk, lenses (MutateLens/DeriveLens/Selector), RrdReader, writing optimized RRDs. Read when building or reviewing any ingestion, conversion, or RRD preprocessing pipeline. Source-specific knowledge lives in the importer skills (rerun-mcap, rerun-urdf, rerun-parquet, rerun-lerobot); read rerun-data-model first to decide what the data should become.

2026-06-171

rerun-data-model

rerun-io/trossen-oss

How raw multimodal robot data maps onto the Rerun data model. Read FIRST, before modeling or converting a dataset. Resolves the entity-vs-component, property-vs-component-vs-layer, and static-vs-temporal decisions, then points at the mechanism skills (rerun-chunk-processing and the importer skills rerun-mcap, rerun-urdf, rerun-parquet, rerun-lerobot) for the how.

2026-06-171

rerun-lerobot

rerun-io/trossen-oss

Ingest a LeRobot (HuggingFace) dataset into Rerun. Read when converting a LeRobot dataset to RRDs, splitting it into per-episode segments, or registering it on a Rerun catalog. Covers the built-in directory importer (log_file_from_path), the RrdReader + send_chunks per-episode split, and when to drop to ParquetReader for custom control.

2026-06-171

rerun-mcap

rerun-io/trossen-oss

Ingest MCAP files into Rerun chunk streams with rerun.experimental.McapReader. Read when converting an MCAP recording, selecting topics or decoders, fixing protobuf schemas that ship without compiled descriptors, or when an MCAP-derived stream comes out empty. Builds on rerun-chunk-processing (stream mechanics) and rerun-data-model (what the topics should become).

2026-06-171

name	rerun-parquet
description	Ingest tabular Parquet files into Rerun chunk streams with rerun.experimental.ParquetReader. Read when converting trajectory or sensor tables (LeRobot-style parquet, exported logs) into entities and components — column grouping, timeline/index columns, static columns, and ColumnRules that assemble typed components (Transform3D, Scalars) from flat columns. Builds on rerun-chunk-processing and rerun-data-model.
user_invocable	true
allowed-tools	Read, Grep, Bash, WebFetch

Rerun Parquet Ingestion

Verified against rerun-sdk 0.34.0a1.

The API

from rerun.experimental import ColumnRule, ParquetReader

reader = ParquetReader(
    table_path,
    entity_path_prefix="/world",         # prepended to every entity path
    column_grouping="prefix",            # "prefix" | "individual" | "explicit_prefixes"
    delimiter="_",                       # split for column_grouping="prefix"
    prefixes=None,                       # required for "explicit_prefixes"
    use_structs=True,                    # pack grouped columns into one struct component
    static_columns=["robot_type"],       # constant-per-file values, logged static
    index_columns=[("timestamp", "timestamp", "us"), ("frame_index", "sequence")],
    column_rules=[...],                  # typed-component assembly, below
)
stream = reader.stream()

Column grouping: which columns share an entity

"prefix" (default): split each column name on delimiter, group by the first segment. gripper_pos_x, gripper_pos_y → entity gripper.
"explicit_prefixes": group by the strings in prefixes, tried longest-first; the prefix is stripped from the component name. Use this when names contain the delimiter ambiguously (observation.state vs observation.images.top: pass the full prefixes).
"individual": every column is its own chunk/entity. Rarely the model you want; reach for it only as a debugging baseline.

use_structs=True (default) packs a group's columns into a single Arrow struct component; False emits one component per column (flat layout, what queries see as separate columns).

Timelines: `index_columns`

Each entry is (name, type) or (name, type, unit):

type: "timestamp" (since epoch), "duration" (elapsed), "sequence" (ordinal int).
unit describes what the raw integers in the column are ("ns" default, "us", "ms", "s"); Rerun rescales to ns internally. Ignored for "sequence".

Typed components: `column_rules`

Without rules, grouped columns stay generic struct/scalar data. Rules combine suffix-matched columns into real Rerun components so the viewer and transform system understand them:

ColumnRule.translation3d([sx, sy, sz]) → Translation3D
ColumnRule.rotation_quat([sx, sy, sz, sw]) → RotationQuat
ColumnRule.rotation_axis_angle([ax, ay, az, angle]) → RotationAxisAngle
ColumnRule.scale3d([sx, sy, sz]) → Scale3D
ColumnRule.scalars(suffixes, names=[...]) → Scalars with named series
ColumnRule.transform(translation_suffixes, rotation_suffixes) → Transform3D (3 + 4 columns; both suffix sets must match under the same sub-prefix)

column_rules=[
    ColumnRule.translation3d(["_pos_x", "_pos_y", "_pos_z"], field_name_override="_pos"),
    ColumnRule.rotation_quat(["_quat_x", "_quat_y", "_quat_z", "_quat_w"], field_name_override="_quat"),
    ColumnRule.scalars(["_x", "_y", "_z"], names=["x", "y", "z"]),
]

Gotchas

No index_columns → synthetic row_index timeline only. Queries that expect a timestamp timeline find nothing.
The unit is the raw column's unit, not a desired output unit; a microsecond column declared "ns" lands 1000x in the past.
static_columns raises if a listed column actually varies; that error is a data-quality signal, not a reason to drop the static declaration.
Rule order: first matching rule wins.
Quaternion column order is x, y, z, w in rotation_quat; check the source's convention before wiring suffixes.
Anything the reader cannot express (per-row entity routing, derived values, unit conversion) belongs in lenses downstream, not in pre-pandas munging; keep the pipeline columnar (rerun-chunk-processing).

References

API source with full docstrings: rerun/experimental/_parquet_reader.py in the installed rerun-sdk package, or python -c "from rerun.experimental import ParquetReader; help(ParquetReader)"
rerun-lerobot — LeRobot datasets store episodes as parquet; that skill covers the built-in importer route vs reading the parquet directly with this reader.
rerun-data-model (mapping decisions), rerun-chunk-processing (stream mechanics after .stream())

rerun-parquet

Rerun Parquet Ingestion

The API

Column grouping: which columns share an entity

Timelines: index_columns

Typed components: column_rules

Gotchas

References

同仓库更多 Skills

同仓库更多 Skills

Rerun Parquet Ingestion

The API

Column grouping: which columns share an entity

Timelines: index_columns

Typed components: column_rules

Gotchas

References

Timelines: `index_columns`

Typed components: `column_rules`

Timelines: `index_columns`

Typed components: `column_rules`