تشغيل أي مهارة في Manus بنقرة واحدة

rerun-lerobot

النجوم١

التفرعات٠

آخر تحديث١٧ يونيو ٢٠٢٦ في ٢٣:٤٢

Ingest a LeRobot (HuggingFace) dataset into Rerun. Read when converting a LeRobot dataset to RRDs, splitting it into per-episode segments, or registering it on a Rerun catalog. Covers the built-in directory importer (log_file_from_path), the RrdReader + send_chunks per-episode split, and when to drop to ParquetReader for custom control.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

rerun-io

rerun-io/trossen-oss

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

SKILL.md

readonly

name	rerun-lerobot
description	Ingest a LeRobot (HuggingFace) dataset into Rerun. Read when converting a LeRobot dataset to RRDs, splitting it into per-episode segments, or registering it on a Rerun catalog. Covers the built-in directory importer (log_file_from_path), the RrdReader + send_chunks per-episode split, and when to drop to ParquetReader for custom control.
user_invocable	true
allowed-tools	Read, Grep, Bash, WebFetch

Rerun LeRobot Ingestion

Rerun has a built-in LeRobot importer: point log_file_from_path (or the viewer, or rerun <dir> on the CLI) at the dataset directory and it ingests episodes, camera videos, and state/action tables with no conversion code. There is no chunk-level LeRobotReader; the chunk-processing route is to import first, then reprocess the resulting RRD with RrdReader. The complete working pipeline this skill follows is prepare_dataset.py in https://github.com/rerun-io/rerun/tree/main/examples/python/dataloader.

Verified against rerun-sdk 0.34.0a1. The download step needs huggingface_hub.

Step 1: dataset → one combined RRD

from huggingface_hub import snapshot_download
import rerun as rr

dataset_dir = snapshot_download(repo_id="rerun/so101-pick-and-place",
                                repo_type="dataset", local_dir=dest)

with rr.RecordingStream("lerobot") as rec:
    rec.save(str(combined_rrd))
    rec.log_file_from_path(str(dataset_dir))   # the built-in importer

The importer emits one recording per episode (recording ids like episode_1), plus a metadata-only root recording, all into the single RRD.

Step 2: split into per-episode RRDs

Catalog segments are one-recording-per-file, and recording_id becomes the segment id on registration. Split with RrdReader:

reader = rr.experimental.RrdReader(str(combined_rrd))
for entry in reader.recordings():
    store = reader.store(store=entry)
    if not store.schema().entity_paths():       # skip the metadata-only root recording
        continue
    episode_id = zero_pad(entry.recording_id)   # episode_1 -> episode_00001
    with rr.RecordingStream("lerobot", recording_id=episode_id, send_properties=False) as rec:
        rec.save(str(rrd_dir / f"{episode_id}.rrd"))
        rec.send_chunks(store)

Two non-obvious moves, both from prepare_dataset.py:

Zero-pad the episode id. episode_10 sorts before episode_2 lexicographically; segment tables and viewers sort lexicographically. Pad to a fixed width when re-assigning recording_id.
send_properties=False on the new stream, so the copy doesn't inject fresh recording properties on top of the copied chunks.

send_chunks does not preserve the source store's identity; the new stream's recording_id wins, which is exactly what makes the rename work.

If episodes need cleanup (drop topics, fix data, add derived components), run the store through lenses between read and write: reader.stream(store=entry).drop(...).lenses(...) then collect(optimize=OptimizationProfile.OBJECT_STORE).write_rrd(..., recording_id=episode_id) (see rerun-chunk-processing). Use the OBJECT_STORE profile whenever the RRDs are headed for a catalog.

Step 3: register on a catalog

dataset = client.create_dataset("my_lerobot_set")
dataset.register_prefix(rrd_dir.as_uri())       # base segments, one per episode

Computed layers and per-episode properties then follow the standard patterns in rerun-data-model (layer recording_id must equal the episode segment id).

When to bypass the importer

The importer decides the entity layout for you. Reach for ParquetReader (see rerun-parquet) on the dataset's data/*.parquet files when you need a different model: custom entity paths, selected columns only, typed Transform3D assembly from flat columns, or ingestion without materializing videos. You then own video handling and episode boundaries yourself; prefer the importer unless its layout actually blocks you.

Gotchas

log_file_from_path must target the dataset root directory, not a file inside it.
Unpadded episode ids mis-sort downstream; pad before registering.
The combined RRD contains a metadata-only root recording; skip stores with no entity paths or you register an empty segment.
Re-registering after a fix: the catalog keys segments by recording_id; reuse the same ids to update in place rather than minting new ones.

References

https://github.com/rerun-io/rerun/tree/main/examples/python/dataloader — prepare_dataset.py (download → import → split → register, complete and runnable) and train.py (training-side consumption via rerun.experimental.dataloader)
rerun-parquet (direct parquet route), rerun-chunk-processing (RrdReader, lenses, OBJECT_STORE), rerun-data-model (segments, layers, properties)

المزيد من هذا المستودع

نفس المستودع

rerun-blueprint

rerun-io/trossen-oss

Design a Rerun blueprint from the data, then iterate on it from headless screenshots. Read this when laying out a recording or dataset in the viewer, designing a default blueprint, or deciding which views show which entities. Covers archetype-to-view mapping, layout reasoning, the rrb construction API, the contents grammar, and the screenshot loop.

2026-06-171

rerun-catalog-queries

rerun-io/trossen-oss

Performance patterns and gotchas for querying a Rerun catalog from Python. Reach for this when a CatalogClient/dataset query is unexpectedly slow, or when shaping a per-segment / per-episode pipeline that hits the catalog from many places.

2026-06-171

rerun-chunk-processing

rerun-io/trossen-oss

Core mechanics of the Rerun Chunk Processing API (rerun.experimental) — LazyChunkStream pipelines, Chunk, lenses (MutateLens/DeriveLens/Selector), RrdReader, writing optimized RRDs. Read when building or reviewing any ingestion, conversion, or RRD preprocessing pipeline. Source-specific knowledge lives in the importer skills (rerun-mcap, rerun-urdf, rerun-parquet, rerun-lerobot); read rerun-data-model first to decide what the data should become.

2026-06-171

rerun-data-model

rerun-io/trossen-oss

How raw multimodal robot data maps onto the Rerun data model. Read FIRST, before modeling or converting a dataset. Resolves the entity-vs-component, property-vs-component-vs-layer, and static-vs-temporal decisions, then points at the mechanism skills (rerun-chunk-processing and the importer skills rerun-mcap, rerun-urdf, rerun-parquet, rerun-lerobot) for the how.

2026-06-171

rerun-mcap

rerun-io/trossen-oss

Ingest MCAP files into Rerun chunk streams with rerun.experimental.McapReader. Read when converting an MCAP recording, selecting topics or decoders, fixing protobuf schemas that ship without compiled descriptors, or when an MCAP-derived stream comes out empty. Builds on rerun-chunk-processing (stream mechanics) and rerun-data-model (what the topics should become).

2026-06-171

rerun-parquet

rerun-io/trossen-oss

Ingest tabular Parquet files into Rerun chunk streams with rerun.experimental.ParquetReader. Read when converting trajectory or sensor tables (LeRobot-style parquet, exported logs) into entities and components — column grouping, timeline/index columns, static columns, and ColumnRules that assemble typed components (Transform3D, Scalars) from flat columns. Builds on rerun-chunk-processing and rerun-data-model.

2026-06-171

name	rerun-lerobot
description	Ingest a LeRobot (HuggingFace) dataset into Rerun. Read when converting a LeRobot dataset to RRDs, splitting it into per-episode segments, or registering it on a Rerun catalog. Covers the built-in directory importer (log_file_from_path), the RrdReader + send_chunks per-episode split, and when to drop to ParquetReader for custom control.
user_invocable	true
allowed-tools	Read, Grep, Bash, WebFetch

Rerun LeRobot Ingestion

Verified against rerun-sdk 0.34.0a1. The download step needs huggingface_hub.

Step 1: dataset → one combined RRD

from huggingface_hub import snapshot_download
import rerun as rr

dataset_dir = snapshot_download(repo_id="rerun/so101-pick-and-place",
                                repo_type="dataset", local_dir=dest)

with rr.RecordingStream("lerobot") as rec:
    rec.save(str(combined_rrd))
    rec.log_file_from_path(str(dataset_dir))   # the built-in importer

The importer emits one recording per episode (recording ids like episode_1), plus a metadata-only root recording, all into the single RRD.

Step 2: split into per-episode RRDs

Catalog segments are one-recording-per-file, and recording_id becomes the segment id on registration. Split with RrdReader:

reader = rr.experimental.RrdReader(str(combined_rrd))
for entry in reader.recordings():
    store = reader.store(store=entry)
    if not store.schema().entity_paths():       # skip the metadata-only root recording
        continue
    episode_id = zero_pad(entry.recording_id)   # episode_1 -> episode_00001
    with rr.RecordingStream("lerobot", recording_id=episode_id, send_properties=False) as rec:
        rec.save(str(rrd_dir / f"{episode_id}.rrd"))
        rec.send_chunks(store)

Two non-obvious moves, both from prepare_dataset.py:

Zero-pad the episode id. episode_10 sorts before episode_2 lexicographically; segment tables and viewers sort lexicographically. Pad to a fixed width when re-assigning recording_id.
send_properties=False on the new stream, so the copy doesn't inject fresh recording properties on top of the copied chunks.

send_chunks does not preserve the source store's identity; the new stream's recording_id wins, which is exactly what makes the rename work.

Step 3: register on a catalog

dataset = client.create_dataset("my_lerobot_set")
dataset.register_prefix(rrd_dir.as_uri())       # base segments, one per episode

Computed layers and per-episode properties then follow the standard patterns in rerun-data-model (layer recording_id must equal the episode segment id).

When to bypass the importer

Gotchas

log_file_from_path must target the dataset root directory, not a file inside it.
Unpadded episode ids mis-sort downstream; pad before registering.
The combined RRD contains a metadata-only root recording; skip stores with no entity paths or you register an empty segment.
Re-registering after a fix: the catalog keys segments by recording_id; reuse the same ids to update in place rather than minting new ones.

References

https://github.com/rerun-io/rerun/tree/main/examples/python/dataloader — prepare_dataset.py (download → import → split → register, complete and runnable) and train.py (training-side consumption via rerun.experimental.dataloader)
rerun-parquet (direct parquet route), rerun-chunk-processing (RrdReader, lenses, OBJECT_STORE), rerun-data-model (segments, layers, properties)