تشغيل أي مهارة في Manus بنقرة واحدة

rerun-lerobot

النجوم١٠٬٩٧٦

التفرعات٧٧٢

آخر تحديث١٧ يونيو ٢٠٢٦ في ٠١:١٤

Ingest a LeRobot (HuggingFace) dataset into Rerun. Read when converting a LeRobot dataset to RRDs, splitting it into per-episode segments, or registering it on a Rerun catalog. Covers the built-in directory importer (log_file_from_path), the RrdReader + send_chunks per-episode split, and when to drop to ParquetReader for custom control.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

rerun-io

rerun-io/rerun

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

name	rerun-lerobot
description	Ingest a LeRobot (HuggingFace) dataset into Rerun. Read when converting a LeRobot dataset to RRDs, splitting it into per-episode segments, or registering it on a Rerun catalog. Covers the built-in directory importer (log_file_from_path), the RrdReader + send_chunks per-episode split, and when to drop to ParquetReader for custom control.
user_invocable	true
allowed-tools	Read, Grep, Bash, WebFetch

Rerun LeRobot ingestion

Rerun has a built-in LeRobot importer: point log_file_from_path (or the viewer, or rerun <dir> on the CLI) at the dataset directory and it ingests episodes, camera videos, and state/action tables with no conversion code. There is no chunk-level LeRobotReader; the chunk-processing route is to import first, then reprocess the resulting RRD with RrdReader.

The download step needs huggingface_hub.

Step 1: dataset -> one combined RRD

from huggingface_hub import snapshot_download
import rerun as rr

dataset_dir = snapshot_download(repo_id="rerun/so101-pick-and-place", repo_type="dataset", local_dir=dest)

with rr.RecordingStream("rerun_example_lerobot") as rec:
    rec.save(str(combined_rrd))
    rec.log_file_from_path(str(dataset_dir))  # the built-in importer

The importer emits one recording per episode (recording ids like episode_1), plus a metadata-only root recording, all into the single RRD.

Step 2: split into per-episode RRDs

Catalog segments are one-recording-per-file, and recording_id becomes the segment id on registration. Split with RrdReader:

reader = rr.experimental.RrdReader(str(combined_rrd))
for entry in reader.recordings():
    store = reader.store(store=entry)
    if not store.schema().entity_paths():  # skip the metadata-only root recording
        continue
    episode_id = zero_pad(entry.recording_id)  # episode_1 -> episode_00001
    with rr.RecordingStream("rerun_example_lerobot", recording_id=episode_id, send_properties=False) as rec:
        rec.save(str(rrd_dir / f"{episode_id}.rrd"))
        rec.send_chunks(store)

Two non-obvious moves:

Zero-pad the episode id. episode_10 sorts before episode_2 lexicographically; segment tables and viewers sort lexicographically. Pad to a fixed width when re-assigning recording_id.
send_properties=False on the new stream, so the copy doesn't inject fresh recording properties on top of the copied chunks.

send_chunks does not preserve the source store's identity; the new stream's recording_id wins, which is exactly what makes the rename work.

If episodes need cleanup (drop topics, fix data, add derived components), run the store through lenses between read and write: reader.stream(store=entry).drop(...).lenses(...) then collect().write_rrd(..., recording_id=episode_id) (see rerun-chunk-processing).

Computed layers and per-episode properties then follow the standard patterns in rerun-data-model (layer recording_id must equal the episode segment id).

Gotchas

log_file_from_path must target the dataset root directory, not a file inside it.
Unpadded episode ids sort incorrectly downstream; pad before registering.
The combined RRD contains a metadata-only root recording; skip stores with no entity paths or you register an empty segment.

References

https://github.com/rerun-io/rerun/tree/main/examples/python/dataloader prepare_dataset.py (download → import → split → register, complete and runnable) and train.py (training-side consumption via rerun.experimental.dataloader)

المزيد من هذا المستودع

نفس المستودع

rerun-chunk-processing

rerun-io/rerun

Core mechanics of the Rerun Chunk Processing API (rerun.experimental) — LazyChunkStream pipelines, Chunk, lenses (MutateLens/DeriveLens/Selector), RrdReader, writing optimized RRDs. Read when building or reviewing any ingestion, conversion, or RRD preprocessing pipeline. Source-specific knowledge lives in the importer skills (rerun-mcap, rerun-urdf, rerun-parquet, rerun-lerobot); read rerun-data-model first to decide what the data should become.

2026-06-1711.0k

rerun-mcap

rerun-io/rerun

Ingest MCAP files into Rerun chunk streams with rerun.experimental.McapReader. Read when converting an MCAP recording, selecting topics or decoders, decoding custom protobuf messages, or when an MCAP-derived stream comes out empty. Builds on rerun-chunk-processing (stream mechanics) and rerun-data-model (what the topics should become).

2026-06-1711.0k

rerun-urdf

rerun-io/rerun

Drive the Rerun URDF API (rerun.urdf.UrdfTree) to ingest a URDF as a Transform3D layer on a robot recording. Read when logging a robot model, running forward kinematics from joint states, composing a fixed chain for sensor extrinsics, or when the transform tree will not connect from the data alone. Builds on rerun-chunk-processing (stream/lens mechanics) and rerun-data-model (entity paths, timeline, base-vs-layer).

2026-06-1711.0k

rerun-blueprint

rerun-io/rerun

Design a Rerun blueprint from the data, then iterate on it from headless screenshots. Read this when laying out a recording or dataset in the viewer, designing a default blueprint, or deciding which views show which entities. Covers archetype-to-view mapping, layout reasoning, the rrb construction API, the contents grammar, and the screenshot loop.

2026-06-1611.0k

rerun-data-model

rerun-io/rerun

How raw multimodal robot data maps onto the Rerun data model. Read FIRST, before modeling or converting a dataset. Resolves the entity-vs-component, property-vs-component-vs-layer, and static-vs-temporal decisions, then points at the mechanism skills (rerun-chunk-processing and the importer skills rerun-mcap, rerun-urdf, rerun-parquet, rerun-lerobot) for the how.

2026-06-1611.0k

name	rerun-lerobot
description	Ingest a LeRobot (HuggingFace) dataset into Rerun. Read when converting a LeRobot dataset to RRDs, splitting it into per-episode segments, or registering it on a Rerun catalog. Covers the built-in directory importer (log_file_from_path), the RrdReader + send_chunks per-episode split, and when to drop to ParquetReader for custom control.
user_invocable	true
allowed-tools	Read, Grep, Bash, WebFetch

Rerun LeRobot ingestion

The download step needs huggingface_hub.

Step 1: dataset -> one combined RRD

from huggingface_hub import snapshot_download
import rerun as rr

dataset_dir = snapshot_download(repo_id="rerun/so101-pick-and-place", repo_type="dataset", local_dir=dest)

with rr.RecordingStream("rerun_example_lerobot") as rec:
    rec.save(str(combined_rrd))
    rec.log_file_from_path(str(dataset_dir))  # the built-in importer

The importer emits one recording per episode (recording ids like episode_1), plus a metadata-only root recording, all into the single RRD.

Step 2: split into per-episode RRDs

Catalog segments are one-recording-per-file, and recording_id becomes the segment id on registration. Split with RrdReader:

reader = rr.experimental.RrdReader(str(combined_rrd))
for entry in reader.recordings():
    store = reader.store(store=entry)
    if not store.schema().entity_paths():  # skip the metadata-only root recording
        continue
    episode_id = zero_pad(entry.recording_id)  # episode_1 -> episode_00001
    with rr.RecordingStream("rerun_example_lerobot", recording_id=episode_id, send_properties=False) as rec:
        rec.save(str(rrd_dir / f"{episode_id}.rrd"))
        rec.send_chunks(store)

Two non-obvious moves:

Zero-pad the episode id. episode_10 sorts before episode_2 lexicographically; segment tables and viewers sort lexicographically. Pad to a fixed width when re-assigning recording_id.
send_properties=False on the new stream, so the copy doesn't inject fresh recording properties on top of the copied chunks.

send_chunks does not preserve the source store's identity; the new stream's recording_id wins, which is exactly what makes the rename work.

Computed layers and per-episode properties then follow the standard patterns in rerun-data-model (layer recording_id must equal the episode segment id).

Gotchas

log_file_from_path must target the dataset root directory, not a file inside it.
Unpadded episode ids sort incorrectly downstream; pad before registering.
The combined RRD contains a metadata-only root recording; skip stores with no entity paths or you register an empty segment.

References

https://github.com/rerun-io/rerun/tree/main/examples/python/dataloader prepare_dataset.py (download → import → split → register, complete and runnable) and train.py (training-side consumption via rerun.experimental.dataloader)