تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

convert-to-marimo

Name: Convert To Marimo
Author: pinecone-io

// This skill should be used when the user asks to "convert a notebook to marimo", "migrate a Jupyter notebook to marimo", "rewrite a notebook in marimo", or wants to modernize an existing .ipynb file into a high-quality marimo .py notebook.

تشغيل في Manus

$ git log --oneline --stat

stars:٣٬٠٢٤

forks:١٬٠٧٢

updated:٢١ مايو ٢٠٢٦ في ٢٠:٥٢

SKILL.md

readonly

package.json

"author": "pinecone-io"

"repository": "pinecone-io/examples"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

تشغيل أي مهارة بنقرة واحدة

name	convert-to-marimo
description	This skill should be used when the user asks to "convert a notebook to marimo", "migrate a Jupyter notebook to marimo", "rewrite a notebook in marimo", or wants to modernize an existing .ipynb file into a high-quality marimo .py notebook.
version	0.1.0

Convert Jupyter Notebook to Marimo

Convert an existing .ipynb Jupyter notebook into a high-quality marimo .py notebook, updating dependencies, adopting marimo affordances, improving code quality, and revising prose to meet the Pinecone examples writing guidelines.

Writing guidelines reference: See .ai/writing-guidelines.md for voice, tone, and style.

Phase 1: Initial Conversion

Convert with marimo

uv run marimo convert path/to/notebook.ipynb -o docs/notebook-name.py

Start in sandbox mode for development

uvx marimo edit --sandbox docs/notebook-name.py --no-token

Sandbox mode creates an isolated environment from the notebook's # /// script inline metadata — none of the project's root dependencies bleed in. Always develop in sandbox mode so the dependency list stays honest.

Explore the code_mode API at the start of each session

import marimo._code_mode as cm
help(cm)

The API can change between marimo versions; verify it before using it.

Phase 2: Dependencies

Update the `# /// script` metadata block

The converted file will have a metadata block at the top. This is the source of truth for the notebook's dependencies when running in sandbox mode.

# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "marimo>=0.23.6",
#     "pinecone==9.0.1",
#     "datasets==3.5.1",
# ]
# ///

Rules:

Pin every dependency to a specific version with ==
Include marimo>=0.23.6 (or current version)
Pin the Pinecone SDK to 9.0.1 (or latest)
Keep this block in the notebook file — never add notebook deps to the root pyproject.toml

Remove unused dependencies

After conversion, audit the declared deps against what's actually imported. Common removals:

tqdm — replaced by mo.status.progress_bar()
numpy — often imported by Jupyter cells that don't need it directly
pinecone-notebooks — Colab-only authentication widget, not needed in marimo

Watch for library compatibility breaks

Check whether newer versions of dependencies break with the data sources used. A known example: datasets>=4.0 dropped support for custom loading scripts (e.g. Helsinki-NLP/tatoeba). Pin to the last working version and note why in a comment.

Phase 3: Remove Jupyter/Colab Artifacts

Delete or replace:

Colab/nbviewer badges — strip from the header markdown cell
!pip install cells — dependencies are declared in # /// script, not installed at runtime
Colab authentication cells — pinecone_notebooks.colab.Authenticate() and similar widgets
"Note: pip install is formatted for Jupyter" markdown — not relevant in marimo
## Installation section headings — no installation step needed
trust_remote_code=True notes — keep the argument but remove surrounding Jupyter-specific explanation
References to "this notebook", "run this cell", "Jupyter" — rewrite as plain prose

Phase 4: Update the Pinecone SDK to 9.0.1

Replace deprecated method calls with the pc.indexes.* namespace:

Old	New
`pc.has_index(name=x)`	`pc.indexes.exists(name=x)`
`pc.create_index(name=x, ...)`	`pc.indexes.create(name=x, ...)`
`pc.describe_index(name=x)`	`pc.indexes.describe(name=x)`
`pc.delete_index(name=x)`	`pc.indexes.delete(name=x)`
`pc.Index(host=desc.host)`	`pc.index(name=x)`
`index.search(namespace=ns, query={"top_k": k, "inputs": {...}})`	`index.search(namespace=ns, top_k=k, inputs={...})`
`results["result"]["hits"]` / `result["_score"]`	`results.result.hits` / `hit.score`

Always use keyword argument names in all Pinecone API calls — positional args are harder to read and more fragile across SDK versions.

Phase 5: Adopt Marimo Affordances

Replace `print()` output with tables

# Before
for result in results:
    print(f"{result['text']} (score: {result['score']})")

# After
mo.ui.table([{"text": r["text"], "score": r["score"]} for r in results])

Use mo.vstack() to combine a heading with a table:

mo.vstack([
    mo.md(f"**Query:** {query}"),
    mo.ui.table(data, show_column_summaries=False),
])

Replace `tqdm` with `mo.status.progress_bar()`

# Before
for batch in tqdm(batches):
    index.upsert(batch)

# After
for batch in mo.status.progress_bar(batches, title="Upserting", show_rate=True, show_eta=True):
    index.upsert(batch)

When passing a range, omit total — marimo infers it from len(range(...)).

Wrap destructive operations in `mo.ui.run_button()`

Marimo's reactive model means all cells run automatically. A cleanup cell that deletes an index will fire immediately — gate it with a button:

# Cell 1 — display the button
delete_button = mo.ui.run_button(label="Delete index")
delete_button

# Cell 2 — action (separate cell — can't read .value in the same cell that creates it)
mo.stop(not delete_button.value)
pc.indexes.delete(name=index_name)

Use `mo.callout()` for status messages

mo.callout(mo.md("API key loaded from environment."), kind="success")
mo.callout(mo.md("Enter your API key to continue."), kind="info")
mo.callout(mo.md("**Error:** index not found."), kind="danger")

Kinds: neutral, info, warn, success, danger.

Handle API key input

Users running locally can set PINECONE_API_KEY in their environment or a .env file (marimo reads .env on startup). Users in molab need a password input:

# Cell 1 — input (hide_code=True)
env_key = os.environ.get("PINECONE_API_KEY", "")
api_key_input = mo.ui.text(
    kind="password",
    placeholder="pcsk_...",
    label="Pinecone API Key",
    value=env_key,
    full_width=True,
)
(
    mo.callout(mo.md("API key loaded from environment."), kind="success")
    if env_key
    else mo.vstack([
        mo.callout(mo.md("Enter your Pinecone API key. Get a free key at [app.pinecone.io](https://app.pinecone.io)."), kind="info"),
        api_key_input,
    ])
)

# Cell 2 — validate and create client (hide_code=True for the stop check; visible for pc = Pinecone(...))
api_key = api_key_input.value
mo.stop(
    not api_key,
    mo.callout(mo.md("**API key required.** Enter your key above to continue."), kind="danger"),
)

# Cell 3 — visible: instantiate the client
pc = Pinecone(api_key=api_key, source_tag="pinecone_examples:...")

Display data with `mo.ui.table()`

HuggingFace datasets and lists of dicts both work directly:

mo.ui.table(dataset, page_size=10)
mo.ui.table(records, page_size=10)

Add interactive inputs for exploration

At the end of the notebook, add a "Try It Yourself" section:

query_input = mo.ui.text(value="default query", full_width=True)
lang_select = mo.ui.radio(
    options={"All": None, "English": "en", "Spanish": "es"},
    value="All",
)
mo.vstack([query_input, lang_select])

Then in the next cell:

search(query_input.value, lang=lang_select.value)

Results update when the user changes either input.

Phase 6: Code Quality

Name things to document intent

Well-named functions and variables replace comments. If you find yourself writing a comment to explain what a block of code does, that is a signal to extract it into a named function instead.

Before:

# Filter sentences containing our keyword and build records for Pinecone
results = []
for i, row in enumerate(dataset.filter(lambda x: any(k in x["text"] for k in keywords))):
    results.append({"id": str(i), "chunk_text": row["text"], "lang": row["lang"]})

After:

def filter_by_keywords(dataset, keywords):
    return dataset.filter(lambda x: any(k in x["text"] for k in keywords))

def to_records(sentences, id_prefix=""):
    return [
        {"id": f"{id_prefix}{i}", "chunk_text": s["text"], "lang": s["lang"]}
        for i, s in enumerate(sentences)
    ]

filtered = filter_by_keywords(dataset, keywords)
records = to_records(filtered)

The second version reads like a description of what is happening. The function names are the documentation.

Decompose monolithic functions by stage

Jupyter notebooks often have one large function that loads, filters, transforms, and formats data all at once. Split it along its natural stages — each stage becomes a function with a clear name and a clear input/output contract.

Identify stages by asking: at what points does the data change shape or purpose?

Example decomposition:

prepare_sentences(dataset, keywords)  →  one big function doing everything

becomes:

filter_pairs(dataset, keywords)       →  returns filtered HF dataset (pairs)
extract_sentences(pairs, lang)        →  returns single-language HF dataset
to_records(sentences, column)         →  returns list of Pinecone record dicts

Each stage can be shown, inspected, and explained independently. Each can be reused or replaced without touching the others.

Split large cells to make intermediate results visible

Marimo cells produce output. A single cell that does five things produces one output — or none. Splitting at stage boundaries lets each step show its result, which helps readers understand what changed and why.

Rule of thumb: if a cell produces a value worth seeing (a filtered dataset, a record list, a search result), that value should be the last expression in its own cell.

# Too much in one cell — intermediate state invisible
filtered = filter_pairs(tatoeba, keywords)
english = extract_sentences(filtered, lang="en")
records = to_records(english, column="sentence")
index.upsert_records(records=records, namespace=namespace)

# Split: each step's output is inspectable
# Cell 1
filtered_pairs = filter_pairs(tatoeba, keywords=keywords)

# Cell 2 — reader can see what was extracted
english = extract_sentences(filtered_pairs, lang="en")
mo.ui.table(english, page_size=5)

# Cell 3 — reader can see the record format before upserting
records = to_records(english, column="sentence")
mo.ui.table(records, page_size=5)

# Cell 4 — upsert is its own step
for start in mo.status.progress_bar(range(0, len(records), batch_size)):
    index.upsert_records(records=records[start:start + batch_size], namespace=namespace)

Extract reusable helpers into their own cells

If a function is called more than once, or could reasonably be called with different arguments, give it its own cell. Readers can read the definition once, then see it used cleanly at each call site.

The search notebook pattern is a good model:

# One cell defines the helper
def search(query, top_k=10, lang=None):
    results = index.search(
        namespace=namespace,
        top_k=top_k,
        inputs={"text": query},
        filter={"lang": {"$eq": lang}} if lang else None,
    )
    return print_results(query, results)

# Subsequent cells are just clean call sites
search("I want to go to the park and relax")
search("Quiero ir al parque a relajarme")
search("The park is crowded today", lang="en")

Parameterize functions — avoid globals

Converted notebooks often have functions that silently close over global variables (keywords, index, namespace). This makes the function hard to reuse and hides dependencies.

Before (globals):

keywords = ["park"]

def prepare_sentences(dataset):
    return dataset.filter(lambda x: any(k in x["translation"]["en"] for k in keywords))

After (explicit parameter):

def prepare_sentences(dataset, keywords=None):
    if keywords:
        return dataset.filter(lambda x: any(k in x["translation"]["en"] for k in keywords))
    return dataset

The exception: functions that close over index and namespace in a "search" helper are reasonable — they're scoped to the notebook, and the closure reads naturally.

Remove over-explaining comments

Only comment on the non-obvious WHY — not on what the code does. Delete comments like:

# Initialize client
# convert to record format
# flatten and shuffle for ease of use
# Here, we create a record for each sentence in the dataset

Keep comments that explain constraints, workarounds, or non-obvious choices — especially when a behaviour might surprise a reader (e.g. why a version is pinned, why a parameter is omitted).

Avoid multiply-defined variables across cells

Marimo's static analysis flags top-level variables defined in more than one cell. When two cells have the same local variable names, either:

Use different names
Inline the computation (no assignment)
Consolidate both cells into one

Watch for marimo cell configuration issues

Cells created with code_mode default to hide_code=True. Always explicitly set hide_code=False for code cells that should be visible. Verify with:

for cell in ctx.cells:
    kind = "md  " if cell.config.hide_code else "code"
    print(f"[{cell.id}] {kind}: {cell.code[:60]!r}")

Phase 7: Prose and Structure

Follow .ai/writing-guidelines.md. Key points for marimo conversion:

Voice and tone

Use "we" throughout (collaborative tutorial voice)
Factual and collegial — no "super helpful!", "Neat!", "magic", "Congrats"
No superlatives, no marketing language
No time references ("recently added", "new feature")

Structure

Intersperse explanations between code cells — don't dump all prose at the top
Put "why" before the code it motivates (e.g. explain why a keyword is ambiguous just before the filter that uses it)
After showing data, explain what you see before proceeding
Use ### subheadings within sections for skimmability

Merge adjacent text cells

When two or more markdown cells appear next to each other with no code between them, consolidate them into one unless they serve structurally distinct purposes (e.g. a section heading followed by body text can be merged).

Remove Jupyter-specific prose

"Run the cell below" → remove or rewrite
"This notebook will..." → "This example demonstrates..."
References to Colab, Google Colab, nbviewer → remove entirely
"In this notebook" → rewrite without the word "notebook"

Section heading guidelines

Headings should be short noun phrases, not full sentences
"Meaning Over Keywords" not "Semantic Search considers the meaning of the query"
"How It Works" not "Wait, how is this working?"
"Cleanup" not "Demo Cleanup"

Phase 8: Final Checks

Run ruff

uv run ruff check docs/notebook-name.py
uv run ruff format docs/notebook-name.py

The CI pipeline runs ruff check and ruff format --check on changed .py files. Fix all issues before committing.

Verify sandbox runs

uvx marimo edit --sandbox docs/notebook-name.py --no-token

Run through the notebook end-to-end to confirm all cells execute correctly in the isolated environment.

Verify no root pyproject.toml changes

Notebook dependencies belong in the # /// script block only. If marimo's package manager added anything to pyproject.toml during development, revert those changes and restore uv.lock from main:

git checkout origin/main -- uv.lock

Common Pitfalls

Problem	Fix
`mo.ui.run_button().value` read in same cell	Split button creation and value access into separate cells
Multiply-defined variable names across cells	Inline the call or use distinct names
Cells created with `code_mode` are hidden	Explicitly set `hide_code=False`
marimo package manager edits `pyproject.toml`	Revert — deps belong in `# /// script` only
`datasets>=4` breaks dataset loading scripts	Pin to last working version (e.g. `datasets==3.5.1`)
Old SDK calls (`pc.has_index`, `pc.Index(host=...)`)	Replace with `pc.indexes.*` namespace
`tqdm` still imported but unused	Remove it — use `mo.status.progress_bar()`
`source_tag` in `pc = Pinecone(...)`	Keep it, but note in prose it's for internal Pinecone analytics — users should not include it in their own apps
Index deletion cell auto-fires on notebook load	Wrap in `mo.ui.run_button()`

name	convert-to-marimo
description	This skill should be used when the user asks to "convert a notebook to marimo", "migrate a Jupyter notebook to marimo", "rewrite a notebook in marimo", or wants to modernize an existing .ipynb file into a high-quality marimo .py notebook.
version	0.1.0

Convert Jupyter Notebook to Marimo

Writing guidelines reference: See .ai/writing-guidelines.md for voice, tone, and style.

Phase 1: Initial Conversion

Convert with marimo

uv run marimo convert path/to/notebook.ipynb -o docs/notebook-name.py

Start in sandbox mode for development

uvx marimo edit --sandbox docs/notebook-name.py --no-token

Explore the code_mode API at the start of each session

import marimo._code_mode as cm
help(cm)

The API can change between marimo versions; verify it before using it.

Phase 2: Dependencies

Update the `# /// script` metadata block

The converted file will have a metadata block at the top. This is the source of truth for the notebook's dependencies when running in sandbox mode.

# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "marimo>=0.23.6",
#     "pinecone==9.0.1",
#     "datasets==3.5.1",
# ]
# ///

Rules:

Pin every dependency to a specific version with ==
Include marimo>=0.23.6 (or current version)
Pin the Pinecone SDK to 9.0.1 (or latest)
Keep this block in the notebook file — never add notebook deps to the root pyproject.toml

Remove unused dependencies

After conversion, audit the declared deps against what's actually imported. Common removals:

tqdm — replaced by mo.status.progress_bar()
numpy — often imported by Jupyter cells that don't need it directly
pinecone-notebooks — Colab-only authentication widget, not needed in marimo

Watch for library compatibility breaks

Phase 3: Remove Jupyter/Colab Artifacts

Delete or replace:

Colab/nbviewer badges — strip from the header markdown cell
!pip install cells — dependencies are declared in # /// script, not installed at runtime
Colab authentication cells — pinecone_notebooks.colab.Authenticate() and similar widgets
"Note: pip install is formatted for Jupyter" markdown — not relevant in marimo
## Installation section headings — no installation step needed
trust_remote_code=True notes — keep the argument but remove surrounding Jupyter-specific explanation
References to "this notebook", "run this cell", "Jupyter" — rewrite as plain prose

Phase 4: Update the Pinecone SDK to 9.0.1

Replace deprecated method calls with the pc.indexes.* namespace:

Old	New
`pc.has_index(name=x)`	`pc.indexes.exists(name=x)`
`pc.create_index(name=x, ...)`	`pc.indexes.create(name=x, ...)`
`pc.describe_index(name=x)`	`pc.indexes.describe(name=x)`
`pc.delete_index(name=x)`	`pc.indexes.delete(name=x)`
`pc.Index(host=desc.host)`	`pc.index(name=x)`
`index.search(namespace=ns, query={"top_k": k, "inputs": {...}})`	`index.search(namespace=ns, top_k=k, inputs={...})`
`results["result"]["hits"]` / `result["_score"]`	`results.result.hits` / `hit.score`

Always use keyword argument names in all Pinecone API calls — positional args are harder to read and more fragile across SDK versions.

Phase 5: Adopt Marimo Affordances

Replace `print()` output with tables

# Before
for result in results:
    print(f"{result['text']} (score: {result['score']})")

# After
mo.ui.table([{"text": r["text"], "score": r["score"]} for r in results])

Use mo.vstack() to combine a heading with a table:

mo.vstack([
    mo.md(f"**Query:** {query}"),
    mo.ui.table(data, show_column_summaries=False),
])

Replace `tqdm` with `mo.status.progress_bar()`

# Before
for batch in tqdm(batches):
    index.upsert(batch)

# After
for batch in mo.status.progress_bar(batches, title="Upserting", show_rate=True, show_eta=True):
    index.upsert(batch)

When passing a range, omit total — marimo infers it from len(range(...)).

Wrap destructive operations in `mo.ui.run_button()`

Marimo's reactive model means all cells run automatically. A cleanup cell that deletes an index will fire immediately — gate it with a button:

# Cell 1 — display the button
delete_button = mo.ui.run_button(label="Delete index")
delete_button

# Cell 2 — action (separate cell — can't read .value in the same cell that creates it)
mo.stop(not delete_button.value)
pc.indexes.delete(name=index_name)

Use `mo.callout()` for status messages

mo.callout(mo.md("API key loaded from environment."), kind="success")
mo.callout(mo.md("Enter your API key to continue."), kind="info")
mo.callout(mo.md("**Error:** index not found."), kind="danger")

Kinds: neutral, info, warn, success, danger.

Handle API key input

Users running locally can set PINECONE_API_KEY in their environment or a .env file (marimo reads .env on startup). Users in molab need a password input:

# Cell 1 — input (hide_code=True)
env_key = os.environ.get("PINECONE_API_KEY", "")
api_key_input = mo.ui.text(
    kind="password",
    placeholder="pcsk_...",
    label="Pinecone API Key",
    value=env_key,
    full_width=True,
)
(
    mo.callout(mo.md("API key loaded from environment."), kind="success")
    if env_key
    else mo.vstack([
        mo.callout(mo.md("Enter your Pinecone API key. Get a free key at [app.pinecone.io](https://app.pinecone.io)."), kind="info"),
        api_key_input,
    ])
)

# Cell 2 — validate and create client (hide_code=True for the stop check; visible for pc = Pinecone(...))
api_key = api_key_input.value
mo.stop(
    not api_key,
    mo.callout(mo.md("**API key required.** Enter your key above to continue."), kind="danger"),
)

# Cell 3 — visible: instantiate the client
pc = Pinecone(api_key=api_key, source_tag="pinecone_examples:...")

Display data with `mo.ui.table()`

HuggingFace datasets and lists of dicts both work directly:

mo.ui.table(dataset, page_size=10)
mo.ui.table(records, page_size=10)

Add interactive inputs for exploration

At the end of the notebook, add a "Try It Yourself" section:

query_input = mo.ui.text(value="default query", full_width=True)
lang_select = mo.ui.radio(
    options={"All": None, "English": "en", "Spanish": "es"},
    value="All",
)
mo.vstack([query_input, lang_select])

Then in the next cell:

search(query_input.value, lang=lang_select.value)

Results update when the user changes either input.

Phase 6: Code Quality

Name things to document intent

Well-named functions and variables replace comments. If you find yourself writing a comment to explain what a block of code does, that is a signal to extract it into a named function instead.

Before:

# Filter sentences containing our keyword and build records for Pinecone
results = []
for i, row in enumerate(dataset.filter(lambda x: any(k in x["text"] for k in keywords))):
    results.append({"id": str(i), "chunk_text": row["text"], "lang": row["lang"]})

After:

def filter_by_keywords(dataset, keywords):
    return dataset.filter(lambda x: any(k in x["text"] for k in keywords))

def to_records(sentences, id_prefix=""):
    return [
        {"id": f"{id_prefix}{i}", "chunk_text": s["text"], "lang": s["lang"]}
        for i, s in enumerate(sentences)
    ]

filtered = filter_by_keywords(dataset, keywords)
records = to_records(filtered)

The second version reads like a description of what is happening. The function names are the documentation.

Decompose monolithic functions by stage

Identify stages by asking: at what points does the data change shape or purpose?

Example decomposition:

prepare_sentences(dataset, keywords)  →  one big function doing everything

becomes:

filter_pairs(dataset, keywords)       →  returns filtered HF dataset (pairs)
extract_sentences(pairs, lang)        →  returns single-language HF dataset
to_records(sentences, column)         →  returns list of Pinecone record dicts

Each stage can be shown, inspected, and explained independently. Each can be reused or replaced without touching the others.

Split large cells to make intermediate results visible

Rule of thumb: if a cell produces a value worth seeing (a filtered dataset, a record list, a search result), that value should be the last expression in its own cell.

# Too much in one cell — intermediate state invisible
filtered = filter_pairs(tatoeba, keywords)
english = extract_sentences(filtered, lang="en")
records = to_records(english, column="sentence")
index.upsert_records(records=records, namespace=namespace)

# Split: each step's output is inspectable
# Cell 1
filtered_pairs = filter_pairs(tatoeba, keywords=keywords)

# Cell 2 — reader can see what was extracted
english = extract_sentences(filtered_pairs, lang="en")
mo.ui.table(english, page_size=5)

# Cell 3 — reader can see the record format before upserting
records = to_records(english, column="sentence")
mo.ui.table(records, page_size=5)

# Cell 4 — upsert is its own step
for start in mo.status.progress_bar(range(0, len(records), batch_size)):
    index.upsert_records(records=records[start:start + batch_size], namespace=namespace)

Extract reusable helpers into their own cells

If a function is called more than once, or could reasonably be called with different arguments, give it its own cell. Readers can read the definition once, then see it used cleanly at each call site.

The search notebook pattern is a good model:

# One cell defines the helper
def search(query, top_k=10, lang=None):
    results = index.search(
        namespace=namespace,
        top_k=top_k,
        inputs={"text": query},
        filter={"lang": {"$eq": lang}} if lang else None,
    )
    return print_results(query, results)

# Subsequent cells are just clean call sites
search("I want to go to the park and relax")
search("Quiero ir al parque a relajarme")
search("The park is crowded today", lang="en")

Parameterize functions — avoid globals

Converted notebooks often have functions that silently close over global variables (keywords, index, namespace). This makes the function hard to reuse and hides dependencies.

Before (globals):

keywords = ["park"]

def prepare_sentences(dataset):
    return dataset.filter(lambda x: any(k in x["translation"]["en"] for k in keywords))

After (explicit parameter):

def prepare_sentences(dataset, keywords=None):
    if keywords:
        return dataset.filter(lambda x: any(k in x["translation"]["en"] for k in keywords))
    return dataset

The exception: functions that close over index and namespace in a "search" helper are reasonable — they're scoped to the notebook, and the closure reads naturally.

Remove over-explaining comments

Only comment on the non-obvious WHY — not on what the code does. Delete comments like:

# Initialize client
# convert to record format
# flatten and shuffle for ease of use
# Here, we create a record for each sentence in the dataset

Keep comments that explain constraints, workarounds, or non-obvious choices — especially when a behaviour might surprise a reader (e.g. why a version is pinned, why a parameter is omitted).

Avoid multiply-defined variables across cells

Marimo's static analysis flags top-level variables defined in more than one cell. When two cells have the same local variable names, either:

Use different names
Inline the computation (no assignment)
Consolidate both cells into one

Watch for marimo cell configuration issues

Cells created with code_mode default to hide_code=True. Always explicitly set hide_code=False for code cells that should be visible. Verify with:

for cell in ctx.cells:
    kind = "md  " if cell.config.hide_code else "code"
    print(f"[{cell.id}] {kind}: {cell.code[:60]!r}")

Phase 7: Prose and Structure

Follow .ai/writing-guidelines.md. Key points for marimo conversion:

Voice and tone

Use "we" throughout (collaborative tutorial voice)
Factual and collegial — no "super helpful!", "Neat!", "magic", "Congrats"
No superlatives, no marketing language
No time references ("recently added", "new feature")

Structure

Intersperse explanations between code cells — don't dump all prose at the top
Put "why" before the code it motivates (e.g. explain why a keyword is ambiguous just before the filter that uses it)
After showing data, explain what you see before proceeding
Use ### subheadings within sections for skimmability

Merge adjacent text cells

Remove Jupyter-specific prose

"Run the cell below" → remove or rewrite
"This notebook will..." → "This example demonstrates..."
References to Colab, Google Colab, nbviewer → remove entirely
"In this notebook" → rewrite without the word "notebook"

Section heading guidelines

Headings should be short noun phrases, not full sentences
"Meaning Over Keywords" not "Semantic Search considers the meaning of the query"
"How It Works" not "Wait, how is this working?"
"Cleanup" not "Demo Cleanup"

Phase 8: Final Checks

Run ruff

uv run ruff check docs/notebook-name.py
uv run ruff format docs/notebook-name.py

The CI pipeline runs ruff check and ruff format --check on changed .py files. Fix all issues before committing.

Verify sandbox runs

uvx marimo edit --sandbox docs/notebook-name.py --no-token

Run through the notebook end-to-end to confirm all cells execute correctly in the isolated environment.

Verify no root pyproject.toml changes

Notebook dependencies belong in the # /// script block only. If marimo's package manager added anything to pyproject.toml during development, revert those changes and restore uv.lock from main:

git checkout origin/main -- uv.lock

Common Pitfalls

Problem	Fix
`mo.ui.run_button().value` read in same cell	Split button creation and value access into separate cells
Multiply-defined variable names across cells	Inline the call or use distinct names
Cells created with `code_mode` are hidden	Explicitly set `hide_code=False`
marimo package manager edits `pyproject.toml`	Revert — deps belong in `# /// script` only
`datasets>=4` breaks dataset loading scripts	Pin to last working version (e.g. `datasets==3.5.1`)
Old SDK calls (`pc.has_index`, `pc.Index(host=...)`)	Replace with `pc.indexes.*` namespace
`tqdm` still imported but unused	Remove it — use `mo.status.progress_bar()`
`source_tag` in `pc = Pinecone(...)`	Keep it, but note in prose it's for internal Pinecone analytics — users should not include it in their own apps
Index deletion cell auto-fires on notebook load	Wrap in `mo.ui.run_button()`

convert-to-marimo

Convert Jupyter Notebook to Marimo

Phase 1: Initial Conversion

Convert with marimo

Start in sandbox mode for development

Explore the code_mode API at the start of each session

Phase 2: Dependencies

Update the # /// script metadata block

Remove unused dependencies

Watch for library compatibility breaks

Phase 3: Remove Jupyter/Colab Artifacts

Phase 4: Update the Pinecone SDK to 9.0.1

Phase 5: Adopt Marimo Affordances

Replace print() output with tables

Replace tqdm with mo.status.progress_bar()

Wrap destructive operations in mo.ui.run_button()

Use mo.callout() for status messages

Handle API key input

Display data with mo.ui.table()

Add interactive inputs for exploration

Phase 6: Code Quality

Name things to document intent

Decompose monolithic functions by stage

Split large cells to make intermediate results visible

Extract reusable helpers into their own cells

Parameterize functions — avoid globals

Remove over-explaining comments

Avoid multiply-defined variables across cells

Watch for marimo cell configuration issues

Phase 7: Prose and Structure

Voice and tone

Structure

Merge adjacent text cells

Remove Jupyter-specific prose

Section heading guidelines

Phase 8: Final Checks

Run ruff

Verify sandbox runs

Verify no root pyproject.toml changes

Common Pitfalls

Convert Jupyter Notebook to Marimo

Phase 1: Initial Conversion

Convert with marimo

Start in sandbox mode for development

Explore the code_mode API at the start of each session

Phase 2: Dependencies

Update the # /// script metadata block

Remove unused dependencies

Watch for library compatibility breaks

Phase 3: Remove Jupyter/Colab Artifacts

Phase 4: Update the Pinecone SDK to 9.0.1

Phase 5: Adopt Marimo Affordances

Replace print() output with tables

Replace tqdm with mo.status.progress_bar()

Wrap destructive operations in mo.ui.run_button()

Use mo.callout() for status messages

Handle API key input

Display data with mo.ui.table()

Add interactive inputs for exploration

Phase 6: Code Quality

Name things to document intent

Decompose monolithic functions by stage

Split large cells to make intermediate results visible

Extract reusable helpers into their own cells

Parameterize functions — avoid globals

Remove over-explaining comments

Avoid multiply-defined variables across cells

Watch for marimo cell configuration issues

Phase 7: Prose and Structure

Voice and tone

Structure

Merge adjacent text cells

Remove Jupyter-specific prose

Section heading guidelines

Phase 8: Final Checks

Run ruff

Verify sandbox runs

Verify no root pyproject.toml changes

Common Pitfalls

Update the `# /// script` metadata block

Replace `print()` output with tables

Replace `tqdm` with `mo.status.progress_bar()`

Wrap destructive operations in `mo.ui.run_button()`

Use `mo.callout()` for status messages

Display data with `mo.ui.table()`

Update the `# /// script` metadata block

Replace `print()` output with tables

Replace `tqdm` with `mo.status.progress_bar()`

Wrap destructive operations in `mo.ui.run_button()`

Use `mo.callout()` for status messages

Display data with `mo.ui.table()`