| name | convert-to-marimo |
| description | This skill should be used when the user asks to "convert a notebook to marimo", "migrate a Jupyter notebook to marimo", "rewrite a notebook in marimo", or wants to modernize an existing .ipynb file into a high-quality marimo .py notebook. |
| version | 0.1.0 |
Convert Jupyter Notebook to Marimo
Convert an existing .ipynb Jupyter notebook into a high-quality marimo .py notebook, updating dependencies, adopting marimo affordances, improving code quality, and revising prose to meet the Pinecone examples writing guidelines.
Writing guidelines reference: See .ai/writing-guidelines.md for voice, tone, and style.
Phase 1: Initial Conversion
Convert with marimo
uv run marimo convert path/to/notebook.ipynb -o docs/notebook-name.py
Start in sandbox mode for development
uvx marimo edit --sandbox docs/notebook-name.py --no-token
Sandbox mode creates an isolated environment from the notebook's # /// script inline metadata — none of the project's root dependencies bleed in. Always develop in sandbox mode so the dependency list stays honest.
Explore the code_mode API at the start of each session
import marimo._code_mode as cm
help(cm)
The API can change between marimo versions; verify it before using it.
Phase 2: Dependencies
Update the # /// script metadata block
The converted file will have a metadata block at the top. This is the source of truth for the notebook's dependencies when running in sandbox mode.
Rules:
- Pin every dependency to a specific version with
==
- Include
marimo>=0.23.6 (or current version)
- Pin the Pinecone SDK to
9.0.1 (or latest)
- Keep this block in the notebook file — never add notebook deps to the root
pyproject.toml
Remove unused dependencies
After conversion, audit the declared deps against what's actually imported. Common removals:
tqdm — replaced by mo.status.progress_bar()
numpy — often imported by Jupyter cells that don't need it directly
pinecone-notebooks — Colab-only authentication widget, not needed in marimo
Watch for library compatibility breaks
Check whether newer versions of dependencies break with the data sources used. A known example: datasets>=4.0 dropped support for custom loading scripts (e.g. Helsinki-NLP/tatoeba). Pin to the last working version and note why in a comment.
Phase 3: Remove Jupyter/Colab Artifacts
Delete or replace:
- Colab/nbviewer badges — strip from the header markdown cell
!pip install cells — dependencies are declared in # /// script, not installed at runtime
- Colab authentication cells —
pinecone_notebooks.colab.Authenticate() and similar widgets
- "Note: pip install is formatted for Jupyter" markdown — not relevant in marimo
## Installation section headings — no installation step needed
trust_remote_code=True notes — keep the argument but remove surrounding Jupyter-specific explanation
- References to "this notebook", "run this cell", "Jupyter" — rewrite as plain prose
Phase 4: Update the Pinecone SDK to 9.0.1
Replace deprecated method calls with the pc.indexes.* namespace:
| Old | New |
|---|
pc.has_index(name=x) | pc.indexes.exists(name=x) |
pc.create_index(name=x, ...) | pc.indexes.create(name=x, ...) |
pc.describe_index(name=x) | pc.indexes.describe(name=x) |
pc.delete_index(name=x) | pc.indexes.delete(name=x) |
pc.Index(host=desc.host) | pc.index(name=x) |
index.search(namespace=ns, query={"top_k": k, "inputs": {...}}) | index.search(namespace=ns, top_k=k, inputs={...}) |
results["result"]["hits"] / result["_score"] | results.result.hits / hit.score |
Always use keyword argument names in all Pinecone API calls — positional args are harder to read and more fragile across SDK versions.
Phase 5: Adopt Marimo Affordances
Replace print() output with tables
for result in results:
print(f"{result['text']} (score: {result['score']})")
mo.ui.table([{"text": r["text"], "score": r["score"]} for r in results])
Use mo.vstack() to combine a heading with a table:
mo.vstack([
mo.md(f"**Query:** {query}"),
mo.ui.table(data, show_column_summaries=False),
])
Replace tqdm with mo.status.progress_bar()
for batch in tqdm(batches):
index.upsert(batch)
for batch in mo.status.progress_bar(batches, title="Upserting", show_rate=True, show_eta=True):
index.upsert(batch)
When passing a range, omit total — marimo infers it from len(range(...)).
Wrap destructive operations in mo.ui.run_button()
Marimo's reactive model means all cells run automatically. A cleanup cell that deletes an index will fire immediately — gate it with a button:
delete_button = mo.ui.run_button(label="Delete index")
delete_button
mo.stop(not delete_button.value)
pc.indexes.delete(name=index_name)
Use mo.callout() for status messages
mo.callout(mo.md("API key loaded from environment."), kind="success")
mo.callout(mo.md("Enter your API key to continue."), kind="info")
mo.callout(mo.md("**Error:** index not found."), kind="danger")
Kinds: neutral, info, warn, success, danger.
Handle API key input
Users running locally can set PINECONE_API_KEY in their environment or a .env file (marimo reads .env on startup). Users in molab need a password input:
env_key = os.environ.get("PINECONE_API_KEY", "")
api_key_input = mo.ui.text(
kind="password",
placeholder="pcsk_...",
label="Pinecone API Key",
value=env_key,
full_width=True,
)
(
mo.callout(mo.md("API key loaded from environment."), kind="success")
if env_key
else mo.vstack([
mo.callout(mo.md("Enter your Pinecone API key. Get a free key at [app.pinecone.io](https://app.pinecone.io)."), kind="info"),
api_key_input,
])
)
api_key = api_key_input.value
mo.stop(
not api_key,
mo.callout(mo.md("**API key required.** Enter your key above to continue."), kind="danger"),
)
pc = Pinecone(api_key=api_key, source_tag="pinecone_examples:...")
Display data with mo.ui.table()
HuggingFace datasets and lists of dicts both work directly:
mo.ui.table(dataset, page_size=10)
mo.ui.table(records, page_size=10)
Add interactive inputs for exploration
At the end of the notebook, add a "Try It Yourself" section:
query_input = mo.ui.text(value="default query", full_width=True)
lang_select = mo.ui.radio(
options={"All": None, "English": "en", "Spanish": "es"},
value="All",
)
mo.vstack([query_input, lang_select])
Then in the next cell:
search(query_input.value, lang=lang_select.value)
Results update when the user changes either input.
Phase 6: Code Quality
Name things to document intent
Well-named functions and variables replace comments. If you find yourself writing a comment to explain what a block of code does, that is a signal to extract it into a named function instead.
Before:
results = []
for i, row in enumerate(dataset.filter(lambda x: any(k in x["text"] for k in keywords))):
results.append({"id": str(i), "chunk_text": row["text"], "lang": row["lang"]})
After:
def filter_by_keywords(dataset, keywords):
return dataset.filter(lambda x: any(k in x["text"] for k in keywords))
def to_records(sentences, id_prefix=""):
return [
{"id": f"{id_prefix}{i}", "chunk_text": s["text"], "lang": s["lang"]}
for i, s in enumerate(sentences)
]
filtered = filter_by_keywords(dataset, keywords)
records = to_records(filtered)
The second version reads like a description of what is happening. The function names are the documentation.
Decompose monolithic functions by stage
Jupyter notebooks often have one large function that loads, filters, transforms, and formats data all at once. Split it along its natural stages — each stage becomes a function with a clear name and a clear input/output contract.
Identify stages by asking: at what points does the data change shape or purpose?
Example decomposition:
prepare_sentences(dataset, keywords) → one big function doing everything
becomes:
filter_pairs(dataset, keywords) → returns filtered HF dataset (pairs)
extract_sentences(pairs, lang) → returns single-language HF dataset
to_records(sentences, column) → returns list of Pinecone record dicts
Each stage can be shown, inspected, and explained independently. Each can be reused or replaced without touching the others.
Split large cells to make intermediate results visible
Marimo cells produce output. A single cell that does five things produces one output — or none. Splitting at stage boundaries lets each step show its result, which helps readers understand what changed and why.
Rule of thumb: if a cell produces a value worth seeing (a filtered dataset, a record list, a search result), that value should be the last expression in its own cell.
filtered = filter_pairs(tatoeba, keywords)
english = extract_sentences(filtered, lang="en")
records = to_records(english, column="sentence")
index.upsert_records(records=records, namespace=namespace)
filtered_pairs = filter_pairs(tatoeba, keywords=keywords)
english = extract_sentences(filtered_pairs, lang="en")
mo.ui.table(english, page_size=5)
records = to_records(english, column="sentence")
mo.ui.table(records, page_size=5)
for start in mo.status.progress_bar(range(0, len(records), batch_size)):
index.upsert_records(records=records[start:start + batch_size], namespace=namespace)
Extract reusable helpers into their own cells
If a function is called more than once, or could reasonably be called with different arguments, give it its own cell. Readers can read the definition once, then see it used cleanly at each call site.
The search notebook pattern is a good model:
def search(query, top_k=10, lang=None):
results = index.search(
namespace=namespace,
top_k=top_k,
inputs={"text": query},
filter={"lang": {"$eq": lang}} if lang else None,
)
return print_results(query, results)
search("I want to go to the park and relax")
search("Quiero ir al parque a relajarme")
search("The park is crowded today", lang="en")
Parameterize functions — avoid globals
Converted notebooks often have functions that silently close over global variables (keywords, index, namespace). This makes the function hard to reuse and hides dependencies.
Before (globals):
keywords = ["park"]
def prepare_sentences(dataset):
return dataset.filter(lambda x: any(k in x["translation"]["en"] for k in keywords))
After (explicit parameter):
def prepare_sentences(dataset, keywords=None):
if keywords:
return dataset.filter(lambda x: any(k in x["translation"]["en"] for k in keywords))
return dataset
The exception: functions that close over index and namespace in a "search" helper are reasonable — they're scoped to the notebook, and the closure reads naturally.
Remove over-explaining comments
Only comment on the non-obvious WHY — not on what the code does. Delete comments like:
# Initialize client
# convert to record format
# flatten and shuffle for ease of use
# Here, we create a record for each sentence in the dataset
Keep comments that explain constraints, workarounds, or non-obvious choices — especially when a behaviour might surprise a reader (e.g. why a version is pinned, why a parameter is omitted).
Avoid multiply-defined variables across cells
Marimo's static analysis flags top-level variables defined in more than one cell. When two cells have the same local variable names, either:
- Use different names
- Inline the computation (no assignment)
- Consolidate both cells into one
Watch for marimo cell configuration issues
Cells created with code_mode default to hide_code=True. Always explicitly set hide_code=False for code cells that should be visible. Verify with:
for cell in ctx.cells:
kind = "md " if cell.config.hide_code else "code"
print(f"[{cell.id}] {kind}: {cell.code[:60]!r}")
Phase 7: Prose and Structure
Follow .ai/writing-guidelines.md. Key points for marimo conversion:
Voice and tone
- Use "we" throughout (collaborative tutorial voice)
- Factual and collegial — no "super helpful!", "Neat!", "magic", "Congrats"
- No superlatives, no marketing language
- No time references ("recently added", "new feature")
Structure
- Intersperse explanations between code cells — don't dump all prose at the top
- Put "why" before the code it motivates (e.g. explain why a keyword is ambiguous just before the filter that uses it)
- After showing data, explain what you see before proceeding
- Use
### subheadings within sections for skimmability
Merge adjacent text cells
When two or more markdown cells appear next to each other with no code between them, consolidate them into one unless they serve structurally distinct purposes (e.g. a section heading followed by body text can be merged).
Remove Jupyter-specific prose
- "Run the cell below" → remove or rewrite
- "This notebook will..." → "This example demonstrates..."
- References to Colab, Google Colab, nbviewer → remove entirely
- "In this notebook" → rewrite without the word "notebook"
Section heading guidelines
- Headings should be short noun phrases, not full sentences
- "Meaning Over Keywords" not "Semantic Search considers the meaning of the query"
- "How It Works" not "Wait, how is this working?"
- "Cleanup" not "Demo Cleanup"
Phase 8: Final Checks
Run ruff
uv run ruff check docs/notebook-name.py
uv run ruff format docs/notebook-name.py
The CI pipeline runs ruff check and ruff format --check on changed .py files. Fix all issues before committing.
Verify sandbox runs
uvx marimo edit --sandbox docs/notebook-name.py --no-token
Run through the notebook end-to-end to confirm all cells execute correctly in the isolated environment.
Verify no root pyproject.toml changes
Notebook dependencies belong in the # /// script block only. If marimo's package manager added anything to pyproject.toml during development, revert those changes and restore uv.lock from main:
git checkout origin/main -- uv.lock
Common Pitfalls
| Problem | Fix |
|---|
mo.ui.run_button().value read in same cell | Split button creation and value access into separate cells |
| Multiply-defined variable names across cells | Inline the call or use distinct names |
Cells created with code_mode are hidden | Explicitly set hide_code=False |
marimo package manager edits pyproject.toml | Revert — deps belong in # /// script only |
datasets>=4 breaks dataset loading scripts | Pin to last working version (e.g. datasets==3.5.1) |
Old SDK calls (pc.has_index, pc.Index(host=...)) | Replace with pc.indexes.* namespace |
tqdm still imported but unused | Remove it — use mo.status.progress_bar() |
source_tag in pc = Pinecone(...) | Keep it, but note in prose it's for internal Pinecone analytics — users should not include it in their own apps |
| Index deletion cell auto-fires on notebook load | Wrap in mo.ui.run_button() |