Run any Skill in Manus with one click

xlsx-parsing

Read Microsoft Excel (.xlsx) files robustly with `openpyxl` (or `pandas`). Covers multi-sheet workbooks, header rows, empty cells, merged cells, comma-separated list cells, and converting a sheet to a list-of-dicts the rest of your code can consume. Use when a task input or reference document is an `.xlsx` file rather than JSON/CSV.

Run Skill in Manus

Stars1,367

Forks317

UpdatedJune 11, 2026 at 19:23

Source

benchflow-ai

benchflow-ai/skillsbench

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

SKILL.md

readonly

name	xlsx-parsing
description	Read Microsoft Excel (.xlsx) files robustly with `openpyxl` (or `pandas`). Covers multi-sheet workbooks, header rows, empty cells, merged cells, comma-separated list cells, and converting a sheet to a list-of-dicts the rest of your code can consume. Use when a task input or reference document is an `.xlsx` file rather than JSON/CSV.

xlsx-parsing

Excel workbooks are the lingua franca of operational documents that nobody bothered to put in a database — playbooks, rate cards, deviation policies, finance models, SLAs. They show up in tasks with three properties that trip up naive readers:

Multiple sheets, only one of which is the data you actually want.
Sparse cells — a row that uses a column may sit next to a row that doesn't, leaving None cells. Empty is meaningful (the rule does not apply), not an error.
Composite cells — a single cell that contains a comma-separated list, a JSON blob, or a sentence rather than an atomic value.

Treat the workbook as a typed table with declared columns, not a free-form spreadsheet. Read every sheet you need, normalise it to list[dict[str, Any]], then operate on that.

Reading with `openpyxl` (pure Python, no compiled dependencies)

import openpyxl

wb = openpyxl.load_workbook("workbook.xlsx", data_only=True, read_only=True)
print(wb.sheetnames)            # e.g., ['Metadata', 'Definitions', 'Rules']

ws = wb["Rules"]
rows = ws.iter_rows(values_only=True)
header = [str(c).strip() if c else "" for c in next(rows)]
records = [dict(zip(header, row)) for row in rows if any(cell is not None for cell in row)]

Notes:

data_only=True returns the cached value of formula cells instead of the formula expression. Without this you may get strings like "=A1+B2".
read_only=True is faster on big workbooks and avoids loading styles you don't need.
The any(cell is not None ...) filter drops entirely-blank rows that Excel preserves at the bottom of a sheet.
dict(zip(header, row)) handles trailing empty columns gracefully when a row is shorter than the header.

Reading with `pandas` (if it's installed)

import pandas as pd

# Multi-sheet read returns a dict of DataFrames
sheets = pd.read_excel("workbook.xlsx", sheet_name=None, dtype=object)
rules_df = sheets["Rules"]

# Drop fully-empty rows; keep partial rows
rules_df = rules_df.dropna(how="all")

# Iterate as dicts; NaN becomes None
records = rules_df.where(rules_df.notna(), None).to_dict(orient="records")

pandas is heavier but useful when you want grouping, joins, or numeric aggregation. Either library is fine; do not mix them in the same module.

Empty cells: `None` is the answer, not an error

A row that does not specify a numeric cap leaves that cell blank. The blank is part of the rule's shape — it means "no cap applies" or "this constraint is not engaged for this rule." Code defensively:

def get(rec, key, default=None):
    val = rec.get(key)
    return default if val is None or (isinstance(val, str) and not val.strip()) else val

Compare against is None or call .strip() rather than truthiness — 0 and False are valid values that fail truthy tests.

Composite cells

Authors often put list-valued data into a single cell. A cell containing "Delaware, New York, California" is one string, not three rows.

def split_list_cell(value):
    if value is None:
        return []
    return [item.strip() for item in str(value).split(",") if item.strip()]

If you see a cell with curly-brace text, it is probably an embedded JSON document; parse with json.loads. Try the simple split first.

Merged cells

Merged cells appear once in the underlying data; only the top-left cell holds the value, and the rest are None. If a column is intentionally merged for a "section header" effect, fill the value down to recover row-wise records:

last = None
for row in records:
    if row["section"] is None:
        row["section"] = last
    else:
        last = row["section"]

If you need to know whether a cell is in a merged range, ws.merged_cells.ranges gives you the list.

Multiple sheets

Use the metadata sheet (often named Metadata, Info, or README) for workbook-level fields, and the data sheet(s) for per-record rows. Read all sheets you need before processing — do not assume the schema of one sheet is described inside another sheet you have not opened.

Putting it together for a configuration-style workbook

import openpyxl

def load_sheet_as_records(wb, sheet_name):
    ws = wb[sheet_name]
    rows = ws.iter_rows(values_only=True)
    header = [str(c).strip() if c else "" for c in next(rows)]
    return [
        dict(zip(header, row))
        for row in rows
        if any(cell is not None for cell in row)
    ]

wb = openpyxl.load_workbook("workbook.xlsx", data_only=True, read_only=True)
metadata = {row[0]: row[1] for row in wb["Metadata"].iter_rows(min_row=2, values_only=True)}
defs  = load_sheet_as_records(wb, "Definitions")
rules = load_sheet_as_records(wb, "Rules")

After this, rules[0]["key"], rules[0]["rule_type"], etc. are plain Python values you can branch on. The rest of your code does not need to know the input was Excel.

When not to use this skill

The file is .csv — use csv.DictReader directly.
The file is .json or .jsonl — use json.loads.
The file is .xls (legacy binary) — openpyxl will refuse; use xlrd<2 or convert to .xlsx first.

More from this repository

same repository

skillsbench

benchflow-ai/skillsbench

SkillsBench contribution workflow. Use when: (1) Creating benchmark tasks, (2) Understanding repo structure, (3) Preparing PRs for task submission.

2026-06-161.4k

task-creator

benchflow-ai/skillsbench

SkillsBench task authoring — walk a contributor from idea to submission-ready task following CONTRIBUTING.md and the task-implementation rubric. Use when the user wants to create a new SkillsBench task, scaffold a task from an existing workflow (notebook, Excel workbook, document, dataset), convert a prompt or a benchmark item into a SkillsBench task, write skills for a task, or prepare a SkillsBench PR. Pairs with `task-review` (run that as a self-check before submitting).

2026-06-161.4k

task-review

benchflow-ai/skillsbench

SkillsBench task PR review — classifies the task track (standard / research / multimodal), runs static policy checks against the track-specific rubric, benchmarks the task across oracle plus Claude and Codex (with and without skills), audits trajectories for cheating and skill invocation, and produces a `pr-N-task-timestamp-run.txt` review report alongside a `prN.zip` bundle of trajectories. Use when reviewing a SkillsBench task PR (by number, branch, or local task path), when the user asks to review a task, run benchmarks on a PR, audit a submission, classify a task as research or multimodal track, or prepare a comment to post on a SkillsBench PR.

2026-06-161.4k

contract-playbook-review

benchflow-ai/skillsbench

Methodology for clause-by-clause review of a contract against a structured deviation policy ("playbook"). Covers how to walk a playbook, locate the matching provision in the contract, apply rule types (max-value, must-be-present, must-be-absent, acceptable-set, must-have-feature), classify the result (ok / risk / reject), choose the prescribed action, and ground each finding in a verbatim excerpt. Use whenever reviewing any contract — NDA, MSA, vendor DD questionnaire, lease, DPA — against a structured rules-based playbook.

2026-06-111.4k

nda-clause-taxonomy

benchflow-ai/skillsbench

Reference for the standard clauses found in commercial non-disclosure agreements (mutual and one-way) — what each clause does, the surface forms it appears in, and how to recognise it in unfamiliar drafting. Use when reviewing, comparing, or extracting provisions from any confidentiality / NDA / mutual NDA / standstill-and-confidentiality agreement.

2026-06-111.4k

hierarchical-taxonomy-clustering

benchflow-ai/skillsbench

Build unified multi-level category taxonomy from hierarchical product category paths from any e-commerce companies using embedding-based recursive clustering with intelligent category naming via weighted word frequency analysis.

2026-06-081.4k

name	xlsx-parsing
description	Read Microsoft Excel (.xlsx) files robustly with `openpyxl` (or `pandas`). Covers multi-sheet workbooks, header rows, empty cells, merged cells, comma-separated list cells, and converting a sheet to a list-of-dicts the rest of your code can consume. Use when a task input or reference document is an `.xlsx` file rather than JSON/CSV.

xlsx-parsing

Multiple sheets, only one of which is the data you actually want.
Sparse cells — a row that uses a column may sit next to a row that doesn't, leaving None cells. Empty is meaningful (the rule does not apply), not an error.
Composite cells — a single cell that contains a comma-separated list, a JSON blob, or a sentence rather than an atomic value.

Treat the workbook as a typed table with declared columns, not a free-form spreadsheet. Read every sheet you need, normalise it to list[dict[str, Any]], then operate on that.

Reading with `openpyxl` (pure Python, no compiled dependencies)

import openpyxl

wb = openpyxl.load_workbook("workbook.xlsx", data_only=True, read_only=True)
print(wb.sheetnames)            # e.g., ['Metadata', 'Definitions', 'Rules']

ws = wb["Rules"]
rows = ws.iter_rows(values_only=True)
header = [str(c).strip() if c else "" for c in next(rows)]
records = [dict(zip(header, row)) for row in rows if any(cell is not None for cell in row)]

Notes:

data_only=True returns the cached value of formula cells instead of the formula expression. Without this you may get strings like "=A1+B2".
read_only=True is faster on big workbooks and avoids loading styles you don't need.
The any(cell is not None ...) filter drops entirely-blank rows that Excel preserves at the bottom of a sheet.
dict(zip(header, row)) handles trailing empty columns gracefully when a row is shorter than the header.

Reading with `pandas` (if it's installed)

import pandas as pd

# Multi-sheet read returns a dict of DataFrames
sheets = pd.read_excel("workbook.xlsx", sheet_name=None, dtype=object)
rules_df = sheets["Rules"]

# Drop fully-empty rows; keep partial rows
rules_df = rules_df.dropna(how="all")

# Iterate as dicts; NaN becomes None
records = rules_df.where(rules_df.notna(), None).to_dict(orient="records")

pandas is heavier but useful when you want grouping, joins, or numeric aggregation. Either library is fine; do not mix them in the same module.

Empty cells: `None` is the answer, not an error

def get(rec, key, default=None):
    val = rec.get(key)
    return default if val is None or (isinstance(val, str) and not val.strip()) else val

Compare against is None or call .strip() rather than truthiness — 0 and False are valid values that fail truthy tests.

Composite cells

Authors often put list-valued data into a single cell. A cell containing "Delaware, New York, California" is one string, not three rows.

def split_list_cell(value):
    if value is None:
        return []
    return [item.strip() for item in str(value).split(",") if item.strip()]

If you see a cell with curly-brace text, it is probably an embedded JSON document; parse with json.loads. Try the simple split first.

Merged cells

last = None
for row in records:
    if row["section"] is None:
        row["section"] = last
    else:
        last = row["section"]

If you need to know whether a cell is in a merged range, ws.merged_cells.ranges gives you the list.

Multiple sheets

Putting it together for a configuration-style workbook

import openpyxl

def load_sheet_as_records(wb, sheet_name):
    ws = wb[sheet_name]
    rows = ws.iter_rows(values_only=True)
    header = [str(c).strip() if c else "" for c in next(rows)]
    return [
        dict(zip(header, row))
        for row in rows
        if any(cell is not None for cell in row)
    ]

wb = openpyxl.load_workbook("workbook.xlsx", data_only=True, read_only=True)
metadata = {row[0]: row[1] for row in wb["Metadata"].iter_rows(min_row=2, values_only=True)}
defs  = load_sheet_as_records(wb, "Definitions")
rules = load_sheet_as_records(wb, "Rules")

After this, rules[0]["key"], rules[0]["rule_type"], etc. are plain Python values you can branch on. The rest of your code does not need to know the input was Excel.

When not to use this skill

The file is .csv — use csv.DictReader directly.
The file is .json or .jsonl — use json.loads.
The file is .xls (legacy binary) — openpyxl will refuse; use xlrd<2 or convert to .xlsx first.

xlsx-parsing

xlsx-parsing

Reading with openpyxl (pure Python, no compiled dependencies)

Reading with pandas (if it's installed)

Empty cells: None is the answer, not an error

Composite cells

Merged cells

Multiple sheets

Putting it together for a configuration-style workbook

When not to use this skill

More from this repository

xlsx-parsing

Reading with openpyxl (pure Python, no compiled dependencies)

Reading with pandas (if it's installed)

Empty cells: None is the answer, not an error

Composite cells

Merged cells

Multiple sheets

Putting it together for a configuration-style workbook

When not to use this skill

More from this repository

Reading with `openpyxl` (pure Python, no compiled dependencies)

Reading with `pandas` (if it's installed)

Empty cells: `None` is the answer, not an error

Reading with `openpyxl` (pure Python, no compiled dependencies)

Reading with `pandas` (if it's installed)

Empty cells: `None` is the answer, not an error