with one click
ppt-analysis
PPT (.pptx/.ppt) 全量解析。覆盖:所有 slide 文本/表格/图表提取、嵌入图片 caption、纯图片 slide 渲染识别、数据标签提取。
Menu
PPT (.pptx/.ppt) 全量解析。覆盖:所有 slide 文本/表格/图表提取、嵌入图片 caption、纯图片 slide 渲染识别、数据标签提取。
Standard and fast PPT pipeline. All LLM / VLM / T2I calls are wrapped in a single CLI entry (scripts/run_stage.py). The main agent's job is simple: emit ONE shell command per stage, never write loops, never write prompts. Standard mode plans thoroughly with a style preview checkpoint, web research, and image search for polished, delivery-ready presentations. Fast mode builds a complete draft immediately with autonomous decisions, then provides structured refinement suggestions so the user can iterate quickly. Supports AI-generated infographics (U1) for diagrams and flowcharts, web image search (Serper) for real photos, and ECharts for data charts.
PDF 文档解析。自动区分文字型 PDF 与扫描型 PDF,覆盖:文本/表格提取、多页全量扫描、嵌入图表 caption、单位感知数值计算。
Word (.docx/.doc) 文档全量解析。覆盖:正文/段落文本提取、表格数据提取、高亮/颜色格式读取、多文件汇总对比、嵌入图片转 caption。
Word / PDF / PPT 文档解析与数据分析引擎。覆盖三类文件格式的全量提取、表格数值化、图表理解与跨文档汇总分析。**遇到以下任一情况就主动使用本 skill**:①用户上传或指定了 .docx / .doc / .pdf / .pptx / .ppt 文件并要求分析、提取或统计其中内容;②用户出现触发词:Word分析 / PDF解析 / PPT提取 / 文档分析 / 报告解析 / 幻灯片分析 / 发票提取 / 合同分析 / 文档统计 / 错别字 / 语病 / 字号检查 / 简历分析 / 多文档对比;③任务涉及从文档中提取表格、数值、图表、格式(颜色/高亮/字号)、组织架构、时间线等结构化信息。仅不用于:Excel/CSV 数据分析(使用 sn-da-excel-workflow)、纯图片分析(使用 sn-da-image-caption)。
Creative-mode PPT pipeline. One full-page 16:9 PNG per slide. LLM / VLM calls go through sn-ppt-standard/lib/model_client.py (shared thin client). Text-to-image (the actual png rendering) goes through sn-image-base/scripts/sn_agent_runner.py. Falls back to web image search when T2I generation fails. Expects task_pack.json + info_pack.json already written by sn-ppt-entry.
Entry point for PPT generation. Asks the user to choose a mode (fast, standard, or creative), then collects role / audience / scene / page_count as needed. For standard mode, also asks how images should be sourced (AI generation, web search, or none) and whether charts should use AI-generated infographics or ECharts. Parses uploaded pdf/docx/md/txt files, produces task_pack.json + info_pack.json in a new deck_dir, then dispatches to sn-ppt-creative or sn-ppt-standard. Fast mode skips optional questions and gets straight to building. Use when the user asks to make a PPT / presentation / 演示 / PPT.
| name | ppt-analysis |
| description | PPT (.pptx/.ppt) 全量解析。覆盖:所有 slide 文本/表格/图表提取、嵌入图片 caption、纯图片 slide 渲染识别、数据标签提取。 |
from pptx import Presentation
from pptx.util import Inches
import os, subprocess, json
# python-pptx is available
# For .ppt (old binary format): convert via libreoffice
def load_pptx(path):
if path.lower().endswith('.ppt'):
import subprocess
out_dir = os.path.dirname(path)
subprocess.run(
['libreoffice', '--headless', '--convert-to', 'pptx', '--outdir', out_dir, path],
check=True, capture_output=True
)
path = path.rsplit('.', 1)[0] + '.pptx'
return Presentation(path), path
def extract_all_slides_text(pptx_path):
"""
Extract text from every slide: text frames, tables, chart titles.
For slides with no extractable text, flag them for image captioning.
"""
prs, _ = load_pptx(pptx_path)
slides_data = []
for slide_num, slide in enumerate(prs.slides, start=1):
slide_texts = []
has_text = False
for shape in slide.shapes:
# Text frame (most common)
if shape.has_text_frame:
for para in shape.text_frame.paragraphs:
text = para.text.strip()
if text:
slide_texts.append(text)
has_text = True
# Table
if shape.has_table:
tbl = shape.table
for row in tbl.rows:
row_text = '\t'.join(cell.text.strip() for cell in row.cells)
if row_text.strip():
slide_texts.append(row_text)
has_text = True
# Chart title
if shape.shape_type == 3: # MSO_SHAPE_TYPE.CHART
try:
if shape.chart.has_title:
title = shape.chart.chart_title.text_frame.text
slide_texts.append(f"[Chart: {title}]")
has_text = True
except Exception:
pass
slides_data.append({
'slide': slide_num,
'text': '\n'.join(slide_texts),
'has_text': has_text,
'needs_caption': not has_text # flag image-only slides
})
print(f"Total slides: {len(slides_data)}")
image_only = sum(1 for s in slides_data if s['needs_caption'])
print(f"Slides with text: {len(slides_data) - image_only}, image-only: {image_only}")
return slides_data
import pandas as pd
def extract_pptx_tables(pptx_path):
"""Extract all tables from all slides as DataFrames."""
prs, _ = load_pptx(pptx_path)
all_tables = []
for slide_num, slide in enumerate(prs.slides, start=1):
for shape in slide.shapes:
if not shape.has_table:
continue
tbl = shape.table
rows = []
for row in tbl.rows:
rows.append([cell.text.strip() for cell in row.cells])
if not rows:
continue
# Use first row as header
try:
df = pd.DataFrame(rows[1:], columns=rows[0])
except Exception:
df = pd.DataFrame(rows)
all_tables.append({'slide': slide_num, 'df': df})
print(f" Slide {slide_num}: table {df.shape[0]}r × {df.shape[1]}c")
print(df.head(3).to_string())
return all_tables
python-pptx can read Chart data when it's stored as embedded Excel data.
If that fails, fall back to captioning the slide image.
def extract_chart_data(pptx_path):
"""
Extract data series from Chart shapes.
Returns list of {slide, chart_title, series_name, categories, values}.
"""
prs, _ = load_pptx(pptx_path)
charts = []
for slide_num, slide in enumerate(prs.slides, start=1):
for shape in slide.shapes:
if shape.shape_type != 3: # not a chart
continue
try:
chart = shape.chart
title = chart.chart_title.text_frame.text if chart.has_title else f"Chart_S{slide_num}"
for plot in chart.plots:
for series in plot.series:
try:
categories = [str(pt.label) for pt in series.data_labels] if hasattr(series, 'data_labels') else []
values = [pt.value for pt in series.values] if hasattr(series, 'values') else []
# Alternative: use xChart data
if not values:
values = list(series.values)
except Exception as e:
values = []
categories = []
charts.append({
'slide': slide_num,
'chart_title': title,
'series': getattr(series, 'name', ''),
'categories': categories,
'values': values
})
except Exception as e:
print(f" Slide {slide_num}: chart extraction failed ({e}) — will use caption")
return charts
When a slide has no extractable text (pure image/screenshot slides):
import fitz # PyMuPDF can also render PPTX via LibreOffice conversion
CAPTION = "/path/to/skills/sn-da-image-caption/scripts/caption.py"
def caption_image_slides(pptx_path, slides_data, prompt=None):
"""
For slides flagged as 'needs_caption', render to PNG and caption.
Uses LibreOffice to convert PPTX to PDF first, then renders pages.
"""
image_slides = [s for s in slides_data if s['needs_caption']]
if not image_slides:
print("No image-only slides to caption.")
return slides_data
# Convert PPTX → PDF (preserves slide visuals)
out_dir = "/tmp"
r = subprocess.run(
['libreoffice', '--headless', '--convert-to', 'pdf', '--outdir', out_dir, pptx_path],
capture_output=True, text=True
)
pdf_name = os.path.basename(pptx_path).rsplit('.', 1)[0] + '.pdf'
pdf_path = os.path.join(out_dir, pdf_name)
if not os.path.exists(pdf_path):
print(f"LibreOffice conversion failed: {r.stderr[:200]}")
return slides_data
# Render each image-only slide
doc = fitz.open(pdf_path)
for s in image_slides:
page_idx = s['slide'] - 1 # 0-indexed
if page_idx >= len(doc):
continue
page = doc[page_idx]
mat = fitz.Matrix(150/72, 150/72)
pix = page.get_pixmap(matrix=mat)
img_path = f"/tmp/slide_{s['slide']}.png"
pix.save(img_path)
# Caption the slide image
cmd = ["python3", CAPTION, img_path, "--json"]
p = prompt or "提取幻灯片中所有文字、数值和表格内容,保持结构,Markdown格式输出。"
cmd += ["--prompt", p]
cr = subprocess.run(cmd, capture_output=True, text=True, timeout=90)
if cr.returncode == 0:
desc = json.loads(cr.stdout).get("description", "")
s['text'] = desc
s['needs_caption'] = False
print(f" Slide {s['slide']}: captioned ({len(desc)} chars)")
else:
print(f" Slide {s['slide']}: caption failed — {cr.stderr[:80]}")
doc.close()
return slides_data
def find_in_pptx(pptx_path, keyword, slides_data=None):
"""Find keyword across all slides (after text extraction + captioning)."""
if slides_data is None:
slides_data = extract_all_slides_text(pptx_path)
results = []
for s in slides_data:
if keyword in s.get('text', ''):
idx = s['text'].find(keyword)
context = s['text'][max(0, idx-100):idx+200]
results.append({'slide': s['slide'], 'context': context})
print(f"'{keyword}' found in {len(results)} slides: {[r['slide'] for r in results]}")
return results
def extract_timeline(pptx_path, date_pattern=r'\d{4}[年/\-]\d{1,2}'):
"""Extract date-tagged events from slide text."""
import re
slides_data = extract_all_slides_text(pptx_path)
events = []
for s in slides_data:
for line in s['text'].split('\n'):
if re.search(date_pattern, line):
events.append({'slide': s['slide'], 'event': line.strip()})
return events
def compute_ratio_from_pptx_table(pptx_path, numerator_col, denominator_col):
"""Example: compute ratio = col_A / col_B for all rows."""
tables = extract_pptx_tables(pptx_path)
for item in tables:
df = item['df']
# Try to find columns (flexible matching)
num_col = next((c for c in df.columns if numerator_col in c), None)
den_col = next((c for c in df.columns if denominator_col in c), None)
if num_col and den_col:
df[num_col] = pd.to_numeric(df[num_col].str.replace('人', '').str.strip(), errors='coerce')
df[den_col] = pd.to_numeric(df[den_col].str.replace('人', '').str.strip(), errors='coerce')
df['ratio'] = (df[num_col] / df[den_col] * 100).round(0).astype(str) + '%'
print(df[['slide' if 'slide' in df.columns else df.columns[0], num_col, den_col, 'ratio']].to_string())
pptx_path = "/mnt/data/report.pptx"
# 1. Extract text from all slides
slides_data = extract_all_slides_text(pptx_path)
# 2. Caption image-only slides
slides_data = caption_image_slides(pptx_path, slides_data)
# 3. Combine all text for analysis
all_text = '\n\n'.join(
f"[Slide {s['slide']}]\n{s['text']}"
for s in slides_data if s.get('text')
)
# 4. Search or analyze
results = find_in_pptx(pptx_path, '录用占比', slides_data)
# 5. Extract tables if needed
tables = extract_pptx_tables(pptx_path)
| Pitfall | Fix |
|---|---|
| Skip slides with no text → miss chart data | Flag needs_caption, render & caption (Method 4) |
shape.chart.plots[0].series fails → no data | Catch exception, fall back to captioning the slide |
| Table columns misread (企业名 vs 岗位名) | Print headers + first 3 rows before computing; verify column meaning |
| Only read first N slides | Always for slide in prs.slides — no index limit |
.ppt format → python-pptx can't open | Convert to .pptx via libreoffice first |
| PPT has overlapping text boxes → garbled order | Sort shapes by top-left position: sorted(slide.shapes, key=lambda s: (s.top, s.left)) |