| name | ir-financials-extractor |
| description | Extract H1 2025 and H2 2024 financial data from any listed company's investor relations website using Playwright browser automation. Use when asked to get half-year financials, interim results, semi-annual reports, or compare H1/H2 periods for publicly traded companies. Handles cookie consent, navigation to IR sections, and derives H2 data from FY minus H1 when no standalone H2 report exists. |
IR Financials Extractor
Extract H1 2025 + H2 2024 financials from any listed company IR site (no login required).
Goal
Given a public listed company website, navigate to Investor Relations, locate:
- H1 2025 report (or "Interim results 2025 / Half-year 2025")
- H2 2024 report (or a source that clearly represents the second half of FY 2024)
Then record key financial details for both periods in a two-column financials table.
Constraints
- No logins required
- ONLY use sources from the official company website - the URL will be provided by the user. Do NOT use third-party financial news sites, aggregators, or external databases. All data must come directly from the company's own IR pages, PDFs, or official announcements hosted on their domain.
- Be robust to different naming conventions and layouts
Guardrails (MANDATORY)
1. Domain Restriction Enforcement
CRITICAL: You MUST stay within the user-specified domain(s) at all times.
- Extract the domain from the user-provided URL (e.g.,
www.example.com from https://www.example.com/path)
- NEVER navigate to any URL outside the specified domain(s)
- If a link redirects to an external domain, DO NOT follow it - record it as "External link - not followed" in sources.md
- Before every navigation, verify the target URL is within the allowed domain(s)
- Subdomains of the main domain are permitted (e.g.,
investors.example.com is allowed if example.com was specified)
Domain validation checklist before each navigation:
✓ Is the URL on the user-specified domain or its subdomain?
✓ If not, DO NOT navigate - log the blocked URL and continue
2. Tool Usage Restrictions
PROHIBITED TOOLS - DO NOT USE:
- ❌
WebFetch - Never use for any purpose during this skill
- ❌
WebSearch - Never use for any purpose during this skill
REQUIRED: Use Playwright browser automation tools for all data gathering.
All navigation, page reading, interactions, and evidence capture must be performed through Playwright MCP tools. This ensures:
- All navigation is visible and auditable
- Domain restrictions can be enforced
- Screenshots provide verifiable evidence
- No external sources can be accessed inadvertently
Rationale: WebFetch and WebSearch can access any URL on the internet, bypassing domain restrictions. Playwright browser automation ensures all navigation is constrained to user-approved domains.
3. Screenshot Evidence Requirements
MANDATORY: Every piece of extracted financial data must have screenshot evidence.
- Take a screenshot of each page containing financial data before extracting values
- Save ALL screenshots to the designated
reports/<company_name>/screenshots/ directory
- Use descriptive, sequential filenames (e.g.,
01-h1-2025-financials.png)
Required screenshot evidence (financial material only):
| Content Type | Screenshot Required |
|---|
| H1 2025 results/financials page | ✅ Yes |
| H2 2024 results/financials page | ✅ Yes |
| FY 2024 results page (if used for derivation) | ✅ Yes |
| H1 2024 results page (if used for derivation) | ✅ Yes |
| Any financial data table or summary | ✅ Yes |
| PDF pages containing extracted numbers | ✅ Yes |
Evidence chain: Every value in Table B must be traceable to a specific screenshot file referenced in the Evidence column.
4. Compliance Verification
Before completing the extraction, verify:
Output Directory Structure
All outputs must be saved in an organized directory structure. Before starting extraction, create the required directories.
reports/
└── <company_name>/
├── screenshots/
│ ├── 01-h1-2025-results.png
│ ├── 02-h1-2025-financials.png
│ ├── 03-fy-2024-results.png
│ ├── 04-fy-2024-financials.png
│ ├── 05-h1-2024-results.png
│ ├── 06-h1-2024-financials.png
│ └── ...
├── report.md # Final report with Tables A and B
└── sources.md # List of all source URLs used
Directory Setup
At the start of each extraction, run:
mkdir -p reports/<company_name>/screenshots
Replace <company_name> with a sanitized version of the company name (lowercase, hyphens instead of spaces, e.g., "Marks and Spencer" → "marks-and-spencer").
Screenshot Naming Convention
Save screenshots with sequential numbering and descriptive names. Only screenshot pages containing financial data:
01-h1-2025-results.png - H1 2025 results page
02-h1-2025-financials.png - H1 2025 financial highlights
03-fy-2024-results.png - FY 2024 results page
04-fy-2024-financials.png - FY 2024 financial highlights
05-h1-2024-results.png - H1 2024 results page (for derivation)
06-h1-2024-financials.png - H1 2024 financial highlights
Use Playwright screenshot tool with the filename parameter pointing to the appropriate path:
filename: "reports/<company_name>/screenshots/01-h1-2025-results.png"
Screenshot Hygiene Rules
- Always save to the correct directory - Never save screenshots to the root or working directory
- Use sequential numbering - Prefix with two-digit numbers (01, 02, 03...) to maintain order
- Use descriptive names - Include what the screenshot shows (e.g., "h1-2025-financials" not "screenshot3")
- Capture financial evidence only - Take screenshots of:
- Financial highlights/summary tables
- Results announcement pages with key figures
- Any page you extract numbers from
- PNG format preferred - Use PNG for clarity of text and numbers
Inputs (provided at runtime)
company_name (string) - Name of the company
company_website_url (required) - Company's official website URL. This is mandatory as all data must be sourced exclusively from the official company website.
Required Output (must produce two markdown tables)
Table A: Source metadata (columns = periods)
| Field | H1 2025 | H2 2024 |
|---|
| Document title | | |
| Period covered (exact dates if shown) | | |
| Currency + units (e.g. £m, $m) | | |
| Source URL (final page or PDF) | | |
| Evidence pointer (PDF page + table name, or HTML section heading) | | |
| Notes (reported vs adjusted, any caveats) | | |
Table B: Financials matrix (rows = line items, columns = periods)
| Line item | H1 2025 | H2 2024 | Notes (reported/adjusted/derived) | Evidence (page/section) |
|---|
| Revenue | | | | |
| Operating profit (or EBIT) | | | | |
| Profit before tax | | | | |
| Net profit attributable (or profit for period) | | | | |
| EPS basic | | | | |
| Free cash flow (or closest stated proxy) | | | | |
| Net debt / (net cash) | | | | |
| Dividend (declared/paid for period) | | | | |
Workflow
Step 0: Setup output directories
Before any browser navigation, create the output directory structure:
- Sanitize the company name (lowercase, replace spaces with hyphens)
- Create directories using Bash:
mkdir -p reports/<company_name>/screenshots
Step 1: Reach the official IR area
-
Navigate to company_website_url
- The URL must be provided by the user. If not provided, ask the user for the official company website URL before proceeding.
-
Take a snapshot to understand the page structure
-
Clear blockers:
- Accept cookies if a consent banner appears (look for "Accept", "Accept all", "I agree" buttons)
- Close modals or popups
- Dismiss region/country prompts
-
Find "Investors / Investor Relations / Results / Reports" via:
- Top navigation bar
- Hamburger/mobile menu (may need to click to expand)
- Footer links
- Site search functionality (keywords:
investor, results, reports, financial)
-
Click on the IR link to navigate to the Investor Relations section
Success check: You land on an IR section showing Results, Reports, Announcements, or Presentations.
Step 2: Find H1 2025
Within IR, look under sections like:
- "Results & presentations"
- "Reports"
- "Financial results"
- "Announcements"
- "Regulatory news"
Scan link titles for H1 2025 indicators:
- "H1 2025"
- "Half-year 2025"
- "Interim results 2025"
- "Six months ended [date] 2025"
- "Interim report 2025"
- "1H 2025"
- "First half 2025"
Open the most authoritative source:
Step 3: Find H2 2024 (handle the common "no standalone H2" case)
First, try to find an explicit H2/second-half document:
- "H2 2024"
- "Second half 2024"
- "Six months ended [date] 2024" (corresponding to H2 of their fiscal year)
- "2H 2024"
If no standalone H2 exists, use the derivation fallback:
-
Find FY 2024 results (full-year results announcement or annual report)
- Take screenshot:
reports/<company_name>/screenshots/03-fy-2024-results.png
- Take screenshot of financials:
reports/<company_name>/screenshots/04-fy-2024-financials.png
-
Find H1 2024 interim results
- Take screenshot:
reports/<company_name>/screenshots/05-h1-2024-results.png
- Take screenshot of financials:
reports/<company_name>/screenshots/06-h1-2024-financials.png
-
Derive additive H2 metrics using formula:
H2 2024 = FY 2024 - H1 2024
-
In Table A Notes and Table B Notes, clearly mark values as Derived and cite both FY and H1 sources
Important: Do NOT derive balance-sheet endpoints (like net debt) unless the document explicitly provides an H2 end-of-period figure. Balance sheet items are point-in-time, not cumulative.
Step 4: Extract numbers with evidence
For each period (or derived period):
-
Identify the clearest source:
- "Financial highlights" section
- Consolidated income statement
- Key figures / at a glance section
- Results summary table
-
For each line item, record:
- Value (the actual number)
- Currency and units (e.g., £m, $bn, EUR millions)
- Type: Whether it is reported or adjusted/underlying (do not mix without labelling)
-
For every captured number, store an evidence pointer:
- PDF: page number + table name (e.g., "p.3 Financial highlights")
- HTML: section heading + nearby table label (e.g., "Key figures section, Results summary table")
Step 5: Compile and save the output
- Create Table A with all source metadata
- Create Table B with all financial line items
- Ensure every cell has proper evidence pointers
- Mark derived values clearly with "Derived: FY2024 - H1 2024" in Notes column
- If any line items couldn't be found, note "Not disclosed" or "Not found"
- Save the final report to
reports/<company_name>/report.md
- Save a sources file listing all URLs used to
reports/<company_name>/sources.md
The final report.md should include:
- Company name and extraction date
- Table A (Source metadata)
- Table B (Financials matrix)
- Reference to screenshots folder for evidence
Browser Navigation Tips
Cookie Consent Patterns
Common button texts to look for:
- "Accept", "Accept all", "Accept cookies"
- "I agree", "Agree", "OK"
- "Allow", "Allow all"
- "Got it", "Understood"
Finding IR Links
Common navigation patterns:
- Top nav: "Investors", "Investor Relations", "Shareholders"
- Footer: "For Investors", "Investor Information"
- May be under "About" or "Company" dropdown
- Corporate sites often have separate IR subdomain (e.g., investors.company.com)
Results Page Patterns
Look for:
- Tabbed interfaces (Results, Reports, Presentations)
- Year filters or dropdowns
- Date-sorted list of announcements
- "Archive" sections for older reports
PDF Handling
When you encounter a PDF:
- Note the URL for the Source URL field
- If the PDF viewer is embedded, try to identify page numbers
- Look for "Financial highlights" typically on pages 1-5
- Income statement/P&L usually in first 10-15 pages
Example Usage
User: "Extract H1 2025 and H2 2024 financials for Unilever from https://www.unilever.com"
Expected workflow:
- Create directories:
mkdir -p reports/unilever/screenshots
- Navigate to unilever.com
- Find and click Investors link
- Navigate to Results section
- Locate H1 2025 interim results, take screenshots →
01-h1-2025-results.png, 02-h1-2025-financials.png
- Locate FY 2024 and H1 2024 (to derive H2 2024), take screenshots →
03-fy-2024-results.png, 04-fy-2024-financials.png, 05-h1-2024-results.png, 06-h1-2024-financials.png
- Extract all financial metrics with evidence
- Save
reports/unilever/report.md with Tables A and B
- Save
reports/unilever/sources.md with all source URLs
Error Handling
- If no URL provided: Ask the user to provide the official company website URL before proceeding. Do NOT search for or guess URLs.
- If H1 2025 not yet published: Report "H1 2025 results not yet available as of [date]"
- If company uses different fiscal year: Note the fiscal year end date and adjust period names accordingly
- If no IR section found on the provided website: Report the issue to the user rather than searching external sources
- If page requires login: Report "Login required - cannot access without authentication"
- If numbers appear in multiple currencies: Note primary reporting currency and any conversions