| name | attribute |
| version | 1 |
| description | Attribute leaked environment data to victim companies by analyzing ownership signals. Use when analyzing breach data, supply chain attack artifacts, or exfiltrated environment snapshots to identify which organization was compromised. Trigger for questions about victim identification, leaked credential attribution, or breach victimology. |
| author | ramimac |
| argument-hint | ["target-directory"] |
Leaked Data Attribution
Identify the victim organization from leaked environment data by analyzing ownership signals - platform metadata, private infrastructure, corporate domains, credentials, and unique identifiers.
Core Principle: Organizational vs Individual Signals
Attribution requires distinguishing between:
Organizational signals - Directly identify an organization:
- Platform organization names (GitHub org, Azure DevOps collection, GitLab namespace)
- Corporate infrastructure (private registries, self-hosted tools)
- Enterprise accounts (SSO redirects, tenant IDs)
Individual signals - Identify a person who may work for an organization:
- Personal tokens and credentials
- Email addresses
- Usernames and profile fields
The key question: Does this individual represent the organization, or are they incidental to the data?
Analysis Workflow
Step 1: Extract and Catalog All Signals
Before attributing, systematically extract everything that could identify ownership.
Domains to extract:
- Email domains (from any email address found)
- URL domains (from any URL - registries, APIs, webhooks, configs)
- Hostname patterns (machine names, DNS suffixes)
- Proxy/network configuration domains
Identifiers to extract:
- Platform org/user names (GitHub, GitLab, Azure DevOps, etc.)
- Tenant/account IDs (Azure tenant, AWS account, GCP project)
- Workspace/team IDs (Slack team ID, Atlassian org)
- Repository paths and namespaces
High-entropy strings to note:
- API keys and tokens (can be validated/enriched via APIs)
- Webhook URLs (contain embedded IDs)
- JWT tokens (contain issuer, claims)
- Connection strings (contain hostnames, accounts)
Unique patterns to flag:
- Custom environment variable prefixes (e.g.,
ACME_*)
- Asset naming conventions in hostnames
- Internal tool names or project codes
- Scoped package names (
@company/package)
Step 2: Analyze Organizational Platform Signals
Examine signals that directly identify an organization through platform metadata.
Platform organizations:
- GitHub repository owner → Verify it's an Organization (not User) via API
- Azure DevOps collection URI → Extract org from path
- GitLab project namespace → Root namespace = org
- CircleCI, Bitbucket, Buildkite, Drone, Travis → Org/workspace fields
Enterprise indicators:
- Self-hosted platform domains (e.g.,
gitlab.acme.com = Acme owns it)
- Enterprise SSO/SAML redirects (GitHub Enterprise Cloud detection)
- Tenant IDs that resolve to organization names
What to look for:
- Org names in platform-specific environment variables
- Repository URLs with org in path
- OIDC token claims containing owner/org fields
Confidence: HIGH when verified as organization
Step 3: Analyze Infrastructure Signals
Private or self-hosted infrastructure indicates organizational ownership.
Domains indicating ownership:
- Private registry domains (
npm.acme.com, artifactory.acme.io)
- Self-hosted tool domains (
vault.acme.com, sentry.acme.internal)
- Internal Git server domains
Identifiers in infrastructure URLs:
- Org names embedded in paths (
pkgs.dev.azure.com/{org}/...)
- Tenant subdomains (
{company}.jfrog.io)
- Account-specific endpoints
Package scopes:
- Scoped npm packages (
@acme/package) pointing to private registries
- Private PyPI indexes
- Go private module patterns
Confidence: HIGH when on identifiable corporate domain
Step 4: Analyze Domain Signals
Extract and analyze all domains found in the data.
High-value domain sources:
- Email addresses → Corporate email = strong signal
- Git commit metadata → Author/committer email domains
- Proxy bypass lists (
no_proxy) → Often contain internal domains
- URL configurations → API endpoints, webhooks, service URLs
Domain extraction from:
- Full URLs → Parse out hostname
- Email addresses → Extract domain portion
- Hostnames → Extract DNS suffix patterns
- Configuration values → Look for embedded URLs/domains
Filtering - skip these:
| Category | Examples |
|---|
| Localhost | localhost, 127.0.0.1, .local |
| Internal suffixes | .internal, .corp, .lan, .intranet |
| Placeholders | localdomain.com, example.com |
| Cloud providers | amazonaws.com, azure.com, googleapis.com |
| Public platforms | github.com, gitlab.com, npmjs.org |
| Personal email | gmail.com, yahoo.com, outlook.com |
Confidence: MEDIUM for corporate domains; needs corroboration
Step 5: Analyze High-Entropy Strings and Credentials
Tokens and secrets can be validated or enriched to reveal organizational context.
API tokens - validate and enrich:
- GitHub PATs → Call
/user to get profile with company field
- Slack tokens → Call
auth.test to get workspace name
- npm tokens → Profile lookup reveals org memberships
- Cloud credentials → Often return account metadata on validation
Webhook URLs - extract embedded IDs:
- Slack webhooks → Team ID in path (
/services/{TEAM_ID}/...)
- Other webhooks → May contain account/workspace identifiers
JWT tokens - decode and examine:
- Issuer domain (
iss claim) → May indicate organization
- Subject/audience claims → May contain org identifiers
- Custom claims → Platform-specific org information
Connection strings - parse components:
- Database hostnames → May be corporate infrastructure
- Account names → May embed organization
- Server URLs → Domain analysis
Confidence: Varies - validated tokens = HIGH; unvalidated = LOW
Step 6: Evaluate Individual Signals
When signals are tied to individuals (not organizations), extra validation is needed.
Individual signal types:
- Personal tokens and API keys
- Email addresses (could be personal or corporate)
- Usernames and profile fields
- Personal tool configurations
The problem:
- Credentials may not belong to the victim
- Profile fields are self-reported and may be stale
- Individual context may not reflect organizational affiliation
Validation approaches:
- Does individual identity match other context in the data?
- Do multiple individual signals point to the same organization?
- Is there corroboration from organizational signals?
Confidence: LOW unless corroborated by organizational signals
Step 7: Cross-Reference and Corroborate
Combine signals to build confidence.
Cross-referencing:
- Does the email domain match the platform org?
- Does the private registry domain match other infrastructure?
- Do multiple independent signals point to the same company?
Alias resolution:
- Map cryptic org names to company names (e.g.,
acme-dev → Acme Corp)
- Look for company prefixes/suffixes in org names
- Cross-reference org names with email domains
Confidence boosting:
| Condition | Confidence Impact |
|---|
| Multiple corroborating signals | → HIGH |
| Enterprise/Fortune 500 match | → HIGH |
| API-verified organization | → HIGH |
| Single organizational signal | → MEDIUM |
| Single weak/individual signal | → LOW |
| Contradictory signals | → Manual review |
Step 8: Resolve Ambiguity
Contradictory signals:
- Prioritize organizational signals over individual signals
- Prioritize infrastructure domains over email domains
- Consider client/vendor relationships (one org using another's tools)
- Flag for manual review if unresolved
Personal accounts mistaken for organizations:
- If API lookup reveals personal account → Do not attribute as company
- Check user's company profile field (LOW confidence)
- Look for other organizational signals
No clear signals:
- Document what signals exist
- Note confidence as LOW or NONE
- Identify enrichment opportunities
Signal Reliability Reference
HIGH Confidence
- Verified platform organization (API confirmed)
- Self-hosted infrastructure on corporate domain
- Enterprise SSO/SAML configurations
- Multiple corroborating organizational signals
- Validated credentials returning org metadata
MEDIUM Confidence
- Corporate email domains
- Private registry domains (without full verification)
- Unverified organization names
- Workspace/team names from collaboration tools
LOW Confidence
- Hostname patterns without domain context
- Cloud resource naming conventions
- JWT token issuer domains (unvalidated)
- Individual profile fields without corroboration
- Single uncorroborated signal
Signals to AVOID
- Collection/exfiltration paths - Attacker's infrastructure, not victim
- Package author metadata - Package creator, not consumer
- Uncorroborated individual signals - May not represent the victim
Enrichment Tactics
Credential Validation
Validate tokens to extract organizational metadata:
- GitHub →
/user endpoint for company field
- Slack →
auth.test for workspace info
- npm → Profile lookup for org memberships
- Cloud platforms → Metadata APIs for account info
ID Resolution
Resolve opaque identifiers to names:
- Slack team IDs → API or browser lookup
- Azure tenant IDs → Organization name resolution
- AWS account IDs → (limited without access)
- Platform org IDs → API lookups
Org Name Mapping
Map cryptic names to companies:
- Look for company prefixes/suffixes
- Cross-reference with domain signals
- Build alias dictionary from confirmed mappings
Domain Intelligence
Enrich domains with context:
- WHOIS/DNS lookups for domain ownership
- Certificate transparency for related domains
- Known corporate domain databases
Quick Checklist
When analyzing leaked environment data: