Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

ai-code-detection

Detect whether a piece of code or an entire software project was written by a human, AI, or some hybrid thereof. Use this skill whenever the user wants to audit a file, snippet, repo, or commit history for AI authorship signals; phrases like "is this AI-generated", "was this written by ChatGPT", "detect LLM code", "human or AI?", "check for AI authorship", "is this vibe-coded", or any request to judge, score, or explain the provenance of code. Also trigger when the user pastes code and asks "did a human write this?" or "does this look AI-generated?" — even casually phrased.

Exécuter dans Manus

Aperçu

Commande d'installation

npx skills add https://github.com/daedalus/skills --skill ai-code-detection

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

daedalus/skills

Étoiles1

Forks0

Mis à jour19 mai 2026 à 12:44

SKILL.md

readonly

name

ai-code-detection

description

AI Code Detection

Weight-of-evidence methodology. No single signal is conclusive; build a case across multiple axes. Prioritize evolutionary/temporal signals over aesthetic ones — style is increasingly gameable; history is not.

Core insight: The strongest signal of human authorship is historical inconsistency. Humans forget, pivot, leave scars, partially refactor, and change opinions mid-project. LLMs optimize toward coherence. A sufficiently coherent large codebase is itself suspicious.

Step 0 — Classify the target

Before collecting evidence, determine category. The rubric produces one of:

Category	Description
Human	No meaningful AI assistance
Human + AI-assisted	Human architecture/thought, AI drafting/refactoring
AI-generated + human-edited	AI produced bulk, human polished
AI-generated	Mostly machine-originated
Indeterminate	Insufficient evidence

A senior engineer using Claude for boilerplate can look "AI-ish" while remaining fundamentally human-driven. Collapsing these into a binary verdict loses information.

Axis 1 — Evolutionary signals (highest confidence)

Only applicable to repos with history. Weight this axis above all others.

Human indicators (history encodes process):

Bug-introducing commits followed by fixes
Refactors that partially migrate patterns — old and new coexist
Inconsistent architecture that reflects changing opinions over time
Deprecated code surviving longer than expected
Temporary instrumentation later removed (print, metrics, flags)
Performance hacks added after profiling (not speculatively)
Emotional or frustrated commit messages
Localized competence: one subsystem brilliant, another messy
Migration scars: evidence of abandoned approaches

AI indicators (history is synthetic):

Entire architecture appears fully formed in early commits
Uniform code quality across all modules from day one
Large feature landings with minimal iteration
No migration scars, no dead abstractions
No abandoned experiments
Perfectly synchronized style across contributors and files
Commit cadence aligned with prompting cycles (bursts, then silence)
Sudden competence jumps inconsistent with prior work in the repo

Axis 2 — Variance / entropy signals (strong)

Humans are uneven. AI tends toward statistical smoothness.

Measure (qualitatively or with tooling):

Function length variance: human repos have fat tails; AI clusters tightly
Comment density variance: humans have files with zero comments and files with excessive ones; AI is uniformly moderate
Cyclomatic complexity distribution: human hot spots vs. AI uniform medium
Abstraction depth: humans overengineer one thing and underengineer another; AI maintains consistent depth throughout
Naming quality: uniformly medium-good naming across a large repo is suspicious; humans have naming that reflects when they wrote something and how tired they were

Low variance across a large repo is a red flag, not a verdict.

Axis 3 — Operational scar signals (strong for production code)

LLMs are strongest at greenfield code and clean abstractions. They are weaker at encoding the trauma of production systems.

Human indicators (systems encode history):

Comments referencing incidents: # race condition seen in prod 2023-08
Compatibility hacks: if sys.version_info < (3, 8):
Weird retry logic that doesn't match textbook backoff
# don't touch this with no explanation
Vendor-specific workarounds
Magic constants from production failures with no comment
Defensive paranoia around specific race conditions
# TODO: remove after $vendor fixes their shit

AI indicators (systems are idealized):

Textbook retry/timeout patterns
Clean abstractions where reality is usually ugly
No vendor-specific ugliness
Error handling that covers the happy path failures but not the weird ones

Axis 4 — Style signals (weakest; treat as corroborating only)

Deprecated / decaying signals — increasingly describe good engineers using Copilot, not AI authorship per se:

Type hints everywhere
Docstrings everywhere
set -euo pipefail
Proper error handling
Exhaustive happy-path tests

Still useful style signals (weight lightly):

Comments explain what not why
Uniform naming: process_data, handle_request, validate_input everywhere
Symmetric structure: every if has else, every try has finally
Error messages that read like documentation
Over-parameterized abstractions: config=None, verbose=False, timeout=30 on everything, never varied by callers

Axis 5 — Adversarial awareness

If evasion is suspected, look for:

Humans trying to look human (synthetic mess):

Intentionally added TODOs that don't correspond to real gaps
Fake debug statements
Synthetic "wip" commits with no actual incremental progress
Artificially varied comment density

Humans accidentally exposing AI use:

Stylistic phase shifts mid-repo (before/after Copilot adoption)
Commit bursts aligned with prompting cycles
Architecture sophistication inconsistent with prior work
Two subsystems that look like they were written by different people — because one was prompted differently

Future trajectory: AI systems will increasingly reduce detectability via repo-conditioned generation, personalized fine-tuning, and simulated iterative development. Temporal/variance signals will outlast style signals.

Workflow

Step 1 — Determine scope

Snippet / single file: axes 4 only; flag low confidence explicitly.
File + neighbors: axes 2 and 4.
Full repo with history: all axes; weight 1 and 3 heavily.

Step 2 — Collect evidence

For each applicable axis, list concrete observations tagged [H] (human signal) or [AI] (AI signal). Note "not applicable" for axes requiring history when only a snippet is provided.

Step 3 — Verdict

Scope: <snippet | file | repo>

Evidence:
  Evolutionary:   [H: ...] / [AI: ...] / [N/A]
  Variance:       [H: ...] / [AI: ...]
  Operational:    [H: ...] / [AI: ...] / [N/A]
  Style:          [H: ...] / [AI: ...]
  Adversarial:    [signals if present]

Category: <Human | Human+AI-assisted | AI-generated+human-edited | AI-generated | Indeterminate>
Confidence: <Low | Medium | High>

Key tells: <top 2–3 observations that drove the verdict>
What would change it: <additional evidence that would shift the assessment>

Uncertainty Handling

"Indeterminate" is a valid and often correct verdict. State what additional evidence would resolve it:

Git blame / timestamps / commit graph
Author's other known work for comparison
Diff between stated and actual author competence level
Specialized tooling (GPTZero Code, DetectGPT variants) — treat as one signal, not ground truth

Plus depuis ce dépôt

même dépôt

ai-code-review

daedalus/skills

Orchestrate multi-agent AI code review on a git diff or merge request. Use this skill whenever the user wants to review code changes with AI, analyze a diff, audit a pull request or merge request, check for bugs/security issues/performance problems, or set up an automated code review pipeline. Trigger even for casual phrasing like "can you review this PR", "check my diff for issues", "look over these changes", or "what do you think of this code change". Always use this skill when code review, diff analysis, or MR/PR review is involved — do not attempt ad-hoc review without it. Do NOT trigger for reviewing prose, essays, documentation-only files, or non-code content.

2026-05-301

adhd-reasoning-mode

daedalus/skills

Apply exploratory, curiosity-driven reasoning inspired by ADHD-associated cognitive traits — including curiosity-biased attention, associative jumps across distant domains, interrupt-driven anomaly detection, hyperfocus under uncertainty, and parallel weak-stream ideation. Use this skill whenever the user asks for: creative brainstorming, cross-domain analogies, unconventional problem-solving, research hypothesis generation, adversarial/security thinking, scientific discovery tasks, or any time the user says "think outside the box", "what am I missing", "explore weird angles", "be creative", "ADHD mode", or "exploratory reasoning". Also trigger when a conventional answer would be too narrow, too domain-local, or when the problem space benefits from wide associative search before convergence. Trigger mid-task too: if reasoning has stayed in one domain for several steps without surprise, this skill applies even if it wasn't requested upfront.

2026-05-281

python-project-scaffold

daedalus/skills

Full Python project bootstrapping workflow. Use this skill whenever the user wants to build a new Python tool, library, CLI, or module from scratch — especially when they mention "create X", "build X in Python", "write a Python project for X", or ask for a proper project with tests, linting, versioning, or git setup. Triggers on any request to scaffold, initialize, or structure a new Python project. Even if the user only says "build me X in Python", apply this skill — it encodes the full professional workflow: SPEC → implementation → pytest → README → lint → git. Always use this skill rather than improvising a one-off script when the deliverable is a reusable project.

2026-05-261

alphaproof-nexus

daedalus/skills

Knowledge scaffold for building, using, or reasoning about AlphaProof Nexus — Google DeepMind's LLM-aided formal proof search system (arXiv:2605.22763). Always use this skill for ANY of the following: AI-driven theorem proving in Lean 4, reproducing or extending the AlphaProof Nexus agent architecture, solving open mathematics problems with formal verification, integrating evolutionary algorithms with LLM proof search, applying the system to Erdős problems / OEIS conjectures / algebraic geometry / optimization / graph theory, understanding the EVOLVE-BLOCK / EVOLVE-VALUE prompt interface, comparing the four agent configurations (A/B/C/D), or the Elo/P-UCB sketch rating mechanism. Also trigger for adjacent queries like "automate math research with AI", "connect Lean compiler feedback to an LLM loop", "cheapest way to prove hard math with AI", "reproduce a DeepMind theorem prover", "LLM + formal verification pipeline", or anything about AlphaProof, AlphaEvolve applied to proofs, or the Formal Conjectures benchm

2026-05-241

os-bootstrap

daedalus/skills

Bootstrap the creation of a POSIX-like operating system kernel from scratch. Use this skill whenever someone wants to build, start, or plan a kernel or OS — including requests like "help me write an OS", "I want to build a kernel", "start an operating system project", "implement POSIX syscalls", "build a process scheduler", "write a VFS layer", "implement memory management for my kernel", "create a bootable system", or any request involving kernel internals (interrupts, paging, scheduling, file systems, system calls). Also trigger when someone wants to extend an existing hobby OS with a new kernel subsystem. This skill covers both project scaffolding AND deep technical implementation guidance — use it for either or both.

2026-05-241

social-engineering-jailbreak

daedalus/skills

Analyze, reproduce, and defend against social engineering jailbreaks on LLMs — attacks that exploit psychological compliance patterns rather than technical prompt injection. Use this skill whenever the user wants to: map a manipulation sequence move-by-move, generate a social engineering attack transcript against a target model, evaluate a model's resistance to psychosocial pressure, audit a conversation for coercive structure, or build robustness evals for manipulation-vector attacks. Trigger on phrases like: "jailbreak without injection", "psychopathy jailbreak", "social engineering an LLM", "manipulation sequence", "coercive compliance", "identity reframe", "authority jailbreak", "test model against social pressure", "does the model resist gaslighting", "analyze this conversation for manipulation", "incremental escalation", "grooming pattern", "commitment and consistency exploit", "why did the model comply", or any request to understand why an LLM failed to hold a boundary under conversational pressure rat

2026-05-191

Source

daedalus

daedalus/skills

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques15-1253L4

name

ai-code-detection

description

AI Code Detection

Core insight: The strongest signal of human authorship is historical inconsistency. Humans forget, pivot, leave scars, partially refactor, and change opinions mid-project. LLMs optimize toward coherence. A sufficiently coherent large codebase is itself suspicious.

Step 0 — Classify the target

Before collecting evidence, determine category. The rubric produces one of:

Category	Description
Human	No meaningful AI assistance
Human + AI-assisted	Human architecture/thought, AI drafting/refactoring
AI-generated + human-edited	AI produced bulk, human polished
AI-generated	Mostly machine-originated
Indeterminate	Insufficient evidence

A senior engineer using Claude for boilerplate can look "AI-ish" while remaining fundamentally human-driven. Collapsing these into a binary verdict loses information.

Axis 1 — Evolutionary signals (highest confidence)

Only applicable to repos with history. Weight this axis above all others.

Human indicators (history encodes process):

Bug-introducing commits followed by fixes
Refactors that partially migrate patterns — old and new coexist
Inconsistent architecture that reflects changing opinions over time
Deprecated code surviving longer than expected
Temporary instrumentation later removed (print, metrics, flags)
Performance hacks added after profiling (not speculatively)
Emotional or frustrated commit messages
Localized competence: one subsystem brilliant, another messy
Migration scars: evidence of abandoned approaches

AI indicators (history is synthetic):

Entire architecture appears fully formed in early commits
Uniform code quality across all modules from day one
Large feature landings with minimal iteration
No migration scars, no dead abstractions
No abandoned experiments
Perfectly synchronized style across contributors and files
Commit cadence aligned with prompting cycles (bursts, then silence)
Sudden competence jumps inconsistent with prior work in the repo

Axis 2 — Variance / entropy signals (strong)

Humans are uneven. AI tends toward statistical smoothness.

Measure (qualitatively or with tooling):

Function length variance: human repos have fat tails; AI clusters tightly
Comment density variance: humans have files with zero comments and files with excessive ones; AI is uniformly moderate
Cyclomatic complexity distribution: human hot spots vs. AI uniform medium
Abstraction depth: humans overengineer one thing and underengineer another; AI maintains consistent depth throughout
Naming quality: uniformly medium-good naming across a large repo is suspicious; humans have naming that reflects when they wrote something and how tired they were

Low variance across a large repo is a red flag, not a verdict.

Axis 3 — Operational scar signals (strong for production code)

LLMs are strongest at greenfield code and clean abstractions. They are weaker at encoding the trauma of production systems.

Human indicators (systems encode history):

Comments referencing incidents: # race condition seen in prod 2023-08
Compatibility hacks: if sys.version_info < (3, 8):
Weird retry logic that doesn't match textbook backoff
# don't touch this with no explanation
Vendor-specific workarounds
Magic constants from production failures with no comment
Defensive paranoia around specific race conditions
# TODO: remove after $vendor fixes their shit

AI indicators (systems are idealized):

Textbook retry/timeout patterns
Clean abstractions where reality is usually ugly
No vendor-specific ugliness
Error handling that covers the happy path failures but not the weird ones

Axis 4 — Style signals (weakest; treat as corroborating only)

Deprecated / decaying signals — increasingly describe good engineers using Copilot, not AI authorship per se:

Type hints everywhere
Docstrings everywhere
set -euo pipefail
Proper error handling
Exhaustive happy-path tests

Still useful style signals (weight lightly):

Comments explain what not why
Uniform naming: process_data, handle_request, validate_input everywhere
Symmetric structure: every if has else, every try has finally
Error messages that read like documentation
Over-parameterized abstractions: config=None, verbose=False, timeout=30 on everything, never varied by callers

Axis 5 — Adversarial awareness

If evasion is suspected, look for:

Humans trying to look human (synthetic mess):

Intentionally added TODOs that don't correspond to real gaps
Fake debug statements
Synthetic "wip" commits with no actual incremental progress
Artificially varied comment density

Humans accidentally exposing AI use:

Stylistic phase shifts mid-repo (before/after Copilot adoption)
Commit bursts aligned with prompting cycles
Architecture sophistication inconsistent with prior work
Two subsystems that look like they were written by different people — because one was prompted differently

Workflow

Step 1 — Determine scope

Snippet / single file: axes 4 only; flag low confidence explicitly.
File + neighbors: axes 2 and 4.
Full repo with history: all axes; weight 1 and 3 heavily.

Step 2 — Collect evidence

For each applicable axis, list concrete observations tagged [H] (human signal) or [AI] (AI signal). Note "not applicable" for axes requiring history when only a snippet is provided.

Step 3 — Verdict

Scope: <snippet | file | repo>

Evidence:
  Evolutionary:   [H: ...] / [AI: ...] / [N/A]
  Variance:       [H: ...] / [AI: ...]
  Operational:    [H: ...] / [AI: ...] / [N/A]
  Style:          [H: ...] / [AI: ...]
  Adversarial:    [signals if present]

Category: <Human | Human+AI-assisted | AI-generated+human-edited | AI-generated | Indeterminate>
Confidence: <Low | Medium | High>

Key tells: <top 2–3 observations that drove the verdict>
What would change it: <additional evidence that would shift the assessment>

Uncertainty Handling

"Indeterminate" is a valid and often correct verdict. State what additional evidence would resolve it:

Git blame / timestamps / commit graph
Author's other known work for comparison
Diff between stated and actual author competence level
Specialized tooling (GPTZero Code, DetectGPT variants) — treat as one signal, not ground truth