name	llm-binary-deobfuscation
description	Deobfuscating binary code remains a fundamental challenge in reverse engineering, as obfuscation is widely used to hinder analysis and conceal program logic. Although large language models (LLMs) have... Activation: LLM, reverse engineering

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

Overview

Deobfuscating binary code remains a fundamental challenge in reverse engineering, as obfuscation is widely used to hinder analysis and conceal program logic. Although large language models (LLMs) have shown promise in recovering semantics from obfuscated binaries, a systematic evaluation of their effectiveness is still lacking. In this work, we present BinDeObfBench, the first comprehensive benchmark for assessing LLM-based binary deobfuscation across diverse transformations spanning pre-compilation, compile-time, and post-compilation stages. Our evaluation shows that deobfuscation performance depends more on reasoning capability and domain expertise than on model scale, and that task-specific supervised fine-tuning consistently outperforms broad domain pre-training. Reasoning models can maintain robustness under severe obfuscation, generalize across different instruction set architectures (ISAs) and optimization levels. In-context learning benefits standard models but yields limited gains for reasoning models. Overall, our study highlights the importance of task-specific fine-tuning and reasoning-driven strategies, and positions BinDeObfBench as a basis for future work in binary deobfuscation.

Source Paper

Title: Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation
Authors: Li Hu, Xiuwei Shang, Jieke Shi, Shaoyin Cheng, Junqi Zhang, Gangyang Li, Zhou Yang, Weiming Zhang, David Lo
arXiv: 2604.08083v1
Categories: cs.SE
Published: 2026-04-09
PDF: https://arxiv.org/pdf/2604.08083v1

Core Concepts

Robust control

Key Contributions

Novel theoretical framework
Practical implementation guidelines
Experimental validation

Practical Applications

Application 1: Research Implementation

# Example implementation based on paper methodology
# See original paper for complete details
def apply_methodology():
    """
    Apply the methodology from the paper.
    """
    # TODO: Implement based on paper specifications
    pass

References

Li Hu et al. (2026). "Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation." arXiv:2604.08083v1.

Activation Keywords

LLM, reverse engineering
systems engineering
research paper

Instructions for Agents

使用此技能时遵循以下流程：

理解问题：分析输入需求和约束条件
选择方法：根据场景选择合适的技术方案
执行操作：按照方法论实施具体步骤
验证结果：检查结果是否符合预期

Examples

Example 1: Basic Usage

User: 请帮我应用此技能

Agent: 我将按照标准流程执行...

Example 2: Advanced Usage

User: 有更复杂的场景需要处理

Agent: 针对复杂场景，我将采用以下策略...

Tools Used

exec
read
write

More from this repository

same repository

qldpc-breakeven-demonstration

hiyenwong/ai_collection

Breakeven demonstration of quantum low-density parity-check (qLDPC) codes — first experimental evidence that qLDPC codes can achieve fault-tolerance breakeven on trapped-ion quantum hardware. Critical milestone for scalable quantum error correction. Activation: qLDPC, quantum error correction, breakeven, trapped-ion, fault tolerance, quantum coding, logical qubit, error suppression.

2026-06-081

amm-fairness-impossibility

hiyenwong/ai_collection

Arrovian impossibility theorem for Automated Market Maker (AMM) design. Proves no aggregation rule for weighted-product AMMs can be simultaneously fair and strategy-proof when n>2 liquidity providers. Key result: fairness forces mean-type aggregation (weighted Aitchison centroid) while strategy-proofness forces median-type; only single-provider dictatorship satisfies both. Obstruction vanishes at n=2. Applies to DeFi protocol design, mechanism design, and prediction markets. (arXiv: 2606.04959)

2026-06-081

bbqram-state-preparation-finance

hiyenwong/ai_collection

Architecture-aware quantum state preparation using Bucket Brigade QRAM (BBQRAM) with segment tree for polylogarithmic query time. Covers complex-valued matrix encoding, classical precomputation of rotation angles, and magnitude-then-phase procedures. Enables efficient data loading for quantum finance applications. Based on arXiv:2604.25644. Use when: designing QRAM-based quantum data loaders, optimizing state preparation for quantum finance, loading complex-valued financial data into quantum circuits, implementing efficient amplitude encoding with BBQRAM.

2026-06-081

distributional-portfolio-optimization

hiyenwong/ai_collection

Distributional Portfolio Optimization (DPO) unified framework — organizing Bayesian, robust, chance-constrained, stochastic-allocation, and distributional RL portfolio methods through joint coupling Gamma_theta(dw,dr). Includes Wasserstein-CVaR duality, credible-radius calibration, and distributional Bellman contraction. Activation: distributional portfolio optimization, DPO, Wasserstein DRO, Bayesian portfolio, CVaR, credible radius, distributional reinforcement learning.

2026-06-081

inverse-born-rule-fallacy

hiyenwong/ai_collection

Critical analysis methodology for quantum data encoding — identifies how naive amplitude encoding (psi=sqrt(P)) abelianizes the Hilbert space and fails to achieve genuine quantum advantage in QML/finance. Advocates for Dynamical Hamiltonian Encoding (DHE) where data generates non-commutative evolution.

2026-06-081

portfolio-optimization-mean-variance-spectrum

hiyenwong/ai_collection

Portfolio Optimization with Mean-Variance-Spectrum Preferences

2026-06-081

name	llm-binary-deobfuscation
description	Deobfuscating binary code remains a fundamental challenge in reverse engineering, as obfuscation is widely used to hinder analysis and conceal program logic. Although large language models (LLMs) have... Activation: LLM, reverse engineering

Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

Overview

Source Paper

Title: Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation
Authors: Li Hu, Xiuwei Shang, Jieke Shi, Shaoyin Cheng, Junqi Zhang, Gangyang Li, Zhou Yang, Weiming Zhang, David Lo
arXiv: 2604.08083v1
Categories: cs.SE
Published: 2026-04-09
PDF: https://arxiv.org/pdf/2604.08083v1

Core Concepts

Robust control

Key Contributions

Novel theoretical framework
Practical implementation guidelines
Experimental validation

Practical Applications

Application 1: Research Implementation

# Example implementation based on paper methodology
# See original paper for complete details
def apply_methodology():
    """
    Apply the methodology from the paper.
    """
    # TODO: Implement based on paper specifications
    pass

References

Li Hu et al. (2026). "Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation." arXiv:2604.08083v1.

Activation Keywords

LLM, reverse engineering
systems engineering
research paper

Instructions for Agents

使用此技能时遵循以下流程：

理解问题：分析输入需求和约束条件
选择方法：根据场景选择合适的技术方案
执行操作：按照方法论实施具体步骤
验证结果：检查结果是否符合预期

Examples

Example 1: Basic Usage

User: 请帮我应用此技能

Agent: 我将按照标准流程执行...

Example 2: Advanced Usage

User: 有更复杂的场景需要处理

Agent: 针对复杂场景，我将采用以下策略...

Tools Used

exec
read
write