AI SecurityApr 22, 2026

The Cost of Understanding: LLM-Driven Reverse Engineering vs Iterative LLM Obfuscation — Elastic Security Labs

Elastic Security Labs demonstrates LLM-driven reverse engineering versus obfuscation in adversarial arms race.

Summary

Elastic Security Labs published research on the escalating arms race between LLM-driven reverse engineering and program obfuscation techniques. The study benchmarked Claude Opus 4.6 against binaries obfuscated with Tigress, showing that LLMs can defeat various obfuscation levels but heavy obfuscation significantly increases computational costs. Researchers developed LLM-targeting static analysis countermeasures and identified that effective defenses exploit context windows, budget caps, and model shortcut biases.

Full text

21 April 2026•Cyril François•Daniel Stepanic•Jia Yu ChanThe Cost of Understanding: LLM-Driven Reverse Engineering vs Iterative LLM ObfuscationElastic Security Labs explores the ongoing arms race between LLM-driven reverse engineering and obfuscation.23 min readGenerative AI, Detection Engineering, Malware AnalysisIntroduction Over the past few years, we have observed a significant evolution in the capabilities of LLMs to be productive and to carry out various tasks that address real-world problems, such as program synthesis, malware research, or vulnerability research. Specifically in the context of reverse engineering, LLMs are particularly effective given the right tools because they are very good at reading source code even without symbols. Not only that, thanks to their knowledge, they are capable of imitating and applying reversing methodologies. Program obfuscation methods create a significant asymmetry between the time required to apply the transformations to a program and the time required to reverse-engineer it, providing a relatively effective defense against reverse engineering and putting pressure on researchers to waste time and develop new methods. The emergence of LLMs has significantly changed the game, as models are now capable of breaking these obfuscations (depending on the transformations applied) in a reasonable amount of time, thus reversing this asymmetry in favor of the attacker. Nevertheless, in this cat-and-mouse game, we assume that it is only a matter of time before obfuscator manufacturers adapt with new techniques and raise the bar, just as, to face this new reality where reverse engineering has never been so accessible, software producers systematically apply these transformations to protect their intellectual property. Twice a year, Elastic offers engineers the opportunity to undertake a one-week research project during ON Week. For this April 2026 session, inspired by this article, we researched how cheap and easy it is to vibecode obfuscation techniques targeted against LLMs, specifically Claude Opus 4.6. This research will cover an initial benchmark we conducted, in which we tested the model against targets compiled with various combinations of transformations using the academic (but very powerful) Tigress obfuscator. Then we follow with our research of different obfuscation techniques we have found effective against the model, which were completely vibecoded using a dev/test/improve AI-driven pipeline. Due to time constraints, we focused on static-analysis defenses. However, we think with no doubt that the workflow we have used can also be used to research ideas focused on dynamic-analysis defenses, such as evasion and anti-debug techniques, to make LLM-driven analysis significantly more expensive and unreliable. Key takeaways LLMs have rapidly reshaped the software industry, making complex topics such as reverse engineering more accessible, including the ability to defeat various levels of obfuscation Heavy obfuscation dramatically inflates computational cost and time, disrupting automated analysis pipelines Effective LLM-targeting static analysis countermeasures are cheap and fast to develop Successful LLM defenses exploit context windows, budget caps, and shortcut biases Claude Opus 4.6 vs Tigress Obfuscator benchmark We used Claude to benchmark its ability to statically solve a crackme obfuscated with the academic obfuscator Tigress. Benchmark pipeline To carry out these tests, we used a controller/worker setup in which one Opus instance manages sub-instances: it monitors their progress, collects their results, and can allocate more time to an instance if it judges that it is making progress and has potential. Conversely, it can also kill the instance if it estimates that the model is stuck in its task, going in circles, or starting to brute-force the problem. Each worker sub-instance has access to a Windows virtual machine with IDA Pro installed and accessible via the IDA MCP plugin. It also has access to the resources of the Linux virtual machine it runs in for developing and launching scripts. In addition, we use the Caveman plugin, compatible with Claude, which reduces LLM fluff talking up to -75% with the right instructions at startup. This increases work velocity and reduces the cost of each task. We use it in its default mode. This setup allows each worker instance to start the test with an empty context and a classic reverse-engineering prompt, so it does not know it is being monitored as part of the benchmark. Evaluation system For the scoring, each target is scored by the controller instance on three axes (0–2 points each), for a maximum of six points: Axis210Algorithm IdentificationCorrectly identified multi-round XOR with LCG key derivation from seedPartial — found XOR or cipher, but missed key schedule or roundsWrong or gave upPassword RecoveryExact password r3v3rs3!Found seed, expected bytes, or partial key derivation, but didn't completeNothingAnalytical DepthFull internals: seed, LCG constants, 4 rounds, XOR+rotate, inversionSome components, but an incomplete pictureSurface-level only Test cases To perform these tests, we used the following challenge: recover the password r3v3rs3! by statically reverse-engineering the compiled binary. // Run 2 crackme — 4-round XOR cipher with LCG key schedule // Password "r3v3rs3!" only recoverable by reversing the algorithm. // No key array in the binary — only a 32-bit seed. unsigned int key_seed = 0x5EED1234u; unsigned char enc_expected[8] = { 0x1a, 0xcb, 0x74, 0xaa, 0x1a, 0x8b, 0x31, 0xb8 }; void transform(const char *input, unsigned char *output, int len) { unsigned int s = key_seed; unsigned int subkeys[4]; // Key schedule: derive 4 round subkeys via glibc LCG for (int r = 0; r < 4; r++) { s = s * 1103515245u + 12345u; subkeys[r] = s; } // Copy input to 8-byte buffer (zero-padded) for (int i = 0; i < 8; i++) output[i] = (i < len) ? (unsigned char)input[i] : 0; // 4 rounds: XOR with subkey bytes, then rotate left by 1 for (int r = 0; r < 4; r++) { for (int i = 0; i < 8; i++) output[i] ^= (unsigned char)(subkeys[r] >> (8 * (i & 3))); unsigned char tmp = output[0]; for (int i = 0; i < 7; i++) output[i] = output[i + 1]; output[7] = tmp; } } int verify(const unsigned char *transformed, int len) { if (len != 8) return 0; for (int i = 0; i < 8; i++) if (transformed[i] != enc_expected[i]) return 0; return 1; } // main(): reads argv[1], calls transform(), calls verify() // prints "Access granted!" or "Access denied." Results Default Run We compiled the challenge with different transformations, each transformation producing a different binary but with the same behavior and features. For the first run, we used default options for each transformation. All the transformations available in Tigress are available here. The tests were divided into 4 phases of increasing difficulty for a total of 22 targets: Phase 0 - No Transforms p0_baseline — No transformation Phase 1 — Individual Transforms (7 targets): p1_encode_arithmetic — EncodeArithmetic only p1_encode_literals — EncodeLiterals only p1_flatten_indirect — Flatten(indirect) only p1_jit — JIT only p1_jit_dynamic — JitDynamic(xtea) only p1_virtualize_indirect_regs — Virtualize(indirect,regs) only p1_virtualize_switch_stack — Virtualize(switch,stack) only Phase 2 — Paired Transforms (7 targets): p2_both_data — EncodeLiterals + EncodeArithmetic p2_flatten_ind_enc_arithmetic — Flatten(indirect) + EncodeArithmetic p2_flatten_ind_virt_sw — Flatten(indirect) + Virtualize(switch) p2_jitdyn_enc_arithmetic — JitDynamic(xtea) + EncodeArithmetic p2_virt_ind_enc_arithmetic — Virtualize(indirect,regs) + EncodeArithmetic p2_virt_ind_enc_literals — Virtualize(indirect,regs) + EncodeLiterals p2_virt_sw_enc_arithmetic — Virtualize(switch) + EncodeArithmetic Phase 3 — Heavy Combos (7 targets): p3_double_virtualize — Virtualize(switch) then Virtualize(indirect,regs) — nested VMs p3_double_virt_both_data — Double virtualize + EncodeLit

Entities

Elastic (vendor)Anthropic (vendor)Claude Opus 4.6 (product)IDA Pro (product)LLM-driven reverse engineering (technology)