Deobfuscation of JavaScript code and identification of security weaknesses through large language models

Publication type
Paper
Date
03.03.2026
Description

Advancements in Large Language Models (LLMs) allow solving many challenging tasks related to software security in an automatic manner, e.g., the generation of test cases. An important aspect concerns the deobfuscation of source code, especially for improving its readability or preventing the elusion of signature-based countermeasures. Although LLMs are increasingly deployed to reveal the presence of malicious payloads within obfuscated software components, a comprehensive understanding of their potential and limitations is still missing. In this work, we evaluate the effectiveness of deobfuscating JavaScript code through an LLM-based pipeline. In more detail, we investigate whether LLMs can preserve structural properties of the software, especially to enhance the identification of weaknesses. Compared to two standard tools (i.e., JSNice and js-deobfuscator), our approach provides a more readable JavaScript prose according to several metrics, while retaining information on the Common Weaknesses Enumeration plaguing the software. To support the process of explaining issues within code, we performed tests on the use of two general-purpose LLMs, i.e., ChatGPT and Google Gemini. Results indicate that advancing the security of JavaScript through LLMs requires facing several challenges, which can be largely addressed via ad-hoc models.

Authors
Giacomo Benedetti, Luca Caviglione, Carmela Comito, Alberto Falcone, Massimo Guarascio