How OpenAI’s o3 found CVE-2025-37899: A Linux kernel zeroday hidden in plain Sight
An AI model has discovered a new Linux kernel zeroday. Learn how CVE-2025-37899 was found using OpenAI’s o3, and what it means for cybersecurity’s future.
In a landmark development for cybersecurity research, independent vulnerability analyst Sean HN has disclosed a remote use-after-free flaw in the Linux kernel‘s ksmbd module, assigned CVE-2025-37899, discovered with the assistance of OpenAI’s cutting-edge language model, o3. The vulnerability lies in the SMB session logoff handler and has the potential to allow remote kernel memory corruption or code execution.
What makes this disclosure especially significant is not only the nature of the bug—present in a critical piece of infrastructure—but also the method of its discovery. Without any auxiliary fuzzing tools, symbolic analysis, or agentic frameworks, the researcher was able to locate the flaw through natural language prompts and code ingestion using o3’s API. This sets a precedent for large language models as viable tools in real-world vulnerability discovery pipelines.

Where Is the Vulnerability and Why Is It Dangerous?
The use-after-free condition stems from concurrent handling of SMB2 session logoff requests across multiple threads. In ksmbd, which implements the SMB3 protocol natively in the Linux kernel, the sess->user pointer represents user session state. If one worker thread frees it upon session logoff while another is mid-operation on the same session via a separate transport binding, the pointer can be dereferenced after being invalidated.
The vulnerable sequence occurs in the function smb2_session_logoff, where the session’s user pointer is freed without verifying that no other concurrent threads are still accessing it. Since SMB3 allows multiple connections to bind to a single session, this leads to a classic race condition.
This is a textbook concurrency flaw, exacerbated by modern protocol features like session multiplexing. As cybercrime groups increasingly target kernel-level vulnerabilities for privilege escalation, any path leading to use-after-free in memory-safe areas of the kernel represents a severe security risk. In enterprise or server deployments, such a flaw could be weaponized into a remote denial-of-service or, worse, full remote code execution (RCE) in the most privileged context.
How o3 Redefined Vulnerability Research
Rather than stumbling upon the bug manually, Sean HN took a deliberate approach to evaluate LLMs like o3 for code analysis tasks. The researcher selected ksmbd—a module already known to contain non-trivial but approachable bugs—and prepared a methodical context input for o3 that included the session setup and teardown logic, request dispatchers, and specific command handlers, spread across 3,300 lines of code (LoC).
The key innovation was feeding o3 layered, semantically coherent prompts, with explicit instructions to find use-after-free vulnerabilities, while discouraging false positives. Over 100 runs, o3 not only rediscovered a known bug (CVE-2025-37778), but also independently identified the previously undocumented CVE-2025-37899.
From a tooling standpoint, this marks a leap beyond deterministic static analyzers or fuzzers. Traditional methods would have required precise symbolic constraints to replicate the vulnerability, whereas o3 was able to simulate human-style reasoning about thread scheduling and memory safety without instrumentation. This raises strategic questions about how enterprise red teams or national CERTs could deploy LLMs in advanced threat hunting.
Vulnerability Details: Deep Dive into CVE-2025-37899
The core issue stems from missing synchronization between worker threads handling different connections bound to the same session. The offending code inside smb2_session_logoff() looks innocent at first glance:
if (sess->user) {
ksmbd_free_user(sess->user);
sess->user = NULL;
}
This unprotected free does not account for the possibility that other threads, such as those handling WRITE or OPEN SMB commands on a different connection (transport), may still rely on sess->user.
This kind of vulnerability is particularly insidious in kernelspace code, where even a single misaligned instruction after the kfree() call can result in memory corruption. Depending on the memory reuse pattern in the slab allocator, attackers could perform heap spraying to influence the post-free memory layout, making the issue potentially exploitable for code injection.
AI vs. Human Reasoning: A Convergence of Discovery Paths
Interestingly, the o3 model not only discovered CVE-2025-37899 but also suggested a more robust fix than the human-proposed patch for CVE-2025-37778. Sean HN had previously issued a patch that simply freed sess->user and set it to NULL. However, o3’s findings pointed out that nullification does not address concurrent access from other bindings, something the researcher admitted he had overlooked.
This is an excellent case study in human-in-the-loop AI, where the model does not outperform the expert in isolation but complements human judgment with broader recall and codepath coverage. In fact, o3’s analysis mirrors the kind of peer-review oversight that is essential in critical patch review processes but often skipped due to time constraints or reviewer fatigue.
Signal vs. Noise: The Practical Challenges of LLM Integration
While the discovery is remarkable, the signal-to-noise ratio remains a concern. In the CVE-2025-37778 detection run, o3 produced 1 correct report for every 4.5 false positives. For the broader file-wide audit that led to CVE-2025-37899, the rate dropped further. Out of 100 runs with 100k-token input, only one output matched the zeroday.
For vulnerability triage teams, the key to operationalizing LLMs like o3 will be front-end filtering layers, perhaps involving simpler classifiers, or metadata-based heuristics, before escalating to full LLM scans. With tooling such as GitHub’s llm CLI or custom CI hooks, research teams can begin to evaluate LLMs on curated modules—starting with network-facing code paths, which typically have the highest threat surface.
Implications for Security Engineering Teams
The CVE-2025-37899 disclosure signals a clear paradigm shift in vulnerability research and secure code review. Companies maintaining kernel-adjacent components—especially file-sharing, container runtime, or hypervisor code—should begin integrating LLMs like o3 not as replacements, but as scalable second-pass auditors.
The research also underscores that patches must be threat-model aware. A fix that works in single-threaded logic may be completely inadequate under real-world concurrency. Models that reason about shared-state systems could serve as early warning systems, flagging such blind spots before they enter production.
Future of AI-Augmented Security Research
The broader lesson here is about capability convergence. For the first time, LLMs are demonstrating attributes of flexibility, creativity, and contextual reasoning previously exclusive to human experts. Their utility is no longer hypothetical. Even with a high false positive rate, a single zeroday like CVE-2025-37899 discovered early can prevent catastrophic exploitation, especially in critical infrastructure deployments or cloud-native environments that rely on Linux kernel stability.
As Sean HN noted, if we were to stop AI model progress today, o3 would already be worth integrating. But with every iteration, the cost-benefit ratio tilts further in favour of adoption. For cybersecurity leaders, now is the time to start building bridges between code intelligence tools and LLM-assisted analysis, before attackers do.
Discover more from Business-News-Today.com
Subscribe to get the latest posts sent to your email.