cyber daily logo

Breaking news and updates daily. Subscribe to our Newsletter

Breaking news and updates daily. Subscribe to our Newsletter X facebook linkedin Instagram Instagram

Machine learning can reduce false positives in application security by 96%

Opinion: Security teams constantly struggle with managing high levels of false positives making it difficult to prioritise risk, writes Dr Stuart Millar, senior data scientist at Rapid7.

user iconDr. Stuart Millar
Tue, 20 Dec 2022
Machine learning can reduce false positives in application security by 96%
expand image

As we look ahead to 2023, it’s an exciting time for artificial intelligence (AI) and machine learning (ML) cyber security innovations. To illustrate the possibilities, our multidisciplinary ML group has designed a novel deep learning model to automatically prioritise application security vulnerabilities and reduce false positive friction.

Partnering with the Centre for Secure Information Technologies (CSIT) at Queen’s University Belfast, which is the UK’s Innovation and Knowledge Centre for cyber security and recognised by GCHQ and EPSRC as a centre of excellence for cyber security research, this is the first deep learning system to optimise dynamic application security testing (DAST) vulnerability triage in application security.

Around the world, security teams constantly struggle with prioritising risk and managing a high level of false positive alerts, while the rise of the cloud post-COVID means web application security is more crucial than ever. With web attacks continuing to be the most common type of compromise, high levels of false positives generated by vulnerability scanners have become an industry-wide challenge.


To combat this, our innovative ML architecture optimises vulnerability triage by utilising the structure of traffic exchanges between a DAST scanner and a given web application. Leveraging convolutional neural networks and natural language processing, we designed a deep learning system that encapsulates internal representations of request and response HTTP traffic before fusing them together to make a prediction of a verified vulnerability or a false positive. This system learns from historical triage carried out by our industry-leading SMEs in our managed services division.

Given the skillset, time, and cognitive effort required to review high volumes of DAST results by hand, the addition of this deep learning capability to a scanner creates a hybrid system that enables application security analysts to rank scan results, de-prioritise false positives, and concentrate on likely real vulnerabilities. With the system able to make hundreds of predictions per second, productivity is improved and remediation time reduced, resulting in stronger customer security postures. A rigorous evaluation of this ML architecture across multiple customers shows that 96 per cent of false positives on average can automatically be detected and filtered out.

Our deep learning model uses convolutional neural networks and natural language processing to represent the structure of client-server web traffic. Neither the model nor the scanner requires source code access, with this hybrid approach first finding potential vulnerabilities using a scan engine, followed by the model predicting those findings as real vulnerabilities or false positives.

The resultant solution enables the augmentation of triage decisions by de-prioritising false positives. Considering the average time to detect a web breach can be several months, these time savings are essential to both reduce exposure and harden security postures. This ensures a vulnerability can be rapidly discovered, verified, and remediated, which in turn, ensures the window of opportunity for an attacker is much smaller. You can download a copy of the pre-print publication here.

Dr. Stuart Millar is a senior data scientist at Rapid7.

cyber daily subscribe
Be the first to hear the latest developments in the cyber industry.