Advanced Persistent Testing - the most comprehensive detection and response evaluation

Stefan Dumitrascu
Mar 21
2 min read

How does your security solution stack against a persistent threat actor? What capabilities are going to defend you when the inevitable happens?

Our new APT Evaluation is the most comprehensive independent test available, looking at both security efficacy and security capabilities. Read more about how we test.

How do we test?

We test like an attacker would, with the benefit of transparency after the attack. We use a full attack chain to correctly assess the capabilities of the solution under real circumstances.

Real IPs, real domains, real human hands-on keyboard attackers empowered with the latest available tradecraft.

Environment & Execution

APT Evaluation Infrastructure — APT Infrastructure

A target organisation is deployed with cloud infrastructure to reflect a real environment.

Three main sequences are used during our test case to understand what actions occurred during the attack: Intrusion, Infiltration and Propagation. This helps us communicate key points to address. We also assess the detective and preventative capabilities (depending on the configuration used) of the solution by using what we call Blue Responses. As with the attacker actions matched to MITRE ATT&CK framework we try to match these to these capabilities to the D3fend framework.

False Positives

We use legitimate scenarios for our false positive testing to reflect common behaviours in an organisation. These are provided during the deployment phase of the solution as a normal customer/provider relationship would work. The tested solutions also have the right to request a two week learning period to learn about common patterns in the targeted organisation. This, alongside the scenarios used are always disclosed in the final report in detail in our public repository.

Capabilities

Protection capabilities are measured against the possible mitigations available for the tradecraft used alongside the customisability of responses to help with future detection engineering.

Read more about it in our methodology here.

Test Scope & Transparency

For 2025 we have chosen APT29 as the main corpus. The key reasons we selected is because it has been active for a long time constantly expanding their tradecraft. This enables us to have an extensive tradecraft in scope. Notably, in more recent operations increase in cloud related techniques.

We always strive for transparency, as such the technique scope is published on our website and at our GitHub.

Rating

It's important to put context behind each metric gained. There are always trade-offs when choosing deployments of solutions. Our rating system aims to provide enough information for CISOs to get an overview of the success stories of the evaluation backed by useful empirical data.

We don't encourage products to be too verbose as alert fatigue is all too common in SOCs. Enough information to gain a clear context of what was attacked while keeping detection rates high should be enough.

We will post a more detailed post about constructing an effective rating system and the key metrics we look for in the meantime, you can read about our current rating system in our methodology here.

Take part in the most comprehensive detection and response evaluation to measure your solution against an APT. Call for participation active now!