How Sure Is Not Sure Enough?

Elemendar’s READ. platform uses proprietary Machine Learning (ML) technology to extract entities in STIX and MITRE ATT&CK formats from unstructured text (check out this blog for more details). In plain terms, we teach machines to replicate a very specific human task: mapping text strings to more abstract categories such as malware families, threat actors, ATT&CK techniques etc.

Now take a moment to reflect critically on ML and the science of AI as a whole. No human can be 100% correct, 100% of the time, and yet us imperfect humans are teachers for machine intelligence (via human-generated data). Thus we can be 100% certain that the machine will make mistakes in its task. On reflection this is obvious, but how do you deal with error factors within the context of developing a tool for cyber threat intelligence analysts that could be used in a number of high impact situations?

After giving a lot of thought to this problem within Elemendar, we concluded that the best way to deal with confidence issues around ML entity extraction is to allow the analyst to filter results based on confidence level. For example, in the upcoming version of READ. analysts can filter by confidence levels for extractions within a document collection: Almost certain, Very likely, or Probable. A scale that can be applied to the entities that we extract is obvious – but how should this be applied in real-world CTI practice?

Within the context of action bias vs opportunity cost

The recent paper Opportunity Cost of Action Bias in Cybersecurity Incident Response by Dykstra et al presents a compelling case for counterbalancing the tendency for humans to want to immediately respond in the face of a crisis (action bias), which sacrifices a potentially better option to respond that might occur in the future. The paper takes a deep dive into the Sony compromise by North Korea in 2014 to outline a case for giving incident responders an opportunity to “wait and see”.

Dykstra’s paper is interesting on a number of levels; but within the context of READ. and the wider application of CTI tooling, it highlights the need for explicit levels of confidence around the data used by CTI analysts during high-stress decision-making. Revisiting READ’s different levels of confidence around extracted entities, we can see how each would be applicable to certain CTI scenarios in figure 1 below.

Figure 1: Confidence level mapped to use case

When we compare the action bias vs opportunity cost scenario presented in Dykstra’s paper and the need for a higher level of confidence in CTI data due to the crisis situation, the application of the confidence levels around entity extraction presented in Figure 1 becomes apparent.

In a crisis situation, the decision maker requires a high degree of confidence in their data and has little time to critically evaluate detail; as such they might set READ. to only extract entities with an “almost certain” confidence rating. In practical terms, approaching MITRE ATT&CK techniques in such a way would fundamentally affect the threat hunting aspect of a crisis response: analysts would only be sent to areas of the victim network that the response manager was almost certain had threat actor activity.

Conversely, the opportunity cost of action bias is higher in an exploratory CTI research scenario: closing off potential avenues of research too early, for example because of lower confidence ratings, likely results in blind spots and poorer situational awareness when the operating domain is larger than a known compromised victim network. In such a situation, this thinking suggests letting analysts run with a greater number of lower-confidence hypotheses (ideally coupled with a structured analytic technique such as Analysis of Competing Hypotheses, Generalised Intelligence Requirements or cone of plausibility).

Of course, all of this is academic if the tools and data on which analysts depend don’t make it easy to put such thinking into practice. We hope that implementing confidence ratings in READ., as supported by STIX and MISP, contributes to putting ML-derived CTI on an even footing with more traditional sources and augments the analyst’s arsenal across the spectrum of confidence-dependent use cases.

CONTACT ELEMENDAR

FOLLOW

ABOUT ELEMENDAR

RECENT TWEETS

SUBSCRIBE