Our first blog about the use of AI in the intelligence cycle covered the cycle’s first phase, direction. AI in collection, the next phase, is our focus of discussion today. 

Of all the phases, collection is easily the most routine, mechanical, or even atomic. Unlike with the direction phase, collection can’t be neatly divided into beginning, middle, and end stages. So we’ve adopted a different approach for determining AI’s utility in Cyber Threat Intelligence collection. 

Keep an open mind here, as we present our thoughts on whether the CTI analyst will want to apply AI at this phase.

Hard Questions about AI in Collection

The analyst acquired intelligence requirements (IRs) in the direction phase. Now their task is to accumulate sources and information to address those IRs. They’ll consider which sources and agencies (aka SANDA) are accessible. And they’ll need to account for the SANDA’s reliability and credibility.  

Can AI help with this? Our previous blog offers a small clue. From the example of an IR being addressed by a generative-AI tool, we saw that the tool listed all the potential categories of cyber-threat actors that exist! It felt like an excessive, noisy answer instead of a direct one. Also, most AI-generated answers often wrap up by concluding that the answer is not exhaustive. 

The question now is: Would the client, who approved the IRs, be satisfied with an excessive answer instead of a direct one? And could it give them the unsubstantiated worry that a whole swath of cyber-threat actors is after them, rather than the more likely few who target their industry and region? 

We know what you’re thinking…surely AI has no place in this collection phase. So, is that it? Well, yes, but there’s a related solution. 

The Automation Alternative

After the whole collection phase runs from start to finish several times, we’re backing automation (tech that follows a set of predefined rules) – rather than AI (tech primed to make its own decisions) – to play a more useful role. 

That’s because collection is based not only on IRs, but on whatever underlying framework, or rules, the analyst uses to help prioritise IRs and decide the order to answer them in. An automation tool can learn that framework, once its parameters are defined, and prove useful in collection activity. 

As an example, we’ve matched a collection to another intelligence cycle: the F3EAD – Find, Fix, Finish, Exploit, Analyse, and Disseminate. We’ll focus on Find and Fix, for their close relevance to initiating a collection phase. We can roughly define them as: 

  • Find – Use of the ‘Who, What, When, Where, Why’ questions to identify likely adversaries who could target an organisation. These act as a primer for automation. When these key questions are answered, the wealth of data that the intelligence team would need to comb through is significantly reduced and becomes more directed; some directive tactics that could help include Boolean search strings or Google Dorking. 
  • Fix – Use of the data points from the Find phase to ‘fix’-ate on which systems the adversaries will target and which potential problems the organisation could encounter from their actions. Here, selecting the SANDA best suited to gather intelligence would be useful. If any of the shortlisted SANDA are open source, and perhaps unstructured, they could be good candidates for automation: Applying automated collection or web-scraping tools to the selected SANDA could save time by extracting relevant details from complex content. 

See how there’s a potential for the IRs to be addressed more directly with the help of a framework that can be supported by technology? With the Find and Fix objectives, the analyst can decide which order to address IRs in, by determining priority; then, when finding the answers through SANDA, automated technology will help compile those answers.  

What’s the Verdict?

Collection is fundamentally about getting IRs answered. If you want to take this process to a level beyond using automated technology to siphon choice details about a potential adversary from SANDA, AI could also be an option.

Analysts could adapt learning technology (i.e. AI) to understand the entire workflow of the collection process described above, and teach it to provide direct answers to the IRs. This AI assistance would come with its own set of caveats – ranging from conditional reliability of the tech-produced answers to limitations in adaptability from one intelligence case to the next. 

Whether you’re curious about applying automation or AI technology to help in the collection phase, there’s a fair bit of preparation and caveats involved in navigating either route. That’s a wrap for our look at AI in the collection phase. Onwards to the next phase, in our coming blog about analysis.