Problem Statement

Organizations don’t usually invest in enough resources into Threat Intelligence, even if they do, SOC operators usually struggle with the amount of information gathered from the platforms. The amount of resources could be overwhelming. It takes a lot of knowledge and understanding of the threat landscape, attack vectors to filter out the noises and narrow down to only applicable TTPs.

So how can we develop a framework to help potentially solve this problem?

Automation – part of the answer

As demonstrated in the following diagram. Threat intel feed consumption pipeline if done right will introduce certain attributes that could be used to correlate with internal data to feed into a model which calculates the applicability of the TTPs.

So how do we get the relevant attributes from the threatfeeds?

In this post, we will explore a branch of Machine Learning called Natural Language Processing – Custom Entity Extraction. Full working example code is here for reference

Again, the goal of this post is not to teach you about ML and the code so we won’t go into the components but rather focus on the workflow to achieve the end results.

  1. Create a Labeled Trained Dataset for the ML Model

2. We feed the trained datasets and test data sets into the Machine Learning models


3. Take the TTPs output from the CRF or LSTM models and produce actionable items based on probability

POC

Some scripts that I used to achieve the processes mentioned above are placed here:

https://github.com/tomwynn/nlp_threatfeeds