LAPSUS$ Hunting with simple Anomaly Detection

By tomwynn281On April 4, 2022April 4, 2022

Various methods to bypass OKTA MFAs have been disclosed on the internet, which I won’t discuss in this blog post. But rather how to detect if the credentials (authentication + authorization) had been compromised and the Threat Actor has already had access to the systems.

Anomaly Detection with Isolation Forest

In this post, we will discuss a process which leverages python script to process massive amount of data that could be run periodically to detect anomalies in log-in geolocations.

Requirement: OKTA authentication logs and some Python

Gather data

There are various ways to pull OKTA logs. Either via API calls against OKTA or API calls against your SIEMs. In this example, I used the following query to pull the data from my SIEM and preprocess the data to contain only the fields of interests.

I like to preprocess data as much as possible using the SIEMs first because of the SIEM’s robust query language and resource optimization. So I would recommend to always look for optimization opportunities at the SIEM query language as much as possible before exporting the data to DataFrame.

Here are the results:

Now, let’s find the anomalies

We will use sklearn’s Isolation Forest Algorithm to find the anomalies in the dataset.

Step 1:

The algorithm requires quantitative variables but our dataset is categorical data. So first, we will need to encode our data using the following function:

Then we will pull the data and call it in our main function:

Now, our data will look something like this:

Step 2:

Now we can start applying our Isolation Forest Algorithm from sklearn lib

A couple of things we tried to do in the code above

Defined the parameters for IsolationForest
Append the column nameded “anomaly” to the DataFrame
Then filter out the anomaly value = -1
Invert the encoded values to the original categorical values
Remove duplicates

That gives us:

Step 3:

Analysis: for validation purpose, the data above was reviewed against the full dataset for the last 6 months: The user had actually never logged in from the UK and France IPs before, only US IPs (no record of traveling either). So, why now?

Next steps

In the next blog posts, we will review how we can automate this entire process. From finding the anomalies from the process above, to fully automated hunts against the threat intels within our SIEM tools.

Stay tuned.

Sneak Peek: both IPs from France and UKs are all LAPSUS$ associated IPs.