Security log analysis: How to group by two fields in Splunk
Splunk is a powerful tool, but with so many available functions and hit-and-miss coverage on forums it can sometimes take some trial and error to get queries right. Here’s what I pieced together to perform a count on a subset of events and group the data by two fields…
As an example, let’s imagine that we’re investigating some incoming traffic from a certain IP address and want to see its targets according to destination IP and port. A query to retrieve that information and present it in a readable format would look like this:
src_ip=”126.96.36.199″ stats count by dest_port dest_ip stats list(dest_port), list(count) by dest_ip sort -list(count)
Let’s look in a little bit more detail at how that query is put together…
This is our base query, which works in much the same way as any standard search you’d run in Splunk. In this case, I’m looking for all events that have the source IP 188.8.131.52, but this can be any search that returns the data you want to perform the other operations on.
stats count by dest_port dest_ip
Next we run stats, which is Splunk’s aggregation function and allows us to generate various statistics from our data. We list the fields we want Splunk to return at the end of the function – here, I want to see the results by destination IP address and destination port.
stats list(dest_port), list(count) by dest_ip
Now it’s time to arrange the results so we can see the counts on a per-IP basis. First we list by destination port, and then we list the event counts by destination IP, which leaves us with our (almost) final list of properly arranged destination IP addresses and ports.
Finally, let’s sort our results so we can see what the most common destination IP addresses are. This is achieved using Splunk’s sort function, which defaults to ascending order. The hyphen before the word list makes it descending.
After all of that, Splunk will give us something that looks like this:
That’s pretty useful to see what’s happening. We have a clear view of which systems are being targeted and which services on those systems may be affected. However, if we need to do any further analysis, Splunk throws us something of a curve ball.
Unfortunately, asking Splunk to export an Excel file containing this data will give us an inconsistently space-delimited sheet that’s very difficult to use without some manual rearrangement. It looks something like this:
|21 20 80
|12 7 4
For now, this is just a case of working through the sheet and rearranging the data into rows and columns that make more sense, but it’s something I may write some Python for in future, because it feels like it would be handy to have a script tucked away ready for when I need to do some analysis in a rush when investigating an incident.