Saturday, November 9, 2013

Data Wrangling – Splunk & CIM

Data… there is a lots of it… Now we can store it, well, we’ve been able to for a while, but it’s catching on that lots of data is good, and making it useful is awesome!

I get to play with a little data, it’s miniscule in comparison with some, but it’s what I get to play with… so I am learning about the things I can do. One of the tools I get to use is Splunk, I’ve had the opportunity to shape and mold the “data” so that it’s more than just unstructured data. It’s useable data with some tuning, but as far as I can tell, you really do have to put time and effort into Splunk to “train it”. Let me be clear, I’ve yet to see Splunk do something that looks intelligent other than Key:Value pair extraction out of data… and that’s debatable on being useful.

I have a vision, and right now I am trying to understand if it’s a commonly shared vision using what the Splunk people call the Common Information Model (link here). My vision, helped along by a friend and mentor, as well as seeing what people have done in more advanced correlation systems is to build essentially a web of linking points of data within all of the various events and log types I have access too.

Think of this from a security analyst standpoint:

Event comes in, they see it in the SIEM or the ES App and start to dig in… Asking the question of what else does this show up in or as? Building a search based off of src_ip=”IP in question” OR dest_ip=”IP in question”… (Two points on this, still pre-Splunk6 and lets say we cleverly specify the index via config magic for the role in the app)… What do you think will happen?

What I am pushing for is to make it so all sources, sourcetypes, and sub-sourcetypes that have a component that is a “source ip address” or a “destination ip address” is checked for this IP. If it has hits, it shows up in the search. Yes, this is not a super rare term search, that’s the point of the search, it’s not supposed to be. It does however provide the analyst with the ability to dig in to all of the sourcetypes that have hits, allowing further drilling down in various searches to extract and pivot through the data in various periods of time to see where this IP has interacted with the network.

Like:

User VPN’s into company -> AV events occur -> Logs into a meeting -> Logs into an application server -> etc. etc.

Being able to correlate a single users path through various log sources is key to seeing what all that user has done in the period of time visible to the security analyst. Making it easier for them to pick up a bread crumb, whether it be in the middle of the trail or at any point, and finding out about the who|what|why|when parts of deciding if it’s an incident or not.

I have no idea how many people/organizations are leveraging the power of CIM (Common information model), or if I am just being slow to get on board with this.

My  experience with CIM is having it pointed out by a friend/mentor, and then trying to hold the people working on Splunk to it.

No comments:

Post a Comment