April | 2012 | Malos Ojos Security Blog

Is Cloud-based SIEM Any Better?
Deron Grzetich | 20. April, 2012

In flipping through some articles from the various publications I read (wow, did I just sound like Sarah Palin?) I came across this comment in an article on SIEM in the cloud:

“Another problem with pushing SIEM into the cloud is that targeted attack detection requires in-depth knowledge of internal systems, the kind found in corporate security teams. Cloud-based SIEM services may have trouble with recognizing the low-and-slow attacks, said Mark Nicolett, vice president with research firm Gartner.” (http://searchcloudsecurity.techtarget.com/news/2240147704/More-companies-eyeing-SIEM-in-the-cloud)

To give some context to the article it was more about leveraging “the cloud” to provide SIEM services for the SMB market, which doesn’t have the staff on hand to manage full-blown SIEM deployments, than it was about detecting attacks, but I digress…

I agree and disagree. I agree, and have said this before in my arguments for and against using an MSSP for monitoring. While the context of the article was using the cloud to host the data (and the usual data protection arguments came up) isn’t this just another case of outsourcing to a 3^rd party provider and calling it cloud? Data security issues aside, it doesn’t matter if the MSSP uses their own infrastructure, or some cloud provider’s infrastructure, the monitoring service is what I’m paying for. I’ve said this before, and I’ll say it again, a MSSP is “a SOC” not “your SOC”. They do a fair job at detecting events, but may fail to put these into a business context that makes sense. Again, this is something you can try to get them to do, but personal experience has taught (more like biased) me to believe that it can’t be done.

But I’m also going to disagree and say that it isn’t only cloud-based SIEM providers who miss the low-and-slow attacks. I’d argue internal security teams are as likely to miss them as well based on the maturity of monitoring I see at organizations and the surrounding IR processes. I don’t mean to sound negative, but few organizations have built a solid detective capability that gets down to the level of very carefully crafted attacks which may not result in a lot of traffic and/or alerts. In addition, the alerts as defined by your SOC/IR team may not be suited to catch these attacks, and even if they are we still need to ensure we have the right trigger sources and thresholds without overwhelming the analysts who deal with the output.

Either way, my point is that we aren’t very good at this….yet. What I often see lacking is the level of knowledge in the analysts who review the console and even some of the program architects who define the alerts for the “low and slow” attacks. In terms of maturity we are still struggling with getting the highly visible alerts configured correctly for our environment or getting the SIEM we purchased 2-3 years ago to do what we want it to do. Vendors are doing their part to make deployment and configuration simpler while still allowing flexibility in alert creation and correlation. But, I don’t think that will get us to the level of maturity needed to identify the stealthy attacks…I do think it is going to come down to us providing “security intelligence” versus a monitoring service, but I’ll hold on to that for a future post.

To answer the question of the title, no, not yet. Again, I think what we are talking about here is just outsourcing using the “C” word and I argue the same points I would if I just said MSSP in place of cloud. Business context issues aside, it is better than doing nothing and may serve a purpose to fill a void, especially if the organization is small enough that they will never bring this function in house. One thing that the attackers understand is that although the SMB market may not be as juicy a target as the large orgs, they still have some good data that is worth the effort…and even less risk since they rarely have solid security programs. So, is it better than nothing? Sure. Is it the correct answer today? Maybe. Will it detect a low and slow attack? No, but you’re chances with in internal program aren’t that much better today….and they need to be.

SIEM Deployments Don’t Fail…
Deron Grzetich | 10. April, 2012

Let me restate the title, SIEM deployments don’t fail. The technology to accept logs, parse and correlate events has existed in a mature state for some time now. The space is so mature that we even have a slight divide between SIEMs that are better suited for user tracking and compliance and some that are better at pure security events depending on how the technology “grew up”. So the pure technical aspects of the deployment are generally not the reason your SIEM deployment fails to bring the value you, or your organization, envisioned (no pun intended).

Remember that old ISO thing about people, process, AND technology? Seems we often forget the first two and focus too much on the technology itself. While I’d like to say this is limited to smaller organizations the fact it that it is not. The technology simply supports the people who deal with the output (read: security or compliance alerts) and the process they take to ensure that the response is consistent, repeatable, tracked, and reported. That being said we also seem to forget to plan out a few things before we start down the SIEM path in the first place. This post aims to provide you with the “lessons learned” from both my own journey as well as that of what I see my clients go through in a Q and A format.

Question 1. Why are we deploying SIEM or a log management/monitoring solution?

The answer to this is most likely going to drive the initial development of your overall strategy. The drivers tend to vary but generally fall into the following categories or issues (can be one or more):

The company is afraid of seeing their name in the paper as the latest “breached” company (i.e. is afraid of Anonymous due to their “ethicalness” or possibly afraid of what is left of Lulzsec)
A knee-jerk reaction to being breached recently and the checkbook is open, time to spend some money…
Had some failure of a compliance requirement (i.e. PCI, e-Banking) that monitoring solves (from a checkbox perspective)
Have finally graduated from simply deploying “preventative” controls and realize they need to detect the failure (which happens more than we know) of those controls

What are just as important are the goals of the overall program. Are we more concerned with network or system security events? Are we focused on user activity or compliance monitoring? Is it both? What do we need to get out of this program at a minimum level and what would be a nice to have? Where does this program need to be in the next 12 month? The next 3 years? Answering these questions helps answer the question of “why”. The purpose and mission must be defined before we even think about looking at the technology to support the program. While this seems like a logical first step most people start by evaluating technology solutions and then backing into the purpose and mission based on the tool they like the most. Remember, technology is rarely the barrier.

Question 2. Now that we are moving forward with the program, how do we start?

The answer to this one will obviously depend on the answers to some of the questions above. Let’s assume for a moment, and for simplicity of this post, that you have chosen security monitoring as the emphasis of the program. Your first step is NOT to run out to every system, application, security control, and network device and point all of the logs at the highest (i.e. debugging) level at the SIEM. Sure, during a response having every log imaginable to sort through may be of great benefit, however at this stage I’m more concerned that I have the “right” logs as opposed to “all” logs. One of the reasons I see this “throw everything at the SIEM and see what sticks” idea may be partially driven by the vendors themselves or an overzealous security guy. I could image a sales rep saying “yes, point everything at us and we’ll tell you what is important as we have magical gnomes under the hood who correlate 10 times faster and better than our competition”. Great, as long as what is important to you exactly lines up with what the vendor thinks then go for it (joking, of course).

The step that seems most logical here is to define what events, if they occur, are most important given your organization, business, structure, and the type and criticality of data you store or is most valuable. If we define our top 10, 20, 30, etc. and rank these events by criticality we have started to define a few things about our program without even knowing it. First, with a list of events we can match these up to the log sources that we would need in order to trigger an alert in the system. Do we need one event source and a threshold to trigger? Or is it multiple sources that we can correlate? Don’t be surprised if your list is a mixture of both types. Vendors would love for us to believe that all events are the result of their correlation magic, but in reality that just isn’t true. We can take that one step further and define the logs we would need to further investigate an alert as well. Second, we started to define an order of criticality for both investigation and response. Given the number of potential events per day and a lack of staff to investigate every one, we need to get to what matters which should be our critical or higher risk events first.

One thing to keep in mind here as well is to not develop your top “x” list in vacuum. As part of good project planning you should have identified the necessary business units, lines, and resources that need to be involved in this process. Security people are good at thinking about security, but maybe not so much about how someone could misuse a mainframe, SAP, our financial apps and so on. Those who are closer to the application, BU, or function may end up being a great resource during this phase.

And finally, events shouldn’t be confined to only perimeter systems. If we look at security logging and are concerned about attacks we need to build signatures for the entire attack process, not just our perimeter defenses which fail us 50% of the time. Ask yourself, if we missed the attack at the perimeter, how long would the attacker have access to our network and systems until we noticed? If the Verizon DBIR report is any indication the answer may be weeks to months.

Question 3. I’ve defined my events, prioritized them, and linked them to both trigger log sources and investigation log requirements. Now what?

Hate to say it, but this may be the hardest part of the process. Hard because it assumes your company has asset management under control. And I don’t mean being able to answer where a particular piece of hardware may be at a given moment. I do mean being able to match an asset up to its business function, use, application, support, and ownership information from both the underlying services layer (i.e. OS, web server, etc.) as well as the application owner. All of this is in addition to the standard tracking of a decent asset management program such as location, status, network addressing, etc. If you lack this information you may be able to start gathering the necessary asset metadata from various sources that may (hopefully) already exist. Most companies have some rudimentary asset tracking system, but you could also leverage output from a recent business impact analysis (BIA) or even the output from the vulnerability assessment process…assuming you perform periodic discovery of assets. Tedious? Yes.

Let’s assume we were able to cobble something together that is reasonable for asset management. Using our top “x” list we can identify all of the log sources and match those up to the required assets. Once we know all of the sources we need to:

Ensure that all assets that are required to log, based on our events, have logging enabled and to the correct level, and;
That as new assets are added which match our a log source type from our event list go through step 1 above, and;
The assets we do have logging to the SIEM continue to log until they are decommissioned. If they stop logging we can investigate as to why.

One client I had called this a Monitored Asset Management program or something to that effect, which I thought was a fitting way to describe this process. This isn’t a difficult as one may think given that the systems logging into our SIEM tend to be noisy, so a system that goes dead quite for a period of time is an indicator of an potential issue (i.e. it was decommissioned and we didn’t know, someone changed the logging configuration, or it is live yet has an issue sending (or us receiving) the logs). One thing that does slip by this process is if someone changes the logging level to less than what is required for our event to trigger, thus blinding the SIEM until the level is changed back to the required setting.

In addition to the asset management we should test our event for correctness at this point. We should be able to manually trigger each event type and what as it comes in to the SIEM or dashboard. I can admit I have made this mistake in the past, believing that there is no way we could have screwed up a query or correlation so that the event would never trigger…but we did. You should also have a plan to test these periodically, especially for low volume high impact type of events to ensure that nothing has changed and the system is working as designed.

Question 4. To MSSP, or not to MSSP, that is the question. Do you need an MSSP and if so what is their value?

This is also a tough question to answer as it always “depends”. Most companies don’t have the necessary people, skills, or availability to monitor the environment in a way which accomplishes the mission we set for ourselves in step 1. That tends to lead to the MSSP discussion of outsourcing this to a 3^rd party who has the people and time (well, you’re paying for it so they better) to watch the events pop up on the console and then do “something”.

Let me start with the positive aspects of using an MSSP before I say anything negative. First, they do offer a “staff on demand” which may be a good way to get the program off the ground assuming you require a 24×7 capability. That is a question that needs to be answered in step 1 as well, and you should ask yourself if we received an alert at 3am, do we have the capability to respond or would that be taken care of by the first security analyst on our team in the morning? 24×7 monitoring is great, assuming you have the response capability as well. Second, they do offer some level of comfort in “having someone to call” during an event or incident. They tend to not only offer monitoring services but also may have response capabilities, threat intelligence information (I’ll leave the value of that one up to you), and investigation.

Now on to the negatives of using an MSSP. First, they are “a SOC looking at a SIEM console”, and not “your SOC who cares about your business”. The MSSP doesn’t view the events in the same business context and you unless you give them that context and then demand that they care. Believe me, I’ve tried this route and it leads to frustrating phone calls with MSSP SOC managers and then the sales guy who offers some “money back” for you troubles. Even if you provide the context of the system, network architecture, and all the necessary information there is no guarantee they will use it. To give you a personal example we used an unnamed MSSP and would constantly receive alerts from them stating that a “system” was infected as it was seen browsing and then downloading something bad (i.e. JavaScript NOOP sled or infected PDF). That “system” turned out to be the web proxy 99.9% of the time. To show how ridiculous this issue was all you had to do was look in the actually proxy log record, which was sent to them, to determine the network address (and host name) of the internal system that was involved in the event. Side note, they had a copy of the network diagram and a system list which showed the system by name, network address, and function. Any analyst who has ever worked in a corporate environment would understand the stupidity of telling us that the web proxy was potentially infected. Second, MSSPs, unless contractually obligated, may not be storing all of the logs you need during an incident or investigation. Think back to the answer to question 2 for a moment where we defined our event, trigger logs, and logs required to further investigate an event. What happens if you receive an event from the MSSP and go back to the sources to pull the necessary logs to investigate only to find they were overwritten? As an example from my past, and this depends on traffic and log settings, but Active Directory logs at my previous employer rolled over every 4 hours. If I wasn’t storing those elsewhere I may have been missing a necessary piece of information. There are ways around this issue which I plan on addressing in a follow up post on SOC/SIEM/IR design.

Question 5. Anything else that I need to consider? What do others miss the first time around, or even after deploying a SIEM?

To close this post I’d offer some additional suggestions besides some of the (what I feel are obvious) suggestions above. People are very important in this process, so regardless of the technology you’re going to need some solid security analysts with skills ranging from log management to forensics and investigations. One of the initial barriers to launching this type of program tends to be a lack of qualified resources in this area. It may be in your best interest to go the MSSP route and keep a 3^rd party on retainer to scale your team during an actual verified incident. Also, one other key aspects of the program must be a way to measure the success, or failure, of the program and processes. Most companies start with the obvious metric of “acknowledged” time…or the time between receiving the event and acknowledging that someone saw it and said “hey, I see that”. While that is a start I’d be more concerned that the resolution of the event was within the SLAs we defined as part of the program in the early stages. There is a lot more I could, but won’t, go into here on metrics which I’ll save for a follow up post. In my next post I’ll also talk about “tiering” the events so that the events with a defined response can take an alternate workflow, and more interesting events which require analysis will be routed to those best equipped to deal with them. And finally, ensuring that the development, or modification, of the overall incident response process is considered when implementing a SIEM program. Questions such as, how will SIEM differ from DLP monitoring and how does SIEM compliment, or not, our existing investigative or forensics tool kit, will need to be answered.

Conclusion

To recap the simple steps presented here:

Define your program with a focus on process and people as opposed to a “technology first” approach
Define the events, risk ranked, that matter to your organization and link those to both the required trigger log sources as well as logs required to investigate the event
Ensure that the required logs from the previous step are available, and continue to be available to the SIEM system
Consider the use of an MSSP carefully, weighing the benefits and drawbacks of such an approach
Lots of other items in terms of design, workflow, tracking and the like need to be considered (hopefully I’ll motivate myself to post again with thoughts on SOC/SIEM/IR design considerations)

While I think the list above and this post are quite rudimentary I can admit that I made some of the mistakes I mentioned the first time I went through this process myself. My excuse is that we tried this for ourselves back in 2007, but I find little excuse for larger organizations making these mistakes some 5 years later. Hopefully you can take something from this post, even if it is to disagree with my views…just hope it encourages some thought around these programs before starting on the deployment of one.

// >> April 2012

Cyber Threat Readiness Webinar – May 3rd, 2012
Deron Grzetich | 25. April, 2012

Is Cloud-based SIEM Any Better?
Deron Grzetich | 20. April, 2012

SIEM Deployments Don’t Fail…
Deron Grzetich | 10. April, 2012

// >> April 2012

Cyber Threat Readiness Webinar – May 3rd, 2012 Deron Grzetich | 25. April, 2012

Is Cloud-based SIEM Any Better? Deron Grzetich | 20. April, 2012

SIEM Deployments Don’t Fail… Deron Grzetich | 10. April, 2012

Cyber Threat Readiness Webinar – May 3rd, 2012
Deron Grzetich | 25. April, 2012

Is Cloud-based SIEM Any Better?
Deron Grzetich | 20. April, 2012

SIEM Deployments Don’t Fail…
Deron Grzetich | 10. April, 2012