Normally I’d be posting something related to the Sony hack and why it doesn’t really matter that is was or was not DPRK and asking why everyone is so focused on the destructive malware versus the fact that Sony made many mistakes in the years prior that ultimately led to what we are witnessing in the media today, but I digress…
I’ve been doing this teaching thing for quite some time, 7 and a half years to be exact, and recently went back in time and added up all of the students that have had a course with me…which totals out to just over 800 as of last quarter. I started thinking about not only how I got here but also how my advice and teaching style has changed over these years. In thinking back, the one question that inevitably comes up every quarter in class is “what does it take to be successful in this field?” to which my answers are always almost the same. I’m not basing my answer to this question solely on what I did or how I got to where I am, my answer is based on a combination my own experiences along with traits I’ve observed from others that I’ve worked with throughout my career who I consider to be great security folks in general. I also think there are traits that need to be shed by some folks in our field as well, which I’m also happy to convey to those who ask.
So why am I writing this post? Well, when I first started teaching I attempted to cater to everyone in the class and tried not to offend anyone. Perhaps with age, and after many years of trying to make every student happy, I realized I was doing the students a disservice. I wasn’t exactly treating them in a way that would clue them in on the general expectations of those in our field or in a way that would force them to learn traits that would make them successful. However, and NOT to my surprise, I noted some student just don’t like this new method. Some student just don’t like to be told that they are going to have to work hard, some just don’t like the fact that they may have to research something on their own, and some worry more about the grade than what they take away from the course that they can apply in their current roles or in the future. At the end of every quarter the school asks (used to be mandatory) students to provide course evaluations back to the instructor along with comments on strengths, weaknesses, what they liked, didn’t like and so on. And every quarter I’m both thrilled and dismayed after reading the comments. Of course some people really enjoyed the course, some disliked it, and I expect that…but some disliked it for reasons you wouldn’t think. And what is most dismaying is that some of these reasons fly directly in the face of what I think it takes to be successful in our field…and that worries me.
So what does it take? For me I boiled it down to three main required traits: curiosity, perseverance, being able to learn new things on your own.
Let me start with curiosity – inquisitiveness, interest, and imagination…all synonyms of curiosity. If you don’t find yourself asking “how or why does this work?” very often then you’re probably not curious by nature. Curious people are fascinated by things that generally leads to a burning desire to know why and how things in this world work. Curious people have been known to take things apart (sorry mom) and sort of put them back together. Curious people like to build things as well, because starting with the pieces helps them to understand how it works, then they are likely to take what they just built apart and try to make it better. I pair imaginative with this trait as curious people tend to find themselves thinking of new ways of doing things or finding unique solutions to problems. We often find ourselves needing to ask how something works, or coming up with new or novel solutions in our field…so I put curiosity as one of my top 3.
The second is perseverance. I feel this is a strong trait of people I see as being successful across many different fields, but also feel it is critical to carry this trait if your field is information security. It is easy to get frustrated in our field just based on the breadth of knowledge it requires. I don’t mean to say you need to know everything, but as my career progressed I found myself leaning on knowledge that wasn’t directly related to security that allowed me to see the bigger picture of things and how what I was doing was going to affect things that I previously wasn’t thinking of. We also need perseverance as we are often beat down and blamed for “IT not working like it should because of security” or being the “team that says no all the time”. I’m not advocating that you take a strong stance each and every time someone challenges you (being stubborn is not the same as perseverance), you need to pick your battles, and when you do you will need to stand your ground. This trait leads me to the last one…where perseverance is required to not give up on something just because it is hard.
Learn how you learn, so that you can teach yourself new things in the future. When I was an undergrad I hadn’t yet learned how I learn best, or better yet, how to teach myself. By the time I was a grad student I had figured it out and learning, and courses in general, became much easier and fun which allowed me to focus on how to apply my newly gained knowledge to both past and future problems I would encounter. I find myself learning all the time just to stay current, so if you’re going to stay current in our field then expect to become a lifelong learner. Yes, I know this also applies to other fields as well, but I feel it is critical in our field given the pace of change. I used to think this was a “young” student’s problem or a generational issue, but over time I found that to be a misconception on my part. This seems to affect all generations equally based on my observations. My advice is generally pretty simple, if you’re not curious or inquisitive and you can’t stick it out through perseverance it is unlikely that you’ll find yourself being a lifelong learner. One thing that I get in the evaluations each and every quarter since I’ve changed my style is that “I can’t believe you said that we should use Google to answer our own questions”…which blows my mind. When I started in this field I had Hacking Exposed and IRC…today’s new students of the field have Google, 100’s of published and unpublished security books, and way more educational resources than ever before. So why not use them to your advantage? Why is being asked to learn something on your own such a bad thing?
To conclude, I realize there are other traits that are required in order to be successful, but I feel that without the three I boiled it down to that the others don’t matter. Finally, I realize that some people who aren’t strong in these three traits will still stay in the field and be satisfied with putting years into the field only to push buttons every day. Which reminds me, I was watching Caddyshack the other day and there’s a line where Ty tells Danny that “the world needs ditch digger too”…relating that to this post I always make it “the world needs firewall admins too” in my mind.
In September of this year a large dump, approximately 5 million, of what was claimed to be Google account usernames and passwords was dumped onto the internet. While Google’s own analysis of the dump showed that only 2% of the accounts would have worked to allow access into Google/Gmail accounts (see: http://googleonlinesecurity.blogspot.com/2014/09/cleaning-up-after-password-dumps.html) I still found it interesting to analyze the password dump as it offers a glimpse into how people choose their passwords…I guess I can’t shed the pen tester in me or my love of breaking and analyzing passwords.
In the past (circa 2006-7) I completed a similar analysis on ~100k Active Directory cracked passwords that were harvested through a few pen tests in 3 different industry verticals (energy, oil/gas, and healthcare). The idea was to get a view into not only how people chose their passwords but also to understand how to break them more easily. At the time the only available password dumps and research I could find was being done on passwords from websites (e.g. myspace phishing). I wanted to see how corporate users were choosing passwords and not analyzing how some 12 year old chose a password of “blink182”.
The main take-aways from my research on ~100k corporate passwords were:
- 99% of the passwords were less than 14 characters in length – and a majority were 8 characters in length
- Passwords generally had a root that consisted of a pronounceable (read: dictionary) word, or combination of words, followed by an appendage
- The appendage was:
- A 2 digit combination
- 4 digit dates (from 1900-2007)
- 3 digit combinations
- Special characters or special plus a digit
- The root or pronounceable word was rarely preceded by the appendages I listed above, but in ~5% of the cases the passwords either started with the appendage or were wrapped in appendages
- Through examination of password histories and phishing attacks I was running back then, I noticed that when users are required to change their password they generally change the appendage and not the root (i.e. password01 becomes password02, and so on)
It also became quite obvious as to why the majority of the passwords were 8 characters in length once I reviewed the GPO that was applied to the domain (or users OU) that dictated that the minimum length be 8 characters with complexity enabled. Why a length of 8 and complexity enabled you may ask? Well, likely because Gartner recommended this as a length due to the amount of time brute-forcing an 8 character complex password using CPU-based methods was considered adequate, unless of course we are dealing with LM hashes as was the case in my initial research. GPU-based systems, as we are well aware, have no problems brute-forcing their way through 8 character complex passwords (and this was as of a few years ago).
So, on to the current password dump of Google usernames and passwords. While I realize I discounted research on website passwords earlier I made an assumption here that at some point all of those corporate policies and awareness trainings would make their way into our everyday “password” lives and into our website passwords. Plus, I already had a copy of the plaintext passwords, there were a lot of them, and I didn’t need to spend any time breaking them.
First, some raw numbers and my technique:
- There were just under 5 million passwords in the dump, of which I sampled approximately 20% (1,045,757 to be exact) for analysis. I made the assumption that this would be a statistically relevant sample and the results indicative of analysis of all ~5 million passwords
- I removed all 1-2 character passwords, leaving me with passwords of 3-64 characters in length
- I had to spend some time cleaning up the dump, removing passwords that appeared to be hashes (and not plaintext passwords) as well as removing lines that were obviously mis-prints (i.e. website addresses versus passwords, extra line breaks and the like…)
- Prior to removing duplicate passwords I did some quick counts on both common or funny passwords, and here’s what I found:
- A very small number (~3,700) contained a swear word. I’m not telling you what I searched for, but the reason I did this was I found these to generally have the funniest combinations both in this current analysis as well as the corporate passwords mentioned above
- A smaller than expected number of passwords (~10,500) contained “pass” or “password” in the password
- The use of 1337 and 31337 is still a thing (albeit small) with ~2,500 passwords containing those strings
- Nietzsche may have been correct and God may be dead as ~1,900 passwords contained the word god
- There were 255,284 duplicates which resulted in a final (de-duped) password count of 790,462
Another reason I wanted to complete this analysis was that it gave me a reason to try out the Password Analysis and Cracking Kit (PACK). PACK is a Python-based wordlist (in my case password) analysis script and I would recommend this tool for both password analysis as well as taking a peak into your large wordlists to see if they are valuable or not based on what you’re trying to crack (i.e. 7 character passwords don’t make sense for WPA-PSK breaking). The script not only shows you the length by distribution (which would be trivial in Excel, I know), but it also analyzes the character sets, complexity, and shows what Hashcat masks would work best to break X percentage of the passwords in the list.
Here’s what PACK had to say about my Google sample of 790,462 passwords:
- Length distribution (in the graphic below) showed that the largest percentage of passwords was in the 6-10 character range which is expected. What I didn’t expect was that 8-10 would make up a majority of the passwords by length (~59.6% of all passwords)
- Complexity by percentage was highest for lower alpha-numeric followed by lower alpha-only (~79% of all passwords)
- The simple masks show that a majority of the passwords are a string followed by digit (root + appendage as mentioned above) and a simple string (~66% of all passwords)
- The advanced masks for Hashcat that break the most passwords (~12% of all passwords) are ?l?l?l?l?l?l?l?l and ?l?l?l?l?l?l, which means 12% of the passwords are 6 or 8 character alpha-only passwords
So what did the results show? Well, I think they showed what I thought they would show…passwords that people chose are both predictable and unchanged over the last 7+ years. In addition to my research, KoreLogic Security had a good presentation on password topology histograms from corporate pen tests that showed results that were similar to those above. The one difference that stood out from my research was that I was breaking LM passwords back in 2006-7 so I wasn’t as concerned with case-permutations (I didn’t care where the upper alpha character was as that was trivial to determine if needed), and second, the Google sample I analyzed seemed to favor all lower case roots. For comparison, my top 3 masks and the Kore top 3 masks were:
(Kore* – My Sample)
- ?u?l?l?l?l?l?d?d – ?l?l?l?l?l?l?l?l
- ?u?l?l?l?l?l?l?d?d – ?l?l?l?l?l?l
- ?u?l?l?l?d?d?d?d – ?l?l?l?l?l?l?l?l?l
*Note I’m only using the Kore Fortune 100 sample, the second sample in their set simply added a special character to the last position of the mask.
What this comparison showed was that corporate passwords appear to be more complex but just as long as the passwords people generally choose for websites (i.e. Gmail, Ymail, etc.) for the top 3 masks and that the techniques for breaking non-corporate and corporate passwords may need to be adjusted depending on the target at-hand. What this also showed was that corporate users tend to hover around a few topologies of passwords and my sample showed a more even distribution of masks. For example, the top 3 masks from the Kore presentation represent ~36% of all passwords from the sample and mine represent only ~16%. I’d have to get 10 masks deep to get to 36% of all passwords. It also showed that people tend to choose a path of least resistance to creating a password. Many websites and web services generally do not require password complexity, hence what I believe is driving the higher number of all lower alpha passwords in my sample versus my early research and the more recent Kore research into corporate passwords.
Finally, I’d like to end this with an explanation of the title of the post. While in the “less than a 10th of a percent area of the passwords in my sample”, some interesting passphrases emerged especially in the 20+ character range. Besides “ilovejustinbeiber” I thought the “passwordispassword” password spoke volumes about how users view their passwords…as a hindrance to getting what they want or where they need to go.
Tl;dr – passwords still suck as an authentication mechanism and our users aren’t going to go out of their way to increase security.
Asset management is a foundational IT service that many organizations continue to struggle to provide. Worse yet, and from the security perspective, this affects all of the secondary and tertiary services that rely on this foundation such as vulnerability management, security monitoring, analysis, and response to name a few. While it is very rare that an organization has an up-to-date or accurate picture of all of their IT assets, it is even rarer (think rainbow-colored unicorn that pukes Skittles) that an organization has an accurate picture of the criticality of their assets. While some do a decent job when standing up a CMDB of mapping applications to supporting infrastructure and ranking their criticality (although many tend to use a binary critical/not critical ranking), these criticality rankings are statically assigned and if not updated over time may turn stale. Manual re-evaluation of assets and applications in a CMDB is a time-consuming task that many organizations, after the pain of setting up the CMDB in the first iteration, are not willing or likely to make re-certification of assets and criticality rakings a priority…and it is easy to understand why. My other issue is that many CMDBs sometimes take into account the “availability” factor of an asset over the criticality of the assets from a security perspective. For example, it is not uncommon to see a rigorous change management process for assets in the CMDB with a slightly less rigorous (or non-existent) change management process for non-CMDB assets. But I digress…to summarize my problem:
- Asset criticality often does not exist or is assigned upon the asset’s entry into a central tracking mechanism or CMDB
- The effort to manually determine and recertify asset criticality is often so great that manual processes fail or produce inaccurate data
- In order for asset criticality data to be useful we may need near real time views of the criticality that change in concert with the asset’s usage
- Without accurate asset inventories and criticalities we cannot accurately represent overall risk or risk posture of an organization
The impact of inaccurate asset inventories and lack of up-to-date criticality rankings got me thinking that there has to be a better way. Being that I spend a majority of my time in the security monitoring space, and now what seems to be threat intel and security/data analytics space, I kept thinking of possible solutions. The one factor I found to be in common with every possible solution was data. And why not? We used to talk about the problems of “too much data” and how we were drowning in it…so why not use it to infer the critically of assets and to update their critically in an automated fashion. Basically, make the data work for us for a change.
To start I looked for existing solutions but couldn’t find one. Yes, some vendors have pieces of what I was looking for (i.e. identity analytics), but no one vendor had a solution that fit my needs. In general, my thought process was:
- We may be starting with a statically defined criticality rating for certain assets and applications (i.e. CMDB), and I’m fine with that as a starting point
- I need a way to gather and process data that would support, or reject, the statically assigned ratings
- I also need a way to assign ratings to assets outside of what has been statically assigned (i.e. critical assets not included in CMDB)
- The rating system shouldn’t be binary (yes/no) but more flexible and take into account real-world factors such as the type/sensitivity of the data stored or processed, usage, and network accessibility factors
- Assets criticalities could be inferred and updated on a periodic (i.e. monthly) or real-time basis through data collection and processing
- The side-benefit of all of this would also include a more accurate asset inventory and picture that could be used to support everything from IT BAU processes (i.e. license management) and security initiatives (i.e. VM, security monitoring, response, etc.)
These 6 thoughts guided the drafting of a research paper, posted here (http://www.malos-ojos.com/wp-content/uploads/2014/08/DGRZETICH-Ideas-on-Asset-Criticality-Inference.pdf), that I’ve been ever so slowly working on. Keep in mind that the paper is a draft and still a work in progress and attempts to start to solve the problem using data and the idea that we should be able to infer the criticality of an asset based on models and data analytics. I’ve been thinking about this for a while now (the paper was dated 6/26/2013) and even last year attempted to gather a sample data set and to work with the M.S. students from DePaul in the Predicative Analytics concentration to solve but that never came to fruition. Maybe this year…
While this isn’t a post on what threat intelligence is or is not I’d be negligent if I didn’t at least begin to put some scope and context around this term as the focus of this post is on making threat data and intelligence actionable. Not to mention, every vendor and their grandmother is trying to use this phase to sell products and services without fully understanding or defining its meaning.
First, it is important to understand that there is a difference between data and threat intelligence. There are providers of data, which is generally some type of atomic indicator (i.e. IOC) that comes in the form of an IP address, URL, domain, meta data, email addresses or hash. This data, in its least useful form, is a simple listing of indicators without including attribution of the threat actor(s) or campaigns with which they are associated. Some providers include the malware/malware family/RAT that was last associated with the indicator (i.e. Zeus, cryptolocker, PlugX, njRAT, etc.) and the date associated with the last activity. Some other providers focus on telemetry data about the indicator (i.e. who registered the domain, geolocated IP, AS numbers, and so on). Moving up the maturity scale and closer to real intel are providers that track a series of indicators such as IP, domains/subdomains, email addresses and meta data to a campaign and threat actor or group. If we add the atomic indicators plus the tactics (i.e. phishing campaigns that include a weaponized PDF that installs a backdoor that connects to C2 infrastructure associated with a threat actor or group) used by the threat actors we start to build a more holistic view of the threat. Now that we understand tactics, techniques and procedures (TTPs) and capability or our adversaries, we focus on the intent of the actors/groups or personas and how their operations are, or are not, potentially being directed at our organization. The final piece of the equation, which is partially the focus of this post, is understanding how we take these data feeds, enrich them, and then use them in the context of our own organization and move towards providing actual threat intelligence – but that is a post on its own.
Many organizations think that building a threat intelligence capability is a large undertaking. To some extent they are correct in the long term/strategic view for a mature threat intel program that may be years down the road. However, the purpose of this post is to argue that even with just a few data and intel sources we can enable or enhance our current capabilities such as security monitoring/analysis/response and vulnerability management services. I honestly chose these services as they fit nicely in my reference model for a threat monitoring and response program as well as threat intel which is at the center of this reference model. So let’s walk through a few examples…
Enrichment of Vulnerability Data
Vulnerability assessment programs have been around for what seems like forever, but mature vulnerability management programs are few and far between. Why is this? It seems we, as security professionals, are good at buying an assessment technology and running it and that’s about it. What we aren’t very good at is setting up a full cycle vulnerability management program to assign and track vulnerability status throughout the lifecycle. Some of the reasons are due to historical challenges (outlined in more detail in a research paper I posted here: http://goo.gl/yzXB4r) such as poor asset management/ownership information, history of breaking the infrastructure with your scans (real or imagined by IT), or way too many vulnerabilities identified to remediate. Let’s examine that last challenge of having too many vulnerabilities and see if our data and intel feeds can help.
Historically what have security groups done when they were faced with a large number of vulnerabilities? The worst action I’ve seen is to take the raw number of vulnerabilities and present them as a rolling line graph/bar chart over time. This type of reporting does nothing to expose the true risk, which should be one of the main outputs of the vulnerability management program, and infuriates IT by making them look bad. Not to mention these “raw” numbers generally tend to include the low severity vulnerabilities. Do I really care that someone can tell what timezone my laptop is set to? I don’t know about you but I doubt that is going to lead to the next Target breach. Outside of raw numbers, the next type of action usually taken is to assign some remediation order or preference to the assessment results. While a good start, most security teams go into “let’s just look at sev 4 and sev 5 vulnerabilities” mode which may result in what amounts to a still very large list. Enter our threat data…
What if we were able to subscribe to a data feed where the provider tracked current and historical usage of exploits, matched the exploit with the associated vulnerabilities, and hence the required remediation action (i.e. apply patch, change configuration, etc.)? This data, when put into the context of our current set of vulnerabilities, becomes intelligence and allows us prioritize remediation of the vulnerabilities that impose the greatest risk due to their active use in attack kits as well as non 0-day exploits being used by nation state actors. As a side note, among a few vendors there is a myth being spread that most all nation-state attacks utilize 0-days, which I find to be an odd statement given that we are so bad at securing our infrastructure through patch and configuration management that it is likely that an Adobe exploit from 2012 is going to be effective in most environments. But I digress.
So how much does using threat data to prioritize remediation really help the program in reducing risk? In my research paper (here: http://goo.gl/yzXB4r) I noted that limiting to sev 4 and sev 5 as well as using threat data it is possible to reduce the number of systems that require remediation by ~60% and the discrete number of patches that needed to be applied was reduced by ~80%. While one may argue that this may still result in a high number of patches and/or systems requiring treatment I would counter-argue that I’d rather address 39,000 systems versus 100,000 and apply 180 discrete patches over 1000 any day. At least I’m making more manageable chunks of work and the work that I am assigning results in a more meaningful reduction of risk.
Integrating Your Very First Threat Feed – How Special
In addition to creating a reference model for a security monitoring, analysis and response programs (which includes threat intel) I also built out a model for implementing the threat intel service which includes a 4 step flow of: 1. Threat Risk Analysis, 2. Acquisition, 3. Centralization, and 4. Utilization. I’ll detail this model in a future post and the fact that in a mature service there would be a level of automation, but for now I’d like to point out that it is perfectly acceptable to build a threat intel program as a series of iterative steps. By simply performing a threat risk assessment and understanding or defining the data and intel needs an organization should then be able to choose a data or intel provider that is suitable to their goals. Ironically I’ve witnessed a few organizations that went out and procured a feed, or multiple feeds, without understanding how it was going to benefit them or how it would be operationalized…I’ll save those stories for another day. And while I’m not going to cover the differences between finished intel versus indicators/data in this post, it is possible for an organization to procure feeds (open source and commercial feeds) and instrument their network to prevent activity, or at a minimum, detect the presence of the activity.
As an example, let’s say that we have a set of preventive controls in our environment – firewalls, web/email proxies, network-based intrusion prevention systems, and end point controls such as AV, app whitelisting, and host-based firewalls. Let’s also say we have a set of detective controls that includes a log management system and/or security information and event management (SIEM) which is being fed by various network infrastructure components, systems and applications, and our preventive controls mentioned above. For the sake of continuing the example let’s also say that I’m in an industry vertical that performs R&D and would likely be targeted by nation state actors (i.e. this Panda that Kitten) in addition to the standard Crimeware groups and hacktivists. With this understanding I should be able to evaluate and select a threat intel/data provider that could then be used to instrument my network (preventive and detective controls) to highlight activity by these groups. At this point you would start asking yourself if you need a provider that covers all of the type of threat actors/groups, if you need vertical-specific feeds, and if you need to ensure that you have a process to take the feeds and instrument your environment? The answer to all three is likely to be yes.
Continuing with the example, let’s say I selected a provider that provides both analyst-derived/proprietary intel in addition to cultivating widely available open source information. This information should be centralized so that an operator can assess the validity and applicability of the information being shared and determine the course of action on how to integrate this into the preventative and/or detective controls. A simple example of this may be validating the list of known-bad IPs and updating the firewall (network and possibly host-based) with blocks/denies for these destinations. Or, updating the web proxy to block traffic to known bad URLs or domains/sub-domains. One thing that shouldn’t be overlooked here would be that we trigger an alert on this activity for later reporting on the efficacy of our controls and/or the type of activity we are seeing on our network. This type of data is often lacking in many organization and they struggle to create a management-level intel reports that are specific to the organization that highlight the current and historical activity being observed. In addition, we could also take the indicators and implement detection rules in our log management/SIEM to detect and alert on this activity. Again, keep in mind that for an organization just standing up a threat intel service these may be manual processes that have the possibility of being partially automated in a later or more mature version of the service.
As a side note, one thing I’ve noticed from many of the SIEM vendors is how they try to sell everyone on the “intel feeds” that their product has and how they are “already integrated”. The problem I have with this “integration” is that you are receiving the vendor’s feed and not one of your choosing. If SIEM vendors were smart they would not only offer their own feeds but also open up integrations with customer-created feeds that are being generated from their intel program. As it stands today this integration is not as straight-forward as it should be, then again, we also aren’t doing a very good job of standardizing the format of our intel despite STIX/CyBOX/TAXII, OpenIOC, IODEF, etc. and the transfer mechanism (API, JSON, XML, etc.) being around for a while now.
To round out this example, it is also important to note that as we instrument our environment that we track the alerts generated based on our indicators back to the category or type (i.e. nation-state, crimeware, hacktivist, etc.) and if possible track back to the specific origin of the threat (i.e. Ukrainian crimeware groups, Deep Panda, Anonymous, etc.). This is going to be key in monitoring for and reporting on threat activity so we can track historical changes and better predict future activity. We can also use this information to re-evaluate our control set as we map the attacks by kind/type/vector and effectiveness (i.e. was the attack blocked at delivery) or the in-effectiveness (i.e. was a system compromised and only detected through monitoring) and map these against the kill chain. This type of information translates into both overall security architecture and funding requests very well.
While this is a new and seemingly complex area for information security professionals it really isn’t that difficult to get started. This post highlighted only a few simple examples and there are many more that could be part of your phase 1 deployment of a threat intel service. My only parting advice would be to make sure you have a strategy and mission statement for the service, determine your threat landscape, define what you need out of your feeds and acquire them, centralize the information and utilize it by instrumenting and monitoring your environment. While some of the vendors in this space have great sales pitches and even really cool logos, you had better understand your requirements (and limitations) prior to scheduling a bunch of vendor demos.
Who likes dependencies anyway??? Not me…so here is a shell script to get Cuckoo Sandbox v1.1 installed
I realized that I was spending an inordinate amount of time when rebuilding Cuckoo Sandbox (http://cuckoosandbox.org) in my home lab just because I was starting from a fresh Ubuntu install which does not ship with all of the dependencies and packages that are required by Cuckoo. I also break this system quite often and in such specular ways that the only recovery mechanism is to rebuild the system from the OS up. This, unfortunately, also leads back to spending way too much time post-OS install in rebuilding Cuckoo. There has to be a better way…and so there is using a shell script I wrote to get me up and running in no time after a rebuild.
So what do I need to run this script?
The script (located here: cuckoo_install – right-click and save as, rename to .sh) assumes you have a base install of Ubuntu 12.04LTS and that you have updated through an apt-get update and an apt-get dist-upgrade. It was also created to work specifically for Cuckoo Sandbox v1.1. Beyond that you’re on your own to set networking and the user accounts as you see fit. In my case I use the account created during OS install for everything on this system and I have a physically and logically segmented network just for the sandbox and the virtual machines used to detonate the malware. These systems are directly connected to the internet and sit behind a Cisco ASA which is logging all accepts and denies to a Splunk instance and the connection is tapped using a VSS 12×4 distributed tap and the traffic is captured using the free version of NetWitness Investigator. I’m also running a VM instance of INetSim (http://www.inetsim.org) that supplies DNS, FTP, and other services that may be required by the malware (i.e. through faking a DNS response to point the malware to a system I control).
What happens when I run the script?
Assuming your base Ubuntu system has connectivity to the internet it will proceed to download and install all of the required dependencies and packages required to run Cuckoo Sandbox v1.1 (again, this assumes you’re on 12.04LTS as a base OS). There is a built-in check at the start that will verify your version that will error out if you’re on something other than 12.04LTS. If you think this will work even if you’re not on 12.04LTS you can, at your own risk, comment out this section and force the script to run. The script runs in sections and requires that you hit enter before proceeding to the next section. I put this in so you could review the status of a section (i.e. no errors) before continuing on to the next section of the script. If you find that annoying simply comment out all of the “read” commands in the script and it will run start to end, however it becomes difficult to identify any install errors given the length of the output. Other than that the script will install what is required for Cuckoo, and after running you can address and errors or issues with the installed components to ensure everything is installed correctly.
What do I need to do following the script to get Cuckoo up and running?
This is going to be highly dependent on your individual setup, however you need to get your virtual machines built and/or transferred into VirtualBox and set the snapshots that will be used (plenty of good info on the net on this step such as http://santi-bassett.blogspot.com/2013/01/installing-cuckoo-sandbox-on-virtualbox.html). You also need to add your user account to the virtualbox group, download the malware.py file if you plan on using Volatility, and setup your network for your particular needs.
Can I modify the script and/or what if it doesn’t work?
I’m posting this script as-is. It works for my needs in my lab environment which may not be the same as yours. Feel free to mod it as required, however all I ask is if you make significant improvements to the script that you share it back to the community. I’m not going to actively maintain the script or make modifications in the future as this is a one shot deal (I have a $dayjob that actually pays the bills).
Note: If you’re new to Cuckoo or Ubuntu I’d actually recommend trying a manual install if you have time. I realized I learned quite a lot about the required packages and how the system functions when I struggled to get Cuckoo up and running a few years ago. It makes troubleshooting issues I encounter now much easier.
Research paper on Snort rule development for the major fault attack on Allen Bradley MicroLogix 1400 controllers
As part of a course I took last quarter at DePaul University on critical infrastructure security I drew the straw on one of our group labs which required that we write a Snort signature for an attack on the Allen Bradley MicroLogix 1400 series controllers. The attack was written by Matt Luallen of Cybati in September of last year for Metasploit which sets a bit on a data file on the controller which indicates to the controller that there is a major logical fault. This attack stops the running program on the controller and must be manually cleared (either through physical interaction with the controller or by clearing the fault using the RSMicroLogix application).
The results of this research project will likely be published in the future in a more formal fashion, but until then I wanted to post a sneak peek at the report for those who may be interested. Note that I wrote this a few months ago and held off on publishing it as it was being copy edited for publication. As I assume that process had died I am left with no choice but to publish this work…no sense in holding on to something to could be of value to someone else.
A link to the PDF is here.
Since I don’t have time to actually write the articles I want to, I thought I’d add a post to share my collection of photos of broken systems. These are systems I find in public places, like hotels, airports, and grocery stores and I take a picture. So here’s my collection:
The photos above are pretty old, but as I was walking past the Chase ATM at the local Dominick’s I noticed that a start menu was displayed on the screen. And being curious I had to touch it to see what was installed. Oddly enough it had Windows Movie Maker, which I find to be a strange application to have installed on an ATM. Also curious that ActiveState Perl and Acrobat Reader were installed…would seem to me that the image for the ATMs was bloated.
Above is the airport collection, although I could swear that I had more of these…Flight notification screens, baggage claim, and an internet kiosk. By the way, who in the hell uses this system? I have a feeling this some sucker bought this in a “get rich overnight” scam where they “own” the system and sit at home and make it big! Suckers.
As much as I enjoy staying at the Cosmopolitan in Vegas they always seem to have an issue with that cool display system they run in the lobby, throughout the casino, and the elevators. The first one was just a licensing issue with the software, which I originally took so I could remember the name of the app. Surprisingly, it isn’t that expensive. The second photo is from the LCD screens in the elevator.
Keeping with the elevator track, the above two photos are from the elevators in the Aon Center where my office is…I think. I’m not really sure since I only go there once a month or so. I wasn’t sure if I should be worried if I’d get stuck in the elevator, but then I remembered that the screen on the left shows the “elevator” news so no one needs to make eye contact on the long elevator rides up to the 59th floor. Assuming that the IP is a true public then it comes back as owned by Savvis in Missouri somewhere just south of Chesterfield. Oh, and the kernel version is from 2007 and the SSID is bay15.
The last three are randoms. First is from DePaul University in the lobby where the app running the kiosk crashed…why? In the second one I think Best Buy needs to dispatch the Geek Squad, although this seems to be a Flash issue if watchdog.sys is causing the BSOD. Finally, a bar with a broken poker machine….running Linux.
JSPSpy is an interesting tool that once uploaded to a server that supports JSP pages gives you a user interface on the web server itself. Its power comes from the ability to upload/download, zip, and delete files at will on the web server as well as spawn a command prompt. In addition, if you are able to gain credentials to a database server serving the web application (say through an unencrypted database connection string) it has a database connection component as well which would allow one to crawl a backend database server for information.
There is one issue with the code, which I find odd given that it was created in 2009, in that the SQL driver and URL for the connection using JDBC is incorrect. Well, not incorrect, the issue is that it supports SQL Server 2000. Starting with SQL 2005 the driver and URL were changed…and the code for JSPSpy which is easily accessible on the internet has an old connection string.
In addition, there are a few more UI’s for crawling a SQL backend using JSP floating around as well. I’ve included one in this demo as well.
The video demonstrates the power of JSPSpy in my demo environment consisting of Java 1.6, Tomcat 6.0, SQL 2005 and Windows 2003 Server. UPDATE: I updated the video on this as it appears it didn’t convert correctly and only shows in SD, not HD so the text is very hard to read. The new video below is in HD.
As a security professional it’s not often that people try to socially engineer me, especially over the phone. But, I thought the call I received was worthy of both a big laugh as well as a post. This got me thinking as well…is the going hourly rate for a person to sit and call people on the phone now low enough that it beats out automated malware and drive-bys? While I doubt that is the case I have to assume that since it is still a running scam, and I saw articles on this from August of this year, that they are making money. It also made me laugh as I took a trip down memory lane of having to do this as a consultant in a prior life, although I’d like to think my version was more convincing.
If you get it, here’s how the scam goes:
In my case it was a blocked call, and the person on the other end of the phone states they are with Microsoft. My guy’s name was Victor Dias (Indian accent) which didn’t quite make sense given his difficulty with spelling it when I asked. I’m kicking myself for not having a Win7 VM running at the time and following through on his instructions to see how this all ends, but I digress. He asked me to do some rudimentary things, such as go to Start, search for “ev”, and open the event viewer. Then he asked me if I have any errors or warnings in the Application logs, or if I have had any pop-ups stating that an application had crashed. Next, he asked if I had AV running (which of course I said no to) so he said “your computer is probably infected with the malwares (sp) and junks (sp), can you open Remote Assistance and allow me to connect so I can run a scan to remove the junks (sp)?”
Awesome! Going back to why I wanted to kick myself was that I didn’t have a Windows 7 system in front of me…I so wanted to see what he was going to do, and in hindsight what I may have been able to do to him (disclaimer: I’m not advocating offensive operations, wink wink). At this point I was done with the scam and started to ask him a series of questions. What is your name? Can you spell that? What is your MS employee ID number? BTW, he answered with 44398…ummm, pretty sure they are 6 digits and not 5, to which he said “oh yes, mine is 5 digits”. In fact, you can find this info online, so a little research prior to the scam never hurts (your welcome for the free advice, Victor). What finally broke him was when I asked where he was calling from. Manvil, TX, or Manville, TX…he couldn’t spell the name of the city he was in. Then I asked which major city in Texas was closest to his location…he couldn’t answer. So when I gave him options of cities he simply hung up, knowing he wasn’t getting anywhere with me.
So, I have a Win7 VM, my copy of NetWitness, and some surprises ready in case Victor calls back. Here’s hoping to hear from you, Victor.
I attended an ISACA presentation at DePaul the other evening given by Eric Karshiev from Deloitte on the Zeus malware family and had a few thoughts that I wanted to post (link to the event is at the end).
First, kudos to Eric for a decent presentation even though, self-admittedly, he hasn’t done much public speaking in his career….all I can say is that it only gets easier the more you force yourself to do it.
Second, while the presentation was at the right level of technical detail for an ISACA meeting, and I don’t mean that in a derogatory way ISACA, there were also some really good questions from the students in attendance, which was very encouraging. I do believe an important first step in defending your organization comes from a through understanding of the threats you face as well as your risk profile based on what your company does, how it does it, and your likelihood of being targeted by attackers in addition to the general opportunistic attacks we see on a daily basis.
That being said, I think there were some great questions that may not have been fully answered during the course of the presentation, and I’d like to list those here and take a shot at answering. I took the liberty of paraphrasing some questions and consolidating them where it made sense…so here we go:
1. What is the number one attack vector for malware in the recent past?
I made this question more broad and vague as was asked in the presentation, but I did that on purpose so I could answer it a few different ways. First, social engineering and targeting the users is nothing new, so that is has been and will be an attack vector that is used. More specifically, client-side browser exploits utilizing vulnerabilities in the browser, and more likely the plug-ins and 3rd party apps such as Adobe and Java (as an example, the new Adobe X 0-day that was, or will be, released soon). I think this has been standard operating procedure for attackers for the past 4 years given how insecure and under-patched many of these applications are. We are pretty good at patching the OS layer, but not so good at patching 3rd party applications, especially as they exists on mobile laptops that aren’t always connected to the corporate network. One thing to keep an eye on in this space is HTML5. If it ends up being as popular as Java/Flash look for an increase in vulnerability identification and use in attacks. Don’t believe me? Look at all of the exploit kits out there (last time I looked at my list I had 34 of them) and look at the CVE’s related to each of the exploit kits…they range from 2004-2011 and most target Java, Flash, and PDFs.
Want to see how insecure your 3rd party apps may be? Download and run Secunia PSI (free for personal use) and review the report.
2. Is Zeus targeted or opportunistic? Do I need to be more concerned about protecting a C-level exec, the rest of our users, or both?
Zeus, as a MITM banking Trojan, and by necessity is an opportunistic attack. If it can steal $5 or $5000 it doesn’t really matter. The more systems I have compromised the more money I can make, therefore from an attackers perspective it makes sense to spread this as far and wide as I can. I don’t mean to generalize here, but my advice is to protect all of your user’s systems in the same way when it comes to opportunistic threats. On the other hand, you do need to be concerned about targeted attacks against executives and ensure they, and their admins, understand that they may be targeted. For example, we trained the exec at the law firm to help them proactively identify a targeted phishing attack. One day we received a call from an exec stating that they received an email, it didn’t look legit, and had a PDF attachment that they didn’t open. We immediately reviewed the attached PDF and it was weaponized (although poorly) to infect the system with a dropper and connect back to C2 to get a binary. When we looked at the content of the email message we noticed that it was unique enough to comb through all received mail message for the same email and attachment. What we noticed was that 5 other messages like the one we had in our possession were sent, but only to executives of the firm. On top of that, each had a weaponized PDF attachment that was different from the others but had the same dropper functionality. The polymorphism was likely in place to evade IDS, mail filters, and AV…all of which were bypassed without issue.
3. You said AV isn’t effective given that it is signature based. What else can we do to protect users from being infected, and if we can’t protect them how can we detect malware?
This was a great question, and the one that actually spurred me to write this post, that went unanswered (at least to my satisfaction). Yes, part of AV detection is signature based, but so are mail filters and IDS/IPS systems. It is true that these commodity controls can protect us from the “known” malware that is floating around the internet, but it can’t protect us from new malware…I think this is an obvious statement given the number of systems that are compromised on a regular basis.
That being said, there are some controls we can implement that aren’t signature based that can detect malware based on behavior. Since I mentioned social engineering, it may be helpful to give our users a hand in determining the “goodness or badness” of emails they receive by ranking them. Email analytics is a good start, and products have now sprung up that play in this space. ProofPoint is an example of a tool that may empower your users and allow them to make better decisions about emails they receive and what to do with them. It isn’t full-blown security data analytics, but it is a start. Another example of a vendor in this space is FireEye with their email and web products, which can identify executable attachments in email and those received from clicking on internet links (or drive-by downloaded), analyze them in a sandbox, and make a determination of the as to their “badness”. Damballa is also another product focused on behavior analysis of malware as it uses the network…this makes sense as malware which doesn’t communicate to its owner isn’t very valuable. Their technology makes use of the known C2 systems as well as DGA-based malware generating many resolution requests and getting a bunch of NX’s back. Finally, Netwitness is an invaluable tool in both monitoring and incident response as it gives the visibility into the network that we have been lacking for so many years. And yes, there is a lot of overlap in these tools, so expect some consolidation in the coming years.
I don’t mean to push vendors as a solution and would never throw technology at a situation to fix the underlying root causes – unpatched OS, browsers, and 3rd party applications open a nice attack surface for the bad guys. Why do we allow our users full control of their system? Do they all need to be admin? We also don’t seem to be doing a great job of monitoring the network and all of the systems we own…what bothers me most is that the attackers are attacking us on home turf. We own the battlefield and keep getting our a$$es handed to us.
4. There was a comment on the use of Palo Alto and Wildfire in relation to the use of the cloud and how that may help.
Most all of the technologies mentioned above use the same mechanism, and this is nothing new as AV vendors have been doing this since they realized they could get good intel from all of their customers. My only caution is that the benefit realized from sending all bad binaries to a cloud service for analysis is that it is dependent on how good that analysis is.
So to close, my suggestion to anyone interested in malware prevention, detection, and analysis is that there are some great resources on the internet as well as some decent classes you can take to better understand this threat. If analysis is your thing then I’d recommend Hacking – The Art of Exploitation and Practical Malware Analysis as some good reads. Setup a lab at home and experiment with some of the tools and techniques used by past and current malware…nothing beats hands on work in this space as the more you know the better you are at malware identification and response.
Link to the presentation site – http://events.depaul.edu/event/zeus_malware_family_the_dark_industry#.UJ1baoXLDEs