Security and Big Data: Two Sore Points and Seven Questions
There’s no question that the intersections of Big Data and security have grabbed a lot of attention in the year-plus since I’ve been blogging on this topic.
Indeed, it seems that Big Data will be one of the major focus areas at the RSA Conference a month hence:
- I count at least three sessions and who knows how many vendors capitalizing thereupon…
- …including <unabashedPlug> my own 8 AM (!) Thursday panel on managing advanced security problems using Big Data analytics (session SPO1-301). </unabashedPlug> This panel will be hosted by Eddie Schwartz, a like-minded adventurer whom many of you know from NetWitness…and who has also been around the security block more than a few.
- One such session that I’m looking forward to most will be that of Preston Wood and the Zions Bancorporation team, who were early adopters in building a security data warehouse based on Hadoop. If you want to hear about hands-on experience, I can recommend this bunch. I’ve heard Preston present on this, and welcome the opportunity to learn more from the Zions crew directly involved in this effort (Thursday at 10:40, Session TECH-303).
Clearly there is a lot of opportunity in applying the tools and practices of Big Data in information security. And, given the heady effervescence of such a transformational concept, this has also led to some abuse. As Andrew Hay pointed out earlier this week, “if you’re a SIEM/LM vendor, you can’t just slap ‘big data’ on some materials and be anointed as a player in that space.”
The situation is not helped by the difficulty encountered in looking for a consensus definition of “Big Data” to begin with. Searching for one on the Intertubes yields a variety of results which often invoke the (now) familiar 3 V’s of volume, variety or velocity (to which IBM’s Arvind Krishna has added a fourth: veracity). Typically cited are the problems that arise when data becomes more than a relational database system can handle, which inspired a move toward so-called “NoSQL” techniques. Many see two distinct turning points – the first when Google began to talk about MapReduce, the second when Doug Cutting developed the Hadoop framework not long thereafter – the implication being that Big Data is necessarily about distributed processing across a large-scale data repository.
But these definitions fall short, and not just in considering where SQL-based data mining and warehousing techniques may not only suffice but be supported with mature tools for database management. Consider as well examples such as IBM’s approach to what it calls “stream computing,” which Computerworld’s Jon Brodkin summed up as “analyzing data in continuously updated streams of information from multiple sources, rather than static files pre-loaded into a data warehouse.” In other words:
Big Data isn’t just about NoSQL warehousing
The “streams” approach could have valuable application to security, in bringing data processing closer to real time and updating understandings accordingly in response to fast-moving changes. The concept is illustrated in an often-seen IBM TV commercial that describes how pointless it would be to cross an intersection here and now based on the traffic that was there five minutes ago. This highlights how readily the Big Data concept can be stretched, which, I think, is a good thing. But this will, of course, also lead to abuse.
Regardless how it does it, if a tool or technique fails to help you get beyond past or existing constraints on
- finding data that really matters,
- making practical use of it,
- keeping up with the pace of this data, or
- making use of a wider range of information that would make a difference,
no amount of arm-waving about “[b|B]ig [d|D]ata” is going to make a particle of difference to you, regardless whether it’s new technology or a re-labeling of the stuff that failed to do it for you up to now.
Which raises another point – and one more significant to me personally:
Data-driven security isn’t just about Big Data
It’s true that technologies designed to process data at scale can be powerful enablers of both tactics and strategy. But these are not the only ways that making the most of data can have a transformational impact on security. Moreover, a focus on the tactics of large-scale data management may also distract from what should be the objective: shaping a strategy and improving defense based on what you know, not what you guess, assume or simply fear.
Hardly anyone knew this better than the “quants” at @stake, who made a case for security measurement well before Doug Cutting’s son even had a stuffed elephant, and whose alumni continue to have an impact on the evolution of infosec – from visionary Dan Geer to Veracode’s Chris Wysopal and Perimeter e-Security’s Andrew Jaquith, who literally wrote the book on security metrics, just to name a few.
One recent item that accentuated this point was this article, “Why Infosec Forced Me to Get an MBA,” by Don Turnblade. While those highly focused on metrics or risk may tussle over some of the specifics in this piece, the thought of calling on just such a practical approach to business logic would have been considered nothing short of radical when I was a CISO. Back then, many were still flirting with ideas such as ROSI (“Return on Security Investment”), a concept whose limitations were deftly and mirthfully dissected by Andrew Jaquith’s “assume a spherical cow” illustration in Security Metrics. The fact that articles like Don’s can still turn heads suggests just how far we still have to go.
Consider also the strides made by services vendors such as Vigilant, whose business revolves around helping organizations make the most of both new and existing approaches to security information management, including making the most of SIEM. With the recent augmentation of its offerings with a distinctive approach to intelligence delivery, Vigilant offers a model of what information management directly relevant to enterprise security should look like.
The tools that are taking us beyond the data management constraints of the past are just that: tools, albeit pretty significant ones that have the potential to open entirely new vistas to us, in security and beyond. But they are only part of the larger picture.
To me, data-driven security revolves around seven key questions, regardless whether “Big Data” (however defined) is part of the answer or not:
- Can we better defend that which is worth defending through better insight?
- If so, how and where can we best make use of that insight?
- Where can intelligence better inform the tactics of prevention and defense?
- How can we use it to better determine where our efforts can have the greatest impact?
- What are the data sources we can call upon to answer these questions?
- How can we make this data more usefully available?
- And how can we act responsibly with what we know?