In the first two installments in this series, I looked at the rise of

In this post, I’ll look at the third aspect of data-driven security emerging today, and one of the most important for its implications for security management: techniques for enabling data to better inform security strategy and direction.

Strategy in this sense primarily has to do with two things:

  • Awareness of the current security posture and effectiveness of tactics, and
  • Informed decision-making regarding the use of resources. This may range from short-term decisions on deploying investigative or forensic resources, to more long-range investment based on current performance and changing tactical requirements.

Of course, log and event management have been at the heart of tactical management in many organizations for some time, but this is only part of the landscape of data-driven strategy and management tools I expect to see emerge, as organizations continue to strive for ways to make the most of large and increasing amounts of data. Even within the realm of log and event management, technologies will not only have to adapt to the data deluge, but must become more flexible in how they are used, managed and deployed – not least to maintain their effectiveness in the face of a data avalanche. Already, we see SIEM and log management vendors seeking to make activities such as rule and query building more intuitive and natural, reflecting in a way the impact of natural-language query techniques based on elemental “building blocks” that transformed the database with the rise of SQL years ago.

This, however, implies the structure that is needed to analyze and correlate event data. Much of the information collected by log and event management already has its own structure, but this structure can be highly inconsistent from one technology – or even one vendor within a technology – to the next. Data normalization is often a primary emphasis of many log and event management tools, but some, such as ArcSight, have sought to define a cross-platform format for event data to ease the burdens of normalization and rationalization. More vendor-neutral efforts include those of the SCAP community and the Common Event Expression (CEE) initiative.

Normalizing widely disparate data types is no easy task, however – and it is not limited simply to questions of format. A given bit of data may have meaning in many different senses (witness the recent non-event concerning the differences between Ophiuchus as a constellation in astronomy, and Ophiuchus as a potential astrological sign) – but there may be substantial differences between tools as to the meaning of this information. (Disconnects between security, compliance and operational contexts are one such example within IT.) Enabling individual data elements to have relevance in a number of different contexts is at the heart of initiatives such as the Semantic Web and the field of Web ontology, which seek to provide a system for modeling, sharing and giving meaning to data in many different contexts through techniques such as the Resource Description Framework (RDF)Web Ontology Language (“OWL,” go figure) and related definition of datatypes for the XML Schema Definition Language (XSD). The Semantic Web has high potential for closing gaps between data-centric technologies, but implementation would be a major undertaking for many current tools, and could further expand the sheer volume of data generated and consumed by management systems.

What about data with little or no structure, or whose nature may be very different from log or event data, which may nonetheless be relevant to security strategy and management? Already, we have seen the technologies of search applied to monitoring and log data by vendors such as Splunk to give organizations greater flexibility in deepening their operational insight, and not just in security. This suggests the larger value of the techniques of data management across the spectrum of IT operations.

Note the two main types of data handled and generated by the techniques described above:

  • Textual data that largely communicates values in a verbal sense (event data is one example), and
  • Quantitative data, or quantitative insight derived from analysis of a wide range of data types, both verbal and non-verbal.

There is at least one other data type that also has value to IT security and risk management:

  • Object data, which may or may not fit into either of the above categories. This may include what DBAs may think of as “BLOBs” (binary large objects). These often have no meaning unless translated from a digital format into something human-usable, such as images or audio. Or, they may be documents which add context to their textual content when rendered as a specific type of document or message file, for example.

There are some obvious disconnects between these data types that still limit their use in “connecting the dots” across security-relevant data analysis. For one thing, the quantitative data sources available to security professionals are still evolving. Outcome data, for example – particularly in the form of breach incident data – could become a powerful enabler of security management and give security at least a foothold on analytic concepts long established in other domains of risk management. However, the data available today is still fairly limited, and for the most part is still not yet available – or at least not yet too widely available – for direct analysis by security strategists. Increasing the volume and availability of this data is the objective of interests from Verizon’s data breach investigation efforts and its recently introduced VERIS application, to organizations such as the Open Security Foundation and proponents of the New School of Information Security.

But data availability is only one roadblock today. Making this data useful is another. Textual data, for example, is highly searchable using current search techniques – but quantitative data largely is not. Moreover, data analysis can yield entirely new sets of derivative data (which I think of as “second order” data sets), which, in turn, could be exposed as meaningful data in their own right – if they can be communicated effectively…and if their underlying data sources can be protected (no small matter in the infosec world).

This is the realm of what I think of as “synthesis platforms” that unite and give meaning to textual, quantitative and object data. Such platforms could also “liberate,” if you will, the findings of data analysis to give further meaning to strategically valuable information in the form of second-order data sets.

This suggests how technologies from digital forensics and investigation to Business Intelligence will have a growing impact on data-driven security, which I’ll examine in the next post in this series.

Enhanced by Zemanta