A lot of ink (digital and otherwise) has been spilled over Wikileaks this year, but there is one central aspect of the recent “cablegate” case that I wonder if we really get in infosec: Simply put, information has gotten huge – and this doesn’t just mean the content we must protect. The sheer volume of information we can call upon today in security and risk management could bury us – but it could also introduce entirely new opportunities if we learn how to seize them.
Last February, The Economist described the growing “data deluge”: In 2005, mankind created 150 exabytes (billion gigabytes) of data. This year, it will create 1,200. To put these numbers in perspective: according to Google CEO Eric Schmidt, 5 exabytes of data were created between the dawn of civilization and 2003. Now, says Schmidt, that much information is created every two days.
As Wikileaks has made embarrassingly clear, few sectors suffer as much from this deluge as government, which not only generates vast amounts of information, but must also wrestle with some difficult paradoxes in its management:
- Millions of people work in government – yet the most sensitive information must be restricted to a few. Exercise for the reader: Define “few” in the context of an organization such as the US federal government and, as we would say in industry, “its partners and suppliers.”
- When meaningful information is available that could help the work of many, it should be shared – or so the theory goes. This was, of course, one of the key recommendations of the 9/11 Commission. However, this information should also be controlled if considered sensitive. But how does one balance “sharing” with “control” in something as enormous and sprawling as government – and who, in a democracy, decides?
The result is that a lot of information has been deemed sensitive, a lot of people have access to it – and a lot more want access in the name of open government. This past July, the Washington Post ran a revealing series of articles on the state of what it called “the top-secret world that government has created in response to the terrorist attacks of Sept. 11, 2001.” Among the findings of the Post’s two-year investigation:
- Some 1,271 government organizations and 1,931 private companies work on programs related to counterterrorism, homeland security and intelligence in about 10,000 locations across the United States.
- Intelligence analysts alone publish 50,000 reports each year (“a volume so large that many are routinely ignored,” says the Post).
- Approximately 854,000 people hold top-secret security clearances (“nearly 1.5 times as many people as live in Washington, D.C.”).
The actual number of people having access to sensitive government information is likely much larger. Consider, for example, that as many as three million people may have had delegated access to classified US government data as long ago as the late 1990s, according to a 1997 report of a bipartisan congressional committee investigating government secrecy chaired by Senator Daniel Patrick Moynihan of New York.
While Wikileaks has precipitated much debate over how much of what kind of information government should keep to itself, anyone who files a tax return – let alone anyone who has a loved one deployed with troops in the field – would likely argue that governments – as well as businesses and individuals – must keep some information confidential. But how to deal with this scale of sensitivity, and those who have access to it?
Some (myself included) have suggested that better monitoring of information access and use, along with better analysis of access privileges, are two keys to getting a handle on this challenge. In principle, this may well be true – but what we’re talking about here is generating yet another avalanche of information. Monitoring alone at this scale would generate enormous volumes of data that would have to be analyzed for likely indicators of risk. Add to that one of the biggest challenges that information security has always faced: correlation and integration. It’s not that we don’t have enough tools – far from it. A good many of our challenges stem from the fact that what we do have is not only siloed and hampered by bondage to legacy approaches, but was never really designed for this scale of demand. Nor does it integrate well enough in many cases to provide the level of correlation and insight needed – particularly when what we are after is what one former Secretary of Defense referred to as “unknown unknowns.”
Others have pointed out that a good deal of the problem in government is that the public sector simply over-classifies information as sensitive, effectively rendering none of it truly confidential (an argument that factored into the Supreme Court’s 1971 decision that enabled the New York Times and Washington Post to publish the Pentagon papers). I get the argument, but I think it misses the fact that organizations of all kinds are generating enormous amounts of information, and making it available to large numbers of consumers – and these, I think, are problems we now need to grasp of in terms of what it means to manage sensitivity at scale.
Security professionals, welcome to the world of Big Data.
What does this mean for IT security and risk management? Just as important, if not moreso for the impact it will have on effective management: how can we make better use of the mountains of information we already generate and collect in the course of protecting information, which will undoubtedly multiply along with the information we need to protect? What kinds of data and data sources would help us get a better handle on these challenges, and how are they being made available – today as well as in the future?
Better still: How can we turn big data into an advantage? How can the techniques of large-scale data analysis and Business Intelligence make massive information more useful in security? And how will IT security and risk management be transformed by the opportunity – or will it change at all?
Stay tuned to this channel in the coming year…