Does the NSA want to collect everything? Let’s check the facts and then decide.

National Security Agency Seal

National Security Agency Seal (Photo credit: DonkeyHotey)

Over the past several months, we have read headlines and stories about the new surveillance state. We have been told that the United States government, in particular the NSA, intends to “collect, monitor, and store every telephone and internet communication that takes place inside the US and the world.” Such claims have made the public rightfully anxious, fearful, and distrustful of their elected government. But are they correct? We need to check the claims of the stories to see if they are true.

The claim: they want to collect it all

The claims come from Glenn Greenwald’s article in the Guardian.  The Crux of the NSA Story in one phrase: Collect it all.  The article reported that the NSA wanted to “collect, monitor, and store every telephone and internet communication that takes place inside the US and the world.” The evidence for this claim was a Washington Post article that refers to General Alexander’s desire, while in Iraq, to “collect it all”. Here is the quotation.

Numerous NSA documents we’ve already published demonstrate that the NSA’s goal is to collect, monitor and store every telephone and internet communication that takes place inside the US and on the earth. It already collects billions of calls and emails every single day. Still another former NSA whistleblower, the mathematician William Binney, has said that the NSA has “assembled on the order of 20 trillion transactions about US citizens with other US citizens” and that “estimate only was involving phone calls and emails.”

When we look at the evidence, we find it does not support his claims.  The document in question refers to a programme called Boundless Informant.  Mr Greenwald and Mr. MacAskill wrote an article Boundless Informant: the NSA’s secret tool to track global surveillance data which described the programme. However, Boundless Informant is not designed to collect everything. As the article explains.

The focus of the internal NSA tool is on counting and categorizing the records of communications, known as metadata, rather than the content of an email or instant message.

The Boundless Informant documents show the agency collecting almost 3 billion pieces of intelligence from US computer networks over a 30-day period ending in March 2013.

Even though 3 Billion pieces of intelligence is a large number, it is small when compared to what the internet generates every day. The document refers to the volumes taken from specific countries. As we can see these are countries that are of interest to the United States because of national security issues.

Another claim is the title article. XKeyscore: NSA tool collects ‘nearly everything a user does on the internet’ except the article is not about collecting everything

A top secret National Security Agency program allows analysts to search with no prior authorization through vast databases containing emails, online chats and the browsing histories of millions of individuals, according to documents provided by whistleblower Edward Snowden.

You can read the slides here: Boundless Informant NSA data-mining tool – four key slides The article describes the ability to look at particular users in detail not all users all the time.

The facts do not support the claim

In a statement by the NSA The National Security Agency: Missions, Authorities, Oversight and Partnerships we can see why the attempt is not possible. they explained how much of the internet they tracked.

According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However, of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world’s traffic in conducting their mission.

A petabyte is defined as . 1 PB = 1000000000000000B = 1015bytes = 1000terabytes. The NSA is collects on 29 about petabytes a day, every day. By way of comparison, Google deals with 24 petabytes a day in 2009.  AT+T transfers 30 petabytes of data through its systems every day.

The amount of material that is created every day means that no one, not even the NSA, has the technological capacity to do what Mr. Greenwald claimed they intended to do. The largest storage capacity built to date is by IBM who are reported to have built the largest storage array ever, with a capacity of 120 petabytes. It is technically impossible. You may as well try to run to the horizon or draw a full size map of America. Such an attempt should remind us of Stephen Crane’s poem I Saw a Man

I saw a man pursuing the horizon;
Round and round they sped.
I was disturbed at this;
I accosted the man.
“It is futile,” I said,
“You can never — ”

“You lie,” he cried,
And ran on. 

An alternative view, to rebut the counter claim, but is it any better?

Jeff Jarvis’s claims that NSA’s “small” amount is very large when properly understood. He suggests that it may be 50% of the web’s traffic.

Keep in mind that most of the data passing on the net is not email or web pages. It’s media. According to Sandvine data (pdf) for the US fixed net from 2013, real-time entertainment accounted for 62% of net traffic, P2P file-sharing for 10.5%.

The NSA needn’t watch all those episodes of Homeland (or maybe they should) or listen to all that Coldplay – though, I’m sure the RIAA and MPAA are dying to know what the NSA knows about who’s “stealing” what, since that “stealing” allegedly accounts for 23.8% of net traffic.

So, by very rough, beer-soaked-napkin numbers, the NSA’s 1.6% of net traffic would be half of the communication on the net. That’s one helluva lot of “touching”.

On the surface, his argument sounds persuasive. The NSA must be collecting too much data because most of the internet is videos, at least in the US, and the NSA should not be interested in videos.  The comments make a good sound bite, but do they help us understand the issue? What his analysis misses is the challenge of steganography. Steganography is the art and science of hiding data within data.   Secrets can be hidden in videos and uploaded to YouTube or other platforms.. The NSA’s work is to find these types of threats and to decide if they are from a terrorist. They have no choice because Al Qaida hid secret transmissions in pornographic videos.

Panning for gold dust in the internet’s waterfall

The NSA is not looking to find what the average person is doing. They do not have the time or the capacity for that work. More to the point, it is not in their legal remit. A better way to understand the problem is that the NSA is trying to pan for gold dust (terrorists) in a data waterfall.  They have to find a target, and then they have to find out whether the target material hides encrypted data. Once they have found it, they have to unlock it. With so much data generated and encrypted, the NSA is on the back foot. The false positives from people encrypting documents are like people throwing golden coloured sand into the waterfall.  The NSA has to stop and look at the golden sand to make sure it is not a threat before it can move on to the other tasks.

The NSA has to collect widely because the terrorists have changed their approach. Al Qaeda and other related terrorists groups have improved their security to avoid being detected by the NSA.  They take various counter measures to communicate and maintain their networks. Terrorist groups use to social networks, videos, and the internet to radicalise, recruit, and train people. The work to find the threats is difficult. Encryption is only one problem. Other problems will be spoof accounts, or false names. In other cases, they may create a false identity such as sleepers to hide their organisation. (Perhaps we should watch Homeland more.)

Conclusion: More reasoned information less opinion or fear mongering

For the average citizen, the reports do not help them understand the issues.  Instead of trying to frighten the public by claiming the NSA intends or has the capacity to collect it all, or to debunk the NSA’s claims, journalists could help us to understand the issues. If we understood the context, rather than were given the journalist’s opinions, we could be equipped to make an informed decision about what should be done. If we are only to go by opinions or by fear, are we any better informed?

As readers, we experience the NSA’s challenge.  We want to make sense of the world from all the confusion but we are distracted by headlines that generate fear and not understanding and opinions that generate uncertainty and not clarity.


About lawrence serewicz

An American living and working in the UK trying to understand the American idea and explain it to others. The views in this blog are my own for better or worse.
2 Responses to Does the NSA want to collect everything? Let’s check the facts and then decide.

  1. American Kulak says:

    Once again, the problem as stated by former Senate Intelligence Committee senior aide Angelo Codevilla is that the NSA has no hope of decrypting all encrypted traffic in that ‘data waterfall’, or even a fraction of it. Thus by definition as Codevilla points out in the “Ruling Class Consensus on Domestic Spying” it is precisely the unencrypted communications of the 99% of the people on this planet that pose no threat whatsoever to American national security that wind up in the ‘haystacks’ or ‘pool’ to extend your digital waterfall analogy.


    • Thanks for the response. I edited it down because of length and it was repeating the points you made in the first paragraph.
      I have met Professor Codevilla a couple of times and he presented a couple of papers at my graduate school. We went to the same graduate school although many years apart. He is a very bright guy and he knows a lot of things. He and I will likely disagree on some of these issues and still find common ground in our desire to protect and nurture the American regime.
      As I explained in an earlier response, the terrorists will hide in plain site which is part of the reason why the NSA has to Hoover up large amounts of data. However, most of it is not stored for very long simply because they do not have that capacity. No one does. They also want to protect the programme because the overall programme is useful even if they have to change its parameters.

      What underpins this whole discussion is whether the American public want to be safe and whether they have a government to keep them safe. If the NSA does not do that, then it would cease to be useful. However, it does a very good job at what it does do, which is why people want to keep it even if they do not realize it. To put it differently but directly, even theocracies have spy agencies.
      Thanks again for the stimulating comment and the reference to Professor Codevilla.

