Search This Blog

Saturday, 22 June 2013

Mastering the internet: how GCHQ set out to spy on the world wide web

Detail of a top secret briefing to GCHQ intelligence analysts who were to have access to Tempora



Project Tempora – the evolution of a secret programme to capture vast amounts of web and phone data


The memo was finished at 9.32am on Tuesday 19 May 2009, and was written jointly by the director in charge of GCHQ's top-secret Mastering the Internet (MTI) project and a senior member of the agency's cyber-defence team.

The internal email, seen by the Guardian, was a "prioritisation and tasking initiative" to another senior member of staff about the problems facing GCHQ during a period when technology seemed to be racing ahead of the intelligence community.

The authors wanted new ideas – and fast.

"It is becoming increasingly difficult for GCHQ to acquire the rich sources of traffic needed to enable our support to partners within HMG [Her Majesty's government], the armed forces, and overseas," they wrote.

"The rapid development of different technologies, types of traffic, service providers and networks, and the growth in sheer volumes that accompany particularly the expansion and use of the internet, present an unprecedented challenge to the success of GCHQ's mission. Critically we are not currently able to prioritise and task the increasing range and scale of our accesses at the pace, or with the coherence demanded of the internet age: potentially available data is not accessed, potential benefit for HMG is not delivered."

The memo continued: "We would like you to lead a small team to fully define this shortfall in tasking capability [and] identify all the necessary changes needed to rectify it." The two chiefs said they wanted an initial report within a month, and every month thereafter, on what one of them described as "potential quick-win solutions not currently within existing programme plans".

Though this document only offers a snapshot, at the time it was written four years ago some senior officials at GCHQ were clearly anxious about the future, and casting around for ideas among senior colleagues about how to address a variety of problems.

According to the papers leaked by the US National Security Agency (NSA) whistleblower Edward Snowden, Cheltenham's overarching project to "master the internet" was under way, but one of its core programmes, Tempora, was still being tested and developed, and the agency's principal customers, the government, MI5 and MI6, remained hungry for more and better-quality information.

There was America's NSA to consider too. The Americans had been pushing GCHQ to provide better intelligence to support Nato's military effort in Afghanistan; a reflection, perhaps, of wider US frustration that information sharing between the US and the UK had become too lopsided over the past 20 years.

In the joint instruction from 2009, the director had twice mentioned the necessity to fulfil GCHQ's "mission", but the academics and commentators who follow Britain's intelligence agencies are unsure exactly what this means, and politicians rarely try to define it in any detail.

The "mission" has certainly changed and the agency, currently run by Sir Iain Lobban, may be under more pressure now than it has ever been.

The issues, and the enemies, have become more complex, and are quite different from the comparatively simple world into which Britain's secret services were born in 1909.

At the time, concern about German spies living in the UK led to the establishment of a Secret Service Bureau and, at the start of the first world war, two embryonic security organisations began to focus on "signals intelligence" (Sigint), which remains at heart of GCHQ's work.

The codebreakers of Bletchley Park became heroes of the second world war as they mastered the encryption systems used by the Nazis. And the priority during the cold war was Moscow.

During these periods GCHQ's focus was clear, and the priorities of the "mission" easier to establish.

There was no parliamentary scrutiny of its work so the agency, which moved from Milton Keynes to Cheltenham in the early 1950s, existed in a peculiar limbo.

That changed, and with it the boundaries of its work, with the 1994 Intelligence Services Act (Isa), which gave a legal underpinning to the agency for the first time. The act kept the powers and objectives of GCHQ broad and vague.

The agency was tasked with working "in the interests of national security, with particular reference to the defence and foreign policies of Her Majesty's government; in the interests of the economic wellbeing of the United Kingdom; and in support of the prevention and the detection of serious crime".

Reviewing the legislation at the time, the human rights lawyer John Wadham, then legal director of Liberty, highlighted the ambiguities of the expressions used, and warned that the lack of clarity would cause problems and concern.

"National security is used without further definition. It is true the courts themselves have found it impossible to decide what is or what is not in the interests of national security. The reality is that 'national security' can mean whatever the government of the day chooses it to mean." The same could be said for the clause referring to "economic wellbeing".

Arguably, GCHQ's responsibilities have broadened even further over the past decade: it has become the UK's lead agency for cyber-security – identifying the hackers, criminal gangs and state actors who are stealing ideas, information and blueprints from British firms.

Alarmed by the increase in these cyber-attacks, and faced with billions of pounds' worth of intellectual property being stolen every year, the government made the issue a tier-one priority in the 2010 strategic defence and security review. In a time of cuts across Whitehall, the coalition found an extra £650m for cyber-security initiatives, and more than half was given to GCHQ. It has left the agency with a vast array of responsibilities, which were set out in a pithy internal GCHQ memo dated October 2011: "[Our] targets boil down to diplomatic/military/commercial targets/terrorists/organised criminals and e-crime/cyber actors".

All this has taken place during an era in which it has become harder, the intelligence community claims, for analysts to access the information they believe they need. The exponential growth in the number of mobile phone users during the noughties, and the rise of a new breed of independent-minded internet service providers, conspired to make their work more difficult, particularly as many of the new firms were based abroad, outside the jurisdiction of British law.

Struggling to cope with increased demands, a more complex environment, and working within laws that critics say are hopelessly outdated, GCHQ starting casting around for new, innovative ideas. Tempora was one of them.

Though the documents are not explicit, it seems the Mastering the Internet programme began life in early 2007 and, a year later, work began on an experimental research project, run out of GCHQ's outpost at Bude in Cornwall.

Its aim was to establish the practical uses of an "internet buffer", the first of which was referred to as CPC, or Cheltenham Processing Centre.

By March 2010, analysts from the NSA had been allowed some preliminary access to the project, which, at the time, appears to have been codenamed TINT, and was being referred to in official documents as a "joint GCHQ/NSA research initiative".

TINT, the documents explain, "uniquely allows retrospective analysis for attribution" – a storage system of sorts, which allowed analysts to capture traffic on the internet and then review it.

The papers seen by the Guardian make clear that at some point – it is not clear when – GCHQ began to plug into the cables that carry internet traffic into and out of the country, and garner material in a process repeatedly referred to as SSE. This is thought to mean special source exploitation.

The capability, which was authorised by legal warrants, gave GCHQ access to a vast amount of raw information, and the TINT programme a potential way of being able to store it.

A year after the plaintive email asking for new ideas, GCHQ reported significant progress on a number of fronts.

One document described how there were 2 billion users of the internet worldwide, how Facebook had more than 400 million regular users and how there had been a 600% growth in mobile internet traffic the year before. "But we are starting to 'master' the internet," the author claimed. "And our current capability is quite impressive."

The report said the UK now had the "biggest internet access in Five Eyes" – the group of intelligence organisations from the US, UK, Canada, New Zealand and Australia. "We are in the golden age," the report added.

There were caveats. The paper warned that American internet service providers were moving to Malaysia and India, and the NSA was "buying up real estate in these places".

"We won't see this traffic crossing the UK. Oh dear," the author said. He suggested Britain should do the same and play the "US at [their] own game … and buy facilities overseas".

GCHQ's mid-year 2010-11 review revealed another startling fact about Mastering the Internet.

"MTI delivered the next big step in the access, processing and storage journey, hitting a new high of more than 39bn events in a 24-hour period, dramatically increasing our capability to produce unique intelligence from our targets' use of the internet and made major contributions to recent operations."

This appears to suggest GCHQ had managed to record 39bn separate pieces of information during a single day. The report noted there had been "encouraging innovation across all of GCHQ".

The NSA remarked on the success of GCHQ in a "Joint Collaboration Activity" report in February 2011. In a startling admission, it said Cheltenham now "produces larger amounts of metadata collection than the NSA", metadata being the bare details of calls made and messages sent rather than the content within them.

The close working relationship between the two agencies was underlined later in the document, with a suggestion that this was a necessity to process such a vast amount of raw information.

"GCHQ analysts effectively exploit NSA metadata for intelligence production, target development/discovery purposes," the report explained.

"NSA analysts effectively exploit GCHQ metadata for intelligence production, target development/discovery purposes. GCHQ and NSA avoid processing the same data twice and proactively seek to converge technical solutions and processing architectures."

The documents appear to suggest the two agencies had come to rely on each other; with Tempora's "buffering capability", and Britain's access to the cables that carry internet traffic in and out of the country, GCHQ has been able to collect and store a huge amount of information.

The NSA, however, had provided GCHQ with the tools necessary to sift through the data and get value from it.

By May last year, the volume of information available to them grew again, with GCHQ reporting that it now had "internet buffering" capability running from its headquarters in Cheltenham, its station in Bude, and a location abroad, which the Guardian will not identify. The programme was now capable of collecting, a memo explained with excited understatement, "a lot of data!"

Referring to Tempora's "deep dive capability", it explained: "It builds upon the success of the TINT experiment and will provide a vital unique capability.

"This gives over 300 GCHQ and 250 NSA analysts access to huge amounts of data to support the target discovery mission. The MTI programme would like to say a big thanks to everyone who has made this possible … a true collaborative effort!"

Tempora, the document said, had shown that "every area of ops can get real benefit from this capability, especially for target discovery and target development".

But while the ingenuity of the Tempora programme is not in doubt, its existence may trouble anyone who sends and receives an email, or makes an internet phone call, or posts a message on a social media site, and expects the communication to remain private.

Campaigners and human rights lawyers will doubtless want to know how Britain's laws have been applied to allow this vast collection of data. They will ask questions about the oversight of the programme by ministers, MPs and the intelligence interception commissioner, none of whom have spoken in public about it. 




How does GCHQ's internet surveillance work?

What is an internet buffer?

In essence, an internet buffer is a little like Sky+, but on an almost unimaginably large scale. GCHQ, assisted by the NSA, intercepts and collects a large fraction of internet traffic coming into and out of the UK. This is then filtered to get rid of uninteresting content, and what remains is stored for a period of time – three days for content and 30 days for metadata.

The result is that GCHQ and NSA analysts have a vast pool of material to look back on if they are not watching a particular person in real time – just as you can use TV catch-up services to miss a programme you hadn't heard about.


How is it done?

GCHQ appears to have intercepts placed on most of the fibre-optic communications cables in and out of the country. This seems to involve some degree of co-operation – voluntary or otherwise – from companies operating either the cables or the stations at which they come into the country.

These agreements, and the exact identities of the companies that have signed up, are regarded as extremely sensitive, and classified as top secret. Staff are instructed to be very careful about sharing information that could reveal which companies are "special source" providers, for fear of "high-level political fallout". In one document, the companies are described as "intercept partners".


How does it operate?

The system seems to operate by allowing GCHQ to survey internet traffic flowing through different cables at regular intervals, and then automatically detecting which are most interesting, and harvesting the information from those.

The documents suggest GCHQ was able to survey about 1,500 of the 1,600 or so high-capacity cables in and out of the UK at any one time, and aspired to harvest information from 400 or so at once – a quarter of all traffic.

As of last year, the agency had gone halfway, attaching probes to 200 fibre-optic cables, each with a capacity of 10 gigabits per second. In theory, that gave GCHQ access to a flow of 21.6 petabytes in a day, equivalent to 192 times the British Library's entire book collection.

GCHQ documents say efforts are made to automatically filter out UK-to-UK communications, but it is unclear how this would be defined, or whether it would even be possible in many cases.

For example, an email sent using Gmail or Yahoo from one UK citizen to another would be very likely to travel through servers outside the UK. Distinguishing these from communications between people in the UK and outside would be a difficult task.


What does this let GCHQ do?

GCHQ and NSA analysts, who share direct access to the system, are repeatedly told they need a justification to look for information on targets in the system and can't simply go on fishing trips – under the Human Rights Act, searches must be necessary and proportionate. However, when they do search the data, they have lots of specialist tools that let them obtain a huge amount of information from it: details of email addresses, IP addresses, who people communicate with, and what search terms they use.


What's the difference between content and metadata?

The simple analogy for content and metadata is that content is a letter, and metadata is the envelope. However, internet metadata can reveal much more than that: where you are, what you are searching for, who you are messaging and more.

One of the documents seen by the Guardian sets out how GCHQ defines metadata in detail, noting that "we lean on legal and policy interpretations that are not always intuitive". It notes that in an email, the "to", "from" and "cc" fields are metadata, but the subject line is content. The document also sets out how, in some circumstances, even passwords can be regarded as metadata.

The distinction is a very important one to GCHQ with regard to the law, the document explains: "There are extremely stringent legal and policy constraints on what we can do with content, but we are much freer in how we can store metadata. Moreover, there is obviously a much higher volume of content than metadata.

"For these reasons, metadata feeds will usually be unselected – we pull everything we see; on the other hand, we generally only process content that we have a good reason to target."




GCHQ taps fibre-optic cables for secret access to world's communications


Exclusive: British spy agency collects and stores vast quantities of global email messages, Facebook posts, internet histories and calls, and shares them with NSA, latest documents from Edward Snowden reveal


Britain's spy agency GCHQ has secretly gained access to the network of cables which carry the world's phone calls and internet traffic and has started to process vast streams of sensitive personal information which it is sharing with its American partner, the National Security Agency (NSA).

The sheer scale of the agency's ambition is reflected in the titles of its two principal components: Mastering the Internet and Global Telecoms Exploitation, aimed at scooping up as much online and telephone traffic as possible. This is all being carried out without any form of public acknowledgement or debate.

One key innovation has been GCHQ's ability to tap into and store huge volumes of data drawn from fibre-optic cables for up to 30 days so that it can be sifted and analysed. That operation, codenamed Tempora, has been running for some 18 months.

GCHQ and the NSA are consequently able to access and process vast quantities of communications between entirely innocent people, as well as targeted suspects.

This includes recordings of phone calls, the content of email messages, entries on Facebook and the history of any internet user's access to websites – all of which is deemed legal, even though the warrant system was supposed to limit interception to a specified range of targets.

The existence of the programme has been disclosed in documents shown to the Guardian by the NSA whistleblower Edward Snowden as part of his attempt to expose what he has called "the largest programme of suspicionless surveillance in human history".

"It's not just a US problem. The UK has a huge dog in this fight," Snowden told the Guardian. "They [GCHQ] are worse than the US."

However, on Friday a source with knowledge of intelligence argued that the data was collected legally under a system of safeguards, and had provided material that had led to significant breakthroughs in detecting and preventing serious crime.

Britain's technical capacity to tap into the cables that carry the world's communications – referred to in the documents as special source exploitation – has made GCHQ an intelligence superpower.

By 2010, two years after the project was first trialled, it was able to boast it had the "biggest internet access" of any member of the Five Eyes electronic eavesdropping alliance, comprising the US, UK, Canada, Australia and New Zealand.

UK officials could also claim GCHQ "produces larger amounts of metadata than NSA". (Metadata describes basic information on who has been contacting whom, without detailing the content.)

By May last year 300 analysts from GCHQ, and 250 from the NSA, had been assigned to sift through the flood of data.

The Americans were given guidelines for its use, but were told in legal briefings by GCHQ lawyers: "We have a light oversight regime compared with the US".

When it came to judging the necessity and proportionality of what they were allowed to look for, would-be American users were told it was "your call".

The Guardian understands that a total of 850,000 NSA employees and US private contractors with top secret clearance had access to GCHQ databases.

The documents reveal that by last year GCHQ was handling 600m "telephone events" each day, had tapped more than 200 fibre-optic cables and was able to process data from at least 46 of them at a time.

Each of the cables carries data at a rate of 10 gigabits per second, so the tapped cables had the capacity, in theory, to deliver more than 21 petabytes a day – equivalent to sending all the information in all the books in the British Library 192 times every 24 hours.

And the scale of the programme is constantly increasing as more cables are tapped and GCHQ data storage facilities in the UK and abroad are expanded with the aim of processing terabits (thousands of gigabits) of data at a time.

For the 2 billion users of the world wide web, Tempora represents a window on to their everyday lives, sucking up every form of communication from the fibre-optic cables that ring the world.

The NSA has meanwhile opened a second window, in the form of the Prism operation, revealed earlier this month by the Guardian, from which it secured access to the internal systems of global companies that service the internet.

The GCHQ mass tapping operation has been built up over five years by attaching intercept probes to transatlantic fibre-optic cables where they land on British shores carrying data to western Europe from telephone exchanges and internet servers in north America.

This was done under secret agreements with commercial companies, described in one document as "intercept partners".

The papers seen by the Guardian suggest some companies have been paid for the cost of their co-operation and GCHQ went to great lengths to keep their names secret. They were assigned "sensitive relationship teams" and staff were urged in one internal guidance paper to disguise the origin of "special source" material in their reports for fear that the role of the companies as intercept partners would cause "high-level political fallout".

The source with knowledge of intelligence said on Friday the companies were obliged to co-operate in this operation. They are forbidden from revealing the existence of warrants compelling them to allow GCHQ access to the cables.

"There's an overarching condition of the licensing of the companies that they have to co-operate in this. Should they decline, we can compel them to do so. They have no choice."

The source said that although GCHQ was collecting a "vast haystack of data" what they were looking for was "needles".

"Essentially, we have a process that allows us to select a small number of needles in a haystack. We are not looking at every piece of straw. There are certain triggers that allow you to discard or not examine a lot of data so you are just looking at needles. If you had the impression we are reading millions of emails, we are not. There is no intention in this whole programme to use it for looking at UK domestic traffic – British people talking to each other," the source said.

He explained that when such "needles" were found a log was made and the interception commissioner could see that log.

"The criteria are security, terror, organised crime. And economic well-being. There's an auditing process to go back through the logs and see if it was justified or not. The vast majority of the data is discarded without being looked at … we simply don't have the resources."

However, the legitimacy of the operation is in doubt. According to GCHQ's legal advice, it was given the go-ahead by applying old law to new technology. The 2000 Regulation of Investigatory Powers Act (Ripa) requires the tapping of defined targets to be authorised by a warrant signed by the home secretary or foreign secretary.

However, an obscure clause allows the foreign secretary to sign a certificate for the interception of broad categories of material, as long as one end of the monitored communications is abroad. But the nature of modern fibre-optic communications means that a proportion of internal UK traffic is relayed abroad and then returns through the cables.

Parliament passed the Ripa law to allow GCHQ to trawl for information, but it did so 13 years ago with no inkling of the scale on which GCHQ would attempt to exploit the certificates, enabling it to gather and process data regardless of whether it belongs to identified targets.

The categories of material have included fraud, drug trafficking and terrorism, but the criteria at any one time are secret and are not subject to any public debate. GCHQ's compliance with the certificates is audited by the agency itself, but the results of those audits are also secret.

An indication of how broad the dragnet can be was laid bare in advice from GCHQ's lawyers, who said it would be impossible to list the total number of people targeted because "this would be an infinite list which we couldn't manage".

There is an investigatory powers tribunal to look into complaints that the data gathered by GCHQ has been improperly used, but the agency reassured NSA analysts in the early days of the programme, in 2009: "So far they have always found in our favour".

Historically, the spy agencies have intercepted international communications by focusing on microwave towers and satellites. The NSA's intercept station at Menwith Hill in North Yorkshire played a leading role in this. One internal document quotes the head of the NSA, Lieutenant General Keith Alexander, on a visit to Menwith Hill in June 2008, asking: "Why can't we collect all the signals all the time? Sounds like a good summer project for Menwith."

By then, however, satellite interception accounted for only a small part of the network traffic. Most of it now travels on fibre-optic cables, and the UK's position on the western edge of Europe gave it natural access to cables emerging from the Atlantic.

The data collected provides a powerful tool in the hands of the security agencies, enabling them to sift for evidence of serious crime. According to the source, it has allowed them to discover new techniques used by terrorists to avoid security checks and to identify terrorists planning atrocities. It has also been used against child exploitation networks and in the field of cyberdefence.

It was claimed on Friday that it directly led to the arrest and imprisonment of a cell in the Midlands who were planning co-ordinated attacks; to the arrest of five Luton-based individuals preparing acts of terror, and to the arrest of three London-based people planning attacks prior to the Olympics.

As the probes began to generate data, GCHQ set up a three-year trial at the GCHQ station in Bude, Cornwall. By the summer of 2011, GCHQ had probes attached to more than 200 internet links, each carrying data at 10 gigabits a second. "This is a massive amount of data!" as one internal slideshow put it. That summer, it brought NSA analysts into the Bude trials. In the autumn of 2011, it launched Tempora as a mainstream programme, shared with the Americans.

The intercept probes on the transatlantic cables gave GCHQ access to its special source exploitation. Tempora allowed the agency to set up internet buffers so it could not simply watch the data live but also store it – for three days in the case of content and 30 days for metadata.

"Internet buffers represent an exciting opportunity to get direct access to enormous amounts of GCHQ's special source data," one document explained.

The processing centres apply a series of sophisticated computer programmes in order to filter the material through what is known as MVR – massive volume reduction. The first filter immediately rejects high-volume, low-value traffic, such as peer-to-peer downloads, which reduces the volume by about 30%. Others pull out packets of information relating to "selectors" – search terms including subjects, phone numbers and email addresses of interest. Some 40,000 of these were chosen by GCHQ and 31,000 by the NSA. Most of the information extracted is "content", such as recordings of phone calls or the substance of email messages. The rest is metadata.

The GCHQ documents that the Guardian has seen illustrate a constant effort to build up storage capacity at the stations at Cheltenham, Bude and at one overseas location, as well a search for ways to maintain the agency's comparative advantage as the world's leading communications companies increasingly route their cables through Asia to cut costs. Meanwhile, technical work is ongoing to expand GCHQ's capacity to ingest data from new super cables carrying data at 100 gigabits a second. As one training slide told new users: "You are in an enviable position – have fun and make the most of it." 



Ewen MacAskill    [the Guardian's Washington DC bureau chief]
Julian Borger         [the Guardian's diplomatic editor]
Nick Hopkins        [the Guardian's defence and security correspondent]
Nick Davies          [author & a former Journalist of the Year]
James Ball           [data journalist]

 The Guardian