Privately gathering statistics and training simple models

Last week, Luca Melis has presented our NDSS16 paper “Efficient Private Statistics with Succinct Sketches“, where we show how to privately and efficiently aggregate data from many sources and/or large streams, and then use the aggregate to extract useful statistics and train simple machine learning models.

Our work is motivated by a few “real-world” problems:

  • Media broadcasting providers like the BBC (with which we collaborate) routinely collect data from their users about videos they have watched (e.g., on BBC’s iPlayer) in order to provide users with personalized suggestions for other videos, based on recommender systems like Item k-Nearest Neighbor (ItemKNN)
  • Urban and transport planning committees, such as London’s mass transport operators, need to gather statistics about people’s movements and commutes, e.g., to improve transportation services and predict near-future trends and anomalies on a short notice.
  • Network infrastructures like the Tor network need to gather traffic statistics, like the number of, and traffic generated by, Tor hidden services, in order to tune design decisions as well as convince their founders the infrastructure is used for the intended purposes.

While different in their application, these examples exhibit a common feature: the need for providers to aggregate large amounts of sensitive information from large numbers of data sources, in order to produce aggregate statistics and possibly train machine learning models.

Prior work has proposed a few cryptographic tools for privacy-enhanced computation that could be use for private collection of statistics. For instance, by relying on homomorphic encryption and/or secret sharing, an untrusted aggregator can receive encrypted readings from users and only decrypt their sum. However, these require users to perform a number of cryptographic operations, and transmit a number of ciphertexts, linear in the size of their inputs, which makes it impractical for the scenarios discussed above, whereby inputs to be aggregated are quite large. For instance, if we use ItemKNN for the recommendations, we would need to aggregate values for “co-views” (i.e., videos that have been watched by the same user) of hundreds of videos at the time – thus, each user would have to encrypt and transfer hundreds of thousands of values at the time.

Scaling private aggregation

We tackle the problem from two points of view: an “algorithmic” one and a “system” one. That is, we have worked both on the design of the necessary cryptographic and data structure tools, as well as on making it easy for application developers to easily support these tools in web and mobile applications.

Our intuition is that, in many scenarios, it might be enough to collect estimates of statistics and trade off an upper-bounded error with significant efficiency gains. For instance, the accuracy of a recommender system might not be really affected if the statistics we need to train the model are approximated with a small error.

Continue reading Privately gathering statistics and training simple models

Jens Groth – Non-interactive zero knowledge proofs, efficient enough to be used in practice

The UCL information security group’s Jens Groth, a cryptographer, is one of 17 UCL researchers who have been awarded a Starting Grant by the European Research Council. The five-year grant will fund his work on the cryptographic building block known as “zero-knowledge proofs”, a widely applicable technique that underpins both security and trust. ERC Starting Grants are intended to support up-and-coming research leaders who are beginning to set up a research team and conduct independent research. Groth’s focus is on making zero- knowledge proofs more efficient so that they can become cheap enough to become a commonly used, standard security technology. Groth is also the recipient of a second grant from the Engineering and Physical Sciences Research Council to fund his work on another related topic, structure-preserving pairing-based cryptography.

https://www.youtube.com/watch?v=1XIekTniX00

“My line of thinking,” says Groth, “is that there’s been a lot of research into zero-knowledge proofs, but I don’t know of any groups taking entire systems from theory through to very practical implementations. I am hoping to build a group that will cover this entire span, and by covering it thoroughly get some very significant gains in efficiency.” Covering that entire spectrum from the purely abstract to the built system is important, he says, because “Practice can influence theory and give us some insight into what we should be looking at. Also, when you start implementing things, lots of surprising discoveries can come up.”

Unlike other types of cryptographic tools, such as public key cryptography, used in such widely used mass-market applications as SSL (used to secure data passed over the Web while in transit), Groth notes that zero-knowledge proofs are more likely to be a behind-the-scenes technology that end users will never touch directly.

“It will be hidden inside the system,” he says. “The main properties we want are completeness, soundness – and zero-knowledge.” Completeness means the prover can convince the verifier when a statement is true. Soundness means the prover cannot convince the verifier when the statement is false. Finally, zero-knowledge means that there is no leakage of information even if the prover is interacting with a fraudulent verifier.

Continue reading Jens Groth – Non-interactive zero knowledge proofs, efficient enough to be used in practice

Our contributions to the UK Distributed Ledger Technology report

The UK Government Office for Science, has published its report on “Distributed ledger technology: beyond block chain” to which UCL’s Sarah Meiklejohn, Angela Sasse and myself (George Danezis) contributed parts of the security and privacy material. The review, looks largely at economic, innovation and social aspects of these technologies. Our part discusses potential threats to ledgers, as well as opportunities to build robust security systems using ledgers (Certificate Transparency & CONIKS), and overcome privacy challenges, including a mention of the z.cash technology.

You can listen to the podcast interview Sarah gave on the report’s use cases, recommendations, but also more broadly future research directions for distributed ledgers, such as better privacy protection.

In terms of recommendation, I personally welcome the call for the Government Digital Services, and other innovation bodies to building capacity around distributed ledger technologies. The call for more research for efficient and secure ledgers (and the specific mention of cryptography research) is also a good idea, and an obvious need. When it comes to the specific security and privacy recommendation, it simply calls for standards to be established and followed. Sadly this is mildly vague: a standards based approach to designing secure and privacy-friendly systems has not led to major successes. Instead openness in the design, a clear focus on key end-to-end security properties, and the involvement of a wide community of experts might be more productive (and less susceptible to subversion).

The report is well timed: our paper on “Centrally Banked Crypto-Currencies” will be presented in February at a leading security conference, NDSS 2016, by Sarah Meiklejohn, largely inspired by the research agenda published by the Bank of England. It provides some answers to the problems of scalability and eco-friendliness of current proof-of-work based ledger design.

Insecure by design: protocols for encrypted phone calls

The MIKEY-SAKKE protocol is being promoted by the UK government as a better way to secure phone calls. The reality is that MIKEY-SAKKE is designed to offer minimal security while allowing undetectable mass surveillance, through the introduction a backdoor based around mandatory key-escrow. This weakness has implications which go further than just the security of phone calls.

The current state of security for phone calls leaves a lot to be desired. Land-line calls are almost entirely unencrypted, and cellphone calls are also unencrypted except for the radio link between the handset and the phone network. While the latest cryptography standards for cellphones (3G and 4G) are reasonably strong it is possible to force a phone to fall back to older standards with easy-to-break cryptography, if any. The vast majority of phones will not reveal to their user whether such an attack is under way.

The only reason that eavesdropping on land-line calls is not commonplace is that getting access to the closed phone networks is not as easy compared to the more open Internet, and cellphone cryptography designers relied on the equipment necessary to intercept the radio link being only affordable by well-funded government intelligence agencies, and not by criminals or for corporate espionage. That might have been true in the past but it certainly no longer the case with the necessary equipment now available for $1,500. Governments, companies and individuals are increasingly looking for better security.

A second driver for better phone call encryption is the convergence of Internet and phone networks. The LTE (Long-Term Evolution) 4G cellphone standard – under development by the 3rd Generation Partnership Project (3GPP) – carries voice calls over IP packets, and desktop phones in companies are increasingly carrying voice over IP (VoIP) too. Because voice calls may travel over the Internet, whatever security was offered by the closed phone networks is gone and so other security mechanisms are needed.

Like Internet data encryption, voice encryption can broadly be categorised as either link encryption, where each intermediary may encrypt data before passing it onto the next, or end-to-end encryption, where communications are encrypted such that only the legitimate end-points can have access to the unencrypted communication. End-to-end encryption is preferable for security because it avoids intermediaries being able to eavesdrop on communications and gives the end-points assurance that communications will indeed be encrypted all the way to their other communication partner.

Current cellphone encryption standards are link encryption: the phone encrypts calls between it and the phone network using cryptographic keys stored on the Subscriber Identity Module (SIM). Within the phone network, encryption may also be present but the network provider still has access to unencrypted data, so even ignoring the vulnerability to fall-back attacks on the radio link, the network providers and their suppliers are weak points that are tempting for attackers to compromise. Recent examples of such attacks include the compromise of the phone networks of Vodafone in Greece (2004) and Belgacom in Belgium (2012), and the SIM card supplier Gemalto in France (2010). The identity of the Vodafone Greece hacker remains unknown (though the NSA is suspected) but the attacks against Belgacom and Gemalto were carried out by the UK signals intelligence agency – GCHQ – and only publicly revealed from the Snowden leaks, so it is quite possible there are others attacks which remain hidden.

Email is typically only secured by link encryption, if at all, with HTTPS encrypting access to most webmail and Transport Layer Security (TLS) sometimes encrypting other communication protocols that carry email (SMTP, IMAP and POP). Again, the fact that intermediaries have access to plaintext creates a vulnerability, as demonstrated by the 2009 hack of Google’s Gmail likely originating from China. End-to-end email encryption is possible using the OpenPGP or S/MIME protocols but their use is not common, primarily due to their poor usability, which in turn is at least partially a result of having to stay compatible with older insecure email standards.

In contrast, instant messaging applications had more opportunity to start with a clean-slate (because there is no expectation of compatibility among different networks) and so this is where much innovation in terms of end-to-end security has taken place. Secure voice communication however has had less attention than instant messaging so in the remainder of the article we shall examine what should be expected of a secure voice communication system, and in particular see how one of the latest and up-coming protocols, MIKEY-SAKKE, which comes with UK government backing, meets these criteria.

MIKEY-SAKKE and Secure Chorus

MIKEY-SAKKE is the security protocol behind the Secure Chorus voice (and also video) encryption standard, commissioned and designed by GCHQ through their information security arm, CESG. GCHQ have announced that they will only certify voice encryption products through their Commercial Product Assurance (CPA) security evaluation scheme if the product implements MIKEY-SAKKE and Secure Chorus. As a result, MIKEY-SAKKE has a monopoly over the vast majority of classified UK government voice communication and so companies developing secure voice communication systems must implement it in order to gain access to this market. GCHQ can also set requirements of what products are used in the public sector and as well as for companies operating critical national infrastructure.

UK government standards are also influential in guiding purchase decisions outside of government and we are already seeing MIKEY-SAKKE marketed commercially as “government-grade security” and capitalising on their approval for use in the UK government. For this reason, and also because GCHQ have provided implementers a free open source library to make it easier and cheaper to deploy Secure Chorus, we can expect wide use MIKEY-SAKKE in industry and possibly among the public. It is therefore important to consider whether MIKEY-SAKKE is appropriate for wide-scale use. For the reasons outlined in the remainder of this article, the answer is no – MIKEY-SAKKE is designed to offer minimal security while allowing undetectable mass surveillance though key-escrow, not to provide effective security.

Continue reading Insecure by design: protocols for encrypted phone calls

Nicolas Courtois – Algebraic cryptanalysis is not the best way to break something, but sometimes it is the only option

Nicolas Courtois, a mathematician and senior lecturer in computer science at UCL, working with Daniel Hulme and Theodosis Mourouzis, has won the 2012 best paper award from the International Academy, Research, and Industry Association for their work on using SAT solvers to study various problems in algebra and circuit optimization. The research was funded by the European Commission under the FP7 project number 242497, “Resilient Infrastructure and Building Security (RIBS)” and by the UK Technology Strategy Board under project 9626-58525. The paper, Multiplicative Complexity and Solving Generalized Brent Equations with SAT Solvers, was presented at Computation Tools 2012, the third International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking, held in Nice, France in July.

https://www.youtube.com/watch?v=Lfen7ZOIkYw

SAT (short for “satisfiability”) solvers are algorithms used to analyse logical problems composed of multiple statements such as “A is true OR not-B is true or C is true” for the purpose of determining whether the whole system can be true – that is, whether all the statements it’s composed of can be satisfied. SAT solvers also are used to determine how to assign the variables to make the set of statements true. In 2007, Bard and Courtois realised they could be used to test the security of cryptographic functions and measure their complexity, and today they are important tools in cryptanalysis; they have already been used for a long time in other applications such as verifying hardware and software. In this particular paper, Courtois, Hulme, and Mourouzis focused on optimising S-boxes for industrial block ciphers; the paper reports the results of applying their methodology to the PRESENT and GOST block ciphers. Reducing the complexity and hardware cost of these ciphers is particularly important to build so-called secure implementations of cryptography. These are particularly costly because they need to protect against additional threats such as side-channel attacks, in which the attacker exploits additional information leaked from the physical system – for example, by using an oscilloscope to observe a smart card’s  behaviour.

“It’s more a discovery than an invention,” says Courtois. “One of the amazing things SAT solvers can do is give you proof that something is not true.” The semiconductor industry provides one application of the work in this paper: these techniques promise to provide a way to test whether a circuit has been built with the greatest possible efficiency by proving that the chip design uses the smallest possible number of logic gates.

“You’ll get optimal designs and be able to prove they cannot be done better,” he says.

Classical cryptanalysis proceeds by finding approximations to the way a cipher works. Many successful academic attacks have been mounted using such techniques, but they rely on having a relatively large amount of data available for study. That works for large archives of stored data – such as, for example, the communications stored and kept by the Allies after World War II for later cryptanalysis. But in many real-world applications, it is more common to have only very small amounts of data.

“The more realistic scenario is that you’ll just have one or a few messages,” says Courtois. Bluetooth, for example, encrypts only 1,500 bits with a single key. “Most attacks are useless because they won’t work with this quantity of data.” Algebraic cryptanalysis, which he explained in New Frontier in Symmetric Cryptanalysis, an invited talk at Indocrypt 2008, by contrast, is one of the few techniques that can be hoped to work in such difficult situations.

Continue reading Nicolas Courtois – Algebraic cryptanalysis is not the best way to break something, but sometimes it is the only option

Program obfuscation

I had the pleasure of visiting the Simons Institute for the Theory of Computing at UC Berkeley over the Summer. One of the main themes of the programme was obfuscation.

Recently there has been a lot of exciting research on developing cryptographic techniques for program obfuscation. Obfuscation is of course not a new thing, you may already be familiar with the obfuscated C contest. But the hope underlying this research effort is to replace ad hoc obfuscation, which may or may not be possible to reverse engineer, with general techniques that can be applied to obfuscate any program and that satisfy a rigorous definition of what it means for a program to be obfuscated.

Even defining what it means for a program to be obfuscated is not trivial. Ideally, we would like an obfuscator to be a compiler that turns any computer program into a virtual black-box. By a virtual black-box we mean that the obfuscated program should preserve functionality while not leaking any other information about the original code, i.e., it should have the same input-output behaviour as the original program and you should not be able to learn anything beyond what you could learn by executing the original program. Unfortunately it turns out that this is too ambitious a goal: there are special programs, which are impossible to virtual black-box obfuscate.

Instead cryptographers have been working towards developing something called indistinguishability obfuscation. Here the goal is that given two functionally equivalent programs P1 and P2 of the same size and an obfuscation of one of them O(Pi), it should not be possible to tell which one has been obfuscated. Interestingly, even though this is a weaker notion than virtual black-box obfuscation it has already found many applications in the construction of cryptographic schemes. Furthermore, it has been proved that indistinguishability obfuscation is the best possible obfuscation in the sense that any information leaked by an obfuscated program is also leaked by any other program computing the same functionality.

So, when will you see obfuscation algorithms that you can use to obfuscate your code? Well, current obfuscation algorithms have horrible efficiency and are far from practical applicability so there is still a lot of research to be done on improving them. Moreover, current obfuscation proposals are based on something called graded encoding schemes. At the moment, there is a tug of war going on between cryptographers proposing graded encoding schemes and cryptanalysts breaking them. Breaking graded encoding schemes does not necessarily break the obfuscation algorithms building on them but it is fair to say that right now the situation is a mess. Which from a researcher’s perspective makes the field very exciting since there is still a lot to discover!

If you want to learn more about obfuscation I recommend the watching some of the excellent talks from the programme.

Sarah Meiklejohn – Security and Cryptography

Sarah Meiklejohn As a child, Sarah Meiklejohn thought she might become a linguist, largely because she was so strongly interested in the work being done to decode the ancient Greek writing systems Linear A and Linear B.

“I loved all that stuff,” she says. “And then I started doing mathematics.” At that point, with the help of Simon Singh’s The Code Book, she realised the attraction was codebreaking rather than human languages themselves. Simultaneously, security and privacy were increasingly in the spotlight.

“I’m a very private person, and so privacy is near and dear to my heart,” she says. “It’s an important right that a lot of people don’t seem interested in exercising, but it’s still a right. Even if no one voted we would still agree that it was important for people to be able to vote.”

It was during her undergraduate years at Brown, which included a fifth-year Masters degree, that she made the transition from mathematics to cryptography and began studying computer science. She went on to do her PhD at the University of California at San Diego. Her appointment at UCL, which is shared between the Department of Computer Science and the Department of Crime Science, is her first job.

Probably her best-known work is A Fistful of Bitcoins: Characterizing Payments Among Men with No Names (PDF), written with Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage and presented at USENIX 2013, which studied the question of how much anonymity bitcoin really provides.

“The main thing I was trying to focus on in that paper is what bitcoin is used for,” she says. The work began with buying some bitcoin (in 2012, at about £3 each), and performing some transactions with them over a period of months. Using the data collected this way allowed her to uncover some “ground truth” data.

“We developed these clustering techniques to get down to single users and owners.” The result was that they could identify which addresses belonged to which exchanges and enabled them to get a view of what was going on in the network. “So we could say this many bitcoins passed through this exchange per month, or how many were going to underground services like Silk Road.”

Continue reading Sarah Meiklejohn – Security and Cryptography

Just how sophisticated will card fraud techniques become?

In late 2009, my colleagues and I discovered a serious vulnerability in EMV, the most widely used standard for smart card payments, known as “Chip and PIN” in the UK. We showed that it was possible for criminals to use a stolen credit or debit card without knowing the PIN, by tricking the terminal into thinking that any PIN is correct. We gave the banking industry advance notice of our discovery in early December 2009, to give them time to fix the problem before we published our research. After this period expired (two months, in this case) we published our paper as well explaining our results to the public on BBC Newsnight. We demonstrated that this vulnerability was real using a proof-of-concept system built from equipment we had available (off-the shelf laptop and card reader, FPGA development board, and hand-made card emulator).

No-PIN vulnerability demonstration

After the programme aired, the response from the banking industry dismissed the possibility that the vulnerability would be successfully exploited by criminals. The banking trade body, the UK Cards Association, said:

“We believe that this complicated method will never present a real threat to our customers’ cards. … Neither the banking industry nor the police have any evidence of criminals having the capability to deploy such sophisticated attacks.”

Similarly, EMVCo, who develop the EMV standards said:

“It is EMVCo’s view that when the full payment process is taken into account, suitable countermeasures to the attack described in the recent Cambridge Report are already available.”

It was therefore interesting to see that in May 2011, criminals were caught having stolen cards in France then exploiting a variant of this vulnerability to buy over €500,000 worth of goods in Belgium (which were then re-sold). At the time, not many details were available, but it seemed that the techniques the criminals used were much more sophisticated than our proof-of-concept demonstration.

We now know more about what actually happened, as well as the banks’ response, thanks to a paper by the researchers who performed the forensic analysis that formed part of the criminal investigation of this case. It shows just how sophisticated criminals could be, given sufficient motivation, contrary to the expectations in the original banking industry response.

Continue reading Just how sophisticated will card fraud techniques become?

UCL Code Breaking Competition

6689260_sModern security systems frequently rely on complex cryptography to fulfil their goals and so it is important for security practitioners to have a good understanding of how cryptographic systems work and how they can fail. The Cryptanalysis (COMPGA18/COMPM068) module in UCL’s MSc Information Security provides students with the foundational knowledge to analyse cryptographic systems whether as part of system development in industry or as academic research.

To give students a more realistic (and enjoyable) experience there is no written exam for this module; instead the students are evaluated based on coursework and a code breaking competition.

UCL has a strong tradition of experimental research and we have been running many student competitions and hacking events in the past. In March 2013 a team directed by Dr Courtois won the UK University Cipher Challenge 2013 award, held as part of the UK Cyber Security Challenge.

This year the competition has been about finding cryptographically significant events in a real-life financial system. The competition (open both to UCL students and those of other London universities) requires the study of random number generators, elliptic curve cryptography, hash functions, exploration of large datasets, programming and experimentation, data visualisation, graphs and statistics.

We are pleased to announce the winners of the competition:

  • Joint 1st prize: Gemma Bartlett. Grade obtained 92/100.
  • Joint 1st prize: Vasileios Mavroudis.  Grade obtained 92/100.
  • 2nd prize: David Kohan Marzagão.  Grade obtained 82/100.

About the winners:

gemmb vasmdavm

  • Gemma Bartlett (left) is in her final year at UCL studying for an M.Eng. in Mathematical Computation with a focus on Information Security. Her particular interests include digital forensics. She will be starting a job in this field after graduation.
  • Vasilios Mavroudis (middle) received his B.Sc. in Applied Informatics from the University of Macedonia, Greece in 2012.  He is currently pursuing an M.Sc. in Information Security at UCL. In the past, he has worked as a security researcher in Deutsche Bank, University of California Santa Barbara and at the Centre for Research and Technology Hellas (CERTH). His research interests include network and systems security, malware, and applied cryptography.
  • David Kohan Marzagão (right) is currently undertaking a PhD in Computer Science under the supervision of Peter McBurney at King’s College London.  In 2014, he received his BSc in Mathematics at the University of São Paulo, Brazil. His research interests include cryptography, multi-agent systems, graph theory, and random walks.

Teaching cybersecurity to criminologists

I recently had the pleasure of teaching my first module at UCL, an introduction to cybersecurity for students in the SECReT doctoral training centre.

The module had been taught before, but always from a fairly computer-science-heavy perspective. Given that the students had largely no background in computer science, and that my joint appointment in the Department of Security and Crime Science has given me at least some small insight into what aspects of cybersecurity criminologists might find interesting, I chose to design the lecture material largely from scratch. I tried to balance the technical components of cybersecurity that I felt everyone needed to know (which, perhaps unsurprisingly, included a fair amount of cryptography) with high-level design principles and the overarching question of how we define security. Although I say I designed the curriculum from scratch, I of course ended up borrowing heavily from others, most notably from the lecture and exam material of my former supervisor’s undergraduate cybersecurity module (thanks, Stefan!) and from George’s lecture material for Introduction to Computer Security. If anyone’s curious, the lecture material is available on my website.

As I said, the students in the Crime Science department (and in particular the ones taking this module) had little to no background in computer science.  Instead, they had a diverse set of academic backgrounds: psychology, political science, forensics, etc. One of the students’ proposed dissertation titles was “Using gold nanoparticles on metal oxide semiconducting gas sensors to increase sensitivity when detecting illicit materials, such as explosives,” so it’s an understatement to say that we were approaching cybersecurity from different directions!

With that in mind, one of the first things I did in my first lecture was to take a poll on who was familiar with certain concepts (e.g., SSH, malware, the structure of the Internet), and what people were interested in learning about (e.g., digital forensics, cryptanalysis, anonymity). I don’t know what I was expecting, but the responses really blew me away! The students overwhelmingly wanted to hear about how to secure themselves on the Internet, both in terms of personal security habits (e.g., using browser extensions) and in terms of understanding what and how things might go wrong. Almost the whole class specifically requested Tor, and a few had even used it before.

This theme of being (pleasantly!) surprised continued throughout the term.  When I taught certificates, the students asked not for more details on how they work, but if there was a body responsible for governing certificate authorities and if it was possible to sue them if they misbehave. When I taught authentication, we played a Scattergories-style game to weigh the pros and cons of various authentication mechanisms, and they came up with answers like “a con of backup security questions is that they reveal cultural trends that may then be used to reveal age, ethnicity, gender, etc.”

There’s still a month and a half left until the students take the exam, so it’s too soon to say how effective it was at teaching them cybersecurity, but for me the experience was a clear success and one that I look forward to repeating and refining in the future.