Bentham’s Gaze

On the security and privacy of the ultrasound tracking ecosystem

In April 2016, the US Federal Trade Commission (FTC) sent warning letters to 12 Google Play app developers. The letters were addressed to those who incorporated the SilverPush framework in their apps, and reminded developers who used tracking software to explicitly inform their users (as seen in Section 5 of the Federal Trade Commission Act). The incident was covered by popular press and privacy concerns were raised. Shortly after, SilverPush claimed no active partnerships in the US and the buzz subsided.

Unfortunately, as the incident was resolved relatively fast, very few technical details of the technology were made public. To fill in this gap and understand the potential security implications, we conducted an in-depth study of the SilverPush framework and all the associated technologies.

The development of the framework was motivated by a fast-increasing interest of the marketing industry in products performing high-accuracy user tracking, and their derivative monetization schemes. This resulted in a high demand for cross-device tracking techniques with increased accuracy and reduced prerequisites.

The SilverPush framework fulfilled both of these requirements, as it provided a novel way to track users between their devices (e.g., TV, smartphone), without any user actions (e.g., login to a single platform from all their devices). To achieve that, the framework realized a previously unseen cross-device tracking technique (i.e., ultrasound cross-device tracking, uXDT) that offered high tracking accuracy, and came with various desirable features (e.g., easy to deploy, imperceptible by users). What differentiated that framework from existing ones was the use of high-frequency, inaudible ultrasonic beacons (uBeacons) as a medium/channel for identifier transmission between the user’s devices. This is also offered a major advantage to uXDT, against other competing technologies, as uBeacons can be emitted by most commercial speakers and captured by most commercial microphones. This eliminates the need for specialized emission and/or capturing equipment.

Aspects of a little-known ecosystem

The low deployment cost of the technology fueled the growth of a whole ecosystem of frameworks and applications that use uBeacons for various purposes, such as proximity marketing, audience analytics, and device pairing. The ecosystem is built around the near-ultrasonic transmission channel, and enables marketers to profile users.

Unfortunately, users are often given limited information on the ecosystem’s inner workings. This lack of transparency has been the target of great criticism from the users, the security community and the regulators. Moreover, our security analysis revealed a false assumption in the uBeacon threat model that can be exploited by state-level adversaries to launch complex attacks, including one that de-anonymizes the users of anonymity networks (e.g., Tor).

On top of these, a more fundamental shortcoming of the ecosystem is the violation of the least privilege principle, as a consequence of the access to the device’s microphone. More specifically, any app that wants to employ ultrasound-based mechanisms needs to gain full access to the device’s microphone, as there is currently no way to gain access only to the ultrasound spectrum. This clearly violates the least privilege principle, as the app has now access to all audible frequencies and allows a potentially malicious developer to request access to the microphone for ultrasound-pairing purposes, and then use it to eavesdrop the user’s discussions. This also results in any ultrasound-enabled apps to risk being perceived as “potentially malicious” by the users.

Mitigation

To address these shortcomings, we developed a set of countermeasures aiming to provide protection to the users in the short and medium term. The first one is an extension for the Google Chrome browser, which filters out all ultrasounds from the audio output of the websites loaded by the user. The extension actively prevents web pages from emitting inaudible sounds, and thus completely thwarts any unsolicited ultrasound-tracking attempts. Furthermore, we developed a patch for the Android permission system that allows finer-grained control over the audio channel, and forces applications to declare their intention to capture sound from the inaudible spectrum. This will properly separate the permissions for listening to audible sound and sound in the high-frequency spectrum, and will enable the end users to selectively filter the ultrasound frequencies out of the signal acquired by the smartphone microphone.

More importantly, we argue that the ultrasound ecosystem can be made secure only with the standardization of the ultrasound beacon format. During this process, the threat model will be revised and the necessary security features for uBeacons will be specified. Once this process is completed, APIs for handling uBeacons can be implemented in all major operating systems. Such an API would provide methods for uBeacon discovery, processing, generation and emission, similar to those found in the Bluetooth Low Energy APIs. Thereafter, all ultrasound-enabled apps will need access only to this API, and not to the device’s microphone. Thus, solving the problem of over-privileging that exposed the user’s sensitive data to third-party apps.

Discussion

Our work provides an early warning on the risks looming in the ultrasound ecosystem, and lays the foundations for the secure use of this set of technologies. However, it also raises several questions regarding the security of the audio channel. For instance, in a recent incident a journalist accidentally injected commands to several amazon echo devices, which then allegedly tried to order products online. This underlines the need for security features in the audio channel. Unfortunately, due to the variety of use cases, a universal solution that could be applied to the lower communication layers seems unlikely. Instead, solutions must be sought in the higher communications layers (e.g., application layers), and should be the outcome of careful threat modeling.

Inaugural Lecture: Zero-Knowledge Proofs

We held our annual ACE-CSR event in November 2016. The last talk was my inaugural lecture to full professor. I did not write the summary below myself, hence the use of third person, not because I now consider myself royalty! 🙂

In introducing Jens Groth, professor of cryptology, George Danezis, head of the information security group, commented that Groth’s work provided the only viable solutions to many of the hard privacy problems he himself was tackling. To most qualified engineers, he said, the concept of zero-knowledge proofs seems impossible: the idea is to show the properties of a secret without revealing them. A zero-knowledge proof could, for example, verify the result of a computation on some data without revealing the data itself. Most engineers believe that you must choose between integrity and confidentiality; Groth has proved this is not true. In addition, Danezis praised Groth’s work as highly creative, characterised by great mathematical depth and subtlety, and admired Groth’ willingness to speak his mind fearlessly even to government funders. Angela Sasse, head of the department, called Groth’s work “security tools we’re going to need for future generations”, and noted that simultaneously with these other accomplishments Groth helped put in place the foundation for the group as it is today.

Groth, jokingly opted to structure his talk around papers he’s had rejected to illustrate how hard it can be to publish innovative research. The concept of zero-knowledge proofs originated with a 1985 paper by Shafi Goldwasser, Silvio Micali, and Charles Rackoff. Zero-knowledge proofs have three characteristics: completeness (the prover can convince the verifier that the statement is true); soundness (the claiming prover cannot convince the verifier when the statement is false); and secrecy (no information other than the truth of the statement is leaked, even when the prover is interacting with a verifier who cheats). Groth illustrated the latter idea with a simple card trick: he asked an audience member to choose a card and then say whether the card was a heart or not. If the respondent shows all the cards that are not hearts, counting these proves that the selected card must be a heart without revealing what it is.

Zero-knowledge proofs can be extended to think about more complicated statements. Groth listed some examples:

Assert that a logical formula has an assignment to the variables that makes it true
Verify that a graph is Hamiltonian – that is, there is a path that touches each vertex exactly once
That a set of inputs into a Boolean circuit will produce an output of 1
Any statement of the general form U belongs to some NP-language L

Groth could see many possible applications for these proofs: signatures, encryption, electronic cash, electronic auctions, internet voting, multiparty computation, and verifiable cloud computing. His overall career has focused on building versatile and efficient proofs with the goal of moving them from being expensive and slow to being just a fraction of the cost of the task that’s being executed so that people would stop thinking about the cost and just toss them in as a standard part of any transaction.

Continue reading Inaugural Lecture: Zero-Knowledge Proofs

Security intrusions as mechanisms

The practice of security often revolves around figuring out what (malicious act) happened to a system. This historical inquiry is the focus of forensics, specifically when the inquiry regards a policy violation (such as a law). The results of forensic investigation might be used to fix the impacted system, attribute the attack to adversaries, or build more resilient systems going forwards. However, to execute any of these purposes, the investigator first must discover the mechanism of the intrusion.

As discussed at an ACE seminar last October, one common framework for this discovery task is the intrusion kill chain. Mechanisms, mechanistic explanation, and mechanism discovery have highly-developed meanings in the biological and social sciences, but the word is not often used in information security. In a recent paper, we argue that incident response and forensics investigators would be well-served to make use of the existing literature on mechanisms, as thinking about intrusion kill chains as mechanisms is a productive and useful way to frame the work.

To some extent, thinking mechanistically is a description of what (certain) scientists do. But the mechanisms literature within philosophy of science is not merely descriptive. The normative benefits extolled include that thinking mechanistically is an effective heuristic for searching out useful explanations; mechanisms provide the most coherent unity to complex fields of study; and that mechanistic explanation is necessary to guide selection among potential studies given limited experimental resources, experiment design decisions, and interpretation of statistical results. I previously argued that capricious use of biological metaphors is bad for information security. We are keenly aware that these benefits of mechanistic explanation need to apply to security as and for security, not merely because they work in other sciences.

Our paper demonstrates how we can cast the intrusion kill chain, the diamond model, and other models of security intrusions as mechanistic models. This casting begins to demonstrate the mosaic unity of information security. Campaigns are made up of attacks. Attacks, as modeled by the kill chain, have multiple steps. In a specific attack, the delivery step might be accomplished by a drive-by-download. So we demonstrate how drive-by-downloads are a mechanism, one among many possible delivery mechanisms. This description is a schema to be filled in during a particular drive-by download incident with a specific URL and specific javascript, etc. The mechanistic schema of the delivery mechanism informs the investigator because it indicates what types of network addresses to look for, and how to fit them into the explanation quickly. This process is what Lindley Darden calls schema instantiation in the mechanism discovery literature.

Our argument is not that good forensics investigators do not do such mechanism discovery strategies. Rather, it is precisely that good investigators do do them. But we need to describe what it is good investigators in fact do. We do not currently, and that lack makes teaching new investigators particularly difficult. Thinking about intrusions as mechanisms unlocks an expansive literature on good ways to do mechanism discovery. This literature will make it easier to codify what good investigators do, which among other benefits allows us to better teach sound methodological practices to incoming investigators.

Our paper on this topic was published in the open-access Journal of Cybersecurity, as Thinking about intrusion kill chains as mechanisms, by Jonathan M. Spring and Eric Hatleback.

Privacy analysis of the W3C Proximity Sensor specification

Mobile developers are familiar with proximity sensors. These provide information about the distance of an object from the device providing access to the sensor (i.e. smartphone, tablet, laptop, or another Web of Things device). This object is usually the user’s head or hand.

Proximity sensors provide binary values such as far (from the object) or near (respectively), or a more verbose readout, in centimeters.

There are many useful applications for proximity. For example, an application can turn off the screen if the user holds the device close to his/her head (face detection). It is also handy for avoiding the execution of undesired actions, possibly arising from scratching the head with the phone’s screen.

The Proximity Sensor API specification is being standardized by W3C. Every web site will be able to access this information. Let’s focus on the privacy engineering point of view.

Proximity is the distance between an object and device. The latest version of W3C Proximity Sensors API takes advantage of a soon-to-be standard Generic Sensors API. The sensor provides the proximity distance in centimeters.

The use of Proximity Sensors is simple; an example displaying the current distance in centimeters using the modern syntax of Generic Sensors API is as follows:

let sensor = new ProximitySensor();

sensor.start();

sensor.onchange = function(){console.log(event.reading.distance)};

This syntax might later allow requesting various proximity sensors (if the device has more than one), including those based on the sensor’s position (such as “rear” or “front”), but the current implementations generally still use the legacy version from the previous edition of the spec:

window.addEventListener('deviceproximity', function(event) {  
console.log(event.value)  
});

From the perspective of privacy considerations, there is currently no significant difference between those two API versions.

Privacy analysis of Proximity Sensor API

Proximity sensor readout is not providing much data, so it may appear there are no privacy implications, but there are good reasons for performing such analysis. Let’s list two:

Firstly, designing new standards and systems with privacy in mind — privacy by design — is a required practice and a good idea. Secondly, in some circumstances even potentially insignificant mechanisms can still bring consequences from privacy point of view. For example, multiple identifiers might be helpful in deanonymizing users. Non-obvious data leaks can surface, as well.

So let’s discuss some of the possible issues.

The average distance from the device to the user’s face (i.e. object) could be used to differentiate and discriminate between users. While the severity is hard to establish, future changes (i.e. influencing with other external data) cannot be ruled out.

If proximity patterns would be individually-attributable, this would offer a possibility to enhance user profiling based on the analysis of device use patterns. For example, the following could be obtained and analyzed:

Frequency of the user’s patterns of device use (i.e. waving a hand in front of the proximity sensor)
Frequency of the user’s zooming in and out (i.e. device close to the user’s head)
Patterns of use. Does the way the user hold the device vary during the day? How?
What are the mechanics of holding the device close to the user’s head? Can the distance vary?

Some users may use the mobile operating system’s zoom capability to increase the font or images, but others might casually prefer to hold the device closer to their eyes. Such behavioral differences can also be of note.

Proximity sensors can provide the following data: the current proximity distance and max, the maximum distance readout supported by the device. In case the value of max would differ between various implementations (e.g. among browsers, devices), it could form an identifier. And the two (max, distance) combined could also be used as short-lived identifiers. Moreover, it is not so difficult to imagine a situation where those identifiers are actually not so short-lived.

Recommendations

First of all, is there a need to provide a verbose proximity readout at all? For example, is providing readouts of proximity (distance) value up to 150 cm necessary?

In general, a device (browser, a Web of Things device) should be capable of informing the user when a web site accesses proximity information. The user should also be able to inspect which web sites – and how frequently – accessed the API.

Finally, the sensors should provide an adequately verbose readout of the distance.

Proximity Sensors should also be subject to permissions.

Demonstration

At the moment, the implementations of proximity sensors are limited. But a demonstration is running on my SensorsPrivacy research project (tested against Firefox Mobile on Android). In my case (Nexus 5), Firefox offered data in centimeters, but the verbosity was limited to 5 centimeters on my device. Again, this is a matter of hardware and software. In principle, the accuracy could be much higher and one can imagine an accuracy a sensor providing more accurate readouts (even up to 1 meter).

The screen dump below shows how the example demonstrates the use of proximity data (and whether it works on a browser). The site changes its background color if an object is placed near the proximity sensor of a device.

The readout from the demonstration shows a relative time between events (first column), and the proximity distance (second column).

The sensor on my device is reporting two values of distance: 0 (near) and 5 (far), in centimeters. In general, the granularity is subject to the following: standard, implementation and hardware.

And this suggests something, of course. So let’s complement the privacy analysis from the previous section to incorporate a timing analysis.

Even such a limited readout can help performing behavioural analysis, simply by profiling based on time series. As we can see, the sensor readout is quite sensitive in respect to time and can measure with sub-200 ms granularity. The demonstration proof of principle is also showing the minimum relative detected time between events. Feel free to test the performance of your fingers and/or hardware!

Summary

When designing a standard project or an implementation, paying attention to details is imperative. This includes consideration of even potential risks. This is especially the case if the software or systems will be used by millions or hundreds of millions of users, i.e. you are aiming for success.

Finally, standards and specifications are ideal for issuing guidance and good practices

This post originally appeared on Security, Privacy & Tech Inquiries, the blog of Lukasz Olejnik. An accompanying demonstration is available on SensorsPrivacy (a project studing privacy of web sensors).

Blockchains and Why We Need Privacy

The introduction of ZCash, and subsequent articles explaining the technology, brought (slightly more) mainstream attention to a situation that people interested in blockchain technology have been aware of for quite a while: Bitcoin doesn’t provide users with a lot of anonymity.

Bitcoin instead provides a property referred to as pseudonymity (derived from pseudonym, meaning ‘false name’). Spending and receiving bitcoin can be done without transacting parties ever learning each other’s off-chain (real world) identities. Users have a private key with which to spend their funds, a public key so that they can receive funds, and an address where the funds are stored, on the blockchain. However, even if addresses are frequently changed, very revealing analysis can be performed on the information stored in the blockchain, as all transactions are stored in the clear.

For example, imagine if you worked at a small company and you were all paid in bitcoin – based on your other purchases, your colleagues could have a very easy time linking your on-chain addresses to you. This is more revealing than using a bank account, and we want cryptocurrencies to offer better properties than bank accounts in every way. Higher speed, lower cost and greater privacy of transfers are all essential.

ZCash is a whole different ball game. Instead of sending transactions in the clear, with the transaction value and sender and recipient address stored on the blockchain for all to see, ZCash transactions instead produce a zero-knowledge proof that the sender owns an amount of ZEC greater than or equal to the amount that they are trying to spend.

Put more simply, you can submit a proof that you have formed a transaction properly, rather than submitting the actual transaction to be stored forever on the blockchain. These proofs take around 40 seconds to generate, and the current supply of ZCash is a very limited 7977 ZEC. So I’m going to take you through some privacy enhancing methods that work on top of Ethereum, a cryptocurrency with approximately 15 second blocktimes and a current market cap of nearly $1 billion. If we do everything right, our anonymous transactions might even be mined before the ZCash proof has even finished generating.

Statistics from Zchain — Currently 229/(7748+229)=2.87% of ZEC is stored in ‘shielded’ accounts, the rest (97.13%) is in a state with Bitcoin-like privacy properties. Stats and graph: Zchain.

Blockchains

If a lot of the words above meant nothing to you, here’s a blockchain/Bitcoin/Ethereum/smart contract primer. If you already have your blockchain basics down, feel free to skip to part 2.

Bitcoin

Satoshi Nakamoto introduced the world to the proof-of-work blockchain, through the release of the bitcoin whitepaper in 2008, allowing users and interested parties to consider for the first time a trustless system, with which it is possible to securely transfer money to untrusted and unknown recipients. Since its launch, the success of bitcoin has motivated the creation of many other cryptocurrencies, both those built upon Bitcoin’s underlying structure, and those built entirely independently.

Cryptocurrencies are most simply described as ‘blockchains’ with a corresponding token or coin, with which you can create transactions that are then verified and stored in a block on the underlying blockchain. Put even more simply, they look a little like this:

detailedblockchain

Transactions in bitcoin are generally simple transfers of tokens from one account to another, and are stored in a block in the clear, with transaction value, sender, and recipient all available for any curious individual to view, analyse, or otherwise trace.

The blockchain has a consensus algorithm. This is essential for construction of one agreed-upon set of transactions, rather than multiple never-converging views of who currently owns which bitcoins.

diverge

The most common consensus algorithms in play at the moment are ‘proof of work’ algorithms, which require a computationally hard ‘puzzle’ to be solved by miners (a collection of competing participants) in order to make the cost of subverting or reversing transactions prohibitively expensive. Transactions are verified and mined as follows:

animine

To really understand how the privacy enhancing protocol conceived and implemented during my Master’s thesis works, we need to go a little deeper into the inner workings of a blockchain platform called Ethereum.

Ethereum: Programmable Money

In response to the limitations of Bitcoin’s restricted scripting language (you can pretty much only transfer money from A to B with Bitcoin, for security reasons), the Ethereum platform was created, offering an (almost) Turing-complete distributed virtual machine atop the Ethereum blockchain, along with a currency called Ether. The increased scripting ability of the system enables developers to create ‘smart contracts’ on the blockchain, programs with rich functionality and the ability to operate on the blockchain state. The blockchain state records current ownership of money and of the local, persistent storage offered by Ethereum. Smart contracts are limited only by the amount of gas they consume. Gas is a sub-currency of the Ethereum system, existing to impose a limit on the amount of computational time an individual contract can use.

The arrows along the top show how to produce each piece of data from the previous. The labels on the bottom arrows are 'known hard problems', which cannot feasibly be solved with today's computing power. — The arrows along the top show how to produce each piece of data from the previous. The labels on the bottom arrows are ‘known hard problems’, which cannot feasibly be solved with today’s computing power.

Ethereum accounts take one of two forms – either they are ‘externally owned’ accounts, controlled with a private key (like all accounts in the bitcoin system), or ‘contracts’, controlled by the code that resides in the specific address in question. Contracts have immutable code stored at the contract address, and additional storage which can be read from and written to by the contract. An Ethereum transaction contains the destination address, optional data, the gas limit, the sequence number and signature authorising the transaction. If the destination address corresponds to a contract, the contract code is then executed, subject to the gas limit as specified in the transaction, which is used to allow a certain number of computational steps before halting.

The ability to form smart contracts suggests some quite specific methods of addressing the lack of privacy and corresponding potential lack of fungibility of coins in the Ethereum system. Although cryptocurrencies provide some privacy with the absence of identity related checks required to buy, mine, or spend coins, the full transaction history is public, enabling any motivated individual to track and link users’ purchases. This concept heavily decreases the fungibility of cryptocurrencies, allowing very revealing taint analysis of coins to be performed, and leading to suggestions of blacklisting coins which were once flagged as stolen.

With smart contracts, we can do magical things, starting in part 2

A Longitudinal Measurement Study of 4chan’s Politically Incorrect Forum and its Effect on the Web

The discussion board site 4chan has been a part of the Internet’s dark underbelly since its creation, in 2003, by ‘moot’ (Christopher Poole). But recent events have brought it under the spotlight, making it a central figure in the outlandish 2016 US election campaign, with its links to the “alt-right” movement and its rhetoric of hate and racism. However, although 4chan is increasingly “covered” by the mainstream media, we know little about how it actually operates and how instrumental it is in spreading hate on other social platforms. A new study, with colleagues at UCL, Telefonica, and University of Rome now sheds light on 4chan and in particular, on /pol/, the “politically incorrect” board.

What is 4chan anyway?

4chan is an imageboard site, built around a typical bulletin-board model. An “original poster” creates a new thread by making a post, with one single image attached, to a board with a particular focus of interest. Other users can reply, with or without images. Some of 4chan’s most important aspects are anonymity (there is no identity associated with posts) and ephemerality (inactive threads are routinely deleted).

4chan currently features 69 boards, split into 7 high level categories, e.g. Japanese Culture or Adult. In our study, we focused on the /pol/ board, whose declared intended purpose is “discussion of news, world events, political issues, and other related topics”. Arguably, there are two main characteristics of /pol/ threads. One is its racist connotation, with the not-so-unusual aggressive tone, offensive and derogatory language, and links to the “alt-right” movement—a segment of right-wing ideologies supporting Donald Trump and rejecting mainstream conservatism as well as immigration, multiculturalism, and political correctness. The other characteristic is the fact that it generates a substantial amount of original content and “online” culture, ranging from the “lolcats” memes to “pepe the frog.”

This figure below shows four examples of typical /pol/ threads:

/pol/ example threads — Examples of typical /pol/ threads. Thread (A) illustrates the derogatory use of “cuck” in response to a Bernie Sanders image, (B) a casual call for genocide with an image of a woman’s cleavage and a “humorous” response, (C) /pol/’s fears that a withdrawal of Hillary Clinton would guarantee Donald Trump’s loss, and (D) shows Kek the “god” of memes via which /pol/ believes influences reality.

Raids towards other services

Another aspect of /pol/ is its reputation for coordinating and organizing so-called “raids” on other social media platforms. Raids are somewhat similar to Distributed Denial of Service (DDoS) attacks, except that rather than aiming to interrupt the service at a network level, they attempt to disrupt the community by actively harassing users and/or taking over the conversation.

Continue reading A Longitudinal Measurement Study of 4chan’s Politically Incorrect Forum and its Effect on the Web

Understanding the Use of Leaked Webmail Credentials in the Wild

Online accounts enable us to store and access documents, make purchases, and connect to new friends, among many other capabilities. Even though online accounts are convenient to use, they also expose users to risks such as inadvertent disclosure of private information and fraud. In recent times, data breaches and subsequent exposure of users to attacks have become commonplace. For instance, over the last four years, account credentials of millions of users from Dropbox, Yahoo, and LinkedIn have been stolen in massive attacks conducted by cybercriminals.

After online accounts are compromised by cybercriminals, what happens to the accounts? In our paper, presented today at the 2016 ACM Internet Measurement Conference, we answer this question. To do so, we needed to monitor the compromised accounts. This is hard to do, since only large online service providers have access to data from such compromised accounts, for instance Google or Yahoo. As a result, there is sparse research literature on the use of compromised online accounts. To address this problem, we developed an infrastructure to monitor the activity of attackers on Gmail accounts. We did this to enable researchers to understand what happens to compromised webmail accounts in the wild, despite the lack of access to proprietary data on compromised accounts.

Cybercriminals usually sell the stolen credentials on the underground black market or use them privately, depending on the value of the compromised accounts. Such accounts can be used to send spam messages to other online online accounts, or to retrieve sensitive personal or corporate information from the accounts, among a myriad of malicious uses. In the case of compromised webmail accounts, it is not uncommon to find password reset links, financial information, and authentication credentials of other online accounts inside such webmail accounts. This makes webmail accounts particularly attractive to cybercriminals, since they often contain a lot of sensitive information that could potentially be used to compromise other accounts. For this reason, we focus on webmail accounts.

Our infrastructure works as follows. We embed scripts based on Google Apps Script in Gmail accounts, so that the accounts send notifications of activity to us. Such activity includes the opening of email messages, creation of email drafts, sending of email messages, and “starring” of email messages. We also record details of accesses including IP addresses, browser information, and access times of visitors to the accounts. Since we designed the Gmail accounts to lure cybercriminals to interact with them (in the sense of a honeypot system), we refer to the accounts as honey accounts.

To study webmail accounts stolen via malware, we also developed a malware sandbox infrastructure that executes information-stealing malware samples inside virtual machines (VMs). We supply honey credentials to the VMs, which drive web browsers and login to the honey accounts automatically. The login action triggers the malware in the VMs to steal and exfiltrate the honey credentials to Command-and-Control servers under the control of botmasters.

Continue reading Understanding the Use of Leaked Webmail Credentials in the Wild

Strong Customer Authentication in the Payment Services Directive 2

Within the European Union, since 2007, banks are regulated by the Payment Services Directive. This directive sets out which types of institutions can offer payment services, and what rules they must follow. Importantly for customers, these rules include in what circumstances a fraud victim is entitled to a refund. In 2015 the European Parliament adopted a substantial revision to the directive, the Payment Services Directive 2 (PSD2), and it will soon be implemented by EU member states. One of the major changes in PSD2 is the requirement for banks to implement Strong Customer Authentication (SCA) for transactions, more commonly known as two-factor authentication – authentication codes based on two or more elements selected from something only the user knows, something only the user possesses, and something the user is. Moreover, the authentication codes must be linked to the recipient and amount of the transaction, which the customer must be made aware of.

The PSD2 does not detail the requirements of Strong Customer Authentication, nor the permitted exemptions to this rule. Instead, these decisions are to be made by the European Banking Authority (EBA) through Regulatory Technical Standards (RTS). As part of the development of these technical standards the EBA opened an initial discussion, to which we submitted a response based on our research on the security usability of banking authentication. Based on the discussion, the EBA produced a consultation paper incorporating a set of draft technical standards. In our response to this consultation paper, included below, we detailed how research both on security usability and banking authentication more broadly should guide the assessment of Strong Customer Authentication. Specifically we point out that there is an incorrect assumption of an inherent tradeoff between security and usability, that for a system to be secure it must be usable, and that evaluation of Strong Customer Authentication systems should be independent, transparent, and follow principles developed from latest research.

False trade-off between security and usability

In the reasoning presented in the consultation paper there is an assumption that a trade-off must be made between security and usability, e.g. paragraph 6 “Finally, the objective of ensuring a high degree of security and safety would suggest that the [European Banking Authority’s] Technical Standards should be onerous in terms of authentication, whereas the objective of user-friendliness would suggest that the [Regulatory Technical Standards] should rather promote the competing aim of customer convenience, such as one-click payments.”

This security/usability trade-off is not inherent to Strong Customer Authentication (SCA), and in fact the opposite is more commonly true: in order for SCA to be secure it must also be usable “because if the security is usable, users will do the security tasks, rather than ignore or circumvent them”. Also, SCA that is usable will make it more likely that customers will detect fraud because they will not have to expend their limited attention on just performing the actions required to make the SCA work. A small subset (10–15%) of participants in some studies reasoned that the fact that a security mechanism required a lot of effort from them meant it was secure. But that is a misconception that must not be used as an excuse for effortful authentication procedures.

Continue reading Strong Customer Authentication in the Payment Services Directive 2

A Privacy Enhancing Architecture for Secure Wearable Devices

Why do we need security on wearable devices? The primary reason comes from the fact that, being in direct contact with the user, wearable devices have access to very private and sensitive user’s information more often than traditional technologies. The huge and increasing diversity of wearable technologies makes almost any kind of information at risk, going from medical records to personal habits and lifestyle. For that reason, when considering wearables, it is particularly important to introduce appropriate technologies to protect these data, and it is primary that both the user and the engineer are aware of the exact amount of collected information as well as the potential threats pending on the user’s privacy. Moreover, it has also to be considered that the privacy of the wearable’s user is not the only one at risk. In fact, more and more devices are not limited to record the user’s activity, but can also gather information about people standing around.

This blog post presents a flexible privacy enhancing system from its architecture to the prototyping level. The system takes advantage from anonymous credentials and is based on the protocols developed by M. Chase, S. Meiklejohn, and G. Zaverucha in Algebraic MACs and Keyed-Verification Anonymous Credentials. Three main entities are involved in this multi-purpose system: a main server, wearable devices and localisation beacons. In this multipurpose architecture, the server firstly issue some anonymous credentials to the wearables. Then, each time a wearable reach a particular physical location (gets close to a localisation beacon) where it desires to perform an action, it starts presenting its credentials in order to ask the server the execution of that a particular action. Both the design of the wearable and the server remain generic and scalable in order to encourage further enhancements and easy integration into real-world applications; i.e., the central server can manage an arbitrary number of devices, each device can posses an arbitrary number of credentials and the coverage area of the localisation system is arbitrarily extendable.

Architecture

The complete system’s architecture can be modelled as depicted in the figure below. Roughly speaking, a web interface is used to manage and display the device’s functions. Each user and admin access the system from that interface.

During the setup phase, the server issues the credentials to a selected device (according to the algorithms presented in Algebraic MACs and Keyed-Verification Anonymous Credentials) granting it a given privilege level. The credentials’ issuance is a short-range process. In fact, the wearable needs to be physically close to the server to allow the admins to physically verify, once and for all, the identity of the wearables’ users. In order to improve security and battery life, the wearable only communicates using extremely low-power and short-range radio waves (dotted line on the figure). The server beacons can be seen as continuities of the main server and have essentially two roles: the first is to operate as an interface between the wearables and the server, and the second is to act as a RF localisation system. Each time the wearable granted with enough privileges reaches some particular physical location (gets close to a localisation beacon), it starts presenting its credentials in order to prove to the server that it possesses credentials over some attributes (without revealing them), and that these credentials have been previously issued by the server itself. Note that the system preserves anonymity only if many users are involved (for each privilege level), but this is a classic requirement of anonymous systems. Finally, once the credentials have been successfully verified by the server, the server issues a signed request to an external entity (which can be, for instance, an automatic door, an alarm system or any compatible IoT entity) to perform the requested action.

Continue reading A Privacy Enhancing Architecture for Secure Wearable Devices

How to do Zero-Knowledge from Discrete-Logs in under 7kB

We recently had our annual conference for the Academic Centres of Excellence in the UK and I am proud that Jonathan Bootle from UCL won the PhD student elevator pitch competition. I’ll now hand over the rest of the blog post to Jonathan to explain the communication reduction technique for zero-knowledge arguments he presented.

— Jens Groth

In zero-knowledge arguments, a verifier wants to check that a prover possesses secret knowledge satisfying some stated conditions. The prover wants to convince the verifier without disclosing her secrets. Security is based on cryptographic assumptions, like the discrete logarithm assumption. In 1997, Cramer and Damgård produced a discrete log argument requiring the prover to send N pieces of data to the verifier, for a statement of size N. Groth improved this to √N in 2009 and for five years, this was thought to be the best possible. That is, until cryptographers at UCL, including myself, discovered an incredible trick, which shattered the previous record. So how does it work?

The first step is to compile the statement into the correct format for our protocol. Existing compilers take practical statements, such as verifying the correct execution of a C program, and convert them into a type of circuit. We then convert this circuit into a vector equation for the verifier to check. A circuit of size N produces a vector of size roughly N. So far, so good.

We also need use a commitment scheme, which works like a magician’s envelope. By committing to a hidden value, it can be fixed, and revealed later.

In previous works in the discrete logarithm setting, the prover commits to the vector using a single commitment.

Later the prover reveals the whole vector. This gives high communication costs, and was a major bottleneck.

We discovered a rare and extraordinary property of the commitment scheme which allowed us to cut a vector in half and compress the two halves together. Applying the same trick repeatedly produces a single value which is easy to send, and drastically reduces the communication costs of the protocol.

We begin with a vector of roughly N elements. If we repeatedly cut the vector in half, it takes log(N) cuts to produce a vector containing only a single element. Each time that the prover cuts a vector in half, they must make new commitments to enable the verifier to check the shorter vectors produced. Overall, roughly log(N) new commitments are produced, so the prover must send about log(N) values to the verifier. This is a dramatic improvement from √N. To put this in perspective, doubling the size of the statement only adds a constant amount of data for the prover to send.

Of course, it is possible to base security on other cryptographic assumptions, but other efficient protocols often use very strong and untested assumptions which have yet to receive a high level of scrutiny. By contrast, the discrete logarithm assumption has been used and studied for over three decades.

Is this just a theoretical advance? Far from it. Our protocol has a working Python implementation, which has been extensively benchmarked. By combining our code with the compilers that I mentioned earlier, we verified the correct execution of C programs. By sending under 7kB of data each time, we can verify matrix multiplications, simulations of the motion of gas molecules, and SHA-1 hashes. It’s nice to see theory that compiles and works.

The full paper is by Jonathan Bootle, Andrea Cerulli, Pyrros Chaidos, Jens Groth and Christophe Petit and was published at EUROCRYPT 2016.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31