On Location, Time, and Membership: Studying How Aggregate Location Data Can Harm Users’ Privacy

The increasing availability of location and mobility data enables a number of applications, e.g., enhanced navigation services and parking, context-based recommendations, or waiting time predictions at restaurants, which have great potential to improve the quality of life in modern cities. However, the large-scale collection of location data also raises privacy concerns, as mobility patterns may reveal sensitive attributes about users, e.g., home and work places, lifestyles, or even political or religious inclinations.

Service providers, i.e., companies with access to location data, often use aggregation as a privacy-preserving strategy to release these data to third-parties for various analytics tasks. The idea being that, by grouping together users’ traces, the data no longer contains information to enable inferences about individuals such as the ones mentioned above, while it can be used to obtain useful insights about the crowds. For instance, Waze constructs aggregate traffic models to improve navigation within cities, while Uber provides aggregate data for urban planning purposes. Similarly, CityMapper’s Smart Ride application aims at identifying gaps in transportation networks based on traces and rankings collected by users’ mobile devices, while Telefonica monetizes aggregate location statistics through advertising as part of the Smart Steps project.

That’s great, right? Well, our paper, “Knock Knock, Who’s There? Membership Inference on Aggregate Location Data” and published at NDSS 2018, shows that aggregate location time-series can in fact be used to infer information about individual users. In particular, we demonstrate that aggregate locations are prone to a privacy attack, known as membership inference: a malicious entity aims at identifying whether a specific individual contributed her data to the aggregation. We demonstrate the feasibility of this type of privacy attack on a proof-of-concept setting designed for an academic evaluation, and on a real-world setting where we apply membership inference attacks in the context of the Aircloak challenge, the first bounty program for anonymized data re-identification.

Membership Inference Attacks on Aggregate Location Time-Series

Our NDSS’18 paper studies membership inference attacks on aggregate location time-series indicating the number of people transiting in a certain area at a given time. That is, we show that an adversary with some “prior knowledge” about users’ movements is able to train a machine learning classifier and use it to infer the presence of a specific individual’s data in the aggregates.

We experiment with different types of prior knowledge. On the one hand, we simulate powerful adversaries that know the real locations for a subset of users in a database during the aggregation period (e.g., telco providers which have location information about their clients), while on the other hand, weaker ones which only know past statistics about user groups (i.e., reproducing a setting of continuous data release). Overall, we find that the adversarial prior knowledge influences significantly the effectiveness of the attack.

Continue reading On Location, Time, and Membership: Studying How Aggregate Location Data Can Harm Users’ Privacy

Measuring and Modeling the Vivino Wine Social Network

Over the past few years, food and drink have become an essential part of our social media footprints. This shouldn’t come as a surprise – after all, eating and drinking were social activities long before the first #foodporn hashtag on Instagram. In fact, scientific studies have showed that what we gobble up or gulp down is shaped by social and regional influences, and how we tend to mirror habits of people with shared social connections.

Nowadays, we have an unprecedented opportunity to study eating & drinking habits at scale, as people share more and more of that online, both on popular social networks like Instagram, Twitter, and Facebook, but also on “dedicated” apps like Yummly or Untappd.

Along these lines is our recent paper, “Of Wines and Reviews: Measuring and Modeling the Vivino Wine Social Network,” recently presented at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018) in Barcelona. The study – co-authored by former UCL undergraduate student Neema Kotonya, Italian wine journalist Paolo De Cristofaro, and UCL faculty Emiliano De Cristofaro –  presented a preliminary study showcasing big-data and social network analysis of how users worldwide consume, rate, and provide reviews of wines. We did so through the lens of Vivino, a popular wine social network. (And, yes, Paolo is Emiliano’s brother! 😊)

What is Vivino?

Vivino.com is an online community for wine enthusiasts, available both as a web and a mobile application. It was founded in 2009 by Heini Zachariassen, with his colleague Theis Sondergaard joining in 2010. In a nutshell, Vivino allows users to review and purchase wines through third-party vendors. The mobile app also provides a “wine scanner” functionality – i.e., users can upload pictures of wine labels and access reviews and details about the wine/winery.

But Vivino is actually a social network, as it allows wine enthusiasts to communicate with and follow each other, as well as share reviews and recommendations. As of September 2018, it had 32 million users, 9.7 million wines covering a multitude of wine styles, grapes, and geographical regions, as well as 103.7 million ratings and almost 35 million reviews.

Continue reading Measuring and Modeling the Vivino Wine Social Network

Stronger Password, Longer Lifetime: Studying UCL’s password policy

In October 2016, UCL’s Information Services Division (ISD) implemented a new password policy to encourage users to choose stronger passwords. The policy links password lifetime (the time before the password expires) to password strength: The stronger the password, the longer the lifetime.

We (Ingolf Becker, Simon Parkin and M. Angela Sasse) decided to collaborate with the Information Services Division to study the effect of this policy change, and the results were published at USENIX Security this week. We find that users appreciate the choice and respond to the policy by choosing stronger passwords when changing passwords. Even after 16 months the mean password lifetime at UCL continues to increase, yet stronger passwords also lead to more password resets.

The new policy

In the new policy, passwords with Shannon Information Entropy of 50 bits receive a lifetime of 100 days, and passwords with 120 bits receive a lifetime of 350 days:

Password expiry by entropy

Additionally, the new policy penalises the lifetime of passwords containing words from a large dictionary.

Users play the game

We analysed the password lifetime – what we will refer to from here on in as the ‘password strength’ – of all password change and reset events of all pseudonymised users at UCL. The following figure shows the mean password expiration of all users over time, smoothed by 31-day moving averages:

Password expiration over time for all users and new users.

A small drop in password strength was observed between November ’16 and February ’17, as users were moved on to and generally became accustomed to the new system; the kinds of passwords they would have been used to using were at that point not getting them as many days as before (hence the drop). After February ’17, the mean strength increases from 145 days to 170 days in 12 months – an increase of 6.9 bits of entropy. This strongly suggests that users have generally adapted slowly to the new password policy, and eventually make use of the relatively new ability to increase password lifetime by expanding and strengthening their passwords.

Continue reading Stronger Password, Longer Lifetime: Studying UCL’s password policy

What can infosec learn from strategic theory?

Antonio Roque, of MIT Lincoln Labs, has published some provocative papers to arXiv over the last year. These include one on cybersecurity meta-methodology and one on making predictions in cybersecurity. These papers ask some good questions. The one I want to focus on in this short space is what cybersecurity can learn from Carl von Clausewitz’s treatise On War.

This might seem a bit odd to modern computer scientists, but I think it’s a plausible question. Cybersecurity is about winning conflicts, at least sometimes. And as I and others have written, one of the interesting challenges about generating knowledge with a science of security is the fact we have active adversaries. As Roque tells us, generating knowledge in the face of adversaries is also one of the things On War is about.

One important question for me is whether Clausewitz interestingly presaged our current problems (and has since been overtaken), or if On War makes contributions to thinking about cybersecurity that are new and comparable to those from the fields of economics, mathematics, philosophy of science, etc. After a close reading of these papers, my stance is: I have more questions that need answers.

Continue reading What can infosec learn from strategic theory?

An untapped resource to reproduce studies

Science is generally accepted to operate by conducting specially-designed structured observations (such as experiments and case studies) and then interpreting the results to build generalised knowledge (sometimes called theories or models). An important, nay necessary, feature of the social operation of science is transparency in the design, conduct, and interpretation of these structured observations. We’re going to work from the view that security research is science just like any other, though of course as its own discipline it has its own tools, topics, and challenges. This means that studies in security should be replicable, reproducible, or at least able to be corroborated. Spring and Hatleback argue that transparency is just as important for computer science as it is for experimental biology. Rossow et al. also persuasively argue that transparency is a key feature for malware research in particular. But how can we judge whether a paper is transparent enough? The natural answer would seem to be if it is possible to make a replication attempt from the materials and information in the paper. Forget how often the replications succeed for now, although we know that there are publication biases and other factors that mess with that.

So how many security papers published in major conferences contain enough information to attempt a reproduction? In short, we don’t know. From anecdotal evidence, Jono and a couple students looked through the IEEE S&P 2012 proceedings in 2013, and the results were pretty grim. But heroic effort from a few interested parties is not a sustainable answer to this question. We’re here to propose a slightly more robust solution. Master’s students in security should attempt to reproduce published papers as their capstone thesis work. This has several benefits, and several challenges. In the following we hope to convince you that the challenges can be mitigated and the benefits are worth it.

This should be a choice, but one that master’s students should want to make. If anyone has a great new idea to pursue, they should be encouraged to do so. However, here in the UK, the dissertation process is compressed into the summer and there’s not always time to prototype and pilot study designs. Selecting a paper to reproduce, with a documented methodology in place, lets the student get to work faster. There is still a start-up cost; students will likely have to read several abstracts to shortlist a few workable papers, and then read these few papers in detail to select a good candidate. But learning to read, shortlist, and study academic papers is an important skill that all master’s students should be attempting to, well, master. This style of project would provide them with an opportunity to practice these skills.

Briefly, let’s be clear what we mean by reproduction of published work.
Reproduction isn’t just one thing. There’s reproduce and replicate and corroborate and controlled variation (see Feitelson for details). Not everything is amenable to reproduction. For example, case studies (such as attack papers) or natural experiments are often interesting because they are unique. Corroborating some aspect of the case may be possible with a new study, and such study is also valuable. But this not the sort of reproduction we have in mind to advocate here.

Continue reading An untapped resource to reproduce studies

Managing conflicts between ethical principles and job duties

Despite its international context, discussion of the social implications of technology is surprisingly parochial. For example, the idea that individuals should have control over how their data is used is considered radical and innovative in the US, despite it being commonly accepted in Europe since the early 1980’s. The same applies to including professional and ethical training as part of computer science curricula – while a recent move in many US institutions, it’s been mandatory for BCS accredited courses in the UK for as long as I can remember. One lesson that comes from the UK’s experience here, and that I think would be of help to institutions following its lead, is that students being aware of ethics is not enough to protect society and individuals. There needs to also be strong codes of conduct, built on ethical principles, which practitioners are expected to follow.

For most computer science practitioners in the UK, the codes of conduct of relevance are from the field’s professional bodies – BCS and IET. They say roughly what you might expect – do a good job, follow instructions, avoid conflicts of interest, and consider the public interest. I’ve always found these to be a bit unsatisfactory, treating ethical decisions as the uncontroversial product of the application of consistent rules of professional conduct. These rules however don’t help with reality, where practitioners are faced with decisions where all options come at substantial personal or financial cost, where rules are inconsistent with themselves and ethical principles, all while faced with substantial uncertainty as to the consequences of their actions.

That’s why I am pleased to see that the ACM ethical code released today goes some way to acknowledge the complex interaction between technology and society, and provides tools to help practitioners navigate the challenges. In particular it gives some guidance on a topic I have long felt sorely lacking in the BCS and IET codes – what to do when instructions from your employer conflict with the public interest. At best, the BCS and IET codes are silent on how to handle such situations – if anything the BCS code puts emphasis on acting “in accordance” with employer instructions compared to requiring that members only “have due regard” for the public interest. In contrast, the ACM code is clear “that the public good is the paramount consideration”.

The ACM code also is clear that ethical practices are the responsibility of all. Management should enact rules that require ethical practices – they “should pursue clearly defined organizational policies that are consistent with the Code and effectively communicate them to relevant stakeholders. In addition, leaders should encourage and reward compliance with those policies, and take appropriate action when policies are violated.” But also, the code puts the duty on employees, through individual or collective action, to follow ethical practices even if management has not discharged their duty – “rules that are judged unethical should be challenged”.

Courses of action discussed in the ACM code are not limited to just challenging rules, but also actively disrupting unethical practices – “consider challenging the rule through existing channels before violating the rule. A computing professional who decides to violate a rule because it is unethical, or for any other reason, must consider potential consequences and accept responsibility for that action”.

One specific example of such disruptive action is whistleblowing, which the code recognizes as a legitimate course of action in the right circumstances – “if leaders do not act to curtail or mitigate such risks, it may be necessary to ‘blow the whistle’ to reduce potential harm”. However, my one disappointment in the code is that such disclosures are restricted to being made only through the “appropriate authorities” even though such authorities are often ineffective at instituting organizational change or protecting whistleblowers.

Implementing ethical policies is not without cost, and when doing so runs against business opportunities, profit often wins. It is nevertheless helpful that the code suggests that “in cases where misuse or harm are predictable or unavoidable, the best option may be to not implement the system”. The UK banks currently saying they can’t prevent push-payment fraud, resulting in life-changing losses to their customers, would do well to consider this principle. The current situation, where customers are held liable despite taking a normal level of care, is not an ethical practice.

Overall, I think this code is helpful and I am impressed at the breadth and depth of thought that clearly went into it. The code is also timely, as practitioners are now discovering their power to disrupt unethical practices through collective action and could take advantage of being given the permission to do so. The next task will be how to support and encourage the adoption of ethical principles and counteract the powerful forces that run into conflict with their practice.

JavaCard: The execution environment you didn’t know you were using

This is the story of the most popular execution environment, its shortcomings, and how open source and hacking saved the day.

According to recent revelations, the MINIX operating system is embedded in the Management Engine of all Intel CPUs released after 2015. A side-effect of this is that MINIX became known as the most widespread operating system in the world almost overnight. However, in the last decade another tiny OS has silently pushed itself into even more devices around the world while remaining unknown to most: JavaCard.

Your SIM Card, Credit Cards, Loyalty Cards are all most likely JavaCards.

With more than six billion JavaCards sold last year, and approximately 20 billion estimated to have been purchased in total, JavaCard is the winner no one knows about. The execution environment was designed in 1996 for devices with limited memory and processing capabilities and was the first smartcard platform to give developers the ability to execute the same applet on cards produced by different manufacturers. This was a breakthrough for the industry that established JavaCard as the default platform for applications in need of a secure, tamper-proof element.

*While JavaCard is technically not an OS in the standard sense (smart cards do have their own proprietary OSes), in practice it provides very similar functionality with modern embedded OSes. For instance, unless granted by the vendor, app developers are strictly limited within the JavaCard runtime environment; this distinguishes JavaCard from e.g, the classic Java VM where developers can also execute native applications in addition to those executed in the JVM.

A malfunctioning operating system

An operating system is much more than the sum of its source code: it’s also the ecosystem built around it, including the specifications, support, and most importantly the application developers and the user community.

For JavaCard, the specification part is handled formally by Oracle and Java Card Forum who make periodical releases of the platform’s virtual machine (JCVM) specification, runtime library (JCRL) and application programming interface (JC API). All these contribute towards homogeneity between cards from different manufacturers; aiming to ensure applet interoperability and a minimum level of support for basic cryptographic algorithms – at least in theory. In practice, our research has shown that no product in the market implements JC API completely, and different cards support different sets of features. This severely hinders the interoperability of applications and constrains developers within the limited subset of JavaCard features supported by most manufacturers. Developers who choose to use all the features provided by a specific card will, with high likelihood, abolish the interoperability of their applets. Furthermore, some of those specifications are also inadvertently limiting the scope of the platform. For instance, the API specification that lists all the calls to be potentially available to smart card applications ended up acting as an evolutionary bottleneck. This is due to:

  1. Approximately three-year-long API revision cycles that severely delay support for newer cryptographic algorithms (the last API revision was released in 2015).
  2. More complex cryptographic operations (especially asymmetric cryptography) requiring design and production of dedicated hardware accelerators to actually support newly added cryptographic algorithms.
  3. The business model is geared towards the large corporations. Hence, newer cards are available only to those buying in large quantities while smaller development houses and researchers are forced to work with five years old, or even older, cards.

Continue reading JavaCard: The execution environment you didn’t know you were using

Will new UK rules reduce the harm of push-payment fraud?

On Friday’s Rip off Britain I’ll be talking about new attempts by UK banks to prevent fraud, and the upcoming scheme for reimbursing the victims. While these developments have the potential to better protect customers, the changes could equally leave customers in a more vulnerable situation than before. What will decide between these two extremes is how well designed will be the rules surrounding these new schemes.

The beginning of this story is September 2016, when the consumer association – Which? – submitted a super-complaint to the UK Payment System Regulator (PSR) regarding push payment fraud – where a customer is tricked into transferring money into a criminal’s account. Such bank transfers are known as push payments because they are initiated by the bank sending the money, as opposed to pull payments, like credit and debit cards, where it is the receiving bank that starts the process. Banks claim that since the customer was involved in the process, they “authorised” the transaction, and so under UK and EU law, the customer is not entitled to a refund. I’ve argued that this interpretation doesn’t match any reasonable definition of the word “authorised” but nevertheless the term “authorised push payment scams” seems to have stuck as the commonly used terminology for this type of fraud, I’m sure much to the banks’ delight.

The Which? super-complaint asked for banks to be held liable for such frauds, and so reimburse the victims unless the bank can demonstrate the customer has acted with gross negligence. Which? argued that this approach would protect the customers from a fraud that exists as a consequence of bank design decisions, and provides banks with both a short-term incentive to prevent frauds that they can stop, as well as a medium-to-long term incentive for the banks to enhance payment systems to be resistant to fraud. The response from the PSR was disappointing, recognising that banks should do more, but rejecting the recommendation to hold banks liable for this fraud and requesting only that the banks collect more data. Nevertheless, the data collected proved useful in understanding the scale of the problem – £236 million stolen from over 42,000 victims in 2017, with banks only being able to recover 26% of the losses. This revelation led to Parliament asking difficult questions of the PSR.

The PSR’s alternative to holding banks liable for push payment fraud is for victims to be reimbursed if they can demonstrate they have acted with an appropriate level of care and that the bank has not. The precise definition of each level of care was a subject of consultation, and will now be decided by a steering group consisting of representatives of the banking industry and consumers. In my response to this consultation, I explained my reasons for recommending that banks be liable for fraud, including that fairly deciding whether customers met a level of care is a process fraught with difficulties. This is particularly the case due to the inequality in power between a bank and its customer, and that taking a banking dispute to court is ruinously expensive for most people since the option of customers spreading the cost through collective actions was removed from the Financial Services Act. More generally, banks – as the designers of payment systems and having real-world understanding of their use – have the greatest capacity to mitigate the risks these systems introduce.

Nevertheless, if the rules for the reimbursement scheme are set up well, it would be a substantial improvement over the current situation. On the other hand, if the process is bad then it could entrench the worst of current practices. Because the PSR has decided that reimbursement should depend on compliance to a level of care, my response also included what should be the process for defining these levels, and for adjudicating disputes.

Continue reading Will new UK rules reduce the harm of push-payment fraud?

Attack papers are case studies

We should treat attack papers like case studies. When we read them, review them, use them for evidence, and learn from them. This claim is not derogatory. Case studies are useful. But like anything, to be useful case studies need to be done and used appropriately.

Let’s be clear what I mean by attack paper. Any paper that reports how to attack some system. Any paper that includes details of an exploit, discloses a vulnerability, or demonstrates a proof-of-concept for breaching the security of a system. The efail paper that Steven discussed recently is an example. Security conferences are full of these; the ratio of attack papers to total papers varies per conference. USENIX Security tends to contain a fair few.

Let’s be clear what I mean by case study. I mean a scientific report that details a specific occurrence of interest as observed by the author. Case studies can be active, and include interviews or other questioning. They can be solely passive observation. Case studies can follow just one case in isolation, or might follow a series of related cases in similar ways for comparison. Case studies usually do not involve a planned intervention by the observer, otherwise we start to call them experiments. But they may track changes as the result of interventions outside the observer’s control.

What might change if we think about attack papers as case studies? We can apply our scientific experience from other disciplines. I’ve argued before that security is a science. We need to adapt scientific techniques, and other sciences might learn from what we do in security. But we need to be in a dialogue there. Calling attack papers what they are opens up this dialogue in several ways.

Continue reading Attack papers are case studies

Security code AutoFill: is this new iOS feature a security risk for online banking?

A new feature for iPhones in iOS 12 – Security Code AutoFill – is supposed to improve the usability of Two Factor Authentication but could place users at risk of falling victim to online banking fraud.

Two Factor Authentication (2FA), which is often referred to as Two Step Verification, is an essential element for many security systems, especially those online and accessed remotely. In most cases, it provides extended security by checking if the user has access to a device. In SMS-based 2FA, for example, a user registers their phone number with an online service. When this service sees a login attempt for the corresponding user account, it sends a One Time Password (OTP), e.g. four to six digits, to the registered phone number. The legitimate user then receives this code and is able to quote it during the login process, but an impersonator won’t.

In a recent development by Apple, announced at its developer conference WWDC18, they are set to automate this last step to improve user experience with 2FA with a new feature that is set to be introduced to iOS in version 12. The Security Code AutoFill feature, currently available to developers in a beta version, will allow the mobile device to scan incoming SMS messages for such codes and suggest them at the top of the default keyboard.

Description of new iOS 12 Security Code AutoFill feature (source: Apple)

Currently, these SMS codes rely on the user actively switching apps and memorising the code, which can take a couple of seconds. Some users deploy alternative try strategies such as memorising the code from the preview banner and hastily typing it down. Apple’s new iOS feature will require only a single tap from the user. This will make the login process faster and less error prone, a significant improvement to the usability of 2FA. It could also translate into an increased uptake of 2FA among iPhone users.

Example of Security Code AutoFill feature in operation on iPhone (source: Apple)

If users synchronise SMS with their MacBook or iMac, the existing Text Message Forwarding feature will push codes from their iPhone and enable Security Code AutoFill in Safari.

Example of Security Code AutoFill feature synchronised with macOS Mojave (source: Apple)

Reducing friction in user interaction to improve technology uptake for new users, and increase the usability and satisfaction for existing users, is not a new concept. It has not only been discussed in academia at length but is also a common goal within industry, e.g. in banking. This is evident in how the financial and payment industry has encouraged contactless (Near Field Communication – NFC) payments, which makes transactions below a certain threshold much quicker than traditional Chip and PIN payments.

Continue reading Security code AutoFill: is this new iOS feature a security risk for online banking?