The increasing availability of location and mobility data enables a number of applications, e.g., enhanced navigation services and parking, context-based recommendations, or waiting time predictions at restaurants, which have great potential to improve the quality of life in modern cities. However, the large-scale collection of location data also raises privacy concerns, as mobility patterns may reveal sensitive attributes about users, e.g., home and work places, lifestyles, or even political or religious inclinations.
Service providers, i.e., companies with access to location data, often use aggregation as a privacy-preserving strategy to release these data to third-parties for various analytics tasks. The idea being that, by grouping together users’ traces, the data no longer contains information to enable inferences about individuals such as the ones mentioned above, while it can be used to obtain useful insights about the crowds. For instance, Waze constructs aggregate traffic models to improve navigation within cities, while Uber provides aggregate data for urban planning purposes. Similarly, CityMapper’s Smart Ride application aims at identifying gaps in transportation networks based on traces and rankings collected by users’ mobile devices, while Telefonica monetizes aggregate location statistics through advertising as part of the Smart Steps project.
That’s great, right? Well, our paper, “Knock Knock, Who’s There? Membership Inference on Aggregate Location Data” and published at NDSS 2018, shows that aggregate location time-series can in fact be used to infer information about individual users. In particular, we demonstrate that aggregate locations are prone to a privacy attack, known as membership inference: a malicious entity aims at identifying whether a specific individual contributed her data to the aggregation. We demonstrate the feasibility of this type of privacy attack on a proof-of-concept setting designed for an academic evaluation, and on a real-world setting where we apply membership inference attacks in the context of the Aircloak challenge, the first bounty program for anonymized data re-identification.
Membership Inference Attacks on Aggregate Location Time-Series
Our NDSS’18 paper studies membership inference attacks on aggregate location time-series indicating the number of people transiting in a certain area at a given time. That is, we show that an adversary with some “prior knowledge” about users’ movements is able to train a machine learning classifier and use it to infer the presence of a specific individual’s data in the aggregates.
We experiment with different types of prior knowledge. On the one hand, we simulate powerful adversaries that know the real locations for a subset of users in a database during the aggregation period (e.g., telco providers which have location information about their clients), while on the other hand, weaker ones which only know past statistics about user groups (i.e., reproducing a setting of continuous data release). Overall, we find that the adversarial prior knowledge influences significantly the effectiveness of the attack.