Jono and I recently presented our joint paper with Simon and Angela at the Learning from Authoritative Security Experiment Results (LASER) Workshop in San Jose, CA, USA. The workshop was co-located with the IEEE Security and Privacy Symposium. LASER has a different focus each year; in 2016, presented papers explored new approaches to computer security experiments that are repeatable and can be shared across communities.
Through our LASER paper, “Towards robust experimental design for user studies in security and privacy”, we wanted to advance the quest for better experiment design and execution. We proposed the following five principles for conducting robust experiments into usable security and privacy:
- Give participants a primary task
- Ensure participants experience realistic risk
- Avoid priming the participants
- Perform experiments double-blind whenever possible
- Define these elements precisely: threat model; security; privacy and usability
Understanding users and their interaction with security is a blind spot for many security practitioners and designers. Learning from prior studies within and outside our research group, we have defined principles for conducting robust experiments into usable security and privacy. These principles are informed by efforts in other fields such as biology, qualitative research methods, and medicine, where four overarching experiment-design factors guided our principles:
Internal validity – The experiment is of “suitable scope to achieve the reported results” and is not “susceptible to systematic error”.
External validity – The result of the experiment “is not solely an artifact of the laboratory setting”.
Containment – There are no “confounds” in the results, and no experimental “effects are a threat to safety” of the participants, the environment, or society generally.
Transparency – “There are no explanatory gaps in the experimental mechanism” and the explanatory “diagram for the experimental mechanism is complete”, in that it covers all relevant entities and activities.
The following user-centred principles then emerged:
Primary task
In real life, people use computers to accomplish some task, be it to send an email, make a purchase or search for information, and so security as a task is secondary to a main purpose. By giving participants a primary task in a study, we make sure they are put in a realistic situation. A primary task adds external validity in another way since human attention and other mental resources are bounded, where exceeding these bounds can impact personal and corporate security.
The body of empirical evidence based on studies with true primary tasks is still small, but growing. For instance, participants have been invited to use online services, such as a tool for summarising academic articles or a Web search engine. Some examples of experiments into the behavioural economics of privacy include primary tasks in the areas of shopping. Participants have been invited to buy goods from online retailers as diverse as digital cameras, cinema tickets, DVDs or gourmet food.
Realistic risk
Like the importance of a primary task, having a realistic risk is part of the design principle of providing a realistic task environment for study participants, as the potential for real consequences is part of a realistic experience of security. Participants are under “realistic risk” when they perceive there is a threat to the security and privacy of their own information.
In a study we conducted, participants brought their own laptops to the laboratory and if they downloaded a file despite a security warning, it could have potentially infected their own computer with a virus. In interviews afterwards, a few participants stressed that if they have to download something from an untrusted source, they would do it on a public shared computer in order not to put their own machine in jeopardy. Similarly, in a phishing study by Schechter et al., participants who used their own credentials for online banking exhibited less risky behaviour than those who were provided with fake credentials to use.
Avoidance of priming
Priming is exposing participants to some information that might influence how they subsequently behave in the experiment. Non-priming the participant helps avoid biases such as demand characteristics where the participant gives answers based on what they believe the experimenter expects of them. Non-priming is an issue of internal validity, but also containment if the researcher comes into possession of personal or otherwise sensitive information. Non-priming can be achieved by simply not telling participants much about the purpose of the study, it can range from keeping the study description general to actively telling lies to participants. A common way to avoid priming is to deceive participants about the actual purpose of the study. Deception has been used in our field of research; Egelman et al. advocate deception for user studies in security and privacy to produce generalisable findings. In one of our studies, we told our participants we were examining a summary tool for academic papers where in reality we studied their reactions to download warnings.
Double-blind
In a double blind experiment, both the participant and the person executing the experiment do not know details of the study – this limits the capacity for either party to influence the study outcomes through knowledge of the study design itself. Traditionally used in medicine, the person executing the experiment would not be informed as to whether a patient is receiving an active medicine or a placebo. In this way, the designers of a medical trial hope to avoid a situation where an experimenter administering medicine treats the subject differently or influences the results in any other way. Double blind experiment design can improve internal validity and containment by preventing accidental transmission of biases, research goals, or sensitive information between the researcher and the participants.
To the best of our knowledge, experimental procedures using double blind have been used only once so far in usable security and privacy research. Malheiros et al. studied individuals’ willingness to disclose information in a credit card application. They employed three undergraduate psychology students to conduct experimental sessions. The students were told that the study was exploring individuals’ willingness to disclose different types of information on a loan application for an actual loan provider. In reality, the study was looking at participants’ privacy perceptions.
Define: threat model, security, privacy, usability
There are two important ways in which the researcher must carefully attend to how meaning is assigned to terms during explanation and during execution. Firstly, terms must have precise and well-defined meanings when articulating the design, protocol and results of an experiment to colleagues; secondly and more subtly, the researcher should be careful not to bias participants by priming them with definitions provided during the course of a study.
The terms we find to be most commonly impacted by definitional problems are threat model, security, privacy, and usability. These words are central to all research in the field, so it is both unsurprising and troubling that the terms are hard to define.
We are often studying the definitions, because security and privacy mean different things to different people. The gap between security architect and user definitions of security is demonstrated by an example study on CAPTCHAs. In our study on CAPTCHA alternatives, we asked participants to make purchases on a ticket-selling website and part of the check-out process was to solve a CAPTCHA. After the purchase, participants were interviewed about their experience. In the security community, the security of CAPTCHAs is considered in terms of them being solvable by humans and not by robots. This is to protect the system from automated attacks leading to for example unavailability of the service to actual users. In the study, when participants mentioned security they did not speak about how well the CAPTCHAs protected the services but worried about the security of their own accounts and personal data.
The format of each LASER session is a 10-minute presentation followed by a 45-minute discussion. Our session promoted plenty of discussion amongst attendees. We wish to extend the spirit of the session and of LASER: if you know of experiments which support – or even contradict – these principles for experiments in security and privacy usability, we encourage you to provide details in the comments section.
Further details can be found in the full text of our LASER paper.