Internet-wide scanning (or probing) has emerged as a key measurement technique to study a diverse set of the Internet’s properties, including address space utilization, host reachability, topology, service availability, vulnerabilities, and service discrimination. But despite its widespread use and critical importance for Internet measurement, we still lack a clear understanding of IP liveness—whether a target IP address responds to a probe packet. What type of probe packets should we send if we, for example, want to maximize the responding host population? What type of responses can we expect and which factors determine such responses? What degree of consistency can we expect when probing the same host with different probe packets?
In our recent paper Scanning the Internet for Liveness, we presented a systematic analysis of liveness and how it shows up in active scanning campaigns. We developed a taxonomy of liveness which we employed to develop a method to perform concurrent IPv4 scans using ICMP, five TCP-based, and two UDP-based protocols, capturing all responses to our probes. Our key findings are:
- Responsive host populations are highly sensitive to the choice of probe. While ICMP discovers the highest number of raw IPs, our TCP and UDP measurements exclusively contribute a fifth to the total population of responsive hosts.
- Collecting ICMP Error messages for TCP and UDP scans increases the responsive population by more than 13%, and provides new opportunities to interpret scan results.
- At the transport layer, our concurrent measurements reveal that the majority of hosts exhibit inconsistent behaviour when probed on different ports, and that capturing negative responses significantly improves scanning completeness.
- Our concurrent scans allow us to identify nearly 2M tarpits (IPs masquerading as fake hosts) that would bias measurements that do not take them into account.
- Our study of cross-protocol liveness shows that responsiveness for some protocols is correlated, suggesting that the same seed set of responsive IP addresses can be potentially used to bootstrap multiple highly-correlated target populations to reduce scan traffic.
This work recently appeared in the April 2018 issue of ACM SIGCOMM Computer Communication Review (CCR), and was conducted in collaboration with Philipp Richter (MIT), Mobin Javed (LUMS Pakistan, ICSI Berkeley), Srikanth Sundaresan (Princeton University), Zakir Durumeric (Stanford University), Steven J. Murdoch (University College London), Richard Mortier (University of Cambridge) and Vern Paxson (UC Berkeley, ICSI Berkeley). Overall, this study yields practical insights and methodological improvements for the design and the execution of active Internet measurement studies. We released the code and data of this work as open source to allow for reproducibility of the results, and to enable further research.