Robust Classifier for TCP/IP Fingerprinting

Previous measurement studies analyzing Internet packet header traces demonstrate the wealth of information, for instance traffic characteristics and host behavior, available to a passive monitor. In this work we use simple probabilistic learning methods to perform a maximum-likelihood inference of a host's operating system from packet headers. Drawing upon previous TCP/IP ``fingerprinting'' techniques \cite{p0f} we exploit the subtle differences in network stack implementations that uniquely identify each operating system. Whereas previous tools rely on an exact match from an exhaustive list of TCP settings, we develop a naive Bayesian classifier that provides a continuous degree of identification confidence without deep-packet inspection. Rule-based approaches fail to identify as many as 5\% of the hosts in traces we collect from an Internet exchange point, likely due to users modifying their TCP parameters or employing stack ``scrubbers'' \cite{smart-scrub}. In contrast our classifier can intelligently disambiguate under uncertainty.

As an application of our classifier we improve upon previous approaches \cite{smb-technique} to infer the frequency and behavior of Internet clients behind NAT devices. Understanding the prevalence of hosts masquerading behind NAT devices has important implications \cite{RFC2993,RFC3235,RFC3027} to the Internet address registries, developers of end-to-end applications and designers of next generation protocols.

Our results are set to appear at PAM 2004.
Paper and presentation Available now

Return to Classifier.

Questions/comments: