Exploiting Transport-Level Characteristics of Spam

Robert Beverly and Karen Sollins.
Proceedings of the Fifth Conference on Email and Anti-Spam (CEAS 2008),
Mountain View, CA, August 2008.

We present a spam detection technique that relies on neither content nor reputation analysis. Instead, this work investigates the discriminatory power of the email TCP packet stream. From a corpus of packet flows and their corresponding messages, we extract per-email \emph{transport-layer} features. While legitimate mail traffic is well-behaved, we observe small congestion windows, retransmissions, loss and large latencies in spam flows. To identify the most selective flow properties, thereby adapting to different networks and users, we build SpamFlow.'' On our data, SpamFlow achieves greater than 90\% classification accuracy while correctly identifying 78\% of the false negatives from a popular content filter. By capitalizing on spam's fundamental requirement to source large quantities of mail, often from resource constrained hosts and networks, SpamFlow promises a unique and difficult-to-subvert complement to existing spam defenses.

