TARA - Comparable Flows from Live Data

Before attempting any correlation among flows, we first sought to understand the number of active flows in our trace data sharing common source and end points as a function of time. The trace we analyze here, 20030509-180820.eth1.tr, contains an hour's worth of LCS egress traffic. With all source addresses fixed to a common end point (LCS), we observe the active flow count as we vary the IP mask used to aggregate destination points. As we increase the mask length, we expect fewer active flows sharing common end points. Figures 1 and 2 show active flow counts versus time and prefix mask.
Figure 1. Total number of active flows as a function of time and /8 prefix.
Figure 2. Total number of active flows as a function of time and /12 prefix.

In practice, we want to run our flow rate correlation routine after a flow expires with every other flow. For this analysis, we expire flows that are inactive for 60 seconds or more. We further refine our criteria for a potential flow rate comparison by looking only at flows that contain at least 20 overlaping packets in the same time period. Figures 3 and 4 illustrate the number of pair-wise flow rate comparisons possible over time. Note that the large spike of comparisons at the plot end represent the long-lived flows being expired once the trace capture ends.


Figure 3. Percentage of comparisons from all N^2 flow comparisons in 60 second intervals.
Comparable flows have at least 20 packets in the same time period and match IP prefixes to a specified precision.
Figure 4. Number of flows from pair-wise set that are comparable in each 60 second expiration interval.
Figure 5. Scatter plot of pair-wise correlation results.
Figure 6. CDF comparison of flow rate correlations for different prefix masks (log x scale)