Basic flow reassembly algorithm:

After we've assembled a complete datagram out of the fragments in the
trace file, it goes for further processing.  The first stage is to
identify the stream to which the packet belongs.

The clever trick here is to define the A and B ends of a session in a
way which means that every packet in the session has the same sid, but
this is easy: just sort the fields.  In particular, of you have the
5-tuple ((hosta,porta),(hostb,portb),protocol), you know that
(porta,hosta)<(portb,hostb) in a lexicographic ordering.

(For protocols which don't have port numbers, like ICMP, or for which
we can't parse enough of the header to find the port number, like ddp,
we define the port numbers to be 0.)

This means that we can easily and unambiguously map from packets to
stream ids, without having to search the flow table, and that if we
have to break a stream due to low memory (see later) you get two
streams with the same name (which should be relatively easy to spot)
rather than one stream called (a,b,proto) and one called (b,a,proto)
(which wouldn't be).

Once we've found the sid, we can just look up the stream in a big hash
table.  The hash table lookup will also allocate new streams if
there's nothing currently in the table with that sid; the
circumstances under which an sid is sufficiently similar to avoid this
control when new connections are reported (more later).

Once a flow has been found or created, the datagram is passed to the
state machine:

State machine actions:

-- Datagram has SYN set:
	 -- If it hasn't get ack set, then this direction is marked
	    as being client->server
	 -- The Syn sequence number for this direction is set to whatever
	    this packets syn sequence number was
	 -- The transmitted sequence number is set to this packet's
	    sequence number plus one.
-- Datagram contains data:
	 -- Transmitted sequence number is set to the packet sequence number
	    plus the number of bytes of data in the packet
-- Datagram has fin set:
	 -- Transmitted sequence number is set to the packet sequence number,
	    plus the number of bytes of data in the packet, plus 1.
	 -- fin sequence number is set to the new transmitted sequence
	    number
-- Datagram has ack set:
	 -- If the ack sequence number is equal to the other direction's
	    fin sequence number, set that this direction has acked a fin.
	 -- If both directions have acked fins, mark the connection as
	    being comatose.
-- Datagram has rst set:
	 -- Connection is marked as aborted


In addition, any packet which has SYN set but not ACK marks that
direction of the flow as being client to server; this is used when
writing out the senses file.  If we have a flow A/B/ts, then the table
looks like this:


\ A's
 \c2s   0             1
  \
B's\
c2s \

0       unknown       client/server

1       server/client unknown


Creating new connections
------------------------

A packet is considered to match a connection if the stream ids match, and:

-- If the connection is comatose, the packet must be either a fin
   with the same sequence number as the previous fin in that direction, or
   an ack which acknowledges a fin in the opposite direction, or a RST
   segment.

-- If the connection is aborted, the packet must not have syn set

-- If the packet is neither comatose nor aborted, then no further
   conditions are necessary.

If a packet matches a stream, then it gets added to that stream.  If
there is no such stream, a new one gets created.


There is a slight complication: we only have a finite amount of memory
for streams.  When memory starts running low, we discard the
longest-idle stream; if the stream was neither aborted nor comatose,
and had packets in both directions, we also write a message to the
logfile.  At the end of the run, the critical idle threshold is
printed: this is the idle time of the least-idle discarded stream.


Other tricks
------------

You can change the amount of memory used for buffering by changing
max_buffer_mb.  This can safely be done at runtime by attaching with a
debugger if e.g. you find you've started thrashing three days into a
run.  Note that buffers are managed LRU, and so interactions with the
operating system's swap algorithm are likely to be very nasty.

Changing the buffer size should never effect the final results.

Bugs
----

The memory used for storing out-of-order fragments isn't properly
accounted; if you get too many, you run out of memory and die.  It
should be fairly obvious when this happens.

