\chapter{Message Flow Networks}
\label{messageflow}

The network flow-based trust metrics presented in previous chapters
measure the flow of an abstract quantity through a trust graph, and
use the presence of sufficient flow to a target node to decide whether
that node should be accepted. In this chapter, we consider flow
networks in which messages themselves are the unit of flow.

Such networks are especially useful for spam-resistant messaging.
Indeed, using a trust metric for access control is vulnerable to
attacks involving huge volumes of spam messages. The goal of a message
flow network is to tightly bound the {\em volume} of spam messages
which may be succesfully delivered by bad nodes.

\section{The basic message flow design}

\subsection{Assumptions}

We assume a trust network with similar properties as in previous
chapters. Each node in the network represents a participant in the
overall system. Each user manually enters {\em trust edges} into the
graph. Each edge is annotated with a message volume associated with
the peer.

If a user $s$ wants to receive no more than $k$ messages per time unit
from peer $t$, then $s$ enters a trust edge pointing from $t$ to $s$,
annotated with $k$ units of flow. Note that the direction of this
trust edge is reversed with respect to the trust metric presented in
Chapter~\ref{groupmetric}.

We also assume a central server which knows the entire trust graph,
and also keeps track of dynamic information based on the actual flow
of messages sent through the system.

\subsection{Message flow}

In particular, the server maintains a {\em residual capacity} for each
trust edge in the network, named in analogy to the standard algorithm
for computing maximum network flows. However, instead of attempting to
saturate the graph with the maximal number of augmenting paths,
residual capacity is consumed only when messages are actually sent.

In more detail, when a node $x$ wishes to send a message to node $s$,
the server attempts to find a path from $x$ to $s$ through the
residual capacity graph. If no such path exists (in other words, if
there exists a partition of the graph with $x$ on one side and $s$ on
the other in which all trust edges have zero residual capacity), then
the message is blocked. Otherwise, the message is sent, and the
residual capacity for each edge along the path is decremented by one.

As usual in computing augmenting paths, it's a good idea to use the
heuristic of choosing a shortest path through the residual graph,
thereby minimizing total capacity consumed for an individual message.

\subsection{Repleneshing the flow}

Edge capacity is measured in units of number of messages per unit of
time. In addition to consuming capacity when messages are sent, there
must also be a mechanism for replenishing this capacity, so that the
network is roughly in equilibrium when there is a steady stream of
messages.

There are a number of approaches to this repleneshing. Perhaps the
simplest is to replenish {\em all} residual capacities to match the
maximum capacity specified in the trust graph, once every time unit.
However, this approach has a number of disadvantages, including an
uneven probability of rejecting messages over time, depending on the
fractional time since the last increment of the time unit.

It would also be desirable to accommodate bursty traffic patterns.  If
a user specifies seven messages per week from another peer, it would
be unusual to expect messages sent at intervals of exactly 24
hours. Rather, it's more likely that several days would pass without a
message, interspersed with bursts of several messages in quick
succession.

Thus, we propose a computationally simple and efficient method
modelling a process by which flow is replenished with exponential
decay. In addition to storing a scalar for each edge representing the
residual capacity, the server also stores a timestamp representing the
last moment that the capacity was updated.

Then, each time the capacity is queried or modified, it is updated
according to the following simple algorithm:

\[
\begin{array}{l}
cap = cap + (max - cap) \cdot (1 - \exp (timestamp - now))\\
timestamp = now
\end{array}
\]

It should be clear that, over a period of one time unit, the total
flow across such an edge will never exceed the maximum specified. At
the same time, burstiness is penalized, but only somewhat. For
example, if messages are sent in bursts once every time unit,
interspersed by inactivity, then the total flow is $1-e^{-1} \approx
.632$ of the maximum specified.

\section{Security Analysis}

The goal of the message flow network is to deliver ``good'' messages
successfully, while blocking ``bad'' messages.  Thus, the most logical
analysys of the success of such a network is to measure the number of
messages in each category.

Following the treatment in previous chapters (TODO: cite), we
partition the network into nodes labelled ``good'' and ``bad'' nodes.
Intuitively, these notions correspond to legitimate users and spammers
or other attackers, respectively, but from a mathematical point of
view, the partition is entirely arbitrary.

Given this framework, the first security result is appealing in both
its strength and its simplicity:

\begin{theorem}
The total flow of messages from bad nodes to good nodes
is bounded by the sum of capacities of edges from bad nodes to good.
\end{theorem}

The proof of this assertion is nearly trivial; it is essentially a
restatement of the fundamental property of network flows. (TODO: in
fairness, this is the multi-source, multi-sink case. Let's cite a
source)

Of course, a network which blocks bad messages is worthless unless it
also passes good ones. The analysis here is a bit trickier, because we
are measuring the \emph{potential} for one good node to send a message
to another. It is not valid to assume that good nodes are always
trying to saturate the network with messages to other good nodes, in
contrast to the familiar reality of spammers (bad nodes) trying to
send messages.


(TODO: cite social networking literature here?)

Outline: partition network into good \& bad.


Total message flow across partition is bounded by total capacity of
edges across the partition.

Consider two cases of partition: 

1. There is one bad node. Then he can't send more spam than the total
cap of edges from good nodes linking in.

2. There is one good node. Then you can't receive more spam than
the total cap of edges to you. Have a heuristic that favors short
paths when multiple messages are competing for scarce flow. Then,
hopefully all the good mail from nearby neighbors gets through, and at
worst the spammer blocks only mail from good peers farther away.

\section{Pragmatics}

The design of this chapter is unimplemented as of this writing.
However, it should be straightforward to implement as a centralized
web service.

Obviously, having a centralized server is limits both scaling and
security (all users are vulnerable to an attack on the central
server). A completely decentralized peer-to-peer network implementing
the core concepts of the message flow network presented above is the
topic of the next chapter.

\section{Application Notes}

Outline: email is killer app for this.

A similar app is posting comments and backlinks in blogs. Since \#
nodes is so much smaller, a centralized server may be appropriate.

Don't need to actually send messages through net. Don't need to
replace SMTP. You can just send tokens {\em authorizing} the sending
of a message. Then, the sending email client can query the server for
such a token, and insert it into the headers of the mail. The
recepient email client extracts the token from the headers, and
verifies it.