1 files changed, 100 insertions, 109 deletions
diff --git a/doc/spec/proposals/158-microdescriptors.txt b/doc/spec/proposals/158-microdescriptors.txt
index f478a3c83..e6966c0ce 100644
--- a/doc/spec/proposals/158-microdescriptors.txt
+++ b/doc/spec/proposals/158-microdescriptors.txt
@@ -1,11 +1,20 @@
 Filename: 158-microdescriptors.txt
 Title: Clients download consensus + microdescriptors
-Version: $Revision$
-Last-Modified: $Date$
 Author: Roger Dingledine
 Created: 17-Jan-2009
 Status: Open
 
+0. History
+
+  15 May 2009: Substantially revised based on discussions on or-dev
+  from late January.  Removed the notion of voting on how to choose
+  microdescriptors; made it just a function of the consensus method.
+  (This lets us avoid the possibility of "desynchronization.")
+  Added suggestion to use a new consensus flavor.  Specified use of
+  SHA256 for new hashes. -nickm
+
+  15 June 2009: Cleaned up based on comments from Roger. -nickm
+
 1. Overview
 
   This proposal replaces section 3.2 of proposal 141, which was
@@ -13,9 +22,7 @@ Status: Open
   circuit-building protocol to fetch a server descriptor inline at each
   circuit extend, we instead put all of the information that clients need
   either into the consensus itself, or into a new set of data about each
-  relay called a microdescriptor. The microdescriptor is a direct
-  transform from the relay descriptor, so relays don't even need to know
-  this is happening.
+  relay called a microdescriptor.
 
   Descriptor elements that are small and frequently changing should go
   in the consensus itself, and descriptor elements that are small and
@@ -24,6 +31,10 @@ Status: Open
   them, we'll need to resume considering some design like the one in
   proposal 141.
 
+  Note also that any descriptor element which clients need to use to
+  decide which servers to fetch info about, or which servers to fetch
+  info from, needs to stay in the consensus.
+
 2. Motivation
 
   See
@@ -36,99 +47,91 @@ Status: Open
 3. Design
 
   There are three pieces to the proposal. First, authorities will list in
-  their votes (and thus in the consensus) what relay descriptor elements
-  are included in the microdescriptor, and also list the expected hash
-  of microdescriptor for each relay. Second, directory mirrors will serve
-  microdescriptors. Third, clients will ask for them and cache them.
+  their votes (and thus in the consensus) the expected hash of
+  microdescriptor for each relay. Second, authorities will serve
+  microdescriptors, directory mirrors will cache and serve
+  them. Third, clients will ask for them and cache them.
 
 3.1. Consensus changes
 
-  V3 votes should include a new line:
-    microdescriptor-elements bar baz foo
-  listing each descriptor element (sorted alphabetically) that authority
-  included when it calculated its expected microdescriptor hashes.
+  If the authorities choose a consensus method of a given version or
+  later, a microdescriptor format is implicit in that version.
+  A microdescriptor should in every case be a pure function of the
+  router descriptor and the consensus method.
+
+  In votes, we need to include the hash of each expected microdescriptor
+  in the routerstatus section. I suggest a new "m" line for each stanza,
+  with the base64 of the SHA256 hash of the router's microdescriptor.
+
+  For every consensus method that an authority supports, it includes a
+  separate "m" line in each router section of its vote, containing:
+    "m" SP methods 1*(SP AlgorithmName "=" digest) NL
+  where methods is a comma-separated list of the consensus methods
+  that the authority believes will produce "digest".
 
-  We also need to include the hash of each expected microdescriptor in
-  the routerstatus section. I suggest a new "m" line for each stanza,
-  with the base64 of the hash of the elements that the authority voted
-  for above.
+  (As with base64 encoding of SHA1 hashes in consensuses, let's
+  omit the trailing =s)
 
   The consensus microdescriptor-elements and "m" lines are then computed
   as described in Section 3.1.2 below.
 
-  I believe that means we need a new consensus-method "6" that knows
-  how to compute the microdescriptor-elements and add "m" lines.
+  (This means we need a new consensus-method that knows
+  how to compute the microdescriptor-elements and add "m" lines.)
 
-3.1.1. Descriptor elements to include for now
+  The microdescriptor consensus uses the directory-signature format from
+  proposal 162, with the "sha256" algorithm.
 
-  To start, the element list that authorities suggest should be
-    family onion-key
 
-  (Note that the or-dev posts above only mention onion-key, but if
-  we don't also include family then clients will never learn it. It
-  seemed like it should be relatively static, so putting it in the
-  microdescriptor is smarter than trying to fit it into the consensus.)
+3.1.1. Descriptor elements to include for now
 
-  We could imagine a config option "family,onion-key" so authorities
-  could change their voted preferences without needing to upgrade.
+  In the first version, the microdescriptor should contain the
+  onion-key element, and the family element from the router descriptor,
+  and the exit policy summary as currently specified in dir-spec.txt.
 
 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
 
-  One approach is for the consensus microdescriptor-elements line to
-  include every element listed by a majority of authorities, sorted. The
-  problem here is that it will no longer be deterministic what the correct
-  hash for the "m" line should be. We could imagine telling the authority
-  to go look in its descriptor and produce the right hash itself, but
-  we don't want consensus calculation to be based on external data like
-  that. (Plus, the authority may not have the descriptor that everybody
-  else voted to use.)
-
-  The better approach is to take the exact set that has the most votes
-  (breaking ties by the set that has the most elements, and breaking
-  ties after that by whichever is alphabetically first). That will
-  increase the odds that we actually get a microdescriptor hash that
-  is both a) for the descriptor we're putting in the consensus, and b)
-  over the elements that we're declaring it should be for.
-
-  Then the "m" line for a given relay is the one that gets the most votes
-  from authorities that both a) voted for the microdescriptor-elements
-  line we're using, and b) voted for the descriptor we're using.
-
-  (If there's a tie, use the smaller hash. But really, if there are
-  multiple such votes and they differ about a microdescriptor, we caught
-  one of them lying or being buggy. We should log it to track down why.)
-
-  If there are no such votes, then we leave out the "m" line for that
-  relay. That means clients should avoid it for this time period. (As
-  an extension it could instead mean that clients should fetch the
-  descriptor and figure out its microdescriptor themselves. But let's
-  not get ahead of ourselves.)
-
-  It would be nice to have a more foolproof way to agree on what
-  microdescriptor hash each authority should vote for, so we can avoid
-  missing "m" lines. Just switching to a new consensus-method each time
-  we change the set of microdescriptor-elements won't help though, since
-  each authority will still have to decide what hash to vote for before
-  knowing what consensus-method will be used.
-
-  Here's one way we could do it. Each vote / consensus includes
-  the microdescriptor-elements that were used to compute the hashes,
-  and also a preferred-microdescriptor-elements set. If an authority
-  has a consensus from the previous period, then it should use the
-  consensus preferred-microdescriptor-elements when computing its votes
-  for microdescriptor-elements and the appropriate hashes in the upcoming
-  period. (If it has no previous consensus, then it just writes its
-  own preferences in both lines.)
-
-3.2. Directory mirrors serve microdescriptors
-
-  Directory mirrors should then read the microdescriptor-elements line
-  from the consensus, and learn how to answer requests. (Directory mirrors
-  continue to serve normal relay descriptors too, a) to serve old clients
-  and b) to be able to construct microdescriptors on the fly.)
-
-  The microdescriptors with hashes <D1>,<D2>,<D3> should be available at:
-    http://<hostname>/tor/micro/d/<D1>+<D2>+<D3>.z
+  When we are generating a consensus, we use whichever m line
+  unambiguously corresponds to the descriptor digest that will be
+  included in the consensus.
+
+  (If different votes have different microdescriptor digests for a
+  single <descriptor-digest, consensus-method> pair, then at least one
+  of the authorities is broken.  If this happens, the consensus should
+  contain whichever microdescriptor digest is most common.  If there is
+  no winner, we break ties in the favor of the lexically earliest.
+  Either way, we should log a warning: there is definitely a bug.)
+
+  The "m" lines in a consensus contain only the digest, not a list of
+  consensus methods.
+
+3.1.3. A new flavor of consensus
+
+  Rather than inserting "m" lines in the current consensus format,
+  they should be included in a new consensus flavor (see proposal
+  162).
+
+  This flavor can safely omit descriptor digests.
+
+  When we implement this voting method, we can remove the exit policy
+  summary from the current "ns" flavor of consensus, since no current
+  clients use them, and they take up about 5% of the compressed
+  consensus.
+
+  This new consensus flavor should be signed with the sha256 signature
+  format as documented in proposal 162.
+
+3.2. Directory mirrors fetch, cache, and serve microdescriptors
+
+  Directory mirrors should fetch, catch, and serve each microdescriptor
+  from the authorities.  (They need to continue to serve normal relay
+  descriptors too, to handle old clients.)
+
+  The microdescriptors with base64 hashes <D1>,<D2>,<D3> should be
+  available at:
+    http://<hostname>/tor/micro/d/<D1>-<D2>-<D3>.z
+  (We use base64 for size and for consistency with the consensus
+  format. We use -s instead of +s to separate these items, since
+  the + character is used in base64 encoding.)
 
   All the microdescriptors from the current consensus should also be
   available at:
@@ -136,24 +139,9 @@ Status: Open
   so a client that's bootstrapping doesn't need to send a 70KB URL just
   to name every microdescriptor it's looking for.
 
-  The format of a microdescriptor is the header line
-  "microdescriptor-header"
-  followed by each element (keyword and body), alphabetically. There's
-  no need to mention what hash it's for, since it's self-identifying:
-  you can hash the elements to learn this.
-
-  (Do we need a footer line to show that it's over, or is the next
-  microdescriptor line or EOF enough of a hint? A footer line wouldn't
-  hurt much. Also, no fair voting for the microdescriptor-element
-  "microdescriptor-header".)
-
+  Microdescriptors have no header or footer.
   The hash of the microdescriptor is simply the hash of the concatenated
-  elements -- not counting the header line or hypothetical footer line.
-  Unless you prefer that?
-
-  Is there a reasonable way to version these things? We could say that
-  the microdescriptor-header line can contain arguments which clients
-  must ignore if they don't understand them. Any better ways?
+  elements.
 
   Directory mirrors should check to make sure that the microdescriptors
   they're about to serve match the right hashes (either the hashes from
@@ -170,10 +158,14 @@ Status: Open
   When a client gets a new consensus, it looks to see if there are any
   microdescriptors it needs to learn. If it needs to learn more than
   some threshold of the microdescriptors (half?), it requests 'all',
-  else it requests only the missing ones.
+  else it requests only the missing ones.  Clients MAY try to
+  determine whether the upload bandwidth for listing the
+  microdescriptors they want is more or less than the download
+  bandwidth for the microdescriptors they do not want.
 
   Clients maintain a cache of microdescriptors along with metadata like
-  when it was last referenced by a consensus. They keep a microdescriptor
+  when it was last referenced by a consensus, and which identity key
+  it corresponds to.  They keep a microdescriptor
   until it hasn't been mentioned in any consensus for a week. Future
   clients might cache them for longer or shorter times.
 
@@ -190,18 +182,17 @@ Status: Open
   Another future option would be to fetch some of the microdescriptors
   anonymously (via a Tor circuit).
 
+  Another crazy option (Roger's phrasing) is to do decoy fetches as
+  well.
+
 4. Transition and deployment
 
   Phase one, the directory authorities should start voting on
-  microdescriptors and microdescriptor elements, and putting them in the
-  consensus. This should happen during the 0.2.1.x series, and should
-  be relatively easy to do.
+  microdescriptors, and putting them in the consensus.
 
   Phase two, directory mirrors should learn how to serve them, and learn
-  how to read the consensus to find out what they should be serving. This
-  phase could be done either in 0.2.1.x or early in 0.2.2.x, depending
-  on how messy it turns out to be and how quickly we get around to it.
+  how to read the consensus to find out what they should be serving.
 
   Phase three, clients should start fetching and caching them instead
-  of normal descriptors. This should happen post 0.2.1.x.
+  of normal descriptors.