diff options
-rw-r--r-- | doc/TODO | 1 | ||||
-rw-r--r-- | doc/tor-design.tex | 92 |
2 files changed, 46 insertions, 47 deletions
@@ -5,6 +5,7 @@ rename ACI to CircID rotate tls-level connections -- make new ones, expire old ones. dirserver shouldn't put you in running-routers list if you haven't uploading a descriptor recently +look at having smallcells and largecells Legend: SPEC!! - Not specified diff --git a/doc/tor-design.tex b/doc/tor-design.tex index 0e75b1bac..466a28a50 100644 --- a/doc/tor-design.tex +++ b/doc/tor-design.tex @@ -52,7 +52,7 @@ \begin{abstract} We present Tor, a circuit-based low-latency anonymous communication system. Tor is the successor to Onion Routing -and addresses many limitations in the original Onion Routing design. +and addresses various limitations in the original Onion Routing design. Tor works in a real-world Internet environment, requires no special privileges such as root- or kernel-level access, requires little synchronization or coordination between nodes, and @@ -388,7 +388,8 @@ they avoid the well-known inefficiencies of tunneling TCP over TCP Distributed-trust anonymizing systems need to prevent attackers from adding too many servers and thus compromising too many user paths. -Tor relies on a centrally maintained set of well-known servers. Tarzan +Tor relies on a small set of well-known servers to make +decisions about which nodes can join. Tarzan and MorphMix allow unknown users to run servers, and limit an attacker from becoming too much of the network based on a limited resource such as number of IPs controlled. Crowds suggests requiring written, notarized @@ -440,13 +441,13 @@ so that it can serve as a test-bed for future research in low-latency anonymity systems. Many of the open problems in low-latency anonymity networks, such as generating dummy traffic or preventing Sybil attacks \cite{sybil}, may be solvable independently from the issues solved by -Tor. Hopefully future systems will not need to reinvent Tor's design -decisions. (But note that while a flexible design benefits researchers, +Tor. Hopefully future systems will not need to reinvent Tor's design. +(But note that while a flexible design benefits researchers, there is a danger that differing choices of extensions will make users distinguishable. Experiments should be run on a separate network.) -\textbf{Conservative design:} The protocol's design and security -parameters must be conservative. Additional features impose implementation +\textbf{Simple design:} The protocol's design and security +parameters must be well-understood. Additional features impose implementation and complexity costs; adding unproven techniques to the design threatens deployability, readability, and ease of security analysis. Tor aims to deploy a simple and stable system that integrates the best well-understood @@ -454,14 +455,15 @@ approaches to protecting anonymity. \SubSection{Non-goals} \label{subsec:non-goals} -In favoring conservative, deployable designs, we have explicitly deferred +In favoring simple, deployable designs, we have explicitly deferred a number of goals, either because they are solved elsewhere, or because they are an open research question. \textbf{Not Peer-to-peer:} Tarzan and MorphMix aim to scale to completely decentralized peer-to-peer environments with thousands of short-lived servers, many of which may be controlled by an adversary. This approach -is appealing, but still has many open problems. +is appealing, but still has many open problems +\cite{tarzan:ccs02,morphmix:fc04}. \textbf{Not secure against end-to-end attacks:} Tor does not claim to provide a definitive solution to end-to-end timing or intersection @@ -522,9 +524,10 @@ network and correlating traffic entering and leaving the network---either because of relationships in packet timing; relationships in the volume of data sent; or relationships in any externally visible user-selected options. The adversary can also mount active attacks by compromising -routers or keys; by replaying traffic; by selectively DoSing trustworthy -routers to encourage users to send their traffic through compromised -routers, or DoSing users to see if the traffic elsewhere in the +routers or keys; by replaying traffic; by selectively denying service +to trustworthy routers to encourage users to send their traffic through +compromised routers, or denying service to users to see if the traffic +elsewhere in the network stops; or by introducing patterns into traffic that can later be detected. The adversary might attack the directory servers to give users differing views of network state. Additionally, he can try to decrease @@ -587,8 +590,10 @@ fairness issues. % I think we should describe connections before cells. -NM Traffic passes from one OR to another, or between a user's OP and an OR, -in fixed-size cells. Each cell is 256 -bytes, and consists of a header and a payload. The header includes an +in fixed-size cells. Each cell is 256 bytes (but see +Section~\ref{sec:conclusion} +for a discussion of allowing large cells and small cells on the same +network), and consists of a header and a payload. The header includes an anonymous circuit identifier (ACI) that specifies which circuit the % Should we replace ACI with circID ? What is this 'anonymous circuit' % thing anyway? -RD @@ -611,7 +616,8 @@ be multiplexed over a circuit); an end-to-end checksum for integrity checking; the length of the relay payload; and a relay command. Relay commands can be one of: \emph{relay data} (for data flowing down the stream), \emph{relay begin} (to open a -stream), \emph{relay end} (to close a stream), \emph{relay connected} +stream), \emph{relay end} (to close a stream cleanly), \emph{relay +teardown} (to close a broken stream), \emph{relay connected} (to notify the OP that a relay begin has succeeded), \emph{relay extend} and \emph{relay extended} (to extend the circuit by a hop, and to acknowledge), \emph{relay truncate} and \emph{relay truncated} @@ -621,9 +627,6 @@ implement long-range dummies). We describe each of these cell types in more detail below. -% Nick: should there have been a table here? -RD -% Maybe. -NM - \SubSection{Circuits and streams} \label{subsec:circuits} @@ -638,8 +641,9 @@ open many TCP streams. In Tor, each circuit can be shared by many TCP streams. To avoid delays, users construct circuits preemptively. To limit linkability among the streams, users rotate connections by building a new circuit -periodically (currently every minute) if the previous one has been -used, and expire old used circuits that are no longer in use. Thus +periodically if the previous one has been used, +and expire old used circuits that are no longer in use. Tor considers +making a new circuit once a minute: thus even heavy users spend a negligible amount of time and CPU in building circuits, but only a limited number of requests can be linked to each other by a given exit node. Also, because circuits are built @@ -745,25 +749,25 @@ applications like Mozilla and ssh have this flaw. In the case of Mozilla, we're fine: the filtering web proxy called Privoxy does the SOCKS call safely, and Mozilla talks to Privoxy safely. But a -portable general solution, such as for ssh, is an open problem. We could +portable general solution, such as for ssh, is an open problem. We can modify the local nameserver, but this approach is invasive, brittle, and -not portable. We could encourage the resolver library to do resolution +not portable. We can encourage the resolver library to do resolution via TCP rather than UDP, but this approach is hard to do right, and also -has portability problems. Our current answer is to encourage the use of -privacy-aware proxies like Privoxy wherever possible, and also provide -a tool similar to \emph{dig} that can do a private lookup through the -Tor network. +has portability problems. We can provide a tool similar to \emph{dig} that +can do a private lookup through the Tor network. Our current answer is to +encourage the use of privacy-aware proxies like Privoxy wherever possible, Ending a Tor stream is analogous to ending a TCP stream: it uses a two-step handshake for normal operation, or a one-step handshake for errors. If one side of the stream closes abnormally, that node simply sends a relay teardown cell, and tears down the stream. If one side -% Nick: mention relay teardown in 'cell' subsec? good enough name? -RD of the stream closes the connection normally, that node sends a relay end cell down the circuit. When the other side has sent back its own relay end, the stream can be torn down. This two-step handshake allows for TCP-based applications that, for example, close a socket for writing -but are still willing to read. +but are still willing to read. Remember that all relay cells use layered +encryption, so only the destination OR knows what type of relay cell +it is. \SubSection{Integrity checking on streams} @@ -815,6 +819,7 @@ that Alice or Bob tear down the circuit if they receive a bad hash. Volunteers are generally more willing to run services that can limit their bandwidth usage. To accomodate them, Tor servers use a token bucket approach to limit the number of bytes they +% XXX cite token bucket? receive. Tokens are added to the bucket each second (when the bucket is full, new tokens are discarded.) Each token represents permission to receive one byte from the network---to receive a byte, the connection @@ -947,17 +952,6 @@ to slow down other users when they build new circuits. % What about link-to-link rate limiting? -More worrisome are distributed denial of service attacks wherein an -attacker uses a large number of compromised hosts throughout the network -to consume the Tor network's resources. Although these attacks are not -new to the networking literature, some proposed approaches are a poor -fit to anonymous networks. For example, solutions based on backtracking -harmful traffic \cite{XXX} could allow an anonymity-breaking -adversary to exploit the backtracking mechanism. -% XXX I don't see how you would do DDoS through Tor. And even if you -% did, it seems ok to track you down. Should we remove this -% paragraph? -RD - Attackers also have an opportunity to attack the Tor network by mounting attacks on its hosts and network links. Disrupting a single circuit or link breaks all currently open streams passing along that part of the @@ -1001,7 +995,7 @@ network. (Using a private exit (if one exists) is a more secure way for a client to connect to a given host or network---an external adversary cannot eavesdrop traffic between the private exit and the final destination, and so is less sure of Alice's destination and -activities.) is less sure of Alice's destination. More generally, +activities.) is less sure of Alice's destination. In general, nodes can require a variety of forms of traffic authentication \cite{or-discex00}. @@ -1187,7 +1181,7 @@ but refuses to relay traffic from other routers, the directory servers must build circuits and use them to anonymously test router reliability \cite{mix-acc}. -When a client Alice retrieves a consensus directory, she uses it if it +When Alice retrieves a consensus directory, she uses it if it is signed by a majority of the directory servers she knows. Using directory servers rather than flooding provides simplicity and @@ -1221,8 +1215,9 @@ Our design for location-hidden servers has the following properties: simply by sending many requests to talk to Bob. Thus, Bob needs a way to filter incoming requests. \item[Robust:] Bob should be able to maintain a long-term pseudonymous - identity even in the presence of router failure. Thus, Bob's identity - must not be tied to a single OR. + identity even in the presence of router failure. Thus, Bob's service + must not be tied to a single OR, and Bob must be able to tie his service + to new ORs. \item[Smear-resistant:] An attacker should not be able to use rendezvous points to smear an OR. That is, if a social attacker tries to host a location-hidden service that is illegal or disreputable, it should not @@ -1327,8 +1322,8 @@ remains a SOCKS proxy. Thus we must encode all of the necessary information into the fully qualified domain name Alice uses when establishing her connections. Location-hidden services use a virtual top level domain called `.onion': thus hostnames take the form -x.y.onion where x encodes the hash of PK, and y is the authentication -cookie. Alice's onion proxy examines hostnames and recognizes when +x.y.onion where x is the authentication cookie, and y encodes the hash +of PK. Alice's onion proxy examines hostnames and recognizes when they're destined for a hidden server. If so, it decodes the PK and starts the rendezvous as described in the table above. @@ -1342,7 +1337,7 @@ self-authenticating, and so the client can recognize the same service with confidence later on. His design also differs from ours in the following ways: First, Goldberg suggests that the client should manually hunt down a current location of the service via Gnutella; -whereas our use of the DHT makes lookup faster, more robust, and +whereas our use of CFS makes lookup faster, more robust, and transparent to the user. Second, in Tor the client and server negotiate ephemeral keys via Diffie-Hellman, so at no point in the path is the plaintext exposed. Third, our design tries to minimize the @@ -1546,7 +1541,9 @@ them. traffic once the circuits have been closed.) Additionally, building circuits that cross jurisdictions can make legal coercion harder---this phenomenon is commonly called ``jurisdictional - arbitrage.'' + arbitrage.'' The JAP project recently experienced this issue, when + the German government successfully ordered them to add a backdoor to + all of their nodes. \item \emph{Run a recipient.} By running a Web server, an adversary @@ -1890,7 +1887,8 @@ issues remaining to be ironed out. In particular: %% commented out for anonymous submission %\Section{Acknowledgments} -% Peter Palfrader for editing +% Peter Palfrader, Geoff Goodell, Adam Shostack, Joseph Sokol-Margolis +% for editing and comments % Bram Cohen for congestion control discussions % Adam Back for suggesting telescoping circuits |