aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorRoger Dingledine <arma@torproject.org>2003-11-05 01:29:36 +0000
committerRoger Dingledine <arma@torproject.org>2003-11-05 01:29:36 +0000
commit868b3c9724c59ae0e5719a81b56b4f02419f3f9a (patch)
treeff2472dcdc4fd3fad622742f9a4fe0ec0505d0d9 /doc
parent03925bc60b0f36409f3664d25160a427f2646e34 (diff)
downloadtor-868b3c9724c59ae0e5719a81b56b4f02419f3f9a.tar
tor-868b3c9724c59ae0e5719a81b56b4f02419f3f9a.tar.gz
more edits and compression
svn:r764
Diffstat (limited to 'doc')
-rw-r--r--doc/TODO8
-rw-r--r--doc/tor-design.tex133
2 files changed, 79 insertions, 62 deletions
diff --git a/doc/TODO b/doc/TODO
index f73922572..01596b0e1 100644
--- a/doc/TODO
+++ b/doc/TODO
@@ -9,6 +9,14 @@ look at having smallcells and largecells
separate trying to rebuild a circuit because you have none from trying to rebuild a
circuit because the current one is stale
+<nickm> If I compromise a node, and streamIDs are sequential, I learn
+how many streams have been open and closed on this circuit at this point.
+> hm. you learn this for circuits too, do you not?
+<nickm> True. But how-many-circuits-from-A-to-B only leaks how long
+the connection from A to B has been alive and how much use it's seen.
+> ok. needs more investigation.
+
+
Legend:
SPEC!! - Not specified
SPEC - Spec not finalized
diff --git a/doc/tor-design.tex b/doc/tor-design.tex
index 136f88033..138730d4b 100644
--- a/doc/tor-design.tex
+++ b/doc/tor-design.tex
@@ -123,9 +123,10 @@ Routing originally called for batching and reordering cells as they arrived,
assumed padding between ORs, and in
later designs added padding between onion proxies (users) and ORs
\cite{or-ih96,or-jsac98}. Tradeoffs between padding protection
-and cost were discussed, but no general padding scheme was suggested.
-It was theorized \cite{or-pet00} but not described how \emph{traffic
- shaping} would be used. Recent research \cite{econymics}
+and cost were discussed, and \emph{traffic shaping} algorithms were
+theorized \cite{or-pet00} that provide good security without expensive
+padding, but no concrete padding scheme was suggested.
+Recent research \cite{econymics}
and deployment experience \cite{freedom21-security} suggest that this
level of resource use is not practical or economical; and even full
link padding is still vulnerable \cite{defensive-dropping}. Thus,
@@ -157,7 +158,7 @@ congestion control uses end-to-end acks to maintain anonymity
while allowing nodes at the edges of the network to detect congestion
or flooding and send less data until the congestion subsides.
-\textbf{Directory servers:} Earlier Onion Routing designs
+\textbf{Directory servers:} The earlier Onion Routing design
planned to flood link-state information through the network---an approach
that can be unreliable and open to partitioning attacks or
deception. Tor takes a simplified view toward distributing link-state
@@ -189,7 +190,7 @@ while building circuits and route around them. Additionally, liveness
information from directories allows users to avoid unreliable nodes in
the first place.
-\textbf{Rendezvous points and location-protected servers:}
+\textbf{Rendezvous points and hidden services:}
Tor provides an integrated mechanism for responder anonymity via
location-protected servers. Previous Onion Routing designs included
long-lived ``reply onions'' that could be used to build circuits
@@ -228,7 +229,7 @@ Modern anonymity systems date to Chaum's {\bf Mix-Net} design
\cite{chaum-mix}. Chaum
proposed hiding the correspondence between sender and recipient by
wrapping messages in layers of public-key cryptography, and relaying them
-through a path composed of ``mixes.'' Each of these in turn
+through a path composed of ``mixes.'' Each mix in turn
decrypts, delays, and re-orders messages, before relaying them toward
their destinations.
@@ -388,15 +389,15 @@ main goal, however, several considerations have directed
Tor's evolution.
\textbf{Deployability:} The design must be implemented,
-deployed, and used in the real world. This requirement precludes designs
-that are expensive to run (for example, by requiring more bandwidth
-than volunteers are willing to provide); designs that place a heavy
+deployed, and used in the real world. Thus it
+must not be expensive to run (for example, by requiring more bandwidth
+than volunteers are willing to provide); must not place a heavy
liability burden on operators (for example, by allowing attackers to
-implicate onion routers in illegal activities); and designs that are
+implicate onion routers in illegal activities); and must not be
difficult or expensive to implement (for example, by requiring kernel
-patches, or separate proxies for every protocol). This requirement also
-precludes systems in which non-anonymous parties (such as websites)
-must run our software. (Our rendezvous point design does not meet
+patches, or separate proxies for every protocol). We also cannot
+require non-anonymous parties (such as websites)
+to run our software. (Our rendezvous point design does not meet
this goal for non-anonymous users talking to hidden servers,
however; see Section~\ref{sec:rendezvous}.)
@@ -413,8 +414,8 @@ to be anonymous. (The current Tor implementation runs on Windows and
assorted Unix clones including Linux, FreeBSD, and MacOS X.)
\textbf{Flexibility:} The protocol must be flexible and well-specified,
-so that it can serve as a test-bed for future research in low-latency
-anonymity systems. Many of the open problems in low-latency anonymity
+so Tor can serve as a test-bed for future research.
+Many of the open problems in low-latency anonymity
networks, such as generating dummy traffic or preventing Sybil attacks
\cite{sybil}, may be solvable independently from the issues solved by
Tor. Hopefully future systems will not need to reinvent Tor's design.
@@ -426,7 +427,7 @@ distinguishable. Experiments should be run on a separate network.)
parameters must be well-understood. Additional features impose implementation
and complexity costs; adding unproven techniques to the design threatens
deployability, readability, and ease of security analysis. Tor aims to
-deploy a simple and stable system that integrates the best well-understood
+deploy a simple and stable system that integrates the best accepted
approaches to protecting anonymity.\\
\noindent{\large\bf Non-goals}\label{subsec:non-goals}\\
@@ -443,8 +444,7 @@ is appealing, but still has many open problems
\textbf{Not secure against end-to-end attacks:} Tor does not claim
to provide a definitive solution to end-to-end timing or intersection
attacks. Some approaches, such as running an onion router, may help;
-see Section~\ref{sec:attacks} for more discussion.
-% XXX P2P issues here. -NM
+see Section~\ref{sec:maintaining-anonymity} for more discussion.
\textbf{No protocol normalization:} Tor does not provide \emph{protocol
normalization} like Privoxy or the Anonymizer. If anonymization from
@@ -460,6 +460,7 @@ tunneling for non-stream-based protocols like UDP; this too must be
provided by an external service.
\textbf{Does not provide untraceability:} Tor does not try to conceal
+%XXX untraceability, unobservability, unlinkability? -RD
which users are
sending or receiving communications; it only tries to conceal with whom
they communicate.
@@ -487,16 +488,16 @@ we aim to prevent \emph{traffic
analysis} attacks, where the adversary uses traffic patterns to learn
which points in the network he should attack.
-Our adversary might try to link an initiator Alice with any of her
-communication partners, or he might try to build a profile of Alice's
-behavior. He might mount passive attacks by observing the edges of the
-network and correlating traffic entering and leaving the network---either
-by relationships in packet timing; relationships in the volume
-of data sent; or relationships in any externally visible user-selected
+Our adversary might try to link an initiator Alice with her
+communication partners, or try to build a profile of Alice's
+behavior. He might mount passive attacks by observing the network edges
+and correlating traffic entering and leaving the network---either
+by relationships in packet timing; relationships in volume;
+or relationships in externally visible user-selected
options. The adversary can also mount active attacks by compromising
routers or keys; by replaying traffic; by selectively denying service
-to trustworthy routers to encourage users to send their traffic through
-compromised routers, or denying service to users to see if the traffic
+to trustworthy routers to move users to
+compromised routers, or denying service to users to see if traffic
elsewhere in the
network stops; or by introducing patterns into traffic that can later be
detected. The adversary might subvert the directory servers to give users
@@ -516,7 +517,7 @@ each of these attacks.
The Tor network is an overlay network; each onion router (OR)
runs as a normal
-user-level processes without any special privileges.
+user-level process without any special privileges.
Each onion router maintains a long-term TLS \cite{TLS}
connection to every other onion router.
%(We discuss alternatives to this clique-topology assumption in
@@ -545,7 +546,7 @@ link keys are used by the TLS protocol when communicating between
onion routers. Each short-term key is rotated periodically and
independently, to limit the impact of key compromise.
-Section~\ref{subsec:cells} discusses the fixed-size
+Section~\ref{subsec:cells} presents the fixed-size
\emph{cells} that are the unit of communication in Tor. We describe
in Section~\ref{subsec:circuits} how circuits are
built, extended, truncated, and destroyed. Section~\ref{subsec:tcp}
@@ -616,8 +617,8 @@ among their streams, users' OPs build a new circuit
periodically if the previous one has been used,
and expire old used circuits that no longer have any open streams.
OPs consider making a new circuit once a minute: thus
-even heavy users spend a negligible amount of time
-building circuits, but only a limited number of requests can be linked
+even heavy users spend negligible time
+building circuits, but a limited number of requests can be linked
to each other through a given exit node. Also, because circuits are built
in the background, OPs can recover from failed circuit creation
without delaying streams and thereby harming user experience.\\
@@ -663,8 +664,8 @@ This circuit-level handshake protocol achieves unilateral entity
authentication (Alice knows she's handshaking with the OR, but
the OR doesn't care who is opening the circuit---Alice uses no public key
and is trying to remain anonymous) and unilateral key authentication
-(Alice and the OR agree on a key, and Alice knows the OR is the
-only other entity who knows it). It also achieves forward
+(Alice and the OR agree on a key, and Alice knows only the OR learns
+it). It also achieves forward
secrecy and key freshness. More formally, the protocol is as follows
(where $E_{PK_{Bob}}(\cdot)$ is encryption with Bob's public key,
$H$ is a secure hash function, and $|$ is concatenation):
@@ -675,7 +676,7 @@ $H$ is a secure hash function, and $|$ is concatenation):
\end{aligned}
\end{equation*}
-In the second step, Bob proves that it was he who received $g^x$,
+\noindent In the second step, Bob proves that it was he who received $g^x$,
and who chose $y$. We use PK encryption in the first step
(rather than, say, using the first two steps of STS, which has a
signature in the second step) because a single cell is too small to
@@ -745,7 +746,7 @@ truncate} cell to a single OR on the circuit. That OR then sends a
\emph{destroy} cell forward, and acknowledges with a
\emph{relay truncated} cell. Alice can then extend the circuit to
different nodes, all without signaling to the intermediate nodes (or
-an observer) that she has changed her circuit.
+a limited observer) that she has changed her circuit.
Similarly, if a node on the circuit goes down, the adjacent
node can send a \emph{relay truncated} cell back to Alice. Thus the
``break a node and see which circuits go down'' attack
@@ -759,7 +760,7 @@ address and port, it asks the OP (via SOCKS) to make the
connection. The OP chooses the newest open circuit (or creates one if
none is available), and chooses a suitable OR on that circuit to be the
exit node (usually the last node, but maybe others due to exit policy
-conflicts; see Section~\ref{subsec:exitpolicies}. The OP then opens
+conflicts; see Section~\ref{subsec:exitpolicies}.) The OP then opens
the stream by sending a \emph{relay begin} cell to the exit node,
using a streamID of zero (so the OR will recognize it), containing as
its relay payload a new randomly generated streamID, the destination
@@ -960,7 +961,7 @@ can't send a \emph{relay sendme} cell when its packaging window is empty.
\SubSection{Resource management and denial-of-service}
\label{subsec:dos}
-Providing Tor as a public service provides many opportunities for
+Providing Tor as a public service creates many opportunities for
denial-of-service attacks against the network. While
flow control and rate limiting (discussed in
Section~\ref{subsec:congestion}) prevent users from consuming more
@@ -1043,11 +1044,14 @@ between the private exit and the final destination, and so is less sure of
Alice's destination and activities. Most onion routers will function as
\emph{restricted exits} that permit connections to the world at large,
but prevent access to certain abuse-prone addresses and services.
+In general, nodes could require the user to authenticate before
+being allowed to exit \cite{or-discex00}.
% XXX This next sentence makes no sense to me in context; must
% XXX revisit. -NM
-In
-general, nodes can require a variety of forms of traffic authentication
-\cite{or-discex00}.
+% Does this help? It's for the enclave OR model. -RD
+%In
+%general, nodes can require a variety of forms of traffic authentication
+%\cite{or-discex00}.
%The abuse issues on closed (e.g. military) networks are different
%from the abuse on open networks like the Internet. While these IP-based
@@ -1122,12 +1126,12 @@ exit policies. Each such \emph{directory server} acts as an HTTP
server, so participants can fetch current network state and router
lists, and so other ORs can upload
state information. Onion routers periodically publish signed
-statements of their state to each directory server, which combines this
-state information with its own view of network liveness, and generates
-a signed description (a \emph{directory}) of the entire network
-state. Client software is
-pre-loaded with a list of the directory servers and their keys; it uses
-this information to bootstrap each client's view of the network.
+statements of their state to each directory server. The directory servers
+combine this state information with their own views of network liveness,
+and generate a signed description (a \emph{directory}) of the entire
+network state. Client software is
+pre-loaded with a list of the directory servers and their keys,
+to bootstrap each client's view of the network.
When a directory server receives a signed statement for an OR, it
checks whether the OR's identity key is recognized. Directory
@@ -1230,7 +1234,7 @@ points. He may do this on any robust efficient
key-value lookup system with authenticated updates, such as a
distributed hash table (DHT) like CFS \cite{cfs:sosp01}\footnote{
Rather than rely on an external infrastructure, the Onion Routing network
-can run the DHT itself. At first, we can simply run a simple lookup
+can run the DHT itself. At first, we can run a simple lookup
system on the
directory servers.} Alice, the client, chooses an OR as her
\emph{rendezvous point}. She connects to one of Bob's introduction
@@ -1239,6 +1243,7 @@ to connect to the rendezvous point. This extra level of indirection
helps Bob's introduction points avoid problems associated with serving
unpopular files directly (for example, if Bob serves
material that the introduction point's neighbors find objectionable,
+%XXX neighbors is a technical term
or if Bob's service tends to get attacked by network vandals).
The extra level of indirection also allows Bob to respond to some requests
and ignore others.
@@ -1252,6 +1257,8 @@ application integration is described more fully below.
the DHT. He can add more later.
\item Bob builds a circuit to each of his introduction points,
and waits. No data is yet transmitted.
+% XXX what do we mean No data? Bob obviously tells the IP about
+% his hash-of-public key, auth scheme, etc
\item Alice learns about Bob's service out of band (perhaps Bob told her,
or she found it on a website). She retrieves the details of Bob's
service from the DHT.
@@ -1380,7 +1387,7 @@ ours in three ways. First, Goldberg suggests that Alice should manually
hunt down a current location of the service via Gnutella; our approach
makes lookup transparent to the user, as well as faster and more robust.
Second, in Tor the client and server negotiate session keys
-via Diffie-Hellman, so plaintext is not exposed at the rendezvous point. Third,
+via Diffie-Hellman, so plaintext is not exposed even at the rendezvous point. Third,
our design tries to minimize the exposure associated with running the
service, to encourage volunteers to offer introduction and rendezvous
point services. Tor's introduction points do not output any bytes to the
@@ -1647,17 +1654,18 @@ appropriate. The tradeoffs of a similar approach are discussed in
\noindent{\large\bf Attacks against rendezvous points}\\
\emph{Make many introduction requests.} An attacker could
try to deny Bob service by flooding his introduction points with
-requests. Because the Introduction point can block requests that
+requests. Because the introduction points can block requests that
lack authentication tokens, however, Bob can restrict the volume of
requests he receives, or require a certain amount of computation for
every request he receives.
-\emph{Attack an introduction point.} An attacker could try to disrupt
-Bob's location-hidden service by disabling its introduction points.
-But because a Bob's identity is attached to his public key, Bob
-service can simply re-advertise himself at a different introduction
-point. If an attacker is able to disable all of Bob's introduction
-points, he can block access to Bob. However, re-advertisement of new
+\emph{Attack an introduction point.} An attacker could
+disrupt a location-hidden service by disabling its introduction
+points. But because a service's identity is attached to its public
+key, not its introduction point, the service can simply re-advertise
+itself at a different introduction point.
+An attacker who disables all the introduction points for a given
+service can block access to the service. However, re-advertisement of
introduction points can still be done secretly so that only
high-priority clients know the address of Bob's introduction
points. (These selective secret authorizations can also be issued
@@ -1715,7 +1723,7 @@ three nodes unrelated to herself and her destination.
%three nodes, but if she is running an OR and her destination is on an OR,
%she uses five.
Should Alice choose a nondeterministic path length (say,
-increasing it a geometric distribution) to foil an attacker who
+increasing it from a geometric distribution) to foil an attacker who
uses timing to learn that he is the fifth hop and thus concludes that
both Alice and the responder are on ORs?
@@ -1825,18 +1833,19 @@ schemes that offer provable protection against our chosen adversary.
\emph{Caching at exit nodes:} Perhaps each exit node should run a
caching web proxy, to improve anonymity for cached pages (Alice's request never
leaves the Tor network), to improve speed, and to reduce bandwidth cost.
-%XXX and to have a layer to block to block funny stuff out of port 80.
-% is that a useful thing to say?
-% No; we already said it in the exit abuse section. - NM.
On the other hand, forward security is weakened because caches
constitute a record of retrieved files. We must find the right
balance between usability and security.
-\emph{Better directory distribution:} Directory retrieval presents
-a scaling problem, since clients currently download a description of
+\emph{Better directory distribution:} %Directory retrieval presents
+%a scaling problem, since
+Clients currently download a description of
the entire network state every 15 minutes. As the state grows larger
-and clients more numerous, we may need to move to a solution in which
-clients only receive incremental updates to directory state.
+and clients more numerous, we may need a solution in which
+clients receive incremental updates to directory state.
+More generally, we must find more
+scalable yet practical ways to distribute up-to-date snapshots of
+network status without introducing new attacks.
\emph{Implement location-hidden services:} The design in
Section~\ref{sec:rendezvous} has not yet been implemented. While doing