diff options
-rw-r--r-- | doc/FAQ | 111 | ||||
-rw-r--r-- | doc/HACKING | 117 |
2 files changed, 228 insertions, 0 deletions
diff --git a/doc/FAQ b/doc/FAQ new file mode 100644 index 000000000..d4d7a46a2 --- /dev/null +++ b/doc/FAQ @@ -0,0 +1,111 @@ +The Onion Routing (TOR) Frequently Asked Questions +-------------------------------------------------- + +1. General. + +1.1. What is tor? + +Tor is an implementation of version 2 of Onion Routing. + +Onion Routing is a connection-oriented anonymizing communication +service. Users build a layered block of asymmetric encryptions +(an "onion") which describes a source-routed path through a set of +nodes. Those nodes build a "virtual circuit" through the network, in which +each node knows its predecessor and successor, but no others. Traffic +flowing down the circuit is unwrapped by a symmetric key at each node +which reveals the downstream node. + +Basically tor provides a distributed network of servers ("onion +routers"). Users bounce their tcp streams (web traffic, ftp, ssh, etc) +around the routers, and recipients, observers, and even the routers +themselves have difficulty tracking the source of the stream. + +1.2. Why's it called tor? + +Because tor is the onion routing system. I kept telling people I was +working on onion routing, and they said "Neat. Which one?" Even if onion +routing has become a standard household term, this is the actual onion +routing project, started out of the Naval Research Lab. + +(Theories about recursive acronyms are ok too.) + + +2. Compiling and installing. + +[Read the README file for now; check back here once we've got packages/etc +for you.] + + +3. Running tor. + +3.1. What's this about roles? What kind of server should I run? + +The same executable ("or") functions as both client and server, depending +on the value of the config variable named 'Role'. Role represents a +combination of which tasks this particular tor server will do. The default +Role (role 15) is an onion router: it listens for onion routers, listens +for onion proxies, listens for application proxies, and it connects to +all other onion routers it learns about. A directory server (role 63) +does all of the above and also serves directory requests. A simple +onion proxy, on the other hand (role 8), only listens for application +proxies. See part 3.1 of the HACKING document for more technical details. + +3.2. So I can just run a full onion router and join the network? + +No. Users should run just an onion proxy (use the 'oprc' config file). +If you start up a full onion router, the rest of the routers in the +system won't recognize you, so they will reject your handshake attempts. + +3.3. How do I join the network then? + +If you just want to use the onion routing network, you can run a proxy +and you're all set. If you want to run a router, you must convince +the directory server operators (currently arma@mit.edu) that you're a +trustworthy person. From there, the operators add you to the directory, +which propagates out to the rest of the network. All nodes will know +about you within an hour. + +3.4. I want to run a directory server too. + +If you run a very reliable node, you plan to be around for a long time, +and you want to spend some time ensuring that router operators are +people we know and like, we may want you to run a directory server +too. We must manually add you to the 'dirservers' file that's part of +the distribution; users will only know about you when they upgrade to +a new version. Of course, you can always just start up your router as a +directory server too --- but users won't know to ask you for directories, +and more importantly, you'll never learn from the real directory servers +about recently joined routers. + + +4. Development. + +4.1. Who's doing this? + +4.2. Can I help? + +4.3. I've got a bug. + + +5. Anonymity. + +5.1. So I'm totally anonymous if I use tor? + +5.2. Where can I learn more about anonymity? + + +6. Comparison to related projects. + +6.1. Onion Routing. + +Tor *is* onion routing. + +6.2. Freedom. + + +7. Protocol and application support. + +7.1. http? ftp? udp? socks? mozilla? + + + diff --git a/doc/HACKING b/doc/HACKING new file mode 100644 index 000000000..421b32f90 --- /dev/null +++ b/doc/HACKING @@ -0,0 +1,117 @@ + +0. Intro. +Onion Routing is still very much in development stages. This document +aims to get you started in the right direction if you want to understand +the code, add features, fix bugs, etc. + +Read the README file first, so you can get familiar with the basics. + +1. The programs. + +1.1. "or". This is the main program here. It functions as both a server +and a client, depending on which config file you give it. ... + +2. The pieces. + +2.1. Routers. Onion routers, as far as the 'or' program is concerned, +are a bunch of data items that are loaded into the router_array when +the program starts. After it's loaded, the router information is never +changed. When a new OR connection is started (see below), the relevant +information is copied from the router struct to the connection struct. + +2.2. Connections. A connection is a long-standing tcp socket between +nodes. A connection is named based on what it's connected to -- an "OR +connection" has an onion router on the other end, an "OP connection" has +an onion proxy on the other end, an "exit connection" has a website or +other server on the other end, and an "AP connection" has an application +proxy (and thus a user) on the other end. + +2.3. Circuits. A circuit is a single conversation between two +participants over the onion routing network. One end of the circuit has +an AP connection, and the other end has an exit connection. AP and exit +connections have only one circuit associated with them (and thus these +connection types are closed when the circuit is closed), whereas OP and +OR connections multiplex many circuits at once, and stay standing even +when there are no circuits running over them. + +2.4. Cells. Some connections, specifically OR and OP connections, speak +"cells". This means that data over that connection is bundled into 128 +byte packets (8 bytes of header and 120 bytes of payload). Each cell has +a type, or "command", which indicates what it's for. + + +3. Important parameters in the code. + +3.1. Role. + + +4. Robustness features. + +4.1. Bandwidth throttling. Each cell-speaking connection has a maximum +bandwidth it can use, as specified in the routers.or file. Bandwidth +throttling occurs on both the sender side and the receiving side. The +sending side sends cells at regularly spaced intervals (e.g., a connection +with a bandwidth of 12800B/s would queue a cell every 10ms). The receiving +side protects against misbehaving servers that send cells more frequently, +by using a simple token bucket: + +Each connection has a token bucket with a specified capacity. Tokens are +added to the bucket each second (when the bucket is full, new tokens +are discarded.) Each token represents permission to receive one byte +from the network --- to receive a byte, the connection must remove a +token from the bucket. Thus if the bucket is empty, that connection must +wait until more tokens arrive. The number of tokens we add enforces a +longterm average rate of incoming bytes, yet we still permit short-term +bursts above the allowed bandwidth. Currently bucket sizes are set to +ten seconds worth of traffic. + +The bandwidth throttling uses TCP to push back when we stop reading. +We extend it with token buckets to allow more flexibility for traffic +bursts. + +4.2. Data congestion control. Even with the above bandwidth throttling, +we still need to worry about congestion, either accidental or intentional. +If a lot of people make circuits into same node, and they all come out +through the same connection, then that connection may become saturated +(be unable to send out data cells as quickly as it wants to). An adversary +can make a 'put' request through the onion routing network to a webserver +he owns, and then refuse to read any of the bytes at the webserver end +of the circuit. These bottlenecks can propagate back through the entire +network, mucking up everything. + +To handle this congestion, each circuit starts out with a receive +window at each node of 100 cells -- it is willing to receive at most 100 +cells on that circuit. (It handles each direction separately; so that's +really 100 cells forward and 100 cells back.) The edge of the circuit +is willing to create at most 100 cells from data coming from outside the +onion routing network. Nodes in the middle of the circuit will tear down +the circuit if a data cell arrives when the receive window is 0. When +data has traversed the network, the edge node buffers it on its outbuf, +and evaluates whether to respond with a 'sendme' acknowledgement: if its +outbuf is not too full, and its receive window is less than 90, then it +queues a 'sendme' cell backwards in the circuit. Each node that receives +the sendme increments its window by 10 and passes the cell onward. + +In practice, all the nodes in the circuit maintain a receive window +close to 100 except the exit node, which stays around 0, periodically +receiving a sendme and reading 10 more data cells from the webserver. +In this way we can use pretty much all of the available bandwidth for +data, but gracefully back off when faced with multiple circuits (a new +sendme arrives only after some cells have traversed the entire network), +stalled network connections, or attacks. + +We don't need to reimplement full tcp windows, with sequence numbers, +the ability to drop cells when we're full etc, because the tcp streams +already guarantee in-order delivery of each cell. Rather than trying +to build some sort of tcp-on-tcp scheme, we implement this minimal data +congestion control; so far it's enough. + +4.3. Router twins. In many cases when we ask for a router with a given +address and port, we really mean a router who knows a given key. Router +twins are two or more routers that all share the same private key. We thus +give routers extra flexibility in choosing the next hop in the circuit: if +some of the twins are down or slow, it can choose the more available ones. + +Currently the code tries for the primary router first, and if it's down, +chooses the first available twin. + |