diff options
author | Roger Dingledine <arma@torproject.org> | 2007-11-24 15:28:08 +0000 |
---|---|---|
committer | Roger Dingledine <arma@torproject.org> | 2007-11-24 15:28:08 +0000 |
commit | 17393b835927cec5db32dce7af8f2f4721e9a71b (patch) | |
tree | 6744681cf49fd14a9cc27d01496bc2e6d133a418 | |
parent | 5b3cc6cd7e5fa083b32ab9a286f9eec05b831801 (diff) | |
download | tor-17393b835927cec5db32dce7af8f2f4721e9a71b.tar tor-17393b835927cec5db32dce7af8f2f4721e9a71b.tar.gz |
draft of a proposal: Fetching GeoIP databases for clients, relays, and bridges
svn:r12566
-rw-r--r-- | doc/spec/proposals/000-index.txt | 2 | ||||
-rw-r--r-- | doc/spec/proposals/123-autonaming.txt | 3 | ||||
-rw-r--r-- | doc/spec/proposals/126-geoip-reporting.txt | 124 |
3 files changed, 128 insertions, 1 deletions
diff --git a/doc/spec/proposals/000-index.txt b/doc/spec/proposals/000-index.txt index 811de30f3..b45de7fc4 100644 --- a/doc/spec/proposals/000-index.txt +++ b/doc/spec/proposals/000-index.txt @@ -48,6 +48,7 @@ Proposals by number: 123 Naming authorities automatically create bindings [OPEN] 124 Blocking resistant TLS certificate usage [ACCEPTED] 125 Behavior for bridge users, bridge relays, and bridge authorities [OPEN] +126 Fetching GeoIP databases for clients, relays, and bridges [OPEN] Proposals by status: @@ -63,6 +64,7 @@ Proposals by status: 121 Hidden Service Authentication 123 Naming authorities automatically create bindings 125 Behavior for bridge users, bridge relays, and bridge authorities + 126 Fetching GeoIP databases for clients, relays, and bridges ACCEPTED: 105 Version negotiation for the Tor protocol 124 Blocking resistant TLS certificate usage diff --git a/doc/spec/proposals/123-autonaming.txt b/doc/spec/proposals/123-autonaming.txt index 7ab7d3ece..988b4b96c 100644 --- a/doc/spec/proposals/123-autonaming.txt +++ b/doc/spec/proposals/123-autonaming.txt @@ -1,4 +1,4 @@ -Filename: xxx-autonaming.txt +Filename: 123-autonaming.txt Title: Naming authorities automatically create bindings Version: $Revision$ Last-Modified: $Date$ @@ -52,3 +52,4 @@ Proposal: This automaton does not necessarily need to live in the Tor code, it can do its job just as well when it's an external tool. + diff --git a/doc/spec/proposals/126-geoip-reporting.txt b/doc/spec/proposals/126-geoip-reporting.txt new file mode 100644 index 000000000..5f9858140 --- /dev/null +++ b/doc/spec/proposals/126-geoip-reporting.txt @@ -0,0 +1,124 @@ +Filename: 126-geoip-fetching.txt +Title: Fetching GeoIP databases for clients, relays, and bridges +Version: $Revision: 11988 $ +Last-Modified: $Date: 2007-10-16 12:59:42 -0400 (Tue, 16 Oct 2007) $ +Author: Roger Dingledine +Created: 2007-11-24 +Status: Open + +1. Background and motivation + + Right now we can keep a rough count of Tor users, both total and by + country, by watching connections to a single directory mirror. Being + able to get usage estimates is useful both for our funders (to + demonstrate progress) and for our own development (so we know how + quickly we're scaling and can design accordingly, and so we know which + countries and communities to focus on more). This need for information + is the only reason we haven't deployed "directory guards" (think of + them like entry guards but for directory information; in practice, + it would seem that Tor clients should simply use their entry guards + as their directory guards). + + With the move toward bridges, we will no longer be able to track Tor + clients that use bridges, since they use their bridges as directory + guards. Further, we need to be able to learn which bridges stop seeing + use from certain countries (and are thus likely blocked), so we can + avoid giving them out to other users in those countries. + + Right now we support GeoIP lookups through Vidalia: Vidalia draws relays + and circuits on its 'network map', and it performs anonymized GeoIP + lookups to its central servers to know where to put the dots. Vidalia + caches answers it gets -- to reduce delay, to reduce overhead on + the network, and to reduce anonymity issues where users reveal their + behavior through which IP addresses they ask about. + + But with the advent of bridges, Tor clients are asking about IP + addresses that aren't in the main directory. In particular, bridge + users tell the central Vidalia servers about each bridge as they + discover it and their Vidalia tries to map it. + + Also, we wouldn't mind letting Vidalia do a GeoIP lookup on the client's + own IP address, so it can provide a more useful map. + + Also, Vidalia's central servers leave users open to partitioning + attacks, even if they can't target specific users. Further, as we + start using GeoIP results for more operational or security-relevant + goals, such as avoiding or including particular countries in circuits, + it becomes more important that users can't be singled out in terms of + their IP-to-country mapping beliefs. + + This proposal describes a way for Tor relays, bridges, and clients to + download a local copy of a GeoIP database, so they can do local private + queries. Thus we can avoid sending detailed queries to central servers. + +2. Publishing and caching the GeoIP database + + We assume that we use a free GeoIP db, like ip2country. We will need + to standardize on its format; see Section 5. + + Each v3 directory authority should put a copy of the "geoip" file in + its datadirectory. Then its votes should include a hash of this file, + and the resulting consensus directory should specify the consensus hash. + + There should be a new URL for fetching this geoip db (by "current.z" + for testing purposes, and by hash.z for typical downloads). Authorities + should fetch and serve the one listed in the consensus, even when they + vote for their own. This would argue for storing the cached version + in a better filename than "geoip". + + Directory mirrors should keep a copy of this file available via the + same URLs. + + We assume that the file would change at most a few times a month. Should + Tor ship with a bootstrap geoip file? + +3. Clients use it for Vidalia + + Tor fetches the geoip file as above, and puts it in Tor's DataDirectory. + Then we could have a status event that tells controllers that a new + geoip file has arrived. + + Then Vidalia would either read the file directly, or we would add + a control protocol interface for querying. Since Tor probably needs + to parse the file itself (see Section 4 below), offering the control + interface is probably cleanest. + + There should be a config option to disable updating the geoip file, + in case users want to use their own file (e.g. they have a proprietary + GeoIP file they prefer to use). In that case we leave it up to the + user to update his geoip file out-of-band. + +4. Bridges use it for usage summaries + + Once bridges have a GeoIP database locally, they can start to publish + sanitized summaries of client usage -- how many users they see and from + what countries. This might also be a more useful way for ordinary Tor + relays to convey the level of usage they see. + + But how to safely summarize this information without opening too many + anonymity leaks seems hard, so I'm going to leave it for a different + proposal. + +5. Which db to use? + + A recent ip-to-country.csv is 3421362 bytes. Compressed, it is 564252 + bytes. This isn't so bad. But we can easily cut it down further; some + sample lines are: + "205500992","208605279","US","USA","UNITED STATES" + "208605280","208605311","CA","CAN","CANADA" + "208605312","210784255","US","USA","UNITED STATES" + My guess is the compression will solve most of the redundancy, so we + can stick with the default format. + http://ip-to-country.webhosting.info/node/view/5 + + The maxmind GeoLite Country database is also about 500KB compressed. + http://www.maxmind.com/app/geolitecountry + + The maxmind GeoLite City database gives more finegrained detail, such + as geo coordinates and city name. Vidalia currently makes use of this + information. On the other hand it's 16MB compressed, which would seem + to be out of our reach. + http://www.maxmind.com/app/geolitecity + + What other options are there? + |