aboutsummaryrefslogtreecommitdiff
path: root/README.rst
blob: b376c811de1a60087141b90c0f33c71a9f3df729 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
Highlights
==========

- Re-use the same socket connection for multiple requests
  (``HTTPConnectionPool`` and ``HTTPSConnectionPool``)
  (with optional client-side certificate verification).
- File posting (``encode_multipart_formdata``).
- Built-in redirection and retries (optional).
- Supports gzip and deflate decoding.
- Thread-safe and sanity-safe.
- Tested on Python 2.6+ and Python 3.2+, 99% unit test coverage.
- Small and easy to understand codebase perfect for extending and building upon.
  For a more comprehensive solution, have a look at
  `Requests <http://python-requests.org/>`_ which is also powered by urllib3.

What's wrong with urllib and urllib2?
=====================================

There are two critical features missing from the Python standard library:
Connection re-using/pooling and file posting. It's not terribly hard to
implement these yourself, but it's much easier to use a module that already
did the work for you.

The Python standard libraries ``urllib`` and ``urllib2`` have little to do
with each other. They were designed to be independent and standalone, each
solving a different scope of problems, and ``urllib3`` follows in a similar
vein.

Why do I want to reuse connections?
===================================

Performance. When you normally do a urllib call, a separate socket
connection is created with each request. By reusing existing sockets
(supported since HTTP 1.1), the requests will take up less resources on the
server's end, and also provide a faster response time at the client's end.
With some simple benchmarks (see `test/benchmark.py
<https://github.com/shazow/urllib3/blob/master/test/benchmark.py>`_
), downloading 15 URLs from google.com is about twice as fast when using
HTTPConnectionPool (which uses 1 connection) than using plain urllib (which
uses 15 connections).

This library is perfect for:

- Talking to an API
- Crawling a website
- Any situation where being able to post files, handle redirection, and
  retrying is useful. It's relatively lightweight, so it can be used for
  anything!

Examples
========

Go to `urllib3.readthedocs.org <http://urllib3.readthedocs.org>`_
for more nice syntax-highlighted examples.

But, long story short::

  import urllib3

  http = urllib3.PoolManager()

  r = http.request('GET', 'http://google.com/')

  print r.status, r.data

The ``PoolManager`` will take care of reusing connections for you whenever
you request the same host. For more fine-grained control of your connection
pools, you should look at
`ConnectionPool <http://urllib3.readthedocs.org/#connectionpool>`_.


Run the tests
=============

We use some external dependencies to run the urllib3 test suite. Easiest way to
run the tests is thusly from the urllib3 source root: ::

  $ pip install -r test-requirements.txt
  $ nosetests
  .....................................................

Success! You could also ``pip install coverage`` to get code coverage reporting.


Contributing
============

#. `Check for open issues <https://github.com/shazow/urllib3/issues>`_ or open
   a fresh issue to start a discussion around a feature idea or a bug. There is
   a *Contributor Friendly* tag for issues that should be ideal for people who
   are not very familiar with the codebase yet.
#. Fork the `urllib3 repository on Github <https://github.com/shazow/urllib3>`_
   to start making your changes.
#. Write a test which shows that the bug was fixed or that the feature works
   as expected.
#. Send a pull request and bug the maintainer until it gets merged and published.
   :) Make sure to add yourself to ``CONTRIBUTORS.txt``.