1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
|
Metadata-Version: 1.1
Name: urllib3
Version: 1.8.3
Summary: HTTP library with thread-safe connection pooling, file post, and more.
Home-page: http://urllib3.readthedocs.org/
Author: Andrey Petrov
Author-email: andrey.petrov@shazow.net
License: MIT
Description: =======
urllib3
=======
.. image:: https://travis-ci.org/shazow/urllib3.png?branch=master
:target: https://travis-ci.org/shazow/urllib3
Highlights
==========
- Re-use the same socket connection for multiple requests
(``HTTPConnectionPool`` and ``HTTPSConnectionPool``)
(with optional client-side certificate verification).
- File posting (``encode_multipart_formdata``).
- Built-in redirection and retries (optional).
- Supports gzip and deflate decoding.
- Thread-safe and sanity-safe.
- Works with AppEngine, gevent, and eventlib.
- Tested on Python 2.6+ and Python 3.2+, 100% unit test coverage.
- Small and easy to understand codebase perfect for extending and building upon.
For a more comprehensive solution, have a look at
`Requests <http://python-requests.org/>`_ which is also powered by ``urllib3``.
You might already be using urllib3!
===================================
``urllib3`` powers `many great Python libraries <https://sourcegraph.com/search?q=package+urllib3>`_,
including ``pip`` and ``requests``.
What's wrong with urllib and urllib2?
=====================================
There are two critical features missing from the Python standard library:
Connection re-using/pooling and file posting. It's not terribly hard to
implement these yourself, but it's much easier to use a module that already
did the work for you.
The Python standard libraries ``urllib`` and ``urllib2`` have little to do
with each other. They were designed to be independent and standalone, each
solving a different scope of problems, and ``urllib3`` follows in a similar
vein.
Why do I want to reuse connections?
===================================
Performance. When you normally do a urllib call, a separate socket
connection is created with each request. By reusing existing sockets
(supported since HTTP 1.1), the requests will take up less resources on the
server's end, and also provide a faster response time at the client's end.
With some simple benchmarks (see `test/benchmark.py
<https://github.com/shazow/urllib3/blob/master/test/benchmark.py>`_
), downloading 15 URLs from google.com is about twice as fast when using
HTTPConnectionPool (which uses 1 connection) than using plain urllib (which
uses 15 connections).
This library is perfect for:
- Talking to an API
- Crawling a website
- Any situation where being able to post files, handle redirection, and
retrying is useful. It's relatively lightweight, so it can be used for
anything!
Examples
========
Go to `urllib3.readthedocs.org <http://urllib3.readthedocs.org>`_
for more nice syntax-highlighted examples.
But, long story short::
import urllib3
http = urllib3.PoolManager()
r = http.request('GET', 'http://google.com/')
print r.status, r.data
The ``PoolManager`` will take care of reusing connections for you whenever
you request the same host. For more fine-grained control of your connection
pools, you should look at
`ConnectionPool <http://urllib3.readthedocs.org/#connectionpool>`_.
Run the tests
=============
We use some external dependencies, multiple interpreters and code coverage
analysis while running test suite. Easiest way to run the tests is thusly the
``tox`` utility: ::
$ tox
# [..]
py26: commands succeeded
py27: commands succeeded
py32: commands succeeded
py33: commands succeeded
py34: commands succeeded
Note that code coverage less than 100% is regarded as a failing run.
Contributing
============
#. `Check for open issues <https://github.com/shazow/urllib3/issues>`_ or open
a fresh issue to start a discussion around a feature idea or a bug. There is
a *Contributor Friendly* tag for issues that should be ideal for people who
are not very familiar with the codebase yet.
#. Fork the `urllib3 repository on Github <https://github.com/shazow/urllib3>`_
to start making your changes.
#. Write a test which shows that the bug was fixed or that the feature works
as expected.
#. Send a pull request and bug the maintainer until it gets merged and published.
:) Make sure to add yourself to ``CONTRIBUTORS.txt``.
Changes
=======
1.8.3 (2014-06-23)
++++++++++++++++++
* Fix TLS verification when using a proxy in Python 3.4.1. (Issue #385)
* Add ``disable_cache`` option to ``urllib3.util.make_headers``. (Issue #393)
* Wrap ``socket.timeout`` exception with
``urllib3.exceptions.ReadTimeoutError``. (Issue #399)
* Fixed proxy-related bug where connections were being reused incorrectly.
(Issues #366, #369)
* Added ``socket_options`` keyword parameter which allows to define
``setsockopt`` configuration of new sockets. (Issue #397)
* Removed ``HTTPConnection.tcp_nodelay`` in favor of
``HTTPConnection.default_socket_options``. (Issue #397)
* Fixed ``TypeError`` bug in Python 2.6.4. (Issue #411)
1.8.2 (2014-04-17)
++++++++++++++++++
* Fix ``urllib3.util`` not being included in the package.
1.8.1 (2014-04-17)
++++++++++++++++++
* Fix AppEngine bug of HTTPS requests going out as HTTP. (Issue #356)
* Don't install ``dummyserver`` into ``site-packages`` as it's only needed
for the test suite. (Issue #362)
* Added support for specifying ``source_address``. (Issue #352)
1.8 (2014-03-04)
++++++++++++++++
* Improved url parsing in ``urllib3.util.parse_url`` (properly parse '@' in
username, and blank ports like 'hostname:').
* New ``urllib3.connection`` module which contains all the HTTPConnection
objects.
* Several ``urllib3.util.Timeout``-related fixes. Also changed constructor
signature to a more sensible order. [Backwards incompatible]
(Issues #252, #262, #263)
* Use ``backports.ssl_match_hostname`` if it's installed. (Issue #274)
* Added ``.tell()`` method to ``urllib3.response.HTTPResponse`` which
returns the number of bytes read so far. (Issue #277)
* Support for platforms without threading. (Issue #289)
* Expand default-port comparison in ``HTTPConnectionPool.is_same_host``
to allow a pool with no specified port to be considered equal to to an
HTTP/HTTPS url with port 80/443 explicitly provided. (Issue #305)
* Improved default SSL/TLS settings to avoid vulnerabilities.
(Issue #309)
* Fixed ``urllib3.poolmanager.ProxyManager`` not retrying on connect errors.
(Issue #310)
* Disable Nagle's Algorithm on the socket for non-proxies. A subset of requests
will send the entire HTTP request ~200 milliseconds faster; however, some of
the resulting TCP packets will be smaller. (Issue #254)
* Increased maximum number of SubjectAltNames in ``urllib3.contrib.pyopenssl``
from the default 64 to 1024 in a single certificate. (Issue #318)
* Headers are now passed and stored as a custom
``urllib3.collections_.HTTPHeaderDict`` object rather than a plain ``dict``.
(Issue #329, #333)
* Headers no longer lose their case on Python 3. (Issue #236)
* ``urllib3.contrib.pyopenssl`` now uses the operating system's default CA
certificates on inject. (Issue #332)
* Requests with ``retries=False`` will immediately raise any exceptions without
wrapping them in ``MaxRetryError``. (Issue #348)
* Fixed open socket leak with SSL-related failures. (Issue #344, #348)
1.7.1 (2013-09-25)
++++++++++++++++++
* Added granular timeout support with new ``urllib3.util.Timeout`` class.
(Issue #231)
* Fixed Python 3.4 support. (Issue #238)
1.7 (2013-08-14)
++++++++++++++++
* More exceptions are now pickle-able, with tests. (Issue #174)
* Fixed redirecting with relative URLs in Location header. (Issue #178)
* Support for relative urls in ``Location: ...`` header. (Issue #179)
* ``urllib3.response.HTTPResponse`` now inherits from ``io.IOBase`` for bonus
file-like functionality. (Issue #187)
* Passing ``assert_hostname=False`` when creating a HTTPSConnectionPool will
skip hostname verification for SSL connections. (Issue #194)
* New method ``urllib3.response.HTTPResponse.stream(...)`` which acts as a
generator wrapped around ``.read(...)``. (Issue #198)
* IPv6 url parsing enforces brackets around the hostname. (Issue #199)
* Fixed thread race condition in
``urllib3.poolmanager.PoolManager.connection_from_host(...)`` (Issue #204)
* ``ProxyManager`` requests now include non-default port in ``Host: ...``
header. (Issue #217)
* Added HTTPS proxy support in ``ProxyManager``. (Issue #170 #139)
* New ``RequestField`` object can be passed to the ``fields=...`` param which
can specify headers. (Issue #220)
* Raise ``urllib3.exceptions.ProxyError`` when connecting to proxy fails.
(Issue #221)
* Use international headers when posting file names. (Issue #119)
* Improved IPv6 support. (Issue #203)
1.6 (2013-04-25)
++++++++++++++++
* Contrib: Optional SNI support for Py2 using PyOpenSSL. (Issue #156)
* ``ProxyManager`` automatically adds ``Host: ...`` header if not given.
* Improved SSL-related code. ``cert_req`` now optionally takes a string like
"REQUIRED" or "NONE". Same with ``ssl_version`` takes strings like "SSLv23"
The string values reflect the suffix of the respective constant variable.
(Issue #130)
* Vendored ``socksipy`` now based on Anorov's fork which handles unexpectedly
closed proxy connections and larger read buffers. (Issue #135)
* Ensure the connection is closed if no data is received, fixes connection leak
on some platforms. (Issue #133)
* Added SNI support for SSL/TLS connections on Py32+. (Issue #89)
* Tests fixed to be compatible with Py26 again. (Issue #125)
* Added ability to choose SSL version by passing an ``ssl.PROTOCOL_*`` constant
to the ``ssl_version`` parameter of ``HTTPSConnectionPool``. (Issue #109)
* Allow an explicit content type to be specified when encoding file fields.
(Issue #126)
* Exceptions are now pickleable, with tests. (Issue #101)
* Fixed default headers not getting passed in some cases. (Issue #99)
* Treat "content-encoding" header value as case-insensitive, per RFC 2616
Section 3.5. (Issue #110)
* "Connection Refused" SocketErrors will get retried rather than raised.
(Issue #92)
* Updated vendored ``six``, no longer overrides the global ``six`` module
namespace. (Issue #113)
* ``urllib3.exceptions.MaxRetryError`` contains a ``reason`` property holding
the exception that prompted the final retry. If ``reason is None`` then it
was due to a redirect. (Issue #92, #114)
* Fixed ``PoolManager.urlopen()`` from not redirecting more than once.
(Issue #149)
* Don't assume ``Content-Type: text/plain`` for multi-part encoding parameters
that are not files. (Issue #111)
* Pass `strict` param down to ``httplib.HTTPConnection``. (Issue #122)
* Added mechanism to verify SSL certificates by fingerprint (md5, sha1) or
against an arbitrary hostname (when connecting by IP or for misconfigured
servers). (Issue #140)
* Streaming decompression support. (Issue #159)
1.5 (2012-08-02)
++++++++++++++++
* Added ``urllib3.add_stderr_logger()`` for quickly enabling STDERR debug
logging in urllib3.
* Native full URL parsing (including auth, path, query, fragment) available in
``urllib3.util.parse_url(url)``.
* Built-in redirect will switch method to 'GET' if status code is 303.
(Issue #11)
* ``urllib3.PoolManager`` strips the scheme and host before sending the request
uri. (Issue #8)
* New ``urllib3.exceptions.DecodeError`` exception for when automatic decoding,
based on the Content-Type header, fails.
* Fixed bug with pool depletion and leaking connections (Issue #76). Added
explicit connection closing on pool eviction. Added
``urllib3.PoolManager.clear()``.
* 99% -> 100% unit test coverage.
1.4 (2012-06-16)
++++++++++++++++
* Minor AppEngine-related fixes.
* Switched from ``mimetools.choose_boundary`` to ``uuid.uuid4()``.
* Improved url parsing. (Issue #73)
* IPv6 url support. (Issue #72)
1.3 (2012-03-25)
++++++++++++++++
* Removed pre-1.0 deprecated API.
* Refactored helpers into a ``urllib3.util`` submodule.
* Fixed multipart encoding to support list-of-tuples for keys with multiple
values. (Issue #48)
* Fixed multiple Set-Cookie headers in response not getting merged properly in
Python 3. (Issue #53)
* AppEngine support with Py27. (Issue #61)
* Minor ``encode_multipart_formdata`` fixes related to Python 3 strings vs
bytes.
1.2.2 (2012-02-06)
++++++++++++++++++
* Fixed packaging bug of not shipping ``test-requirements.txt``. (Issue #47)
1.2.1 (2012-02-05)
++++++++++++++++++
* Fixed another bug related to when ``ssl`` module is not available. (Issue #41)
* Location parsing errors now raise ``urllib3.exceptions.LocationParseError``
which inherits from ``ValueError``.
1.2 (2012-01-29)
++++++++++++++++
* Added Python 3 support (tested on 3.2.2)
* Dropped Python 2.5 support (tested on 2.6.7, 2.7.2)
* Use ``select.poll`` instead of ``select.select`` for platforms that support
it.
* Use ``Queue.LifoQueue`` instead of ``Queue.Queue`` for more aggressive
connection reusing. Configurable by overriding ``ConnectionPool.QueueCls``.
* Fixed ``ImportError`` during install when ``ssl`` module is not available.
(Issue #41)
* Fixed ``PoolManager`` redirects between schemes (such as HTTP -> HTTPS) not
completing properly. (Issue #28, uncovered by Issue #10 in v1.1)
* Ported ``dummyserver`` to use ``tornado`` instead of ``webob`` +
``eventlet``. Removed extraneous unsupported dummyserver testing backends.
Added socket-level tests.
* More tests. Achievement Unlocked: 99% Coverage.
1.1 (2012-01-07)
++++++++++++++++
* Refactored ``dummyserver`` to its own root namespace module (used for
testing).
* Added hostname verification for ``VerifiedHTTPSConnection`` by vendoring in
Py32's ``ssl_match_hostname``. (Issue #25)
* Fixed cross-host HTTP redirects when using ``PoolManager``. (Issue #10)
* Fixed ``decode_content`` being ignored when set through ``urlopen``. (Issue
#27)
* Fixed timeout-related bugs. (Issues #17, #23)
1.0.2 (2011-11-04)
++++++++++++++++++
* Fixed typo in ``VerifiedHTTPSConnection`` which would only present as a bug if
you're using the object manually. (Thanks pyos)
* Made RecentlyUsedContainer (and consequently PoolManager) more thread-safe by
wrapping the access log in a mutex. (Thanks @christer)
* Made RecentlyUsedContainer more dict-like (corrected ``__delitem__`` and
``__getitem__`` behaviour), with tests. Shouldn't affect core urllib3 code.
1.0.1 (2011-10-10)
++++++++++++++++++
* Fixed a bug where the same connection would get returned into the pool twice,
causing extraneous "HttpConnectionPool is full" log warnings.
1.0 (2011-10-08)
++++++++++++++++
* Added ``PoolManager`` with LRU expiration of connections (tested and
documented).
* Added ``ProxyManager`` (needs tests, docs, and confirmation that it works
with HTTPS proxies).
* Added optional partial-read support for responses when
``preload_content=False``. You can now make requests and just read the headers
without loading the content.
* Made response decoding optional (default on, same as before).
* Added optional explicit boundary string for ``encode_multipart_formdata``.
* Convenience request methods are now inherited from ``RequestMethods``. Old
helpers like ``get_url`` and ``post_url`` should be abandoned in favour of
the new ``request(method, url, ...)``.
* Refactored code to be even more decoupled, reusable, and extendable.
* License header added to ``.py`` files.
* Embiggened the documentation: Lots of Sphinx-friendly docstrings in the code
and docs in ``docs/`` and on urllib3.readthedocs.org.
* Embettered all the things!
* Started writing this file.
0.4.1 (2011-07-17)
++++++++++++++++++
* Minor bug fixes, code cleanup.
0.4 (2011-03-01)
++++++++++++++++
* Better unicode support.
* Added ``VerifiedHTTPSConnection``.
* Added ``NTLMConnectionPool`` in contrib.
* Minor improvements.
0.3.1 (2010-07-13)
++++++++++++++++++
* Added ``assert_host_name`` optional parameter. Now compatible with proxies.
0.3 (2009-12-10)
++++++++++++++++
* Added HTTPS support.
* Minor bug fixes.
* Refactored, broken backwards compatibility with 0.2.
* API to be treated as stable from this version forward.
0.2 (2008-11-17)
++++++++++++++++
* Added unit tests.
* Bug fixes.
0.1 (2008-11-16)
++++++++++++++++
* First release.
Keywords: urllib httplib threadsafe filepost http https ssl pooling
Platform: UNKNOWN
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries
|