diff options
116 files changed, 13032 insertions, 0 deletions
diff --git a/CHANGES.rst b/CHANGES.rst new file mode 100644 index 0000000..dd2cd2d --- /dev/null +++ b/CHANGES.rst @@ -0,0 +1,458 @@ +Changes +======= + +1.9.1 (2014-09-13) +++++++++++++++++++ + +* Apply socket arguments before binding. (Issue #427) + +* More careful checks if fp-like object is closed. (Issue #435) + +* Fixed packaging issues of some development-related files not + getting included. (Issue #440) + +* Allow performing *only* fingerprint verification. (Issue #444) + +* Emit ``SecurityWarning`` if system clock is waaay off. (Issue #445) + +* Fixed PyOpenSSL compatibility with PyPy. (Issue #450) + +* Fixed ``BrokenPipeError`` and ``ConnectionError`` handling in Py3. + (Issue #443) + + + +1.9 (2014-07-04) +++++++++++++++++ + +* Shuffled around development-related files. If you're maintaining a distro + package of urllib3, you may need to tweak things. (Issue #415) + +* Unverified HTTPS requests will trigger a warning on the first request. See + our new `security documentation + <https://urllib3.readthedocs.org/en/latest/security.html>`_ for details. + (Issue #426) + +* New retry logic and ``urllib3.util.retry.Retry`` configuration object. + (Issue #326) + +* All raised exceptions should now wrapped in a + ``urllib3.exceptions.HTTPException``-extending exception. (Issue #326) + +* All errors during a retry-enabled request should be wrapped in + ``urllib3.exceptions.MaxRetryError``, including timeout-related exceptions + which were previously exempt. Underlying error is accessible from the + ``.reason`` propery. (Issue #326) + +* ``urllib3.exceptions.ConnectionError`` renamed to + ``urllib3.exceptions.ProtocolError``. (Issue #326) + +* Errors during response read (such as IncompleteRead) are now wrapped in + ``urllib3.exceptions.ProtocolError``. (Issue #418) + +* Requesting an empty host will raise ``urllib3.exceptions.LocationValueError``. + (Issue #417) + +* Catch read timeouts over SSL connections as + ``urllib3.exceptions.ReadTimeoutError``. (Issue #419) + +* Apply socket arguments before connecting. (Issue #427) + + +1.8.3 (2014-06-23) +++++++++++++++++++ + +* Fix TLS verification when using a proxy in Python 3.4.1. (Issue #385) + +* Add ``disable_cache`` option to ``urllib3.util.make_headers``. (Issue #393) + +* Wrap ``socket.timeout`` exception with + ``urllib3.exceptions.ReadTimeoutError``. (Issue #399) + +* Fixed proxy-related bug where connections were being reused incorrectly. + (Issues #366, #369) + +* Added ``socket_options`` keyword parameter which allows to define + ``setsockopt`` configuration of new sockets. (Issue #397) + +* Removed ``HTTPConnection.tcp_nodelay`` in favor of + ``HTTPConnection.default_socket_options``. (Issue #397) + +* Fixed ``TypeError`` bug in Python 2.6.4. (Issue #411) + + +1.8.2 (2014-04-17) +++++++++++++++++++ + +* Fix ``urllib3.util`` not being included in the package. + + +1.8.1 (2014-04-17) +++++++++++++++++++ + +* Fix AppEngine bug of HTTPS requests going out as HTTP. (Issue #356) + +* Don't install ``dummyserver`` into ``site-packages`` as it's only needed + for the test suite. (Issue #362) + +* Added support for specifying ``source_address``. (Issue #352) + + +1.8 (2014-03-04) +++++++++++++++++ + +* Improved url parsing in ``urllib3.util.parse_url`` (properly parse '@' in + username, and blank ports like 'hostname:'). + +* New ``urllib3.connection`` module which contains all the HTTPConnection + objects. + +* Several ``urllib3.util.Timeout``-related fixes. Also changed constructor + signature to a more sensible order. [Backwards incompatible] + (Issues #252, #262, #263) + +* Use ``backports.ssl_match_hostname`` if it's installed. (Issue #274) + +* Added ``.tell()`` method to ``urllib3.response.HTTPResponse`` which + returns the number of bytes read so far. (Issue #277) + +* Support for platforms without threading. (Issue #289) + +* Expand default-port comparison in ``HTTPConnectionPool.is_same_host`` + to allow a pool with no specified port to be considered equal to to an + HTTP/HTTPS url with port 80/443 explicitly provided. (Issue #305) + +* Improved default SSL/TLS settings to avoid vulnerabilities. + (Issue #309) + +* Fixed ``urllib3.poolmanager.ProxyManager`` not retrying on connect errors. + (Issue #310) + +* Disable Nagle's Algorithm on the socket for non-proxies. A subset of requests + will send the entire HTTP request ~200 milliseconds faster; however, some of + the resulting TCP packets will be smaller. (Issue #254) + +* Increased maximum number of SubjectAltNames in ``urllib3.contrib.pyopenssl`` + from the default 64 to 1024 in a single certificate. (Issue #318) + +* Headers are now passed and stored as a custom + ``urllib3.collections_.HTTPHeaderDict`` object rather than a plain ``dict``. + (Issue #329, #333) + +* Headers no longer lose their case on Python 3. (Issue #236) + +* ``urllib3.contrib.pyopenssl`` now uses the operating system's default CA + certificates on inject. (Issue #332) + +* Requests with ``retries=False`` will immediately raise any exceptions without + wrapping them in ``MaxRetryError``. (Issue #348) + +* Fixed open socket leak with SSL-related failures. (Issue #344, #348) + + +1.7.1 (2013-09-25) +++++++++++++++++++ + +* Added granular timeout support with new ``urllib3.util.Timeout`` class. + (Issue #231) + +* Fixed Python 3.4 support. (Issue #238) + + +1.7 (2013-08-14) +++++++++++++++++ + +* More exceptions are now pickle-able, with tests. (Issue #174) + +* Fixed redirecting with relative URLs in Location header. (Issue #178) + +* Support for relative urls in ``Location: ...`` header. (Issue #179) + +* ``urllib3.response.HTTPResponse`` now inherits from ``io.IOBase`` for bonus + file-like functionality. (Issue #187) + +* Passing ``assert_hostname=False`` when creating a HTTPSConnectionPool will + skip hostname verification for SSL connections. (Issue #194) + +* New method ``urllib3.response.HTTPResponse.stream(...)`` which acts as a + generator wrapped around ``.read(...)``. (Issue #198) + +* IPv6 url parsing enforces brackets around the hostname. (Issue #199) + +* Fixed thread race condition in + ``urllib3.poolmanager.PoolManager.connection_from_host(...)`` (Issue #204) + +* ``ProxyManager`` requests now include non-default port in ``Host: ...`` + header. (Issue #217) + +* Added HTTPS proxy support in ``ProxyManager``. (Issue #170 #139) + +* New ``RequestField`` object can be passed to the ``fields=...`` param which + can specify headers. (Issue #220) + +* Raise ``urllib3.exceptions.ProxyError`` when connecting to proxy fails. + (Issue #221) + +* Use international headers when posting file names. (Issue #119) + +* Improved IPv6 support. (Issue #203) + + +1.6 (2013-04-25) +++++++++++++++++ + +* Contrib: Optional SNI support for Py2 using PyOpenSSL. (Issue #156) + +* ``ProxyManager`` automatically adds ``Host: ...`` header if not given. + +* Improved SSL-related code. ``cert_req`` now optionally takes a string like + "REQUIRED" or "NONE". Same with ``ssl_version`` takes strings like "SSLv23" + The string values reflect the suffix of the respective constant variable. + (Issue #130) + +* Vendored ``socksipy`` now based on Anorov's fork which handles unexpectedly + closed proxy connections and larger read buffers. (Issue #135) + +* Ensure the connection is closed if no data is received, fixes connection leak + on some platforms. (Issue #133) + +* Added SNI support for SSL/TLS connections on Py32+. (Issue #89) + +* Tests fixed to be compatible with Py26 again. (Issue #125) + +* Added ability to choose SSL version by passing an ``ssl.PROTOCOL_*`` constant + to the ``ssl_version`` parameter of ``HTTPSConnectionPool``. (Issue #109) + +* Allow an explicit content type to be specified when encoding file fields. + (Issue #126) + +* Exceptions are now pickleable, with tests. (Issue #101) + +* Fixed default headers not getting passed in some cases. (Issue #99) + +* Treat "content-encoding" header value as case-insensitive, per RFC 2616 + Section 3.5. (Issue #110) + +* "Connection Refused" SocketErrors will get retried rather than raised. + (Issue #92) + +* Updated vendored ``six``, no longer overrides the global ``six`` module + namespace. (Issue #113) + +* ``urllib3.exceptions.MaxRetryError`` contains a ``reason`` property holding + the exception that prompted the final retry. If ``reason is None`` then it + was due to a redirect. (Issue #92, #114) + +* Fixed ``PoolManager.urlopen()`` from not redirecting more than once. + (Issue #149) + +* Don't assume ``Content-Type: text/plain`` for multi-part encoding parameters + that are not files. (Issue #111) + +* Pass `strict` param down to ``httplib.HTTPConnection``. (Issue #122) + +* Added mechanism to verify SSL certificates by fingerprint (md5, sha1) or + against an arbitrary hostname (when connecting by IP or for misconfigured + servers). (Issue #140) + +* Streaming decompression support. (Issue #159) + + +1.5 (2012-08-02) +++++++++++++++++ + +* Added ``urllib3.add_stderr_logger()`` for quickly enabling STDERR debug + logging in urllib3. + +* Native full URL parsing (including auth, path, query, fragment) available in + ``urllib3.util.parse_url(url)``. + +* Built-in redirect will switch method to 'GET' if status code is 303. + (Issue #11) + +* ``urllib3.PoolManager`` strips the scheme and host before sending the request + uri. (Issue #8) + +* New ``urllib3.exceptions.DecodeError`` exception for when automatic decoding, + based on the Content-Type header, fails. + +* Fixed bug with pool depletion and leaking connections (Issue #76). Added + explicit connection closing on pool eviction. Added + ``urllib3.PoolManager.clear()``. + +* 99% -> 100% unit test coverage. + + +1.4 (2012-06-16) +++++++++++++++++ + +* Minor AppEngine-related fixes. + +* Switched from ``mimetools.choose_boundary`` to ``uuid.uuid4()``. + +* Improved url parsing. (Issue #73) + +* IPv6 url support. (Issue #72) + + +1.3 (2012-03-25) +++++++++++++++++ + +* Removed pre-1.0 deprecated API. + +* Refactored helpers into a ``urllib3.util`` submodule. + +* Fixed multipart encoding to support list-of-tuples for keys with multiple + values. (Issue #48) + +* Fixed multiple Set-Cookie headers in response not getting merged properly in + Python 3. (Issue #53) + +* AppEngine support with Py27. (Issue #61) + +* Minor ``encode_multipart_formdata`` fixes related to Python 3 strings vs + bytes. + + +1.2.2 (2012-02-06) +++++++++++++++++++ + +* Fixed packaging bug of not shipping ``test-requirements.txt``. (Issue #47) + + +1.2.1 (2012-02-05) +++++++++++++++++++ + +* Fixed another bug related to when ``ssl`` module is not available. (Issue #41) + +* Location parsing errors now raise ``urllib3.exceptions.LocationParseError`` + which inherits from ``ValueError``. + + +1.2 (2012-01-29) +++++++++++++++++ + +* Added Python 3 support (tested on 3.2.2) + +* Dropped Python 2.5 support (tested on 2.6.7, 2.7.2) + +* Use ``select.poll`` instead of ``select.select`` for platforms that support + it. + +* Use ``Queue.LifoQueue`` instead of ``Queue.Queue`` for more aggressive + connection reusing. Configurable by overriding ``ConnectionPool.QueueCls``. + +* Fixed ``ImportError`` during install when ``ssl`` module is not available. + (Issue #41) + +* Fixed ``PoolManager`` redirects between schemes (such as HTTP -> HTTPS) not + completing properly. (Issue #28, uncovered by Issue #10 in v1.1) + +* Ported ``dummyserver`` to use ``tornado`` instead of ``webob`` + + ``eventlet``. Removed extraneous unsupported dummyserver testing backends. + Added socket-level tests. + +* More tests. Achievement Unlocked: 99% Coverage. + + +1.1 (2012-01-07) +++++++++++++++++ + +* Refactored ``dummyserver`` to its own root namespace module (used for + testing). + +* Added hostname verification for ``VerifiedHTTPSConnection`` by vendoring in + Py32's ``ssl_match_hostname``. (Issue #25) + +* Fixed cross-host HTTP redirects when using ``PoolManager``. (Issue #10) + +* Fixed ``decode_content`` being ignored when set through ``urlopen``. (Issue + #27) + +* Fixed timeout-related bugs. (Issues #17, #23) + + +1.0.2 (2011-11-04) +++++++++++++++++++ + +* Fixed typo in ``VerifiedHTTPSConnection`` which would only present as a bug if + you're using the object manually. (Thanks pyos) + +* Made RecentlyUsedContainer (and consequently PoolManager) more thread-safe by + wrapping the access log in a mutex. (Thanks @christer) + +* Made RecentlyUsedContainer more dict-like (corrected ``__delitem__`` and + ``__getitem__`` behaviour), with tests. Shouldn't affect core urllib3 code. + + +1.0.1 (2011-10-10) +++++++++++++++++++ + +* Fixed a bug where the same connection would get returned into the pool twice, + causing extraneous "HttpConnectionPool is full" log warnings. + + +1.0 (2011-10-08) +++++++++++++++++ + +* Added ``PoolManager`` with LRU expiration of connections (tested and + documented). +* Added ``ProxyManager`` (needs tests, docs, and confirmation that it works + with HTTPS proxies). +* Added optional partial-read support for responses when + ``preload_content=False``. You can now make requests and just read the headers + without loading the content. +* Made response decoding optional (default on, same as before). +* Added optional explicit boundary string for ``encode_multipart_formdata``. +* Convenience request methods are now inherited from ``RequestMethods``. Old + helpers like ``get_url`` and ``post_url`` should be abandoned in favour of + the new ``request(method, url, ...)``. +* Refactored code to be even more decoupled, reusable, and extendable. +* License header added to ``.py`` files. +* Embiggened the documentation: Lots of Sphinx-friendly docstrings in the code + and docs in ``docs/`` and on urllib3.readthedocs.org. +* Embettered all the things! +* Started writing this file. + + +0.4.1 (2011-07-17) +++++++++++++++++++ + +* Minor bug fixes, code cleanup. + + +0.4 (2011-03-01) +++++++++++++++++ + +* Better unicode support. +* Added ``VerifiedHTTPSConnection``. +* Added ``NTLMConnectionPool`` in contrib. +* Minor improvements. + + +0.3.1 (2010-07-13) +++++++++++++++++++ + +* Added ``assert_host_name`` optional parameter. Now compatible with proxies. + + +0.3 (2009-12-10) +++++++++++++++++ + +* Added HTTPS support. +* Minor bug fixes. +* Refactored, broken backwards compatibility with 0.2. +* API to be treated as stable from this version forward. + + +0.2 (2008-11-17) +++++++++++++++++ + +* Added unit tests. +* Bug fixes. + + +0.1 (2008-11-16) +++++++++++++++++ + +* First release. diff --git a/CONTRIBUTORS.txt b/CONTRIBUTORS.txt new file mode 100644 index 0000000..97f3014 --- /dev/null +++ b/CONTRIBUTORS.txt @@ -0,0 +1,131 @@ +# Contributions to the urllib3 project + +## Creator & Maintainer + +* Andrey Petrov <andrey.petrov@shazow.net> + + +## Contributors + +In chronological order: + +* victor.vde <http://code.google.com/u/victor.vde/> + * HTTPS patch (which inspired HTTPSConnectionPool) + +* erikcederstrand <http://code.google.com/u/erikcederstrand/> + * NTLM-authenticated HTTPSConnectionPool + * Basic-authenticated HTTPSConnectionPool (merged into make_headers) + +* niphlod <niphlod@gmail.com> + * Client-verified SSL certificates for HTTPSConnectionPool + * Response gzip and deflate encoding support + * Better unicode support for filepost using StringIO buffers + +* btoconnor <brian@btoconnor.net> + * Non-multipart encoding for POST requests + +* p.dobrogost <http://code.google.com/u/@WBRSRlBZDhBFXQB6/> + * Code review, PEP8 compliance, benchmark fix + +* kennethreitz <me@kennethreitz.com> + * Bugfixes, suggestions, Requests integration + +* georgemarshall <http://github.com/georgemarshall> + * Bugfixes, Improvements and Test coverage + +* Thomas Kluyver <thomas@kluyver.me.uk> + * Python 3 support + +* brandon-rhodes <http://rhodesmill.org/brandon> + * Design review, bugfixes, test coverage. + +* studer <theo.studer@gmail.com> + * IPv6 url support and test coverage + +* Shivaram Lingamneni <slingamn@cs.stanford.edu> + * Support for explicitly closing pooled connections + +* hartator <hartator@gmail.com> + * Corrected multipart behavior for params + +* Thomas Weißschuh <thomas@t-8ch.de> + * Support for TLS SNI + * API unification of ssl_version/cert_reqs + * SSL fingerprint and alternative hostname verification + * Bugfixes in testsuite + +* Sune Kirkeby <mig@ibofobi.dk> + * Optional SNI-support for Python 2 via PyOpenSSL. + +* Marc Schlaich <marc.schlaich@gmail.com> + * Various bugfixes and test improvements. + +* Bryce Boe <bbzbryce@gmail.com> + * Correct six.moves conflict + * Fixed pickle support of some exceptions + +* Boris Figovsky <boris.figovsky@ravellosystems.com> + * Allowed to skip SSL hostname verification + +* Cory Benfield <http://lukasa.co.uk/about/> + * Stream method for Response objects. + * Return native strings in header values. + * Generate 'Host' header when using proxies. + +* Jason Robinson <jaywink@basshero.org> + * Add missing WrappedSocket.fileno method in PyOpenSSL + +* Audrius Butkevicius <audrius.butkevicius@elastichosts.com> + * Fixed a race condition + +* Stanislav Vitkovskiy <stas.vitkovsky@gmail.com> + * Added HTTPS (CONNECT) proxy support + +* Stephen Holsapple <sholsapp@gmail.com> + * Added abstraction for granular control of request fields + +* Martin von Gagern <Martin.vGagern@gmx.net> + * Support for non-ASCII header parameters + +* Kevin Burke <kev@inburke.com> and Pavel Kirichenko <juanych@yandex-team.ru> + * Support for separate connect and request timeouts + +* Peter Waller <p@pwaller.net> + * HTTPResponse.tell() for determining amount received over the wire + +* Nipunn Koorapati <nipunn1313@gmail.com> + * Ignore default ports when comparing hosts for equality + +* Danilo @dbrgn <http://dbrgn.ch/> + * Disabled TLS compression by default on Python 3.2+ + * Disabled TLS compression in pyopenssl contrib module + * Configurable cipher suites in pyopenssl contrib module + +* Roman Bogorodskiy <roman.bogorodskiy@ericsson.com> + * Account retries on proxy errors + +* Nicolas Delaby <nicolas.delaby@ezeep.com> + * Use the platform-specific CA certificate locations + +* Josh Schneier <https://github.com/jschneier> + * HTTPHeaderDict and associated tests and docs + * Bugfixes, docs, test coverage + +* Tahia Khan <http://tahia.tk/> + * Added Timeout examples in docs + +* Arthur Grunseid <http://grunseid.com> + * source_address support and tests (with https://github.com/bui) + +* Ian Cordasco <graffatcolmingov@gmail.com> + * PEP8 Compliance and Linting + * Add ability to pass socket options to an HTTP Connection + +* Erik Tollerud <erik.tollerud@gmail.com> + * Support for standard library io module. + +* Krishna Prasad <kprasad.iitd@gmail.com> + * Google App Engine documentation + +* [Your name or handle] <[email or website]> + * [Brief summary of your changes] diff --git a/LICENSE.txt b/LICENSE.txt new file mode 100644 index 0000000..2a02593 --- /dev/null +++ b/LICENSE.txt @@ -0,0 +1,19 @@ +This is the MIT license: http://www.opensource.org/licenses/mit-license.php + +Copyright 2008-2014 Andrey Petrov and contributors (see CONTRIBUTORS.txt) + +Permission is hereby granted, free of charge, to any person obtaining a copy of this +software and associated documentation files (the "Software"), to deal in the Software +without restriction, including without limitation the rights to use, copy, modify, merge, +publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons +to whom the Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, +INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR +PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE +FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR +OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS IN THE SOFTWARE. diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000..4edfedd --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,5 @@ +include README.rst CHANGES.rst LICENSE.txt CONTRIBUTORS.txt dev-requirements.txt Makefile +recursive-include dummyserver * +recursive-include test * +recursive-include docs * +recursive-exclude docs/_build * diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..b692b12 --- /dev/null +++ b/Makefile @@ -0,0 +1,48 @@ +REQUIREMENTS_FILE=dev-requirements.txt +REQUIREMENTS_OUT=dev-requirements.txt.log +SETUP_OUT=*.egg-info + + +all: setup requirements + +virtualenv: +ifndef VIRTUAL_ENV + $(error Must be run inside of a virtualenv) +endif + +setup: virtualenv $(SETUP_OUT) + +$(SETUP_OUT): setup.py setup.cfg + python setup.py develop + touch $(SETUP_OUT) + +requirements: setup $(REQUIREMENTS_OUT) + +piprot: setup + pip install piprot + piprot -x $(REQUIREMENTS_FILE) + +$(REQUIREMENTS_OUT): $(REQUIREMENTS_FILE) + pip install -r $(REQUIREMENTS_FILE) | tee -a $(REQUIREMENTS_OUT) + python setup.py develop + +clean: + find . -name "*.py[oc]" -delete + find . -name "__pycache__" -delete + rm -f $(REQUIREMENTS_OUT) + rm -rf docs/_build + +test: requirements + nosetests + +test-all: requirements + tox + +docs: + cd docs && pip install -r doc-requirements.txt && make html + +release: + ./release.sh + + +.PHONY: docs diff --git a/PKG-INFO b/PKG-INFO new file mode 100644 index 0000000..964cd4b --- /dev/null +++ b/PKG-INFO @@ -0,0 +1,625 @@ +Metadata-Version: 1.1 +Name: urllib3 +Version: 1.9.1 +Summary: HTTP library with thread-safe connection pooling, file post, and more. +Home-page: http://urllib3.readthedocs.org/ +Author: Andrey Petrov +Author-email: andrey.petrov@shazow.net +License: MIT +Description: ======= + urllib3 + ======= + + .. image:: https://travis-ci.org/shazow/urllib3.png?branch=master + :target: https://travis-ci.org/shazow/urllib3 + + .. image:: https://www.bountysource.com/badge/tracker?tracker_id=192525 + :target: https://www.bountysource.com/trackers/192525-urllib3?utm_source=192525&utm_medium=shield&utm_campaign=TRACKER_BADGE + + + Highlights + ========== + + - Re-use the same socket connection for multiple requests + (``HTTPConnectionPool`` and ``HTTPSConnectionPool``) + (with optional client-side certificate verification). + - File posting (``encode_multipart_formdata``). + - Built-in redirection and retries (optional). + - Supports gzip and deflate decoding. + - Thread-safe and sanity-safe. + - Works with AppEngine, gevent, and eventlib. + - Tested on Python 2.6+, Python 3.2+, and PyPy, with 100% unit test coverage. + - Small and easy to understand codebase perfect for extending and building upon. + For a more comprehensive solution, have a look at + `Requests <http://python-requests.org/>`_ which is also powered by ``urllib3``. + + + You might already be using urllib3! + =================================== + + ``urllib3`` powers `many great Python libraries + <https://sourcegraph.com/search?q=package+urllib3>`_, including ``pip`` and + ``requests``. + + + What's wrong with urllib and urllib2? + ===================================== + + There are two critical features missing from the Python standard library: + Connection re-using/pooling and file posting. It's not terribly hard to + implement these yourself, but it's much easier to use a module that already + did the work for you. + + The Python standard libraries ``urllib`` and ``urllib2`` have little to do + with each other. They were designed to be independent and standalone, each + solving a different scope of problems, and ``urllib3`` follows in a similar + vein. + + + Why do I want to reuse connections? + =================================== + + Performance. When you normally do a urllib call, a separate socket + connection is created with each request. By reusing existing sockets + (supported since HTTP 1.1), the requests will take up less resources on the + server's end, and also provide a faster response time at the client's end. + With some simple benchmarks (see `test/benchmark.py + <https://github.com/shazow/urllib3/blob/master/test/benchmark.py>`_ + ), downloading 15 URLs from google.com is about twice as fast when using + HTTPConnectionPool (which uses 1 connection) than using plain urllib (which + uses 15 connections). + + This library is perfect for: + + - Talking to an API + - Crawling a website + - Any situation where being able to post files, handle redirection, and + retrying is useful. It's relatively lightweight, so it can be used for + anything! + + + Examples + ======== + + Go to `urllib3.readthedocs.org <http://urllib3.readthedocs.org>`_ + for more nice syntax-highlighted examples. + + But, long story short:: + + import urllib3 + + http = urllib3.PoolManager() + + r = http.request('GET', 'http://google.com/') + + print r.status, r.data + + The ``PoolManager`` will take care of reusing connections for you whenever + you request the same host. For more fine-grained control of your connection + pools, you should look at `ConnectionPool + <http://urllib3.readthedocs.org/#connectionpool>`_. + + + Run the tests + ============= + + We use some external dependencies, multiple interpreters and code coverage + analysis while running test suite. Our ``Makefile`` handles much of this for + you as long as you're running it `inside of a virtualenv + <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_:: + + $ make test + [... magically installs dependencies and runs tests on your virtualenv] + Ran 182 tests in 1.633s + + OK (SKIP=6) + + Note that code coverage less than 100% is regarded as a failing run. Some + platform-specific tests are skipped unless run in that platform. To make sure + the code works in all of urllib3's supported platforms, you can run our ``tox`` + suite:: + + $ make test-all + [... tox creates a virtualenv for every platform and runs tests inside of each] + py26: commands succeeded + py27: commands succeeded + py32: commands succeeded + py33: commands succeeded + py34: commands succeeded + + Our test suite `runs continuously on Travis CI + <https://travis-ci.org/shazow/urllib3>`_ with every pull request. + + + Contributing + ============ + + #. `Check for open issues <https://github.com/shazow/urllib3/issues>`_ or open + a fresh issue to start a discussion around a feature idea or a bug. There is + a *Contributor Friendly* tag for issues that should be ideal for people who + are not very familiar with the codebase yet. + #. Fork the `urllib3 repository on Github <https://github.com/shazow/urllib3>`_ + to start making your changes. + #. Write a test which shows that the bug was fixed or that the feature works + as expected. + #. Send a pull request and bug the maintainer until it gets merged and published. + :) Make sure to add yourself to ``CONTRIBUTORS.txt``. + + + Sponsorship + =========== + + If your company benefits from this library, please consider `sponsoring its + development <http://urllib3.readthedocs.org/en/latest/#sponsorship>`_. + + + Changes + ======= + + 1.9.1 (2014-09-13) + ++++++++++++++++++ + + * Apply socket arguments before binding. (Issue #427) + + * More careful checks if fp-like object is closed. (Issue #435) + + * Fixed packaging issues of some development-related files not + getting included. (Issue #440) + + * Allow performing *only* fingerprint verification. (Issue #444) + + * Emit ``SecurityWarning`` if system clock is waaay off. (Issue #445) + + * Fixed PyOpenSSL compatibility with PyPy. (Issue #450) + + * Fixed ``BrokenPipeError`` and ``ConnectionError`` handling in Py3. + (Issue #443) + + + + 1.9 (2014-07-04) + ++++++++++++++++ + + * Shuffled around development-related files. If you're maintaining a distro + package of urllib3, you may need to tweak things. (Issue #415) + + * Unverified HTTPS requests will trigger a warning on the first request. See + our new `security documentation + <https://urllib3.readthedocs.org/en/latest/security.html>`_ for details. + (Issue #426) + + * New retry logic and ``urllib3.util.retry.Retry`` configuration object. + (Issue #326) + + * All raised exceptions should now wrapped in a + ``urllib3.exceptions.HTTPException``-extending exception. (Issue #326) + + * All errors during a retry-enabled request should be wrapped in + ``urllib3.exceptions.MaxRetryError``, including timeout-related exceptions + which were previously exempt. Underlying error is accessible from the + ``.reason`` propery. (Issue #326) + + * ``urllib3.exceptions.ConnectionError`` renamed to + ``urllib3.exceptions.ProtocolError``. (Issue #326) + + * Errors during response read (such as IncompleteRead) are now wrapped in + ``urllib3.exceptions.ProtocolError``. (Issue #418) + + * Requesting an empty host will raise ``urllib3.exceptions.LocationValueError``. + (Issue #417) + + * Catch read timeouts over SSL connections as + ``urllib3.exceptions.ReadTimeoutError``. (Issue #419) + + * Apply socket arguments before connecting. (Issue #427) + + + 1.8.3 (2014-06-23) + ++++++++++++++++++ + + * Fix TLS verification when using a proxy in Python 3.4.1. (Issue #385) + + * Add ``disable_cache`` option to ``urllib3.util.make_headers``. (Issue #393) + + * Wrap ``socket.timeout`` exception with + ``urllib3.exceptions.ReadTimeoutError``. (Issue #399) + + * Fixed proxy-related bug where connections were being reused incorrectly. + (Issues #366, #369) + + * Added ``socket_options`` keyword parameter which allows to define + ``setsockopt`` configuration of new sockets. (Issue #397) + + * Removed ``HTTPConnection.tcp_nodelay`` in favor of + ``HTTPConnection.default_socket_options``. (Issue #397) + + * Fixed ``TypeError`` bug in Python 2.6.4. (Issue #411) + + + 1.8.2 (2014-04-17) + ++++++++++++++++++ + + * Fix ``urllib3.util`` not being included in the package. + + + 1.8.1 (2014-04-17) + ++++++++++++++++++ + + * Fix AppEngine bug of HTTPS requests going out as HTTP. (Issue #356) + + * Don't install ``dummyserver`` into ``site-packages`` as it's only needed + for the test suite. (Issue #362) + + * Added support for specifying ``source_address``. (Issue #352) + + + 1.8 (2014-03-04) + ++++++++++++++++ + + * Improved url parsing in ``urllib3.util.parse_url`` (properly parse '@' in + username, and blank ports like 'hostname:'). + + * New ``urllib3.connection`` module which contains all the HTTPConnection + objects. + + * Several ``urllib3.util.Timeout``-related fixes. Also changed constructor + signature to a more sensible order. [Backwards incompatible] + (Issues #252, #262, #263) + + * Use ``backports.ssl_match_hostname`` if it's installed. (Issue #274) + + * Added ``.tell()`` method to ``urllib3.response.HTTPResponse`` which + returns the number of bytes read so far. (Issue #277) + + * Support for platforms without threading. (Issue #289) + + * Expand default-port comparison in ``HTTPConnectionPool.is_same_host`` + to allow a pool with no specified port to be considered equal to to an + HTTP/HTTPS url with port 80/443 explicitly provided. (Issue #305) + + * Improved default SSL/TLS settings to avoid vulnerabilities. + (Issue #309) + + * Fixed ``urllib3.poolmanager.ProxyManager`` not retrying on connect errors. + (Issue #310) + + * Disable Nagle's Algorithm on the socket for non-proxies. A subset of requests + will send the entire HTTP request ~200 milliseconds faster; however, some of + the resulting TCP packets will be smaller. (Issue #254) + + * Increased maximum number of SubjectAltNames in ``urllib3.contrib.pyopenssl`` + from the default 64 to 1024 in a single certificate. (Issue #318) + + * Headers are now passed and stored as a custom + ``urllib3.collections_.HTTPHeaderDict`` object rather than a plain ``dict``. + (Issue #329, #333) + + * Headers no longer lose their case on Python 3. (Issue #236) + + * ``urllib3.contrib.pyopenssl`` now uses the operating system's default CA + certificates on inject. (Issue #332) + + * Requests with ``retries=False`` will immediately raise any exceptions without + wrapping them in ``MaxRetryError``. (Issue #348) + + * Fixed open socket leak with SSL-related failures. (Issue #344, #348) + + + 1.7.1 (2013-09-25) + ++++++++++++++++++ + + * Added granular timeout support with new ``urllib3.util.Timeout`` class. + (Issue #231) + + * Fixed Python 3.4 support. (Issue #238) + + + 1.7 (2013-08-14) + ++++++++++++++++ + + * More exceptions are now pickle-able, with tests. (Issue #174) + + * Fixed redirecting with relative URLs in Location header. (Issue #178) + + * Support for relative urls in ``Location: ...`` header. (Issue #179) + + * ``urllib3.response.HTTPResponse`` now inherits from ``io.IOBase`` for bonus + file-like functionality. (Issue #187) + + * Passing ``assert_hostname=False`` when creating a HTTPSConnectionPool will + skip hostname verification for SSL connections. (Issue #194) + + * New method ``urllib3.response.HTTPResponse.stream(...)`` which acts as a + generator wrapped around ``.read(...)``. (Issue #198) + + * IPv6 url parsing enforces brackets around the hostname. (Issue #199) + + * Fixed thread race condition in + ``urllib3.poolmanager.PoolManager.connection_from_host(...)`` (Issue #204) + + * ``ProxyManager`` requests now include non-default port in ``Host: ...`` + header. (Issue #217) + + * Added HTTPS proxy support in ``ProxyManager``. (Issue #170 #139) + + * New ``RequestField`` object can be passed to the ``fields=...`` param which + can specify headers. (Issue #220) + + * Raise ``urllib3.exceptions.ProxyError`` when connecting to proxy fails. + (Issue #221) + + * Use international headers when posting file names. (Issue #119) + + * Improved IPv6 support. (Issue #203) + + + 1.6 (2013-04-25) + ++++++++++++++++ + + * Contrib: Optional SNI support for Py2 using PyOpenSSL. (Issue #156) + + * ``ProxyManager`` automatically adds ``Host: ...`` header if not given. + + * Improved SSL-related code. ``cert_req`` now optionally takes a string like + "REQUIRED" or "NONE". Same with ``ssl_version`` takes strings like "SSLv23" + The string values reflect the suffix of the respective constant variable. + (Issue #130) + + * Vendored ``socksipy`` now based on Anorov's fork which handles unexpectedly + closed proxy connections and larger read buffers. (Issue #135) + + * Ensure the connection is closed if no data is received, fixes connection leak + on some platforms. (Issue #133) + + * Added SNI support for SSL/TLS connections on Py32+. (Issue #89) + + * Tests fixed to be compatible with Py26 again. (Issue #125) + + * Added ability to choose SSL version by passing an ``ssl.PROTOCOL_*`` constant + to the ``ssl_version`` parameter of ``HTTPSConnectionPool``. (Issue #109) + + * Allow an explicit content type to be specified when encoding file fields. + (Issue #126) + + * Exceptions are now pickleable, with tests. (Issue #101) + + * Fixed default headers not getting passed in some cases. (Issue #99) + + * Treat "content-encoding" header value as case-insensitive, per RFC 2616 + Section 3.5. (Issue #110) + + * "Connection Refused" SocketErrors will get retried rather than raised. + (Issue #92) + + * Updated vendored ``six``, no longer overrides the global ``six`` module + namespace. (Issue #113) + + * ``urllib3.exceptions.MaxRetryError`` contains a ``reason`` property holding + the exception that prompted the final retry. If ``reason is None`` then it + was due to a redirect. (Issue #92, #114) + + * Fixed ``PoolManager.urlopen()`` from not redirecting more than once. + (Issue #149) + + * Don't assume ``Content-Type: text/plain`` for multi-part encoding parameters + that are not files. (Issue #111) + + * Pass `strict` param down to ``httplib.HTTPConnection``. (Issue #122) + + * Added mechanism to verify SSL certificates by fingerprint (md5, sha1) or + against an arbitrary hostname (when connecting by IP or for misconfigured + servers). (Issue #140) + + * Streaming decompression support. (Issue #159) + + + 1.5 (2012-08-02) + ++++++++++++++++ + + * Added ``urllib3.add_stderr_logger()`` for quickly enabling STDERR debug + logging in urllib3. + + * Native full URL parsing (including auth, path, query, fragment) available in + ``urllib3.util.parse_url(url)``. + + * Built-in redirect will switch method to 'GET' if status code is 303. + (Issue #11) + + * ``urllib3.PoolManager`` strips the scheme and host before sending the request + uri. (Issue #8) + + * New ``urllib3.exceptions.DecodeError`` exception for when automatic decoding, + based on the Content-Type header, fails. + + * Fixed bug with pool depletion and leaking connections (Issue #76). Added + explicit connection closing on pool eviction. Added + ``urllib3.PoolManager.clear()``. + + * 99% -> 100% unit test coverage. + + + 1.4 (2012-06-16) + ++++++++++++++++ + + * Minor AppEngine-related fixes. + + * Switched from ``mimetools.choose_boundary`` to ``uuid.uuid4()``. + + * Improved url parsing. (Issue #73) + + * IPv6 url support. (Issue #72) + + + 1.3 (2012-03-25) + ++++++++++++++++ + + * Removed pre-1.0 deprecated API. + + * Refactored helpers into a ``urllib3.util`` submodule. + + * Fixed multipart encoding to support list-of-tuples for keys with multiple + values. (Issue #48) + + * Fixed multiple Set-Cookie headers in response not getting merged properly in + Python 3. (Issue #53) + + * AppEngine support with Py27. (Issue #61) + + * Minor ``encode_multipart_formdata`` fixes related to Python 3 strings vs + bytes. + + + 1.2.2 (2012-02-06) + ++++++++++++++++++ + + * Fixed packaging bug of not shipping ``test-requirements.txt``. (Issue #47) + + + 1.2.1 (2012-02-05) + ++++++++++++++++++ + + * Fixed another bug related to when ``ssl`` module is not available. (Issue #41) + + * Location parsing errors now raise ``urllib3.exceptions.LocationParseError`` + which inherits from ``ValueError``. + + + 1.2 (2012-01-29) + ++++++++++++++++ + + * Added Python 3 support (tested on 3.2.2) + + * Dropped Python 2.5 support (tested on 2.6.7, 2.7.2) + + * Use ``select.poll`` instead of ``select.select`` for platforms that support + it. + + * Use ``Queue.LifoQueue`` instead of ``Queue.Queue`` for more aggressive + connection reusing. Configurable by overriding ``ConnectionPool.QueueCls``. + + * Fixed ``ImportError`` during install when ``ssl`` module is not available. + (Issue #41) + + * Fixed ``PoolManager`` redirects between schemes (such as HTTP -> HTTPS) not + completing properly. (Issue #28, uncovered by Issue #10 in v1.1) + + * Ported ``dummyserver`` to use ``tornado`` instead of ``webob`` + + ``eventlet``. Removed extraneous unsupported dummyserver testing backends. + Added socket-level tests. + + * More tests. Achievement Unlocked: 99% Coverage. + + + 1.1 (2012-01-07) + ++++++++++++++++ + + * Refactored ``dummyserver`` to its own root namespace module (used for + testing). + + * Added hostname verification for ``VerifiedHTTPSConnection`` by vendoring in + Py32's ``ssl_match_hostname``. (Issue #25) + + * Fixed cross-host HTTP redirects when using ``PoolManager``. (Issue #10) + + * Fixed ``decode_content`` being ignored when set through ``urlopen``. (Issue + #27) + + * Fixed timeout-related bugs. (Issues #17, #23) + + + 1.0.2 (2011-11-04) + ++++++++++++++++++ + + * Fixed typo in ``VerifiedHTTPSConnection`` which would only present as a bug if + you're using the object manually. (Thanks pyos) + + * Made RecentlyUsedContainer (and consequently PoolManager) more thread-safe by + wrapping the access log in a mutex. (Thanks @christer) + + * Made RecentlyUsedContainer more dict-like (corrected ``__delitem__`` and + ``__getitem__`` behaviour), with tests. Shouldn't affect core urllib3 code. + + + 1.0.1 (2011-10-10) + ++++++++++++++++++ + + * Fixed a bug where the same connection would get returned into the pool twice, + causing extraneous "HttpConnectionPool is full" log warnings. + + + 1.0 (2011-10-08) + ++++++++++++++++ + + * Added ``PoolManager`` with LRU expiration of connections (tested and + documented). + * Added ``ProxyManager`` (needs tests, docs, and confirmation that it works + with HTTPS proxies). + * Added optional partial-read support for responses when + ``preload_content=False``. You can now make requests and just read the headers + without loading the content. + * Made response decoding optional (default on, same as before). + * Added optional explicit boundary string for ``encode_multipart_formdata``. + * Convenience request methods are now inherited from ``RequestMethods``. Old + helpers like ``get_url`` and ``post_url`` should be abandoned in favour of + the new ``request(method, url, ...)``. + * Refactored code to be even more decoupled, reusable, and extendable. + * License header added to ``.py`` files. + * Embiggened the documentation: Lots of Sphinx-friendly docstrings in the code + and docs in ``docs/`` and on urllib3.readthedocs.org. + * Embettered all the things! + * Started writing this file. + + + 0.4.1 (2011-07-17) + ++++++++++++++++++ + + * Minor bug fixes, code cleanup. + + + 0.4 (2011-03-01) + ++++++++++++++++ + + * Better unicode support. + * Added ``VerifiedHTTPSConnection``. + * Added ``NTLMConnectionPool`` in contrib. + * Minor improvements. + + + 0.3.1 (2010-07-13) + ++++++++++++++++++ + + * Added ``assert_host_name`` optional parameter. Now compatible with proxies. + + + 0.3 (2009-12-10) + ++++++++++++++++ + + * Added HTTPS support. + * Minor bug fixes. + * Refactored, broken backwards compatibility with 0.2. + * API to be treated as stable from this version forward. + + + 0.2 (2008-11-17) + ++++++++++++++++ + + * Added unit tests. + * Bug fixes. + + + 0.1 (2008-11-16) + ++++++++++++++++ + + * First release. + +Keywords: urllib httplib threadsafe filepost http https ssl pooling +Platform: UNKNOWN +Classifier: Environment :: Web Environment +Classifier: Intended Audience :: Developers +Classifier: License :: OSI Approved :: MIT License +Classifier: Operating System :: OS Independent +Classifier: Programming Language :: Python +Classifier: Programming Language :: Python :: 2 +Classifier: Programming Language :: Python :: 3 +Classifier: Topic :: Internet :: WWW/HTTP +Classifier: Topic :: Software Development :: Libraries diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..fc6bccf --- /dev/null +++ b/README.rst @@ -0,0 +1,145 @@ +======= +urllib3 +======= + +.. image:: https://travis-ci.org/shazow/urllib3.png?branch=master + :target: https://travis-ci.org/shazow/urllib3 + +.. image:: https://www.bountysource.com/badge/tracker?tracker_id=192525 + :target: https://www.bountysource.com/trackers/192525-urllib3?utm_source=192525&utm_medium=shield&utm_campaign=TRACKER_BADGE + + +Highlights +========== + +- Re-use the same socket connection for multiple requests + (``HTTPConnectionPool`` and ``HTTPSConnectionPool``) + (with optional client-side certificate verification). +- File posting (``encode_multipart_formdata``). +- Built-in redirection and retries (optional). +- Supports gzip and deflate decoding. +- Thread-safe and sanity-safe. +- Works with AppEngine, gevent, and eventlib. +- Tested on Python 2.6+, Python 3.2+, and PyPy, with 100% unit test coverage. +- Small and easy to understand codebase perfect for extending and building upon. + For a more comprehensive solution, have a look at + `Requests <http://python-requests.org/>`_ which is also powered by ``urllib3``. + + +You might already be using urllib3! +=================================== + +``urllib3`` powers `many great Python libraries +<https://sourcegraph.com/search?q=package+urllib3>`_, including ``pip`` and +``requests``. + + +What's wrong with urllib and urllib2? +===================================== + +There are two critical features missing from the Python standard library: +Connection re-using/pooling and file posting. It's not terribly hard to +implement these yourself, but it's much easier to use a module that already +did the work for you. + +The Python standard libraries ``urllib`` and ``urllib2`` have little to do +with each other. They were designed to be independent and standalone, each +solving a different scope of problems, and ``urllib3`` follows in a similar +vein. + + +Why do I want to reuse connections? +=================================== + +Performance. When you normally do a urllib call, a separate socket +connection is created with each request. By reusing existing sockets +(supported since HTTP 1.1), the requests will take up less resources on the +server's end, and also provide a faster response time at the client's end. +With some simple benchmarks (see `test/benchmark.py +<https://github.com/shazow/urllib3/blob/master/test/benchmark.py>`_ +), downloading 15 URLs from google.com is about twice as fast when using +HTTPConnectionPool (which uses 1 connection) than using plain urllib (which +uses 15 connections). + +This library is perfect for: + +- Talking to an API +- Crawling a website +- Any situation where being able to post files, handle redirection, and + retrying is useful. It's relatively lightweight, so it can be used for + anything! + + +Examples +======== + +Go to `urllib3.readthedocs.org <http://urllib3.readthedocs.org>`_ +for more nice syntax-highlighted examples. + +But, long story short:: + + import urllib3 + + http = urllib3.PoolManager() + + r = http.request('GET', 'http://google.com/') + + print r.status, r.data + +The ``PoolManager`` will take care of reusing connections for you whenever +you request the same host. For more fine-grained control of your connection +pools, you should look at `ConnectionPool +<http://urllib3.readthedocs.org/#connectionpool>`_. + + +Run the tests +============= + +We use some external dependencies, multiple interpreters and code coverage +analysis while running test suite. Our ``Makefile`` handles much of this for +you as long as you're running it `inside of a virtualenv +<http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_:: + + $ make test + [... magically installs dependencies and runs tests on your virtualenv] + Ran 182 tests in 1.633s + + OK (SKIP=6) + +Note that code coverage less than 100% is regarded as a failing run. Some +platform-specific tests are skipped unless run in that platform. To make sure +the code works in all of urllib3's supported platforms, you can run our ``tox`` +suite:: + + $ make test-all + [... tox creates a virtualenv for every platform and runs tests inside of each] + py26: commands succeeded + py27: commands succeeded + py32: commands succeeded + py33: commands succeeded + py34: commands succeeded + +Our test suite `runs continuously on Travis CI +<https://travis-ci.org/shazow/urllib3>`_ with every pull request. + + +Contributing +============ + +#. `Check for open issues <https://github.com/shazow/urllib3/issues>`_ or open + a fresh issue to start a discussion around a feature idea or a bug. There is + a *Contributor Friendly* tag for issues that should be ideal for people who + are not very familiar with the codebase yet. +#. Fork the `urllib3 repository on Github <https://github.com/shazow/urllib3>`_ + to start making your changes. +#. Write a test which shows that the bug was fixed or that the feature works + as expected. +#. Send a pull request and bug the maintainer until it gets merged and published. + :) Make sure to add yourself to ``CONTRIBUTORS.txt``. + + +Sponsorship +=========== + +If your company benefits from this library, please consider `sponsoring its +development <http://urllib3.readthedocs.org/en/latest/#sponsorship>`_. diff --git a/dev-requirements.txt b/dev-requirements.txt new file mode 100644 index 0000000..8010704 --- /dev/null +++ b/dev-requirements.txt @@ -0,0 +1,7 @@ +nose==1.3.3 +mock==1.0.1 +coverage==3.7.1 +tox==1.7.1 + +# Tornado 3.2.2 makes our tests flaky, so we stick with 3.1 +tornado==3.1.1 diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..135c543 --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,130 @@ +# Makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +PAPER = +BUILDDIR = _build + +# Internal variables. +PAPEROPT_a4 = -D latex_paper_size=a4 +PAPEROPT_letter = -D latex_paper_size=letter +ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . + +.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest + +help: + @echo "Please use \`make <target>' where <target> is one of" + @echo " html to make standalone HTML files" + @echo " dirhtml to make HTML files named index.html in directories" + @echo " singlehtml to make a single large HTML file" + @echo " pickle to make pickle files" + @echo " json to make JSON files" + @echo " htmlhelp to make HTML files and a HTML help project" + @echo " qthelp to make HTML files and a qthelp project" + @echo " devhelp to make HTML files and a Devhelp project" + @echo " epub to make an epub" + @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" + @echo " latexpdf to make LaTeX files and run them through pdflatex" + @echo " text to make text files" + @echo " man to make manual pages" + @echo " changes to make an overview of all changed/added/deprecated items" + @echo " linkcheck to check all external links for integrity" + @echo " doctest to run all doctests embedded in the documentation (if enabled)" + +clean: + -rm -rf $(BUILDDIR)/* + +html: + $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." + +dirhtml: + $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." + +singlehtml: + $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml + @echo + @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." + +pickle: + $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle + @echo + @echo "Build finished; now you can process the pickle files." + +json: + $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json + @echo + @echo "Build finished; now you can process the JSON files." + +htmlhelp: + $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp + @echo + @echo "Build finished; now you can run HTML Help Workshop with the" \ + ".hhp project file in $(BUILDDIR)/htmlhelp." + +qthelp: + $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp + @echo + @echo "Build finished; now you can run "qcollectiongenerator" with the" \ + ".qhcp project file in $(BUILDDIR)/qthelp, like this:" + @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/urllib3.qhcp" + @echo "To view the help file:" + @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/urllib3.qhc" + +devhelp: + $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp + @echo + @echo "Build finished." + @echo "To view the help file:" + @echo "# mkdir -p $$HOME/.local/share/devhelp/urllib3" + @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/urllib3" + @echo "# devhelp" + +epub: + $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub + @echo + @echo "Build finished. The epub file is in $(BUILDDIR)/epub." + +latex: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo + @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." + @echo "Run \`make' in that directory to run these through (pdf)latex" \ + "(use \`make latexpdf' here to do that automatically)." + +latexpdf: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo "Running LaTeX files through pdflatex..." + $(MAKE) -C $(BUILDDIR)/latex all-pdf + @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." + +text: + $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text + @echo + @echo "Build finished. The text files are in $(BUILDDIR)/text." + +man: + $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man + @echo + @echo "Build finished. The manual pages are in $(BUILDDIR)/man." + +changes: + $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes + @echo + @echo "The overview file is in $(BUILDDIR)/changes." + +linkcheck: + $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck + @echo + @echo "Link check complete; look for any errors in the above output " \ + "or in $(BUILDDIR)/linkcheck/output.txt." + +doctest: + $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest + @echo "Testing of doctests in the sources finished, look at the " \ + "results in $(BUILDDIR)/doctest/output.txt." diff --git a/docs/README b/docs/README new file mode 100644 index 0000000..9126c73 --- /dev/null +++ b/docs/README @@ -0,0 +1,14 @@ +# Building the Docs + +First install Sphinx: + + pip install sphinx + +Install pyopenssl and certifi dependencies, to avoid some build errors. (Optional) + + # This step is optional + pip install ndg-httpsclient pyasn1 certifi + +Then build: + + cd docs && make html diff --git a/docs/collections.rst b/docs/collections.rst new file mode 100644 index 0000000..b348140 --- /dev/null +++ b/docs/collections.rst @@ -0,0 +1,13 @@ +Collections +=========== + +These datastructures are used to implement the behaviour of various urllib3 +components in a decoupled and application-agnostic design. + +.. automodule:: urllib3._collections + + .. autoclass:: RecentlyUsedContainer + :members: + + .. autoclass:: HTTPHeaderDict + :members: diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 0000000..7ac8393 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,232 @@ +# -*- coding: utf-8 -*- +# +# urllib3 documentation build configuration file, created by +# sphinx-quickstart on Wed Oct 5 13:15:40 2011. +# +# This file is execfile()d with the current directory set to its containing dir. +# +# Note that not all possible configuration values are present in this +# autogenerated file. +# +# All configuration values have a default; values that are commented out +# serve to show the default. + +from datetime import date +import os +import sys + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. + +root_path = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) +sys.path.insert(0, root_path) + +import urllib3 + + +# -- General configuration ----------------------------------------------------- + +# If your documentation needs a minimal Sphinx version, state it here. +#needs_sphinx = '1.0' + +# Add any Sphinx extension module names here, as strings. They can be extensions +# coming with Sphinx (named 'sphinx.ext.*') or your custom ones. +extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.doctest', + 'sphinx.ext.intersphinx', +] + +# Test code blocks only when explicitly specified +doctest_test_doctest_blocks = '' + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# The suffix of source filenames. +source_suffix = '.rst' + +# The encoding of source files. +#source_encoding = 'utf-8-sig' + +# The master toctree document. +master_doc = 'index' + +# General information about the project. +project = u'urllib3' +copyright = u'{year}, Andrey Petrov'.format(year=date.today().year) + +# The version info for the project you're documenting, acts as replacement for +# |version| and |release|, also used in various other places throughout the +# built documents. +# +# The short X.Y version. +version = urllib3.__version__ +# The full version, including alpha/beta/rc tags. +release = version + +# The language for content autogenerated by Sphinx. Refer to documentation +# for a list of supported languages. +#language = None + +# There are two options for replacing |today|: either, you set today to some +# non-false value, then it is used: +#today = '' +# Else, today_fmt is used as the format for a strftime call. +#today_fmt = '%B %d, %Y' + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +exclude_patterns = ['_build'] + +# The reST default role (used for this markup: `text`) to use for all documents. +#default_role = None + +# If true, '()' will be appended to :func: etc. cross-reference text. +#add_function_parentheses = True + +# If true, the current module name will be prepended to all description +# unit titles (such as .. function::). +#add_module_names = True + +# If true, sectionauthor and moduleauthor directives will be shown in the +# output. They are ignored by default. +#show_authors = False + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = 'sphinx' + +# A list of ignored prefixes for module index sorting. +#modindex_common_prefix = [] + + +# -- Options for HTML output --------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +html_theme = 'nature' + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +#html_theme_options = {} + +# Add any paths that contain custom themes here, relative to this directory. +#html_theme_path = [] + +# The name for this set of Sphinx documents. If None, it defaults to +# "<project> v<release> documentation". +#html_title = None + +# A shorter title for the navigation bar. Default is the same as html_title. +#html_short_title = None + +# The name of an image file (relative to this directory) to place at the top +# of the sidebar. +#html_logo = None + +# The name of an image file (within the static path) to use as favicon of the +# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 +# pixels large. +#html_favicon = None + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +#html_static_path = ['_static'] + +# If not '', a 'Last updated on:' timestamp is inserted at every page bottom, +# using the given strftime format. +#html_last_updated_fmt = '%b %d, %Y' + +# If true, SmartyPants will be used to convert quotes and dashes to +# typographically correct entities. +#html_use_smartypants = True + +# Custom sidebar templates, maps document names to template names. +#html_sidebars = {} + +# Additional templates that should be rendered to pages, maps page names to +# template names. +#html_additional_pages = {} + +# If false, no module index is generated. +#html_domain_indices = True + +# If false, no index is generated. +#html_use_index = True + +# If true, the index is split into individual pages for each letter. +#html_split_index = False + +# If true, links to the reST sources are added to the pages. +#html_show_sourcelink = True + +# If true, "Created using Sphinx" is shown in the HTML footer. Default is True. +#html_show_sphinx = True + +# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. +#html_show_copyright = True + +# If true, an OpenSearch description file will be output, and all pages will +# contain a <link> tag referring to it. The value of this option must be the +# base URL from which the finished HTML is served. +#html_use_opensearch = '' + +# This is the file name suffix for HTML files (e.g. ".xhtml"). +#html_file_suffix = None + +# Output file base name for HTML help builder. +htmlhelp_basename = 'urllib3doc' + + +# -- Options for LaTeX output -------------------------------------------------- + +# The paper size ('letter' or 'a4'). +#latex_paper_size = 'letter' + +# The font size ('10pt', '11pt' or '12pt'). +#latex_font_size = '10pt' + +# Grouping the document tree into LaTeX files. List of tuples +# (source start file, target name, title, author, documentclass [howto/manual]). +latex_documents = [ + ('index', 'urllib3.tex', u'urllib3 Documentation', + u'Andrey Petrov', 'manual'), +] + +# The name of an image file (relative to this directory) to place at the top of +# the title page. +#latex_logo = None + +# For "manual" documents, if this is true, then toplevel headings are parts, +# not chapters. +#latex_use_parts = False + +# If true, show page references after internal links. +#latex_show_pagerefs = False + +# If true, show URL addresses after external links. +#latex_show_urls = False + +# Additional stuff for the LaTeX preamble. +#latex_preamble = '' + +# Documents to append as an appendix to all manuals. +#latex_appendices = [] + +# If false, no module index is generated. +#latex_domain_indices = True + + +# -- Options for manual page output -------------------------------------------- + +# One entry per manual page. List of tuples +# (source start file, name, description, authors, manual section). +man_pages = [ + ('index', 'urllib3', u'urllib3 Documentation', + [u'Andrey Petrov'], 1) +] + +intersphinx_mapping = {'python': ('http://docs.python.org/2.7', None)} diff --git a/docs/contrib.rst b/docs/contrib.rst new file mode 100644 index 0000000..99c5492 --- /dev/null +++ b/docs/contrib.rst @@ -0,0 +1,14 @@ +.. _contrib-modules: + +Contrib Modules +=============== + +These modules implement various extra features, that may not be ready for +prime time. + +.. _pyopenssl: + +SNI-support for Python 2 +------------------------ + +.. automodule:: urllib3.contrib.pyopenssl diff --git a/docs/doc-requirements.txt b/docs/doc-requirements.txt new file mode 100644 index 0000000..b7b6d66 --- /dev/null +++ b/docs/doc-requirements.txt @@ -0,0 +1,12 @@ +ndg-httpsclient==0.3.2 +pyasn1==0.1.7 +Sphinx==1.2.2 +Jinja2==2.7.3 +MarkupSafe==0.23 +Pygments==1.6 +cryptography==0.4 +six==1.7.2 +cffi==0.8.2 +docutils==0.11 +pycparser==2.10 +certifi==14.05.14
\ No newline at end of file diff --git a/docs/exceptions.rst b/docs/exceptions.rst new file mode 100644 index 0000000..f9e0553 --- /dev/null +++ b/docs/exceptions.rst @@ -0,0 +1,7 @@ +Exceptions +========== + +Custom exceptions defined by urllib3 + +.. automodule:: urllib3.exceptions + :members: diff --git a/docs/helpers.rst b/docs/helpers.rst new file mode 100644 index 0000000..79f268b --- /dev/null +++ b/docs/helpers.rst @@ -0,0 +1,55 @@ +Helpers +======= + +Useful methods for working with :mod:`httplib`, completely decoupled from +code specific to **urllib3**. + + +Timeouts +-------- + +.. automodule:: urllib3.util.timeout + :members: + +Retries +------- + +.. automodule:: urllib3.util.retry + :members: + +URL Helpers +----------- + +.. automodule:: urllib3.util.url + :members: + +Filepost +-------- + +.. automodule:: urllib3.filepost + :members: + +.. automodule:: urllib3.fields + :members: + +Request +------- + +.. automodule:: urllib3.request + :members: + +.. automodule:: urllib3.util.request + :members: + +Response +-------- + +.. automodule:: urllib3.response + :members: + :undoc-members: + +SSL/TLS Helpers +--------------- + +.. automodule:: urllib3.util.ssl_ + :members: diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 0000000..1fc8a9c --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,340 @@ +===================== +urllib3 Documentation +===================== + +.. toctree:: + :hidden: + + pools + managers + security + helpers + collections + contrib + + +Highlights +========== + +- Re-use the same socket connection for multiple requests, with optional + client-side certificate verification. See: + :class:`~urllib3.connectionpool.HTTPConnectionPool` and + :class:`~urllib3.connectionpool.HTTPSConnectionPool` + +- File posting. See: + :func:`~urllib3.filepost.encode_multipart_formdata` + +- Built-in redirection and retries (optional). + +- Supports gzip and deflate decoding. See: + :func:`~urllib3.response.decode_gzip` and + :func:`~urllib3.response.decode_deflate` + +- Thread-safe and sanity-safe. + +- Tested on Python 2.6+ and Python 3.2+, 100% unit test coverage. + +- Works with AppEngine, gevent, eventlib, and the standard library :mod:`io` module. + +- Small and easy to understand codebase perfect for extending and building upon. + For a more comprehensive solution, have a look at + `Requests <http://python-requests.org/>`_ which is also powered by urllib3. + + +Getting Started +=============== + +Installing +---------- + +``pip install urllib3`` or fetch the latest source from +`github.com/shazow/urllib3 <https://github.com/shazow/urllib3>`_. + +Usage +----- + +.. doctest :: + + >>> import urllib3 + >>> http = urllib3.PoolManager() + >>> r = http.request('GET', 'http://example.com/') + >>> r.status + 200 + >>> r.headers['server'] + 'ECS (iad/182A)' + >>> 'data: ' + r.data + 'data: ...' + + +**By default, urllib3 does not verify your HTTPS requests**. +You'll need to supply a root certificate bundle, or use `certifi +<https://certifi.io/>`_ + +.. doctest :: + + >>> import urllib3, certifi + >>> http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where()) + >>> r = http.request('GET', 'https://insecure.com/') + Traceback (most recent call last): + ... + SSLError: hostname 'insecure.com' doesn't match 'svn.nmap.org' + +For more on making secure SSL/TLS HTTPS requests, read the :ref:`Security +section <security>`. + + +urllib3's responses respect the :mod:`io` framework from Python's +standard library, allowing use of these standard objects for purposes +like buffering: + +.. doctest :: + + >>> http = urllib3.PoolManager() + >>> r = http.urlopen('GET','http://example.com/', preload_content=False) + >>> b = io.BufferedReader(r, 2048) + >>> firstpart = b.read(100) + >>> # ... your internet connection fails momentarily ... + >>> secondpart = b.read() + + +Components +========== + +:mod:`urllib3` tries to strike a fine balance between power, extendability, and +sanity. To achieve this, the codebase is a collection of small reusable +utilities and abstractions composed together in a few helpful layers. + + +PoolManager +----------- + +The highest level is the :doc:`PoolManager(...) <managers>`. + +The :class:`~urllib3.poolmanagers.PoolManager` will take care of reusing +connections for you whenever you request the same host. This should cover most +scenarios without significant loss of efficiency, but you can always drop down +to a lower level component for more granular control. + +.. doctest :: + + >>> import urllib3 + >>> http = urllib3.PoolManager(10) + >>> r1 = http.request('GET', 'http://example.com/') + >>> r2 = http.request('GET', 'http://httpbin.org/') + >>> r3 = http.request('GET', 'http://httpbin.org/get') + >>> len(http.pools) + 2 + +A :class:`~urllib3.poolmanagers.PoolManager` is a proxy for a collection of +:class:`ConnectionPool` objects. They both inherit from +:class:`~urllib3.request.RequestMethods` to make sure that their API is +similar, so that instances of either can be passed around interchangeably. + + +ProxyManager +------------ + +The :class:`~urllib3.poolmanagers.ProxyManager` is an HTTP proxy-aware +subclass of :class:`~urllib3.poolmanagers.PoolManager`. It produces a single +:class:`~urllib3.connectionpool.HTTPConnectionPool` instance for all HTTP +connections and individual per-server:port +:class:`~urllib3.connectionpool.HTTPSConnectionPool` instances for tunnelled +HTTPS connections: + +:: + + >>> proxy = urllib3.ProxyManager('http://localhost:3128/') + >>> r1 = proxy.request('GET', 'http://google.com/') + >>> r2 = proxy.request('GET', 'http://httpbin.org/') + >>> len(proxy.pools) + 1 + >>> r3 = proxy.request('GET', 'https://httpbin.org/') + >>> r4 = proxy.request('GET', 'https://twitter.com/') + >>> len(proxy.pools) + 3 + + +ConnectionPool +-------------- + +The next layer is the :doc:`ConnectionPool(...) <pools>`. + +The :class:`~urllib3.connectionpool.HTTPConnectionPool` and +:class:`~urllib3.connectionpool.HTTPSConnectionPool` classes allow you to +define a pool of connections to a single host and make requests against this +pool with automatic **connection reusing** and **thread safety**. + +When the :mod:`ssl` module is available, then +:class:`~urllib3.connectionpool.HTTPSConnectionPool` objects can be configured +to check SSL certificates against specific provided certificate authorities. + +.. doctest :: + + >>> import urllib3 + >>> conn = urllib3.connection_from_url('http://httpbin.org/') + >>> r1 = conn.request('GET', 'http://httpbin.org/') + >>> r2 = conn.request('GET', '/user-agent') + >>> r3 = conn.request('GET', 'http://example.com') + Traceback (most recent call last): + ... + urllib3.exceptions.HostChangedError: HTTPConnectionPool(host='httpbin.org', port=None): Tried to open a foreign host with url: http://example.com + +Again, a ConnectionPool is a pool of connections to a specific host. Trying to +access a different host through the same pool will raise a ``HostChangedError`` +exception unless you specify ``assert_same_host=False``. Do this at your own +risk as the outcome is completely dependent on the behaviour of the host server. + +If you need to access multiple hosts and don't want to manage your own +collection of :class:`~urllib3.connectionpool.ConnectionPool` objects, then you +should use a :class:`~urllib3.poolmanager.PoolManager`. + +A :class:`~urllib3.connectionpool.ConnectionPool` is composed of a collection +of :class:`httplib.HTTPConnection` objects. + + +Timeout +------- + +A timeout can be set to abort socket operations on individual connections after +the specified duration. The timeout can be defined as a float or an instance of +:class:`~urllib3.util.timeout.Timeout` which gives more granular configuration +over how much time is allowed for different stages of the request. This can be +set for the entire pool or per-request. + +.. doctest :: + + >>> from urllib3 import PoolManager, Timeout + + >>> # Manager with 3 seconds combined timeout. + >>> http = PoolManager(timeout=3.0) + >>> r = http.request('GET', 'http://httpbin.org/delay/1') + + >>> # Manager with 2 second timeout for the read phase, no limit for the rest. + >>> http = PoolManager(timeout=Timeout(read=2.0)) + >>> r = http.request('GET', 'http://httpbin.org/delay/1') + + >>> # Manager with no timeout but a request with a timeout of 1 seconds for + >>> # the connect phase and 2 seconds for the read phase. + >>> http = PoolManager() + >>> r = http.request('GET', 'http://httpbin.org/delay/1', timeout=Timeout(connect=1.0, read=2.0)) + + >>> # Same Manager but request with a 5 second total timeout. + >>> r = http.request('GET', 'http://httpbin.org/delay/1', timeout=Timeout(total=5.0)) + +See the :class:`~urllib3.util.timeout.Timeout` definition for more details. + + +Retry +----- + +Retries can be configured by passing an instance of +:class:`~urllib3.util.retry.Retry`, or disabled by passing ``False``, to the +``retries`` parameter. + +Redirects are also considered to be a subset of retries but can be configured or +disabled individually. + +:: + + >>> from urllib3 import PoolManager, Retry + + >>> # Allow 3 retries total for all requests in this pool. These are the same: + >>> http = PoolManager(retries=3) + >>> http = PoolManager(retries=Retry(3)) + >>> http = PoolManager(retries=Retry(total=3)) + + >>> r = http.request('GET', 'http://httpbin.org/redirect/2') + >>> # r.status -> 200 + + >>> # Disable redirects for this request. + >>> r = http.request('GET', 'http://httpbin.org/redirect/2', retries=Retry(3, redirect=False)) + >>> # r.status -> 302 + + >>> # No total limit, but only do 5 connect retries, for this request. + >>> r = http.request('GET', 'http://httpbin.org/', retries=Retry(connect=5)) + + +See the :class:`~urllib3.util.retry.Retry` definition for more details. + + +Foundation +---------- + +At the very core, just like its predecessors, :mod:`urllib3` is built on top of +:mod:`httplib` -- the lowest level HTTP library included in the Python +standard library. + +To aid the limited functionality of the :mod:`httplib` module, :mod:`urllib3` +provides various helper methods which are used with the higher level components +but can also be used independently. + +.. toctree:: + + helpers + exceptions + + +Contrib Modules +--------------- + +These modules implement various extra features, that may not be ready for +prime time. + +.. toctree:: + + contrib + + +Contributing +============ + +#. `Check for open issues <https://github.com/shazow/urllib3/issues>`_ or open + a fresh issue to start a discussion around a feature idea or a bug. There is + a *Contributor Friendly* tag for issues that should be ideal for people who + are not very familiar with the codebase yet. +#. Fork the `urllib3 repository on Github <https://github.com/shazow/urllib3>`_ + to start making your changes. +#. Write a test which shows that the bug was fixed or that the feature works + as expected. +#. Send a pull request and bug the maintainer until it gets merged and published. + :) Make sure to add yourself to ``CONTRIBUTORS.txt``. + + +Sponsorship +=========== + +Please consider sponsoring urllib3 development, especially if your company +benefits from this library. + +* **Project Grant**: A grant for contiguous full-time development has the + biggest impact for progress. Periods of 3 to 10 days allow a contributor to + tackle substantial complex issues which are otherwise left to linger until + somebody can't afford to not fix them. + + Contact `@shazow <https://github.com/shazow>`_ to arrange a grant for a core + contributor. + +* **One-off**: Development will continue regardless of funding, but donations help move + things further along quicker as the maintainer can allocate more time off to + work on urllib3 specifically. + + .. raw:: html + + <a href="https://donorbox.org/personal-sponsor-urllib3" style="background-color:#1275ff;color:#fff;text-decoration:none;font-family:Verdana,sans-serif;display:inline-block;font-size:14px;padding:7px 16px;border-radius:5px;margin-right:2em;vertical-align:top;border:1px solid rgba(160,160,160,0.5);background-image:linear-gradient(#7dc5ee,#008cdd 85%,#30a2e4);box-shadow:inset 0 1px 0 rgba(255,255,255,0.25);">Sponsor with Credit Card</a> + + <a class="coinbase-button" data-code="137087702cf2e77ce400d53867b164e6" href="https://coinbase.com/checkouts/137087702cf2e77ce400d53867b164e6">Sponsor with Bitcoin</a><script src="https://coinbase.com/assets/button.js" type="text/javascript"></script> + +* **Recurring**: You're welcome to `support the maintainer on Gittip + <https://www.gittip.com/shazow/>`_. + + +Recent Sponsors +--------------- + +Huge thanks to all the companies and individuals who financially contributed to +the development of urllib3. Please send a PR if you've donated and would like +to be listed. + +* `Stripe <https://stripe.com/>`_ (June 23, 2014) + +.. * [Company] ([optional tagline]), [optional description of grant] ([date]) diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 0000000..41aa35b --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,170 @@ +@ECHO OFF + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set BUILDDIR=_build +set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . +if NOT "%PAPER%" == "" ( + set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% +) + +if "%1" == "" goto help + +if "%1" == "help" ( + :help + echo.Please use `make ^<target^>` where ^<target^> is one of + echo. html to make standalone HTML files + echo. dirhtml to make HTML files named index.html in directories + echo. singlehtml to make a single large HTML file + echo. pickle to make pickle files + echo. json to make JSON files + echo. htmlhelp to make HTML files and a HTML help project + echo. qthelp to make HTML files and a qthelp project + echo. devhelp to make HTML files and a Devhelp project + echo. epub to make an epub + echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter + echo. text to make text files + echo. man to make manual pages + echo. changes to make an overview over all changed/added/deprecated items + echo. linkcheck to check all external links for integrity + echo. doctest to run all doctests embedded in the documentation if enabled + goto end +) + +if "%1" == "clean" ( + for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i + del /q /s %BUILDDIR%\* + goto end +) + +if "%1" == "html" ( + %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/html. + goto end +) + +if "%1" == "dirhtml" ( + %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. + goto end +) + +if "%1" == "singlehtml" ( + %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. + goto end +) + +if "%1" == "pickle" ( + %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can process the pickle files. + goto end +) + +if "%1" == "json" ( + %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can process the JSON files. + goto end +) + +if "%1" == "htmlhelp" ( + %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can run HTML Help Workshop with the ^ +.hhp project file in %BUILDDIR%/htmlhelp. + goto end +) + +if "%1" == "qthelp" ( + %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; now you can run "qcollectiongenerator" with the ^ +.qhcp project file in %BUILDDIR%/qthelp, like this: + echo.^> qcollectiongenerator %BUILDDIR%\qthelp\urllib3.qhcp + echo.To view the help file: + echo.^> assistant -collectionFile %BUILDDIR%\qthelp\urllib3.ghc + goto end +) + +if "%1" == "devhelp" ( + %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. + goto end +) + +if "%1" == "epub" ( + %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The epub file is in %BUILDDIR%/epub. + goto end +) + +if "%1" == "latex" ( + %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex + if errorlevel 1 exit /b 1 + echo. + echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. + goto end +) + +if "%1" == "text" ( + %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The text files are in %BUILDDIR%/text. + goto end +) + +if "%1" == "man" ( + %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man + if errorlevel 1 exit /b 1 + echo. + echo.Build finished. The manual pages are in %BUILDDIR%/man. + goto end +) + +if "%1" == "changes" ( + %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes + if errorlevel 1 exit /b 1 + echo. + echo.The overview file is in %BUILDDIR%/changes. + goto end +) + +if "%1" == "linkcheck" ( + %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck + if errorlevel 1 exit /b 1 + echo. + echo.Link check complete; look for any errors in the above output ^ +or in %BUILDDIR%/linkcheck/output.txt. + goto end +) + +if "%1" == "doctest" ( + %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest + if errorlevel 1 exit /b 1 + echo. + echo.Testing of doctests in the sources finished, look at the ^ +results in %BUILDDIR%/doctest/output.txt. + goto end +) + +:end diff --git a/docs/managers.rst b/docs/managers.rst new file mode 100644 index 0000000..f9cab03 --- /dev/null +++ b/docs/managers.rst @@ -0,0 +1,62 @@ +PoolManager +=========== + +.. automodule:: urllib3.poolmanager + +A pool manager is an abstraction for a collection of +:doc:`ConnectionPools <pools>`. + +If you need to make requests to multiple hosts, then you can use a +:class:`.PoolManager`, which takes care of maintaining your pools +so you don't have to. + +.. doctest :: + + >>> from urllib3 import PoolManager + >>> manager = PoolManager(10) + >>> r = manager.request('GET', 'http://example.com') + >>> r.headers['server'] + 'ECS (iad/182A)' + >>> r = manager.request('GET', 'http://httpbin.org/') + >>> r.headers['server'] + 'gunicorn/18.0' + >>> r = manager.request('POST', 'http://httpbin.org/headers') + >>> r = manager.request('HEAD', 'http://httpbin.org/cookies') + >>> len(manager.pools) + 2 + >>> conn = manager.connection_from_host('httpbin.org') + >>> conn.num_requests + 3 + +The API of a :class:`.PoolManager` object is similar to that of a +:doc:`ConnectionPool <pools>`, so they can be passed around interchangeably. + +The PoolManager uses a Least Recently Used (LRU) policy for discarding old +pools. That is, if you set the PoolManager ``num_pools`` to 10, then after +making requests to 11 or more different hosts, the least recently used pools +will be cleaned up eventually. + +Cleanup of stale pools does not happen immediately. You can read more about the +implementation and the various adjustable variables within +:class:`~urllib3._collections.RecentlyUsedContainer`. + +API +--- + + .. autoclass:: PoolManager + :inherited-members: + +ProxyManager +============ + +:class:`.ProxyManager` is an HTTP proxy-aware subclass of :class:`.PoolManager`. +It produces a single +:class:`~urllib3.connectionpool.HTTPConnectionPool` instance for all HTTP +connections and individual per-server:port +:class:`~urllib3.connectionpool.HTTPSConnectionPool` instances for tunnelled +HTTPS connections. + +API +--- + .. autoclass:: ProxyManager + diff --git a/docs/pools.rst b/docs/pools.rst new file mode 100644 index 0000000..63cb7d1 --- /dev/null +++ b/docs/pools.rst @@ -0,0 +1,71 @@ +ConnectionPools +=============== + +.. automodule:: urllib3.connectionpool + +A connection pool is a container for a collection of connections to a specific +host. + +If you need to make requests to the same host repeatedly, then you should use a +:class:`.HTTPConnectionPool`. + +.. doctest :: + + >>> from urllib3 import HTTPConnectionPool + >>> pool = HTTPConnectionPool('ajax.googleapis.com', maxsize=1) + >>> r = pool.request('GET', '/ajax/services/search/web', + ... fields={'q': 'urllib3', 'v': '1.0'}) + >>> r.status + 200 + >>> r.headers['content-type'] + 'text/javascript; charset=utf-8' + >>> 'data: ' + r.data # Content of the response + 'data: ...' + >>> r = pool.request('GET', '/ajax/services/search/web', + ... fields={'q': 'python', 'v': '1.0'}) + >>> 'data: ' + r.data # Content of the response + 'data: ...' + >>> pool.num_connections + 1 + >>> pool.num_requests + 2 + +By default, the pool will cache just one connection. If you're planning on using +such a pool in a multithreaded environment, you should set the ``maxsize`` of +the pool to a higher number, such as the number of threads. You can also control +many other variables like timeout, blocking, and default headers. + +Helpers +------- + +There are various helper functions provided for instantiating these +ConnectionPools more easily: + + .. autofunction:: connection_from_url + +API +--- + +:mod:`urllib3.connectionpool` comes with two connection pools: + + .. autoclass:: HTTPConnectionPool + :members: + :inherited-members: + + .. autoclass:: HTTPSConnectionPool + + +All of these pools inherit from a common base class: + + .. autoclass:: ConnectionPool + +.. module:: urllib3.connection + +Related Classes +--------------- + +urllib3 implements its own :class:`HTTPConnection` object to allow for more +flexibility than the standard library's implementation. + +.. autoclass:: HTTPConnection + :members: diff --git a/docs/security.rst b/docs/security.rst new file mode 100644 index 0000000..5321e24 --- /dev/null +++ b/docs/security.rst @@ -0,0 +1,161 @@ +.. _security: + +Security: Verified HTTPS with SSL/TLS +===================================== + +Very important fact: **By default, urllib3 does not verify HTTPS requests.** + +The historic reason for this is that we rely on ``httplib`` for some of the +HTTP protocol implementation, and ``httplib`` does not verify requests out of +the box. This is not a good reason, but here we are. + +Luckily, it's not too hard to enable verified HTTPS requests and there are a +few ways to do it. + + +Python with SSL enabled +----------------------- + +First we need to make sure your Python installation has SSL enabled. Easiest +way to check is to simply open a Python shell and type `import ssl`:: + + >>> import ssl + Traceback (most recent call last): + ... + ImportError: No module named _ssl + +If you got an ``ImportError``, then your Python is not compiled with SSL support +and you'll need to re-install it. Read +`this StackOverflow thread <https://stackoverflow.com/questions/5128845/importerror-no-module-named-ssl>`_ +for details. + +Otherwise, if ``ssl`` imported cleanly, then we're ready to setup our certificates: +:ref:`certifi-with-urllib3`. + + +Enabling SSL on Google AppEngine +++++++++++++++++++++++++++++++++ + +If you're using Google App Engine, you'll need to add ``ssl`` as a library +dependency to your yaml file, like this:: + + libraries: + - name: ssl + version: latest + +If it's still not working, you may need to enable billing on your account +to `enable using sockets +<https://developers.google.com/appengine/docs/python/sockets/>`_. + + +.. _certifi-with-urllib3: + +Using Certifi with urllib3 +-------------------------- + +`Certifi <http://certifi.io/>`_ is a package which ships with Mozilla's root +certificates for easy programmatic access. + +1. Install the Python ``certifi`` package:: + + $ pip install certifi + +2. Setup your pool to require a certificate and provide the certifi bundle:: + + import urllib3 + import certifi + + http = urllib3.PoolManager( + cert_reqs='CERT_REQUIRED', # Force certificate check. + ca_certs=certifi.where(), # Path to the Certifi bundle. + ) + + # You're ready to make verified HTTPS requests. + try: + r = http.request('GET', 'https://example.com/') + except urllib3.exceptions.SSLError as e: + # Handle incorrect certificate error. + ... + +Make sure to update your ``certifi`` package regularly to get the latest root +certificates. + + +Using your system's root certificates +------------------------------------- + +Your system's root certificates may be more up-to-date than maintaining your +own, but the trick is finding where they live. Different operating systems have +them in different places. + +For example, on most Linux distributions they're at +``/etc/ssl/certs/ca-certificates.crt``. On Windows and OS X? `It's not so simple +<https://stackoverflow.com/questions/10095676/openssl-reasonable-default-for-trusted-ca-certificates>`_. + +Once you find your root certificate file:: + + import urllib3 + + ca_certs = "/etc/ssl/certs/ca-certificates.crt" # Or wherever it lives. + + http = urllib3.PoolManager( + cert_reqs='CERT_REQUIRED', # Force certificate check. + ca_certs=ca_certs, # Path to your certificate bundle. + ) + + # You're ready to make verified HTTPS requests. + try: + r = http.request('GET', 'https://example.com/') + except urllib3.exceptions.SSLError as e: + # Handle incorrect certificate error. + ... + + +OpenSSL / PyOpenSSL +------------------- + +By default, we use the standard library's ``ssl`` module. Unfortunately, there +are several limitations which are addressed by PyOpenSSL: + +- (Python 2.x) SNI support. +- (Python 2.x-3.2) Disabling compression to mitigate `CRIME attack + <https://en.wikipedia.org/wiki/CRIME_(security_exploit)>`_. + +To use the Python OpenSSL bindings instead, you'll need to install the required +packages:: + + $ pip install pyopenssl ndg-httpsclient pyasn1 + +Once the packages are installed, you can tell urllib3 to switch the ssl backend +to PyOpenSSL with :func:`~urllib3.contrib.pyopenssl.inject_into_urllib3`:: + + import urllib3.contrib.pyopenssl + urllib3.contrib.pyopenssl.inject_into_urllib3() + +Now you can continue using urllib3 as you normally would. + +For more details, check the :mod:`~urllib3.contrib.pyopenssl` module. + + +InsecureRequestWarning +---------------------- + +.. versionadded:: 1.9 + +Unverified HTTPS requests will trigger a warning:: + + urllib3/connectionpool.py:736: InsecureRequestWarning: Unverified HTTPS + request is being made. Adding certificate verification is strongly advised. + See: https://urllib3.readthedocs.org/en/latest/security.html + (This warning will only appear once by default.) + +This would be a great time to enable HTTPS verification: +:ref:`certifi-with-urllib3`. + +If you know what you're doing and would like to disable this and other warnings, +you can use :func:`~urllib3.disable_warnings`:: + + import urllib3 + urllib3.disable_warnings() + +Making unverified HTTPS requests is strongly discouraged. ˙ ͜ʟ˙ diff --git a/dummyserver/__init__.py b/dummyserver/__init__.py new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/dummyserver/__init__.py diff --git a/dummyserver/__init__.pyc b/dummyserver/__init__.pyc Binary files differnew file mode 100644 index 0000000..24e9f56 --- /dev/null +++ b/dummyserver/__init__.pyc diff --git a/dummyserver/certs/cacert.key b/dummyserver/certs/cacert.key new file mode 100644 index 0000000..fc8be6e --- /dev/null +++ b/dummyserver/certs/cacert.key @@ -0,0 +1,15 @@ +-----BEGIN RSA PRIVATE KEY----- +MIICXgIBAAKBgQDKz8a9X2SfNms9TffyNaFO/K42fAjUI1dAM1G8TVoj0a81ay7W +z4R7V1zfjXFT/WoRW04Y6xek0bff0OtsW+AriooUy7+pPYnrchpAW0p7hPjH1DIB +Vab01CJMhQ24er92Q1dF4WBv4yKqEaV1IYz1cvqvCCJgAbsWn1I8Cna1lwIDAQAB +AoGAPpkK+oBrCkk9qFpcYUH0W/DZxK9b+j4+O+6bF8e4Pr4FmjNO7bZ3aap5W/bI +N+hLyLepzz8guRqR6l8NixCAi+JiVW/agh5o4Jrek8UJWQamwSL4nJ36U3Iw/l7w +vcN1txfkpsA2SB9QFPGfDKcP3+IZMOZ7uFLzk/gzgLYiCEECQQD+M5Lj+e/sNBkb +XeIBxWIrPfEeIkk4SDkqImzDjq1FcfxZkvfskqyJgUvcLe5hb+ibY8jqWvtpvFTI +5v/tzHvPAkEAzD8fNrGz8KiAVTo7+0vrb4AebAdSLZUvbp0AGs5pXUAuQx6VEgz8 +opNKpZjBwAFsZKlwhgDqaChiAt9aKUkzuQJBALlai9I2Dg7SkjgVRdX6wjE7slRB +tdgXOa+SeHJD1+5aRiJeeu8CqFJ/d/wtdbOQsTCVGwxfmREpZT00ywrvXpsCQQCU +gs1Kcrn5Ijx2PCrDFbfyUkFMoaIiXNipYGVkGHRKhtFcoo8YGfNUry7W7BTtbNuI +8h9MgLvw0nQ5zHf9jymZAkEA7o4uA6XSS1zUqEQ55bZRFHcz/99pLH35G906iwVb +d5rd1Z4Cf5s/91o5gwL6ZP2Ig34CCn+NSL4avgz6K0VUaA== +-----END RSA PRIVATE KEY----- diff --git a/dummyserver/certs/cacert.pem b/dummyserver/certs/cacert.pem new file mode 100644 index 0000000..38d32dc --- /dev/null +++ b/dummyserver/certs/cacert.pem @@ -0,0 +1,23 @@ +-----BEGIN CERTIFICATE----- +MIIDzDCCAzWgAwIBAgIJALPrscov4b/jMA0GCSqGSIb3DQEBBQUAMIGBMQswCQYD +VQQGEwJGSTEOMAwGA1UECBMFZHVtbXkxDjAMBgNVBAcTBWR1bW15MQ4wDAYDVQQK +EwVkdW1teTEOMAwGA1UECxMFZHVtbXkxETAPBgNVBAMTCFNuYWtlT2lsMR8wHQYJ +KoZIhvcNAQkBFhBkdW1teUB0ZXN0LmxvY2FsMB4XDTExMTIyMjA3NTYxNVoXDTIx +MTIxOTA3NTYxNVowgYExCzAJBgNVBAYTAkZJMQ4wDAYDVQQIEwVkdW1teTEOMAwG +A1UEBxMFZHVtbXkxDjAMBgNVBAoTBWR1bW15MQ4wDAYDVQQLEwVkdW1teTERMA8G +A1UEAxMIU25ha2VPaWwxHzAdBgkqhkiG9w0BCQEWEGR1bW15QHRlc3QubG9jYWww +gZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAMrPxr1fZJ82az1N9/I1oU78rjZ8 +CNQjV0AzUbxNWiPRrzVrLtbPhHtXXN+NcVP9ahFbThjrF6TRt9/Q62xb4CuKihTL +v6k9ietyGkBbSnuE+MfUMgFVpvTUIkyFDbh6v3ZDV0XhYG/jIqoRpXUhjPVy+q8I +ImABuxafUjwKdrWXAgMBAAGjggFIMIIBRDAdBgNVHQ4EFgQUGXd/I2JiQllF+3Wd +x3NyBLszCi0wgbYGA1UdIwSBrjCBq4AUGXd/I2JiQllF+3Wdx3NyBLszCi2hgYek +gYQwgYExCzAJBgNVBAYTAkZJMQ4wDAYDVQQIEwVkdW1teTEOMAwGA1UEBxMFZHVt +bXkxDjAMBgNVBAoTBWR1bW15MQ4wDAYDVQQLEwVkdW1teTERMA8GA1UEAxMIU25h +a2VPaWwxHzAdBgkqhkiG9w0BCQEWEGR1bW15QHRlc3QubG9jYWyCCQCz67HKL+G/ +4zAPBgNVHRMBAf8EBTADAQH/MBEGCWCGSAGG+EIBAQQEAwIBBjAJBgNVHRIEAjAA +MCsGCWCGSAGG+EIBDQQeFhxUaW55Q0EgR2VuZXJhdGVkIENlcnRpZmljYXRlMA4G +A1UdDwEB/wQEAwICBDANBgkqhkiG9w0BAQUFAAOBgQBnnwtO8onsyhGOvS6cS8af +IRZyAXgouuPeP3Zrf5W80iZcV23u94969sPEIsD8Ujv5u0hUSrToGl4ahOMEOFNL +R5ndQOkh3VsepJnoE+RklZzbHWxU8onWlVzsNBFbclxidzaU3UHmdgXJAJL5nVSd +Zpn44QSS0UXsaC0mBimVNw== +-----END CERTIFICATE----- diff --git a/dummyserver/certs/client.csr b/dummyserver/certs/client.csr new file mode 100644 index 0000000..703d351 --- /dev/null +++ b/dummyserver/certs/client.csr @@ -0,0 +1,23 @@ +-----BEGIN CERTIFICATE----- +MIID1TCCAz6gAwIBAgIBAjANBgkqhkiG9w0BAQUFADCBgTELMAkGA1UEBhMCRkkx +DjAMBgNVBAgTBWR1bW15MQ4wDAYDVQQHEwVkdW1teTEOMAwGA1UEChMFZHVtbXkx +DjAMBgNVBAsTBWR1bW15MREwDwYDVQQDEwhTbmFrZU9pbDEfMB0GCSqGSIb3DQEJ +ARYQZHVtbXlAdGVzdC5sb2NhbDAeFw0xMTEyMjIwNzU5NTlaFw0yMTEyMTgwNzU5 +NTlaMH8xCzAJBgNVBAYTAkZJMQ4wDAYDVQQIEwVkdW1teTEOMAwGA1UEBxMFZHVt +bXkxDjAMBgNVBAoTBWR1bW15MQ4wDAYDVQQLEwVkdW1teTEPMA0GA1UEAxMGY2xp +ZW50MR8wHQYJKoZIhvcNAQkBFhBjbGllbnRAbG9jYWxob3N0MIGfMA0GCSqGSIb3 +DQEBAQUAA4GNADCBiQKBgQDaITA/XCzviqjex+lJJP+pgmQQ+ncUf+PDaFw86kWh +cWuI2eSBVaIaP6SsxYgIODQTjqYGjRogsd1Nvx3gRdIMEagTfVQyVwfDfNp8aT8v +SY/wDYFjsD07asmjGvwiu0sLp4t/tMz+x5ELlU4+hGnmPInH6hLK150DqgbNmJus +3wIDAQABo4IBXDCCAVgwCQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBLAwKwYJ +YIZIAYb4QgENBB4WHFRpbnlDQSBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0O +BBYEFG71FCU2yisH1GyrcqYaPKVeTWxBMIG2BgNVHSMEga4wgauAFBl3fyNiYkJZ +Rft1ncdzcgS7MwotoYGHpIGEMIGBMQswCQYDVQQGEwJGSTEOMAwGA1UECBMFZHVt +bXkxDjAMBgNVBAcTBWR1bW15MQ4wDAYDVQQKEwVkdW1teTEOMAwGA1UECxMFZHVt +bXkxETAPBgNVBAMTCFNuYWtlT2lsMR8wHQYJKoZIhvcNAQkBFhBkdW1teUB0ZXN0 +LmxvY2FsggkAs+uxyi/hv+MwCQYDVR0SBAIwADAbBgNVHREEFDASgRBjbGllbnRA +bG9jYWxob3N0MAsGA1UdDwQEAwIFoDANBgkqhkiG9w0BAQUFAAOBgQDEwZmp3yE8 +R4U9Ob/IeEo6O3p0T4o7GNvufGksM/mELmzyC+Qh/Ul6fNn+IhdKWpo61sMZou+n +eOufXVouc8dGhQ1Qi5s0i51d/ouhfYNs+AGRcpwEieVjZhgE1XfrNwvvjIx3yPtK +m9LSmCtVKcTWqOHQywKn+G83a+7bsh835Q== +-----END CERTIFICATE----- diff --git a/dummyserver/certs/client.key b/dummyserver/certs/client.key new file mode 100644 index 0000000..0d1c343 --- /dev/null +++ b/dummyserver/certs/client.key @@ -0,0 +1,15 @@ +-----BEGIN RSA PRIVATE KEY----- +MIICWwIBAAKBgQDaITA/XCzviqjex+lJJP+pgmQQ+ncUf+PDaFw86kWhcWuI2eSB +VaIaP6SsxYgIODQTjqYGjRogsd1Nvx3gRdIMEagTfVQyVwfDfNp8aT8vSY/wDYFj +sD07asmjGvwiu0sLp4t/tMz+x5ELlU4+hGnmPInH6hLK150DqgbNmJus3wIDAQAB +AoGAKMMg+AYqo4z+57rl/nQ6jpu+RWn4zMzlbEPZUMzavEOsu8M0L3MoOs1/4YV8 +WUTffnQe1ISTyF5Uo82+MIX7rUtfJITFSQrIWe7AGdm6Nir8TQQ7fD97modXyAUx +69I9SQjQlseg5PCRCp/DfcBncvHeYuf8gAJK5FfC1VW1cQECQQDvzFNoGrwnsrtm +4gj1Kt0c20jkIYFN6iQ6Sjs/1fk1cXDeWzjPaa92zF+i+02Ma/eWJ0ZVrhisw6sv +zxGp+ByBAkEA6N4SpuGWytJqCRfwenQZ4Oa8mNcVo5ulGf/eUHVXvHewWxQ7xWRi +iWUj/z1byR9+yno8Yfd04kaNCPYN/ICZXwJAAf5//xCh2e6pkkx06J0Ho7LLI2KH +8b7tuDJf1cMQxHoCB0dY7JijZeiDLxbJ6U4IjA4djp7ZA67I4KfnLLOsgQJARLZS +dp+WKR7RXwGLWfasNCqhd8/veKlSnEtdxAv76Ya/qQBdaq9mS/hmGMh4Lu52MTTE +YHvuJ159+yjvk5Q2rQJABjlU1+GZqwv/7QM7GxfJO+GPI4PHv5Yji5s7LLu2c6dL +XY2XiTHQL9PnPrKp3+qDDzxjyej30lfz4he6E5pI+g== +-----END RSA PRIVATE KEY----- diff --git a/dummyserver/certs/client.pem b/dummyserver/certs/client.pem new file mode 100644 index 0000000..29aea38 --- /dev/null +++ b/dummyserver/certs/client.pem @@ -0,0 +1,22 @@ +-----BEGIN CERTIFICATE----- +MIIDqDCCAxGgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBgTELMAkGA1UEBhMCRkkx +DjAMBgNVBAgTBWR1bW15MQ4wDAYDVQQHEwVkdW1teTEOMAwGA1UEChMFZHVtbXkx +DjAMBgNVBAsTBWR1bW15MREwDwYDVQQDEwhTbmFrZU9pbDEfMB0GCSqGSIb3DQEJ +ARYQZHVtbXlAdGVzdC5sb2NhbDAeFw0xMTEyMjIwNzU4NDBaFw0yMTEyMTgwNzU4 +NDBaMGExCzAJBgNVBAYTAkZJMQ4wDAYDVQQIEwVkdW1teTEOMAwGA1UEBxMFZHVt +bXkxDjAMBgNVBAoTBWR1bW15MQ4wDAYDVQQLEwVkdW1teTESMBAGA1UEAxMJbG9j +YWxob3N0MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDXe3FqmCWvP8XPxqtT ++0bfL1Tvzvebi46k0WIcUV8bP3vyYiSRXG9ALmyzZH4GHY9UVs4OEDkCMDOBSezB +0y9ai/9doTNcaictdEBu8nfdXKoTtzrn+VX4UPrkH5hm7NQ1fTQuj1MR7yBCmYqN +3Q2Q+Efuujyx0FwBzAuy1aKYuwIDAQABo4IBTTCCAUkwCQYDVR0TBAIwADARBglg +hkgBhvhCAQEEBAMCBkAwKwYJYIZIAYb4QgENBB4WHFRpbnlDQSBHZW5lcmF0ZWQg +Q2VydGlmaWNhdGUwHQYDVR0OBBYEFBvnSuVKLNPEFMAFqHw292vGHGJSMIG2BgNV +HSMEga4wgauAFBl3fyNiYkJZRft1ncdzcgS7MwotoYGHpIGEMIGBMQswCQYDVQQG +EwJGSTEOMAwGA1UECBMFZHVtbXkxDjAMBgNVBAcTBWR1bW15MQ4wDAYDVQQKEwVk +dW1teTEOMAwGA1UECxMFZHVtbXkxETAPBgNVBAMTCFNuYWtlT2lsMR8wHQYJKoZI +hvcNAQkBFhBkdW1teUB0ZXN0LmxvY2FsggkAs+uxyi/hv+MwCQYDVR0SBAIwADAZ +BgNVHREEEjAQgQ5yb290QGxvY2FsaG9zdDANBgkqhkiG9w0BAQUFAAOBgQBXdedG +XHLPmOVBeKWjTmaekcaQi44snhYqE1uXRoIQXQsyw+Ya5+n/uRxPKZO/C78EESL0 +8rnLTdZXm4GBYyHYmMy0AdWR7y030viOzAkWWRRRbuecsaUzFCI+F9jTV5LHuRzz +V8fUKwiEE9swzkWgMpfVTPFuPgzxwG9gMbrBfg== +-----END CERTIFICATE----- diff --git a/dummyserver/certs/client_bad.pem b/dummyserver/certs/client_bad.pem new file mode 100644 index 0000000..e9402fb --- /dev/null +++ b/dummyserver/certs/client_bad.pem @@ -0,0 +1,17 @@ +-----BEGIN CERTIFICATE----- +MIICsDCCAhmgAwIBAgIJAL63Nc6KY94BMA0GCSqGSIb3DQEBBQUAMEUxCzAJBgNV +BAYTAkFVMRMwEQYDVQQIEwpTb21lLVN0YXRlMSEwHwYDVQQKExhJbnRlcm5ldCBX +aWRnaXRzIFB0eSBMdGQwHhcNMTExMDExMjMxMjAzWhcNMjExMDA4MjMxMjAzWjBF +MQswCQYDVQQGEwJBVTETMBEGA1UECBMKU29tZS1TdGF0ZTEhMB8GA1UEChMYSW50 +ZXJuZXQgV2lkZ2l0cyBQdHkgTHRkMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB +gQC8HGxvblJ4Z0i/lIlG8jrNsFrCqYRAXtj3xdnnjfUpd/kNhU/KahMsG6urAe/4 +Yj+Zqf1sVnt0Cye8FZE3cN9RAcwJrlTCRiicJiXEbA7cPfMphqNGqjVHtmxQ1OsU +NHK7cxKa9OX3xmg4h55vxSZYgibAEPO2g3ueGk7RWIAQ8wIDAQABo4GnMIGkMB0G +A1UdDgQWBBSeeo/YRpdn5DK6bUI7ZDJ57pzGdDB1BgNVHSMEbjBsgBSeeo/YRpdn +5DK6bUI7ZDJ57pzGdKFJpEcwRTELMAkGA1UEBhMCQVUxEzARBgNVBAgTClNvbWUt +U3RhdGUxITAfBgNVBAoTGEludGVybmV0IFdpZGdpdHMgUHR5IEx0ZIIJAL63Nc6K +Y94BMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEFBQADgYEAOntoloMGt1325UR0 +GGEKQJbiRhLXY4otdgFjEvCG2RPZVLxWYhLMu0LkB6HBYULEuoy12ushtRWlhS1k +6PNRkaZ+LQTSREj6Do4c4zzLxCDmxYmejOz63cIWX2x5IY6qEx2BNOfmM4xEdF8W +LSGGbQfuAghiEh0giAi4AQloDlY= +-----END CERTIFICATE----- diff --git a/dummyserver/certs/server.crt b/dummyserver/certs/server.crt new file mode 100644 index 0000000..29aea38 --- /dev/null +++ b/dummyserver/certs/server.crt @@ -0,0 +1,22 @@ +-----BEGIN CERTIFICATE----- +MIIDqDCCAxGgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBgTELMAkGA1UEBhMCRkkx +DjAMBgNVBAgTBWR1bW15MQ4wDAYDVQQHEwVkdW1teTEOMAwGA1UEChMFZHVtbXkx +DjAMBgNVBAsTBWR1bW15MREwDwYDVQQDEwhTbmFrZU9pbDEfMB0GCSqGSIb3DQEJ +ARYQZHVtbXlAdGVzdC5sb2NhbDAeFw0xMTEyMjIwNzU4NDBaFw0yMTEyMTgwNzU4 +NDBaMGExCzAJBgNVBAYTAkZJMQ4wDAYDVQQIEwVkdW1teTEOMAwGA1UEBxMFZHVt +bXkxDjAMBgNVBAoTBWR1bW15MQ4wDAYDVQQLEwVkdW1teTESMBAGA1UEAxMJbG9j +YWxob3N0MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDXe3FqmCWvP8XPxqtT ++0bfL1Tvzvebi46k0WIcUV8bP3vyYiSRXG9ALmyzZH4GHY9UVs4OEDkCMDOBSezB +0y9ai/9doTNcaictdEBu8nfdXKoTtzrn+VX4UPrkH5hm7NQ1fTQuj1MR7yBCmYqN +3Q2Q+Efuujyx0FwBzAuy1aKYuwIDAQABo4IBTTCCAUkwCQYDVR0TBAIwADARBglg +hkgBhvhCAQEEBAMCBkAwKwYJYIZIAYb4QgENBB4WHFRpbnlDQSBHZW5lcmF0ZWQg +Q2VydGlmaWNhdGUwHQYDVR0OBBYEFBvnSuVKLNPEFMAFqHw292vGHGJSMIG2BgNV +HSMEga4wgauAFBl3fyNiYkJZRft1ncdzcgS7MwotoYGHpIGEMIGBMQswCQYDVQQG +EwJGSTEOMAwGA1UECBMFZHVtbXkxDjAMBgNVBAcTBWR1bW15MQ4wDAYDVQQKEwVk +dW1teTEOMAwGA1UECxMFZHVtbXkxETAPBgNVBAMTCFNuYWtlT2lsMR8wHQYJKoZI +hvcNAQkBFhBkdW1teUB0ZXN0LmxvY2FsggkAs+uxyi/hv+MwCQYDVR0SBAIwADAZ +BgNVHREEEjAQgQ5yb290QGxvY2FsaG9zdDANBgkqhkiG9w0BAQUFAAOBgQBXdedG +XHLPmOVBeKWjTmaekcaQi44snhYqE1uXRoIQXQsyw+Ya5+n/uRxPKZO/C78EESL0 +8rnLTdZXm4GBYyHYmMy0AdWR7y030viOzAkWWRRRbuecsaUzFCI+F9jTV5LHuRzz +V8fUKwiEE9swzkWgMpfVTPFuPgzxwG9gMbrBfg== +-----END CERTIFICATE----- diff --git a/dummyserver/certs/server.csr b/dummyserver/certs/server.csr new file mode 100644 index 0000000..29aea38 --- /dev/null +++ b/dummyserver/certs/server.csr @@ -0,0 +1,22 @@ +-----BEGIN CERTIFICATE----- +MIIDqDCCAxGgAwIBAgIBATANBgkqhkiG9w0BAQUFADCBgTELMAkGA1UEBhMCRkkx +DjAMBgNVBAgTBWR1bW15MQ4wDAYDVQQHEwVkdW1teTEOMAwGA1UEChMFZHVtbXkx +DjAMBgNVBAsTBWR1bW15MREwDwYDVQQDEwhTbmFrZU9pbDEfMB0GCSqGSIb3DQEJ +ARYQZHVtbXlAdGVzdC5sb2NhbDAeFw0xMTEyMjIwNzU4NDBaFw0yMTEyMTgwNzU4 +NDBaMGExCzAJBgNVBAYTAkZJMQ4wDAYDVQQIEwVkdW1teTEOMAwGA1UEBxMFZHVt +bXkxDjAMBgNVBAoTBWR1bW15MQ4wDAYDVQQLEwVkdW1teTESMBAGA1UEAxMJbG9j +YWxob3N0MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDXe3FqmCWvP8XPxqtT ++0bfL1Tvzvebi46k0WIcUV8bP3vyYiSRXG9ALmyzZH4GHY9UVs4OEDkCMDOBSezB +0y9ai/9doTNcaictdEBu8nfdXKoTtzrn+VX4UPrkH5hm7NQ1fTQuj1MR7yBCmYqN +3Q2Q+Efuujyx0FwBzAuy1aKYuwIDAQABo4IBTTCCAUkwCQYDVR0TBAIwADARBglg +hkgBhvhCAQEEBAMCBkAwKwYJYIZIAYb4QgENBB4WHFRpbnlDQSBHZW5lcmF0ZWQg +Q2VydGlmaWNhdGUwHQYDVR0OBBYEFBvnSuVKLNPEFMAFqHw292vGHGJSMIG2BgNV +HSMEga4wgauAFBl3fyNiYkJZRft1ncdzcgS7MwotoYGHpIGEMIGBMQswCQYDVQQG +EwJGSTEOMAwGA1UECBMFZHVtbXkxDjAMBgNVBAcTBWR1bW15MQ4wDAYDVQQKEwVk +dW1teTEOMAwGA1UECxMFZHVtbXkxETAPBgNVBAMTCFNuYWtlT2lsMR8wHQYJKoZI +hvcNAQkBFhBkdW1teUB0ZXN0LmxvY2FsggkAs+uxyi/hv+MwCQYDVR0SBAIwADAZ +BgNVHREEEjAQgQ5yb290QGxvY2FsaG9zdDANBgkqhkiG9w0BAQUFAAOBgQBXdedG +XHLPmOVBeKWjTmaekcaQi44snhYqE1uXRoIQXQsyw+Ya5+n/uRxPKZO/C78EESL0 +8rnLTdZXm4GBYyHYmMy0AdWR7y030viOzAkWWRRRbuecsaUzFCI+F9jTV5LHuRzz +V8fUKwiEE9swzkWgMpfVTPFuPgzxwG9gMbrBfg== +-----END CERTIFICATE----- diff --git a/dummyserver/certs/server.key b/dummyserver/certs/server.key new file mode 100644 index 0000000..89ab057 --- /dev/null +++ b/dummyserver/certs/server.key @@ -0,0 +1,15 @@ +-----BEGIN RSA PRIVATE KEY----- +MIICXgIBAAKBgQDXe3FqmCWvP8XPxqtT+0bfL1Tvzvebi46k0WIcUV8bP3vyYiSR +XG9ALmyzZH4GHY9UVs4OEDkCMDOBSezB0y9ai/9doTNcaictdEBu8nfdXKoTtzrn ++VX4UPrkH5hm7NQ1fTQuj1MR7yBCmYqN3Q2Q+Efuujyx0FwBzAuy1aKYuwIDAQAB +AoGBANOGBM6bbhq7ImYU4qf8+RQrdVg2tc9Fzo+yTnn30sF/rx8/AiCDOV4qdGAh +HKjKKaGj2H/rotqoEFcxBy05LrgJXxydBP72e9PYhNgKOcSmCQu4yALIPEXfKuIM +zgAErHVJ2l79fif3D4hzNyz+u5E1A9n3FG9cgaJSiYP8IG2RAkEA82GZ8rBkSGQQ +ZQ3oFuzPAAL21lbj8D0p76fsCpvS7427DtZDOjhOIKZmaeykpv+qSzRraqEqjDRi +S4kjQvwh6QJBAOKniZ+NDo2lSpbOFk+XlmABK1DormVpj8KebHEZYok1lRI+WiX9 +Nnoe9YLgix7++6H5SBBCcTB4HvM+5A4BuwMCQQChcX/eZbXP81iQwB3Rfzp8xnqY +icDf7qKvz9Ma4myU7Y5E9EpaB1mD/P14jDpYcMW050vNyqTfpiwB8TFL0NZpAkEA +02jkFH9UyMgZV6qo4tqI98l/ZrtyF8OrxSNSEPhVkZf6EQc5vN9/lc8Uv1vESEgb +3AwRrKDcxRH2BHtv6qSwkwJAGjqnkIcEkA75r1e55/EF2chcZW1+tpwKupE8CtAH +VXGd5DVwt4cYWkLUj2gF2fJbV97uu2MAg5CFDb+vQ6p5eA== +-----END RSA PRIVATE KEY----- diff --git a/dummyserver/certs/server.key.org b/dummyserver/certs/server.key.org new file mode 100644 index 0000000..709082e --- /dev/null +++ b/dummyserver/certs/server.key.org @@ -0,0 +1,12 @@ +-----BEGIN RSA PRIVATE KEY----- +Proc-Type: 4,ENCRYPTED +DEK-Info: DES-EDE3-CBC,8B3708EAD53963D4 + +uyLo4sFmSo7+K1uVgSENI+85JsG5o1JmovvxD/ucUl9CDhDj4KgFzs95r7gjjlhS +kA/hIY8Ec9i6T3zMXpAswWI5Mv2LE+UdYR5h60dYtIinLC7KF0QIztSecNWy20Bi +/NkobZhN7VZUuCEoSRWj4Ia3EuATF8Y9ZRGFPNsqMbSAhsGZ1P5xbDMEpE+5PbJP +LvdF9yWDT77rHeI4CKV4aP/yxtm1heEhKw5o6hdpPBQajPpjSQbh7/V6Qd0QsKcV +n27kPnSabsTbbc2IR40il4mZfHvXAlp4KoHL3RUgaons7q0hAUpUi+vJXbEukGGt +3dlyWwKwEFS7xBQ1pQvzcePI4/fRQxhZNxeFZW6n12Y3X61vg1IsG7usPhRe3iDP +3g1MXQMAhxaECnDN9b006IeoYdaktd4wrs/fn8x6Yz4= +-----END RSA PRIVATE KEY----- diff --git a/dummyserver/handlers.py b/dummyserver/handlers.py new file mode 100644 index 0000000..72faa1a --- /dev/null +++ b/dummyserver/handlers.py @@ -0,0 +1,242 @@ +from __future__ import print_function + +import collections +import gzip +import json +import logging +import sys +import time +import zlib + +from io import BytesIO +from tornado.wsgi import HTTPRequest + +try: + from urllib.parse import urlsplit +except ImportError: + from urlparse import urlsplit + +log = logging.getLogger(__name__) + + +class Response(object): + def __init__(self, body='', status='200 OK', headers=None): + if not isinstance(body, bytes): + body = body.encode('utf8') + + self.body = body + self.status = status + self.headers = headers or [("Content-type", "text/plain")] + + def __call__(self, environ, start_response): + start_response(self.status, self.headers) + return [self.body] + + +class WSGIHandler(object): + pass + + +class TestingApp(WSGIHandler): + """ + Simple app that performs various operations, useful for testing an HTTP + library. + + Given any path, it will attempt to load a corresponding local method if + it exists. Status code 200 indicates success, 400 indicates failure. Each + method has its own conditions for success/failure. + """ + def __call__(self, environ, start_response): + """ Call the correct method in this class based on the incoming URI """ + req = HTTPRequest(environ) + + req.params = {} + for k, v in req.arguments.items(): + req.params[k] = next(iter(v)) + + path = req.path[:] + if not path.startswith('/'): + path = urlsplit(path).path + + target = path[1:].replace('/', '_') + method = getattr(self, target, self.index) + resp = method(req) + + if dict(resp.headers).get('Connection') == 'close': + # FIXME: Can we kill the connection somehow? + pass + + return resp(environ, start_response) + + def index(self, _request): + "Render simple message" + return Response("Dummy server!") + + def source_address(self, request): + """Return the requester's IP address.""" + return Response(request.remote_ip) + + def set_up(self, request): + test_type = request.params.get('test_type') + test_id = request.params.get('test_id') + if test_id: + print('\nNew test %s: %s' % (test_type, test_id)) + else: + print('\nNew test %s' % test_type) + return Response("Dummy server is ready!") + + def specific_method(self, request): + "Confirm that the request matches the desired method type" + method = request.params.get('method') + if method and not isinstance(method, str): + method = method.decode('utf8') + + if request.method != method: + return Response("Wrong method: %s != %s" % + (method, request.method), status='400 Bad Request') + return Response() + + def upload(self, request): + "Confirm that the uploaded file conforms to specification" + # FIXME: This is a huge broken mess + param = request.params.get('upload_param', 'myfile').decode('ascii') + filename = request.params.get('upload_filename', '').decode('utf-8') + size = int(request.params.get('upload_size', '0')) + files_ = request.files.get(param) + + if len(files_) != 1: + return Response("Expected 1 file for '%s', not %d" %(param, len(files_)), + status='400 Bad Request') + file_ = files_[0] + + data = file_['body'] + if int(size) != len(data): + return Response("Wrong size: %d != %d" % + (size, len(data)), status='400 Bad Request') + + if filename != file_['filename']: + return Response("Wrong filename: %s != %s" % + (filename, file_.filename), + status='400 Bad Request') + + return Response() + + def redirect(self, request): + "Perform a redirect to ``target``" + target = request.params.get('target', '/') + headers = [('Location', target)] + return Response(status='303 See Other', headers=headers) + + def keepalive(self, request): + if request.params.get('close', b'0') == b'1': + headers = [('Connection', 'close')] + return Response('Closing', headers=headers) + + headers = [('Connection', 'keep-alive')] + return Response('Keeping alive', headers=headers) + + def sleep(self, request): + "Sleep for a specified amount of ``seconds``" + seconds = float(request.params.get('seconds', '1')) + time.sleep(seconds) + return Response() + + def echo(self, request): + "Echo back the params" + if request.method == 'GET': + return Response(request.query) + + return Response(request.body) + + def encodingrequest(self, request): + "Check for UA accepting gzip/deflate encoding" + data = b"hello, world!" + encoding = request.headers.get('Accept-Encoding', '') + headers = None + if encoding == 'gzip': + headers = [('Content-Encoding', 'gzip')] + file_ = BytesIO() + zipfile = gzip.GzipFile('', mode='w', fileobj=file_) + zipfile.write(data) + zipfile.close() + data = file_.getvalue() + elif encoding == 'deflate': + headers = [('Content-Encoding', 'deflate')] + data = zlib.compress(data) + elif encoding == 'garbage-gzip': + headers = [('Content-Encoding', 'gzip')] + data = 'garbage' + elif encoding == 'garbage-deflate': + headers = [('Content-Encoding', 'deflate')] + data = 'garbage' + return Response(data, headers=headers) + + def headers(self, request): + return Response(json.dumps(request.headers)) + + def successful_retry(self, request): + """ Handler which will return an error and then success + + It's not currently very flexible as the number of retries is hard-coded. + """ + test_name = request.headers.get('test-name', None) + if not test_name: + return Response("test-name header not set", + status="400 Bad Request") + + if not hasattr(self, 'retry_test_names'): + self.retry_test_names = collections.defaultdict(int) + self.retry_test_names[test_name] += 1 + + if self.retry_test_names[test_name] >= 2: + return Response("Retry successful!") + else: + return Response("need to keep retrying!", status="418 I'm A Teapot") + + def shutdown(self, request): + sys.exit() + + +# RFC2231-aware replacement of internal tornado function +def _parse_header(line): + r"""Parse a Content-type like header. + + Return the main content-type and a dictionary of options. + + >>> d = _parse_header("CD: fd; foo=\"bar\"; file*=utf-8''T%C3%A4st")[1] + >>> d['file'] == 'T\u00e4st' + True + >>> d['foo'] + 'bar' + """ + import tornado.httputil + import email.utils + from urllib3.packages import six + if not six.PY3: + line = line.encode('utf-8') + parts = tornado.httputil._parseparam(';' + line) + key = next(parts) + # decode_params treats first argument special, but we already stripped key + params = [('Dummy', 'value')] + for p in parts: + i = p.find('=') + if i >= 0: + name = p[:i].strip().lower() + value = p[i + 1:].strip() + params.append((name, value)) + params = email.utils.decode_params(params) + params.pop(0) # get rid of the dummy again + pdict = {} + for name, value in params: + value = email.utils.collapse_rfc2231_value(value) + if len(value) >= 2 and value[0] == '"' and value[-1] == '"': + value = value[1:-1] + pdict[name] = value + return key, pdict + +# TODO: make the following conditional as soon as we know a version +# which does not require this fix. +# See https://github.com/facebook/tornado/issues/868 +if True: + import tornado.httputil + tornado.httputil._parse_header = _parse_header diff --git a/dummyserver/handlers.pyc b/dummyserver/handlers.pyc Binary files differnew file mode 100644 index 0000000..22aedc3 --- /dev/null +++ b/dummyserver/handlers.pyc diff --git a/dummyserver/proxy.py b/dummyserver/proxy.py new file mode 100755 index 0000000..aca92a7 --- /dev/null +++ b/dummyserver/proxy.py @@ -0,0 +1,137 @@ +#!/usr/bin/env python +# +# Simple asynchronous HTTP proxy with tunnelling (CONNECT). +# +# GET/POST proxying based on +# http://groups.google.com/group/python-tornado/msg/7bea08e7a049cf26 +# +# Copyright (C) 2012 Senko Rasic <senko.rasic@dobarkod.hr> +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. + +import sys +import socket + +import tornado.httpserver +import tornado.ioloop +import tornado.iostream +import tornado.web +import tornado.httpclient + +__all__ = ['ProxyHandler', 'run_proxy'] + + +class ProxyHandler(tornado.web.RequestHandler): + SUPPORTED_METHODS = ['GET', 'POST', 'CONNECT'] + + @tornado.web.asynchronous + def get(self): + + def handle_response(response): + if response.error and not isinstance(response.error, + tornado.httpclient.HTTPError): + self.set_status(500) + self.write('Internal server error:\n' + str(response.error)) + self.finish() + else: + self.set_status(response.code) + for header in ('Date', 'Cache-Control', 'Server', + 'Content-Type', 'Location'): + v = response.headers.get(header) + if v: + self.set_header(header, v) + if response.body: + self.write(response.body) + self.finish() + + req = tornado.httpclient.HTTPRequest(url=self.request.uri, + method=self.request.method, body=self.request.body, + headers=self.request.headers, follow_redirects=False, + allow_nonstandard_methods=True) + + client = tornado.httpclient.AsyncHTTPClient() + try: + client.fetch(req, handle_response) + except tornado.httpclient.HTTPError as e: + if hasattr(e, 'response') and e.response: + self.handle_response(e.response) + else: + self.set_status(500) + self.write('Internal server error:\n' + str(e)) + self.finish() + + @tornado.web.asynchronous + def post(self): + return self.get() + + @tornado.web.asynchronous + def connect(self): + host, port = self.request.uri.split(':') + client = self.request.connection.stream + + def read_from_client(data): + upstream.write(data) + + def read_from_upstream(data): + client.write(data) + + def client_close(data=None): + if upstream.closed(): + return + if data: + upstream.write(data) + upstream.close() + + def upstream_close(data=None): + if client.closed(): + return + if data: + client.write(data) + client.close() + + def start_tunnel(): + client.read_until_close(client_close, read_from_client) + upstream.read_until_close(upstream_close, read_from_upstream) + client.write(b'HTTP/1.0 200 Connection established\r\n\r\n') + + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) + upstream = tornado.iostream.IOStream(s) + upstream.connect((host, int(port)), start_tunnel) + + +def run_proxy(port, start_ioloop=True): + """ + Run proxy on the specified port. If start_ioloop is True (default), + the tornado IOLoop will be started immediately. + """ + app = tornado.web.Application([ + (r'.*', ProxyHandler), + ]) + app.listen(port) + ioloop = tornado.ioloop.IOLoop.instance() + if start_ioloop: + ioloop.start() + +if __name__ == '__main__': + port = 8888 + if len(sys.argv) > 1: + port = int(sys.argv[1]) + + print ("Starting HTTP proxy on port %d" % port) + run_proxy(port) diff --git a/dummyserver/proxy.pyc b/dummyserver/proxy.pyc Binary files differnew file mode 100644 index 0000000..23fa01d --- /dev/null +++ b/dummyserver/proxy.pyc diff --git a/dummyserver/server.py b/dummyserver/server.py new file mode 100755 index 0000000..99f0835 --- /dev/null +++ b/dummyserver/server.py @@ -0,0 +1,181 @@ +#!/usr/bin/env python + +""" +Dummy server used for unit testing. +""" +from __future__ import print_function + +import errno +import logging +import os +import random +import string +import sys +import threading +import socket + +from tornado.platform.auto import set_close_exec +import tornado.wsgi +import tornado.httpserver +import tornado.ioloop +import tornado.web + + +log = logging.getLogger(__name__) + +CERTS_PATH = os.path.join(os.path.dirname(__file__), 'certs') +DEFAULT_CERTS = { + 'certfile': os.path.join(CERTS_PATH, 'server.crt'), + 'keyfile': os.path.join(CERTS_PATH, 'server.key'), +} +DEFAULT_CA = os.path.join(CERTS_PATH, 'cacert.pem') +DEFAULT_CA_BAD = os.path.join(CERTS_PATH, 'client_bad.pem') + + +# Different types of servers we have: + + +class SocketServerThread(threading.Thread): + """ + :param socket_handler: Callable which receives a socket argument for one + request. + :param ready_event: Event which gets set when the socket handler is + ready to receive requests. + """ + def __init__(self, socket_handler, host='localhost', port=8081, + ready_event=None): + threading.Thread.__init__(self) + + self.socket_handler = socket_handler + self.host = host + self.ready_event = ready_event + + def _start_server(self): + sock = socket.socket(socket.AF_INET6) + if sys.platform != 'win32': + sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) + sock.bind((self.host, 0)) + self.port = sock.getsockname()[1] + + # Once listen() returns, the server socket is ready + sock.listen(0) + + if self.ready_event: + self.ready_event.set() + + self.socket_handler(sock) + sock.close() + + def run(self): + self.server = self._start_server() + + +# FIXME: there is a pull request patching bind_sockets in Tornado directly. +# If it gets merged and released we can drop this and use +# `tornado.netutil.bind_sockets` again. +# https://github.com/facebook/tornado/pull/977 + +def bind_sockets(port, address=None, family=socket.AF_UNSPEC, backlog=128, + flags=None): + """Creates listening sockets bound to the given port and address. + + Returns a list of socket objects (multiple sockets are returned if + the given address maps to multiple IP addresses, which is most common + for mixed IPv4 and IPv6 use). + + Address may be either an IP address or hostname. If it's a hostname, + the server will listen on all IP addresses associated with the + name. Address may be an empty string or None to listen on all + available interfaces. Family may be set to either `socket.AF_INET` + or `socket.AF_INET6` to restrict to IPv4 or IPv6 addresses, otherwise + both will be used if available. + + The ``backlog`` argument has the same meaning as for + `socket.listen() <socket.socket.listen>`. + + ``flags`` is a bitmask of AI_* flags to `~socket.getaddrinfo`, like + ``socket.AI_PASSIVE | socket.AI_NUMERICHOST``. + """ + sockets = [] + if address == "": + address = None + if not socket.has_ipv6 and family == socket.AF_UNSPEC: + # Python can be compiled with --disable-ipv6, which causes + # operations on AF_INET6 sockets to fail, but does not + # automatically exclude those results from getaddrinfo + # results. + # http://bugs.python.org/issue16208 + family = socket.AF_INET + if flags is None: + flags = socket.AI_PASSIVE + binded_port = None + for res in set(socket.getaddrinfo(address, port, family, + socket.SOCK_STREAM, 0, flags)): + af, socktype, proto, canonname, sockaddr = res + try: + sock = socket.socket(af, socktype, proto) + except socket.error as e: + if e.args[0] == errno.EAFNOSUPPORT: + continue + raise + set_close_exec(sock.fileno()) + if os.name != 'nt': + sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) + if af == socket.AF_INET6: + # On linux, ipv6 sockets accept ipv4 too by default, + # but this makes it impossible to bind to both + # 0.0.0.0 in ipv4 and :: in ipv6. On other systems, + # separate sockets *must* be used to listen for both ipv4 + # and ipv6. For consistency, always disable ipv4 on our + # ipv6 sockets and use a separate ipv4 socket when needed. + # + # Python 2.x on windows doesn't have IPPROTO_IPV6. + if hasattr(socket, "IPPROTO_IPV6"): + sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 1) + + # automatic port allocation with port=None + # should bind on the same port on IPv4 and IPv6 + host, requested_port = sockaddr[:2] + if requested_port == 0 and binded_port is not None: + sockaddr = tuple([host, binded_port] + list(sockaddr[2:])) + + sock.setblocking(0) + sock.bind(sockaddr) + binded_port = sock.getsockname()[1] + sock.listen(backlog) + sockets.append(sock) + return sockets + + +def run_tornado_app(app, io_loop, certs, scheme, host): + if scheme == 'https': + http_server = tornado.httpserver.HTTPServer(app, ssl_options=certs, + io_loop=io_loop) + else: + http_server = tornado.httpserver.HTTPServer(app, io_loop=io_loop) + + sockets = bind_sockets(None, address=host) + port = sockets[0].getsockname()[1] + http_server.add_sockets(sockets) + return http_server, port + + +def run_loop_in_thread(io_loop): + t = threading.Thread(target=io_loop.start) + t.start() + return t + + +def get_unreachable_address(): + while True: + host = ''.join(random.choice(string.ascii_lowercase) + for _ in range(60)) + sockaddr = (host, 54321) + + # check if we are really "lucky" and hit an actual server + try: + s = socket.create_connection(sockaddr) + except socket.error: + return sockaddr + else: + s.close() diff --git a/dummyserver/server.pyc b/dummyserver/server.pyc Binary files differnew file mode 100644 index 0000000..b997d0e --- /dev/null +++ b/dummyserver/server.pyc diff --git a/dummyserver/testcase.py b/dummyserver/testcase.py new file mode 100644 index 0000000..335b2f2 --- /dev/null +++ b/dummyserver/testcase.py @@ -0,0 +1,131 @@ +import unittest +import socket +import threading +from nose.plugins.skip import SkipTest +from tornado import ioloop, web, wsgi + +from dummyserver.server import ( + SocketServerThread, + run_tornado_app, + run_loop_in_thread, + DEFAULT_CERTS, +) +from dummyserver.handlers import TestingApp +from dummyserver.proxy import ProxyHandler + + + +class SocketDummyServerTestCase(unittest.TestCase): + """ + A simple socket-based server is created for this class that is good for + exactly one request. + """ + scheme = 'http' + host = 'localhost' + + @classmethod + def _start_server(cls, socket_handler): + ready_event = threading.Event() + cls.server_thread = SocketServerThread(socket_handler=socket_handler, + ready_event=ready_event, + host=cls.host) + cls.server_thread.start() + ready_event.wait() + cls.port = cls.server_thread.port + + @classmethod + def tearDownClass(cls): + if hasattr(cls, 'server_thread'): + cls.server_thread.join(0.1) + + +class HTTPDummyServerTestCase(unittest.TestCase): + """ A simple HTTP server that runs when your test class runs + + Have your unittest class inherit from this one, and then a simple server + will start when your tests run, and automatically shut down when they + complete. For examples of what test requests you can send to the server, + see the TestingApp in dummyserver/handlers.py. + """ + scheme = 'http' + host = 'localhost' + host_alt = '127.0.0.1' # Some tests need two hosts + certs = DEFAULT_CERTS + + @classmethod + def _start_server(cls): + cls.io_loop = ioloop.IOLoop() + app = wsgi.WSGIContainer(TestingApp()) + cls.server, cls.port = run_tornado_app(app, cls.io_loop, cls.certs, + cls.scheme, cls.host) + cls.server_thread = run_loop_in_thread(cls.io_loop) + + @classmethod + def _stop_server(cls): + cls.io_loop.add_callback(cls.server.stop) + cls.io_loop.add_callback(cls.io_loop.stop) + cls.server_thread.join() + + @classmethod + def setUpClass(cls): + cls._start_server() + + @classmethod + def tearDownClass(cls): + cls._stop_server() + + +class HTTPSDummyServerTestCase(HTTPDummyServerTestCase): + scheme = 'https' + host = 'localhost' + certs = DEFAULT_CERTS + + +class HTTPDummyProxyTestCase(unittest.TestCase): + + http_host = 'localhost' + http_host_alt = '127.0.0.1' + + https_host = 'localhost' + https_host_alt = '127.0.0.1' + https_certs = DEFAULT_CERTS + + proxy_host = 'localhost' + proxy_host_alt = '127.0.0.1' + + @classmethod + def setUpClass(cls): + cls.io_loop = ioloop.IOLoop() + + app = wsgi.WSGIContainer(TestingApp()) + cls.http_server, cls.http_port = run_tornado_app( + app, cls.io_loop, None, 'http', cls.http_host) + + app = wsgi.WSGIContainer(TestingApp()) + cls.https_server, cls.https_port = run_tornado_app( + app, cls.io_loop, cls.https_certs, 'https', cls.http_host) + + app = web.Application([(r'.*', ProxyHandler)]) + cls.proxy_server, cls.proxy_port = run_tornado_app( + app, cls.io_loop, None, 'http', cls.proxy_host) + + cls.server_thread = run_loop_in_thread(cls.io_loop) + + @classmethod + def tearDownClass(cls): + cls.io_loop.add_callback(cls.http_server.stop) + cls.io_loop.add_callback(cls.https_server.stop) + cls.io_loop.add_callback(cls.proxy_server.stop) + cls.io_loop.add_callback(cls.io_loop.stop) + cls.server_thread.join() + + +class IPv6HTTPDummyServerTestCase(HTTPDummyServerTestCase): + host = '::1' + + @classmethod + def setUpClass(cls): + if not socket.has_ipv6: + raise SkipTest('IPv6 not available') + else: + super(IPv6HTTPDummyServerTestCase, cls).setUpClass() diff --git a/dummyserver/testcase.pyc b/dummyserver/testcase.pyc Binary files differnew file mode 100644 index 0000000..29cc06a --- /dev/null +++ b/dummyserver/testcase.pyc diff --git a/setup.cfg b/setup.cfg new file mode 100644 index 0000000..b140696 --- /dev/null +++ b/setup.cfg @@ -0,0 +1,15 @@ +[nosetests] +logging-clear-handlers = true +with-coverage = true +cover-package = urllib3 +cover-min-percentage = 100 +cover-erase = true + +[flake8] +max-line-length = 99 + +[egg_info] +tag_build = +tag_date = 0 +tag_svn_revision = 0 + diff --git a/setup.py b/setup.py new file mode 100644 index 0000000..f638377 --- /dev/null +++ b/setup.py @@ -0,0 +1,57 @@ +#!/usr/bin/env python + +from distutils.core import setup + +import os +import re + +try: + import setuptools +except ImportError: + pass # No 'develop' command, oh well. + +base_path = os.path.dirname(__file__) + +# Get the version (borrowed from SQLAlchemy) +fp = open(os.path.join(base_path, 'urllib3', '__init__.py')) +VERSION = re.compile(r".*__version__ = '(.*?)'", + re.S).match(fp.read()).group(1) +fp.close() + + +version = VERSION + +setup(name='urllib3', + version=version, + description="HTTP library with thread-safe connection pooling, file post, and more.", + long_description=open('README.rst').read() + '\n\n' + open('CHANGES.rst').read(), + classifiers=[ + 'Environment :: Web Environment', + 'Intended Audience :: Developers', + 'License :: OSI Approved :: MIT License', + 'Operating System :: OS Independent', + 'Programming Language :: Python', + 'Programming Language :: Python :: 2', + 'Programming Language :: Python :: 3', + 'Topic :: Internet :: WWW/HTTP', + 'Topic :: Software Development :: Libraries', + ], + keywords='urllib httplib threadsafe filepost http https ssl pooling', + author='Andrey Petrov', + author_email='andrey.petrov@shazow.net', + url='http://urllib3.readthedocs.org/', + license='MIT', + packages=['urllib3', + 'urllib3.packages', 'urllib3.packages.ssl_match_hostname', + 'urllib3.contrib', 'urllib3.util', + ], + requires=[], + tests_require=[ + # These are a less-specific subset of dev-requirements.txt, for the + # convenience of distro package maintainers. + 'nose', + 'mock', + 'tornado', + ], + test_suite='test', + ) diff --git a/test/__init__.py b/test/__init__.py new file mode 100644 index 0000000..d56a4d3 --- /dev/null +++ b/test/__init__.py @@ -0,0 +1,92 @@ +import warnings +import sys +import errno +import functools +import socket + +from nose.plugins.skip import SkipTest + +from urllib3.exceptions import MaxRetryError, HTTPWarning +from urllib3.packages import six + +# We need a host that will not immediately close the connection with a TCP +# Reset. SO suggests this hostname +TARPIT_HOST = '10.255.255.1' + +VALID_SOURCE_ADDRESSES = [('::1', 0), ('127.0.0.1', 0)] +# RFC 5737: 192.0.2.0/24 is for testing only. +# RFC 3849: 2001:db8::/32 is for documentation only. +INVALID_SOURCE_ADDRESSES = [('192.0.2.255', 0), ('2001:db8::1', 0)] + + +def clear_warnings(cls=HTTPWarning): + new_filters = [] + for f in warnings.filters: + if issubclass(f[2], cls): + continue + new_filters.append(f) + warnings.filters[:] = new_filters + +def setUp(): + clear_warnings() + warnings.simplefilter('ignore', HTTPWarning) + + +def onlyPy26OrOlder(test): + """Skips this test unless you are on Python2.6.x or earlier.""" + + @functools.wraps(test) + def wrapper(*args, **kwargs): + msg = "{name} only runs on Python2.6.x or older".format(name=test.__name__) + if sys.version_info >= (2, 7): + raise SkipTest(msg) + return test(*args, **kwargs) + return wrapper + +def onlyPy27OrNewer(test): + """Skips this test unless you are on Python 2.7.x or later.""" + + @functools.wraps(test) + def wrapper(*args, **kwargs): + msg = "{name} requires Python 2.7.x+ to run".format(name=test.__name__) + if sys.version_info < (2, 7): + raise SkipTest(msg) + return test(*args, **kwargs) + return wrapper + +def onlyPy3(test): + """Skips this test unless you are on Python3.x""" + + @functools.wraps(test) + def wrapper(*args, **kwargs): + msg = "{name} requires Python3.x to run".format(name=test.__name__) + if not six.PY3: + raise SkipTest(msg) + return test(*args, **kwargs) + return wrapper + +def requires_network(test): + """Helps you skip tests that require the network""" + + def _is_unreachable_err(err): + return getattr(err, 'errno', None) in (errno.ENETUNREACH, + errno.EHOSTUNREACH) # For OSX + + @functools.wraps(test) + def wrapper(*args, **kwargs): + msg = "Can't run {name} because the network is unreachable".format( + name=test.__name__) + try: + return test(*args, **kwargs) + except socket.error as e: + # This test needs an initial network connection to attempt the + # connection to the TARPIT_HOST. This fails if you are in a place + # without an Internet connection, so we skip the test in that case. + if _is_unreachable_err(e): + raise SkipTest(msg) + raise + except MaxRetryError as e: + if _is_unreachable_err(e.reason): + raise SkipTest(msg) + raise + return wrapper diff --git a/test/__init__.pyc b/test/__init__.pyc Binary files differnew file mode 100644 index 0000000..38b9317 --- /dev/null +++ b/test/__init__.pyc diff --git a/test/benchmark.py b/test/benchmark.py new file mode 100644 index 0000000..242e72f --- /dev/null +++ b/test/benchmark.py @@ -0,0 +1,77 @@ +#!/usr/bin/env python + +""" +Really simple rudimentary benchmark to compare ConnectionPool versus standard +urllib to demonstrate the usefulness of connection re-using. +""" +from __future__ import print_function + +import sys +import time +import urllib + +sys.path.append('../') +import urllib3 + + +# URLs to download. Doesn't matter as long as they're from the same host, so we +# can take advantage of connection re-using. +TO_DOWNLOAD = [ + 'http://code.google.com/apis/apps/', + 'http://code.google.com/apis/base/', + 'http://code.google.com/apis/blogger/', + 'http://code.google.com/apis/calendar/', + 'http://code.google.com/apis/codesearch/', + 'http://code.google.com/apis/contact/', + 'http://code.google.com/apis/books/', + 'http://code.google.com/apis/documents/', + 'http://code.google.com/apis/finance/', + 'http://code.google.com/apis/health/', + 'http://code.google.com/apis/notebook/', + 'http://code.google.com/apis/picasaweb/', + 'http://code.google.com/apis/spreadsheets/', + 'http://code.google.com/apis/webmastertools/', + 'http://code.google.com/apis/youtube/', +] + + +def urllib_get(url_list): + assert url_list + for url in url_list: + now = time.time() + r = urllib.urlopen(url) + elapsed = time.time() - now + print("Got in %0.3f: %s" % (elapsed, url)) + + +def pool_get(url_list): + assert url_list + pool = urllib3.PoolManager() + for url in url_list: + now = time.time() + r = pool.request('GET', url, assert_same_host=False) + elapsed = time.time() - now + print("Got in %0.3fs: %s" % (elapsed, url)) + + +if __name__ == '__main__': + print("Running pool_get ...") + now = time.time() + pool_get(TO_DOWNLOAD) + pool_elapsed = time.time() - now + + print("Running urllib_get ...") + now = time.time() + urllib_get(TO_DOWNLOAD) + urllib_elapsed = time.time() - now + + print("Completed pool_get in %0.3fs" % pool_elapsed) + print("Completed urllib_get in %0.3fs" % urllib_elapsed) + + +""" +Example results: + +Completed pool_get in 1.163s +Completed urllib_get in 2.318s +""" diff --git a/test/contrib/__init__.py b/test/contrib/__init__.py new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/test/contrib/__init__.py diff --git a/test/contrib/__init__.pyc b/test/contrib/__init__.pyc Binary files differnew file mode 100644 index 0000000..2d2fd5d --- /dev/null +++ b/test/contrib/__init__.pyc diff --git a/test/contrib/test_pyopenssl.py b/test/contrib/test_pyopenssl.py new file mode 100644 index 0000000..5d57527 --- /dev/null +++ b/test/contrib/test_pyopenssl.py @@ -0,0 +1,23 @@ +from nose.plugins.skip import SkipTest +from urllib3.packages import six + +if six.PY3: + raise SkipTest('Testing of PyOpenSSL disabled on PY3') + +try: + from urllib3.contrib.pyopenssl import (inject_into_urllib3, + extract_from_urllib3) +except ImportError as e: + raise SkipTest('Could not import PyOpenSSL: %r' % e) + + +from ..with_dummyserver.test_https import TestHTTPS, TestHTTPS_TLSv1 +from ..with_dummyserver.test_socketlevel import TestSNI, TestSocketClosing + + +def setup_module(): + inject_into_urllib3() + + +def teardown_module(): + extract_from_urllib3() diff --git a/test/contrib/test_pyopenssl.pyc b/test/contrib/test_pyopenssl.pyc Binary files differnew file mode 100644 index 0000000..6441273 --- /dev/null +++ b/test/contrib/test_pyopenssl.pyc diff --git a/test/port_helpers.py b/test/port_helpers.py new file mode 100644 index 0000000..e818a9b --- /dev/null +++ b/test/port_helpers.py @@ -0,0 +1,100 @@ +# These helpers are copied from test_support.py in the Python 2.7 standard +# library test suite. + +import socket + + +# Don't use "localhost", since resolving it uses the DNS under recent +# Windows versions (see issue #18792). +HOST = "127.0.0.1" +HOSTv6 = "::1" + +def find_unused_port(family=socket.AF_INET, socktype=socket.SOCK_STREAM): + """Returns an unused port that should be suitable for binding. This is + achieved by creating a temporary socket with the same family and type as + the 'sock' parameter (default is AF_INET, SOCK_STREAM), and binding it to + the specified host address (defaults to 0.0.0.0) with the port set to 0, + eliciting an unused ephemeral port from the OS. The temporary socket is + then closed and deleted, and the ephemeral port is returned. + + Either this method or bind_port() should be used for any tests where a + server socket needs to be bound to a particular port for the duration of + the test. Which one to use depends on whether the calling code is creating + a python socket, or if an unused port needs to be provided in a constructor + or passed to an external program (i.e. the -accept argument to openssl's + s_server mode). Always prefer bind_port() over find_unused_port() where + possible. Hard coded ports should *NEVER* be used. As soon as a server + socket is bound to a hard coded port, the ability to run multiple instances + of the test simultaneously on the same host is compromised, which makes the + test a ticking time bomb in a buildbot environment. On Unix buildbots, this + may simply manifest as a failed test, which can be recovered from without + intervention in most cases, but on Windows, the entire python process can + completely and utterly wedge, requiring someone to log in to the buildbot + and manually kill the affected process. + + (This is easy to reproduce on Windows, unfortunately, and can be traced to + the SO_REUSEADDR socket option having different semantics on Windows versus + Unix/Linux. On Unix, you can't have two AF_INET SOCK_STREAM sockets bind, + listen and then accept connections on identical host/ports. An EADDRINUSE + socket.error will be raised at some point (depending on the platform and + the order bind and listen were called on each socket). + + However, on Windows, if SO_REUSEADDR is set on the sockets, no EADDRINUSE + will ever be raised when attempting to bind two identical host/ports. When + accept() is called on each socket, the second caller's process will steal + the port from the first caller, leaving them both in an awkwardly wedged + state where they'll no longer respond to any signals or graceful kills, and + must be forcibly killed via OpenProcess()/TerminateProcess(). + + The solution on Windows is to use the SO_EXCLUSIVEADDRUSE socket option + instead of SO_REUSEADDR, which effectively affords the same semantics as + SO_REUSEADDR on Unix. Given the propensity of Unix developers in the Open + Source world compared to Windows ones, this is a common mistake. A quick + look over OpenSSL's 0.9.8g source shows that they use SO_REUSEADDR when + openssl.exe is called with the 's_server' option, for example. See + http://bugs.python.org/issue2550 for more info. The following site also + has a very thorough description about the implications of both REUSEADDR + and EXCLUSIVEADDRUSE on Windows: + http://msdn2.microsoft.com/en-us/library/ms740621(VS.85).aspx) + + XXX: although this approach is a vast improvement on previous attempts to + elicit unused ports, it rests heavily on the assumption that the ephemeral + port returned to us by the OS won't immediately be dished back out to some + other process when we close and delete our temporary socket but before our + calling code has a chance to bind the returned port. We can deal with this + issue if/when we come across it.""" + tempsock = socket.socket(family, socktype) + port = bind_port(tempsock) + tempsock.close() + del tempsock + return port + +def bind_port(sock, host=HOST): + """Bind the socket to a free port and return the port number. Relies on + ephemeral ports in order to ensure we are using an unbound port. This is + important as many tests may be running simultaneously, especially in a + buildbot environment. This method raises an exception if the sock.family + is AF_INET and sock.type is SOCK_STREAM, *and* the socket has SO_REUSEADDR + or SO_REUSEPORT set on it. Tests should *never* set these socket options + for TCP/IP sockets. The only case for setting these options is testing + multicasting via multiple UDP sockets. + + Additionally, if the SO_EXCLUSIVEADDRUSE socket option is available (i.e. + on Windows), it will be set on the socket. This will prevent anyone else + from bind()'ing to our host/port for the duration of the test. + """ + if sock.family == socket.AF_INET and sock.type == socket.SOCK_STREAM: + if hasattr(socket, 'SO_REUSEADDR'): + if sock.getsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR) == 1: + raise ValueError("tests should never set the SO_REUSEADDR " \ + "socket option on TCP/IP sockets!") + if hasattr(socket, 'SO_REUSEPORT'): + if sock.getsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT) == 1: + raise ValueError("tests should never set the SO_REUSEPORT " \ + "socket option on TCP/IP sockets!") + if hasattr(socket, 'SO_EXCLUSIVEADDRUSE'): + sock.setsockopt(socket.SOL_SOCKET, socket.SO_EXCLUSIVEADDRUSE, 1) + + sock.bind((host, 0)) + port = sock.getsockname()[1] + return port diff --git a/test/port_helpers.pyc b/test/port_helpers.pyc Binary files differnew file mode 100644 index 0000000..7a1c425 --- /dev/null +++ b/test/port_helpers.pyc diff --git a/test/test_collections.py b/test/test_collections.py new file mode 100644 index 0000000..4d173ac --- /dev/null +++ b/test/test_collections.py @@ -0,0 +1,180 @@ +import unittest + +from urllib3._collections import ( + HTTPHeaderDict, + RecentlyUsedContainer as Container +) +from urllib3.packages import six +xrange = six.moves.xrange + + +class TestLRUContainer(unittest.TestCase): + def test_maxsize(self): + d = Container(5) + + for i in xrange(5): + d[i] = str(i) + + self.assertEqual(len(d), 5) + + for i in xrange(5): + self.assertEqual(d[i], str(i)) + + d[i+1] = str(i+1) + + self.assertEqual(len(d), 5) + self.assertFalse(0 in d) + self.assertTrue(i+1 in d) + + def test_expire(self): + d = Container(5) + + for i in xrange(5): + d[i] = str(i) + + for i in xrange(5): + d.get(0) + + # Add one more entry + d[5] = '5' + + # Check state + self.assertEqual(list(d.keys()), [2, 3, 4, 0, 5]) + + def test_same_key(self): + d = Container(5) + + for i in xrange(10): + d['foo'] = i + + self.assertEqual(list(d.keys()), ['foo']) + self.assertEqual(len(d), 1) + + def test_access_ordering(self): + d = Container(5) + + for i in xrange(10): + d[i] = True + + # Keys should be ordered by access time + self.assertEqual(list(d.keys()), [5, 6, 7, 8, 9]) + + new_order = [7,8,6,9,5] + for k in new_order: + d[k] + + self.assertEqual(list(d.keys()), new_order) + + def test_delete(self): + d = Container(5) + + for i in xrange(5): + d[i] = True + + del d[0] + self.assertFalse(0 in d) + + d.pop(1) + self.assertFalse(1 in d) + + d.pop(1, None) + + def test_get(self): + d = Container(5) + + for i in xrange(5): + d[i] = True + + r = d.get(4) + self.assertEqual(r, True) + + r = d.get(5) + self.assertEqual(r, None) + + r = d.get(5, 42) + self.assertEqual(r, 42) + + self.assertRaises(KeyError, lambda: d[5]) + + def test_disposal(self): + evicted_items = [] + + def dispose_func(arg): + # Save the evicted datum for inspection + evicted_items.append(arg) + + d = Container(5, dispose_func=dispose_func) + for i in xrange(5): + d[i] = i + self.assertEqual(list(d.keys()), list(xrange(5))) + self.assertEqual(evicted_items, []) # Nothing disposed + + d[5] = 5 + self.assertEqual(list(d.keys()), list(xrange(1, 6))) + self.assertEqual(evicted_items, [0]) + + del d[1] + self.assertEqual(evicted_items, [0, 1]) + + d.clear() + self.assertEqual(evicted_items, [0, 1, 2, 3, 4, 5]) + + def test_iter(self): + d = Container() + + self.assertRaises(NotImplementedError, d.__iter__) + + +class TestHTTPHeaderDict(unittest.TestCase): + def setUp(self): + self.d = HTTPHeaderDict(A='foo') + self.d.add('a', 'bar') + + def test_overwriting_with_setitem_replaces(self): + d = HTTPHeaderDict() + + d['A'] = 'foo' + self.assertEqual(d['a'], 'foo') + + d['a'] = 'bar' + self.assertEqual(d['A'], 'bar') + + def test_copy(self): + h = self.d.copy() + self.assertTrue(self.d is not h) + self.assertEqual(self.d, h) + + def test_add(self): + d = HTTPHeaderDict() + + d['A'] = 'foo' + d.add('a', 'bar') + + self.assertEqual(d['a'], 'foo, bar') + self.assertEqual(d['A'], 'foo, bar') + + def test_getlist(self): + self.assertEqual(self.d.getlist('a'), ['foo', 'bar']) + self.assertEqual(self.d.getlist('A'), ['foo', 'bar']) + self.assertEqual(self.d.getlist('b'), []) + + def test_delitem(self): + del self.d['a'] + self.assertFalse('a' in self.d) + self.assertFalse('A' in self.d) + + def test_equal(self): + b = HTTPHeaderDict({'a': 'foo, bar'}) + self.assertEqual(self.d, b) + c = [('a', 'foo, bar')] + self.assertNotEqual(self.d, c) + + def test_len(self): + self.assertEqual(len(self.d), 1) + + def test_repr(self): + rep = "HTTPHeaderDict({'A': 'foo, bar'})" + self.assertEqual(repr(self.d), rep) + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_collections.pyc b/test/test_collections.pyc Binary files differnew file mode 100644 index 0000000..d1ecd73 --- /dev/null +++ b/test/test_collections.pyc diff --git a/test/test_compatibility.py b/test/test_compatibility.py new file mode 100644 index 0000000..05ee4de --- /dev/null +++ b/test/test_compatibility.py @@ -0,0 +1,23 @@ +import unittest +import warnings + +from urllib3.connection import HTTPConnection + + +class TestVersionCompatibility(unittest.TestCase): + def test_connection_strict(self): + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + + # strict=True is deprecated in Py33+ + conn = HTTPConnection('localhost', 12345, strict=True) + + if w: + self.fail('HTTPConnection raised warning on strict=True: %r' % w[0].message) + + def test_connection_source_address(self): + try: + # source_address does not exist in Py26- + conn = HTTPConnection('localhost', 12345, source_address='127.0.0.1') + except TypeError as e: + self.fail('HTTPConnection raised TypeError on source_adddress: %r' % e) diff --git a/test/test_compatibility.pyc b/test/test_compatibility.pyc Binary files differnew file mode 100644 index 0000000..2dfdf75 --- /dev/null +++ b/test/test_compatibility.pyc diff --git a/test/test_connectionpool.py b/test/test_connectionpool.py new file mode 100644 index 0000000..28fb89b --- /dev/null +++ b/test/test_connectionpool.py @@ -0,0 +1,211 @@ +import unittest + +from urllib3.connectionpool import ( + connection_from_url, + HTTPConnection, + HTTPConnectionPool, +) +from urllib3.util.timeout import Timeout +from urllib3.packages.ssl_match_hostname import CertificateError +from urllib3.exceptions import ( + ClosedPoolError, + EmptyPoolError, + HostChangedError, + LocationValueError, + MaxRetryError, + ProtocolError, + SSLError, +) + +from socket import error as SocketError +from ssl import SSLError as BaseSSLError + +try: # Python 3 + from queue import Empty + from http.client import HTTPException +except ImportError: + from Queue import Empty + from httplib import HTTPException + + +class TestConnectionPool(unittest.TestCase): + """ + Tests in this suite should exercise the ConnectionPool functionality + without actually making any network requests or connections. + """ + def test_same_host(self): + same_host = [ + ('http://google.com/', '/'), + ('http://google.com/', 'http://google.com/'), + ('http://google.com/', 'http://google.com'), + ('http://google.com/', 'http://google.com/abra/cadabra'), + ('http://google.com:42/', 'http://google.com:42/abracadabra'), + # Test comparison using default ports + ('http://google.com:80/', 'http://google.com/abracadabra'), + ('http://google.com/', 'http://google.com:80/abracadabra'), + ('https://google.com:443/', 'https://google.com/abracadabra'), + ('https://google.com/', 'https://google.com:443/abracadabra'), + ] + + for a, b in same_host: + c = connection_from_url(a) + self.assertTrue(c.is_same_host(b), "%s =? %s" % (a, b)) + + not_same_host = [ + ('https://google.com/', 'http://google.com/'), + ('http://google.com/', 'https://google.com/'), + ('http://yahoo.com/', 'http://google.com/'), + ('http://google.com:42', 'https://google.com/abracadabra'), + ('http://google.com', 'https://google.net/'), + # Test comparison with default ports + ('http://google.com:42', 'http://google.com'), + ('https://google.com:42', 'https://google.com'), + ('http://google.com:443', 'http://google.com'), + ('https://google.com:80', 'https://google.com'), + ('http://google.com:443', 'https://google.com'), + ('https://google.com:80', 'http://google.com'), + ('https://google.com:443', 'http://google.com'), + ('http://google.com:80', 'https://google.com'), + ] + + for a, b in not_same_host: + c = connection_from_url(a) + self.assertFalse(c.is_same_host(b), "%s =? %s" % (a, b)) + c = connection_from_url(b) + self.assertFalse(c.is_same_host(a), "%s =? %s" % (b, a)) + + + def test_max_connections(self): + pool = HTTPConnectionPool(host='localhost', maxsize=1, block=True) + + pool._get_conn(timeout=0.01) + + try: + pool._get_conn(timeout=0.01) + self.fail("Managed to get a connection without EmptyPoolError") + except EmptyPoolError: + pass + + try: + pool.request('GET', '/', pool_timeout=0.01) + self.fail("Managed to get a connection without EmptyPoolError") + except EmptyPoolError: + pass + + self.assertEqual(pool.num_connections, 1) + + def test_pool_edgecases(self): + pool = HTTPConnectionPool(host='localhost', maxsize=1, block=False) + + conn1 = pool._get_conn() + conn2 = pool._get_conn() # New because block=False + + pool._put_conn(conn1) + pool._put_conn(conn2) # Should be discarded + + self.assertEqual(conn1, pool._get_conn()) + self.assertNotEqual(conn2, pool._get_conn()) + + self.assertEqual(pool.num_connections, 3) + + def test_exception_str(self): + self.assertEqual( + str(EmptyPoolError(HTTPConnectionPool(host='localhost'), "Test.")), + "HTTPConnectionPool(host='localhost', port=None): Test.") + + def test_retry_exception_str(self): + self.assertEqual( + str(MaxRetryError( + HTTPConnectionPool(host='localhost'), "Test.", None)), + "HTTPConnectionPool(host='localhost', port=None): " + "Max retries exceeded with url: Test. (Caused by redirect)") + + err = SocketError("Test") + + # using err.__class__ here, as socket.error is an alias for OSError + # since Py3.3 and gets printed as this + self.assertEqual( + str(MaxRetryError( + HTTPConnectionPool(host='localhost'), "Test.", err)), + "HTTPConnectionPool(host='localhost', port=None): " + "Max retries exceeded with url: Test. " + "(Caused by %r)" % err) + + + def test_pool_size(self): + POOL_SIZE = 1 + pool = HTTPConnectionPool(host='localhost', maxsize=POOL_SIZE, block=True) + + def _raise(ex): + raise ex() + + def _test(exception, expect): + pool._make_request = lambda *args, **kwargs: _raise(exception) + self.assertRaises(expect, pool.request, 'GET', '/') + + self.assertEqual(pool.pool.qsize(), POOL_SIZE) + + # Make sure that all of the exceptions return the connection to the pool + _test(Empty, EmptyPoolError) + _test(BaseSSLError, SSLError) + _test(CertificateError, SSLError) + + # The pool should never be empty, and with these two exceptions being raised, + # a retry will be triggered, but that retry will fail, eventually raising + # MaxRetryError, not EmptyPoolError + # See: https://github.com/shazow/urllib3/issues/76 + pool._make_request = lambda *args, **kwargs: _raise(HTTPException) + self.assertRaises(MaxRetryError, pool.request, + 'GET', '/', retries=1, pool_timeout=0.01) + self.assertEqual(pool.pool.qsize(), POOL_SIZE) + + def test_assert_same_host(self): + c = connection_from_url('http://google.com:80') + + self.assertRaises(HostChangedError, c.request, + 'GET', 'http://yahoo.com:80', assert_same_host=True) + + def test_pool_close(self): + pool = connection_from_url('http://google.com:80') + + # Populate with some connections + conn1 = pool._get_conn() + conn2 = pool._get_conn() + conn3 = pool._get_conn() + pool._put_conn(conn1) + pool._put_conn(conn2) + + old_pool_queue = pool.pool + + pool.close() + self.assertEqual(pool.pool, None) + + self.assertRaises(ClosedPoolError, pool._get_conn) + + pool._put_conn(conn3) + + self.assertRaises(ClosedPoolError, pool._get_conn) + + self.assertRaises(Empty, old_pool_queue.get, block=False) + + def test_pool_timeouts(self): + pool = HTTPConnectionPool(host='localhost') + conn = pool._new_conn() + self.assertEqual(conn.__class__, HTTPConnection) + self.assertEqual(pool.timeout.__class__, Timeout) + self.assertEqual(pool.timeout._read, Timeout.DEFAULT_TIMEOUT) + self.assertEqual(pool.timeout._connect, Timeout.DEFAULT_TIMEOUT) + self.assertEqual(pool.timeout.total, None) + + pool = HTTPConnectionPool(host='localhost', timeout=3) + self.assertEqual(pool.timeout._read, 3) + self.assertEqual(pool.timeout._connect, 3) + self.assertEqual(pool.timeout.total, None) + + def test_no_host(self): + self.assertRaises(LocationValueError, HTTPConnectionPool, None) + + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_connectionpool.pyc b/test/test_connectionpool.pyc Binary files differnew file mode 100644 index 0000000..e87a3b3 --- /dev/null +++ b/test/test_connectionpool.pyc diff --git a/test/test_exceptions.py b/test/test_exceptions.py new file mode 100644 index 0000000..4190a61 --- /dev/null +++ b/test/test_exceptions.py @@ -0,0 +1,46 @@ +import unittest +import pickle + +from urllib3.exceptions import (HTTPError, MaxRetryError, LocationParseError, + ClosedPoolError, EmptyPoolError, + HostChangedError, ReadTimeoutError, + ConnectTimeoutError) +from urllib3.connectionpool import HTTPConnectionPool + + + +class TestPickle(unittest.TestCase): + + def verify_pickling(self, item): + return pickle.loads(pickle.dumps(item)) + + def test_exceptions(self): + assert self.verify_pickling(HTTPError(None)) + assert self.verify_pickling(MaxRetryError(None, None, None)) + assert self.verify_pickling(LocationParseError(None)) + assert self.verify_pickling(ConnectTimeoutError(None)) + + def test_exceptions_with_objects(self): + assert self.verify_pickling( + HTTPError('foo')) + + assert self.verify_pickling( + HTTPError('foo', IOError('foo'))) + + assert self.verify_pickling( + MaxRetryError(HTTPConnectionPool('localhost'), '/', None)) + + assert self.verify_pickling( + LocationParseError('fake location')) + + assert self.verify_pickling( + ClosedPoolError(HTTPConnectionPool('localhost'), None)) + + assert self.verify_pickling( + EmptyPoolError(HTTPConnectionPool('localhost'), None)) + + assert self.verify_pickling( + HostChangedError(HTTPConnectionPool('localhost'), '/', None)) + + assert self.verify_pickling( + ReadTimeoutError(HTTPConnectionPool('localhost'), '/', None)) diff --git a/test/test_exceptions.pyc b/test/test_exceptions.pyc Binary files differnew file mode 100644 index 0000000..3274e34 --- /dev/null +++ b/test/test_exceptions.pyc diff --git a/test/test_fields.py b/test/test_fields.py new file mode 100644 index 0000000..cdec68b --- /dev/null +++ b/test/test_fields.py @@ -0,0 +1,49 @@ +import unittest + +from urllib3.fields import guess_content_type, RequestField +from urllib3.packages.six import u + + +class TestRequestField(unittest.TestCase): + + def test_guess_content_type(self): + self.assertTrue(guess_content_type('image.jpg') in + ['image/jpeg', 'image/pjpeg']) + self.assertEqual(guess_content_type('notsure'), + 'application/octet-stream') + self.assertEqual(guess_content_type(None), 'application/octet-stream') + + def test_create(self): + simple_field = RequestField('somename', 'data') + self.assertEqual(simple_field.render_headers(), '\r\n') + filename_field = RequestField('somename', 'data', + filename='somefile.txt') + self.assertEqual(filename_field.render_headers(), '\r\n') + headers_field = RequestField('somename', 'data', + headers={'Content-Length': 4}) + self.assertEqual( + headers_field.render_headers(), 'Content-Length: 4\r\n\r\n') + + def test_make_multipart(self): + field = RequestField('somename', 'data') + field.make_multipart(content_type='image/jpg', + content_location='/test') + self.assertEqual( + field.render_headers(), + 'Content-Disposition: form-data; name="somename"\r\n' + 'Content-Type: image/jpg\r\n' + 'Content-Location: /test\r\n' + '\r\n') + + def test_render_parts(self): + field = RequestField('somename', 'data') + parts = field._render_parts({'name': 'value', 'filename': 'value'}) + self.assertTrue('name="value"' in parts) + self.assertTrue('filename="value"' in parts) + parts = field._render_parts([('name', 'value'), ('filename', 'value')]) + self.assertEqual(parts, 'name="value"; filename="value"') + + def test_render_part(self): + field = RequestField('somename', 'data') + param = field._render_part('filename', u('n\u00e4me')) + self.assertEqual(param, "filename*=utf-8''n%C3%A4me") diff --git a/test/test_fields.pyc b/test/test_fields.pyc Binary files differnew file mode 100644 index 0000000..4622899 --- /dev/null +++ b/test/test_fields.pyc diff --git a/test/test_filepost.py b/test/test_filepost.py new file mode 100644 index 0000000..390dbb3 --- /dev/null +++ b/test/test_filepost.py @@ -0,0 +1,133 @@ +import unittest + +from urllib3.filepost import encode_multipart_formdata, iter_fields +from urllib3.fields import RequestField +from urllib3.packages.six import b, u + + +BOUNDARY = '!! test boundary !!' + + +class TestIterfields(unittest.TestCase): + + def test_dict(self): + for fieldname, value in iter_fields(dict(a='b')): + self.assertEqual((fieldname, value), ('a', 'b')) + + self.assertEqual( + list(sorted(iter_fields(dict(a='b', c='d')))), + [('a', 'b'), ('c', 'd')]) + + def test_tuple_list(self): + for fieldname, value in iter_fields([('a', 'b')]): + self.assertEqual((fieldname, value), ('a', 'b')) + + self.assertEqual( + list(iter_fields([('a', 'b'), ('c', 'd')])), + [('a', 'b'), ('c', 'd')]) + + +class TestMultipartEncoding(unittest.TestCase): + + def test_input_datastructures(self): + fieldsets = [ + dict(k='v', k2='v2'), + [('k', 'v'), ('k2', 'v2')], + ] + + for fields in fieldsets: + encoded, _ = encode_multipart_formdata(fields, boundary=BOUNDARY) + self.assertEqual(encoded.count(b(BOUNDARY)), 3) + + + def test_field_encoding(self): + fieldsets = [ + [('k', 'v'), ('k2', 'v2')], + [('k', b'v'), (u('k2'), b'v2')], + [('k', b'v'), (u('k2'), 'v2')], + ] + + for fields in fieldsets: + encoded, content_type = encode_multipart_formdata(fields, boundary=BOUNDARY) + + self.assertEqual(encoded, + b'--' + b(BOUNDARY) + b'\r\n' + b'Content-Disposition: form-data; name="k"\r\n' + b'\r\n' + b'v\r\n' + b'--' + b(BOUNDARY) + b'\r\n' + b'Content-Disposition: form-data; name="k2"\r\n' + b'\r\n' + b'v2\r\n' + b'--' + b(BOUNDARY) + b'--\r\n' + , fields) + + self.assertEqual(content_type, + 'multipart/form-data; boundary=' + str(BOUNDARY)) + + + def test_filename(self): + fields = [('k', ('somename', b'v'))] + + encoded, content_type = encode_multipart_formdata(fields, boundary=BOUNDARY) + + self.assertEqual(encoded, + b'--' + b(BOUNDARY) + b'\r\n' + b'Content-Disposition: form-data; name="k"; filename="somename"\r\n' + b'Content-Type: application/octet-stream\r\n' + b'\r\n' + b'v\r\n' + b'--' + b(BOUNDARY) + b'--\r\n' + ) + + self.assertEqual(content_type, + 'multipart/form-data; boundary=' + str(BOUNDARY)) + + + def test_textplain(self): + fields = [('k', ('somefile.txt', b'v'))] + + encoded, content_type = encode_multipart_formdata(fields, boundary=BOUNDARY) + + self.assertEqual(encoded, + b'--' + b(BOUNDARY) + b'\r\n' + b'Content-Disposition: form-data; name="k"; filename="somefile.txt"\r\n' + b'Content-Type: text/plain\r\n' + b'\r\n' + b'v\r\n' + b'--' + b(BOUNDARY) + b'--\r\n' + ) + + self.assertEqual(content_type, + 'multipart/form-data; boundary=' + str(BOUNDARY)) + + + def test_explicit(self): + fields = [('k', ('somefile.txt', b'v', 'image/jpeg'))] + + encoded, content_type = encode_multipart_formdata(fields, boundary=BOUNDARY) + + self.assertEqual(encoded, + b'--' + b(BOUNDARY) + b'\r\n' + b'Content-Disposition: form-data; name="k"; filename="somefile.txt"\r\n' + b'Content-Type: image/jpeg\r\n' + b'\r\n' + b'v\r\n' + b'--' + b(BOUNDARY) + b'--\r\n' + ) + + self.assertEqual(content_type, + 'multipart/form-data; boundary=' + str(BOUNDARY)) + + def test_request_fields(self): + fields = [RequestField('k', b'v', filename='somefile.txt', headers={'Content-Type': 'image/jpeg'})] + + encoded, content_type = encode_multipart_formdata(fields, boundary=BOUNDARY) + + self.assertEqual(encoded, + b'--' + b(BOUNDARY) + b'\r\n' + b'Content-Type: image/jpeg\r\n' + b'\r\n' + b'v\r\n' + b'--' + b(BOUNDARY) + b'--\r\n' + ) diff --git a/test/test_filepost.pyc b/test/test_filepost.pyc Binary files differnew file mode 100644 index 0000000..ec54472 --- /dev/null +++ b/test/test_filepost.pyc diff --git a/test/test_poolmanager.py b/test/test_poolmanager.py new file mode 100644 index 0000000..754ee8a --- /dev/null +++ b/test/test_poolmanager.py @@ -0,0 +1,76 @@ +import unittest + +from urllib3.poolmanager import PoolManager +from urllib3 import connection_from_url +from urllib3.exceptions import ( + ClosedPoolError, + LocationValueError, +) + + +class TestPoolManager(unittest.TestCase): + def test_same_url(self): + # Convince ourselves that normally we don't get the same object + conn1 = connection_from_url('http://localhost:8081/foo') + conn2 = connection_from_url('http://localhost:8081/bar') + + self.assertNotEqual(conn1, conn2) + + # Now try again using the PoolManager + p = PoolManager(1) + + conn1 = p.connection_from_url('http://localhost:8081/foo') + conn2 = p.connection_from_url('http://localhost:8081/bar') + + self.assertEqual(conn1, conn2) + + def test_many_urls(self): + urls = [ + "http://localhost:8081/foo", + "http://www.google.com/mail", + "http://localhost:8081/bar", + "https://www.google.com/", + "https://www.google.com/mail", + "http://yahoo.com", + "http://bing.com", + "http://yahoo.com/", + ] + + connections = set() + + p = PoolManager(10) + + for url in urls: + conn = p.connection_from_url(url) + connections.add(conn) + + self.assertEqual(len(connections), 5) + + def test_manager_clear(self): + p = PoolManager(5) + + conn_pool = p.connection_from_url('http://google.com') + self.assertEqual(len(p.pools), 1) + + conn = conn_pool._get_conn() + + p.clear() + self.assertEqual(len(p.pools), 0) + + self.assertRaises(ClosedPoolError, conn_pool._get_conn) + + conn_pool._put_conn(conn) + + self.assertRaises(ClosedPoolError, conn_pool._get_conn) + + self.assertEqual(len(p.pools), 0) + + + def test_nohost(self): + p = PoolManager(5) + self.assertRaises(LocationValueError, p.connection_from_url, 'http://@') + self.assertRaises(LocationValueError, p.connection_from_url, None) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_poolmanager.pyc b/test/test_poolmanager.pyc Binary files differnew file mode 100644 index 0000000..077c2ac --- /dev/null +++ b/test/test_poolmanager.pyc diff --git a/test/test_proxymanager.py b/test/test_proxymanager.py new file mode 100644 index 0000000..e7b5c48 --- /dev/null +++ b/test/test_proxymanager.py @@ -0,0 +1,43 @@ +import unittest + +from urllib3.poolmanager import ProxyManager + + +class TestProxyManager(unittest.TestCase): + def test_proxy_headers(self): + p = ProxyManager('http://something:1234') + url = 'http://pypi.python.org/test' + + # Verify default headers + default_headers = {'Accept': '*/*', + 'Host': 'pypi.python.org'} + headers = p._set_proxy_headers(url) + + self.assertEqual(headers, default_headers) + + # Verify default headers don't overwrite provided headers + provided_headers = {'Accept': 'application/json', + 'custom': 'header', + 'Host': 'test.python.org'} + headers = p._set_proxy_headers(url, provided_headers) + + self.assertEqual(headers, provided_headers) + + # Verify proxy with nonstandard port + provided_headers = {'Accept': 'application/json'} + expected_headers = provided_headers.copy() + expected_headers.update({'Host': 'pypi.python.org:8080'}) + url_with_port = 'http://pypi.python.org:8080/test' + headers = p._set_proxy_headers(url_with_port, provided_headers) + + self.assertEqual(headers, expected_headers) + + def test_default_port(self): + p = ProxyManager('http://something') + self.assertEqual(p.proxy.port, 80) + p = ProxyManager('https://something') + self.assertEqual(p.proxy.port, 443) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_proxymanager.pyc b/test/test_proxymanager.pyc Binary files differnew file mode 100644 index 0000000..3696ee8 --- /dev/null +++ b/test/test_proxymanager.pyc diff --git a/test/test_response.py b/test/test_response.py new file mode 100644 index 0000000..7d67c93 --- /dev/null +++ b/test/test_response.py @@ -0,0 +1,398 @@ +import unittest + +from io import BytesIO, BufferedReader + +from urllib3.response import HTTPResponse +from urllib3.exceptions import DecodeError + + +from base64 import b64decode + +# A known random (i.e, not-too-compressible) payload generated with: +# "".join(random.choice(string.printable) for i in xrange(512)) +# .encode("zlib").encode("base64") +# Randomness in tests == bad, and fixing a seed may not be sufficient. +ZLIB_PAYLOAD = b64decode(b"""\ +eJwFweuaoQAAANDfineQhiKLUiaiCzvuTEmNNlJGiL5QhnGpZ99z8luQfe1AHoMioB+QSWHQu/L+ +lzd7W5CipqYmeVTBjdgSATdg4l4Z2zhikbuF+EKn69Q0DTpdmNJz8S33odfJoVEexw/l2SS9nFdi +pis7KOwXzfSqarSo9uJYgbDGrs1VNnQpT9f8zAorhYCEZronZQF9DuDFfNK3Hecc+WHLnZLQptwk +nufw8S9I43sEwxsT71BiqedHo0QeIrFE01F/4atVFXuJs2yxIOak3bvtXjUKAA6OKnQJ/nNvDGKZ +Khe5TF36JbnKVjdcL1EUNpwrWVfQpFYJ/WWm2b74qNeSZeQv5/xBhRdOmKTJFYgO96PwrHBlsnLn +a3l0LwJsloWpMbzByU5WLbRE6X5INFqjQOtIwYz5BAlhkn+kVqJvWM5vBlfrwP42ifonM5yF4ciJ +auHVks62997mNGOsM7WXNG3P98dBHPo2NhbTvHleL0BI5dus2JY81MUOnK3SGWLH8HeWPa1t5KcW +S5moAj5HexY/g/F8TctpxwsvyZp38dXeLDjSQvEQIkF7XR3YXbeZgKk3V34KGCPOAeeuQDIgyVhV +nP4HF2uWHA==""") + + +class TestLegacyResponse(unittest.TestCase): + def test_getheaders(self): + headers = {'host': 'example.com'} + r = HTTPResponse(headers=headers) + self.assertEqual(r.getheaders(), headers) + + def test_getheader(self): + headers = {'host': 'example.com'} + r = HTTPResponse(headers=headers) + self.assertEqual(r.getheader('host'), 'example.com') + + +class TestResponse(unittest.TestCase): + def test_cache_content(self): + r = HTTPResponse('foo') + self.assertEqual(r.data, 'foo') + self.assertEqual(r._body, 'foo') + + def test_default(self): + r = HTTPResponse() + self.assertEqual(r.data, None) + + def test_none(self): + r = HTTPResponse(None) + self.assertEqual(r.data, None) + + def test_preload(self): + fp = BytesIO(b'foo') + + r = HTTPResponse(fp, preload_content=True) + + self.assertEqual(fp.tell(), len(b'foo')) + self.assertEqual(r.data, b'foo') + + def test_no_preload(self): + fp = BytesIO(b'foo') + + r = HTTPResponse(fp, preload_content=False) + + self.assertEqual(fp.tell(), 0) + self.assertEqual(r.data, b'foo') + self.assertEqual(fp.tell(), len(b'foo')) + + def test_decode_bad_data(self): + fp = BytesIO(b'\x00' * 10) + self.assertRaises(DecodeError, HTTPResponse, fp, headers={ + 'content-encoding': 'deflate' + }) + + def test_decode_deflate(self): + import zlib + data = zlib.compress(b'foo') + + fp = BytesIO(data) + r = HTTPResponse(fp, headers={'content-encoding': 'deflate'}) + + self.assertEqual(r.data, b'foo') + + def test_decode_deflate_case_insensitve(self): + import zlib + data = zlib.compress(b'foo') + + fp = BytesIO(data) + r = HTTPResponse(fp, headers={'content-encoding': 'DeFlAtE'}) + + self.assertEqual(r.data, b'foo') + + def test_chunked_decoding_deflate(self): + import zlib + data = zlib.compress(b'foo') + + fp = BytesIO(data) + r = HTTPResponse(fp, headers={'content-encoding': 'deflate'}, + preload_content=False) + + self.assertEqual(r.read(3), b'') + self.assertEqual(r.read(1), b'f') + self.assertEqual(r.read(2), b'oo') + + def test_chunked_decoding_deflate2(self): + import zlib + compress = zlib.compressobj(6, zlib.DEFLATED, -zlib.MAX_WBITS) + data = compress.compress(b'foo') + data += compress.flush() + + fp = BytesIO(data) + r = HTTPResponse(fp, headers={'content-encoding': 'deflate'}, + preload_content=False) + + self.assertEqual(r.read(1), b'') + self.assertEqual(r.read(1), b'f') + self.assertEqual(r.read(2), b'oo') + + def test_chunked_decoding_gzip(self): + import zlib + compress = zlib.compressobj(6, zlib.DEFLATED, 16 + zlib.MAX_WBITS) + data = compress.compress(b'foo') + data += compress.flush() + + fp = BytesIO(data) + r = HTTPResponse(fp, headers={'content-encoding': 'gzip'}, + preload_content=False) + + self.assertEqual(r.read(11), b'') + self.assertEqual(r.read(1), b'f') + self.assertEqual(r.read(2), b'oo') + + def test_body_blob(self): + resp = HTTPResponse(b'foo') + self.assertEqual(resp.data, b'foo') + self.assertTrue(resp.closed) + + def test_io(self): + import socket + try: + from http.client import HTTPResponse as OldHTTPResponse + except: + from httplib import HTTPResponse as OldHTTPResponse + + fp = BytesIO(b'foo') + resp = HTTPResponse(fp, preload_content=False) + + self.assertEqual(resp.closed, False) + self.assertEqual(resp.readable(), True) + self.assertEqual(resp.writable(), False) + self.assertRaises(IOError, resp.fileno) + + resp.close() + self.assertEqual(resp.closed, True) + + # Try closing with an `httplib.HTTPResponse`, because it has an + # `isclosed` method. + hlr = OldHTTPResponse(socket.socket()) + resp2 = HTTPResponse(hlr, preload_content=False) + self.assertEqual(resp2.closed, False) + resp2.close() + self.assertEqual(resp2.closed, True) + + #also try when only data is present. + resp3 = HTTPResponse('foodata') + self.assertRaises(IOError, resp3.fileno) + + resp3._fp = 2 + # A corner case where _fp is present but doesn't have `closed`, + # `isclosed`, or `fileno`. Unlikely, but possible. + self.assertEqual(resp3.closed, True) + self.assertRaises(IOError, resp3.fileno) + + def test_io_bufferedreader(self): + fp = BytesIO(b'foo') + resp = HTTPResponse(fp, preload_content=False) + br = BufferedReader(resp) + + self.assertEqual(br.read(), b'foo') + + br.close() + self.assertEqual(resp.closed, True) + + b = b'fooandahalf' + fp = BytesIO(b) + resp = HTTPResponse(fp, preload_content=False) + br = BufferedReader(resp, 5) + + br.read(1) # sets up the buffer, reading 5 + self.assertEqual(len(fp.read()), len(b) - 5) + + # This is necessary to make sure the "no bytes left" part of `readinto` + # gets tested. + while not br.closed: + br.read(5) + + def test_io_readinto(self): + # This test is necessary because in py2.6, `readinto` doesn't get called + # in `test_io_bufferedreader` like it does for all the other python + # versions. Probably this is because the `io` module in py2.6 is an + # old version that has a different underlying implementation. + + + fp = BytesIO(b'foo') + resp = HTTPResponse(fp, preload_content=False) + + barr = bytearray(3) + assert resp.readinto(barr) == 3 + assert b'foo' == barr + + # The reader should already be empty, so this should read nothing. + assert resp.readinto(barr) == 0 + assert b'foo' == barr + + def test_streaming(self): + fp = BytesIO(b'foo') + resp = HTTPResponse(fp, preload_content=False) + stream = resp.stream(2, decode_content=False) + + self.assertEqual(next(stream), b'fo') + self.assertEqual(next(stream), b'o') + self.assertRaises(StopIteration, next, stream) + + def test_streaming_tell(self): + fp = BytesIO(b'foo') + resp = HTTPResponse(fp, preload_content=False) + stream = resp.stream(2, decode_content=False) + + position = 0 + + position += len(next(stream)) + self.assertEqual(2, position) + self.assertEqual(position, resp.tell()) + + position += len(next(stream)) + self.assertEqual(3, position) + self.assertEqual(position, resp.tell()) + + self.assertRaises(StopIteration, next, stream) + + def test_gzipped_streaming(self): + import zlib + compress = zlib.compressobj(6, zlib.DEFLATED, 16 + zlib.MAX_WBITS) + data = compress.compress(b'foo') + data += compress.flush() + + fp = BytesIO(data) + resp = HTTPResponse(fp, headers={'content-encoding': 'gzip'}, + preload_content=False) + stream = resp.stream(2) + + self.assertEqual(next(stream), b'f') + self.assertEqual(next(stream), b'oo') + self.assertRaises(StopIteration, next, stream) + + def test_gzipped_streaming_tell(self): + import zlib + compress = zlib.compressobj(6, zlib.DEFLATED, 16 + zlib.MAX_WBITS) + uncompressed_data = b'foo' + data = compress.compress(uncompressed_data) + data += compress.flush() + + fp = BytesIO(data) + resp = HTTPResponse(fp, headers={'content-encoding': 'gzip'}, + preload_content=False) + stream = resp.stream() + + # Read everything + payload = next(stream) + self.assertEqual(payload, uncompressed_data) + + self.assertEqual(len(data), resp.tell()) + + self.assertRaises(StopIteration, next, stream) + + def test_deflate_streaming_tell_intermediate_point(self): + # Ensure that ``tell()`` returns the correct number of bytes when + # part-way through streaming compressed content. + import zlib + + NUMBER_OF_READS = 10 + + class MockCompressedDataReading(BytesIO): + """ + A ByteIO-like reader returning ``payload`` in ``NUMBER_OF_READS`` + calls to ``read``. + """ + + def __init__(self, payload, payload_part_size): + self.payloads = [ + payload[i*payload_part_size:(i+1)*payload_part_size] + for i in range(NUMBER_OF_READS+1)] + + assert b"".join(self.payloads) == payload + + def read(self, _): + # Amount is unused. + if len(self.payloads) > 0: + return self.payloads.pop(0) + return b"" + + uncompressed_data = zlib.decompress(ZLIB_PAYLOAD) + + payload_part_size = len(ZLIB_PAYLOAD) // NUMBER_OF_READS + fp = MockCompressedDataReading(ZLIB_PAYLOAD, payload_part_size) + resp = HTTPResponse(fp, headers={'content-encoding': 'deflate'}, + preload_content=False) + stream = resp.stream() + + parts_positions = [(part, resp.tell()) for part in stream] + end_of_stream = resp.tell() + + self.assertRaises(StopIteration, next, stream) + + parts, positions = zip(*parts_positions) + + # Check that the payload is equal to the uncompressed data + payload = b"".join(parts) + self.assertEqual(uncompressed_data, payload) + + # Check that the positions in the stream are correct + expected = [(i+1)*payload_part_size for i in range(NUMBER_OF_READS)] + self.assertEqual(expected, list(positions)) + + # Check that the end of the stream is in the correct place + self.assertEqual(len(ZLIB_PAYLOAD), end_of_stream) + + def test_deflate_streaming(self): + import zlib + data = zlib.compress(b'foo') + + fp = BytesIO(data) + resp = HTTPResponse(fp, headers={'content-encoding': 'deflate'}, + preload_content=False) + stream = resp.stream(2) + + self.assertEqual(next(stream), b'f') + self.assertEqual(next(stream), b'oo') + self.assertRaises(StopIteration, next, stream) + + def test_deflate2_streaming(self): + import zlib + compress = zlib.compressobj(6, zlib.DEFLATED, -zlib.MAX_WBITS) + data = compress.compress(b'foo') + data += compress.flush() + + fp = BytesIO(data) + resp = HTTPResponse(fp, headers={'content-encoding': 'deflate'}, + preload_content=False) + stream = resp.stream(2) + + self.assertEqual(next(stream), b'f') + self.assertEqual(next(stream), b'oo') + self.assertRaises(StopIteration, next, stream) + + def test_empty_stream(self): + fp = BytesIO(b'') + resp = HTTPResponse(fp, preload_content=False) + stream = resp.stream(2, decode_content=False) + + self.assertRaises(StopIteration, next, stream) + + def test_mock_httpresponse_stream(self): + # Mock out a HTTP Request that does enough to make it through urllib3's + # read() and close() calls, and also exhausts and underlying file + # object. + class MockHTTPRequest(object): + self.fp = None + + def read(self, amt): + data = self.fp.read(amt) + if not data: + self.fp = None + + return data + + def close(self): + self.fp = None + + bio = BytesIO(b'foo') + fp = MockHTTPRequest() + fp.fp = bio + resp = HTTPResponse(fp, preload_content=False) + stream = resp.stream(2) + + self.assertEqual(next(stream), b'fo') + self.assertEqual(next(stream), b'o') + self.assertRaises(StopIteration, next, stream) + + def test_get_case_insensitive_headers(self): + headers = {'host': 'example.com'} + r = HTTPResponse(headers=headers) + self.assertEqual(r.headers.get('host'), 'example.com') + self.assertEqual(r.headers.get('Host'), 'example.com') + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_response.pyc b/test/test_response.pyc Binary files differnew file mode 100644 index 0000000..99e5c0e --- /dev/null +++ b/test/test_response.pyc diff --git a/test/test_retry.py b/test/test_retry.py new file mode 100644 index 0000000..7a3aa40 --- /dev/null +++ b/test/test_retry.py @@ -0,0 +1,156 @@ +import unittest + +from urllib3.packages.six.moves import xrange +from urllib3.util.retry import Retry +from urllib3.exceptions import ( + ConnectTimeoutError, + ReadTimeoutError, + MaxRetryError +) + + +class RetryTest(unittest.TestCase): + + def test_string(self): + """ Retry string representation looks the way we expect """ + retry = Retry() + self.assertEqual(str(retry), 'Retry(total=10, connect=None, read=None, redirect=None)') + for _ in range(3): + retry = retry.increment() + self.assertEqual(str(retry), 'Retry(total=7, connect=None, read=None, redirect=None)') + + def test_retry_both_specified(self): + """Total can win if it's lower than the connect value""" + error = ConnectTimeoutError() + retry = Retry(connect=3, total=2) + retry = retry.increment(error=error) + retry = retry.increment(error=error) + try: + retry.increment(error=error) + self.fail("Failed to raise error.") + except MaxRetryError as e: + self.assertEqual(e.reason, error) + + def test_retry_higher_total_loses(self): + """ A lower connect timeout than the total is honored """ + error = ConnectTimeoutError() + retry = Retry(connect=2, total=3) + retry = retry.increment(error=error) + retry = retry.increment(error=error) + self.assertRaises(MaxRetryError, retry.increment, error=error) + + def test_retry_higher_total_loses_vs_read(self): + """ A lower read timeout than the total is honored """ + error = ReadTimeoutError(None, "/", "read timed out") + retry = Retry(read=2, total=3) + retry = retry.increment(error=error) + retry = retry.increment(error=error) + self.assertRaises(MaxRetryError, retry.increment, error=error) + + def test_retry_total_none(self): + """ if Total is none, connect error should take precedence """ + error = ConnectTimeoutError() + retry = Retry(connect=2, total=None) + retry = retry.increment(error=error) + retry = retry.increment(error=error) + try: + retry.increment(error=error) + self.fail("Failed to raise error.") + except MaxRetryError as e: + self.assertEqual(e.reason, error) + + error = ReadTimeoutError(None, "/", "read timed out") + retry = Retry(connect=2, total=None) + retry = retry.increment(error=error) + retry = retry.increment(error=error) + retry = retry.increment(error=error) + self.assertFalse(retry.is_exhausted()) + + def test_retry_default(self): + """ If no value is specified, should retry connects 3 times """ + retry = Retry() + self.assertEqual(retry.total, 10) + self.assertEqual(retry.connect, None) + self.assertEqual(retry.read, None) + self.assertEqual(retry.redirect, None) + + error = ConnectTimeoutError() + retry = Retry(connect=1) + retry = retry.increment(error=error) + self.assertRaises(MaxRetryError, retry.increment, error=error) + + retry = Retry(connect=1) + retry = retry.increment(error=error) + self.assertFalse(retry.is_exhausted()) + + self.assertTrue(Retry(0).raise_on_redirect) + self.assertFalse(Retry(False).raise_on_redirect) + + def test_retry_read_zero(self): + """ No second chances on read timeouts, by default """ + error = ReadTimeoutError(None, "/", "read timed out") + retry = Retry(read=0) + try: + retry.increment(error=error) + self.fail("Failed to raise error.") + except MaxRetryError as e: + self.assertEqual(e.reason, error) + + def test_backoff(self): + """ Backoff is computed correctly """ + max_backoff = Retry.BACKOFF_MAX + + retry = Retry(total=100, backoff_factor=0.2) + self.assertEqual(retry.get_backoff_time(), 0) # First request + + retry = retry.increment() + self.assertEqual(retry.get_backoff_time(), 0) # First retry + + retry = retry.increment() + self.assertEqual(retry.backoff_factor, 0.2) + self.assertEqual(retry.total, 98) + self.assertEqual(retry.get_backoff_time(), 0.4) # Start backoff + + retry = retry.increment() + self.assertEqual(retry.get_backoff_time(), 0.8) + + retry = retry.increment() + self.assertEqual(retry.get_backoff_time(), 1.6) + + for i in xrange(10): + retry = retry.increment() + + self.assertEqual(retry.get_backoff_time(), max_backoff) + + def test_zero_backoff(self): + retry = Retry() + self.assertEqual(retry.get_backoff_time(), 0) + retry = retry.increment() + retry = retry.increment() + self.assertEqual(retry.get_backoff_time(), 0) + + def test_sleep(self): + # sleep a very small amount of time so our code coverage is happy + retry = Retry(backoff_factor=0.0001) + retry = retry.increment() + retry = retry.increment() + retry.sleep() + + def test_status_forcelist(self): + retry = Retry(status_forcelist=xrange(500,600)) + self.assertFalse(retry.is_forced_retry('GET', status_code=200)) + self.assertFalse(retry.is_forced_retry('GET', status_code=400)) + self.assertTrue(retry.is_forced_retry('GET', status_code=500)) + + retry = Retry(total=1, status_forcelist=[418]) + self.assertFalse(retry.is_forced_retry('GET', status_code=400)) + self.assertTrue(retry.is_forced_retry('GET', status_code=418)) + + def test_exhausted(self): + self.assertFalse(Retry(0).is_exhausted()) + self.assertTrue(Retry(-1).is_exhausted()) + self.assertEqual(Retry(1).increment().total, 0) + + def test_disabled(self): + self.assertRaises(MaxRetryError, Retry(-1).increment) + self.assertRaises(MaxRetryError, Retry(0).increment) diff --git a/test/test_retry.pyc b/test/test_retry.pyc Binary files differnew file mode 100644 index 0000000..398c010 --- /dev/null +++ b/test/test_retry.pyc diff --git a/test/test_util.py b/test/test_util.py new file mode 100644 index 0000000..1811dbd --- /dev/null +++ b/test/test_util.py @@ -0,0 +1,357 @@ +import warnings +import logging +import unittest +import ssl + +from mock import patch + +from urllib3 import add_stderr_logger, disable_warnings +from urllib3.util.request import make_headers +from urllib3.util.timeout import Timeout +from urllib3.util.url import ( + get_host, + parse_url, + split_first, + Url, +) +from urllib3.util.ssl_ import resolve_cert_reqs +from urllib3.exceptions import ( + LocationParseError, + TimeoutStateError, + InsecureRequestWarning, +) + +from urllib3.util import is_fp_closed + +from . import clear_warnings + +# This number represents a time in seconds, it doesn't mean anything in +# isolation. Setting to a high-ish value to avoid conflicts with the smaller +# numbers used for timeouts +TIMEOUT_EPOCH = 1000 + +class TestUtil(unittest.TestCase): + def test_get_host(self): + url_host_map = { + # Hosts + 'http://google.com/mail': ('http', 'google.com', None), + 'http://google.com/mail/': ('http', 'google.com', None), + 'google.com/mail': ('http', 'google.com', None), + 'http://google.com/': ('http', 'google.com', None), + 'http://google.com': ('http', 'google.com', None), + 'http://www.google.com': ('http', 'www.google.com', None), + 'http://mail.google.com': ('http', 'mail.google.com', None), + 'http://google.com:8000/mail/': ('http', 'google.com', 8000), + 'http://google.com:8000': ('http', 'google.com', 8000), + 'https://google.com': ('https', 'google.com', None), + 'https://google.com:8000': ('https', 'google.com', 8000), + 'http://user:password@127.0.0.1:1234': ('http', '127.0.0.1', 1234), + 'http://google.com/foo=http://bar:42/baz': ('http', 'google.com', None), + 'http://google.com?foo=http://bar:42/baz': ('http', 'google.com', None), + 'http://google.com#foo=http://bar:42/baz': ('http', 'google.com', None), + + # IPv4 + '173.194.35.7': ('http', '173.194.35.7', None), + 'http://173.194.35.7': ('http', '173.194.35.7', None), + 'http://173.194.35.7/test': ('http', '173.194.35.7', None), + 'http://173.194.35.7:80': ('http', '173.194.35.7', 80), + 'http://173.194.35.7:80/test': ('http', '173.194.35.7', 80), + + # IPv6 + '[2a00:1450:4001:c01::67]': ('http', '[2a00:1450:4001:c01::67]', None), + 'http://[2a00:1450:4001:c01::67]': ('http', '[2a00:1450:4001:c01::67]', None), + 'http://[2a00:1450:4001:c01::67]/test': ('http', '[2a00:1450:4001:c01::67]', None), + 'http://[2a00:1450:4001:c01::67]:80': ('http', '[2a00:1450:4001:c01::67]', 80), + 'http://[2a00:1450:4001:c01::67]:80/test': ('http', '[2a00:1450:4001:c01::67]', 80), + + # More IPv6 from http://www.ietf.org/rfc/rfc2732.txt + 'http://[FEDC:BA98:7654:3210:FEDC:BA98:7654:3210]:8000/index.html': ('http', '[FEDC:BA98:7654:3210:FEDC:BA98:7654:3210]', 8000), + 'http://[1080:0:0:0:8:800:200C:417A]/index.html': ('http', '[1080:0:0:0:8:800:200C:417A]', None), + 'http://[3ffe:2a00:100:7031::1]': ('http', '[3ffe:2a00:100:7031::1]', None), + 'http://[1080::8:800:200C:417A]/foo': ('http', '[1080::8:800:200C:417A]', None), + 'http://[::192.9.5.5]/ipng': ('http', '[::192.9.5.5]', None), + 'http://[::FFFF:129.144.52.38]:42/index.html': ('http', '[::FFFF:129.144.52.38]', 42), + 'http://[2010:836B:4179::836B:4179]': ('http', '[2010:836B:4179::836B:4179]', None), + } + for url, expected_host in url_host_map.items(): + returned_host = get_host(url) + self.assertEqual(returned_host, expected_host) + + def test_invalid_host(self): + # TODO: Add more tests + invalid_host = [ + 'http://google.com:foo', + 'http://::1/', + 'http://::1:80/', + ] + + for location in invalid_host: + self.assertRaises(LocationParseError, get_host, location) + + + def test_parse_url(self): + url_host_map = { + 'http://google.com/mail': Url('http', host='google.com', path='/mail'), + 'http://google.com/mail/': Url('http', host='google.com', path='/mail/'), + 'google.com/mail': Url(host='google.com', path='/mail'), + 'http://google.com/': Url('http', host='google.com', path='/'), + 'http://google.com': Url('http', host='google.com'), + 'http://google.com?foo': Url('http', host='google.com', path='', query='foo'), + + # Path/query/fragment + '': Url(), + '/': Url(path='/'), + '?': Url(path='', query=''), + '#': Url(path='', fragment=''), + '#?/!google.com/?foo#bar': Url(path='', fragment='?/!google.com/?foo#bar'), + '/foo': Url(path='/foo'), + '/foo?bar=baz': Url(path='/foo', query='bar=baz'), + '/foo?bar=baz#banana?apple/orange': Url(path='/foo', query='bar=baz', fragment='banana?apple/orange'), + + # Port + 'http://google.com/': Url('http', host='google.com', path='/'), + 'http://google.com:80/': Url('http', host='google.com', port=80, path='/'), + 'http://google.com:/': Url('http', host='google.com', path='/'), + 'http://google.com:80': Url('http', host='google.com', port=80), + 'http://google.com:': Url('http', host='google.com'), + + # Auth + 'http://foo:bar@localhost/': Url('http', auth='foo:bar', host='localhost', path='/'), + 'http://foo@localhost/': Url('http', auth='foo', host='localhost', path='/'), + 'http://foo:bar@baz@localhost/': Url('http', auth='foo:bar@baz', host='localhost', path='/'), + 'http://@': Url('http', host=None, auth='') + } + for url, expected_url in url_host_map.items(): + returned_url = parse_url(url) + self.assertEqual(returned_url, expected_url) + + def test_parse_url_invalid_IPv6(self): + self.assertRaises(ValueError, parse_url, '[::1') + + def test_request_uri(self): + url_host_map = { + 'http://google.com/mail': '/mail', + 'http://google.com/mail/': '/mail/', + 'http://google.com/': '/', + 'http://google.com': '/', + '': '/', + '/': '/', + '?': '/?', + '#': '/', + '/foo?bar=baz': '/foo?bar=baz', + } + for url, expected_request_uri in url_host_map.items(): + returned_url = parse_url(url) + self.assertEqual(returned_url.request_uri, expected_request_uri) + + def test_netloc(self): + url_netloc_map = { + 'http://google.com/mail': 'google.com', + 'http://google.com:80/mail': 'google.com:80', + 'google.com/foobar': 'google.com', + 'google.com:12345': 'google.com:12345', + } + + for url, expected_netloc in url_netloc_map.items(): + self.assertEqual(parse_url(url).netloc, expected_netloc) + + def test_make_headers(self): + self.assertEqual( + make_headers(accept_encoding=True), + {'accept-encoding': 'gzip,deflate'}) + + self.assertEqual( + make_headers(accept_encoding='foo,bar'), + {'accept-encoding': 'foo,bar'}) + + self.assertEqual( + make_headers(accept_encoding=['foo', 'bar']), + {'accept-encoding': 'foo,bar'}) + + self.assertEqual( + make_headers(accept_encoding=True, user_agent='banana'), + {'accept-encoding': 'gzip,deflate', 'user-agent': 'banana'}) + + self.assertEqual( + make_headers(user_agent='banana'), + {'user-agent': 'banana'}) + + self.assertEqual( + make_headers(keep_alive=True), + {'connection': 'keep-alive'}) + + self.assertEqual( + make_headers(basic_auth='foo:bar'), + {'authorization': 'Basic Zm9vOmJhcg=='}) + + self.assertEqual( + make_headers(proxy_basic_auth='foo:bar'), + {'proxy-authorization': 'Basic Zm9vOmJhcg=='}) + + self.assertEqual( + make_headers(disable_cache=True), + {'cache-control': 'no-cache'}) + + def test_split_first(self): + test_cases = { + ('abcd', 'b'): ('a', 'cd', 'b'), + ('abcd', 'cb'): ('a', 'cd', 'b'), + ('abcd', ''): ('abcd', '', None), + ('abcd', 'a'): ('', 'bcd', 'a'), + ('abcd', 'ab'): ('', 'bcd', 'a'), + } + for input, expected in test_cases.items(): + output = split_first(*input) + self.assertEqual(output, expected) + + def test_add_stderr_logger(self): + handler = add_stderr_logger(level=logging.INFO) # Don't actually print debug + logger = logging.getLogger('urllib3') + self.assertTrue(handler in logger.handlers) + + logger.debug('Testing add_stderr_logger') + logger.removeHandler(handler) + + def test_disable_warnings(self): + with warnings.catch_warnings(record=True) as w: + clear_warnings() + warnings.warn('This is a test.', InsecureRequestWarning) + self.assertEqual(len(w), 1) + disable_warnings() + warnings.warn('This is a test.', InsecureRequestWarning) + self.assertEqual(len(w), 1) + + def _make_time_pass(self, seconds, timeout, time_mock): + """ Make some time pass for the timeout object """ + time_mock.return_value = TIMEOUT_EPOCH + timeout.start_connect() + time_mock.return_value = TIMEOUT_EPOCH + seconds + return timeout + + def test_invalid_timeouts(self): + try: + Timeout(total=-1) + self.fail("negative value should throw exception") + except ValueError as e: + self.assertTrue('less than' in str(e)) + try: + Timeout(connect=2, total=-1) + self.fail("negative value should throw exception") + except ValueError as e: + self.assertTrue('less than' in str(e)) + + try: + Timeout(read=-1) + self.fail("negative value should throw exception") + except ValueError as e: + self.assertTrue('less than' in str(e)) + + # Booleans are allowed also by socket.settimeout and converted to the + # equivalent float (1.0 for True, 0.0 for False) + Timeout(connect=False, read=True) + + try: + Timeout(read="foo") + self.fail("string value should not be allowed") + except ValueError as e: + self.assertTrue('int or float' in str(e)) + + + @patch('urllib3.util.timeout.current_time') + def test_timeout(self, current_time): + timeout = Timeout(total=3) + + # make 'no time' elapse + timeout = self._make_time_pass(seconds=0, timeout=timeout, + time_mock=current_time) + self.assertEqual(timeout.read_timeout, 3) + self.assertEqual(timeout.connect_timeout, 3) + + timeout = Timeout(total=3, connect=2) + self.assertEqual(timeout.connect_timeout, 2) + + timeout = Timeout() + self.assertEqual(timeout.connect_timeout, Timeout.DEFAULT_TIMEOUT) + + # Connect takes 5 seconds, leaving 5 seconds for read + timeout = Timeout(total=10, read=7) + timeout = self._make_time_pass(seconds=5, timeout=timeout, + time_mock=current_time) + self.assertEqual(timeout.read_timeout, 5) + + # Connect takes 2 seconds, read timeout still 7 seconds + timeout = Timeout(total=10, read=7) + timeout = self._make_time_pass(seconds=2, timeout=timeout, + time_mock=current_time) + self.assertEqual(timeout.read_timeout, 7) + + timeout = Timeout(total=10, read=7) + self.assertEqual(timeout.read_timeout, 7) + + timeout = Timeout(total=None, read=None, connect=None) + self.assertEqual(timeout.connect_timeout, None) + self.assertEqual(timeout.read_timeout, None) + self.assertEqual(timeout.total, None) + + timeout = Timeout(5) + self.assertEqual(timeout.total, 5) + + + def test_timeout_str(self): + timeout = Timeout(connect=1, read=2, total=3) + self.assertEqual(str(timeout), "Timeout(connect=1, read=2, total=3)") + timeout = Timeout(connect=1, read=None, total=3) + self.assertEqual(str(timeout), "Timeout(connect=1, read=None, total=3)") + + + @patch('urllib3.util.timeout.current_time') + def test_timeout_elapsed(self, current_time): + current_time.return_value = TIMEOUT_EPOCH + timeout = Timeout(total=3) + self.assertRaises(TimeoutStateError, timeout.get_connect_duration) + + timeout.start_connect() + self.assertRaises(TimeoutStateError, timeout.start_connect) + + current_time.return_value = TIMEOUT_EPOCH + 2 + self.assertEqual(timeout.get_connect_duration(), 2) + current_time.return_value = TIMEOUT_EPOCH + 37 + self.assertEqual(timeout.get_connect_duration(), 37) + + def test_resolve_cert_reqs(self): + self.assertEqual(resolve_cert_reqs(None), ssl.CERT_NONE) + self.assertEqual(resolve_cert_reqs(ssl.CERT_NONE), ssl.CERT_NONE) + + self.assertEqual(resolve_cert_reqs(ssl.CERT_REQUIRED), ssl.CERT_REQUIRED) + self.assertEqual(resolve_cert_reqs('REQUIRED'), ssl.CERT_REQUIRED) + self.assertEqual(resolve_cert_reqs('CERT_REQUIRED'), ssl.CERT_REQUIRED) + + def test_is_fp_closed_object_supports_closed(self): + class ClosedFile(object): + @property + def closed(self): + return True + + self.assertTrue(is_fp_closed(ClosedFile())) + + def test_is_fp_closed_object_has_none_fp(self): + class NoneFpFile(object): + @property + def fp(self): + return None + + self.assertTrue(is_fp_closed(NoneFpFile())) + + def test_is_fp_closed_object_has_fp(self): + class FpFile(object): + @property + def fp(self): + return True + + self.assertTrue(not is_fp_closed(FpFile())) + + def test_is_fp_closed_object_has_neither_fp_nor_closed(self): + class NotReallyAFile(object): + pass + + self.assertRaises(ValueError, is_fp_closed, NotReallyAFile()) diff --git a/test/test_util.pyc b/test/test_util.pyc Binary files differnew file mode 100644 index 0000000..0500c3b --- /dev/null +++ b/test/test_util.pyc diff --git a/test/with_dummyserver/__init__.py b/test/with_dummyserver/__init__.py new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/test/with_dummyserver/__init__.py diff --git a/test/with_dummyserver/__init__.pyc b/test/with_dummyserver/__init__.pyc Binary files differnew file mode 100644 index 0000000..833be60 --- /dev/null +++ b/test/with_dummyserver/__init__.pyc diff --git a/test/with_dummyserver/test_connectionpool.py b/test/with_dummyserver/test_connectionpool.py new file mode 100644 index 0000000..7d54fbf --- /dev/null +++ b/test/with_dummyserver/test_connectionpool.py @@ -0,0 +1,706 @@ +import errno +import logging +import socket +import sys +import unittest +import time + +import mock + +try: + from urllib.parse import urlencode +except: + from urllib import urlencode + +from .. import ( + requires_network, + onlyPy3, onlyPy27OrNewer, onlyPy26OrOlder, + TARPIT_HOST, VALID_SOURCE_ADDRESSES, INVALID_SOURCE_ADDRESSES, +) +from ..port_helpers import find_unused_port +from urllib3 import ( + encode_multipart_formdata, + HTTPConnectionPool, +) +from urllib3.exceptions import ( + ConnectTimeoutError, + EmptyPoolError, + DecodeError, + MaxRetryError, + ReadTimeoutError, + ProtocolError, +) +from urllib3.packages.six import b, u +from urllib3.util.retry import Retry +from urllib3.util.timeout import Timeout + +import tornado +from dummyserver.testcase import HTTPDummyServerTestCase + +from nose.tools import timed + +log = logging.getLogger('urllib3.connectionpool') +log.setLevel(logging.NOTSET) +log.addHandler(logging.StreamHandler(sys.stdout)) + + +class TestConnectionPool(HTTPDummyServerTestCase): + + def setUp(self): + self.pool = HTTPConnectionPool(self.host, self.port) + + def test_get(self): + r = self.pool.request('GET', '/specific_method', + fields={'method': 'GET'}) + self.assertEqual(r.status, 200, r.data) + + def test_post_url(self): + r = self.pool.request('POST', '/specific_method', + fields={'method': 'POST'}) + self.assertEqual(r.status, 200, r.data) + + def test_urlopen_put(self): + r = self.pool.urlopen('PUT', '/specific_method?method=PUT') + self.assertEqual(r.status, 200, r.data) + + def test_wrong_specific_method(self): + # To make sure the dummy server is actually returning failed responses + r = self.pool.request('GET', '/specific_method', + fields={'method': 'POST'}) + self.assertEqual(r.status, 400, r.data) + + r = self.pool.request('POST', '/specific_method', + fields={'method': 'GET'}) + self.assertEqual(r.status, 400, r.data) + + def test_upload(self): + data = "I'm in ur multipart form-data, hazing a cheezburgr" + fields = { + 'upload_param': 'filefield', + 'upload_filename': 'lolcat.txt', + 'upload_size': len(data), + 'filefield': ('lolcat.txt', data), + } + + r = self.pool.request('POST', '/upload', fields=fields) + self.assertEqual(r.status, 200, r.data) + + def test_one_name_multiple_values(self): + fields = [ + ('foo', 'a'), + ('foo', 'b'), + ] + + # urlencode + r = self.pool.request('GET', '/echo', fields=fields) + self.assertEqual(r.data, b'foo=a&foo=b') + + # multipart + r = self.pool.request('POST', '/echo', fields=fields) + self.assertEqual(r.data.count(b'name="foo"'), 2) + + + def test_unicode_upload(self): + fieldname = u('myfile') + filename = u('\xe2\x99\xa5.txt') + data = u('\xe2\x99\xa5').encode('utf8') + size = len(data) + + fields = { + u('upload_param'): fieldname, + u('upload_filename'): filename, + u('upload_size'): size, + fieldname: (filename, data), + } + + r = self.pool.request('POST', '/upload', fields=fields) + self.assertEqual(r.status, 200, r.data) + + def test_timeout_float(self): + url = '/sleep?seconds=0.005' + # Pool-global timeout + pool = HTTPConnectionPool(self.host, self.port, timeout=0.001, retries=False) + self.assertRaises(ReadTimeoutError, pool.request, 'GET', url) + + def test_conn_closed(self): + pool = HTTPConnectionPool(self.host, self.port, timeout=0.001, retries=False) + conn = pool._get_conn() + pool._put_conn(conn) + try: + url = '/sleep?seconds=0.005' + pool.urlopen('GET', url) + self.fail("The request should fail with a timeout error.") + except ReadTimeoutError: + if conn.sock: + self.assertRaises(socket.error, conn.sock.recv, 1024) + finally: + pool._put_conn(conn) + + def test_nagle(self): + """ Test that connections have TCP_NODELAY turned on """ + # This test needs to be here in order to be run. socket.create_connection actually tries to + # connect to the host provided so we need a dummyserver to be running. + pool = HTTPConnectionPool(self.host, self.port) + conn = pool._get_conn() + pool._make_request(conn, 'GET', '/') + tcp_nodelay_setting = conn.sock.getsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY) + assert tcp_nodelay_setting > 0, ("Expected TCP_NODELAY to be set on the " + "socket (with value greater than 0) " + "but instead was %s" % + tcp_nodelay_setting) + + def test_socket_options(self): + """Test that connections accept socket options.""" + # This test needs to be here in order to be run. socket.create_connection actually tries to + # connect to the host provided so we need a dummyserver to be running. + pool = HTTPConnectionPool(self.host, self.port, socket_options=[ + (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) + ]) + s = pool._new_conn()._new_conn() # Get the socket + using_keepalive = s.getsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE) > 0 + self.assertTrue(using_keepalive) + s.close() + + def test_disable_default_socket_options(self): + """Test that passing None disables all socket options.""" + # This test needs to be here in order to be run. socket.create_connection actually tries to + # connect to the host provided so we need a dummyserver to be running. + pool = HTTPConnectionPool(self.host, self.port, socket_options=None) + s = pool._new_conn()._new_conn() + using_nagle = s.getsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY) == 0 + self.assertTrue(using_nagle) + s.close() + + def test_defaults_are_applied(self): + """Test that modifying the default socket options works.""" + # This test needs to be here in order to be run. socket.create_connection actually tries to + # connect to the host provided so we need a dummyserver to be running. + pool = HTTPConnectionPool(self.host, self.port) + # Get the HTTPConnection instance + conn = pool._new_conn() + # Update the default socket options + conn.default_socket_options += [(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)] + s = conn._new_conn() + nagle_disabled = s.getsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY) > 0 + using_keepalive = s.getsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE) > 0 + self.assertTrue(nagle_disabled) + self.assertTrue(using_keepalive) + + @timed(0.5) + def test_timeout(self): + """ Requests should time out when expected """ + url = '/sleep?seconds=0.002' + timeout = Timeout(read=0.001) + + # Pool-global timeout + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout, retries=False) + + conn = pool._get_conn() + self.assertRaises(ReadTimeoutError, pool._make_request, + conn, 'GET', url) + pool._put_conn(conn) + + time.sleep(0.02) # Wait for server to start receiving again. :( + + self.assertRaises(ReadTimeoutError, pool.request, 'GET', url) + + # Request-specific timeouts should raise errors + pool = HTTPConnectionPool(self.host, self.port, timeout=0.1, retries=False) + + conn = pool._get_conn() + self.assertRaises(ReadTimeoutError, pool._make_request, + conn, 'GET', url, timeout=timeout) + pool._put_conn(conn) + + time.sleep(0.02) # Wait for server to start receiving again. :( + + self.assertRaises(ReadTimeoutError, pool.request, + 'GET', url, timeout=timeout) + + # Timeout int/float passed directly to request and _make_request should + # raise a request timeout + self.assertRaises(ReadTimeoutError, pool.request, + 'GET', url, timeout=0.001) + conn = pool._new_conn() + self.assertRaises(ReadTimeoutError, pool._make_request, conn, + 'GET', url, timeout=0.001) + pool._put_conn(conn) + + # Timeout int/float passed directly to _make_request should not raise a + # request timeout if it's a high value + pool.request('GET', url, timeout=1) + + @requires_network + @timed(0.5) + def test_connect_timeout(self): + url = '/sleep?seconds=0.005' + timeout = Timeout(connect=0.001) + + # Pool-global timeout + pool = HTTPConnectionPool(TARPIT_HOST, self.port, timeout=timeout) + conn = pool._get_conn() + self.assertRaises(ConnectTimeoutError, pool._make_request, conn, 'GET', url) + + # Retries + retries = Retry(connect=0) + self.assertRaises(MaxRetryError, pool.request, 'GET', url, + retries=retries) + + # Request-specific connection timeouts + big_timeout = Timeout(read=0.2, connect=0.2) + pool = HTTPConnectionPool(TARPIT_HOST, self.port, + timeout=big_timeout, retries=False) + conn = pool._get_conn() + self.assertRaises(ConnectTimeoutError, pool._make_request, conn, 'GET', + url, timeout=timeout) + + pool._put_conn(conn) + self.assertRaises(ConnectTimeoutError, pool.request, 'GET', url, + timeout=timeout) + + + def test_connection_error_retries(self): + """ ECONNREFUSED error should raise a connection error, with retries """ + port = find_unused_port() + pool = HTTPConnectionPool(self.host, port) + try: + pool.request('GET', '/', retries=Retry(connect=3)) + self.fail("Should have failed with a connection error.") + except MaxRetryError as e: + self.assertTrue(isinstance(e.reason, ProtocolError)) + self.assertEqual(e.reason.args[1].errno, errno.ECONNREFUSED) + + def test_timeout_reset(self): + """ If the read timeout isn't set, socket timeout should reset """ + url = '/sleep?seconds=0.005' + timeout = Timeout(connect=0.001) + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + conn = pool._get_conn() + try: + pool._make_request(conn, 'GET', url) + except ReadTimeoutError: + self.fail("This request shouldn't trigger a read timeout.") + + @requires_network + @timed(5.0) + def test_total_timeout(self): + url = '/sleep?seconds=0.005' + + timeout = Timeout(connect=3, read=5, total=0.001) + pool = HTTPConnectionPool(TARPIT_HOST, self.port, timeout=timeout) + conn = pool._get_conn() + self.assertRaises(ConnectTimeoutError, pool._make_request, conn, 'GET', url) + + # This will get the socket to raise an EAGAIN on the read + timeout = Timeout(connect=3, read=0) + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + conn = pool._get_conn() + self.assertRaises(ReadTimeoutError, pool._make_request, conn, 'GET', url) + + # The connect should succeed and this should hit the read timeout + timeout = Timeout(connect=3, read=5, total=0.002) + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + conn = pool._get_conn() + self.assertRaises(ReadTimeoutError, pool._make_request, conn, 'GET', url) + + @requires_network + def test_none_total_applies_connect(self): + url = '/sleep?seconds=0.005' + timeout = Timeout(total=None, connect=0.001) + pool = HTTPConnectionPool(TARPIT_HOST, self.port, timeout=timeout) + conn = pool._get_conn() + self.assertRaises(ConnectTimeoutError, pool._make_request, conn, 'GET', + url) + + def test_timeout_success(self): + timeout = Timeout(connect=3, read=5, total=None) + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + pool.request('GET', '/') + # This should not raise a "Timeout already started" error + pool.request('GET', '/') + + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + # This should also not raise a "Timeout already started" error + pool.request('GET', '/') + + timeout = Timeout(total=None) + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + pool.request('GET', '/') + + def test_tunnel(self): + # note the actual httplib.py has no tests for this functionality + timeout = Timeout(total=None) + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + conn = pool._get_conn() + try: + conn.set_tunnel(self.host, self.port) + except AttributeError: # python 2.6 + conn._set_tunnel(self.host, self.port) + + conn._tunnel = mock.Mock(return_value=None) + pool._make_request(conn, 'GET', '/') + conn._tunnel.assert_called_once_with() + + # test that it's not called when tunnel is not set + timeout = Timeout(total=None) + pool = HTTPConnectionPool(self.host, self.port, timeout=timeout) + conn = pool._get_conn() + + conn._tunnel = mock.Mock(return_value=None) + pool._make_request(conn, 'GET', '/') + self.assertEqual(conn._tunnel.called, False) + + def test_redirect(self): + r = self.pool.request('GET', '/redirect', fields={'target': '/'}, redirect=False) + self.assertEqual(r.status, 303) + + r = self.pool.request('GET', '/redirect', fields={'target': '/'}) + self.assertEqual(r.status, 200) + self.assertEqual(r.data, b'Dummy server!') + + def test_bad_connect(self): + pool = HTTPConnectionPool('badhost.invalid', self.port) + try: + pool.request('GET', '/', retries=5) + self.fail("should raise timeout exception here") + except MaxRetryError as e: + self.assertTrue(isinstance(e.reason, ProtocolError), e.reason) + + def test_keepalive(self): + pool = HTTPConnectionPool(self.host, self.port, block=True, maxsize=1) + + r = pool.request('GET', '/keepalive?close=0') + r = pool.request('GET', '/keepalive?close=0') + + self.assertEqual(r.status, 200) + self.assertEqual(pool.num_connections, 1) + self.assertEqual(pool.num_requests, 2) + + def test_keepalive_close(self): + pool = HTTPConnectionPool(self.host, self.port, + block=True, maxsize=1, timeout=2) + + r = pool.request('GET', '/keepalive?close=1', retries=0, + headers={ + "Connection": "close", + }) + + self.assertEqual(pool.num_connections, 1) + + # The dummyserver will have responded with Connection:close, + # and httplib will properly cleanup the socket. + + # We grab the HTTPConnection object straight from the Queue, + # because _get_conn() is where the check & reset occurs + # pylint: disable-msg=W0212 + conn = pool.pool.get() + self.assertEqual(conn.sock, None) + pool._put_conn(conn) + + # Now with keep-alive + r = pool.request('GET', '/keepalive?close=0', retries=0, + headers={ + "Connection": "keep-alive", + }) + + # The dummyserver responded with Connection:keep-alive, the connection + # persists. + conn = pool.pool.get() + self.assertNotEqual(conn.sock, None) + pool._put_conn(conn) + + # Another request asking the server to close the connection. This one + # should get cleaned up for the next request. + r = pool.request('GET', '/keepalive?close=1', retries=0, + headers={ + "Connection": "close", + }) + + self.assertEqual(r.status, 200) + + conn = pool.pool.get() + self.assertEqual(conn.sock, None) + pool._put_conn(conn) + + # Next request + r = pool.request('GET', '/keepalive?close=0') + + def test_post_with_urlencode(self): + data = {'banana': 'hammock', 'lol': 'cat'} + r = self.pool.request('POST', '/echo', fields=data, encode_multipart=False) + self.assertEqual(r.data.decode('utf-8'), urlencode(data)) + + def test_post_with_multipart(self): + data = {'banana': 'hammock', 'lol': 'cat'} + r = self.pool.request('POST', '/echo', + fields=data, + encode_multipart=True) + body = r.data.split(b'\r\n') + + encoded_data = encode_multipart_formdata(data)[0] + expected_body = encoded_data.split(b'\r\n') + + # TODO: Get rid of extra parsing stuff when you can specify + # a custom boundary to encode_multipart_formdata + """ + We need to loop the return lines because a timestamp is attached + from within encode_multipart_formdata. When the server echos back + the data, it has the timestamp from when the data was encoded, which + is not equivalent to when we run encode_multipart_formdata on + the data again. + """ + for i, line in enumerate(body): + if line.startswith(b'--'): + continue + + self.assertEqual(body[i], expected_body[i]) + + def test_check_gzip(self): + r = self.pool.request('GET', '/encodingrequest', + headers={'accept-encoding': 'gzip'}) + self.assertEqual(r.headers.get('content-encoding'), 'gzip') + self.assertEqual(r.data, b'hello, world!') + + def test_check_deflate(self): + r = self.pool.request('GET', '/encodingrequest', + headers={'accept-encoding': 'deflate'}) + self.assertEqual(r.headers.get('content-encoding'), 'deflate') + self.assertEqual(r.data, b'hello, world!') + + def test_bad_decode(self): + self.assertRaises(DecodeError, self.pool.request, + 'GET', '/encodingrequest', + headers={'accept-encoding': 'garbage-deflate'}) + + self.assertRaises(DecodeError, self.pool.request, + 'GET', '/encodingrequest', + headers={'accept-encoding': 'garbage-gzip'}) + + def test_connection_count(self): + pool = HTTPConnectionPool(self.host, self.port, maxsize=1) + + pool.request('GET', '/') + pool.request('GET', '/') + pool.request('GET', '/') + + self.assertEqual(pool.num_connections, 1) + self.assertEqual(pool.num_requests, 3) + + def test_connection_count_bigpool(self): + http_pool = HTTPConnectionPool(self.host, self.port, maxsize=16) + + http_pool.request('GET', '/') + http_pool.request('GET', '/') + http_pool.request('GET', '/') + + self.assertEqual(http_pool.num_connections, 1) + self.assertEqual(http_pool.num_requests, 3) + + def test_partial_response(self): + pool = HTTPConnectionPool(self.host, self.port, maxsize=1) + + req_data = {'lol': 'cat'} + resp_data = urlencode(req_data).encode('utf-8') + + r = pool.request('GET', '/echo', fields=req_data, preload_content=False) + + self.assertEqual(r.read(5), resp_data[:5]) + self.assertEqual(r.read(), resp_data[5:]) + + def test_lazy_load_twice(self): + # This test is sad and confusing. Need to figure out what's + # going on with partial reads and socket reuse. + + pool = HTTPConnectionPool(self.host, self.port, block=True, maxsize=1, timeout=2) + + payload_size = 1024 * 2 + first_chunk = 512 + + boundary = 'foo' + + req_data = {'count': 'a' * payload_size} + resp_data = encode_multipart_formdata(req_data, boundary=boundary)[0] + + req2_data = {'count': 'b' * payload_size} + resp2_data = encode_multipart_formdata(req2_data, boundary=boundary)[0] + + r1 = pool.request('POST', '/echo', fields=req_data, multipart_boundary=boundary, preload_content=False) + + self.assertEqual(r1.read(first_chunk), resp_data[:first_chunk]) + + try: + r2 = pool.request('POST', '/echo', fields=req2_data, multipart_boundary=boundary, + preload_content=False, pool_timeout=0.001) + + # This branch should generally bail here, but maybe someday it will + # work? Perhaps by some sort of magic. Consider it a TODO. + + self.assertEqual(r2.read(first_chunk), resp2_data[:first_chunk]) + + self.assertEqual(r1.read(), resp_data[first_chunk:]) + self.assertEqual(r2.read(), resp2_data[first_chunk:]) + self.assertEqual(pool.num_requests, 2) + + except EmptyPoolError: + self.assertEqual(r1.read(), resp_data[first_chunk:]) + self.assertEqual(pool.num_requests, 1) + + self.assertEqual(pool.num_connections, 1) + + def test_for_double_release(self): + MAXSIZE=5 + + # Check default state + pool = HTTPConnectionPool(self.host, self.port, maxsize=MAXSIZE) + self.assertEqual(pool.num_connections, 0) + self.assertEqual(pool.pool.qsize(), MAXSIZE) + + # Make an empty slot for testing + pool.pool.get() + self.assertEqual(pool.pool.qsize(), MAXSIZE-1) + + # Check state after simple request + pool.urlopen('GET', '/') + self.assertEqual(pool.pool.qsize(), MAXSIZE-1) + + # Check state without release + pool.urlopen('GET', '/', preload_content=False) + self.assertEqual(pool.pool.qsize(), MAXSIZE-2) + + pool.urlopen('GET', '/') + self.assertEqual(pool.pool.qsize(), MAXSIZE-2) + + # Check state after read + pool.urlopen('GET', '/').data + self.assertEqual(pool.pool.qsize(), MAXSIZE-2) + + pool.urlopen('GET', '/') + self.assertEqual(pool.pool.qsize(), MAXSIZE-2) + + def test_release_conn_parameter(self): + MAXSIZE=5 + pool = HTTPConnectionPool(self.host, self.port, maxsize=MAXSIZE) + self.assertEqual(pool.pool.qsize(), MAXSIZE) + + # Make request without releasing connection + pool.request('GET', '/', release_conn=False, preload_content=False) + self.assertEqual(pool.pool.qsize(), MAXSIZE-1) + + def test_dns_error(self): + pool = HTTPConnectionPool('thishostdoesnotexist.invalid', self.port, timeout=0.001) + self.assertRaises(MaxRetryError, pool.request, 'GET', '/test', retries=2) + + def test_source_address(self): + for addr in VALID_SOURCE_ADDRESSES: + pool = HTTPConnectionPool(self.host, self.port, + source_address=addr, retries=False) + r = pool.request('GET', '/source_address') + assert r.data == b(addr[0]), ( + "expected the response to contain the source address {addr}, " + "but was {data}".format(data=r.data, addr=b(addr[0]))) + + def test_source_address_error(self): + for addr in INVALID_SOURCE_ADDRESSES: + pool = HTTPConnectionPool(self.host, self.port, + source_address=addr, retries=False) + self.assertRaises(ProtocolError, + pool.request, 'GET', '/source_address') + + @onlyPy3 + def test_httplib_headers_case_insensitive(self): + HEADERS = {'Content-Length': '0', 'Content-type': 'text/plain', + 'Server': 'TornadoServer/%s' % tornado.version} + r = self.pool.request('GET', '/specific_method', + fields={'method': 'GET'}) + self.assertEqual(HEADERS, dict(r.headers.items())) # to preserve case sensitivity + + +class TestRetry(HTTPDummyServerTestCase): + def setUp(self): + self.pool = HTTPConnectionPool(self.host, self.port) + + def test_max_retry(self): + try: + r = self.pool.request('GET', '/redirect', + fields={'target': '/'}, + retries=0) + self.fail("Failed to raise MaxRetryError exception, returned %r" % r.status) + except MaxRetryError: + pass + + def test_disabled_retry(self): + """ Disabled retries should disable redirect handling. """ + r = self.pool.request('GET', '/redirect', + fields={'target': '/'}, + retries=False) + self.assertEqual(r.status, 303) + + r = self.pool.request('GET', '/redirect', + fields={'target': '/'}, + retries=Retry(redirect=False)) + self.assertEqual(r.status, 303) + + pool = HTTPConnectionPool('thishostdoesnotexist.invalid', self.port, timeout=0.001) + self.assertRaises(ProtocolError, pool.request, 'GET', '/test', retries=False) + + def test_read_retries(self): + """ Should retry for status codes in the whitelist """ + retry = Retry(read=1, status_forcelist=[418]) + resp = self.pool.request('GET', '/successful_retry', + headers={'test-name': 'test_read_retries'}, + retries=retry) + self.assertEqual(resp.status, 200) + + def test_read_total_retries(self): + """ HTTP response w/ status code in the whitelist should be retried """ + headers = {'test-name': 'test_read_total_retries'} + retry = Retry(total=1, status_forcelist=[418]) + resp = self.pool.request('GET', '/successful_retry', + headers=headers, retries=retry) + self.assertEqual(resp.status, 200) + + def test_retries_wrong_whitelist(self): + """HTTP response w/ status code not in whitelist shouldn't be retried""" + retry = Retry(total=1, status_forcelist=[202]) + resp = self.pool.request('GET', '/successful_retry', + headers={'test-name': 'test_wrong_whitelist'}, + retries=retry) + self.assertEqual(resp.status, 418) + + def test_default_method_whitelist_retried(self): + """ urllib3 should retry methods in the default method whitelist """ + retry = Retry(total=1, status_forcelist=[418]) + resp = self.pool.request('OPTIONS', '/successful_retry', + headers={'test-name': 'test_default_whitelist'}, + retries=retry) + self.assertEqual(resp.status, 200) + + def test_retries_wrong_method_list(self): + """Method not in our whitelist should not be retried, even if code matches""" + headers = {'test-name': 'test_wrong_method_whitelist'} + retry = Retry(total=1, status_forcelist=[418], + method_whitelist=['POST']) + resp = self.pool.request('GET', '/successful_retry', + headers=headers, retries=retry) + self.assertEqual(resp.status, 418) + + def test_read_retries_unsuccessful(self): + headers = {'test-name': 'test_read_retries_unsuccessful'} + resp = self.pool.request('GET', '/successful_retry', + headers=headers, retries=1) + self.assertEqual(resp.status, 418) + + def test_retry_reuse_safe(self): + """ It should be possible to reuse a Retry object across requests """ + headers = {'test-name': 'test_retry_safe'} + retry = Retry(total=1, status_forcelist=[418]) + resp = self.pool.request('GET', '/successful_retry', + headers=headers, retries=retry) + self.assertEqual(resp.status, 200) + resp = self.pool.request('GET', '/successful_retry', + headers=headers, retries=retry) + self.assertEqual(resp.status, 200) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/with_dummyserver/test_connectionpool.pyc b/test/with_dummyserver/test_connectionpool.pyc Binary files differnew file mode 100644 index 0000000..b8c38e9 --- /dev/null +++ b/test/with_dummyserver/test_connectionpool.pyc diff --git a/test/with_dummyserver/test_https.py b/test/with_dummyserver/test_https.py new file mode 100644 index 0000000..cf3eee7 --- /dev/null +++ b/test/with_dummyserver/test_https.py @@ -0,0 +1,374 @@ +import datetime +import logging +import ssl +import sys +import unittest +import warnings + +import mock +from nose.plugins.skip import SkipTest + +from dummyserver.testcase import HTTPSDummyServerTestCase +from dummyserver.server import DEFAULT_CA, DEFAULT_CA_BAD, DEFAULT_CERTS + +from test import ( + onlyPy26OrOlder, + requires_network, + TARPIT_HOST, + clear_warnings, +) +from urllib3 import HTTPSConnectionPool +from urllib3.connection import ( + VerifiedHTTPSConnection, + UnverifiedHTTPSConnection, + RECENT_DATE, +) +from urllib3.exceptions import ( + SSLError, + ReadTimeoutError, + ConnectTimeoutError, + InsecureRequestWarning, + SystemTimeWarning, +) +from urllib3.util.timeout import Timeout + + +log = logging.getLogger('urllib3.connectionpool') +log.setLevel(logging.NOTSET) +log.addHandler(logging.StreamHandler(sys.stdout)) + + + +class TestHTTPS(HTTPSDummyServerTestCase): + def setUp(self): + self._pool = HTTPSConnectionPool(self.host, self.port) + + def test_simple(self): + r = self._pool.request('GET', '/') + self.assertEqual(r.status, 200, r.data) + + def test_set_ssl_version_to_tlsv1(self): + self._pool.ssl_version = ssl.PROTOCOL_TLSv1 + r = self._pool.request('GET', '/') + self.assertEqual(r.status, 200, r.data) + + def test_verified(self): + https_pool = HTTPSConnectionPool(self.host, self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA) + + conn = https_pool._new_conn() + self.assertEqual(conn.__class__, VerifiedHTTPSConnection) + + with mock.patch('warnings.warn') as warn: + r = https_pool.request('GET', '/') + self.assertEqual(r.status, 200) + self.assertFalse(warn.called, warn.call_args_list) + + def test_invalid_common_name(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA) + try: + https_pool.request('GET', '/') + self.fail("Didn't raise SSL invalid common name") + except SSLError as e: + self.assertTrue("doesn't match" in str(e)) + + def test_verified_with_bad_ca_certs(self): + https_pool = HTTPSConnectionPool(self.host, self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA_BAD) + + try: + https_pool.request('GET', '/') + self.fail("Didn't raise SSL error with bad CA certs") + except SSLError as e: + self.assertTrue('certificate verify failed' in str(e), + "Expected 'certificate verify failed'," + "instead got: %r" % e) + + def test_verified_without_ca_certs(self): + # default is cert_reqs=None which is ssl.CERT_NONE + https_pool = HTTPSConnectionPool(self.host, self.port, + cert_reqs='CERT_REQUIRED') + + try: + https_pool.request('GET', '/') + self.fail("Didn't raise SSL error with no CA certs when" + "CERT_REQUIRED is set") + except SSLError as e: + # there is a different error message depending on whether or + # not pyopenssl is injected + self.assertTrue('No root certificates specified' in str(e) or + 'certificate verify failed' in str(e), + "Expected 'No root certificates specified' or " + "'certificate verify failed', " + "instead got: %r" % e) + + def test_no_ssl(self): + pool = HTTPSConnectionPool(self.host, self.port) + pool.ConnectionCls = None + self.assertRaises(SSLError, pool._new_conn) + self.assertRaises(SSLError, pool.request, 'GET', '/') + + def test_unverified_ssl(self): + """ Test that bare HTTPSConnection can connect, make requests """ + pool = HTTPSConnectionPool(self.host, self.port) + pool.ConnectionCls = UnverifiedHTTPSConnection + + with mock.patch('warnings.warn') as warn: + r = pool.request('GET', '/') + self.assertEqual(r.status, 200) + self.assertTrue(warn.called) + + call, = warn.call_args_list + category = call[0][1] + self.assertEqual(category, InsecureRequestWarning) + + def test_ssl_unverified_with_ca_certs(self): + pool = HTTPSConnectionPool(self.host, self.port, + cert_reqs='CERT_NONE', + ca_certs=DEFAULT_CA_BAD) + + with mock.patch('warnings.warn') as warn: + r = pool.request('GET', '/') + self.assertEqual(r.status, 200) + self.assertTrue(warn.called) + + call, = warn.call_args_list + category = call[0][1] + self.assertEqual(category, InsecureRequestWarning) + + @requires_network + def test_ssl_verified_with_platform_ca_certs(self): + """ + We should rely on the platform CA file to validate authenticity of SSL + certificates. Since this file is used by many components of the OS, + such as curl, apt-get, etc., we decided to not touch it, in order to + not compromise the security of the OS running the test suite (typically + urllib3 developer's OS). + + This test assumes that httpbin.org uses a certificate signed by a well + known Certificate Authority. + """ + try: + import urllib3.contrib.pyopenssl + except ImportError: + raise SkipTest('Test requires PyOpenSSL') + if (urllib3.connection.ssl_wrap_socket is + urllib3.contrib.pyopenssl.orig_connection_ssl_wrap_socket): + # Not patched + raise SkipTest('Test should only be run after PyOpenSSL ' + 'monkey patching') + + https_pool = HTTPSConnectionPool('httpbin.org', 443, + cert_reqs=ssl.CERT_REQUIRED) + + https_pool.request('HEAD', '/') + + def test_assert_hostname_false(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA) + + https_pool.assert_hostname = False + https_pool.request('GET', '/') + + def test_assert_specific_hostname(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA) + + https_pool.assert_hostname = 'localhost' + https_pool.request('GET', '/') + + def test_assert_fingerprint_md5(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA) + + https_pool.assert_fingerprint = 'CA:84:E1:AD0E5a:ef:2f:C3:09' \ + ':E7:30:F8:CD:C8:5B' + https_pool.request('GET', '/') + + def test_assert_fingerprint_sha1(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA) + + https_pool.assert_fingerprint = 'CC:45:6A:90:82:F7FF:C0:8218:8e:' \ + '7A:F2:8A:D7:1E:07:33:67:DE' + https_pool.request('GET', '/') + + def test_assert_invalid_fingerprint(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_REQUIRED', + ca_certs=DEFAULT_CA) + + https_pool.assert_fingerprint = 'AA:AA:AA:AA:AA:AAAA:AA:AAAA:AA:' \ + 'AA:AA:AA:AA:AA:AA:AA:AA:AA' + + self.assertRaises(SSLError, https_pool.request, 'GET', '/') + https_pool._get_conn() + + # Uneven length + https_pool.assert_fingerprint = 'AA:A' + self.assertRaises(SSLError, https_pool.request, 'GET', '/') + https_pool._get_conn() + + # Invalid length + https_pool.assert_fingerprint = 'AA' + self.assertRaises(SSLError, https_pool.request, 'GET', '/') + + def test_verify_none_and_bad_fingerprint(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_NONE', + ca_certs=DEFAULT_CA_BAD) + + https_pool.assert_fingerprint = 'AA:AA:AA:AA:AA:AAAA:AA:AAAA:AA:' \ + 'AA:AA:AA:AA:AA:AA:AA:AA:AA' + self.assertRaises(SSLError, https_pool.request, 'GET', '/') + + def test_verify_none_and_good_fingerprint(self): + https_pool = HTTPSConnectionPool('127.0.0.1', self.port, + cert_reqs='CERT_NONE', + ca_certs=DEFAULT_CA_BAD) + + https_pool.assert_fingerprint = 'CC:45:6A:90:82:F7FF:C0:8218:8e:' \ + '7A:F2:8A:D7:1E:07:33:67:DE' + https_pool.request('GET', '/') + + @requires_network + def test_https_timeout(self): + timeout = Timeout(connect=0.001) + https_pool = HTTPSConnectionPool(TARPIT_HOST, self.port, + timeout=timeout, retries=False, + cert_reqs='CERT_REQUIRED') + + timeout = Timeout(total=None, connect=0.001) + https_pool = HTTPSConnectionPool(TARPIT_HOST, self.port, + timeout=timeout, retries=False, + cert_reqs='CERT_REQUIRED') + self.assertRaises(ConnectTimeoutError, https_pool.request, 'GET', '/') + + timeout = Timeout(read=0.001) + https_pool = HTTPSConnectionPool(self.host, self.port, + timeout=timeout, retries=False, + cert_reqs='CERT_REQUIRED') + https_pool.ca_certs = DEFAULT_CA + https_pool.assert_fingerprint = 'CC:45:6A:90:82:F7FF:C0:8218:8e:' \ + '7A:F2:8A:D7:1E:07:33:67:DE' + url = '/sleep?seconds=0.005' + self.assertRaises(ReadTimeoutError, https_pool.request, 'GET', url) + + timeout = Timeout(total=None) + https_pool = HTTPSConnectionPool(self.host, self.port, timeout=timeout, + cert_reqs='CERT_NONE') + https_pool.request('GET', '/') + + def test_tunnel(self): + """ test the _tunnel behavior """ + timeout = Timeout(total=None) + https_pool = HTTPSConnectionPool(self.host, self.port, timeout=timeout, + cert_reqs='CERT_NONE') + conn = https_pool._new_conn() + try: + conn.set_tunnel(self.host, self.port) + except AttributeError: # python 2.6 + conn._set_tunnel(self.host, self.port) + conn._tunnel = mock.Mock() + https_pool._make_request(conn, 'GET', '/') + conn._tunnel.assert_called_once_with() + + @onlyPy26OrOlder + def test_tunnel_old_python(self): + """HTTPSConnection can still make connections if _tunnel_host isn't set + + The _tunnel_host attribute was added in 2.6.3 - because our test runners + generally use the latest Python 2.6, we simulate the old version by + deleting the attribute from the HTTPSConnection. + """ + conn = self._pool._new_conn() + del conn._tunnel_host + self._pool._make_request(conn, 'GET', '/') + + @requires_network + def test_enhanced_timeout(self): + def new_pool(timeout, cert_reqs='CERT_REQUIRED'): + https_pool = HTTPSConnectionPool(TARPIT_HOST, self.port, + timeout=timeout, + retries=False, + cert_reqs=cert_reqs) + return https_pool + + https_pool = new_pool(Timeout(connect=0.001)) + conn = https_pool._new_conn() + self.assertRaises(ConnectTimeoutError, https_pool.request, 'GET', '/') + self.assertRaises(ConnectTimeoutError, https_pool._make_request, conn, + 'GET', '/') + + https_pool = new_pool(Timeout(connect=5)) + self.assertRaises(ConnectTimeoutError, https_pool.request, 'GET', '/', + timeout=Timeout(connect=0.001)) + + t = Timeout(total=None) + https_pool = new_pool(t) + conn = https_pool._new_conn() + self.assertRaises(ConnectTimeoutError, https_pool.request, 'GET', '/', + timeout=Timeout(total=None, connect=0.001)) + + def test_enhanced_ssl_connection(self): + fingerprint = 'CC:45:6A:90:82:F7FF:C0:8218:8e:7A:F2:8A:D7:1E:07:33:67:DE' + + conn = VerifiedHTTPSConnection(self.host, self.port) + https_pool = HTTPSConnectionPool(self.host, self.port, + cert_reqs='CERT_REQUIRED', ca_certs=DEFAULT_CA, + assert_fingerprint=fingerprint) + + https_pool._make_request(conn, 'GET', '/') + + def test_ssl_correct_system_time(self): + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter('always') + self._pool.request('GET', '/') + + self.assertEqual([], w) + + def test_ssl_wrong_system_time(self): + with mock.patch('urllib3.connection.datetime') as mock_date: + mock_date.date.today.return_value = datetime.date(1970, 1, 1) + + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter('always') + self._pool.request('GET', '/') + + self.assertEqual(len(w), 1) + warning = w[0] + + self.assertEqual(SystemTimeWarning, warning.category) + self.assertTrue(str(RECENT_DATE) in warning.message.args[0]) + + +class TestHTTPS_TLSv1(HTTPSDummyServerTestCase): + certs = DEFAULT_CERTS.copy() + certs['ssl_version'] = ssl.PROTOCOL_TLSv1 + + def setUp(self): + self._pool = HTTPSConnectionPool(self.host, self.port) + + def test_set_ssl_version_to_sslv3(self): + self._pool.ssl_version = ssl.PROTOCOL_SSLv3 + self.assertRaises(SSLError, self._pool.request, 'GET', '/') + + def test_ssl_version_as_string(self): + self._pool.ssl_version = 'PROTOCOL_SSLv3' + self.assertRaises(SSLError, self._pool.request, 'GET', '/') + + def test_ssl_version_as_short_string(self): + self._pool.ssl_version = 'SSLv3' + self.assertRaises(SSLError, self._pool.request, 'GET', '/') + + +if __name__ == '__main__': + unittest.main() diff --git a/test/with_dummyserver/test_https.pyc b/test/with_dummyserver/test_https.pyc Binary files differnew file mode 100644 index 0000000..6d85316 --- /dev/null +++ b/test/with_dummyserver/test_https.pyc diff --git a/test/with_dummyserver/test_poolmanager.py b/test/with_dummyserver/test_poolmanager.py new file mode 100644 index 0000000..52ff974 --- /dev/null +++ b/test/with_dummyserver/test_poolmanager.py @@ -0,0 +1,136 @@ +import unittest +import json + +from dummyserver.testcase import (HTTPDummyServerTestCase, + IPv6HTTPDummyServerTestCase) +from urllib3.poolmanager import PoolManager +from urllib3.connectionpool import port_by_scheme +from urllib3.exceptions import MaxRetryError, SSLError + + +class TestPoolManager(HTTPDummyServerTestCase): + + def setUp(self): + self.base_url = 'http://%s:%d' % (self.host, self.port) + self.base_url_alt = 'http://%s:%d' % (self.host_alt, self.port) + + def test_redirect(self): + http = PoolManager() + + r = http.request('GET', '%s/redirect' % self.base_url, + fields={'target': '%s/' % self.base_url}, + redirect=False) + + self.assertEqual(r.status, 303) + + r = http.request('GET', '%s/redirect' % self.base_url, + fields={'target': '%s/' % self.base_url}) + + self.assertEqual(r.status, 200) + self.assertEqual(r.data, b'Dummy server!') + + def test_redirect_twice(self): + http = PoolManager() + + r = http.request('GET', '%s/redirect' % self.base_url, + fields={'target': '%s/redirect' % self.base_url}, + redirect=False) + + self.assertEqual(r.status, 303) + + r = http.request('GET', '%s/redirect' % self.base_url, + fields={'target': '%s/redirect?target=%s/' % (self.base_url, self.base_url)}) + + self.assertEqual(r.status, 200) + self.assertEqual(r.data, b'Dummy server!') + + def test_redirect_to_relative_url(self): + http = PoolManager() + + r = http.request('GET', '%s/redirect' % self.base_url, + fields = {'target': '/redirect'}, + redirect = False) + + self.assertEqual(r.status, 303) + + r = http.request('GET', '%s/redirect' % self.base_url, + fields = {'target': '/redirect'}) + + self.assertEqual(r.status, 200) + self.assertEqual(r.data, b'Dummy server!') + + def test_cross_host_redirect(self): + http = PoolManager() + + cross_host_location = '%s/echo?a=b' % self.base_url_alt + try: + http.request('GET', '%s/redirect' % self.base_url, + fields={'target': cross_host_location}, + timeout=0.01, retries=0) + self.fail("Request succeeded instead of raising an exception like it should.") + + except MaxRetryError: + pass + + r = http.request('GET', '%s/redirect' % self.base_url, + fields={'target': '%s/echo?a=b' % self.base_url_alt}, + timeout=0.01, retries=1) + + self.assertEqual(r._pool.host, self.host_alt) + + def test_missing_port(self): + # Can a URL that lacks an explicit port like ':80' succeed, or + # will all such URLs fail with an error? + + http = PoolManager() + + # By globally adjusting `port_by_scheme` we pretend for a moment + # that HTTP's default port is not 80, but is the port at which + # our test server happens to be listening. + port_by_scheme['http'] = self.port + try: + r = http.request('GET', 'http://%s/' % self.host, retries=0) + finally: + port_by_scheme['http'] = 80 + + self.assertEqual(r.status, 200) + self.assertEqual(r.data, b'Dummy server!') + + def test_headers(self): + http = PoolManager(headers={'Foo': 'bar'}) + + r = http.request_encode_url('GET', '%s/headers' % self.base_url) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), 'bar') + + r = http.request_encode_body('POST', '%s/headers' % self.base_url) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), 'bar') + + r = http.request_encode_url('GET', '%s/headers' % self.base_url, headers={'Baz': 'quux'}) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), None) + self.assertEqual(returned_headers.get('Baz'), 'quux') + + r = http.request_encode_body('GET', '%s/headers' % self.base_url, headers={'Baz': 'quux'}) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), None) + self.assertEqual(returned_headers.get('Baz'), 'quux') + + def test_http_with_ssl_keywords(self): + http = PoolManager(ca_certs='REQUIRED') + + r = http.request('GET', 'http://%s:%s/' % (self.host, self.port)) + self.assertEqual(r.status, 200) + + +class TestIPv6PoolManager(IPv6HTTPDummyServerTestCase): + def setUp(self): + self.base_url = 'http://[%s]:%d' % (self.host, self.port) + + def test_ipv6(self): + http = PoolManager() + http.request('GET', self.base_url) + +if __name__ == '__main__': + unittest.main() diff --git a/test/with_dummyserver/test_poolmanager.pyc b/test/with_dummyserver/test_poolmanager.pyc Binary files differnew file mode 100644 index 0000000..26c52e9 --- /dev/null +++ b/test/with_dummyserver/test_poolmanager.pyc diff --git a/test/with_dummyserver/test_proxy_poolmanager.py b/test/with_dummyserver/test_proxy_poolmanager.py new file mode 100644 index 0000000..61eedf1 --- /dev/null +++ b/test/with_dummyserver/test_proxy_poolmanager.py @@ -0,0 +1,263 @@ +import unittest +import json +import socket + +from dummyserver.testcase import HTTPDummyProxyTestCase +from dummyserver.server import ( + DEFAULT_CA, DEFAULT_CA_BAD, get_unreachable_address) + +from urllib3.poolmanager import proxy_from_url, ProxyManager +from urllib3.exceptions import MaxRetryError, SSLError, ProxyError +from urllib3.connectionpool import connection_from_url, VerifiedHTTPSConnection + + +class TestHTTPProxyManager(HTTPDummyProxyTestCase): + + def setUp(self): + self.http_url = 'http://%s:%d' % (self.http_host, self.http_port) + self.http_url_alt = 'http://%s:%d' % (self.http_host_alt, + self.http_port) + self.https_url = 'https://%s:%d' % (self.https_host, self.https_port) + self.https_url_alt = 'https://%s:%d' % (self.https_host_alt, + self.https_port) + self.proxy_url = 'http://%s:%d' % (self.proxy_host, self.proxy_port) + + def test_basic_proxy(self): + http = proxy_from_url(self.proxy_url) + + r = http.request('GET', '%s/' % self.http_url) + self.assertEqual(r.status, 200) + + r = http.request('GET', '%s/' % self.https_url) + self.assertEqual(r.status, 200) + + def test_nagle_proxy(self): + """ Test that proxy connections do not have TCP_NODELAY turned on """ + http = proxy_from_url(self.proxy_url) + hc2 = http.connection_from_host(self.http_host, self.http_port) + conn = hc2._get_conn() + hc2._make_request(conn, 'GET', '/') + tcp_nodelay_setting = conn.sock.getsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY) + self.assertEqual(tcp_nodelay_setting, 0, + ("Expected TCP_NODELAY for proxies to be set " + "to zero, instead was %s" % tcp_nodelay_setting)) + + def test_proxy_conn_fail(self): + host, port = get_unreachable_address() + http = proxy_from_url('http://%s:%s/' % (host, port), retries=1) + self.assertRaises(MaxRetryError, http.request, 'GET', + '%s/' % self.https_url) + self.assertRaises(MaxRetryError, http.request, 'GET', + '%s/' % self.http_url) + + try: + http.request('GET', '%s/' % self.http_url) + self.fail("Failed to raise retry error.") + except MaxRetryError as e: + self.assertEqual(type(e.reason), ProxyError) + + def test_oldapi(self): + http = ProxyManager(connection_from_url(self.proxy_url)) + + r = http.request('GET', '%s/' % self.http_url) + self.assertEqual(r.status, 200) + + r = http.request('GET', '%s/' % self.https_url) + self.assertEqual(r.status, 200) + + def test_proxy_verified(self): + http = proxy_from_url(self.proxy_url, cert_reqs='REQUIRED', + ca_certs=DEFAULT_CA_BAD) + https_pool = http._new_pool('https', self.https_host, + self.https_port) + try: + https_pool.request('GET', '/') + self.fail("Didn't raise SSL error with wrong CA") + except SSLError as e: + self.assertTrue('certificate verify failed' in str(e), + "Expected 'certificate verify failed'," + "instead got: %r" % e) + + http = proxy_from_url(self.proxy_url, cert_reqs='REQUIRED', + ca_certs=DEFAULT_CA) + https_pool = http._new_pool('https', self.https_host, + self.https_port) + + conn = https_pool._new_conn() + self.assertEqual(conn.__class__, VerifiedHTTPSConnection) + https_pool.request('GET', '/') # Should succeed without exceptions. + + http = proxy_from_url(self.proxy_url, cert_reqs='REQUIRED', + ca_certs=DEFAULT_CA) + https_fail_pool = http._new_pool('https', '127.0.0.1', self.https_port) + + try: + https_fail_pool.request('GET', '/') + self.fail("Didn't raise SSL invalid common name") + except SSLError as e: + self.assertTrue("doesn't match" in str(e)) + + def test_redirect(self): + http = proxy_from_url(self.proxy_url) + + r = http.request('GET', '%s/redirect' % self.http_url, + fields={'target': '%s/' % self.http_url}, + redirect=False) + + self.assertEqual(r.status, 303) + + r = http.request('GET', '%s/redirect' % self.http_url, + fields={'target': '%s/' % self.http_url}) + + self.assertEqual(r.status, 200) + self.assertEqual(r.data, b'Dummy server!') + + def test_cross_host_redirect(self): + http = proxy_from_url(self.proxy_url) + + cross_host_location = '%s/echo?a=b' % self.http_url_alt + try: + http.request('GET', '%s/redirect' % self.http_url, + fields={'target': cross_host_location}, + timeout=0.1, retries=0) + self.fail("We don't want to follow redirects here.") + + except MaxRetryError: + pass + + r = http.request('GET', '%s/redirect' % self.http_url, + fields={'target': '%s/echo?a=b' % self.http_url_alt}, + timeout=0.1, retries=1) + self.assertNotEqual(r._pool.host, self.http_host_alt) + + def test_cross_protocol_redirect(self): + http = proxy_from_url(self.proxy_url) + + cross_protocol_location = '%s/echo?a=b' % self.https_url + try: + http.request('GET', '%s/redirect' % self.http_url, + fields={'target': cross_protocol_location}, + timeout=0.1, retries=0) + self.fail("We don't want to follow redirects here.") + + except MaxRetryError: + pass + + r = http.request('GET', '%s/redirect' % self.http_url, + fields={'target': '%s/echo?a=b' % self.https_url}, + timeout=0.1, retries=1) + self.assertEqual(r._pool.host, self.https_host) + + def test_headers(self): + http = proxy_from_url(self.proxy_url,headers={'Foo': 'bar'}, + proxy_headers={'Hickory': 'dickory'}) + + r = http.request_encode_url('GET', '%s/headers' % self.http_url) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), 'bar') + self.assertEqual(returned_headers.get('Hickory'), 'dickory') + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.http_host,self.http_port)) + + r = http.request_encode_url('GET', '%s/headers' % self.http_url_alt) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), 'bar') + self.assertEqual(returned_headers.get('Hickory'), 'dickory') + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.http_host_alt,self.http_port)) + + r = http.request_encode_url('GET', '%s/headers' % self.https_url) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), 'bar') + self.assertEqual(returned_headers.get('Hickory'), None) + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.https_host,self.https_port)) + + r = http.request_encode_url('GET', '%s/headers' % self.https_url_alt) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), 'bar') + self.assertEqual(returned_headers.get('Hickory'), None) + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.https_host_alt,self.https_port)) + + r = http.request_encode_body('POST', '%s/headers' % self.http_url) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), 'bar') + self.assertEqual(returned_headers.get('Hickory'), 'dickory') + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.http_host,self.http_port)) + + r = http.request_encode_url('GET', '%s/headers' % self.http_url, headers={'Baz': 'quux'}) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), None) + self.assertEqual(returned_headers.get('Baz'), 'quux') + self.assertEqual(returned_headers.get('Hickory'), 'dickory') + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.http_host,self.http_port)) + + r = http.request_encode_url('GET', '%s/headers' % self.https_url, headers={'Baz': 'quux'}) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), None) + self.assertEqual(returned_headers.get('Baz'), 'quux') + self.assertEqual(returned_headers.get('Hickory'), None) + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.https_host,self.https_port)) + + r = http.request_encode_body('GET', '%s/headers' % self.http_url, headers={'Baz': 'quux'}) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), None) + self.assertEqual(returned_headers.get('Baz'), 'quux') + self.assertEqual(returned_headers.get('Hickory'), 'dickory') + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.http_host,self.http_port)) + + r = http.request_encode_body('GET', '%s/headers' % self.https_url, headers={'Baz': 'quux'}) + returned_headers = json.loads(r.data.decode()) + self.assertEqual(returned_headers.get('Foo'), None) + self.assertEqual(returned_headers.get('Baz'), 'quux') + self.assertEqual(returned_headers.get('Hickory'), None) + self.assertEqual(returned_headers.get('Host'), + '%s:%s'%(self.https_host,self.https_port)) + + def test_proxy_pooling(self): + http = proxy_from_url(self.proxy_url) + + for x in range(2): + r = http.urlopen('GET', self.http_url) + self.assertEqual(len(http.pools), 1) + + for x in range(2): + r = http.urlopen('GET', self.http_url_alt) + self.assertEqual(len(http.pools), 1) + + for x in range(2): + r = http.urlopen('GET', self.https_url) + self.assertEqual(len(http.pools), 2) + + for x in range(2): + r = http.urlopen('GET', self.https_url_alt) + self.assertEqual(len(http.pools), 3) + + def test_proxy_pooling_ext(self): + http = proxy_from_url(self.proxy_url) + hc1 = http.connection_from_url(self.http_url) + hc2 = http.connection_from_host(self.http_host, self.http_port) + hc3 = http.connection_from_url(self.http_url_alt) + hc4 = http.connection_from_host(self.http_host_alt, self.http_port) + self.assertEqual(hc1,hc2) + self.assertEqual(hc2,hc3) + self.assertEqual(hc3,hc4) + + sc1 = http.connection_from_url(self.https_url) + sc2 = http.connection_from_host(self.https_host, + self.https_port,scheme='https') + sc3 = http.connection_from_url(self.https_url_alt) + sc4 = http.connection_from_host(self.https_host_alt, + self.https_port,scheme='https') + self.assertEqual(sc1,sc2) + self.assertNotEqual(sc2,sc3) + self.assertEqual(sc3,sc4) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/with_dummyserver/test_proxy_poolmanager.pyc b/test/with_dummyserver/test_proxy_poolmanager.pyc Binary files differnew file mode 100644 index 0000000..12c320c --- /dev/null +++ b/test/with_dummyserver/test_proxy_poolmanager.pyc diff --git a/test/with_dummyserver/test_socketlevel.py b/test/with_dummyserver/test_socketlevel.py new file mode 100644 index 0000000..e1ac1c6 --- /dev/null +++ b/test/with_dummyserver/test_socketlevel.py @@ -0,0 +1,544 @@ +# TODO: Break this module up into pieces. Maybe group by functionality tested +# rather than the socket level-ness of it. + +from urllib3 import HTTPConnectionPool, HTTPSConnectionPool +from urllib3.poolmanager import proxy_from_url +from urllib3.exceptions import ( + MaxRetryError, + ProxyError, + ReadTimeoutError, + SSLError, + ProtocolError, +) +from urllib3.util.ssl_ import HAS_SNI +from urllib3.util.timeout import Timeout +from urllib3.util.retry import Retry + +from dummyserver.testcase import SocketDummyServerTestCase +from dummyserver.server import ( + DEFAULT_CERTS, DEFAULT_CA, get_unreachable_address) + +from nose.plugins.skip import SkipTest +from threading import Event +import socket +import ssl + + +class TestCookies(SocketDummyServerTestCase): + + def test_multi_setcookie(self): + def multicookie_response_handler(listener): + sock = listener.accept()[0] + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + + sock.send(b'HTTP/1.1 200 OK\r\n' + b'Set-Cookie: foo=1\r\n' + b'Set-Cookie: bar=1\r\n' + b'\r\n') + sock.close() + + self._start_server(multicookie_response_handler) + pool = HTTPConnectionPool(self.host, self.port) + r = pool.request('GET', '/', retries=0) + self.assertEqual(r.headers, {'set-cookie': 'foo=1, bar=1'}) + + +class TestSNI(SocketDummyServerTestCase): + + def test_hostname_in_first_request_packet(self): + if not HAS_SNI: + raise SkipTest('SNI-support not available') + + done_receiving = Event() + self.buf = b'' + + def socket_handler(listener): + sock = listener.accept()[0] + + self.buf = sock.recv(65536) # We only accept one packet + done_receiving.set() # let the test know it can proceed + sock.close() + + self._start_server(socket_handler) + pool = HTTPSConnectionPool(self.host, self.port) + try: + pool.request('GET', '/', retries=0) + except SSLError: # We are violating the protocol + pass + done_receiving.wait() + self.assertTrue(self.host.encode() in self.buf, + "missing hostname in SSL handshake") + + +class TestSocketClosing(SocketDummyServerTestCase): + + def test_recovery_when_server_closes_connection(self): + # Does the pool work seamlessly if an open connection in the + # connection pool gets hung up on by the server, then reaches + # the front of the queue again? + + done_closing = Event() + + def socket_handler(listener): + for i in 0, 1: + sock = listener.accept()[0] + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf = sock.recv(65536) + + body = 'Response %d' % i + sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + '%s' % (len(body), body)).encode('utf-8')) + + sock.close() # simulate a server timing out, closing socket + done_closing.set() # let the test know it can proceed + + self._start_server(socket_handler) + pool = HTTPConnectionPool(self.host, self.port) + + response = pool.request('GET', '/', retries=0) + self.assertEqual(response.status, 200) + self.assertEqual(response.data, b'Response 0') + + done_closing.wait() # wait until the socket in our pool gets closed + + response = pool.request('GET', '/', retries=0) + self.assertEqual(response.status, 200) + self.assertEqual(response.data, b'Response 1') + + def test_connection_refused(self): + # Does the pool retry if there is no listener on the port? + host, port = get_unreachable_address() + pool = HTTPConnectionPool(host, port) + self.assertRaises(MaxRetryError, pool.request, 'GET', '/', retries=0) + + def test_connection_read_timeout(self): + timed_out = Event() + def socket_handler(listener): + sock = listener.accept()[0] + while not sock.recv(65536).endswith(b'\r\n\r\n'): + pass + + timed_out.wait() + sock.close() + + self._start_server(socket_handler) + pool = HTTPConnectionPool(self.host, self.port, timeout=0.001, retries=False) + + try: + self.assertRaises(ReadTimeoutError, pool.request, 'GET', '/') + finally: + timed_out.set() + + def test_timeout_errors_cause_retries(self): + def socket_handler(listener): + sock_timeout = listener.accept()[0] + + # Wait for a second request before closing the first socket. + sock = listener.accept()[0] + sock_timeout.close() + + # Second request. + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + + # Now respond immediately. + body = 'Response 2' + sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + '%s' % (len(body), body)).encode('utf-8')) + + sock.close() + + # In situations where the main thread throws an exception, the server + # thread can hang on an accept() call. This ensures everything times + # out within 1 second. This should be long enough for any socket + # operations in the test suite to complete + default_timeout = socket.getdefaulttimeout() + socket.setdefaulttimeout(1) + + try: + self._start_server(socket_handler) + t = Timeout(connect=0.001, read=0.001) + pool = HTTPConnectionPool(self.host, self.port, timeout=t) + + response = pool.request('GET', '/', retries=1) + self.assertEqual(response.status, 200) + self.assertEqual(response.data, b'Response 2') + finally: + socket.setdefaulttimeout(default_timeout) + + def test_delayed_body_read_timeout(self): + timed_out = Event() + + def socket_handler(listener): + sock = listener.accept()[0] + buf = b'' + body = 'Hi' + while not buf.endswith(b'\r\n\r\n'): + buf = sock.recv(65536) + sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' % len(body)).encode('utf-8')) + + timed_out.wait() + sock.send(body.encode('utf-8')) + sock.close() + + self._start_server(socket_handler) + pool = HTTPConnectionPool(self.host, self.port) + + response = pool.urlopen('GET', '/', retries=0, preload_content=False, + timeout=Timeout(connect=1, read=0.001)) + try: + self.assertRaises(ReadTimeoutError, response.read) + finally: + timed_out.set() + + def test_incomplete_response(self): + body = 'Response' + partial_body = body[:2] + + def socket_handler(listener): + sock = listener.accept()[0] + + # Consume request + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf = sock.recv(65536) + + # Send partial response and close socket. + sock.send(( + 'HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + '%s' % (len(body), partial_body)).encode('utf-8') + ) + sock.close() + + self._start_server(socket_handler) + pool = HTTPConnectionPool(self.host, self.port) + + response = pool.request('GET', '/', retries=0, preload_content=False) + self.assertRaises(ProtocolError, response.read) + + def test_retry_weird_http_version(self): + """ Retry class should handle httplib.BadStatusLine errors properly """ + + def socket_handler(listener): + sock = listener.accept()[0] + # First request. + # Pause before responding so the first request times out. + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + + # send unknown http protocol + body = "bad http 0.5 response" + sock.send(('HTTP/0.5 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + '%s' % (len(body), body)).encode('utf-8')) + sock.close() + + # Second request. + sock = listener.accept()[0] + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + + # Now respond immediately. + sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + 'foo' % (len('foo'))).encode('utf-8')) + + sock.close() # Close the socket. + + self._start_server(socket_handler) + pool = HTTPConnectionPool(self.host, self.port) + retry = Retry(read=1) + response = pool.request('GET', '/', retries=retry) + self.assertEqual(response.status, 200) + self.assertEqual(response.data, b'foo') + + + +class TestProxyManager(SocketDummyServerTestCase): + + def test_simple(self): + def echo_socket_handler(listener): + sock = listener.accept()[0] + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + + sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + '%s' % (len(buf), buf.decode('utf-8'))).encode('utf-8')) + sock.close() + + self._start_server(echo_socket_handler) + base_url = 'http://%s:%d' % (self.host, self.port) + proxy = proxy_from_url(base_url) + + r = proxy.request('GET', 'http://google.com/') + + self.assertEqual(r.status, 200) + # FIXME: The order of the headers is not predictable right now. We + # should fix that someday (maybe when we migrate to + # OrderedDict/MultiDict). + self.assertEqual(sorted(r.data.split(b'\r\n')), + sorted([ + b'GET http://google.com/ HTTP/1.1', + b'Host: google.com', + b'Accept-Encoding: identity', + b'Accept: */*', + b'', + b'', + ])) + + def test_headers(self): + def echo_socket_handler(listener): + sock = listener.accept()[0] + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + + sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + '%s' % (len(buf), buf.decode('utf-8'))).encode('utf-8')) + sock.close() + + self._start_server(echo_socket_handler) + base_url = 'http://%s:%d' % (self.host, self.port) + + # Define some proxy headers. + proxy_headers = {'For The Proxy': 'YEAH!'} + proxy = proxy_from_url(base_url, proxy_headers=proxy_headers) + + conn = proxy.connection_from_url('http://www.google.com/') + + r = conn.urlopen('GET', 'http://www.google.com/', assert_same_host=False) + + self.assertEqual(r.status, 200) + # FIXME: The order of the headers is not predictable right now. We + # should fix that someday (maybe when we migrate to + # OrderedDict/MultiDict). + self.assertTrue(b'For The Proxy: YEAH!\r\n' in r.data) + + def test_retries(self): + def echo_socket_handler(listener): + sock = listener.accept()[0] + # First request, which should fail + sock.close() + + # Second request + sock = listener.accept()[0] + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + + sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: %d\r\n' + '\r\n' + '%s' % (len(buf), buf.decode('utf-8'))).encode('utf-8')) + sock.close() + + self._start_server(echo_socket_handler) + base_url = 'http://%s:%d' % (self.host, self.port) + + proxy = proxy_from_url(base_url) + conn = proxy.connection_from_url('http://www.google.com') + + r = conn.urlopen('GET', 'http://www.google.com', + assert_same_host=False, retries=1) + self.assertEqual(r.status, 200) + + self.assertRaises(ProxyError, conn.urlopen, 'GET', + 'http://www.google.com', + assert_same_host=False, retries=False) + + def test_connect_reconn(self): + def proxy_ssl_one(listener): + sock = listener.accept()[0] + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += sock.recv(65536) + s = buf.decode('utf-8') + if not s.startswith('CONNECT '): + sock.send(('HTTP/1.1 405 Method not allowed\r\n' + 'Allow: CONNECT\r\n\r\n').encode('utf-8')) + sock.close() + return + + if not s.startswith('CONNECT %s:443' % (self.host,)): + sock.send(('HTTP/1.1 403 Forbidden\r\n\r\n').encode('utf-8')) + sock.close() + return + + sock.send(('HTTP/1.1 200 Connection Established\r\n\r\n').encode('utf-8')) + ssl_sock = ssl.wrap_socket(sock, + server_side=True, + keyfile=DEFAULT_CERTS['keyfile'], + certfile=DEFAULT_CERTS['certfile'], + ca_certs=DEFAULT_CA) + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += ssl_sock.recv(65536) + + ssl_sock.send(('HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: 2\r\n' + 'Connection: close\r\n' + '\r\n' + 'Hi').encode('utf-8')) + ssl_sock.close() + def echo_socket_handler(listener): + proxy_ssl_one(listener) + proxy_ssl_one(listener) + + self._start_server(echo_socket_handler) + base_url = 'http://%s:%d' % (self.host, self.port) + + proxy = proxy_from_url(base_url) + + url = 'https://{0}'.format(self.host) + conn = proxy.connection_from_url(url) + r = conn.urlopen('GET', url, retries=0) + self.assertEqual(r.status, 200) + r = conn.urlopen('GET', url, retries=0) + self.assertEqual(r.status, 200) + + +class TestSSL(SocketDummyServerTestCase): + + def test_ssl_failure_midway_through_conn(self): + def socket_handler(listener): + sock = listener.accept()[0] + sock2 = sock.dup() + ssl_sock = ssl.wrap_socket(sock, + server_side=True, + keyfile=DEFAULT_CERTS['keyfile'], + certfile=DEFAULT_CERTS['certfile'], + ca_certs=DEFAULT_CA) + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += ssl_sock.recv(65536) + + # Deliberately send from the non-SSL socket. + sock2.send(( + 'HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: 2\r\n' + '\r\n' + 'Hi').encode('utf-8')) + sock2.close() + ssl_sock.close() + + self._start_server(socket_handler) + pool = HTTPSConnectionPool(self.host, self.port) + + self.assertRaises(SSLError, pool.request, 'GET', '/', retries=0) + + def test_ssl_read_timeout(self): + timed_out = Event() + + def socket_handler(listener): + sock = listener.accept()[0] + ssl_sock = ssl.wrap_socket(sock, + server_side=True, + keyfile=DEFAULT_CERTS['keyfile'], + certfile=DEFAULT_CERTS['certfile'], + ca_certs=DEFAULT_CA) + + buf = b'' + while not buf.endswith(b'\r\n\r\n'): + buf += ssl_sock.recv(65536) + + # Send incomplete message (note Content-Length) + ssl_sock.send(( + 'HTTP/1.1 200 OK\r\n' + 'Content-Type: text/plain\r\n' + 'Content-Length: 10\r\n' + '\r\n' + 'Hi-').encode('utf-8')) + timed_out.wait() + + sock.close() + ssl_sock.close() + + self._start_server(socket_handler) + pool = HTTPSConnectionPool(self.host, self.port) + + response = pool.urlopen('GET', '/', retries=0, preload_content=False, + timeout=Timeout(connect=1, read=0.001)) + try: + self.assertRaises(ReadTimeoutError, response.read) + finally: + timed_out.set() + + +def consume_socket(sock, chunks=65536): + while not sock.recv(chunks).endswith(b'\r\n\r\n'): + pass + + +def create_response_handler(response, num=1): + def socket_handler(listener): + for _ in range(num): + sock = listener.accept()[0] + consume_socket(sock) + + sock.send(response) + sock.close() + + return socket_handler + + +class TestErrorWrapping(SocketDummyServerTestCase): + + def test_bad_statusline(self): + handler = create_response_handler( + b'HTTP/1.1 Omg What Is This?\r\n' + b'Content-Length: 0\r\n' + b'\r\n' + ) + self._start_server(handler) + pool = HTTPConnectionPool(self.host, self.port, retries=False) + self.assertRaises(ProtocolError, pool.request, 'GET', '/') + + def test_unknown_protocol(self): + handler = create_response_handler( + b'HTTP/1000 200 OK\r\n' + b'Content-Length: 0\r\n' + b'\r\n' + ) + self._start_server(handler) + pool = HTTPConnectionPool(self.host, self.port, retries=False) + self.assertRaises(ProtocolError, pool.request, 'GET', '/') diff --git a/test/with_dummyserver/test_socketlevel.pyc b/test/with_dummyserver/test_socketlevel.pyc Binary files differnew file mode 100644 index 0000000..ba3b19e --- /dev/null +++ b/test/with_dummyserver/test_socketlevel.pyc diff --git a/urllib3.egg-info/PKG-INFO b/urllib3.egg-info/PKG-INFO new file mode 100644 index 0000000..964cd4b --- /dev/null +++ b/urllib3.egg-info/PKG-INFO @@ -0,0 +1,625 @@ +Metadata-Version: 1.1 +Name: urllib3 +Version: 1.9.1 +Summary: HTTP library with thread-safe connection pooling, file post, and more. +Home-page: http://urllib3.readthedocs.org/ +Author: Andrey Petrov +Author-email: andrey.petrov@shazow.net +License: MIT +Description: ======= + urllib3 + ======= + + .. image:: https://travis-ci.org/shazow/urllib3.png?branch=master + :target: https://travis-ci.org/shazow/urllib3 + + .. image:: https://www.bountysource.com/badge/tracker?tracker_id=192525 + :target: https://www.bountysource.com/trackers/192525-urllib3?utm_source=192525&utm_medium=shield&utm_campaign=TRACKER_BADGE + + + Highlights + ========== + + - Re-use the same socket connection for multiple requests + (``HTTPConnectionPool`` and ``HTTPSConnectionPool``) + (with optional client-side certificate verification). + - File posting (``encode_multipart_formdata``). + - Built-in redirection and retries (optional). + - Supports gzip and deflate decoding. + - Thread-safe and sanity-safe. + - Works with AppEngine, gevent, and eventlib. + - Tested on Python 2.6+, Python 3.2+, and PyPy, with 100% unit test coverage. + - Small and easy to understand codebase perfect for extending and building upon. + For a more comprehensive solution, have a look at + `Requests <http://python-requests.org/>`_ which is also powered by ``urllib3``. + + + You might already be using urllib3! + =================================== + + ``urllib3`` powers `many great Python libraries + <https://sourcegraph.com/search?q=package+urllib3>`_, including ``pip`` and + ``requests``. + + + What's wrong with urllib and urllib2? + ===================================== + + There are two critical features missing from the Python standard library: + Connection re-using/pooling and file posting. It's not terribly hard to + implement these yourself, but it's much easier to use a module that already + did the work for you. + + The Python standard libraries ``urllib`` and ``urllib2`` have little to do + with each other. They were designed to be independent and standalone, each + solving a different scope of problems, and ``urllib3`` follows in a similar + vein. + + + Why do I want to reuse connections? + =================================== + + Performance. When you normally do a urllib call, a separate socket + connection is created with each request. By reusing existing sockets + (supported since HTTP 1.1), the requests will take up less resources on the + server's end, and also provide a faster response time at the client's end. + With some simple benchmarks (see `test/benchmark.py + <https://github.com/shazow/urllib3/blob/master/test/benchmark.py>`_ + ), downloading 15 URLs from google.com is about twice as fast when using + HTTPConnectionPool (which uses 1 connection) than using plain urllib (which + uses 15 connections). + + This library is perfect for: + + - Talking to an API + - Crawling a website + - Any situation where being able to post files, handle redirection, and + retrying is useful. It's relatively lightweight, so it can be used for + anything! + + + Examples + ======== + + Go to `urllib3.readthedocs.org <http://urllib3.readthedocs.org>`_ + for more nice syntax-highlighted examples. + + But, long story short:: + + import urllib3 + + http = urllib3.PoolManager() + + r = http.request('GET', 'http://google.com/') + + print r.status, r.data + + The ``PoolManager`` will take care of reusing connections for you whenever + you request the same host. For more fine-grained control of your connection + pools, you should look at `ConnectionPool + <http://urllib3.readthedocs.org/#connectionpool>`_. + + + Run the tests + ============= + + We use some external dependencies, multiple interpreters and code coverage + analysis while running test suite. Our ``Makefile`` handles much of this for + you as long as you're running it `inside of a virtualenv + <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_:: + + $ make test + [... magically installs dependencies and runs tests on your virtualenv] + Ran 182 tests in 1.633s + + OK (SKIP=6) + + Note that code coverage less than 100% is regarded as a failing run. Some + platform-specific tests are skipped unless run in that platform. To make sure + the code works in all of urllib3's supported platforms, you can run our ``tox`` + suite:: + + $ make test-all + [... tox creates a virtualenv for every platform and runs tests inside of each] + py26: commands succeeded + py27: commands succeeded + py32: commands succeeded + py33: commands succeeded + py34: commands succeeded + + Our test suite `runs continuously on Travis CI + <https://travis-ci.org/shazow/urllib3>`_ with every pull request. + + + Contributing + ============ + + #. `Check for open issues <https://github.com/shazow/urllib3/issues>`_ or open + a fresh issue to start a discussion around a feature idea or a bug. There is + a *Contributor Friendly* tag for issues that should be ideal for people who + are not very familiar with the codebase yet. + #. Fork the `urllib3 repository on Github <https://github.com/shazow/urllib3>`_ + to start making your changes. + #. Write a test which shows that the bug was fixed or that the feature works + as expected. + #. Send a pull request and bug the maintainer until it gets merged and published. + :) Make sure to add yourself to ``CONTRIBUTORS.txt``. + + + Sponsorship + =========== + + If your company benefits from this library, please consider `sponsoring its + development <http://urllib3.readthedocs.org/en/latest/#sponsorship>`_. + + + Changes + ======= + + 1.9.1 (2014-09-13) + ++++++++++++++++++ + + * Apply socket arguments before binding. (Issue #427) + + * More careful checks if fp-like object is closed. (Issue #435) + + * Fixed packaging issues of some development-related files not + getting included. (Issue #440) + + * Allow performing *only* fingerprint verification. (Issue #444) + + * Emit ``SecurityWarning`` if system clock is waaay off. (Issue #445) + + * Fixed PyOpenSSL compatibility with PyPy. (Issue #450) + + * Fixed ``BrokenPipeError`` and ``ConnectionError`` handling in Py3. + (Issue #443) + + + + 1.9 (2014-07-04) + ++++++++++++++++ + + * Shuffled around development-related files. If you're maintaining a distro + package of urllib3, you may need to tweak things. (Issue #415) + + * Unverified HTTPS requests will trigger a warning on the first request. See + our new `security documentation + <https://urllib3.readthedocs.org/en/latest/security.html>`_ for details. + (Issue #426) + + * New retry logic and ``urllib3.util.retry.Retry`` configuration object. + (Issue #326) + + * All raised exceptions should now wrapped in a + ``urllib3.exceptions.HTTPException``-extending exception. (Issue #326) + + * All errors during a retry-enabled request should be wrapped in + ``urllib3.exceptions.MaxRetryError``, including timeout-related exceptions + which were previously exempt. Underlying error is accessible from the + ``.reason`` propery. (Issue #326) + + * ``urllib3.exceptions.ConnectionError`` renamed to + ``urllib3.exceptions.ProtocolError``. (Issue #326) + + * Errors during response read (such as IncompleteRead) are now wrapped in + ``urllib3.exceptions.ProtocolError``. (Issue #418) + + * Requesting an empty host will raise ``urllib3.exceptions.LocationValueError``. + (Issue #417) + + * Catch read timeouts over SSL connections as + ``urllib3.exceptions.ReadTimeoutError``. (Issue #419) + + * Apply socket arguments before connecting. (Issue #427) + + + 1.8.3 (2014-06-23) + ++++++++++++++++++ + + * Fix TLS verification when using a proxy in Python 3.4.1. (Issue #385) + + * Add ``disable_cache`` option to ``urllib3.util.make_headers``. (Issue #393) + + * Wrap ``socket.timeout`` exception with + ``urllib3.exceptions.ReadTimeoutError``. (Issue #399) + + * Fixed proxy-related bug where connections were being reused incorrectly. + (Issues #366, #369) + + * Added ``socket_options`` keyword parameter which allows to define + ``setsockopt`` configuration of new sockets. (Issue #397) + + * Removed ``HTTPConnection.tcp_nodelay`` in favor of + ``HTTPConnection.default_socket_options``. (Issue #397) + + * Fixed ``TypeError`` bug in Python 2.6.4. (Issue #411) + + + 1.8.2 (2014-04-17) + ++++++++++++++++++ + + * Fix ``urllib3.util`` not being included in the package. + + + 1.8.1 (2014-04-17) + ++++++++++++++++++ + + * Fix AppEngine bug of HTTPS requests going out as HTTP. (Issue #356) + + * Don't install ``dummyserver`` into ``site-packages`` as it's only needed + for the test suite. (Issue #362) + + * Added support for specifying ``source_address``. (Issue #352) + + + 1.8 (2014-03-04) + ++++++++++++++++ + + * Improved url parsing in ``urllib3.util.parse_url`` (properly parse '@' in + username, and blank ports like 'hostname:'). + + * New ``urllib3.connection`` module which contains all the HTTPConnection + objects. + + * Several ``urllib3.util.Timeout``-related fixes. Also changed constructor + signature to a more sensible order. [Backwards incompatible] + (Issues #252, #262, #263) + + * Use ``backports.ssl_match_hostname`` if it's installed. (Issue #274) + + * Added ``.tell()`` method to ``urllib3.response.HTTPResponse`` which + returns the number of bytes read so far. (Issue #277) + + * Support for platforms without threading. (Issue #289) + + * Expand default-port comparison in ``HTTPConnectionPool.is_same_host`` + to allow a pool with no specified port to be considered equal to to an + HTTP/HTTPS url with port 80/443 explicitly provided. (Issue #305) + + * Improved default SSL/TLS settings to avoid vulnerabilities. + (Issue #309) + + * Fixed ``urllib3.poolmanager.ProxyManager`` not retrying on connect errors. + (Issue #310) + + * Disable Nagle's Algorithm on the socket for non-proxies. A subset of requests + will send the entire HTTP request ~200 milliseconds faster; however, some of + the resulting TCP packets will be smaller. (Issue #254) + + * Increased maximum number of SubjectAltNames in ``urllib3.contrib.pyopenssl`` + from the default 64 to 1024 in a single certificate. (Issue #318) + + * Headers are now passed and stored as a custom + ``urllib3.collections_.HTTPHeaderDict`` object rather than a plain ``dict``. + (Issue #329, #333) + + * Headers no longer lose their case on Python 3. (Issue #236) + + * ``urllib3.contrib.pyopenssl`` now uses the operating system's default CA + certificates on inject. (Issue #332) + + * Requests with ``retries=False`` will immediately raise any exceptions without + wrapping them in ``MaxRetryError``. (Issue #348) + + * Fixed open socket leak with SSL-related failures. (Issue #344, #348) + + + 1.7.1 (2013-09-25) + ++++++++++++++++++ + + * Added granular timeout support with new ``urllib3.util.Timeout`` class. + (Issue #231) + + * Fixed Python 3.4 support. (Issue #238) + + + 1.7 (2013-08-14) + ++++++++++++++++ + + * More exceptions are now pickle-able, with tests. (Issue #174) + + * Fixed redirecting with relative URLs in Location header. (Issue #178) + + * Support for relative urls in ``Location: ...`` header. (Issue #179) + + * ``urllib3.response.HTTPResponse`` now inherits from ``io.IOBase`` for bonus + file-like functionality. (Issue #187) + + * Passing ``assert_hostname=False`` when creating a HTTPSConnectionPool will + skip hostname verification for SSL connections. (Issue #194) + + * New method ``urllib3.response.HTTPResponse.stream(...)`` which acts as a + generator wrapped around ``.read(...)``. (Issue #198) + + * IPv6 url parsing enforces brackets around the hostname. (Issue #199) + + * Fixed thread race condition in + ``urllib3.poolmanager.PoolManager.connection_from_host(...)`` (Issue #204) + + * ``ProxyManager`` requests now include non-default port in ``Host: ...`` + header. (Issue #217) + + * Added HTTPS proxy support in ``ProxyManager``. (Issue #170 #139) + + * New ``RequestField`` object can be passed to the ``fields=...`` param which + can specify headers. (Issue #220) + + * Raise ``urllib3.exceptions.ProxyError`` when connecting to proxy fails. + (Issue #221) + + * Use international headers when posting file names. (Issue #119) + + * Improved IPv6 support. (Issue #203) + + + 1.6 (2013-04-25) + ++++++++++++++++ + + * Contrib: Optional SNI support for Py2 using PyOpenSSL. (Issue #156) + + * ``ProxyManager`` automatically adds ``Host: ...`` header if not given. + + * Improved SSL-related code. ``cert_req`` now optionally takes a string like + "REQUIRED" or "NONE". Same with ``ssl_version`` takes strings like "SSLv23" + The string values reflect the suffix of the respective constant variable. + (Issue #130) + + * Vendored ``socksipy`` now based on Anorov's fork which handles unexpectedly + closed proxy connections and larger read buffers. (Issue #135) + + * Ensure the connection is closed if no data is received, fixes connection leak + on some platforms. (Issue #133) + + * Added SNI support for SSL/TLS connections on Py32+. (Issue #89) + + * Tests fixed to be compatible with Py26 again. (Issue #125) + + * Added ability to choose SSL version by passing an ``ssl.PROTOCOL_*`` constant + to the ``ssl_version`` parameter of ``HTTPSConnectionPool``. (Issue #109) + + * Allow an explicit content type to be specified when encoding file fields. + (Issue #126) + + * Exceptions are now pickleable, with tests. (Issue #101) + + * Fixed default headers not getting passed in some cases. (Issue #99) + + * Treat "content-encoding" header value as case-insensitive, per RFC 2616 + Section 3.5. (Issue #110) + + * "Connection Refused" SocketErrors will get retried rather than raised. + (Issue #92) + + * Updated vendored ``six``, no longer overrides the global ``six`` module + namespace. (Issue #113) + + * ``urllib3.exceptions.MaxRetryError`` contains a ``reason`` property holding + the exception that prompted the final retry. If ``reason is None`` then it + was due to a redirect. (Issue #92, #114) + + * Fixed ``PoolManager.urlopen()`` from not redirecting more than once. + (Issue #149) + + * Don't assume ``Content-Type: text/plain`` for multi-part encoding parameters + that are not files. (Issue #111) + + * Pass `strict` param down to ``httplib.HTTPConnection``. (Issue #122) + + * Added mechanism to verify SSL certificates by fingerprint (md5, sha1) or + against an arbitrary hostname (when connecting by IP or for misconfigured + servers). (Issue #140) + + * Streaming decompression support. (Issue #159) + + + 1.5 (2012-08-02) + ++++++++++++++++ + + * Added ``urllib3.add_stderr_logger()`` for quickly enabling STDERR debug + logging in urllib3. + + * Native full URL parsing (including auth, path, query, fragment) available in + ``urllib3.util.parse_url(url)``. + + * Built-in redirect will switch method to 'GET' if status code is 303. + (Issue #11) + + * ``urllib3.PoolManager`` strips the scheme and host before sending the request + uri. (Issue #8) + + * New ``urllib3.exceptions.DecodeError`` exception for when automatic decoding, + based on the Content-Type header, fails. + + * Fixed bug with pool depletion and leaking connections (Issue #76). Added + explicit connection closing on pool eviction. Added + ``urllib3.PoolManager.clear()``. + + * 99% -> 100% unit test coverage. + + + 1.4 (2012-06-16) + ++++++++++++++++ + + * Minor AppEngine-related fixes. + + * Switched from ``mimetools.choose_boundary`` to ``uuid.uuid4()``. + + * Improved url parsing. (Issue #73) + + * IPv6 url support. (Issue #72) + + + 1.3 (2012-03-25) + ++++++++++++++++ + + * Removed pre-1.0 deprecated API. + + * Refactored helpers into a ``urllib3.util`` submodule. + + * Fixed multipart encoding to support list-of-tuples for keys with multiple + values. (Issue #48) + + * Fixed multiple Set-Cookie headers in response not getting merged properly in + Python 3. (Issue #53) + + * AppEngine support with Py27. (Issue #61) + + * Minor ``encode_multipart_formdata`` fixes related to Python 3 strings vs + bytes. + + + 1.2.2 (2012-02-06) + ++++++++++++++++++ + + * Fixed packaging bug of not shipping ``test-requirements.txt``. (Issue #47) + + + 1.2.1 (2012-02-05) + ++++++++++++++++++ + + * Fixed another bug related to when ``ssl`` module is not available. (Issue #41) + + * Location parsing errors now raise ``urllib3.exceptions.LocationParseError`` + which inherits from ``ValueError``. + + + 1.2 (2012-01-29) + ++++++++++++++++ + + * Added Python 3 support (tested on 3.2.2) + + * Dropped Python 2.5 support (tested on 2.6.7, 2.7.2) + + * Use ``select.poll`` instead of ``select.select`` for platforms that support + it. + + * Use ``Queue.LifoQueue`` instead of ``Queue.Queue`` for more aggressive + connection reusing. Configurable by overriding ``ConnectionPool.QueueCls``. + + * Fixed ``ImportError`` during install when ``ssl`` module is not available. + (Issue #41) + + * Fixed ``PoolManager`` redirects between schemes (such as HTTP -> HTTPS) not + completing properly. (Issue #28, uncovered by Issue #10 in v1.1) + + * Ported ``dummyserver`` to use ``tornado`` instead of ``webob`` + + ``eventlet``. Removed extraneous unsupported dummyserver testing backends. + Added socket-level tests. + + * More tests. Achievement Unlocked: 99% Coverage. + + + 1.1 (2012-01-07) + ++++++++++++++++ + + * Refactored ``dummyserver`` to its own root namespace module (used for + testing). + + * Added hostname verification for ``VerifiedHTTPSConnection`` by vendoring in + Py32's ``ssl_match_hostname``. (Issue #25) + + * Fixed cross-host HTTP redirects when using ``PoolManager``. (Issue #10) + + * Fixed ``decode_content`` being ignored when set through ``urlopen``. (Issue + #27) + + * Fixed timeout-related bugs. (Issues #17, #23) + + + 1.0.2 (2011-11-04) + ++++++++++++++++++ + + * Fixed typo in ``VerifiedHTTPSConnection`` which would only present as a bug if + you're using the object manually. (Thanks pyos) + + * Made RecentlyUsedContainer (and consequently PoolManager) more thread-safe by + wrapping the access log in a mutex. (Thanks @christer) + + * Made RecentlyUsedContainer more dict-like (corrected ``__delitem__`` and + ``__getitem__`` behaviour), with tests. Shouldn't affect core urllib3 code. + + + 1.0.1 (2011-10-10) + ++++++++++++++++++ + + * Fixed a bug where the same connection would get returned into the pool twice, + causing extraneous "HttpConnectionPool is full" log warnings. + + + 1.0 (2011-10-08) + ++++++++++++++++ + + * Added ``PoolManager`` with LRU expiration of connections (tested and + documented). + * Added ``ProxyManager`` (needs tests, docs, and confirmation that it works + with HTTPS proxies). + * Added optional partial-read support for responses when + ``preload_content=False``. You can now make requests and just read the headers + without loading the content. + * Made response decoding optional (default on, same as before). + * Added optional explicit boundary string for ``encode_multipart_formdata``. + * Convenience request methods are now inherited from ``RequestMethods``. Old + helpers like ``get_url`` and ``post_url`` should be abandoned in favour of + the new ``request(method, url, ...)``. + * Refactored code to be even more decoupled, reusable, and extendable. + * License header added to ``.py`` files. + * Embiggened the documentation: Lots of Sphinx-friendly docstrings in the code + and docs in ``docs/`` and on urllib3.readthedocs.org. + * Embettered all the things! + * Started writing this file. + + + 0.4.1 (2011-07-17) + ++++++++++++++++++ + + * Minor bug fixes, code cleanup. + + + 0.4 (2011-03-01) + ++++++++++++++++ + + * Better unicode support. + * Added ``VerifiedHTTPSConnection``. + * Added ``NTLMConnectionPool`` in contrib. + * Minor improvements. + + + 0.3.1 (2010-07-13) + ++++++++++++++++++ + + * Added ``assert_host_name`` optional parameter. Now compatible with proxies. + + + 0.3 (2009-12-10) + ++++++++++++++++ + + * Added HTTPS support. + * Minor bug fixes. + * Refactored, broken backwards compatibility with 0.2. + * API to be treated as stable from this version forward. + + + 0.2 (2008-11-17) + ++++++++++++++++ + + * Added unit tests. + * Bug fixes. + + + 0.1 (2008-11-16) + ++++++++++++++++ + + * First release. + +Keywords: urllib httplib threadsafe filepost http https ssl pooling +Platform: UNKNOWN +Classifier: Environment :: Web Environment +Classifier: Intended Audience :: Developers +Classifier: License :: OSI Approved :: MIT License +Classifier: Operating System :: OS Independent +Classifier: Programming Language :: Python +Classifier: Programming Language :: Python :: 2 +Classifier: Programming Language :: Python :: 3 +Classifier: Topic :: Internet :: WWW/HTTP +Classifier: Topic :: Software Development :: Libraries diff --git a/urllib3.egg-info/SOURCES.txt b/urllib3.egg-info/SOURCES.txt new file mode 100644 index 0000000..2f0b5fc --- /dev/null +++ b/urllib3.egg-info/SOURCES.txt @@ -0,0 +1,115 @@ +CHANGES.rst +CONTRIBUTORS.txt +LICENSE.txt +MANIFEST.in +Makefile +README.rst +dev-requirements.txt +setup.cfg +setup.py +docs/Makefile +docs/README +docs/collections.rst +docs/conf.py +docs/contrib.rst +docs/doc-requirements.txt +docs/exceptions.rst +docs/helpers.rst +docs/index.rst +docs/make.bat +docs/managers.rst +docs/pools.rst +docs/security.rst +dummyserver/__init__.py +dummyserver/__init__.pyc +dummyserver/handlers.py +dummyserver/handlers.pyc +dummyserver/proxy.py +dummyserver/proxy.pyc +dummyserver/server.py +dummyserver/server.pyc +dummyserver/testcase.py +dummyserver/testcase.pyc +dummyserver/certs/cacert.key +dummyserver/certs/cacert.pem +dummyserver/certs/client.csr +dummyserver/certs/client.key +dummyserver/certs/client.pem +dummyserver/certs/client_bad.pem +dummyserver/certs/server.crt +dummyserver/certs/server.csr +dummyserver/certs/server.key +dummyserver/certs/server.key.org +test/__init__.py +test/__init__.pyc +test/benchmark.py +test/port_helpers.py +test/port_helpers.pyc +test/test_collections.py +test/test_collections.pyc +test/test_compatibility.py +test/test_compatibility.pyc +test/test_connectionpool.py +test/test_connectionpool.pyc +test/test_exceptions.py +test/test_exceptions.pyc +test/test_fields.py +test/test_fields.pyc +test/test_filepost.py +test/test_filepost.pyc +test/test_poolmanager.py +test/test_poolmanager.pyc +test/test_proxymanager.py +test/test_proxymanager.pyc +test/test_response.py +test/test_response.pyc +test/test_retry.py +test/test_retry.pyc +test/test_util.py +test/test_util.pyc +test/contrib/__init__.py +test/contrib/__init__.pyc +test/contrib/test_pyopenssl.py +test/contrib/test_pyopenssl.pyc +test/with_dummyserver/__init__.py +test/with_dummyserver/__init__.pyc +test/with_dummyserver/test_connectionpool.py +test/with_dummyserver/test_connectionpool.pyc +test/with_dummyserver/test_https.py +test/with_dummyserver/test_https.pyc +test/with_dummyserver/test_poolmanager.py +test/with_dummyserver/test_poolmanager.pyc +test/with_dummyserver/test_proxy_poolmanager.py +test/with_dummyserver/test_proxy_poolmanager.pyc +test/with_dummyserver/test_socketlevel.py +test/with_dummyserver/test_socketlevel.pyc +urllib3/__init__.py +urllib3/_collections.py +urllib3/connection.py +urllib3/connectionpool.py +urllib3/exceptions.py +urllib3/fields.py +urllib3/filepost.py +urllib3/poolmanager.py +urllib3/request.py +urllib3/response.py +urllib3.egg-info/PKG-INFO +urllib3.egg-info/SOURCES.txt +urllib3.egg-info/dependency_links.txt +urllib3.egg-info/top_level.txt +urllib3/contrib/__init__.py +urllib3/contrib/ntlmpool.py +urllib3/contrib/pyopenssl.py +urllib3/packages/__init__.py +urllib3/packages/ordered_dict.py +urllib3/packages/six.py +urllib3/packages/ssl_match_hostname/__init__.py +urllib3/packages/ssl_match_hostname/_implementation.py +urllib3/util/__init__.py +urllib3/util/connection.py +urllib3/util/request.py +urllib3/util/response.py +urllib3/util/retry.py +urllib3/util/ssl_.py +urllib3/util/timeout.py +urllib3/util/url.py
\ No newline at end of file diff --git a/urllib3.egg-info/dependency_links.txt b/urllib3.egg-info/dependency_links.txt new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/urllib3.egg-info/dependency_links.txt @@ -0,0 +1 @@ + diff --git a/urllib3.egg-info/top_level.txt b/urllib3.egg-info/top_level.txt new file mode 100644 index 0000000..a42590b --- /dev/null +++ b/urllib3.egg-info/top_level.txt @@ -0,0 +1 @@ +urllib3 diff --git a/urllib3/__init__.py b/urllib3/__init__.py new file mode 100644 index 0000000..3546d13 --- /dev/null +++ b/urllib3/__init__.py @@ -0,0 +1,66 @@ +""" +urllib3 - Thread-safe connection pooling and re-using. +""" + +__author__ = 'Andrey Petrov (andrey.petrov@shazow.net)' +__license__ = 'MIT' +__version__ = '1.9.1' + + +from .connectionpool import ( + HTTPConnectionPool, + HTTPSConnectionPool, + connection_from_url +) + +from . import exceptions +from .filepost import encode_multipart_formdata +from .poolmanager import PoolManager, ProxyManager, proxy_from_url +from .response import HTTPResponse +from .util.request import make_headers +from .util.url import get_host +from .util.timeout import Timeout +from .util.retry import Retry + + +# Set default logging handler to avoid "No handler found" warnings. +import logging +try: # Python 2.7+ + from logging import NullHandler +except ImportError: + class NullHandler(logging.Handler): + def emit(self, record): + pass + +logging.getLogger(__name__).addHandler(NullHandler()) + +def add_stderr_logger(level=logging.DEBUG): + """ + Helper for quickly adding a StreamHandler to the logger. Useful for + debugging. + + Returns the handler after adding it. + """ + # This method needs to be in this __init__.py to get the __name__ correct + # even if urllib3 is vendored within another package. + logger = logging.getLogger(__name__) + handler = logging.StreamHandler() + handler.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(message)s')) + logger.addHandler(handler) + logger.setLevel(level) + logger.debug('Added a stderr logging handler to logger: %s' % __name__) + return handler + +# ... Clean up. +del NullHandler + + +# Set security warning to only go off once by default. +import warnings +warnings.simplefilter('module', exceptions.SecurityWarning) + +def disable_warnings(category=exceptions.HTTPWarning): + """ + Helper for quickly disabling all urllib3 warnings. + """ + warnings.simplefilter('ignore', category) diff --git a/urllib3/_collections.py b/urllib3/_collections.py new file mode 100644 index 0000000..d77ebb8 --- /dev/null +++ b/urllib3/_collections.py @@ -0,0 +1,199 @@ +from collections import Mapping, MutableMapping +try: + from threading import RLock +except ImportError: # Platform-specific: No threads available + class RLock: + def __enter__(self): + pass + + def __exit__(self, exc_type, exc_value, traceback): + pass + + +try: # Python 2.7+ + from collections import OrderedDict +except ImportError: + from .packages.ordered_dict import OrderedDict +from .packages.six import itervalues + + +__all__ = ['RecentlyUsedContainer', 'HTTPHeaderDict'] + + +_Null = object() + + +class RecentlyUsedContainer(MutableMapping): + """ + Provides a thread-safe dict-like container which maintains up to + ``maxsize`` keys while throwing away the least-recently-used keys beyond + ``maxsize``. + + :param maxsize: + Maximum number of recent elements to retain. + + :param dispose_func: + Every time an item is evicted from the container, + ``dispose_func(value)`` is called. Callback which will get called + """ + + ContainerCls = OrderedDict + + def __init__(self, maxsize=10, dispose_func=None): + self._maxsize = maxsize + self.dispose_func = dispose_func + + self._container = self.ContainerCls() + self.lock = RLock() + + def __getitem__(self, key): + # Re-insert the item, moving it to the end of the eviction line. + with self.lock: + item = self._container.pop(key) + self._container[key] = item + return item + + def __setitem__(self, key, value): + evicted_value = _Null + with self.lock: + # Possibly evict the existing value of 'key' + evicted_value = self._container.get(key, _Null) + self._container[key] = value + + # If we didn't evict an existing value, we might have to evict the + # least recently used item from the beginning of the container. + if len(self._container) > self._maxsize: + _key, evicted_value = self._container.popitem(last=False) + + if self.dispose_func and evicted_value is not _Null: + self.dispose_func(evicted_value) + + def __delitem__(self, key): + with self.lock: + value = self._container.pop(key) + + if self.dispose_func: + self.dispose_func(value) + + def __len__(self): + with self.lock: + return len(self._container) + + def __iter__(self): + raise NotImplementedError('Iteration over this class is unlikely to be threadsafe.') + + def clear(self): + with self.lock: + # Copy pointers to all values, then wipe the mapping + # under Python 2, this copies the list of values twice :-| + values = list(self._container.values()) + self._container.clear() + + if self.dispose_func: + for value in values: + self.dispose_func(value) + + def keys(self): + with self.lock: + return self._container.keys() + + +class HTTPHeaderDict(MutableMapping): + """ + :param headers: + An iterable of field-value pairs. Must not contain multiple field names + when compared case-insensitively. + + :param kwargs: + Additional field-value pairs to pass in to ``dict.update``. + + A ``dict`` like container for storing HTTP Headers. + + Field names are stored and compared case-insensitively in compliance with + RFC 7230. Iteration provides the first case-sensitive key seen for each + case-insensitive pair. + + Using ``__setitem__`` syntax overwrites fields that compare equal + case-insensitively in order to maintain ``dict``'s api. For fields that + compare equal, instead create a new ``HTTPHeaderDict`` and use ``.add`` + in a loop. + + If multiple fields that are equal case-insensitively are passed to the + constructor or ``.update``, the behavior is undefined and some will be + lost. + + >>> headers = HTTPHeaderDict() + >>> headers.add('Set-Cookie', 'foo=bar') + >>> headers.add('set-cookie', 'baz=quxx') + >>> headers['content-length'] = '7' + >>> headers['SET-cookie'] + 'foo=bar, baz=quxx' + >>> headers['Content-Length'] + '7' + + If you want to access the raw headers with their original casing + for debugging purposes you can access the private ``._data`` attribute + which is a normal python ``dict`` that maps the case-insensitive key to a + list of tuples stored as (case-sensitive-original-name, value). Using the + structure from above as our example: + + >>> headers._data + {'set-cookie': [('Set-Cookie', 'foo=bar'), ('set-cookie', 'baz=quxx')], + 'content-length': [('content-length', '7')]} + """ + + def __init__(self, headers=None, **kwargs): + self._data = {} + if headers is None: + headers = {} + self.update(headers, **kwargs) + + def add(self, key, value): + """Adds a (name, value) pair, doesn't overwrite the value if it already + exists. + + >>> headers = HTTPHeaderDict(foo='bar') + >>> headers.add('Foo', 'baz') + >>> headers['foo'] + 'bar, baz' + """ + self._data.setdefault(key.lower(), []).append((key, value)) + + def getlist(self, key): + """Returns a list of all the values for the named field. Returns an + empty list if the key doesn't exist.""" + return self[key].split(', ') if key in self else [] + + def copy(self): + h = HTTPHeaderDict() + for key in self._data: + for rawkey, value in self._data[key]: + h.add(rawkey, value) + return h + + def __eq__(self, other): + if not isinstance(other, Mapping): + return False + other = HTTPHeaderDict(other) + return dict((k1, self[k1]) for k1 in self._data) == \ + dict((k2, other[k2]) for k2 in other._data) + + def __getitem__(self, key): + values = self._data[key.lower()] + return ', '.join(value[1] for value in values) + + def __setitem__(self, key, value): + self._data[key.lower()] = [(key, value)] + + def __delitem__(self, key): + del self._data[key.lower()] + + def __len__(self): + return len(self._data) + + def __iter__(self): + for headers in itervalues(self._data): + yield headers[0][0] + + def __repr__(self): + return '%s(%r)' % (self.__class__.__name__, dict(self.items())) diff --git a/urllib3/connection.py b/urllib3/connection.py new file mode 100644 index 0000000..cebdd86 --- /dev/null +++ b/urllib3/connection.py @@ -0,0 +1,254 @@ +import datetime +import sys +import socket +from socket import timeout as SocketTimeout +import warnings +from .packages import six + +try: # Python 3 + from http.client import HTTPConnection as _HTTPConnection, HTTPException +except ImportError: + from httplib import HTTPConnection as _HTTPConnection, HTTPException + + +class DummyConnection(object): + "Used to detect a failed ConnectionCls import." + pass + + +try: # Compiled with SSL? + HTTPSConnection = DummyConnection + import ssl + BaseSSLError = ssl.SSLError +except (ImportError, AttributeError): # Platform-specific: No SSL. + ssl = None + + class BaseSSLError(BaseException): + pass + + +try: # Python 3: + # Not a no-op, we're adding this to the namespace so it can be imported. + ConnectionError = ConnectionError +except NameError: # Python 2: + class ConnectionError(Exception): + pass + + +from .exceptions import ( + ConnectTimeoutError, + SystemTimeWarning, +) +from .packages.ssl_match_hostname import match_hostname + +from .util.ssl_ import ( + resolve_cert_reqs, + resolve_ssl_version, + ssl_wrap_socket, + assert_fingerprint, +) + + +from .util import connection + +port_by_scheme = { + 'http': 80, + 'https': 443, +} + +RECENT_DATE = datetime.date(2014, 1, 1) + + +class HTTPConnection(_HTTPConnection, object): + """ + Based on httplib.HTTPConnection but provides an extra constructor + backwards-compatibility layer between older and newer Pythons. + + Additional keyword parameters are used to configure attributes of the connection. + Accepted parameters include: + + - ``strict``: See the documentation on :class:`urllib3.connectionpool.HTTPConnectionPool` + - ``source_address``: Set the source address for the current connection. + + .. note:: This is ignored for Python 2.6. It is only applied for 2.7 and 3.x + + - ``socket_options``: Set specific options on the underlying socket. If not specified, then + defaults are loaded from ``HTTPConnection.default_socket_options`` which includes disabling + Nagle's algorithm (sets TCP_NODELAY to 1) unless the connection is behind a proxy. + + For example, if you wish to enable TCP Keep Alive in addition to the defaults, + you might pass:: + + HTTPConnection.default_socket_options + [ + (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1), + ] + + Or you may want to disable the defaults by passing an empty list (e.g., ``[]``). + """ + + default_port = port_by_scheme['http'] + + #: Disable Nagle's algorithm by default. + #: ``[(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)]`` + default_socket_options = [(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)] + + #: Whether this connection verifies the host's certificate. + is_verified = False + + def __init__(self, *args, **kw): + if six.PY3: # Python 3 + kw.pop('strict', None) + + # Pre-set source_address in case we have an older Python like 2.6. + self.source_address = kw.get('source_address') + + if sys.version_info < (2, 7): # Python 2.6 + # _HTTPConnection on Python 2.6 will balk at this keyword arg, but + # not newer versions. We can still use it when creating a + # connection though, so we pop it *after* we have saved it as + # self.source_address. + kw.pop('source_address', None) + + #: The socket options provided by the user. If no options are + #: provided, we use the default options. + self.socket_options = kw.pop('socket_options', self.default_socket_options) + + # Superclass also sets self.source_address in Python 2.7+. + _HTTPConnection.__init__(self, *args, **kw) + + def _new_conn(self): + """ Establish a socket connection and set nodelay settings on it. + + :return: New socket connection. + """ + extra_kw = {} + if self.source_address: + extra_kw['source_address'] = self.source_address + + if self.socket_options: + extra_kw['socket_options'] = self.socket_options + + try: + conn = connection.create_connection( + (self.host, self.port), self.timeout, **extra_kw) + + except SocketTimeout: + raise ConnectTimeoutError( + self, "Connection to %s timed out. (connect timeout=%s)" % + (self.host, self.timeout)) + + return conn + + def _prepare_conn(self, conn): + self.sock = conn + # the _tunnel_host attribute was added in python 2.6.3 (via + # http://hg.python.org/cpython/rev/0f57b30a152f) so pythons 2.6(0-2) do + # not have them. + if getattr(self, '_tunnel_host', None): + # TODO: Fix tunnel so it doesn't depend on self.sock state. + self._tunnel() + # Mark this connection as not reusable + self.auto_open = 0 + + def connect(self): + conn = self._new_conn() + self._prepare_conn(conn) + + +class HTTPSConnection(HTTPConnection): + default_port = port_by_scheme['https'] + + def __init__(self, host, port=None, key_file=None, cert_file=None, + strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, **kw): + + HTTPConnection.__init__(self, host, port, strict=strict, + timeout=timeout, **kw) + + self.key_file = key_file + self.cert_file = cert_file + + # Required property for Google AppEngine 1.9.0 which otherwise causes + # HTTPS requests to go out as HTTP. (See Issue #356) + self._protocol = 'https' + + def connect(self): + conn = self._new_conn() + self._prepare_conn(conn) + self.sock = ssl.wrap_socket(conn, self.key_file, self.cert_file) + + +class VerifiedHTTPSConnection(HTTPSConnection): + """ + Based on httplib.HTTPSConnection but wraps the socket with + SSL certification. + """ + cert_reqs = None + ca_certs = None + ssl_version = None + assert_fingerprint = None + + def set_cert(self, key_file=None, cert_file=None, + cert_reqs=None, ca_certs=None, + assert_hostname=None, assert_fingerprint=None): + + self.key_file = key_file + self.cert_file = cert_file + self.cert_reqs = cert_reqs + self.ca_certs = ca_certs + self.assert_hostname = assert_hostname + self.assert_fingerprint = assert_fingerprint + + def connect(self): + # Add certificate verification + conn = self._new_conn() + + resolved_cert_reqs = resolve_cert_reqs(self.cert_reqs) + resolved_ssl_version = resolve_ssl_version(self.ssl_version) + + hostname = self.host + if getattr(self, '_tunnel_host', None): + # _tunnel_host was added in Python 2.6.3 + # (See: http://hg.python.org/cpython/rev/0f57b30a152f) + + self.sock = conn + # Calls self._set_hostport(), so self.host is + # self._tunnel_host below. + self._tunnel() + # Mark this connection as not reusable + self.auto_open = 0 + + # Override the host with the one we're requesting data from. + hostname = self._tunnel_host + + is_time_off = datetime.date.today() < RECENT_DATE + if is_time_off: + warnings.warn(( + 'System time is way off (before {0}). This will probably ' + 'lead to SSL verification errors').format(RECENT_DATE), + SystemTimeWarning + ) + + # Wrap socket using verification with the root certs in + # trusted_root_certs + self.sock = ssl_wrap_socket(conn, self.key_file, self.cert_file, + cert_reqs=resolved_cert_reqs, + ca_certs=self.ca_certs, + server_hostname=hostname, + ssl_version=resolved_ssl_version) + + if self.assert_fingerprint: + assert_fingerprint(self.sock.getpeercert(binary_form=True), + self.assert_fingerprint) + elif resolved_cert_reqs != ssl.CERT_NONE \ + and self.assert_hostname is not False: + match_hostname(self.sock.getpeercert(), + self.assert_hostname or hostname) + + self.is_verified = (resolved_cert_reqs == ssl.CERT_REQUIRED + or self.assert_fingerprint is not None) + + +if ssl: + # Make a copy for testing. + UnverifiedHTTPSConnection = HTTPSConnection + HTTPSConnection = VerifiedHTTPSConnection diff --git a/urllib3/connectionpool.py b/urllib3/connectionpool.py new file mode 100644 index 0000000..ac6e0ca --- /dev/null +++ b/urllib3/connectionpool.py @@ -0,0 +1,757 @@ +import errno +import logging +import sys +import warnings + +from socket import error as SocketError, timeout as SocketTimeout +import socket + +try: # Python 3 + from queue import LifoQueue, Empty, Full +except ImportError: + from Queue import LifoQueue, Empty, Full + import Queue as _ # Platform-specific: Windows + + +from .exceptions import ( + ClosedPoolError, + ProtocolError, + EmptyPoolError, + HostChangedError, + LocationValueError, + MaxRetryError, + ProxyError, + ReadTimeoutError, + SSLError, + TimeoutError, + InsecureRequestWarning, +) +from .packages.ssl_match_hostname import CertificateError +from .packages import six +from .connection import ( + port_by_scheme, + DummyConnection, + HTTPConnection, HTTPSConnection, VerifiedHTTPSConnection, + HTTPException, BaseSSLError, ConnectionError +) +from .request import RequestMethods +from .response import HTTPResponse + +from .util.connection import is_connection_dropped +from .util.retry import Retry +from .util.timeout import Timeout +from .util.url import get_host + + +xrange = six.moves.xrange + +log = logging.getLogger(__name__) + +_Default = object() + + +## Pool objects +class ConnectionPool(object): + """ + Base class for all connection pools, such as + :class:`.HTTPConnectionPool` and :class:`.HTTPSConnectionPool`. + """ + + scheme = None + QueueCls = LifoQueue + + def __init__(self, host, port=None): + if not host: + raise LocationValueError("No host specified.") + + # httplib doesn't like it when we include brackets in ipv6 addresses + self.host = host.strip('[]') + self.port = port + + def __str__(self): + return '%s(host=%r, port=%r)' % (type(self).__name__, + self.host, self.port) + +# This is taken from http://hg.python.org/cpython/file/7aaba721ebc0/Lib/socket.py#l252 +_blocking_errnos = set([errno.EAGAIN, errno.EWOULDBLOCK]) + + +class HTTPConnectionPool(ConnectionPool, RequestMethods): + """ + Thread-safe connection pool for one host. + + :param host: + Host used for this HTTP Connection (e.g. "localhost"), passed into + :class:`httplib.HTTPConnection`. + + :param port: + Port used for this HTTP Connection (None is equivalent to 80), passed + into :class:`httplib.HTTPConnection`. + + :param strict: + Causes BadStatusLine to be raised if the status line can't be parsed + as a valid HTTP/1.0 or 1.1 status line, passed into + :class:`httplib.HTTPConnection`. + + .. note:: + Only works in Python 2. This parameter is ignored in Python 3. + + :param timeout: + Socket timeout in seconds for each individual connection. This can + be a float or integer, which sets the timeout for the HTTP request, + or an instance of :class:`urllib3.util.Timeout` which gives you more + fine-grained control over request timeouts. After the constructor has + been parsed, this is always a `urllib3.util.Timeout` object. + + :param maxsize: + Number of connections to save that can be reused. More than 1 is useful + in multithreaded situations. If ``block`` is set to false, more + connections will be created but they will not be saved once they've + been used. + + :param block: + If set to True, no more than ``maxsize`` connections will be used at + a time. When no free connections are available, the call will block + until a connection has been released. This is a useful side effect for + particular multithreaded situations where one does not want to use more + than maxsize connections per host to prevent flooding. + + :param headers: + Headers to include with all requests, unless other headers are given + explicitly. + + :param retries: + Retry configuration to use by default with requests in this pool. + + :param _proxy: + Parsed proxy URL, should not be used directly, instead, see + :class:`urllib3.connectionpool.ProxyManager`" + + :param _proxy_headers: + A dictionary with proxy headers, should not be used directly, + instead, see :class:`urllib3.connectionpool.ProxyManager`" + + :param \**conn_kw: + Additional parameters are used to create fresh :class:`urllib3.connection.HTTPConnection`, + :class:`urllib3.connection.HTTPSConnection` instances. + """ + + scheme = 'http' + ConnectionCls = HTTPConnection + + def __init__(self, host, port=None, strict=False, + timeout=Timeout.DEFAULT_TIMEOUT, maxsize=1, block=False, + headers=None, retries=None, + _proxy=None, _proxy_headers=None, + **conn_kw): + ConnectionPool.__init__(self, host, port) + RequestMethods.__init__(self, headers) + + self.strict = strict + + if not isinstance(timeout, Timeout): + timeout = Timeout.from_float(timeout) + + if retries is None: + retries = Retry.DEFAULT + + self.timeout = timeout + self.retries = retries + + self.pool = self.QueueCls(maxsize) + self.block = block + + self.proxy = _proxy + self.proxy_headers = _proxy_headers or {} + + # Fill the queue up so that doing get() on it will block properly + for _ in xrange(maxsize): + self.pool.put(None) + + # These are mostly for testing and debugging purposes. + self.num_connections = 0 + self.num_requests = 0 + self.conn_kw = conn_kw + + if self.proxy: + # Enable Nagle's algorithm for proxies, to avoid packet fragmentation. + # We cannot know if the user has added default socket options, so we cannot replace the + # list. + self.conn_kw.setdefault('socket_options', []) + + def _new_conn(self): + """ + Return a fresh :class:`HTTPConnection`. + """ + self.num_connections += 1 + log.info("Starting new HTTP connection (%d): %s" % + (self.num_connections, self.host)) + + conn = self.ConnectionCls(host=self.host, port=self.port, + timeout=self.timeout.connect_timeout, + strict=self.strict, **self.conn_kw) + return conn + + def _get_conn(self, timeout=None): + """ + Get a connection. Will return a pooled connection if one is available. + + If no connections are available and :prop:`.block` is ``False``, then a + fresh connection is returned. + + :param timeout: + Seconds to wait before giving up and raising + :class:`urllib3.exceptions.EmptyPoolError` if the pool is empty and + :prop:`.block` is ``True``. + """ + conn = None + try: + conn = self.pool.get(block=self.block, timeout=timeout) + + except AttributeError: # self.pool is None + raise ClosedPoolError(self, "Pool is closed.") + + except Empty: + if self.block: + raise EmptyPoolError(self, + "Pool reached maximum size and no more " + "connections are allowed.") + pass # Oh well, we'll create a new connection then + + # If this is a persistent connection, check if it got disconnected + if conn and is_connection_dropped(conn): + log.info("Resetting dropped connection: %s" % self.host) + conn.close() + if getattr(conn, 'auto_open', 1) == 0: + # This is a proxied connection that has been mutated by + # httplib._tunnel() and cannot be reused (since it would + # attempt to bypass the proxy) + conn = None + + return conn or self._new_conn() + + def _put_conn(self, conn): + """ + Put a connection back into the pool. + + :param conn: + Connection object for the current host and port as returned by + :meth:`._new_conn` or :meth:`._get_conn`. + + If the pool is already full, the connection is closed and discarded + because we exceeded maxsize. If connections are discarded frequently, + then maxsize should be increased. + + If the pool is closed, then the connection will be closed and discarded. + """ + try: + self.pool.put(conn, block=False) + return # Everything is dandy, done. + except AttributeError: + # self.pool is None. + pass + except Full: + # This should never happen if self.block == True + log.warning( + "Connection pool is full, discarding connection: %s" % + self.host) + + # Connection never got put back into the pool, close it. + if conn: + conn.close() + + def _validate_conn(self, conn): + """ + Called right before a request is made, after the socket is created. + """ + pass + + def _get_timeout(self, timeout): + """ Helper that always returns a :class:`urllib3.util.Timeout` """ + if timeout is _Default: + return self.timeout.clone() + + if isinstance(timeout, Timeout): + return timeout.clone() + else: + # User passed us an int/float. This is for backwards compatibility, + # can be removed later + return Timeout.from_float(timeout) + + def _make_request(self, conn, method, url, timeout=_Default, + **httplib_request_kw): + """ + Perform a request on a given urllib connection object taken from our + pool. + + :param conn: + a connection from one of our connection pools + + :param timeout: + Socket timeout in seconds for the request. This can be a + float or integer, which will set the same timeout value for + the socket connect and the socket read, or an instance of + :class:`urllib3.util.Timeout`, which gives you more fine-grained + control over your timeouts. + """ + self.num_requests += 1 + + timeout_obj = self._get_timeout(timeout) + timeout_obj.start_connect() + conn.timeout = timeout_obj.connect_timeout + + # Trigger any extra validation we need to do. + self._validate_conn(conn) + + # conn.request() calls httplib.*.request, not the method in + # urllib3.request. It also calls makefile (recv) on the socket. + conn.request(method, url, **httplib_request_kw) + + # Reset the timeout for the recv() on the socket + read_timeout = timeout_obj.read_timeout + + # App Engine doesn't have a sock attr + if getattr(conn, 'sock', None): + # In Python 3 socket.py will catch EAGAIN and return None when you + # try and read into the file pointer created by http.client, which + # instead raises a BadStatusLine exception. Instead of catching + # the exception and assuming all BadStatusLine exceptions are read + # timeouts, check for a zero timeout before making the request. + if read_timeout == 0: + raise ReadTimeoutError( + self, url, "Read timed out. (read timeout=%s)" % read_timeout) + if read_timeout is Timeout.DEFAULT_TIMEOUT: + conn.sock.settimeout(socket.getdefaulttimeout()) + else: # None or a value + conn.sock.settimeout(read_timeout) + + # Receive the response from the server + try: + try: # Python 2.7+, use buffering of HTTP responses + httplib_response = conn.getresponse(buffering=True) + except TypeError: # Python 2.6 and older + httplib_response = conn.getresponse() + except SocketTimeout: + raise ReadTimeoutError( + self, url, "Read timed out. (read timeout=%s)" % read_timeout) + + except BaseSSLError as e: + # Catch possible read timeouts thrown as SSL errors. If not the + # case, rethrow the original. We need to do this because of: + # http://bugs.python.org/issue10272 + if 'timed out' in str(e) or \ + 'did not complete (read)' in str(e): # Python 2.6 + raise ReadTimeoutError( + self, url, "Read timed out. (read timeout=%s)" % read_timeout) + + raise + + except SocketError as e: # Platform-specific: Python 2 + # See the above comment about EAGAIN in Python 3. In Python 2 we + # have to specifically catch it and throw the timeout error + if e.errno in _blocking_errnos: + raise ReadTimeoutError( + self, url, "Read timed out. (read timeout=%s)" % read_timeout) + + raise + + # AppEngine doesn't have a version attr. + http_version = getattr(conn, '_http_vsn_str', 'HTTP/?') + log.debug("\"%s %s %s\" %s %s" % (method, url, http_version, + httplib_response.status, + httplib_response.length)) + return httplib_response + + def close(self): + """ + Close all pooled connections and disable the pool. + """ + # Disable access to the pool + old_pool, self.pool = self.pool, None + + try: + while True: + conn = old_pool.get(block=False) + if conn: + conn.close() + + except Empty: + pass # Done. + + def is_same_host(self, url): + """ + Check if the given ``url`` is a member of the same host as this + connection pool. + """ + if url.startswith('/'): + return True + + # TODO: Add optional support for socket.gethostbyname checking. + scheme, host, port = get_host(url) + + # Use explicit default port for comparison when none is given + if self.port and not port: + port = port_by_scheme.get(scheme) + elif not self.port and port == port_by_scheme.get(scheme): + port = None + + return (scheme, host, port) == (self.scheme, self.host, self.port) + + def urlopen(self, method, url, body=None, headers=None, retries=None, + redirect=True, assert_same_host=True, timeout=_Default, + pool_timeout=None, release_conn=None, **response_kw): + """ + Get a connection from the pool and perform an HTTP request. This is the + lowest level call for making a request, so you'll need to specify all + the raw details. + + .. note:: + + More commonly, it's appropriate to use a convenience method provided + by :class:`.RequestMethods`, such as :meth:`request`. + + .. note:: + + `release_conn` will only behave as expected if + `preload_content=False` because we want to make + `preload_content=False` the default behaviour someday soon without + breaking backwards compatibility. + + :param method: + HTTP request method (such as GET, POST, PUT, etc.) + + :param body: + Data to send in the request body (useful for creating + POST requests, see HTTPConnectionPool.post_url for + more convenience). + + :param headers: + Dictionary of custom headers to send, such as User-Agent, + If-None-Match, etc. If None, pool headers are used. If provided, + these headers completely replace any pool-specific headers. + + :param retries: + Configure the number of retries to allow before raising a + :class:`~urllib3.exceptions.MaxRetryError` exception. + + Pass ``None`` to retry until you receive a response. Pass a + :class:`~urllib3.util.retry.Retry` object for fine-grained control + over different types of retries. + Pass an integer number to retry connection errors that many times, + but no other types of errors. Pass zero to never retry. + + If ``False``, then retries are disabled and any exception is raised + immediately. Also, instead of raising a MaxRetryError on redirects, + the redirect response will be returned. + + :type retries: :class:`~urllib3.util.retry.Retry`, False, or an int. + + :param redirect: + If True, automatically handle redirects (status codes 301, 302, + 303, 307, 308). Each redirect counts as a retry. Disabling retries + will disable redirect, too. + + :param assert_same_host: + If ``True``, will make sure that the host of the pool requests is + consistent else will raise HostChangedError. When False, you can + use the pool on an HTTP proxy and request foreign hosts. + + :param timeout: + If specified, overrides the default timeout for this one + request. It may be a float (in seconds) or an instance of + :class:`urllib3.util.Timeout`. + + :param pool_timeout: + If set and the pool is set to block=True, then this method will + block for ``pool_timeout`` seconds and raise EmptyPoolError if no + connection is available within the time period. + + :param release_conn: + If False, then the urlopen call will not release the connection + back into the pool once a response is received (but will release if + you read the entire contents of the response such as when + `preload_content=True`). This is useful if you're not preloading + the response's content immediately. You will need to call + ``r.release_conn()`` on the response ``r`` to return the connection + back into the pool. If None, it takes the value of + ``response_kw.get('preload_content', True)``. + + :param \**response_kw: + Additional parameters are passed to + :meth:`urllib3.response.HTTPResponse.from_httplib` + """ + if headers is None: + headers = self.headers + + if not isinstance(retries, Retry): + retries = Retry.from_int(retries, redirect=redirect, default=self.retries) + + if release_conn is None: + release_conn = response_kw.get('preload_content', True) + + # Check host + if assert_same_host and not self.is_same_host(url): + raise HostChangedError(self, url, retries) + + conn = None + + # Merge the proxy headers. Only do this in HTTP. We have to copy the + # headers dict so we can safely change it without those changes being + # reflected in anyone else's copy. + if self.scheme == 'http': + headers = headers.copy() + headers.update(self.proxy_headers) + + # Must keep the exception bound to a separate variable or else Python 3 + # complains about UnboundLocalError. + err = None + + try: + # Request a connection from the queue. + conn = self._get_conn(timeout=pool_timeout) + + # Make the request on the httplib connection object. + httplib_response = self._make_request(conn, method, url, + timeout=timeout, + body=body, headers=headers) + + # If we're going to release the connection in ``finally:``, then + # the request doesn't need to know about the connection. Otherwise + # it will also try to release it and we'll have a double-release + # mess. + response_conn = not release_conn and conn + + # Import httplib's response into our own wrapper object + response = HTTPResponse.from_httplib(httplib_response, + pool=self, + connection=response_conn, + **response_kw) + + # else: + # The connection will be put back into the pool when + # ``response.release_conn()`` is called (implicitly by + # ``response.read()``) + + except Empty: + # Timed out by queue. + raise EmptyPoolError(self, "No pool connections are available.") + + except (BaseSSLError, CertificateError) as e: + # Release connection unconditionally because there is no way to + # close it externally in case of exception. + release_conn = True + raise SSLError(e) + + except (TimeoutError, HTTPException, SocketError, ConnectionError) as e: + if conn: + # Discard the connection for these exceptions. It will be + # be replaced during the next _get_conn() call. + conn.close() + conn = None + + stacktrace = sys.exc_info()[2] + if isinstance(e, SocketError) and self.proxy: + e = ProxyError('Cannot connect to proxy.', e) + elif isinstance(e, (SocketError, HTTPException)): + e = ProtocolError('Connection aborted.', e) + + retries = retries.increment(method, url, error=e, + _pool=self, _stacktrace=stacktrace) + retries.sleep() + + # Keep track of the error for the retry warning. + err = e + + finally: + if release_conn: + # Put the connection back to be reused. If the connection is + # expired then it will be None, which will get replaced with a + # fresh connection during _get_conn. + self._put_conn(conn) + + if not conn: + # Try again + log.warning("Retrying (%r) after connection " + "broken by '%r': %s" % (retries, err, url)) + return self.urlopen(method, url, body, headers, retries, + redirect, assert_same_host, + timeout=timeout, pool_timeout=pool_timeout, + release_conn=release_conn, **response_kw) + + # Handle redirect? + redirect_location = redirect and response.get_redirect_location() + if redirect_location: + if response.status == 303: + method = 'GET' + + try: + retries = retries.increment(method, url, response=response, _pool=self) + except MaxRetryError: + if retries.raise_on_redirect: + raise + return response + + log.info("Redirecting %s -> %s" % (url, redirect_location)) + return self.urlopen(method, redirect_location, body, headers, + retries=retries, redirect=redirect, + assert_same_host=assert_same_host, + timeout=timeout, pool_timeout=pool_timeout, + release_conn=release_conn, **response_kw) + + # Check if we should retry the HTTP response. + if retries.is_forced_retry(method, status_code=response.status): + retries = retries.increment(method, url, response=response, _pool=self) + retries.sleep() + log.info("Forced retry: %s" % url) + return self.urlopen(method, url, body, headers, + retries=retries, redirect=redirect, + assert_same_host=assert_same_host, + timeout=timeout, pool_timeout=pool_timeout, + release_conn=release_conn, **response_kw) + + return response + + +class HTTPSConnectionPool(HTTPConnectionPool): + """ + Same as :class:`.HTTPConnectionPool`, but HTTPS. + + When Python is compiled with the :mod:`ssl` module, then + :class:`.VerifiedHTTPSConnection` is used, which *can* verify certificates, + instead of :class:`.HTTPSConnection`. + + :class:`.VerifiedHTTPSConnection` uses one of ``assert_fingerprint``, + ``assert_hostname`` and ``host`` in this order to verify connections. + If ``assert_hostname`` is False, no verification is done. + + The ``key_file``, ``cert_file``, ``cert_reqs``, ``ca_certs`` and + ``ssl_version`` are only used if :mod:`ssl` is available and are fed into + :meth:`urllib3.util.ssl_wrap_socket` to upgrade the connection socket + into an SSL socket. + """ + + scheme = 'https' + ConnectionCls = HTTPSConnection + + def __init__(self, host, port=None, + strict=False, timeout=Timeout.DEFAULT_TIMEOUT, maxsize=1, + block=False, headers=None, retries=None, + _proxy=None, _proxy_headers=None, + key_file=None, cert_file=None, cert_reqs=None, + ca_certs=None, ssl_version=None, + assert_hostname=None, assert_fingerprint=None, + **conn_kw): + + HTTPConnectionPool.__init__(self, host, port, strict, timeout, maxsize, + block, headers, retries, _proxy, _proxy_headers, + **conn_kw) + self.key_file = key_file + self.cert_file = cert_file + self.cert_reqs = cert_reqs + self.ca_certs = ca_certs + self.ssl_version = ssl_version + self.assert_hostname = assert_hostname + self.assert_fingerprint = assert_fingerprint + + def _prepare_conn(self, conn): + """ + Prepare the ``connection`` for :meth:`urllib3.util.ssl_wrap_socket` + and establish the tunnel if proxy is used. + """ + + if isinstance(conn, VerifiedHTTPSConnection): + conn.set_cert(key_file=self.key_file, + cert_file=self.cert_file, + cert_reqs=self.cert_reqs, + ca_certs=self.ca_certs, + assert_hostname=self.assert_hostname, + assert_fingerprint=self.assert_fingerprint) + conn.ssl_version = self.ssl_version + + if self.proxy is not None: + # Python 2.7+ + try: + set_tunnel = conn.set_tunnel + except AttributeError: # Platform-specific: Python 2.6 + set_tunnel = conn._set_tunnel + + if sys.version_info <= (2, 6, 4) and not self.proxy_headers: # Python 2.6.4 and older + set_tunnel(self.host, self.port) + else: + set_tunnel(self.host, self.port, self.proxy_headers) + + # Establish tunnel connection early, because otherwise httplib + # would improperly set Host: header to proxy's IP:port. + conn.connect() + + return conn + + def _new_conn(self): + """ + Return a fresh :class:`httplib.HTTPSConnection`. + """ + self.num_connections += 1 + log.info("Starting new HTTPS connection (%d): %s" + % (self.num_connections, self.host)) + + if not self.ConnectionCls or self.ConnectionCls is DummyConnection: + # Platform-specific: Python without ssl + raise SSLError("Can't connect to HTTPS URL because the SSL " + "module is not available.") + + actual_host = self.host + actual_port = self.port + if self.proxy is not None: + actual_host = self.proxy.host + actual_port = self.proxy.port + + conn = self.ConnectionCls(host=actual_host, port=actual_port, + timeout=self.timeout.connect_timeout, + strict=self.strict, **self.conn_kw) + + return self._prepare_conn(conn) + + def _validate_conn(self, conn): + """ + Called right before a request is made, after the socket is created. + """ + super(HTTPSConnectionPool, self)._validate_conn(conn) + + # Force connect early to allow us to validate the connection. + if not getattr(conn, 'sock', None): # AppEngine might not have `.sock` + conn.connect() + + if not conn.is_verified: + warnings.warn(( + 'Unverified HTTPS request is being made. ' + 'Adding certificate verification is strongly advised. See: ' + 'https://urllib3.readthedocs.org/en/latest/security.html ' + '(This warning will only appear once by default.)'), + InsecureRequestWarning) + + +def connection_from_url(url, **kw): + """ + Given a url, return an :class:`.ConnectionPool` instance of its host. + + This is a shortcut for not having to parse out the scheme, host, and port + of the url before creating an :class:`.ConnectionPool` instance. + + :param url: + Absolute URL string that must include the scheme. Port is optional. + + :param \**kw: + Passes additional parameters to the constructor of the appropriate + :class:`.ConnectionPool`. Useful for specifying things like + timeout, maxsize, headers, etc. + + Example:: + + >>> conn = connection_from_url('http://google.com/') + >>> r = conn.request('GET', '/') + """ + scheme, host, port = get_host(url) + if scheme == 'https': + return HTTPSConnectionPool(host, port=port, **kw) + else: + return HTTPConnectionPool(host, port=port, **kw) diff --git a/urllib3/contrib/__init__.py b/urllib3/contrib/__init__.py new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/urllib3/contrib/__init__.py diff --git a/urllib3/contrib/ntlmpool.py b/urllib3/contrib/ntlmpool.py new file mode 100644 index 0000000..c6b266f --- /dev/null +++ b/urllib3/contrib/ntlmpool.py @@ -0,0 +1,114 @@ +""" +NTLM authenticating pool, contributed by erikcederstran + +Issue #10, see: http://code.google.com/p/urllib3/issues/detail?id=10 +""" + +try: + from http.client import HTTPSConnection +except ImportError: + from httplib import HTTPSConnection +from logging import getLogger +from ntlm import ntlm + +from urllib3 import HTTPSConnectionPool + + +log = getLogger(__name__) + + +class NTLMConnectionPool(HTTPSConnectionPool): + """ + Implements an NTLM authentication version of an urllib3 connection pool + """ + + scheme = 'https' + + def __init__(self, user, pw, authurl, *args, **kwargs): + """ + authurl is a random URL on the server that is protected by NTLM. + user is the Windows user, probably in the DOMAIN\\username format. + pw is the password for the user. + """ + super(NTLMConnectionPool, self).__init__(*args, **kwargs) + self.authurl = authurl + self.rawuser = user + user_parts = user.split('\\', 1) + self.domain = user_parts[0].upper() + self.user = user_parts[1] + self.pw = pw + + def _new_conn(self): + # Performs the NTLM handshake that secures the connection. The socket + # must be kept open while requests are performed. + self.num_connections += 1 + log.debug('Starting NTLM HTTPS connection no. %d: https://%s%s' % + (self.num_connections, self.host, self.authurl)) + + headers = {} + headers['Connection'] = 'Keep-Alive' + req_header = 'Authorization' + resp_header = 'www-authenticate' + + conn = HTTPSConnection(host=self.host, port=self.port) + + # Send negotiation message + headers[req_header] = ( + 'NTLM %s' % ntlm.create_NTLM_NEGOTIATE_MESSAGE(self.rawuser)) + log.debug('Request headers: %s' % headers) + conn.request('GET', self.authurl, None, headers) + res = conn.getresponse() + reshdr = dict(res.getheaders()) + log.debug('Response status: %s %s' % (res.status, res.reason)) + log.debug('Response headers: %s' % reshdr) + log.debug('Response data: %s [...]' % res.read(100)) + + # Remove the reference to the socket, so that it can not be closed by + # the response object (we want to keep the socket open) + res.fp = None + + # Server should respond with a challenge message + auth_header_values = reshdr[resp_header].split(', ') + auth_header_value = None + for s in auth_header_values: + if s[:5] == 'NTLM ': + auth_header_value = s[5:] + if auth_header_value is None: + raise Exception('Unexpected %s response header: %s' % + (resp_header, reshdr[resp_header])) + + # Send authentication message + ServerChallenge, NegotiateFlags = \ + ntlm.parse_NTLM_CHALLENGE_MESSAGE(auth_header_value) + auth_msg = ntlm.create_NTLM_AUTHENTICATE_MESSAGE(ServerChallenge, + self.user, + self.domain, + self.pw, + NegotiateFlags) + headers[req_header] = 'NTLM %s' % auth_msg + log.debug('Request headers: %s' % headers) + conn.request('GET', self.authurl, None, headers) + res = conn.getresponse() + log.debug('Response status: %s %s' % (res.status, res.reason)) + log.debug('Response headers: %s' % dict(res.getheaders())) + log.debug('Response data: %s [...]' % res.read()[:100]) + if res.status != 200: + if res.status == 401: + raise Exception('Server rejected request: wrong ' + 'username or password') + raise Exception('Wrong server response: %s %s' % + (res.status, res.reason)) + + res.fp = None + log.debug('Connection established') + return conn + + def urlopen(self, method, url, body=None, headers=None, retries=3, + redirect=True, assert_same_host=True): + if headers is None: + headers = {} + headers['Connection'] = 'Keep-Alive' + return super(NTLMConnectionPool, self).urlopen(method, url, body, + headers, retries, + redirect, + assert_same_host) diff --git a/urllib3/contrib/pyopenssl.py b/urllib3/contrib/pyopenssl.py new file mode 100644 index 0000000..8475eeb --- /dev/null +++ b/urllib3/contrib/pyopenssl.py @@ -0,0 +1,284 @@ +'''SSL with SNI_-support for Python 2. Follow these instructions if you would +like to verify SSL certificates in Python 2. Note, the default libraries do +*not* do certificate checking; you need to do additional work to validate +certificates yourself. + +This needs the following packages installed: + +* pyOpenSSL (tested with 0.13) +* ndg-httpsclient (tested with 0.3.2) +* pyasn1 (tested with 0.1.6) + +You can install them with the following command: + + pip install pyopenssl ndg-httpsclient pyasn1 + +To activate certificate checking, call +:func:`~urllib3.contrib.pyopenssl.inject_into_urllib3` from your Python code +before you begin making HTTP requests. This can be done in a ``sitecustomize`` +module, or at any other time before your application begins using ``urllib3``, +like this:: + + try: + import urllib3.contrib.pyopenssl + urllib3.contrib.pyopenssl.inject_into_urllib3() + except ImportError: + pass + +Now you can use :mod:`urllib3` as you normally would, and it will support SNI +when the required modules are installed. + +Activating this module also has the positive side effect of disabling SSL/TLS +compression in Python 2 (see `CRIME attack`_). + +If you want to configure the default list of supported cipher suites, you can +set the ``urllib3.contrib.pyopenssl.DEFAULT_SSL_CIPHER_LIST`` variable. + +Module Variables +---------------- + +:var DEFAULT_SSL_CIPHER_LIST: The list of supported SSL/TLS cipher suites. + Default: ``ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES: + ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS`` + +.. _sni: https://en.wikipedia.org/wiki/Server_Name_Indication +.. _crime attack: https://en.wikipedia.org/wiki/CRIME_(security_exploit) + +''' + +try: + from ndg.httpsclient.ssl_peer_verification import SUBJ_ALT_NAME_SUPPORT + from ndg.httpsclient.subj_alt_name import SubjectAltName as BaseSubjectAltName +except SyntaxError as e: + raise ImportError(e) + +import OpenSSL.SSL +from pyasn1.codec.der import decoder as der_decoder +from pyasn1.type import univ, constraint +from socket import _fileobject, timeout +import ssl +import select + +from .. import connection +from .. import util + +__all__ = ['inject_into_urllib3', 'extract_from_urllib3'] + +# SNI only *really* works if we can read the subjectAltName of certificates. +HAS_SNI = SUBJ_ALT_NAME_SUPPORT + +# Map from urllib3 to PyOpenSSL compatible parameter-values. +_openssl_versions = { + ssl.PROTOCOL_SSLv23: OpenSSL.SSL.SSLv23_METHOD, + ssl.PROTOCOL_SSLv3: OpenSSL.SSL.SSLv3_METHOD, + ssl.PROTOCOL_TLSv1: OpenSSL.SSL.TLSv1_METHOD, +} +_openssl_verify = { + ssl.CERT_NONE: OpenSSL.SSL.VERIFY_NONE, + ssl.CERT_OPTIONAL: OpenSSL.SSL.VERIFY_PEER, + ssl.CERT_REQUIRED: OpenSSL.SSL.VERIFY_PEER + + OpenSSL.SSL.VERIFY_FAIL_IF_NO_PEER_CERT, +} + +# A secure default. +# Sources for more information on TLS ciphers: +# +# - https://wiki.mozilla.org/Security/Server_Side_TLS +# - https://www.ssllabs.com/projects/best-practices/index.html +# - https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/ +# +# The general intent is: +# - Prefer cipher suites that offer perfect forward secrecy (DHE/ECDHE), +# - prefer ECDHE over DHE for better performance, +# - prefer any AES-GCM over any AES-CBC for better performance and security, +# - use 3DES as fallback which is secure but slow, +# - disable NULL authentication, MD5 MACs and DSS for security reasons. +DEFAULT_SSL_CIPHER_LIST = "ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:" + \ + "ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:" + \ + "!aNULL:!MD5:!DSS" + + +orig_util_HAS_SNI = util.HAS_SNI +orig_connection_ssl_wrap_socket = connection.ssl_wrap_socket + + +def inject_into_urllib3(): + 'Monkey-patch urllib3 with PyOpenSSL-backed SSL-support.' + + connection.ssl_wrap_socket = ssl_wrap_socket + util.HAS_SNI = HAS_SNI + + +def extract_from_urllib3(): + 'Undo monkey-patching by :func:`inject_into_urllib3`.' + + connection.ssl_wrap_socket = orig_connection_ssl_wrap_socket + util.HAS_SNI = orig_util_HAS_SNI + + +### Note: This is a slightly bug-fixed version of same from ndg-httpsclient. +class SubjectAltName(BaseSubjectAltName): + '''ASN.1 implementation for subjectAltNames support''' + + # There is no limit to how many SAN certificates a certificate may have, + # however this needs to have some limit so we'll set an arbitrarily high + # limit. + sizeSpec = univ.SequenceOf.sizeSpec + \ + constraint.ValueSizeConstraint(1, 1024) + + +### Note: This is a slightly bug-fixed version of same from ndg-httpsclient. +def get_subj_alt_name(peer_cert): + # Search through extensions + dns_name = [] + if not SUBJ_ALT_NAME_SUPPORT: + return dns_name + + general_names = SubjectAltName() + for i in range(peer_cert.get_extension_count()): + ext = peer_cert.get_extension(i) + ext_name = ext.get_short_name() + if ext_name != 'subjectAltName': + continue + + # PyOpenSSL returns extension data in ASN.1 encoded form + ext_dat = ext.get_data() + decoded_dat = der_decoder.decode(ext_dat, + asn1Spec=general_names) + + for name in decoded_dat: + if not isinstance(name, SubjectAltName): + continue + for entry in range(len(name)): + component = name.getComponentByPosition(entry) + if component.getName() != 'dNSName': + continue + dns_name.append(str(component.getComponent())) + + return dns_name + + +class WrappedSocket(object): + '''API-compatibility wrapper for Python OpenSSL's Connection-class. + + Note: _makefile_refs, _drop() and _reuse() are needed for the garbage + collector of pypy. + ''' + + def __init__(self, connection, socket, suppress_ragged_eofs=True): + self.connection = connection + self.socket = socket + self.suppress_ragged_eofs = suppress_ragged_eofs + self._makefile_refs = 0 + + def fileno(self): + return self.socket.fileno() + + def makefile(self, mode, bufsize=-1): + self._makefile_refs += 1 + return _fileobject(self, mode, bufsize, close=True) + + def recv(self, *args, **kwargs): + try: + data = self.connection.recv(*args, **kwargs) + except OpenSSL.SSL.SysCallError as e: + if self.suppress_ragged_eofs and e.args == (-1, 'Unexpected EOF'): + return b'' + else: + raise + except OpenSSL.SSL.WantReadError: + rd, wd, ed = select.select( + [self.socket], [], [], self.socket.gettimeout()) + if not rd: + raise timeout('The read operation timed out') + else: + return self.recv(*args, **kwargs) + else: + return data + + def settimeout(self, timeout): + return self.socket.settimeout(timeout) + + def sendall(self, data): + return self.connection.sendall(data) + + def close(self): + if self._makefile_refs < 1: + return self.connection.shutdown() + else: + self._makefile_refs -= 1 + + def getpeercert(self, binary_form=False): + x509 = self.connection.get_peer_certificate() + + if not x509: + return x509 + + if binary_form: + return OpenSSL.crypto.dump_certificate( + OpenSSL.crypto.FILETYPE_ASN1, + x509) + + return { + 'subject': ( + (('commonName', x509.get_subject().CN),), + ), + 'subjectAltName': [ + ('DNS', value) + for value in get_subj_alt_name(x509) + ] + } + + def _reuse(self): + self._makefile_refs += 1 + + def _drop(self): + if self._makefile_refs < 1: + self.close() + else: + self._makefile_refs -= 1 + + +def _verify_callback(cnx, x509, err_no, err_depth, return_code): + return err_no == 0 + + +def ssl_wrap_socket(sock, keyfile=None, certfile=None, cert_reqs=None, + ca_certs=None, server_hostname=None, + ssl_version=None): + ctx = OpenSSL.SSL.Context(_openssl_versions[ssl_version]) + if certfile: + ctx.use_certificate_file(certfile) + if keyfile: + ctx.use_privatekey_file(keyfile) + if cert_reqs != ssl.CERT_NONE: + ctx.set_verify(_openssl_verify[cert_reqs], _verify_callback) + if ca_certs: + try: + ctx.load_verify_locations(ca_certs, None) + except OpenSSL.SSL.Error as e: + raise ssl.SSLError('bad ca_certs: %r' % ca_certs, e) + else: + ctx.set_default_verify_paths() + + # Disable TLS compression to migitate CRIME attack (issue #309) + OP_NO_COMPRESSION = 0x20000 + ctx.set_options(OP_NO_COMPRESSION) + + # Set list of supported ciphersuites. + ctx.set_cipher_list(DEFAULT_SSL_CIPHER_LIST) + + cnx = OpenSSL.SSL.Connection(ctx, sock) + cnx.set_tlsext_host_name(server_hostname) + cnx.set_connect_state() + while True: + try: + cnx.do_handshake() + except OpenSSL.SSL.WantReadError: + select.select([sock], [], []) + continue + except OpenSSL.SSL.Error as e: + raise ssl.SSLError('bad handshake', e) + break + + return WrappedSocket(cnx, sock) diff --git a/urllib3/exceptions.py b/urllib3/exceptions.py new file mode 100644 index 0000000..7519ba9 --- /dev/null +++ b/urllib3/exceptions.py @@ -0,0 +1,156 @@ + +## Base Exceptions + +class HTTPError(Exception): + "Base exception used by this module." + pass + +class HTTPWarning(Warning): + "Base warning used by this module." + pass + + + +class PoolError(HTTPError): + "Base exception for errors caused within a pool." + def __init__(self, pool, message): + self.pool = pool + HTTPError.__init__(self, "%s: %s" % (pool, message)) + + def __reduce__(self): + # For pickling purposes. + return self.__class__, (None, None) + + +class RequestError(PoolError): + "Base exception for PoolErrors that have associated URLs." + def __init__(self, pool, url, message): + self.url = url + PoolError.__init__(self, pool, message) + + def __reduce__(self): + # For pickling purposes. + return self.__class__, (None, self.url, None) + + +class SSLError(HTTPError): + "Raised when SSL certificate fails in an HTTPS connection." + pass + + +class ProxyError(HTTPError): + "Raised when the connection to a proxy fails." + pass + + +class DecodeError(HTTPError): + "Raised when automatic decoding based on Content-Type fails." + pass + + +class ProtocolError(HTTPError): + "Raised when something unexpected happens mid-request/response." + pass + + +#: Renamed to ProtocolError but aliased for backwards compatibility. +ConnectionError = ProtocolError + + +## Leaf Exceptions + +class MaxRetryError(RequestError): + """Raised when the maximum number of retries is exceeded. + + :param pool: The connection pool + :type pool: :class:`~urllib3.connectionpool.HTTPConnectionPool` + :param string url: The requested Url + :param exceptions.Exception reason: The underlying error + + """ + + def __init__(self, pool, url, reason=None): + self.reason = reason + + message = "Max retries exceeded with url: %s" % url + if reason: + message += " (Caused by %r)" % reason + else: + message += " (Caused by redirect)" + + RequestError.__init__(self, pool, url, message) + + +class HostChangedError(RequestError): + "Raised when an existing pool gets a request for a foreign host." + + def __init__(self, pool, url, retries=3): + message = "Tried to open a foreign host with url: %s" % url + RequestError.__init__(self, pool, url, message) + self.retries = retries + + +class TimeoutStateError(HTTPError): + """ Raised when passing an invalid state to a timeout """ + pass + + +class TimeoutError(HTTPError): + """ Raised when a socket timeout error occurs. + + Catching this error will catch both :exc:`ReadTimeoutErrors + <ReadTimeoutError>` and :exc:`ConnectTimeoutErrors <ConnectTimeoutError>`. + """ + pass + + +class ReadTimeoutError(TimeoutError, RequestError): + "Raised when a socket timeout occurs while receiving data from a server" + pass + + +# This timeout error does not have a URL attached and needs to inherit from the +# base HTTPError +class ConnectTimeoutError(TimeoutError): + "Raised when a socket timeout occurs while connecting to a server" + pass + + +class EmptyPoolError(PoolError): + "Raised when a pool runs out of connections and no more are allowed." + pass + + +class ClosedPoolError(PoolError): + "Raised when a request enters a pool after the pool has been closed." + pass + + +class LocationValueError(ValueError, HTTPError): + "Raised when there is something wrong with a given URL input." + pass + + +class LocationParseError(LocationValueError): + "Raised when get_host or similar fails to parse the URL input." + + def __init__(self, location): + message = "Failed to parse: %s" % location + HTTPError.__init__(self, message) + + self.location = location + + +class SecurityWarning(HTTPWarning): + "Warned when perfoming security reducing actions" + pass + + +class InsecureRequestWarning(SecurityWarning): + "Warned when making an unverified HTTPS request." + pass + + +class SystemTimeWarning(SecurityWarning): + "Warned when system time is suspected to be wrong" + pass diff --git a/urllib3/fields.py b/urllib3/fields.py new file mode 100644 index 0000000..c853f8d --- /dev/null +++ b/urllib3/fields.py @@ -0,0 +1,177 @@ +import email.utils +import mimetypes + +from .packages import six + + +def guess_content_type(filename, default='application/octet-stream'): + """ + Guess the "Content-Type" of a file. + + :param filename: + The filename to guess the "Content-Type" of using :mod:`mimetypes`. + :param default: + If no "Content-Type" can be guessed, default to `default`. + """ + if filename: + return mimetypes.guess_type(filename)[0] or default + return default + + +def format_header_param(name, value): + """ + Helper function to format and quote a single header parameter. + + Particularly useful for header parameters which might contain + non-ASCII values, like file names. This follows RFC 2231, as + suggested by RFC 2388 Section 4.4. + + :param name: + The name of the parameter, a string expected to be ASCII only. + :param value: + The value of the parameter, provided as a unicode string. + """ + if not any(ch in value for ch in '"\\\r\n'): + result = '%s="%s"' % (name, value) + try: + result.encode('ascii') + except UnicodeEncodeError: + pass + else: + return result + if not six.PY3: # Python 2: + value = value.encode('utf-8') + value = email.utils.encode_rfc2231(value, 'utf-8') + value = '%s*=%s' % (name, value) + return value + + +class RequestField(object): + """ + A data container for request body parameters. + + :param name: + The name of this request field. + :param data: + The data/value body. + :param filename: + An optional filename of the request field. + :param headers: + An optional dict-like object of headers to initially use for the field. + """ + def __init__(self, name, data, filename=None, headers=None): + self._name = name + self._filename = filename + self.data = data + self.headers = {} + if headers: + self.headers = dict(headers) + + @classmethod + def from_tuples(cls, fieldname, value): + """ + A :class:`~urllib3.fields.RequestField` factory from old-style tuple parameters. + + Supports constructing :class:`~urllib3.fields.RequestField` from + parameter of key/value strings AND key/filetuple. A filetuple is a + (filename, data, MIME type) tuple where the MIME type is optional. + For example:: + + 'foo': 'bar', + 'fakefile': ('foofile.txt', 'contents of foofile'), + 'realfile': ('barfile.txt', open('realfile').read()), + 'typedfile': ('bazfile.bin', open('bazfile').read(), 'image/jpeg'), + 'nonamefile': 'contents of nonamefile field', + + Field names and filenames must be unicode. + """ + if isinstance(value, tuple): + if len(value) == 3: + filename, data, content_type = value + else: + filename, data = value + content_type = guess_content_type(filename) + else: + filename = None + content_type = None + data = value + + request_param = cls(fieldname, data, filename=filename) + request_param.make_multipart(content_type=content_type) + + return request_param + + def _render_part(self, name, value): + """ + Overridable helper function to format a single header parameter. + + :param name: + The name of the parameter, a string expected to be ASCII only. + :param value: + The value of the parameter, provided as a unicode string. + """ + return format_header_param(name, value) + + def _render_parts(self, header_parts): + """ + Helper function to format and quote a single header. + + Useful for single headers that are composed of multiple items. E.g., + 'Content-Disposition' fields. + + :param header_parts: + A sequence of (k, v) typles or a :class:`dict` of (k, v) to format + as `k1="v1"; k2="v2"; ...`. + """ + parts = [] + iterable = header_parts + if isinstance(header_parts, dict): + iterable = header_parts.items() + + for name, value in iterable: + if value: + parts.append(self._render_part(name, value)) + + return '; '.join(parts) + + def render_headers(self): + """ + Renders the headers for this request field. + """ + lines = [] + + sort_keys = ['Content-Disposition', 'Content-Type', 'Content-Location'] + for sort_key in sort_keys: + if self.headers.get(sort_key, False): + lines.append('%s: %s' % (sort_key, self.headers[sort_key])) + + for header_name, header_value in self.headers.items(): + if header_name not in sort_keys: + if header_value: + lines.append('%s: %s' % (header_name, header_value)) + + lines.append('\r\n') + return '\r\n'.join(lines) + + def make_multipart(self, content_disposition=None, content_type=None, + content_location=None): + """ + Makes this request field into a multipart request field. + + This method overrides "Content-Disposition", "Content-Type" and + "Content-Location" headers to the request parameter. + + :param content_type: + The 'Content-Type' of the request body. + :param content_location: + The 'Content-Location' of the request body. + + """ + self.headers['Content-Disposition'] = content_disposition or 'form-data' + self.headers['Content-Disposition'] += '; '.join([ + '', self._render_parts( + (('name', self._name), ('filename', self._filename)) + ) + ]) + self.headers['Content-Type'] = content_type + self.headers['Content-Location'] = content_location diff --git a/urllib3/filepost.py b/urllib3/filepost.py new file mode 100644 index 0000000..0fbf488 --- /dev/null +++ b/urllib3/filepost.py @@ -0,0 +1,93 @@ +import codecs + +from uuid import uuid4 +from io import BytesIO + +from .packages import six +from .packages.six import b +from .fields import RequestField + +writer = codecs.lookup('utf-8')[3] + + +def choose_boundary(): + """ + Our embarassingly-simple replacement for mimetools.choose_boundary. + """ + return uuid4().hex + + +def iter_field_objects(fields): + """ + Iterate over fields. + + Supports list of (k, v) tuples and dicts, and lists of + :class:`~urllib3.fields.RequestField`. + + """ + if isinstance(fields, dict): + i = six.iteritems(fields) + else: + i = iter(fields) + + for field in i: + if isinstance(field, RequestField): + yield field + else: + yield RequestField.from_tuples(*field) + + +def iter_fields(fields): + """ + .. deprecated:: 1.6 + + Iterate over fields. + + The addition of :class:`~urllib3.fields.RequestField` makes this function + obsolete. Instead, use :func:`iter_field_objects`, which returns + :class:`~urllib3.fields.RequestField` objects. + + Supports list of (k, v) tuples and dicts. + """ + if isinstance(fields, dict): + return ((k, v) for k, v in six.iteritems(fields)) + + return ((k, v) for k, v in fields) + + +def encode_multipart_formdata(fields, boundary=None): + """ + Encode a dictionary of ``fields`` using the multipart/form-data MIME format. + + :param fields: + Dictionary of fields or list of (key, :class:`~urllib3.fields.RequestField`). + + :param boundary: + If not specified, then a random boundary will be generated using + :func:`mimetools.choose_boundary`. + """ + body = BytesIO() + if boundary is None: + boundary = choose_boundary() + + for field in iter_field_objects(fields): + body.write(b('--%s\r\n' % (boundary))) + + writer(body).write(field.render_headers()) + data = field.data + + if isinstance(data, int): + data = str(data) # Backwards compatibility + + if isinstance(data, six.text_type): + writer(body).write(data) + else: + body.write(data) + + body.write(b'\r\n') + + body.write(b('--%s--\r\n' % (boundary))) + + content_type = str('multipart/form-data; boundary=%s' % boundary) + + return body.getvalue(), content_type diff --git a/urllib3/packages/__init__.py b/urllib3/packages/__init__.py new file mode 100644 index 0000000..37e8351 --- /dev/null +++ b/urllib3/packages/__init__.py @@ -0,0 +1,4 @@ +from __future__ import absolute_import + +from . import ssl_match_hostname + diff --git a/urllib3/packages/ordered_dict.py b/urllib3/packages/ordered_dict.py new file mode 100644 index 0000000..4479363 --- /dev/null +++ b/urllib3/packages/ordered_dict.py @@ -0,0 +1,259 @@ +# Backport of OrderedDict() class that runs on Python 2.4, 2.5, 2.6, 2.7 and pypy. +# Passes Python2.7's test suite and incorporates all the latest updates. +# Copyright 2009 Raymond Hettinger, released under the MIT License. +# http://code.activestate.com/recipes/576693/ +try: + from thread import get_ident as _get_ident +except ImportError: + from dummy_thread import get_ident as _get_ident + +try: + from _abcoll import KeysView, ValuesView, ItemsView +except ImportError: + pass + + +class OrderedDict(dict): + 'Dictionary that remembers insertion order' + # An inherited dict maps keys to values. + # The inherited dict provides __getitem__, __len__, __contains__, and get. + # The remaining methods are order-aware. + # Big-O running times for all methods are the same as for regular dictionaries. + + # The internal self.__map dictionary maps keys to links in a doubly linked list. + # The circular doubly linked list starts and ends with a sentinel element. + # The sentinel element never gets deleted (this simplifies the algorithm). + # Each link is stored as a list of length three: [PREV, NEXT, KEY]. + + def __init__(self, *args, **kwds): + '''Initialize an ordered dictionary. Signature is the same as for + regular dictionaries, but keyword arguments are not recommended + because their insertion order is arbitrary. + + ''' + if len(args) > 1: + raise TypeError('expected at most 1 arguments, got %d' % len(args)) + try: + self.__root + except AttributeError: + self.__root = root = [] # sentinel node + root[:] = [root, root, None] + self.__map = {} + self.__update(*args, **kwds) + + def __setitem__(self, key, value, dict_setitem=dict.__setitem__): + 'od.__setitem__(i, y) <==> od[i]=y' + # Setting a new item creates a new link which goes at the end of the linked + # list, and the inherited dictionary is updated with the new key/value pair. + if key not in self: + root = self.__root + last = root[0] + last[1] = root[0] = self.__map[key] = [last, root, key] + dict_setitem(self, key, value) + + def __delitem__(self, key, dict_delitem=dict.__delitem__): + 'od.__delitem__(y) <==> del od[y]' + # Deleting an existing item uses self.__map to find the link which is + # then removed by updating the links in the predecessor and successor nodes. + dict_delitem(self, key) + link_prev, link_next, key = self.__map.pop(key) + link_prev[1] = link_next + link_next[0] = link_prev + + def __iter__(self): + 'od.__iter__() <==> iter(od)' + root = self.__root + curr = root[1] + while curr is not root: + yield curr[2] + curr = curr[1] + + def __reversed__(self): + 'od.__reversed__() <==> reversed(od)' + root = self.__root + curr = root[0] + while curr is not root: + yield curr[2] + curr = curr[0] + + def clear(self): + 'od.clear() -> None. Remove all items from od.' + try: + for node in self.__map.itervalues(): + del node[:] + root = self.__root + root[:] = [root, root, None] + self.__map.clear() + except AttributeError: + pass + dict.clear(self) + + def popitem(self, last=True): + '''od.popitem() -> (k, v), return and remove a (key, value) pair. + Pairs are returned in LIFO order if last is true or FIFO order if false. + + ''' + if not self: + raise KeyError('dictionary is empty') + root = self.__root + if last: + link = root[0] + link_prev = link[0] + link_prev[1] = root + root[0] = link_prev + else: + link = root[1] + link_next = link[1] + root[1] = link_next + link_next[0] = root + key = link[2] + del self.__map[key] + value = dict.pop(self, key) + return key, value + + # -- the following methods do not depend on the internal structure -- + + def keys(self): + 'od.keys() -> list of keys in od' + return list(self) + + def values(self): + 'od.values() -> list of values in od' + return [self[key] for key in self] + + def items(self): + 'od.items() -> list of (key, value) pairs in od' + return [(key, self[key]) for key in self] + + def iterkeys(self): + 'od.iterkeys() -> an iterator over the keys in od' + return iter(self) + + def itervalues(self): + 'od.itervalues -> an iterator over the values in od' + for k in self: + yield self[k] + + def iteritems(self): + 'od.iteritems -> an iterator over the (key, value) items in od' + for k in self: + yield (k, self[k]) + + def update(*args, **kwds): + '''od.update(E, **F) -> None. Update od from dict/iterable E and F. + + If E is a dict instance, does: for k in E: od[k] = E[k] + If E has a .keys() method, does: for k in E.keys(): od[k] = E[k] + Or if E is an iterable of items, does: for k, v in E: od[k] = v + In either case, this is followed by: for k, v in F.items(): od[k] = v + + ''' + if len(args) > 2: + raise TypeError('update() takes at most 2 positional ' + 'arguments (%d given)' % (len(args),)) + elif not args: + raise TypeError('update() takes at least 1 argument (0 given)') + self = args[0] + # Make progressively weaker assumptions about "other" + other = () + if len(args) == 2: + other = args[1] + if isinstance(other, dict): + for key in other: + self[key] = other[key] + elif hasattr(other, 'keys'): + for key in other.keys(): + self[key] = other[key] + else: + for key, value in other: + self[key] = value + for key, value in kwds.items(): + self[key] = value + + __update = update # let subclasses override update without breaking __init__ + + __marker = object() + + def pop(self, key, default=__marker): + '''od.pop(k[,d]) -> v, remove specified key and return the corresponding value. + If key is not found, d is returned if given, otherwise KeyError is raised. + + ''' + if key in self: + result = self[key] + del self[key] + return result + if default is self.__marker: + raise KeyError(key) + return default + + def setdefault(self, key, default=None): + 'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od' + if key in self: + return self[key] + self[key] = default + return default + + def __repr__(self, _repr_running={}): + 'od.__repr__() <==> repr(od)' + call_key = id(self), _get_ident() + if call_key in _repr_running: + return '...' + _repr_running[call_key] = 1 + try: + if not self: + return '%s()' % (self.__class__.__name__,) + return '%s(%r)' % (self.__class__.__name__, self.items()) + finally: + del _repr_running[call_key] + + def __reduce__(self): + 'Return state information for pickling' + items = [[k, self[k]] for k in self] + inst_dict = vars(self).copy() + for k in vars(OrderedDict()): + inst_dict.pop(k, None) + if inst_dict: + return (self.__class__, (items,), inst_dict) + return self.__class__, (items,) + + def copy(self): + 'od.copy() -> a shallow copy of od' + return self.__class__(self) + + @classmethod + def fromkeys(cls, iterable, value=None): + '''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S + and values equal to v (which defaults to None). + + ''' + d = cls() + for key in iterable: + d[key] = value + return d + + def __eq__(self, other): + '''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive + while comparison to a regular mapping is order-insensitive. + + ''' + if isinstance(other, OrderedDict): + return len(self)==len(other) and self.items() == other.items() + return dict.__eq__(self, other) + + def __ne__(self, other): + return not self == other + + # -- the following methods are only used in Python 2.7 -- + + def viewkeys(self): + "od.viewkeys() -> a set-like object providing a view on od's keys" + return KeysView(self) + + def viewvalues(self): + "od.viewvalues() -> an object providing a view on od's values" + return ValuesView(self) + + def viewitems(self): + "od.viewitems() -> a set-like object providing a view on od's items" + return ItemsView(self) diff --git a/urllib3/packages/six.py b/urllib3/packages/six.py new file mode 100644 index 0000000..27d8011 --- /dev/null +++ b/urllib3/packages/six.py @@ -0,0 +1,385 @@ +"""Utilities for writing code that runs on Python 2 and 3""" + +#Copyright (c) 2010-2011 Benjamin Peterson + +#Permission is hereby granted, free of charge, to any person obtaining a copy of +#this software and associated documentation files (the "Software"), to deal in +#the Software without restriction, including without limitation the rights to +#use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of +#the Software, and to permit persons to whom the Software is furnished to do so, +#subject to the following conditions: + +#The above copyright notice and this permission notice shall be included in all +#copies or substantial portions of the Software. + +#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS +#FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR +#COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER +#IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN +#CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +import operator +import sys +import types + +__author__ = "Benjamin Peterson <benjamin@python.org>" +__version__ = "1.2.0" # Revision 41c74fef2ded + + +# True if we are running on Python 3. +PY3 = sys.version_info[0] == 3 + +if PY3: + string_types = str, + integer_types = int, + class_types = type, + text_type = str + binary_type = bytes + + MAXSIZE = sys.maxsize +else: + string_types = basestring, + integer_types = (int, long) + class_types = (type, types.ClassType) + text_type = unicode + binary_type = str + + if sys.platform.startswith("java"): + # Jython always uses 32 bits. + MAXSIZE = int((1 << 31) - 1) + else: + # It's possible to have sizeof(long) != sizeof(Py_ssize_t). + class X(object): + def __len__(self): + return 1 << 31 + try: + len(X()) + except OverflowError: + # 32-bit + MAXSIZE = int((1 << 31) - 1) + else: + # 64-bit + MAXSIZE = int((1 << 63) - 1) + del X + + +def _add_doc(func, doc): + """Add documentation to a function.""" + func.__doc__ = doc + + +def _import_module(name): + """Import module, returning the module after the last dot.""" + __import__(name) + return sys.modules[name] + + +class _LazyDescr(object): + + def __init__(self, name): + self.name = name + + def __get__(self, obj, tp): + result = self._resolve() + setattr(obj, self.name, result) + # This is a bit ugly, but it avoids running this again. + delattr(tp, self.name) + return result + + +class MovedModule(_LazyDescr): + + def __init__(self, name, old, new=None): + super(MovedModule, self).__init__(name) + if PY3: + if new is None: + new = name + self.mod = new + else: + self.mod = old + + def _resolve(self): + return _import_module(self.mod) + + +class MovedAttribute(_LazyDescr): + + def __init__(self, name, old_mod, new_mod, old_attr=None, new_attr=None): + super(MovedAttribute, self).__init__(name) + if PY3: + if new_mod is None: + new_mod = name + self.mod = new_mod + if new_attr is None: + if old_attr is None: + new_attr = name + else: + new_attr = old_attr + self.attr = new_attr + else: + self.mod = old_mod + if old_attr is None: + old_attr = name + self.attr = old_attr + + def _resolve(self): + module = _import_module(self.mod) + return getattr(module, self.attr) + + + +class _MovedItems(types.ModuleType): + """Lazy loading of moved objects""" + + +_moved_attributes = [ + MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"), + MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"), + MovedAttribute("input", "__builtin__", "builtins", "raw_input", "input"), + MovedAttribute("map", "itertools", "builtins", "imap", "map"), + MovedAttribute("reload_module", "__builtin__", "imp", "reload"), + MovedAttribute("reduce", "__builtin__", "functools"), + MovedAttribute("StringIO", "StringIO", "io"), + MovedAttribute("xrange", "__builtin__", "builtins", "xrange", "range"), + MovedAttribute("zip", "itertools", "builtins", "izip", "zip"), + + MovedModule("builtins", "__builtin__"), + MovedModule("configparser", "ConfigParser"), + MovedModule("copyreg", "copy_reg"), + MovedModule("http_cookiejar", "cookielib", "http.cookiejar"), + MovedModule("http_cookies", "Cookie", "http.cookies"), + MovedModule("html_entities", "htmlentitydefs", "html.entities"), + MovedModule("html_parser", "HTMLParser", "html.parser"), + MovedModule("http_client", "httplib", "http.client"), + MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"), + MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"), + MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"), + MovedModule("cPickle", "cPickle", "pickle"), + MovedModule("queue", "Queue"), + MovedModule("reprlib", "repr"), + MovedModule("socketserver", "SocketServer"), + MovedModule("tkinter", "Tkinter"), + MovedModule("tkinter_dialog", "Dialog", "tkinter.dialog"), + MovedModule("tkinter_filedialog", "FileDialog", "tkinter.filedialog"), + MovedModule("tkinter_scrolledtext", "ScrolledText", "tkinter.scrolledtext"), + MovedModule("tkinter_simpledialog", "SimpleDialog", "tkinter.simpledialog"), + MovedModule("tkinter_tix", "Tix", "tkinter.tix"), + MovedModule("tkinter_constants", "Tkconstants", "tkinter.constants"), + MovedModule("tkinter_dnd", "Tkdnd", "tkinter.dnd"), + MovedModule("tkinter_colorchooser", "tkColorChooser", + "tkinter.colorchooser"), + MovedModule("tkinter_commondialog", "tkCommonDialog", + "tkinter.commondialog"), + MovedModule("tkinter_tkfiledialog", "tkFileDialog", "tkinter.filedialog"), + MovedModule("tkinter_font", "tkFont", "tkinter.font"), + MovedModule("tkinter_messagebox", "tkMessageBox", "tkinter.messagebox"), + MovedModule("tkinter_tksimpledialog", "tkSimpleDialog", + "tkinter.simpledialog"), + MovedModule("urllib_robotparser", "robotparser", "urllib.robotparser"), + MovedModule("winreg", "_winreg"), +] +for attr in _moved_attributes: + setattr(_MovedItems, attr.name, attr) +del attr + +moves = sys.modules[__name__ + ".moves"] = _MovedItems("moves") + + +def add_move(move): + """Add an item to six.moves.""" + setattr(_MovedItems, move.name, move) + + +def remove_move(name): + """Remove item from six.moves.""" + try: + delattr(_MovedItems, name) + except AttributeError: + try: + del moves.__dict__[name] + except KeyError: + raise AttributeError("no such move, %r" % (name,)) + + +if PY3: + _meth_func = "__func__" + _meth_self = "__self__" + + _func_code = "__code__" + _func_defaults = "__defaults__" + + _iterkeys = "keys" + _itervalues = "values" + _iteritems = "items" +else: + _meth_func = "im_func" + _meth_self = "im_self" + + _func_code = "func_code" + _func_defaults = "func_defaults" + + _iterkeys = "iterkeys" + _itervalues = "itervalues" + _iteritems = "iteritems" + + +try: + advance_iterator = next +except NameError: + def advance_iterator(it): + return it.next() +next = advance_iterator + + +if PY3: + def get_unbound_function(unbound): + return unbound + + Iterator = object + + def callable(obj): + return any("__call__" in klass.__dict__ for klass in type(obj).__mro__) +else: + def get_unbound_function(unbound): + return unbound.im_func + + class Iterator(object): + + def next(self): + return type(self).__next__(self) + + callable = callable +_add_doc(get_unbound_function, + """Get the function out of a possibly unbound function""") + + +get_method_function = operator.attrgetter(_meth_func) +get_method_self = operator.attrgetter(_meth_self) +get_function_code = operator.attrgetter(_func_code) +get_function_defaults = operator.attrgetter(_func_defaults) + + +def iterkeys(d): + """Return an iterator over the keys of a dictionary.""" + return iter(getattr(d, _iterkeys)()) + +def itervalues(d): + """Return an iterator over the values of a dictionary.""" + return iter(getattr(d, _itervalues)()) + +def iteritems(d): + """Return an iterator over the (key, value) pairs of a dictionary.""" + return iter(getattr(d, _iteritems)()) + + +if PY3: + def b(s): + return s.encode("latin-1") + def u(s): + return s + if sys.version_info[1] <= 1: + def int2byte(i): + return bytes((i,)) + else: + # This is about 2x faster than the implementation above on 3.2+ + int2byte = operator.methodcaller("to_bytes", 1, "big") + import io + StringIO = io.StringIO + BytesIO = io.BytesIO +else: + def b(s): + return s + def u(s): + return unicode(s, "unicode_escape") + int2byte = chr + import StringIO + StringIO = BytesIO = StringIO.StringIO +_add_doc(b, """Byte literal""") +_add_doc(u, """Text literal""") + + +if PY3: + import builtins + exec_ = getattr(builtins, "exec") + + + def reraise(tp, value, tb=None): + if value.__traceback__ is not tb: + raise value.with_traceback(tb) + raise value + + + print_ = getattr(builtins, "print") + del builtins + +else: + def exec_(code, globs=None, locs=None): + """Execute code in a namespace.""" + if globs is None: + frame = sys._getframe(1) + globs = frame.f_globals + if locs is None: + locs = frame.f_locals + del frame + elif locs is None: + locs = globs + exec("""exec code in globs, locs""") + + + exec_("""def reraise(tp, value, tb=None): + raise tp, value, tb +""") + + + def print_(*args, **kwargs): + """The new-style print function.""" + fp = kwargs.pop("file", sys.stdout) + if fp is None: + return + def write(data): + if not isinstance(data, basestring): + data = str(data) + fp.write(data) + want_unicode = False + sep = kwargs.pop("sep", None) + if sep is not None: + if isinstance(sep, unicode): + want_unicode = True + elif not isinstance(sep, str): + raise TypeError("sep must be None or a string") + end = kwargs.pop("end", None) + if end is not None: + if isinstance(end, unicode): + want_unicode = True + elif not isinstance(end, str): + raise TypeError("end must be None or a string") + if kwargs: + raise TypeError("invalid keyword arguments to print()") + if not want_unicode: + for arg in args: + if isinstance(arg, unicode): + want_unicode = True + break + if want_unicode: + newline = unicode("\n") + space = unicode(" ") + else: + newline = "\n" + space = " " + if sep is None: + sep = space + if end is None: + end = newline + for i, arg in enumerate(args): + if i: + write(sep) + write(arg) + write(end) + +_add_doc(reraise, """Reraise an exception.""") + + +def with_metaclass(meta, base=object): + """Create a base class with a metaclass.""" + return meta("NewBase", (base,), {}) diff --git a/urllib3/packages/ssl_match_hostname/__init__.py b/urllib3/packages/ssl_match_hostname/__init__.py new file mode 100644 index 0000000..dd59a75 --- /dev/null +++ b/urllib3/packages/ssl_match_hostname/__init__.py @@ -0,0 +1,13 @@ +try: + # Python 3.2+ + from ssl import CertificateError, match_hostname +except ImportError: + try: + # Backport of the function from a pypi module + from backports.ssl_match_hostname import CertificateError, match_hostname + except ImportError: + # Our vendored copy + from ._implementation import CertificateError, match_hostname + +# Not needed, but documenting what we provide. +__all__ = ('CertificateError', 'match_hostname') diff --git a/urllib3/packages/ssl_match_hostname/_implementation.py b/urllib3/packages/ssl_match_hostname/_implementation.py new file mode 100644 index 0000000..52f4287 --- /dev/null +++ b/urllib3/packages/ssl_match_hostname/_implementation.py @@ -0,0 +1,105 @@ +"""The match_hostname() function from Python 3.3.3, essential when using SSL.""" + +# Note: This file is under the PSF license as the code comes from the python +# stdlib. http://docs.python.org/3/license.html + +import re + +__version__ = '3.4.0.2' + +class CertificateError(ValueError): + pass + + +def _dnsname_match(dn, hostname, max_wildcards=1): + """Matching according to RFC 6125, section 6.4.3 + + http://tools.ietf.org/html/rfc6125#section-6.4.3 + """ + pats = [] + if not dn: + return False + + # Ported from python3-syntax: + # leftmost, *remainder = dn.split(r'.') + parts = dn.split(r'.') + leftmost = parts[0] + remainder = parts[1:] + + wildcards = leftmost.count('*') + if wildcards > max_wildcards: + # Issue #17980: avoid denials of service by refusing more + # than one wildcard per fragment. A survey of established + # policy among SSL implementations showed it to be a + # reasonable choice. + raise CertificateError( + "too many wildcards in certificate DNS name: " + repr(dn)) + + # speed up common case w/o wildcards + if not wildcards: + return dn.lower() == hostname.lower() + + # RFC 6125, section 6.4.3, subitem 1. + # The client SHOULD NOT attempt to match a presented identifier in which + # the wildcard character comprises a label other than the left-most label. + if leftmost == '*': + # When '*' is a fragment by itself, it matches a non-empty dotless + # fragment. + pats.append('[^.]+') + elif leftmost.startswith('xn--') or hostname.startswith('xn--'): + # RFC 6125, section 6.4.3, subitem 3. + # The client SHOULD NOT attempt to match a presented identifier + # where the wildcard character is embedded within an A-label or + # U-label of an internationalized domain name. + pats.append(re.escape(leftmost)) + else: + # Otherwise, '*' matches any dotless string, e.g. www* + pats.append(re.escape(leftmost).replace(r'\*', '[^.]*')) + + # add the remaining fragments, ignore any wildcards + for frag in remainder: + pats.append(re.escape(frag)) + + pat = re.compile(r'\A' + r'\.'.join(pats) + r'\Z', re.IGNORECASE) + return pat.match(hostname) + + +def match_hostname(cert, hostname): + """Verify that *cert* (in decoded format as returned by + SSLSocket.getpeercert()) matches the *hostname*. RFC 2818 and RFC 6125 + rules are followed, but IP addresses are not accepted for *hostname*. + + CertificateError is raised on failure. On success, the function + returns nothing. + """ + if not cert: + raise ValueError("empty or no certificate") + dnsnames = [] + san = cert.get('subjectAltName', ()) + for key, value in san: + if key == 'DNS': + if _dnsname_match(value, hostname): + return + dnsnames.append(value) + if not dnsnames: + # The subject is only checked when there is no dNSName entry + # in subjectAltName + for sub in cert.get('subject', ()): + for key, value in sub: + # XXX according to RFC 2818, the most specific Common Name + # must be used. + if key == 'commonName': + if _dnsname_match(value, hostname): + return + dnsnames.append(value) + if len(dnsnames) > 1: + raise CertificateError("hostname %r " + "doesn't match either of %s" + % (hostname, ', '.join(map(repr, dnsnames)))) + elif len(dnsnames) == 1: + raise CertificateError("hostname %r " + "doesn't match %r" + % (hostname, dnsnames[0])) + else: + raise CertificateError("no appropriate commonName or " + "subjectAltName fields were found") diff --git a/urllib3/poolmanager.py b/urllib3/poolmanager.py new file mode 100644 index 0000000..515dc96 --- /dev/null +++ b/urllib3/poolmanager.py @@ -0,0 +1,265 @@ +import logging + +try: # Python 3 + from urllib.parse import urljoin +except ImportError: + from urlparse import urljoin + +from ._collections import RecentlyUsedContainer +from .connectionpool import HTTPConnectionPool, HTTPSConnectionPool +from .connectionpool import port_by_scheme +from .exceptions import LocationValueError +from .request import RequestMethods +from .util.url import parse_url +from .util.retry import Retry + + +__all__ = ['PoolManager', 'ProxyManager', 'proxy_from_url'] + + +pool_classes_by_scheme = { + 'http': HTTPConnectionPool, + 'https': HTTPSConnectionPool, +} + +log = logging.getLogger(__name__) + +SSL_KEYWORDS = ('key_file', 'cert_file', 'cert_reqs', 'ca_certs', + 'ssl_version') + + +class PoolManager(RequestMethods): + """ + Allows for arbitrary requests while transparently keeping track of + necessary connection pools for you. + + :param num_pools: + Number of connection pools to cache before discarding the least + recently used pool. + + :param headers: + Headers to include with all requests, unless other headers are given + explicitly. + + :param \**connection_pool_kw: + Additional parameters are used to create fresh + :class:`urllib3.connectionpool.ConnectionPool` instances. + + Example:: + + >>> manager = PoolManager(num_pools=2) + >>> r = manager.request('GET', 'http://google.com/') + >>> r = manager.request('GET', 'http://google.com/mail') + >>> r = manager.request('GET', 'http://yahoo.com/') + >>> len(manager.pools) + 2 + + """ + + proxy = None + + def __init__(self, num_pools=10, headers=None, **connection_pool_kw): + RequestMethods.__init__(self, headers) + self.connection_pool_kw = connection_pool_kw + self.pools = RecentlyUsedContainer(num_pools, + dispose_func=lambda p: p.close()) + + def _new_pool(self, scheme, host, port): + """ + Create a new :class:`ConnectionPool` based on host, port and scheme. + + This method is used to actually create the connection pools handed out + by :meth:`connection_from_url` and companion methods. It is intended + to be overridden for customization. + """ + pool_cls = pool_classes_by_scheme[scheme] + kwargs = self.connection_pool_kw + if scheme == 'http': + kwargs = self.connection_pool_kw.copy() + for kw in SSL_KEYWORDS: + kwargs.pop(kw, None) + + return pool_cls(host, port, **kwargs) + + def clear(self): + """ + Empty our store of pools and direct them all to close. + + This will not affect in-flight connections, but they will not be + re-used after completion. + """ + self.pools.clear() + + def connection_from_host(self, host, port=None, scheme='http'): + """ + Get a :class:`ConnectionPool` based on the host, port, and scheme. + + If ``port`` isn't given, it will be derived from the ``scheme`` using + ``urllib3.connectionpool.port_by_scheme``. + """ + + if not host: + raise LocationValueError("No host specified.") + + scheme = scheme or 'http' + port = port or port_by_scheme.get(scheme, 80) + pool_key = (scheme, host, port) + + with self.pools.lock: + # If the scheme, host, or port doesn't match existing open + # connections, open a new ConnectionPool. + pool = self.pools.get(pool_key) + if pool: + return pool + + # Make a fresh ConnectionPool of the desired type + pool = self._new_pool(scheme, host, port) + self.pools[pool_key] = pool + + return pool + + def connection_from_url(self, url): + """ + Similar to :func:`urllib3.connectionpool.connection_from_url` but + doesn't pass any additional parameters to the + :class:`urllib3.connectionpool.ConnectionPool` constructor. + + Additional parameters are taken from the :class:`.PoolManager` + constructor. + """ + u = parse_url(url) + return self.connection_from_host(u.host, port=u.port, scheme=u.scheme) + + def urlopen(self, method, url, redirect=True, **kw): + """ + Same as :meth:`urllib3.connectionpool.HTTPConnectionPool.urlopen` + with custom cross-host redirect logic and only sends the request-uri + portion of the ``url``. + + The given ``url`` parameter must be absolute, such that an appropriate + :class:`urllib3.connectionpool.ConnectionPool` can be chosen for it. + """ + u = parse_url(url) + conn = self.connection_from_host(u.host, port=u.port, scheme=u.scheme) + + kw['assert_same_host'] = False + kw['redirect'] = False + if 'headers' not in kw: + kw['headers'] = self.headers + + if self.proxy is not None and u.scheme == "http": + response = conn.urlopen(method, url, **kw) + else: + response = conn.urlopen(method, u.request_uri, **kw) + + redirect_location = redirect and response.get_redirect_location() + if not redirect_location: + return response + + # Support relative URLs for redirecting. + redirect_location = urljoin(url, redirect_location) + + # RFC 7231, Section 6.4.4 + if response.status == 303: + method = 'GET' + + retries = kw.get('retries') + if not isinstance(retries, Retry): + retries = Retry.from_int(retries, redirect=redirect) + + kw['retries'] = retries.increment(method, redirect_location) + kw['redirect'] = redirect + + log.info("Redirecting %s -> %s" % (url, redirect_location)) + return self.urlopen(method, redirect_location, **kw) + + +class ProxyManager(PoolManager): + """ + Behaves just like :class:`PoolManager`, but sends all requests through + the defined proxy, using the CONNECT method for HTTPS URLs. + + :param proxy_url: + The URL of the proxy to be used. + + :param proxy_headers: + A dictionary contaning headers that will be sent to the proxy. In case + of HTTP they are being sent with each request, while in the + HTTPS/CONNECT case they are sent only once. Could be used for proxy + authentication. + + Example: + >>> proxy = urllib3.ProxyManager('http://localhost:3128/') + >>> r1 = proxy.request('GET', 'http://google.com/') + >>> r2 = proxy.request('GET', 'http://httpbin.org/') + >>> len(proxy.pools) + 1 + >>> r3 = proxy.request('GET', 'https://httpbin.org/') + >>> r4 = proxy.request('GET', 'https://twitter.com/') + >>> len(proxy.pools) + 3 + + """ + + def __init__(self, proxy_url, num_pools=10, headers=None, + proxy_headers=None, **connection_pool_kw): + + if isinstance(proxy_url, HTTPConnectionPool): + proxy_url = '%s://%s:%i' % (proxy_url.scheme, proxy_url.host, + proxy_url.port) + proxy = parse_url(proxy_url) + if not proxy.port: + port = port_by_scheme.get(proxy.scheme, 80) + proxy = proxy._replace(port=port) + + assert proxy.scheme in ("http", "https"), \ + 'Not supported proxy scheme %s' % proxy.scheme + + self.proxy = proxy + self.proxy_headers = proxy_headers or {} + + connection_pool_kw['_proxy'] = self.proxy + connection_pool_kw['_proxy_headers'] = self.proxy_headers + + super(ProxyManager, self).__init__( + num_pools, headers, **connection_pool_kw) + + def connection_from_host(self, host, port=None, scheme='http'): + if scheme == "https": + return super(ProxyManager, self).connection_from_host( + host, port, scheme) + + return super(ProxyManager, self).connection_from_host( + self.proxy.host, self.proxy.port, self.proxy.scheme) + + def _set_proxy_headers(self, url, headers=None): + """ + Sets headers needed by proxies: specifically, the Accept and Host + headers. Only sets headers not provided by the user. + """ + headers_ = {'Accept': '*/*'} + + netloc = parse_url(url).netloc + if netloc: + headers_['Host'] = netloc + + if headers: + headers_.update(headers) + return headers_ + + def urlopen(self, method, url, redirect=True, **kw): + "Same as HTTP(S)ConnectionPool.urlopen, ``url`` must be absolute." + u = parse_url(url) + + if u.scheme == "http": + # For proxied HTTPS requests, httplib sets the necessary headers + # on the CONNECT to the proxy. For HTTP, we'll definitely + # need to set 'Host' at the very least. + headers = kw.get('headers', self.headers) + kw['headers'] = self._set_proxy_headers(url, headers) + + return super(ProxyManager, self).urlopen(method, url, redirect=redirect, **kw) + + +def proxy_from_url(url, **kw): + return ProxyManager(proxy_url=url, **kw) diff --git a/urllib3/request.py b/urllib3/request.py new file mode 100644 index 0000000..51fe238 --- /dev/null +++ b/urllib3/request.py @@ -0,0 +1,135 @@ +try: + from urllib.parse import urlencode +except ImportError: + from urllib import urlencode + +from .filepost import encode_multipart_formdata + + +__all__ = ['RequestMethods'] + + +class RequestMethods(object): + """ + Convenience mixin for classes who implement a :meth:`urlopen` method, such + as :class:`~urllib3.connectionpool.HTTPConnectionPool` and + :class:`~urllib3.poolmanager.PoolManager`. + + Provides behavior for making common types of HTTP request methods and + decides which type of request field encoding to use. + + Specifically, + + :meth:`.request_encode_url` is for sending requests whose fields are + encoded in the URL (such as GET, HEAD, DELETE). + + :meth:`.request_encode_body` is for sending requests whose fields are + encoded in the *body* of the request using multipart or www-form-urlencoded + (such as for POST, PUT, PATCH). + + :meth:`.request` is for making any kind of request, it will look up the + appropriate encoding format and use one of the above two methods to make + the request. + + Initializer parameters: + + :param headers: + Headers to include with all requests, unless other headers are given + explicitly. + """ + + _encode_url_methods = set(['DELETE', 'GET', 'HEAD', 'OPTIONS']) + + def __init__(self, headers=None): + self.headers = headers or {} + + def urlopen(self, method, url, body=None, headers=None, + encode_multipart=True, multipart_boundary=None, + **kw): # Abstract + raise NotImplemented("Classes extending RequestMethods must implement " + "their own ``urlopen`` method.") + + def request(self, method, url, fields=None, headers=None, **urlopen_kw): + """ + Make a request using :meth:`urlopen` with the appropriate encoding of + ``fields`` based on the ``method`` used. + + This is a convenience method that requires the least amount of manual + effort. It can be used in most situations, while still having the + option to drop down to more specific methods when necessary, such as + :meth:`request_encode_url`, :meth:`request_encode_body`, + or even the lowest level :meth:`urlopen`. + """ + method = method.upper() + + if method in self._encode_url_methods: + return self.request_encode_url(method, url, fields=fields, + headers=headers, + **urlopen_kw) + else: + return self.request_encode_body(method, url, fields=fields, + headers=headers, + **urlopen_kw) + + def request_encode_url(self, method, url, fields=None, **urlopen_kw): + """ + Make a request using :meth:`urlopen` with the ``fields`` encoded in + the url. This is useful for request methods like GET, HEAD, DELETE, etc. + """ + if fields: + url += '?' + urlencode(fields) + return self.urlopen(method, url, **urlopen_kw) + + def request_encode_body(self, method, url, fields=None, headers=None, + encode_multipart=True, multipart_boundary=None, + **urlopen_kw): + """ + Make a request using :meth:`urlopen` with the ``fields`` encoded in + the body. This is useful for request methods like POST, PUT, PATCH, etc. + + When ``encode_multipart=True`` (default), then + :meth:`urllib3.filepost.encode_multipart_formdata` is used to encode + the payload with the appropriate content type. Otherwise + :meth:`urllib.urlencode` is used with the + 'application/x-www-form-urlencoded' content type. + + Multipart encoding must be used when posting files, and it's reasonably + safe to use it in other times too. However, it may break request + signing, such as with OAuth. + + Supports an optional ``fields`` parameter of key/value strings AND + key/filetuple. A filetuple is a (filename, data, MIME type) tuple where + the MIME type is optional. For example:: + + fields = { + 'foo': 'bar', + 'fakefile': ('foofile.txt', 'contents of foofile'), + 'realfile': ('barfile.txt', open('realfile').read()), + 'typedfile': ('bazfile.bin', open('bazfile').read(), + 'image/jpeg'), + 'nonamefile': 'contents of nonamefile field', + } + + When uploading a file, providing a filename (the first parameter of the + tuple) is optional but recommended to best mimick behavior of browsers. + + Note that if ``headers`` are supplied, the 'Content-Type' header will + be overwritten because it depends on the dynamic random boundary string + which is used to compose the body of the request. The random boundary + string can be explicitly set with the ``multipart_boundary`` parameter. + """ + if encode_multipart: + body, content_type = encode_multipart_formdata( + fields or {}, boundary=multipart_boundary) + else: + body, content_type = (urlencode(fields or {}), + 'application/x-www-form-urlencoded') + + if headers is None: + headers = self.headers + + headers_ = {'Content-Type': content_type} + headers_.update(headers) + + return self.urlopen(method, url, body=body, headers=headers_, + **urlopen_kw) diff --git a/urllib3/response.py b/urllib3/response.py new file mode 100644 index 0000000..e69de95 --- /dev/null +++ b/urllib3/response.py @@ -0,0 +1,333 @@ +import zlib +import io +from socket import timeout as SocketTimeout + +from ._collections import HTTPHeaderDict +from .exceptions import ProtocolError, DecodeError, ReadTimeoutError +from .packages.six import string_types as basestring, binary_type +from .connection import HTTPException, BaseSSLError +from .util.response import is_fp_closed + + + +class DeflateDecoder(object): + + def __init__(self): + self._first_try = True + self._data = binary_type() + self._obj = zlib.decompressobj() + + def __getattr__(self, name): + return getattr(self._obj, name) + + def decompress(self, data): + if not self._first_try: + return self._obj.decompress(data) + + self._data += data + try: + return self._obj.decompress(data) + except zlib.error: + self._first_try = False + self._obj = zlib.decompressobj(-zlib.MAX_WBITS) + try: + return self.decompress(self._data) + finally: + self._data = None + + +def _get_decoder(mode): + if mode == 'gzip': + return zlib.decompressobj(16 + zlib.MAX_WBITS) + + return DeflateDecoder() + + +class HTTPResponse(io.IOBase): + """ + HTTP Response container. + + Backwards-compatible to httplib's HTTPResponse but the response ``body`` is + loaded and decoded on-demand when the ``data`` property is accessed. This + class is also compatible with the Python standard library's :mod:`io` + module, and can hence be treated as a readable object in the context of that + framework. + + Extra parameters for behaviour not present in httplib.HTTPResponse: + + :param preload_content: + If True, the response's body will be preloaded during construction. + + :param decode_content: + If True, attempts to decode specific content-encoding's based on headers + (like 'gzip' and 'deflate') will be skipped and raw data will be used + instead. + + :param original_response: + When this HTTPResponse wrapper is generated from an httplib.HTTPResponse + object, it's convenient to include the original for debug purposes. It's + otherwise unused. + """ + + CONTENT_DECODERS = ['gzip', 'deflate'] + REDIRECT_STATUSES = [301, 302, 303, 307, 308] + + def __init__(self, body='', headers=None, status=0, version=0, reason=None, + strict=0, preload_content=True, decode_content=True, + original_response=None, pool=None, connection=None): + + self.headers = HTTPHeaderDict() + if headers: + self.headers.update(headers) + self.status = status + self.version = version + self.reason = reason + self.strict = strict + self.decode_content = decode_content + + self._decoder = None + self._body = None + self._fp = None + self._original_response = original_response + self._fp_bytes_read = 0 + + if body and isinstance(body, (basestring, binary_type)): + self._body = body + + self._pool = pool + self._connection = connection + + if hasattr(body, 'read'): + self._fp = body + + if preload_content and not self._body: + self._body = self.read(decode_content=decode_content) + + def get_redirect_location(self): + """ + Should we redirect and where to? + + :returns: Truthy redirect location string if we got a redirect status + code and valid location. ``None`` if redirect status and no + location. ``False`` if not a redirect status code. + """ + if self.status in self.REDIRECT_STATUSES: + return self.headers.get('location') + + return False + + def release_conn(self): + if not self._pool or not self._connection: + return + + self._pool._put_conn(self._connection) + self._connection = None + + @property + def data(self): + # For backwords-compat with earlier urllib3 0.4 and earlier. + if self._body: + return self._body + + if self._fp: + return self.read(cache_content=True) + + def tell(self): + """ + Obtain the number of bytes pulled over the wire so far. May differ from + the amount of content returned by :meth:``HTTPResponse.read`` if bytes + are encoded on the wire (e.g, compressed). + """ + return self._fp_bytes_read + + def read(self, amt=None, decode_content=None, cache_content=False): + """ + Similar to :meth:`httplib.HTTPResponse.read`, but with two additional + parameters: ``decode_content`` and ``cache_content``. + + :param amt: + How much of the content to read. If specified, caching is skipped + because it doesn't make sense to cache partial content as the full + response. + + :param decode_content: + If True, will attempt to decode the body based on the + 'content-encoding' header. + + :param cache_content: + If True, will save the returned data such that the same result is + returned despite of the state of the underlying file object. This + is useful if you want the ``.data`` property to continue working + after having ``.read()`` the file object. (Overridden if ``amt`` is + set.) + """ + # Note: content-encoding value should be case-insensitive, per RFC 7230 + # Section 3.2 + content_encoding = self.headers.get('content-encoding', '').lower() + if self._decoder is None: + if content_encoding in self.CONTENT_DECODERS: + self._decoder = _get_decoder(content_encoding) + if decode_content is None: + decode_content = self.decode_content + + if self._fp is None: + return + + flush_decoder = False + + try: + try: + if amt is None: + # cStringIO doesn't like amt=None + data = self._fp.read() + flush_decoder = True + else: + cache_content = False + data = self._fp.read(amt) + if amt != 0 and not data: # Platform-specific: Buggy versions of Python. + # Close the connection when no data is returned + # + # This is redundant to what httplib/http.client _should_ + # already do. However, versions of python released before + # December 15, 2012 (http://bugs.python.org/issue16298) do + # not properly close the connection in all cases. There is + # no harm in redundantly calling close. + self._fp.close() + flush_decoder = True + + except SocketTimeout: + # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but + # there is yet no clean way to get at it from this context. + raise ReadTimeoutError(self._pool, None, 'Read timed out.') + + except BaseSSLError as e: + # FIXME: Is there a better way to differentiate between SSLErrors? + if not 'read operation timed out' in str(e): # Defensive: + # This shouldn't happen but just in case we're missing an edge + # case, let's avoid swallowing SSL errors. + raise + + raise ReadTimeoutError(self._pool, None, 'Read timed out.') + + except HTTPException as e: + # This includes IncompleteRead. + raise ProtocolError('Connection broken: %r' % e, e) + + self._fp_bytes_read += len(data) + + try: + if decode_content and self._decoder: + data = self._decoder.decompress(data) + except (IOError, zlib.error) as e: + raise DecodeError( + "Received response with content-encoding: %s, but " + "failed to decode it." % content_encoding, e) + + if flush_decoder and decode_content and self._decoder: + buf = self._decoder.decompress(binary_type()) + data += buf + self._decoder.flush() + + if cache_content: + self._body = data + + return data + + finally: + if self._original_response and self._original_response.isclosed(): + self.release_conn() + + def stream(self, amt=2**16, decode_content=None): + """ + A generator wrapper for the read() method. A call will block until + ``amt`` bytes have been read from the connection or until the + connection is closed. + + :param amt: + How much of the content to read. The generator will return up to + much data per iteration, but may return less. This is particularly + likely when using compressed data. However, the empty string will + never be returned. + + :param decode_content: + If True, will attempt to decode the body based on the + 'content-encoding' header. + """ + while not is_fp_closed(self._fp): + data = self.read(amt=amt, decode_content=decode_content) + + if data: + yield data + + @classmethod + def from_httplib(ResponseCls, r, **response_kw): + """ + Given an :class:`httplib.HTTPResponse` instance ``r``, return a + corresponding :class:`urllib3.response.HTTPResponse` object. + + Remaining parameters are passed to the HTTPResponse constructor, along + with ``original_response=r``. + """ + + headers = HTTPHeaderDict() + for k, v in r.getheaders(): + headers.add(k, v) + + # HTTPResponse objects in Python 3 don't have a .strict attribute + strict = getattr(r, 'strict', 0) + return ResponseCls(body=r, + headers=headers, + status=r.status, + version=r.version, + reason=r.reason, + strict=strict, + original_response=r, + **response_kw) + + # Backwards-compatibility methods for httplib.HTTPResponse + def getheaders(self): + return self.headers + + def getheader(self, name, default=None): + return self.headers.get(name, default) + + # Overrides from io.IOBase + def close(self): + if not self.closed: + self._fp.close() + + @property + def closed(self): + if self._fp is None: + return True + elif hasattr(self._fp, 'closed'): + return self._fp.closed + elif hasattr(self._fp, 'isclosed'): # Python 2 + return self._fp.isclosed() + else: + return True + + def fileno(self): + if self._fp is None: + raise IOError("HTTPResponse has no file to get a fileno from") + elif hasattr(self._fp, "fileno"): + return self._fp.fileno() + else: + raise IOError("The file-like object this HTTPResponse is wrapped " + "around has no file descriptor") + + def flush(self): + if self._fp is not None and hasattr(self._fp, 'flush'): + return self._fp.flush() + + def readable(self): + # This method is required for `io` module compatibility. + return True + + def readinto(self, b): + # This method is required for `io` module compatibility. + temp = self.read(len(b)) + if len(temp) == 0: + return 0 + else: + b[:len(temp)] = temp + return len(temp) diff --git a/urllib3/util/__init__.py b/urllib3/util/__init__.py new file mode 100644 index 0000000..8becc81 --- /dev/null +++ b/urllib3/util/__init__.py @@ -0,0 +1,24 @@ +# For backwards compatibility, provide imports that used to be here. +from .connection import is_connection_dropped +from .request import make_headers +from .response import is_fp_closed +from .ssl_ import ( + SSLContext, + HAS_SNI, + assert_fingerprint, + resolve_cert_reqs, + resolve_ssl_version, + ssl_wrap_socket, +) +from .timeout import ( + current_time, + Timeout, +) + +from .retry import Retry +from .url import ( + get_host, + parse_url, + split_first, + Url, +) diff --git a/urllib3/util/connection.py b/urllib3/util/connection.py new file mode 100644 index 0000000..2156993 --- /dev/null +++ b/urllib3/util/connection.py @@ -0,0 +1,97 @@ +import socket +try: + from select import poll, POLLIN +except ImportError: # `poll` doesn't exist on OSX and other platforms + poll = False + try: + from select import select + except ImportError: # `select` doesn't exist on AppEngine. + select = False + + +def is_connection_dropped(conn): # Platform-specific + """ + Returns True if the connection is dropped and should be closed. + + :param conn: + :class:`httplib.HTTPConnection` object. + + Note: For platforms like AppEngine, this will always return ``False`` to + let the platform handle connection recycling transparently for us. + """ + sock = getattr(conn, 'sock', False) + if sock is False: # Platform-specific: AppEngine + return False + if sock is None: # Connection already closed (such as by httplib). + return True + + if not poll: + if not select: # Platform-specific: AppEngine + return False + + try: + return select([sock], [], [], 0.0)[0] + except socket.error: + return True + + # This version is better on platforms that support it. + p = poll() + p.register(sock, POLLIN) + for (fno, ev) in p.poll(0.0): + if fno == sock.fileno(): + # Either data is buffered (bad), or the connection is dropped. + return True + + +# This function is copied from socket.py in the Python 2.7 standard +# library test suite. Added to its signature is only `socket_options`. +def create_connection(address, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, + source_address=None, socket_options=None): + """Connect to *address* and return the socket object. + + Convenience function. Connect to *address* (a 2-tuple ``(host, + port)``) and return the socket object. Passing the optional + *timeout* parameter will set the timeout on the socket instance + before attempting to connect. If no *timeout* is supplied, the + global default timeout setting returned by :func:`getdefaulttimeout` + is used. If *source_address* is set it must be a tuple of (host, port) + for the socket to bind as a source address before making the connection. + An host of '' or port 0 tells the OS to use the default. + """ + + host, port = address + err = None + for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM): + af, socktype, proto, canonname, sa = res + sock = None + try: + sock = socket.socket(af, socktype, proto) + + # If provided, set socket level options before connecting. + # This is the only addition urllib3 makes to this function. + _set_socket_options(sock, socket_options) + + if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT: + sock.settimeout(timeout) + if source_address: + sock.bind(source_address) + sock.connect(sa) + return sock + + except socket.error as _: + err = _ + if sock is not None: + sock.close() + + if err is not None: + raise err + else: + raise socket.error("getaddrinfo returns an empty list") + + +def _set_socket_options(sock, options): + if options is None: + return + + for opt in options: + sock.setsockopt(*opt) diff --git a/urllib3/util/request.py b/urllib3/util/request.py new file mode 100644 index 0000000..bc64f6b --- /dev/null +++ b/urllib3/util/request.py @@ -0,0 +1,71 @@ +from base64 import b64encode + +from ..packages.six import b + +ACCEPT_ENCODING = 'gzip,deflate' + + +def make_headers(keep_alive=None, accept_encoding=None, user_agent=None, + basic_auth=None, proxy_basic_auth=None, disable_cache=None): + """ + Shortcuts for generating request headers. + + :param keep_alive: + If ``True``, adds 'connection: keep-alive' header. + + :param accept_encoding: + Can be a boolean, list, or string. + ``True`` translates to 'gzip,deflate'. + List will get joined by comma. + String will be used as provided. + + :param user_agent: + String representing the user-agent you want, such as + "python-urllib3/0.6" + + :param basic_auth: + Colon-separated username:password string for 'authorization: basic ...' + auth header. + + :param proxy_basic_auth: + Colon-separated username:password string for 'proxy-authorization: basic ...' + auth header. + + :param disable_cache: + If ``True``, adds 'cache-control: no-cache' header. + + Example:: + + >>> make_headers(keep_alive=True, user_agent="Batman/1.0") + {'connection': 'keep-alive', 'user-agent': 'Batman/1.0'} + >>> make_headers(accept_encoding=True) + {'accept-encoding': 'gzip,deflate'} + """ + headers = {} + if accept_encoding: + if isinstance(accept_encoding, str): + pass + elif isinstance(accept_encoding, list): + accept_encoding = ','.join(accept_encoding) + else: + accept_encoding = ACCEPT_ENCODING + headers['accept-encoding'] = accept_encoding + + if user_agent: + headers['user-agent'] = user_agent + + if keep_alive: + headers['connection'] = 'keep-alive' + + if basic_auth: + headers['authorization'] = 'Basic ' + \ + b64encode(b(basic_auth)).decode('utf-8') + + if proxy_basic_auth: + headers['proxy-authorization'] = 'Basic ' + \ + b64encode(b(proxy_basic_auth)).decode('utf-8') + + if disable_cache: + headers['cache-control'] = 'no-cache' + + return headers diff --git a/urllib3/util/response.py b/urllib3/util/response.py new file mode 100644 index 0000000..45fff55 --- /dev/null +++ b/urllib3/util/response.py @@ -0,0 +1,22 @@ +def is_fp_closed(obj): + """ + Checks whether a given file-like object is closed. + + :param obj: + The file-like object to check. + """ + + try: + # Check via the official file-like-object way. + return obj.closed + except AttributeError: + pass + + try: + # Check if the object is a container for another file-like object that + # gets released on exhaustion (e.g. HTTPResponse). + return obj.fp is None + except AttributeError: + pass + + raise ValueError("Unable to determine whether fp is closed.") diff --git a/urllib3/util/retry.py b/urllib3/util/retry.py new file mode 100644 index 0000000..eb560df --- /dev/null +++ b/urllib3/util/retry.py @@ -0,0 +1,279 @@ +import time +import logging + +from ..exceptions import ( + ProtocolError, + ConnectTimeoutError, + ReadTimeoutError, + MaxRetryError, +) +from ..packages import six + + +log = logging.getLogger(__name__) + + +class Retry(object): + """ Retry configuration. + + Each retry attempt will create a new Retry object with updated values, so + they can be safely reused. + + Retries can be defined as a default for a pool:: + + retries = Retry(connect=5, read=2, redirect=5) + http = PoolManager(retries=retries) + response = http.request('GET', 'http://example.com/') + + Or per-request (which overrides the default for the pool):: + + response = http.request('GET', 'http://example.com/', retries=Retry(10)) + + Retries can be disabled by passing ``False``:: + + response = http.request('GET', 'http://example.com/', retries=False) + + Errors will be wrapped in :class:`~urllib3.exceptions.MaxRetryError` unless + retries are disabled, in which case the causing exception will be raised. + + + :param int total: + Total number of retries to allow. Takes precedence over other counts. + + Set to ``None`` to remove this constraint and fall back on other + counts. It's a good idea to set this to some sensibly-high value to + account for unexpected edge cases and avoid infinite retry loops. + + Set to ``0`` to fail on the first retry. + + Set to ``False`` to disable and imply ``raise_on_redirect=False``. + + :param int connect: + How many connection-related errors to retry on. + + These are errors raised before the request is sent to the remote server, + which we assume has not triggered the server to process the request. + + Set to ``0`` to fail on the first retry of this type. + + :param int read: + How many times to retry on read errors. + + These errors are raised after the request was sent to the server, so the + request may have side-effects. + + Set to ``0`` to fail on the first retry of this type. + + :param int redirect: + How many redirects to perform. Limit this to avoid infinite redirect + loops. + + A redirect is a HTTP response with a status code 301, 302, 303, 307 or + 308. + + Set to ``0`` to fail on the first retry of this type. + + Set to ``False`` to disable and imply ``raise_on_redirect=False``. + + :param iterable method_whitelist: + Set of uppercased HTTP method verbs that we should retry on. + + By default, we only retry on methods which are considered to be + indempotent (multiple requests with the same parameters end with the + same state). See :attr:`Retry.DEFAULT_METHOD_WHITELIST`. + + :param iterable status_forcelist: + A set of HTTP status codes that we should force a retry on. + + By default, this is disabled with ``None``. + + :param float backoff_factor: + A backoff factor to apply between attempts. urllib3 will sleep for:: + + {backoff factor} * (2 ^ ({number of total retries} - 1)) + + seconds. If the backoff_factor is 0.1, then :func:`.sleep` will sleep + for [0.1s, 0.2s, 0.4s, ...] between retries. It will never be longer + than :attr:`Retry.MAX_BACKOFF`. + + By default, backoff is disabled (set to 0). + + :param bool raise_on_redirect: Whether, if the number of redirects is + exhausted, to raise a MaxRetryError, or to return a response with a + response code in the 3xx range. + """ + + DEFAULT_METHOD_WHITELIST = frozenset([ + 'HEAD', 'GET', 'PUT', 'DELETE', 'OPTIONS', 'TRACE']) + + #: Maximum backoff time. + BACKOFF_MAX = 120 + + def __init__(self, total=10, connect=None, read=None, redirect=None, + method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None, + backoff_factor=0, raise_on_redirect=True, _observed_errors=0): + + self.total = total + self.connect = connect + self.read = read + + if redirect is False or total is False: + redirect = 0 + raise_on_redirect = False + + self.redirect = redirect + self.status_forcelist = status_forcelist or set() + self.method_whitelist = method_whitelist + self.backoff_factor = backoff_factor + self.raise_on_redirect = raise_on_redirect + self._observed_errors = _observed_errors # TODO: use .history instead? + + def new(self, **kw): + params = dict( + total=self.total, + connect=self.connect, read=self.read, redirect=self.redirect, + method_whitelist=self.method_whitelist, + status_forcelist=self.status_forcelist, + backoff_factor=self.backoff_factor, + raise_on_redirect=self.raise_on_redirect, + _observed_errors=self._observed_errors, + ) + params.update(kw) + return type(self)(**params) + + @classmethod + def from_int(cls, retries, redirect=True, default=None): + """ Backwards-compatibility for the old retries format.""" + if retries is None: + retries = default if default is not None else cls.DEFAULT + + if isinstance(retries, Retry): + return retries + + redirect = bool(redirect) and None + new_retries = cls(retries, redirect=redirect) + log.debug("Converted retries value: %r -> %r" % (retries, new_retries)) + return new_retries + + def get_backoff_time(self): + """ Formula for computing the current backoff + + :rtype: float + """ + if self._observed_errors <= 1: + return 0 + + backoff_value = self.backoff_factor * (2 ** (self._observed_errors - 1)) + return min(self.BACKOFF_MAX, backoff_value) + + def sleep(self): + """ Sleep between retry attempts using an exponential backoff. + + By default, the backoff factor is 0 and this method will return + immediately. + """ + backoff = self.get_backoff_time() + if backoff <= 0: + return + time.sleep(backoff) + + def _is_connection_error(self, err): + """ Errors when we're fairly sure that the server did not receive the + request, so it should be safe to retry. + """ + return isinstance(err, ConnectTimeoutError) + + def _is_read_error(self, err): + """ Errors that occur after the request has been started, so we can't + assume that the server did not process any of it. + """ + return isinstance(err, (ReadTimeoutError, ProtocolError)) + + def is_forced_retry(self, method, status_code): + """ Is this method/response retryable? (Based on method/codes whitelists) + """ + if self.method_whitelist and method.upper() not in self.method_whitelist: + return False + + return self.status_forcelist and status_code in self.status_forcelist + + def is_exhausted(self): + """ Are we out of retries? + """ + retry_counts = (self.total, self.connect, self.read, self.redirect) + retry_counts = list(filter(None, retry_counts)) + if not retry_counts: + return False + + return min(retry_counts) < 0 + + def increment(self, method=None, url=None, response=None, error=None, _pool=None, _stacktrace=None): + """ Return a new Retry object with incremented retry counters. + + :param response: A response object, or None, if the server did not + return a response. + :type response: :class:`~urllib3.response.HTTPResponse` + :param Exception error: An error encountered during the request, or + None if the response was received successfully. + + :return: A new ``Retry`` object. + """ + if self.total is False and error: + # Disabled, indicate to re-raise the error. + raise six.reraise(type(error), error, _stacktrace) + + total = self.total + if total is not None: + total -= 1 + + _observed_errors = self._observed_errors + connect = self.connect + read = self.read + redirect = self.redirect + + if error and self._is_connection_error(error): + # Connect retry? + if connect is False: + raise six.reraise(type(error), error, _stacktrace) + elif connect is not None: + connect -= 1 + _observed_errors += 1 + + elif error and self._is_read_error(error): + # Read retry? + if read is False: + raise six.reraise(type(error), error, _stacktrace) + elif read is not None: + read -= 1 + _observed_errors += 1 + + elif response and response.get_redirect_location(): + # Redirect retry? + if redirect is not None: + redirect -= 1 + + else: + # FIXME: Nothing changed, scenario doesn't make sense. + _observed_errors += 1 + + new_retry = self.new( + total=total, + connect=connect, read=read, redirect=redirect, + _observed_errors=_observed_errors) + + if new_retry.is_exhausted(): + raise MaxRetryError(_pool, url, error) + + log.debug("Incremented Retry for (url='%s'): %r" % (url, new_retry)) + + return new_retry + + + def __repr__(self): + return ('{cls.__name__}(total={self.total}, connect={self.connect}, ' + 'read={self.read}, redirect={self.redirect})').format( + cls=type(self), self=self) + + +# For backwards compatibility (equivalent to pre-v1.9): +Retry.DEFAULT = Retry(3) diff --git a/urllib3/util/ssl_.py b/urllib3/util/ssl_.py new file mode 100644 index 0000000..9cfe2d2 --- /dev/null +++ b/urllib3/util/ssl_.py @@ -0,0 +1,132 @@ +from binascii import hexlify, unhexlify +from hashlib import md5, sha1 + +from ..exceptions import SSLError + + +try: # Test for SSL features + SSLContext = None + HAS_SNI = False + + import ssl + from ssl import wrap_socket, CERT_NONE, PROTOCOL_SSLv23 + from ssl import SSLContext # Modern SSL? + from ssl import HAS_SNI # Has SNI? +except ImportError: + pass + + +def assert_fingerprint(cert, fingerprint): + """ + Checks if given fingerprint matches the supplied certificate. + + :param cert: + Certificate as bytes object. + :param fingerprint: + Fingerprint as string of hexdigits, can be interspersed by colons. + """ + + # Maps the length of a digest to a possible hash function producing + # this digest. + hashfunc_map = { + 16: md5, + 20: sha1 + } + + fingerprint = fingerprint.replace(':', '').lower() + digest_length, odd = divmod(len(fingerprint), 2) + + if odd or digest_length not in hashfunc_map: + raise SSLError('Fingerprint is of invalid length.') + + # We need encode() here for py32; works on py2 and p33. + fingerprint_bytes = unhexlify(fingerprint.encode()) + + hashfunc = hashfunc_map[digest_length] + + cert_digest = hashfunc(cert).digest() + + if not cert_digest == fingerprint_bytes: + raise SSLError('Fingerprints did not match. Expected "{0}", got "{1}".' + .format(hexlify(fingerprint_bytes), + hexlify(cert_digest))) + + +def resolve_cert_reqs(candidate): + """ + Resolves the argument to a numeric constant, which can be passed to + the wrap_socket function/method from the ssl module. + Defaults to :data:`ssl.CERT_NONE`. + If given a string it is assumed to be the name of the constant in the + :mod:`ssl` module or its abbrevation. + (So you can specify `REQUIRED` instead of `CERT_REQUIRED`. + If it's neither `None` nor a string we assume it is already the numeric + constant which can directly be passed to wrap_socket. + """ + if candidate is None: + return CERT_NONE + + if isinstance(candidate, str): + res = getattr(ssl, candidate, None) + if res is None: + res = getattr(ssl, 'CERT_' + candidate) + return res + + return candidate + + +def resolve_ssl_version(candidate): + """ + like resolve_cert_reqs + """ + if candidate is None: + return PROTOCOL_SSLv23 + + if isinstance(candidate, str): + res = getattr(ssl, candidate, None) + if res is None: + res = getattr(ssl, 'PROTOCOL_' + candidate) + return res + + return candidate + + +if SSLContext is not None: # Python 3.2+ + def ssl_wrap_socket(sock, keyfile=None, certfile=None, cert_reqs=None, + ca_certs=None, server_hostname=None, + ssl_version=None): + """ + All arguments except `server_hostname` have the same meaning as for + :func:`ssl.wrap_socket` + + :param server_hostname: + Hostname of the expected certificate + """ + context = SSLContext(ssl_version) + context.verify_mode = cert_reqs + + # Disable TLS compression to migitate CRIME attack (issue #309) + OP_NO_COMPRESSION = 0x20000 + context.options |= OP_NO_COMPRESSION + + if ca_certs: + try: + context.load_verify_locations(ca_certs) + # Py32 raises IOError + # Py33 raises FileNotFoundError + except Exception as e: # Reraise as SSLError + raise SSLError(e) + if certfile: + # FIXME: This block needs a test. + context.load_cert_chain(certfile, keyfile) + if HAS_SNI: # Platform-specific: OpenSSL with enabled SNI + return context.wrap_socket(sock, server_hostname=server_hostname) + return context.wrap_socket(sock) + +else: # Python 3.1 and earlier + def ssl_wrap_socket(sock, keyfile=None, certfile=None, cert_reqs=None, + ca_certs=None, server_hostname=None, + ssl_version=None): + return wrap_socket(sock, keyfile=keyfile, certfile=certfile, + ca_certs=ca_certs, cert_reqs=cert_reqs, + ssl_version=ssl_version) diff --git a/urllib3/util/timeout.py b/urllib3/util/timeout.py new file mode 100644 index 0000000..ea7027f --- /dev/null +++ b/urllib3/util/timeout.py @@ -0,0 +1,240 @@ +# The default socket timeout, used by httplib to indicate that no timeout was +# specified by the user +from socket import _GLOBAL_DEFAULT_TIMEOUT +import time + +from ..exceptions import TimeoutStateError + +# A sentinel value to indicate that no timeout was specified by the user in +# urllib3 +_Default = object() + +def current_time(): + """ + Retrieve the current time. This function is mocked out in unit testing. + """ + return time.time() + + +class Timeout(object): + """ Timeout configuration. + + Timeouts can be defined as a default for a pool:: + + timeout = Timeout(connect=2.0, read=7.0) + http = PoolManager(timeout=timeout) + response = http.request('GET', 'http://example.com/') + + Or per-request (which overrides the default for the pool):: + + response = http.request('GET', 'http://example.com/', timeout=Timeout(10)) + + Timeouts can be disabled by setting all the parameters to ``None``:: + + no_timeout = Timeout(connect=None, read=None) + response = http.request('GET', 'http://example.com/, timeout=no_timeout) + + + :param total: + This combines the connect and read timeouts into one; the read timeout + will be set to the time leftover from the connect attempt. In the + event that both a connect timeout and a total are specified, or a read + timeout and a total are specified, the shorter timeout will be applied. + + Defaults to None. + + :type total: integer, float, or None + + :param connect: + The maximum amount of time to wait for a connection attempt to a server + to succeed. Omitting the parameter will default the connect timeout to + the system default, probably `the global default timeout in socket.py + <http://hg.python.org/cpython/file/603b4d593758/Lib/socket.py#l535>`_. + None will set an infinite timeout for connection attempts. + + :type connect: integer, float, or None + + :param read: + The maximum amount of time to wait between consecutive + read operations for a response from the server. Omitting + the parameter will default the read timeout to the system + default, probably `the global default timeout in socket.py + <http://hg.python.org/cpython/file/603b4d593758/Lib/socket.py#l535>`_. + None will set an infinite timeout. + + :type read: integer, float, or None + + .. note:: + + Many factors can affect the total amount of time for urllib3 to return + an HTTP response. + + For example, Python's DNS resolver does not obey the timeout specified + on the socket. Other factors that can affect total request time include + high CPU load, high swap, the program running at a low priority level, + or other behaviors. + + In addition, the read and total timeouts only measure the time between + read operations on the socket connecting the client and the server, + not the total amount of time for the request to return a complete + response. For most requests, the timeout is raised because the server + has not sent the first byte in the specified time. This is not always + the case; if a server streams one byte every fifteen seconds, a timeout + of 20 seconds will not trigger, even though the request will take + several minutes to complete. + + If your goal is to cut off any request after a set amount of wall clock + time, consider having a second "watcher" thread to cut off a slow + request. + """ + + #: A sentinel object representing the default timeout value + DEFAULT_TIMEOUT = _GLOBAL_DEFAULT_TIMEOUT + + def __init__(self, total=None, connect=_Default, read=_Default): + self._connect = self._validate_timeout(connect, 'connect') + self._read = self._validate_timeout(read, 'read') + self.total = self._validate_timeout(total, 'total') + self._start_connect = None + + def __str__(self): + return '%s(connect=%r, read=%r, total=%r)' % ( + type(self).__name__, self._connect, self._read, self.total) + + @classmethod + def _validate_timeout(cls, value, name): + """ Check that a timeout attribute is valid. + + :param value: The timeout value to validate + :param name: The name of the timeout attribute to validate. This is + used to specify in error messages. + :return: The validated and casted version of the given value. + :raises ValueError: If the type is not an integer or a float, or if it + is a numeric value less than zero. + """ + if value is _Default: + return cls.DEFAULT_TIMEOUT + + if value is None or value is cls.DEFAULT_TIMEOUT: + return value + + try: + float(value) + except (TypeError, ValueError): + raise ValueError("Timeout value %s was %s, but it must be an " + "int or float." % (name, value)) + + try: + if value < 0: + raise ValueError("Attempted to set %s timeout to %s, but the " + "timeout cannot be set to a value less " + "than 0." % (name, value)) + except TypeError: # Python 3 + raise ValueError("Timeout value %s was %s, but it must be an " + "int or float." % (name, value)) + + return value + + @classmethod + def from_float(cls, timeout): + """ Create a new Timeout from a legacy timeout value. + + The timeout value used by httplib.py sets the same timeout on the + connect(), and recv() socket requests. This creates a :class:`Timeout` + object that sets the individual timeouts to the ``timeout`` value + passed to this function. + + :param timeout: The legacy timeout value. + :type timeout: integer, float, sentinel default object, or None + :return: Timeout object + :rtype: :class:`Timeout` + """ + return Timeout(read=timeout, connect=timeout) + + def clone(self): + """ Create a copy of the timeout object + + Timeout properties are stored per-pool but each request needs a fresh + Timeout object to ensure each one has its own start/stop configured. + + :return: a copy of the timeout object + :rtype: :class:`Timeout` + """ + # We can't use copy.deepcopy because that will also create a new object + # for _GLOBAL_DEFAULT_TIMEOUT, which socket.py uses as a sentinel to + # detect the user default. + return Timeout(connect=self._connect, read=self._read, + total=self.total) + + def start_connect(self): + """ Start the timeout clock, used during a connect() attempt + + :raises urllib3.exceptions.TimeoutStateError: if you attempt + to start a timer that has been started already. + """ + if self._start_connect is not None: + raise TimeoutStateError("Timeout timer has already been started.") + self._start_connect = current_time() + return self._start_connect + + def get_connect_duration(self): + """ Gets the time elapsed since the call to :meth:`start_connect`. + + :return: Elapsed time. + :rtype: float + :raises urllib3.exceptions.TimeoutStateError: if you attempt + to get duration for a timer that hasn't been started. + """ + if self._start_connect is None: + raise TimeoutStateError("Can't get connect duration for timer " + "that has not started.") + return current_time() - self._start_connect + + @property + def connect_timeout(self): + """ Get the value to use when setting a connection timeout. + + This will be a positive float or integer, the value None + (never timeout), or the default system timeout. + + :return: Connect timeout. + :rtype: int, float, :attr:`Timeout.DEFAULT_TIMEOUT` or None + """ + if self.total is None: + return self._connect + + if self._connect is None or self._connect is self.DEFAULT_TIMEOUT: + return self.total + + return min(self._connect, self.total) + + @property + def read_timeout(self): + """ Get the value for the read timeout. + + This assumes some time has elapsed in the connection timeout and + computes the read timeout appropriately. + + If self.total is set, the read timeout is dependent on the amount of + time taken by the connect timeout. If the connection time has not been + established, a :exc:`~urllib3.exceptions.TimeoutStateError` will be + raised. + + :return: Value to use for the read timeout. + :rtype: int, float, :attr:`Timeout.DEFAULT_TIMEOUT` or None + :raises urllib3.exceptions.TimeoutStateError: If :meth:`start_connect` + has not yet been called on this object. + """ + if (self.total is not None and + self.total is not self.DEFAULT_TIMEOUT and + self._read is not None and + self._read is not self.DEFAULT_TIMEOUT): + # In case the connect timeout has not yet been established. + if self._start_connect is None: + return self._read + return max(0, min(self.total - self.get_connect_duration(), + self._read)) + elif self.total is not None and self.total is not self.DEFAULT_TIMEOUT: + return max(0, self.total - self.get_connect_duration()) + else: + return self._read diff --git a/urllib3/util/url.py b/urllib3/util/url.py new file mode 100644 index 0000000..487d456 --- /dev/null +++ b/urllib3/util/url.py @@ -0,0 +1,171 @@ +from collections import namedtuple + +from ..exceptions import LocationParseError + + +url_attrs = ['scheme', 'auth', 'host', 'port', 'path', 'query', 'fragment'] + + +class Url(namedtuple('Url', url_attrs)): + """ + Datastructure for representing an HTTP URL. Used as a return value for + :func:`parse_url`. + """ + slots = () + + def __new__(cls, scheme=None, auth=None, host=None, port=None, path=None, + query=None, fragment=None): + return super(Url, cls).__new__(cls, scheme, auth, host, port, path, + query, fragment) + + @property + def hostname(self): + """For backwards-compatibility with urlparse. We're nice like that.""" + return self.host + + @property + def request_uri(self): + """Absolute path including the query string.""" + uri = self.path or '/' + + if self.query is not None: + uri += '?' + self.query + + return uri + + @property + def netloc(self): + """Network location including host and port""" + if self.port: + return '%s:%d' % (self.host, self.port) + return self.host + + +def split_first(s, delims): + """ + Given a string and an iterable of delimiters, split on the first found + delimiter. Return two split parts and the matched delimiter. + + If not found, then the first part is the full input string. + + Example:: + + >>> split_first('foo/bar?baz', '?/=') + ('foo', 'bar?baz', '/') + >>> split_first('foo/bar?baz', '123') + ('foo/bar?baz', '', None) + + Scales linearly with number of delims. Not ideal for large number of delims. + """ + min_idx = None + min_delim = None + for d in delims: + idx = s.find(d) + if idx < 0: + continue + + if min_idx is None or idx < min_idx: + min_idx = idx + min_delim = d + + if min_idx is None or min_idx < 0: + return s, '', None + + return s[:min_idx], s[min_idx+1:], min_delim + + +def parse_url(url): + """ + Given a url, return a parsed :class:`.Url` namedtuple. Best-effort is + performed to parse incomplete urls. Fields not provided will be None. + + Partly backwards-compatible with :mod:`urlparse`. + + Example:: + + >>> parse_url('http://google.com/mail/') + Url(scheme='http', host='google.com', port=None, path='/', ...) + >>> parse_url('google.com:80') + Url(scheme=None, host='google.com', port=80, path=None, ...) + >>> parse_url('/foo?bar') + Url(scheme=None, host=None, port=None, path='/foo', query='bar', ...) + """ + + # While this code has overlap with stdlib's urlparse, it is much + # simplified for our needs and less annoying. + # Additionally, this implementations does silly things to be optimal + # on CPython. + + if not url: + # Empty + return Url() + + scheme = None + auth = None + host = None + port = None + path = None + fragment = None + query = None + + # Scheme + if '://' in url: + scheme, url = url.split('://', 1) + + # Find the earliest Authority Terminator + # (http://tools.ietf.org/html/rfc3986#section-3.2) + url, path_, delim = split_first(url, ['/', '?', '#']) + + if delim: + # Reassemble the path + path = delim + path_ + + # Auth + if '@' in url: + # Last '@' denotes end of auth part + auth, url = url.rsplit('@', 1) + + # IPv6 + if url and url[0] == '[': + host, url = url.split(']', 1) + host += ']' + + # Port + if ':' in url: + _host, port = url.split(':', 1) + + if not host: + host = _host + + if port: + # If given, ports must be integers. + if not port.isdigit(): + raise LocationParseError(url) + port = int(port) + else: + # Blank ports are cool, too. (rfc3986#section-3.2.3) + port = None + + elif not host and url: + host = url + + if not path: + return Url(scheme, auth, host, port, path, query, fragment) + + # Fragment + if '#' in path: + path, fragment = path.split('#', 1) + + # Query + if '?' in path: + path, query = path.split('?', 1) + + return Url(scheme, auth, host, port, path, query, fragment) + + +def get_host(url): + """ + Deprecated. Use :func:`.parse_url` instead. + """ + p = parse_url(url) + return p.scheme or 'http', p.hostname, p.port |