From 38dc3b8f231cf36bcc771001318556d9e84c2889 Mon Sep 17 00:00:00 2001 From: Per Andersson Date: Fri, 7 Sep 2012 02:45:18 +0200 Subject: Imported Upstream version 1.1.5 --- README.rst | 159 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 README.rst (limited to 'README.rst') diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..08dfc50 --- /dev/null +++ b/README.rst @@ -0,0 +1,159 @@ +====== +Bleach +====== + +Bleach is an HTML sanitizing library that escapes or strips markup and +attributes based on a white list. Bleach can also linkify text safely, applying +filters that Django's ``urlize`` filter cannot, and optionally setting ``rel`` +attributes, even on links already in the text. + +Bleach is intended for sanitizing text from *untrusted* sources. If you find +yourself jumping through hoops to allow your site administrators to do lots of +things, you're probably outside the use cases. Either trust those users, or +don't. + +Because it relies on html5lib_, Bleach is as good as modern browsers at dealing +with weird, quirky HTML fragments. And *any* of Bleach's methods will fix +unbalanced or mis-nested tags. + +The version on `github `_ is the most +up-to-date and contains the latest bug fixes. + + +Basic Use +========= + +The simplest way to use Bleach is:: + + >>> import bleach + + >>> bleach.clean('an example') + u'an <script>evil()</script> example' + + >>> bleach.linkify('an http://example.com url') + u'an http://example.com url + + >>> bleach.delinkify('a link') + u'a link' + +*NB*: Bleach always returns a ``unicode`` object, whether you give it a +bytestring or a ``unicode`` object, but Bleach does not attempt to detect +incoming character encodings, and will assume UTF-8. If you are using a +different character encoding, you should convert from a bytestring to +``unicode`` before passing the text to Bleach. + + +Customizing Bleach +================== + +``clean()``, ``linkify()`` and ``delinkify()`` can take several optional +keyword arguments to customize their behavior. + + +``clean()`` +----------- + +``bleach.clean()`` is the primary tool in Bleach. It uses html5lib_ to parse a +document fragment into a tree and does the sanitization during tokenizing, +which is incredibly powerful and has several advantages over regular +expression-based sanitization. + +``tags`` + A whitelist of HTML tags. Must be a list. Defaults to + ``bleach.ALLOWED_TAGS``. +``attributes`` + A whitelist of HTML attributes. Either a list, in which case all attributes + are allowed on all elements, or a dict, with tag names as keys and lists of + allowed attributes as values ('*' is a wildcard key to allow an attribute on + any tag). Or it is possible to pass a callable instead of a list that accepts + name and value of attribute and returns True of False. Defaults to + ``bleach.ALLOWED_ATTRIBUTES``. +``styles`` + A whitelist of allowed CSS properties within a ``style`` attribute. (Note + that ``style`` attributes are not allowed by default.) Must be a list. + Defaults to ``[]``. +``strip`` + Strip disallowed HTML instead of escaping it. A boolean. Defaults to + ``False``. +``strip_comments`` + Strip HTML comments. A boolean. Defaults to ``True``. + + +``linkify()`` +------------- + +``bleach.linkify()`` turns things that look like URLs or (optionally) email +addresses and turns them into links. It does this smartly, only looking in text +nodes, and never within ```` tags. + +There are options that affect output, and some of these are also applied to +links already found in the text. These are designed to allow you to set +attributes like ``rel="nofollow"`` or ``target``, or push outgoing links +through a redirection URL, and do this to links already in the text, as well. + +``nofollow`` + Add ``rel="nofollow"`` to non-relative links (both created by ``linkify()`` + and those already present in the text). Defaults to ``True``. +``filter_url`` + A callable through which the ``href`` attribute of links (both created by + ``linkify()`` and already present in the text) will be passed. Must accept a + single argument and return a string. +``filter_text`` + A callable through which the text of links (only those created by + ``linkify``) will be passed. Must accept a single argument and return a + string. +``skip_pre`` + Do not create new links inside ``
`` sections. Still follows
+  ``nofollow``. Defaults to ``False``.
+``parse_email``
+  Linkify email addresses with ``mailto:``. Defaults to ``False``.
+``target``
+  Set a ``target`` attribute on links. Like ``nofollow``, if ``target`` is not
+  ``None``, will set the attribute on links already in the text, as well.
+  Defaults to ``None``.
+
+
+``delinkify()``
+---------------
+
+``bleach.delinkify()`` is basically the opposite of ``linkify()``. It strips
+links out of text except, optionally, relative links, or links to domains
+you've whitelisted.
+
+``allow_domains``
+  Allow links to the domains in this list. Set to ``None`` or an empty list to
+  disallow all non-relative domains. See below for wildcards. Defaults to
+  ``None``.
+``allow_relative``
+  Allow relative links (i.e. those with no hostname). Defaults to ``False``.
+
+
+Wildcards
+^^^^^^^^^
+
+To allow links to a domain and its subdomains, ``allow_domains`` accepts two
+types of wildcard arguments in domains:
+
+``*``
+  Allow a single level of subdomain. This can be anywhere in the hostname, even
+  the TLD. This allows you to, for example, allow links to ``example.*``.
+  ``*.example.com`` will match both ``foo.example.com`` and ``example.com``.
+  ::
+    >>> delinkify('bar', \
+    ... allow_domains=['*.ex.*'])
+    u'bar'
+    >>> delinkify('bar', allow_domains=['*.ex.mp'])
+    u'bar
+``**``
+  To allow any number of *preceding* subdomains, you can start a hostname with
+  ``**``. Note that unlike ``*``, ``**`` may only appear once, and only at the
+  beginning of a hostname.
+  ::
+    >>> delinkify('t', \
+    ... allow_domains=['**.ex.mp'])
+    u't'
+  If ``**`` appears anywhere but the beginning of a hostname, ``delinkify``
+  will throw ``bleach.ValidationError`` (which is a ``ValueError`` subclass,
+  for easy catching).
+
+.. _html5lib: http://code.google.com/p/html5lib/
-- 
cgit v1.2.3