diff options
author | Daniel Axtens <dja@axtens.net> | 2019-05-14 16:11:25 +1000 |
---|---|---|
committer | Stephen Finucane <stephen@that.guru> | 2019-05-14 15:17:25 +0100 |
commit | 0c7d45e35fe409dd056ccc4558808a8e97ffaf0e (patch) | |
tree | 4fec14bcbedb57daab0c7840d81157eb4fa4133e | |
parent | b538d05f430226629fcbbe1a92c5333a5c46a59d (diff) | |
download | patchwork-0c7d45e35fe409dd056ccc4558808a8e97ffaf0e.tar patchwork-0c7d45e35fe409dd056ccc4558808a8e97ffaf0e.tar.gz |
Revert "parser: Ensure whitespace is stripped for long headers"
This reverts commit 841f966b8d54b2f51ab1c498eed6e5391f2546a9.
In July 2018, we received a report of OzLabs patchwork mangling
emails that have subjects containing words with internal commas,
like "Insert DT binding for foo,bar" (#197).
Stephen took a look and came up with the comment this reverts. Quoting
the commit message:
RFC2822 states that long headers can be wrapped using CRLF followed by
WSP [1]. For example:
Subject: Foo bar,
baz
Should be parsed as:
Foo bar,baz
As it turns out, this is not the case. Journey with me to
section 2.2.3 of RFC 2822:
2.2.3. Long Header Fields
Each header field is logically a single line of characters comprising
the field name, the colon, and the field body. For convenience
however, and to deal with the 998/78 character limitations per line,
the field body portion of a header field can be split into a multiple
line representation; this is called "folding". The general rule is
that wherever this standard allows for folding white space (not
simply WSP characters), a CRLF may be inserted before any WSP. For
example, the header field:
Subject: This is a test
can be represented as:
Subject: This
is a test
So the issue with the example in the reverted commit is that there is no
folding white space in "bar,baz", so it's not valid to split it.
These are valid:
Subject: Foo bar,baz
Subject: Foo
bar,baz
but splitting "bar,baz" into "bar,\n baz" is not valid.
What then is correct unfolding behaviour? Quoting the RFC again:
The process of moving from this folded multiple-line representation
of a header field to its single line representation is called
"unfolding". Unfolding is accomplished by simply removing any CRLF
that is immediately followed by WSP. Each header field should be
treated in its unfolded form for further syntactic and semantic
evaluation.
In other words, the unfolding rule requires you to strip the CRLF, but
it does not permit you to strip the WSP. Indeed, if "bar,\n baz" is
received, the correct unfolding is "bar, baz".
If you do strip the WSP, you end up mashing words together, such as in
https://patchwork.ozlabs.org/patch/1097852/
So revert the commit, restoring original behaviour, but keep a corrected
version of the test.
This presents a big question though: how did Rob's email up with a
mangled subject line?
To answer this question, you end up having to learn about OzLabs
Patchwork and how it differs from Patchwork the project.
OzLabs Patchwork (patchwork.ozlabs.org) is an installation of Patchwork.
Part of what makes it so useful for so many projects is a little
intervening layer that can massage some mail to make it end up in the
right project. Email that lands in the device tree project is an example
of email that goes through this process. I only learned about this
today and I haven't looked in any detail at precisely what is done to
the mail. The script is not part of the Patchwork project.
This intervening filter is a Python script that runs - and this is an
important detail - in Python 2.7.
Ignoring all the details, the filter basically operates in a pipe
between the mail program and patchwork's parsemail, like
(mail from system) | filter.py | parsemail
At it's very simplest, filter.py acts as follows:
import email
import sys
mail = email.parse_from_file(sys.stdin)
sys.stdout.write(mail.as_string())
Fascinatingly, if you take Rob's email from #197 and put it through this
process, you can see that it is getting mangled:
Before:
Subject: [PATCH v2 3/4] dt-bindings: sound: wm8994: document wlf,csnaddr-pd property
After:
Subject: [PATCH v2 3/4] dt-bindings: sound: wm8994: document wlf,
csnaddr-pd property
You can see that python27 has incorrectly wrapped the header, breaking
where there is not a foldable space. Python3 does not have this issue.
To summarise:
- part of the magic of OzLabs PW is a filter to make sure mail gets to
the right place. This isn't part of the Patchwork project and so is
usually invisible to patchwork developers.
- the filter is written in python27. The email module in py27 has a bug
that incorrectly breaks subjects around commas within words.
- patchwork correctly unfolds those broken subjects with a space after
the comma.
- the extra space was interpreted as a bug in patchwork, leading to a
misinterpretation of the spec to strip out the whitespace that was
believed to be in error.
- that broke other wrapped subjects.
To solve this, revert the commit and I'll work with jk to get the filter
script into py3 compatibility. (Given that py27 sunsets in ~7mo, trying
to fix it is not worth it.)
Closes: #273
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Stephen Finucane <stephen@that.guru>
[stephenfin: Use a new release note instead of editing the original one]
-rw-r--r-- | patchwork/parser.py | 1 | ||||
-rw-r--r-- | patchwork/tests/test_parser.py | 2 | ||||
-rw-r--r-- | releasenotes/notes/issue-273-2bb8d2bf5fa9a57e.yaml | 8 |
3 files changed, 9 insertions, 2 deletions
diff --git a/patchwork/parser.py b/patchwork/parser.py index 712780a..91e9920 100644 --- a/patchwork/parser.py +++ b/patchwork/parser.py @@ -47,7 +47,6 @@ class DuplicateMailError(Exception): def normalise_space(value): - value = ''.join(re.split(r'\n\s+', value)) whitespace_re = re.compile(r'\s+') return whitespace_re.sub(' ', value).strip() diff --git a/patchwork/tests/test_parser.py b/patchwork/tests/test_parser.py index ddbcf5b..f182202 100644 --- a/patchwork/tests/test_parser.py +++ b/patchwork/tests/test_parser.py @@ -838,7 +838,7 @@ class SubjectTest(TestCase): self.assertEqual(clean_subject("[PATCH] meep \n meep"), ('meep meep', [])) self.assertEqual(clean_subject("[PATCH] meep,\n meep"), - ('meep,meep', [])) + ('meep, meep', [])) self.assertEqual(clean_subject('[PATCH RFC] meep'), ('[RFC] meep', ['RFC'])) self.assertEqual(clean_subject('[PATCH,RFC] meep'), diff --git a/releasenotes/notes/issue-273-2bb8d2bf5fa9a57e.yaml b/releasenotes/notes/issue-273-2bb8d2bf5fa9a57e.yaml new file mode 100644 index 0000000..506de0d --- /dev/null +++ b/releasenotes/notes/issue-273-2bb8d2bf5fa9a57e.yaml @@ -0,0 +1,8 @@ +--- +fixes: + - | + `#197`__ was the result of a issue with OzLabs instance and not Patchwork + itself, and the fix included actually ended up corrupting subjects for + everyone. It has now been reverted. + + __ https://github.com/getpatchwork/patchwork/issues/197 |