This is an archive of the discontinued Mercurial Phabricator instance.

convert: don't pass bytes to, or expect bytes from, emailparser
AbandonedPublic

Authored by Kwan on Oct 11 2019, 1:12 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

Kwan created this revision.Oct 11 2019, 1:12 PM
martinvonz added inline comments.
hgext/convert/gnuarch.py
301

Is it right to depend on the user's preferred encoding (as I think unifromlocal() does)? Would it make sense to instead initialize self.catlogparser = emailparser.BytesParser()?

Kwan added inline comments.Oct 12 2019, 10:36 AM
hgext/convert/gnuarch.py
301

Hmm, I wasn't aware of that drawback of unifromlocal() if it's true. Is there a canonical "Give me unicode from these mercurial bytes" function? Regardless, BytesParser does sound handy, and is even in 3.5, but isn't present in 2.7. Would doing it conditionally be alright? (and a conditional alias for parsebytes)

self.catlogparser = (
    emailparser.BytesParser()
    if pycompat.ispy3
    else emailparser.Parser()
)
if not pycompat.ispy3:
    self.catlogparser.parsebytes = self.catlogparser.parsestr
-            catlog = self.catlogparser.parsestr(data)
+            catlog = self.catlogparser.parsebytes(data)
durin42 added inline comments.
hgext/convert/gnuarch.py
301

It depends which bytes, basically. Some bytes in hg are known to be UTF-8, but anytime we have file contents or filenames we don't know.

I'm not sure of the context here, but maybe the output from tla is in some known encoding?

(I'm also open to the idea of dropping tla convert support as we move to Python 3, as tla has been obsolete for a _long_ time.)

Kwan abandoned this revision.Nov 4 2019, 2:28 PM

Obsoleted by a better fix in cf3bf3b03445.