This is an archive of the discontinued Mercurial Phabricator instance.

stringutil: add a new function to do minimal regex escaping
ClosedPublic

Authored by durin42 on Jun 26 2018, 11:22 AM.

Details

Summary

Per https://bugs.python.org/issue29995, re.escape() used to
over-escape regular expression strings, but in Python 3.7 that's been
fixed, which also improved the performance of re.escape(). Since it's
both an output change for us *and* a perfomance win, let's just
effectively backport the new behavior to hg on all Python versions.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

durin42 created this revision.Jun 26 2018, 11:22 AM
pulkit accepted this revision.Jun 26 2018, 1:26 PM
This revision was automatically updated to reflect the committed changes.
yuja added a subscriber: yuja.Jun 27 2018, 9:47 AM

+# regex special chars pulled from https://bugs.python.org/issue29995
+# which was part of Python 3.7.
+_respecial = pycompat.bytestr(b'()[]{}?*+-|^$\\.# \t\n\r\v\f')
+_regexescapemap = {ord(i): (b'\\' + i).decode('latin1') for i in _respecial}

The Py3.7 version also includes '&' and '~'.

https://github.com/python/cpython/blob/v3.7.0rc1/Lib/re.py#L248

In D3841#60110, @yuja wrote:

+# regex special chars pulled from https://bugs.python.org/issue29995
+# which was part of Python 3.7.
+_respecial = pycompat.bytestr(b'()[]{}?*+-|^$\\.# \t\n\r\v\f')
+_regexescapemap = {ord(i): (b'\\' + i).decode('latin1') for i in _respecial}

The Py3.7 version also includes '&' and '~'.
https://github.com/python/cpython/blob/v3.7.0rc1/Lib/re.py#L248

Nice catch, mailed D3850 as a follow-up.