This is an archive of the discontinued Mercurial Phabricator instance.

stringutil: add a new function to do minimal regex escaping
ClosedPublic

Authored by durin42 on Jun 26 2018, 11:22 AM.

Details

Summary

Per https://bugs.python.org/issue29995, re.escape() used to
over-escape regular expression strings, but in Python 3.7 that's been
fixed, which also improved the performance of re.escape(). Since it's
both an output change for us *and* a perfomance win, let's just
effectively backport the new behavior to hg on all Python versions.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

durin42 created this revision.Jun 26 2018, 11:22 AM
pulkit accepted this revision.Jun 26 2018, 1:26 PM
This revision was automatically updated to reflect the committed changes.
yuja added a subscriber: yuja.Jun 27 2018, 9:47 AM

+# regex special chars pulled from https://bugs.python.org/issue29995
+# which was part of Python 3.7.
+_respecial = pycompat.bytestr(b'()[]{}?*+-|^$\\.# \t\n\r\v\f')
+_regexescapemap = {ord(i): (b'\\' + i).decode('latin1') for i in _respecial}

The Py3.7 version also includes '&' and '~'.

https://github.com/python/cpython/blob/v3.7.0rc1/Lib/re.py#L248

In D3841#60110, @yuja wrote:

+# regex special chars pulled from https://bugs.python.org/issue29995
+# which was part of Python 3.7.
+_respecial = pycompat.bytestr(b'()[]{}?*+-|^$\\.# \t\n\r\v\f')
+_regexescapemap = {ord(i): (b'\\' + i).decode('latin1') for i in _respecial}

The Py3.7 version also includes '&' and '~'.
https://github.com/python/cpython/blob/v3.7.0rc1/Lib/re.py#L248

Nice catch, mailed D3850 as a follow-up.