This is an archive of the discontinued Mercurial Phabricator instance.

narrow: escape includepats/excludepats when sending over the wire
AbandonedPublic

Authored by spectral on Mar 13 2020, 12:12 AM.

Download Raw Diff

Details

Reviewers

durin42
martinvonz
marmoute

Group Reviewers

hg-reviewers

Summary

The escaping chosen was RFC 4180-like in that it wraps the values that are to be
escaped in double quote (") characters, and doubles any embedded double quote
characters. It differs from RFC 4180 in what is escaped; the escaping is only
applied if the item contains a , character or starts with a " character. RFC
4180 states that " characters embedded in a value that isn't escaped are not
allowed, and it also states that \r and \n characters should trigger an item to
be escaped.

The differences from RFC 4180 are intentional:

the include and exclude patterns can't contain a , character today (that's what's being fixed), so surrounding them in " characters won't really change anything.
the include and exclude patterns today can't start with a " character, so by *not* escaping all strings that have an embedded " unless they start with a " character (for future compatibility), we maintain BC.
the include and exclude patterns aren't multi-line, so \r and \n are irrelevant (and technically allowed in the patterns, though Mercurial is generally unable to reason about filenames that contain newlines).

Alternatives considered:

Full RFC 4180 encoding and decoding:

this would have been a breaking change for any include/exclude pattern that includes a " in the name if a new client talks to an old server. This is quite unlikely to cause problems in practice, but was easy to avoid.
this would have required the decoder to handle \r and \n characters specially, which would complicate it significantly.
the csv module in Python would have provided this to us, but it requires str on python3, and we only have bytes. It's not generically safe to convert these patterns to str, though perhaps roundtripping the string through fsencode and fsdecode would have been acceptable?

Other escaping mechanisms:

the original version of this change did use a different (home-grown) escaping mechanism. This received pushback during review.
using \ escaping (\->\\, and , -> \,) is annoying to decode, though overall pretty similar to what we have to do here. It would introduce significantly more changes for cross-platform and cross-version compatibility, however, as a server that encounters a string with \, can't know if the client is escaping a comma, or if the client is old and the path ends in a backslash (among other ambiguities).
using the existing batch escaping would likely have worked, but would introduce cross-version ambiguities if any path contained a : character followed by one of the four characters used in that escaping mechanism. The author does not know if their deployment has any paths with these character sequences in them, so decided to err on the side of caution.

Diff Detail

Repository

rHG Mercurial

Branch

default

Lint

No Linters Available

Unit

No Unit Test Coverage

Event Timeline

spectral created this revision.Mar 13 2020, 12:12 AM

Herald added a reviewer: durin42. · View Herald TranscriptMar 13 2020, 12:12 AM

Herald added a reviewer: martinvonz. · View Herald Transcript

Herald added a reviewer: hg-reviewers. · View Herald Transcript

Herald added a subscriber: mercurial-devel. · View Herald Transcript

The Windows path changes seem like a good idea.

Would quoting paths with commas eliminate the need for custom escaping? I don't feel strongly about it, but custom escaping always feels weird to me. (I fact, a coworker did some homebrew escaping for CSV files a few days ago, but I forget how it ultimately ended up.)

spectral updated this revision to Diff 20767.Mar 13 2020, 1:21 AM

In D8281#123621, @mharbison72 wrote:

The Windows path changes seem like a good idea.
Would quoting paths with commas eliminate the need for custom escaping? I don't feel strongly about it, but custom escaping always feels weird to me. (I fact, a coworker did some homebrew escaping for CSV files a few days ago, but I forget how it ultimately ended up.)

Let me play with that a bit, I think it'll work and be detectable on the server since the first character can't currently be a double-quote, so there wouldn't really be any BC issues aside from the pconvert (which wouldn't be as important anymore, but still probably a good idea?)

In D8281#123625, @spectral wrote:

In D8281#123621, @mharbison72 wrote:

The Windows path changes seem like a good idea.
Would quoting paths with commas eliminate the need for custom escaping? I don't feel strongly about it, but custom escaping always feels weird to me. (I fact, a coworker did some homebrew escaping for CSV files a few days ago, but I forget how it ultimately ended up.)

Let me play with that a bit, I think it'll work and be detectable on the server since the first character can't currently be a double-quote, so there wouldn't really be any BC issues aside from the pconvert (which wouldn't be as important anymore, but still probably a good idea?)

It's a little weird to me that there's no step corresponding to pconvert() in the decoding function. Maybe we should do the pconvert() step outside encodecsvpaths(), maybe in narrowspec.normalizepattern()? However, that would make it impossible to have backslashes in paths in the narrowspec on Windows even after this patch (because they would always be converted to slashes), including narrowspecs returned from the server. However^2, maybe it wouldn't work to check out such paths on Windows anyway, so it doesn't really matter (so it's fine to call pconvert())?

The escaping scheme is a bit puzzling to me. Coudl we use something more standard for this ? (like urlencode).

(requesting change of the function name. Now that we can, lets make them readable)

mercurial/wireprototypes.py
182	Let's name it `encode_csv_paths` now that `_` are permitted. It woul dbe easier to read.
219	Let's name it `decode_csv_paths` now that `_` are permitted. It woul dbe easier to read.

This revision now requires changes to proceed.Mar 13 2020, 4:58 AM

In D8281#123625, @spectral wrote:

In D8281#123621, @mharbison72 wrote:

The Windows path changes seem like a good idea.
Would quoting paths with commas eliminate the need for custom escaping? I don't feel strongly about it, but custom escaping always feels weird to me. (I fact, a coworker did some homebrew escaping for CSV files a few days ago, but I forget how it ultimately ended up.)

Let me play with that a bit, I think it'll work and be detectable on the server since the first character can't currently be a double-quote, so there wouldn't really be any BC issues aside from the pconvert (which wouldn't be as important anymore, but still probably a good idea?)

I haven't played with narrow yet, so I'm not sure of the context. Are these user input paths that would end up being ignored/rejected if a Windows user used path\to\file when talking to a Unix server? Or are these stored in a tracked file? (Which I think could still cause problems.) I can't think of a good reason to stay inconsistent, and it is still experimental, so it still seems like a good idea while we still can fix it.

In D8281#123659, @mharbison72 wrote:

In D8281#123625, @spectral wrote:

In D8281#123621, @mharbison72 wrote:

The Windows path changes seem like a good idea.
Would quoting paths with commas eliminate the need for custom escaping? I don't feel strongly about it, but custom escaping always feels weird to me. (I fact, a coworker did some homebrew escaping for CSV files a few days ago, but I forget how it ultimately ended up.)

Let me play with that a bit, I think it'll work and be detectable on the server since the first character can't currently be a double-quote, so there wouldn't really be any BC issues aside from the pconvert (which wouldn't be as important anymore, but still probably a good idea?)

I haven't played with narrow yet, so I'm not sure of the context. Are these user input paths that would end up being ignored/rejected if a Windows user used path\to\file when talking to a Unix server? Or are these stored in a tracked file? (Which I think could still cause problems.) I can't think of a good reason to stay inconsistent, and it is still experimental, so it still seems like a good idea while we still can fix it.

They are stored in .hg/store/narrowspec. They usually get into that file via hg clone --narrow --include and similar, but the server may also send them. We should ideally do the conversion early when the user provides them. I think the pconvert in this patch is to handle existing repos as well as possible.

spectral edited parent revisions, added: D8294: tests: make test-doctest.t automatically find files to run tests on; removed: D8280: tests: make test-doctest.t module list match reality.Mar 13 2020, 10:29 PM

spectral updated this revision to Diff 20789.

spectral retitled this revision from narrow: escape includepats/excludepats when sending over the wire (BC) to narrow: escape includepats/excludepats when sending over the wire.Mar 17 2020, 5:25 PM

spectral edited the summary of this revision. (Show Details)

spectral updated this revision to Diff 20813.

spectral updated this revision to Diff 20818.Mar 17 2020, 5:50 PM

spectral updated this revision to Diff 20819.Mar 17 2020, 5:59 PM

Since narrow is still experimental, I don't think we should try too hard for backward compatibility. We could introduce a new end-point for a new encoding and drop the old one in a couple of version.

I am really not enthousiatic with having our own version of an encoding. Because this means extra overhead for people working on this in the future. Especially if it needs to be reimplemented (eg: in rust). If we drop the hard BC constraint on this, like starting from scratch. What would be your (@spectral ) encoding of choice ? could we got for something simple but widely available like url-encode ?

In D8281#124058, @marmoute wrote:

Since narrow is still experimental, I don't think we should try too hard for backward compatibility. We could introduce a new end-point for a new encoding and drop the old one in a couple of version.

+0, honestly. I won't require it, but I'd really rather we shaved this yak _now_ rather than when narrow has even more users.

In D8281#124129, @durin42 wrote:

In D8281#124058, @marmoute wrote:

Since narrow is still experimental, I don't think we should try too hard for backward compatibility. We could introduce a new end-point for a new encoding and drop the old one in a couple of version.

+0, honestly. I won't require it, but I'd really rather we shaved this yak _now_ rather than when narrow has even more users.

I'm getting a bit frustrated with how much time I've spent on this, made worse by the fact that I agree with everything everyone's saying and so it's not like I'm frustrated at the review process, just how slow I've been at accomplishing this.

So, before I go down another rabbit hole, here's what I'm thinking:

Server emits a new capability narrow-exp-1-escaped (in addition to the current narrow-exp-1, this is not replacing the existing capability)
wireprototypes's map will change these items from csv to csv.escaped
Compatible clients will detect this capability from the server and send items of type csv.escaped during getbundle with keys like <previousname>.escaped (ex: include.escaped). If the server doesn't support csv.escaped, the client sends with the old names (unescaped).
The escaping will be urllibcompat.quote
The server will strip the .escaped suffix on the keys, split on comma, and urllibcompat.unquote the individual items
I'm *not* expecting to do anything about \ -> / conversion.

Since these are part of getbundle, I haven't found a way of doing this that's not one of:

a custom escaping mechanism that's backwards compatible
adding a capability and renaming the keys that are sent (so the server can tell when it needs to unescape)
having the client always send duplicate items (i.e. send include and include.escaped). I'm not even sure that older servers would tolerate receiving keys they aren't expecting.
having the client only escape when necessary (i.e. it includes a comma), and then always send the path as include.escaped (which runs into the problem of old servers rejecting).
having the server always unescape and the client always escape. This breaks the server's ability to interact with older clients that aren't escaping (which we'll need to support for at least a week or two).

For the non-getbundle parts (I think the wireproto command is 'widen'), we can easily make a widen2 or something, but it's probably easier to just keep the same command name and do the same thing as in getbundle: detect the capability, send as foo.escaped if supported.

In D8281#124246, @spectral wrote:

In D8281#124129, @durin42 wrote:

In D8281#124058, @marmoute wrote:

Since narrow is still experimental, I don't think we should try too hard for backward compatibility. We could introduce a new end-point for a new encoding and drop the old one in a couple of version.

+0, honestly. I won't require it, but I'd really rather we shaved this yak _now_ rather than when narrow has even more users.

I'm getting a bit frustrated with how much time I've spent on this, made worse by the fact that I agree with everything everyone's saying and so it's not like I'm frustrated at the review process, just how slow I've been at accomplishing this.
So, before I go down another rabbit hole, here's what I'm thinking:

Server emits a new capability narrow-exp-1-escaped (in addition to the current narrow-exp-1, this is not replacing the existing capability)

nit: I *think* the "1" in the name was supposed to be a version number, so the new capability's name would be narrow-exp-2.

wireprototypes's map will change these items from csv to csv.escaped

Compatible clients will detect this capability from the server and send items of type csv.escaped during getbundle with keys like <previousname>.escaped (ex: include.escaped). If the server doesn't support csv.escaped, the client sends with the old names (unescaped).

The escaping will be urllibcompat.quote

The server will strip the .escaped suffix on the keys, split on comma, and urllibcompat.unquote the individual items

I'm *not* expecting to do anything about \ -> / conversion.

This all sounds good to me.

Since these are part of getbundle, I haven't found a way of doing this that's not one of:

a custom escaping mechanism that's backwards compatible

adding a capability and renaming the keys that are sent (so the server can tell when it needs to unescape)

having the client always send duplicate items (i.e. send include and include.escaped). I'm not even sure that older servers would tolerate receiving keys they aren't expecting.

having the client only escape when necessary (i.e. it includes a comma), and then always send the path as include.escaped (which runs into the problem of old servers rejecting).

having the server always unescape and the client always escape. This breaks the server's ability to interact with older clients that aren't escaping (which we'll need to support for at least a week or two).

For the non-getbundle parts (I think the wireproto command is 'widen'), we can easily make a widen2 or something, but it's probably easier to just keep the same command name and do the same thing as in getbundle: detect the capability, send as foo.escaped if supported.

Maybe no one uses the "widen" command so we don't even need to worry about compatibility there?

In D8281#124247, @martinvonz wrote:

In D8281#124246, @spectral wrote:

Server emits a new capability narrow-exp-1-escaped (in addition to the current narrow-exp-1, this is not replacing the existing capability)

nit: I *think* the "1" in the name was supposed to be a version number, so the new capability's name would be narrow-exp-2.

Yes, I had assumed that as well. This isn't really a new version of the protocol, though, just a minor tweak, and it's primarily to the 'csv' type used in getbundle (see the current version of this patch that adds 'qcsv' for the actual locations it's used). Honestly I went back and forth between announcing it as getbundle-csv-escaped and something related to narrow (and ended up on narrow, as you see, since while it's generically useful besides narrow, nothing else needs it today, and future things wouldn't need this to be announced forever - they'll always have used foo.escaped and been transmitted as escaped).

Maybe no one uses the "widen" command so we don't even need to worry about compatibility there?

I don't know how often it's used :) I just know that there's something *not* getbundle-related in narrowwirepeer.py (looks like it's called narrow_widen that needed to be modified or else the tests wouldn't pass. I honestly don't even know if we're using it internally at Google right now. If not, that's fewer things to change, which I'm OK with :)

In D8281#124246, @spectral wrote:

I'm *not* expecting to do anything about \ -> / conversion.

So would there be some interoperability issue between Windows and not-Windows if paths aren't pconverted, if paths can also come from the server as Martin mentioned? Is there anything here that makes it more difficult to pconvert in the future? (I assume it only came up in the first place to allow the custom escaping. I understand your frustration, so I'm not looking to sign you up for more work. But I only know about narrow from a very high conceptual level, so I figure I might as well ask now and save this info for later.)

In D8281#124246, @spectral wrote:

In D8281#124129, @durin42 wrote:

In D8281#124058, @marmoute wrote:

Since narrow is still experimental, I don't think we should try too hard for backward compatibility. We could introduce a new end-point for a new encoding and drop the old one in a couple of version.

+0, honestly. I won't require it, but I'd really rather we shaved this yak _now_ rather than when narrow has even more users.

I'm getting a bit frustrated with how much time I've spent on this, made worse by the fact that I agree with everything everyone's saying and so it's not like I'm frustrated at the review process, just how slow I've been at accomplishing this.

I know the feeling, thanks a lot for revisiting you original plan.

So, before I go down another rabbit hole, here's what I'm thinking:

Server emits a new capability narrow-exp-1-escaped (in addition to the current narrow-exp-1, this is not replacing the existing capability)

wireprototypes's map will change these items from csv to csv.escaped

Compatible clients will detect this capability from the server and send items of type csv.escaped during getbundle with keys like <previousname>.escaped (ex: include.escaped). If the server doesn't support csv.escaped, the client sends with the old names (unescaped).

The escaping will be urllibcompat.quote

The server will strip the .escaped suffix on the keys, split on comma, and urllibcompat.unquote the individual items

This looks overall good to me.

I'm *not* expecting to do anything about \ -> / conversion.

Does this means the client side is expected to enforce using / as the directory separator ?

Since these are part of getbundle, I haven't found a way of doing this that's not one of:

a custom escaping mechanism that's backwards compatible

adding a capability and renaming the keys that are sent (so the server can tell when it needs to unescape)

having the client always send duplicate items (i.e. send include and include.escaped). I'm not even sure that older servers would tolerate receiving keys they aren't expecting.

It would not work. Server would reject unknown arguments to getbundle.

having the client only escape when necessary (i.e. it includes a comma), and then always send the path as include.escaped (which runs into the problem of old servers rejecting).

In my opinion, I don't think we get much benefit of conditional escaping. So keeping things simple seems better.

having the server always unescape and the client always escape. This breaks the server's ability to interact with older clients that aren't escaping (which we'll need to support for at least a week or two).

As much as I think we don't need strong BC on narrow because it is experimental, have a couple of version that can still speak to each other is preferable.

For the non-getbundle parts (I think the wireproto command is 'widen'), we can easily make a widen2 or something, but it's probably easier to just keep the same command name and do the same thing as in getbundle: detect the capability, send as foo.escaped if supported.

If I understood the situation correctly, a rewrite in planned.

This revision now requires changes to proceed.Apr 22 2020, 12:11 PM

Herald added a subscriber: mercurial-patches. · View Herald TranscriptApr 22 2020, 12:11 PM

spectral abandoned this revision.Oct 29 2021, 5:44 PM

Revision Contents
Changeset List

			Path	Packages
M			hgext/narrow/narrowbundle2.py (4 lines)
M			hgext/narrow/narrowwirepeer.py (14 lines)
M			mercurial/wireprototypes.py (203 lines)
M			mercurial/wireprotov1peer.py (2 lines)
M			mercurial/wireprotov1server.py (2 lines)
M			relnotes/next (4 lines)
M			tests/test-doctest.py (1 line)
M			tests/test-narrow-exchange.t (59 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	20765		Mar 13 2020, 12:12 AM	★	★
Diff 2	20767		Mar 13 2020, 1:21 AM	★	★
Diff 3	20789		Mar 13 2020, 10:29 PM	★	★
Diff 4	20813		Mar 17 2020, 5:25 PM	★	★
Diff 5	20818		Mar 17 2020, 5:50 PM	★	★
Diff 6	20819		Mar 17 2020, 5:59 PM	★	★

Commit	Parents	Author	Summary	Date
54ad8cc96f44	0af56d3ee24c	Kyle Lippincott		Mar 17 2020, 1:05 AM

Status	Author	Revision
Abandoned	spectral	D8281 narrow: escape includepats/excludepats when sending over the wire
Closed	spectral	D8294 tests: make test-doctest.t automatically find files to run tests on
Closed	spectral	D8280 tests: make test-doctest.t module list match reality
Closed	spectral	D8279 tests: remove doctest in narrowspec, it is broken

Diff 20813

hgext/narrow/narrowbundle2.py



	def setup():			def setup():
	"""Enable narrow repo support in bundle2-related extension points."""			"""Enable narrow repo support in bundle2-related extension points."""
	getbundleargs = wireprototypes.GETBUNDLE_ARGUMENTS			getbundleargs = wireprototypes.GETBUNDLE_ARGUMENTS

	getbundleargs[b'narrow'] = b'boolean'			getbundleargs[b'narrow'] = b'boolean'
	getbundleargs[b'depth'] = b'plain'			getbundleargs[b'depth'] = b'plain'
	getbundleargs[b'oldincludepats'] = b'csv'			getbundleargs[b'oldincludepats'] = b'qcsv'
	getbundleargs[b'oldexcludepats'] = b'csv'			getbundleargs[b'oldexcludepats'] = b'qcsv'
	getbundleargs[b'known'] = b'csv'			getbundleargs[b'known'] = b'csv'

	# Extend changegroup serving to handle requests from narrow clients.			# Extend changegroup serving to handle requests from narrow clients.
	origcgfn = exchange.getbundle2partsmapping[b'changegroup']			origcgfn = exchange.getbundle2partsmapping[b'changegroup']

	def wrappedcgfn(args, *kwargs):			def wrappedcgfn(args, *kwargs):
	repo = args[1]			repo = args[1]
	if repo.ui.has_section(_NARROWACL_SECTION):			if repo.ui.has_section(_NARROWACL_SECTION):

hgext/narrow/narrowwirepeer.py

	cgversion(maybe): the changegroup version to produce			cgversion(maybe): the changegroup version to produce
	known: list of nodes which are known on the client (used in ellipses cases)			known: list of nodes which are known on the client (used in ellipses cases)
	ellipses: whether to send ellipses data or not			ellipses: whether to send ellipses data or not
	"""			"""

	preferuncompressed = False			preferuncompressed = False
	try:			try:

	def splitpaths(data):			oldincludes = wireprototypes.decode_qcsv(oldincludes)
	# work around ''.split(',') => ['']			newincludes = wireprototypes.decode_qcsv(newincludes)
	return data.split(b',') if data else []			oldexcludes = wireprototypes.decode_qcsv(oldexcludes)
				newexcludes = wireprototypes.decode_qcsv(newexcludes)
	oldincludes = splitpaths(oldincludes)
	newincludes = splitpaths(newincludes)
	oldexcludes = splitpaths(oldexcludes)
	newexcludes = splitpaths(newexcludes)
	# validate the patterns			# validate the patterns
	narrowspec.validatepatterns(set(oldincludes))			narrowspec.validatepatterns(set(oldincludes))
	narrowspec.validatepatterns(set(newincludes))			narrowspec.validatepatterns(set(newincludes))
	narrowspec.validatepatterns(set(oldexcludes))			narrowspec.validatepatterns(set(oldexcludes))
	narrowspec.validatepatterns(set(newexcludes))			narrowspec.validatepatterns(set(newexcludes))

	common = wireprototypes.decodelist(commonheads)			common = wireprototypes.decodelist(commonheads)
	known = wireprototypes.decodelist(known)			known = wireprototypes.decodelist(known)
	)			)


	def peernarrowwiden(remote, **kwargs):			def peernarrowwiden(remote, **kwargs):
	for ch in ('commonheads', 'known'):			for ch in ('commonheads', 'known'):
	kwargs[ch] = wireprototypes.encodelist(kwargs[ch])			kwargs[ch] = wireprototypes.encodelist(kwargs[ch])

	for ch in ('oldincludes', 'newincludes', 'oldexcludes', 'newexcludes'):			for ch in ('oldincludes', 'newincludes', 'oldexcludes', 'newexcludes'):
	kwargs[ch] = b','.join(kwargs[ch])			kwargs[ch] = wireprototypes.encode_qcsv(kwargs[ch])

	kwargs['ellipses'] = b'%i' % bool(kwargs['ellipses'])			kwargs['ellipses'] = b'%i' % bool(kwargs['ellipses'])
	f = remote._callcompressable(b'narrow_widen', **kwargs)			f = remote._callcompressable(b'narrow_widen', **kwargs)
	return bundle2.getunbundler(remote.ui, f)			return bundle2.getunbundler(remote.ui, f)

mercurial/wireprototypes.py

	# Meant to be extended by extensions. It is the extension's responsibility to			# Meant to be extended by extensions. It is the extension's responsibility to
	# ensure such options are properly processed in exchange.getbundle.			# ensure such options are properly processed in exchange.getbundle.
	#			#
	# supported types are:			# supported types are:
	#			#
	# :nodes: list of binary nodes, transmitted as space-separated hex nodes			# :nodes: list of binary nodes, transmitted as space-separated hex nodes
	# :csv: list of values, transmitted as comma-separated values			# :csv: list of values, transmitted as comma-separated values
	# :scsv: set of values, transmitted as comma-separated values			# :scsv: set of values, transmitted as comma-separated values
				# :qcsv: list of values, transmitted as quote-escaped comma-separated values
	# :plain: string with no transformation needed.			# :plain: string with no transformation needed.
	GETBUNDLE_ARGUMENTS = {			GETBUNDLE_ARGUMENTS = {
	b'heads': b'nodes',			b'heads': b'nodes',
	b'bookmarks': b'boolean',			b'bookmarks': b'boolean',
	b'common': b'nodes',			b'common': b'nodes',
	b'obsmarkers': b'boolean',			b'obsmarkers': b'boolean',
	b'phases': b'boolean',			b'phases': b'boolean',
	b'bundlecaps': b'scsv',			b'bundlecaps': b'scsv',
	b'listkeys': b'csv',			b'listkeys': b'csv',
	b'cg': b'boolean',			b'cg': b'boolean',
	b'cbattempted': b'boolean',			b'cbattempted': b'boolean',
	b'stream': b'boolean',			b'stream': b'boolean',
	b'includepats': b'csv',			b'includepats': b'qcsv',
	b'excludepats': b'csv',			b'excludepats': b'qcsv',
	}			}


				def encode_qcsv(paths):
				marmouteUnsubmitted Not Done Let's name it `encode_csv_paths` now that `_` are permitted. It woul dbe easier to read. marmoute: Let's name it `encode_csv_paths` now that `_` are permitted. It woul dbe easier to read.
				r'''escape and join a value of type 'qcsv', producing a bytes object

				This produces an RFC 4180-like encoding, with the primary difference being
				that we allow and produce items that have an embedded " character without
				forcing the item to be completely surrounded by " characters. That is,
				b'a"b' serializes as b'a"b', not b'"a""b"'. We also do not treat \r or \n
				special in any way.

				If an item starts with a " character or has a , character in it, the entire
				item is escaped.

				>>> from mercurial.pycompat import sysstr
				>>> def check(paths):
				... return sysstr(encode_qcsv(paths))
				>>> check([b'a', b'b', b'c'])
				'a,b,c'
				>>> check([b'a"b', b'c'])
				'a"b,c'
				>>> check([b'a,b', b'c'])
				'"a,b",c'
				>>> check([b'a,b,c'])
				'"a,b,c"'
				>>> check([b'a"b,"'])
				'"a""b,"""'
				>>> check([b'a"cb', b'"'])
				'a"cb,""""'
				>>> check([b'"'])
				'""""'
				>>> check([b'', b''])
				','
				>>> check([b'', b'', b'', b''])
				',,,'
				>>> check([b','])
				'","'
				>>> check([b'",,",""'])
				'""",,"","""""'
				>>> check([])
				marmouteUnsubmitted Not Done Let's name it `decode_csv_paths` now that `_` are permitted. It woul dbe easier to read. marmoute: Let's name it `decode_csv_paths` now that `_` are permitted. It woul dbe easier to read.
				''
				>>> check([b''])
				''
				'''
				def maybequote(p):
				if p.startswith(b'"') or b',' in p:
				return b'"%s"' % p.replace(b'"', b'""')
				return p
				return b','.join([maybequote(p) for p in paths])


				def _commasplit(s):
				'''Split a csv value, doing RFC 4180-like unescaping.

				If an item in the csv list is escaped, the whole item will be enclosed in "
				characters. Embedded " characters in these strings are represented by two
				sequential " characters.

				If the item does not start with a " character, it is returned as-is
				(including any embedded " characters). If the item starts with a " character
				but does not end in one, we will remove the " and the final character of the
				string, behavior in this situation is not guaranteed to be stable.

				>>> from mercurial.pycompat import sysstr
				>>> def check(s):
				... return list([sysstr(x) for x in _commasplit(s)])
				>>> check(b'a,b,c')
				['a', 'b', 'c']
				>>> check(b'a"b,c')
				['a"b', 'c']
				>>> check(b'"a,b",c')
				['a,b', 'c']
				>>> check(b'"a,b,c"')
				['a,b,c']
				>>> check(b'"a""b,"""')
				['a"b,"']
				>>> check(b'""')
				['']
				>>> check(b'""""')
				['"']
				>>> check(b'')
				['']
				>>> check(b',')
				['', '']
				>>> check(b',,,')
				['', '', '', '']
				>>> check(b'","')
				[',']
				>>> check(b'""",,"","""""')
				['",,",""']
				>>> # These are invalid, so just make sure it doesn't crash/hang, the actual
				>>> # value is essentially irrelevant.
				>>> check(b'"')
				['']
				>>> check(b',",')
				['', '']
				>>> # Note the missing 'c'.
				>>> check(b'"abc')
				['ab']
				'''

				dbg = False
				startpos = 0
				while True:
				quoted = False
				if s[startpos:startpos+1] == b'"':
				quoted = True
				startpos += 1
				if dbg:
				yield 'DBG: quoted, startpos: %d' % startpos
				else:
				if dbg:
				yield 'DBG: not quoted'
				pass
				searchpos = startpos
				while True:
				commapos = s.find(b',', searchpos)
				if dbg:
				yield 'DBG: comma found at: %d' % commapos
				if commapos < 0:
				if quoted:
				if dbg:
				yield 'DBG: to the end of the string (quoted)'
				yield s[startpos:-1].replace(b'""', b'"')
				else:
				if dbg:
				yield 'DBG: to the end of the string (not quoted)'
				yield s[startpos:]
				return
				else:
				if quoted:
				if s[startpos:commapos].count(b'"') % 2 == 0:
				searchpos = commapos + 1
				if dbg:
				yield 'DBG: found embedded comma at %d, continuing...' % commapos
				continue # find another comma, this one is quoted
				yield s[startpos:commapos - 1].replace(b'""', b'"')

				elif commapos == startpos:
				yield ''
				else:
				if dbg:
				yield 'DBG: no embedded comma found'
				yield s[startpos:commapos]
				startpos = commapos + 1
				break # Stop searching for embedded commas, go to next item


				def decode_qcsv(s):
				r'''decode an value of type 'qcsv', producing a list

				If `s` is an empty string, decodes to an empty list.

				>>> from mercurial.pycompat import sysstr
				>>> def check(s):
				... return list([sysstr(x) for x in decode_qcsv(s)])
				>>> check(b'a,b,c')
				['a', 'b', 'c']
				>>> check(b'a"b,c')
				['a"b', 'c']
				>>> check(b'"a,b",c')
				['a,b', 'c']
				>>> check(b'"a,b,c"')
				['a,b,c']
				>>> check(b'"a""b,"""')
				['a"b,"']
				>>> check(b'""')
				['']
				>>> check(b'""""')
				['"']
				>>> check(b'')
				['']
				>>> check(b',')
				['', '']
				>>> check(b',,,')
				['', '', '', '']
				>>> check(b'","')
				[',']
				>>> check(b'""",,"","""""')
				['",,",""']
				>>> # These are invalid, so just make sure it doesn't crash/hang, the actual
				>>> # value is essentially irrelevant.
				>>> check(b'"')
				['']
				>>> check(b',",')
				['', '']
				>>> # Note the missing 'c'.
				>>> check(b'"abc')
				['ab']
				'''
				if not s:
				return []

				if s[0:1] != b'"' and b',"' not in s:
				# fast path
				return s.split(b',')

				return list(_commasplit(s))


	class baseprotocolhandler(interfaceutil.Interface):			class baseprotocolhandler(interfaceutil.Interface):
	"""Abstract base class for wire protocol handlers.			"""Abstract base class for wire protocol handlers.

	A wire protocol handler serves as an interface between protocol command			A wire protocol handler serves as an interface between protocol command
	handlers and the wire protocol transport layer. Protocol handlers provide			handlers and the wire protocol transport layer. Protocol handlers provide
	methods to read command arguments, redirect stdio for the duration of			methods to read command arguments, redirect stdio for the duration of
	the request, handle response types, etc.			the request, handle response types, etc.
	"""			"""

mercurial/wireprotov1peer.py

	b'Unexpectedly None keytype for key %s' % key			b'Unexpectedly None keytype for key %s' % key
	)			)
	elif keytype == b'nodes':			elif keytype == b'nodes':
	value = wireprototypes.encodelist(value)			value = wireprototypes.encodelist(value)
	elif keytype == b'csv':			elif keytype == b'csv':
	value = b','.join(value)			value = b','.join(value)
	elif keytype == b'scsv':			elif keytype == b'scsv':
	value = b','.join(sorted(value))			value = b','.join(sorted(value))
				elif keytype == b'qcsv':
				value = wireprototypes.encode_qcsv(value)
	elif keytype == b'boolean':			elif keytype == b'boolean':
	value = b'%i' % bool(value)			value = b'%i' % bool(value)
	elif keytype != b'plain':			elif keytype != b'plain':
	raise KeyError(b'unknown getbundle option type %s' % keytype)			raise KeyError(b'unknown getbundle option type %s' % keytype)
	opts[key] = value			opts[key] = value
	f = self._callcompressable(b"getbundle", **pycompat.strkwargs(opts))			f = self._callcompressable(b"getbundle", **pycompat.strkwargs(opts))
	if any((cap.startswith(b'HG2') for cap in bundlecaps)):			if any((cap.startswith(b'HG2') for cap in bundlecaps)):
	return bundle2.getunbundler(self.ui, f)			return bundle2.getunbundler(self.ui, f)

mercurial/wireprotov1server.py

	for k, v in pycompat.iteritems(opts):			for k, v in pycompat.iteritems(opts):
	keytype = wireprototypes.GETBUNDLE_ARGUMENTS[k]			keytype = wireprototypes.GETBUNDLE_ARGUMENTS[k]
	if keytype == b'nodes':			if keytype == b'nodes':
	opts[k] = wireprototypes.decodelist(v)			opts[k] = wireprototypes.decodelist(v)
	elif keytype == b'csv':			elif keytype == b'csv':
	opts[k] = list(v.split(b','))			opts[k] = list(v.split(b','))
	elif keytype == b'scsv':			elif keytype == b'scsv':
	opts[k] = set(v.split(b','))			opts[k] = set(v.split(b','))
				elif keytype == b'encode_qcsv':
				opts[k] = wireprototypes.encode_qcsv(v)
	elif keytype == b'boolean':			elif keytype == b'boolean':
	# Client should serialize False as '0', which is a non-empty string			# Client should serialize False as '0', which is a non-empty string
	# so it evaluates as a True bool.			# so it evaluates as a True bool.
	if v == b'0':			if v == b'0':
	opts[k] = False			opts[k] = False
	else:			else:
	opts[k] = bool(v)			opts[k] = bool(v)
	elif keytype != b'plain':			elif keytype != b'plain':

relnotes/next

	revlog-compression=zstd, zlib			revlog-compression=zstd, zlib

	Will use `zstd` compression for new repositories is available, and will			Will use `zstd` compression for new repositories is available, and will
	simply fall back to `zlib` if not.			simply fall back to `zlib` if not.

	* `hg debugmergestate` output is now templated, which may be useful			* `hg debugmergestate` output is now templated, which may be useful
	e.g. for IDEs that want to help the user resolve merge conflicts.			e.g. for IDEs that want to help the user resolve merge conflicts.

				* The experimental `narrow` extension will now be able to have include or
				exclude patterns that have a comma in the name when both client and server
				are updated.


	== New Experimental Features ==			== New Experimental Features ==

	* `hg copy` now supports a `--at-rev` argument to mark files as			* `hg copy` now supports a `--at-rev` argument to mark files as
	copied in the specified commit. It only works with `--after` for			copied in the specified commit. It only works with `--after` for
	now (i.e., it's only useful for marking files copied using non-hg			now (i.e., it's only useful for marking files copied using non-hg
	`cp` as copied).			`cp` as copied).

tests/test-doctest.py

	('mercurial.minirst', '{}'),			('mercurial.minirst', '{}'),
	('mercurial.parser', '{}'),			('mercurial.parser', '{}'),
	('mercurial.patch', '{}'),			('mercurial.patch', '{}'),
	('mercurial.pathutil', '{}'),			('mercurial.pathutil', '{}'),
	('mercurial.pycompat', '{}'),			('mercurial.pycompat', '{}'),
	('mercurial.revlogutils.deltas', '{}'),			('mercurial.revlogutils.deltas', '{}'),
	('mercurial.revset', '{}'),			('mercurial.revset', '{}'),
	('mercurial.revsetlang', '{}'),			('mercurial.revsetlang', '{}'),
				('mercurial.wireprototypes', '{}'),
	('mercurial.simplemerge', '{}'),			('mercurial.simplemerge', '{}'),
	('mercurial.smartset', '{}'),			('mercurial.smartset', '{}'),
	('mercurial.store', '{}'),			('mercurial.store', '{}'),
	('mercurial.subrepo', '{}'),			('mercurial.subrepo', '{}'),
	('mercurial.templater', '{}'),			('mercurial.templater', '{}'),
	('mercurial.ui', '{}'),			('mercurial.ui', '{}'),
	('mercurial.util', "{'testtarget': 'platform'}"),			('mercurial.util', "{'testtarget': 'platform'}"),
	('mercurial.util', '{}'),			('mercurial.util', '{}'),

tests/test-narrow-exchange.t

	remote: adding manifests			remote: adding manifests
	remote: adding file changes			remote: adding file changes
	remote: added 1 changesets with 0 changes to 0 files (no-lfs-on !)			remote: added 1 changesets with 0 changes to 0 files (no-lfs-on !)
	remote: error: pretxnchangegroup.lfs hook raised an exception: data/inside2/f.i@f59b4e021835: no match found (lfs-on !)			remote: error: pretxnchangegroup.lfs hook raised an exception: data/inside2/f.i@f59b4e021835: no match found (lfs-on !)
	remote: transaction abort! (lfs-on !)			remote: transaction abort! (lfs-on !)
	remote: rollback completed (lfs-on !)			remote: rollback completed (lfs-on !)
	remote: abort: data/inside2/f.i@f59b4e021835: no match found! (lfs-on !)			remote: abort: data/inside2/f.i@f59b4e021835: no match found! (lfs-on !)
	abort: stream ended unexpectedly (got 0 bytes, expected 4) (lfs-on !)			abort: stream ended unexpectedly (got 0 bytes, expected 4) (lfs-on !)

				Test paths with commas in them
				$ cd $TESTTMP
				$ hg init commas-master
				$ cd commas-master
				$ mkdir a,b
				$ mkdir a,b/c,d
				$ mkdir a,b/e,f
				$ mkdir g
				$ echo abcd > a,b/c,d/abcd
				$ echo abef > a,b/e,f/abef
				$ echo ghi > g/h,i
				$ hg ci -qAm r0
				$ echo abcd2 >> a,b/c,d/abcd
				$ echo abef2 >> a,b/e,f/abef
				$ echo ghi2 >> g/h,i
				$ hg ci -qm r1
				$ cd ..

				Test that we can pull and push with a file that has a comma in the name, even
				though the commas don't appear in the narrowspec file (since they're just
				filenames)
				$ hg clone --narrow ssh://user@dummy/commas-master commas-in-file \
				> --include g -qr 0
				$ cd commas-in-file
				$ hg pull -q
				$ echo ghi3 >> g/h,i
				$ hg ci -qm 'modify g/h,i'
				$ hg push -qf
				$ cd ..

				Test commas in the --include, plus pull+push
				$ hg clone --narrow ssh://user@dummy/commas-master commas-in-dir \
				> --include a,b --exclude a,b/c,d -qr 0
				$ cd commas-in-dir
				$ hg pull -q
				$ echo abef3 >> a,b/e,f/abef
				$ hg ci -qm 'modify a,b/e,f'
				$ hg push -qf

				Test that --{add,remove}{include,exclude} work with commas in the directory
				names.
				$ hg tracked
				I path:a,b
				X path:a,b/c,d
				$ hg tracked --removeexclude a,b/c,d --addinclude a,b/e,f -q
				$ hg tracked
				I path:a,b
				I path:a,b/e,f
				$ hg files
				a,b/c,d/abcd
				a,b/e,f/abef
				$ hg tracked --removeinclude a,b/e,f --addexclude a,b/c,d -q
				$ hg tracked
				I path:a,b
				X path:a,b/c,d
				$ hg files
				a,b/e,f/abef
				$ cd ..