This is an archive of the discontinued Mercurial Phabricator instance.

Differential D7252

dirs: reject consecutive slashes in paths
ClosedPublic

Authored by durin42 on Nov 6 2019, 12:16 AM.

Download Raw Diff

Details

Reviewers

indygreg

Group Reviewers

hg-reviewers

Commits

rHG5d40317d42b7: dirs: reject consecutive slashes in paths
rHGd14d245c78ed: dirs: reject consecutive slashes in paths

Summary

We shouldn't ever see those, and the fuzzer go really excited that if
it gives us a 65k string with 55k slashes in it we use a lot of RAM.

This is a better fix than what I tried in D7105. It was suggested by
Yuya, and I verified it does in fact cause the fuzzer to not OOM.

This is a revision of D7234, but with the missing set of an error
added. I added a unit test of the dirs behavior because I needed to
reason more carefully about the failure modes around consecutive
slashes.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

durin42 created this revision.Nov 6 2019, 12:16 AM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptNov 6 2019, 12:16 AM

Herald added a subscriber: mercurial-devel. · View Herald Transcript

durin42 edited the summary of this revision. (Show Details)Nov 6 2019, 12:18 AM

durin42 updated this revision to Diff 17619.

durin42 edited the summary of this revision. (Show Details)Nov 6 2019, 1:10 AM

durin42 updated this revision to Diff 17620.

indygreg accepted this revision.Nov 7 2019, 3:34 AM

This revision is now accepted and ready to land.Nov 7 2019, 3:34 AM

durin42 added a commit: rHGd14d245c78ed: dirs: reject consecutive slashes in paths.Nov 7 2019, 3:36 AM

Closed by commit rHGd14d245c78ed: dirs: reject consecutive slashes in paths (authored by durin42). · Explain Why

This revision was automatically updated to reflect the committed changes.

durin42 added a commit: rHG5d40317d42b7: dirs: reject consecutive slashes in paths.Nov 7 2019, 11:06 AM

Sorry to necropost, but since this broke the Rust implementation, I was wondering what the best approach would be to replicate this behavior, and I am starting to think that this should be reverted.

IIUC, currently any new path passes through the pathauditor first for validation, so any further check would be completely redundant.
My intuition is that adding this check is purely here to satisfy the fuzzer, but would never happen in real life. Adding checks to this (very) internal data structure comes at a cost, both in performance and in code ergonomics.

In D7252#109627, @Alphare wrote:

Sorry to necropost, but since this broke the Rust implementation, I was wondering what the best approach would be to replicate this behavior, and I am starting to think that this should be reverted.

Oh, you mean the Rust version doesn't do the same rejection?

Given that you're about to do a hash lookup, I'm a little skeptical that a endswith('/') check would show up meaningfully in a profiler, but I'm willing to be proven wrong?

IIUC, currently any new path passes through the pathauditor first for validation, so any further check would be completely redundant.

Plausible, but I'd like some sort of test coverage demonstrating that.

My intuition is that adding this check is purely here to satisfy the fuzzer, but would never happen in real life. Adding checks to this (very) internal data structure comes at a cost, both in performance and in code ergonomics.

Yes, this was largely added to make the fuzzer not get stuck on OOM conditions. That said, if the pathauditor can't catch this, we need to defend against this DoS vector at this layer, and it's such a small check at this layer I'm inclined to keep it unless it is measurably slowing down real uses...

In D7252#109656, @durin42 wrote:

In D7252#109627, @Alphare wrote:

Sorry to necropost, but since this broke the Rust implementation, I was wondering what the best approach would be to replicate this behavior, and I am starting to think that this should be reverted.

Oh, you mean the Rust version doesn't do the same rejection?
Given that you're about to do a hash lookup, I'm a little skeptical that a endswith('/') check would show up meaningfully in a profiler, but I'm willing to be proven wrong?

IIUC, currently any new path passes through the pathauditor first for validation, so any further check would be completely redundant.

Plausible, but I'd like some sort of test coverage demonstrating that.

My intuition is that adding this check is purely here to satisfy the fuzzer, but would never happen in real life. Adding checks to this (very) internal data structure comes at a cost, both in performance and in code ergonomics.

Yes, this was largely added to make the fuzzer not get stuck on OOM conditions. That said, if the pathauditor can't catch this, we need to defend against this DoS vector at this layer, and it's such a small check at this layer I'm inclined to keep it unless it is measurably slowing down real uses...

I should add: if we can substantiate that such a path can't make it through the pathauditor (and we have tests for that in the pathauditor layer so we don't break that in the future!) we can push this consecutive-slashes check into dirs_fuzzer.cc and remove it from dirs.c.

Oh, you mean the Rust version doesn't do the same rejection?

It does not, currently.

Given that you're about to do a hash lookup, I'm a little skeptical that a endswith('/') check would show up meaningfully in a profiler, but I'm willing to be proven wrong?

I have the same intuition you do, it's mostly about not repeating the same operations that have any impact at all on performance/code.

I should add: if we can substantiate that such a path can't make it through the pathauditor (and we have tests for that in the pathauditor layer so we don't break that in the future!) we can push this consecutive-slashes check into dirs_fuzzer.cc and remove it from dirs.c.

It was my candid intuition that the pathauditor was already "ensured" as the barrier for path-based vulnerabilities.
Since my plan for the Rust implementation was to enforce that very fact, I guess I'll be the one to write said tests when I write a Rust pathauditor. It should come up not too long after my current work (in-progress) about matchers, since handling unknown files in any capacity will require the pathauditor.

I'm not 100% sure of the implications of writing the aforementioned tests, if it's too much work, I can replicate this check in Rust for the time being.

In practice, this means the tests consistently fails when testing with Rust. Can we either have a quick fix of the Rust code or a temporary backout of this (until we Rust code is fixed)?

Alphare mentioned this in D7503: rust-dirs: address failing tests for `dirs` impl with a temporary fix.Nov 22 2019, 4:46 AM

Alphare mentioned this in rHG20a3bf5e71d6: rust-dirs: address failing tests for `dirs` impl with a temporary fix.Nov 22 2019, 12:44 PM

Alphare mentioned this in rHG1fe2e574616e: rust-dirs: address failing tests for `dirs` impl with a temporary fix.Dec 3 2019, 10:59 AM

Revision Contents
Changeset List

		Path
M		mercurial/cext/dirs.c (8 lines)
M		mercurial/util.py (4 lines)
A	M	tests/test-dirs.py (27 lines)

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	17618		Nov 6 2019, 12:16 AM	★	★
Diff 2	17619		Nov 6 2019, 12:18 AM	★	★
Diff 3	17620		Nov 6 2019, 1:10 AM	★	★
Diff 4	17709	rHGd14d245c78edccb51db215d3c2afc0acb8894593	Oct 17 2019, 7:29 PM	★	★

Diff 17709

mercurial/cext/dirs.c

	* implementation details. We also commit violations of the Python			* implementation details. We also commit violations of the Python
	* "protocol" such as mutating immutable objects. But since we only			* "protocol" such as mutating immutable objects. But since we only
	* mutate objects created in this function or in other well-defined			* mutate objects created in this function or in other well-defined
	* locations, the references are known so these violations should go			* locations, the references are known so these violations should go
	* unnoticed. */			* unnoticed. */
	while ((pos = _finddir(cpath, pos - 1)) != -1) {			while ((pos = _finddir(cpath, pos - 1)) != -1) {
	PyObject *val;			PyObject *val;

				/* Sniff for trailing slashes, a marker of an invalid input. */
				if (pos > 0 && cpath[pos - 1] == '/') {
				PyErr_SetString(
				PyExc_ValueError,
				"found invalid consecutive slashes in path");
				goto bail;
				}

	key = PyBytes_FromStringAndSize(cpath, pos);			key = PyBytes_FromStringAndSize(cpath, pos);
	if (key == NULL)			if (key == NULL)
	goto bail;			goto bail;

	val = PyDict_GetItem(dirs, key);			val = PyDict_GetItem(dirs, key);
	if (val != NULL) {			if (val != NULL) {
	PYLONG_VALUE(val) += 1;			PYLONG_VALUE(val) += 1;
	Py_CLEAR(key);			Py_CLEAR(key);

mercurial/util.py

	)			)
	else:			else:
	for f in map:			for f in map:
	addpath(f)			addpath(f)

	def addpath(self, path):			def addpath(self, path):
	dirs = self._dirs			dirs = self._dirs
	for base in finddirs(path):			for base in finddirs(path):
				if base.endswith(b'/'):
				raise ValueError(
				"found invalid consecutive slashes in path: %r" % base
				)
	if base in dirs:			if base in dirs:
	dirs[base] += 1			dirs[base] += 1
	return			return
	dirs[base] = 1			dirs[base] = 1

	def delpath(self, path):			def delpath(self, path):
	dirs = self._dirs			dirs = self._dirs
	for base in finddirs(path):			for base in finddirs(path):

tests/test-dirs.py

This file was added.

				from __future__ import absolute_import

				import unittest

				import silenttestrunner

				from mercurial import util


				class dirstests(unittest.TestCase):
				def testdirs(self):
				for case, want in [
				(b'a/a/a', [b'a', b'a/a', b'']),
				(b'alpha/beta/gamma', [b'', b'alpha', b'alpha/beta']),
				]:
				d = util.dirs({})
				d.addpath(case)
				self.assertEqual(sorted(d), sorted(want))

				def testinvalid(self):
				with self.assertRaises(ValueError):
				d = util.dirs({})
				d.addpath(b'a//b')


				if __name__ == '__main__':
				silenttestrunner.main(__name__)