This is an archive of the discontinued Mercurial Phabricator instance.

patch: buffer lines for a same hunk
ClosedPublic

Authored by quark on Apr 9 2018, 6:59 PM.

Download Raw Diff

Details

Reviewers

yuja
durin42

Group Reviewers

hg-reviewers

Commits

rHG5471348921c1: patch: buffer lines for a same hunk

Summary

Instead of yielding tokens directly, buffer them if they belong to a same
hunk. This makes it easier for the upcoming new worddiff algorithm to only
focus on the diff hunk, instead of having to worry about other contents.

This breaks how the existing experimental worddiff algorithm works, so the
algorithm was removed, and related tests are disabled for now. The next patch
will add a new worddiff algorithm.

Diff Detail

Repository

rHG Mercurial

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

Event Timeline

quark created this revision.Apr 9 2018, 6:59 PM

Herald added a reviewer: hg-reviewers. · View Herald TranscriptApr 9 2018, 6:59 PM

Herald added a subscriber: mercurial-devel. · View Herald Transcript

quark added a child revision: D3212: patch: implement a new worddiff algorithm.Apr 9 2018, 6:59 PM

yuja requested changes to this revision.Apr 10 2018, 10:46 AM

yuja added a subscriber: yuja.

yuja added inline comments.

mercurial/patch.py
2490	Don't use `bytes[n]` since it returns an integer on Python 3. That's why there were silly `startswith(char)`.
2554	I think `hunkbuffer` could be `(alines_without_pluses, blines_without_minuses)` so `difsinglehunkinline()` function will be slightly simpler.
2578	Can you split this patch to drop the current inlinediff implementation and buffer hunk lines?

This revision now requires changes to proceed.Apr 10 2018, 10:46 AM

durin42 accepted this revision.Apr 16 2018, 7:01 PM

Closed by commit rHG5471348921c1: patch: buffer lines for a same hunk (authored by quark). · Explain WhyApr 16 2018, 7:11 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents
Changeset List

			Path	Packages
M			mercurial/patch.py (160 lines)
M			tests/test-diff-color.t (5 lines)

Status	Author	Revision
Closed	quark	D3212 patch: implement a new worddiff algorithm
Closed	quark	D3211 patch: buffer lines for a same hunk
Closed	quark	D3210 patch: move yielding "\n" to the end of loop

Diff 8334

mercurial/patch.py

	# patch.py - patch file parsing routines			# patch.py - patch file parsing routines
	#			#
	# Copyright 2006 Brendan Cully <brendan@kublai.com>			# Copyright 2006 Brendan Cully <brendan@kublai.com>
	# Copyright 2007 Chris Mason <chris.mason@oracle.com>			# Copyright 2007 Chris Mason <chris.mason@oracle.com>
	#			#
	# This software may be used and distributed according to the terms of the			# This software may be used and distributed according to the terms of the
	# GNU General Public License version 2 or any later version.			# GNU General Public License version 2 or any later version.

	from __future__ import absolute_import, print_function			from __future__ import absolute_import, print_function

	import collections			import collections
	import contextlib			import contextlib
	import copy			import copy
	import difflib
	import email			import email
	import errno			import errno
	import hashlib			import hashlib
	import os			import os
	import posixpath			import posixpath
	import re			import re
	import shutil			import shutil
	import tempfile			import tempfile
	raise GitDiffRequired			raise GitDiffRequired
	# Buffer the whole output until we are sure it can be generated			# Buffer the whole output until we are sure it can be generated
	return list(difffn(opts.copy(git=False), losedata))			return list(difffn(opts.copy(git=False), losedata))
	except GitDiffRequired:			except GitDiffRequired:
	return difffn(opts.copy(git=True), None)			return difffn(opts.copy(git=True), None)
	else:			else:
	return difffn(opts, None)			return difffn(opts, None)

				def diffsinglehunk(hunklines):
				"""yield tokens for a list of lines in a single hunk"""
				for line in hunklines:
				# chomp
				chompline = line.rstrip('\n')
				# highlight tabs and trailing whitespace
				stripline = chompline.rstrip()
				if line[0] == '-':
				yujaUnsubmitted Not Done Don't use `bytes[n]` since it returns an integer on Python 3. That's why there were silly `startswith(char)`. yuja: Don't use `bytes[n]` since it returns an integer on Python 3. That's why there were silly…
				label = 'diff.deleted'
				elif line[0] == '+':
				label = 'diff.inserted'
				else:
				raise error.ProgrammingError('unexpected hunk line: %s' % line)
				for token in tabsplitter.findall(stripline):
				if '\t' == token[0]:
				yield (token, 'diff.tab')
				else:
				yield (token, label)

				if chompline != stripline:
				yield (chompline[len(stripline):], 'diff.trailingwhitespace')
				if chompline != line:
				yield (line[len(chompline):], '')

	def difflabel(func, args, *kw):			def difflabel(func, args, *kw):
	'''yields 2-tuples of (output, label) based on the output of func()'''			'''yields 2-tuples of (output, label) based on the output of func()'''
	inlinecolor = False
	if kw.get(r'opts'):
	inlinecolor = kw[r'opts'].worddiff
	headprefixes = [('diff', 'diff.diffline'),			headprefixes = [('diff', 'diff.diffline'),
	('copy', 'diff.extended'),			('copy', 'diff.extended'),
	('rename', 'diff.extended'),			('rename', 'diff.extended'),
	('old', 'diff.extended'),			('old', 'diff.extended'),
	('new', 'diff.extended'),			('new', 'diff.extended'),
	('deleted', 'diff.extended'),			('deleted', 'diff.extended'),
	('index', 'diff.extended'),			('index', 'diff.extended'),
	('similarity', 'diff.extended'),			('similarity', 'diff.extended'),
	('---', 'diff.file_a'),			('---', 'diff.file_a'),
	('+++', 'diff.file_b')]			('+++', 'diff.file_b')]
	textprefixes = [('@', 'diff.hunk'),			textprefixes = [('@', 'diff.hunk'),
	('-', 'diff.deleted'),			# - and + are handled by diffsinglehunk
	('+', 'diff.inserted')]			]
	head = False			head = False

				# buffers a hunk, i.e. adjacent "-", "+" lines without other changes.
				hunkbuffer = []
				def consumehunkbuffer():
				if hunkbuffer:
				for token in diffsinglehunk(hunkbuffer):
				yield token
				hunkbuffer[:] = []

	for chunk in func(args, *kw):			for chunk in func(args, *kw):
	lines = chunk.split('\n')			lines = chunk.split('\n')
	matches = {}
	if inlinecolor:
	matches = _findmatches(lines)
	linecount = len(lines)			linecount = len(lines)
	for i, line in enumerate(lines):			for i, line in enumerate(lines):
	if head:			if head:
	if line.startswith('@'):			if line.startswith('@'):
	head = False			head = False
	else:			else:
	if line and not line.startswith((' ', '+', '-', '@', '\\')):			if line and not line.startswith((' ', '+', '-', '@', '\\')):
	head = True			head = True
	stripline = line
	diffline = False			diffline = False
	if not head and line and line.startswith(('+', '-')):			if not head and line and line.startswith(('+', '-')):
	# highlight tabs and trailing whitespace, but only in
	# changed lines
	stripline = line.rstrip()
	diffline = True			diffline = True

	prefixes = textprefixes			prefixes = textprefixes
	if head:			if head:
	prefixes = headprefixes			prefixes = headprefixes
	for prefix, label in prefixes:
	if stripline.startswith(prefix):
	if diffline:			if diffline:
	if i in matches:			# buffered
	for t, l in _inlinediff(lines[i].rstrip(),			bufferedline = line
	lines[matches[i]].rstrip(),			if i + 1 < linecount:
	label):			bufferedline += "\n"
	yield (t, l)			hunkbuffer.append(bufferedline)
				yujaUnsubmitted Not Done I think `hunkbuffer` could be `(alines_without_pluses, blines_without_minuses)` so `difsinglehunkinline()` function will be slightly simpler. yuja: I think `hunkbuffer` could be `(alines_without_pluses, blines_without_minuses)` so…
	else:
	for token in tabsplitter.findall(stripline):
	if token.startswith('\t'):
	yield (token, 'diff.tab')
	else:
	yield (token, label)
	else:			else:
				# unbuffered
				for token in consumehunkbuffer():
				yield token
				stripline = line.rstrip()
				for prefix, label in prefixes:
				if stripline.startswith(prefix):
	yield (stripline, label)			yield (stripline, label)
				if line != stripline:
				yield (line[len(stripline):],
				'diff.trailingwhitespace')
	break			break
	else:			else:
	yield (line, '')			yield (line, '')
	if line != stripline:
	yield (line[len(stripline):], 'diff.trailingwhitespace')
	if i + 1 < linecount:			if i + 1 < linecount:
	yield ('\n', '')			yield ('\n', '')
				for token in consumehunkbuffer():
	def _findmatches(slist):			yield token
	'''Look for insertion matches to deletion and returns a dict of
	correspondences.
	'''
	lastmatch = 0
	matches = {}
	for i, line in enumerate(slist):
	if line == '':
	continue
	if line.startswith('-'):
	lastmatch = max(lastmatch, i)
	newgroup = False
	for j, newline in enumerate(slist[lastmatch + 1:]):
	if newline == '':
	continue
	if newline.startswith('-') and newgroup: # too far, no match
	break
	if newline.startswith('+'): # potential match
	newgroup = True
	sim = difflib.SequenceMatcher(None, line, newline).ratio()
	if sim > 0.7:
	lastmatch = lastmatch + 1 + j
	matches[i] = lastmatch
	matches[lastmatch] = i
	break
	return matches

	def _inlinediff(s1, s2, operation):
	yujaUnsubmitted Not Done Can you split this patch to drop the current inlinediff implementation and buffer hunk lines? yuja: Can you split this patch to * drop the current inlinediff implementation * and buffer hunk…
	'''Perform string diff to highlight specific changes.'''
	operation_skip = ('+', '?') if operation == 'diff.deleted' else ('-', '?')
	if operation == 'diff.deleted':
	s2, s1 = s1, s2

	buff = []
	# we never want to higlight the leading +-
	if operation == 'diff.deleted' and s2.startswith('-'):
	label = operation
	token = '-'
	s2 = s2[1:]
	s1 = s1[1:]
	elif operation == 'diff.inserted' and s1.startswith('+'):
	label = operation
	token = '+'
	s2 = s2[1:]
	s1 = s1[1:]
	else:
	raise error.ProgrammingError("Case not expected, operation = %s" %
	operation)

	s = difflib.ndiff(_nonwordre.split(s2), _nonwordre.split(s1))
	for part in s:
	if part.startswith(operation_skip) or len(part) == 2:
	continue
	l = operation + '.highlight'
	if part.startswith(' '):
	l = operation
	if part[2:] == '\t':
	l = 'diff.tab'
	if l == label: # contiguous token with same label
	token += part[2:]
	continue
	else:
	buff.append((token, label))
	label = l
	token = part[2:]
	buff.append((token, label))

	return buff

	def diffui(args, *kw):			def diffui(args, *kw):
	'''like diff(), but yields 2-tuples of (output, label) for ui.write()'''			'''like diff(), but yields 2-tuples of (output, label) for ui.write()'''
	return difflabel(diff, args, *kw)			return difflabel(diff, args, *kw)

	def _filepairs(modified, added, removed, copy, opts):			def _filepairs(modified, added, removed, copy, opts):
	'''generates tuples (f1, f2, copyop), where f1 is the name of the file			'''generates tuples (f1, f2, copyop), where f1 is the name of the file
	before and f2 is the the name after. For added files, f1 will be None,			before and f2 is the the name after. For added files, f1 will be None,

tests/test-diff-color.t

	[diff.inserted\|+ assuming this works)]			[diff.inserted\|+ assuming this works)]
	[diff.inserted\|+be changed into four!]			[diff.inserted\|+be changed into four!]

	[diff.deleted\|-three of those lines will]			[diff.deleted\|-three of those lines will]
	[diff.deleted\|-collapse onto one]			[diff.deleted\|-collapse onto one]
	[diff.deleted\|-(to see if it works)]			[diff.deleted\|-(to see if it works)]
	[diff.inserted\|+three of those lines have]			[diff.inserted\|+three of those lines have]
	[diff.inserted\|+collapsed onto one]			[diff.inserted\|+collapsed onto one]
				#if false
	$ hg diff --config experimental.worddiff=True --color=debug			$ hg diff --config experimental.worddiff=True --color=debug
	[diff.diffline\|diff --git a/file1 b/file1]			[diff.diffline\|diff --git a/file1 b/file1]
	[diff.file_a\|--- a/file1]			[diff.file_a\|--- a/file1]
	[diff.file_b\|+++ b/file1]			[diff.file_b\|+++ b/file1]
	[diff.hunk\|@@ -1,16 +1,17 @@]			[diff.hunk\|@@ -1,16 +1,17 @@]
	[diff.deleted\|-this is the ][diff.deleted.highlight\|first][diff.deleted\| line]			[diff.deleted\|-this is the ][diff.deleted.highlight\|first][diff.deleted\| line]
	[diff.deleted\|-this is the second line]			[diff.deleted\|-this is the second line]
	[diff.deleted\|-][diff.deleted.highlight\| ][diff.deleted\|third line starts with space]			[diff.deleted\|-][diff.deleted.highlight\| ][diff.deleted\|third line starts with space]
	[diff.inserted\|+ assuming this works)]			[diff.inserted\|+ assuming this works)]
	[diff.inserted\|+be changed into ][diff.inserted.highlight\|four][diff.inserted\|!]			[diff.inserted\|+be changed into ][diff.inserted.highlight\|four][diff.inserted\|!]

	[diff.deleted\|-three of those lines ][diff.deleted.highlight\|will]			[diff.deleted\|-three of those lines ][diff.deleted.highlight\|will]
	[diff.deleted\|-][diff.deleted.highlight\|collapse][diff.deleted\| onto one]			[diff.deleted\|-][diff.deleted.highlight\|collapse][diff.deleted\| onto one]
	[diff.deleted\|-(to see if it works)]			[diff.deleted\|-(to see if it works)]
	[diff.inserted\|+three of those lines ][diff.inserted.highlight\|have]			[diff.inserted\|+three of those lines ][diff.inserted.highlight\|have]
	[diff.inserted\|+][diff.inserted.highlight\|collapsed][diff.inserted\| onto one]			[diff.inserted\|+][diff.inserted.highlight\|collapsed][diff.inserted\| onto one]
				#endif

	multibyte character shouldn't be broken up in word diff:			multibyte character shouldn't be broken up in word diff:

	$ $PYTHON <<'EOF'			$ $PYTHON <<'EOF'
	> with open("utf8", "wb") as f:			> with open("utf8", "wb") as f:
	> f.write(b"blah \xe3\x82\xa2 blah\n")			> f.write(b"blah \xe3\x82\xa2 blah\n")
	> EOF			> EOF
	$ hg ci -Am 'add utf8 char' utf8			$ hg ci -Am 'add utf8 char' utf8
	$ $PYTHON <<'EOF'			$ $PYTHON <<'EOF'
	> with open("utf8", "wb") as f:			> with open("utf8", "wb") as f:
	> f.write(b"blah \xe3\x82\xa4 blah\n")			> f.write(b"blah \xe3\x82\xa4 blah\n")
	> EOF			> EOF
	$ hg ci -m 'slightly change utf8 char' utf8			$ hg ci -m 'slightly change utf8 char' utf8

				#if false
	$ hg diff --config experimental.worddiff=True --color=debug -c.			$ hg diff --config experimental.worddiff=True --color=debug -c.
	[diff.diffline\|diff --git a/utf8 b/utf8]			[diff.diffline\|diff --git a/utf8 b/utf8]
	[diff.file_a\|--- a/utf8]			[diff.file_a\|--- a/utf8]
	[diff.file_b\|+++ b/utf8]			[diff.file_b\|+++ b/utf8]
	[diff.hunk\|@@ -1,1 +1,1 @@]			[diff.hunk\|@@ -1,1 +1,1 @@]
	[diff.deleted\|-blah ][diff.deleted.highlight\|\xe3\x82\xa2][diff.deleted\| blah] (esc)			[diff.deleted\|-blah ][diff.deleted.highlight\|\xe3\x82\xa2][diff.deleted\| blah] (esc)
	[diff.inserted\|+blah ][diff.inserted.highlight\|\xe3\x82\xa4][diff.inserted\| blah] (esc)			[diff.inserted\|+blah ][diff.inserted.highlight\|\xe3\x82\xa4][diff.inserted\| blah] (esc)
				#endif

Diff	ID	Description	Created	Lint	Unit
Base		Base
Diff 1	7923		Apr 9 2018, 6:59 PM	★	★
Diff 2	8334	rHG5471348921c1e569c15019105ec82f15c522c22d	Mar 19 2018, 7:28 AM	★	★

	[diff.inserted\|+ assuming this works)]			[diff.inserted\|+ assuming this works)]
	[diff.inserted\|+be changed into four!]			[diff.inserted\|+be changed into four!]

	[diff.deleted\|-three of those lines will]			[diff.deleted\|-three of those lines will]
	[diff.deleted\|-collapse onto one]			[diff.deleted\|-collapse onto one]
	[diff.deleted\|-(to see if it works)]			[diff.deleted\|-(to see if it works)]
	[diff.inserted\|+three of those lines have]			[diff.inserted\|+three of those lines have]
	[diff.inserted\|+collapsed onto one]			[diff.inserted\|+collapsed onto one]
				#if false
	$ hg diff --config experimental.worddiff=True --color=debug			$ hg diff --config experimental.worddiff=True --color=debug
	[diff.diffline\|diff --git a/file1 b/file1]			[diff.diffline\|diff --git a/file1 b/file1]
	[diff.file_a\|--- a/file1]			[diff.file_a\|--- a/file1]
	[diff.file_b\|+++ b/file1]			[diff.file_b\|+++ b/file1]
	[diff.hunk\|@@ -1,16 +1,17 @@]			[diff.hunk\|@@ -1,16 +1,17 @@]
	[diff.deleted\|-this is the ][diff.deleted.highlight\|first][diff.deleted\| line]			[diff.deleted\|-this is the ][diff.deleted.highlight\|first][diff.deleted\| line]
	[diff.deleted\|-this is the second line]			[diff.deleted\|-this is the second line]
	[diff.deleted\|-][diff.deleted.highlight\| ][diff.deleted\|third line starts with space]			[diff.deleted\|-][diff.deleted.highlight\| ][diff.deleted\|third line starts with space]
	[diff.inserted\|+ assuming this works)]			[diff.inserted\|+ assuming this works)]
	[diff.inserted\|+be changed into ][diff.inserted.highlight\|four][diff.inserted\|!]			[diff.inserted\|+be changed into ][diff.inserted.highlight\|four][diff.inserted\|!]

	[diff.deleted\|-three of those lines ][diff.deleted.highlight\|will]			[diff.deleted\|-three of those lines ][diff.deleted.highlight\|will]
	[diff.deleted\|-][diff.deleted.highlight\|collapse][diff.deleted\| onto one]			[diff.deleted\|-][diff.deleted.highlight\|collapse][diff.deleted\| onto one]
	[diff.deleted\|-(to see if it works)]			[diff.deleted\|-(to see if it works)]
	[diff.inserted\|+three of those lines ][diff.inserted.highlight\|have]			[diff.inserted\|+three of those lines ][diff.inserted.highlight\|have]
	[diff.inserted\|+][diff.inserted.highlight\|collapsed][diff.inserted\| onto one]			[diff.inserted\|+][diff.inserted.highlight\|collapsed][diff.inserted\| onto one]
				#endif

	multibyte character shouldn't be broken up in word diff:			multibyte character shouldn't be broken up in word diff:

	$ $PYTHON <<'EOF'			$ $PYTHON <<'EOF'
	> with open("utf8", "wb") as f:			> with open("utf8", "wb") as f:
	> f.write(b"blah \xe3\x82\xa2 blah\n")			> f.write(b"blah \xe3\x82\xa2 blah\n")
	> EOF			> EOF
	$ hg ci -Am 'add utf8 char' utf8			$ hg ci -Am 'add utf8 char' utf8
	$ $PYTHON <<'EOF'			$ $PYTHON <<'EOF'
	> with open("utf8", "wb") as f:			> with open("utf8", "wb") as f:
	> f.write(b"blah \xe3\x82\xa4 blah\n")			> f.write(b"blah \xe3\x82\xa4 blah\n")
	> EOF			> EOF
	$ hg ci -m 'slightly change utf8 char' utf8			$ hg ci -m 'slightly change utf8 char' utf8

				#if false
	$ hg diff --config experimental.worddiff=True --color=debug -c.			$ hg diff --config experimental.worddiff=True --color=debug -c.
	[diff.diffline\|diff --git a/utf8 b/utf8]			[diff.diffline\|diff --git a/utf8 b/utf8]
	[diff.file_a\|--- a/utf8]			[diff.file_a\|--- a/utf8]
	[diff.file_b\|+++ b/utf8]			[diff.file_b\|+++ b/utf8]
	[diff.hunk\|@@ -1,1 +1,1 @@]			[diff.hunk\|@@ -1,1 +1,1 @@]
	[diff.deleted\|-blah ][diff.deleted.highlight\|\xe3\x82\xa2][diff.deleted\| blah] (esc)			[diff.deleted\|-blah ][diff.deleted.highlight\|\xe3\x82\xa2][diff.deleted\| blah] (esc)
	[diff.inserted\|+blah ][diff.inserted.highlight\|\xe3\x82\xa4][diff.inserted\| blah] (esc)			[diff.inserted\|+blah ][diff.inserted.highlight\|\xe3\x82\xa4][diff.inserted\| blah] (esc)
				#endif