This is an archive of the discontinued Mercurial Phabricator instance.

xdiff: use xxhash in xdiff
AbandonedPublic

Authored by indygreg on Mar 3 2018, 8:37 PM.

Details

Reviewers
quark
Group Reviewers
hg-reviewers
Summary

Diffing functions split inputs into blocks, typically lines. When
inspecting each block, the diffing function needs to see if two blocks
are identical. Because repeated memcmp() can be expensive, diff
implementations will hash each block to an integer and then do a simple
integer/register compare to check for block equivalence.

It looks like xxhash was using djb2 for this hash. This hash was fine
for its use. But the hash was operating on single bytes at a time
and the output from a previous byte was needed before feeding in a
new byte. So our CPU pipeline was very constricted.

This commit changes the hash to xxhash. This hashing function takes
a start address and a length. Internally, it operates on multiple
bytes at a time. This requires fewer assembly instructions and makes
the hash faster.

It's worth noting we are using the 32-bit version of xxhash. The 64-bit
version is faster on 64-bit hardware and we should ideally use it.
But the 64-bit version returns an unsigned long long and xdiff wants
the hash to be 32 bits. We could probably truncate the hash. Let's
use the 32-bit hash for now.

This change yields a minor perf win on the mozilla-central repo:

$ hg perfbdiff --alldata -c --count 10 --blocks --xdiff 400000
! wall 0.845893 comb 0.840000 user 0.840000 sys 0.000000 (best of 12)
! wall 0.796796 comb 0.790000 user 0.790000 sys 0.000000 (best of 13)

$ hg perfbdiff -m --count 100 --blocks --xdiff 400000
! wall 10.026769 comb 10.030000 user 9.010000 sys 1.020000 (best of 3)
! wall  9.450092 comb  9.460000 user 8.470000 sys 0.990000 (best of 3)

Taken on its own, this isn't a significant improvement. But it does
open the door to more efficient line searching...

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

indygreg created this revision.Mar 3 2018, 8:37 PM
quark accepted this revision.Mar 3 2018, 8:55 PM
quark added a subscriber: quark.

I think the perf win comes from less hash collisions.

quark requested changes to this revision.Mar 3 2018, 9:14 PM

It seems xxhash can make things slower if the line is short.

## test case
open('a','w').write(''.join('%s\n' % (i % 100000) for i in xrange(10000000)))
open('b','w').write(''.join('%s\n' % (i % 100000) for i in xrange(10000001)))
## before xxhash
./xdiff /tmp/[ab] &> /dev/null  0.67s user 0.21s system 99% cpu 0.875 total
## after xxhash
./xdiff /tmp/[ab] &> /dev/null  2.86s user 0.43s system 98% cpu 3.315 total

The memchr change does not change much - it's still about 3 seconds on my machine.

Code I use for testing:

// contrib/xdiff/xdiff.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include "mercurial/thirdparty/xdiff/xdiff.h"

#define abort(...)                                                             \
	{                                                                      \
		fprintf(stderr, __VA_ARGS__);                                  \
		exit(-1);                                                      \
	}

char buf[4096000];
void readfile(const char *path, mmfile_t *file)
{
	memset(file, 0, sizeof(*file));
	FILE *fp = fopen(path, "r");
	if (!fp) {
		abort("cannot open %s\n", path);
	}
	while (!feof(fp)) {
		size_t size = fread(buf, 1, sizeof buf, fp);
		if (size > 0) {
			size_t new_size = file->size + size;
			file->ptr = realloc(file->ptr, new_size);
			if (!file->ptr) {
				abort("cannot allocate\n");
			}
			memcpy(file->ptr + file->size, buf, size);
			file->size = new_size;
		}
	}
	fclose(fp);
}

static int xdiff_outf(void *priv_, mmbuffer_t *mb, int nbuf)
{
	int i;
	for (i = 0; i < nbuf; i++) {
		write(STDOUT_FILENO, mb[i].ptr, mb[i].size);
	}
	return 0;
}

int main(int argc, char const *argv[])
{
	if (argc < 3) {
		abort("usage: %s FILE1 FILE2\n", argv[0]);
	}

	mmfile_t a, b;

	readfile(argv[1], &a);
	readfile(argv[2], &b);

	xpparam_t xpp = {
	    0,    /* flags */
	    NULL, /* anchors */
	    0,    /* anchors_nr */
	};
	xdemitconf_t xecfg = {
	    3,    /* ctxlen */
	    0,    /* interhunkctxlen */
	    0,    /* flags */
	    NULL, /* find_func */
	    NULL, /* find_func_priv */
	    NULL, /* hunk_consume_func */
	};
	xdemitcb_t ecb = {
	    0,           /* priv */
	    &xdiff_outf, /* outf */
	};

	xdl_diff(&a, &b, &xpp, &xecfg, &ecb);

	free(a.ptr);
	free(b.ptr);

	return 0;
}
## contrib/xdiff/Makefile
xdiff: ../../mercurial/thirdparty/xdiff/*.c xdiff.c
	gcc -O2 -g -std=c99 -I../.. -I../../mercurial/thirdparty/xdiff -I../../mercurial -o $@ $^

clean:
	-rm -f xdiff
This revision now requires changes to proceed.Mar 3 2018, 9:14 PM

I can reproduce the results. xxhash needs more than 4-8 bytes to outperform the old hasher. I also suspect more hash collisions are present with xxhash. Although I haven't verified.

The input showing the regression effectively turns the code into a hash table benchmark.

But my results on actual repo data show an improvement.

FWIW, memchr+djb hash perform about the same as the original code on the perfbdiff -m case.

indygreg abandoned this revision.Mar 4 2018, 2:30 PM