This is an archive of the discontinued Mercurial Phabricator instance.

[DRAFT] xdiff: skip trimmed lines when preparing the hashtable
AbandonedPublic

Authored by quark on Mar 3 2018, 9:51 PM.

Details

Reviewers
None
Group Reviewers
hg-reviewers
Summary
NOTE: I'm still reading xdiffi.c to understand whether this is a safe optimization or not.

xdiff has a "xdl_trim_ends" function that removes common prefix and suffix.

Previously, xdiff will build a hashtable for all lines. That is a waste of
time for trimmed lines. This diff changes the logic so trimmed lines will be
ignored when building the hashtable. Note: the hashtable is still needed for
shifting purpose, so it does not blindly take whatever xdl_trim_ends says,
but also looks around.

For the following test case:

#!python
open('a','w').write(''.join('%s\n' % (i % 100000) for i in xrange(10000000)))
open('b','w').write(''.join('%s\n' % (i % 100000) for i in xrange(10000001)))

This series reduces xdiff's time for the above case from 1.1 seconds (D2604)
to 0.6 seconds.

Benchmarking on commands.py, it's 1/4 faster:

hg perfbdiff --count 3000 --blocks --xdiff .hg/store/data/mercurial/commands.py.i 1
# before
! wall 2.050600 comb 2.050000 user 2.040000 sys 0.010000 (best of 5)
# after
! wall 1.510821 comb 1.500000 user 1.500000 sys 0.000000 (best of 6)

However, GNU diffutils can perform even better (<0.1 seconds), there are
still things to catch up.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

quark created this revision.Mar 3 2018, 9:51 PM
quark retitled this revision from [RFC] xdiff: skip trimmed lines when preparing the hashtable to [DRAFT] xdiff: skip trimmed lines when preparing the hashtable.Mar 3 2018, 9:52 PM
quark edited the summary of this revision. (Show Details)Mar 3 2018, 10:18 PM
quark abandoned this revision.Mar 4 2018, 7:50 PM