bdiff: write a native version of splitnewlines

Authored by durin42 on Feb 1 2018, 4:58 PM.



./hg perfunidiff mercurial/ 0 --count 500 --profile before:
! wall 0.309280 comb 0.350000 user 0.290000 sys 0.060000 (best of 32)

./hg perfunidiff mercurial/ 0 --count 500 --profile after:
! wall 0.241572 comb 0.260000 user 0.240000 sys 0.020000 (best of 39)

so it's about 20% faster. I hate Python. I wish we could usefully
write this in Rust, but it doesn't look like that's realistic without
using the cpython crate, which I'd still like to avoid.

Diff Detail

rHG Mercurial
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.
durin42 created this revision.Feb 1 2018, 4:58 PM
indygreg requested changes to this revision.Feb 1 2018, 6:08 PM
indygreg added subscribers: yuja, indygreg.

This looks mostly good. Needs some minor tweaks. Some of my comments are informative and can probably be ignored as far as reviewing goes.


Let's use PyBytes here for compatibility with Python 3.

Also, Py_buffer here would let us avoid a memory allocation. Unfortunately, various C extensions in older versions of Python 2.7 don't recognize the Py_buffer interface (I'm looking at you zlib). So we need to be careful about where all we use Py_buffer tricks :( It would be really nice if *all* of this code were in C so we didn't have to allocate a PyObject for every line: we could just pass an array of line offsets around.


The PyList is pre-allocated. That means you can use PyList_SET_ITEM() for even faster execution.


Does our style guideline not require braces yet? Seeing this reminds me of goto fail :(


For a micro optimization, I bet if you rewrite this to iterate over chunks of size size_t and do bit tests that this will be even faster. The searching for newlines is a hot loop in the bdiff code. Unfortunately, I never completed my refactor to optimize the line scanning.


I have a feeling this extra line scan will matter in a benchmark. Could you perf record the new hg perf* command and verify? If it is a big deal, I would allocate an int[16384] array on the stack or something to deal with the common case and store additional newline offsets on the heap so we only do the newline scan once.


@yuja may have an opinion on this, but mine is that we now have stronger guarantees around C extension versioning, so if the active module policy is to use C extensions, we should ensure we get splitnewlines from the C module. I /think/ if we move splitnewlines to the bdiff module that things will sort themselves out? This could be done as a follow-up easily enough though.

This revision now requires changes to proceed.Feb 1 2018, 6:08 PM
yuja added inline comments.Feb 2 2018, 8:34 AM

Perhaps Py_ssize_t is preferred.


Here result isn't initialized to NULL yet.


I don't know which is faster, but if we do preallocate a buffer, we can create
a large PyList and shrink it by PyList_SetSlice(..., NULL) at the end.

builtin_filter() of Python 2 do that.


if we move splitnewlines to the bdiff module that things will sort themselves out

Yes, splitnewlines should be moved to pure/ And we need to bump the C API version.

durin42 updated this revision to Diff 5160.Feb 2 2018, 11:26 AM
durin42 marked 7 inline comments as done.
durin42 added inline comments.Feb 2 2018, 11:26 AM

It's already enough of a win I'd like to land the simple version and then focus on spending effort on a Rust version rather than continuing with this. How do you feel about that as a plan?

yuja requested changes to this revision.Feb 3 2018, 3:17 AM

Looks mostly good to me. I hesitated to fix minor nits in flight because
it's C.


Nit: static bool sliceintolist(


start < size should be always true because size > 0. Otherwise,
PyList_New(nelts + 1) could be wrong.

This revision now requires changes to proceed.Feb 3 2018, 3:17 AM
durin42 updated this revision to Diff 5487.Feb 12 2018, 10:23 AM
durin42 marked an inline comment as done.
durin42 marked an inline comment as done.Feb 12 2018, 10:24 AM

Good catches. This should be ready now.

I also added bdiff.c to clang-format oversight in a newly inserted parent, because the file is simple enough that doing so was easy.

indygreg accepted this revision.Feb 12 2018, 2:11 PM

I'm happy with this as a first revision.

While I'm accepting as hg-reviewers, I think C code should have an extra set of eyes. So I'll defer to @yuja to queue it.

For the record, I'm no fan of not having braces for all bodies of conditionals. Can't wait to globally reformat our code to fix that.


This doesn't need to be static. I'd declare it as inline bool sliceintolist(.

yuja accepted this revision.Feb 13 2018, 7:10 AM

Queued, thanks.


IIRC, inline doesn't imply a function has a file scope, and inline
could be `` on unsupported platforms.

This revision is now accepted and ready to land.Feb 13 2018, 7:10 AM
This revision was automatically updated to reflect the committed changes.