This is an archive of the discontinued Mercurial Phabricator instance.

bdiff: write a native version of splitnewlines
ClosedPublic

Authored by durin42 on Feb 1 2018, 4:58 PM.

Details

Summary

./hg perfunidiff mercurial/manifest.py 0 --count 500 --profile before:
! wall 0.309280 comb 0.350000 user 0.290000 sys 0.060000 (best of 32)

./hg perfunidiff mercurial/manifest.py 0 --count 500 --profile after:
! wall 0.241572 comb 0.260000 user 0.240000 sys 0.020000 (best of 39)

so it's about 20% faster. I hate Python. I wish we could usefully
write this in Rust, but it doesn't look like that's realistic without
using the cpython crate, which I'd still like to avoid.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

durin42 created this revision.Feb 1 2018, 4:58 PM
indygreg requested changes to this revision.

This looks mostly good. Needs some minor tweaks. Some of my comments are informative and can probably be ignored as far as reviewing goes.

mercurial/cext/bdiff.c
185

Let's use PyBytes here for compatibility with Python 3.

Also, Py_buffer here would let us avoid a memory allocation. Unfortunately, various C extensions in older versions of Python 2.7 don't recognize the Py_buffer interface (I'm looking at you zlib). So we need to be careful about where all we use Py_buffer tricks :( It would be really nice if *all* of this code were in C so we didn't have to allocate a PyObject for every line: we could just pass an array of line offsets around.

188

The PyList is pre-allocated. That means you can use PyList_SET_ITEM() for even faster execution.

199–200

Does our style guideline not require braces yet? Seeing this reminds me of goto fail :(

206–210

For a micro optimization, I bet if you rewrite this to iterate over chunks of size size_t and do bit tests that this will be even faster. The searching for newlines is a hot loop in the bdiff code. Unfortunately, I never completed my refactor to optimize the line scanning.

214–215

I have a feeling this extra line scan will matter in a benchmark. Could you perf record the new hg perf* command and verify? If it is a big deal, I would allocate an int[16384] array on the stack or something to deal with the common case and store additional newline offsets on the heap so we only do the newline scan once.

mercurial/mdiff.py
34

@yuja may have an opinion on this, but mine is that we now have stronger guarantees around C extension versioning, so if the active module policy is to use C extensions, we should ensure we get splitnewlines from the C module. I /think/ if we move splitnewlines to the bdiff module that things will sort themselves out? This could be done as a follow-up easily enough though.

This revision now requires changes to proceed.Feb 1 2018, 6:08 PM
yuja added inline comments.Feb 2 2018, 8:34 AM
mercurial/cext/bdiff.c
195

Perhaps Py_ssize_t is preferred.

200

Here result isn't initialized to NULL yet.

214–215

I don't know which is faster, but if we do preallocate a buffer, we can create
a large PyList and shrink it by PyList_SetSlice(..., NULL) at the end.

builtin_filter() of Python 2 do that.

mercurial/mdiff.py
34

if we move splitnewlines to the bdiff module that things will sort themselves out

Yes, splitnewlines should be moved to pure/bdiff.py. And we need to bump the C API version.

durin42 marked 7 inline comments as done.Feb 2 2018, 11:26 AM
durin42 updated this revision to Diff 5160.
durin42 added inline comments.Feb 2 2018, 11:26 AM
mercurial/cext/bdiff.c
214–215

It's already enough of a win I'd like to land the simple version and then focus on spending effort on a Rust version rather than continuing with this. How do you feel about that as a plan?

yuja requested changes to this revision.Feb 3 2018, 3:17 AM

Looks mostly good to me. I hesitated to fix minor nits in flight because
it's C.

mercurial/cext/bdiff.c
183

Nit: static bool sliceintolist(

222

start < size should be always true because size > 0. Otherwise,
PyList_New(nelts + 1) could be wrong.

This revision now requires changes to proceed.Feb 3 2018, 3:17 AM
durin42 marked an inline comment as done.Feb 12 2018, 10:23 AM
durin42 updated this revision to Diff 5487.
durin42 marked an inline comment as done.Feb 12 2018, 10:24 AM

Good catches. This should be ready now.

I also added bdiff.c to clang-format oversight in a newly inserted parent, because the file is simple enough that doing so was easy.

indygreg accepted this revision.Feb 12 2018, 2:11 PM

I'm happy with this as a first revision.

While I'm accepting as hg-reviewers, I think C code should have an extra set of eyes. So I'll defer to @yuja to queue it.

For the record, I'm no fan of not having braces for all bodies of conditionals. Can't wait to globally reformat our code to fix that.

mercurial/cext/bdiff.c
183

This doesn't need to be static. I'd declare it as inline bool sliceintolist(.

yuja accepted this revision.Feb 13 2018, 7:10 AM

Queued, thanks.

mercurial/cext/bdiff.c
183

IIRC, inline doesn't imply a function has a file scope, and inline
could be `` on unsupported platforms.

This revision is now accepted and ready to land.Feb 13 2018, 7:10 AM
This revision was automatically updated to reflect the committed changes.