Page MenuHomePhabricator

workers: implemented worker on windows
ClosedPublic

Authored by wlis on Nov 20 2017, 1:37 PM.

Details

Summary

This change implements thread based worker on windows.
The handling of exception from within threads will happen in separate diff.

The worker is for now used in mercurial/merge.py and in lfs extension

After multiple tests and milions of files materiealized, thousands lfs fetched
it seems that neither merge.py nor lfs/blobstore.py is thread unsafe. I also
looked through the code and besides the backgroundfilecloser (handled in base
of this) things look good.

The performance boost of this on windows is

~50% for sparse --enable-profile

  • Speedup of hg up/rebase - not exactly measured
Test Plan

Ran 10s of hg sparse --enable-profile and --disable-profile operations on large profiles and verified that workers are running. Used sysinternals suite to see that all threads are spawned and run as they should

Run various other operations on the repo including update and rebase

Ran tests on CentOS and all tests that pass on @ pass here

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

wlis created this revision.Nov 20 2017, 1:37 PM
wlis edited the summary of this revision. (Show Details)Nov 20 2017, 1:39 PM

I haven't looked at the code in much detail.

The use of threads for the workers on Windows is obviously better performance wise than the current world where we use a single thread in a single process. However, we'll hit the upper limit of performance pretty quickly because the process will be CPU bound in revlogs. And the upper limit with Python threads will be much worse than the upper limit with a concurrency primitive that doesn't have the GIL.

FWIW, last I checked we only had 1 consumer of the worker API in core: working directory updates. Given limited users of this API and the inability to easily implement multi-process workers on Windows (figuring out how to invoke a new Python process is not trivial because the ways Mercurial can be invoked on Windows), I've been very tempted to remove the worker API. I'm actually thinking about replacing its one use case with Rust. But I don't have any code to show for that and likely won't for several weeks. That being said, I don't want to discourage this code from landing. If others feel it is useful, let's get it in. But its lifetime may be short if the Rust solution materializes.

wlis planned changes to this revision.Nov 21 2017, 12:08 AM

I need to test these changes a bit more. I found a place in merge.py that has a risk of race condition and need to figure out how to protect it.
Right now there are 2 places where we use workers. 1 in core (merge.py) and there is also us in lfs in fb extensions.

We actually hit the CPU limit right away when we start using threading, and I initially looked into implementing something closer to what we do in posix, but as you say that is not easy on Windows as fork doesn't exist. I think that from my perspective putting the applyupdates code into rust and starting multiple processes with that would work, and as you say gives more space for perf improvement than this change.

I am happy with making this code short lived if it gets replaced by a better solution.

wlis added a comment.Nov 25 2017, 11:50 AM

The previous issues were related to fb-hgext remotefilelog and https://phab.mercurial-scm.org/D1513 fixes it on the side of remotefilelog.
I will still need to test this code on a repo without remotefilelog to make sure that the normal filelog doesn't hit similar issues.

wlis requested review of this revision.Nov 30 2017, 7:42 PM
durin42 added a subscriber: durin42.Dec 5 2017, 5:26 PM

Did you test this on a non-remotefilelog repo?

wlis added a comment.EditedDec 12 2017, 4:26 PM

@durin42 yes, I tested without remotefilelog (at least I believe it was not being used at that time). I cloned a repo with --config extensions.remotefilelog=! and then put appropriate section in .hg/hgrc
hg config had line

extensions.remotefilelog=!

Ran updates between far revisions and verified that threads get started during update. Hg didn't complain about anything and repo stayed healthy.

This revision was automatically updated to reflect the committed changes.