- User Since
- Nov 13 2017, 7:11 AM (119 w, 4 d)
Jan 16 2018
I finally got around to testing this properly and I can reproduce the issue. I looked into the code a bit and it is possible that we create keepalive connections before forking and we are illegally multiplexing same connection.
The quick fix on our side is to not use workers on upload action, and it fixes the issue right away. Proper fix would be to fix the https handler, but it doesn't look like an easy one. I don't think I'll get to that any time soon.
Jan 4 2018
@mharbison72 Thank you for commenting with this issue. We didn't roll this to many people yet and didn't see the issue. I will try to test the scenario with upload of many large files and I'll comment back here soon.
Dec 18 2017
Dec 17 2017
Dec 15 2017
Dec 12 2017
@durin42 yes, I tested without remotefilelog (at least I believe it was not being used at that time). I cloned a repo with --config extensions.remotefilelog=! and then put appropriate section in .hg/hgrc
hg config had line
Ran updates between far revisions and verified that threads get started during update. Hg didn't complain about anything and repo stayed healthy.
Dec 11 2017
@mharbison72 I am not sure if these tests are able to satisfy conditions to actually multithread. But you are right it there is an issue we can force 1 worker.
The workers on posix are implemented by forking and the only way of communication is through pipes created by worker.py code. Once forked they only communicate every some # of tasks (file fetches in this case) has been finished by the worker (I think # ~ 100 but not sure). We would have to change POSIX behaviour to allow reporting smaller pieces of progress through pipe (potentially 0 tasks finished). This would need changes in bunch of layers (worker, merge, blobstore) instead the current simple use of progress(...) function.
It is possible to implement that communication, but it is significant amount of work and testing.
Updated the test (as my changes change the output) and retested. Now everything works fine.
I must have messed up something when running tests previously- probably wrong revision. The tests actually catch the failure above:
@mharbison72 you are right, the upload doesn't work because I removed the fliewithprogress wrapper around the file that adds couple functions that I didn't realize. That includes len.
Will fix very soon.
Dec 1 2017
Tested with the server:
Nov 30 2017
Nov 29 2017
Nov 28 2017
Nov 25 2017
The previous issues were related to fb-hgext remotefilelog and https://phab.mercurial-scm.org/D1513 fixes it on the side of remotefilelog.
I will still need to test this code on a repo without remotefilelog to make sure that the normal filelog doesn't hit similar issues.
Nov 21 2017
I need to test these changes a bit more. I found a place in merge.py that has a risk of race condition and need to figure out how to protect it.
Right now there are 2 places where we use workers. 1 in core (merge.py) and there is also us in lfs in fb extensions.
Nov 20 2017
That sounds good. I actually started with a change to manage a single background closer between threads, but the locking code gets a bit complicated and seemed more risky. I didn't know the main reason for 1 background closer was the number of descriptors.
I'll check what disabling backgroundcloser does to the performance.