store: don't read the whole fncache in memory
ClosedPublic

Authored by pulkit on Nov 22 2018, 7:27 AM.

Details

Summary

In large repositories with lot of files, the fncache grows more than 100 MB and
reading that whole thing into memory slows things down. Let's not read the whole
thing into memory.

This patch changes fncache loading code to read 1 MB at once. Loading 1 MB at
once saves ~1 sec on perffncacheload for our internal repository. I tried
various values such as 0.5 MB, 5 MB, 10 MB but best results were produced using
1 MB as the chunksize.

On a narrow clone with fncache around 40 MB, this patch saves ~0.04 seconds on
average on perffncacheload.

To test the code, I have coded an extension in test-fncache.t which set
chunksize to 1 byte, and the test passes with that.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
pulkit created this revision.Nov 22 2018, 7:27 AM
pulkit edited the summary of this revision. (Show Details)Nov 22 2018, 7:28 AM
yuja added a subscriber: yuja.Nov 22 2018, 8:32 AM

diff --git a/mercurial/store.py b/mercurial/store.py

    • a/mercurial/store.py +++ b/mercurial/store.py @@ -461,13 +461,13 @@
      1. skip nonexistent file self.entries = set() return
  • self.entries = set(decodedir(fp.read()).splitlines())
  • if '' in self.entries:
  • fp.seek(0)
  • for n, line in enumerate(util.iterfile(fp)):
  • if not line.rstrip('\n'):
  • t = _('invalid entry in fncache, line %d') % (n + 1)
  • raise error.Abort(t) + self.entries = set() + for n, line in enumerate(util.iterfile(fp)): + entry = line.rstrip('\n') + if not entry: + t = _('invalid entry in fncache, line %d') % (n + 1) + raise error.Abort(t) + self.entries.add(decodedir(entry))

This goes the opposite direction to 9fca5b056c0a. I guess the current code
would be faster if the fncache is around ~10MB. Can you benchmark it?

In D5296#78834, @yuja wrote:

diff --git a/mercurial/store.py b/mercurial/store.py

    • a/mercurial/store.py +++ b/mercurial/store.py @@ -461,13 +461,13 @@
      1. skip nonexistent file self.entries = set() return
  • self.entries = set(decodedir(fp.read()).splitlines())
  • if '' in self.entries:
  • fp.seek(0)
  • for n, line in enumerate(util.iterfile(fp)):
  • if not line.rstrip('\n'):
  • t = _('invalid entry in fncache, line %d') % (n + 1)
  • raise error.Abort(t) + self.entries = set() + for n, line in enumerate(util.iterfile(fp)): + entry = line.rstrip('\n') + if not entry: + t = _('invalid entry in fncache, line %d') % (n + 1) + raise error.Abort(t) + self.entries.add(decodedir(entry))

This goes the opposite direction to 9fca5b056c0a. I guess the current code
would be faster if the fncache is around ~10MB. Can you benchmark it?

Yeah the current code is much faster if fncache is not large.

Seeing the performance benefit it brings on our repo, I want to try other ways we can do this. Do we like having a conditional which checks the size of fncache and choose one of the approaches depending on how large it is?

If size(fncache) < 50M, use the current approach
else, use the approach described in this patch

@yuja what do you think?

I suspect 9fca5b056c0a made things faster because the code before was using 1 I/O operation for every entry. I would also not be surprised if CPython from that era did something very inefficient with regards to line reading.

The current code is pretty bad because it buffers the entire file content in memory! I agree we should change it.

I like this patch as written. If profiling shows it to be slow, I think there is room to optimize util.iterfile() or even to teach the vfs layer how to efficiently open files for line-based I/O. This is something I could help optimize if needed.

While I'm here, the fncache file being a newline delimited list of full file paths is kinda ridiculous. We could do much better by using compression and/or a more complicated data structure. It is kinda silly that we have to load this decoded data structure into memory. So if your file on disk is ~100MB, you are going to have a Python set that also consumes ~100MB. That's really not great.

yuja added a comment.Feb 25 2019, 10:03 PM

(resend without the "On ... wrote:" line)

Seeing the performance benefit it brings on our repo, I want to try other ways we can do this. Do we like having a conditional which checks the size of fncache and choose one of the approaches depending on how large it is?

If size(fncache) < 50M, use the current approach
else, use the approach described in this patch

Suppose the current code is fast because it avoids running Python in loop,
I think it can be extended to not use too much memory.

chunk = b''
while True:
    chunk += fp.read(chunk_size)  # maybe ~10MB?
    p = chunk.rindex(b'\n')  # need to handle error
    decodedir(chunk[:p + 1])...
    chunk = chunk[p + 1:]

Just an idea. Not profiled.

pulkit updated this revision to Diff 14255.Feb 27 2019, 9:05 AM
pulkit edited the summary of this revision. (Show Details)
In D5296#87892, @yuja wrote:

(resend without the "On ... wrote:" line)

Seeing the performance benefit it brings on our repo, I want to try other ways we can do this. Do we like having a conditional which checks the size of fncache and choose one of the approaches depending on how large it is?

If size(fncache) < 50M, use the current approach
else, use the approach described in this patch

Suppose the current code is fast because it avoids running Python in loop,
I think it can be extended to not use too much memory.

chunk = b''
while True:
    chunk += fp.read(chunk_size)  # maybe ~10MB?
    p = chunk.rindex(b'\n')  # need to handle error
    decodedir(chunk[:p + 1])...
    chunk = chunk[p + 1:]

Just an idea. Not profiled.

I implemented this idea. It saves ~1 sec on perffncacheload for us when chunksize is taken 1 MB.

I suspect 9fca5b056c0a made things faster because the code before was using 1 I/O operation for every entry. I would also not be surprised if CPython from that era did something very inefficient with regards to line reading.

The current code is pretty bad because it buffers the entire file content in memory! I agree we should change it.

I like this patch as written. If profiling shows it to be slow, I think there is room to optimize util.iterfile() or even to teach the vfs layer how to efficiently open files for line-based I/O. This is something I could help optimize if needed.

While I'm here, the fncache file being a newline delimited list of full file paths is kinda ridiculous. We could do much better by using compression and/or a more complicated data structure. It is kinda silly that we have to load this decoded data structure into memory. So if your file on disk is ~100MB, you are going to have a Python set that also consumes ~100MB. That's really not great.

I agree. We should either use compression or use a much complicated data structure here. Right now there is no way to check whether an entry exists or not without parsing the whole fncache. It does not affect much on small repositories but on large repositories such as mozilla, the difference of parsing the whole fncache is noticable. We have perffncacheload taking ~4 seconds :/

pulkit updated this revision to Diff 14256.Feb 27 2019, 12:47 PM
pulkit edited the summary of this revision. (Show Details)Feb 27 2019, 1:05 PM
yuja added a comment.Feb 27 2019, 8:03 PM
  • self.entries = set(decodedir(fp.read()).splitlines()) + + self.entries = [] + totalsize = self.vfs.stat('fncache').st_size

I don't think totalsize has to be stat()ed. We can just loop over until
fp.read() reaches to EOF. It's unreliable to assume that the stat('fncache')
see the identical file to the fp open()ed before.

+ chunksize = (10 ** 6) # 1 MB
+ chunk = b''
+ chunksize = min(totalsize, chunksize)

Do we have any test covering totalsize > chunksize case?

+ totalsize -= chunksize
+ while chunksize:
+ chunk += fp.read(chunksize)
+ try:
+ p = chunk.rindex(b'\n')
+ self.entries.extend(decodedir(chunk[:p + 1]).splitlines())

Any reason to not build a set directly?

+ chunk = chunk[p + 1:]
+ except ValueError:
+ # substring '\n' not found, maybe the entry is bigger than the
+ # chunksize, so let's keep iterating
+ pass
+ chunksize = min(totalsize, chunksize)
+ totalsize -= chunksize
+
+ self.entries = set(self.entries)

pulkit updated this revision to Diff 14529.Mar 16 2019, 8:33 PM
In D5296#88016, @yuja wrote:
  • self.entries = set(decodedir(fp.read()).splitlines()) + + self.entries = [] + totalsize = self.vfs.stat('fncache').st_size

I don't think totalsize has to be stat()ed. We can just loop over until
fp.read() reaches to EOF. It's unreliable to assume that the stat('fncache')
see the identical file to the fp open()ed before.

+ chunksize = (10 ** 6) # 1 MB
+ chunk = b''
+ chunksize = min(totalsize, chunksize)

Do we have any test covering totalsize > chunksize case?

Right now this patch does not have any tests. How should I add them?

  1. add a debug config option and pass that to fncachestore and then to fncache
  2. have a function which returns the chunk_size, write an extenion in tests which wrap that function and enable that extension in tests

I think 2) is better option here. I am unable to think of any solution expect this two. I think you will have better ideas here. What do you think?

yuja added a comment.Mar 16 2019, 9:51 PM

+ for c in iter(functools.partial(fp.read, chunksize), b''):
+ chunk += c
+ try:
+ p = chunk.rindex(b'\n')
+ self.entries |= set(decodedir(chunk[:p + 1]).splitlines())

Nit: you can entries.update(any_iterable) which would be slightly faster
when bucket size changes.

Right now this patch does not have any tests. How should I add them?

1. add a debug config option and pass that to fncachestore and then to fncache
2. have a function which returns the chunk_size, write an extenion in tests which wrap that function and enable that extension in tests

Something like 2. chunksize can be a module constant, which can later be
updated by an extension. That's probably the easiest option.

pulkit updated this revision to Diff 14535.Mar 17 2019, 10:54 AM
pulkit edited the summary of this revision. (Show Details)
pulkit updated this revision to Diff 14541.Mar 18 2019, 7:32 AM
This revision was automatically updated to reflect the committed changes.
yuja added a comment.Mar 18 2019, 8:29 AM

Queued, thanks.

+ self.entries = set()
+ chunk = b''
+ for c in iter(functools.partial(fp.read, fncache_chunksize), b''):
+ chunk += c
+ try:
+ p = chunk.rindex(b'\n')
+ self.entries.update(decodedir(chunk[:p + 1]).splitlines())
+ chunk = chunk[p + 1:]
+ except ValueError:
+ # substring '\n' not found, maybe the entry is bigger than the
+ # chunksize, so let's keep iterating
+ pass

We might want to check if the chunk is fully consumed. If the file doesn't
end with '\n', which I think is invalid though, the last line would be silently
ignored.