This is an archive of the discontinued Mercurial Phabricator instance.

fsmonitor: layer on another hack in bser.c for os.stat() compat (issue5811)
ClosedPublic

Authored by durin42 on Mar 24 2018, 2:30 PM.

Details

Summary

It's unclear to me how these bserobj_tuple objects are used, other
than as stat objects. This should fix fsmonitor in the wake of
ffa3026d4196 and similar changes. I regret the hack here, but the code
already has plenty of hg-specific hacks. :(

It feels like we should be able to use int(result.st_mtime) globally,
but that doesn't work. See issue4836 for a bug that was hard to track
down relating to rounding behavior causing very subtle dirstate
problems.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

durin42 created this revision.Mar 24 2018, 2:30 PM

@indygreg I think you saw this failure mode, I'd appreciate it if you could check if this fixes the watchman failures I introduced (I don't use watchman, and so I'm not quite sure how to reproduce.)

durin42 updated this revision to Diff 7271.Mar 24 2018, 2:52 PM
indygreg requested changes to this revision.Mar 26 2018, 12:04 PM

This still fails for me with:

  File "/home/gps/src/hg/mercurial/context.py", line 1590, in _dirstatestatus
    clean=clean, unknown=unknown)
  File "/home/gps/src/hg/mercurial/dirstate.py", line 1073, in status
    elif (time != st[stat.ST_MTIME]
IndexError: tuple index out of range

Also, hacking the low-level bser data type at the bser level feels very wrong. I would rather we roll our own C type that holds a reference to the bserObject and provides the necessary object API. Although there are already Mercurial hacks in bser.c for st_* attribute access. So I could be convinced that this hack is acceptable.

I can try debugging this failure if that would be helpful. Perhaps something in CPython land is evaluating len() and short-circuiting the [8] access.

hgext/fsmonitor/pywatchman/bser.c
102

Is there an off by 2 with 7? Shouldn't it be 9 (since accesses up to [7] - an object with 8 elements) should be allowed?

109

This leaks the st_mtime PyObject.

Also, strictly speaking, we should verify that the return value is not NULL.

This revision now requires changes to proceed.Mar 26 2018, 12:04 PM

This still fails for me with:

  File "/home/gps/src/hg/mercurial/context.py", line 1590, in _dirstatestatus
    clean=clean, unknown=unknown)
  File "/home/gps/src/hg/mercurial/dirstate.py", line 1073, in status
    elif (time != st[stat.ST_MTIME]
IndexError: tuple index out of range

Also, hacking the low-level bser data type at the bser level feels very wrong. I would rather we roll our own C type that holds a reference to the bserObject and provides the necessary object API. Although there are already Mercurial hacks in bser.c for st_* attribute access. So I could be convinced that this hack is acceptable.

Yeah, I mostly wanted to verify a fix before trying to do something principled.

I can try debugging this failure if that would be helpful. Perhaps something in CPython land is evaluating len() and short-circuiting the [8] access.

Yes, debugging would help a ton. I don't really have enough understanding to make enough headway here :(

I may look into this today. Having fsmonitor disabled makes operations on Firefox repos much slower and this is hampering my ability to hack on that repo :/

spectral added inline comments.
hgext/fsmonitor/pywatchman/bser.c
103

My system pythons (2.7.13 and 3.5.3) have the following behaviors:

≻ python -c 'from future import print_function; import os, sys, stat; print(sys.version); s=os.stat("."); print(s.st_mtime, s[stat.ST_MTIME], type(s.st_mtime), type(s[stat.ST_MTIME]), stat.ST_MTIME)'
2.7.13 (default, Nov 24 2017, 17:33:09)
[GCC 6.3.0 20170516]
1522089418.2 1522089418 <type 'float'> <type 'int'> 8

≻ python3 -c 'from future import print_function; import os, sys, stat; print(sys.version); s=os.stat("."); print(s.st_mtime, s[stat.ST_MTIME], type(s.st_mtime), type(s[stat.ST_MTIME]), stat.ST_MTIME)'
3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118]
1522089422.693618 1522089422 <class 'float'> <class 'int'> 8

Seems like we could just use int(os.stat().st_mtime) instead of os.stat()[stat.ST_MTIME] (if we really need an int instead of a float), and avoid most of the issues with attempting to access a stat_result that isn't the official one (like the one we create in cext/osutil.c, or I know other extensions create...)

I have no idea if that would solve this actual problem though, bser might be a separate problem from what I'm thinking of.

durin42 added inline comments.Mar 26 2018, 3:19 PM
hgext/fsmonitor/pywatchman/bser.c
103

We used to do int(...st_mtime) and had to stop because of rounding issues: https://bz.mercurial-scm.org/show_bug.cgi?id=4836

I managed to debug this.

st[state.ST_MTIME] will invoke PyObject_GetItem. Its implementation does this:

m = o->ob_type->tp_as_mapping;
if (m && m->mp_subscript)
    return m->mp_subscript(o, key);

if (o->ob_type->tp_as_sequence) {
    if (PyIndex_Check(key)) {
        Py_ssize_t key_value;
        key_value = PyNumber_AsSsize_t(key, PyExc_IndexError);
        if (key_value == -1 && PyErr_Occurred())
            return NULL;
        return PySequence_GetItem(o, key_value);
    }

Our bserObjectType defines tp_as_mapping and st[state.ST_MTIME] ends up calling bserobj_getattrro(). That function sees the argument is an integer and ends up calling PySequence_GetItem():

if (PyIndex_Check(name)) {
  i = PyNumber_AsSsize_t(name, PyExc_IndexError);
  if (i == -1 && PyErr_Occurred()) {
    goto bail;
  }
  ret = PySequence_GetItem(obj->values, i);
  goto bail;
}

PySequence_GetItem() does the following:

m = s->ob_type->tp_as_sequence;
if (m && m->sq_item) {
    if (i < 0) {
        if (m->sq_length) {
            Py_ssize_t l = (*m->sq_length)(s);
            if (l < 0)
                return NULL;
            i += l;
        }
    }
    return m->sq_item(s, i);
}

sq_item here is tupleitem(), not bserobj_tuple_item(). It tells the truth and IndexError is raised.

So, I think we want the hack to live in bserobj_getattrro(). I don't think the hack needs to live in bserobj_tuple_item() because that function only ever gets called through bserobj_getattrro().

durin42 marked 4 inline comments as done.Apr 9 2018, 5:22 PM
durin42 edited the summary of this revision. (Show Details)
durin42 updated this revision to Diff 7915.

PTAL - I think this should be ready to go.

indygreg accepted this revision.Apr 12 2018, 11:52 AM

This is super hacky. That's par for the course for this file, sadly.

Also, I'm kinda surprised we're fully realizing the Python tuple to represent stat results. If we made tuple attribute access lazy, that would probably make hg status a bit faster, since we never access every field of that tuple. These kinds of things never show up in Python profilers though. And it is difficult to pin performance problems on Python object creation. But we know it is a very real thing. Obviously work for another day.

This revision is now accepted and ready to land.Apr 12 2018, 11:52 AM
This revision was automatically updated to reflect the committed changes.