Page MenuHomePhabricator

grep: make grep search on working directory by default
AbandonedPublic

Authored by durin42 on Mar 23 2018, 4:05 PM.

Details

Reviewers
sangeet259
Group Reviewers
hg-reviewers
Summary

The earlier behaviour of grep was quite counter intuitive as it searches
history by default. It throws up matches from files which have been
deleted and commited out. This patch handles only the default behaviour
by iterating on the files being tracked by the workingctx are
not marked for removal. The node is set by useing scmutil.binnode(ctx).
All the tests of hg grep are updated to reflected this change.
Additional tests have been added to show the new behaviour.

Diff Detail

Repository
rHG Mercurial
Lint
Lint Skipped
Unit
Unit Tests Skipped

Event Timeline

sangeet259 created this revision.Mar 23 2018, 4:05 PM
sangeet259 updated this revision to Diff 7262.Mar 23 2018, 4:14 PM
sangeet259 edited the summary of this revision. (Show Details)Mar 24 2018, 3:21 AM
sangeet259 updated this revision to Diff 7266.
pulkit added a subscriber: pulkit.Mar 24 2018, 3:39 AM

No need to add Edit1, Edit2 in commit message. If something changes from one version to another that deserves mention, include that in commit message as normal.

mercurial/commands.py
2577

Nit: no need to add bool() calls.

2578

The above if condition wants opts['rev'] to be empty and here you are using it's value.

tests/test-grep.t
341–366

For better testing, please add untracked files which can be probable match. Also I suspect there will be existing tests for hg grep which should break because of new behavior.

355

what's this None here?

sangeet259 edited the summary of this revision. (Show Details)Mar 24 2018, 6:08 AM
sangeet259 updated this revision to Diff 7267.
yuja added a subscriber: yuja.Mar 24 2018, 6:43 AM

Perhaps we can start with adding an experimental option to grep files
including unchanged ones?

IIUC, the new default behavior is something like hg grep -r "wdir()" --all-files,
which is basically s/ctx.files()/ctx/.
(needless to say --all-files is a bad name, and the following patch is just a hack.)

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -1835,7 +1835,7 @@ def walkchangerevs(repo, match, opts, pr
     if not revs:
         return []
     wanted = set()
-    slowpath = match.anypats() or (not match.always() and opts.get('removed'))
+    slowpath = True
     fncache = {}
     change = repo.changectx
 
@@ -1889,7 +1889,7 @@ def walkchangerevs(repo, match, opts, pr
                 else:
                     self.revs.discard(value)
                     ctx = change(value)
-                    matches = [f for f in ctx.files() if match(f)]
+                    matches = [f for f in ctx if match(f)]
                     if matches:
                         fncache[value] = matches
                         self.set.add(value)
@@ -1939,7 +1939,7 @@ def walkchangerevs(repo, match, opts, pr
                 ctx = change(rev)
                 if not fns:
                     def fns_generator():
-                        for f in ctx.files():
+                        for f in ctx:
                             if match(f):
                                 yield f
                     fns = fns_generator()
yuja added inline comments.Mar 24 2018, 7:11 AM
mercurial/commands.py
2474

Better to test if ctx is a workingctx (i.e. ctx.rev() is None).

Another idea is to make wctx.filenode() return wdirid and catch
WdirUnsupported exception
to fall back to the slow path.

2475

Nit: fctx.isbinary() is preferred.

av6 added a subscriber: av6.Mar 24 2018, 8:06 AM
av6 added inline comments.
mercurial/commands.py
2487

This line looks identical to the one later on, with Pulkit's comment about usage of bool().

2487–2490

This block is not indented enough, but see my other comment below.

2494

Since the difference in both branches for this if-else block seems to be only in this line, and just one value, it probably makes sense to store said value in a variable and use it in place of False/True and reduce duplication.

sangeet259 marked 7 inline comments as done.Mar 25 2018, 3:17 AM
sangeet259 updated this revision to Diff 7280.

@yuja Can you please clarify this a bit more?

Perhaps we can start with adding an experimental option to grep files
including unchanged ones?"

mercurial/commands.py
2474

Sure, pushing the first approach now.
But, I think the second approach fits the "Easier to ask for forgiveness than permission" philosophy of python.

Currently the filenode of workingctx returns this: return self._fileinfo(path)[0] . Shall I change that to returning wdirid?

Also can you please highlight the slowpath part? Any reference or links to mailing lists, where I can learn more about it.

2475

Done. Any reason why that is better?

2494

Great. Thanks !

2578

I could have made the call by simply passing '' instead of opts.get('rev') but I chose this because I am planning to build upon this to handle revisions as well.

tests/test-grep.t
355

The None was a result of displaying revision in the ui.write. Removed that. I had forgotten to update this result.

yuja added a comment.Mar 25 2018, 4:36 AM

@yuja Can you please clarify this a bit more?

Perhaps we can start with adding an experimental option to grep files
including unchanged ones?"

This patch appears to do 3 separate things by adding a big "if" branch.

  • fix grep -r 'wdir()'
  • add mode to include unmodified files
  • change the default

It's probably better to write one (or more) patches for each change, in a way
not duplicating code for each combination of possible options.

mercurial/commands.py
2474

Shall I change that to returning wdirid?

That isn't easy to answer because wctx.filenode() can return
another pseudo
hash (e.g. 000000000000000added) if wctx._manifest is preloaded.

highlight the slowpath part

It's quite old, so I have no concrete idea. But maybe reusing filelog objects is the key.

2475

Extensions may override it to provide a faster implementation.

yuja added inline comments.Mar 25 2018, 5:57 AM
mercurial/commands.py
2474

I did try that. D2940 .. D2942

No idea if they are good or bad.

sangeet259 updated this revision to Diff 7289.Mar 26 2018, 9:01 AM
sangeet259 marked 2 inline comments as done.Apr 17 2018, 11:05 AM

@yuja

fix grep -r 'wdir()'

What exactly do you mean by fixing 'grep -r wdir()' ?

yuja added a comment.EditedApr 19 2018, 8:39 AM

What exactly do you mean by fixing 'grep -r wdir()' ?

Make it return the same result as

$ hg ci -m 'this was wdir()'
$ hg grep -r .

(except the revision number/hash)

@yuja
So I was working on it, but the way I tried to correct "wdir()" is by handling it as a special case, so introducing an "if" block again similar to this.
But you were asking to eliminate the if/else block altogether, I wonder if there is way ?

yuja added a comment.May 25 2018, 9:14 AM
So I was working on it, but the way I tried to correct "wdir()" is by handling it as a special case, so introducing an "if" block again similar to this.
But you were asking to eliminate the if/else block altogether, I wonder if there is way ?

IIRC, my point was to avoid adding *big* if/else branch which will be likely
to introduce untested code paths.

I don't think there would be a way to go without if/else or try/except as
long as we use the low-level filelog API for performance reasons. Perhaps,
we'll have to catch WdirUnsupported error to fall back to the filectx API.

try:
    flog.read(fnode)
except error.WdirUnsupported:
    ctx[fn].data()

Perhaps, we'll have to catch WdirUnsupported error to fall back to the filectx API.

This seems a good idea.
Trying it.

Hey @yuja
Can you explain why you did this ?

  • slowpath = match.anypats() or (not match.always() and opts.get('removed'))

+ slowpath = True

yuja added a comment.Jun 6 2018, 10:11 AM
Hey @yuja 
Can you explain why you did this ?
> - slowpath = match.anypats() or (not match.always() and opts.get('removed')) +    slowpath = True

From my vague memory, I guess I would think that if we do s/ctx.files()/ctx/
(which means we're looking for revisions where files exist, not files change),
the use of walkfilerevs() is not valid since filelogs only contain "changes."

To be clear, I'm not saying that will be the desired behavior of hg grep of
"--all-files" of multiple revisions. It might be better to skip intermediate
revisions where files are unchanged.

sangeet259 added a comment.EditedJun 11 2018, 11:43 AM

@yuja I am adding an --unmodified flag to change to change the mode to grep on the unmodified files as well.
What I am trying to do is ::

def fns_generator():
    if --unmodified:
        for f in ctx:
            for f in ctx:
                if match(f):
                    yield f
    else :
        for f in ctx.files():
            for f in ctx:
                if match(f):
                    yield f

Currently, this would work for a single revision, so I am modifying the prep function in grep.
Shall I continue and work on sending the required patch ?

martinvonz added inline comments.
mercurial/commands.py
2575–2585

nit: can you change 'if part' to 'if-part' or '"if" part' to make it easier to parse? Also drop the trailing backslash.

Actually, the comment doesn't seem to add any useful information, so I'd suggest either removing it or changing it to something like "When neither -r nor --all is passed, search the working copy" (if that's accurate)

2577

drop one of the spaces before "or"

2578

I agree with Pulkit: it's clearer to change this to

ctx = repo[None]
m = scmutil.match(ctx, pats, opts, default='relglob')

for now and start calling revsingle() in a later patch

2585

can you drop "rev is None" for now (since it's always true)?

2585

It's surprising to me that ctx.matches() can include files that are not tracked. I'll see if I can send a patch to change that. (No action expected from you)

tests/test-grep.t
293–294

Perhaps it's better to preserve the old behavior here (by adding --all, I think)? I'm not sure what the reason for this call is, though, so perhaps it could instead be removed?

@martinvonz, thanks for the reviews, but @yuja has asked me to break this patch into three patches. So I guess this is not going to be merged anyway.

Currently, I am trying to add a flag like --all_files or --unmodified which will enable one to grep all the files that were tracked by that revision and not just the changed files.

yuja added a comment.Jun 12 2018, 10:42 AM
@yuja  I am adding an --unmodified flag to change to change the mode to grep on the unmodified files as well.
What I am trying to do is ::
  def fns_generator():
      if --unmodified:
          for f in ctx:
              for f in ctx:
                  if match(f):
                      yield f
      else :
          for f in ctx.files():
              for f in ctx:
                  if match(f):
                      yield f

Nit: if can be narrowed to ctx/ctx.files().

if --unmodified:
    fiter = iter(ctx)
else:
    fiter = ctx.files()
Currently, this would work for a single revision, so I am modifying the `prep` function in grep.
Shall I continue and send the patch ?

Seems fine. It should be generally encouraged to send patches than discussing
for long without a working code.

but @yuja has asked me to break this patch into three patches. So I guess
this is not going to be merged anyway.

To be clear, I suggested to split patches because the original code seemed
to add a single knob to switch two more features at once.

durin42 added a subscriber: durin42.Jul 3 2018, 2:08 PM

It looks like this is obsoleted by D3826?

durin42 commandeered this revision.
durin42 abandoned this revision.