Page MenuHomePhabricator

git: skeleton of a new extension to _directly_ operate on git repos
ClosedPublic

Authored by durin42 on Aug 16 2019, 4:54 PM.

Details

Summary

This is based in part of work I did years ago in hgit, but it's mostly
new code since I'm using pygit2 instead of dulwich and the hg storage
interfaces have improved. Some cleanup of old hgit code by Pulkit,
which I greatly appreciate.

test-git-interop.t does not cover a whole lot of cases, but it
passes. It includes status, diff, making a new commit, and hg annotate
working on the git repository.

This is _not_ (yet) production quality code: this is an
experiment. Known technical debt lurking in this implementation:

  • Writing bookmarks just totally ignores transactions.
  • The way progress is threaded down into the gitstore is awful.
  • Ideally we'd find a way to incrementally reindex DAGs. I'm not sure how to do that efficiently, so we might need a "known only fast-forwards" mode on the DAG indexer for use on hg commit and friends.
  • We don't even _try_ to do anything reasonable for hg pull or hg push.
  • Mercurial need an interface for the changelog type.

Tests currently require git 2.24 as far as I'm aware: git status has
some changed output that I didn't try and handle in a compatible way.

This patch has produced some interesting cleanups, most recently on
the manifest type. I expect continuing down this road will produce
other meritorious cleanups throughout our code.

Diff Detail

Repository
rHG Mercurial
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

durin42 created this revision.Aug 16 2019, 4:54 PM
durin42 edited the summary of this revision. (Show Details)
durin42 updated this revision to Diff 16293.
durin42 updated this revision to Diff 16387.Sep 5 2019, 3:20 PM
durin42 edited the summary of this revision. (Show Details)Sep 16 2019, 10:26 AM
durin42 updated this revision to Diff 16554.
durin42 edited the summary of this revision. (Show Details)Sep 25 2019, 6:14 PM
durin42 updated this revision to Diff 16622.
durin42 edited the summary of this revision. (Show Details)Sep 30 2019, 3:58 PM
durin42 updated this revision to Diff 16704.
durin42 updated this revision to Diff 16714.Sep 30 2019, 6:46 PM
durin42 updated this revision to Diff 16858.Oct 5 2019, 4:28 PM
durin42 edited the summary of this revision. (Show Details)Oct 9 2019, 5:28 PM
durin42 updated this revision to Diff 17007.
durin42 edited the summary of this revision. (Show Details)Dec 16 2019, 3:46 PM
durin42 updated this revision to Diff 18756.
durin42 updated this revision to Diff 18757.Dec 16 2019, 4:04 PM

Hmm has this RFC been abandoned?

Is there relevant discussion regarding this somewhere?

I need to make some time to clean up the manifest implementation in this to land it, and then we'll need help improving it. It's not dead, just resting. :)

durin42 edited the summary of this revision. (Show Details)Feb 11 2020, 12:29 AM
durin42 updated this revision to Diff 20144.
durin42 planned changes to this revision.Feb 11 2020, 12:32 AM

Planned changes:

  • Fix up writing files not at repo root
  • Code formatting

Uploaded mainly so people don't despair. I'm much happier with the manifest implementation now, and I think we're close to having something that could be landed that others could contribute to. I don't have time to put all the polish into this that it would need, but would love to help out...

durin42 edited the summary of this revision. (Show Details)Feb 14 2020, 4:06 PM
durin42 updated this revision to Diff 20224.
durin42 updated this revision to Diff 20225.Feb 14 2020, 4:19 PM
durin42 updated this revision to Diff 20226.Feb 14 2020, 4:33 PM
durin42 edited the summary of this revision. (Show Details)Feb 14 2020, 4:44 PM
durin42 updated this revision to Diff 20230.

This is now ready for review: I would be happy to see this land, and have others contribute towards it. I don't know that I have time to do all that needs doing, but would be delighted to mentor others that want to help!

durin42 retitled this revision from git: RFC of a new extension to _directly_ operate on git repositories to git: skeleton of a new extension to _directly_ operate on git repos.Feb 14 2020, 4:47 PM
durin42 edited the summary of this revision. (Show Details)
durin42 updated this revision to Diff 20300.Feb 25 2020, 1:52 PM

Not done reviewing, but I need to switch over to other work. Here are a few comments for now.

hgext/git/__init__.py
39–40

What blocks that?

48

Why not @util.propertycache and do away with self._db_handle?

69–73

Or maybe pygit2 takes care of locking while it updates? So I wonder if this is fine the way it is. No action required.

126

nit: drop the leading \n and teach people to include newline at EOF instead?

127–130

Did you intend to call the file requires and not need this-is-git? I think this extension should also register with featuresetupfuncs.

martinvonz added inline comments.Feb 27 2020, 7:36 PM
hgext/git/gitlog.py
30

Could we also get __iter__? We can of course add that later, but maybe it seems easy to add anyway (revlog.py has return iter(pycompat.xrange(len(self)))).

Maybe also copy the following from revlog.py?

def tiprev(self):
    return len(self.index) - 1 # well, use "len(self)" here, I guess

def tip(self):
    return self.node(self.tiprev())

def revs(self, start=0, stop=None):
    """iterate over all rev in this revlog (from start to stop)"""
    return storageutil.iterrevs(len(self), start=start, stop=stop)
151

Will the ? be replaced by abc123% or b'abc123%' on py3? (Same applies further down.)

durin42 marked 4 inline comments as done.Mar 4 2020, 1:46 PM
durin42 updated this revision to Diff 20471.
durin42 added inline comments.
hgext/git/__init__.py
39–40

I've added a TODO.md that documents what needs to happen here (missing an interface.)

69–73

Noted this in the TODO.md

126

This was intentional: I don't want to take a valid-but-missing-trailing-newline .git/info/exclude and blindly write a .hg at the end of an existing line. Are you saying the paranoia feels misguided?

127–130

It honestly didn't occur to me to use the presence of git in requires to trigger the "this is a git repo" behavior. Should I add a TODO about that?

hgext/git/gitlog.py
30

I'm hoping we can only implement iter on changelog, not on baselog. Ditto for tip and revs, but I was going to block that on the interface definition (I haven't yet seen anything that wants these methods...)

151

It should be the former. I've actually been developing this extension exclusively on Python 3, so the tests already pass on 3.

mharbison72 added inline comments.
hgext/git/__init__.py
127–130

That seems like a good idea, because there's the capability to load extensions automatically based on entries in requires. That would make it easier to not have to configure it globally.

Just sending two comments I forgot to send yesterday.

hgext/git/__init__.py
127–130

Please do, because that seems like the natural way for it to work.

hgext/git/gitlog.py
30

(I haven't yet seen anything that wants these methods...)

I found some things while playing with the extension :) I don't remember anymore what those were.

durin42 updated this revision to Diff 20536.Mar 5 2020, 1:55 PM

Alright, TODO.md updated. Let me know if there's anything missing. I'm motivated to land this, since I'm now getting more than one patch /a week/ for this code and managing it is tricky.

martinvonz added inline comments.Mar 5 2020, 5:26 PM
hgext/git/gitlog.py
151

I think I asked because I noticed that the shortest() template function always returned the full hash and things were really slow (I assume that was because of the byte string on line 173, not here). Does it work for you? (I'm fine with adding a TODO about fixing it if you see the same brokenness.)

hgext/git/gitutil.py
16–18

equivalent to return pycompat.sysstr(hex(n)) (would be using utf-8 instead of ascii, but that shouldn't matter)?

hgext/git/manifest.py
28

Call it gittreemanifest in that case?

durin42 marked 3 inline comments as done.Mar 6 2020, 2:30 PM
durin42 updated this revision to Diff 20572.
durin42 added inline comments.Mar 6 2020, 2:30 PM
hgext/git/gitlog.py
151

I'm seeing shortest() actually returning short hashes (4-bytes, like I've got configured) so I'm not sure what's broken.

I added a test, which passes under python3.

durin42 updated this revision to Diff 20573.Mar 6 2020, 2:34 PM

I think there are many py3 errors in this patch, but I'll queue it (mostly) as is for now. I'll send a patch to fix some of those py3 issues as a follow-up.

hgext/git/gitlog.py
151

That test case fails on py3 for me. I'll wrap the id + b'%' in pycompat.sysstr() in flight to make the test case pass.

This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.