This is an archive of the discontinued Mercurial Phabricator instance.

The state of this stack is not quite clear: there are abandoned revisions and the first changeset (introducing "mercurial/remotenames.py" file) seems to be missing.

In D1358#22836, @dlax wrote:

The state of this stack is not quite clear: there are abandoned revisions and the first changeset (introducing "mercurial/remotenames.py" file) seems to be missing.

Ah, I will resend the series maybe with new differentials. Thanks!

pulkit (Pulkit Goyal) <phabricator@mercurial-scm.org> writes:

pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.
REPOSITORY
rHG Mercurial
REVISION DETAIL
https://phab.mercurial-scm.org/D1358
AFFECTED FILES
mercurial/remotenames.py
CHANGE DETAILS
diff --git a/mercurial/remotenames.py b/mercurial/remotenames.py

a/mercurial/remotenames.py

+++ b/mercurial/remotenames.py

A bit jumping the gun, no? I don't think we should be skipping putting
this initial stuff into hgext/remotenames.py.

signature.asc832 BDownload

In D1358#22899, @smf wrote:
pulkit (Pulkit Goyal) <phabricator@mercurial-scm.org> writes:
pulkit created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.
REPOSITORY
rHG Mercurial
REVISION DETAIL
https://phab.mercurial-scm.org/D1358
AFFECTED FILES
mercurial/remotenames.py
CHANGE DETAILS
diff --git a/mercurial/remotenames.py b/mercurial/remotenames.py

a/mercurial/remotenames.py

+++ b/mercurial/remotenames.py
A bit jumping the gun, no? I don't think we should be skipping putting
this initial stuff into hgext/remotenames.py.

I don't think so. IMO the storage-only parts can and should go straight in core and be on by default. Exposing the data (by default, at least) we're collecting should probably go through an extension rodeo first.

durin42 (Augie Fackler) <phabricator@mercurial-scm.org> writes:

durin42 added a comment.

In https://phab.mercurial-scm.org/D1358#22899, @smf wrote:
> pulkit (Pulkit Goyal) <phabricator@mercurial-scm.org> writes:
>
> > pulkit created this revision.
> >  Herald added a subscriber: mercurial-devel.
> >  Herald added a reviewer: hg-reviewers.
> >
> > REPOSITORY
> >
> >   rHG Mercurial
> >
> > REVISION DETAIL
> >
> >   https://phab.mercurial-scm.org/D1358
> >
> > AFFECTED FILES
> >
> >   mercurial/remotenames.py
> >
> > CHANGE DETAILS
> >
> > diff --git a/mercurial/remotenames.py b/mercurial/remotenames.py
> >
> > - a/mercurial/remotenames.py +++ b/mercurial/remotenames.py
>
> A bit jumping the gun, no? I don't think we should be skipping putting
>  this initial stuff into hgext/remotenames.py.
I don't think so. IMO the storage-only parts can and should go straight in core and be on by default. Exposing the data (by default, at least) we're collecting should probably go through an extension rodeo first.

I agree with putting data generating functions and storage into core.
Though, I disagree with the approach shown so far.

We currently have at least two extensions in core that do something
similar: blackbox and journal. Including remotenames, all three have
overlapping data storage needs. The storage design of each is roughly:

blackbox logs each command
journal logs update / move commands
remotenames logs exchange commands

For something to make it to core, I would expect a unified way to deal
with this duplication. For example, one solution could be:

Create a v1 storage format api to hold this information (maybe start with sqlite or something very simple at first for the backend)

Create a helper utility (decorator, or just wrap ui, etc)

Use helper utility on all these commands to keep the data extension-agnostic

This would be a great way to improve all our data logging extensions as
well as keep things tidy in core. If it’s needed, I’d be more than happy
to guide Pulkit and do the review for him if that is a bottleneck.

signature.asc832 BDownload

In D1358#23388, @smf wrote:

For something to make it to core, I would expect a unified way to deal
with this duplication. For example, one solution could be:

Create a v1 storage format api to hold this information (maybe start with sqlite or something very simple at first for the backend)

Create a helper utility (decorator, or just wrap ui, etc)

Use helper utility on all these commands to keep the data extension-agnostic

This would be a great way to improve all our data logging extensions as
well as keep things tidy in core. If it’s needed, I’d be more than happy
to guide Pulkit and do the review for him if that is a bottleneck.

signature.asc832 BDownload

I like where your head is at, but I'm unclear if Pulkit et al will be willing to do this extra work. Pulkit, is this something that we could get you to put on the path of "remotenames in core"?

In D1358#23388, @smf wrote:
durin42 added a comment.
I don't think so. IMO the storage-only parts can and should go straight in core and be on by default. Exposing the data (by default, at least) we're collecting should probably go through an extension rodeo first.
I agree with putting data generating functions and storage into core.
Though, I disagree with the approach shown so far.
We currently have at least two extensions in core that do something
similar: blackbox and journal. Including remotenames, all three have
overlapping data storage needs. The storage design of each is roughly:

blackbox logs each command

journal logs update / move commands

remotenames logs exchange commands

For something to make it to core, I would expect a unified way to deal
with this duplication.

While I agree with you that we should have a unified way to deal with any kind of duplication, I am unable to understand the duplication between data stored by remotanemes and one stored by journal and blackbox.

Data stored for journal, blackbox and remotenames are as follows:

Journal: time, user, namespace, name of changed item, command, nodes(old and new)
Blackbox: date, user, node, process id, source and some kind of formatter message
Remotenames: node, remotepath, name of bookmark or branch

I strongly feel that data storage for journal and blackbox should be unified, I am not sure how these overlaps with remotenames one. I want to keep remotenames stored separately as information about a remoterepo can be used in operations like push and pull. @durin42 and others had some ideas around it. Moreover journal and blackbox data serves different purpose than that of remotenames.

Can you please help me understanding how you think these data can be merged in one format.

For example, one solution could be:

Create a v1 storage format api to hold this information (maybe start with sqlite or something very simple at first for the backend)

Create a helper utility (decorator, or just wrap ui, etc)

Use helper utility on all these commands to keep the data extension-agnostic

This would be a great way to improve all our data logging extensions as
well as keep things tidy in core. If it’s needed, I’d be more than happy
to guide Pulkit and do the review for him if that is a bottleneck.

I will also like to work on improving storage things in core but I don't think that intersects very much with remotenames. This thread will also interests you: https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-November/107411.html

In D1358#23388, @smf wrote:

Create a v1 storage format api to hold this information (maybe start with sqlite or something very simple at first for the backend)

On sqlite:

Some people use traditional tools like grep, less on blackbox.log. Using sqlite would make blackbox harder to use.

sqlite calls fsync by default. It's a performance issue under heavy I/O.

Create a helper utility (decorator, or just wrap ui, etc)

Use helper utility on all these commands to keep the data extension-agnostic

The current APIs (ui.log and repo.journal.record) do not look
terrible. Could you give an example of how this diff could be
improved with your proposed new API? i.e. is it becoming shorter,
less cryptic?

If the code under the new API look similar to the existing one,
I wonder whether they could be implemented independently - i.e.
this series continue using the existing API and changing the
format is independent from this series.

I'm a little confused here, in the same way Pulkit has expressed. remotenames has a couple parts related to storage: one is the storing of the remotenames themselves, and the other is tracking how they change over time. The remotename storage itself doesn't seem like it overlaps with journal or blackbox, as it's not a log. The tracking of how they change over time seems to fit the journal exactly, which is a store for tracking how various references to commits (working copy parent, bookmarks, and now remotenames) change over time.

So I guess I'm confused about what part of remotenames needs unification? They seem to either be completely orthogonal, or a complete match for an existing system.

In D1358#23588, @durin42 wrote:

In D1358#23388, @smf wrote:

For something to make it to core, I would expect a unified way to deal
with this duplication. For example, one solution could be:

Create a v1 storage format api to hold this information (maybe start with sqlite or something very simple at first for the backend)

Create a helper utility (decorator, or just wrap ui, etc)

Use helper utility on all these commands to keep the data extension-agnostic

This would be a great way to improve all our data logging extensions as
well as keep things tidy in core. If it’s needed, I’d be more than happy
to guide Pulkit and do the review for him if that is a bottleneck.

signature.asc832 BDownload

I like where your head is at, but I'm unclear if Pulkit et al will be willing to do this extra work. Pulkit, is this something that we could get you to put on the path of "remotenames in core"?

I will be happy to do that work but there are two things involved here:

Improving end-user experience
Improving our backend storage

If we do 1) after the completion of 2), then I think we won't have enough time to test the new feature in this cycle or maybe we need to wait for the next cycle. I don't see a very hard dependency here between both and they can go in parallel which I am again happy to do. @smf how does this sounds to you? If this sounds good to you, I will send a new version of the series and start a thread on mailing related to improving the storage API's.

pulkit (Pulkit Goyal) <phabricator@mercurial-scm.org> writes:

pulkit added a comment.

In https://phab.mercurial-scm.org/D1358#23388, @smf wrote:
> > durin42 added a comment.
> > 
> >   I don't think so. IMO the storage-only parts can and should go straight in core and be on by default. Exposing the data (by default, at least) we're collecting should probably go through an extension rodeo first.
>
> I agree with putting data generating functions and storage into core.
>  Though, I disagree with the approach shown so far.
>
> We currently have at least two extensions in core that do something
>  similar: blackbox and journal. Including remotenames, all three have
>  overlapping data storage needs. The storage design of each is roughly:
>
> - blackbox logs each command
> - journal logs update / move commands
> - remotenames logs exchange commands
>
>   For something to make it to core, I would expect a unified way to deal with this duplication.
While I agree with you that we should have a unified way to deal with any kind of duplication, I am unable to understand the duplication between data stored by remotanemes and one stored by journal and blackbox.
Data stored for journal, blackbox and remotenames are as follows:
Journal: time, user, namespace, name of changed item, command, nodes(old and new)
Blackbox: date, user, node, process id, source and some kind of formatter message
Remotenames: node, remotepath, name of bookmark or branch

Why not union them as such:

datetime, user, namespace, node, process id, args (name of changed item,
remotepath, etc), input (old nodes), result of command (new nodes, etc)

I strongly feel that data storage for journal and blackbox should be unified, I am not sure how these overlaps with remotenames one. I want to keep remotenames stored separately as information about a remoterepo can be used in operations like push and pull. @durin42 and others had some ideas around it. Moreover journal and blackbox data serves different purpose than that of remotenames.

I don't really see them as different purposes. Questions like, "What was
the state of my repo yesterday (journal)?" or "What was the state of
remote yesterday (remotenames, though future features that is append
only)?" or "What were the last five commands and their input
(blackbox)?"

We're talking about putting something in core and I believe we should
really think this stuff out.

signature.asc832 BDownload

In D1358#23866, @smf wrote:

pulkit (Pulkit Goyal) <phabricator@mercurial-scm.org> writes:

pulkit added a comment.

In https://phab.mercurial-scm.org/D1358#23388, @smf wrote:
> > durin42 added a comment.
> > 
> >   I don't think so. IMO the storage-only parts can and should go straight in core and be on by default. Exposing the data (by default, at least) we're collecting should probably go through an extension rodeo first.
>
> I agree with putting data generating functions and storage into core.
>  Though, I disagree with the approach shown so far.
>
> We currently have at least two extensions in core that do something
>  similar: blackbox and journal. Including remotenames, all three have
>  overlapping data storage needs. The storage design of each is roughly:
>
> - blackbox logs each command
> - journal logs update / move commands
> - remotenames logs exchange commands
>
>   For something to make it to core, I would expect a unified way to deal with this duplication.
While I agree with you that we should have a unified way to deal with any kind of duplication, I am unable to understand the duplication between data stored by remotanemes and one stored by journal and blackbox.
Data stored for journal, blackbox and remotenames are as follows:
Journal: time, user, namespace, name of changed item, command, nodes(old and new)
Blackbox: date, user, node, process id, source and some kind of formatter message
Remotenames: node, remotepath, name of bookmark or branch

Why not union them as such:
datetime, user, namespace, node, process id, args (name of changed item,
remotepath, etc), input (old nodes), result of command (new nodes, etc)

remotenames don't need datetime, user, process id, result of commands and the etc's.
if we store all the journal data, blackbox data and remotenames data all at one place, we will certainly need one more field which will store whether it's remotenames data, journal or blackbox as otherwise it will be slow to read all the data and find the data we need.

I strongly feel that data storage for journal and blackbox should be unified, I am not sure how these overlaps with remotenames one. I want to keep remotenames stored separately as information about a remoterepo can be used in operations like push and pull. @durin42 and others had some ideas around it. Moreover journal and blackbox data serves different purpose than that of remotenames.
I don't really see them as different purposes. Questions like, "What was
the state of my repo yesterday (journal)?" or "What was the state of
remote yesterday (remotenames, though future features that is append
only)?" or "What were the last five commands and their input
(blackbox)?"

Sorry Sean, but I do see them as different purposes because of the following points:

This information is related to a remote repo. It will be very good if we have storage of remote repo information and local repo information separately.
remotenames will be a part of the UI (commands like log etc.)
There are revsets related to remotenames
remotenames will have new namespaces
There is lot of data stored by journal and blackbox which is not required for remotenames.

Even we get to a stage where we pulls the journal or blackbox information from a remote repo and store them locally, we should store them atleast in different files as that of local journal and blackbox information.

If we want to unify the storage for remote and local things, then we should think about having a unified storage for bookmarks(local+remote) and branches(local+remote).

We're talking about putting something in core and I believe we should
really think this stuff out.

pulkit (Pulkit Goyal) <phabricator@mercurial-scm.org> writes:

pulkit added a comment.

In https://phab.mercurial-scm.org/D1358#23866, @smf wrote:
> pulkit (Pulkit Goyal) <phabricator@mercurial-scm.org> writes:
>
> > pulkit added a comment.
> >
> >   In https://phab.mercurial-scm.org/D1358#23388, @smf wrote:
> >
> >   > > durin42 added a comment.
> >   > >
> >   > >   I don't think so. IMO the storage-only parts can and should go straight in core and be on by default. Exposing the data (by default, at least) we're collecting should probably go through an extension rodeo first.
> >   >
> >   > I agree with putting data generating functions and storage into core.
> >   >  Though, I disagree with the approach shown so far.
> >   >
> >   > We currently have at least two extensions in core that do something
> >   >  similar: blackbox and journal. Including remotenames, all three have
> >   >  overlapping data storage needs. The storage design of each is roughly:
> >   >
> >   > - blackbox logs each command
> >   > - journal logs update / move commands
> >   > - remotenames logs exchange commands
> >   >
> >   >   For something to make it to core, I would expect a unified way to deal with this duplication.
> >
> >
> >   While I agree with you that we should have a unified way to deal with any kind of duplication, I am unable to understand the duplication between data stored by remotanemes and one stored by journal and blackbox.
> >
> >   Data stored for journal, blackbox and remotenames are as follows:
> >
> >   Journal: time, user, namespace, name of changed item, command, nodes(old and new)
> >   Blackbox: date, user, node, process id, source and some kind of formatter message
> >   Remotenames: node, remotepath, name of bookmark or branch
>
> Why not union them as such:
>
> datetime, user, namespace, node, process id, args (name of changed item,
>  remotepath, etc), input (old nodes), result of command (new nodes, etc)
1. remotenames don't need datetime, user, process id, result of commands and the etc's.

Just because we don't use them now doesn't mean we won't in future (in
fact, I could easily see a use-case for seeing the previous locations of
a remote repo by date or even user).

if we store all the journal data, blackbox data and remotenames data all at one place, we will certainly need one more field which will store whether it's remotenames data, journal or blackbox as otherwise it will be slow to read all the data and find the data we need.

Perhaps but not necessarily. We could just query, for instance, all the
times we've pulled or pushed. The resulting query set would have the
same-ish data as remotenames has now.

>
>
>>   I strongly feel that data storage for journal and blackbox should be unified, I am not sure how these overlaps with remotenames one. I want to keep remotenames stored separately as information about a remoterepo can be used in operations like push and pull. @durin42 and others had some ideas around it. Moreover journal and blackbox data serves different purpose than that of remotenames.
>
> I don't really see them as different purposes. Questions like, "What was
>  the state of my repo yesterday (journal)?" or "What was the state of
>  remote yesterday (remotenames, though future features that is append
>  only)?" or "What were the last five commands and their input
>  (blackbox)?"
Sorry Sean, but I do see them as different purposes because of the following points:
1. This information is related to a remote repo. It will be very good if we have storage of remote repo information and local repo information separately.

Perhaps. But that sounds more like an implementation detail than a
logging data design.

remotenames will be a part of the UI (commands like log etc.)

Sure, but so could journal and others?

There are revsets related to remotenames

Are you asking points (2) and (3) due to performance reasons?

remotenames will have new namespaces

Only when reading the data which will still live in an extension for now.

There is lot of data stored by journal and blackbox which is not required for remotenames.

I mean, sort of. After using remotenames for so long, it would have been
nice if I stored the datetime, proc, command, etc because I found myself
needing that information sometimes.

Even we get to a stage where we pulls the journal or blackbox information from a remote repo and store them locally, we should store them atleast in different files as that of local journal and blackbox information.

My main point in this thread is that this information (journal,
blackbox, remotenames, etc) should be, at least from core's perspective,
agnostic of which extension wants that data. By writing code with only
one extension in mind, we end up in a place where extensions may have
(unwritten) dependencies. I ran into this often with hg-git,
hgsubversion, and remotenames.

Keeping the data in a centralized place through a common api (e.g.
something akin to namespace) we avoid having extensions step on each
other's toes.

If we want to unify the storage for remote and local things, then we should think about having a unified storage for bookmarks(local+remote) and branches(local+remote).

That's not a bad idea, though a bit more specialized than a data store.
I have long wanted a generic cache object so that extensions didn't have
to implement their own.

Journal seems to have a lot of similar code to remotenames. In fact,
probably even better code because I see it has an actual storage class.
We could start by abstracting out journalstorage and friends to another
module and then importing that into remotenames for use as a storage
layer.

Restricting to just journal and remotenames seems easier for now.
Something like this is what I have in mind:

abstract out storage of journal (hgext/extstorage.py or something)
use new module in remotenames
iterate on usage
graduate extstorage to core

I might be able to take a crack at (1) this weekend.

signature.asc832 BDownload

There seems like two use cases for remotenames: knowledge about the most recently seen location of each remote bookmark, and knowledge about where the remote bookmarks have been over time. I think 99% of the benefit of remotenames comes from the first part. Building a storage layer which suits both the journal and remotenames seems like massive scope creep to address the 1% use case. Even if we did unify the storage models, I still don't believe remotenames and the journal could share a common storage format because they don't have similar access patterns. Remotenames is much more of a random access dict-like object, while the journal is much more of a sequential log. If I want to lookup the value of the latest remote master I don't want to deal with scanning some log of recent activity, and if I want to maintain a log of recent activity I don't want to build an index that allows for random access.

I have left out this patch and updated the series with new differentials D1547 to D1551.

durin42 mentioned this in D1551: remotenames: consider existing data while storing newer data.Dec 6 2017, 2:20 PM

Revision Contents
Changeset List

			Path	Packages
M			mercurial/remotenames.py (20 lines)

Status	Author	Revision
Abandoned	pulkit	D1356 remotenames: add test showing overwriting on remotenames data
Abandoned	pulkit	D941 remotenames: add a function to return the vfs object
Abandoned	pulkit	D1358 remotenames: store journal entry for bookmarks if journal is loaded
Abandoned	pulkit	D1357 remotenames: consider existing data while storing newer data
Abandoned	pulkit	D940 remotenames: add functions to read remotenames data from .hg/remotenames/
Abandoned	pulkit	D939 remotenames: add functionality to store remotenames under .hg/hgremotenames/
Abandoned	pulkit	D938 remotenames: move utility functions to join and split names to core
Abandoned	pulkit	D937 remotenames: move function to pull remotenames from the remoterepo to core

Diff 3397

mercurial/remotenames.py

	# remotenames.py			# remotenames.py
	#			#
	# Copyright 2017 Augie Fackler <augie@google.com>			# Copyright 2017 Augie Fackler <augie@google.com>
	# Copyright 2017 Sean Farley <sean@farley.io>			# Copyright 2017 Sean Farley <sean@farley.io>
	#			#
	# This software may be used and distributed according to the terms of the			# This software may be used and distributed according to the terms of the
	# GNU General Public License version 2 or any later version.			# GNU General Public License version 2 or any later version.

	from __future__ import absolute_import			from __future__ import absolute_import

	from .node import hex			from .node import (
				bin,
				hex,
				)

	from . import (			from . import (
				util,
	vfs as vfsmod,			vfs as vfsmod,
	)			)

				# namespace to use when recording an hg journal entry
				journalremotebookmarktype = 'remotebookmark'
	# directory name in .hg/ in which remotenames files will be present			# directory name in .hg/ in which remotenames files will be present
	remotenamedir = 'remotenames'			remotenamedir = 'remotenames'

	def readremotenamefile(repo, filename):			def readremotenamefile(repo, filename):
	"""			"""
	reads a file from .hg/remotenames/ directory and yields it's content			reads a file from .hg/remotenames/ directory and yields it's content
	filename: the file to be read			filename: the file to be read
	yield a tuple (node, remotepath, name)			yield a tuple (node, remotepath, name)
	save remote bookmarks in .hg/remotenames/bookmarks.			save remote bookmarks in .hg/remotenames/bookmarks.
	The format of the data stored will be:			The format of the data stored will be:

	`node remotepath bookmarkname`			`node remotepath bookmarkname`

	bookmarks is a dictionary of remote bookmarks.			bookmarks is a dictionary of remote bookmarks.
	"""			"""
	olddata = set(readremotenamefile(repo, 'bookmarks'))			olddata = set(readremotenamefile(repo, 'bookmarks'))
				oldbooks = {}

	vfs = vfsmod.vfs(repo.vfs.join(remotenamedir))			vfs = vfsmod.vfs(repo.vfs.join(remotenamedir))
	f = vfs('bookmarks', 'w', atomictemp=True)			f = vfs('bookmarks', 'w', atomictemp=True)

	# re-save the data from a different remote than this one.			# re-save the data from a different remote than this one.
	for node, oldpath, rname in sorted(olddata):			for node, oldpath, rname in sorted(olddata):
	if oldpath != remotepath:			if oldpath != remotepath:
	f.write('%s %s %s\n' % (node, oldpath, rname))			f.write('%s %s %s\n' % (node, oldpath, rname))
				else:
				oldbooks[rname] = node

				# record a journal entry if journal is loaded
				if util.safehasattr(repo, 'journal'):
				for bname, newnode in bookmarks.iteritems():
				oldnode = oldbooks.get(bname, hex(nullid))
				if oldnode != newnode:
				joinedremotename = remotepath + '/' + bname
				repo.journal.record(journalremotebookmarktype, joinedremotename,
				bin(oldnode), bin(newnode))

	for bookmark, node in sorted(bookmarks.iteritems()):			for bookmark, node in sorted(bookmarks.iteritems()):
	if node:			if node:
	f.write('%s %s %s\n' % (node, remotepath, bookmark))			f.write('%s %s %s\n' % (node, remotepath, bookmark))
	f.close()			f.close()

	def saveremotebranches(repo, remotepath, branches):			def saveremotebranches(repo, remotepath, branches):
	"""			"""

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	3397			Nov 10 2017, 2:01 PM	★	★

This is an archive of the discontinued Mercurial Phabricator instance.

remotenames: store journal entry for bookmarks if journal is loadedAbandonedPublic

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 3397

mercurial/remotenames.py

remotenames: store journal entry for bookmarks if journal is loaded
AbandonedPublic

Revision Contents
Changeset List