git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Allowing weak references to blobs and strong references to commits
@ 2015-03-31 10:07 Mike Hommey
  2015-03-31 19:55 ` Philip Oakley
  2015-03-31 20:23 ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Mike Hommey @ 2015-03-31 10:07 UTC (permalink / raw)
  To: git

Hi,

Currently, in git-cinnabar[1], I'm using a private namespace
(refs/cinnabar) for various different things:
- references to all the imported heads (which may or may not
  match remote refs),
- the last refs used for a fetch (part of the refspec protocol for
  remote-helpers)
- a branch containing mappings from mercurial sha1s to git objects.
- a branch used to store all mercurial manifests.
- a cache of some sort (used for tags only atm)

So essentially there are a bit more than twice as many refs as actually
necessary (and up to more than three when there is only one remote).

Ideally, the mercurial manifests data would use as many refs as
branches, so that their parent information wouldn't have to be guessed
from the corresponding git commits, but I didn't do that initially to
avoid making the number of necessary refs even bigger (that would make
the number three or four times as many as necessary).

I won't bother you with all the whys and hows, but that ends up being
a lot of unwanted noise for users, because many commands don't limit
themselves to refs/heads, refs/tags and refs/remotes.

One way to reduce this noise would be for me to create fake octopus
merges and reduce the number of heads to one, or at least one per
category. But this is cumbersome and would create a lot of useless
commits that would end up loose, except if they are kept forever which
seems even worse.

So I thought, since commits are already allowed in tree objects, for
submodules, why not add a bit to the mode that would tell git that
those commit object references are meant to always be there aka strong
reference, as opposed to the current weak references for submodules.
I was thinking something like 0200000, which is above S_IFMT, but I
haven't checked if mode is expected to be a short anywhere, maybe one of
the file permission flags could be abused instead (sticky bit?).

I could see this used in the future to e.g. implement a fetchable reflog
(which could be a ref to a tree with strong references to commits).

Then that got me thinking that the opposite would be useful to me as
well: I'm currently storing mercurial manifests as git trees with
(weak) commit references using the mercurial sha1s for files.
Unfortunately, that doesn't allow to store the corresponding file
permissions, so I'm going through hoops to get that. It would be simpler
for me if I could just declare files or symlinks with the right
permissions and say 'the corresponding blob doesn't need to exist'.
I'm sure other tools using git as storage would have a use for such
weak references.

What do you think about this? Does that seem reasonable to have in git
core, and if yes, how would you go about implementing it (same bit with
different meaning for blobs and commits (or would you rather that were
only done for commits and not for blobs)? what should I be careful
about, besides making sure gc and fsck don't mess up?)

Cheers,

Mike

1. a git-remote-hg tool, https://github.com/glandium/git-cinnabar/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 10:07 Allowing weak references to blobs and strong references to commits Mike Hommey
@ 2015-03-31 19:55 ` Philip Oakley
  2015-03-31 21:08   ` Randall S. Becker
  2015-03-31 20:23 ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Philip Oakley @ 2015-03-31 19:55 UTC (permalink / raw)
  To: Mike Hommey, git

From: "Mike Hommey" <mh@glandium.org>
[...]

> So I thought, since commits are already allowed in tree objects, for
> submodules, why not add a bit to the mode that would tell git that
> those commit object references are meant to always be there aka strong
> reference, as opposed to the current weak references for submodules.
> I was thinking something like 0200000, which is above S_IFMT, but I
> haven't checked if mode is expected to be a short anywhere, maybe one 
> of
> the file permission flags could be abused instead (sticky bit?).
>
> I could see this used in the future to e.g. implement a fetchable 
> reflog
> (which could be a ref to a tree with strong references to commits).
>
> Then that got me thinking that the opposite would be useful to me as
> well: I'm currently storing mercurial manifests as git trees with
> (weak) commit references using the mercurial sha1s for files.
> Unfortunately, that doesn't allow to store the corresponding file
> permissions, so I'm going through hoops to get that. It would be 
> simpler
> for me if I could just declare files or symlinks with the right
> permissions and say 'the corresponding blob doesn't need to exist'.
> I'm sure other tools using git as storage would have a use for such
> weak references.
>
The "weak references" idea is something that's on my back list of 
Toh-Doh's for the purpose of having a Narrow clone.

However it's not that easy as you need to consider three areas - what's 
on disk (worktree/file system), what's in the index, and what's in the 
object store and how a coherent view is kept of all three without 
breakage.

The 'Sparse Checkout' / 'Skip Worktree' (see `git help read-tree`) 
covers the first two but not the third (which submodules does) [that's 
your 'the corresponding blob doesn't need to exist' aspect from my 
perspective]


> What do you think about this? Does that seem reasonable to have in git
> core, and if yes, how would you go about implementing it (same bit 
> with
> different meaning for blobs and commits (or would you rather that were
> only done for commits and not for blobs)? what should I be careful
> about, besides making sure gc and fsck don't mess up?)
>
> Cheers,
>
> Mike
>
> 1. a git-remote-hg tool, https://github.com/glandium/git-cinnabar/
> --
Philip 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 10:07 Allowing weak references to blobs and strong references to commits Mike Hommey
  2015-03-31 19:55 ` Philip Oakley
@ 2015-03-31 20:23 ` Junio C Hamano
  2015-03-31 22:39   ` Mike Hommey
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2015-03-31 20:23 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git

Mike Hommey <mh@glandium.org> writes:

> So I thought, since commits are already allowed in tree objects, for
> submodules, why not add a bit to the mode that would tell git that
> those commit object references are meant to always be there aka strong
> reference, as opposed to the current weak references for submodules.

Unless you are recording the paths to these "commits" to be
potentially checked out on the filesystem, do not put them in a
"tree".  The entries in a tree object represent "This thing go to
this path in the working tree".

It is not clear to me (and because you said "I won't bother you with
all the whys and hows", I am guessing that it is OK for readers to
be unclear), but I think you only want to make sure "git fetch" and
"git push" transfers these objects, the graph formed by which is
*not* any part of the main history of the project.  It is perfectly
OK to represent these objects as a special purpose history and have
a ref point at its tip.  The "notes" database is represented that
way, for example.  And I do not see anything wrong to use octopus
merges in the history if you want to represent "here are the commit
objects that I care about at this point in the 'special purpose'
history (not the main history)".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: Allowing weak references to blobs and strong references to commits
  2015-03-31 19:55 ` Philip Oakley
@ 2015-03-31 21:08   ` Randall S. Becker
  0 siblings, 0 replies; 10+ messages in thread
From: Randall S. Becker @ 2015-03-31 21:08 UTC (permalink / raw)
  To: 'Philip Oakley', 'Mike Hommey', git

On March 31, 2015 3:55 PM Philip Oakley wrote:
> From: "Mike Hommey" <mh@glandium.org>
> [...]
> > So I thought, since commits are already allowed in tree objects, for
> > submodules, why not add a bit to the mode that would tell git that
> > those commit object references are meant to always be there aka strong
> > reference, as opposed to the current weak references for submodules.
> > I was thinking something like 0200000, which is above S_IFMT, but I
> > haven't checked if mode is expected to be a short anywhere, maybe one
> > of the file permission flags could be abused instead (sticky bit?).
> >
> > I could see this used in the future to e.g. implement a fetchable
> > reflog (which could be a ref to a tree with strong references to
> > commits).
> >
> > Then that got me thinking that the opposite would be useful to me as
> > well: I'm currently storing mercurial manifests as git trees with
> > (weak) commit references using the mercurial sha1s for files.
> > Unfortunately, that doesn't allow to store the corresponding file
> > permissions, so I'm going through hoops to get that. It would be
> > simpler for me if I could just declare files or symlinks with the
> > right permissions and say 'the corresponding blob doesn't need to
> > exist'.
> > I'm sure other tools using git as storage would have a use for such
> > weak references.
> >
> The "weak references" idea is something that's on my back list of
Toh-Doh's for
> the purpose of having a Narrow clone.
> 
> However it's not that easy as you need to consider three areas - what's on
disk
> (worktree/file system), what's in the index, and what's in the object
store and
> how a coherent view is kept of all three without breakage.
> 
> The 'Sparse Checkout' / 'Skip Worktree' (see `git help read-tree`) covers
the
> first two but not the third (which submodules does) [that's your 'the
> corresponding blob doesn't need to exist' aspect from my perspective]
> 
> 
> > What do you think about this? Does that seem reasonable to have in git
> > core, and if yes, how would you go about implementing it (same bit
> > with different meaning for blobs and commits (or would you rather that
> > were only done for commits and not for blobs)? what should I be
> > careful about, besides making sure gc and fsck don't mess up?)

I don't know whether this is relevant or not - forgiveness requested in
advance. It may be useful to store primarily the SHA1 for a weak object. In
a product called RMS, this was called an "External Reference". The file
itself was not stored, but its signature was. It was possible to tell that
the commit was validly and completely on disk, only if the signature matched
(so git status would know). If the file was missing, or had an invalid
signature, the working area was considered dirty (so git status would
presumably report "modified"). All signatures were stored for these types of
files, but the contents were not - hence "external". Otherwise, we stored
all other repository attributes - except the contents, with the obvious
risks. This was typically used to track versions of the compilers and
headers being used for builds, which we did not want to store in the
repository, managed by a separate systems operations group, but wanted to
know the signatures in case we had to go back in time. From my point of
view, I would like to be able to have /usr/include (example only) as a
working area where I can be 100% certain it contains what I expect it to
contain, but I don't really want to store the objects in a repository - and
may not have root anyway.

Cheers,
Randall

-- Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)/NonStop(211288444200000000)
-- In my real life, I talk too much.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 20:23 ` Junio C Hamano
@ 2015-03-31 22:39   ` Mike Hommey
  2015-03-31 23:00     ` Junio C Hamano
  2015-03-31 23:14     ` Jonathan Nieder
  0 siblings, 2 replies; 10+ messages in thread
From: Mike Hommey @ 2015-03-31 22:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Mar 31, 2015 at 01:23:23PM -0700, Junio C Hamano wrote:
> Mike Hommey <mh@glandium.org> writes:
> 
> > So I thought, since commits are already allowed in tree objects, for
> > submodules, why not add a bit to the mode that would tell git that
> > those commit object references are meant to always be there aka strong
> > reference, as opposed to the current weak references for submodules.
> 
> Unless you are recording the paths to these "commits" to be
> potentially checked out on the filesystem, do not put them in a
> "tree".  The entries in a tree object represent "This thing go to
> this path in the working tree".
> 
> It is not clear to me (and because you said "I won't bother you with
> all the whys and hows", I am guessing that it is OK for readers to
> be unclear), but I think you only want to make sure "git fetch" and
> "git push" transfers these objects, the graph formed by which is
> *not* any part of the main history of the project.

Transfer is not the main reason I have those refs, although it's a nice
plus. I'm using the git database as a storage for metadata I need to
keep track of the remote mercurial content that isn't part of the
content history represented in the corresponding git history. Obviously,
this isn't meant to be checked out, like a notes tree. The refs are only
there so that a) git-cinnabar can find its data and b) git gc doesn't
remove it.

> It is perfectly OK to represent these objects as a special purpose
> history and have a ref point at its tip.  The "notes" database is
> represented that way, for example.

Indeed, the notes database is in a similar situation. But the fact that
in my use-case (cloning Mozilla mercurial repositories) _thousands_ of
refs are involved is not really making it user-friendly. I'd rather hide
those away from users.

> And I do not see anything wrong to use octopus merges in the history
> if you want to represent "here are the commit objects that I care
> about at this point in the 'special purpose' history (not the main
> history)".

Octopus merges are limited to 16 parents. That means to merge thousands
of refs, I need to do that on at least 3 levels, involving hundreds of
commit objects, many of which would become loose quickly, or become
extra noise in the "metadata" "branches"... and while researching this
further, I realize it doesn't seem there is such a limit to the number
of parents for octopus merges. Where did I get that from? Was there such
a limit in the past or was I high?

That being said, having names associated with those tips _is_ useful to
me (it allows to know the corresponding mercurial sha1 and branch
without having to do a note lookup for each), and a tree of commits
would help, here, although I could put a tree of weak refs to commits
as the content of an octopus merge...

So I guess I could live without the strong commit refs. Weak blob refs
would still be useful, though.

Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 22:39   ` Mike Hommey
@ 2015-03-31 23:00     ` Junio C Hamano
  2015-03-31 23:14     ` Jonathan Nieder
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2015-03-31 23:00 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git

Mike Hommey <mh@glandium.org> writes:

> Octopus merges are limited to 16 parents.

Huh?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 22:39   ` Mike Hommey
  2015-03-31 23:00     ` Junio C Hamano
@ 2015-03-31 23:14     ` Jonathan Nieder
  2015-03-31 23:18       ` Jonathan Nieder
                         ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Jonathan Nieder @ 2015-03-31 23:14 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Junio C Hamano, git

Mike Hommey wrote:

> Octopus merges are limited to 16 parents.

The note about this in fast-import is out of date (e.g., see
t/t7602-merge-octopus-many.sh and v1.6.0-rc0~194, 2008-06-27).  How
about this patch?

-- >8--
Subject: fast-import doc: remove suggested 16-parent limit

Merges with an absurd number of parents are still a bad idea because
they do not render well in tools like gitk, but if they are present
in the repository being imported into git then there's no need to
avoid reproducing them faithfully.

In olden times, before v1.6.0-rc0~194 (2008-06-27), git commit-tree
and higher-level tools built on top of it were limited to writing 16
parents for a commit.  Nowadays normal git operations are happy to
write more parents when asked, so the motivation for this note in the
fast-import documentation is gone and we can remove it.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Thanks,
Jonathan

 Documentation/git-fast-import.txt | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index f71fb01..773584e 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -506,11 +506,8 @@ If the `from` command is
 omitted when creating a new branch, the first `merge` commit will be
 the first ancestor of the current commit, and the branch will start
 out with no files.  An unlimited number of `merge` commands per
-commit are permitted by fast-import, thereby establishing an n-way merge.
-However Git's other tools never create commits with more than 15
-additional ancestors (forming a 16-way merge).  For this reason
-it is suggested that frontends do not use more than 15 `merge`
-commands per commit; 16, if starting a new, empty branch.
+commit are permitted by fast-import and other git commands, thereby
+establishing an n-way merge.
 
 Here `<commit-ish>` is any of the commit specification expressions
 also accepted by `from` (see above).
-- 
2.2.0.rc0.207.ga3a616c

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 23:14     ` Jonathan Nieder
@ 2015-03-31 23:18       ` Jonathan Nieder
  2015-03-31 23:25       ` Junio C Hamano
  2015-03-31 23:35       ` Mike Hommey
  2 siblings, 0 replies; 10+ messages in thread
From: Jonathan Nieder @ 2015-03-31 23:18 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Junio C Hamano, git

Jonathan Nieder wrote:

>                                                                 How
> about this patch?

I think I botched the wording.  Here's a second try.

-- >8 --
Subject: fast-import doc: remove suggested 16-parent limit

Merges with an absurd number of parents are still a bad idea because
they do not render well in tools like gitk, but if they are present
in the repository being imported into git then there's no need to
avoid reproducing them faithfully.

In olden times, before v1.6.0-rc0~194 (2008-06-27), git commit-tree
and higher-level tools built on top of it were limited to writing 16
parents for a commit.  Nowadays normal git operations are happy to
write more parents when asked, so the motivation for this note in the
fast-import documentation is gone and we can remove it.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index f71fb01..690fed3 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -507,10 +507,6 @@ omitted when creating a new branch, the first `merge` commit will be
 the first ancestor of the current commit, and the branch will start
 out with no files.  An unlimited number of `merge` commands per
 commit are permitted by fast-import, thereby establishing an n-way merge.
-However Git's other tools never create commits with more than 15
-additional ancestors (forming a 16-way merge).  For this reason
-it is suggested that frontends do not use more than 15 `merge`
-commands per commit; 16, if starting a new, empty branch.
 
 Here `<commit-ish>` is any of the commit specification expressions
 also accepted by `from` (see above).
-- 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 23:14     ` Jonathan Nieder
  2015-03-31 23:18       ` Jonathan Nieder
@ 2015-03-31 23:25       ` Junio C Hamano
  2015-03-31 23:35       ` Mike Hommey
  2 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2015-03-31 23:25 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Mike Hommey, Git Mailing List

On Tue, Mar 31, 2015 at 4:14 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Mike Hommey wrote:
>
>> Octopus merges are limited to 16 parents.
>
> The note about this in fast-import is out of date (e.g., see
> t/t7602-merge-octopus-many.sh and v1.6.0-rc0~194, 2008-06-27).  How
> about this patch?

Ahh, I thought we eradicated all mentions of that ancient limit back then.

Thanks for catching.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Allowing weak references to blobs and strong references to commits
  2015-03-31 23:14     ` Jonathan Nieder
  2015-03-31 23:18       ` Jonathan Nieder
  2015-03-31 23:25       ` Junio C Hamano
@ 2015-03-31 23:35       ` Mike Hommey
  2 siblings, 0 replies; 10+ messages in thread
From: Mike Hommey @ 2015-03-31 23:35 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git

On Tue, Mar 31, 2015 at 04:14:49PM -0700, Jonathan Nieder wrote:
> Mike Hommey wrote:
> 
> > Octopus merges are limited to 16 parents.
> 
> The note about this in fast-import is out of date (e.g., see
> t/t7602-merge-octopus-many.sh and v1.6.0-rc0~194, 2008-06-27).  How
> about this patch?

Aha! I wasn't stoned! Thanks for fixing this :)

Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-03-31 23:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-31 10:07 Allowing weak references to blobs and strong references to commits Mike Hommey
2015-03-31 19:55 ` Philip Oakley
2015-03-31 21:08   ` Randall S. Becker
2015-03-31 20:23 ` Junio C Hamano
2015-03-31 22:39   ` Mike Hommey
2015-03-31 23:00     ` Junio C Hamano
2015-03-31 23:14     ` Jonathan Nieder
2015-03-31 23:18       ` Jonathan Nieder
2015-03-31 23:25       ` Junio C Hamano
2015-03-31 23:35       ` Mike Hommey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).