git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug report: git branch behaves as if --no-replace-objects is passed
@ 2021-03-30  6:05 Elijah Newren
  2021-03-30  7:02 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2021-03-30  6:05 UTC (permalink / raw)
  To: Git Mailing List

Hi,

Forwarding a report from
https://github.com/newren/git-filter-repo/issues/229, with a few more
details...

log, diff, etc. seem to all support replace objects nicely:

    git init -b main whatever
    cd whatever/
    >empty
    git add empty
    git commit -m initial
    git replace -f deadbeefdeadbeefdeadbeefdeadbeefdeadbeef $(git
rev-parse main)

    echo stuff >>empty

Now, If I try to use this replace object:

    $ git diff --stat deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
     empty | 1 +
     1 file changed, 1 insertion(+)
    $ git log --oneline deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
    deadbee (replaced) initial

it all works well.  BUT, if I try to use it with branch it doesn't work:

    $ git branch --contains deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
    $

and possibly worse, if I create a new branch based on it and use it:

    $ git branch foobar deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
    $ git checkout foobar
    $ echo stuff >empty
    $ git add empty
    $ git commit -m more

then it's clear that branch created foobar pointing to the replaced
object rather than the replacement object -- despite the fact that the
replaced object doesn't even exist within this repo:

    $ git cat-file -p HEAD
    tree 18108bae26dc91af2055bc66cc9fea278012dbd3
    parent deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
    author Elijah Newren <newren@gmail.com> 1617083739 -0700
    committer Elijah Newren <newren@gmail.com> 1617083739 -0700

    more

I poked around in the code a little but it is not at all clear to me
why some parts of the code (log, diff) translate replace refs
correctly, while others (branch) don't.  It is clear from the output
that log is aware that the refs are replaced, which makes me wonder if
every caller needs to be aware of replace refs for them to work
correctly everywhere, because I couldn't find a missing environment
setup for "branch".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30  6:05 Bug report: git branch behaves as if --no-replace-objects is passed Elijah Newren
@ 2021-03-30  7:02 ` Jeff King
  2021-03-30 18:58   ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2021-03-30  7:02 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

On Mon, Mar 29, 2021 at 11:05:05PM -0700, Elijah Newren wrote:

> it all works well.  BUT, if I try to use it with branch it doesn't work:
> 
>     $ git branch --contains deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>     $

I'm not surprised here. The replace mechanism is usually "if you are
trying to access object X, then access the contents of Y instead". But I
don't think we generally rewrite references from other objects. So in
that sense, no ref "contains" deadbeef, because nobody points to it as
an ancestor.

I guess "branch --contains" could convert the mention of the replaced
object on the command-line, something like this:

diff --git a/parse-options-cb.c b/parse-options-cb.c
index 4542d4d3f9..200be4e3d2 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -6,6 +6,7 @@
 #include "string-list.h"
 #include "strvec.h"
 #include "oid-array.h"
+#include "replace-object.h"
 
 /*----- some often used options -----*/
 
@@ -84,6 +85,7 @@ int parse_opt_verbosity_cb(const struct option *opt, const char *arg,
 int parse_opt_commits(const struct option *opt, const char *arg, int unset)
 {
 	struct object_id oid;
+	const struct object_id *replace;
 	struct commit *commit;
 
 	BUG_ON_OPT_NEG(unset);
@@ -92,7 +94,9 @@ int parse_opt_commits(const struct option *opt, const char *arg, int unset)
 		return -1;
 	if (get_oid(arg, &oid))
 		return error("malformed object name %s", arg);
-	commit = lookup_commit_reference(the_repository, &oid);
+
+	replace = lookup_replace_object(the_repository, &oid);
+	commit = lookup_commit_reference(the_repository, replace);
 	if (!commit)
 		return error("no such commit %s", arg);
 	commit_list_insert(commit, opt->value);

though if we go that route, I suspect we ought to be adding both the
original _and_ the replacement.

I'm not entirely sure this is a good direction, though.

> and possibly worse, if I create a new branch based on it and use it:
> 
>     $ git branch foobar deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>     $ git checkout foobar
>     $ echo stuff >empty
>     $ git add empty
>     $ git commit -m more
> 
> then it's clear that branch created foobar pointing to the replaced
> object rather than the replacement object -- despite the fact that the
> replaced object doesn't even exist within this repo:
> 
>     $ git cat-file -p HEAD
>     tree 18108bae26dc91af2055bc66cc9fea278012dbd3
>     parent deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>     author Elijah Newren <newren@gmail.com> 1617083739 -0700
>     committer Elijah Newren <newren@gmail.com> 1617083739 -0700
> 
>     more

Yeah, that's pretty horrible. I do think you're using replace objects in
a way they weren't really intended for, in two ways:

  - usually you'd actually have deadbeef, and you just want to rewrite
    it to something else

  - you wouldn't usually work directly with the replace object on the
    command line like this. Usually the intent is to patch some old
    section of history to get a different view.

None of which is an excuse exactly. But just to say that I'm not too
surprised that the "replace" mechanism is a bit half-baked, and there
are probably a lot of weird implications nobody has thought through.

Patches welcome, of course, though I suspect it may be a rabbit hole
that isn't worth your time. :)

> I poked around in the code a little but it is not at all clear to me
> why some parts of the code (log, diff) translate replace refs
> correctly, while others (branch) don't.  It is clear from the output
> that log is aware that the refs are replaced, which makes me wonder if
> every caller needs to be aware of replace refs for them to work
> correctly everywhere, because I couldn't find a missing environment
> setup for "branch".

Right. To work as you expect above, basically everything that is going
to look at an oid would need to consider whether to use a replace ref.
Shoving the lookup into get_oid() would "fix" that, but I suspect would
confuse some other callers. Right now the rule is "when you access the
object contents, pretend as if it had X instead of Y", so we can enforce
that at the layer of accessing the object database.

-Peff

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30  7:02 ` Jeff King
@ 2021-03-30 18:58   ` Junio C Hamano
  2021-03-30 21:19     ` Elijah Newren
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2021-03-30 18:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren, Git Mailing List

Jeff King <peff@peff.net> writes:

> ... though if we go that route, I suspect we ought to be adding both the
> original _and_ the replacement.

So "branch --contains X" would ask "which of these branches reach X
or its replacement?" and "branch --no-contains X" would ask "which
of these do not reach X nor its replacement?" --- I guess the result
is still internally consistent (meaning: any and all branches fall
into either "--contains X" or "--no-contains X" camp).

> I'm not entirely sure this is a good direction, though.
>
>> and possibly worse, if I create a new branch based on it and use it:
>> 
>>     $ git branch foobar deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>>     $ git checkout foobar
>>     $ echo stuff >empty
>>     $ git add empty
>>     $ git commit -m more
>> 
>> then it's clear that branch created foobar pointing to the replaced
>> object rather than the replacement object -- despite the fact that the
>> replaced object doesn't even exist within this repo:
>> 
>>     $ git cat-file -p HEAD
>>     tree 18108bae26dc91af2055bc66cc9fea278012dbd3
>>     parent deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
>>     author Elijah Newren <newren@gmail.com> 1617083739 -0700
>>     committer Elijah Newren <newren@gmail.com> 1617083739 -0700
>> 
>>     more
>
> Yeah, that's pretty horrible.

I am not sure.  As you analize below, the replace mechanism is about
telling Git: when anybody refers to deadbeef, use its replacement if
defined instead.

And one of the points in the mechanism is to allow to do so even
retroactively, so the HEAD object there may be referring to deadbeef
that may not exist does not matter, as long as the object that is to
replace deadbeef is available.  If not, that is a repository
corruption.  After all, the commit object you cat-file'ed may have
been created by somebody else in a separate repository that had
deadbeef before they were told by Elijah that the object is obsolete
and to be replaced by something else (Git supports distributed
development) and then pulled into Elijah's repository, and we should
be prepared to seeing "parent deadbeef" in such a commit.  As long as
replacement happens when accessing the contents, that would be OK.

So, I do not see it as "pretty horrible", but I may be missing
something.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30 18:58   ` Junio C Hamano
@ 2021-03-30 21:19     ` Elijah Newren
  2021-03-30 21:30       ` Elijah Newren
  2021-03-30 21:53       ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Elijah Newren @ 2021-03-30 21:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

On Tue, Mar 30, 2021 at 11:58 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > ... though if we go that route, I suspect we ought to be adding both the
> > original _and_ the replacement.
>
> So "branch --contains X" would ask "which of these branches reach X
> or its replacement?" and "branch --no-contains X" would ask "which
> of these do not reach X nor its replacement?" --- I guess the result
> is still internally consistent (meaning: any and all branches fall
> into either "--contains X" or "--no-contains X" camp).

I'm not so sure about this interpretation.  Based on the documentation
in git-replace(1):

       Replacement references will be used by default by all Git commands
       except those doing reachability traversal (prune, pack transfer and
       fsck).

I would have thought that

* "branch --contains X" would ask "which of these branches reach X's
replacement?"
* "git --no-replace-objects branch --contains X" would ask "which of
these branches reach X?"

and if folks really wanted to check whether either X or its
replacement were reachable then they'd need to run both commands.

The only place outside of reachability traversal where I think it
makes sense for a command to distinguish between X being a replace ref
for Y and Y itself is in `git log` where it can show the "replaced"
moniker.

> > I'm not entirely sure this is a good direction, though.
> >
> >> and possibly worse, if I create a new branch based on it and use it:
> >>
> >>     $ git branch foobar deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> >>     $ git checkout foobar
> >>     $ echo stuff >empty
> >>     $ git add empty
> >>     $ git commit -m more
> >>
> >> then it's clear that branch created foobar pointing to the replaced
> >> object rather than the replacement object -- despite the fact that the
> >> replaced object doesn't even exist within this repo:
> >>
> >>     $ git cat-file -p HEAD
> >>     tree 18108bae26dc91af2055bc66cc9fea278012dbd3
> >>     parent deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> >>     author Elijah Newren <newren@gmail.com> 1617083739 -0700
> >>     committer Elijah Newren <newren@gmail.com> 1617083739 -0700
> >>
> >>     more
> >
> > Yeah, that's pretty horrible.
>
> I am not sure.  As you analize below, the replace mechanism is about
> telling Git: when anybody refers to deadbeef, use its replacement if
> defined instead.
>
> And one of the points in the mechanism is to allow to do so even
> retroactively, so the HEAD object there may be referring to deadbeef
> that may not exist does not matter, as long as the object that is to
> replace deadbeef is available.  If not, that is a repository
> corruption.  After all, the commit object you cat-file'ed may have
> been created by somebody else in a separate repository that had
> deadbeef before they were told by Elijah that the object is obsolete
> and to be replaced by something else (Git supports distributed
> development) and then pulled into Elijah's repository, and we should
> be prepared to seeing "parent deadbeef" in such a commit.  As long as
> replacement happens when accessing the contents, that would be OK.
>
> So, I do not see it as "pretty horrible", but I may be missing
> something.

I think you're focusing on git commit, or perhaps on git checkout.
I'm focusing on git branch; what it did does not seem fine to me.
Using your own words:

"the replace mechanism is about telling Git: when anybody refers to
deadbeef, use its replacement if defined instead."

git branch didn't do that; it put deadbeef into refs/heads/foobar.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30 21:19     ` Elijah Newren
@ 2021-03-30 21:30       ` Elijah Newren
  2021-03-30 21:59         ` Junio C Hamano
  2021-03-30 21:53       ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2021-03-30 21:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

On Tue, Mar 30, 2021 at 2:19 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Mar 30, 2021 at 11:58 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Jeff King <peff@peff.net> writes:
> >
> > > ... though if we go that route, I suspect we ought to be adding both the
> > > original _and_ the replacement.
> >
> > So "branch --contains X" would ask "which of these branches reach X
> > or its replacement?" and "branch --no-contains X" would ask "which
> > of these do not reach X nor its replacement?" --- I guess the result
> > is still internally consistent (meaning: any and all branches fall
> > into either "--contains X" or "--no-contains X" camp).
>
> I'm not so sure about this interpretation.  Based on the documentation
> in git-replace(1):
>
>        Replacement references will be used by default by all Git commands
>        except those doing reachability traversal (prune, pack transfer and
>        fsck).
>
> I would have thought that
>
> * "branch --contains X" would ask "which of these branches reach X's
> replacement?"
> * "git --no-replace-objects branch --contains X" would ask "which of
> these branches reach X?"
>
> and if folks really wanted to check whether either X or its
> replacement were reachable then they'd need to run both commands.
>
> The only place outside of reachability traversal where I think it
> makes sense for a command to distinguish between X being a replace ref
> for Y and Y itself is in `git log` where it can show the "replaced"
> moniker.
>
> > > I'm not entirely sure this is a good direction, though.
> > >
> > >> and possibly worse, if I create a new branch based on it and use it:
> > >>
> > >>     $ git branch foobar deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> > >>     $ git checkout foobar
> > >>     $ echo stuff >empty
> > >>     $ git add empty
> > >>     $ git commit -m more
> > >>
> > >> then it's clear that branch created foobar pointing to the replaced
> > >> object rather than the replacement object -- despite the fact that the
> > >> replaced object doesn't even exist within this repo:
> > >>
> > >>     $ git cat-file -p HEAD
> > >>     tree 18108bae26dc91af2055bc66cc9fea278012dbd3
> > >>     parent deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> > >>     author Elijah Newren <newren@gmail.com> 1617083739 -0700
> > >>     committer Elijah Newren <newren@gmail.com> 1617083739 -0700
> > >>
> > >>     more
> > >
> > > Yeah, that's pretty horrible.
> >
> > I am not sure.  As you analize below, the replace mechanism is about
> > telling Git: when anybody refers to deadbeef, use its replacement if
> > defined instead.
> >
> > And one of the points in the mechanism is to allow to do so even
> > retroactively, so the HEAD object there may be referring to deadbeef
> > that may not exist does not matter, as long as the object that is to
> > replace deadbeef is available.  If not, that is a repository
> > corruption.  After all, the commit object you cat-file'ed may have
> > been created by somebody else in a separate repository that had
> > deadbeef before they were told by Elijah that the object is obsolete
> > and to be replaced by something else (Git supports distributed
> > development) and then pulled into Elijah's repository, and we should
> > be prepared to seeing "parent deadbeef" in such a commit.  As long as
> > replacement happens when accessing the contents, that would be OK.
> >
> > So, I do not see it as "pretty horrible", but I may be missing
> > something.
>
> I think you're focusing on git commit, or perhaps on git checkout.
> I'm focusing on git branch; what it did does not seem fine to me.
> Using your own words:
>
> "the replace mechanism is about telling Git: when anybody refers to
> deadbeef, use its replacement if defined instead."
>
> git branch didn't do that; it put deadbeef into refs/heads/foobar.

Perhaps I should also add why it not only breaks expectations, but why
that broken expectation causes problems:

* People tend to have commit hashes stored in lots of weird placed --
bug trackers, reports, emails, etc.  These tend to be important for a
short time period, but the number of these references make it harder
for folks who want to rewrite history to fix various past issues (very
large binary blobs and other misdeeds).

* filter-repo uses replace refs to provide users with a way to access
new commits using old commit hashes, to help them through this
transition period.

* Additional refs (especially one for every commit) will cause some
slowness.  So it's nice to be able to provide these replace refs for
short term transition, but tell users they can simply delete the
replace refs when they no longer need them without consequence.


The fact that git branch puts deadbeef into refs/heads/foobar, leads
to a chain where new commits now rely on replacement refs.  In the
best case, others will not be able to pull from this user and the user
will not be able to push the new commits anywhere -- and that user
will have some work to do to rewrite (rebase?) the commits
appropriately.  In the worst case, the users do succeed in
distributing this new history, and now all users everywhere will be
mandated to keep all replace refs for all time (or at least until the
next major repository rewrite)...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30 21:19     ` Elijah Newren
  2021-03-30 21:30       ` Elijah Newren
@ 2021-03-30 21:53       ` Junio C Hamano
  2021-03-30 22:43         ` Elijah Newren
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2021-03-30 21:53 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> Using your own words:
>
> "the replace mechanism is about telling Git: when anybody refers to
> deadbeef, use its replacement if defined instead."

"When anybody wants the contents of deadbeef, give the contents of
its replacement" is what the replace (graft, too) is.

> git branch didn't do that; it put deadbeef into refs/heads/foobar.

Yes, but refs/heads/foobar having deadbeef does not have a problem,
as long as you get the contents of the replacement object when you
ask for the contents of deadbeef (or "the object referred to by
the refs/heads/foobar ref").


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30 21:30       ` Elijah Newren
@ 2021-03-30 21:59         ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2021-03-30 21:59 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> On Tue, Mar 30, 2021 at 2:19 PM Elijah Newren <newren@gmail.com> wrote:
>>
>> On Tue, Mar 30, 2021 at 11:58 AM Junio C Hamano <gitster@pobox.com> wrote:
>> >
>> > Jeff King <peff@peff.net> writes:
>> >
>> > > ... though if we go that route, I suspect we ought to be adding both the
>> > > original _and_ the replacement.
>> >
>> > So "branch --contains X" would ask "which of these branches reach X
>> > or its replacement?" and "branch --no-contains X" would ask "which
>> > of these do not reach X nor its replacement?" --- I guess the result
>> > is still internally consistent (meaning: any and all branches fall
>> > into either "--contains X" or "--no-contains X" camp).
>>
>> I'm not so sure about this interpretation.  Based on the documentation
>> in git-replace(1):
>>
>>        Replacement references will be used by default by all Git commands
>>        except those doing reachability traversal (prune, pack transfer and
>>        fsck).

This "rechability" sidenote is primarily so that we won't result in
a corrupt repository when replacement is lifted.  When object A is
replaced by object B, and somebody makes A reachable (e.g. a ref
points at deadbeef), we mark both A (and the objects A refers to)
and B reachable, so that "prune" won't lose A.  It would allow the
replacement lifted after "prune".

Tweaking "branch --contains X" to list branches that reach either X
or its replacement would probably have the same effect, so I would
think it would be a good change (and fix to your original issue).

The "the result is still internally consistent" comment was the
result of my attempt to make sure such a change would not introduce
gross incoherency to the resulting system.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30 21:53       ` Junio C Hamano
@ 2021-03-30 22:43         ` Elijah Newren
  2021-03-30 23:04           ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2021-03-30 22:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

On Tue, Mar 30, 2021 at 2:53 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Using your own words:
> >
> > "the replace mechanism is about telling Git: when anybody refers to
> > deadbeef, use its replacement if defined instead."
>
> "When anybody wants the contents of deadbeef, give the contents of
> its replacement" is what the replace (graft, too) is.
>
> > git branch didn't do that; it put deadbeef into refs/heads/foobar.
>
> Yes, but refs/heads/foobar having deadbeef does not have a problem,
> as long as you get the contents of the replacement object when you
> ask for the contents of deadbeef (or "the object referred to by
> the refs/heads/foobar ref").

Your "as long as" is I think the assumption that's violated in the
workflow in question.  You may have the replace ref defined, but
others don't[1].  Neither party has the actual original deadbeef
commit[2].  Having deadbeef in refs/heads/foobar leads eventually to
creating commits with deadbeef as an explicit parent, as we discussed
above.  While that's internally consistent, as you point out, can you
push your new commit elsewhere without pushing the replace refs too?

* If you can, isn't that repository corruption?

* If you can't, how are users supposed to understand, debug, and
correct the fact that their history is unshareable?

Why does `git branch` (in conjunction with one user deciding to fetch
replace refs) make it so easy to create a branch that cannot readily
be shared with others?


[1] Everyone can get the replace refs, but fetch/clone does not grab
them by default, and push does not include them by default, so it's
multiple manual steps.
[2] For the rewrite at $DAYJOB, we'll make it as hard as possible for
users to access the original deadbeef commit(s).  People who cloned
the old unrewritten repo in the past might still have it somewhere if
they didn't follow instructions to delete old clones and reclone, but
new hires won't have deadbeef and won't have a way to get it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30 22:43         ` Elijah Newren
@ 2021-03-30 23:04           ` Junio C Hamano
  2021-03-31  0:32             ` Elijah Newren
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2021-03-30 23:04 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeff King, Git Mailing List

Elijah Newren <newren@gmail.com> writes:

> Your "as long as" is I think the assumption that's violated in the
> workflow in question.  You may have the replace ref defined, but
> others don't[1].  Neither party has the actual original deadbeef
> commit[2].  Having deadbeef in refs/heads/foobar leads eventually to
> creating commits with deadbeef as an explicit parent, as we discussed
> above.  While that's internally consistent, as you point out, can you
> push your new commit elsewhere without pushing the replace refs too?

I think the change to "branch --contains" would be an improvement
whether you actually have deadbeef or not, but in any case, defining
(eh, rather, being able to define) a replacement for something you
do not have is the ultimate source of the problem.  And that "bug"
has not very much specific to how "branch --contains" should behave.

> Why does `git branch` (in conjunction with one user deciding to fetch
> replace refs) make it so easy to create a branch that cannot readily
> be shared with others?

In other words, I do not think it is "git branch" or "git checkout -b"
that brought your repository into a broken state.  The "replace"
mechanism may have room for improvement to avoid such a corruption.

IIRC, the original "graft" mechanism did not even have any UI, so it
was pretty much "you can graft any parent to any child, and if you
break the repository you can keep both halves".  Now "replace" has a
dedicated UI component in the form of "git replace" command, we should
be able to teach it how to record replacement more safely.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug report: git branch behaves as if --no-replace-objects is passed
  2021-03-30 23:04           ` Junio C Hamano
@ 2021-03-31  0:32             ` Elijah Newren
  0 siblings, 0 replies; 10+ messages in thread
From: Elijah Newren @ 2021-03-31  0:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

On Tue, Mar 30, 2021 at 4:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Your "as long as" is I think the assumption that's violated in the
> > workflow in question.  You may have the replace ref defined, but
> > others don't[1].  Neither party has the actual original deadbeef
> > commit[2].  Having deadbeef in refs/heads/foobar leads eventually to
> > creating commits with deadbeef as an explicit parent, as we discussed
> > above.  While that's internally consistent, as you point out, can you
> > push your new commit elsewhere without pushing the replace refs too?
>
> I think the change to "branch --contains" would be an improvement
> whether you actually have deadbeef or not, but in any case, defining

Great, we agree that branch --contains can be improved.

> (eh, rather, being able to define) a replacement for something you
> do not have is the ultimate source of the problem.  And that "bug"
> has not very much specific to how "branch --contains" should behave.

Okay, perhaps you consider the ability to create a replacement for an
object that doesn't exist as a bug.  How do we handle this bug,
though?  That's not at all clear to me.  Is it documentation updates?
More error messages from various commands?  Code changes to handle
these cases better?  Something else?  This behavior has been allowed
for over a decade, the refs can be created outside of "git replace"
and the replacement mechanism comes with
* documentation claiming there are only two restrictions on
replacement or replaced refs, both bypassable[1]
* documentation claims of robustness for the replace mechanism[2]
* documentation claiming that all non-reachability traversal related
commands will translate replacement refs to the real commit IDs[3]
* user-facing UI to support creating/updating/deleting replace refs

Based on the above, filter-repo has been creating replacement refs for
years now, one for every commit in the repository that wasn't filtered
out.  And I thought it all worked great without any problems, until
the recent report.  I guess I only ever used the old commit IDs to
pass to things like diff, log, etc., and I guess most users either
didn't use the replace refs at all or only used them that way
too...until this week. However, I'm planning to do a very big testcase
of history rewriting (ancient and huge binary blobs) at $DAYJOB later
this year including replace refs for some of the release-team folks,
so I'm kind of concerned about what I need to fix in git or at least
what I need to document.

[1] See the part up to "...There is no other restriction on the
replaced and replacement objects.", from git-replace(1)
[2] "Note that the grafts mechanism is outdated and can lead to
problems...see git-replace(1) for a more flexible and robust system",
from gitrepository-layout(1)
[3] "Replacement references will be used by default by all Git
commands except those doing reachability traversal", from
git-replace(1)

> > Why does `git branch` (in conjunction with one user deciding to fetch
> > replace refs) make it so easy to create a branch that cannot readily
> > be shared with others?
>
> In other words, I do not think it is "git branch" or "git checkout -b"
> that brought your repository into a broken state.  The "replace"
> mechanism may have room for improvement to avoid such a corruption.
>
> IIRC, the original "graft" mechanism did not even have any UI, so it
> was pretty much "you can graft any parent to any child, and if you
> break the repository you can keep both halves".  Now "replace" has a
> dedicated UI component in the form of "git replace" command, we should
> be able to teach it how to record replacement more safely.

I would like to avoid corruption, and I'm happy to have alternate
solutions.  I don't understand how this could be fixed in "git
replace", though, especially since "git replace" was not even invoked
in the real use case under consideration (git filter-repo uses "git
update-ref --stdin" to create the replace objects).

Are there other alternative ways to fix this outside of "git
branch"/"git checkout -b"?

If not, are there technical reasons that "git branch <branchname>
<replace-hash>" and other commands like it could not be adjusted to
write the replacement hash rather than the replaced hash to the new
branch?

Even if there are no alternative fixes and there are technical reasons
that "git branch" and other commands like it cannot be adjusted, I
still feel that translate-old-commit-hashes via
replacement-of-non-existent-commits feature is valuable enough to keep
in filter-repo anyway.  It'd make me a little uneasy, but I'd continue
creating the replace refs and just document the drawbacks in that
scenario.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-31  0:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30  6:05 Bug report: git branch behaves as if --no-replace-objects is passed Elijah Newren
2021-03-30  7:02 ` Jeff King
2021-03-30 18:58   ` Junio C Hamano
2021-03-30 21:19     ` Elijah Newren
2021-03-30 21:30       ` Elijah Newren
2021-03-30 21:59         ` Junio C Hamano
2021-03-30 21:53       ` Junio C Hamano
2021-03-30 22:43         ` Elijah Newren
2021-03-30 23:04           ` Junio C Hamano
2021-03-31  0:32             ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).