git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Bad refspec messes up bundle.
@ 2018-03-19  8:39 Luciano Joublanc
  2018-03-19 17:36 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Luciano Joublanc @ 2018-03-19  8:39 UTC (permalink / raw)
  To: git

Yesterday I created a git bundle as best as I can remember like this

git bundle save chunk chunk.bundle --all master

Note the 'master' I added accidentally at the end - this was a user
error but still the bundle was created.

When I tried to clone this, I get

~\local\src> git clone 'G:\My Drive\chunk.bundle' fs2-columns
Cloning into 'fs2-columns'...
Receiving objects: 100% (31/31), done.
Resolving deltas: 100% (5/5), done.
fatal: multiple updates for ref 'refs/remotes/origin/master' not allowed.
~\local\src> git bundle verify chunk.bundle
The bundle contains these 3 refs:
3c804437a5f8537db1bfb5d09b7bff4f9950605e refs/heads/master
3c804437a5f8537db1bfb5d09b7bff4f9950605e HEAD
3c804437a5f8537db1bfb5d09b7bff4f9950605e refs/heads/master
The bundle records a complete history.
chunk.bundle is okay

After trying a couple of things, I finally managed to clone it using

~\local\src> git clone -b master --single-branch .\chunk.bundle fs2-columns

i.e. the '--single-branch' option saved me.

Is this a bug? Should bundle allow providing multiple refspecs when
`--all` is provided? I admit this was clearly a case of 'caveat
emptor', but shouldn't this be disallowed (i.e. is there any situation
when this is useful?)

Thanks!

Luciano

-- 
This message is intended only for the personal and confidential use of the 
designated recipient(s) named above.  If you are not the intended recipient 
of this message you are hereby notified that any review, dissemination, 
distribution or copying of this message is strictly prohibited.  This 
communication is for information purposes only and should not be regarded 
as an offer to sell or as a solicitation of an offer to buy any financial 
product, an official confirmation of any transaction, or as an official 
statement of the Dinosaur Group.  Email transmission cannot be guaranteed 
to be secure or error-free.  Therefore, we do not represent that this 
information is complete or accurate and it should not be relied upon as 
such.  All information is subject to change without notice.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad refspec messes up bundle.
  2018-03-19  8:39 Bad refspec messes up bundle Luciano Joublanc
@ 2018-03-19 17:36 ` Junio C Hamano
  2018-03-30 10:20   ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2018-03-19 17:36 UTC (permalink / raw)
  To: Luciano Joublanc; +Cc: git

Luciano Joublanc <ljoublanc@dinogroup.eu> writes:

> Yesterday I created a git bundle as best as I can remember like this
>
> git bundle save chunk chunk.bundle --all master
>
> Note the 'master' I added accidentally at the end - this was a user
> error but still the bundle was created.
>
> When I tried to clone this, I get
>
> ~\local\src> git clone 'G:\My Drive\chunk.bundle' fs2-columns
> Cloning into 'fs2-columns'...
> Receiving objects: 100% (31/31), done.
> Resolving deltas: 100% (5/5), done.
> fatal: multiple updates for ref 'refs/remotes/origin/master' not allowed.
> ~\local\src> git bundle verify chunk.bundle
> The bundle contains these 3 refs:
> 3c804437a5f8537db1bfb5d09b7bff4f9950605e refs/heads/master
> 3c804437a5f8537db1bfb5d09b7bff4f9950605e HEAD
> 3c804437a5f8537db1bfb5d09b7bff4f9950605e refs/heads/master
> The bundle records a complete history.
> chunk.bundle is okay
>
> After trying a couple of things, I finally managed to clone it using
>
> ~\local\src> git clone -b master --single-branch .\chunk.bundle fs2-columns
>
> i.e. the '--single-branch' option saved me.
>
> Is this a bug? Should bundle allow providing multiple refspecs when
> `--all` is provided? I admit this was clearly a case of 'caveat
> emptor', but shouldn't this be disallowed (i.e. is there any situation
> when this is useful?)

Thanks for a report.

Just like a remote repository that reports the same ref more than
once in its initial advertisement (i.e. "git ls-remote $remote"
gives duplicate entries), a bundle file that records the same ref
more than once *is* a bug, I would think.

A "git bundle create" command that creates such a bundle file
shouldn't.  It is not very useful to diagnose it as an error; it
probably makes more sense to dedup the refs instead when writing the
bundle file.  Of course, we should abort with an error *if* the code
ever tries to store the same ref twice with different object name
(i.e. attempt to dedup, in vain).

Also, "git clone" from such a bundle file (or for that matter, a
remote repository that advertises the same ref twice) probably
should do a similar deduping, with a warning message.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad refspec messes up bundle.
  2018-03-19 17:36 ` Junio C Hamano
@ 2018-03-30 10:20   ` Johannes Schindelin
  2018-03-30 17:18     ` Junio C Hamano
  2018-03-31  8:50     ` Luciano Joublanc
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-03-30 10:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Luciano Joublanc, git

Hi Luciano,

On Mon, 19 Mar 2018, Junio C Hamano wrote:

> Luciano Joublanc <ljoublanc@dinogroup.eu> writes:
> 
> > Yesterday I created a git bundle as best as I can remember like this
> >
> > git bundle save chunk chunk.bundle --all master
> >
> > Note the 'master' I added accidentally at the end - this was a user
> > error but still the bundle was created.
> >
> > When I tried to clone this, I get
> >
> > ~\local\src> git clone 'G:\My Drive\chunk.bundle' fs2-columns
> > Cloning into 'fs2-columns'...
> > Receiving objects: 100% (31/31), done.
> > Resolving deltas: 100% (5/5), done.
> > fatal: multiple updates for ref 'refs/remotes/origin/master' not allowed.
> > ~\local\src> git bundle verify chunk.bundle
> > The bundle contains these 3 refs:
> > 3c804437a5f8537db1bfb5d09b7bff4f9950605e refs/heads/master
> > 3c804437a5f8537db1bfb5d09b7bff4f9950605e HEAD
> > 3c804437a5f8537db1bfb5d09b7bff4f9950605e refs/heads/master
> > The bundle records a complete history.
> > chunk.bundle is okay
> >
> > After trying a couple of things, I finally managed to clone it using
> >
> > ~\local\src> git clone -b master --single-branch .\chunk.bundle fs2-columns
> >
> > i.e. the '--single-branch' option saved me.
> >
> > Is this a bug? Should bundle allow providing multiple refspecs when
> > `--all` is provided? I admit this was clearly a case of 'caveat
> > emptor', but shouldn't this be disallowed (i.e. is there any situation
> > when this is useful?)
> 
> Thanks for a report.
> 
> Just like a remote repository that reports the same ref more than
> once in its initial advertisement (i.e. "git ls-remote $remote"
> gives duplicate entries), a bundle file that records the same ref
> more than once *is* a bug, I would think.
> 
> A "git bundle create" command that creates such a bundle file
> shouldn't.  It is not very useful to diagnose it as an error; it
> probably makes more sense to dedup the refs instead when writing the
> bundle file.  Of course, we should abort with an error *if* the code
> ever tries to store the same ref twice with different object name
> (i.e. attempt to dedup, in vain).
> 
> Also, "git clone" from such a bundle file (or for that matter, a
> remote repository that advertises the same ref twice) probably
> should do a similar deduping, with a warning message.

I agree that it is a bug if a bundle file records a ref multiple times.
Luciano, here are some pointers so you can fix it:

- probably the best way to start would be to add a new test case to
  t/t5607-clone-bundle.sh. The script *should* be relatively easy to
  understand and imitate. The new test case would probably look somewhat
  like this:

	test_expect_failure 'bundles must not contain multiple refs' '
		git bundle create all.bundle --all master &&
		grep master all.bundle >master.lines &&
		test_line_count = 1 master.lines
	'

- then, ensure that this test passes and reports this correctly as a known
  breakage.

- now it is time to fix the actual bug. The code in question is the
  write_bundle_refs() function in bundle.c (careful, there are two files
  of that name in Git's source code, one in the builtin/ subdirectory, the
  other one in the toplevel directory, you will want to look at the
  latter one).

- notice that there is already a construct in the loop over the pending
  refs where some are skipped:

		if (e->item->flags & UNINTERESTING)
			continue;

  The "uninteresting" refs would be those on the excluded end of a commit
  range, e.g. if you called `git bundle create x.bundle master..next`, the
  `master` ref would be such an "uninteresting" ref.

- one might be tempted to introduce another flag and set it once a ref was
  shown and skip a ref for which that flag is set. And indeed, this was my
  own first thought!

  However, this would be incorrect, as the flags are stored with the
  *commit*, not with the ref. So if two refs point to the same commit,
  that new code would skip the second one by mistake!

  By the way, this makes me think that there is another very real bug in
  the bundle code, in the part I showed above. Suppose you have a `master`
  and a `next` ref, and both point at the same commit, then you would want
  `git bundle create next.bundle master..next` to list `next`, don't you
  think? This one would be a lot harder to fix, though, as the
  setup_revisions() function really only sets the flag on the respective
  commits because it wants to set things up for revision walking, not
  necessarily to list "interesting" refs. A possible solution might be to
  abuse the `mode` field in the object_array (which is the type of the
  `pending` field in the `rev_info` struct where the parsed command-line
  parameters are held, when that `mode` makes only sense for blobs and
  we store refs pointing to commits or tags).

  That's a side-track, though.

- most likely, the best way to avoid duplicate refs entries is to use the
  actual ref name and put it into a hash set. Luckily, we do have code
  for this, and examples how to use it, too. See e.g. fc65b00da7e
  (merge-recursive: change current file dir string_lists to hashmap,
  2017-09-07). So you would define something like

	struct ref_name_entry {
		struct hashmap_entry e; /* must be first field */
		char ref_name[FLEX_ARRAY];
	}

  and implement the cmp function similar to the path_hashmap_cmp() in
  merge-recursive.c, then initialize a hashset in write_bundle_refs(),
  add handled ref names via

	FLEX_ALLOC_STR(entry, ref_name, e->name);
	hashmap_entry_init(entry, strhash(e->name));
	hashmap_add(&shown_refs, entry);

  and look up whether a ref should be skipped via

	if (hashmap_get_from_hash(&shown_refs, strhash(e->name), e->name))
		continue;

  Oh, and before returning from write_bundle_refs(), the hashset
  (including its entries) should be released via

	hashmap_free(&shown_refs, 1);


These hints should get you started on this project.

Looking forward to your contribution,
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad refspec messes up bundle.
  2018-03-30 10:20   ` Johannes Schindelin
@ 2018-03-30 17:18     ` Junio C Hamano
  2018-03-30 18:58       ` Johannes Schindelin
  2018-03-31  8:50     ` Luciano Joublanc
  1 sibling, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2018-03-30 17:18 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Luciano Joublanc, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Luciano,
>
>> > Is this a bug? Should bundle allow providing multiple refspecs when
> ...
> I agree that it is a bug if a bundle file records a ref multiple times.
> Luciano, here are some pointers so you can fix it:
>
> - probably the best way to start would be to add a new test case to
>   t/t5607-clone-bundle.sh. The script *should* be relatively easy to
>   understand and imitate. The new test case would probably look somewhat
>   like this:
>
> 	test_expect_failure 'bundles must not contain multiple refs' '

s/multiple/duplicate/.  It is not unusual for a bundle to record
more than one ref; it is (1) useless and harmful to unsuspecting
clients to record the same ref twice with the same value and (2)
nonesnse to record the same ref twice with different value.

Other than that, the outline seems to go in the right general
direction.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad refspec messes up bundle.
  2018-03-30 17:18     ` Junio C Hamano
@ 2018-03-30 18:58       ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-03-30 18:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Luciano Joublanc, git

Hi Junio,

On Fri, 30 Mar 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> > Is this a bug? Should bundle allow providing multiple refspecs when
> > ...
> > I agree that it is a bug if a bundle file records a ref multiple
> > times.  Luciano, here are some pointers so you can fix it:
> >
> > - probably the best way to start would be to add a new test case to
> >   t/t5607-clone-bundle.sh. The script *should* be relatively easy to
> >   understand and imitate. The new test case would probably look somewhat
> >   like this:
> >
> > 	test_expect_failure 'bundles must not contain multiple refs' '
> 
> s/multiple/duplicate/.  It is not unusual for a bundle to record
> more than one ref; it is (1) useless and harmful to unsuspecting
> clients to record the same ref twice with the same value and (2)
> nonesnse to record the same ref twice with different value.

Of course! Thanks for pointing this out.

> Other than that, the outline seems to go in the right general
> direction.

Excellent.

Luciano, the ball is in your court now. If you get stuck with anything
(such as getting started with building Git), please let us know. We can
help.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad refspec messes up bundle.
  2018-03-30 10:20   ` Johannes Schindelin
  2018-03-30 17:18     ` Junio C Hamano
@ 2018-03-31  8:50     ` Luciano Joublanc
  2018-04-03 14:38       ` Johannes Schindelin
  1 sibling, 1 reply; 7+ messages in thread
From: Luciano Joublanc @ 2018-03-31  8:50 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

Hi Johannes,

With such a comprehensive reply, I would feel guilty not making a
contribution now :) Be forewarned though, It's been about ten years
since I wrote anything in `C`!

I've cloned the `maint` branch, built the project, and added the test
as you suggested - it's failing as expected.

I'm somewhat confused though. I think it's m limited understanding of
'ref' and 'commit'.

On 30 March 2018 at 11:20, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
>
>   However, this would be incorrect, as the flags are stored with the
>   *commit*, not with the ref. So if two refs point to the same commit,
>   that new code would skip the second one by mistake!


Isn't that the point here? to deduplicate commits?  My limited
understanding is that at a 'ref' is like an alias or pointer to a
commit.

>
>   By the way, this makes me think that there is another very real bug in
>   the bundle code, in the part I showed above. Suppose you have a `master`
>   and a `next` ref, and both point at the same commit, then you would want
>   `git bundle create next.bundle master..next` to list `next`, don't you
>   think?


Doesn't this contradict what you just said, that we don't want to skip
refs with the same commit #?

In fact, if you look in the calling function, there is a
`    object_array_remove_duplicates(&revs.pending);`
Which to the best of my understanding removes duplicate refs (not
commits). However, I suspect this doesn't cover the `--all` case as
it's a switch rather than a revspec? Would that be right?

>
>
> - most likely, the best way to avoid duplicate refs entries is to use the
>   actual ref name and put it into a hash set. Luckily, we do have code
>   for this, and examples how to use it, too. See e.g. fc65b00da7e
>   (merge-recursive: change current file dir string_lists to hashmap,
>   2017-09-07). So you would define something like
>

Separately, if I do end up including the hashmap code, it should be
refactored out into it's own file, right?

Thanks again,

Luciano

-- 
This message is intended only for the personal and confidential use of the 
designated recipient(s) named above.  If you are not the intended recipient 
of this message you are hereby notified that any review, dissemination, 
distribution or copying of this message is strictly prohibited.  This 
communication is for information purposes only and should not be regarded 
as an offer to sell or as a solicitation of an offer to buy any financial 
product, an official confirmation of any transaction, or as an official 
statement of the Dinosaur Group.  Email transmission cannot be guaranteed 
to be secure or error-free.  Therefore, we do not represent that this 
information is complete or accurate and it should not be relied upon as 
such.  All information is subject to change without notice.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bad refspec messes up bundle.
  2018-03-31  8:50     ` Luciano Joublanc
@ 2018-04-03 14:38       ` Johannes Schindelin
  0 siblings, 0 replies; 7+ messages in thread
From: Johannes Schindelin @ 2018-04-03 14:38 UTC (permalink / raw)
  To: Luciano Joublanc; +Cc: Junio C Hamano, git

Hi Luciano,

On Sat, 31 Mar 2018, Luciano Joublanc wrote:

> I've cloned the `maint` branch, built the project, and added the test
> as you suggested - it's failing as expected.

Excellent.

> On 30 March 2018 at 11:20, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> >   However, this would be incorrect, as the flags are stored with the
> >   *commit*, not with the ref. So if two refs point to the same commit,
> >   that new code would skip the second one by mistake!
> 
> Isn't that the point here? to deduplicate commits?  My limited
> understanding is that at a 'ref' is like an alias or pointer to a
> commit.

The point is to deduplicate refs, not commits ;-)

Imagine that you have a git.git clone, and then you work on some topic,
say, `bundle-refs` and then call `git checkout -b bundle-refs-wip` from
there. If you now say

	git bundle create wip.bundle bundle-refs bundle-refs-wip

you will want both bundle-refs and bundle-refs-wip to show up in that
bundle, not just bundle-refs, even if both refs point at the same commit.

> >   By the way, this makes me think that there is another very real bug
> >   in the bundle code, in the part I showed above. Suppose you have a
> >   `master` and a `next` ref, and both point at the same commit, then
> >   you would want `git bundle create next.bundle master..next` to list
> >   `next`, don't you think?
> 
> Doesn't this contradict what you just said, that we don't want to skip
> refs with the same commit #?

I would rather be able to generate such a wip.bundle as outlined above
where calling `git ls-remote wip.bundle` would list *both* refs, with the
same commit.

> In fact, if you look in the calling function, there is a
> `    object_array_remove_duplicates(&revs.pending);`
> Which to the best of my understanding removes duplicate refs (not
> commits). However, I suspect this doesn't cover the `--all` case as
> it's a switch rather than a revspec? Would that be right?

Oh, I missed that!

And I also missed that this is implemented with something *else* than a
hashmap, so it won't have linear complexity but instead quadratic. Gross.

But you got an interesting nugget there, as it indeed tries to
deduplicate, but not by object ID, otherwise the bug you reported would
not occur (but other bugs, as I outlined above).

Instead, the object_array_remove_duplicates() code does this:

-- snip --
void object_array_remove_duplicates(struct object_array *array)
{
        unsigned nr = array->nr, src;
        struct object_array_entry *objects = array->objects;

        array->nr = 0;
        for (src = 0; src < nr; src++) {
                if (!contains_name(array, objects[src].name)) {
                        if (src != array->nr)
                                objects[array->nr] = objects[src];
                        array->nr++;
                } else {
                        object_array_release_entry(&objects[src]);
                }
        }
}
-- snap --

And indeed, the `contains_name()` function iterates through all of the
re-added entries and compares the *name*.

Running this in a debugger shows the culprit, too: there is a
`refs/heads/master`, a `HEAD` and a `master`. Note how the last entry
(which was taken directly from the command-line arguments) lacks the
`refs/heads/` prefix? *That* is the culprit...

> > - most likely, the best way to avoid duplicate refs entries is to use the
> >   actual ref name and put it into a hash set. Luckily, we do have code
> >   for this, and examples how to use it, too. See e.g. fc65b00da7e
> >   (merge-recursive: change current file dir string_lists to hashmap,
> >   2017-09-07). So you would define something like
> 
> Separately, if I do end up including the hashmap code, it should be
> refactored out into it's own file, right?

I do not think that is necessary. Personally, I'd just add the hashmap as
local variable to `write_bundle_refs()`, initialize it before the loop, add
the struct for the hashmap entry and the _cmp function as file-local (i.e.
`static`) function before `write_bundle_refs()`, then add all shown refs
(as stored in the `display_ref` variable) to the hashmap, and add another
conditional `goto skip_write_ref` after all the others, contingent on
`display_ref` *not* being found in the hashmap via
`hashmap_get_from_hash(&displayed_refs, strhash(display_ref),
display_ref)`.

In the same run, I would remove that `object_array_remove_duplicates()`
function altogether, as its only caller is now no longer necessary.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-03 14:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-19  8:39 Bad refspec messes up bundle Luciano Joublanc
2018-03-19 17:36 ` Junio C Hamano
2018-03-30 10:20   ` Johannes Schindelin
2018-03-30 17:18     ` Junio C Hamano
2018-03-30 18:58       ` Johannes Schindelin
2018-03-31  8:50     ` Luciano Joublanc
2018-04-03 14:38       ` Johannes Schindelin

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git