git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] pull: do not segfault when HEAD refers to missing object file
@ 2017-03-05 23:42 André Laszlo
  2017-03-05 23:52 ` brian m. carlson
  2017-03-06  3:46 ` Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: André Laszlo @ 2017-03-05 23:42 UTC (permalink / raw)
  To: git
  Cc: Nguyễn Thái Ngọc Duy, brian m . carlson,
	René Scharfe, Johannes Schindelin, Jeff King,
	André Laszlo

git pull --rebase on a corrupted HEAD used to segfault; it has been
corrected to error out with a message. A test has also been added to
verify this fix.

Signed-off-by: André Laszlo <andre@laszlo.nu>
---

Notes:
    When add_head_to_pending fails to add a pending object, git pull
    --rebase segfaults. This can happen if HEAD is referring to a corrupt
    or missing object.
    
    I discovered this segfault on my machine after pulling a repo from
    GitHub, but I have been unable to reproduce the sequence of events
    that lead to the corrupted HEAD (I think it may have been caused by a
    lost network connection in my case).
    
    The following commands use add_head_to_pending:
    
    format-patch  setup_revisions before add_head_to_pending
    diff          checks rev.pending.nr
    shortlog      checks rev.pending.nr
    log           uses resolve_ref_unsafe
    
    All of the above return an error code of 128 and print "fatal: bad
    object HEAD" instead of segfaulting, which I think is correct
    behavior. The check and error message have been added to
    has_uncommitted_changes, where they were missing, as well as to
    diff-lib.c (without the error message).

 diff-lib.c      |  2 +-
 t/t5520-pull.sh | 12 ++++++++++++
 wt-status.c     |  5 +++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/diff-lib.c b/diff-lib.c
index 52447466b..9d26b18c3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -512,7 +512,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	struct object_array_entry *ent;
 
 	ent = revs->pending.objects;
-	if (diff_cache(revs, ent->item->oid.hash, ent->name, cached))
+	if (!ent || diff_cache(revs, ent->item->oid.hash, ent->name, cached))
 		exit(128);
 
 	diff_set_mnemonic_prefix(&revs->diffopt, "c/", cached ? "i/" : "w/");
diff --git a/t/t5520-pull.sh b/t/t5520-pull.sh
index 17f4d0fe4..1edb6a97a 100755
--- a/t/t5520-pull.sh
+++ b/t/t5520-pull.sh
@@ -664,4 +664,16 @@ test_expect_success 'git pull --rebase against local branch' '
 	test file = "$(cat file2)"
 '
 
+test_expect_success 'git pull --rebase with corrupt HEAD does not segfault' '
+	mkdir corrupted &&
+	(cd corrupted &&
+	git init &&
+	echo one >file && git add file &&
+	git commit -m one &&
+	REV=$(git rev-parse HEAD) &&
+	rm -f .git/objects/${REV:0:2}/${REV:2} &&
+	test_expect_code 128 git pull --rebase > /dev/null
+	)
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index d47012048..3d60eaff5 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -2252,6 +2252,11 @@ int has_uncommitted_changes(int ignore_submodules)
 		DIFF_OPT_SET(&rev_info.diffopt, IGNORE_SUBMODULES);
 	DIFF_OPT_SET(&rev_info.diffopt, QUICK);
 	add_head_to_pending(&rev_info);
+
+	/* The add_head_to_pending call might not have added anything. */
+	if (!rev_info.pending.nr)
+		die("bad object %s", "HEAD");
+
 	diff_setup_done(&rev_info.diffopt);
 	result = run_diff_index(&rev_info, 1);
 	return diff_result_code(&rev_info.diffopt, result);
-- 
2.12.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pull: do not segfault when HEAD refers to missing object file
  2017-03-05 23:42 [PATCH] pull: do not segfault when HEAD refers to missing object file André Laszlo
@ 2017-03-05 23:52 ` brian m. carlson
  2017-03-06  3:51   ` Jeff King
  2017-03-06  3:46 ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: brian m. carlson @ 2017-03-05 23:52 UTC (permalink / raw)
  To: André Laszlo
  Cc: git, Nguyễn Thái Ngọc Duy, René Scharfe,
	Johannes Schindelin, Jeff King

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:
> +test_expect_success 'git pull --rebase with corrupt HEAD does not segfault' '
> +	mkdir corrupted &&
> +	(cd corrupted &&
> +	git init &&
> +	echo one >file && git add file &&
> +	git commit -m one &&
> +	REV=$(git rev-parse HEAD) &&
> +	rm -f .git/objects/${REV:0:2}/${REV:2} &&

I think this is a bashism.  On dash, I get the following:

  genre ok % dash -c 'foo=abcdefg; echo ${foo:0:2}; echo ${foo:2}'
  dash: 1: Bad substitution

> +	test_expect_code 128 git pull --rebase > /dev/null
> +	)
> +'
> +
>  test_done
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pull: do not segfault when HEAD refers to missing object file
  2017-03-05 23:42 [PATCH] pull: do not segfault when HEAD refers to missing object file André Laszlo
  2017-03-05 23:52 ` brian m. carlson
@ 2017-03-06  3:46 ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-03-06  3:46 UTC (permalink / raw)
  To: André Laszlo
  Cc: git, Nguyễn Thái Ngọc Duy, brian m . carlson,
	René Scharfe, Johannes Schindelin

On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:

> git pull --rebase on a corrupted HEAD used to segfault;it has been
> corrected to error out with a message. A test has also been added to
> verify this fix.

Your commit message mostly tells us the "what" that we can see from the
diff. But why are we segfaulting? What assumption is being violated, and
why is the fix the right thing?

You've stuck some of the details in your notes, and they should probably
make it into the commit message.

>     When add_head_to_pending fails to add a pending object, git pull
>     --rebase segfaults. This can happen if HEAD is referring to a corrupt
>     or missing object.

The other obvious time when HEAD is not valid is when you are on an
unborn branch. So we should also consider how such a case interacts with
the callsites you are touching.

I think it is primarily this hunk:

> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -2252,6 +2252,11 @@ int has_uncommitted_changes(int ignore_submodules)
>  		DIFF_OPT_SET(&rev_info.diffopt, IGNORE_SUBMODULES);
>  	DIFF_OPT_SET(&rev_info.diffopt, QUICK);
>  	add_head_to_pending(&rev_info);
> +
> +	/* The add_head_to_pending call might not have added anything. */
> +	if (!rev_info.pending.nr)
> +		die("bad object %s", "HEAD");
> +
>  	diff_setup_done(&rev_info.diffopt);
>  	result = run_diff_index(&rev_info, 1);
>  	return diff_result_code(&rev_info.diffopt, result);

that is the fix. We assume that add_head_to_pending() put something into
rev_info.pending, but it might not.

In the case of corruption, "bad object HEAD" is a reasonable outcome.

In the case of an unborn branch, we'd probably want to compare against
the empty tree. This trick is used elsewhere in wt-status.c, as in
wt_status_collect_changes_index().

I'm not sure if we need to deal with that here or not. I wasn't able to
trigger this code with an unborn branch. If you have no index, then the
is_cache_unborn() check triggers earlier in the function. If you do have
an index, then we have an earlier check:

  if (is_null_sha1(orig_head) && !is_cache_unborn())
	die(_("Updating an unborn branch with changes added to the index."));

that covers this case. I am not 100% sure that check is correct, though.
If I do:

  git init
  echo content >foo
  git add foo
  git rm -f foo

then I have an index which is not "unborn", but also does not contain
any changes. I think the conditional above should just be checking
"active_nr". If we were to fix that, then I think
has_uncommitted_changes() would need to be adjusted as well.

I admit that this is probably an absurd corner case.  So I think I am OK
with your fix for now, and we can revisit the logic later if anybody
starts to care.

>     I discovered this segfault on my machine after pulling a repo from
>     GitHub, but I have been unable to reproduce the sequence of events
>     that lead to the corrupted HEAD (I think it may have been caused by a
>     lost network connection in my case).

Yikes. That should never lead to a corrupted HEAD (unless you are on a
networked filesystem). I can't think of another read add_head_to_pending
would fail, though, aside from a missing or corrupted object.

Arguably add_head_to_pending() should die loudly if it can't parse the
object pointed to by HEAD, rather than quietly returning without adding
anything.

> diff --git a/diff-lib.c b/diff-lib.c
> index 52447466b..9d26b18c3 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -512,7 +512,7 @@ int run_diff_index(struct rev_info *revs, int cached)
>  	struct object_array_entry *ent;
>  
>  	ent = revs->pending.objects;
> -	if (diff_cache(revs, ent->item->oid.hash, ent->name, cached))
> +	if (!ent || diff_cache(revs, ent->item->oid.hash, ent->name, cached))
>  		exit(128);

So if I understand correctly, this hunk should not trigger anymore,
because we would never call run_diff_index() without something in
pending.

I think it would be a programming error elsewhere to do so, and we
should treat this as an assertion by complaining loudly (for this and
any other confusing cases):

  if (revs->pending.nr != 1)
          die("BUG: run_diff_index requires exactly one rev (got %d)",
	      revs->pending.nr);

Lastly, I think this is your first patch. So welcome, and thank you. :)
I know there was a lot of discussion and critique above, but I think
overall your patch is going in the right direction.

-Peff


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pull: do not segfault when HEAD refers to missing object file
  2017-03-05 23:52 ` brian m. carlson
@ 2017-03-06  3:51   ` Jeff King
  2017-03-06  6:52     ` Johannes Sixt
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2017-03-06  3:51 UTC (permalink / raw)
  To: brian m. carlson, André Laszlo, git,
	Nguyễn Thái Ngọc Duy, René Scharfe,
	Johannes Schindelin

On Sun, Mar 05, 2017 at 11:52:22PM +0000, brian m. carlson wrote:

> On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:
> > +test_expect_success 'git pull --rebase with corrupt HEAD does not segfault' '
> > +	mkdir corrupted &&
> > +	(cd corrupted &&
> > +	git init &&
> > +	echo one >file && git add file &&
> > +	git commit -m one &&
> > +	REV=$(git rev-parse HEAD) &&
> > +	rm -f .git/objects/${REV:0:2}/${REV:2} &&
> 
> I think this is a bashism.  On dash, I get the following:
> 
>   genre ok % dash -c 'foo=abcdefg; echo ${foo:0:2}; echo ${foo:2}'
>   dash: 1: Bad substitution

Yeah, it is. You can do it easily with 'sed', of course, but if you want
to avoid the extra process and do it in pure shell, it's more like:

  last38=${REV#??}
  first2=${REV%$last38}
  rm -f .git/objects/$first2/$last38

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pull: do not segfault when HEAD refers to missing object file
  2017-03-06  3:51   ` Jeff King
@ 2017-03-06  6:52     ` Johannes Sixt
  2017-03-06  7:33       ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Sixt @ 2017-03-06  6:52 UTC (permalink / raw)
  To: Jeff King, André Laszlo
  Cc: brian m. carlson, git, Nguyễn Thái Ngọc Duy,
	René Scharfe, Johannes Schindelin

Am 06.03.2017 um 04:51 schrieb Jeff King:
> On Sun, Mar 05, 2017 at 11:52:22PM +0000, brian m. carlson wrote:
>
>> On Mon, Mar 06, 2017 at 12:42:22AM +0100, André Laszlo wrote:
>>> +test_expect_success 'git pull --rebase with corrupt HEAD does not segfault' '
>>> +	mkdir corrupted &&
>>> +	(cd corrupted &&

We usally indent this like so:

	(
		cd corrupted &&
		echo one >file &&
		git add file &&
...
	) &&

>>> +	git init &&
>>> +	echo one >file && git add file &&
>>> +	git commit -m one &&
>>> +	REV=$(git rev-parse HEAD) &&
>>> +	rm -f .git/objects/${REV:0:2}/${REV:2} &&
>>
>> I think this is a bashism.  On dash, I get the following:
>>
>>   genre ok % dash -c 'foo=abcdefg; echo ${foo:0:2}; echo ${foo:2}'
>>   dash: 1: Bad substitution
>
> Yeah, it is. You can do it easily with 'sed', of course, but if you want
> to avoid the extra process and do it in pure shell, it's more like:
>
>   last38=${REV#??}
>   first2=${REV%$last38}
>   rm -f .git/objects/$first2/$last38

Is it "HEAD points to non-existent object" or ".git/HEAD contains junk"? 
In both cases there are simpler solutions than to remove an object. For 
example, `echo "$_x40" >.git/HEAD` or `echo "this is junk" >.git/HEAD`?

-- Hannes


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pull: do not segfault when HEAD refers to missing object file
  2017-03-06  6:52     ` Johannes Sixt
@ 2017-03-06  7:33       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-03-06  7:33 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: André Laszlo, brian m. carlson, git,
	Nguyễn Thái Ngọc Duy, René Scharfe,
	Johannes Schindelin

On Mon, Mar 06, 2017 at 07:52:10AM +0100, Johannes Sixt wrote:

> > Yeah, it is. You can do it easily with 'sed', of course, but if you want
> > to avoid the extra process and do it in pure shell, it's more like:
> > 
> >   last38=${REV#??}
> >   first2=${REV%$last38}
> >   rm -f .git/objects/$first2/$last38
> 
> Is it "HEAD points to non-existent object" or ".git/HEAD contains junk"? In
> both cases there are simpler solutions than to remove an object. For
> example, `echo "$_x40" >.git/HEAD` or `echo "this is junk" >.git/HEAD`?

Good point. Unfortunately, it's a little tricky. You can't use "this is
junk" because then git does not recognize it as a git repo at all. You
can't use $_x40 because the null sha1 is used internally to signal an
unborn branch, so we mix it up with that case.

Something like "111..." for 40 characters would work, though at that
point I think just simulating an actual corruption might be less
convoluted.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-03-06  7:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-05 23:42 [PATCH] pull: do not segfault when HEAD refers to missing object file André Laszlo
2017-03-05 23:52 ` brian m. carlson
2017-03-06  3:51   ` Jeff King
2017-03-06  6:52     ` Johannes Sixt
2017-03-06  7:33       ` Jeff King
2017-03-06  3:46 ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).