Re: [PATCH v2] Avoid segfault and provide fallback when cloning invalid branch/tag

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: "Sohom Datta via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Sohom Datta <sohom.datta@learner.manipal.edu>
Subject: Re: [PATCH v2] Avoid segfault and provide fallback when cloning invalid branch/tag
Date: Wed, 04 Nov 2020 11:31:05 -0800	[thread overview]
Message-ID: <xmqq7dr0ncli.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <pull.906.v2.git.git.1604510485432.gitgitgadget@gmail.com> (Sohom Datta via GitGitGadget's message of "Wed, 04 Nov 2020 17:21:25 +0000")

"Sohom Datta via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Sohom Datta <sohom.datta@learner.manipal.edu>

> Subject: Re: [PATCH v2] Avoid segfault and provide fallback when cloning invalid branch/tag

cf. https://git-scm.com/docs/SubmittingPatches#describe-changes

e.g. Subject: [PATCH v2] clone: do not put a non-commit on branch head

or something like that?

>
> The Git command to clone a specific branch of a git repository always
> assumes that the commit pointed to by the branch are going to valid. It

s/are going to valid/is going to be valid/, or just "is valid".
Probably the latter is preferable.

> thus segfaults and crashes whenever a tag or a branch with a invalid
> commit is specified.
>
> Aborting the operation is not desirable since the user is left with no
> usable git folder to work with despite git having done most of the work

s/folder/repository/; not everything is Windows ;-)

Insert a comma after "work with" before "despite".

> in setting everything up.

The above description is not wrong per-se, but the most expensive
part of discarding the result at this poihnt would be the copying of
contents from the other side, not just setting things up locally,
no?

> This commit is based of David Berardi's commit at
> https://lore.kernel.org/git/20191103180716.GA72007@carpenter.lan/
>
> Make git fallback on creating a unborn master branch when encountering
> a invalid commit.
>
> Signed-off-by: Sohom Datta <sohom.datta@learner.manipal.edu>
> ---
>     Fix potential segfault on cloning invalid tag
>     
>     Changes since v1:
>     
>      * Reworked the patch to use the fallback approach based on feedback
>        from Jeff King.

Thanks.  Other than the nit pointed out above, the log message looks
readable.  Let's take a look at the code.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index a0841923cf..53930f7536 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -711,10 +711,63 @@ static void update_remote_refs(const struct ref *refs,
>  	}
>  }
>  
> -static void update_head(const struct ref *our, const struct ref *remote,
> +static struct commit * commit_lookup_helper (const struct ref *our,
> +					      const struct ref *remote,
> +					      const char *msg,
> +					      int *err)

Style.  Remove SP around commit_lookup_helper on both sides.

foo_helper() is not particularly a meaningful name.  Let's read on
and see if we can come up with a better name.

> +{
> +	const struct object_id *tip = NULL;

This initialization is not strictly needed, no?  I'd rather see us
leave it uninitialized.

> +	struct commit *tip_commit =  NULL;

Excess SP before NULL.

> +	if (our)
> +		tip = &our->old_oid;
> +	else if (remote)
> +		tip = &remote->old_oid;
> +	else {
> +		/*
> +		 * We have no local branch requested with "-b", and the
> +		 * remote HEAD is unborn. There's nothing to update HEAD
> +		 * to, but this state is not an error.
> +		 */
> +		return NULL;
> +	}

OK, so tip points at the location that has the object name of the
object that is planned to be at the tip of the branch that is
checked out to the working tree.

> +	if ( !lookup_object(the_repository, tip)) {

Style. No SP after '(' before the condition.

> +		/**

Why double asterisk here, unlike the early return we saw earlier?
Let's lose the extra one.

> +		 * We have a object id associated with the tip of the branch

s/a object id/an object name/;

> +		 * but the object id doesn't resolve to a object. This will be
> +		 * handled downstream in update_ref().
> +		 */

The design to punt on a non-existing object at this point in the
callflow may be sound, but is !lookup_object() a reliable indication
that the object name refers to a non-existing object?

In other words, I do not see what guarantees that we have already
saw and parsed the named object and placed it in the hashtable
the_repository->parsed_objects->obj_hash[].

Is there a reason we should avoid using parse_object() here instead?

> +		return NULL;
> +	}
> +
> +	tip_commit = lookup_commit_reference_gently(the_repository, tip, 1);
> +	if (!tip_commit) {
> +		/*
> +		 * The given non-commit cannot be checked out,
> +		 * so have a 'master' branch and leave it unborn.
> +		 */

OK, the reasoning here is that we checked if tip refers to
non-existing object before we tried lookup_commit_reference_gently(),
so a failure to find a commit here must mean the object does exist
but is not a committish.  And in such a case, we decided to abort
the calling update_head() and create an unborn branch ourselves.

Perhaps makes sense, I need to think about it a bit more to be
certain, though.

> +		error(_("non-commit branch cannot be checked out."));
> +		create_symref("HEAD", "refs/heads/master" ,msg);

Style.  Wrong placement of a comma before 'msg'.

Post v2.29 codebase should ask what our default first branch name
should be.

> +		*err = -1;
> +		return NULL;

It is customary to use the returned value from error(), like so

		*status = error(_("..."));

that would save you one line.  Taking all the nits above together,
perhaps

	if (!tip_commit) {
		char *ref = xstrfmt("refs/heads/%s", git_default_branch_name());
		create_symref("HEAD", ref, msg);
		*err = error(_("..."));
		free(ref);
		return NULL;
	}

HOWEVER, I think it is much cleaner to lose the "we first check if
the object exists" part and directly see if tip points at a
committish first.  Only when the tip does not point at a committish,
we need to tell the two cases apart.  So after figuring out into
which ref the variable "tip" should point, we'd do something like:

	tip_commit = lookup_commit_refernece_gently(... tip, 1);
	if (tip_commit)
		return tip_commit;

	/*
	 * Now, "tip" is not a committish.  Does it even exist?
	 */
	if (!lookup_object(... tip))
		/* let update_ref() handle this error condition */
		return NULL;

	/*
	 * Give an error, but leave the default initial branch
	 * unborn, instead of failing the whole operation to cause
	 * the directory removed.
	 */
	... do the create_symref() thing ...
	return NULL;

Wouldn't it be cleaner?

> +	}
> +
> +	return tip_commit;
> +}
> +
> +static int update_head(const struct ref *our,
> +			const struct ref *remote,

If you did not do unnecessary rewrapping, the patch noise in the
function definition here would have been avoided, no?

>  			const char *msg)
>  {
>  	const char *head;
> +	int err = 0;
> +	const struct commit * c;

Style.  Lose SP between asterisk and c; in our codebase the asterisk
sticks to the identifier (not type).

Have a blank line here, between the end of decl and the first stmt
in the block.

> +	c = commit_lookup_helper(our, remote, msg, &err);
> +	if (err < 0) {
> +		return -1;
> +	}

Style. No {} around a single stmt.

More importantly, without a name that is more meaningful than
"commit_lookup_helper()" and without any comment near its definition
to describe what the function is supposed to compute and what it
considers an error, the reader of this callsite cannot tell what we
are calling the "helper" for, and under what condition we are told
by the helper to return an error to our callers.

With the analysis and rewrite of that _helper() function, I think a
more descriptive name for the helper should contain words like
"check/ensure" and "commit" in it.  And the "err" variables, both in
the caller and in the helper function, are somewhat misnamed.  It is
a signal from the callee to the caller that updating the branch head
has already been done by the callee, so the caller does not have to 
do anything further.  Perhaps

	int skip_update_ref = 0;
	const struct commit *c;

	c = check_tip_commit(our, remote, msg, &skip_update_ref);
	if (skip_update_ref)
		return -1; /* left a fallback unborn branch */

	... and then "c" is a commit (already peeled) usable for
	... pointing HEAD to at this point.

I'd suggest giving the "c" variable, whose lifespan is now much
longer than in the original code, a more meaningful name, though.

>  	if (our && skip_prefix(our->name, "refs/heads/", &head)) {
>  		/* Local default branch link */
>  		if (create_symref("HEAD", our->name, NULL) < 0)
> @@ -725,8 +778,6 @@ static void update_head(const struct ref *our, const struct ref *remote,
>  			install_branch_config(0, head, remote_name, our->name);
>  		}
>  	} else if (our) {
> -		struct commit *c = lookup_commit_reference(the_repository,
> -							   &our->old_oid);
>  		/* --branch specifies a non-branch (i.e. tags), detach HEAD */
>  		update_ref(msg, "HEAD", &c->object.oid, NULL, REF_NO_DEREF,
>  			   UPDATE_REFS_DIE_ON_ERR);
> @@ -739,6 +790,8 @@ static void update_head(const struct ref *our, const struct ref *remote,
>  		update_ref(msg, "HEAD", &remote->old_oid, NULL, REF_NO_DEREF,
>  			   UPDATE_REFS_DIE_ON_ERR);

Can't this cause discrepancy when the helper picked remote->old_oid
as the "tip" there?  We called lookup_commit_reference_gently() on
that oid, and decided to check out the resulting commit and point
HEAD at it, so we are certain that your "c" is a commit object, but
remote->old_oid may name an annotated tag that points at the commit
object.  Don't we want to use &c->object.old here as well?

If that is the case (you need to think about it yourself---I didn't
spend more than 5 minutes on this particular point myself and the
above might be total nonsense), this part would become

	if (our && our is in our local branch namespace) {
		... do the local thing ...
	} else if (our || remote) {
		update_ref(msg, "HEAD", &c->object.oid, NULL, ...);
	}

as the body for 'our' and 'remote' are the same.

>  	}
> +
> +	return 0;
>  }
>  
>  static int git_sparse_checkout_init(const char *repo)
> @@ -1346,7 +1399,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  			   branch_top.buf, reflog_msg.buf, transport,
>  			   !is_local);
>  
> -	update_head(our_head_points_at, remote_head, reflog_msg.buf);
> +	err = update_head(our_head_points_at, remote_head, reflog_msg.buf);
>  
>  	/*
>  	 * We want to show progress for recursive submodule clones iff
> @@ -1365,7 +1418,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	}
>  
>  	junk_mode = JUNK_LEAVE_REPO;
> -	err = checkout(submodule_progress);
> +	if ( err == 0 ) {
> +		/*
> +		 * Only try to checkout if we successfully updated HEAD; otherwise
> +		 * HEAD isn't pointing to the thing the user wanted.
> +		 */
> +		err = checkout(submodule_progress);
> +	}

The logic looks OK.  We avoid checking out an unborn if we left HEAD
unborn because there is no better choice.  And by not calling
checkout() and overwriting the err variable, we make sure that the
entire command still reports a failure.

I am not absolutely sure if reusing 'err' like the above two hunks
is the best thing to do, or the code becomes easier to follow if we
introduced another variable to receive the returned value from
update_head().  Perhaps what we have above is good enough.  I dunno.

> diff --git a/t/t5609-clone-branch.sh b/t/t5609-clone-branch.sh
> index 6e7a7be052..a589db9aa0 100755
> --- a/t/t5609-clone-branch.sh
> +++ b/t/t5609-clone-branch.sh
> @@ -20,6 +20,9 @@ test_expect_success 'setup' '
>  	 echo one >file && git add file && git commit -m one &&
>  	 git checkout -b two &&
>  	 echo two >file && git add file && git commit -m two &&
> +	 blob=$(git rev-parse HEAD:file) &&
> +	 echo $blob > .git/refs/heads/broken-tag &&
> +	 echo $blob > .git/refs/heads/broken-head &&

Style.  Do not leave SP between redirection operator and its target,
i.e.

	echo $blob >.git/refs/heads/...

Didn't you mean to use refs/tags/broken-tag for the first one, though?
Otherwise, I do not see how anybody would expect the two tests check
different things.

Thanks.

> +test_expect_success 'cloning -b for invalid tag must fail and fallback on remote head' '
> +	test_must_fail git clone -b broken-tag parent broken-tag 2>error &&
> +	test_i18ngrep "non-commit branch cannot be checked out." error &&
> +	(cd broken-tag && check_HEAD master)
> +'
> +
> +test_expect_success 'cloning -b for broken head must fail and fallback on remote head' '
> +	test_must_fail git clone -b broken-head parent broken-head &&
> +	test_i18ngrep "non-commit branch cannot be checked out." error &&
> +	(cd broken-head && check_HEAD master)
> +'
> +
>  test_done
>
> base-commit: 7f7ebe054af6d831b999d6c2241b9227c4e4e08d

     prev parent reply	other threads:[~2020-11-04 19:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 11:46 [PATCH] Fix potential segfault on cloning invalid tag Sohom Datta via GitGitGadget
2020-10-30 15:09 ` Jeff King
2020-11-04 17:21 ` [PATCH v2] Avoid segfault and provide fallback when cloning invalid branch/tag Sohom Datta via GitGitGadget
2020-11-04 19:31   ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7dr0ncli.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=peff@peff.net \
    --cc=sohom.datta@learner.manipal.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).