git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Ben Peart <peartben@gmail.com>
Cc: git@vger.kernel.org, pclouds@gmail.com,
	Ben Peart <benpeart@microsoft.com>
Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
Date: Tue, 13 Sep 2016 15:34:53 -0700	[thread overview]
Message-ID: <xmqq7fafv376.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: 20160913142628.15440-1-benpeart@microsoft.com

Ben Peart <peartben@gmail.com> writes:

> +static int needs_working_tree_merge(const struct checkout_opts *opts,
> +	const struct branch_info *old,
> +	const struct branch_info *new)
> +{
> +...
> +}

I do not think I need to repeat the same remarks on the conditions
in this helper, which hasn't changed since v2.  Many "comments" in
the code do not explain why skipping is justified, or what they
claim to check looks to me just plain wrong.

For example, there is

       /*
        * If we're not creating a new branch, by definition we're changing
        * the existing one so need to do the merge
        */
       if (!opts->new_branch)
               return 1;
	
but "git checkout" (no other argument) hits this condition.  It
disables the most trivial optimization opportunity, because we are
not "creating".

"By definition, we're changing"?  Really?  Not quite.

If you disable this bogus check, "git checkout" (no other argument)
would be allowed to skip the merge_working_tree(), and that in turn
reveals another case that the helper is not checking when
unpack_trees() MUST be called.

    Note: namely, when sparse checkout is in effect, switching from
    HEAD to HEAD can nuke existing working tree files outside the
    sparse pattern -- YUCK!  See penultimate test in t1011 for
    an example.

This yuckiness is not your fault, but needs_working_tree_merge()
logic you added needs to refrain from skipping unpack_trees() call
when sparse thing is in effect.  I'd expect "git checkout -b foo"
instead of "git checkout" (no other argument) would fail to honor
the sparse thing and reveal this bug, because the above bogus
"!opts->new_branch" check will not protect you for that case.

In other words, these random series of "if (...) return 1" are bugs
hiding other real bugs and we need to reason about which ones are
bugs that are hiding what other bugs that are not covered by this
function.  As Peff said earlier for v1, this is still an unreadable
mess.  We need to figure out a way to make sure we are skipping on
the right condition and not accidentally hiding a bug of failing to
check the right condition.  I offhand do not have a good suggestion
on this; sorry.

>  static int merge_working_tree(const struct checkout_opts *opts,
>  			      struct branch_info *old,
>  			      struct branch_info *new,
>  			      int *writeout_error)
>  {
> +	/*
> +	 * Optimize the performance of "git checkout -b foo" by avoiding
> +	 * the expensive merge, index and working directory updates if they
> +	 * are not needed.
> +	 */
> +	if (!needs_working_tree_merge(opts, old, new))
> +		return 0;
> +
>  	int ret;
>  	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));

With the change you made at the beginning of this function, it no
longer compiles with -Wdecl-after-stmt, but that is the smallest of
the problems.

It is a small step in the right direction to move the call to the
helper from the caller to this function, but it is a bit too small.

Notice that the lines after the above context look like this:

	hold_locked_index(lock_file, 1);
	if (read_cache_preload(NULL) < 0)
		return error(_("index file corrupt"));

	resolve_undo_clear();
	if (opts->force) {
		ret = reset_tree(new->commit->tree, opts, 1, writeout_error);
		if (ret)
			return ret;
	} else {
		struct tree_desc trees[2];
		...

I would have expected that the check goes inside the "else" thing
that actually does a two-tree merge, and the helper loses the check
with opts->force, at least.  That would still be a change smaller
than desired, but at least a meaningful improvement compared to the
previous one.  As I have already pointed out, in the "else" clause
there is a check "is the index free of conflicted entries? if so
error out", and that must be honored in !opt->force case, no matter
what your needs_working_tree_merge() says.  I also was hoping that
you would notice, when you were told about the unmerged check, by
reading the remainder of the merge_working_tree(), that we need to
call show_local_changes() when we are not doing force and when we
are not quiet---returning early like the above patch will never be
able to call that one downstream in the function.

Regardless of what the actual checks end up to be, the right place
to do this "optimization" would look more like:

 builtin/checkout.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2b50a49..a6b9e17 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -508,14 +508,19 @@ static int merge_working_tree(const struct checkout_opts *opts,
 			topts.dir->flags |= DIR_SHOW_IGNORED;
 			setup_standard_excludes(topts.dir);
 		}
+
+		if ( we know we can skip the unpack ) {
+			ret = 0;
+		} else {
 			tree = parse_tree_indirect(old->commit ?
 						   old->commit->object.oid.hash :
 						   EMPTY_TREE_SHA1_BIN);
 			init_tree_desc(&trees[0], tree->buffer, tree->size);
 			tree = parse_tree_indirect(new->commit->object.oid.hash);
 			init_tree_desc(&trees[1], tree->buffer, tree->size);
-
 			ret = unpack_trees(2, trees, &topts);
+		}
+
 		if (ret == -1) {
 			/*
 			 * Unpack couldn't do a trivial merge; either

I'd think.  Note that the determination of "we can skip" would
involve knowing the object names of the two trees involved, so for
performance reasons, some of the parse-tree calls may have to come
before the call to "do we know we can skip?", but that does not
fundamentally change the basic code structure.

Thanks.

  reply	other threads:[~2016-09-13 22:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-13 14:26 [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
2016-09-13 22:34 ` Junio C Hamano [this message]
2016-09-14  6:30   ` Oleg Taranenko
2016-09-14 15:48     ` Junio C Hamano
     [not found]   ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com>
     [not found]     ` <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com>
2016-09-19 13:18       ` Ben Peart
2016-09-19 16:30         ` Junio C Hamano
2016-09-19 17:03           ` Junio C Hamano
2016-09-21 18:32             ` Ben Peart
2016-09-24 14:28               ` Philip Oakley
2016-09-24 18:26                 ` Junio C Hamano
2016-09-24 19:31                   ` Philip Oakley
  -- strict thread matches above, loose matches on Subject: below --
2016-09-28 17:02 Ben Peart
2016-09-28 17:52 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7fafv376.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).