git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
@ 2016-09-13 14:26 Ben Peart
  2016-09-13 22:34 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Peart @ 2016-09-13 14:26 UTC (permalink / raw)
  To: git; +Cc: gitster, pclouds, peartben, Ben Peart

Teach git to avoid unnecessary merge during trivial checkout.  When
running 'git checkout -b foo' git follows a common code path through
the expensive merge_working_tree even when it is unnecessary.  As a
result, 95% of the time is spent in merge_working_tree doing the 2-way
merge between the new and old commit trees that is unneeded.

The time breakdown is as follows:

    merge_working_tree <-- 95%
        unpack_trees <-- 80%
            traverse_trees <-- 50%
            cache_tree_update <-- 17%
            mark_new_skip_worktree <-- 10%

With a large repo, this cost is pronounced.  Using "git checkout -b r"
to create and switch to a new branch costs 166 seconds (all times worst
case with a cold file system cache).

git.c:406               trace: built-in: git 'checkout' '-b' 'r'
read-cache.c:1667       performance: 17.442926555 s: read_index_from
name-hash.c:128         performance: 2.912145231 s: lazy_init_name_hash
read-cache.c:2208       performance: 4.387713335 s: write_locked_index
trace.c:420             performance: 166.458921289 s: git command:
                                        'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 'r'
Switched to a new branch 'r'

By adding a test to skip the unnecessary call to merge_working_tree in
this case reduces the cost to 16 seconds.

git.c:406               trace: built-in: git 'checkout' '-b' 's'
read-cache.c:1667       performance: 16.100742476 s: read_index_from
trace.c:420             performance: 16.461547867 s: git command: 'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's'
Switched to a new branch 's'

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 builtin/checkout.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..8b2f428 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -38,6 +38,10 @@ struct checkout_opts {
 	int ignore_skipworktree;
 	int ignore_other_worktrees;
 	int show_progress;
+	/*
+	 * If new checkout options are added, needs_working_tree_merge
+	 * should be updated accordingly.
+	 */
 
 	const char *new_branch;
 	const char *new_branch_force;
@@ -460,11 +464,99 @@ static void setup_branch_path(struct branch_info *branch)
 	branch->path = strbuf_detach(&buf, NULL);
 }
 
+static int needs_working_tree_merge(const struct checkout_opts *opts,
+	const struct branch_info *old,
+	const struct branch_info *new)
+{
+	/*
+	 * We must do the merge if we are actually moving to a new
+	 * commit tree.
+	 */
+	if (!old->commit || !new->commit ||
+		oidcmp(&old->commit->tree->object.oid, &new->commit->tree->object.oid))
+		return 1;
+
+	/*
+	 * opts->patch_mode cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	/*
+	 * opts->quiet only impacts output so doesn't require a merge
+	 */
+
+	/*
+	 * Honor the explicit request for a three-way merge or to throw away
+	 * local changes
+	 */
+	if (opts->merge || opts->force)
+		return 1;
+
+	/*
+	 * Checking out the requested commit may require updating the working
+	 * directory and index, let the merge handle it.
+	 */
+	if (opts->force_detach)
+		return 1;
+
+	/*
+	 * opts->writeout_stage cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	/*
+	 * Honor the explicit ignore requests
+	 */
+	if (!opts->overwrite_ignore || opts->ignore_skipworktree ||
+		opts->ignore_other_worktrees)
+		return 1;
+
+	/*
+	 * opts->show_progress only impacts output so doesn't require a merge
+	 */
+
+	/*
+	 * If we're not creating a new branch, by definition we're changing
+	 * the existing one so need to do the merge
+	 */
+	if (!opts->new_branch)
+		return 1;
+
+	/*
+	 * new_branch_force is defined to "create/reset and checkout a branch"
+	 * so needs to go through the merge to do the reset
+	 */
+	if (opts->new_branch_force)
+		return 1;
+
+	/*
+	 * A new orphaned branch requrires the index and the working tree to be
+	 * adjusted to <start_point>
+	 */
+	if (opts->new_orphan_branch)
+		return 1;
+
+	/*
+	 * Remaining variables are not checkout options but used to track state
+	 * that doesn't trigger the need for a merge.
+	 */
+
+	return 0;
+}
+
 static int merge_working_tree(const struct checkout_opts *opts,
 			      struct branch_info *old,
 			      struct branch_info *new,
 			      int *writeout_error)
 {
+	/*
+	 * Optimize the performance of "git checkout -b foo" by avoiding
+	 * the expensive merge, index and working directory updates if they
+	 * are not needed.
+	 */
+	if (!needs_working_tree_merge(opts, old, new))
+		return 0;
+
 	int ret;
 	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 
-- 
2.10.0.windows.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-13 14:26 [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
@ 2016-09-13 22:34 ` Junio C Hamano
  2016-09-14  6:30   ` Oleg Taranenko
       [not found]   ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com>
  0 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-09-13 22:34 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, pclouds, Ben Peart

Ben Peart <peartben@gmail.com> writes:

> +static int needs_working_tree_merge(const struct checkout_opts *opts,
> +	const struct branch_info *old,
> +	const struct branch_info *new)
> +{
> +...
> +}

I do not think I need to repeat the same remarks on the conditions
in this helper, which hasn't changed since v2.  Many "comments" in
the code do not explain why skipping is justified, or what they
claim to check looks to me just plain wrong.

For example, there is

       /*
        * If we're not creating a new branch, by definition we're changing
        * the existing one so need to do the merge
        */
       if (!opts->new_branch)
               return 1;
	
but "git checkout" (no other argument) hits this condition.  It
disables the most trivial optimization opportunity, because we are
not "creating".

"By definition, we're changing"?  Really?  Not quite.

If you disable this bogus check, "git checkout" (no other argument)
would be allowed to skip the merge_working_tree(), and that in turn
reveals another case that the helper is not checking when
unpack_trees() MUST be called.

    Note: namely, when sparse checkout is in effect, switching from
    HEAD to HEAD can nuke existing working tree files outside the
    sparse pattern -- YUCK!  See penultimate test in t1011 for
    an example.

This yuckiness is not your fault, but needs_working_tree_merge()
logic you added needs to refrain from skipping unpack_trees() call
when sparse thing is in effect.  I'd expect "git checkout -b foo"
instead of "git checkout" (no other argument) would fail to honor
the sparse thing and reveal this bug, because the above bogus
"!opts->new_branch" check will not protect you for that case.

In other words, these random series of "if (...) return 1" are bugs
hiding other real bugs and we need to reason about which ones are
bugs that are hiding what other bugs that are not covered by this
function.  As Peff said earlier for v1, this is still an unreadable
mess.  We need to figure out a way to make sure we are skipping on
the right condition and not accidentally hiding a bug of failing to
check the right condition.  I offhand do not have a good suggestion
on this; sorry.

>  static int merge_working_tree(const struct checkout_opts *opts,
>  			      struct branch_info *old,
>  			      struct branch_info *new,
>  			      int *writeout_error)
>  {
> +	/*
> +	 * Optimize the performance of "git checkout -b foo" by avoiding
> +	 * the expensive merge, index and working directory updates if they
> +	 * are not needed.
> +	 */
> +	if (!needs_working_tree_merge(opts, old, new))
> +		return 0;
> +
>  	int ret;
>  	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));

With the change you made at the beginning of this function, it no
longer compiles with -Wdecl-after-stmt, but that is the smallest of
the problems.

It is a small step in the right direction to move the call to the
helper from the caller to this function, but it is a bit too small.

Notice that the lines after the above context look like this:

	hold_locked_index(lock_file, 1);
	if (read_cache_preload(NULL) < 0)
		return error(_("index file corrupt"));

	resolve_undo_clear();
	if (opts->force) {
		ret = reset_tree(new->commit->tree, opts, 1, writeout_error);
		if (ret)
			return ret;
	} else {
		struct tree_desc trees[2];
		...

I would have expected that the check goes inside the "else" thing
that actually does a two-tree merge, and the helper loses the check
with opts->force, at least.  That would still be a change smaller
than desired, but at least a meaningful improvement compared to the
previous one.  As I have already pointed out, in the "else" clause
there is a check "is the index free of conflicted entries? if so
error out", and that must be honored in !opt->force case, no matter
what your needs_working_tree_merge() says.  I also was hoping that
you would notice, when you were told about the unmerged check, by
reading the remainder of the merge_working_tree(), that we need to
call show_local_changes() when we are not doing force and when we
are not quiet---returning early like the above patch will never be
able to call that one downstream in the function.

Regardless of what the actual checks end up to be, the right place
to do this "optimization" would look more like:

 builtin/checkout.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2b50a49..a6b9e17 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -508,14 +508,19 @@ static int merge_working_tree(const struct checkout_opts *opts,
 			topts.dir->flags |= DIR_SHOW_IGNORED;
 			setup_standard_excludes(topts.dir);
 		}
+
+		if ( we know we can skip the unpack ) {
+			ret = 0;
+		} else {
 			tree = parse_tree_indirect(old->commit ?
 						   old->commit->object.oid.hash :
 						   EMPTY_TREE_SHA1_BIN);
 			init_tree_desc(&trees[0], tree->buffer, tree->size);
 			tree = parse_tree_indirect(new->commit->object.oid.hash);
 			init_tree_desc(&trees[1], tree->buffer, tree->size);
-
 			ret = unpack_trees(2, trees, &topts);
+		}
+
 		if (ret == -1) {
 			/*
 			 * Unpack couldn't do a trivial merge; either

I'd think.  Note that the determination of "we can skip" would
involve knowing the object names of the two trees involved, so for
performance reasons, some of the parse-tree calls may have to come
before the call to "do we know we can skip?", but that does not
fundamentally change the basic code structure.

Thanks.

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-13 22:34 ` Junio C Hamano
@ 2016-09-14  6:30   ` Oleg Taranenko
  2016-09-14 15:48     ` Junio C Hamano
       [not found]   ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com>
  1 sibling, 1 reply; 13+ messages in thread
From: Oleg Taranenko @ 2016-09-14  6:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Peart, git, pclouds, Ben Peart

Sorry for bothering, why not introduce a brand new option like git
checkout -b foo --skip-worktree-merge for such rare optimization use
case?

On Wed, Sep 14, 2016 at 12:34 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ben Peart <peartben@gmail.com> writes:
>
>> +static int needs_working_tree_merge(const struct checkout_opts *opts,
>> +     const struct branch_info *old,
>> +     const struct branch_info *new)
>> +{
>> +...
>> +}
>
> I do not think I need to repeat the same remarks on the conditions
> in this helper, which hasn't changed since v2.  Many "comments" in
> the code do not explain why skipping is justified, or what they
> claim to check looks to me just plain wrong.
>
> For example, there is
>
>        /*
>         * If we're not creating a new branch, by definition we're changing
>         * the existing one so need to do the merge
>         */
>        if (!opts->new_branch)
>                return 1;
>
> but "git checkout" (no other argument) hits this condition.  It
> disables the most trivial optimization opportunity, because we are
> not "creating".
>
> "By definition, we're changing"?  Really?  Not quite.
>
> If you disable this bogus check, "git checkout" (no other argument)
> would be allowed to skip the merge_working_tree(), and that in turn
> reveals another case that the helper is not checking when
> unpack_trees() MUST be called.
>
>     Note: namely, when sparse checkout is in effect, switching from
>     HEAD to HEAD can nuke existing working tree files outside the
>     sparse pattern -- YUCK!  See penultimate test in t1011 for
>     an example.
>
> This yuckiness is not your fault, but needs_working_tree_merge()
> logic you added needs to refrain from skipping unpack_trees() call
> when sparse thing is in effect.  I'd expect "git checkout -b foo"
> instead of "git checkout" (no other argument) would fail to honor
> the sparse thing and reveal this bug, because the above bogus
> "!opts->new_branch" check will not protect you for that case.
>
> In other words, these random series of "if (...) return 1" are bugs
> hiding other real bugs and we need to reason about which ones are
> bugs that are hiding what other bugs that are not covered by this
> function.  As Peff said earlier for v1, this is still an unreadable
> mess.  We need to figure out a way to make sure we are skipping on
> the right condition and not accidentally hiding a bug of failing to
> check the right condition.  I offhand do not have a good suggestion
> on this; sorry.
>
>>  static int merge_working_tree(const struct checkout_opts *opts,
>>                             struct branch_info *old,
>>                             struct branch_info *new,
>>                             int *writeout_error)
>>  {
>> +     /*
>> +      * Optimize the performance of "git checkout -b foo" by avoiding
>> +      * the expensive merge, index and working directory updates if they
>> +      * are not needed.
>> +      */
>> +     if (!needs_working_tree_merge(opts, old, new))
>> +             return 0;
>> +
>>       int ret;
>>       struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
>
> With the change you made at the beginning of this function, it no
> longer compiles with -Wdecl-after-stmt, but that is the smallest of
> the problems.
>
> It is a small step in the right direction to move the call to the
> helper from the caller to this function, but it is a bit too small.
>
> Notice that the lines after the above context look like this:
>
>         hold_locked_index(lock_file, 1);
>         if (read_cache_preload(NULL) < 0)
>                 return error(_("index file corrupt"));
>
>         resolve_undo_clear();
>         if (opts->force) {
>                 ret = reset_tree(new->commit->tree, opts, 1, writeout_error);
>                 if (ret)
>                         return ret;
>         } else {
>                 struct tree_desc trees[2];
>                 ...
>
> I would have expected that the check goes inside the "else" thing
> that actually does a two-tree merge, and the helper loses the check
> with opts->force, at least.  That would still be a change smaller
> than desired, but at least a meaningful improvement compared to the
> previous one.  As I have already pointed out, in the "else" clause
> there is a check "is the index free of conflicted entries? if so
> error out", and that must be honored in !opt->force case, no matter
> what your needs_working_tree_merge() says.  I also was hoping that
> you would notice, when you were told about the unmerged check, by
> reading the remainder of the merge_working_tree(), that we need to
> call show_local_changes() when we are not doing force and when we
> are not quiet---returning early like the above patch will never be
> able to call that one downstream in the function.
>
> Regardless of what the actual checks end up to be, the right place
> to do this "optimization" would look more like:
>
>  builtin/checkout.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 2b50a49..a6b9e17 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -508,14 +508,19 @@ static int merge_working_tree(const struct checkout_opts *opts,
>                         topts.dir->flags |= DIR_SHOW_IGNORED;
>                         setup_standard_excludes(topts.dir);
>                 }
> +
> +               if ( we know we can skip the unpack ) {
> +                       ret = 0;
> +               } else {
>                         tree = parse_tree_indirect(old->commit ?
>                                                    old->commit->object.oid.hash :
>                                                    EMPTY_TREE_SHA1_BIN);
>                         init_tree_desc(&trees[0], tree->buffer, tree->size);
>                         tree = parse_tree_indirect(new->commit->object.oid.hash);
>                         init_tree_desc(&trees[1], tree->buffer, tree->size);
> -
>                         ret = unpack_trees(2, trees, &topts);
> +               }
> +
>                 if (ret == -1) {
>                         /*
>                          * Unpack couldn't do a trivial merge; either
>
> I'd think.  Note that the determination of "we can skip" would
> involve knowing the object names of the two trees involved, so for
> performance reasons, some of the parse-tree calls may have to come
> before the call to "do we know we can skip?", but that does not
> fundamentally change the basic code structure.
>
> Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-14  6:30   ` Oleg Taranenko
@ 2016-09-14 15:48     ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-09-14 15:48 UTC (permalink / raw)
  To: Oleg Taranenko; +Cc: Ben Peart, git, pclouds, Ben Peart

Oleg Taranenko <olegtaranenko@gmail.com> writes:

> Sorry for bothering, why not introduce a brand new option like git
> checkout -b foo --skip-worktree-merge for such rare optimization use
> case?

I am not sure what problem such a new option solves.  How would you
describe and explain what "--skip-worktree-merge" option to the end
user?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
       [not found]     ` <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com>
@ 2016-09-19 13:18       ` Ben Peart
  2016-09-19 16:30         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Peart @ 2016-09-19 13:18 UTC (permalink / raw)
  To: 'Junio C Hamano'; +Cc: pclouds, peartben, 'Ben Peart', git

Let me see if I can better explain what I’m trying to accomplish with this
patch.  
 
"git checkout -b foo" (without -f -m or <start_point>) is defined in the
manual as being a shortcut for/equivalent to:
 
        (1a) "git branch foo"
        (1b) "git checkout foo"
 
However, it has been our experience in our observed use cases and all the
existing git tests, that it can be treated as equivalent to:
 
        (2a) "git branch foo"
        (2b) "git symbolic-ref HEAD refs/heads/foo"
 
That is, the common perception (use case) is to just create a new branch
"foo" (pointing at the current commit) and point HEAD at it WITHOUT making
any changes to the index or worktree.
 
However, the (1b) command has "git reset" connotations in that it should
examine and manipulate the trees, index, and worktree in the expectation
that there MIGHT be work to do.
 
Since this additional work in (1b) takes minutes on large repos and (2b)
takes less than a second, my intent was to identify the conditions that this
additional work will have no affect and thereby avoid it.
 
Alternatively, was the "-b" option just created as a shortcut only to avoid
calling the separate "git branch foo" command and we should not think about
the common perception and usage?
 
More comments inline...
 
> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com]
> Sent: Tuesday, September 13, 2016 6:35 PM
> To: Ben Peart <mailto:peartben@gmail.com>
> Cc: mailto:git@vger.kernel.org; mailto:pclouds@gmail.com; Ben Peart
> <mailto:Ben.Peart@microsoft.com>
> Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Ben Peart <mailto:peartben@gmail.com> writes:
> 
> > +static int needs_working_tree_merge(const struct checkout_opts *opts,
> > +      const struct branch_info *old,
> > +      const struct branch_info *new)
> > +{
> > +...
> > +}
> 
> I do not think I need to repeat the same remarks on the conditions in this
> helper, which hasn't changed since v2.  Many "comments" in the code do not
> explain why skipping is justified, or what they claim to check looks to me
just
> plain wrong.
> 
> For example, there is
> 
>        /*
>         * If we're not creating a new branch, by definition we're changing
>         * the existing one so need to do the merge
>         */
>        if (!opts->new_branch)
>                return 1;
> 
> but "git checkout" (no other argument) hits this condition.  It disables
the
> most trivial optimization opportunity, because we are not "creating".
> 
 
Disabling the optimization for "git checkout" with no argument was
intentional. This command does not create a new branch but instead, performs
a "soft reset" which will update the index and working directory to reflect
changes to the sparse-checkout (for example).  If this was not disabled,
many tests fail as they expect this behavior.  Because "git checkout" does
not actually change the refs, if we skipped the merge/index/working
directory update, this command becomes a no-op.
 
> "By definition, we're changing"?  Really?  Not quite.
> 
 
What I was attempting to communicate is that if we aren't creating a new
branch any changes or updates will happen in the existing branch.  Since
that could only be updating the index and working directory, we don't want
to skip those steps or we've defeated any purpose in running the command. 
 
> If you disable this bogus check, "git checkout" (no other argument) would
be
> allowed to skip the merge_working_tree(), and that in turn reveals another
> case that the helper is not checking when
> unpack_trees() MUST be called.
> 
>     Note: namely, when sparse checkout is in effect, switching from
>     HEAD to HEAD can nuke existing working tree files outside the
>     sparse pattern -- YUCK!  See penultimate test in t1011 for
>     an example.
> 
> This yuckiness is not your fault, but needs_working_tree_merge() logic you
> added needs to refrain from skipping unpack_trees() call when sparse thing
> is in effect.  I'd expect "git checkout -b foo"
> instead of "git checkout" (no other argument) would fail to honor the
sparse
> thing and reveal this bug, because the above bogus "!opts->new_branch"
> check will not protect you for that case.
> 
 
It is correct that this optimization will skip updating the tree to honor
any changes to the sparse-checkout in the case of creating a new branch.
Unfortunately, I don't know of any way to detect the changes other than
actually doing all the work to update the skip work tree bit in the index.
If this behavior is required, then this optimization will need to check  if
sparse-checkout is enabled and skip the optimization just in case there have
been changes.
 
> In other words, these random series of "if (...) return 1" are bugs hiding
> other real bugs and we need to reason about which ones are bugs that are
> hiding what other bugs that are not covered by this function.  As Peff
said
> earlier for v1, this is still an unreadable mess.  We need to figure out a
way to
> make sure we are skipping on the right condition and not accidentally
hiding
> a bug of failing to check the right condition.  I offhand do not have a
good
> suggestion on this; sorry.
> 
 
Beyond this code review process and testing, I don't know how else we make
sure we're caught all the conditions where we are OK skipping some of the
steps.  Any change has inherent risk - a change in behavior even more so.
 
> >  static int merge_working_tree(const struct checkout_opts *opts,
> >                                                struct branch_info *old,
> >                                                struct branch_info *new,
> >                                                int *writeout_error)
> >  {
> > +      /*
> > +      * Optimize the performance of "git checkout -b foo" by avoiding
> > +      * the expensive merge, index and working directory updates if
they
> > +      * are not needed.
> > +      */
> > +      if (!needs_working_tree_merge(opts, old, new))
> > +                      return 0;
> > +
> >          int ret;
> >          struct lock_file *lock_file = xcalloc(1, sizeof(struct
lock_file));
> 
> With the change you made at the beginning of this function, it no longer
> compiles with -Wdecl-after-stmt, but that is the smallest of the problems.
 
I apologize, I didn't realize this was a requirement.  It built and passed
all existing tests on Windows but I will reorder the declarations to prevent
causing issues with other platforms/compilers.
 
> 
> It is a small step in the right direction to move the call to the helper
from the
> caller to this function, but it is a bit too small.
> 
> Notice that the lines after the above context look like this:
> 
>             hold_locked_index(lock_file, 1);
>             if (read_cache_preload(NULL) < 0)
>                             return error(_("index file corrupt"));
> 
>             resolve_undo_clear();
>             if (opts->force) {
>                             ret = reset_tree(new->commit->tree, opts, 1,
> writeout_error);
>                             if (ret)
>                                             return ret;
>             } else {
>                             struct tree_desc trees[2];
>                             ...
> 
> I would have expected that the check goes inside the "else" thing that
> actually does a two-tree merge, and the helper loses the check with opts-
> >force, at least.  That would still be a change smaller than desired, but
at
> least a meaningful improvement compared to the previous one.  
 
I'll restructure it that way.
 
> As I have
> already pointed out, in the "else" clause there is a check "is the index
free of
> conflicted entries? if so error out", and that must be honored in
!opt->force
> case, no matter what your needs_working_tree_merge() says.  
 
Given we're not merging trees, updating the index, or work tree, why do we
need to error out in this case?  We aren't attempting this optimization if
they pass "-m."  If there are conflicted entries that haven't been fixed,
they will still exist.  We're essentially just creating a new reference for
the existing commit/index/work tree.
 
> I also was
> hoping that you would notice, when you were told about the unmerged
> check, by reading the remainder of the merge_working_tree(), that we need
> to call show_local_changes() when we are not doing force and when we are
> not quiet---returning early like the above patch will never be able to
call that
> one downstream in the function.
 
It is a good point that my optimization skipped the call to
show_local_changes.  Thanks for catching that, I've fixed it for my next
iteration.
 
> 
> Regardless of what the actual checks end up to be, the right place to do
this
> "optimization" would look more like:
> 
>  builtin/checkout.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/checkout.c b/builtin/checkout.c index
2b50a49..a6b9e17
> 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -508,14 +508,19 @@ static int merge_working_tree(const struct
> checkout_opts *opts,
>                                             topts.dir->flags |=
DIR_SHOW_IGNORED;
>                                            
setup_standard_excludes(topts.dir);
>                             }
> +
> +                          if ( we know we can skip the unpack ) {
> +                                          ret = 0;
> +                          } else {
>                                             tree =
parse_tree_indirect(old->commit ?
                                                                            
                  old->commit-
> >object.oid.hash :
> 
                                                                            
                  EMPTY_TREE_SHA1_BIN);
>                                             init_tree_desc(&trees[0],
tree->buffer, tree->size);
>                                             tree =
parse_tree_indirect(new->commit-
> >object.oid.hash);
>                                             init_tree_desc(&trees[1],
tree->buffer, tree->size);
> -
>                                             ret = unpack_trees(2, trees,
&topts);
> +                          }
> +
>                             if (ret == -1) {
>                                             /*
>                                              * Unpack couldn't do a
trivial merge; either
> 
 
I'll restructure it to be like you suggest above however, given we will not
be merging the tress, we won't have any index changes to write out. I will
also skip the calls to cache_tree_update and write_locked_index.
 
> I'd think.  Note that the determination of "we can skip" would involve
> knowing the object names of the two trees involved, so for performance
> reasons, some of the parse-tree calls may have to come before the call to
> "do we know we can skip?", but that does not fundamentally change the
> basic code structure.
> 
> Thanks.
 
I don't understand why we'd need to know the object names of the two trees
given we have the IDs.  What did you have in mind that would need those?
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-19 13:18       ` Ben Peart
@ 2016-09-19 16:30         ` Junio C Hamano
  2016-09-19 17:03           ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-09-19 16:30 UTC (permalink / raw)
  To: Ben Peart; +Cc: pclouds, 'Ben Peart', git

"Ben Peart" <peartben@gmail.com> writes:

> Let me see if I can better explain what I’m trying to accomplish with this
> patch.  
>  
> "git checkout -b foo" (without -f -m or <start_point>) is defined in the
> manual as being a shortcut for/equivalent to:
>  
>         (1a) "git branch foo"
>         (1b) "git checkout foo"
>  
> However, it has been our experience in our observed use cases and all the
> existing git tests, that it can be treated as equivalent to:
>  
>         (2a) "git branch foo"
>         (2b) "git symbolic-ref HEAD refs/heads/foo"
>  
> That is, the common perception (use case) is to just create a new branch
> "foo" (pointing at the current commit) and point HEAD at it WITHOUT making
> any changes to the index or worktree.
>  
> However, the (1b) command has "git reset" connotations in that it should
> examine and manipulate the trees, index, and worktree in the expectation
> that there MIGHT be work to do.
>  
> Since this additional work in (1b) takes minutes on large repos and (2b)
> takes less than a second, my intent was to identify the conditions that this
> additional work will have no affect and thereby avoid it.
>  
> Alternatively, was the "-b" option just created as a shortcut only to avoid
> calling the separate "git branch foo" command and we should not think about
> the common perception and usage?

If you are trying to change the definition of "checkout -b" from 1
to 2 above, that is a completely different issue.  I thought this
was an attempt to optimize for the performance without changing the
behaviour.

So if you did not apologize like this...

> It is correct that this optimization will skip updating the tree to honor
> any changes to the sparse-checkout in the case of creating a new branch.
> Unfortunately, I don't know of any way to detect the changes other than
> actually doing all the work to update the skip work tree bit in the index.

... but insisted that skipping the yucky sparse-checkout adjustment
in this case was an intended behaviour change, I would have
understood (not necessarily agreed, though) what you were trying to
do.

> Beyond this code review process and testing, I don't know how else we make
> sure we're caught all the conditions where we are OK skipping some of the
> steps. Any change has inherent risk - a change in behavior even more so.

At least we made one-step progress today.  I now know that you are
trying to change the behaviour, but I didn't know that last week,
when I was primarily reacting that your claim that this was
performance thing and assuming you meant no change in behaviour, but
there was clearly behaviour change, and it was apparent that the
denseness of the code made it almost impossible to see if there are
unintended changes.

I am still not sure if I like the change of what "checkout -b" is
this late in the game, though.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-19 16:30         ` Junio C Hamano
@ 2016-09-19 17:03           ` Junio C Hamano
  2016-09-21 18:32             ` Ben Peart
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-09-19 17:03 UTC (permalink / raw)
  To: Ben Peart; +Cc: pclouds, 'Ben Peart', git

Junio C Hamano <gitster@pobox.com> writes:

>> "git checkout -b foo" (without -f -m or <start_point>) is defined in the
>> manual as being a shortcut for/equivalent to:
>>  
>>         (1a) "git branch foo"
>>         (1b) "git checkout foo"
>>  
>> However, it has been our experience in our observed use cases and all the
>> existing git tests, that it can be treated as equivalent to:
>>  
>>         (2a) "git branch foo"
>>         (2b) "git symbolic-ref HEAD refs/heads/foo"
>> ...
>
> I am still not sure if I like the change of what "checkout -b" is
> this late in the game, though.

Having said all that.

I do see the merit of having a shorthand way to invoke your 2 above.
It is just that I am not convinced that it is the best way to
achieve that goal to redefine what "git checkout -b <new-name>" (no
other parameters) does.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-19 17:03           ` Junio C Hamano
@ 2016-09-21 18:32             ` Ben Peart
  2016-09-24 14:28               ` Philip Oakley
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Peart @ 2016-09-21 18:32 UTC (permalink / raw)
  To: Junio C Hamano, Ben Peart; +Cc: pclouds@gmail.com, git@vger.kernel.org

I understand the reluctance to change the existing behavior of the "git checkout -b <new-name>" command.

I see this as a tradeoff between taking advantage of the muscle memory for the existing command and coming up with a new shortcut command and training people to use it instead.

The fact that all the use cases we've observed and all the git test cases actually produce the same results but significantly faster with that change in behavior made me hope we could redefine the command to take advantage of the muscle memory.

That said, you're much more on the frontline of receiving negative feedback about doing that than I am. :)  How would you like to proceed?

Ben

-----Original Message-----
From: Junio C Hamano [mailto:gitster@pobox.com] 
Sent: Monday, September 19, 2016 1:04 PM
To: Ben Peart <peartben@gmail.com>
Cc: pclouds@gmail.com; Ben Peart <Ben.Peart@microsoft.com>; git@vger.kernel.org
Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout

Junio C Hamano <gitster@pobox.com> writes:

>> "git checkout -b foo" (without -f -m or <start_point>) is defined in 
>> the manual as being a shortcut for/equivalent to:
>>  
>>         (1a) "git branch foo"
>>         (1b) "git checkout foo"
>>  
>> However, it has been our experience in our observed use cases and all 
>> the existing git tests, that it can be treated as equivalent to:
>>  
>>         (2a) "git branch foo"
>>         (2b) "git symbolic-ref HEAD refs/heads/foo"
>> ...
>
> I am still not sure if I like the change of what "checkout -b" is this 
> late in the game, though.

Having said all that.

I do see the merit of having a shorthand way to invoke your 2 above.
It is just that I am not convinced that it is the best way to achieve that goal to redefine what "git checkout -b <new-name>" (no other parameters) does.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-21 18:32             ` Ben Peart
@ 2016-09-24 14:28               ` Philip Oakley
  2016-09-24 18:26                 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Philip Oakley @ 2016-09-24 14:28 UTC (permalink / raw)
  To: Ben Peart, Junio C Hamano; +Cc: pclouds, git

Ben,
Using a 'bottom / in-line' posting flow is much preferred, which may require 
some manual editing[1], hopefully I have it about right...
Philip
--
[1] this is massaged and mangled Outlook Express, sometimes one has to work 
with the tools at hand...

From: "Ben Peart" <Ben.Peart@microsoft.com>
> From: Junio C Hamano [mailto:gitster@pobox.com]
> > Junio C Hamano <gitster@pobox.comwrites:
> >
> > >"git checkout -b foo" (without -f -m or <start_point>) is defined in
> > >the manual as being a shortcut for/equivalent to:
> > >
> > >        (1a) "git branch foo"
> > >        (1b) "git checkout foo"
> > >
> > >However, it has been our experience in our observed use cases and all
> > >the existing git tests, that it can be treated as equivalent to:
> > >
> > >        (2a) "git branch foo"
> > >        (2b) "git symbolic-ref HEAD refs/heads/foo"
> > >...
> > >
> > I am still not sure if I like the change of what "checkout -b" is this
> > late in the game, though.
> >
> > Having said all that.
> >
> > I do see the merit of having a shorthand way to invoke your 2 above.
> > It is just that I am not convinced that it is the best way to achieve 
> > that goal to redefine what "git checkout -b <new-name>" (no other 
> > parameters) does.
> >
> ---
>
> I understand the reluctance to change the existing behavior of the "git 
> checkout -b <new-name>" command.
>
> I see this as a tradeoff between taking advantage of the muscle memory for 
> the existing command and coming up with a new shortcut command and 
> training people to use it instead.
>
> The fact that all the use cases we've observed and all the git test cases 
> actually produce the same results but significantly faster with that 
> change in behavior made me hope we could redefine the command to take 
> advantage of the muscle memory.
>
> That said, you're much more on the frontline of receiving negative 
> feedback about doing that than I am. :)  How would you like to proceed?

The discussion can often feel harsh [2], especially if there is accidental 
'talking past each other', which is usually because of differing 
perspectives on the issues.

I didn't see an initial confirmation as to what the issue really was. You 
indicated the symptom ('a long checkout time'), but then we missed out on 
hard facts and example repos, so that the issue was replicable.

Is there an example public repo that you can show the issue on? (or 
anonymise a private one - there is a script for that [3])

Can you give local timings (and indication of the hardware and software 
versions used for the test, and if appropriate, network setup)?

I know at my work that sometime our home drives are multiply mapped to H:, a 
C:/homedrive directory and a $netshare/me network directory via the 
Microsofy roaming profiles, and if there is hard synchronization (or 
whatever term is appropriate) there can be sudden slowdowns as local C: 
writes drop from 'instant' to 'forever'...

Is there anything special about the repos that have the delays? Is it a 
local process issue that causes the repos to develop those symptoms (see 
above about not being sure why you have these issues), in which case it 
could be local self inflicted issues, or it could be that you have a 
regulatory issue for that domain that requires such symptoms, which would 
shift the problem from a 'don't do that' response to a 'hmm, how to cover 
this'.


At the moment there is the simple workaround of an alias that executes that 
two step command dance to achieve what you needed, and Junio has outlined 
the issues he needed to be covered from his maintainer perspective (e.g. the 
detection of sparse checkouts). Confirming the root causes would help in 
setting a baseline.

I hope that is of help - I'd seen that the discussion had gone quiet.

--
Philip


[2] Been there, feel your pain. It's not in any way malicious, just a 
reflection that email can be a poor medium for such discussions.
[3] https://public-inbox.org/git/20140827170127.GA6138@peff.net/ suggest 
that the `git fast-export --anonymize --all` maybe the approach. 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-24 14:28               ` Philip Oakley
@ 2016-09-24 18:26                 ` Junio C Hamano
  2016-09-24 19:31                   ` Philip Oakley
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-09-24 18:26 UTC (permalink / raw)
  To: Philip Oakley; +Cc: Ben Peart, pclouds, git

"Philip Oakley" <philipoakley@iee.org> writes:

>> > >"git checkout -b foo" (without -f -m or <start_point>) is defined in
>> > >the manual as being a shortcut for/equivalent to:
>> > >
>> > >        (1a) "git branch foo"
>> > >        (1b) "git checkout foo"
>> > >
>> > >However, it has been our experience in our observed use cases and all
>> > >the existing git tests, that it can be treated as equivalent to:
>> > >
>> > >        (2a) "git branch foo"
>> > >        (2b) "git symbolic-ref HEAD refs/heads/foo"
>> > >...
>> > >
>> > I am still not sure if I like the change of what "checkout -b" is this
>> > late in the game, though.
>>
>> ...
>> That said, you're much more on the frontline of receiving negative
>> feedback about doing that than I am. :)  How would you like to
>> proceed?
>
> I didn't see an initial confirmation as to what the issue really
> was. You indicated the symptom ('a long checkout time'), but then we
> missed out on hard facts and example repos, so that the issue was
> replicable.

I took it as a given, trivial and obvious optimization opportunity,
that it is wasteful having to traverse two trees to consolidate and
reflect their differences into the working tree when we know upfront
that these two trees are identical, no matter what the overhead for
doing so is.

> At the moment there is the simple workaround of an alias that executes
> that two step command dance to achieve what you needed, and Junio has
> outlined the issues he needed to be covered from his maintainer
> perspective (e.g. the detection of sparse checkouts). Confirming the
> root causes would help in setting a baseline.
>
> I hope that is of help - I'd seen that the discussion had gone quiet.

Some of the problems I have are:

 (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
     and "git checkout HEAD" (no other parameters to any of them)
     ought to give identical index and working tree.  It is too
     confusing to leave subtly different results that will lead to
     hard to diagnose bugs for only one of them.

 (2) The proposed log message talks only about "performance
     optimization", while the purpose of the change is more about
     changing the definition of what "git checkout -b NEW" is from
     "git branch NEW && git checkout NEW" to "git branch NEW && git
     symbolic-ref HEAD refs/heads/NEW".  The explanation in a Ben's
     later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does
     a much better job contrasting the two.

 (3) I identified only one difference as an example sufficient to
     point out why the patch provided is not a pure optimization but
     behaviour change.  Fixing that example alone to avoid change in
     the behaviour is trivial (see if the "info/sparse-checkout"
     file is present and refrain from skipping the proper checkout),
     but a much larger problem is that I do not know (and Ben does
     not, I suspect) know what other behaviour changes the patch is
     introducing, and worse, the checks are sufficiently dense too
     detailed and intimate to the implementation of unpack_trees()
     that it is impossible for anybody to make sure the exceptions
     defined in this patch and updates to other parts of the system
     will be kept in sync.

So my inclination at this point, unless we see somebody invents a
clever way to solve (3), is that any change that violates (1),
i.e. as long as the patch does "Are we doing '-b NEW'?  Then we do
something subtly different", is not acceptable, and solving (3) in a
maintainable way smells like quite a hard thing to do.  But it would
be ideal if (3) is solved cleanly, as we will then not have to worry
about changing behaviour at all and can apply the optimization for
all of the four cases equally.  As a side effect, that approach
would solve problem (2) above.

If we were to punt on keeping the sanity (1) and introduce a subtly
different "create a new branch and point the HEAD at it", an easier
way out may be be one of

 1. a totally new command, e.g. "git branch-switch NEW" that takes
    only a single argument and no other "checkout" options, or

 2. a new option to "git checkout" that takes _ONLY_ a single
    argument and incompatible with any other option or command line
    argument, or

 3. an alias that does "git branch" followed by "git symbolic-ref".

Neither of the first two sounds palatable, though.




        

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-24 18:26                 ` Junio C Hamano
@ 2016-09-24 19:31                   ` Philip Oakley
  0 siblings, 0 replies; 13+ messages in thread
From: Philip Oakley @ 2016-09-24 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Peart, pclouds, git

Hi Junio,

From: "Junio C Hamano" <gitster@pobox.com>
> "Philip Oakley" <philipoakley@iee.org> writes:
>
>>> > >"git checkout -b foo" (without -f -m or <start_point>) is defined in
>>> > >the manual as being a shortcut for/equivalent to:
>>> > >
>>> > >        (1a) "git branch foo"
>>> > >        (1b) "git checkout foo"
>>> > >
>>> > >However, it has been our experience in our observed use cases and all
>>> > >the existing git tests, that it can be treated as equivalent to:
>>> > >
>>> > >        (2a) "git branch foo"
>>> > >        (2b) "git symbolic-ref HEAD refs/heads/foo"
>>> > >...
>>> > >
>>> > I am still not sure if I like the change of what "checkout -b" is this
>>> > late in the game, though.
>>>
>>> ...
>>> That said, you're much more on the frontline of receiving negative
>>> feedback about doing that than I am. :)  How would you like to
>>> proceed?
>>
>> I didn't see an initial confirmation as to what the issue really
>> was. You indicated the symptom ('a long checkout time'), but then we
>> missed out on hard facts and example repos, so that the issue was
>> replicable.
>
> I took it as a given, trivial and obvious optimization opportunity,
> that it is wasteful having to traverse two trees to consolidate and
> reflect their differences into the working tree when we know upfront
> that these two trees are identical, no matter what the overhead for
> doing so is.

I agree, and I believe Ben agrees.

>
>> At the moment there is the simple workaround of an alias that executes
>> that two step command dance to achieve what you needed, and Junio has
>> outlined the issues he needed to be covered from his maintainer
>> perspective (e.g. the detection of sparse checkouts). Confirming the
>> root causes would help in setting a baseline.
>>
>> I hope that is of help - I'd seen that the discussion had gone quiet.
>
> Some of the problems I have are:
>
> (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
>     and "git checkout HEAD" (no other parameters to any of them)
>     ought to give identical index and working tree.  It is too
>     confusing to leave subtly different results that will lead to
>     hard to diagnose bugs for only one of them.
>
> (2) The proposed log message talks only about "performance
>     optimization",

>                                while the purpose of the change is more 
> about
>     changing the definition

Here I think is the misunderstanding. His purpose is NOT to change the 
definition (IIUC). As I read the message you reference below (and Ben's 
other messages), I understood that he was trying to achieve what you said 
(i.e. optimise the trivial and obvious opportunity) of selecting for the 
common case (underlying conditions) where the two command sequences are 
identical. If the selected case / conditions is not identical then it is 
defined wrongly...

I suspect that it was Ben's 'soft' explanation that allowed the discussion 
to diverge.


>                                                 of what "git checkout -b 
> NEW" is from
>     "git branch NEW && git checkout NEW" to "git branch NEW && git
>     symbolic-ref HEAD refs/heads/NEW".  The explanation in a Ben's
>     later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does
>     a much better job contrasting the two.
>
> (3) I identified only one difference as an example sufficient to
>     point out why the patch provided is not a pure optimization but
>     behaviour change.  Fixing that example alone to avoid change in
>     the behaviour is trivial (see if the "info/sparse-checkout"
>     file is present and refrain from skipping the proper checkout),

This is probably the point Ben needs to take on board to narrow the 
conditions down. There may be others.

>     but a much larger problem is that I do not know (and Ben does
>     not, I suspect) know what other behaviour changes the patch is
>     introducing, and worse, the checks are sufficiently dense too
>     detailed and intimate to the implementation of unpack_trees()
>     that it is impossible for anybody to make sure the exceptions
>     defined in this patch and updates to other parts of the system
>     will be kept in sync.

I did not believe he was proposing such a change to behaviour, hence his 
difficulty in responding (or at least that is my perception). I.e. he was 
digging a hole in the wrong place.

It is possible that he had accidentally introduced a behavious change, and 
having failed to explictly say "This patch (should) produces no behavious 
change", which then continued to re-inforce the misunderstanding.

>
> So my inclination at this point, unless we see somebody invents a
> clever way to solve (3), is that any change that violates (1),
> i.e. as long as the patch does "Are we doing '-b NEW'?  Then we do
> something subtly different", is not acceptable, and solving (3) in a
> maintainable way smells like quite a hard thing to do.  But it would
> be ideal if (3) is solved cleanly, as we will then not have to worry
> about changing behaviour at all and can apply the optimization for
> all of the four cases equally.  As a side effect, that approach
> would solve problem (2) above.
>
> If we were to punt on keeping the sanity (1) and introduce a subtly
> different "create a new branch and point the HEAD at it", an easier
> way out may be be one of
>
> 1. a totally new command, e.g. "git branch-switch NEW" that takes
>    only a single argument and no other "checkout" options, or
>
> 2. a new option to "git checkout" that takes _ONLY_ a single
>    argument and incompatible with any other option or command line
>    argument, or
>
> 3. an alias that does "git branch" followed by "git symbolic-ref".
>
> Neither of the first two sounds palatable, though.

It will need Ben to come back and clarify, if he did, or did not, want any 
behaviour change (beyond speed of action;-)

Thanks

Philip 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
@ 2016-09-28 17:02 Ben Peart
  2016-09-28 17:52 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Peart @ 2016-09-28 17:02 UTC (permalink / raw)
  To: git
  Cc: Ben Peart, pclouds, Jeff Hostetler, philipoakley,
	'Junio C Hamano'

Resending

> -----Original Message-----
> From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On
> Behalf Of Philip Oakley
> Sent: Saturday, September 24, 2016 3:31 PM
> To: Junio C Hamano <gitster@pobox.com>
> Cc: Ben Peart <Ben.Peart@microsoft.com>; pclouds@gmail.com;
> git@vger.kernel.org
> Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Hi Junio,
> 
> From: "Junio C Hamano" <gitster@pobox.com>
> > "Philip Oakley" <philipoakley@iee.org> writes:
> >
> >>> > >"git checkout -b foo" (without -f -m or <start_point>) is defined
> >>> > >in the manual as being a shortcut for/equivalent to:
> >>> > >
> >>> > >        (1a) "git branch foo"
> >>> > >        (1b) "git checkout foo"
> >>> > >
> >>> > >However, it has been our experience in our observed use cases and
> >>> > >all the existing git tests, that it can be treated as equivalent
to:
> >>> > >
> >>> > >        (2a) "git branch foo"
> >>> > >        (2b) "git symbolic-ref HEAD refs/heads/foo"
> >>> > >...
> >>> > >
> >>> > I am still not sure if I like the change of what "checkout -b" is
> >>> > this late in the game, though.
> >>>
> >>> ...
> >>> That said, you're much more on the frontline of receiving negative
> >>> feedback about doing that than I am. :)  How would you like to
> >>> proceed?
> >>
> >> I didn't see an initial confirmation as to what the issue really was.
> >> You indicated the symptom ('a long checkout time'), but then we
> >> missed out on hard facts and example repos, so that the issue was
> >> replicable.
> >
> > I took it as a given, trivial and obvious optimization opportunity,
> > that it is wasteful having to traverse two trees to consolidate and
> > reflect their differences into the working tree when we know upfront
> > that these two trees are identical, no matter what the overhead for
> > doing so is.
> 
> I agree, and I believe Ben agrees.
> 

Correct.  In my original patch request I put more specific information on 
the impact this optimization has in our specific case (reducing the cost 
from 166 seconds to 16 seconds).

> >
> >> At the moment there is the simple workaround of an alias that
> >> executes that two step command dance to achieve what you needed, and
> >> Junio has outlined the issues he needed to be covered from his
> >> maintainer perspective (e.g. the detection of sparse checkouts).
> >> Confirming the root causes would help in setting a baseline.
> >>
> >> I hope that is of help - I'd seen that the discussion had gone quiet.
> >
> > Some of the problems I have are:
> >
> > (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
> >     and "git checkout HEAD" (no other parameters to any of them)
> >     ought to give identical index and working tree.  It is too
> >     confusing to leave subtly different results that will lead to
> >     hard to diagnose bugs for only one of them.
> >
> > (2) The proposed log message talks only about "performance
> >     optimization",
> 
> >                                while the purpose of the change is more
> > about
> >     changing the definition
> 
> Here I think is the misunderstanding. His purpose is NOT to change the
> definition (IIUC). As I read the message you reference below (and Ben's
other
> messages), I understood that he was trying to achieve what you said (i.e.
> optimise the trivial and obvious opportunity) of selecting for the common
> case (underlying conditions) where the two command sequences are
> identical. If the selected case / conditions is not identical then it is
defined
> wrongly...
> 
> I suspect that it was Ben's 'soft' explanation that allowed the discussion
to
> diverge.
> 

I'm unaccustomed to doing reviews like this via email so have been 
struggling with how to most effectively communicate about the proposed
change.  I appreciate any help and understanding as I go through this
for the first time.

My intention was not to change the users expected results which
I believe are to "create a new branch and switch to it."  We reinforce
that expectation with the output of the command which completes 
with the text "Switched to a new branch 'foo'"

> 
> >                                                 of what "git checkout -b
> > NEW" is from
> >     "git branch NEW && git checkout NEW" to "git branch NEW && git
> >     symbolic-ref HEAD refs/heads/NEW".  The explanation in a Ben's
> >     later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does
> >     a much better job contrasting the two.
> >
> > (3) I identified only one difference as an example sufficient to
> >     point out why the patch provided is not a pure optimization but
> >     behaviour change.  Fixing that example alone to avoid change in
> >     the behaviour is trivial (see if the "info/sparse-checkout"
> >     file is present and refrain from skipping the proper checkout),
> 
> This is probably the point Ben needs to take on board to narrow the
> conditions down. There may be others.
> 

The fact that "git checkout -b NEW" updates the index and as a
result reflects any changes in the sparse-checkout and the issue 
Junio pointed out earlier about not calling show_local_changes 
at the end of merge_working_tree are the only difference in behavior
I am aware of.  Both of these are easily rectified.

That said, given we are skipping huge amounts of work by no longer 
merging the commit trees, generating a new index, and merging the 
local modifications in the working tree, it is possible that there are
other behavior changes I'm just not aware of.

> >     but a much larger problem is that I do not know (and Ben does
> >     not, I suspect) know what other behaviour changes the patch is
> >     introducing, and worse, the checks are sufficiently dense too
> >     detailed and intimate to the implementation of unpack_trees()
> >     that it is impossible for anybody to make sure the exceptions
> >     defined in this patch and updates to other parts of the system
> >     will be kept in sync.
> 
> I did not believe he was proposing such a change to behaviour, hence his
> difficulty in responding (or at least that is my perception). I.e. he was
> digging a hole in the wrong place.
> 
> It is possible that he had accidentally introduced a behavious change, and
> having failed to explictly say "This patch (should) produces no behavious
> change", which then continued to re-inforce the misunderstanding.
> 
> >
> > So my inclination at this point, unless we see somebody invents a
> > clever way to solve (3), is that any change that violates (1),
> > i.e. as long as the patch does "Are we doing '-b NEW'?  Then we do
> > something subtly different", is not acceptable, and solving (3) in a
> > maintainable way smells like quite a hard thing to do.  But it would
> > be ideal if (3) is solved cleanly, as we will then not have to worry
> > about changing behaviour at all and can apply the optimization for
> > all of the four cases equally.  As a side effect, that approach
> > would solve problem (2) above.
> >
> > If we were to punt on keeping the sanity (1) and introduce a subtly
> > different "create a new branch and point the HEAD at it", an easier
> > way out may be be one of
> >
> > 1. a totally new command, e.g. "git branch-switch NEW" that takes
> >    only a single argument and no other "checkout" options, or
> >
> > 2. a new option to "git checkout" that takes _ONLY_ a single
> >    argument and incompatible with any other option or command line
> >    argument, or
> >
> > 3. an alias that does "git branch" followed by "git symbolic-ref".
> >
> > Neither of the first two sounds palatable, though.
> 
> It will need Ben to come back and clarify, if he did, or did not, want any
> behaviour change (beyond speed of action;-)
> 

There is a subtlety here in what is meant by "any behavior change."
I did not want to change the users expectations of what this command
is used for.  The only noticeable behavior change should only be that it 
sped up by an order of magnitude.  

To get that speed up, there is a change in behavior from git's 
perspective as it is no longer doing a bunch of work it used to do 
which is what is saving the time.

I was aware that skipping the commit merge/new index/merge working 
tree meant that "git checkout NEW" would no longer update these to 
reflect any potential changes to the sparse-checkout file.  

To determine if this would change the results the user was *expecting*,  
I searched the web and found that all the instructions I could locate
that taught people how to update the index/working tree after
making changes to the sparse-checkout file instructed them to use
"git read-tree -mu HEAD."  I didn't find any that told people to use
"git checkout -b NEW"  

Finally, when I made the optimization to skip these steps I then
verified that the test suite still passed all tests.  I realize that 
there is not 100% coverage of tests but I thought it was a good
indication that none of them were impacted by this optimization.

I've tried to think of a way to solve (3) in a more maintainable way 
but have not been able to come up with anything.  Ultimately,
to ensure are only applying the optimization in this specific case,
we have to test to make sure other options don't require the extra
steps.  I'm open to suggestions!

I'm going to be out for the next 2 weeks so will be unable to respond 
to activity on the thread but a co-worker who has been involved will
be responsive to feedback and rolling any new versions of the patch.

Thanks,

Ben




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-28 17:02 Ben Peart
@ 2016-09-28 17:52 ` Junio C Hamano
  0 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-09-28 17:52 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, Ben Peart, pclouds, Jeff Hostetler, philipoakley

"Ben Peart" <peartben@gmail.com> writes:

> The fact that "git checkout -b NEW" updates the index and as a
> result reflects any changes in the sparse-checkout and the issue 
> Junio pointed out earlier about not calling show_local_changes 
> at the end of merge_working_tree are the only difference in behavior
> I am aware of.  Both of these are easily rectified.
>
> That said, given we are skipping huge amounts of work by no longer 
> merging the commit trees, generating a new index, and merging the 
> local modifications in the working tree, it is possible that there are
> other behavior changes I'm just not aware of.

That is OK.  It is not ok to leave such bugs at the end of the
development before the topic is merged to 'master' to be delivered
to the end users, but you do not have to fight alone to produce a
perfect piece of code with your first attempt.  That's what the
reviews and testing period are for.

If you are shooting for the same behaviour, then that is much better
than "make 'checkout -b NEW' be equivalent to a sequence of
update-ref && symbolic-ref, which is different from others", which
was the second explanation you gave earlier.  I am much happier with
that goal.

But if that is the case, I really do not see any point of singling
out "-b NEW" case.  The following property MUST be kept:

 (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
     and "git checkout HEAD" (no other parameters to any of them)
     ought to give identical index and working tree.  It is too
     confusing to leave subtly different results that will lead to
     hard to diagnose bugs for only one of them.

That would make the "do we skip unpack_trees() call?" decision a lot
simpler to make, I would suspect.  We only need to see "are the two
trees we would fed unpack_trees() the same as HEAD's tree?" and do
not have to look at new_branch and other irrelevant things at all.
What happens in the ref namespace is immaterial, as making or
skipping an unpack_trees() call would not affect anything other than
the resulting index and the working tree.  If we want to keep that
sparse-checkout wart, we would also need to see if the control file
sparse-checkout keeps in $GIT_DIR/ exists, but the result will be
much simpler set of rules, and would hopefully help remove the "the
optimization kicks in following logic that is an unreviewable-mess"
issue.




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-09-28 17:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-13 14:26 [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
2016-09-13 22:34 ` Junio C Hamano
2016-09-14  6:30   ` Oleg Taranenko
2016-09-14 15:48     ` Junio C Hamano
     [not found]   ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com>
     [not found]     ` <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com>
2016-09-19 13:18       ` Ben Peart
2016-09-19 16:30         ` Junio C Hamano
2016-09-19 17:03           ` Junio C Hamano
2016-09-21 18:32             ` Ben Peart
2016-09-24 14:28               ` Philip Oakley
2016-09-24 18:26                 ` Junio C Hamano
2016-09-24 19:31                   ` Philip Oakley
  -- strict thread matches above, loose matches on Subject: below --
2016-09-28 17:02 Ben Peart
2016-09-28 17:52 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).