git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ben Peart" <peartben@gmail.com>
To: "'Junio C Hamano'" <gitster@pobox.com>
Cc: <git@vger.kernel.org>, <pclouds@gmail.com>,
	"'Ben Peart'" <benpeart@microsoft.com>
Subject: RE: [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout
Date: Mon, 12 Sep 2016 14:12:07 -0400	[thread overview]
Message-ID: <13ef001d20d21$2d2ea840$878bf8c0$@gmail.com> (raw)
In-Reply-To: <xmqq1t0sagcm.fsf@gitster.mtv.corp.google.com>



> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com]
> Sent: Friday, September 9, 2016 5:55 PM
> To: Ben Peart <peartben@gmail.com>
> Cc: git@vger.kernel.org; pclouds@gmail.com; Ben Peart
> <benpeart@microsoft.com>
> Subject: Re: [PATCH v2] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Ben Peart <peartben@gmail.com> writes:
> 
> > @@ -802,6 +806,87 @@ static void orphaned_commit_warning(struct
> commit *old, struct commit *new)
> >  	free(refs.objects);
> >  }
> >
> > +static int needs_working_tree_merge(const struct checkout_opts *opts,
> > +	const struct branch_info *old,
> > +	const struct branch_info *new)
> > +{
> > +	/*
> > +	 * We must do the merge if we are actually moving to a new
> > +	 * commit tree.
> > +	 */
> > +	if (!old->commit || !new->commit ||
> > +		oidcmp(&old->commit->tree->object.oid, &new->commit-
> >tree->object.oid))
> > +		return 1;
> 
> A huge helper function helps it somewhat, compared with the earlier
> unreadable mess ;-).
> 
> Are we certain that at this point the commit objects are both parsed and
> their tree->object.oid are both valid?
> 
> > +	/*
> > +	 * Honor the explicit request for a three-way merge or to throw away
> > +	 * local changes
> > +	 */
> > +	if (opts->merge || opts->force)
> > +		return 1;
> 
> Hmph, "git checkout -m HEAD" wouldn't have to do anything wrt the index
> status, no?
> 
> For that matter, neither "git checkout -f HEAD".  Unless we rely on
> unpack_trees() to write over the working tree files.
> 
>     ... me goes and looks, and finds that merge_working_tree()
>     indeed does have a logic to do quite different thing when
>     "--force" is given.
> 
> This makes me wonder if the "merge_working_tree() is expensive, so
> selectively skip calling it" approach is working at a wrong level.
> Wouldn't the merge_working_tree() function itself a better place to do
this
> kind of "we may not have to do the full two-way merge"
> optimization?  It already looks at opts and does things differently (e.g.
when
> running with "--force", it does not even call unpack).
> If you can optimize even more by looking at other fields in opts to avoid
> unpack, that would fit better with the structure of the code that we
already
> have.
> 

I completely agree that optimizing within merge_working_tree would provide 
more opportunities for optimization.  I can certainly move the test into
that 
function as a first step.  I've looked into it a little but came to the
conclusion
that it will be non-trivial to determine how to ensure the minimal work is 
done for any arbitrary set of options passed in without breaking something.


While I'd love to see that work done, I just don't have the time to pursue
further 
optimizations that may be available at this point in time.  There are other
things 
(like speeding up status on large repos) I need to work on first.

> > +	/*
> > +	 * Checking out the requested commit may require updating the
> working
> > +	 * directory and index, let the merge handle it.
> > +	 */
> > +	if (opts->force_detach)
> > +		return 1;
> 
> This does not make much sense to me.  After "git branch -f foo HEAD",
there
> is no difference in what is done to the index and the working directory
> between "git checkout --detach HEAD" and "git checkout foo", is there?
> 

I'm attempting to optimize for a single, common path where checkout is 
just creating a new branch (ie "git checkout -b foo") to minimize the 
possibility that I broke some other path I didn't fully understand.  

It is quite possible that there are cases where the index and/or working
directory do not need to be updated or where a merge won't actually 
change anything that this test is not optimized for.  Perhaps I should 
emphasize the "*may* require updating the working directory" in my 
comment.  Because it *could* happen, I let the code fall back to the
old behavior.

> > +	/*
> > +	 * opts->writeout_stage cannot be used with switching branches so is
> > +	 * not tested here
> > +	 */
> > +
> > +	 /*
> > +	  * Honor the explicit ignore requests
> > +	  */
> > +	if (!opts->overwrite_ignore || opts->ignore_skipworktree
> > +		|| opts->ignore_other_worktrees)
> > +		return 1;
> 
> Style.  I think you earlier had
> 
> 	if (a || b ||
>             c)
> 
> and here you are doing
> 
> 	if (a || b
>             || c)
> 
> Please pick one and stick to it (I'd pick the former).

Done

> 
> > +	 /*
> > +	 * If we're not creating a new branch, by definition we're changing
> > +	 * the existing one so need to do the merge
> > +	 */
> > +	if (!opts->new_branch)
> > +		return 1;
> 
> Sorry, but I fail to follow that line of thought.  Starting from a state
where
> your HEAD points at commit A,
> 
>  - switching to a detached HEAD pointing at a commit A,
>  - switching to an existing branch that already points at the same
>    commit A, and
>  - force updating an existing branch that was pointing at something
>    else to point at the same commit A,
> 
> would have the same effect as creating a new branch at commit A and
> switching to it, no?  The same comment applies to the remainder of this
> function.
> 
> More importantly, merge_working_tree() checks things other than what this
> function is checking.  For example, it prevents you from branch-switching
> (whether it is to switch to an existing branch that has the same commit as
the
> current HEAD, to switch to detached HEAD state at the same commit as the
> current HEAD, or to switch to a new branch that points at the same commit
> as the current HEAD) if your index is unmerged (i.e. you are in the middle
of
> a mergy operation).
> 
> So my gut feeling is that this:
> 
> > +	/*
> > +	 * Optimize the performance of "git checkout foo" by skipping the
call
> > +	 * to merge_working_tree where possible.
> > +	 */
> > +	if (needs_working_tree_merge(opts, &old, new)) {
> > +		ret = merge_working_tree(opts, &old, new,
> &writeout_error);
> 
> works at the wrong level.  The comment up to 'Optimize the performance of
> "git checkout foo"' may correctly state what we want to achieve, but I
think
> we should do so not with "by skipping the call to", but with "by
optimizing
> merge_working_tree()".
> 

I agree that optimizing merge_working_tree could result in even greater 
savings and could definitely optimize for more paths/options than this 
patch. While I'd love to see that done, I'm also happy to get a 10x 
improvement in the common case of creating a new branch.

I'll reroll the patch moving the current optimization into 
merge_working_tree and fixing up the style issues you pointed out.

> Thanks.
> 



  reply	other threads:[~2016-09-12 18:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-09 19:25 [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
2016-09-09 21:55 ` Junio C Hamano
2016-09-12 18:12   ` Ben Peart [this message]
2016-09-12 20:31     ` Junio C Hamano
2016-09-13 12:33       ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='13ef001d20d21$2d2ea840$878bf8c0$@gmail.com' \
    --to=peartben@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).