git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ben Peart" <peartben@gmail.com>
To: <git@vger.kernel.org>
Cc: "Ben Peart" <Ben.Peart@microsoft.com>, <pclouds@gmail.com>,
	"Jeff Hostetler" <jeffhost@microsoft.com>, <philipoakley@iee.org>,
	"'Junio C Hamano'" <gitster@pobox.com>
Subject: RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
Date: Wed, 28 Sep 2016 13:02:04 -0400	[thread overview]
Message-ID: <004d01d219aa$0a941fa0$1fbc5ee0$@gmail.com> (raw)

Resending

> -----Original Message-----
> From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On
> Behalf Of Philip Oakley
> Sent: Saturday, September 24, 2016 3:31 PM
> To: Junio C Hamano <gitster@pobox.com>
> Cc: Ben Peart <Ben.Peart@microsoft.com>; pclouds@gmail.com;
> git@vger.kernel.org
> Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Hi Junio,
> 
> From: "Junio C Hamano" <gitster@pobox.com>
> > "Philip Oakley" <philipoakley@iee.org> writes:
> >
> >>> > >"git checkout -b foo" (without -f -m or <start_point>) is defined
> >>> > >in the manual as being a shortcut for/equivalent to:
> >>> > >
> >>> > >        (1a) "git branch foo"
> >>> > >        (1b) "git checkout foo"
> >>> > >
> >>> > >However, it has been our experience in our observed use cases and
> >>> > >all the existing git tests, that it can be treated as equivalent
to:
> >>> > >
> >>> > >        (2a) "git branch foo"
> >>> > >        (2b) "git symbolic-ref HEAD refs/heads/foo"
> >>> > >...
> >>> > >
> >>> > I am still not sure if I like the change of what "checkout -b" is
> >>> > this late in the game, though.
> >>>
> >>> ...
> >>> That said, you're much more on the frontline of receiving negative
> >>> feedback about doing that than I am. :)  How would you like to
> >>> proceed?
> >>
> >> I didn't see an initial confirmation as to what the issue really was.
> >> You indicated the symptom ('a long checkout time'), but then we
> >> missed out on hard facts and example repos, so that the issue was
> >> replicable.
> >
> > I took it as a given, trivial and obvious optimization opportunity,
> > that it is wasteful having to traverse two trees to consolidate and
> > reflect their differences into the working tree when we know upfront
> > that these two trees are identical, no matter what the overhead for
> > doing so is.
> 
> I agree, and I believe Ben agrees.
> 

Correct.  In my original patch request I put more specific information on 
the impact this optimization has in our specific case (reducing the cost 
from 166 seconds to 16 seconds).

> >
> >> At the moment there is the simple workaround of an alias that
> >> executes that two step command dance to achieve what you needed, and
> >> Junio has outlined the issues he needed to be covered from his
> >> maintainer perspective (e.g. the detection of sparse checkouts).
> >> Confirming the root causes would help in setting a baseline.
> >>
> >> I hope that is of help - I'd seen that the discussion had gone quiet.
> >
> > Some of the problems I have are:
> >
> > (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
> >     and "git checkout HEAD" (no other parameters to any of them)
> >     ought to give identical index and working tree.  It is too
> >     confusing to leave subtly different results that will lead to
> >     hard to diagnose bugs for only one of them.
> >
> > (2) The proposed log message talks only about "performance
> >     optimization",
> 
> >                                while the purpose of the change is more
> > about
> >     changing the definition
> 
> Here I think is the misunderstanding. His purpose is NOT to change the
> definition (IIUC). As I read the message you reference below (and Ben's
other
> messages), I understood that he was trying to achieve what you said (i.e.
> optimise the trivial and obvious opportunity) of selecting for the common
> case (underlying conditions) where the two command sequences are
> identical. If the selected case / conditions is not identical then it is
defined
> wrongly...
> 
> I suspect that it was Ben's 'soft' explanation that allowed the discussion
to
> diverge.
> 

I'm unaccustomed to doing reviews like this via email so have been 
struggling with how to most effectively communicate about the proposed
change.  I appreciate any help and understanding as I go through this
for the first time.

My intention was not to change the users expected results which
I believe are to "create a new branch and switch to it."  We reinforce
that expectation with the output of the command which completes 
with the text "Switched to a new branch 'foo'"

> 
> >                                                 of what "git checkout -b
> > NEW" is from
> >     "git branch NEW && git checkout NEW" to "git branch NEW && git
> >     symbolic-ref HEAD refs/heads/NEW".  The explanation in a Ben's
> >     later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does
> >     a much better job contrasting the two.
> >
> > (3) I identified only one difference as an example sufficient to
> >     point out why the patch provided is not a pure optimization but
> >     behaviour change.  Fixing that example alone to avoid change in
> >     the behaviour is trivial (see if the "info/sparse-checkout"
> >     file is present and refrain from skipping the proper checkout),
> 
> This is probably the point Ben needs to take on board to narrow the
> conditions down. There may be others.
> 

The fact that "git checkout -b NEW" updates the index and as a
result reflects any changes in the sparse-checkout and the issue 
Junio pointed out earlier about not calling show_local_changes 
at the end of merge_working_tree are the only difference in behavior
I am aware of.  Both of these are easily rectified.

That said, given we are skipping huge amounts of work by no longer 
merging the commit trees, generating a new index, and merging the 
local modifications in the working tree, it is possible that there are
other behavior changes I'm just not aware of.

> >     but a much larger problem is that I do not know (and Ben does
> >     not, I suspect) know what other behaviour changes the patch is
> >     introducing, and worse, the checks are sufficiently dense too
> >     detailed and intimate to the implementation of unpack_trees()
> >     that it is impossible for anybody to make sure the exceptions
> >     defined in this patch and updates to other parts of the system
> >     will be kept in sync.
> 
> I did not believe he was proposing such a change to behaviour, hence his
> difficulty in responding (or at least that is my perception). I.e. he was
> digging a hole in the wrong place.
> 
> It is possible that he had accidentally introduced a behavious change, and
> having failed to explictly say "This patch (should) produces no behavious
> change", which then continued to re-inforce the misunderstanding.
> 
> >
> > So my inclination at this point, unless we see somebody invents a
> > clever way to solve (3), is that any change that violates (1),
> > i.e. as long as the patch does "Are we doing '-b NEW'?  Then we do
> > something subtly different", is not acceptable, and solving (3) in a
> > maintainable way smells like quite a hard thing to do.  But it would
> > be ideal if (3) is solved cleanly, as we will then not have to worry
> > about changing behaviour at all and can apply the optimization for
> > all of the four cases equally.  As a side effect, that approach
> > would solve problem (2) above.
> >
> > If we were to punt on keeping the sanity (1) and introduce a subtly
> > different "create a new branch and point the HEAD at it", an easier
> > way out may be be one of
> >
> > 1. a totally new command, e.g. "git branch-switch NEW" that takes
> >    only a single argument and no other "checkout" options, or
> >
> > 2. a new option to "git checkout" that takes _ONLY_ a single
> >    argument and incompatible with any other option or command line
> >    argument, or
> >
> > 3. an alias that does "git branch" followed by "git symbolic-ref".
> >
> > Neither of the first two sounds palatable, though.
> 
> It will need Ben to come back and clarify, if he did, or did not, want any
> behaviour change (beyond speed of action;-)
> 

There is a subtlety here in what is meant by "any behavior change."
I did not want to change the users expectations of what this command
is used for.  The only noticeable behavior change should only be that it 
sped up by an order of magnitude.  

To get that speed up, there is a change in behavior from git's 
perspective as it is no longer doing a bunch of work it used to do 
which is what is saving the time.

I was aware that skipping the commit merge/new index/merge working 
tree meant that "git checkout NEW" would no longer update these to 
reflect any potential changes to the sparse-checkout file.  

To determine if this would change the results the user was *expecting*,  
I searched the web and found that all the instructions I could locate
that taught people how to update the index/working tree after
making changes to the sparse-checkout file instructed them to use
"git read-tree -mu HEAD."  I didn't find any that told people to use
"git checkout -b NEW"  

Finally, when I made the optimization to skip these steps I then
verified that the test suite still passed all tests.  I realize that 
there is not 100% coverage of tests but I thought it was a good
indication that none of them were impacted by this optimization.

I've tried to think of a way to solve (3) in a more maintainable way 
but have not been able to come up with anything.  Ultimately,
to ensure are only applying the optimization in this specific case,
we have to test to make sure other options don't require the extra
steps.  I'm open to suggestions!

I'm going to be out for the next 2 weeks so will be unable to respond 
to activity on the thread but a co-worker who has been involved will
be responsive to feedback and rolling any new versions of the patch.

Thanks,

Ben




             reply	other threads:[~2016-09-28 17:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-28 17:02 Ben Peart [this message]
2016-09-28 17:52 ` [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2016-09-13 14:26 Ben Peart
2016-09-13 22:34 ` Junio C Hamano
2016-09-14  6:30   ` Oleg Taranenko
2016-09-14 15:48     ` Junio C Hamano
     [not found]   ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com>
     [not found]     ` <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com>
2016-09-19 13:18       ` Ben Peart
2016-09-19 16:30         ` Junio C Hamano
2016-09-19 17:03           ` Junio C Hamano
2016-09-21 18:32             ` Ben Peart
2016-09-24 14:28               ` Philip Oakley
2016-09-24 18:26                 ` Junio C Hamano
2016-09-24 19:31                   ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='004d01d219aa$0a941fa0$1fbc5ee0$@gmail.com' \
    --to=peartben@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=pclouds@gmail.com \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).