git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shaun Case <warmsocks@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Derrick Stolee <stolee@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Possible regression for sparse-checkout in git 2.27.0
Date: Thu, 4 Jun 2020 13:50:08 -0700	[thread overview]
Message-ID: <CAD3+_6AK390F5iVqCmP-FY8MxJtcUoS7HsbunhZ0qNzOdT53CQ@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BHyDUrOLg6-VONewbmXNFHUvKoDXfRTLgg8aEhJFtJLuQ@mail.gmail.com>

Hi Elijah, Derrick,

On Thu, Jun 4, 2020 at 12:27 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> On Wed, Jun 3, 2020 at 8:36 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 6/3/2020 12:37 AM, Elijah Newren wrote:
> > > I think it'd be more natural to run
> >
> > >   git clone --filter=blob:none --sparse
> > > https://github.com/r-spacex/launch-timeline.git
> >
> > > in place of the combination of
> >
> > >   git clone --filter=blob:none --no-checkout
> > > https://github.com/r-spacex/launch-timeline.git
> > >   git sparse-checkout init --cone
> >
> > > since the --sparse flag was added just for this kind of case -- to do
> > > a clone but start with only a few things checked out.  It's easier, is
> > > the route we're moving towards, and as a bonus also happens to work.
> >
> > Just one warning: the --sparse option in "git clone" does not currently
> > enable core.sparseCheckoutCone, so running "git sparse-checkout init --cone"
> > afterwards is a good idea, or else your "git sparse-checkout (set|add)"
> > commands will not behave the way you expect.

Ok, great.  My production script now does this:

git clone --filter=blob:none --sparse <URL>
git sparse-checkout init --cone
git sparse-checkout add <PATH1>
git sparse-checkout add <PATH2>
[ ... ]

... and everything works as expected in 2.26.2 and 2.27.0. Perfection!

See below for why I was doing it the other way.

> > (I think that I will propose a change in behavior to make it do so during
> > this release cycle.)

That sounds good to me.  Could you also consider adding an error
message for git sparse-checkout init, if something unexpected follows?  My
rationale is that my script initially contained this:

     git sparse-checkout init cone

Note the missing "--" in front of cone.  This failed silently, so I
thought I was in
cone mode, but wasn't.  An error message would have helped.  Your proposal,
having it happen automatically, would too.

> > > A bit of a side note, or a few of them, but this command of yours is broken:
> > >   git sparse-checkout set README.md
> > > because --cone mode means you are specifying *directories* that should
> > > be checked out.

Thank you for pointing this out.  This is a surprise.  I'm migrating some large
Subversion repos and workspace setup scripts to git (thus the interest
in partial
cloning and sparse checkouts), and svn has the ability to sparse checkout
individual files as well as directories, so I expected it to work the
same way in git.

Is this limitation a design decision, a technical limitation, a
planned feature?  It
seems to be fairly popular in svn, and  there is even more interest
for it in git:

https://stackoverflow.com/questions/122107/checkout-one-file-from-subversion
https://stackoverflow.com/questions/2466735/how-to-sparsely-checkout-only-one-single-file-from-a-git-repository

> > > Currently, this gives no error, it instead silently
> > > drops you back to non-cone mode, which seems bad to me.
> > > sparse-checkout should provide some kind of error -- or at very least
> > > a warning -- when you make that mistake.

Agreed.

I'm going to leave the following here for context, but please scroll down.

> > > Now let's talk about the commit in question that changed behavior
> > > here.  The point of sparse-checkout is never to switch branches or
> > > checkout new commits; all it does is update which paths are in the
> > > current working directory.  A related point to this is it should never
> > > add or remove entries from the index and shouldn't change any hashes
> > > of files in the index.  It used to violate this, at first via an
> > > implementation that was literally invoking `git read-tree -mu HEAD` in
> > > a subprocess, and then later using internal code equivalent to
> > > invoking that command in a subprocess.  But by violating the
> > > leave-index-entries-alone mandate, it left folks who were in the
> > > middle of a rebase and wanted to update their sparse-checkout to
> > > include some more directories in their working tree in a precarious
> > > spot -- if they didn't update, then they didn't have the files
> > > necessary to build, and if they did forcibly update via `git read-tree
> > > -mu HEAD` then their staged changes would all get wiped out.  I spent
> > > some quality time helping users recover their files and teaching them
> > > about the git storage model.
> > >
> > > So that brings us back to your original question.  When you said
> > > --no-checkout, it means that there is no commit checked out and the
> > > index is empty.  update_sparsity() is correctly toggling the
> > > SKIP_WORKTREE bits for the existing index entries that don't match the
> > > sparsity patterns, and it is correctly calling check_updates().
> > > check_updates() is correctly checking for files currently in the index
> > > which have toggled to being needed in the current worktree so that it
> > > can issue downloads related to promisor packs.  The problem is just
> > > that there aren't any index entries to begin with, so there are no
> > > SKIP_WORKTREE bits to update, and thus no files that need to be
> > > downloaded.
> > >
> > > It seems a bit risky to make sparse-checkout start doing
> > > checkout/switch behavior and adding entries to the index.  There's a
> > > couple ways forward.  One, we could decide this is a special edge or
> > > corner case where we allow it: if the index is completely empty, then
> > > there's no data to lose and thus we could make `git sparse-checkout
> > > init [--cone]` in that one case use the old 'read-tree -mu HEAD'
> > > logic.  Alternatively, we could just require users to run 'git reset
> > > --hard' at the end of your script.
> > >
> > > Stolee: Thoughts?
> >
> > I agree that using "--sparse" instead of "--no-checkout" is the
> > best way forward for now, but I'll classify that as a workaround
> > and not necessarily the end of the conversation.
>
> Agreed.

Agree too.  I also agree that Elijah's changes are necessary, and will save me
a ton of headaches going forward.  I don't want them backed out.

So, it seems --sparse instead of --no-checkout is what I should have been doing
from the beginning, I had to fiddle a bit to get what I had working,
by following
some online resources and experimenting, and what I ended up with working
satisfactorily was --no-checkout, with 2.26.2.

Why?

Perhaps the messaging has inadvertently become a little muddled.  I started
by working on getting partial cloning working; when I searched, these are the
resources I found and followed:

https://about.gitlab.com/blog/2020/03/13/partial-clone-for-massive-repositories/
https://github.blog/2020-01-13-highlights-from-git-2-25/
https://stackoverflow.com/questions/4114887/is-it-possible-to-do-a-sparse-checkout-without-checking-out-the-whole-repository
https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/
https://docs.gitlab.com/ce/topics/git/partial_clone.html

The first four mention --no-checkout, but do not mention --sparse.  The last one
actually gives proper guidance, but by the time I came across it, I had the
--no-checkout method working and I came away not even realizing --sparse
exists.  Perhaps I'm the only one who went down this path.  If not, the
--no-checkout case may be a bit more common than you might expect.

> > In general, the commit in question is doing something extremely
> > valuable for common situations, like rebase that you mention.
> > I also think that this change in behavior is warranted by the
> > clear warning placed at the top of the docs [1]:
> >
> >         THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE
> >         BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-
> >         CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.
> >
> > [1] https://git-scm.com/docs/git-sparse-checkout#_description

Got it.  Consider your butts covered.  :)

Thanks for the guidance and explanation.

Shaun

  reply	other threads:[~2020-06-04 20:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-03  1:08 Possible regression for sparse-checkout in git 2.27.0 Shaun Case
2020-06-03  4:37 ` Elijah Newren
2020-06-03 15:36   ` Derrick Stolee
2020-06-04  7:27     ` Elijah Newren
2020-06-04 20:50       ` Shaun Case [this message]
2020-06-05  0:32         ` Derrick Stolee
2020-06-05  0:56           ` Junio C Hamano
2020-06-05 13:07             ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD3+_6AK390F5iVqCmP-FY8MxJtcUoS7HsbunhZ0qNzOdT53CQ@mail.gmail.com \
    --to=warmsocks@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).