git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Shaun Case <warmsocks@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: Possible regression for sparse-checkout in git 2.27.0
Date: Tue, 2 Jun 2020 21:37:05 -0700	[thread overview]
Message-ID: <CABPp-BGvc3GZfFuiXaqDk6391ZexQ7D3x9gr5JK6-L+rDQQ4sg@mail.gmail.com> (raw)
In-Reply-To: <CAD3+_6CUX0RPr-dgfUnfGDNNfqu80SYCskioYnu=MS6aJv2dEQ@mail.gmail.com>

Hi,

On Tue, Jun 2, 2020 at 6:12 PM Shaun Case <warmsocks@gmail.com> wrote:
>
> I'm seeing an unexpected change in behavior of git sparse-checkout
> between 2.26.2 and 2.27.0 on CentOS 7.8.  Problem manifests with both
> github and multiple recent versions of gitlab.
>
> Sources came from
> https://mirrors.edge.kernel.org/pub/software/scm/git/, I verified the
> sha256 hashes for both 2.26.2 and 2.27.0 matched.
>
> Built with
> gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
>
> By doing:
>
> ./configure \
>     --prefix=/usr/local \
>     --with-python=python3
> make -j 10
> sudo make -j 10 install install-doc install-html
>
> Problem detail:
>
> In 2.26.2, git sparse-checkout commands...
>
>   init --cone
>   set
>   add
>
> ... all bring in files.  In 2.27.0, no files appear except the .git subdir
> until I do:
>
> git read-tree -mu HEAD
>
> However, according to my reading of this, git read-tree should not be
> necessary in 2.27.0:
> https://stackoverflow.com/questions/28760940/why-does-one-call-git-read-tree-after-a-sparse-checkout/
>
> When I do export GIT_TRACE_PACKET=1, I see that the init --cone, set
> and add sparse-checkout commands all talk to the server.  In 2.27.0,
> these commands do not result in any packets sent to the server.  It
> sounds like they should be calling update_sparsity() themselves, but
> aren't.  Is that wrong?  I couldn't find a git CLI client command to
> invoke it directly except for read-tree.
>
> Below is a small bash script and its output that shows the problem
> with a small random public repo on github.
>
> Best regards,
> Shaun
>
> Script:
> ==============================================
> #!/bin/bash -x
>
> export GIT_TRACE_PACKET=0
> git --version
> git clone --filter=blob:none --no-checkout
> https://github.com/r-spacex/launch-timeline.git
> cd launch-timeline
>
> # Note no server traffic at from here...
> export GIT_TRACE_PACKET=1
> git sparse-checkout init --cone
> git sparse-checkout set README.md
> git sparse-checkout add css
> # ... to here.
>
> # There is nothing here yet except .git/
> ls -las
>
> # This brings in the files as expected, uncomment to verify.
> # git read-tree -mu HEAD
> ==============================================
>
>
> Output:
> ==============================================
> 3sc3396:/tmp>demo-git-sparse-checkout-bug.sh
> + export GIT_TRACE_PACKET=0
> + GIT_TRACE_PACKET=0
> + git --version
> git version 2.27.0
> + git clone --filter=blob:none --no-checkout
> https://github.com/r-spacex/launch-timeline.git
> Cloning into 'launch-timeline'...
> remote: Enumerating objects: 529, done.
> remote: Total 529 (delta 0), reused 0 (delta 0), pack-reused 529
> Receiving objects: 100% (529/529), 67.49 KiB | 987.00 KiB/s, done.
> Resolving deltas: 100% (168/168), done.
> + cd launch-timeline
> + export GIT_TRACE_PACKET=1
> + GIT_TRACE_PACKET=1
> + git sparse-checkout init --cone
> + git sparse-checkout set README.md
> + git sparse-checkout add css
> + ls -las
> total 68
>  4 drwxrwxr-x   3 scase scase  4096 Jun  2 17:46 .
> 60 drwxrwxrwt. 34 root  root  57344 Jun  2 17:46 ..
>  4 drwxrwxr-x   8 scase scase  4096 Jun  2 17:46 .git
> ==============================================

Thanks for the detailed report.  This change came from commit
f56f31af03 ("sparse-checkout: use new update_sparsity() function",
2020-03-27), made by me.  After playing with it for a bit, the
--filter=blob:none turns out to not be relevant to reproducing the
behavior of not getting any files in the working tree, other than
perhaps providing you a way to measure what was or wasn't happening.

I think it'd be more natural to run
  git clone --filter=blob:none --sparse
https://github.com/r-spacex/launch-timeline.git
in place of the combination of
  git clone --filter=blob:none --no-checkout
https://github.com/r-spacex/launch-timeline.git
  git sparse-checkout init --cone
since the --sparse flag was added just for this kind of case -- to do
a clone but start with only a few things checked out.  It's easier, is
the route we're moving towards, and as a bonus also happens to work.
;-)

A bit of a side note, or a few of them, but this command of yours is broken:
  git sparse-checkout set README.md
because --cone mode means you are specifying *directories* that should
be checked out.  Currently, this gives no error, it instead silently
drops you back to non-cone mode, which seems bad to me.
sparse-checkout should provide some kind of error -- or at very least
a warning -- when you make that mistake.

Now let's talk about the commit in question that changed behavior
here.  The point of sparse-checkout is never to switch branches or
checkout new commits; all it does is update which paths are in the
current working directory.  A related point to this is it should never
add or remove entries from the index and shouldn't change any hashes
of files in the index.  It used to violate this, at first via an
implementation that was literally invoking `git read-tree -mu HEAD` in
a subprocess, and then later using internal code equivalent to
invoking that command in a subprocess.  But by violating the
leave-index-entries-alone mandate, it left folks who were in the
middle of a rebase and wanted to update their sparse-checkout to
include some more directories in their working tree in a precarious
spot -- if they didn't update, then they didn't have the files
necessary to build, and if they did forcibly update via `git read-tree
-mu HEAD` then their staged changes would all get wiped out.  I spent
some quality time helping users recover their files and teaching them
about the git storage model.

So that brings us back to your original question.  When you said
--no-checkout, it means that there is no commit checked out and the
index is empty.  update_sparsity() is correctly toggling the
SKIP_WORKTREE bits for the existing index entries that don't match the
sparsity patterns, and it is correctly calling check_updates().
check_updates() is correctly checking for files currently in the index
which have toggled to being needed in the current worktree so that it
can issue downloads related to promisor packs.  The problem is just
that there aren't any index entries to begin with, so there are no
SKIP_WORKTREE bits to update, and thus no files that need to be
downloaded.

It seems a bit risky to make sparse-checkout start doing
checkout/switch behavior and adding entries to the index.  There's a
couple ways forward.  One, we could decide this is a special edge or
corner case where we allow it: if the index is completely empty, then
there's no data to lose and thus we could make `git sparse-checkout
init [--cone]` in that one case use the old 'read-tree -mu HEAD'
logic.  Alternatively, we could just require users to run 'git reset
--hard' at the end of your script.

Stolee: Thoughts?

  reply	other threads:[~2020-06-03  4:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-03  1:08 Possible regression for sparse-checkout in git 2.27.0 Shaun Case
2020-06-03  4:37 ` Elijah Newren [this message]
2020-06-03 15:36   ` Derrick Stolee
2020-06-04  7:27     ` Elijah Newren
2020-06-04 20:50       ` Shaun Case
2020-06-05  0:32         ` Derrick Stolee
2020-06-05  0:56           ` Junio C Hamano
2020-06-05 13:07             ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BGvc3GZfFuiXaqDk6391ZexQ7D3x9gr5JK6-L+rDQQ4sg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    --cc=warmsocks@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).