Re: [PATCH v4] sparse-checkout.txt: new document with sparse-checkout directions

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Elijah Newren <newren@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Victoria Dye <vdye@github.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>,
	Matheus Tavares <matheus.bernardino@usp.br>,
	Glen Choo <chooglen@google.com>,
	Martin von Zweigbergk <martinvonz@google.com>
Subject: Re: [PATCH v4] sparse-checkout.txt: new document with sparse-checkout directions
Date: Tue, 15 Nov 2022 21:49:08 -0800	[thread overview]
Message-ID: <CABPp-BHaKH4sOPx2tx7CU+Uymvtu=mU1ZweGBDdWvhb-FgGA_Q@mail.gmail.com> (raw)
In-Reply-To: <CAOLTT8TzpfoH7pz7gxgFvNWOaUZUcg1q_Tap+2anwHfAUgDV8Q@mail.gmail.com>

On Mon, Nov 14, 2022 at 8:03 PM ZheNing Hu <adlternative@gmail.com> wrote:
> Elijah Newren via GitGitGadget <gitgitgadget@gmail.com> 于2022年11月6日周日 14:04写道：
[...]
> > +sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to
> > +       define the set of files of interest.  A warning: It is easy to
> > +       over-use this term (or the shortened "patterns" term), for two
> > +       reasons: (1) users in cone mode specify directories rather than
> > +       patterns (their directories are transformed into patterns, but
> > +       users may think you are talking about non-cone mode if you use the
> > +       word "patterns"), and (b) the sparse specification might
>
> nit: s/(b)/(2)/g

Thanks.

> > +sparse specification: The set of paths in the user's area of focus.  This
> > +       is typically just the tracked files that match the sparsity
> > +       patterns, but the sparse specification can temporarily differ and
> > +       include additional files.  (See also "Sparse specification
> > +       vs. sparsity patterns")
> > +
> > +       * When working with history, the sparse specification is exactly
> > +         the set of files matching the sparsity patterns.
> > +       * When interacting with the working tree, the sparse specification
> > +         is the set of tracked files with a clear SKIP_WORKTREE bit or
> > +         tracked files present in the working copy.
>
> I'm guessing what you mean here is:
> Some files are stored with a flag bit of !SKIP_WORKTREE in its index entry.
> But files are "vivifying" (restore to worktree) or new files added to
> index (tracked files),
> they also belong to the sparse specification.

For this case, when interacting with the working tree, I mean the
union of the following two sets of files:
  * files with !SKIP_WORKTREE in the index entry
  * files present in the working copy whose index entry has the
SKIP_WORKTREE bit set.

The fact that files might be new index entries (i.e. newly tracked
files) is irrelevant; it only matters whether such files fit into one
of the two categories above or not.

The fact that files have been "vivified" is slightly ambiguous and
thus a bad way to define this set.  When git vivifies files, it'll
clear the SKIP_WORKTREE bit.  If an editor, or script external to git,
or something else restores the file, it will likely overlook that
detail.  We want vivified files to be part of the sparse specification
when interacting with the working tree regardless of whether the
SKIP_WORKTREE bit was correctly updated, so I defined it the way I did
to remove such ambiguity.  (I guess I should note that as per
af6a51875a ("repo_read_index: clear SKIP_WORKTREE bit from files
present in worktree", 2022-01-14), git will clear the SKIP_WORKTREE
bit for files present in the working copy as one of the first things
it does, but that could leave people wondering whether I meant the
SKIP_WORKTREE bit was set as of the time of git invocation.  So, I
explicitly call out files present in the working copy for which the
index entry has the SKIP_WORKTREE bit set, so folks know these files
are definitely included in the sparse specification.)

> I think we can add some examples to describe these terms.
>
> #!/bin/sh
>
> set -x
>
> rm -rf mono-repo
> git init mono-repo -b main
> (
>   cd mono-repo &&
>   mkdir p1 p2 &&
>   echo a >p1/a &&
>   echo b >p1/b &&
>   echo a >p2/a &&
>   echo b >p2/b &&
>   git add . &&
>   git commit -m ok &&
>   git sparse-checkout set p1 &&
>   git ls-files -t &&
>   echo a >>p1/a &&
>   echo b >>p1/b &&
>   mkdir p2 p3 &&
>   echo next >>p2/a &&
>   echo next >>p3/c &&
>   git add p3/c &&
>   # p2/a and p3/c vivify
>   git ls-files -t &&
>   # compare wortree/commit
>   git --no-pager diff HEAD --name-only
> )

You've added a bunch of code with this example, but you have not said
what the output should be, so how exactly does this help describe the
terms?

> > +       * When modifying or showing results from the index, the sparse
> > +         specification is the set of files with a clear SKIP_WORKTREE bit
> > +         or that differ in the index from HEAD.
>
> #!/bin/sh
>
> set -x
>
> rm -rf mono-repo
> git init mono-repo -b main
> (
>   cd mono-repo &&
>   mkdir p1 p2 &&
>   echo a >p1/a &&
>   echo b >p1/b &&
>   echo a >p2/a &&
>   echo b >p2/b &&
>   git add . &&
>   git commit -m ok &&
>   git sparse-checkout set p1 &&
>   git update-index --chmod=+x p2/a &&
>   # compare commit/index
>   git --no-pager diff --cached --name-only
> )

Same issue here; you haven't stated the expected output of these
commands, so I don't see how they help with the description at all.

Perhaps it's worth noting why I think the sparse specification should
be extended when dealing with the index:

  * "mergy" commands (merge, rebase, cherry-pick, am, revert) can
modify the index outside the sparsity patterns, without creating a
commit.
  * `git commit` (or `rebase --continue`, or whatever) will create a
commit from whatever staged versions of files there are
  => `git status` should show what is about to be committed
  => `git diff --cached --name-only` ought to be usable to show what
is to be committed
  => `git grep --cached ...` ought to be usable to search through what
is about to be committed

See also https://lore.kernel.org/git/CABPp-BESkb=04vVnqTvZyeCa+7cymX7rosUW3rhtA02khMJKHA@mail.gmail.com/
(starting with the paragraph with "leery" in it), and the thread
starting there.  If the sparse specification is not expanded, users
will get some nasty surprises, and the only other alternative I can
think of to avoid such surprises would be making several commands
always run full tree.  Running full-tree with a non-default option to
run sparse forces behavior A folks into a "pick your poison"
situation, which is not nice.  Extending the sparse specification to
include files whose index entries do not match HEAD for index-related
operations provides the nice middle ground that avoids such usability
problems while also allowing users to avoid operating on a full tree.

[...]
>  I think A's users might need a little more refinement.
>
> A: Some users are _only_ interested in the sparse portion of the repo,
> but they don't want to download all the blobs, they can accept to download
> other data later using partial-clone, which can reduce the local storage size.
> (Current default behavior)

Behavior A is definitely not the current default behavior.  Also,
behavior A is not tied to partial clones; some users may well want it
even with a dense clone, so we need to avoid suggesting it is only for
users with partial clones.  (Though, if users are using partial clones
with behavior A, then I agree with the part you wrote other than your
parenthetical comment.)

> A** : Some users are _only_ interested in the sparse portion of the repo,
> but they want to download all the blobs in it to avoid some unnecessary
> network connections afterwards.

Here you just repeated `A*` but relabelled it as `A**`.  Yes, this one
is explicitly tied to partial clone behavior.

[...]
> > +    The fact that files can move between the 'tracked' and 'untracked'
> > +    categories means some commands will have to treat untracked files
> > +    differently.  But if we have to treat untracked files differently,
> > +    then additional commands may also need changes:
> > +
> > +      * status
> > +      * clean
> > +
>
> I'm a bit worried about git status, because it's used in many shells
> (e.g. zsh) i
> in the git prompt function. Its default behavior is restricted, otherwise users
> may get blocked when they use zsh to cd to that directory. I don't know how
> to reproduce this problem (since the scenario is built on checkout to a local
> unborn branch).

Could you elaborate?  I'm not sure if you are talking about an
existing problem that you are worried about being exacerbated, or a
hypothetical problem that could occur with changes.  Further, your
wording is so vague about the problem, that I have no idea what its
nature is or whether any changes to status would even possibly have
any bearing on it.  But the suggested changes to git status are
simply:

> > +    In particular, `status` may need to report any untracked files outside
> > +    the sparsity specification as an erroneous condition (especially to
> > +    avoid the user trying to `git add` them, forcing `git add` to display
> > +    an error).

[...]
> > +    ls-files may be slightly special in that e.g. `git ls-files -t` is
> > +    often used to see what is sparse and what is not.  Perhaps -t should
> > +    always work on the full tree?
> > +
>
> Recently git ls-files added a --format option, perhaps this can be modified to
> show if a file is SKIP_WORKTREE in the future.

If so, then you've made my question also applicable for `--format`;
much like -t, should --format always work on the full tree?  (Or maybe
just when the format specifies the skip worktree bit?)

[...]
> > +  * If a config option is added (sparse.scope?) what should the values and
> > +    description be?  "sparse" (behavior A), "worktree-sparse-history-dense"
> > +    (behavior B), "dense" (behavior C)?  There's a risk of confusion,
> > +    because even for Behaviors A and B we want some commands to be
> > +    full-tree and others to operate sparsely, so the wording may need to be
> > +    more tied to the usecases and somehow explain that.  Also, right now,
> > +    the primary difference we are focusing is just the history-querying
> > +    commands (log/diff/grep).  Previous config suggestion here: [13]
> > +
>
> Maybe sparse.scope={sparse, all}?

I guess that's people's common first guess.  However, when you dig in,
I think this would be badly broken -- see my response to Stolee I just
sent out.

next prev parent reply	other threads:[~2022-11-16  5:58 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-25  0:09 [PATCH] sparse-checkout.txt: new document with sparse-checkout directions Elijah Newren via GitGitGadget
2022-09-26 17:20 ` Junio C Hamano
2022-09-26 17:38 ` Junio C Hamano
2022-09-27  3:05   ` Elijah Newren
2022-09-27  4:30     ` Junio C Hamano
2022-09-26 20:08 ` Victoria Dye
2022-09-26 22:36   ` Junio C Hamano
2022-09-27  7:30     ` Elijah Newren
2022-09-27 16:07       ` Junio C Hamano
2022-09-28  6:13         ` Elijah Newren
2022-09-27  6:09   ` Elijah Newren
2022-09-27 16:42   ` Derrick Stolee
2022-09-28  5:42     ` Elijah Newren
2022-09-27 15:43 ` Junio C Hamano
2022-09-28  7:49   ` Elijah Newren
2022-09-27 16:36 ` Derrick Stolee
2022-09-28  5:38   ` Elijah Newren
2022-09-28 13:22     ` Derrick Stolee
2022-10-06  7:10       ` Elijah Newren
2022-10-06 18:27         ` Derrick Stolee
2022-10-07  2:56           ` Elijah Newren
2022-09-30  9:54     ` ZheNing Hu
2022-10-06  7:53       ` Elijah Newren
2022-10-15  2:17         ` ZheNing Hu
2022-10-15  4:37           ` Elijah Newren
2022-10-15 14:49             ` ZheNing Hu
2022-09-30  9:09   ` ZheNing Hu
2022-09-28  8:32 ` [PATCH v2] " Elijah Newren via GitGitGadget
2022-10-08 22:52   ` [PATCH v3] " Elijah Newren via GitGitGadget
2022-11-06  6:04     ` [PATCH v4] " Elijah Newren via GitGitGadget
2022-11-07 20:44       ` Derrick Stolee
2022-11-16  4:39         ` Elijah Newren
2022-11-15  4:03       ` ZheNing Hu
2022-11-16  3:18         ` ZheNing Hu
2022-11-16  6:51           ` Elijah Newren
2022-11-16  5:49         ` Elijah Newren [this message]
2022-11-16 10:04           ` ZheNing Hu
2022-11-16 10:10             ` ZheNing Hu
2022-11-16 14:33               ` ZheNing Hu
2022-11-19  2:36                 ` Elijah Newren
2022-11-19  2:15             ` Elijah Newren
2022-11-23  9:08               ` ZheNing Hu
2023-01-14 10:18           ` ZheNing Hu
2023-01-20  4:30             ` Elijah Newren
2023-01-23 15:05               ` ZheNing Hu
2023-01-24  3:17                 ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BHaKH4sOPx2tx7CU+Uymvtu=mU1ZweGBDdWvhb-FgGA_Q@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=chooglen@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=martinvonz@google.com \
    --cc=matheus.bernardino@usp.br \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).