git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Victoria Dye <vdye@github.com>,
	Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <derrickstolee@github.com>,
	Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>,
	Matheus Tavares <matheus.bernardino@usp.br>,
	ZheNing Hu <adlternative@gmail.com>
Subject: Re: [PATCH] sparse-checkout.txt: new document with sparse-checkout directions
Date: Tue, 27 Sep 2022 00:30:34 -0700	[thread overview]
Message-ID: <CABPp-BESkb=04vVnqTvZyeCa+7cymX7rosUW3rhtA02khMJKHA@mail.gmail.com> (raw)
In-Reply-To: <xmqqv8p9ahly.fsf@gitster.g>

On Mon, Sep 26, 2022 at 3:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Victoria Dye <vdye@github.com> writes:
>
> >> +* Commands behaving the same regardless of high-level use-case
> >> +
> >> +  * commands that only look at files within the sparsity specification
> >> +
> >> +      * status
> >> +      * diff (without --cached or REVISION arguments)
> >> +      * grep (without --cached or REVISION arguments)
> >
> > 'status' and 'diff' currently show information about untracked files outside
> > the working tree (since, not being in the index, they don't have a
> > 'SKIP_WORKTREE' to use). Should that change with the proposed '--restrict'
> > option?
>
> Most likely not.  When sparsity specification is in effect, as you
> said elsewhere in your response, no files, whether tracked or
> untrcked, should exist that are outside your area of interest.
> Their presence should be reported as anomalies by "git status".
>
> Unless the command is being run with the "-uno" option, that is.

Oh, wow, that's something completely outside what I had considered.  I
had viewed sparse-checkouts as splitting "tracked files" into two
subsets.  As such, `--[no-]restrict` could only affect selecting
whether the smaller or larger set of tracked files was of interest.
From that viewpoint, untracked files seemed orthogonal, and thus there
couldn't be such a thing as an "anamalous untracked file".

But this idea is very interesting.  Hmm...

>
> > - 'switch', 'checkout' (switch-like), and 'read-tree -m' block the operation
> >   & advise on how to clean up the modified files to re-align with the
> >   sparsity patterns.
> > - 'reset --hard' silently drops the modified file and resets the
> >   'SKIP_WORKTREE' bit on the corresponding index entry.
> >
> > With the exception of 'reset --hard' (aggressively and unconditionally
> > cleaning the worktree & index is an important aspect of the command, IMO),
> > I'd personally like to see commands in this category align with the behavior
> > of 'switch' where they don't already. Regardless of what we decide, though,
> > I think it's probably worth documenting the "modified outside of sparsity
> > patterns" case.
>
> True.  I agree on both counts.
>
> > Also, 'read-tree' (no args) doesn't apply the 'SKIP_WORKTREE' bit to *any*
> > of the entries it reads into the index. Having all of your files suddenly
> > appear "deleted" probably isn't desired behavior, so it might be a good
> > candidate for the "Known bugs" section.
>
> I would imagine that it actually is OK to say that it is the
> responsibility of whoever invokes read-tree the plumbing command
> to reapply the skip-worktree bits and/or collapse the index entries
> outside the area of interest into trees afterwards.

I'll keep that in mind, but that sounds very error prone to me.

> >> +* Commands that differ for behavior A vs. behavior B:
> >> +
> >> +  * commands that make modifications:
> >
> > nit: "make modifications" -> "make modifications to the index"?
>
> That clarification actually raises an interesting question.  Do we
> want three level distinction, i.e. different behaviour between
> commands that touch and do not touch the working tree, between those
> that touch and do not touch the index, and between those that touch
> and do not touch the commit?
>
> As the index is merely a way to express what the user did to
> eventually create the next tree to be recorded in the commit, my gut
> feeling is that it may be easier to understand if we treated the
> working tree and the index at the same level, actually.  I.e. if
> grepping in the working tree of a sparse checkout does not find a
> match outside the cones of interest, it may make sense to do the
> same at least by default in "grep --cached" mode.
>
> If I understand Stolee's write-up on the use case of those in the
> camp B, they are more aware of the larger whole and expect to see
> hits outside the area they have checkout when running "grep HEAD".
> But in their use case, they do not touch (only look) the area
> outside their cone of interest, so if we limit the operation to
> their cone of interest by default for working tree, the same default
> probably should apply equally for an operation that inspect the
> index.

That is an interesting angle to view things; I wondered if an idea
along these lines was going to come up when I was first responding to
Shaoxuan.  I also wondered if people would come to different
conclusions on whether "git grep --cached" should search outside the
sparsity-paths depending upon whether the sparse index was in use.

One thing that makes me a little leery about this path is whether we
can consistently apply the scoped-to-sparse-specification rule for
index operations.  For example:

  * You previously agreed that `git format-patch` should ignore sparse
specification and operate full tree.
  * `git apply --cached $PATCH` only updates the index, and you
suggested in an alternate email that apply should operate full-tree
(at least with --index or without --cached, but I assume by extension
it probably also applies with --cached).
  * What if someone ran the last two commands, and then goes to commit
the result?  Do we want to scope `git commit` to only accept staged
changes within the sparse specification by default?  I thought we
wouldn't and marked commit as a full-tree operation, by default.
  * What if someone runs `git diff --cached` just before that commit?
Do we scope the diff to only those paths within the sparse
specification?
  * What if someone runs `git status` just before that commit?  Do we
only show staged changes within the sparse specification?

It feels like "git grep --cached" is perhaps the next thing along this
sequence, and I don't see a clear line where to draw that we should
limit things to the sparse specification for the index while treating
the other operations as full tree; it seems like something feels
broken or inconsistent in this sequence of commands if we attempt to
do so.


Also, I have some users in camp B.  They specifically have been using
"git grep --cached ..." for a few years now to find other code of
interest outside of their current sparse-checkout (often in stubbed
out dependencies or other projects that depend on the area you are
modifying).  This allows them to make internal API changes and find
the other sites that need to be modified, including outside the normal
sparse cone.  Perhaps I could re-teach them to use "git grep ... HEAD"
instead, but it may feel like a bit of a break to them.  I've found
"git grep --cached" being documented by others who wrote various "how
to work in sparse checkouts" documents, all commenting on this being
the trick to do a whole-tree search.  I did warn them that we might
change that command on them (and sparse-checkouts in general have a
warning about potentially changing behavior), but I'm a little
hesitant to do so.  So that's a second reason I lean towards treating
index searches the same as REVISION ones -- full-tree for camp B.

  reply	other threads:[~2022-09-27  7:31 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-25  0:09 [PATCH] sparse-checkout.txt: new document with sparse-checkout directions Elijah Newren via GitGitGadget
2022-09-26 17:20 ` Junio C Hamano
2022-09-26 17:38 ` Junio C Hamano
2022-09-27  3:05   ` Elijah Newren
2022-09-27  4:30     ` Junio C Hamano
2022-09-26 20:08 ` Victoria Dye
2022-09-26 22:36   ` Junio C Hamano
2022-09-27  7:30     ` Elijah Newren [this message]
2022-09-27 16:07       ` Junio C Hamano
2022-09-28  6:13         ` Elijah Newren
2022-09-27  6:09   ` Elijah Newren
2022-09-27 16:42   ` Derrick Stolee
2022-09-28  5:42     ` Elijah Newren
2022-09-27 15:43 ` Junio C Hamano
2022-09-28  7:49   ` Elijah Newren
2022-09-27 16:36 ` Derrick Stolee
2022-09-28  5:38   ` Elijah Newren
2022-09-28 13:22     ` Derrick Stolee
2022-10-06  7:10       ` Elijah Newren
2022-10-06 18:27         ` Derrick Stolee
2022-10-07  2:56           ` Elijah Newren
2022-09-30  9:54     ` ZheNing Hu
2022-10-06  7:53       ` Elijah Newren
2022-10-15  2:17         ` ZheNing Hu
2022-10-15  4:37           ` Elijah Newren
2022-10-15 14:49             ` ZheNing Hu
2022-09-30  9:09   ` ZheNing Hu
2022-09-28  8:32 ` [PATCH v2] " Elijah Newren via GitGitGadget
2022-10-08 22:52   ` [PATCH v3] " Elijah Newren via GitGitGadget
2022-11-06  6:04     ` [PATCH v4] " Elijah Newren via GitGitGadget
2022-11-07 20:44       ` Derrick Stolee
2022-11-16  4:39         ` Elijah Newren
2022-11-15  4:03       ` ZheNing Hu
2022-11-16  3:18         ` ZheNing Hu
2022-11-16  6:51           ` Elijah Newren
2022-11-16  5:49         ` Elijah Newren
2022-11-16 10:04           ` ZheNing Hu
2022-11-16 10:10             ` ZheNing Hu
2022-11-16 14:33               ` ZheNing Hu
2022-11-19  2:36                 ` Elijah Newren
2022-11-19  2:15             ` Elijah Newren
2022-11-23  9:08               ` ZheNing Hu
2023-01-14 10:18           ` ZheNing Hu
2023-01-20  4:30             ` Elijah Newren
2023-01-23 15:05               ` ZheNing Hu
2023-01-24  3:17                 ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BESkb=04vVnqTvZyeCa+7cymX7rosUW3rhtA02khMJKHA@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).