git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: "Tim Renouf (open source)" <tpr.ll@botech.co.uk>
Cc: Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config
Date: Wed, 2 Jun 2021 16:41:58 -0700	[thread overview]
Message-ID: <CABPp-BHT3ZrGrDVBcSTuzhSmy0EdtGUeHF_RAVohw=nemGhoOA@mail.gmail.com> (raw)
In-Reply-To: <9BCB8981-09D5-4BF4-981B-2BF0AA0D6E5A@botech.co.uk>

Hi,

On Tue, Jun 1, 2021 at 11:14 PM Tim Renouf (open source)
<tpr.ll@botech.co.uk> wrote:
>
> Hi all
>
> Thanks for the reviews and comments.
>
> My use case is that I never want it to remove or otherwise touch those files outside of sparse-checkout that happen to be the same path as index paths. I realize that currently gives me complications (e.g. I must never try and merge/cherry-pick/rebase a commit that might cause a merge conflict out there), and I realize that’s not what everyone else wants. For example, I don’t want git reset --hard or whatever to remove those files. Hence the config option.

Every case I've ever seen of present-despite-skip-worktree files is an
accidental and erroneous condition.  If you're trying to use it for
something else, we'd really need to understand the higher level
use-cases so that we can make ranges of commands work together nicely.
The sparse checkout capability itself was started by a low-level
implementational detail ("treat paths as though they matched HEAD and
don't write them to the working tree") and led to a maze of surprises,
bugs, edge cases, etc.  I'd rather not compound that problem by adding
another configuration option defined via a similar low-level
implementational detail.

I'm also leaning towards having `git reset --hard` not clear
present-despite-skip-worktree files, and not having `git status`
report them; both seem like an unnecessary performance issue for this
rare accidental/erroneous condition.  I totally agree that `git
switch/checkout` should not delete or overwrite such files if they
differ from HEAD; but similar to how having a dirty file prevents a
branch switch to some branch that deletes the file, a
present-despite-skip-worktree file that differs from HEAD should
result in an error message and an aborted switch/checkout.  At least
for the standard sparse-checkout behavior; don't know what your
usecase is.

> Am I right in saying that the sparse-index work makes it easier to achieve my use case? In that those outside-sparse-checkout paths would not ever get merged into the index, even if I merge/cherry-pick/rebase a tree with paths there?

If you merge/cherry-pick/rebase a commit with changes to a path
outside the sparse-checkout, and it merges cleanly with your current
branch, then with sparse-index it wouldn't even show up in the index
(though one of its parent trees would be modified because of the
changes to that file).  But that's not very different than without the
sparse-index, at least with the ort merge backend (available and ready
for using starting with git-2.32).  The recursive merge strategy (the
default) is known to write files to the working tree despite the
sparse-checkout requests, even when the merge is clean, but that's
just a bug in the recursive backend.  (One that isn't easily fixable
within its design, which is one of many reasons it was being written
in the form of the ort backend.)

If you merge/cherry-pick/rebase a commit with changes to a path
outside the sparse-checkout, and it conflicts with your branch, then
with or without sparse-index, that file is going to show up in the
index with higher order stages and be written to the working tree.
That is, so long as you don't have a file in the way.  Since the ort
backend makes use of the checkout machinery to switch from the current
state to the new state, fixing checkout to throw an error and abort
when a present-despite-skip-worktree file is present would cause
merges/rebases/cherry-picks to do the same.

> I can go into more details on how I arrived ay my use case if it helps.
>
> So maybe there are two separate things here:
>
> 1. The bug that checkout removes my file when it is dirty, instead of refusing (unless -f) or just ignoring it.

Yep, we need to fix this.

> 2. My use case, which is to do its best to never remove or otherwise touch worktree files outside sparse-checkout.

Can you expand on this use case a bit?  Should git diff or git status
report on your changes for your usecase?  Should "git restore" ignore
the given paths and not overwrite it?  What if the user explicitly
listed the path in question?  Should stash save these changes or
ignore them?  Should add include these changes or ignore them?  Since
your indexed file is different than the worktree file, should "git mv"
move just the file recorded in the index, just the one in the
worktree, or both?  If someone tries to run "git archive", should your
modified files be included?  If you don't like anything touching these
paths, does that mean they are also uninteresting?  So for example,
should "git log -p" or "git grep $REVISION" only return results inside
the sparse-checkout list of paths?

From a higher level, what are you trying to achieve?  Is it similar to
`git update-index --assume-unchanged`?  The point of sparse-checkout
was to reduce the number of files in the working tree, but it appears
you aren't trying to do that; you are putting files back into the
working tree anyway.  So, what's the point of sparse-checkout then?

It's possible sparse-checkout doesn't meet your needs, and just isn't
designed to meet them, and we need another special concept for your
case.  Or perhaps there's a config option that's meaningful.  Or maybe
you're just struggling with the bugs of sparse-checkout, of which
there are *many*.  See e.g.
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

> > I'm also worried that making a config option may have masked subtle
> > bugs in the patch that the rest of the testsuite would have turned up.
>
> It is true that not hiding it in a config option makes a few tests fail, including ones that specifically test that a git reset after a materialization from a merge conflict causes the file to be removed.

Yeah, not so surprising that it has weird interactions with merging
(and perhaps different weird interactions with different merge
backends).  Anything that touches unpack-trees.c either needs to be
part of standard operation, or if it's behind a special config option,
then it needs to be accompanied with a huge number of tests from many
different commands since it affects so many things.  We're trying to
add a battery of tests for sparse-checkout and sparse-index, and we
still obviously need a lot more.

Hope that helps,
Elijah

  reply	other threads:[~2021-06-02 23:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28 20:14 bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Tim Renouf (open source)
2021-05-28 22:44 ` Elijah Newren
2021-06-01 18:31   ` [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config Tim Renouf
2021-06-02  2:00     ` Derrick Stolee
2021-06-02  2:32       ` Junio C Hamano
2021-06-02  5:53         ` Elijah Newren
2021-06-02  6:13           ` Tim Renouf (open source)
2021-06-02 23:41             ` Elijah Newren [this message]
2021-06-05 15:33               ` Tim Renouf (open source)
2021-06-02  1:37 ` bug report: git checkout deletes worktree file even though it is excluded by sparse-checkout Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BHT3ZrGrDVBcSTuzhSmy0EdtGUeHF_RAVohw=nemGhoOA@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    --cc=tpr.ll@botech.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).