git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Martin von Zweigbergk <martinvonz@google.com>
Cc: Victoria Dye <vdye@github.com>, Glen Choo <chooglen@google.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: Bug report: `git restore --source --staged` deals poorly with sparse-checkout
Date: Wed, 5 Oct 2022 00:51:43 -0700	[thread overview]
Message-ID: <CABPp-BH5_=Tq9DM6iAfG3+DuzEE7dR-H8rhP34x-A5FQhLO+bg@mail.gmail.com> (raw)
In-Reply-To: <CAESOdVByucFm=yJn2yL1mwKGqey7tHXH4A-JM-yP125Ok+_Q+g@mail.gmail.com>

On Tue, Oct 4, 2022 at 9:53 PM Martin von Zweigbergk
<martinvonz@google.com> wrote:
>
> On Tue, Oct 4, 2022 at 1:35 PM Victoria Dye <vdye@github.com> wrote:
> >
> > Martin von Zweigbergk wrote:
> > >> On Tue, Oct 4, 2022 at 9:34 AM Victoria Dye <vdye@github.com <mailto:vdye@github.com>> wrote:
> > >>
[...]
> > >> Thanks for reporting this! There are a few confusing things going on with
> > >> 'restore' here.
> > >>
> > >> First is that the out-of-cone was even restored in the first place.
> > >>
> > >
> > > I was actually happy that the out-of-cone paths were restored. I ran that
> > > command as an experiment while reading Elijah's doc because I was curious
> > > what would happen. The reason I think it should restore out-of-cone paths is
> > > so you can do `git restore --staged --source <some commit> && git commit -m
> > > "restore to old commit"` without caring about the sparse spec.
> >
> > Conversely, that's behavior a user *wouldn't* want if they want to keep
> > their sparse cone intact (not to mention the performance impact of checking
> > out the entire worktree). I think it does more harm to those users than it
> > would benefit the ones that want to checkout out-of-cone files.
> >
> > The use-case you're describing should be served by the
> > '--ignore-skip-worktree-bits' option (not the most intuitive name,
> > unfortunately). Luckily, there's an increasing desire to improve the naming
> > of sparse-related options, so the UX situation should improve in the future.
>
> I realized after sending my previous email that I might have a
> different view of what
> sparse checkout is about. To me, it seems like it should be just a performance
> optimization. That's why I feel like commands should behave the same way with
> or without a sparse spec (unless that proposed `--restrict` flag is passed). I
> understand if that's just not feasible. Sorry about the noise in that case :)

The problem I see with that definition is that I'm not even sure what
that means.  Behaving the same way with or without a sparse
specification at the extreme means that switching branches should
populate all the files in that branch...meaning a dense checkout.
That kind of defeats the point.  So, I'm sure you don't mean that they
behave the same in all cases...but where do you draw the line?

Is it just switch/checkout & `reset --hard` that avoid reading and
writing outside the sparse specification?

Should diff & status ignore files outside the sparse specification
even if users wrote to such files?  A "performance optimization" might
suggest we should, but would users get confused?

What about merge/rebase/cherry-pick/revert?  Should those write
additional files to the working tree or avoid it?  What about if there
are conflicts outside the sparse specification?

And if extra files get written outside the sparse specification, are
there ways for this to get "cleaned up" where after resolving
conflicts or changes we can again remove the file from the working
tree?

What about `git grep PATTERN`?  That's documented to search the
tracked files in the working tree.  But should that include the files
that would have been there were it not for the "performance
optimization" of not checking them out?  (Similarly, what about `git
grep --cached PATTERN` or `git grep PATTERN REVISION`?)  I mean, if
these commands should behave the same regardless of sparse
specification, then you should search those other files, right?  But
isn't that a nasty performance penalty if the user has a
sparse/partial clone since Git will have to do many network operations
to get additional blobs in order to search them?  Is that really
wanted?

What about `git rm '*.png'` to remove all the tracked png files from
my working tree.  Should that also remove all the files that would
have been there were it not for the "performance optimization"?  Will
that result in very negative surprises for those with a "I want to
concentrate on just this subset of files" mental model?

What about `git worktree add`?  Should the sparse specification be the
same for all worktrees?  Be per-worktree?  Should it default to dense?
 To just top-level sparse?  To the same sparsity as the worktree you
were in when you created a new one?

There are more questions along these lines, but perhaps this gives a
flavor of the kinds of questions that can come up which need
underlying user mental models and usecases to help us answer how these
commands should behave.

> > But suppose you're doing something like 'git restore --staged --no-worktree
> > --ignore-skip-worktree-bits --source HEAD^'. In that case:
> >
> > - you are restoring to the index
> > - you are *not* restoring to the worktree
> > - you're restoring files with SKIP_WORKTREE applied
> >
> > Right now, SKIP_WORKTREE is removed from the matched entries, but suppose
> > (per your comment) it wasn't. That wouldn't mean that we've "modified the
> > working copy"; the working tree is defined with respect to the index, and if
> > the index entry says "I don't care about the worktree", then there are no
> > differences to reflect.
>
> Yes, not technically changed. I was (and still am) thinking of the working copy
> as logically containing all the files even if some of them are not written out
> to disk.

Git started with a nearly identical definition and still sadly
includes it (though buried deep in a plumbing manual that thankfully
virtually no one will ever read):

       Skip-worktree bit can be defined in one (long) sentence: When reading
       an entry, if it is marked as skip-worktree, then Git pretends its
       working directory version is up to date and read the index version
       instead.

Such a mental model is useless to me in answering how commands should
behave in any interesting cases, and actually led to several
inconsistencies and bugs[1].  However, you didn't leave it there; you
took it a step further...

[1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

> I understand if that seems like an odd way of thinking about
> it. It might
> help to think about how it might appear in a virtual file system where clean
> files outside the sparse spec are presented from the index. Or it may be better
> to not try to think about it in this weird way at all :)

Now you've provided a real usecase (virtual file system really
pretending that all files are present), which gives us some context
with which we can try to answer those questions I asked above.  I
think some of the answers might be different for this usecase than the
"behavior A" and "behavior B" in my current sparse-checkout.txt
document, so maybe we want to add this usecase to the mix.

Anyway, thanks for the interesting bug report, and the context of
another usecase to consider.

  reply	other threads:[~2022-10-05  7:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-03 22:05 Bug report: `git restore --source --staged` deals poorly with sparse-checkout Glen Choo
2022-10-04 16:34 ` Victoria Dye
     [not found]   ` <CAESOdVAh68HoQoyicfZn4XbjGfiRFCu1zFQmUjMcSAg3tUzr4Q@mail.gmail.com>
2022-10-04 20:34     ` Victoria Dye
2022-10-05  4:53       ` Martin von Zweigbergk
2022-10-05  7:51         ` Elijah Newren [this message]
2022-10-05 20:00           ` Martin von Zweigbergk
2022-10-06  4:20             ` Elijah Newren
2022-10-05 16:11         ` Victoria Dye
2022-10-05  5:22       ` Elijah Newren
2022-10-06 19:30       ` Junio C Hamano
2022-10-06 19:38         ` Junio C Hamano
2022-10-05  5:00     ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BH5_=Tq9DM6iAfG3+DuzEE7dR-H8rhP34x-A5FQhLO+bg@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=chooglen@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=martinvonz@google.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).