git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Martin von Zweigbergk <martinvonz@google.com>
To: Elijah Newren <newren@gmail.com>
Cc: Victoria Dye <vdye@github.com>, Glen Choo <chooglen@google.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: Bug report: `git restore --source --staged` deals poorly with sparse-checkout
Date: Wed, 5 Oct 2022 13:00:00 -0700	[thread overview]
Message-ID: <CAESOdVDt7SU=OJhF0mgyZ=B3sncB49aML8oOzKTKAnmGO5BaVQ@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BH5_=Tq9DM6iAfG3+DuzEE7dR-H8rhP34x-A5FQhLO+bg@mail.gmail.com>

On Wed, Oct 5, 2022 at 12:51 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Oct 4, 2022 at 9:53 PM Martin von Zweigbergk
> <martinvonz@google.com> wrote:
> >
> > On Tue, Oct 4, 2022 at 1:35 PM Victoria Dye <vdye@github.com> wrote:
> > >
> > > Martin von Zweigbergk wrote:
> > > >> On Tue, Oct 4, 2022 at 9:34 AM Victoria Dye <vdye@github.com <mailto:vdye@github.com>> wrote:
> > > >>
> [...]
> > > >> Thanks for reporting this! There are a few confusing things going on with
> > > >> 'restore' here.
> > > >>
> > > >> First is that the out-of-cone was even restored in the first place.
> > > >>
> > > >
> > > > I was actually happy that the out-of-cone paths were restored. I ran that
> > > > command as an experiment while reading Elijah's doc because I was curious
> > > > what would happen. The reason I think it should restore out-of-cone paths is
> > > > so you can do `git restore --staged --source <some commit> && git commit -m
> > > > "restore to old commit"` without caring about the sparse spec.
> > >
> > > Conversely, that's behavior a user *wouldn't* want if they want to keep
> > > their sparse cone intact (not to mention the performance impact of checking
> > > out the entire worktree). I think it does more harm to those users than it
> > > would benefit the ones that want to checkout out-of-cone files.
> > >
> > > The use-case you're describing should be served by the
> > > '--ignore-skip-worktree-bits' option (not the most intuitive name,
> > > unfortunately). Luckily, there's an increasing desire to improve the naming
> > > of sparse-related options, so the UX situation should improve in the future.
> >
> > I realized after sending my previous email that I might have a
> > different view of what
> > sparse checkout is about. To me, it seems like it should be just a performance
> > optimization. That's why I feel like commands should behave the same way with
> > or without a sparse spec (unless that proposed `--restrict` flag is passed). I
> > understand if that's just not feasible. Sorry about the noise in that case :)
>
> The problem I see with that definition is that I'm not even sure what
> that means.  Behaving the same way with or without a sparse
> specification at the extreme means that switching branches should
> populate all the files in that branch...meaning a dense checkout.
> That kind of defeats the point.  So, I'm sure you don't mean that they
> behave the same in all cases...but where do you draw the line?

I agree with you and Stolee that there are two different cases: some
people use sparse checkouts to restrict what they see (behavior A), and
some people use it just as a performance optimization (behavior B). So I
suspect we roughly agree about what should happen if you pass
`--restrict` (or if that becomes the default so you don't actually need to
pass it). My arguments were about the `--no-restrict` case. Sorry, I
should have made that clear.

I also agree that having a way to make commands restrict to certain paths
by default is useful, and I agree that tying that set of paths to the current
worktree's sparse spec makes sense.

I'll answer the questions below for the `--no-restrict` case
(behavior B).

>
> Is it just switch/checkout & `reset --hard` that avoid reading and
> writing outside the sparse specification?
>
> Should diff & status ignore files outside the sparse specification
> even if users wrote to such files?  A "performance optimization" might
> suggest we should, but would users get confused?

I think they should be included (again, in the `--no-restrict` case).

>
> What about merge/rebase/cherry-pick/revert?  Should those write
> additional files to the working tree or avoid it?  What about if there
> are conflicts outside the sparse specification?

I think they should avoid it, but since the user will need to resolve
that conflict anyway, I can see it makes sense to write them to disk
if there are conflicts.

>
> And if extra files get written outside the sparse specification, are
> there ways for this to get "cleaned up" where after resolving
> conflicts or changes we can again remove the file from the working
> tree?

I've never really used `git sparse-checkout` (until I read your doc),
but isn't that what `git sparse-checkout reapply` is for?

>
> What about `git grep PATTERN`?  That's documented to search the
> tracked files in the working tree.  But should that include the files
> that would have been there were it not for the "performance
> optimization" of not checking them out?  (Similarly, what about `git
> grep --cached PATTERN` or `git grep PATTERN REVISION`?)  I mean, if
> these commands should behave the same regardless of sparse
> specification, then you should search those other files, right?  But
> isn't that a nasty performance penalty if the user has a
> sparse/partial clone since Git will have to do many network operations
> to get additional blobs in order to search them?  Is that really
> wanted?

I think it's consistent to search them with `--no-restrict` (but not
with `--restrict`, of course).

>
> What about `git rm '*.png'` to remove all the tracked png files from
> my working tree.  Should that also remove all the files that would
> have been there were it not for the "performance optimization"?  Will
> that result in very negative surprises for those with a "I want to
> concentrate on just this subset of files" mental model?

Same here.

>
> What about `git worktree add`?  Should the sparse specification be the
> same for all worktrees?  Be per-worktree?  Should it default to dense?
>  To just top-level sparse?  To the same sparsity as the worktree you
> were in when you created a new one?

That's an interesting case. If someone does `git worktree add` and expects
all files to be available in the working copy, they might be surprised, yes.
I think that's a much smaller risk than
`git restore --source HEAD^ --staged && git commit -m 'undo changes'` being
partial, however.

  reply	other threads:[~2022-10-05 20:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-03 22:05 Bug report: `git restore --source --staged` deals poorly with sparse-checkout Glen Choo
2022-10-04 16:34 ` Victoria Dye
     [not found]   ` <CAESOdVAh68HoQoyicfZn4XbjGfiRFCu1zFQmUjMcSAg3tUzr4Q@mail.gmail.com>
2022-10-04 20:34     ` Victoria Dye
2022-10-05  4:53       ` Martin von Zweigbergk
2022-10-05  7:51         ` Elijah Newren
2022-10-05 20:00           ` Martin von Zweigbergk [this message]
2022-10-06  4:20             ` Elijah Newren
2022-10-05 16:11         ` Victoria Dye
2022-10-05  5:22       ` Elijah Newren
2022-10-06 19:30       ` Junio C Hamano
2022-10-06 19:38         ` Junio C Hamano
2022-10-05  5:00     ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAESOdVDt7SU=OJhF0mgyZ=B3sncB49aML8oOzKTKAnmGO5BaVQ@mail.gmail.com' \
    --to=martinvonz@google.com \
    --cc=chooglen@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).