list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <>
To: Matheus Tavares Bernardino <>
Cc: Elijah Newren via GitGitGadget <>,
	git <>, Derrick Stolee <>
Subject: Re: [PATCH] unpack-trees: do not set SKIP_WORKTREE on submodules
Date: Wed, 17 Jun 2020 17:24:26 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Hi Matheus,

On Wed, Jun 17, 2020 at 10:58 AM Matheus Tavares Bernardino
<> wrote:
> On Tue, Jun 16, 2020 at 10:21 PM Elijah Newren via GitGitGadget
> <> wrote:
> >
> > From: Elijah Newren <>
> >
> > As noted in commit e7d7c73249 ("git-sparse-checkout: clarify
> > interactions with submodules", 2020-06-10), sparse-checkout cannot
> > remove submodules even if they don't match the sparsity patterns,
> > because doing so would risk data loss -- unpushed, uncommitted, or
> > untracked changes could all be lost.  That commit also updated the
> > documentation to point out that submodule initialization state was a
> > parallel, orthogonal reason that entries in the index might not be
> > present in the working tree.
> >
> > However, sparsity and submodule initialization weren't actually fully
> > orthogonal yet.  The SKIP_WORKTREE handling in unpack_trees would
> > attempt to set the SKIP_WORKTREE bit on submodules when the submodule
> > did not match the sparsity patterns.  This resulted in innocuous but
> > potentially alarming warning messages:
> >
> >     warning: unable to rmdir 'sha1collisiondetection': Directory not empty
> >
> > It could also make things confusing since the entry would be marked as
> > SKIP_WORKTREE in the index but actually still be present in the working
> > tree:
> >
> >     $ git ls-files -t | grep sha1collisiondetection
> >     S sha1collisiondetection
> >     $ ls -al sha1collisiondetection/ | wc -l
> >     13
> >
> > Submodules have always been their own form of "partial checkout"
> > behavior, with their own "present or not" state determined by running
> > "git submodule [init|deinit|update]".  Enforce that separation by having
> > the SKIP_WORKTREE logic not touch submodules and allow submodules to
> > continue using their own initialization state for determining if the
> > submodule is present.
> Makes sense to me.
> I'm just thinking about the possible implications in grep (with
> mt/grep-sparse-checkout). As you mentioned in [1], users might think
> of "git grep $rev $pat" as an optimized version of "git checkout $rev
> && git grep $pat". And, in this sense, they probably expect the output
> of these two operations to be equal. But if we don't set the
> SKIP_WORKTREE bit for submodules they might diverge.
> As an example, if we have a repository like:
> .
> |-- A
> |   `-- sub
> `-- B
> And the [cone-mode] sparsity rules:
> /*
> !/*/
> /B/
> Then, "git grep --recurse-submodules $rev $pat" would search only in B
> (as A doesn't match the sparsity patterns and thus, is not recursed
> into). But "git checkout $rev && git grep --recurse-submodules $pat"
> would search in both B and A/sub (as the latter would not have the
> SKIP_WORKTREE bit set).
> This might be a problem for git-grep, not git-sparse-checkout. But I'm
> not sure how we could solve it efficiently, as the submodule might be
> deep down in a path whose first dir was already ignored for not
> matching the sparsity patterns. Is this a problem we should consider,
> or is it OK if the outputs of these two operations diverge?

You always have an eagle eye for catching these things.  Good point.

So, ignoring submodules for a second... In the case of unmerged files
that do not match the sparsity patterns, we previously felt it was
fine that
    git grep $revision $pattern
won't search through $revision:${currently_unmerged_file} despite the fact that
    git grep $pattern
will search through ${currently_unmerged_file} from the working directory.

If we follow that analogy, then we're fine having those two "almost
command equivalents" diverge.

Let's dig a little further anyway and compare these "almost command
equivalents" a bit more, which as a reminder are (1) "git grep
--recurse-submodules $rev $pat" vs. (2) "git checkout $rev && git grep
--recurse-submodules $pat".  (I just wanted to give them simple
numerical labels.)  In all cases, we'll be considering a $FILENAME
which does not match sparsity patterns but is present in the working
directory for reasons listed below:

First case: $FILENAME has uncommitted changes:
(Command 1) This will search only through files in $REV that match
sparsity patterns; $REV:$FILENAME will be excluded
(Command 2) If $FILENAME is different between HEAD and $REV this fails
and shows nothing, even if other paths in the sparsity patterns had
matches.  Otherwise, it searches through $FILENAME.

If the command is modified to add a -m to the `git checkout` command
to make the checkout not fail, then the second command will always
search through $REV:$FILENAME, as modified by the dirty changes being
merged in.  If the user cleans up the file with uncommitted changes
first, then the checkout would make the file go away and thus it
wouldn't be searched after the switch.

Second case: $FILENAME has unmerged changes:
(Command 1) This will search through files in $REV that match sparsity
patterns; $REV:$FILENAME will be excluded.
(Command 2) This will always fail and show nothing, regardless of any
other matches.

Here the checkout command always fails due to the unmerged entries and
won't even allow the user to switch with -m.

So, my comparison of commands #1 and #2 isn't quite right, even for
non-submodule cases.  But let's dig a little further.  If someone were
to try do change their sparsity patterns or even just run a "git
sparse-checkout reapply" when they had the above issues, they'd see
something like:

    $ git sparse-checkout reapply
    warning: The following paths are unmerged and were left despite
sparse patterns:

    After fixing the above paths, you may want to run `git
sparse-checkout reapply`.

This basically suggests that we consider uncommitted and unmerged
files to be "unclean" in some way (sparse-checkout wants to set the
SKIP_WORKTREE bit on all files that do not match the sparsity
specification, so "clean" means sparse-checkout is able to do so).  So
I could amend my earlier comparison and say that IF the user has a
clean directory, then "git grep --recurse-submodules $REVISION
$PATTERN" should be equivalent to "git checkout $REVISION && git grep
--recurse-submodules $PATTERN".  I could also say that given the big
warnings we give users when we can't set the SKIP_WORKTREE bit, that
we expect it to be a transient state and thus that we expect them to
more likely than not clear it out by the time they do switch branches.
That would lead us to the follow-up rule that if the user does not
have a clean directory then "git grep --recurse-submodules $REVISION
$PATTERN" should be equivalent to what you would get if the unclean
entries were ignored (expecting them to be cleaned before the any `git
checkout` could be run) and you then otherwise ran "git checkout
$REVISION && git grep --recurse-submodules $PATTERN".

That suggests that grep's implementation we agreed on earlier is still
correct (when given a $REVISION ignore submodulees that do not match
the sparsity patterns), but that unpack-trees/sparse-checkout still
need an update:

When we notice an initialized submodule that does not match the
sparsity patterns, we should print a warning just like we do for
unmerged and dirty entries.

So my patch needs a bit more work.

  reply	other threads:[~2020-06-18  0:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17  1:21 Elijah Newren via GitGitGadget
2020-06-17 17:58 ` Matheus Tavares Bernardino
2020-06-18  0:24   ` Elijah Newren [this message]
2020-06-18 14:34     ` Matheus Tavares Bernardino
2020-06-18 19:19       ` Elijah Newren
2020-06-18 20:29         ` Matheus Tavares Bernardino
2020-06-18  1:51 How soon
2020-10-06  0:06 Luv MeZza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

  List information:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).