git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Possible bug with git restore
@ 2020-08-12 18:51 Sergii Shkarnikov
  2020-08-14 22:41 ` Eric Sunshine
  0 siblings, 1 reply; 11+ messages in thread
From: Sergii Shkarnikov @ 2020-08-12 18:51 UTC (permalink / raw)
  To: git

Hi.

I see some strange behavior of the "git restore" command. Here is a
bug report template filled out:

*********************************************************************
Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

    I tried to restore a couple of files from an earlier commit
running the restore command with a wildcard:

    git restore -s HEAD~ -- */filename.*

    In my work tree those are .cpp and .hpp files stored in different folders.

What did you expect to happen? (Expected behavior)

    Both files are restored to what they were in HEAD~.

What happened instead? (Actual behavior)

    Both files were deleted (and got (delete) status).

What's different between what you expected and what actually happened?

    Files became deleted for no apparent reason instead of restoring
to expected revision.

Anything else you want to add:

    Running this command without wildcards for each file separately
works as expected.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.27.0.windows.1
cpu: x86_64
built from commit: 907ab1011dce9112700498e034b974ba60f8b407
sizeof-long: 4
sizeof-size_t: 8
uname: Windows 10.0 18363
compiler info: gnuc: 10.1
libc info: no libc information available


[Enabled Hooks]
post-commit
post-checkout
post-merge
pre-push
***************************************************************
Regards,
Sergii Shkarnikov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-12 18:51 Possible bug with git restore Sergii Shkarnikov
@ 2020-08-14 22:41 ` Eric Sunshine
  2020-08-20 12:59   ` Sergii Shkarnikov
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Sunshine @ 2020-08-14 22:41 UTC (permalink / raw)
  To: Sergii Shkarnikov; +Cc: Git List

On Wed, Aug 12, 2020 at 2:51 PM Sergii Shkarnikov
<sergii.shkarnikov@globallogic.com> wrote:
>     I tried to restore a couple of files from an earlier commit
> running the restore command with a wildcard:
>
>     git restore -s HEAD~ -- */filename.*
>
>     In my work tree those are .cpp and .hpp files stored in different folders.
>     Both files were deleted (and got (delete) status).
>     Running this command without wildcards for each file separately
> works as expected.

Thanks for the report. Can you provide a complete recipe in the form
of shell command to make this happen so others can reproduce the
behavior? Doing so will help track down the issue. Also, since this is
Windows, do the cases of the filenames in the referenced commit match
the cases actually on the filesystem (and have the cases changed
between commits)?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-14 22:41 ` Eric Sunshine
@ 2020-08-20 12:59   ` Sergii Shkarnikov
  2020-08-20 13:40     ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Sergii Shkarnikov @ 2020-08-20 12:59 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

Here is a script to reproduce the issue that works for me in Git Bash:

=============================================
#!/bin/bash

#create repo with corresponding structure
mkdir restore_bug_test
cd restore_bug_test
mkdir incl
mkdir src
touch incl/test_file.hpp
touch src/test_file.cpp
git init
git add .
git commit -m"initial"

#add a couple of commits
echo "1" >> incl/test_file.hpp
echo "1" >> src/test_file.cpp
git commit -am"1"
echo "2" >> incl/test_file.hpp
echo "2" >> src/test_file.cpp
git commit -am"2"

#reproduce bug
git restore -s HEAD~ -- *test_file.*
git status
===============================================

Haven't checked filenames explicitly, but they haven't been changed manually.
And there are no explicit case changes in the attached script.

On Sat, Aug 15, 2020 at 1:42 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Wed, Aug 12, 2020 at 2:51 PM Sergii Shkarnikov
> <sergii.shkarnikov@globallogic.com> wrote:
> >     I tried to restore a couple of files from an earlier commit
> > running the restore command with a wildcard:
> >
> >     git restore -s HEAD~ -- */filename.*
> >
> >     In my work tree those are .cpp and .hpp files stored in different folders.
> >     Both files were deleted (and got (delete) status).
> >     Running this command without wildcards for each file separately
> > works as expected.
>
> Thanks for the report. Can you provide a complete recipe in the form
> of shell command to make this happen so others can reproduce the
> behavior? Doing so will help track down the issue. Also, since this is
> Windows, do the cases of the filenames in the referenced commit match
> the cases actually on the filesystem (and have the cases changed
> between commits)?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-20 12:59   ` Sergii Shkarnikov
@ 2020-08-20 13:40     ` Jeff King
  2020-08-20 17:48       ` René Scharfe
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2020-08-20 13:40 UTC (permalink / raw)
  To: Sergii Shkarnikov; +Cc: Eric Sunshine, Git List

On Thu, Aug 20, 2020 at 03:59:00PM +0300, Sergii Shkarnikov wrote:

> Here is a script to reproduce the issue that works for me in Git Bash:
> 
> =============================================
> #!/bin/bash
> 
> #create repo with corresponding structure
> mkdir restore_bug_test
> cd restore_bug_test
> mkdir incl
> mkdir src
> touch incl/test_file.hpp
> touch src/test_file.cpp
> git init
> git add .
> git commit -m"initial"
> 
> #add a couple of commits
> echo "1" >> incl/test_file.hpp
> echo "1" >> src/test_file.cpp
> git commit -am"1"
> echo "2" >> incl/test_file.hpp
> echo "2" >> src/test_file.cpp
> git commit -am"2"
> 
> #reproduce bug
> git restore -s HEAD~ -- *test_file.*
> git status
> ===============================================

That reproduces for me here on Linux, as well (for those just joining,
the interesting thing is that the final "git status" reports the files
as deleted, rather than modified back to "1").

Interestingly, if I do:

  git restore -s HEAD~ --overlay -- *test_file.*

then I get:

  error: pathspec '*test_file.*' did not match any file(s) known to git

So there are two oddities here:

  - shouldn't that wildcard pathspec match those files? I've confirmed
    that the glob characters make it into Git's pathspec machinery, and
    since it doesn't have slashes, I think we'd match a basename (and
    certainly "git ls-files *test_file.*" does what I expect).

  - even if it doesn't match, it seems weird that overlay mode would
    remove files that don't match. I'd expect it to remove files in
    trees that _did_ match the pathspec, but leave anything outside of
    the pathspec untouched.

    It's almost like we matched the pathspec in the pass over the
    working tree files, but failed to do so when reading in the tree.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-20 13:40     ` Jeff King
@ 2020-08-20 17:48       ` René Scharfe
  2020-08-20 18:27         ` Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: René Scharfe @ 2020-08-20 17:48 UTC (permalink / raw)
  To: Jeff King, Sergii Shkarnikov; +Cc: Eric Sunshine, Git List

Am 20.08.20 um 15:40 schrieb Jeff King:
> On Thu, Aug 20, 2020 at 03:59:00PM +0300, Sergii Shkarnikov wrote:
>
>> Here is a script to reproduce the issue that works for me in Git Bash:

That's very helpful!

>   - shouldn't that wildcard pathspec match those files? I've confirmed
>     that the glob characters make it into Git's pathspec machinery, and
>     since it doesn't have slashes, I think we'd match a basename (and
>     certainly "git ls-files *test_file.*" does what I expect).

No, because restore doesn't interpret pathspecs recursively.  I don't
know why that causes files to disappear, though.  But here's a fix.

No sign-off because I don't understand why pathspec recursiveness is a
thing that can be turned off -- I'd expect pathspec syntax to be
consistent for all commands.  So there might be a good reason why it was
not enabled for restore (and switch and checkout).

The flag was introduced in bc96cc87dbb (tree_entry_interesting():
support depth limit, 2010-12-15), but I still don't fully get it.
Anyway, here's what I got -- feel free to incorporate it in a real
patch:

René


---
 builtin/checkout.c               |  4 ++++
 t/t2072-restore-pathspec-file.sh | 19 ++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 28371954912..8d2dc0cfa48 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -66,6 +66,7 @@ struct checkout_opts {
 	int can_switch_when_in_progress;
 	int orphan_from_empty_tree;
 	int empty_pathspec_ok;
+	int recursive_pathspec;
 	int checkout_index;
 	int checkout_worktree;
 	const char *ignore_unmerged_opt;
@@ -1707,6 +1708,8 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
 		die(_("--pathspec-file-nul requires --pathspec-from-file"));
 	}

+	opts->pathspec.recursive = opts->recursive_pathspec;
+
 	if (opts->pathspec.nr) {
 		if (1 < !!opts->writeout_stage + !!opts->force + !!opts->merge)
 			die(_("git checkout: --ours/--theirs, --force and --merge are incompatible when\n"
@@ -1852,6 +1855,7 @@ int cmd_restore(int argc, const char **argv, const char *prefix)
 	opts.checkout_index = -1;    /* default off */
 	opts.checkout_worktree = -2; /* default on */
 	opts.ignore_unmerged_opt = "--ignore-unmerged";
+	opts.recursive_pathspec = 1;

 	options = parse_options_dup(restore_options);
 	options = add_common_options(&opts, options);
diff --git a/t/t2072-restore-pathspec-file.sh b/t/t2072-restore-pathspec-file.sh
index 0d47946e8a9..da976665095 100755
--- a/t/t2072-restore-pathspec-file.sh
+++ b/t/t2072-restore-pathspec-file.sh
@@ -9,18 +9,21 @@ test_tick
 test_expect_success setup '
 	test_commit file0 &&

+	mkdir dir1 &&
+	echo 1 >dir1/file &&
 	echo 1 >fileA.t &&
 	echo 1 >fileB.t &&
 	echo 1 >fileC.t &&
 	echo 1 >fileD.t &&
-	git add fileA.t fileB.t fileC.t fileD.t &&
+	git add dir1 fileA.t fileB.t fileC.t fileD.t &&
 	git commit -m "files 1" &&

+	echo 2 >dir1/file &&
 	echo 2 >fileA.t &&
 	echo 2 >fileB.t &&
 	echo 2 >fileC.t &&
 	echo 2 >fileD.t &&
-	git add fileA.t fileB.t fileC.t fileD.t &&
+	git add dir1 fileA.t fileB.t fileC.t fileD.t &&
 	git commit -m "files 2" &&

 	git tag checkpoint
@@ -31,7 +34,7 @@ restore_checkpoint () {
 }

 verify_expect () {
-	git status --porcelain --untracked-files=no -- fileA.t fileB.t fileC.t fileD.t >actual &&
+	git status --porcelain --untracked-files=no -- dir1 fileA.t fileB.t fileC.t fileD.t >actual &&
 	test_cmp expect actual
 }

@@ -161,4 +164,14 @@ test_expect_success 'error conditions' '
 	test_i18ngrep -e "you must specify path(s) to restore" err
 '

+test_expect_success 'pathspec matches file in subdirectory' '
+	restore_checkpoint &&
+
+	echo "*file" | git restore --pathspec-from-file=- --source=HEAD^1 &&
+	cat >expect <<-\EOF &&
+	 M dir1/file
+	EOF
+	verify_expect
+'
+
 test_done
--
2.28.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-20 17:48       ` René Scharfe
@ 2020-08-20 18:27         ` Jeff King
  2020-08-22  8:57           ` [PATCH] checkout, restore: make pathspec recursive René Scharfe
                             ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jeff King @ 2020-08-20 18:27 UTC (permalink / raw)
  To: René Scharfe; +Cc: Sergii Shkarnikov, Eric Sunshine, Git List

On Thu, Aug 20, 2020 at 07:48:48PM +0200, René Scharfe wrote:

> >   - shouldn't that wildcard pathspec match those files? I've confirmed
> >     that the glob characters make it into Git's pathspec machinery, and
> >     since it doesn't have slashes, I think we'd match a basename (and
> >     certainly "git ls-files *test_file.*" does what I expect).
> 
> No, because restore doesn't interpret pathspecs recursively.  I don't
> know why that causes files to disappear, though.  But here's a fix.

I think it's because of this comment from bc96cc87dbb:

  When pathspec.recursive == 0, the behavior depends on match functions:
  non-recursive for tree_entry_interesting() and recursive for
  match_pathspec{,_depth}

So when we read the tree, we don't match recursively, and those entries
don't appear. But then we correlate that with the index:

          /*
           * Make sure all pathspecs participated in locating the paths
           * to be checked out.
           */
          for (pos = 0; pos < active_nr; pos++)
                  if (opts->overlay_mode)
                          mark_ce_for_checkout_overlay(active_cache[pos],
                                                       ps_matched,
                                                       opts);
                  else
                          mark_ce_for_checkout_no_overlay(active_cache[pos],
                                                          ps_matched,
                                                          opts);

And in no-overlay mode (the default for restore), we do:

  
  static void mark_ce_for_checkout_no_overlay(struct cache_entry *ce,
                                              char *ps_matched,
                                              const struct checkout_opts *opts)
  {
          ce->ce_flags &= ~CE_MATCHED;
          if (!opts->ignore_skipworktree && ce_skip_worktree(ce))
                  return;
          if (ce_path_match(&the_index, ce, &opts->pathspec, ps_matched)) {
                  ce->ce_flags |= CE_MATCHED;
                  if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
                          /*
                           * In overlay mode, but the path is not in
                           * tree-ish, which means we should remove it
                           * from the index and the working tree.
                           */
                          ce->ce_flags |= CE_REMOVE | CE_WT_REMOVE;
          }
  }

And that ce_path_match() _does_ treat the pathspec recursively. So we
say "yes, it matches in the index but wasn't in the tree, and therefore
we must delete it".

So the fundamental issue is treating the pathspec in two different ways,
and then correlating the results. We need to either do a recursive match
for the tree match (as your patch does), or do non-recursive for this
index match (which I don't think is trivial, because of the way the
recursive flag works).

> No sign-off because I don't understand why pathspec recursiveness is a
> thing that can be turned off -- I'd expect pathspec syntax to be
> consistent for all commands.  So there might be a good reason why it was
> not enabled for restore (and switch and checkout).

I think it was originally done this way for compatibility of some
commands as we unified the pathspec code. But I'm having trouble digging
up the exact details.

However, it seems particularly egregious in checkout/restore, because we
may also be using the index as a source, in which case the pathspecs
_would_ be recursive by default. E.g., in the test repo we've been
discussing:

  [make the index and working tree differ]
  $ git reset HEAD^
  Unstaged changes after reset:
  M	incl/test_file.hpp
  M	src/test_file.cpp

  [restore using a wildcard, but out of the index rather than a tree]
  $ git restore -- '*.hpp'

  [and check that we did indeed match]
  $ git status
  On branch master
  Changes not staged for commit:
	modified:   src/test_file.cpp

So I think this inconsistency in pathspec matching between trees and the
index has probably existed in git-checkout for ages (and I guess people
don't do wildcards with trees often enough for anybody to have noticed).
But it didn't cause the index-deletion problem, because that only
appeared more recently with the --no-overlay mode. That's the default
for restore, but you can trigger the problem with checkout, too:

  $ git reset --hard
  $ git checkout --no-overlay HEAD^ '*.hpp'
  Updated 0 paths from 2668463
  $ git status
  On branch master
  Changes to be committed:
	deleted:    incl/test_file.hpp

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] checkout, restore: make pathspec recursive
  2020-08-20 18:27         ` Jeff King
@ 2020-08-22  8:57           ` René Scharfe
  2020-08-24 20:21             ` Jeff King
  2020-08-22 10:29           ` Possible bug with git restore René Scharfe
  2020-08-22 19:36           ` Junio C Hamano
  2 siblings, 1 reply; 11+ messages in thread
From: René Scharfe @ 2020-08-22  8:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Sergii Shkarnikov, Eric Sunshine, Git List, Junio C Hamano

The pathspec given to git checkout and git restore is used with both
tree_entry_interesting (via read_tree_recursive) and match_pathspec
(via ce_path_match).  The latter effectively only supports recursive
matching regardless of the value of the pathspec flag "recursive",
which is unset here.

That causes different match results for pathspecs with wildcards, and
can lead checkout and restore in no-overlay mode to remove entries
instead of modifying them.  Enable recursive matching for both checkout
and restore to make matching consistent.

Setting the flag in checkout_main() technically also affects git switch,
but since that command doesn't accept pathspecs at all this has no
actual consequence.

Reported-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Initial-test-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/checkout.c               |  2 ++
 t/t2025-checkout-no-overlay.sh   | 12 ++++++++++++
 t/t2072-restore-pathspec-file.sh | 19 ++++++++++++++++---
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 28371954912..a65b4bf0b71 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1707,6 +1707,8 @@ static int checkout_main(int argc, const char **argv, const char *prefix,
 		die(_("--pathspec-file-nul requires --pathspec-from-file"));
 	}

+	opts->pathspec.recursive = 1;
+
 	if (opts->pathspec.nr) {
 		if (1 < !!opts->writeout_stage + !!opts->force + !!opts->merge)
 			die(_("git checkout: --ours/--theirs, --force and --merge are incompatible when\n"
diff --git a/t/t2025-checkout-no-overlay.sh b/t/t2025-checkout-no-overlay.sh
index 76330cb5ab7..fa9e0987063 100755
--- a/t/t2025-checkout-no-overlay.sh
+++ b/t/t2025-checkout-no-overlay.sh
@@ -44,4 +44,16 @@ test_expect_success '--no-overlay --theirs with D/F conflict deletes file' '
 	test_path_is_missing file1
 '

+test_expect_success 'wildcard pathspec matches file in subdirectory' '
+	git reset --hard &&
+	mkdir subdir &&
+	test_commit file3-1 subdir/file3 &&
+	test_commit file3-2 subdir/file3 &&
+
+	git checkout --no-overlay file3-1 "*file3" &&
+	echo file3-1 >expect &&
+	test_path_is_file subdir/file3 &&
+	test_cmp expect subdir/file3
+'
+
 test_done
diff --git a/t/t2072-restore-pathspec-file.sh b/t/t2072-restore-pathspec-file.sh
index 0d47946e8a9..b48345bf95f 100755
--- a/t/t2072-restore-pathspec-file.sh
+++ b/t/t2072-restore-pathspec-file.sh
@@ -9,18 +9,21 @@ test_tick
 test_expect_success setup '
 	test_commit file0 &&

+	mkdir dir1 &&
+	echo 1 >dir1/file &&
 	echo 1 >fileA.t &&
 	echo 1 >fileB.t &&
 	echo 1 >fileC.t &&
 	echo 1 >fileD.t &&
-	git add fileA.t fileB.t fileC.t fileD.t &&
+	git add dir1 fileA.t fileB.t fileC.t fileD.t &&
 	git commit -m "files 1" &&

+	echo 2 >dir1/file &&
 	echo 2 >fileA.t &&
 	echo 2 >fileB.t &&
 	echo 2 >fileC.t &&
 	echo 2 >fileD.t &&
-	git add fileA.t fileB.t fileC.t fileD.t &&
+	git add dir1 fileA.t fileB.t fileC.t fileD.t &&
 	git commit -m "files 2" &&

 	git tag checkpoint
@@ -31,7 +34,7 @@ restore_checkpoint () {
 }

 verify_expect () {
-	git status --porcelain --untracked-files=no -- fileA.t fileB.t fileC.t fileD.t >actual &&
+	git status --porcelain --untracked-files=no -- dir1 fileA.t fileB.t fileC.t fileD.t >actual &&
 	test_cmp expect actual
 }

@@ -161,4 +164,14 @@ test_expect_success 'error conditions' '
 	test_i18ngrep -e "you must specify path(s) to restore" err
 '

+test_expect_success 'wildcard pathspec matches file in subdirectory' '
+	restore_checkpoint &&
+
+	echo "*file" | git restore --pathspec-from-file=- --source=HEAD^1 &&
+	cat >expect <<-\EOF &&
+	 M dir1/file
+	EOF
+	verify_expect
+'
+
 test_done
--
2.28.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-20 18:27         ` Jeff King
  2020-08-22  8:57           ` [PATCH] checkout, restore: make pathspec recursive René Scharfe
@ 2020-08-22 10:29           ` René Scharfe
  2020-08-24 20:25             ` Jeff King
  2020-08-22 19:36           ` Junio C Hamano
  2 siblings, 1 reply; 11+ messages in thread
From: René Scharfe @ 2020-08-22 10:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Sergii Shkarnikov, Eric Sunshine, Git List

Am 20.08.20 um 20:27 schrieb Jeff King:
> On Thu, Aug 20, 2020 at 07:48:48PM +0200, René Scharfe wrote:
>
>>>   - shouldn't that wildcard pathspec match those files? I've confirmed
>>>     that the glob characters make it into Git's pathspec machinery, and
>>>     since it doesn't have slashes, I think we'd match a basename (and
>>>     certainly "git ls-files *test_file.*" does what I expect).
>>
>> No, because restore doesn't interpret pathspecs recursively.  I don't
>> know why that causes files to disappear, though.  But here's a fix.
>
> I think it's because of this comment from bc96cc87dbb:
>
>   When pathspec.recursive == 0, the behavior depends on match functions:
>   non-recursive for tree_entry_interesting() and recursive for
>   match_pathspec{,_depth}

> So the fundamental issue is treating the pathspec in two different ways,
> and then correlating the results. We need to either do a recursive match
> for the tree match (as your patch does), or do non-recursive for this
> index match (which I don't think is trivial, because of the way the
> recursive flag works).

If using the same pathspec with both tree_entry_interesting and
match_pathspec gives inconsistent results and can even lead to data loss
as we've seen here, then we better prevent it.

The easiest way to do that would be to BUG out in match_pathspec if
recursive is unset, to indicate that it doesn't support non-recursive
matching.  Finding all the places that didn't bothered to set this flag
since it doesn't affect match_pathspec anyway would be quite tedious,
though.

At least the test suite still completes with the following evil patch
and the fix I sent earlier (evil because it ignores const), so we
currently don't have any other mismatches in covered code.

René

---
 dir.c       | 4 ++++
 pathspec.h  | 2 ++
 tree-walk.c | 5 +++++
 3 files changed, 11 insertions(+)

diff --git a/dir.c b/dir.c
index fe64be30ed6..87d5ffa62d0 100644
--- a/dir.c
+++ b/dir.c
@@ -432,6 +432,10 @@ static int do_match_pathspec(const struct index_state *istate,
 {
 	int i, retval = 0, exclude = flags & DO_MATCH_EXCLUDE;

+	((struct pathspec *)ps)->match_pathspec = 1;
+	if (ps->tree_entry_interesting && !ps->recursive)
+		BUG("match_pathspec tree_entry_interesting !recursive");
+
 	GUARD_PATHSPEC(ps,
 		       PATHSPEC_FROMTOP |
 		       PATHSPEC_MAXDEPTH |
diff --git a/pathspec.h b/pathspec.h
index 454ce364fac..bbae0abb249 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -32,6 +32,8 @@ struct pathspec {
 	unsigned int has_wildcard:1;
 	unsigned int recursive:1;
 	unsigned int recurse_submodules:1;
+	unsigned int match_pathspec:1;
+	unsigned int tree_entry_interesting:1;
 	unsigned magic;
 	int max_depth;
 	struct pathspec_item {
diff --git a/tree-walk.c b/tree-walk.c
index 0160294712b..f6465cd9cf4 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -1185,6 +1185,11 @@ enum interesting tree_entry_interesting(struct index_state *istate,
 					const struct pathspec *ps)
 {
 	enum interesting positive, negative;
+
+	((struct pathspec *)ps)->tree_entry_interesting = 1;
+	if (ps->match_pathspec && !ps->recursive)
+		BUG("match_pathspec tree_entry_interesting !recursive");
+
 	positive = do_match(istate, entry, base, base_offset, ps, 0);

 	/*
--
2.28.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-20 18:27         ` Jeff King
  2020-08-22  8:57           ` [PATCH] checkout, restore: make pathspec recursive René Scharfe
  2020-08-22 10:29           ` Possible bug with git restore René Scharfe
@ 2020-08-22 19:36           ` Junio C Hamano
  2 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2020-08-22 19:36 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Sergii Shkarnikov, Eric Sunshine, Git List

Jeff King <peff@peff.net> writes:

> So I think this inconsistency in pathspec matching between trees and the
> index has probably existed in git-checkout for ages (and I guess people
> don't do wildcards with trees often enough for anybody to have noticed).
> But it didn't cause the index-deletion problem, because that only
> appeared more recently with the --no-overlay mode. That's the default
> for restore, but you can trigger the problem with checkout, too:
>
>   $ git reset --hard
>   $ git checkout --no-overlay HEAD^ '*.hpp'
>   Updated 0 paths from 2668463
>   $ git status
>   On branch master
>   Changes to be committed:
> 	deleted:    incl/test_file.hpp

The --no-overlay mode is an enhancement added on top of reasonably
aged codebase relatively recently.  Most of the core code in
checkout dates back to early 2008, while --no-overlay was done as a
long-overdue-afterthought in early last year.

And it is not all that surprising that this issue took a long time
to be discovered.

Thank you all for finding, analysing and fixing it promptly.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] checkout, restore: make pathspec recursive
  2020-08-22  8:57           ` [PATCH] checkout, restore: make pathspec recursive René Scharfe
@ 2020-08-24 20:21             ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2020-08-24 20:21 UTC (permalink / raw)
  To: René Scharfe
  Cc: Sergii Shkarnikov, Eric Sunshine, Git List, Junio C Hamano

On Sat, Aug 22, 2020 at 10:57:59AM +0200, René Scharfe wrote:

> The pathspec given to git checkout and git restore is used with both
> tree_entry_interesting (via read_tree_recursive) and match_pathspec
> (via ce_path_match).  The latter effectively only supports recursive
> matching regardless of the value of the pathspec flag "recursive",
> which is unset here.
> 
> That causes different match results for pathspecs with wildcards, and
> can lead checkout and restore in no-overlay mode to remove entries
> instead of modifying them.  Enable recursive matching for both checkout
> and restore to make matching consistent.
> 
> Setting the flag in checkout_main() technically also affects git switch,
> but since that command doesn't accept pathspecs at all this has no
> actual consequence.

Thanks, I think this is the thing to do (and the code change looks
obviously correct ;) ).

This may cause a user-visible behavior change even in overlay mode
(because we'll now match some tree entries that we wouldn't before). But
I think it's an improvement, because it makes:

  git checkout <tree> -- <paths>

consistent with:

  git checkout -- <paths>

I don't know if you want to call that out more directly in the commit
message.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bug with git restore
  2020-08-22 10:29           ` Possible bug with git restore René Scharfe
@ 2020-08-24 20:25             ` Jeff King
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2020-08-24 20:25 UTC (permalink / raw)
  To: René Scharfe; +Cc: Sergii Shkarnikov, Eric Sunshine, Git List

On Sat, Aug 22, 2020 at 12:29:23PM +0200, René Scharfe wrote:

> > So the fundamental issue is treating the pathspec in two different ways,
> > and then correlating the results. We need to either do a recursive match
> > for the tree match (as your patch does), or do non-recursive for this
> > index match (which I don't think is trivial, because of the way the
> > recursive flag works).
> 
> If using the same pathspec with both tree_entry_interesting and
> match_pathspec gives inconsistent results and can even lead to data loss
> as we've seen here, then we better prevent it.

Yeah, it would be nice to do away with this inconsistency entirely. I
don't know if that even causes user-visible behavior changes at this
point, though.

> The easiest way to do that would be to BUG out in match_pathspec if
> recursive is unset, to indicate that it doesn't support non-recursive
> matching.  Finding all the places that didn't bothered to set this flag
> since it doesn't affect match_pathspec anyway would be quite tedious,
> though.

Yeah. I think it's easier to approach it from the tree-entry side, and
say: which spots are not recursive, and could/should they be. Which I
think is where your patch below is going.

> At least the test suite still completes with the following evil patch
> and the fix I sent earlier (evil because it ignores const), so we
> currently don't have any other mismatches in covered code.

That's kind-of good news, in that it lends support to the notion that
basically everything wants to be recursive. But I'd be worried there's a
lingering code path that is not covered in the test suite. Certainly
shipping your BUG() patch would flush it out, but it's not very friendly
to users who do run into it.

I wonder if we can drop it to a warning() and see if anybody reports it.

Or the more-work but more-responsible version: manually trace all of the
pathspec calls to see which code paths might rely on leaving "recursive"
unset.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-08-24 20:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-12 18:51 Possible bug with git restore Sergii Shkarnikov
2020-08-14 22:41 ` Eric Sunshine
2020-08-20 12:59   ` Sergii Shkarnikov
2020-08-20 13:40     ` Jeff King
2020-08-20 17:48       ` René Scharfe
2020-08-20 18:27         ` Jeff King
2020-08-22  8:57           ` [PATCH] checkout, restore: make pathspec recursive René Scharfe
2020-08-24 20:21             ` Jeff King
2020-08-22 10:29           ` Possible bug with git restore René Scharfe
2020-08-24 20:25             ` Jeff King
2020-08-22 19:36           ` Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git