git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Torsten Krah <krah.tm@gmail.com>
Cc: Emily Shaffer <emilyshaffer@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Broken branch after git commit - tracked files in staging area can't be removed with restore --staged, or commit or stash
Date: Wed, 8 Jan 2020 05:40:08 -0500	[thread overview]
Message-ID: <20200108104008.GA2207365@coredump.intra.peff.net> (raw)
In-Reply-To: <2423f8c0b91578c0faf7527b7d97b0e1e9666261.camel@gmail.com>

On Wed, Jan 08, 2020 at 11:02:41AM +0100, Torsten Krah wrote:

> Hi Jeff, I have a poc you can try:

Thanks, I can see the problem now.

> cd /tmp
> mkdir testrepo
> cd testrepo
> touch TEST1 TEST2
> git add -A
> git commit -m First
> touch TEST3 TEST4
> git add -A
> git commit -m Second
> git reset --soft HEAD~1

OK, so we'd expect to still see TEST3 and TEST4 in the index. And we do:

> git status
> 
>    Auf Branch master
>    Zum Commit vorgemerkte Änderungen:
>      (benutzen Sie "git restore --staged <Datei>..." zum Entfernen aus
>    der Staging-Area)
>    	neue Datei:     TEST3
>    	neue Datei:     TEST4

And then if we try to unstage one of them, we should see that:

> git restore --staged TEST3
> 
>    [10:57:26][tkrah@torstenknbl:/tmp/testrepo]  (master) $ LC_ALL=C git
>    status
>    On branch master
>    Changes to be committed:
>      (use "git restore --staged <file>..." to unstage)
>    	new file:   TEST4
> 
>    Untracked files:
>      (use "git add <file>..." to include in what will be committed)
>    	TEST3

So far so good. But I think there is a lurking problem in the index,
because...

> git commit -m Second
> 
>    [master 5b62331] Second
>     2 files changed, 0 insertions(+), 0
>    deletions(-)
>     create mode 100644 TEST3
>     create mode 100644 TEST4

This is very wrong (and is a bug, not a problem with what you're doing).
We wrote a commit with TEST3 in it, even though it wasn't in the index.
Why would we do that? Almost certainly it's because of a cache-tree
extension in the index. Presumably "git restore --staged" is not
properly invalidating the cache-tree entry. And that would explain what
you see next:

> And now TEST3 is in the commit and what is even more "interesting" is
> the next one:
> 
>    [10:59:16][tkrah@torstenknbl:/tmp/testrepo]  (master) $ LC_ALL=C git
> status
>    On branch master
>    Changes to be committed:
>      (use "git restore --staged <file>..." to unstage)
>    	deleted:    TEST3
> 
>    Untracked files:
>      (use "git add <file>..." to include in what will be committed)
>    	TEST3
> 
> TEST3 is unstaged and deleted now.

That happens because we wrote out a commit with TEST3, but it's still
not in the index! So it appears as if a deletion was staged (remember
that what is "staged" is really just the diff between the index and
HEAD).

We can repeat the same process, substituting "git reset -- TEST3" for
git-restore, and it works fine. So the problem is in git-restore itself.

Here's another observation. If we do:

  git restore --staged TEST3 TEST4

then running "git commit" won't do anything, since there's nothing
staged (and this is why I had trouble reproducing the problem earlier).
If we add a new file, like:

  echo whatever >new
  git add new

then the problem won't exhibit, because that will also invalidate the
cache-tree (because it has to account for "new"). But we could put those
files in their own subdirectory and do:

  $ git restore --staged subdir/TEST3
  $ git status
  On branch master
  Changes to be committed:
  	new file:   subdir/TEST4
  
  Untracked files:
  	subdir/TEST3

and if I commit that, I see the bug again:

  $ git commit -m second
  [master 5c3b8c1] second
   2 files changed, 0 insertions(+), 0 deletions(-)
   create mode 100644 subdir/TEST3
   create mode 100644 subdir/TEST4

Now here's even more fun. I wanted to do another test, so I tried to
reset back to the original "second" commit, but it segfaults! Yikes.

  $ git reset --hard HEAD@{2}
  Segmentation fault

Running it under valgrind shows we're indeed seeing a bogus cache-tree:

  ==2214443== Invalid read of size 4
  ==2214443==    at 0x362DFF: traverse_by_cache_tree (unpack-trees.c:734)
  ==2214443==    by 0x3630A3: traverse_trees_recursive (unpack-trees.c:799)
  ==2214443==    by 0x364209: unpack_callback (unpack-trees.c:1258)
  ==2214443==    by 0x35F925: traverse_trees (tree-walk.c:497)
  ==2214443==    by 0x364DDD: unpack_trees (unpack-trees.c:1589)
  ==2214443==    by 0x1BE6CF: reset_index (reset.c:95)
  ==2214443==    by 0x1BF894: cmd_reset (reset.c:421)
  ==2214443==    by 0x124868: run_builtin (git.c:444)
  ==2214443==    by 0x124BAE: handle_builtin (git.c:674)
  ==2214443==    by 0x124E24: run_argv (git.c:741)
  ==2214443==    by 0x1252AB: cmd_main (git.c:872)
  ==2214443==    by 0x1E811A: main (common-main.c:52)
  ==2214443==  Address 0x40 is not stack'd, malloc'd or (recently) free'd
  ==2214443== 
  ==2214443== Process terminating with default action of signal 11 (SIGSEGV)

This isn't the same spot that Emily fixed earlier today in [1], but it
seems like a very similar problem. Weird.

So instead I blew away the index and then did hard reset:

  $ rm .git/index
  $ git reset --hard
  HEAD is now at 5c3b8c1 second

And now we can see what happens if both files are unstaged, and then we
add a new file, and then commit:

  $ git reset --soft HEAD^
  $ git restore --staged subdir/TEST3 subdir/TEST4
  $ echo whatever >new
  $ git add new
  $ git commit -m again
  [master 119d9e4] again
   1 file changed, 1 insertion(+)
   create mode 100644 new

So that _doesn't_ exhibit the problem. I guess because adding the new
file causes git-add to reexamine the whole cache-tree and clean it up.

So there seem to be at least two bugs:

 - git-restore doesn't properly invalidate the cache-tree

 - the index-reading code is not careful enough about bogus cache-trees,
   and may segfault

-Peff

[1] https://lore.kernel.org/git/20200108023127.219429-1-emilyshaffer@google.com/

  parent reply	other threads:[~2020-01-08 10:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-07 12:55 Broken branch after git commit - tracked files in staging area can't be removed with restore --staged, or commit or stash Torsten Krah
2020-01-07 13:43 ` Torsten Krah
2020-01-07 15:28   ` Torsten Krah
2020-01-08  9:11     ` Jeff King
2020-01-08 10:02       ` Torsten Krah
2020-01-08 10:31         ` Torsten Krah
2020-01-08 10:40         ` Jeff King [this message]
2020-01-08 11:43           ` [PATCH] restore: invalidate cache-tree when removing entries with --staged Jeff King
2020-01-08 15:41             ` Junio C Hamano
2020-02-05 20:24             ` Dennis Kaarsemaker
2020-01-08 12:42           ` Broken branch after git commit - tracked files in staging area can't be removed with restore --staged, or commit or stash Torsten Krah
2020-01-09  9:39             ` Jeff King
2020-01-09 10:49               ` Torsten Krah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200108104008.GA2207365@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=krah.tm@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).