git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Tracking a merge bug
@ 2020-10-26 15:08 Ephrim Khong
  2020-10-26 15:50 ` Elijah Newren
  0 siblings, 1 reply; 3+ messages in thread
From: Ephrim Khong @ 2020-10-26 15:08 UTC (permalink / raw)
  To: GIT Mailing-list

Dear All,

I am trying to find the root cause for what I believe might be a strange
bug in git merge.  I have a feature branch A which branched off master
not too long ago, and want to bring it up to date with master:

    git checkout A
    git merge master

which yields

    Removing somefile
    Removing anotherfile
    error: add_cacheinfo failed to refresh for path 'c/d/e.sh'; merge
aborting.

the offending file, c/d/e.sh, does not exist in my feature branch but
was added to master since branching off. After aborting, the working
directory is in an inconsistent state and c/d/e.sh exists with the
correct content.

Below is a stacktrace - the merger handles the file as a rename
(apparently there is a similar / identical file 'c/f/g.sh' that is
renamed to 'c/d/e.sh'), but that fails because the file has MODE_CHANGED
set. (Which appears strange - at the time where the merge is aborting,
the file apparently was already written to the working directory. Is it
renaming two different files to the same target file?).

Any hint is appreciated, especially where to look: Is the root cause
more likely to be at the filesystem level (the stat returns something
off), or in the merge logic? What else could be wrong here?


The stacktrace looks roughly as follows:

-> read-cache.c, ie_modified(): ie_match_stat returned 63, which is
		MTIME_CHANGED	| CTIME_CHANGED | OWNER_CHANGED |
		MODE_CHANGED | INODE_CHANGED | DATA_CHANGED
	and is_modified() returns 63 because MODE_CHANGED is set.

-> read-cache.c, refresh_cache_ent(): at the call to ie_modified

-> read-cache.c, refresh_cache_entry()

-> merge-recursive.c, add_cacheinfo(), is in the refresh-path (i.e.
make_cache_entry() worked, but refresh_cache_entry() will fail)

-> merge-recursive.c, update_file_flags(), after the update_index: label

-> merge-recursive.c, update_file()

-> merge-recursive.c, handle_content_merge() is in the very last
update_file() call, close to the end of the function

-> merge-recursive.c, handle_rename_normal()

-> merge-recursive.c, process_entry()
	is in the RENAME_NORMAL / RENAME_ONE_FILE_TO_ONE block

-> [...]

Thanks
- Eph

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tracking a merge bug
  2020-10-26 15:08 Tracking a merge bug Ephrim Khong
@ 2020-10-26 15:50 ` Elijah Newren
  2021-03-05 11:07   ` Ephrim Khong
  0 siblings, 1 reply; 3+ messages in thread
From: Elijah Newren @ 2020-10-26 15:50 UTC (permalink / raw)
  To: Ephrim Khong; +Cc: GIT Mailing-list

On Mon, Oct 26, 2020 at 8:11 AM Ephrim Khong <dr.khong@gmail.com> wrote:
>
> Dear All,
>
> I am trying to find the root cause for what I believe might be a strange
> bug in git merge.  I have a feature branch A which branched off master
> not too long ago, and want to bring it up to date with master:
>
>     git checkout A
>     git merge master
>
> which yields
>
>     Removing somefile
>     Removing anotherfile
>     error: add_cacheinfo failed to refresh for path 'c/d/e.sh'; merge
> aborting.

"add_cacheinfo failed to refresh"?  Wow, that's a new one.  Some years
back we had a "add_cacheinfo failed for path" corresponding to the
other error site within that function, but we fixed that one up long
ago.  I've never seen anything hit the refresh failure.

The fact that you're seeing this error message means you're using git
>= 2.18.0.  Could you verify the exact version?

Also, I suspect "somefile" and "anotherfile" are placeholders.  Any
chance we could see the real names?

> the offending file, c/d/e.sh, does not exist in my feature branch but
> was added to master since branching off. After aborting, the working
> directory is in an inconsistent state and c/d/e.sh exists with the
> correct content.

Is c/d a file or submodule or anything in the feature branch?  Do you
have any untracked files at the beginning of the merge named c/d or
c/d/e.sh?  I'm trying to guess at what might be weird about the paths
here.

> Below is a stacktrace - the merger handles the file as a rename
> (apparently there is a similar / identical file 'c/f/g.sh' that is
> renamed to 'c/d/e.sh'), but that fails because the file has MODE_CHANGED
> set. (Which appears strange - at the time where the merge is aborting,
> the file apparently was already written to the working directory. Is it
> renaming two different files to the same target file?).

What's the output of
  git ls-tree -r master | grep c/d/e.sh
  git ls-tree -r feature | grep c/d/f.sh
  git ls-tree -r $(git merge-base master feature) | grep c/d/f.sh
?

And, before the merge, what is the output of
  ls -l c/d/e.sh c/d/f.sh
?

> Any hint is appreciated, especially where to look: Is the root cause
> more likely to be at the filesystem level (the stat returns something
> off), or in the merge logic? What else could be wrong here?

Not sure; I've never hit this before.  I'm curious if you've got a
weird mode recorded for one of the files in your tree or something.
We might have to get someone who understands the index data structure
and read-cache.c to take a look.

> The stacktrace looks roughly as follows:
>
> -> read-cache.c, ie_modified(): ie_match_stat returned 63, which is
>                 MTIME_CHANGED   | CTIME_CHANGED | OWNER_CHANGED |
>                 MODE_CHANGED | INODE_CHANGED | DATA_CHANGED
>         and is_modified() returns 63 because MODE_CHANGED is set.
>
> -> read-cache.c, refresh_cache_ent(): at the call to ie_modified
>
> -> read-cache.c, refresh_cache_entry()
>
> -> merge-recursive.c, add_cacheinfo(), is in the refresh-path (i.e.
> make_cache_entry() worked, but refresh_cache_entry() will fail)
>
> -> merge-recursive.c, update_file_flags(), after the update_index: label
>
> -> merge-recursive.c, update_file()
>
> -> merge-recursive.c, handle_content_merge() is in the very last
> update_file() call, close to the end of the function
>
> -> merge-recursive.c, handle_rename_normal()
>
> -> merge-recursive.c, process_entry()
>         is in the RENAME_NORMAL / RENAME_ONE_FILE_TO_ONE block

RENAME_NORMAL / RENAME_ONE_FILE_TO_ONE means this is NOT a case of
renaming two files to one.  From just this information, it could be
that both sides renamed c/d/f.sh -> c/d/e.sh -- but you ruled that out
when you said c/d/e.sh didn't exist in your branch.  So, that
basically leaves us with a normal rename; i.e. c/d/f.sh existed in the
merge base and in your feature branch, but it was renamed to c/d/e.sh
in master.


Any chance this repository is available for others to access to try to
reproduce the problem?  If the repository has sensitive contents, are
the files c/d/f.sh and c/d/e.sh sensitive?  If the repository overall
is sensitive but those two files aren't, we might be able to get a
reproduction by cloning the repository and running
   git filter-repo --path c/d/e.sh --path c/d/f.sh
to filter the repository down to just these two files (after first
installing git-filter-repo; see
https://github.com/newren/git-filter-repo).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tracking a merge bug
  2020-10-26 15:50 ` Elijah Newren
@ 2021-03-05 11:07   ` Ephrim Khong
  0 siblings, 0 replies; 3+ messages in thread
From: Ephrim Khong @ 2021-03-05 11:07 UTC (permalink / raw)
  To: Elijah Newren; +Cc: GIT Mailing-list

On 26.10.2020 16:50, Elijah Newren wrote:
> On Mon, Oct 26, 2020 at 8:11 AM Ephrim Khong <dr.khong@gmail.com> wrote:
>>
>> I am trying to find the root cause for what I believe might be a strange
>> bug in git merge.  I have a feature branch A which branched off master
>> not too long ago, and want to bring it up to date with master:
>>
>>     git checkout A
>>     git merge master
>>
>> which yields
>>
>>     error: add_cacheinfo failed to refresh for path 'c/d/e.sh'; merge
>> aborting.
> 
> "add_cacheinfo failed to refresh"?  Wow, that's a new one.  Some years
> back we had a "add_cacheinfo failed for path" corresponding to the
> other error site within that function, but we fixed that one up long
> ago.  I've never seen anything hit the refresh failure.

Thank you for looking into this and sorry for the delay. I ran into this
again and did some more testing.

The merge works if I copy the complete repository to a different
filesystem (in this case a local SSD) with cp -a. It was originally on a
network share.

An strace on the affected file seems to show that git creates and writes
the file, attempting to set the executable bit. But the subsequent lstat
reports that no executable bit is set (and that is correct, when looking
at the file after the merge errors out):

	
> lstat("tools/ci/nightly/run_benchmarks.sh", 0x7ffc2f148c20) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "tools/ci/nightly/run_benchmarks.sh", O_WRONLY|O_CREAT|O_EXCL, 0777) = 4
> write(4, "#!/bin/bash\n#\n# Build and run hb"..., 1973) = 1973
> fstat(4, {st_mode=S_IFREG|0755, st_size=1973, ...}) = 0
> close(4)                                = 0
> 	
> lstat("tools/ci/nightly/run_benchmarks.sh", {st_mode=S_IFREG|0755, st_size=1973, ...}) = 0   #<-- first lstat, mode is OK
> unlink("tools/ci/nightly/run_benchmarks.sh") = 0
> openat(AT_FDCWD, "tools/ci/nightly/run_benchmarks.sh", O_WRONLY|O_CREAT|O_TRUNC, 0777) = 4
> write(4, "#!/bin/bash\n#\n# Build and run hb"..., 1973) = 1973
> close(4)                                = 0
> lstat("tools/ci/nightly/run_benchmarks.sh", {st_mode=S_IFREG|0640, st_size=1973, ...}) = 0   #<-- second lstat, mode is wrong
> 	
> write(2, "error: add_cacheinfo failed to r"..., 120error: add_cacheinfo failed to refresh for path 'tools/ci/nightly/run_benchmarks.sh' (2); merge aborting.

note how the last lstat() reports 0640, even though openat() wanted
0777. The same thing when merging on the SSD shows

> lstat("tools/ci/nightly/run_benchmarks.sh", 0x7ffd491745c0) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "tools/ci/nightly/run_benchmarks.sh", O_WRONLY|O_CREAT|O_EXCL, 0777) = 4
> write(4, "#!/bin/bash\n#\n# Build and run hb"..., 1973) = 1973
> fstat(4, {st_mode=S_IFREG|0755, st_size=1973, ...}) = 0
> close(4)                                = 0
> 		
> lstat("tools/ci/nightly/run_benchmarks.sh", {st_mode=S_IFREG|0755, st_size=1973, ...}) = 0
> unlink("tools/ci/nightly/run_benchmarks.sh") = 0
> openat(AT_FDCWD, "tools/ci/nightly/run_benchmarks.sh", O_WRONLY|O_CREAT|O_TRUNC, 0777) = 4
> write(4, "#!/bin/bash\n#\n# Build and run hb"..., 1973) = 1973
> close(4)                                = 0
> lstat("tools/ci/nightly/run_benchmarks.sh", {st_mode=S_IFREG|0755, st_size=1973, ...}) = 0
> 		
> openat(AT_FDCWD, "tools/ci/nightly/run_benchmarks.sh", O_RDONLY|O_CLOEXEC) = 4
> read(4, "#!/bin/bash\n#\n# Build and run hb"..., 1973) = 1973
> close(4)                                = 0

here, lstat() reports 0755 and the merge continues.

This could very well be an issue with the storage system backing my
network share, but maybe I overlooked something in that strace that git
is doing wrong or could do better. It does, for example, write the file
twice with identical content for some reason.

I have patched my git for now, and in read-cache.c, refresh_cache_ent()
call chmod() and re-run lstat() if the mode is incorrect. That fixes the
problem for now. It's not perfect, since chmod() ignores the umask, but
at least allows to merge.

> Any chance this repository is available for others to access to try to
> reproduce the problem?

Unfortunately it's not public. I have tried to cut out the offending
part, but was unable to reproduce it with a smaller test case yet.

Thanks
- Eph

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-05 11:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26 15:08 Tracking a merge bug Ephrim Khong
2020-10-26 15:50 ` Elijah Newren
2021-03-05 11:07   ` Ephrim Khong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).