git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: git <git@vger.kernel.org>
Subject: submodules and merging (Was: Re: [PATCH 02/30] merge-recursive: Fix logic ordering issue)
Date: Tue, 14 Nov 2017 09:17:21 -0800	[thread overview]
Message-ID: <CABPp-BHDrw_dAESic3xK7kC3jMgKeNQuPQF69OpbVYhRkbhJsw@mail.gmail.com> (raw)

On Mon, Nov 13, 2017 at 3:46 PM, Stefan Beller <sbeller@google.com> wrote:
> On Mon, Nov 13, 2017 at 3:39 PM, Elijah Newren <newren@gmail.com> wrote:
>> On Mon, Nov 13, 2017 at 2:12 PM, Stefan Beller <sbeller@google.com> wrote:
>>> I wanted to debug a very similar issue today just after reviewing this
>>> series, see
>>> https://public-inbox.org/git/743acc29-85bb-3773-b6a0-68d4a0b8fd63@ispras.ru/
>>
>> Oh, bleh.  That's not a D/F conflict at all, it's the code assuming
>> there's a D/F conflict because the entry it is processing ("sub") is a
>> submodule rather than a file, and it panics when it sees "a directory
>> in the way" -- a directory that just so happens to be named "sub" and
>> which is in fact the desired submodule, meaning that the working
>> directory is already good and needs no changes.
>
> yup, I came to find the same snippet of code to be the offender,
> I just haven't figured out how to fix this bug.
>
> Thanks for taking a look!
>
> As you have a lot of fresh knowledge in the merge-recursive case
> currently, how would we approach the fix here?

submodules and merging looks pretty broken, to me.  Backing off from
the current bug and just looking at merging with submodules in
general, makes me a little uneasy with what I see.  I mean, just look
at update_file_flags, when it wants the working directory updated, the
code for a submodule is the following:

    if (update_wd) {
<snip>

        if (S_ISGITLINK(mode)) {
            /*
             * We may later decide to recursively descend into
             * the submodule directory and update its index
             * and/or work tree, but we do not do that now.
             */
            update_wd = 0;
            goto update_index;
        }

So, it just doesn't put anything there, so the working directory is
made out-of-date immediately.  Users are happy with that?  Maybe it is
what makes sense, but it surprised me.

From there, we can start stacking on the weird.  For example let's say
we have this setup:
O (merge base): path/contents containing "whatever\n"
A (ours): path is a submodule
B (theirs): path/contents containing "whatever\nworks\n"

Merging A into B results in

CONFLICT (modify/delete): path/contents deleted in A and modified in
HEAD. Version HEAD of path/contents left in tree.
CONFLICT (directory/file): There is a directory with name path in
HEAD. Adding path as path~A
Automatic merge failed; fix conflicts and then commit the result.

but path~A is never created, because of the update_file_flags code
snippet above.  If the user looks at the path directory, it doesn't
have any submodule info either.  It has just "disappeared", despite
the claim made in the conflict notice.  Maybe okay, but slightly
confusing.

Now let's say the user just wants to back out of this merge.  What if
they run 'git merge --abort'?  Well, they're greeted with:

error: 'path/contents' appears as both a file and as a directory
error: path/contents: cannot drop to stage #0
error: Untracked working tree file 'path/contents' would be
overwritten by merge.
fatal: Could not reset index file to revision 'HEAD'.

"Would be overwritten by merge"?  We're unmerging, and I agree it
shouldn't be overwritten, but the fact that it thinks it needs to try
is worrisome.  And I don't like the fact that it just outright failed.
Okay, what about 'git reset --hard' at this point:

error: 'path/contents' appears as both a file and as a directory
error: path/contents: cannot drop to stage #0
warning: unable to rmdir path: Directory not empty
HEAD is now at 3e098a0 B

Cannot drop to stage #0?  Oh, boy.  Things must be really messed up.
Except they're not; it did drop path/contents to stage #0 despite the
error, and git status is clean again.

Now, let's switch tracks and look at things from the other side.  What
if the user had been on A and tried to merge B into A?

$ git checkout A
Switched to branch 'A'
$ git merge B
CONFLICT (modify/delete): path/contents deleted in HEAD and modified
in B. Version B of path/contents left in tree.
Adding path
Automatic merge failed; fix conflicts and then commit the result.

path/contents is left in the tree?  But there's a submodule in the
way!  Did it really put it there...?  Yep:

$ ls -al path/
total 16
drwxrwxr-x 3 newren newren 4096 Nov 14 08:40 .
drwxrwxr-x 4 newren newren 4096 Nov 14 08:38 ..
-rw-rw-r-- 1 newren newren   15 Nov 14 08:40 contents
-rw-rw-r-- 1 newren newren    0 Nov 14 08:39 foo
drwxrwxr-x 8 newren newren 4096 Nov 14 08:40 .git

At least git add and git rm on that location, from the supermodule, do
the right thing.

$ git add path/contents
fatal: Pathspec 'path/contents' is in submodule 'path'
$ git rm path/contents
path/contents: needs merge
rm 'path/contents'

But the fact that merge-recursive wrote unmerged entries back to the
working tree within the submodule feels pretty weird and dirty to me.
(Luckily, if the locations conflict with something in the submodule,
it'll write things out to an alternate path, such as path/contents~B,
so it merely muddies the submodule rather than destroying stuff within
it).


Also, now I'm worried that instead of just D/F (directory/file)
conflicts, we now have two new classes that need to be checked
everywhere throughout the code: submodule/file and submodule/directory
conflicts.  No idea how pervasive of a problem that is.  Maybe there's
a clever way to handle them all with the current D/F conflict code (it
certainly didn't take submodules into consideration when it was
written), but someone would have to take a careful look.  There are
already many codepaths due to many different conflict types, and folks
already have to keep several things in their head: renames, directory
renames, D/F conflicts, recursive cases, dirty files, untracked
files/directories, and now submodules.  It would be nice if we could
reduce the number of things folks have to focus on all at once while
reading merge-recursive.c.  I'm really starting to think we should at
least change how unpack_trees() and merge-recursive interoperate,
because at least the dirty files & untracked files/directories should
be able to be split off, so folks can focus on just the index for most
the code paths, and everything working tree related could be handled
separately.  But that's a topic for a different thread.

Anyway, circling back the problem you brought up, there is actually a
fairly easy fix if you want _just_ this issue cleaned up.  I'll post a
simple patch, but it doesn't address the above problems so it feels a
bit like a band-aid to me.

             reply	other threads:[~2017-11-14 17:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 17:17 Elijah Newren [this message]
2017-11-14 18:13 ` submodules and merging (Was: Re: [PATCH 02/30] merge-recursive: Fix logic ordering issue) Stefan Beller
2017-11-15 17:13   ` Jacob Keller
2017-11-25 22:37     ` Elijah Newren
2017-11-26  5:59       ` Jacob Keller
2017-12-04 19:33         ` Stefan Beller
2017-12-04 20:16           ` Jacob Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BHDrw_dAESic3xK7kC3jMgKeNQuPQF69OpbVYhRkbhJsw@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).