git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Stefan Beller <sbeller@google.com>
Subject: Re: [PATCH v9 29.75/30] merge-recursive: Fix was_tracked() to quit lying with some renamed paths
Date: Mon, 16 Apr 2018 14:21:08 -0700	[thread overview]
Message-ID: <CABPp-BFPTJsTUVoPxxN=2u5jEqn1ngdDvMNhp+VLZKTgZaUkvw@mail.gmail.com> (raw)
In-Reply-To: <xmqqo9ikyojz.fsf@gitster-ct.c.googlers.com>

On Sun, Apr 15, 2018 at 5:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Elijah Newren <newren@gmail.com> writes:
>
>> @@ -362,13 +363,17 @@ static int git_merge_trees(struct merge_options *o,
>>       init_tree_desc_from_tree(t+2, merge);
>>
>>       rc = unpack_trees(3, t, &o->unpack_opts);
>> +     cache_tree_free(&active_cache_tree);
>> +
>> +     o->orig_index = the_index;
>> +     the_index = tmp_index;
>> +
>>       /*
>> -      * unpack_trees NULLifies src_index, but it's used in verify_uptodate,
>> -      * so set to the new index which will usually have modification
>> -      * timestamp info copied over.
>> +      * src_index is used in verify_uptodate, but was NULLified in
>> +      * unpack_trees, so we need to set it back to the original index.
>>        */
>
> Was NULLified?  I thought that the point of src/dst distinction
> Linus introduced long time ago at 34110cd4 ("Make 'unpack_trees()'
> have a separate source and destination index", 2008-03-06) was that
> we can then keep the source side of the traversal unmodified.

That comment is messed up; maybe I edited and re-edited the comment
multiple times and then didn't notice the big problems when
re-reading?

Anyway, I should move the comment a few lines up, and make the code
instead read:

    /*
     * Update the_index to match the new results, AFTER saving a copy
     * in o->orig_index.  Update src_index to point to the saved copy.
     * (verify_uptodate() checks src_index, and the original index is
     * the one that had the necessary modification timestamps.)
     */
    o->orig_index = the_index;
    the_index = tmp_index;
    o->unpack_opts.src_index = &o->orig_index;

>>  static int would_lose_untracked(const char *path)
>>  {
>> -     return !was_tracked(path) && file_exists(path);
>> +     /*
>> +      * This may look like it can be simplified to:
>> +      *   return !was_tracked(o, path) && file_exists(path)
>> +      * but it can't.  This function needs to know whether path was
>> +      * in the working tree due to EITHER having been tracked in the
>> +      * index before the merge OR having been put into the working copy
>> +      * and index by unpack_trees().  Due to that either-or requirement,
>> +      * we check the current index instead of the original one.
>> +      */
>
> If this path was created by merge-recursive, not by unpack_trees(),
> what does this function want to say?  Say, we are looking at path P,
> the other branch we are merging moved some other path Q to P (while
> our side modified contents at path Q).  Then path P we are looking
> at has contents of Q at the merge base at stage #1, the contents of
> Q from our HEAD at stage #2 and the contents of P from the other
> branch at stage #3.  The code below says "path P is OK, we won't
> lose it" in such a case, but it is unclear if the above comment
> wants to also cover that case.

Oh, boy, here be dragons...

The comment as-is actually does cover your example case with Q and P:
unpack_trees(), which is unaware of renames, will see that P only
exists on one side of history and thus load it into the index at stage
0 rather than stage 3.

But your general comment about whether something else in
merge-recursive could create a path in the current index after
unpack_trees() is interesting...it touches on a pitfall that has bit
me multiple times.  There is a required ordering in merge-recursive.c
that for any given path, the working directory must be updated before
the index is -- otherwise, would_lose_untracked() will return faulty
information.  update_file_flags() has this ordering builtin,
update_stages() has a big obnoxious comment at the beginning about how
it should not be called until after update_file() is,
apply_directory_rename_modifications() has a big comment about this
~80% of the way through the function (look for
"would_lose_untracked"), and conflict_rename_rename_2to1() has a big
obnoxious comment near the end painstakingly pointing out that some
code that feels like it would make more sense being combined with a
previous function cannot be due to the
update-working-directory-before-index requirement.

I should probably add to this comment something about this annoying
(and error-prone) ordering restriction, since this function is the
source of those particular pains.  Your suggested ideal-world rewrite
(run unpack_trees() with unpack_opts.index_only=1, do merge in memory,
then update working tree), would make this whole problem go away.

  reply	other threads:[~2018-04-16 21:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-11  7:19 [Git] recursive merge on 'master' severely broken? Junio C Hamano
2018-04-11  9:06 ` Junio C Hamano
2018-04-11  9:57   ` Junio C Hamano
2018-04-11 15:51 ` Elijah Newren
2018-04-12  1:37   ` Junio C Hamano
2018-04-13 19:56 ` [RFC PATCH v9 0/30] Add directory rename detection to git Elijah Newren
2018-04-13 19:56   ` [PATCH v9 29.25/30] merge-recursive: improve output precision around skipping updates Elijah Newren
2018-04-13 19:56   ` [PATCH v9 29.50/30] t6046: testcases checking whether updates can be skipped in a merge Elijah Newren
2018-04-13 21:03     ` Stefan Beller
2018-04-13 19:56   ` [PATCH v9 29.75/30] merge-recursive: Fix was_tracked() to quit lying with some renamed paths Elijah Newren
2018-04-16  0:37     ` Junio C Hamano
2018-04-16 21:21       ` Elijah Newren [this message]
2018-04-13 19:56   ` [PATCH v9 30/30] merge-recursive: fix check for skipability of working tree updates Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFPTJsTUVoPxxN=2u5jEqn1ngdDvMNhp+VLZKTgZaUkvw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sbeller@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).