git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info
Date: Mon, 9 Nov 2020 15:05:48 -0800	[thread overview]
Message-ID: <CABPp-BFNk5ZQLyo5G-7vq3XD3s6z_rGUNM3QL14hXvMYTzK_Ug@mail.gmail.com> (raw)
In-Reply-To: <20201109220431.2534786-1-jonathantanmy@google.com>

On Mon, Nov 9, 2020 at 2:04 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > On Fri, Nov 6, 2020 at 2:52 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> > >
> > > > +     /*
> > > > +      * Note: We only label files with df_conflict, not directories.
> > > > +      * Since directories stay where they are, and files move out of the
> > > > +      * way to make room for a directory, we don't care if there was a
> > > > +      * directory/file conflict for a parent directory of the current path.
> > > > +      */
> > > > +     unsigned df_conflict = (filemask != 0) && (dirmask != 0);
> > >
> > > Suppose you have:
> > >
> > >  [ours]
> > >   foo/
> > >     bar/
> > >       baz
> > >     quux
> > >  [theirs]
> > >   foo
> > >
> > > By "we only label files with df_conflict, not directories", are you
> > > referring to not labelling "foo/" in [ours], or to "bar/", "baz", and
> > > "quux" (so, the files and directories within a directory)? At first I
> > > thought you were referring to the former, but perhaps you are referring
> > > to the latter.
> >
> > The former.  I was drawing a distinction between how this code
> > operates, and how unpack_trees() operates, which probably only matters
> > to those familiar with unpack_trees() or who have been reading through
> > it recently.
>
> Just for clarification: do you mean "the latter"? (The "not" in my
> question might be confusing.)

Yeah, probably was confusing, so let me just state where you are
almost right below.

> To be more illustrative in what I meant, at first I thought that you
> were NOT labelling "foo/" in [ours], hence:
>
>  [ours]
>   foo/  <- unlabeled
>  [theirs]
>   foo   <- labeled
>
> In this way, in a sense, you are indeed labelling only the file, not the
> directory.
>
> But instead what you seem to be doing is this:
>
>  [ours]
>   foo/     <- labeled
>     bar/   <- unlabeled
>       baz  <- unlabeled
>     quux   <- unlabeled
>  [theirs]
>   foo      <- labeled
>
> which is what I meant by NOT labelling "bar/", "baz", and "quux".

I'm doing something /really/ close to this, yes.  However, just to be
pedantic, there is no "foo/".  '/' is an illegal character in a
filename to record in a tree.  One side has a "foo" whose mode and
object_id happen to reflect a tree rather than a blob.  But I only
have one conflict_info per pathname, not 3 (can't have three since
strmaps don't allow duplicate keys, and wouldn't want it if I could).
That one conflict_info stores 3 (mode, object_id) pairs, and also has
a single df_conflict bit.  So, I label "foo" by setting that
df_conflict bit.  But I only pay attention to it for the pairs
representing a blob, not the ones representing a tree.  And I don't
propagate the information down to paths below the foo directory.

> > unpack_trees() will note when there is a directory/file
> > conflict, and propagates that information to all subtrees, with every
> > path specially checking for the o->df_conflict_entry and then handling
> > it specially (e.g. keeping higher order stages instead of using an
> > aggressive or trivial resolutions).
>
> And here it seems like you're describing that unpack_trees() would label
> it in this way:
>
>  [ours]
>   foo/     <- labeled
>     bar/   <- labeled
>       baz  <- labeled
>     quux   <- labeled
>  [theirs]
>   foo      <- labeled
>
> (and you're emphasizing by contrast that merge-ort is NOT doing this).

Correct.

> > However, leaving both a file and
> > a directory at the same path, while allowed in the index, makes for
> > ugliness and difficulty for users to resolve.   Plus it isn't allowed
> > in the working tree anyway.  We decided a while ago that it'd be
> > better to represent these conflicts differently[1], [2].
> >
> > Also, at the time you are unpacking or traversing trees, you only know
> > if one side had a directory where the other side had a file.  You
> > don't know if the final merge result will actually have a
> > directory/file conflict.  If the file existed in both the base version
> > and unmodified on one side, for example, then the file will be removed
> > as part of the merge.  It is similarly possible that the entire
> > directory of files all need to be deleted or are all renamed
> > elsewhere.  So, you have to keep track of a df_conflict bit, but you
> > can't act on it until you've processed several other things first.
> >
> > Since I already know I'm not going to move a whole directory of files
> > out of the way so that a file can be placed in the working tree
> > instead of that whole directory, the directory doesn't need to be
> > tweaked.  I'm not going to propagate any information about a
> > directory/file conflict at some path down to all subpaths of the
> > directory.  I only track it for the file that immediately conflicts,
> > and then only take action on it after resolving all the paths under
> > the corresponding directory to see if the directory/file conflict
> > remains.
> >
> > [1] https://lore.kernel.org/git/xmqqbmabcuhf.fsf@gitster-ct.c.googlers.com/
> > and the thread surrounding it
> > [2] https://lore.kernel.org/git/f27f12e8e50e56c010c29caa00296475d4de205b.1603731704.git.gitgitgadget@gmail.com/,
> > which is now commit ef52778708 ("merge tests: expect improved
> > directory/file conflict handling in ort", 2020-10-26)
>
> Makes sense.
>
> > > > @@ -161,6 +179,13 @@ static int collect_merge_info_callback(int n,
> > > >               newinfo.name = p->path;
> > > >               newinfo.namelen = p->pathlen;
> > > >               newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
> > > > +             /*
> > > > +              * If we did care about parent directories having a D/F
> > > > +              * conflict, then we'd include
> > > > +              *    newinfo.df_conflicts |= (mask & ~dirmask);
> > > > +              * here.  But we don't.  (See comment near setting of local
> > > > +              * df_conflict variable near the beginning of this function).
> > > > +              */
> > >
> > > I'm not sure how "mask" and "dirmask" contains information about parent
> > > directories. "mask" represents the available entries, and "dirmask"
> > > represents which of them are directories, as far as I know. So we can
> > > notice when something is missing, but I don't see how this distinguishes
> > > between the case that something is missing because it was in a parent
> > > directory that got deleted, vs something is missing because it itself
> > > got deleted.
> >
> > Yeah, this is more comparisons to unpack_trees.  This code is about to
> > set up a recursive call into subdirectories.  newinfo is set based on
> > the mask and dirmask of the current entry, and then subdirectories can
> > consult newinfo.df_conflicts to see if that path is within a directory
> > that was involved in a directory/file conflict.  For example:
> >
> > Tree in base version:
> >     foo/
> >         bar
> >     stuff.txt
> > Tree on side 1: (adds foo/baz)
> >     foo/
> >         bar
> >         baz
> >     stuff.txt
> > Tree on side 2: (deletes foo/, adds new file foo)
> >    foo
> >    stuff.txt
> >
> > When processing 'foo', we have mask=7, dirmask = 3.  So, here
> > unpack_trees() would have set newinfo.df_conflicts = (mask & ~dirmask)
> > = 4.  Then when we process foo/bar or foo/baz, we have mask=2,
> > dirmask=0, which looks like there are no directory/file conflicts.
> > However, we can note that these paths are under a directory involved
> > in a directory/file conflict via info.df_conflicts whose value is 4.
> > unpack_trees() cared about paths under a directory that was involved
> > in a directory/file conflict, and someone familiar with that code
> > might ask why I don't also track the same information.  The answer is
> > that I don't track it, even though I thought about it, because it's
> > useless overhead since I'm going to leave the directory alone and move
> > the file out of the way.
> >
> > Does that make sense?
>
> Ah...yes, that makes sense. I think I didn't notice the "newinfo", so I
> didn't realize that we were setting the info of our children based on
> ourselves. Perhaps I would have noticed it sooner if the comment had
> read "If this file/directory cared about its parent directory (the
> current directory) having a D/F conflict, then we'd propagate the masks
> in this way:" instead of "If we did care about parent directories having
> a D/F conflict", but perhaps the point is already obvious enough.

I'm happy to reword it if that makes it clearer.  Thanks for the suggestion.

  reply	other threads:[~2020-11-09 23:06 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren
2020-11-06 22:05   ` Jonathan Tan
2020-11-06 22:45     ` Elijah Newren
2020-11-09 20:55       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren
2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren
2020-11-11 13:52   ` Derrick Stolee
2020-11-11 16:22     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren
2020-11-11 13:54   ` Derrick Stolee
2020-11-11 16:47     ` Elijah Newren
2020-11-11 16:51       ` Derrick Stolee
2020-11-11 17:03         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren
2020-11-11 13:58   ` Derrick Stolee
2020-11-11 17:07     ` Elijah Newren
2020-11-11 17:10       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren
2020-11-06 22:19   ` Jonathan Tan
2020-11-06 23:10     ` Elijah Newren
2020-11-09 20:59       ` Jonathan Tan
2020-11-11 14:38   ` Derrick Stolee
2020-11-11 17:02     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren
2020-11-11 14:51   ` Derrick Stolee
2020-11-11 17:13     ` Elijah Newren
2020-11-11 17:21       ` Eric Sunshine
2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren
2020-11-06 22:52   ` Jonathan Tan
2020-11-06 23:41     ` Elijah Newren
2020-11-09 22:04       ` Jonathan Tan
2020-11-09 23:05         ` Elijah Newren [this message]
2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren
2020-11-06 22:58   ` Jonathan Tan
2020-11-07  0:26     ` Elijah Newren
2020-11-09 22:09       ` Jonathan Tan
2020-11-09 23:08         ` Elijah Newren
2020-11-11 15:26   ` Derrick Stolee
2020-11-11 18:16     ` Elijah Newren
2020-11-11 22:06       ` Elijah Newren
2020-11-12 18:23         ` Derrick Stolee
2020-11-12 18:39       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren
2020-11-11 15:31   ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren
2020-11-11 19:51   ` Jonathan Tan
2020-11-12  1:48     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren
2020-11-11 16:09   ` Derrick Stolee
2020-11-11 18:58     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren
2020-11-11 20:01   ` Jonathan Tan
2020-11-11 20:24     ` Elijah Newren
2020-11-12 20:39       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren
2020-11-11 20:47   ` Jonathan Tan
2020-11-11 21:21     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren
2020-11-12 20:15   ` Jonathan Tan
2020-11-12 22:30     ` Elijah Newren
2020-11-24 20:19       ` Elijah Newren
2020-11-25  2:07         ` Jonathan Tan
2020-11-26 18:13           ` Elijah Newren
2020-11-30 18:41             ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren
2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren
2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee
2020-11-03 16:36   ` Elijah Newren
2020-11-07  6:06     ` Elijah Newren
2020-11-07 15:02       ` Derrick Stolee
2020-11-07 19:39         ` Elijah Newren
2020-11-09 12:30           ` Derrick Stolee
2020-11-09 17:13             ` Elijah Newren
2020-11-09 19:51               ` Derrick Stolee
2020-11-09 22:44                 ` Elijah Newren
2020-11-11 17:08 ` Derrick Stolee
2020-11-11 18:35   ` Elijah Newren
2020-11-11 20:48     ` Derrick Stolee
2020-11-11 21:18       ` Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-11-29  7:43 [PATCH " Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BFNk5ZQLyo5G-7vq3XD3s6z_rGUNM3QL14hXvMYTzK_Ug@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).