git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v2 01/20] merge-ort: setup basic internal data structures
Date: Fri, 6 Nov 2020 14:45:02 -0800	[thread overview]
Message-ID: <CABPp-BHx4w1_JAen0fupCzosu0NfoTsT9d8dEve7n5VJZ0EvBQ@mail.gmail.com> (raw)
In-Reply-To: <20201106220521.757698-1-jonathantanmy@google.com>

On Fri, Nov 6, 2020 at 2:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> I'm not very familiar with the merge machinery, but I'll attempt a
> review of at least the first 10 patches.

Thanks for taking a look!

> > merged_info contains all relevant information for a non-conflicted
> > entry.  conflict_info contains a merged_info, plus any additional
> > information about a conflict such as the higher orders stages involved
> > and the names of the paths those came from (handy once renames get
> > involved).  If an entry remains conflicted, the merged_info portion of a
> > conflict_info will later be filled with whatever version of the file
> > should be placed in the working directory (e.g. an as-merged-as-possible
> > variation that contains conflict markers).
>
> I think that this information should be in the .c file.

Okay.

> > diff --git a/merge-ort.c b/merge-ort.c
> > index b487901d3e..9d5ea0930d 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -17,6 +17,46 @@
> >  #include "cache.h"
> >  #include "merge-ort.h"
> >
> > +#include "strmap.h"
> > +
> > +struct merge_options_internal {
> > +     struct strmap paths;    /* maps path -> (merged|conflict)_info */
> > +     struct strmap unmerged; /* maps path -> conflict_info */
> > +     const char *current_dir_name;
> > +     int call_depth;
> > +};
>
> Maybe document if the path is from the root of the directory or just the
> filename as it appears in a tree object?

Yeah, full relative path from toplevel.  I'll try to add some
documentation to that effect.

> I would have expected "unmerged" to be a "strset", but I guess it's more
> convenient for it to be a map itself. Maybe just document it as "all
> mappings in paths wherein the value is a struct conflict_info".

Makes sense.  And yeah, it's not a strset just because of the simple
optimization to avoid needing to do another lookup in paths afterwards
to get the actual conflict_info; the only time it is used is as a loop
over the still-unmerged entries to try to three-way merge them and
such.

> There seems to be 2 ways of referring to something that we couldn't
> merge - "conflicted" (or "having a conflict") and "unmerged". Should we
> stick to one of them?

Uhm...perhaps, but it feels like I'm going to miss some while looking
over it.  Also, there are some semantic differences at play.  Since
the whole algorithm is divided around multiple stages --
collect_merge_info(), detect_and_process_renames(), process_entries(),
as of a given early stage we might just know that we couldn't merge it
*yet*.  For example, both sides modified the entry, or one side
modified and the other side is missing ("did they delete it or rename
it?").  After rename detection and three-way content merge, something
that had not been automatically mergeable as of an earlier step might
end up being so.  But we need names for stuff in the interim state.
But it's also possible for us to know at an early state that thing are
definitely going to be a conflict regardless of what later stages do
(e.g. both sides rename a path, but rename it differently).

> Also, looking ahead, I see that current_dir_name is used as a temporary
> variable in the recursive calls to collect_merge_info_callback(). I
> would prefer if current_dir_name went in the cbdata to that function
> instead, but if that's not possible, maybe document here that
> current_dir_name is only used in collect_merge_info_callback(), and
> temporarily at that.

Yeah, not possible.  collect_merge_info_callback() has to be of
traverse_callback_t type (from tree-walk.h), which provides no extra
parameters for extra callback data.  I can add a documentation
comment.

> > +struct version_info {
> > +     struct object_id oid;
> > +     unsigned short mode;
> > +};
>
> OK.
>
> > +struct merged_info {
> > +     struct version_info result;
> > +     unsigned is_null:1;
> > +     unsigned clean:1;
> > +     size_t basename_offset;
> > +      /*
> > +       * Containing directory name.  Note that we assume directory_name is
> > +       * constructed such that
> > +       *    strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
> > +       * i.e. string equality is equivalent to pointer equality.  For this
> > +       * to hold, we have to be careful setting directory_name.
> > +       */
> > +     const char *directory_name;
> > +};
>
> I'm not sure how most of the fields in this struct are to be used, but
> perhaps that will be clearer once I read the subsequent code.
>
> > +struct conflict_info {
> > +     struct merged_info merged;
> > +     struct version_info stages[3];
> > +     const char *pathnames[3];
>
> Why would these be different across stages? (Rename detection?)

Yes, as noted in the portion of the commit message that you said you
wanted in the .c file.

>
> > +     unsigned df_conflict:1;
>
> OK.
>
> > +     unsigned path_conflict:1;
>
> This doesn't seem to be assigned anywhere in this patch set?

Oh, right, I could drop it for now and add it back later when it is
used.  It's basically non-content conflict other than the specialized
D/F conflict.  So, things like rename/delete, moved by directory
rename, rename/rename(1to2), and rename/add/delete.  I could have
potentially lumped it in with df_conflict or made df_conflict a
subset, but df_conflict is different enough from the others that it
got a special flag.

But yeah, since it's all rename-related stuff and this patchset
doesn't have any rename handling yet, I probably should have left it
out until I added that stuff later.

> > +     unsigned filemask:3;
> > +     unsigned dirmask:3;
>
> I wonder if this needs to be documented that the least significant bit
> corresponds to stages[0], and so forth.

Maybe I should just put a comment to look at tree-walk.h?  The struct
traverse_info has a "fn" member with a big comment above it describing
mask & dirmask; filemask is just mask & ~dirmask.

> > +     unsigned match_mask:3;
>
> I think this can be derived by just looking at the stages array? Maybe
> document as:
>
>   Optimization to track which stages match. Either 0 or at least 2 bits
>   are set; if at least 2 bits are set, their corresponding stages match.

Yep, I only wanted to compute the match_mask once (I always got
annoyed in merge-recursive.c that we were re-comparing what had
already been compared and computed within unpack_trees()).  I like
your suggested comment; will add it.

  reply	other threads:[~2020-11-06 22:45 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren
2020-11-06 22:05   ` Jonathan Tan
2020-11-06 22:45     ` Elijah Newren [this message]
2020-11-09 20:55       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren
2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren
2020-11-11 13:52   ` Derrick Stolee
2020-11-11 16:22     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren
2020-11-11 13:54   ` Derrick Stolee
2020-11-11 16:47     ` Elijah Newren
2020-11-11 16:51       ` Derrick Stolee
2020-11-11 17:03         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren
2020-11-11 13:58   ` Derrick Stolee
2020-11-11 17:07     ` Elijah Newren
2020-11-11 17:10       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren
2020-11-06 22:19   ` Jonathan Tan
2020-11-06 23:10     ` Elijah Newren
2020-11-09 20:59       ` Jonathan Tan
2020-11-11 14:38   ` Derrick Stolee
2020-11-11 17:02     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren
2020-11-11 14:51   ` Derrick Stolee
2020-11-11 17:13     ` Elijah Newren
2020-11-11 17:21       ` Eric Sunshine
2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren
2020-11-06 22:52   ` Jonathan Tan
2020-11-06 23:41     ` Elijah Newren
2020-11-09 22:04       ` Jonathan Tan
2020-11-09 23:05         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren
2020-11-06 22:58   ` Jonathan Tan
2020-11-07  0:26     ` Elijah Newren
2020-11-09 22:09       ` Jonathan Tan
2020-11-09 23:08         ` Elijah Newren
2020-11-11 15:26   ` Derrick Stolee
2020-11-11 18:16     ` Elijah Newren
2020-11-11 22:06       ` Elijah Newren
2020-11-12 18:23         ` Derrick Stolee
2020-11-12 18:39       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren
2020-11-11 15:31   ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren
2020-11-11 19:51   ` Jonathan Tan
2020-11-12  1:48     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren
2020-11-11 16:09   ` Derrick Stolee
2020-11-11 18:58     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren
2020-11-11 20:01   ` Jonathan Tan
2020-11-11 20:24     ` Elijah Newren
2020-11-12 20:39       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren
2020-11-11 20:47   ` Jonathan Tan
2020-11-11 21:21     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren
2020-11-12 20:15   ` Jonathan Tan
2020-11-12 22:30     ` Elijah Newren
2020-11-24 20:19       ` Elijah Newren
2020-11-25  2:07         ` Jonathan Tan
2020-11-26 18:13           ` Elijah Newren
2020-11-30 18:41             ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren
2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren
2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee
2020-11-03 16:36   ` Elijah Newren
2020-11-07  6:06     ` Elijah Newren
2020-11-07 15:02       ` Derrick Stolee
2020-11-07 19:39         ` Elijah Newren
2020-11-09 12:30           ` Derrick Stolee
2020-11-09 17:13             ` Elijah Newren
2020-11-09 19:51               ` Derrick Stolee
2020-11-09 22:44                 ` Elijah Newren
2020-11-11 17:08 ` Derrick Stolee
2020-11-11 18:35   ` Elijah Newren
2020-11-11 20:48     ` Derrick Stolee
2020-11-11 21:18       ` Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-11-29  7:43 [PATCH " Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BHx4w1_JAen0fupCzosu0NfoTsT9d8dEve7n5VJZ0EvBQ@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).