git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/20] fundamentals of merge-ort implementation
@ 2020-11-29  7:43 Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
                   ` (21 more replies)
  0 siblings, 22 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This is actually v3 of this series; but v2 depended on two topics that
hadn't graduated yet so I couldn't easily use gitgitgadget and get its
testing. Now that the topics have graduated, I have rebased on master. You
can see v2 and comments on it over here: 
https://lore.kernel.org/git/20201102204344.342633-1-newren@gmail.com/

The goal of this series is to show the new design and structure behind
merge-ort, particularly the bits that are completely different to how
merge-recursive operates. There are still multiple important codepaths that
die with a "Not yet implemented" message, so the new merge algorithm is
still not very usable. However, it can handle very trivial rebases or
cherry-picks at the end of the series, and the number of test failures when
run under GIT_TEST_MERGE_ALGORITHM=ort drops from 2281 down to 1453.

At a high level, merge-ort avoids unpack_trees() and the index, instead
using traverse_trees() and its own data structure. After it is done
processing each path, it writes a tree. Only after it has created a new tree
will it touch the working copy or the index. It does so by using a simple
checkout-like step to switch from head to the newly created tree. If there
are conflicted entries, it touches up the index after the checkout-like step
to record those higher order stages.

In the series:

 * Patch 1 adds some basic data structures.
 * Patch 2 documents the high-level steps.
 * Patches 3-5 are some simple setup.
 * Patches 6-10 collect data from the traverse_trees() operation.
 * Patches 11-15 process the individual paths and create a tree.
 * Patches 16-19 handle checkout-and-then-write-higher-order-stages.
 * Patch 20 frees data from the merge_options_internal data structure

Changes since v2 (Thanks to Stolee and Jonathan Tan for their excellent and
detailed reviews!):

 * Add thorough code comments to data structures, to try to answer multiple
   questions the reviewers brought up.
 * Add comments in various areas of the code that were a bit tricky.
 * As per Stolee's request/suggestion, restructured accesses to
   conflict_info* to make the code document that we
 * As per Jonathan's suggestion, restructured tree writing to have
   "END_TREE" markers for directories -- namely, the directory itself.
 * Removed path_conflict field (will add it back in a later series when it
   is actually used)
 * Improved requested commit messages
 * Settled on "conflicted" instead of mix of "conflicted" and "unmerged"
 * Fixed various typos, missing words, etc.
 * Fixed various spaces around operators, missing blank lines, etc.
 * Some other small tweaks I probably forgot or overlooked while typing up
   this summary.

Things for reviewers to concentrate on:

 * Patch 1: I added lots of comments describing the data structures, based
   on reviewer questions/comments. Are they clear?
 * Patch 9: Stolee was worried about the allocation of merged_info OR
   conflict_info and whether we were safely accessing the fields. Since this
   is our primary and largest data structure and many times most entries
   only need a smaller merged_info, I really do want the space savings of
   not always allocating the larger type. I typically access as a
   merged_info* now, and added some accessor macros to document why each
   access as a conflict_info* is okay. Does this seem like a good solution
   to the concern? 
 * Patch 15: Jonathan felt the previous version of this patch was hard to
   follow. I've restructured so that we process the directories in
   opt->priv->paths directly; you can kind of view them as non-synthetic
   end-of-tree markers. They may not stand out as such, though (since they
   aren't synthetic with special names or handling), so I've added some
   pretty big comments to explain things. Does this address concerns?
 * Patches 16-20: these have not yet been reviewed, though these are easier
   patches to review than many of the first 15! A quick guide: * Patches 16,
      18, and 20 are very straightforward; patches 17 and 19 are the ones
      that would benefit more from review.
    * Patch 17 is
      basically the twoway_merge subset of merge_working_tree() from
      builtin/checkout.c. Find that bit of code and it's a direct
      comparison.
    * Patch 19
      amounts to "how do I remove stage 0 entries in the index and replace
      them with 1-3 higher order stages?".

Elijah Newren (20):
  merge-ort: setup basic internal data structures
  merge-ort: add some high-level algorithm structure
  merge-ort: port merge_start() from merge-recursive
  merge-ort: use histogram diff
  merge-ort: add an err() function similar to one from merge-recursive
  merge-ort: implement a very basic collect_merge_info()
  merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  merge-ort: compute a few more useful fields for collect_merge_info
  merge-ort: record stage and auxiliary info for every path
  merge-ort: avoid recursing into identical trees
  merge-ort: add a preliminary simple process_entries() implementation
  merge-ort: have process_entries operate in a defined order
  merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  merge-ort: step 2 of tree writing -- function to create tree object
  merge-ort: step 3 of tree writing -- handling subdirectories as we go
  merge-ort: basic outline for merge_switch_to_result()
  merge-ort: add implementation of checkout()
  tree: enable cmp_cache_name_compare() to be used elsewhere
  merge-ort: add implementation of record_conflicted_index_entries()
  merge-ort: free data structures in merge_finalize()

 merge-ort.c | 1207 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 tree.c      |    2 +-
 tree.h      |    2 +
 3 files changed, 1207 insertions(+), 4 deletions(-)


base-commit: e67fbf927dfdf13d0b21dc6ea15dc3c7ef448ea0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-923%2Fnewren%2Fort-basics-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-923/newren/ort-basics-v1
Pull-Request: https://github.com/git/git/pull/923
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 01/20] merge-ort: setup basic internal data structures
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Set up some basic internal data structures.  The only carry-over from
merge-recursive.c is call_depth, though needed_rename_limit will be
added later.

The central piece of data will definitely be the strmap "paths", which
will map every relevant pathname under consideration to either a
merged_info or a conflict_info.  ("conflicted" is a strmap that is a
subset of "paths".)

merged_info contains all relevant information for a non-conflicted
entry.  conflict_info contains a merged_info, plus any additional
information about a conflict such as the higher orders stages involved
and the names of the paths those came from (handy once renames get
involved).  If an entry remains conflicted, the merged_info portion of a
conflict_info will later be filled with whatever version of the file
should be placed in the working directory (e.g. an as-merged-as-possible
variation that contains conflict markers).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index b487901d3e..bb37fdf838 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,143 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "strmap.h"
+
+struct merge_options_internal {
+	/*
+	 * paths: primary data structure in all of merge ort.
+	 *
+	 * The keys of paths:
+	 *   * are full relative paths from the toplevel of the repository
+	 *     (e.g. "drivers/firmware/raspberrypi.c").
+	 *   * store all relevant paths in the repo, both directories and
+	 *     files (e.g. drivers, drivers/firmware would also be included)
+	 *   * these keys serve to intern all the path strings, which allows
+	 *     us to do pointer comparison on directory names instead of
+	 *     strcmp; we just have to be careful to use the interned strings.
+	 *
+	 * The values of paths:
+	 *   * either a pointer to a merged_info, or a conflict_info struct
+	 *   * merged_info contains all relevant information for a
+	 *     non-conflicted entry.
+	 *   * conflict_info contains a merged_info, plus any additional
+	 *     information about a conflict such as the higher orders stages
+	 *     involved and the names of the paths those came from (handy
+	 *     once renames get involved).
+	 *   * a path may start "conflicted" (i.e. point to a conflict_info)
+	 *     and then a later step (e.g. three-way content merge) determines
+	 *     it can be cleanly merged, at which point it'll be marked clean
+	 *     and the algorithm will ignore any data outside the contained
+	 *     merged_info for that entry
+	 *   * If an entry remains conflicted, the merged_info portion of a
+	 *     conflict_info will later be filled with whatever version of
+	 *     the file should be placed in the working directory (e.g. an
+	 *     as-merged-as-possible variation that contains conflict markers).
+	 */
+	struct strmap paths;
+
+	/*
+	 * conflicted: a subset of keys->values from "paths"
+	 *
+	 * conflicted is basically an optimization between process_entries()
+	 * and record_conflicted_index_entries(); the latter could loop over
+	 * ALL the entries in paths AGAIN and look for the ones that are
+	 * still conflicted, but since process_entries() has to loop over
+	 * all of them, it saves the ones it couldn't resolve in this strmap
+	 * so that record_conflicted_index_entries() can iterate just the
+	 * relevant entries.
+	 */
+	struct strmap conflicted;
+
+	/*
+	 * current_dir_name: temporary var used in collect_merge_info_callback()
+	 *
+	 * Used to set merged_info.directory_name; see documentation for that
+	 * variable and the requirements placed on that field.
+	 */
+	const char *current_dir_name;
+
+	/* call_depth: recursion level counter for merging merge bases */
+	int call_depth;
+};
+
+struct version_info {
+	struct object_id oid;
+	unsigned short mode;
+};
+
+struct merged_info {
+	/* if is_null, ignore result.  otherwise result has oid & mode */
+	struct version_info result;
+	unsigned is_null:1;
+
+	/*
+	 * clean: whether the path in question is cleanly merged.
+	 *
+	 * see conflict_info.merged for more details.
+	 */
+	unsigned clean:1;
+
+	/*
+	 * basename_offset: offset of basename of path.
+	 *
+	 * perf optimization to avoid recomputing offset of final '/'
+	 * character in pathname (0 if no '/' in pathname).
+	 */
+	size_t basename_offset;
+
+	 /*
+	  * directory_name: containing directory name.
+	  *
+	  * Note that we assume directory_name is constructed such that
+	  *    strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
+	  * i.e. string equality is equivalent to pointer equality.  For this
+	  * to hold, we have to be careful setting directory_name.
+	  */
+	const char *directory_name;
+};
+
+struct conflict_info {
+	/*
+	 * merged: the version of the path that will be written to working tree
+	 *
+	 * WARNING: It is critical to check merged.clean and ensure it is 0
+	 * before reading any conflict_info fields outside of merged.
+	 * Allocated merge_info structs will always have clean set to 1.
+	 * Allocated conflict_info structs will have merged.clean set to 0
+	 * initially.  The merged.clean field is how we know if it is safe
+	 * to access other parts of conflict_info besides merged; if a
+	 * conflict_info's merged.clean is changed to 1, the rest of the
+	 * algorithm is not allowed to look at anything outside of the
+	 * merged member anymore.
+	 */
+	struct merged_info merged;
+
+	/* oids & modes from each of the three trees for this path */
+	struct version_info stages[3];
+
+	/* pathnames for each stage; may differ due to rename detection */
+	const char *pathnames[3];
+
+	/* Whether this path is/was involved in a directory/file conflict */
+	unsigned df_conflict:1;
+
+	/*
+	 * For filemask and dirmask, see tree-walk.h's struct traverse_info,
+	 * particularly the documentation above the "fn" member.  Note that
+	 * filemask = mask & ~dirmask from that documentation.
+	 */
+	unsigned filemask:3;
+	unsigned dirmask:3;
+
+	/*
+	 * Optimization to track which stages match, to avoid the need to
+	 * recompute it in multiple steps. Either 0 or at least 2 bits are
+	 * set; if at least 2 bits are set, their corresponding stages match.
+	 */
+	unsigned match_mask:3;
+};
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 02/20] merge-ort: add some high-level algorithm structure
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge_ort_nonrecursive_internal() will be used by both
merge_inmemory_nonrecursive() and merge_inmemory_recursive(); let's
focus on it for now.  It involves some setup -- merge_start() --
followed by the following chain of functions:

  collect_merge_info()
    This function will populate merge_options_internal's paths field,
    via a call to traverse_trees() and a new callback that will be added
    later.

  detect_and_process_renames()
    This function will detect renames, and then adjust entries in paths
    to move conflict stages from old pathnames into those for new
    pathnames, so that the next step doesn't have to think about renames
    and just can do three-way content merging and such.

  process_entries()
    This function determines how to take the various stages (versions of
    a file from the three different sides) and merge them, and whether
    to mark the result as conflicted or cleanly merged.  It also writes
    out these merged file versions as it goes to create a tree.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index bb37fdf838..97ef2276bd 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -18,6 +18,7 @@
 #include "merge-ort.h"
 
 #include "strmap.h"
+#include "tree.h"
 
 struct merge_options_internal {
 	/*
@@ -154,6 +155,37 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+static int collect_merge_info(struct merge_options *opt,
+			      struct tree *merge_base,
+			      struct tree *side1,
+			      struct tree *side2)
+{
+	die("Not yet implemented.");
+}
+
+static int detect_and_process_renames(struct merge_options *opt,
+				      struct tree *merge_base,
+				      struct tree *side1,
+				      struct tree *side2)
+{
+	int clean = 1;
+
+	/*
+	 * Rename detection works by detecting file similarity.  Here we use
+	 * a really easy-to-implement scheme: files are similar IFF they have
+	 * the same filename.  Therefore, by this scheme, there are no renames.
+	 *
+	 * TODO: Actually implement a real rename detection scheme.
+	 */
+	return clean;
+}
+
+static void process_entries(struct merge_options *opt,
+			    struct object_id *result_oid)
+{
+	die("Not yet implemented.");
+}
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
@@ -170,13 +202,46 @@ void merge_finalize(struct merge_options *opt,
 	die("Not yet implemented");
 }
 
+static void merge_start(struct merge_options *opt, struct merge_result *result)
+{
+	die("Not yet implemented.");
+}
+
+/*
+ * Originally from merge_trees_internal(); heavily adapted, though.
+ */
+static void merge_ort_nonrecursive_internal(struct merge_options *opt,
+					    struct tree *merge_base,
+					    struct tree *side1,
+					    struct tree *side2,
+					    struct merge_result *result)
+{
+	struct object_id working_tree_oid;
+
+	collect_merge_info(opt, merge_base, side1, side2);
+	result->clean = detect_and_process_renames(opt, merge_base,
+						   side1, side2);
+	process_entries(opt, &working_tree_oid);
+
+	/* Set return values */
+	result->tree = parse_tree_indirect(&working_tree_oid);
+	/* existence of conflicted entries implies unclean */
+	result->clean &= strmap_empty(&opt->priv->conflicted);
+	if (!opt->priv->call_depth) {
+		result->priv = opt->priv;
+		opt->priv = NULL;
+	}
+}
+
 void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *merge_base,
 			       struct tree *side1,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
-	die("Not yet implemented");
+	assert(opt->ancestor != NULL);
+	merge_start(opt, result);
+	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 03/20] merge-ort: port merge_start() from merge-recursive
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge_start() basically does a bunch of sanity checks, then allocates
and initializes opt->priv -- a struct merge_options_internal.

Most of the sanity checks are usable as-is.  The
allocation/intialization is a bit different since merge-ort has a very
different merge_options_internal than merge-recursive, but the idea is
the same.

The weirdest part here is that merge-ort and merge-recursive use the
same struct merge_options, even though merge_options has a number of
fields that are oddly specific to merge-recursive's internal
implementation and don't even make sense with merge-ort's high-level
design (e.g. buffer_output, which merge-ort has to always do).  I reused
the same data structure because:
  * most the fields made sense to both merge algorithms
  * making a new struct would have required making new enums or somehow
    externalizing them, and that was getting messy.
  * it simplifies converting the existing callers by not having to
    have different code paths for merge_options setup.

I also marked detect_renames as ignored.  We can revisit that later, but
in short: merge-recursive allowed turning off rename detection because
it was sometimes glacially slow.  When you speed something up by a few
orders of magnitude, it's worth revisiting whether that justification is
still relevant.  Besides, if folks find it's still too slow, perhaps
they have a better scaling case than I could find and maybe it turns up
some more optimizations we can add.  If it still is needed as an option,
it is easy to add later.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 97ef2276bd..3581a7d278 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,8 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "diff.h"
+#include "diffcore.h"
 #include "strmap.h"
 #include "tree.h"
 
@@ -204,7 +206,48 @@ void merge_finalize(struct merge_options *opt,
 
 static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
-	die("Not yet implemented.");
+	/* Sanity checks on opt */
+	assert(opt->repo);
+
+	assert(opt->branch1 && opt->branch2);
+
+	assert(opt->detect_directory_renames >= MERGE_DIRECTORY_RENAMES_NONE &&
+	       opt->detect_directory_renames <= MERGE_DIRECTORY_RENAMES_TRUE);
+	assert(opt->rename_limit >= -1);
+	assert(opt->rename_score >= 0 && opt->rename_score <= MAX_SCORE);
+	assert(opt->show_rename_progress >= 0 && opt->show_rename_progress <= 1);
+
+	assert(opt->xdl_opts >= 0);
+	assert(opt->recursive_variant >= MERGE_VARIANT_NORMAL &&
+	       opt->recursive_variant <= MERGE_VARIANT_THEIRS);
+
+	/*
+	 * detect_renames, verbosity, buffer_output, and obuf are ignored
+	 * fields that were used by "recursive" rather than "ort" -- but
+	 * sanity check them anyway.
+	 */
+	assert(opt->detect_renames >= -1 &&
+	       opt->detect_renames <= DIFF_DETECT_COPY);
+	assert(opt->verbosity >= 0 && opt->verbosity <= 5);
+	assert(opt->buffer_output <= 2);
+	assert(opt->obuf.len == 0);
+
+	assert(opt->priv == NULL);
+
+	/* Initialization of opt->priv, our internal merge data */
+	opt->priv = xcalloc(1, sizeof(*opt->priv));
+
+	/*
+	 * Although we initialize opt->priv->paths with strdup_strings=0,
+	 * that's just to avoid making yet another copy of an allocated
+	 * string.  Putting the entry into paths means we are taking
+	 * ownership, so we will later free it.
+	 *
+	 * In contrast, conflicted just has a subset of keys from paths, so
+	 * we don't want to free those (it'd be a duplicate free).
+	 */
+	strmap_init_with_options(&opt->priv->paths, NULL, 0);
+	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 04/20] merge-ort: use histogram diff
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

In my cursory investigation, histogram diffs are about 2% slower than
Myers diffs.  Others have probably done more detailed benchmarks.  But,
in short, histogram diffs have been around for years and in a number of
cases provide obviously better looking diffs where Myers diffs are
unintelligible but the performance hit has kept them from becoming the
default.

However, there are real merge bugs we know about that have triggered on
git.git and linux.git, which I don't have a clue how to address without
the additional information that I believe is provided by histogram
diffs.  See the following:

https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/
https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/

I don't like mismerges.  I really don't like silent mismerges.  While I
am sometimes willing to make performance and correctness tradeoff, I'm
much more interested in correctness in general.  I want to fix the above
bugs.  I have not yet started doing so, but I believe histogram diff at
least gives me an angle.  Unfortunately, I can't rely on using the
information from histogram diff unless it's in use.  And it hasn't been
used because of a few percentage performance hit.

In testcases I have looked at, merge-ort is _much_ faster than
merge-recursive for non-trivial merges/rebases/cherry-picks.  As such,
this is a golden opportunity to switch out the underlying diff algorithm
(at least the one used by the merge machinery; git-diff and git-log are
separate questions); doing so will allow me to get additional data and
improved diffs, and I believe it will help me fix the above bugs at some
point in the future.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 3581a7d278..d737762700 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -21,6 +21,7 @@
 #include "diffcore.h"
 #include "strmap.h"
 #include "tree.h"
+#include "xdiff-interface.h"
 
 struct merge_options_internal {
 	/*
@@ -234,6 +235,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	assert(opt->priv == NULL);
 
+	/* Default to histogram diff.  Actually, just hardcode it...for now. */
+	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
+
 	/* Initialization of opt->priv, our internal merge data */
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29 10:23   ` Ævar Arnfjörð Bjarmason
  2020-11-29 10:26   ` Ævar Arnfjörð Bjarmason
  2020-11-29  7:43 ` [PATCH 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Various places in merge-recursive used an err() function when it hit
some kind of unrecoverable error.  That code was from the reusable bits
of merge-recursive.c that we liked, such as merge_3way, writing object
files to the object store, reading blobs from the object store, etc.  So
create a similar function to allow us to port that code over, and use it
for when we detect problems returned from collect_merge_info()'s
traverse_trees() call, which we will be adding next.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index d737762700..baf31bcc28 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -158,11 +158,28 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+static int err(struct merge_options *opt, const char *err, ...)
+{
+	va_list params;
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addstr(&sb, "error: ");
+	va_start(params, err);
+	strbuf_vaddf(&sb, err, params);
+	va_end(params);
+
+	error("%s", sb.buf);
+	strbuf_release(&sb);
+
+	return -1;
+}
+
 static int collect_merge_info(struct merge_options *opt,
 			      struct tree *merge_base,
 			      struct tree *side1,
 			      struct tree *side2)
 {
+	/* TODO: Implement this using traverse_trees() */
 	die("Not yet implemented.");
 }
 
@@ -265,7 +282,15 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
-	collect_merge_info(opt, merge_base, side1, side2);
+	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
+		err(opt, _("collecting merge info failed for trees %s, %s, %s"),
+		    oid_to_hex(&merge_base->object.oid),
+		    oid_to_hex(&side1->object.oid),
+		    oid_to_hex(&side2->object.oid));
+		result->clean = -1;
+		return;
+	}
+
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
 	process_entries(opt, &working_tree_oid);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 06/20] merge-ort: implement a very basic collect_merge_info()
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This does not actually collect any necessary info other than the
pathnames involved, since it just allocates an all-zero conflict_info
and stuffs that into paths.  However, it invokes the traverse_trees()
machinery to walk over all the paths and sets up the basic
infrastructure we need.

I have left out a few obvious optimizations to try to make this patch as
short and obvious as possible.  A subsequent patch will add some of
those back in with some more useful data fields before we introduce a
patch that actually sets up the conflict_info fields.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 117 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index baf31bcc28..a3096876d4 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -174,13 +174,128 @@ static int err(struct merge_options *opt, const char *err, ...)
 	return -1;
 }
 
+static int collect_merge_info_callback(int n,
+				       unsigned long mask,
+				       unsigned long dirmask,
+				       struct name_entry *names,
+				       struct traverse_info *info)
+{
+	/*
+	 * n is 3.  Always.
+	 * common ancestor (mbase) has mask 1, and stored in index 0 of names
+	 * head of side 1  (side1) has mask 2, and stored in index 1 of names
+	 * head of side 2  (side2) has mask 4, and stored in index 2 of names
+	 */
+	struct merge_options *opt = info->data;
+	struct merge_options_internal *opti = opt->priv;
+	struct conflict_info *ci;
+	struct name_entry *p;
+	size_t len;
+	char *fullpath;
+	unsigned filemask = mask & ~dirmask;
+	unsigned mbase_null = !(mask & 1);
+	unsigned side1_null = !(mask & 2);
+	unsigned side2_null = !(mask & 4);
+
+	/* n = 3 is a fundamental assumption. */
+	if (n != 3)
+		BUG("Called collect_merge_info_callback wrong");
+
+	/*
+	 * A bunch of sanity checks verifying that traverse_trees() calls
+	 * us the way I expect.  Could just remove these at some point,
+	 * though maybe they are helpful to future code readers.
+	 */
+	assert(mbase_null == is_null_oid(&names[0].oid));
+	assert(side1_null == is_null_oid(&names[1].oid));
+	assert(side2_null == is_null_oid(&names[2].oid));
+	assert(!mbase_null || !side1_null || !side2_null);
+	assert(mask > 0 && mask < 8);
+
+	/*
+	 * Get the name of the relevant filepath, which we'll pass to
+	 * setup_path_info() for tracking.
+	 */
+	p = names;
+	while (!p->mode)
+		p++;
+	len = traverse_path_len(info, p->pathlen);
+
+	/* +1 in both of the following lines to include the NUL byte */
+	fullpath = xmalloc(len + 1);
+	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
+
+	/*
+	 * TODO: record information about the path other than all zeros,
+	 * so we can resolve later in process_entries.
+	 */
+	ci = xcalloc(1, sizeof(struct conflict_info));
+	strmap_put(&opti->paths, fullpath, ci);
+
+	/* If dirmask, recurse into subdirectories */
+	if (dirmask) {
+		struct traverse_info newinfo;
+		struct tree_desc t[3];
+		void *buf[3] = {NULL, NULL, NULL};
+		const char *original_dir_name;
+		int i, ret;
+
+		ci->match_mask &= filemask;
+		newinfo = *info;
+		newinfo.prev = info;
+		newinfo.name = p->path;
+		newinfo.namelen = p->pathlen;
+		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
+
+		for (i = 0; i < 3; i++) {
+			const struct object_id *oid = NULL;
+			if (dirmask & 1)
+				oid = &names[i].oid;
+			buf[i] = fill_tree_descriptor(opt->repo, t + i, oid);
+			dirmask >>= 1;
+		}
+
+		original_dir_name = opti->current_dir_name;
+		opti->current_dir_name = fullpath;
+		ret = traverse_trees(NULL, 3, t, &newinfo);
+		opti->current_dir_name = original_dir_name;
+
+		for (i = 0; i < 3; i++)
+			free(buf[i]);
+
+		if (ret < 0)
+			return -1;
+	}
+
+	return mask;
+}
+
 static int collect_merge_info(struct merge_options *opt,
 			      struct tree *merge_base,
 			      struct tree *side1,
 			      struct tree *side2)
 {
-	/* TODO: Implement this using traverse_trees() */
-	die("Not yet implemented.");
+	int ret;
+	struct tree_desc t[3];
+	struct traverse_info info;
+	const char *toplevel_dir_placeholder = "";
+
+	opt->priv->current_dir_name = toplevel_dir_placeholder;
+	setup_traverse_info(&info, toplevel_dir_placeholder);
+	info.fn = collect_merge_info_callback;
+	info.data = opt;
+	info.show_all_errors = 1;
+
+	parse_tree(merge_base);
+	parse_tree(side1);
+	parse_tree(side2);
+	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
+	init_tree_desc(t + 1, side1->buffer, side1->size);
+	init_tree_desc(t + 2, side2->buffer, side2->size);
+
+	ret = traverse_trees(NULL, 3, t, &info);
+
+	return ret;
 }
 
 static int detect_and_process_renames(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Three-way merges, by their nature, are going to often have two or more
trees match at a given subdirectory.  We can avoid calling
fill_tree_descriptor() on the same tree by checking when these trees
match.  Noting when various oids match will also be useful in other
calculations and optimizations as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index a3096876d4..820809f67e 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -196,6 +196,15 @@ static int collect_merge_info_callback(int n,
 	unsigned mbase_null = !(mask & 1);
 	unsigned side1_null = !(mask & 2);
 	unsigned side2_null = !(mask & 4);
+	unsigned side1_matches_mbase = (!side1_null && !mbase_null &&
+					names[0].mode == names[1].mode &&
+					oideq(&names[0].oid, &names[1].oid));
+	unsigned side2_matches_mbase = (!side2_null && !mbase_null &&
+					names[0].mode == names[2].mode &&
+					oideq(&names[0].oid, &names[2].oid));
+	unsigned sides_match = (!side1_null && !side2_null &&
+				names[1].mode == names[2].mode &&
+				oideq(&names[1].oid, &names[2].oid));
 
 	/* n = 3 is a fundamental assumption. */
 	if (n != 3)
@@ -248,10 +257,19 @@ static int collect_merge_info_callback(int n,
 		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
 
 		for (i = 0; i < 3; i++) {
-			const struct object_id *oid = NULL;
-			if (dirmask & 1)
-				oid = &names[i].oid;
-			buf[i] = fill_tree_descriptor(opt->repo, t + i, oid);
+			if (i == 1 && side1_matches_mbase)
+				t[1] = t[0];
+			else if (i == 2 && side2_matches_mbase)
+				t[2] = t[0];
+			else if (i == 2 && sides_match)
+				t[2] = t[1];
+			else {
+				const struct object_id *oid = NULL;
+				if (dirmask & 1)
+					oid = &names[i].oid;
+				buf[i] = fill_tree_descriptor(opt->repo,
+							      t + i, oid);
+			}
 			dirmask >>= 1;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 08/20] merge-ort: compute a few more useful fields for collect_merge_info
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 820809f67e..e5bca25a8d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -193,6 +193,7 @@ static int collect_merge_info_callback(int n,
 	size_t len;
 	char *fullpath;
 	unsigned filemask = mask & ~dirmask;
+	unsigned match_mask = 0; /* will be updated below */
 	unsigned mbase_null = !(mask & 1);
 	unsigned side1_null = !(mask & 2);
 	unsigned side2_null = !(mask & 4);
@@ -206,6 +207,22 @@ static int collect_merge_info_callback(int n,
 				names[1].mode == names[2].mode &&
 				oideq(&names[1].oid, &names[2].oid));
 
+	/*
+	 * Note: When a path is a file on one side of history and a directory
+	 * in another, we have a directory/file conflict.  In such cases, if
+	 * the conflict doesn't resolve from renames and deletions, then we
+	 * always leave directories where they are and move files out of the
+	 * way.  Thus, while struct conflict_info has a df_conflict field to
+	 * track such conflicts, we ignore that field for any directories at
+	 * a path and only pay attention to it for files at the given path.
+	 * The fact that we leave directories were they are also means that
+	 * we do not need to worry about getting additional df_conflict
+	 * information propagated from parent directories down to children
+	 * (unlike, say traverse_trees_recursive() in unpack-trees.c, which
+	 * sets a newinfo.df_conflicts field specifically to propagate it).
+	 */
+	unsigned df_conflict = (filemask != 0) && (dirmask != 0);
+
 	/* n = 3 is a fundamental assumption. */
 	if (n != 3)
 		BUG("Called collect_merge_info_callback wrong");
@@ -221,6 +238,14 @@ static int collect_merge_info_callback(int n,
 	assert(!mbase_null || !side1_null || !side2_null);
 	assert(mask > 0 && mask < 8);
 
+	/* Determine match_mask */
+	if (side1_matches_mbase)
+		match_mask = (side2_matches_mbase ? 7 : 3);
+	else if (side2_matches_mbase)
+		match_mask = 5;
+	else if (sides_match)
+		match_mask = 6;
+
 	/*
 	 * Get the name of the relevant filepath, which we'll pass to
 	 * setup_path_info() for tracking.
@@ -239,6 +264,8 @@ static int collect_merge_info_callback(int n,
 	 * so we can resolve later in process_entries.
 	 */
 	ci = xcalloc(1, sizeof(struct conflict_info));
+	ci->df_conflict = df_conflict;
+	ci->match_mask = match_mask;
 	strmap_put(&opti->paths, fullpath, ci);
 
 	/* If dirmask, recurse into subdirectories */
@@ -255,6 +282,15 @@ static int collect_merge_info_callback(int n,
 		newinfo.name = p->path;
 		newinfo.namelen = p->pathlen;
 		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
+		/*
+		 * If this directory we are about to recurse into cared about
+		 * its parent directory (the current directory) having a D/F
+		 * conflict, then we'd propagate the masks in this way:
+		 *    newinfo.df_conflicts |= (mask & ~dirmask);
+		 * But we don't worry about propagating D/F conflicts.  (See
+		 * comment near setting of local df_conflict variable near
+		 * the beginning of this function).
+		 */
 
 		for (i = 0; i < 3; i++) {
 			if (i == 1 && side1_matches_mbase)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 09/20] merge-ort: record stage and auxiliary info for every path
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (7 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create a helper function, setup_path_info(), which can be used to record
all the information we want in a merged_info or conflict_info.  While
there is currently only one caller of this new function, and some of its
particular parameters are fixed, future callers of this function will be
added later.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 90 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index e5bca25a8d..52a8c41cf8 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -158,6 +158,26 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+/*
+ * For the next three macros, see warning for conflict_info.merged.
+ *
+ * In each of the below, mi is a struct merged_info*, and ci was defined
+ * as a struct conflict_info* (but we need to verify ci isn't actually
+ * pointed at a struct merged_info*).
+ *
+ * INITIALIZE_CI: Assign ci to mi but only if it's safe; set to NULL otherwise.
+ * VERIFY_CI: Ensure that something we assigned to a conflict_info* is one.
+ * ASSIGN_AND_VERIFY_CI: Similar to VERIFY_CI but do assignment first.
+ */
+#define INITIALIZE_CI(ci, mi) do {                                           \
+	(ci) = (!(mi) || (mi)->clean) ? NULL : (struct conflict_info *)(mi); \
+} while (0)
+#define VERIFY_CI(ci) assert(ci && !ci->merged.clean);
+#define ASSIGN_AND_VERIFY_CI(ci, mi) do {    \
+	(ci) = (struct conflict_info *)(mi);  \
+	assert((ci) && !(mi)->clean);        \
+} while (0)
+
 static int err(struct merge_options *opt, const char *err, ...)
 {
 	va_list params;
@@ -174,6 +194,65 @@ static int err(struct merge_options *opt, const char *err, ...)
 	return -1;
 }
 
+static void setup_path_info(struct merge_options *opt,
+			    struct string_list_item *result,
+			    const char *current_dir_name,
+			    int current_dir_name_len,
+			    char *fullpath, /* we'll take over ownership */
+			    struct name_entry *names,
+			    struct name_entry *merged_version,
+			    unsigned is_null,     /* boolean */
+			    unsigned df_conflict, /* boolean */
+			    unsigned filemask,
+			    unsigned dirmask,
+			    int resolved          /* boolean */)
+{
+	/* result->util is void*, so mi is a convenience typed variable */
+	struct merged_info *mi;
+
+	assert(!is_null || resolved);
+	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
+	assert(resolved == (merged_version != NULL));
+
+	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
+				   sizeof(struct conflict_info));
+	mi->directory_name = current_dir_name;
+	mi->basename_offset = current_dir_name_len;
+	mi->clean = !!resolved;
+	if (resolved) {
+		mi->result.mode = merged_version->mode;
+		oidcpy(&mi->result.oid, &merged_version->oid);
+		mi->is_null = !!is_null;
+	} else {
+		int i;
+		struct conflict_info *ci;
+
+		ASSIGN_AND_VERIFY_CI(ci, mi);
+		for (i = 0; i < 3; i++) {
+			ci->pathnames[i] = fullpath;
+			ci->stages[i].mode = names[i].mode;
+			oidcpy(&ci->stages[i].oid, &names[i].oid);
+		}
+		ci->filemask = filemask;
+		ci->dirmask = dirmask;
+		ci->df_conflict = !!df_conflict;
+		if (dirmask)
+			/*
+			 * Assume is_null for now, but if we have entries
+			 * under the directory then when it is complete in
+			 * write_completed_directory() it'll update this.
+			 * Also, for D/F conflicts, we have to handle the
+			 * directory first, then clear this bit and process
+			 * the file to see how it is handled -- that occurs
+			 * near the top of process_entry().
+			 */
+			mi->is_null = 1;
+	}
+	strmap_put(&opt->priv->paths, fullpath, mi);
+	result->string = fullpath;
+	result->util = mi;
+}
+
 static int collect_merge_info_callback(int n,
 				       unsigned long mask,
 				       unsigned long dirmask,
@@ -188,10 +267,12 @@ static int collect_merge_info_callback(int n,
 	 */
 	struct merge_options *opt = info->data;
 	struct merge_options_internal *opti = opt->priv;
-	struct conflict_info *ci;
+	struct string_list_item pi;  /* Path Info */
+	struct conflict_info *ci; /* typed alias to pi.util (which is void*) */
 	struct name_entry *p;
 	size_t len;
 	char *fullpath;
+	const char *dirname = opti->current_dir_name;
 	unsigned filemask = mask & ~dirmask;
 	unsigned match_mask = 0; /* will be updated below */
 	unsigned mbase_null = !(mask & 1);
@@ -260,13 +341,15 @@ static int collect_merge_info_callback(int n,
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
-	 * TODO: record information about the path other than all zeros,
-	 * so we can resolve later in process_entries.
+	 * Record information about the path so we can resolve later in
+	 * process_entries.
 	 */
-	ci = xcalloc(1, sizeof(struct conflict_info));
-	ci->df_conflict = df_conflict;
+	setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+			names, NULL, 0, df_conflict, filemask, dirmask, 0);
+
+	ci = pi.util;
+	VERIFY_CI(ci);
 	ci->match_mask = match_mask;
-	strmap_put(&opti->paths, fullpath, ci);
 
 	/* If dirmask, recurse into subdirectories */
 	if (dirmask) {
@@ -310,7 +393,7 @@ static int collect_merge_info_callback(int n,
 		}
 
 		original_dir_name = opti->current_dir_name;
-		opti->current_dir_name = fullpath;
+		opti->current_dir_name = pi.string;
 		ret = traverse_trees(NULL, 3, t, &newinfo);
 		opti->current_dir_name = original_dir_name;
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 10/20] merge-ort: avoid recursing into identical trees
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (8 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

When all three trees have the same oid, there is no need to recurse into
these trees to find that all files within them happen to match.  We can
just record any one of the trees as the resolution of merging that
particular path.

Immediately resolving trees for other types of trivial tree merges (such
as one side matches the merge base, or the two sides match each other)
would prevent us from detecting renames for some paths, and thus prevent
us from doing three-way content merges for those paths whose renames we
did not detect.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 52a8c41cf8..0789816ae9 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -340,6 +340,19 @@ static int collect_merge_info_callback(int n,
 	fullpath = xmalloc(len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
+	/*
+	 * If mbase, side1, and side2 all match, we can resolve early.  Even
+	 * if these are trees, there will be no renames or anything
+	 * underneath.
+	 */
+	if (side1_matches_mbase && side2_matches_mbase) {
+		/* mbase, side1, & side2 all match; use mbase as resolution */
+		setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+				names, names+0, mbase_null, 0,
+				filemask, dirmask, 1);
+		return mask;
+	}
+
 	/*
 	 * Record information about the path so we can resolve later in
 	 * process_entries.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 11/20] merge-ort: add a preliminary simple process_entries() implementation
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (9 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add a process_entries() implementation that just loops over the paths
and processes each one individually with an auxiliary process_entry()
call.  Add a basic process_entry() as well, which handles several cases
but leaves a few of the more involved ones with die-not-implemented
messages.  Also, although process_entries() is supposed to create a
tree, it does not yet have code to do so -- except in the special case
of merging completely empty trees.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 0789816ae9..04127a32f8 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -465,10 +465,111 @@ static int detect_and_process_renames(struct merge_options *opt,
 	return clean;
 }
 
+/* Per entry merge function */
+static void process_entry(struct merge_options *opt,
+			  const char *path,
+			  struct conflict_info *ci)
+{
+	VERIFY_CI(ci);
+	assert(ci->filemask >= 0 && ci->filemask <= 7);
+	/* ci->match_mask == 7 was handled in collect_merge_info_callback() */
+	assert(ci->match_mask == 0 || ci->match_mask == 3 ||
+	       ci->match_mask == 5 || ci->match_mask == 6);
+
+	if (ci->df_conflict) {
+		die("Not yet implemented.");
+	}
+
+	/*
+	 * NOTE: Below there is a long switch-like if-elseif-elseif... block
+	 *       which the code goes through even for the df_conflict cases
+	 *       above.  Well, it will once we don't die-not-implemented above.
+	 */
+	if (ci->match_mask) {
+		ci->merged.clean = 1;
+		if (ci->match_mask == 6) {
+			/* stages[1] == stages[2] */
+			ci->merged.result.mode = ci->stages[1].mode;
+			oidcpy(&ci->merged.result.oid, &ci->stages[1].oid);
+		} else {
+			/* determine the mask of the side that didn't match */
+			unsigned int othermask = 7 & ~ci->match_mask;
+			int side = (othermask == 4) ? 2 : 1;
+
+			ci->merged.result.mode = ci->stages[side].mode;
+			ci->merged.is_null = !ci->merged.result.mode;
+			oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
+
+			assert(othermask == 2 || othermask == 4);
+			assert(ci->merged.is_null ==
+			       (ci->filemask == ci->match_mask));
+		}
+	} else if (ci->filemask >= 6 &&
+		   (S_IFMT & ci->stages[1].mode) !=
+		   (S_IFMT & ci->stages[2].mode)) {
+		/*
+		 * Two different items from (file/submodule/symlink)
+		 */
+		die("Not yet implemented.");
+	} else if (ci->filemask >= 6) {
+		/*
+		 * TODO: Needs a two-way or three-way content merge, but we're
+		 * just being lazy and copying the version from HEAD and
+		 * leaving it as conflicted.
+		 */
+		ci->merged.clean = 0;
+		ci->merged.result.mode = ci->stages[1].mode;
+		oidcpy(&ci->merged.result.oid, &ci->stages[1].oid);
+	} else if (ci->filemask == 3 || ci->filemask == 5) {
+		/* Modify/delete */
+		die("Not yet implemented.");
+	} else if (ci->filemask == 2 || ci->filemask == 4) {
+		/* Added on one side */
+		int side = (ci->filemask == 4) ? 2 : 1;
+		ci->merged.result.mode = ci->stages[side].mode;
+		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
+		ci->merged.clean = !ci->df_conflict;
+	} else if (ci->filemask == 1) {
+		/* Deleted on both sides */
+		ci->merged.is_null = 1;
+		ci->merged.result.mode = 0;
+		oidcpy(&ci->merged.result.oid, &null_oid);
+		ci->merged.clean = 1;
+	}
+
+	/*
+	 * If still conflicted, record it separately.  This allows us to later
+	 * iterate over just conflicted entries when updating the index instead
+	 * of iterating over all entries.
+	 */
+	if (!ci->merged.clean)
+		strmap_put(&opt->priv->conflicted, path, ci);
+}
+
 static void process_entries(struct merge_options *opt,
 			    struct object_id *result_oid)
 {
-	die("Not yet implemented.");
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (strmap_empty(&opt->priv->paths)) {
+		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
+		return;
+	}
+
+	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
+		/*
+		 * NOTE: mi may actually be a pointer to a conflict_info, but
+		 * we have to check mi->clean first to see if it's safe to
+		 * reassign to such a pointer type.
+		 */
+		struct merged_info *mi = e->value;
+
+		if (!mi->clean)
+			process_entry(opt, e->key, e->value);
+	}
+
+	die("Tree creation not yet implemented");
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 12/20] merge-ort: have process_entries operate in a defined order
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (10 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

We want to handle paths below a directory before needing to handle the
directory itself.  Also, we want to handle the directory immediately
after the paths below it, so we can't use simple lexicographic ordering
from strcmp (which would insert foo.txt between foo and foo/file.c).
Copy string_list_df_name_compare() from merge-recursive.c, and set up a
string list of paths sorted by that function so that we can iterate in
the desired order.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 04127a32f8..eec3b41e7e 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -465,6 +465,33 @@ static int detect_and_process_renames(struct merge_options *opt,
 	return clean;
 }
 
+static int string_list_df_name_compare(const char *one, const char *two)
+{
+	int onelen = strlen(one);
+	int twolen = strlen(two);
+	/*
+	 * Here we only care that entries for D/F conflicts are
+	 * adjacent, in particular with the file of the D/F conflict
+	 * appearing before files below the corresponding directory.
+	 * The order of the rest of the list is irrelevant for us.
+	 *
+	 * To achieve this, we sort with df_name_compare and provide
+	 * the mode S_IFDIR so that D/F conflicts will sort correctly.
+	 * We use the mode S_IFDIR for everything else for simplicity,
+	 * since in other cases any changes in their order due to
+	 * sorting cause no problems for us.
+	 */
+	int cmp = df_name_compare(one, onelen, S_IFDIR,
+				  two, twolen, S_IFDIR);
+	/*
+	 * Now that 'foo' and 'foo/bar' compare equal, we have to make sure
+	 * that 'foo' comes before 'foo/bar'.
+	 */
+	if (cmp)
+		return cmp;
+	return onelen - twolen;
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
@@ -551,24 +578,44 @@ static void process_entries(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *e;
+	struct string_list plist = STRING_LIST_INIT_NODUP;
+	struct string_list_item *entry;
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
+	/* Hack to pre-allocate plist to the desired size */
+	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+
+	/* Put every entry from paths into plist, then sort */
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
+		string_list_append(&plist, e->key)->util = e->value;
+	}
+	plist.cmp = string_list_df_name_compare;
+	string_list_sort(&plist);
+
+	/*
+	 * Iterate over the items in reverse order, so we can handle paths
+	 * below a directory before needing to handle the directory itself.
+	 */
+	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
+		char *path = entry->string;
 		/*
 		 * NOTE: mi may actually be a pointer to a conflict_info, but
 		 * we have to check mi->clean first to see if it's safe to
 		 * reassign to such a pointer type.
 		 */
-		struct merged_info *mi = e->value;
+		struct merged_info *mi = entry->util;
 
-		if (!mi->clean)
-			process_entry(opt, e->key, e->value);
+		if (!mi->clean) {
+			struct conflict_info *ci = (struct conflict_info *)mi;
+			process_entry(opt, path, ci);
+		}
 	}
 
+	string_list_clear(&plist, 0);
 	die("Tree creation not yet implemented");
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (11 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

As a step towards transforming the processed path->conflict_info entries
into an actual tree object, start recording basenames, modes, and oids
in a dir_metadata structure.  Subsequent commits will make use of this
to actually write a tree.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index eec3b41e7e..970708fff9 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -492,10 +492,31 @@ static int string_list_df_name_compare(const char *one, const char *two)
 	return onelen - twolen;
 }
 
+struct directory_versions {
+	struct string_list versions;
+};
+
+static void record_entry_for_tree(struct directory_versions *dir_metadata,
+				  const char *path,
+				  struct merged_info *mi)
+{
+	const char *basename;
+
+	if (mi->is_null)
+		/* nothing to record */
+		return;
+
+	basename = path + mi->basename_offset;
+	assert(strchr(basename, '/') == NULL);
+	string_list_append(&dir_metadata->versions,
+			   basename)->util = &mi->result;
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
-			  struct conflict_info *ci)
+			  struct conflict_info *ci,
+			  struct directory_versions *dir_metadata)
 {
 	VERIFY_CI(ci);
 	assert(ci->filemask >= 0 && ci->filemask <= 7);
@@ -503,6 +524,14 @@ static void process_entry(struct merge_options *opt,
 	assert(ci->match_mask == 0 || ci->match_mask == 3 ||
 	       ci->match_mask == 5 || ci->match_mask == 6);
 
+	if (ci->dirmask) {
+		record_entry_for_tree(dir_metadata, path, &ci->merged);
+		if (ci->filemask == 0)
+			/* nothing else to handle */
+			return;
+		assert(ci->df_conflict);
+	}
+
 	if (ci->df_conflict) {
 		die("Not yet implemented.");
 	}
@@ -571,6 +600,7 @@ static void process_entry(struct merge_options *opt,
 	 */
 	if (!ci->merged.clean)
 		strmap_put(&opt->priv->conflicted, path, ci);
+	record_entry_for_tree(dir_metadata, path, &ci->merged);
 }
 
 static void process_entries(struct merge_options *opt,
@@ -580,6 +610,7 @@ static void process_entries(struct merge_options *opt,
 	struct strmap_entry *e;
 	struct string_list plist = STRING_LIST_INIT_NODUP;
 	struct string_list_item *entry;
+	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP };
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
@@ -609,13 +640,16 @@ static void process_entries(struct merge_options *opt,
 		 */
 		struct merged_info *mi = entry->util;
 
-		if (!mi->clean) {
+		if (mi->clean)
+			record_entry_for_tree(&dir_metadata, path, mi);
+		else {
 			struct conflict_info *ci = (struct conflict_info *)mi;
-			process_entry(opt, path, ci);
+			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
 
 	string_list_clear(&plist, 0);
+	string_list_clear(&dir_metadata.versions, 0);
 	die("Tree creation not yet implemented");
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 14/20] merge-ort: step 2 of tree writing -- function to create tree object
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (12 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create a new function, write_tree(), which will take a list of
basenames, modes, and oids for a single directory and create a tree
object in the object-store.  We do not yet have just basenames, modes,
and oids for just a single directory (we have a mixture of entries from
all directory levels in the hierarchy) so we still die() before the
current call to write_tree(), but the next patch will rectify that.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 970708fff9..59355de628 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -19,6 +19,7 @@
 
 #include "diff.h"
 #include "diffcore.h"
+#include "object-store.h"
 #include "strmap.h"
 #include "tree.h"
 #include "xdiff-interface.h"
@@ -496,6 +497,51 @@ struct directory_versions {
 	struct string_list versions;
 };
 
+static void write_tree(struct object_id *result_oid,
+		       struct string_list *versions,
+		       unsigned int offset,
+		       size_t hash_size)
+{
+	size_t maxlen = 0, extra;
+	unsigned int nr = versions->nr - offset;
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
+	int i;
+
+	/*
+	 * We want to sort the last (versions->nr-offset) entries in versions.
+	 * Do so by abusing the string_list API a bit: make another string_list
+	 * that contains just those entries and then sort them.
+	 *
+	 * We won't use relevant_entries again and will let it just pop off the
+	 * stack, so there won't be allocation worries or anything.
+	 */
+	relevant_entries.items = versions->items + offset;
+	relevant_entries.nr = versions->nr - offset;
+	string_list_sort(&relevant_entries);
+
+	/* Pre-allocate some space in buf */
+	extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
+	for (i = 0; i < nr; i++) {
+		maxlen += strlen(versions->items[offset+i].string) + extra;
+	}
+	strbuf_grow(&buf, maxlen);
+
+	/* Write each entry out to buf */
+	for (i = 0; i < nr; i++) {
+		struct merged_info *mi = versions->items[offset+i].util;
+		struct version_info *ri = &mi->result;
+		strbuf_addf(&buf, "%o %s%c",
+			    ri->mode,
+			    versions->items[offset+i].string, '\0');
+		strbuf_add(&buf, ri->oid.hash, hash_size);
+	}
+
+	/* Write this object file out, and record in result_oid */
+	write_object_file(buf.buf, buf.len, tree_type, result_oid);
+	strbuf_release(&buf);
+}
+
 static void record_entry_for_tree(struct directory_versions *dir_metadata,
 				  const char *path,
 				  struct merged_info *mi)
@@ -648,9 +694,17 @@ static void process_entries(struct merge_options *opt,
 		}
 	}
 
+	/*
+	 * TODO: We can't actually write a tree yet, because dir_metadata just
+	 * contains all basenames of all files throughout the tree with their
+	 * mode and hash.  Not only is that a nonsensical tree, it will have
+	 * lots of duplicates for paths such as "Makefile" or ".gitignore".
+	 */
+	die("Not yet implemented; need to process subtrees separately");
+	write_tree(result_oid, &dir_metadata.versions, 0,
+		   opt->repo->hash_algo->rawsz);
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
-	die("Tree creation not yet implemented");
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (13 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Our order for processing of entries means that if we have a tree of
files that looks like
   Makefile
   src/moduleA/foo.c
   src/moduleA/bar.c
   src/moduleB/baz.c
   src/moduleB/umm.c
   tokens.txt

Then we will process paths in the order of the leftmost column below.  I
have added two additional columns that help explain the algorithm that
follows; the 2nd column is there to remind us we have oid & mode info we
are tracking for each of these paths (which differs between the paths
which I'm not representing well here), and the third column annotates
the parent directory of the entry:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB
   src/moduleB              <version_info>    src
   src/moduleA/foo.c        <version_info>    src/moduleA
   src/moduleA/bar.c        <version_info>    src/moduleA
   src/moduleA              <version_info>    src
   src                      <version_info>    ""
   Makefile                 <version_info>    ""

When the parent directory changes, if it's a subdirectory of the previous
parent directory (e.g. "" -> src/moduleB) then we can just keep appending.
If the parent directory differs from the previous parent directory and is
not a subdirectory, then we should process that directory.

So, for example, when we get to this point:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB

and note that the next entry (src/moduleB) has a different parent than
the last one that isn't a subdirectory, we should write out a tree for it
   100644 blob <HASH> umm.c
   100644 blob <HASH> baz.c

then pop all the entries under that directory while recording the new
hash for that directory, leaving us with
   tokens.txt               <version_info>        ""
   src/moduleB              <new version_info>    src

This process repeats until at the end we get to
   tokens.txt               <version_info>        ""
   src                      <new version_info>    ""
   Makefile                 <version_info>        ""

and then we can write out the toplevel tree.  Since we potentially have
entries in our string_list corresponding to multiple different toplevel
directories, e.g. a slightly different repository might have:
   whizbang.txt             <version_info>        ""
   tokens.txt               <version_info>        ""
   src/moduleD              <new version_info>    src
   src/moduleC              <new version_info>    src
   src/moduleB              <new version_info>    src
   src/moduleA/foo.c        <version_info>        src/moduleA
   src/moduleA/bar.c        <version_info>        src/moduleA

When src/moduleA is popped off, we need to know that the "last
directory" reverts back to src, and how many entries in our string_list
are associated with that parent directory.  So I use an auxiliary
offsets string_list which would have (parent_directory,offset)
information of the form
   ""             0
   src            2
   src/moduleA    5

Whenever I write out a tree for a subdirectory, I set versions.nr to
the final offset value and then decrement offsets.nr...and then add
an entry to versions with a hash for the new directory.

The idea is relatively simple, there's just a lot of accounting to
implement this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 242 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 234 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 59355de628..65dbdadc5e 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -494,7 +494,46 @@ static int string_list_df_name_compare(const char *one, const char *two)
 }
 
 struct directory_versions {
+	/*
+	 * versions: list of (basename -> version_info)
+	 *
+	 * The basenames are in reverse lexicographic order of full pathnames,
+	 * as processed in process_entries().  This puts all entries within
+	 * a directory together, and covers the directory itself after
+	 * everything within it, allowing us to write subtrees before needing
+	 * to record information for the tree itself.
+	 */
 	struct string_list versions;
+
+	/*
+	 * offsets: list of (full relative path directories -> integer offsets)
+	 *
+	 * Since versions contains basenames from files in multiple different
+	 * directories, we need to know which entries in versions correspond
+	 * to which directories.  Values of e.g.
+	 *     ""             0
+	 *     src            2
+	 *     src/moduleA    5
+	 * Would mean that entries 0-1 of versions are files in the toplevel
+	 * directory, entries 2-4 are files under src/, and the remaining
+	 * entries starting at index 5 are files under src/moduleA/.
+	 */
+	struct string_list offsets;
+
+	/*
+	 * last_directory: directory that previously processed file found in
+	 *
+	 * last_directory starts NULL, but records the directory in which the
+	 * previous file was found within.  As soon as
+	 *    directory(current_file) != last_directory
+	 * then we need to start updating accounting in versions & offsets.
+	 * Note that last_directory is always the last path in "offsets" (or
+	 * NULL if "offsets" is empty) so this exists just for quick access.
+	 */
+	const char *last_directory;
+
+	/* last_directory_len: cached computation of strlen(last_directory) */
+	unsigned last_directory_len;
 };
 
 static void write_tree(struct object_id *result_oid,
@@ -558,6 +597,181 @@ static void record_entry_for_tree(struct directory_versions *dir_metadata,
 			   basename)->util = &mi->result;
 }
 
+static void write_completed_directory(struct merge_options *opt,
+				      const char *new_directory_name,
+				      struct directory_versions *info)
+{
+	const char *prev_dir;
+	struct merged_info *dir_info = NULL;
+	unsigned int offset;
+
+	/*
+	 * Some explanation of info->versions and info->offsets...
+	 *
+	 * process_entries() iterates over all relevant files AND
+	 * directories in reverse lexicographic order, and calls this
+	 * function.  Thus, an example of the paths that process_entries()
+	 * could operate on (along with the directories for those paths
+	 * being shown) is:
+	 *
+	 *     xtract.c             ""
+	 *     tokens.txt           ""
+	 *     src/moduleB/umm.c    src/moduleB
+	 *     src/moduleB/stuff.h  src/moduleB
+	 *     src/moduleB/baz.c    src/moduleB
+	 *     src/moduleB          src
+	 *     src/moduleA/foo.c    src/moduleA
+	 *     src/moduleA/bar.c    src/moduleA
+	 *     src/moduleA          src
+	 *     src                  ""
+	 *     Makefile             ""
+	 *
+	 * info->versions:
+	 *
+	 *     always contains the unprocessed entries and their
+	 *     version_info information.  For example, after the first five
+	 *     entries above, info->versions would be:
+	 *
+	 *     	   xtract.c     <xtract.c's version_info>
+	 *     	   token.txt    <token.txt's version_info>
+	 *     	   umm.c        <src/moduleB/umm.c's version_info>
+	 *     	   stuff.h      <src/moduleB/stuff.h's version_info>
+	 *     	   baz.c        <src/moduleB/baz.c's version_info>
+	 *
+	 *     Once a subdirectory is completed we remove the entries in
+	 *     that subdirectory from info->versions, writing it as a tree
+	 *     (write_tree()).  Thus, as soon as we get to src/moduleB,
+	 *     info->versions would be updated to
+	 *
+	 *     	   xtract.c     <xtract.c's version_info>
+	 *     	   token.txt    <token.txt's version_info>
+	 *     	   moduleB      <src/moduleB's version_info>
+	 *
+	 * info->offsets:
+	 *
+	 *     helps us track which entries in info->versions correspond to
+	 *     which directories.  When we are N directories deep (e.g. 4
+	 *     for src/modA/submod/subdir/), we have up to N+1 unprocessed
+	 *     directories (+1 because of toplevel dir).  Corresponding to
+	 *     the info->versions example above, after processing five entries
+	 *     info->offsets will be:
+	 *
+	 *     	   ""           0
+	 *     	   src/moduleB  2
+	 *
+	 *     which is used to know that xtract.c & token.txt are from the
+	 *     toplevel dirctory, while umm.c & stuff.h & baz.c are from the
+	 *     src/moduleB directory.  Again, following the example above,
+	 *     once we need to process src/moduleB, then info->offsets is
+	 *     updated to
+	 *
+	 *     	   ""           0
+	 *     	   src          2
+	 *
+	 *     which says that moduleB (and only moduleB so far) is in the
+	 *     src directory.
+	 *
+	 *     One unique thing to note about info->offsets here is that
+	 *     "src" was not added to info->offsets until there was a path
+	 *     (a file OR directory) immediately below src/ that got
+	 *     processed.
+	 *
+	 * Since process_entry() just appends new entries to info->versions,
+	 * write_completed_directory() only needs to do work if the next path
+	 * is in a directory that is different than the last directory found
+	 * in info->offsets.
+	 */
+
+	/*
+	 * If we are working with the same directory as the last entry, there
+	 * is no work to do.  (See comments above the directory_name member of
+	 * struct merged_info for why we can use pointer comparison instead of
+	 * strcmp here.)
+	 */
+	if (new_directory_name == info->last_directory)
+		return;
+
+	/*
+	 * If we are just starting (last_directory is NULL), or last_directory
+	 * is a prefix of the current directory, then we can just update
+	 * info->offsets to record the offset where we started this directory
+	 * and update last_directory to have quick access to it.
+	 */
+	if (info->last_directory == NULL ||
+	    !strncmp(new_directory_name, info->last_directory,
+		     info->last_directory_len)) {
+		uintptr_t offset = info->versions.nr;
+
+		info->last_directory = new_directory_name;
+		info->last_directory_len = strlen(info->last_directory);
+		/*
+		 * Record the offset into info->versions where we will
+		 * start recording basenames of paths found within
+		 * new_directory_name.
+		 */
+		string_list_append(&info->offsets,
+				   info->last_directory)->util = (void*)offset;
+		return;
+	}
+
+	/*
+	 * The next entry that will be processed will be within
+	 * new_directory_name.  Since at this point we know that
+	 * new_directory_name is within a different directory than
+	 * info->last_directory, we have all entries for info->last_directory
+	 * in info->versions and we need to create a tree object for them.
+	 */
+	dir_info = strmap_get(&opt->priv->paths, info->last_directory);
+	assert(dir_info);
+	offset = (uintptr_t)info->offsets.items[info->offsets.nr-1].util;
+	if (offset == info->versions.nr) {
+		/*
+		 * Actually, we don't need to create a tree object in this
+		 * case.  Whenever all files within a directory disappear
+		 * during the merge (e.g. unmodified on one side and
+		 * deleted on the other, or files were renamed elsewhere),
+		 * then we get here and the directory itself needs to be
+		 * omitted from its parent tree as well.
+		 */
+		dir_info->is_null = 1;
+	} else {
+		/*
+		 * Write out the tree to the git object directory, and also
+		 * record the mode and oid in dir_info->result.
+		 */
+		dir_info->is_null = 0;
+		dir_info->result.mode = S_IFDIR;
+		write_tree(&dir_info->result.oid, &info->versions, offset,
+			   opt->repo->hash_algo->rawsz);
+	}
+
+	/*
+	 * We've now used several entries from info->versions and one entry
+	 * from info->offsets, so we get rid of those values.
+	 */
+	info->offsets.nr--;
+	info->versions.nr = offset;
+
+	/*
+	 * Now we've taken care of the completed directory, but we need to
+	 * prepare things since future entries will be in
+	 * new_directory_name.  (In particular, process_entry() will be
+	 * appending new entries to info->versions.)  So, we need to make
+	 * sure new_directory_name is the last entry in info->offsets.
+	 */
+	prev_dir = info->offsets.nr == 0 ? NULL :
+		   info->offsets.items[info->offsets.nr-1].string;
+	if (new_directory_name != prev_dir) {
+		uintptr_t c = info->versions.nr;
+		string_list_append(&info->offsets,
+				   new_directory_name)->util = (void*)c;
+	}
+
+	/* And, of course, we need to update last_directory to match. */
+	info->last_directory = new_directory_name;
+	info->last_directory_len = strlen(info->last_directory);
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
@@ -656,7 +870,9 @@ static void process_entries(struct merge_options *opt,
 	struct strmap_entry *e;
 	struct string_list plist = STRING_LIST_INIT_NODUP;
 	struct string_list_item *entry;
-	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP };
+	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP,
+						   STRING_LIST_INIT_NODUP,
+						   NULL, 0 };
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
@@ -676,6 +892,11 @@ static void process_entries(struct merge_options *opt,
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
 	 * below a directory before needing to handle the directory itself.
+	 *
+	 * This allows us to write subtrees before we need to write trees,
+	 * and it also enables sane handling of directory/file conflicts
+	 * (because it allows us to know whether the directory is still in
+	 * the way when it is time to process the file at the same path).
 	 */
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
@@ -686,6 +907,8 @@ static void process_entries(struct merge_options *opt,
 		 */
 		struct merged_info *mi = entry->util;
 
+		write_completed_directory(opt, mi->directory_name,
+					  &dir_metadata);
 		if (mi->clean)
 			record_entry_for_tree(&dir_metadata, path, mi);
 		else {
@@ -694,17 +917,20 @@ static void process_entries(struct merge_options *opt,
 		}
 	}
 
-	/*
-	 * TODO: We can't actually write a tree yet, because dir_metadata just
-	 * contains all basenames of all files throughout the tree with their
-	 * mode and hash.  Not only is that a nonsensical tree, it will have
-	 * lots of duplicates for paths such as "Makefile" or ".gitignore".
-	 */
-	die("Not yet implemented; need to process subtrees separately");
+	if (dir_metadata.offsets.nr != 1 ||
+	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
+		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
+		       dir_metadata.offsets.nr);
+		printf("dir_metadata.offsets.items[0].util = %u (should be 0)\n",
+		       (unsigned)(uintptr_t)dir_metadata.offsets.items[0].util);
+		fflush(stdout);
+		BUG("dir_metadata accounting completely off; shouldn't happen");
+	}
 	write_tree(result_oid, &dir_metadata.versions, 0,
 		   opt->repo->hash_algo->rawsz);
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
+	string_list_clear(&dir_metadata.offsets, 0);
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 16/20] merge-ort: basic outline for merge_switch_to_result()
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (14 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a basic implementation for merge_switch_to_result(), though
just in terms of a few new empty functions that will be defined in
subsequent commits.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 65dbdadc5e..1ef32a4053 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -933,13 +933,53 @@ static void process_entries(struct merge_options *opt,
 	string_list_clear(&dir_metadata.offsets, 0);
 }
 
+static int checkout(struct merge_options *opt,
+		    struct tree *prev,
+		    struct tree *next)
+{
+	die("Not yet implemented.");
+}
+
+static int record_conflicted_index_entries(struct merge_options *opt,
+					   struct index_state *index,
+					   struct strmap *paths,
+					   struct strmap *conflicted)
+{
+	if (strmap_empty(conflicted))
+		return 0;
+
+	die("Not yet implemented.");
+}
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
 			    int update_worktree_and_index,
 			    int display_update_msgs)
 {
-	die("Not yet implemented");
+	assert(opt->priv == NULL);
+	if (result->clean >= 0 && update_worktree_and_index) {
+		struct merge_options_internal *opti = result->priv;
+
+		if (checkout(opt, head, result->tree)) {
+			/* failure to function */
+			result->clean = -1;
+			return;
+		}
+
+		if (record_conflicted_index_entries(opt, opt->repo->index,
+						    &opti->paths,
+						    &opti->conflicted)) {
+			/* failure to function */
+			result->clean = -1;
+			return;
+		}
+	}
+
+	if (display_update_msgs) {
+		/* TODO: print out CONFLICT and other informational messages. */
+	}
+
 	merge_finalize(opt, result);
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 17/20] merge-ort: add implementation of checkout()
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (15 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Since merge-ort creates a tree for its output, when there are no
conflicts, updating the working tree and index is as simple as using the
unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent
of a "checkout" operation).

If there were conflicts in the merge, then since the tree we created
included all the conflict markers, then using the unpack_trees machinery
in this manner will still update the working tree correctly.  Further,
all index entries corresponding to cleanly merged files will also be
updated correctly by this procedure.  Index entries corresponding to
conflicted entries will appear as though the user had run "git add -u"
after the merge to accept all files as-is with conflict markers.

Thus, after running unpack_trees(), there needs to be a separate step
for updating the entries in the index corresponding to conflicted files.
This will be the job for the function record_conflicted_index_entris(),
which will be implemented in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 1ef32a4053..69b9fbe591 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -19,9 +19,11 @@
 
 #include "diff.h"
 #include "diffcore.h"
+#include "dir.h"
 #include "object-store.h"
 #include "strmap.h"
 #include "tree.h"
+#include "unpack-trees.h"
 #include "xdiff-interface.h"
 
 struct merge_options_internal {
@@ -937,7 +939,48 @@ static int checkout(struct merge_options *opt,
 		    struct tree *prev,
 		    struct tree *next)
 {
-	die("Not yet implemented.");
+	/* Switch the index/working copy from old to new */
+	int ret;
+	struct tree_desc trees[2];
+	struct unpack_trees_options unpack_opts;
+
+	memset(&unpack_opts, 0, sizeof(unpack_opts));
+	unpack_opts.head_idx = -1;
+	unpack_opts.src_index = opt->repo->index;
+	unpack_opts.dst_index = opt->repo->index;
+
+	setup_unpack_trees_porcelain(&unpack_opts, "merge");
+
+	/*
+	 * NOTE: if this were just "git checkout" code, we would probably
+	 * read or refresh the cache and check for a conflicted index, but
+	 * builtin/merge.c or sequencer.c really needs to read the index
+	 * and check for conflicted entries before starting merging for a
+	 * good user experience (no sense waiting for merges/rebases before
+	 * erroring out), so there's no reason to duplicate that work here.
+	 */
+
+	/* 2-way merge to the new branch */
+	unpack_opts.update = 1;
+	unpack_opts.merge = 1;
+	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
+	unpack_opts.verbose_update = (opt->verbosity > 2);
+	unpack_opts.fn = twoway_merge;
+	if (1/* FIXME: opts->overwrite_ignore*/) {
+		unpack_opts.dir = xcalloc(1, sizeof(*unpack_opts.dir));
+		unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(unpack_opts.dir);
+	}
+	parse_tree(prev);
+	init_tree_desc(&trees[0], prev->buffer, prev->size);
+	parse_tree(next);
+	init_tree_desc(&trees[1], next->buffer, next->size);
+
+	ret = unpack_trees(2, trees, &unpack_opts);
+	clear_unpack_trees_porcelain(&unpack_opts);
+	dir_clear(unpack_opts.dir);
+	FREE_AND_NULL(unpack_opts.dir);
+	return ret;
 }
 
 static int record_conflicted_index_entries(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (16 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:43 ` [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 tree.c | 2 +-
 tree.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tree.c b/tree.c
index e76517f6b1..a52479812c 100644
--- a/tree.c
+++ b/tree.c
@@ -144,7 +144,7 @@ int read_tree_recursive(struct repository *r,
 	return ret;
 }
 
-static int cmp_cache_name_compare(const void *a_, const void *b_)
+int cmp_cache_name_compare(const void *a_, const void *b_)
 {
 	const struct cache_entry *ce1, *ce2;
 
diff --git a/tree.h b/tree.h
index 9383745073..3eb0484cbf 100644
--- a/tree.h
+++ b/tree.h
@@ -28,6 +28,8 @@ void free_tree_buffer(struct tree *tree);
 /* Parses and returns the tree in the given ent, chasing tags and commits. */
 struct tree *parse_tree_indirect(const struct object_id *oid);
 
+int cmp_cache_name_compare(const void *a_, const void *b_);
+
 #define READ_TREE_RECURSIVE 1
 typedef int (*read_tree_fn_t)(const struct object_id *, struct strbuf *, const char *, unsigned int, int, void *);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries()
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (17 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29 10:20   ` Ævar Arnfjörð Bjarmason
  2020-11-29  7:43 ` [PATCH 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

After checkout(), the working tree has the appropriate contents, and the
index matches the working copy.  That means that all unmodified and
cleanly merged files have correct index entries, but conflicted entries
need to be updated.

We do this by looping over the conflicted entries, marking the existing
index entry for the path with CE_REMOVE, adding new higher order staged
for the path at the end of the index (ignoring normal index sort order),
and then at the end of the loop removing the CE_REMOVED-marked cache
entries and sorting the index.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 88 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 69b9fbe591..d1b98e2fca 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,7 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "cache-tree.h"
 #include "diff.h"
 #include "diffcore.h"
 #include "dir.h"
@@ -988,10 +989,96 @@ static int record_conflicted_index_entries(struct merge_options *opt,
 					   struct strmap *paths,
 					   struct strmap *conflicted)
 {
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+	int errs = 0;
+	int original_cache_nr;
+
 	if (strmap_empty(conflicted))
 		return 0;
 
-	die("Not yet implemented.");
+	original_cache_nr = index->cache_nr;
+
+	/* Put every entry from paths into plist, then sort */
+	strmap_for_each_entry(conflicted, &iter, e) {
+		const char *path = e->key;
+		struct conflict_info *ci = e->value;
+		int pos;
+		struct cache_entry *ce;
+		int i;
+
+		VERIFY_CI(ci);
+
+		/*
+		 * The index will already have a stage=0 entry for this path,
+		 * because we created an as-merged-as-possible version of the
+		 * file and checkout() moved the working copy and index over
+		 * to that version.
+		 *
+		 * However, previous iterations through this loop will have
+		 * added unstaged entries to the end of the cache which
+		 * ignore the standard alphabetical ordering of cache
+		 * entries and break invariants needed for index_name_pos()
+		 * to work.  However, we know the entry we want is before
+		 * those appended cache entries, so do a temporary swap on
+		 * cache_nr to only look through entries of interest.
+		 */
+		SWAP(index->cache_nr, original_cache_nr);
+		pos = index_name_pos(index, path, strlen(path));
+		SWAP(index->cache_nr, original_cache_nr);
+		if (pos < 0) {
+			if (ci->filemask == 1)
+				cache_tree_invalidate_path(index, path);
+			else
+				BUG("Conflicted %s but nothing in basic working tree or index; this shouldn't happen", path);
+		} else {
+			ce = index->cache[pos];
+
+			/*
+			 * Clean paths with CE_SKIP_WORKTREE set will not be
+			 * written to the working tree by the unpack_trees()
+			 * call in checkout().  Our conflicted entries would
+			 * have appeared clean to that code since we ignored
+			 * the higher order stages.  Thus, we need override
+			 * the CE_SKIP_WORKTREE bit and manually write those
+			 * files to the working disk here.
+			 *
+			 * TODO: Implement this CE_SKIP_WORKTREE fixup.
+			 */
+
+			/*
+			 * Mark this cache entry for removal and instead add
+			 * new stage>0 entries corresponding to the
+			 * conflicts.  If there are many conflicted entries, we
+			 * want to avoid memmove'ing O(NM) entries by
+			 * inserting the new entries one at a time.  So,
+			 * instead, we just add the new cache entries to the
+			 * end (ignoring normal index requirements on sort
+			 * order) and sort the index once we're all done.
+			 */
+			ce->ce_flags |= CE_REMOVE;
+		}
+
+		for (i = 0; i < 3; i++) {
+			struct version_info *vi;
+			if (!(ci->filemask & (1ul << i)))
+				continue;
+			vi = &ci->stages[i];
+			ce = make_cache_entry(index, vi->mode, &vi->oid,
+					      path, i+1, 0);
+			add_index_entry(index, ce, ADD_CACHE_JUST_APPEND);
+		}
+	}
+
+	/*
+	 * Remove the unused cache entries (and invalidate the relevant
+	 * cache-trees), then sort the index entries to get the conflicted
+	 * entries we added to the end into their right locations.
+	 */
+	remove_marked_cache_entries(index, 1);
+	QSORT(index->cache, index->cache_nr, cmp_cache_name_compare);
+
+	return errs;
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 20/20] merge-ort: free data structures in merge_finalize()
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (18 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
@ 2020-11-29  7:43 ` Elijah Newren via GitGitGadget
  2020-11-29  7:47 ` [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-11-29  7:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index d1b98e2fca..ea6a9d7348 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -182,6 +182,16 @@ struct conflict_info {
 	assert((ci) && !(mi)->clean);        \
 } while (0)
 
+static void free_strmap_strings(struct strmap *map)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	strmap_for_each_entry(map, &iter, entry) {
+		free((char*)entry->key);
+	}
+}
+
 static int err(struct merge_options *opt, const char *err, ...)
 {
 	va_list params;
@@ -1116,7 +1126,27 @@ void merge_switch_to_result(struct merge_options *opt,
 void merge_finalize(struct merge_options *opt,
 		    struct merge_result *result)
 {
-	die("Not yet implemented");
+	struct merge_options_internal *opti = result->priv;
+
+	assert(opt->priv == NULL);
+
+	/*
+	 * We marked opti->paths with strdup_strings = 0, so that we
+	 * wouldn't have to make another copy of the fullpath created by
+	 * make_traverse_path from setup_path_info().  But, now that we've
+	 * used it and have no other references to these strings, it is time
+	 * to deallocate them.
+	 */
+	free_strmap_strings(&opti->paths);
+	strmap_clear(&opti->paths, 1);
+
+	/*
+	 * All keys and values in opti->conflicted are a subset of those in
+	 * opti->paths.  We don't want to deallocate anything twice, so we
+	 * don't free the keys and we pass 0 for free_values.
+	 */
+	strmap_clear(&opti->conflicted, 0);
+	FREE_AND_NULL(opti);
 }
 
 static void merge_start(struct merge_options *opt, struct merge_result *result)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 00/20] fundamentals of merge-ort implementation
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (19 preceding siblings ...)
  2020-11-29  7:43 ` [PATCH 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
@ 2020-11-29  7:47 ` Elijah Newren
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
  21 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren @ 2020-11-29  7:47 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: Git Mailing List, Derrick Stolee, Jonathan Tan

On Sat, Nov 28, 2020 at 11:43 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is actually v3 of this series; but v2 depended on two topics that
> hadn't graduated yet so I couldn't easily use gitgitgadget and get its
> testing. Now that the topics have graduated, I have rebased on master. You
> can see v2 and comments on it over here:
> https://lore.kernel.org/git/20201102204344.342633-1-newren@gmail.com/

Oops, I forgot to CC my two great reviewers of previous rounds.
Fixing that now...

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries()
  2020-11-29  7:43 ` [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
@ 2020-11-29 10:20   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 84+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-11-29 10:20 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren


On Sun, Nov 29 2020, Elijah Newren via GitGitGadget wrote:

> +		if (pos < 0) {
> +			if (ci->filemask == 1)
> +				cache_tree_invalidate_path(index, path);
> +			else
> +				BUG("Conflicted %s but nothing in basic working tree or index; this shouldn't happen", path);

Trivial style comment: elsewhere in the series you avoid indentation
with BUG(...). Would be better here as:

    if (x != 1)
        BUG(...)
    rest_of_code();

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive
  2020-11-29  7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
@ 2020-11-29 10:23   ` Ævar Arnfjörð Bjarmason
  2020-11-30 16:56     ` Elijah Newren
  2020-11-29 10:26   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 84+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-11-29 10:23 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren


On Sun, Nov 29 2020, Elijah Newren via GitGitGadget wrote:

>  static int collect_merge_info(struct merge_options *opt,
>  			      struct tree *merge_base,
>  			      struct tree *side1,
>  			      struct tree *side2)
>  {
> +	/* TODO: Implement this using traverse_trees() */
>  	die("Not yet implemented.");
>  }
>  

Looks like this doesn't belong in this patch & should instead be
squashed into "[PATCH 02/20] merge-ort: add some high-level algorithm
structure".

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive
  2020-11-29  7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
  2020-11-29 10:23   ` Ævar Arnfjörð Bjarmason
@ 2020-11-29 10:26   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 84+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2020-11-29 10:26 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren


On Sun, Nov 29 2020, Elijah Newren via GitGitGadget wrote:

> +		err(opt, _("collecting merge info failed for trees %s, %s, %s"),
> +		    oid_to_hex(&merge_base->object.oid),
> +		    oid_to_hex(&side1->object.oid),
> +		    oid_to_hex(&side2->object.oid));

(Sorry about the two E-Mails, didn't spot this at first). This error
message without context for translators is going to give them no idea
what %s/%s/%s are. Maybe this before it:

 /*
  * TRANSLATORS: The %s arguments are: 1) tree SHA-1 of a merge base 2-3) the
  * trees for the two trees we're merging.
  */

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive
  2020-11-29 10:23   ` Ævar Arnfjörð Bjarmason
@ 2020-11-30 16:56     ` Elijah Newren
  0 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren @ 2020-11-30 16:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Sun, Nov 29, 2020 at 2:23 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Sun, Nov 29 2020, Elijah Newren via GitGitGadget wrote:
>
> >  static int collect_merge_info(struct merge_options *opt,
> >                             struct tree *merge_base,
> >                             struct tree *side1,
> >                             struct tree *side2)
> >  {
> > +     /* TODO: Implement this using traverse_trees() */
> >       die("Not yet implemented.");
> >  }
> >
>
> Looks like this doesn't belong in this patch & should instead be
> squashed into "[PATCH 02/20] merge-ort: add some high-level algorithm
> structure".

Indeed, and Derrick pointed out the same thing but when I went back
through all the emails to try to make sure I covered everything, I
somehow missed that particular piece of his comments.  Anyway, I've
fixed it up locally along with your two other suggestions.  I'll wait
a bit more for other feedback before sending the next re-roll.

Thanks for taking a look!

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                   ` (20 preceding siblings ...)
  2020-11-29  7:47 ` [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren
@ 2020-12-04 20:47 ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
                     ` (20 more replies)
  21 siblings, 21 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren

This is actually v4 of this series (the first two rounds depended on topics
that hadn't graduated yet, so I hadn't yet used gitgitgadget for submitting
it). As a reminder, if you need to see the first two rounds before I started
submitting this series with gitgitgadget, you can see them over here: 
https://lore.kernel.org/git/20201102204344.342633-1-newren@gmail.com/

Changes since v3:

 * Made the small tweaks suggested by Ævar
 * Fixed an embarrassing tree ordering bug in commit 13; base_name_compare()
   != strcmp() is important.

(Tree ordering bug found due to the fact that merge-ort, including many
patches not yet submitted to this list, is in live use at $DAYJOB.)

Elijah Newren (20):
  merge-ort: setup basic internal data structures
  merge-ort: add some high-level algorithm structure
  merge-ort: port merge_start() from merge-recursive
  merge-ort: use histogram diff
  merge-ort: add an err() function similar to one from merge-recursive
  merge-ort: implement a very basic collect_merge_info()
  merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  merge-ort: compute a few more useful fields for collect_merge_info
  merge-ort: record stage and auxiliary info for every path
  merge-ort: avoid recursing into identical trees
  merge-ort: add a preliminary simple process_entries() implementation
  merge-ort: have process_entries operate in a defined order
  merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  merge-ort: step 2 of tree writing -- function to create tree object
  merge-ort: step 3 of tree writing -- handling subdirectories as we go
  merge-ort: basic outline for merge_switch_to_result()
  merge-ort: add implementation of checkout()
  tree: enable cmp_cache_name_compare() to be used elsewhere
  merge-ort: add implementation of record_conflicted_index_entries()
  merge-ort: free data structures in merge_finalize()

 merge-ort.c | 1221 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 tree.c      |    2 +-
 tree.h      |    2 +
 3 files changed, 1221 insertions(+), 4 deletions(-)


base-commit: e67fbf927dfdf13d0b21dc6ea15dc3c7ef448ea0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-923%2Fnewren%2Fort-basics-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-923/newren/ort-basics-v2
Pull-Request: https://github.com/git/git/pull/923

Range-diff vs v1:

  1:  2568ec92c6 =  1:  2568ec92c6 merge-ort: setup basic internal data structures
  2:  3a063865c3 !  2:  b658536f59 merge-ort: add some high-level algorithm structure
     @@ merge-ort.c: struct conflict_info {
      +			      struct tree *side1,
      +			      struct tree *side2)
      +{
     ++	/* TODO: Implement this using traverse_trees() */
      +	die("Not yet implemented.");
      +}
      +
  3:  5615f0eecb =  3:  acb40f5c16 merge-ort: port merge_start() from merge-recursive
  4:  564b072ac1 =  4:  22fecf6ccd merge-ort: use histogram diff
  5:  91516799e4 !  5:  6c4c0c15b3 merge-ort: add an err() function similar to one from merge-recursive
     @@ merge-ort.c: struct conflict_info {
       			      struct tree *side1,
       			      struct tree *side2)
       {
     -+	/* TODO: Implement this using traverse_trees() */
     +-	/* TODO: Implement this using traverse_trees() */
       	die("Not yet implemented.");
       }
       
     @@ merge-ort.c: static void merge_ort_nonrecursive_internal(struct merge_options *o
       
      -	collect_merge_info(opt, merge_base, side1, side2);
      +	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
     ++		/*
     ++		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
     ++		 * base, and 2-3) the trees for the two trees we're merging.
     ++		 */
      +		err(opt, _("collecting merge info failed for trees %s, %s, %s"),
      +		    oid_to_hex(&merge_base->object.oid),
      +		    oid_to_hex(&side1->object.oid),
  6:  ab743967aa !  6:  27268ef8a3 merge-ort: implement a very basic collect_merge_info()
     @@ merge-ort.c: static int err(struct merge_options *opt, const char *err, ...)
       			      struct tree *side1,
       			      struct tree *side2)
       {
     --	/* TODO: Implement this using traverse_trees() */
      -	die("Not yet implemented.");
      +	int ret;
      +	struct tree_desc t[3];
  7:  bff758c5dd =  7:  c6e5621c21 merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  8:  61b3d66fdc =  8:  93fd69fa3c merge-ort: compute a few more useful fields for collect_merge_info
  9:  4e4298fa70 =  9:  decff4b375 merge-ort: record stage and auxiliary info for every path
 10:  3ec087eb68 = 10:  86c661fe1e merge-ort: avoid recursing into identical trees
 11:  0c89cee34e = 11:  aa3b13ffd8 merge-ort: add a preliminary simple process_entries() implementation
 12:  605cbc19d2 = 12:  b54306fd0e merge-ort: have process_entries operate in a defined order
 13:  242c3cab13 = 13:  8ee8561d7a merge-ort: step 1 of tree writing -- record basenames, modes, and oids
 14:  33a5d23c85 ! 14:  6ff56824c3 merge-ort: step 2 of tree writing -- function to create tree object
     @@ merge-ort.c: struct directory_versions {
       	struct string_list versions;
       };
       
     ++static int tree_entry_order(const void *a_, const void *b_)
     ++{
     ++	const struct string_list_item *a = a_;
     ++	const struct string_list_item *b = b_;
     ++
     ++	const struct merged_info *ami = a->util;
     ++	const struct merged_info *bmi = b->util;
     ++	return base_name_compare(a->string, strlen(a->string), ami->result.mode,
     ++				 b->string, strlen(b->string), bmi->result.mode);
     ++}
     ++
      +static void write_tree(struct object_id *result_oid,
      +		       struct string_list *versions,
      +		       unsigned int offset,
     @@ merge-ort.c: struct directory_versions {
      +	 */
      +	relevant_entries.items = versions->items + offset;
      +	relevant_entries.nr = versions->nr - offset;
     -+	string_list_sort(&relevant_entries);
     ++	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
      +
      +	/* Pre-allocate some space in buf */
      +	extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
 15:  29615c366f ! 15:  da4fe90049 merge-ort: step 3 of tree writing -- handling subdirectories as we go
     @@ merge-ort.c: static int string_list_df_name_compare(const char *one, const char
      +	unsigned last_directory_len;
       };
       
     - static void write_tree(struct object_id *result_oid,
     + static int tree_entry_order(const void *a_, const void *b_)
      @@ merge-ort.c: static void record_entry_for_tree(struct directory_versions *dir_metadata,
       			   basename)->util = &mi->result;
       }
 16:  da54fa454a = 16:  8e90d211c5 merge-ort: basic outline for merge_switch_to_result()
 17:  68307f1b67 = 17:  61fada146c merge-ort: add implementation of checkout()
 18:  a3cd563621 = 18:  f5a13a0b08 tree: enable cmp_cache_name_compare() to be used elsewhere
 19:  56b162c609 ! 19:  4efac38116 merge-ort: add implementation of record_conflicted_index_entries()
     @@ merge-ort.c: static int record_conflicted_index_entries(struct merge_options *op
      +		pos = index_name_pos(index, path, strlen(path));
      +		SWAP(index->cache_nr, original_cache_nr);
      +		if (pos < 0) {
     -+			if (ci->filemask == 1)
     -+				cache_tree_invalidate_path(index, path);
     -+			else
     ++			if (ci->filemask != 1)
      +				BUG("Conflicted %s but nothing in basic working tree or index; this shouldn't happen", path);
     ++			cache_tree_invalidate_path(index, path);
      +		} else {
      +			ce = index->cache[pos];
      +
 20:  a4f722a46e = 20:  fbeb527d67 merge-ort: free data structures in merge_finalize()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 01/20] merge-ort: setup basic internal data structures
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Set up some basic internal data structures.  The only carry-over from
merge-recursive.c is call_depth, though needed_rename_limit will be
added later.

The central piece of data will definitely be the strmap "paths", which
will map every relevant pathname under consideration to either a
merged_info or a conflict_info.  ("conflicted" is a strmap that is a
subset of "paths".)

merged_info contains all relevant information for a non-conflicted
entry.  conflict_info contains a merged_info, plus any additional
information about a conflict such as the higher orders stages involved
and the names of the paths those came from (handy once renames get
involved).  If an entry remains conflicted, the merged_info portion of a
conflict_info will later be filled with whatever version of the file
should be placed in the working directory (e.g. an as-merged-as-possible
variation that contains conflict markers).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index b487901d3e..bb37fdf838 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,143 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "strmap.h"
+
+struct merge_options_internal {
+	/*
+	 * paths: primary data structure in all of merge ort.
+	 *
+	 * The keys of paths:
+	 *   * are full relative paths from the toplevel of the repository
+	 *     (e.g. "drivers/firmware/raspberrypi.c").
+	 *   * store all relevant paths in the repo, both directories and
+	 *     files (e.g. drivers, drivers/firmware would also be included)
+	 *   * these keys serve to intern all the path strings, which allows
+	 *     us to do pointer comparison on directory names instead of
+	 *     strcmp; we just have to be careful to use the interned strings.
+	 *
+	 * The values of paths:
+	 *   * either a pointer to a merged_info, or a conflict_info struct
+	 *   * merged_info contains all relevant information for a
+	 *     non-conflicted entry.
+	 *   * conflict_info contains a merged_info, plus any additional
+	 *     information about a conflict such as the higher orders stages
+	 *     involved and the names of the paths those came from (handy
+	 *     once renames get involved).
+	 *   * a path may start "conflicted" (i.e. point to a conflict_info)
+	 *     and then a later step (e.g. three-way content merge) determines
+	 *     it can be cleanly merged, at which point it'll be marked clean
+	 *     and the algorithm will ignore any data outside the contained
+	 *     merged_info for that entry
+	 *   * If an entry remains conflicted, the merged_info portion of a
+	 *     conflict_info will later be filled with whatever version of
+	 *     the file should be placed in the working directory (e.g. an
+	 *     as-merged-as-possible variation that contains conflict markers).
+	 */
+	struct strmap paths;
+
+	/*
+	 * conflicted: a subset of keys->values from "paths"
+	 *
+	 * conflicted is basically an optimization between process_entries()
+	 * and record_conflicted_index_entries(); the latter could loop over
+	 * ALL the entries in paths AGAIN and look for the ones that are
+	 * still conflicted, but since process_entries() has to loop over
+	 * all of them, it saves the ones it couldn't resolve in this strmap
+	 * so that record_conflicted_index_entries() can iterate just the
+	 * relevant entries.
+	 */
+	struct strmap conflicted;
+
+	/*
+	 * current_dir_name: temporary var used in collect_merge_info_callback()
+	 *
+	 * Used to set merged_info.directory_name; see documentation for that
+	 * variable and the requirements placed on that field.
+	 */
+	const char *current_dir_name;
+
+	/* call_depth: recursion level counter for merging merge bases */
+	int call_depth;
+};
+
+struct version_info {
+	struct object_id oid;
+	unsigned short mode;
+};
+
+struct merged_info {
+	/* if is_null, ignore result.  otherwise result has oid & mode */
+	struct version_info result;
+	unsigned is_null:1;
+
+	/*
+	 * clean: whether the path in question is cleanly merged.
+	 *
+	 * see conflict_info.merged for more details.
+	 */
+	unsigned clean:1;
+
+	/*
+	 * basename_offset: offset of basename of path.
+	 *
+	 * perf optimization to avoid recomputing offset of final '/'
+	 * character in pathname (0 if no '/' in pathname).
+	 */
+	size_t basename_offset;
+
+	 /*
+	  * directory_name: containing directory name.
+	  *
+	  * Note that we assume directory_name is constructed such that
+	  *    strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
+	  * i.e. string equality is equivalent to pointer equality.  For this
+	  * to hold, we have to be careful setting directory_name.
+	  */
+	const char *directory_name;
+};
+
+struct conflict_info {
+	/*
+	 * merged: the version of the path that will be written to working tree
+	 *
+	 * WARNING: It is critical to check merged.clean and ensure it is 0
+	 * before reading any conflict_info fields outside of merged.
+	 * Allocated merge_info structs will always have clean set to 1.
+	 * Allocated conflict_info structs will have merged.clean set to 0
+	 * initially.  The merged.clean field is how we know if it is safe
+	 * to access other parts of conflict_info besides merged; if a
+	 * conflict_info's merged.clean is changed to 1, the rest of the
+	 * algorithm is not allowed to look at anything outside of the
+	 * merged member anymore.
+	 */
+	struct merged_info merged;
+
+	/* oids & modes from each of the three trees for this path */
+	struct version_info stages[3];
+
+	/* pathnames for each stage; may differ due to rename detection */
+	const char *pathnames[3];
+
+	/* Whether this path is/was involved in a directory/file conflict */
+	unsigned df_conflict:1;
+
+	/*
+	 * For filemask and dirmask, see tree-walk.h's struct traverse_info,
+	 * particularly the documentation above the "fn" member.  Note that
+	 * filemask = mask & ~dirmask from that documentation.
+	 */
+	unsigned filemask:3;
+	unsigned dirmask:3;
+
+	/*
+	 * Optimization to track which stages match, to avoid the need to
+	 * recompute it in multiple steps. Either 0 or at least 2 bits are
+	 * set; if at least 2 bits are set, their corresponding stages match.
+	 */
+	unsigned match_mask:3;
+};
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 02/20] merge-ort: add some high-level algorithm structure
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge_ort_nonrecursive_internal() will be used by both
merge_inmemory_nonrecursive() and merge_inmemory_recursive(); let's
focus on it for now.  It involves some setup -- merge_start() --
followed by the following chain of functions:

  collect_merge_info()
    This function will populate merge_options_internal's paths field,
    via a call to traverse_trees() and a new callback that will be added
    later.

  detect_and_process_renames()
    This function will detect renames, and then adjust entries in paths
    to move conflict stages from old pathnames into those for new
    pathnames, so that the next step doesn't have to think about renames
    and just can do three-way content merging and such.

  process_entries()
    This function determines how to take the various stages (versions of
    a file from the three different sides) and merge them, and whether
    to mark the result as conflicted or cleanly merged.  It also writes
    out these merged file versions as it goes to create a tree.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index bb37fdf838..8c9fea1a5a 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -18,6 +18,7 @@
 #include "merge-ort.h"
 
 #include "strmap.h"
+#include "tree.h"
 
 struct merge_options_internal {
 	/*
@@ -154,6 +155,38 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+static int collect_merge_info(struct merge_options *opt,
+			      struct tree *merge_base,
+			      struct tree *side1,
+			      struct tree *side2)
+{
+	/* TODO: Implement this using traverse_trees() */
+	die("Not yet implemented.");
+}
+
+static int detect_and_process_renames(struct merge_options *opt,
+				      struct tree *merge_base,
+				      struct tree *side1,
+				      struct tree *side2)
+{
+	int clean = 1;
+
+	/*
+	 * Rename detection works by detecting file similarity.  Here we use
+	 * a really easy-to-implement scheme: files are similar IFF they have
+	 * the same filename.  Therefore, by this scheme, there are no renames.
+	 *
+	 * TODO: Actually implement a real rename detection scheme.
+	 */
+	return clean;
+}
+
+static void process_entries(struct merge_options *opt,
+			    struct object_id *result_oid)
+{
+	die("Not yet implemented.");
+}
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
@@ -170,13 +203,46 @@ void merge_finalize(struct merge_options *opt,
 	die("Not yet implemented");
 }
 
+static void merge_start(struct merge_options *opt, struct merge_result *result)
+{
+	die("Not yet implemented.");
+}
+
+/*
+ * Originally from merge_trees_internal(); heavily adapted, though.
+ */
+static void merge_ort_nonrecursive_internal(struct merge_options *opt,
+					    struct tree *merge_base,
+					    struct tree *side1,
+					    struct tree *side2,
+					    struct merge_result *result)
+{
+	struct object_id working_tree_oid;
+
+	collect_merge_info(opt, merge_base, side1, side2);
+	result->clean = detect_and_process_renames(opt, merge_base,
+						   side1, side2);
+	process_entries(opt, &working_tree_oid);
+
+	/* Set return values */
+	result->tree = parse_tree_indirect(&working_tree_oid);
+	/* existence of conflicted entries implies unclean */
+	result->clean &= strmap_empty(&opt->priv->conflicted);
+	if (!opt->priv->call_depth) {
+		result->priv = opt->priv;
+		opt->priv = NULL;
+	}
+}
+
 void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *merge_base,
 			       struct tree *side1,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
-	die("Not yet implemented");
+	assert(opt->ancestor != NULL);
+	merge_start(opt, result);
+	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge_start() basically does a bunch of sanity checks, then allocates
and initializes opt->priv -- a struct merge_options_internal.

Most of the sanity checks are usable as-is.  The
allocation/intialization is a bit different since merge-ort has a very
different merge_options_internal than merge-recursive, but the idea is
the same.

The weirdest part here is that merge-ort and merge-recursive use the
same struct merge_options, even though merge_options has a number of
fields that are oddly specific to merge-recursive's internal
implementation and don't even make sense with merge-ort's high-level
design (e.g. buffer_output, which merge-ort has to always do).  I reused
the same data structure because:
  * most the fields made sense to both merge algorithms
  * making a new struct would have required making new enums or somehow
    externalizing them, and that was getting messy.
  * it simplifies converting the existing callers by not having to
    have different code paths for merge_options setup.

I also marked detect_renames as ignored.  We can revisit that later, but
in short: merge-recursive allowed turning off rename detection because
it was sometimes glacially slow.  When you speed something up by a few
orders of magnitude, it's worth revisiting whether that justification is
still relevant.  Besides, if folks find it's still too slow, perhaps
they have a better scaling case than I could find and maybe it turns up
some more optimizations we can add.  If it still is needed as an option,
it is easy to add later.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8c9fea1a5a..f8ac721aa3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,8 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "diff.h"
+#include "diffcore.h"
 #include "strmap.h"
 #include "tree.h"
 
@@ -205,7 +207,48 @@ void merge_finalize(struct merge_options *opt,
 
 static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
-	die("Not yet implemented.");
+	/* Sanity checks on opt */
+	assert(opt->repo);
+
+	assert(opt->branch1 && opt->branch2);
+
+	assert(opt->detect_directory_renames >= MERGE_DIRECTORY_RENAMES_NONE &&
+	       opt->detect_directory_renames <= MERGE_DIRECTORY_RENAMES_TRUE);
+	assert(opt->rename_limit >= -1);
+	assert(opt->rename_score >= 0 && opt->rename_score <= MAX_SCORE);
+	assert(opt->show_rename_progress >= 0 && opt->show_rename_progress <= 1);
+
+	assert(opt->xdl_opts >= 0);
+	assert(opt->recursive_variant >= MERGE_VARIANT_NORMAL &&
+	       opt->recursive_variant <= MERGE_VARIANT_THEIRS);
+
+	/*
+	 * detect_renames, verbosity, buffer_output, and obuf are ignored
+	 * fields that were used by "recursive" rather than "ort" -- but
+	 * sanity check them anyway.
+	 */
+	assert(opt->detect_renames >= -1 &&
+	       opt->detect_renames <= DIFF_DETECT_COPY);
+	assert(opt->verbosity >= 0 && opt->verbosity <= 5);
+	assert(opt->buffer_output <= 2);
+	assert(opt->obuf.len == 0);
+
+	assert(opt->priv == NULL);
+
+	/* Initialization of opt->priv, our internal merge data */
+	opt->priv = xcalloc(1, sizeof(*opt->priv));
+
+	/*
+	 * Although we initialize opt->priv->paths with strdup_strings=0,
+	 * that's just to avoid making yet another copy of an allocated
+	 * string.  Putting the entry into paths means we are taking
+	 * ownership, so we will later free it.
+	 *
+	 * In contrast, conflicted just has a subset of keys from paths, so
+	 * we don't want to free those (it'd be a duplicate free).
+	 */
+	strmap_init_with_options(&opt->priv->paths, NULL, 0);
+	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 04/20] merge-ort: use histogram diff
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-12-04 20:47   ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

In my cursory investigation, histogram diffs are about 2% slower than
Myers diffs.  Others have probably done more detailed benchmarks.  But,
in short, histogram diffs have been around for years and in a number of
cases provide obviously better looking diffs where Myers diffs are
unintelligible but the performance hit has kept them from becoming the
default.

However, there are real merge bugs we know about that have triggered on
git.git and linux.git, which I don't have a clue how to address without
the additional information that I believe is provided by histogram
diffs.  See the following:

https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/
https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/

I don't like mismerges.  I really don't like silent mismerges.  While I
am sometimes willing to make performance and correctness tradeoff, I'm
much more interested in correctness in general.  I want to fix the above
bugs.  I have not yet started doing so, but I believe histogram diff at
least gives me an angle.  Unfortunately, I can't rely on using the
information from histogram diff unless it's in use.  And it hasn't been
used because of a few percentage performance hit.

In testcases I have looked at, merge-ort is _much_ faster than
merge-recursive for non-trivial merges/rebases/cherry-picks.  As such,
this is a golden opportunity to switch out the underlying diff algorithm
(at least the one used by the merge machinery; git-diff and git-log are
separate questions); doing so will allow me to get additional data and
improved diffs, and I believe it will help me fix the above bugs at some
point in the future.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index f8ac721aa3..ff305bcbe4 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -21,6 +21,7 @@
 #include "diffcore.h"
 #include "strmap.h"
 #include "tree.h"
+#include "xdiff-interface.h"
 
 struct merge_options_internal {
 	/*
@@ -235,6 +236,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	assert(opt->priv == NULL);
 
+	/* Default to histogram diff.  Actually, just hardcode it...for now. */
+	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
+
 	/* Initialization of opt->priv, our internal merge data */
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-12-04 20:47   ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Various places in merge-recursive used an err() function when it hit
some kind of unrecoverable error.  That code was from the reusable bits
of merge-recursive.c that we liked, such as merge_3way, writing object
files to the object store, reading blobs from the object store, etc.  So
create a similar function to allow us to port that code over, and use it
for when we detect problems returned from collect_merge_info()'s
traverse_trees() call, which we will be adding next.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index ff305bcbe4..b056db6fc8 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -158,12 +158,27 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+static int err(struct merge_options *opt, const char *err, ...)
+{
+	va_list params;
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addstr(&sb, "error: ");
+	va_start(params, err);
+	strbuf_vaddf(&sb, err, params);
+	va_end(params);
+
+	error("%s", sb.buf);
+	strbuf_release(&sb);
+
+	return -1;
+}
+
 static int collect_merge_info(struct merge_options *opt,
 			      struct tree *merge_base,
 			      struct tree *side1,
 			      struct tree *side2)
 {
-	/* TODO: Implement this using traverse_trees() */
 	die("Not yet implemented.");
 }
 
@@ -266,7 +281,19 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
-	collect_merge_info(opt, merge_base, side1, side2);
+	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
+		/*
+		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
+		 * base, and 2-3) the trees for the two trees we're merging.
+		 */
+		err(opt, _("collecting merge info failed for trees %s, %s, %s"),
+		    oid_to_hex(&merge_base->object.oid),
+		    oid_to_hex(&side1->object.oid),
+		    oid_to_hex(&side2->object.oid));
+		result->clean = -1;
+		return;
+	}
+
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
 	process_entries(opt, &working_tree_oid);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info()
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-12-04 20:47   ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This does not actually collect any necessary info other than the
pathnames involved, since it just allocates an all-zero conflict_info
and stuffs that into paths.  However, it invokes the traverse_trees()
machinery to walk over all the paths and sets up the basic
infrastructure we need.

I have left out a few obvious optimizations to try to make this patch as
short and obvious as possible.  A subsequent patch will add some of
those back in with some more useful data fields before we introduce a
patch that actually sets up the conflict_info fields.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 117 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index b056db6fc8..0c37f8bf52 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -174,12 +174,128 @@ static int err(struct merge_options *opt, const char *err, ...)
 	return -1;
 }
 
+static int collect_merge_info_callback(int n,
+				       unsigned long mask,
+				       unsigned long dirmask,
+				       struct name_entry *names,
+				       struct traverse_info *info)
+{
+	/*
+	 * n is 3.  Always.
+	 * common ancestor (mbase) has mask 1, and stored in index 0 of names
+	 * head of side 1  (side1) has mask 2, and stored in index 1 of names
+	 * head of side 2  (side2) has mask 4, and stored in index 2 of names
+	 */
+	struct merge_options *opt = info->data;
+	struct merge_options_internal *opti = opt->priv;
+	struct conflict_info *ci;
+	struct name_entry *p;
+	size_t len;
+	char *fullpath;
+	unsigned filemask = mask & ~dirmask;
+	unsigned mbase_null = !(mask & 1);
+	unsigned side1_null = !(mask & 2);
+	unsigned side2_null = !(mask & 4);
+
+	/* n = 3 is a fundamental assumption. */
+	if (n != 3)
+		BUG("Called collect_merge_info_callback wrong");
+
+	/*
+	 * A bunch of sanity checks verifying that traverse_trees() calls
+	 * us the way I expect.  Could just remove these at some point,
+	 * though maybe they are helpful to future code readers.
+	 */
+	assert(mbase_null == is_null_oid(&names[0].oid));
+	assert(side1_null == is_null_oid(&names[1].oid));
+	assert(side2_null == is_null_oid(&names[2].oid));
+	assert(!mbase_null || !side1_null || !side2_null);
+	assert(mask > 0 && mask < 8);
+
+	/*
+	 * Get the name of the relevant filepath, which we'll pass to
+	 * setup_path_info() for tracking.
+	 */
+	p = names;
+	while (!p->mode)
+		p++;
+	len = traverse_path_len(info, p->pathlen);
+
+	/* +1 in both of the following lines to include the NUL byte */
+	fullpath = xmalloc(len + 1);
+	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
+
+	/*
+	 * TODO: record information about the path other than all zeros,
+	 * so we can resolve later in process_entries.
+	 */
+	ci = xcalloc(1, sizeof(struct conflict_info));
+	strmap_put(&opti->paths, fullpath, ci);
+
+	/* If dirmask, recurse into subdirectories */
+	if (dirmask) {
+		struct traverse_info newinfo;
+		struct tree_desc t[3];
+		void *buf[3] = {NULL, NULL, NULL};
+		const char *original_dir_name;
+		int i, ret;
+
+		ci->match_mask &= filemask;
+		newinfo = *info;
+		newinfo.prev = info;
+		newinfo.name = p->path;
+		newinfo.namelen = p->pathlen;
+		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
+
+		for (i = 0; i < 3; i++) {
+			const struct object_id *oid = NULL;
+			if (dirmask & 1)
+				oid = &names[i].oid;
+			buf[i] = fill_tree_descriptor(opt->repo, t + i, oid);
+			dirmask >>= 1;
+		}
+
+		original_dir_name = opti->current_dir_name;
+		opti->current_dir_name = fullpath;
+		ret = traverse_trees(NULL, 3, t, &newinfo);
+		opti->current_dir_name = original_dir_name;
+
+		for (i = 0; i < 3; i++)
+			free(buf[i]);
+
+		if (ret < 0)
+			return -1;
+	}
+
+	return mask;
+}
+
 static int collect_merge_info(struct merge_options *opt,
 			      struct tree *merge_base,
 			      struct tree *side1,
 			      struct tree *side2)
 {
-	die("Not yet implemented.");
+	int ret;
+	struct tree_desc t[3];
+	struct traverse_info info;
+	const char *toplevel_dir_placeholder = "";
+
+	opt->priv->current_dir_name = toplevel_dir_placeholder;
+	setup_traverse_info(&info, toplevel_dir_placeholder);
+	info.fn = collect_merge_info_callback;
+	info.data = opt;
+	info.show_all_errors = 1;
+
+	parse_tree(merge_base);
+	parse_tree(side1);
+	parse_tree(side2);
+	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
+	init_tree_desc(t + 1, side1->buffer, side1->size);
+	init_tree_desc(t + 2, side2->buffer, side2->size);
+
+	ret = traverse_trees(NULL, 3, t, &info);
+
+	return ret;
 }
 
 static int detect_and_process_renames(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2020-12-04 20:47   ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Three-way merges, by their nature, are going to often have two or more
trees match at a given subdirectory.  We can avoid calling
fill_tree_descriptor() on the same tree by checking when these trees
match.  Noting when various oids match will also be useful in other
calculations and optimizations as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 0c37f8bf52..ab3119d2d8 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -196,6 +196,15 @@ static int collect_merge_info_callback(int n,
 	unsigned mbase_null = !(mask & 1);
 	unsigned side1_null = !(mask & 2);
 	unsigned side2_null = !(mask & 4);
+	unsigned side1_matches_mbase = (!side1_null && !mbase_null &&
+					names[0].mode == names[1].mode &&
+					oideq(&names[0].oid, &names[1].oid));
+	unsigned side2_matches_mbase = (!side2_null && !mbase_null &&
+					names[0].mode == names[2].mode &&
+					oideq(&names[0].oid, &names[2].oid));
+	unsigned sides_match = (!side1_null && !side2_null &&
+				names[1].mode == names[2].mode &&
+				oideq(&names[1].oid, &names[2].oid));
 
 	/* n = 3 is a fundamental assumption. */
 	if (n != 3)
@@ -248,10 +257,19 @@ static int collect_merge_info_callback(int n,
 		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
 
 		for (i = 0; i < 3; i++) {
-			const struct object_id *oid = NULL;
-			if (dirmask & 1)
-				oid = &names[i].oid;
-			buf[i] = fill_tree_descriptor(opt->repo, t + i, oid);
+			if (i == 1 && side1_matches_mbase)
+				t[1] = t[0];
+			else if (i == 2 && side2_matches_mbase)
+				t[2] = t[0];
+			else if (i == 2 && sides_match)
+				t[2] = t[1];
+			else {
+				const struct object_id *oid = NULL;
+				if (dirmask & 1)
+					oid = &names[i].oid;
+				buf[i] = fill_tree_descriptor(opt->repo,
+							      t + i, oid);
+			}
 			dirmask >>= 1;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (6 preceding siblings ...)
  2020-12-04 20:47   ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:47   ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index ab3119d2d8..b4e4c1f157 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -193,6 +193,7 @@ static int collect_merge_info_callback(int n,
 	size_t len;
 	char *fullpath;
 	unsigned filemask = mask & ~dirmask;
+	unsigned match_mask = 0; /* will be updated below */
 	unsigned mbase_null = !(mask & 1);
 	unsigned side1_null = !(mask & 2);
 	unsigned side2_null = !(mask & 4);
@@ -206,6 +207,22 @@ static int collect_merge_info_callback(int n,
 				names[1].mode == names[2].mode &&
 				oideq(&names[1].oid, &names[2].oid));
 
+	/*
+	 * Note: When a path is a file on one side of history and a directory
+	 * in another, we have a directory/file conflict.  In such cases, if
+	 * the conflict doesn't resolve from renames and deletions, then we
+	 * always leave directories where they are and move files out of the
+	 * way.  Thus, while struct conflict_info has a df_conflict field to
+	 * track such conflicts, we ignore that field for any directories at
+	 * a path and only pay attention to it for files at the given path.
+	 * The fact that we leave directories were they are also means that
+	 * we do not need to worry about getting additional df_conflict
+	 * information propagated from parent directories down to children
+	 * (unlike, say traverse_trees_recursive() in unpack-trees.c, which
+	 * sets a newinfo.df_conflicts field specifically to propagate it).
+	 */
+	unsigned df_conflict = (filemask != 0) && (dirmask != 0);
+
 	/* n = 3 is a fundamental assumption. */
 	if (n != 3)
 		BUG("Called collect_merge_info_callback wrong");
@@ -221,6 +238,14 @@ static int collect_merge_info_callback(int n,
 	assert(!mbase_null || !side1_null || !side2_null);
 	assert(mask > 0 && mask < 8);
 
+	/* Determine match_mask */
+	if (side1_matches_mbase)
+		match_mask = (side2_matches_mbase ? 7 : 3);
+	else if (side2_matches_mbase)
+		match_mask = 5;
+	else if (sides_match)
+		match_mask = 6;
+
 	/*
 	 * Get the name of the relevant filepath, which we'll pass to
 	 * setup_path_info() for tracking.
@@ -239,6 +264,8 @@ static int collect_merge_info_callback(int n,
 	 * so we can resolve later in process_entries.
 	 */
 	ci = xcalloc(1, sizeof(struct conflict_info));
+	ci->df_conflict = df_conflict;
+	ci->match_mask = match_mask;
 	strmap_put(&opti->paths, fullpath, ci);
 
 	/* If dirmask, recurse into subdirectories */
@@ -255,6 +282,15 @@ static int collect_merge_info_callback(int n,
 		newinfo.name = p->path;
 		newinfo.namelen = p->pathlen;
 		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
+		/*
+		 * If this directory we are about to recurse into cared about
+		 * its parent directory (the current directory) having a D/F
+		 * conflict, then we'd propagate the masks in this way:
+		 *    newinfo.df_conflicts |= (mask & ~dirmask);
+		 * But we don't worry about propagating D/F conflicts.  (See
+		 * comment near setting of local df_conflict variable near
+		 * the beginning of this function).
+		 */
 
 		for (i = 0; i < 3; i++) {
 			if (i == 1 && side1_matches_mbase)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (7 preceding siblings ...)
  2020-12-04 20:47   ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
@ 2020-12-04 20:47   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:47 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create a helper function, setup_path_info(), which can be used to record
all the information we want in a merged_info or conflict_info.  While
there is currently only one caller of this new function, and some of its
particular parameters are fixed, future callers of this function will be
added later.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 90 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index b4e4c1f157..007c6fc067 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -158,6 +158,26 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+/*
+ * For the next three macros, see warning for conflict_info.merged.
+ *
+ * In each of the below, mi is a struct merged_info*, and ci was defined
+ * as a struct conflict_info* (but we need to verify ci isn't actually
+ * pointed at a struct merged_info*).
+ *
+ * INITIALIZE_CI: Assign ci to mi but only if it's safe; set to NULL otherwise.
+ * VERIFY_CI: Ensure that something we assigned to a conflict_info* is one.
+ * ASSIGN_AND_VERIFY_CI: Similar to VERIFY_CI but do assignment first.
+ */
+#define INITIALIZE_CI(ci, mi) do {                                           \
+	(ci) = (!(mi) || (mi)->clean) ? NULL : (struct conflict_info *)(mi); \
+} while (0)
+#define VERIFY_CI(ci) assert(ci && !ci->merged.clean);
+#define ASSIGN_AND_VERIFY_CI(ci, mi) do {    \
+	(ci) = (struct conflict_info *)(mi);  \
+	assert((ci) && !(mi)->clean);        \
+} while (0)
+
 static int err(struct merge_options *opt, const char *err, ...)
 {
 	va_list params;
@@ -174,6 +194,65 @@ static int err(struct merge_options *opt, const char *err, ...)
 	return -1;
 }
 
+static void setup_path_info(struct merge_options *opt,
+			    struct string_list_item *result,
+			    const char *current_dir_name,
+			    int current_dir_name_len,
+			    char *fullpath, /* we'll take over ownership */
+			    struct name_entry *names,
+			    struct name_entry *merged_version,
+			    unsigned is_null,     /* boolean */
+			    unsigned df_conflict, /* boolean */
+			    unsigned filemask,
+			    unsigned dirmask,
+			    int resolved          /* boolean */)
+{
+	/* result->util is void*, so mi is a convenience typed variable */
+	struct merged_info *mi;
+
+	assert(!is_null || resolved);
+	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
+	assert(resolved == (merged_version != NULL));
+
+	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
+				   sizeof(struct conflict_info));
+	mi->directory_name = current_dir_name;
+	mi->basename_offset = current_dir_name_len;
+	mi->clean = !!resolved;
+	if (resolved) {
+		mi->result.mode = merged_version->mode;
+		oidcpy(&mi->result.oid, &merged_version->oid);
+		mi->is_null = !!is_null;
+	} else {
+		int i;
+		struct conflict_info *ci;
+
+		ASSIGN_AND_VERIFY_CI(ci, mi);
+		for (i = 0; i < 3; i++) {
+			ci->pathnames[i] = fullpath;
+			ci->stages[i].mode = names[i].mode;
+			oidcpy(&ci->stages[i].oid, &names[i].oid);
+		}
+		ci->filemask = filemask;
+		ci->dirmask = dirmask;
+		ci->df_conflict = !!df_conflict;
+		if (dirmask)
+			/*
+			 * Assume is_null for now, but if we have entries
+			 * under the directory then when it is complete in
+			 * write_completed_directory() it'll update this.
+			 * Also, for D/F conflicts, we have to handle the
+			 * directory first, then clear this bit and process
+			 * the file to see how it is handled -- that occurs
+			 * near the top of process_entry().
+			 */
+			mi->is_null = 1;
+	}
+	strmap_put(&opt->priv->paths, fullpath, mi);
+	result->string = fullpath;
+	result->util = mi;
+}
+
 static int collect_merge_info_callback(int n,
 				       unsigned long mask,
 				       unsigned long dirmask,
@@ -188,10 +267,12 @@ static int collect_merge_info_callback(int n,
 	 */
 	struct merge_options *opt = info->data;
 	struct merge_options_internal *opti = opt->priv;
-	struct conflict_info *ci;
+	struct string_list_item pi;  /* Path Info */
+	struct conflict_info *ci; /* typed alias to pi.util (which is void*) */
 	struct name_entry *p;
 	size_t len;
 	char *fullpath;
+	const char *dirname = opti->current_dir_name;
 	unsigned filemask = mask & ~dirmask;
 	unsigned match_mask = 0; /* will be updated below */
 	unsigned mbase_null = !(mask & 1);
@@ -260,13 +341,15 @@ static int collect_merge_info_callback(int n,
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
-	 * TODO: record information about the path other than all zeros,
-	 * so we can resolve later in process_entries.
+	 * Record information about the path so we can resolve later in
+	 * process_entries.
 	 */
-	ci = xcalloc(1, sizeof(struct conflict_info));
-	ci->df_conflict = df_conflict;
+	setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+			names, NULL, 0, df_conflict, filemask, dirmask, 0);
+
+	ci = pi.util;
+	VERIFY_CI(ci);
 	ci->match_mask = match_mask;
-	strmap_put(&opti->paths, fullpath, ci);
 
 	/* If dirmask, recurse into subdirectories */
 	if (dirmask) {
@@ -310,7 +393,7 @@ static int collect_merge_info_callback(int n,
 		}
 
 		original_dir_name = opti->current_dir_name;
-		opti->current_dir_name = fullpath;
+		opti->current_dir_name = pi.string;
 		ret = traverse_trees(NULL, 3, t, &newinfo);
 		opti->current_dir_name = original_dir_name;
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 10/20] merge-ort: avoid recursing into identical trees
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (8 preceding siblings ...)
  2020-12-04 20:47   ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

When all three trees have the same oid, there is no need to recurse into
these trees to find that all files within them happen to match.  We can
just record any one of the trees as the resolution of merging that
particular path.

Immediately resolving trees for other types of trivial tree merges (such
as one side matches the merge base, or the two sides match each other)
would prevent us from detecting renames for some paths, and thus prevent
us from doing three-way content merges for those paths whose renames we
did not detect.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 007c6fc067..2dd52ab426 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -340,6 +340,19 @@ static int collect_merge_info_callback(int n,
 	fullpath = xmalloc(len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
+	/*
+	 * If mbase, side1, and side2 all match, we can resolve early.  Even
+	 * if these are trees, there will be no renames or anything
+	 * underneath.
+	 */
+	if (side1_matches_mbase && side2_matches_mbase) {
+		/* mbase, side1, & side2 all match; use mbase as resolution */
+		setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+				names, names+0, mbase_null, 0,
+				filemask, dirmask, 1);
+		return mask;
+	}
+
 	/*
 	 * Record information about the path so we can resolve later in
 	 * process_entries.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (9 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add a process_entries() implementation that just loops over the paths
and processes each one individually with an auxiliary process_entry()
call.  Add a basic process_entry() as well, which handles several cases
but leaves a few of the more involved ones with die-not-implemented
messages.  Also, although process_entries() is supposed to create a
tree, it does not yet have code to do so -- except in the special case
of merging completely empty trees.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 2dd52ab426..fbbbde1c3f 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -465,10 +465,111 @@ static int detect_and_process_renames(struct merge_options *opt,
 	return clean;
 }
 
+/* Per entry merge function */
+static void process_entry(struct merge_options *opt,
+			  const char *path,
+			  struct conflict_info *ci)
+{
+	VERIFY_CI(ci);
+	assert(ci->filemask >= 0 && ci->filemask <= 7);
+	/* ci->match_mask == 7 was handled in collect_merge_info_callback() */
+	assert(ci->match_mask == 0 || ci->match_mask == 3 ||
+	       ci->match_mask == 5 || ci->match_mask == 6);
+
+	if (ci->df_conflict) {
+		die("Not yet implemented.");
+	}
+
+	/*
+	 * NOTE: Below there is a long switch-like if-elseif-elseif... block
+	 *       which the code goes through even for the df_conflict cases
+	 *       above.  Well, it will once we don't die-not-implemented above.
+	 */
+	if (ci->match_mask) {
+		ci->merged.clean = 1;
+		if (ci->match_mask == 6) {
+			/* stages[1] == stages[2] */
+			ci->merged.result.mode = ci->stages[1].mode;
+			oidcpy(&ci->merged.result.oid, &ci->stages[1].oid);
+		} else {
+			/* determine the mask of the side that didn't match */
+			unsigned int othermask = 7 & ~ci->match_mask;
+			int side = (othermask == 4) ? 2 : 1;
+
+			ci->merged.result.mode = ci->stages[side].mode;
+			ci->merged.is_null = !ci->merged.result.mode;
+			oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
+
+			assert(othermask == 2 || othermask == 4);
+			assert(ci->merged.is_null ==
+			       (ci->filemask == ci->match_mask));
+		}
+	} else if (ci->filemask >= 6 &&
+		   (S_IFMT & ci->stages[1].mode) !=
+		   (S_IFMT & ci->stages[2].mode)) {
+		/*
+		 * Two different items from (file/submodule/symlink)
+		 */
+		die("Not yet implemented.");
+	} else if (ci->filemask >= 6) {
+		/*
+		 * TODO: Needs a two-way or three-way content merge, but we're
+		 * just being lazy and copying the version from HEAD and
+		 * leaving it as conflicted.
+		 */
+		ci->merged.clean = 0;
+		ci->merged.result.mode = ci->stages[1].mode;
+		oidcpy(&ci->merged.result.oid, &ci->stages[1].oid);
+	} else if (ci->filemask == 3 || ci->filemask == 5) {
+		/* Modify/delete */
+		die("Not yet implemented.");
+	} else if (ci->filemask == 2 || ci->filemask == 4) {
+		/* Added on one side */
+		int side = (ci->filemask == 4) ? 2 : 1;
+		ci->merged.result.mode = ci->stages[side].mode;
+		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
+		ci->merged.clean = !ci->df_conflict;
+	} else if (ci->filemask == 1) {
+		/* Deleted on both sides */
+		ci->merged.is_null = 1;
+		ci->merged.result.mode = 0;
+		oidcpy(&ci->merged.result.oid, &null_oid);
+		ci->merged.clean = 1;
+	}
+
+	/*
+	 * If still conflicted, record it separately.  This allows us to later
+	 * iterate over just conflicted entries when updating the index instead
+	 * of iterating over all entries.
+	 */
+	if (!ci->merged.clean)
+		strmap_put(&opt->priv->conflicted, path, ci);
+}
+
 static void process_entries(struct merge_options *opt,
 			    struct object_id *result_oid)
 {
-	die("Not yet implemented.");
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (strmap_empty(&opt->priv->paths)) {
+		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
+		return;
+	}
+
+	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
+		/*
+		 * NOTE: mi may actually be a pointer to a conflict_info, but
+		 * we have to check mi->clean first to see if it's safe to
+		 * reassign to such a pointer type.
+		 */
+		struct merged_info *mi = e->value;
+
+		if (!mi->clean)
+			process_entry(opt, e->key, e->value);
+	}
+
+	die("Tree creation not yet implemented");
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (10 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

We want to handle paths below a directory before needing to handle the
directory itself.  Also, we want to handle the directory immediately
after the paths below it, so we can't use simple lexicographic ordering
from strcmp (which would insert foo.txt between foo and foo/file.c).
Copy string_list_df_name_compare() from merge-recursive.c, and set up a
string list of paths sorted by that function so that we can iterate in
the desired order.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index fbbbde1c3f..c54837999f 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -465,6 +465,33 @@ static int detect_and_process_renames(struct merge_options *opt,
 	return clean;
 }
 
+static int string_list_df_name_compare(const char *one, const char *two)
+{
+	int onelen = strlen(one);
+	int twolen = strlen(two);
+	/*
+	 * Here we only care that entries for D/F conflicts are
+	 * adjacent, in particular with the file of the D/F conflict
+	 * appearing before files below the corresponding directory.
+	 * The order of the rest of the list is irrelevant for us.
+	 *
+	 * To achieve this, we sort with df_name_compare and provide
+	 * the mode S_IFDIR so that D/F conflicts will sort correctly.
+	 * We use the mode S_IFDIR for everything else for simplicity,
+	 * since in other cases any changes in their order due to
+	 * sorting cause no problems for us.
+	 */
+	int cmp = df_name_compare(one, onelen, S_IFDIR,
+				  two, twolen, S_IFDIR);
+	/*
+	 * Now that 'foo' and 'foo/bar' compare equal, we have to make sure
+	 * that 'foo' comes before 'foo/bar'.
+	 */
+	if (cmp)
+		return cmp;
+	return onelen - twolen;
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
@@ -551,24 +578,44 @@ static void process_entries(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *e;
+	struct string_list plist = STRING_LIST_INIT_NODUP;
+	struct string_list_item *entry;
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
+	/* Hack to pre-allocate plist to the desired size */
+	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+
+	/* Put every entry from paths into plist, then sort */
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
+		string_list_append(&plist, e->key)->util = e->value;
+	}
+	plist.cmp = string_list_df_name_compare;
+	string_list_sort(&plist);
+
+	/*
+	 * Iterate over the items in reverse order, so we can handle paths
+	 * below a directory before needing to handle the directory itself.
+	 */
+	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
+		char *path = entry->string;
 		/*
 		 * NOTE: mi may actually be a pointer to a conflict_info, but
 		 * we have to check mi->clean first to see if it's safe to
 		 * reassign to such a pointer type.
 		 */
-		struct merged_info *mi = e->value;
+		struct merged_info *mi = entry->util;
 
-		if (!mi->clean)
-			process_entry(opt, e->key, e->value);
+		if (!mi->clean) {
+			struct conflict_info *ci = (struct conflict_info *)mi;
+			process_entry(opt, path, ci);
+		}
 	}
 
+	string_list_clear(&plist, 0);
 	die("Tree creation not yet implemented");
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (11 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

As a step towards transforming the processed path->conflict_info entries
into an actual tree object, start recording basenames, modes, and oids
in a dir_metadata structure.  Subsequent commits will make use of this
to actually write a tree.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index c54837999f..60cd73416e 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -492,10 +492,31 @@ static int string_list_df_name_compare(const char *one, const char *two)
 	return onelen - twolen;
 }
 
+struct directory_versions {
+	struct string_list versions;
+};
+
+static void record_entry_for_tree(struct directory_versions *dir_metadata,
+				  const char *path,
+				  struct merged_info *mi)
+{
+	const char *basename;
+
+	if (mi->is_null)
+		/* nothing to record */
+		return;
+
+	basename = path + mi->basename_offset;
+	assert(strchr(basename, '/') == NULL);
+	string_list_append(&dir_metadata->versions,
+			   basename)->util = &mi->result;
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
-			  struct conflict_info *ci)
+			  struct conflict_info *ci,
+			  struct directory_versions *dir_metadata)
 {
 	VERIFY_CI(ci);
 	assert(ci->filemask >= 0 && ci->filemask <= 7);
@@ -503,6 +524,14 @@ static void process_entry(struct merge_options *opt,
 	assert(ci->match_mask == 0 || ci->match_mask == 3 ||
 	       ci->match_mask == 5 || ci->match_mask == 6);
 
+	if (ci->dirmask) {
+		record_entry_for_tree(dir_metadata, path, &ci->merged);
+		if (ci->filemask == 0)
+			/* nothing else to handle */
+			return;
+		assert(ci->df_conflict);
+	}
+
 	if (ci->df_conflict) {
 		die("Not yet implemented.");
 	}
@@ -571,6 +600,7 @@ static void process_entry(struct merge_options *opt,
 	 */
 	if (!ci->merged.clean)
 		strmap_put(&opt->priv->conflicted, path, ci);
+	record_entry_for_tree(dir_metadata, path, &ci->merged);
 }
 
 static void process_entries(struct merge_options *opt,
@@ -580,6 +610,7 @@ static void process_entries(struct merge_options *opt,
 	struct strmap_entry *e;
 	struct string_list plist = STRING_LIST_INIT_NODUP;
 	struct string_list_item *entry;
+	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP };
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
@@ -609,13 +640,16 @@ static void process_entries(struct merge_options *opt,
 		 */
 		struct merged_info *mi = entry->util;
 
-		if (!mi->clean) {
+		if (mi->clean)
+			record_entry_for_tree(&dir_metadata, path, mi);
+		else {
 			struct conflict_info *ci = (struct conflict_info *)mi;
-			process_entry(opt, path, ci);
+			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
 
 	string_list_clear(&plist, 0);
+	string_list_clear(&dir_metadata.versions, 0);
 	die("Tree creation not yet implemented");
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (12 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create a new function, write_tree(), which will take a list of
basenames, modes, and oids for a single directory and create a tree
object in the object-store.  We do not yet have just basenames, modes,
and oids for just a single directory (we have a mixture of entries from
all directory levels in the hierarchy) so we still die() before the
current call to write_tree(), but the next patch will rectify that.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 60cd73416e..eec6874943 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -19,6 +19,7 @@
 
 #include "diff.h"
 #include "diffcore.h"
+#include "object-store.h"
 #include "strmap.h"
 #include "tree.h"
 #include "xdiff-interface.h"
@@ -496,6 +497,62 @@ struct directory_versions {
 	struct string_list versions;
 };
 
+static int tree_entry_order(const void *a_, const void *b_)
+{
+	const struct string_list_item *a = a_;
+	const struct string_list_item *b = b_;
+
+	const struct merged_info *ami = a->util;
+	const struct merged_info *bmi = b->util;
+	return base_name_compare(a->string, strlen(a->string), ami->result.mode,
+				 b->string, strlen(b->string), bmi->result.mode);
+}
+
+static void write_tree(struct object_id *result_oid,
+		       struct string_list *versions,
+		       unsigned int offset,
+		       size_t hash_size)
+{
+	size_t maxlen = 0, extra;
+	unsigned int nr = versions->nr - offset;
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
+	int i;
+
+	/*
+	 * We want to sort the last (versions->nr-offset) entries in versions.
+	 * Do so by abusing the string_list API a bit: make another string_list
+	 * that contains just those entries and then sort them.
+	 *
+	 * We won't use relevant_entries again and will let it just pop off the
+	 * stack, so there won't be allocation worries or anything.
+	 */
+	relevant_entries.items = versions->items + offset;
+	relevant_entries.nr = versions->nr - offset;
+	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
+
+	/* Pre-allocate some space in buf */
+	extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
+	for (i = 0; i < nr; i++) {
+		maxlen += strlen(versions->items[offset+i].string) + extra;
+	}
+	strbuf_grow(&buf, maxlen);
+
+	/* Write each entry out to buf */
+	for (i = 0; i < nr; i++) {
+		struct merged_info *mi = versions->items[offset+i].util;
+		struct version_info *ri = &mi->result;
+		strbuf_addf(&buf, "%o %s%c",
+			    ri->mode,
+			    versions->items[offset+i].string, '\0');
+		strbuf_add(&buf, ri->oid.hash, hash_size);
+	}
+
+	/* Write this object file out, and record in result_oid */
+	write_object_file(buf.buf, buf.len, tree_type, result_oid);
+	strbuf_release(&buf);
+}
+
 static void record_entry_for_tree(struct directory_versions *dir_metadata,
 				  const char *path,
 				  struct merged_info *mi)
@@ -648,9 +705,17 @@ static void process_entries(struct merge_options *opt,
 		}
 	}
 
+	/*
+	 * TODO: We can't actually write a tree yet, because dir_metadata just
+	 * contains all basenames of all files throughout the tree with their
+	 * mode and hash.  Not only is that a nonsensical tree, it will have
+	 * lots of duplicates for paths such as "Makefile" or ".gitignore".
+	 */
+	die("Not yet implemented; need to process subtrees separately");
+	write_tree(result_oid, &dir_metadata.versions, 0,
+		   opt->repo->hash_algo->rawsz);
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
-	die("Tree creation not yet implemented");
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (13 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Our order for processing of entries means that if we have a tree of
files that looks like
   Makefile
   src/moduleA/foo.c
   src/moduleA/bar.c
   src/moduleB/baz.c
   src/moduleB/umm.c
   tokens.txt

Then we will process paths in the order of the leftmost column below.  I
have added two additional columns that help explain the algorithm that
follows; the 2nd column is there to remind us we have oid & mode info we
are tracking for each of these paths (which differs between the paths
which I'm not representing well here), and the third column annotates
the parent directory of the entry:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB
   src/moduleB              <version_info>    src
   src/moduleA/foo.c        <version_info>    src/moduleA
   src/moduleA/bar.c        <version_info>    src/moduleA
   src/moduleA              <version_info>    src
   src                      <version_info>    ""
   Makefile                 <version_info>    ""

When the parent directory changes, if it's a subdirectory of the previous
parent directory (e.g. "" -> src/moduleB) then we can just keep appending.
If the parent directory differs from the previous parent directory and is
not a subdirectory, then we should process that directory.

So, for example, when we get to this point:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB

and note that the next entry (src/moduleB) has a different parent than
the last one that isn't a subdirectory, we should write out a tree for it
   100644 blob <HASH> umm.c
   100644 blob <HASH> baz.c

then pop all the entries under that directory while recording the new
hash for that directory, leaving us with
   tokens.txt               <version_info>        ""
   src/moduleB              <new version_info>    src

This process repeats until at the end we get to
   tokens.txt               <version_info>        ""
   src                      <new version_info>    ""
   Makefile                 <version_info>        ""

and then we can write out the toplevel tree.  Since we potentially have
entries in our string_list corresponding to multiple different toplevel
directories, e.g. a slightly different repository might have:
   whizbang.txt             <version_info>        ""
   tokens.txt               <version_info>        ""
   src/moduleD              <new version_info>    src
   src/moduleC              <new version_info>    src
   src/moduleB              <new version_info>    src
   src/moduleA/foo.c        <version_info>        src/moduleA
   src/moduleA/bar.c        <version_info>        src/moduleA

When src/moduleA is popped off, we need to know that the "last
directory" reverts back to src, and how many entries in our string_list
are associated with that parent directory.  So I use an auxiliary
offsets string_list which would have (parent_directory,offset)
information of the form
   ""             0
   src            2
   src/moduleA    5

Whenever I write out a tree for a subdirectory, I set versions.nr to
the final offset value and then decrement offsets.nr...and then add
an entry to versions with a hash for the new directory.

The idea is relatively simple, there's just a lot of accounting to
implement this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 242 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 234 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index eec6874943..cf6f395c69 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -494,7 +494,46 @@ static int string_list_df_name_compare(const char *one, const char *two)
 }
 
 struct directory_versions {
+	/*
+	 * versions: list of (basename -> version_info)
+	 *
+	 * The basenames are in reverse lexicographic order of full pathnames,
+	 * as processed in process_entries().  This puts all entries within
+	 * a directory together, and covers the directory itself after
+	 * everything within it, allowing us to write subtrees before needing
+	 * to record information for the tree itself.
+	 */
 	struct string_list versions;
+
+	/*
+	 * offsets: list of (full relative path directories -> integer offsets)
+	 *
+	 * Since versions contains basenames from files in multiple different
+	 * directories, we need to know which entries in versions correspond
+	 * to which directories.  Values of e.g.
+	 *     ""             0
+	 *     src            2
+	 *     src/moduleA    5
+	 * Would mean that entries 0-1 of versions are files in the toplevel
+	 * directory, entries 2-4 are files under src/, and the remaining
+	 * entries starting at index 5 are files under src/moduleA/.
+	 */
+	struct string_list offsets;
+
+	/*
+	 * last_directory: directory that previously processed file found in
+	 *
+	 * last_directory starts NULL, but records the directory in which the
+	 * previous file was found within.  As soon as
+	 *    directory(current_file) != last_directory
+	 * then we need to start updating accounting in versions & offsets.
+	 * Note that last_directory is always the last path in "offsets" (or
+	 * NULL if "offsets" is empty) so this exists just for quick access.
+	 */
+	const char *last_directory;
+
+	/* last_directory_len: cached computation of strlen(last_directory) */
+	unsigned last_directory_len;
 };
 
 static int tree_entry_order(const void *a_, const void *b_)
@@ -569,6 +608,181 @@ static void record_entry_for_tree(struct directory_versions *dir_metadata,
 			   basename)->util = &mi->result;
 }
 
+static void write_completed_directory(struct merge_options *opt,
+				      const char *new_directory_name,
+				      struct directory_versions *info)
+{
+	const char *prev_dir;
+	struct merged_info *dir_info = NULL;
+	unsigned int offset;
+
+	/*
+	 * Some explanation of info->versions and info->offsets...
+	 *
+	 * process_entries() iterates over all relevant files AND
+	 * directories in reverse lexicographic order, and calls this
+	 * function.  Thus, an example of the paths that process_entries()
+	 * could operate on (along with the directories for those paths
+	 * being shown) is:
+	 *
+	 *     xtract.c             ""
+	 *     tokens.txt           ""
+	 *     src/moduleB/umm.c    src/moduleB
+	 *     src/moduleB/stuff.h  src/moduleB
+	 *     src/moduleB/baz.c    src/moduleB
+	 *     src/moduleB          src
+	 *     src/moduleA/foo.c    src/moduleA
+	 *     src/moduleA/bar.c    src/moduleA
+	 *     src/moduleA          src
+	 *     src                  ""
+	 *     Makefile             ""
+	 *
+	 * info->versions:
+	 *
+	 *     always contains the unprocessed entries and their
+	 *     version_info information.  For example, after the first five
+	 *     entries above, info->versions would be:
+	 *
+	 *     	   xtract.c     <xtract.c's version_info>
+	 *     	   token.txt    <token.txt's version_info>
+	 *     	   umm.c        <src/moduleB/umm.c's version_info>
+	 *     	   stuff.h      <src/moduleB/stuff.h's version_info>
+	 *     	   baz.c        <src/moduleB/baz.c's version_info>
+	 *
+	 *     Once a subdirectory is completed we remove the entries in
+	 *     that subdirectory from info->versions, writing it as a tree
+	 *     (write_tree()).  Thus, as soon as we get to src/moduleB,
+	 *     info->versions would be updated to
+	 *
+	 *     	   xtract.c     <xtract.c's version_info>
+	 *     	   token.txt    <token.txt's version_info>
+	 *     	   moduleB      <src/moduleB's version_info>
+	 *
+	 * info->offsets:
+	 *
+	 *     helps us track which entries in info->versions correspond to
+	 *     which directories.  When we are N directories deep (e.g. 4
+	 *     for src/modA/submod/subdir/), we have up to N+1 unprocessed
+	 *     directories (+1 because of toplevel dir).  Corresponding to
+	 *     the info->versions example above, after processing five entries
+	 *     info->offsets will be:
+	 *
+	 *     	   ""           0
+	 *     	   src/moduleB  2
+	 *
+	 *     which is used to know that xtract.c & token.txt are from the
+	 *     toplevel dirctory, while umm.c & stuff.h & baz.c are from the
+	 *     src/moduleB directory.  Again, following the example above,
+	 *     once we need to process src/moduleB, then info->offsets is
+	 *     updated to
+	 *
+	 *     	   ""           0
+	 *     	   src          2
+	 *
+	 *     which says that moduleB (and only moduleB so far) is in the
+	 *     src directory.
+	 *
+	 *     One unique thing to note about info->offsets here is that
+	 *     "src" was not added to info->offsets until there was a path
+	 *     (a file OR directory) immediately below src/ that got
+	 *     processed.
+	 *
+	 * Since process_entry() just appends new entries to info->versions,
+	 * write_completed_directory() only needs to do work if the next path
+	 * is in a directory that is different than the last directory found
+	 * in info->offsets.
+	 */
+
+	/*
+	 * If we are working with the same directory as the last entry, there
+	 * is no work to do.  (See comments above the directory_name member of
+	 * struct merged_info for why we can use pointer comparison instead of
+	 * strcmp here.)
+	 */
+	if (new_directory_name == info->last_directory)
+		return;
+
+	/*
+	 * If we are just starting (last_directory is NULL), or last_directory
+	 * is a prefix of the current directory, then we can just update
+	 * info->offsets to record the offset where we started this directory
+	 * and update last_directory to have quick access to it.
+	 */
+	if (info->last_directory == NULL ||
+	    !strncmp(new_directory_name, info->last_directory,
+		     info->last_directory_len)) {
+		uintptr_t offset = info->versions.nr;
+
+		info->last_directory = new_directory_name;
+		info->last_directory_len = strlen(info->last_directory);
+		/*
+		 * Record the offset into info->versions where we will
+		 * start recording basenames of paths found within
+		 * new_directory_name.
+		 */
+		string_list_append(&info->offsets,
+				   info->last_directory)->util = (void*)offset;
+		return;
+	}
+
+	/*
+	 * The next entry that will be processed will be within
+	 * new_directory_name.  Since at this point we know that
+	 * new_directory_name is within a different directory than
+	 * info->last_directory, we have all entries for info->last_directory
+	 * in info->versions and we need to create a tree object for them.
+	 */
+	dir_info = strmap_get(&opt->priv->paths, info->last_directory);
+	assert(dir_info);
+	offset = (uintptr_t)info->offsets.items[info->offsets.nr-1].util;
+	if (offset == info->versions.nr) {
+		/*
+		 * Actually, we don't need to create a tree object in this
+		 * case.  Whenever all files within a directory disappear
+		 * during the merge (e.g. unmodified on one side and
+		 * deleted on the other, or files were renamed elsewhere),
+		 * then we get here and the directory itself needs to be
+		 * omitted from its parent tree as well.
+		 */
+		dir_info->is_null = 1;
+	} else {
+		/*
+		 * Write out the tree to the git object directory, and also
+		 * record the mode and oid in dir_info->result.
+		 */
+		dir_info->is_null = 0;
+		dir_info->result.mode = S_IFDIR;
+		write_tree(&dir_info->result.oid, &info->versions, offset,
+			   opt->repo->hash_algo->rawsz);
+	}
+
+	/*
+	 * We've now used several entries from info->versions and one entry
+	 * from info->offsets, so we get rid of those values.
+	 */
+	info->offsets.nr--;
+	info->versions.nr = offset;
+
+	/*
+	 * Now we've taken care of the completed directory, but we need to
+	 * prepare things since future entries will be in
+	 * new_directory_name.  (In particular, process_entry() will be
+	 * appending new entries to info->versions.)  So, we need to make
+	 * sure new_directory_name is the last entry in info->offsets.
+	 */
+	prev_dir = info->offsets.nr == 0 ? NULL :
+		   info->offsets.items[info->offsets.nr-1].string;
+	if (new_directory_name != prev_dir) {
+		uintptr_t c = info->versions.nr;
+		string_list_append(&info->offsets,
+				   new_directory_name)->util = (void*)c;
+	}
+
+	/* And, of course, we need to update last_directory to match. */
+	info->last_directory = new_directory_name;
+	info->last_directory_len = strlen(info->last_directory);
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
@@ -667,7 +881,9 @@ static void process_entries(struct merge_options *opt,
 	struct strmap_entry *e;
 	struct string_list plist = STRING_LIST_INIT_NODUP;
 	struct string_list_item *entry;
-	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP };
+	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP,
+						   STRING_LIST_INIT_NODUP,
+						   NULL, 0 };
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
@@ -687,6 +903,11 @@ static void process_entries(struct merge_options *opt,
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
 	 * below a directory before needing to handle the directory itself.
+	 *
+	 * This allows us to write subtrees before we need to write trees,
+	 * and it also enables sane handling of directory/file conflicts
+	 * (because it allows us to know whether the directory is still in
+	 * the way when it is time to process the file at the same path).
 	 */
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
@@ -697,6 +918,8 @@ static void process_entries(struct merge_options *opt,
 		 */
 		struct merged_info *mi = entry->util;
 
+		write_completed_directory(opt, mi->directory_name,
+					  &dir_metadata);
 		if (mi->clean)
 			record_entry_for_tree(&dir_metadata, path, mi);
 		else {
@@ -705,17 +928,20 @@ static void process_entries(struct merge_options *opt,
 		}
 	}
 
-	/*
-	 * TODO: We can't actually write a tree yet, because dir_metadata just
-	 * contains all basenames of all files throughout the tree with their
-	 * mode and hash.  Not only is that a nonsensical tree, it will have
-	 * lots of duplicates for paths such as "Makefile" or ".gitignore".
-	 */
-	die("Not yet implemented; need to process subtrees separately");
+	if (dir_metadata.offsets.nr != 1 ||
+	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
+		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
+		       dir_metadata.offsets.nr);
+		printf("dir_metadata.offsets.items[0].util = %u (should be 0)\n",
+		       (unsigned)(uintptr_t)dir_metadata.offsets.items[0].util);
+		fflush(stdout);
+		BUG("dir_metadata accounting completely off; shouldn't happen");
+	}
 	write_tree(result_oid, &dir_metadata.versions, 0,
 		   opt->repo->hash_algo->rawsz);
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
+	string_list_clear(&dir_metadata.offsets, 0);
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result()
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (14 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a basic implementation for merge_switch_to_result(), though
just in terms of a few new empty functions that will be defined in
subsequent commits.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index cf6f395c69..fe22751d22 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -944,13 +944,53 @@ static void process_entries(struct merge_options *opt,
 	string_list_clear(&dir_metadata.offsets, 0);
 }
 
+static int checkout(struct merge_options *opt,
+		    struct tree *prev,
+		    struct tree *next)
+{
+	die("Not yet implemented.");
+}
+
+static int record_conflicted_index_entries(struct merge_options *opt,
+					   struct index_state *index,
+					   struct strmap *paths,
+					   struct strmap *conflicted)
+{
+	if (strmap_empty(conflicted))
+		return 0;
+
+	die("Not yet implemented.");
+}
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
 			    int update_worktree_and_index,
 			    int display_update_msgs)
 {
-	die("Not yet implemented");
+	assert(opt->priv == NULL);
+	if (result->clean >= 0 && update_worktree_and_index) {
+		struct merge_options_internal *opti = result->priv;
+
+		if (checkout(opt, head, result->tree)) {
+			/* failure to function */
+			result->clean = -1;
+			return;
+		}
+
+		if (record_conflicted_index_entries(opt, opt->repo->index,
+						    &opti->paths,
+						    &opti->conflicted)) {
+			/* failure to function */
+			result->clean = -1;
+			return;
+		}
+	}
+
+	if (display_update_msgs) {
+		/* TODO: print out CONFLICT and other informational messages. */
+	}
+
 	merge_finalize(opt, result);
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 17/20] merge-ort: add implementation of checkout()
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (15 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Since merge-ort creates a tree for its output, when there are no
conflicts, updating the working tree and index is as simple as using the
unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent
of a "checkout" operation).

If there were conflicts in the merge, then since the tree we created
included all the conflict markers, then using the unpack_trees machinery
in this manner will still update the working tree correctly.  Further,
all index entries corresponding to cleanly merged files will also be
updated correctly by this procedure.  Index entries corresponding to
conflicted entries will appear as though the user had run "git add -u"
after the merge to accept all files as-is with conflict markers.

Thus, after running unpack_trees(), there needs to be a separate step
for updating the entries in the index corresponding to conflicted files.
This will be the job for the function record_conflicted_index_entris(),
which will be implemented in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index fe22751d22..ba62f80420 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -19,9 +19,11 @@
 
 #include "diff.h"
 #include "diffcore.h"
+#include "dir.h"
 #include "object-store.h"
 #include "strmap.h"
 #include "tree.h"
+#include "unpack-trees.h"
 #include "xdiff-interface.h"
 
 struct merge_options_internal {
@@ -948,7 +950,48 @@ static int checkout(struct merge_options *opt,
 		    struct tree *prev,
 		    struct tree *next)
 {
-	die("Not yet implemented.");
+	/* Switch the index/working copy from old to new */
+	int ret;
+	struct tree_desc trees[2];
+	struct unpack_trees_options unpack_opts;
+
+	memset(&unpack_opts, 0, sizeof(unpack_opts));
+	unpack_opts.head_idx = -1;
+	unpack_opts.src_index = opt->repo->index;
+	unpack_opts.dst_index = opt->repo->index;
+
+	setup_unpack_trees_porcelain(&unpack_opts, "merge");
+
+	/*
+	 * NOTE: if this were just "git checkout" code, we would probably
+	 * read or refresh the cache and check for a conflicted index, but
+	 * builtin/merge.c or sequencer.c really needs to read the index
+	 * and check for conflicted entries before starting merging for a
+	 * good user experience (no sense waiting for merges/rebases before
+	 * erroring out), so there's no reason to duplicate that work here.
+	 */
+
+	/* 2-way merge to the new branch */
+	unpack_opts.update = 1;
+	unpack_opts.merge = 1;
+	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
+	unpack_opts.verbose_update = (opt->verbosity > 2);
+	unpack_opts.fn = twoway_merge;
+	if (1/* FIXME: opts->overwrite_ignore*/) {
+		unpack_opts.dir = xcalloc(1, sizeof(*unpack_opts.dir));
+		unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(unpack_opts.dir);
+	}
+	parse_tree(prev);
+	init_tree_desc(&trees[0], prev->buffer, prev->size);
+	parse_tree(next);
+	init_tree_desc(&trees[1], next->buffer, next->size);
+
+	ret = unpack_trees(2, trees, &unpack_opts);
+	clear_unpack_trees_porcelain(&unpack_opts);
+	dir_clear(unpack_opts.dir);
+	FREE_AND_NULL(unpack_opts.dir);
+	return ret;
 }
 
 static int record_conflicted_index_entries(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (16 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 tree.c | 2 +-
 tree.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tree.c b/tree.c
index e76517f6b1..a52479812c 100644
--- a/tree.c
+++ b/tree.c
@@ -144,7 +144,7 @@ int read_tree_recursive(struct repository *r,
 	return ret;
 }
 
-static int cmp_cache_name_compare(const void *a_, const void *b_)
+int cmp_cache_name_compare(const void *a_, const void *b_)
 {
 	const struct cache_entry *ce1, *ce2;
 
diff --git a/tree.h b/tree.h
index 9383745073..3eb0484cbf 100644
--- a/tree.h
+++ b/tree.h
@@ -28,6 +28,8 @@ void free_tree_buffer(struct tree *tree);
 /* Parses and returns the tree in the given ent, chasing tags and commits. */
 struct tree *parse_tree_indirect(const struct object_id *oid);
 
+int cmp_cache_name_compare(const void *a_, const void *b_);
+
 #define READ_TREE_RECURSIVE 1
 typedef int (*read_tree_fn_t)(const struct object_id *, struct strbuf *, const char *, unsigned int, int, void *);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 19/20] merge-ort: add implementation of record_conflicted_index_entries()
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (17 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-04 20:48   ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

After checkout(), the working tree has the appropriate contents, and the
index matches the working copy.  That means that all unmodified and
cleanly merged files have correct index entries, but conflicted entries
need to be updated.

We do this by looping over the conflicted entries, marking the existing
index entry for the path with CE_REMOVE, adding new higher order staged
for the path at the end of the index (ignoring normal index sort order),
and then at the end of the loop removing the CE_REMOVED-marked cache
entries and sorting the index.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index ba62f80420..faebee8e7e 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,7 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "cache-tree.h"
 #include "diff.h"
 #include "diffcore.h"
 #include "dir.h"
@@ -999,10 +1000,95 @@ static int record_conflicted_index_entries(struct merge_options *opt,
 					   struct strmap *paths,
 					   struct strmap *conflicted)
 {
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+	int errs = 0;
+	int original_cache_nr;
+
 	if (strmap_empty(conflicted))
 		return 0;
 
-	die("Not yet implemented.");
+	original_cache_nr = index->cache_nr;
+
+	/* Put every entry from paths into plist, then sort */
+	strmap_for_each_entry(conflicted, &iter, e) {
+		const char *path = e->key;
+		struct conflict_info *ci = e->value;
+		int pos;
+		struct cache_entry *ce;
+		int i;
+
+		VERIFY_CI(ci);
+
+		/*
+		 * The index will already have a stage=0 entry for this path,
+		 * because we created an as-merged-as-possible version of the
+		 * file and checkout() moved the working copy and index over
+		 * to that version.
+		 *
+		 * However, previous iterations through this loop will have
+		 * added unstaged entries to the end of the cache which
+		 * ignore the standard alphabetical ordering of cache
+		 * entries and break invariants needed for index_name_pos()
+		 * to work.  However, we know the entry we want is before
+		 * those appended cache entries, so do a temporary swap on
+		 * cache_nr to only look through entries of interest.
+		 */
+		SWAP(index->cache_nr, original_cache_nr);
+		pos = index_name_pos(index, path, strlen(path));
+		SWAP(index->cache_nr, original_cache_nr);
+		if (pos < 0) {
+			if (ci->filemask != 1)
+				BUG("Conflicted %s but nothing in basic working tree or index; this shouldn't happen", path);
+			cache_tree_invalidate_path(index, path);
+		} else {
+			ce = index->cache[pos];
+
+			/*
+			 * Clean paths with CE_SKIP_WORKTREE set will not be
+			 * written to the working tree by the unpack_trees()
+			 * call in checkout().  Our conflicted entries would
+			 * have appeared clean to that code since we ignored
+			 * the higher order stages.  Thus, we need override
+			 * the CE_SKIP_WORKTREE bit and manually write those
+			 * files to the working disk here.
+			 *
+			 * TODO: Implement this CE_SKIP_WORKTREE fixup.
+			 */
+
+			/*
+			 * Mark this cache entry for removal and instead add
+			 * new stage>0 entries corresponding to the
+			 * conflicts.  If there are many conflicted entries, we
+			 * want to avoid memmove'ing O(NM) entries by
+			 * inserting the new entries one at a time.  So,
+			 * instead, we just add the new cache entries to the
+			 * end (ignoring normal index requirements on sort
+			 * order) and sort the index once we're all done.
+			 */
+			ce->ce_flags |= CE_REMOVE;
+		}
+
+		for (i = 0; i < 3; i++) {
+			struct version_info *vi;
+			if (!(ci->filemask & (1ul << i)))
+				continue;
+			vi = &ci->stages[i];
+			ce = make_cache_entry(index, vi->mode, &vi->oid,
+					      path, i+1, 0);
+			add_index_entry(index, ce, ADD_CACHE_JUST_APPEND);
+		}
+	}
+
+	/*
+	 * Remove the unused cache entries (and invalidate the relevant
+	 * cache-trees), then sort the index entries to get the conflicted
+	 * entries we added to the end into their right locations.
+	 */
+	remove_marked_cache_entries(index, 1);
+	QSORT(index->cache, index->cache_nr, cmp_cache_name_compare);
+
+	return errs;
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 20/20] merge-ort: free data structures in merge_finalize()
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (18 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
@ 2020-12-04 20:48   ` Elijah Newren via GitGitGadget
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-04 20:48 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index faebee8e7e..5d13932dd9 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -182,6 +182,16 @@ struct conflict_info {
 	assert((ci) && !(mi)->clean);        \
 } while (0)
 
+static void free_strmap_strings(struct strmap *map)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	strmap_for_each_entry(map, &iter, entry) {
+		free((char*)entry->key);
+	}
+}
+
 static int err(struct merge_options *opt, const char *err, ...)
 {
 	va_list params;
@@ -1126,7 +1136,27 @@ void merge_switch_to_result(struct merge_options *opt,
 void merge_finalize(struct merge_options *opt,
 		    struct merge_result *result)
 {
-	die("Not yet implemented");
+	struct merge_options_internal *opti = result->priv;
+
+	assert(opt->priv == NULL);
+
+	/*
+	 * We marked opti->paths with strdup_strings = 0, so that we
+	 * wouldn't have to make another copy of the fullpath created by
+	 * make_traverse_path from setup_path_info().  But, now that we've
+	 * used it and have no other references to these strings, it is time
+	 * to deallocate them.
+	 */
+	free_strmap_strings(&opti->paths);
+	strmap_clear(&opti->paths, 1);
+
+	/*
+	 * All keys and values in opti->conflicted are a subset of those in
+	 * opti->paths.  We don't want to deallocate anything twice, so we
+	 * don't free the keys and we pass 0 for free_values.
+	 */
+	strmap_clear(&opti->conflicted, 0);
+	FREE_AND_NULL(opti);
 }
 
 static void merge_start(struct merge_options *opt, struct merge_result *result)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 00/20] fundamentals of merge-ort implementation
  2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (19 preceding siblings ...)
  2020-12-04 20:48   ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
@ 2020-12-13  8:04   ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
                       ` (20 more replies)
  20 siblings, 21 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren

This is actually v5 of this series, and is being sent due to review comments
from a different series, namely en/merge-ort-3[1].

I have rerolls of en/merge-ort-2 and en/merge-ort-3 already prepared, but
since gitgitgadget will not allow me to send a series dependent on a
not-published-by-Junio series, I cannot yet send them. You will need to
temporarily drop them, and I'll resend after you publish the updated version
of this series. I do not like this solution, and I was tempted to just push
the updates into en/merge-ort-3, but since this series was still hanging in
'seen' awaiting feedback and a lot of the suggestions were for things from
this series, I decided to go this route anyway...

[1]
https://lore.kernel.org/git/CABPp-BHa0zehQd-axmb4bF6fR4PTWwGu9uLjOzgTW8_Gu12iZA@mail.gmail.com/

Changes since v4:

 * Improved documentation of filemask and dirmask
 * Improved documentation of merge_result.clean
 * Added new enum merge_side and documentation with it to try to make the
   code a bit more self-documenting.

Elijah Newren (20):
  merge-ort: setup basic internal data structures
  merge-ort: add some high-level algorithm structure
  merge-ort: port merge_start() from merge-recursive
  merge-ort: use histogram diff
  merge-ort: add an err() function similar to one from merge-recursive
  merge-ort: implement a very basic collect_merge_info()
  merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  merge-ort: compute a few more useful fields for collect_merge_info
  merge-ort: record stage and auxiliary info for every path
  merge-ort: avoid recursing into identical trees
  merge-ort: add a preliminary simple process_entries() implementation
  merge-ort: have process_entries operate in a defined order
  merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  merge-ort: step 2 of tree writing -- function to create tree object
  merge-ort: step 3 of tree writing -- handling subdirectories as we go
  merge-ort: basic outline for merge_switch_to_result()
  merge-ort: add implementation of checkout()
  tree: enable cmp_cache_name_compare() to be used elsewhere
  merge-ort: add implementation of record_conflicted_index_entries()
  merge-ort: free data structures in merge_finalize()

 merge-ort.c | 1248 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 merge-ort.h |    9 +-
 tree.c      |    2 +-
 tree.h      |    2 +
 4 files changed, 1256 insertions(+), 5 deletions(-)


base-commit: 3cf59784d42c4152a0b3de7bb7a75d0071e5f878
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-923%2Fnewren%2Fort-basics-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-923/newren/ort-basics-v3
Pull-Request: https://github.com/git/git/pull/923

Range-diff vs v2:

  1:  2568ec92c6d !  1:  518dde86966 merge-ort: setup basic internal data structures
     @@ merge-ort.c
      +	unsigned df_conflict:1;
      +
      +	/*
     -+	 * For filemask and dirmask, see tree-walk.h's struct traverse_info,
     -+	 * particularly the documentation above the "fn" member.  Note that
     -+	 * filemask = mask & ~dirmask from that documentation.
     ++	 * For filemask and dirmask, the ith bit corresponds to whether the
     ++	 * ith entry is a file (filemask) or a directory (dirmask).  Thus,
     ++	 * filemask & dirmask is always zero, and filemask | dirmask is at
     ++	 * most 7 but can be less when a path does not appear as either a
     ++	 * file or a directory on at least one side of history.
     ++	 *
     ++	 * Note that these masks are related to enum merge_side, as the ith
     ++	 * entry corresponds to side i.
     ++	 *
     ++	 * These values come from a traverse_trees() call; more info may be
     ++	 * found looking at tree-walk.h's struct traverse_info,
     ++	 * particularly the documentation above the "fn" member (note that
     ++	 * filemask = mask & ~dirmask from that documentation).
      +	 */
      +	unsigned filemask:3;
      +	unsigned dirmask:3;
  2:  b658536f59d =  2:  5827ec7f3eb merge-ort: add some high-level algorithm structure
  3:  acb40f5c165 =  3:  8295591ee13 merge-ort: port merge_start() from merge-recursive
  4:  22fecf6ccd1 =  4:  38b4f9cf78c merge-ort: use histogram diff
  5:  6c4c0c15b3d !  5:  95143bebf09 merge-ort: add an err() function similar to one from merge-recursive
     @@ Commit message
          for when we detect problems returned from collect_merge_info()'s
          traverse_trees() call, which we will be adding next.
      
     +    While we are at it, also add more documentation for the "clean" field
     +    from struct merge_result, particularly since the name suggests a boolean
     +    but it is not quite one and this is our first non-boolean usage.
     +
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     @@ merge-ort.c: static void merge_ort_nonrecursive_internal(struct merge_options *o
       	result->clean = detect_and_process_renames(opt, merge_base,
       						   side1, side2);
       	process_entries(opt, &working_tree_oid);
     +
     + ## merge-ort.h ##
     +@@ merge-ort.h: struct commit;
     + struct tree;
     + 
     + struct merge_result {
     +-	/* Whether the merge is clean */
     ++	/*
     ++	 * Whether the merge is clean; possible values:
     ++	 *    1: clean
     ++	 *    0: not clean (merge conflicts)
     ++	 *   <0: operation aborted prematurely.  (object database
     ++	 *       unreadable, disk full, etc.)  Worktree may be left in an
     ++	 *       inconsistent state if operation failed near the end.
     ++	 */
     + 	int clean;
     + 
     + 	/*
  6:  27268ef8a3c !  6:  242f6462ebb merge-ort: implement a very basic collect_merge_info()
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     +@@
     + #include "tree.h"
     + #include "xdiff-interface.h"
     + 
     ++/*
     ++ * We have many arrays of size 3.  Whenever we have such an array, the
     ++ * indices refer to one of the sides of the three-way merge.  This is so
     ++ * pervasive that the constants 0, 1, and 2 are used in many places in the
     ++ * code (especially in arithmetic operations to find the other side's index
     ++ * or to compute a relevant mask), but sometimes these enum names are used
     ++ * to aid code clarity.
     ++ *
     ++ * See also 'filemask' and 'dirmask' in struct conflict_info; the "ith side"
     ++ * referred to there is one of these three sides.
     ++ */
     ++enum merge_side {
     ++	MERGE_BASE = 0,
     ++	MERGE_SIDE1 = 1,
     ++	MERGE_SIDE2 = 2
     ++};
     ++
     + struct merge_options_internal {
     + 	/*
     + 	 * paths: primary data structure in all of merge ort.
      @@ merge-ort.c: static int err(struct merge_options *opt, const char *err, ...)
       	return -1;
       }
     @@ merge-ort.c: static int err(struct merge_options *opt, const char *err, ...)
      +		newinfo.namelen = p->pathlen;
      +		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
      +
     -+		for (i = 0; i < 3; i++) {
     ++		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
      +			const struct object_id *oid = NULL;
      +			if (dirmask & 1)
      +				oid = &names[i].oid;
     @@ merge-ort.c: static int err(struct merge_options *opt, const char *err, ...)
      +		ret = traverse_trees(NULL, 3, t, &newinfo);
      +		opti->current_dir_name = original_dir_name;
      +
     -+		for (i = 0; i < 3; i++)
     ++		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++)
      +			free(buf[i]);
      +
      +		if (ret < 0)
  7:  c6e5621c210 !  7:  c18bdc1b052 merge-ort: avoid repeating fill_tree_descriptor() on the same tree
     @@ merge-ort.c: static int collect_merge_info_callback(int n,
      @@ merge-ort.c: static int collect_merge_info_callback(int n,
       		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
       
     - 		for (i = 0; i < 3; i++) {
     + 		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
      -			const struct object_id *oid = NULL;
      -			if (dirmask & 1)
      -				oid = &names[i].oid;
  8:  93fd69fa3c6 !  8:  be5708dc628 merge-ort: compute a few more useful fields for collect_merge_info
     @@ merge-ort.c: static int collect_merge_info_callback(int n,
      +		 * the beginning of this function).
      +		 */
       
     - 		for (i = 0; i < 3; i++) {
     + 		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
       			if (i == 1 && side1_matches_mbase)
  9:  decff4b3754 !  9:  be4bdfac876 merge-ort: record stage and auxiliary info for every path
     @@ merge-ort.c: static int err(struct merge_options *opt, const char *err, ...)
      +		struct conflict_info *ci;
      +
      +		ASSIGN_AND_VERIFY_CI(ci, mi);
     -+		for (i = 0; i < 3; i++) {
     ++		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
      +			ci->pathnames[i] = fullpath;
      +			ci->stages[i].mode = names[i].mode;
      +			oidcpy(&ci->stages[i].oid, &names[i].oid);
 10:  86c661fe1ee = 10:  6fdf85c8f1a merge-ort: avoid recursing into identical trees
 11:  aa3b13ffd87 = 11:  8b001ae643a merge-ort: add a preliminary simple process_entries() implementation
 12:  b54306fd0e6 = 12:  260b12290fb merge-ort: have process_entries operate in a defined order
 13:  8ee8561d7a3 = 13:  092e77bbb15 merge-ort: step 1 of tree writing -- record basenames, modes, and oids
 14:  6ff56824c33 = 14:  b5d9ba10f8c merge-ort: step 2 of tree writing -- function to create tree object
 15:  da4fe900496 = 15:  81374cbf205 merge-ort: step 3 of tree writing -- handling subdirectories as we go
 16:  8e90d211c5d = 16:  3198efe3188 merge-ort: basic outline for merge_switch_to_result()
 17:  61fada146cf ! 17:  119f40c77f8 merge-ort: add implementation of checkout()
     @@ merge-ort.c
      +#include "unpack-trees.h"
       #include "xdiff-interface.h"
       
     - struct merge_options_internal {
     + /*
      @@ merge-ort.c: static int checkout(struct merge_options *opt,
       		    struct tree *prev,
       		    struct tree *next)
 18:  f5a13a0b084 = 18:  b4c400051ad tree: enable cmp_cache_name_compare() to be used elsewhere
 19:  4efac38116d ! 19:  ee831c8cece merge-ort: add implementation of record_conflicted_index_entries()
     @@ merge-ort.c: static int record_conflicted_index_entries(struct merge_options *op
      +			ce->ce_flags |= CE_REMOVE;
      +		}
      +
     -+		for (i = 0; i < 3; i++) {
     ++		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
      +			struct version_info *vi;
      +			if (!(ci->filemask & (1ul << i)))
      +				continue;
 20:  fbeb527d671 = 20:  55451a79eec merge-ort: free data structures in merge_finalize()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 01/20] merge-ort: setup basic internal data structures
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
                       ` (19 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Set up some basic internal data structures.  The only carry-over from
merge-recursive.c is call_depth, though needed_rename_limit will be
added later.

The central piece of data will definitely be the strmap "paths", which
will map every relevant pathname under consideration to either a
merged_info or a conflict_info.  ("conflicted" is a strmap that is a
subset of "paths".)

merged_info contains all relevant information for a non-conflicted
entry.  conflict_info contains a merged_info, plus any additional
information about a conflict such as the higher orders stages involved
and the names of the paths those came from (handy once renames get
involved).  If an entry remains conflicted, the merged_info portion of a
conflict_info will later be filled with whatever version of the file
should be placed in the working directory (e.g. an as-merged-as-possible
variation that contains conflict markers).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index b487901d3ec..3325c9c0a2c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,153 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "strmap.h"
+
+struct merge_options_internal {
+	/*
+	 * paths: primary data structure in all of merge ort.
+	 *
+	 * The keys of paths:
+	 *   * are full relative paths from the toplevel of the repository
+	 *     (e.g. "drivers/firmware/raspberrypi.c").
+	 *   * store all relevant paths in the repo, both directories and
+	 *     files (e.g. drivers, drivers/firmware would also be included)
+	 *   * these keys serve to intern all the path strings, which allows
+	 *     us to do pointer comparison on directory names instead of
+	 *     strcmp; we just have to be careful to use the interned strings.
+	 *
+	 * The values of paths:
+	 *   * either a pointer to a merged_info, or a conflict_info struct
+	 *   * merged_info contains all relevant information for a
+	 *     non-conflicted entry.
+	 *   * conflict_info contains a merged_info, plus any additional
+	 *     information about a conflict such as the higher orders stages
+	 *     involved and the names of the paths those came from (handy
+	 *     once renames get involved).
+	 *   * a path may start "conflicted" (i.e. point to a conflict_info)
+	 *     and then a later step (e.g. three-way content merge) determines
+	 *     it can be cleanly merged, at which point it'll be marked clean
+	 *     and the algorithm will ignore any data outside the contained
+	 *     merged_info for that entry
+	 *   * If an entry remains conflicted, the merged_info portion of a
+	 *     conflict_info will later be filled with whatever version of
+	 *     the file should be placed in the working directory (e.g. an
+	 *     as-merged-as-possible variation that contains conflict markers).
+	 */
+	struct strmap paths;
+
+	/*
+	 * conflicted: a subset of keys->values from "paths"
+	 *
+	 * conflicted is basically an optimization between process_entries()
+	 * and record_conflicted_index_entries(); the latter could loop over
+	 * ALL the entries in paths AGAIN and look for the ones that are
+	 * still conflicted, but since process_entries() has to loop over
+	 * all of them, it saves the ones it couldn't resolve in this strmap
+	 * so that record_conflicted_index_entries() can iterate just the
+	 * relevant entries.
+	 */
+	struct strmap conflicted;
+
+	/*
+	 * current_dir_name: temporary var used in collect_merge_info_callback()
+	 *
+	 * Used to set merged_info.directory_name; see documentation for that
+	 * variable and the requirements placed on that field.
+	 */
+	const char *current_dir_name;
+
+	/* call_depth: recursion level counter for merging merge bases */
+	int call_depth;
+};
+
+struct version_info {
+	struct object_id oid;
+	unsigned short mode;
+};
+
+struct merged_info {
+	/* if is_null, ignore result.  otherwise result has oid & mode */
+	struct version_info result;
+	unsigned is_null:1;
+
+	/*
+	 * clean: whether the path in question is cleanly merged.
+	 *
+	 * see conflict_info.merged for more details.
+	 */
+	unsigned clean:1;
+
+	/*
+	 * basename_offset: offset of basename of path.
+	 *
+	 * perf optimization to avoid recomputing offset of final '/'
+	 * character in pathname (0 if no '/' in pathname).
+	 */
+	size_t basename_offset;
+
+	 /*
+	  * directory_name: containing directory name.
+	  *
+	  * Note that we assume directory_name is constructed such that
+	  *    strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
+	  * i.e. string equality is equivalent to pointer equality.  For this
+	  * to hold, we have to be careful setting directory_name.
+	  */
+	const char *directory_name;
+};
+
+struct conflict_info {
+	/*
+	 * merged: the version of the path that will be written to working tree
+	 *
+	 * WARNING: It is critical to check merged.clean and ensure it is 0
+	 * before reading any conflict_info fields outside of merged.
+	 * Allocated merge_info structs will always have clean set to 1.
+	 * Allocated conflict_info structs will have merged.clean set to 0
+	 * initially.  The merged.clean field is how we know if it is safe
+	 * to access other parts of conflict_info besides merged; if a
+	 * conflict_info's merged.clean is changed to 1, the rest of the
+	 * algorithm is not allowed to look at anything outside of the
+	 * merged member anymore.
+	 */
+	struct merged_info merged;
+
+	/* oids & modes from each of the three trees for this path */
+	struct version_info stages[3];
+
+	/* pathnames for each stage; may differ due to rename detection */
+	const char *pathnames[3];
+
+	/* Whether this path is/was involved in a directory/file conflict */
+	unsigned df_conflict:1;
+
+	/*
+	 * For filemask and dirmask, the ith bit corresponds to whether the
+	 * ith entry is a file (filemask) or a directory (dirmask).  Thus,
+	 * filemask & dirmask is always zero, and filemask | dirmask is at
+	 * most 7 but can be less when a path does not appear as either a
+	 * file or a directory on at least one side of history.
+	 *
+	 * Note that these masks are related to enum merge_side, as the ith
+	 * entry corresponds to side i.
+	 *
+	 * These values come from a traverse_trees() call; more info may be
+	 * found looking at tree-walk.h's struct traverse_info,
+	 * particularly the documentation above the "fn" member (note that
+	 * filemask = mask & ~dirmask from that documentation).
+	 */
+	unsigned filemask:3;
+	unsigned dirmask:3;
+
+	/*
+	 * Optimization to track which stages match, to avoid the need to
+	 * recompute it in multiple steps. Either 0 or at least 2 bits are
+	 * set; if at least 2 bits are set, their corresponding stages match.
+	 */
+	unsigned match_mask:3;
+};
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 02/20] merge-ort: add some high-level algorithm structure
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
                       ` (18 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge_ort_nonrecursive_internal() will be used by both
merge_inmemory_nonrecursive() and merge_inmemory_recursive(); let's
focus on it for now.  It involves some setup -- merge_start() --
followed by the following chain of functions:

  collect_merge_info()
    This function will populate merge_options_internal's paths field,
    via a call to traverse_trees() and a new callback that will be added
    later.

  detect_and_process_renames()
    This function will detect renames, and then adjust entries in paths
    to move conflict stages from old pathnames into those for new
    pathnames, so that the next step doesn't have to think about renames
    and just can do three-way content merging and such.

  process_entries()
    This function determines how to take the various stages (versions of
    a file from the three different sides) and merge them, and whether
    to mark the result as conflicted or cleanly merged.  It also writes
    out these merged file versions as it goes to create a tree.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 3325c9c0a2c..d0abee9b6ab 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -18,6 +18,7 @@
 #include "merge-ort.h"
 
 #include "strmap.h"
+#include "tree.h"
 
 struct merge_options_internal {
 	/*
@@ -164,6 +165,38 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+static int collect_merge_info(struct merge_options *opt,
+			      struct tree *merge_base,
+			      struct tree *side1,
+			      struct tree *side2)
+{
+	/* TODO: Implement this using traverse_trees() */
+	die("Not yet implemented.");
+}
+
+static int detect_and_process_renames(struct merge_options *opt,
+				      struct tree *merge_base,
+				      struct tree *side1,
+				      struct tree *side2)
+{
+	int clean = 1;
+
+	/*
+	 * Rename detection works by detecting file similarity.  Here we use
+	 * a really easy-to-implement scheme: files are similar IFF they have
+	 * the same filename.  Therefore, by this scheme, there are no renames.
+	 *
+	 * TODO: Actually implement a real rename detection scheme.
+	 */
+	return clean;
+}
+
+static void process_entries(struct merge_options *opt,
+			    struct object_id *result_oid)
+{
+	die("Not yet implemented.");
+}
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
@@ -180,13 +213,46 @@ void merge_finalize(struct merge_options *opt,
 	die("Not yet implemented");
 }
 
+static void merge_start(struct merge_options *opt, struct merge_result *result)
+{
+	die("Not yet implemented.");
+}
+
+/*
+ * Originally from merge_trees_internal(); heavily adapted, though.
+ */
+static void merge_ort_nonrecursive_internal(struct merge_options *opt,
+					    struct tree *merge_base,
+					    struct tree *side1,
+					    struct tree *side2,
+					    struct merge_result *result)
+{
+	struct object_id working_tree_oid;
+
+	collect_merge_info(opt, merge_base, side1, side2);
+	result->clean = detect_and_process_renames(opt, merge_base,
+						   side1, side2);
+	process_entries(opt, &working_tree_oid);
+
+	/* Set return values */
+	result->tree = parse_tree_indirect(&working_tree_oid);
+	/* existence of conflicted entries implies unclean */
+	result->clean &= strmap_empty(&opt->priv->conflicted);
+	if (!opt->priv->call_depth) {
+		result->priv = opt->priv;
+		opt->priv = NULL;
+	}
+}
+
 void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *merge_base,
 			       struct tree *side1,
 			       struct tree *side2,
 			       struct merge_result *result)
 {
-	die("Not yet implemented");
+	assert(opt->ancestor != NULL);
+	merge_start(opt, result);
+	merge_ort_nonrecursive_internal(opt, merge_base, side1, side2, result);
 }
 
 void merge_incore_recursive(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 03/20] merge-ort: port merge_start() from merge-recursive
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
                       ` (17 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

merge_start() basically does a bunch of sanity checks, then allocates
and initializes opt->priv -- a struct merge_options_internal.

Most of the sanity checks are usable as-is.  The
allocation/intialization is a bit different since merge-ort has a very
different merge_options_internal than merge-recursive, but the idea is
the same.

The weirdest part here is that merge-ort and merge-recursive use the
same struct merge_options, even though merge_options has a number of
fields that are oddly specific to merge-recursive's internal
implementation and don't even make sense with merge-ort's high-level
design (e.g. buffer_output, which merge-ort has to always do).  I reused
the same data structure because:
  * most the fields made sense to both merge algorithms
  * making a new struct would have required making new enums or somehow
    externalizing them, and that was getting messy.
  * it simplifies converting the existing callers by not having to
    have different code paths for merge_options setup.

I also marked detect_renames as ignored.  We can revisit that later, but
in short: merge-recursive allowed turning off rename detection because
it was sometimes glacially slow.  When you speed something up by a few
orders of magnitude, it's worth revisiting whether that justification is
still relevant.  Besides, if folks find it's still too slow, perhaps
they have a better scaling case than I could find and maybe it turns up
some more optimizations we can add.  If it still is needed as an option,
it is easy to add later.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index d0abee9b6ab..fb07c8f2b30 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,8 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "diff.h"
+#include "diffcore.h"
 #include "strmap.h"
 #include "tree.h"
 
@@ -215,7 +217,48 @@ void merge_finalize(struct merge_options *opt,
 
 static void merge_start(struct merge_options *opt, struct merge_result *result)
 {
-	die("Not yet implemented.");
+	/* Sanity checks on opt */
+	assert(opt->repo);
+
+	assert(opt->branch1 && opt->branch2);
+
+	assert(opt->detect_directory_renames >= MERGE_DIRECTORY_RENAMES_NONE &&
+	       opt->detect_directory_renames <= MERGE_DIRECTORY_RENAMES_TRUE);
+	assert(opt->rename_limit >= -1);
+	assert(opt->rename_score >= 0 && opt->rename_score <= MAX_SCORE);
+	assert(opt->show_rename_progress >= 0 && opt->show_rename_progress <= 1);
+
+	assert(opt->xdl_opts >= 0);
+	assert(opt->recursive_variant >= MERGE_VARIANT_NORMAL &&
+	       opt->recursive_variant <= MERGE_VARIANT_THEIRS);
+
+	/*
+	 * detect_renames, verbosity, buffer_output, and obuf are ignored
+	 * fields that were used by "recursive" rather than "ort" -- but
+	 * sanity check them anyway.
+	 */
+	assert(opt->detect_renames >= -1 &&
+	       opt->detect_renames <= DIFF_DETECT_COPY);
+	assert(opt->verbosity >= 0 && opt->verbosity <= 5);
+	assert(opt->buffer_output <= 2);
+	assert(opt->obuf.len == 0);
+
+	assert(opt->priv == NULL);
+
+	/* Initialization of opt->priv, our internal merge data */
+	opt->priv = xcalloc(1, sizeof(*opt->priv));
+
+	/*
+	 * Although we initialize opt->priv->paths with strdup_strings=0,
+	 * that's just to avoid making yet another copy of an allocated
+	 * string.  Putting the entry into paths means we are taking
+	 * ownership, so we will later free it.
+	 *
+	 * In contrast, conflicted just has a subset of keys from paths, so
+	 * we don't want to free those (it'd be a duplicate free).
+	 */
+	strmap_init_with_options(&opt->priv->paths, NULL, 0);
+	strmap_init_with_options(&opt->priv->conflicted, NULL, 0);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 04/20] merge-ort: use histogram diff
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
                       ` (16 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

In my cursory investigation, histogram diffs are about 2% slower than
Myers diffs.  Others have probably done more detailed benchmarks.  But,
in short, histogram diffs have been around for years and in a number of
cases provide obviously better looking diffs where Myers diffs are
unintelligible but the performance hit has kept them from becoming the
default.

However, there are real merge bugs we know about that have triggered on
git.git and linux.git, which I don't have a clue how to address without
the additional information that I believe is provided by histogram
diffs.  See the following:

https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/
https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/
https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/

I don't like mismerges.  I really don't like silent mismerges.  While I
am sometimes willing to make performance and correctness tradeoff, I'm
much more interested in correctness in general.  I want to fix the above
bugs.  I have not yet started doing so, but I believe histogram diff at
least gives me an angle.  Unfortunately, I can't rely on using the
information from histogram diff unless it's in use.  And it hasn't been
used because of a few percentage performance hit.

In testcases I have looked at, merge-ort is _much_ faster than
merge-recursive for non-trivial merges/rebases/cherry-picks.  As such,
this is a golden opportunity to switch out the underlying diff algorithm
(at least the one used by the merge machinery; git-diff and git-log are
separate questions); doing so will allow me to get additional data and
improved diffs, and I believe it will help me fix the above bugs at some
point in the future.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index fb07c8f2b30..85942cfa7c7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -21,6 +21,7 @@
 #include "diffcore.h"
 #include "strmap.h"
 #include "tree.h"
+#include "xdiff-interface.h"
 
 struct merge_options_internal {
 	/*
@@ -245,6 +246,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	assert(opt->priv == NULL);
 
+	/* Default to histogram diff.  Actually, just hardcode it...for now. */
+	opt->xdl_opts = DIFF_WITH_ALG(opt, HISTOGRAM_DIFF);
+
 	/* Initialization of opt->priv, our internal merge data */
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 05/20] merge-ort: add an err() function similar to one from merge-recursive
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
                       ` (15 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Various places in merge-recursive used an err() function when it hit
some kind of unrecoverable error.  That code was from the reusable bits
of merge-recursive.c that we liked, such as merge_3way, writing object
files to the object store, reading blobs from the object store, etc.  So
create a similar function to allow us to port that code over, and use it
for when we detect problems returned from collect_merge_info()'s
traverse_trees() call, which we will be adding next.

While we are at it, also add more documentation for the "clean" field
from struct merge_result, particularly since the name suggests a boolean
but it is not quite one and this is our first non-boolean usage.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 31 +++++++++++++++++++++++++++++--
 merge-ort.h |  9 ++++++++-
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 85942cfa7c7..76c0f934279 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -168,12 +168,27 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+static int err(struct merge_options *opt, const char *err, ...)
+{
+	va_list params;
+	struct strbuf sb = STRBUF_INIT;
+
+	strbuf_addstr(&sb, "error: ");
+	va_start(params, err);
+	strbuf_vaddf(&sb, err, params);
+	va_end(params);
+
+	error("%s", sb.buf);
+	strbuf_release(&sb);
+
+	return -1;
+}
+
 static int collect_merge_info(struct merge_options *opt,
 			      struct tree *merge_base,
 			      struct tree *side1,
 			      struct tree *side2)
 {
-	/* TODO: Implement this using traverse_trees() */
 	die("Not yet implemented.");
 }
 
@@ -276,7 +291,19 @@ static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 {
 	struct object_id working_tree_oid;
 
-	collect_merge_info(opt, merge_base, side1, side2);
+	if (collect_merge_info(opt, merge_base, side1, side2) != 0) {
+		/*
+		 * TRANSLATORS: The %s arguments are: 1) tree hash of a merge
+		 * base, and 2-3) the trees for the two trees we're merging.
+		 */
+		err(opt, _("collecting merge info failed for trees %s, %s, %s"),
+		    oid_to_hex(&merge_base->object.oid),
+		    oid_to_hex(&side1->object.oid),
+		    oid_to_hex(&side2->object.oid));
+		result->clean = -1;
+		return;
+	}
+
 	result->clean = detect_and_process_renames(opt, merge_base,
 						   side1, side2);
 	process_entries(opt, &working_tree_oid);
diff --git a/merge-ort.h b/merge-ort.h
index 74adccad162..55ae7ee865d 100644
--- a/merge-ort.h
+++ b/merge-ort.h
@@ -7,7 +7,14 @@ struct commit;
 struct tree;
 
 struct merge_result {
-	/* Whether the merge is clean */
+	/*
+	 * Whether the merge is clean; possible values:
+	 *    1: clean
+	 *    0: not clean (merge conflicts)
+	 *   <0: operation aborted prematurely.  (object database
+	 *       unreadable, disk full, etc.)  Worktree may be left in an
+	 *       inconsistent state if operation failed near the end.
+	 */
 	int clean;
 
 	/*
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 06/20] merge-ort: implement a very basic collect_merge_info()
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
                       ` (14 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This does not actually collect any necessary info other than the
pathnames involved, since it just allocates an all-zero conflict_info
and stuffs that into paths.  However, it invokes the traverse_trees()
machinery to walk over all the paths and sets up the basic
infrastructure we need.

I have left out a few obvious optimizations to try to make this patch as
short and obvious as possible.  A subsequent patch will add some of
those back in with some more useful data fields before we introduce a
patch that actually sets up the conflict_info fields.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 134 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 76c0f934279..4a2c7de6e8e 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -23,6 +23,23 @@
 #include "tree.h"
 #include "xdiff-interface.h"
 
+/*
+ * We have many arrays of size 3.  Whenever we have such an array, the
+ * indices refer to one of the sides of the three-way merge.  This is so
+ * pervasive that the constants 0, 1, and 2 are used in many places in the
+ * code (especially in arithmetic operations to find the other side's index
+ * or to compute a relevant mask), but sometimes these enum names are used
+ * to aid code clarity.
+ *
+ * See also 'filemask' and 'dirmask' in struct conflict_info; the "ith side"
+ * referred to there is one of these three sides.
+ */
+enum merge_side {
+	MERGE_BASE = 0,
+	MERGE_SIDE1 = 1,
+	MERGE_SIDE2 = 2
+};
+
 struct merge_options_internal {
 	/*
 	 * paths: primary data structure in all of merge ort.
@@ -184,12 +201,128 @@ static int err(struct merge_options *opt, const char *err, ...)
 	return -1;
 }
 
+static int collect_merge_info_callback(int n,
+				       unsigned long mask,
+				       unsigned long dirmask,
+				       struct name_entry *names,
+				       struct traverse_info *info)
+{
+	/*
+	 * n is 3.  Always.
+	 * common ancestor (mbase) has mask 1, and stored in index 0 of names
+	 * head of side 1  (side1) has mask 2, and stored in index 1 of names
+	 * head of side 2  (side2) has mask 4, and stored in index 2 of names
+	 */
+	struct merge_options *opt = info->data;
+	struct merge_options_internal *opti = opt->priv;
+	struct conflict_info *ci;
+	struct name_entry *p;
+	size_t len;
+	char *fullpath;
+	unsigned filemask = mask & ~dirmask;
+	unsigned mbase_null = !(mask & 1);
+	unsigned side1_null = !(mask & 2);
+	unsigned side2_null = !(mask & 4);
+
+	/* n = 3 is a fundamental assumption. */
+	if (n != 3)
+		BUG("Called collect_merge_info_callback wrong");
+
+	/*
+	 * A bunch of sanity checks verifying that traverse_trees() calls
+	 * us the way I expect.  Could just remove these at some point,
+	 * though maybe they are helpful to future code readers.
+	 */
+	assert(mbase_null == is_null_oid(&names[0].oid));
+	assert(side1_null == is_null_oid(&names[1].oid));
+	assert(side2_null == is_null_oid(&names[2].oid));
+	assert(!mbase_null || !side1_null || !side2_null);
+	assert(mask > 0 && mask < 8);
+
+	/*
+	 * Get the name of the relevant filepath, which we'll pass to
+	 * setup_path_info() for tracking.
+	 */
+	p = names;
+	while (!p->mode)
+		p++;
+	len = traverse_path_len(info, p->pathlen);
+
+	/* +1 in both of the following lines to include the NUL byte */
+	fullpath = xmalloc(len + 1);
+	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
+
+	/*
+	 * TODO: record information about the path other than all zeros,
+	 * so we can resolve later in process_entries.
+	 */
+	ci = xcalloc(1, sizeof(struct conflict_info));
+	strmap_put(&opti->paths, fullpath, ci);
+
+	/* If dirmask, recurse into subdirectories */
+	if (dirmask) {
+		struct traverse_info newinfo;
+		struct tree_desc t[3];
+		void *buf[3] = {NULL, NULL, NULL};
+		const char *original_dir_name;
+		int i, ret;
+
+		ci->match_mask &= filemask;
+		newinfo = *info;
+		newinfo.prev = info;
+		newinfo.name = p->path;
+		newinfo.namelen = p->pathlen;
+		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
+
+		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
+			const struct object_id *oid = NULL;
+			if (dirmask & 1)
+				oid = &names[i].oid;
+			buf[i] = fill_tree_descriptor(opt->repo, t + i, oid);
+			dirmask >>= 1;
+		}
+
+		original_dir_name = opti->current_dir_name;
+		opti->current_dir_name = fullpath;
+		ret = traverse_trees(NULL, 3, t, &newinfo);
+		opti->current_dir_name = original_dir_name;
+
+		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++)
+			free(buf[i]);
+
+		if (ret < 0)
+			return -1;
+	}
+
+	return mask;
+}
+
 static int collect_merge_info(struct merge_options *opt,
 			      struct tree *merge_base,
 			      struct tree *side1,
 			      struct tree *side2)
 {
-	die("Not yet implemented.");
+	int ret;
+	struct tree_desc t[3];
+	struct traverse_info info;
+	const char *toplevel_dir_placeholder = "";
+
+	opt->priv->current_dir_name = toplevel_dir_placeholder;
+	setup_traverse_info(&info, toplevel_dir_placeholder);
+	info.fn = collect_merge_info_callback;
+	info.data = opt;
+	info.show_all_errors = 1;
+
+	parse_tree(merge_base);
+	parse_tree(side1);
+	parse_tree(side2);
+	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
+	init_tree_desc(t + 1, side1->buffer, side1->size);
+	init_tree_desc(t + 2, side2->buffer, side2->size);
+
+	ret = traverse_trees(NULL, 3, t, &info);
+
+	return ret;
 }
 
 static int detect_and_process_renames(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (5 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
                       ` (13 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Three-way merges, by their nature, are going to often have two or more
trees match at a given subdirectory.  We can avoid calling
fill_tree_descriptor() on the same tree by checking when these trees
match.  Noting when various oids match will also be useful in other
calculations and optimizations as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 4a2c7de6e8e..690c64fe264 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -223,6 +223,15 @@ static int collect_merge_info_callback(int n,
 	unsigned mbase_null = !(mask & 1);
 	unsigned side1_null = !(mask & 2);
 	unsigned side2_null = !(mask & 4);
+	unsigned side1_matches_mbase = (!side1_null && !mbase_null &&
+					names[0].mode == names[1].mode &&
+					oideq(&names[0].oid, &names[1].oid));
+	unsigned side2_matches_mbase = (!side2_null && !mbase_null &&
+					names[0].mode == names[2].mode &&
+					oideq(&names[0].oid, &names[2].oid));
+	unsigned sides_match = (!side1_null && !side2_null &&
+				names[1].mode == names[2].mode &&
+				oideq(&names[1].oid, &names[2].oid));
 
 	/* n = 3 is a fundamental assumption. */
 	if (n != 3)
@@ -275,10 +284,19 @@ static int collect_merge_info_callback(int n,
 		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
 
 		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
-			const struct object_id *oid = NULL;
-			if (dirmask & 1)
-				oid = &names[i].oid;
-			buf[i] = fill_tree_descriptor(opt->repo, t + i, oid);
+			if (i == 1 && side1_matches_mbase)
+				t[1] = t[0];
+			else if (i == 2 && side2_matches_mbase)
+				t[2] = t[0];
+			else if (i == 2 && sides_match)
+				t[2] = t[1];
+			else {
+				const struct object_id *oid = NULL;
+				if (dirmask & 1)
+					oid = &names[i].oid;
+				buf[i] = fill_tree_descriptor(opt->repo,
+							      t + i, oid);
+			}
 			dirmask >>= 1;
 		}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 08/20] merge-ort: compute a few more useful fields for collect_merge_info
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (6 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
                       ` (12 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 690c64fe264..a6876191c02 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -220,6 +220,7 @@ static int collect_merge_info_callback(int n,
 	size_t len;
 	char *fullpath;
 	unsigned filemask = mask & ~dirmask;
+	unsigned match_mask = 0; /* will be updated below */
 	unsigned mbase_null = !(mask & 1);
 	unsigned side1_null = !(mask & 2);
 	unsigned side2_null = !(mask & 4);
@@ -233,6 +234,22 @@ static int collect_merge_info_callback(int n,
 				names[1].mode == names[2].mode &&
 				oideq(&names[1].oid, &names[2].oid));
 
+	/*
+	 * Note: When a path is a file on one side of history and a directory
+	 * in another, we have a directory/file conflict.  In such cases, if
+	 * the conflict doesn't resolve from renames and deletions, then we
+	 * always leave directories where they are and move files out of the
+	 * way.  Thus, while struct conflict_info has a df_conflict field to
+	 * track such conflicts, we ignore that field for any directories at
+	 * a path and only pay attention to it for files at the given path.
+	 * The fact that we leave directories were they are also means that
+	 * we do not need to worry about getting additional df_conflict
+	 * information propagated from parent directories down to children
+	 * (unlike, say traverse_trees_recursive() in unpack-trees.c, which
+	 * sets a newinfo.df_conflicts field specifically to propagate it).
+	 */
+	unsigned df_conflict = (filemask != 0) && (dirmask != 0);
+
 	/* n = 3 is a fundamental assumption. */
 	if (n != 3)
 		BUG("Called collect_merge_info_callback wrong");
@@ -248,6 +265,14 @@ static int collect_merge_info_callback(int n,
 	assert(!mbase_null || !side1_null || !side2_null);
 	assert(mask > 0 && mask < 8);
 
+	/* Determine match_mask */
+	if (side1_matches_mbase)
+		match_mask = (side2_matches_mbase ? 7 : 3);
+	else if (side2_matches_mbase)
+		match_mask = 5;
+	else if (sides_match)
+		match_mask = 6;
+
 	/*
 	 * Get the name of the relevant filepath, which we'll pass to
 	 * setup_path_info() for tracking.
@@ -266,6 +291,8 @@ static int collect_merge_info_callback(int n,
 	 * so we can resolve later in process_entries.
 	 */
 	ci = xcalloc(1, sizeof(struct conflict_info));
+	ci->df_conflict = df_conflict;
+	ci->match_mask = match_mask;
 	strmap_put(&opti->paths, fullpath, ci);
 
 	/* If dirmask, recurse into subdirectories */
@@ -282,6 +309,15 @@ static int collect_merge_info_callback(int n,
 		newinfo.name = p->path;
 		newinfo.namelen = p->pathlen;
 		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
+		/*
+		 * If this directory we are about to recurse into cared about
+		 * its parent directory (the current directory) having a D/F
+		 * conflict, then we'd propagate the masks in this way:
+		 *    newinfo.df_conflicts |= (mask & ~dirmask);
+		 * But we don't worry about propagating D/F conflicts.  (See
+		 * comment near setting of local df_conflict variable near
+		 * the beginning of this function).
+		 */
 
 		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
 			if (i == 1 && side1_matches_mbase)
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 09/20] merge-ort: record stage and auxiliary info for every path
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (7 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
                       ` (11 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create a helper function, setup_path_info(), which can be used to record
all the information we want in a merged_info or conflict_info.  While
there is currently only one caller of this new function, and some of its
particular parameters are fixed, future callers of this function will be
added later.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 90 insertions(+), 7 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index a6876191c02..bbfc056300b 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -185,6 +185,26 @@ struct conflict_info {
 	unsigned match_mask:3;
 };
 
+/*
+ * For the next three macros, see warning for conflict_info.merged.
+ *
+ * In each of the below, mi is a struct merged_info*, and ci was defined
+ * as a struct conflict_info* (but we need to verify ci isn't actually
+ * pointed at a struct merged_info*).
+ *
+ * INITIALIZE_CI: Assign ci to mi but only if it's safe; set to NULL otherwise.
+ * VERIFY_CI: Ensure that something we assigned to a conflict_info* is one.
+ * ASSIGN_AND_VERIFY_CI: Similar to VERIFY_CI but do assignment first.
+ */
+#define INITIALIZE_CI(ci, mi) do {                                           \
+	(ci) = (!(mi) || (mi)->clean) ? NULL : (struct conflict_info *)(mi); \
+} while (0)
+#define VERIFY_CI(ci) assert(ci && !ci->merged.clean);
+#define ASSIGN_AND_VERIFY_CI(ci, mi) do {    \
+	(ci) = (struct conflict_info *)(mi);  \
+	assert((ci) && !(mi)->clean);        \
+} while (0)
+
 static int err(struct merge_options *opt, const char *err, ...)
 {
 	va_list params;
@@ -201,6 +221,65 @@ static int err(struct merge_options *opt, const char *err, ...)
 	return -1;
 }
 
+static void setup_path_info(struct merge_options *opt,
+			    struct string_list_item *result,
+			    const char *current_dir_name,
+			    int current_dir_name_len,
+			    char *fullpath, /* we'll take over ownership */
+			    struct name_entry *names,
+			    struct name_entry *merged_version,
+			    unsigned is_null,     /* boolean */
+			    unsigned df_conflict, /* boolean */
+			    unsigned filemask,
+			    unsigned dirmask,
+			    int resolved          /* boolean */)
+{
+	/* result->util is void*, so mi is a convenience typed variable */
+	struct merged_info *mi;
+
+	assert(!is_null || resolved);
+	assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
+	assert(resolved == (merged_version != NULL));
+
+	mi = xcalloc(1, resolved ? sizeof(struct merged_info) :
+				   sizeof(struct conflict_info));
+	mi->directory_name = current_dir_name;
+	mi->basename_offset = current_dir_name_len;
+	mi->clean = !!resolved;
+	if (resolved) {
+		mi->result.mode = merged_version->mode;
+		oidcpy(&mi->result.oid, &merged_version->oid);
+		mi->is_null = !!is_null;
+	} else {
+		int i;
+		struct conflict_info *ci;
+
+		ASSIGN_AND_VERIFY_CI(ci, mi);
+		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
+			ci->pathnames[i] = fullpath;
+			ci->stages[i].mode = names[i].mode;
+			oidcpy(&ci->stages[i].oid, &names[i].oid);
+		}
+		ci->filemask = filemask;
+		ci->dirmask = dirmask;
+		ci->df_conflict = !!df_conflict;
+		if (dirmask)
+			/*
+			 * Assume is_null for now, but if we have entries
+			 * under the directory then when it is complete in
+			 * write_completed_directory() it'll update this.
+			 * Also, for D/F conflicts, we have to handle the
+			 * directory first, then clear this bit and process
+			 * the file to see how it is handled -- that occurs
+			 * near the top of process_entry().
+			 */
+			mi->is_null = 1;
+	}
+	strmap_put(&opt->priv->paths, fullpath, mi);
+	result->string = fullpath;
+	result->util = mi;
+}
+
 static int collect_merge_info_callback(int n,
 				       unsigned long mask,
 				       unsigned long dirmask,
@@ -215,10 +294,12 @@ static int collect_merge_info_callback(int n,
 	 */
 	struct merge_options *opt = info->data;
 	struct merge_options_internal *opti = opt->priv;
-	struct conflict_info *ci;
+	struct string_list_item pi;  /* Path Info */
+	struct conflict_info *ci; /* typed alias to pi.util (which is void*) */
 	struct name_entry *p;
 	size_t len;
 	char *fullpath;
+	const char *dirname = opti->current_dir_name;
 	unsigned filemask = mask & ~dirmask;
 	unsigned match_mask = 0; /* will be updated below */
 	unsigned mbase_null = !(mask & 1);
@@ -287,13 +368,15 @@ static int collect_merge_info_callback(int n,
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
 	/*
-	 * TODO: record information about the path other than all zeros,
-	 * so we can resolve later in process_entries.
+	 * Record information about the path so we can resolve later in
+	 * process_entries.
 	 */
-	ci = xcalloc(1, sizeof(struct conflict_info));
-	ci->df_conflict = df_conflict;
+	setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+			names, NULL, 0, df_conflict, filemask, dirmask, 0);
+
+	ci = pi.util;
+	VERIFY_CI(ci);
 	ci->match_mask = match_mask;
-	strmap_put(&opti->paths, fullpath, ci);
 
 	/* If dirmask, recurse into subdirectories */
 	if (dirmask) {
@@ -337,7 +420,7 @@ static int collect_merge_info_callback(int n,
 		}
 
 		original_dir_name = opti->current_dir_name;
-		opti->current_dir_name = fullpath;
+		opti->current_dir_name = pi.string;
 		ret = traverse_trees(NULL, 3, t, &newinfo);
 		opti->current_dir_name = original_dir_name;
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 10/20] merge-ort: avoid recursing into identical trees
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (8 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
                       ` (10 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

When all three trees have the same oid, there is no need to recurse into
these trees to find that all files within them happen to match.  We can
just record any one of the trees as the resolution of merging that
particular path.

Immediately resolving trees for other types of trivial tree merges (such
as one side matches the merge base, or the two sides match each other)
would prevent us from detecting renames for some paths, and thus prevent
us from doing three-way content merges for those paths whose renames we
did not detect.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index bbfc056300b..868ac65091b 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -367,6 +367,19 @@ static int collect_merge_info_callback(int n,
 	fullpath = xmalloc(len + 1);
 	make_traverse_path(fullpath, len + 1, info, p->path, p->pathlen);
 
+	/*
+	 * If mbase, side1, and side2 all match, we can resolve early.  Even
+	 * if these are trees, there will be no renames or anything
+	 * underneath.
+	 */
+	if (side1_matches_mbase && side2_matches_mbase) {
+		/* mbase, side1, & side2 all match; use mbase as resolution */
+		setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+				names, names+0, mbase_null, 0,
+				filemask, dirmask, 1);
+		return mask;
+	}
+
 	/*
 	 * Record information about the path so we can resolve later in
 	 * process_entries.
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 11/20] merge-ort: add a preliminary simple process_entries() implementation
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (9 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
                       ` (9 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add a process_entries() implementation that just loops over the paths
and processes each one individually with an auxiliary process_entry()
call.  Add a basic process_entry() as well, which handles several cases
but leaves a few of the more involved ones with die-not-implemented
messages.  Also, although process_entries() is supposed to create a
tree, it does not yet have code to do so -- except in the special case
of merging completely empty trees.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 102 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 868ac65091b..d78b6b0873d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -492,10 +492,111 @@ static int detect_and_process_renames(struct merge_options *opt,
 	return clean;
 }
 
+/* Per entry merge function */
+static void process_entry(struct merge_options *opt,
+			  const char *path,
+			  struct conflict_info *ci)
+{
+	VERIFY_CI(ci);
+	assert(ci->filemask >= 0 && ci->filemask <= 7);
+	/* ci->match_mask == 7 was handled in collect_merge_info_callback() */
+	assert(ci->match_mask == 0 || ci->match_mask == 3 ||
+	       ci->match_mask == 5 || ci->match_mask == 6);
+
+	if (ci->df_conflict) {
+		die("Not yet implemented.");
+	}
+
+	/*
+	 * NOTE: Below there is a long switch-like if-elseif-elseif... block
+	 *       which the code goes through even for the df_conflict cases
+	 *       above.  Well, it will once we don't die-not-implemented above.
+	 */
+	if (ci->match_mask) {
+		ci->merged.clean = 1;
+		if (ci->match_mask == 6) {
+			/* stages[1] == stages[2] */
+			ci->merged.result.mode = ci->stages[1].mode;
+			oidcpy(&ci->merged.result.oid, &ci->stages[1].oid);
+		} else {
+			/* determine the mask of the side that didn't match */
+			unsigned int othermask = 7 & ~ci->match_mask;
+			int side = (othermask == 4) ? 2 : 1;
+
+			ci->merged.result.mode = ci->stages[side].mode;
+			ci->merged.is_null = !ci->merged.result.mode;
+			oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
+
+			assert(othermask == 2 || othermask == 4);
+			assert(ci->merged.is_null ==
+			       (ci->filemask == ci->match_mask));
+		}
+	} else if (ci->filemask >= 6 &&
+		   (S_IFMT & ci->stages[1].mode) !=
+		   (S_IFMT & ci->stages[2].mode)) {
+		/*
+		 * Two different items from (file/submodule/symlink)
+		 */
+		die("Not yet implemented.");
+	} else if (ci->filemask >= 6) {
+		/*
+		 * TODO: Needs a two-way or three-way content merge, but we're
+		 * just being lazy and copying the version from HEAD and
+		 * leaving it as conflicted.
+		 */
+		ci->merged.clean = 0;
+		ci->merged.result.mode = ci->stages[1].mode;
+		oidcpy(&ci->merged.result.oid, &ci->stages[1].oid);
+	} else if (ci->filemask == 3 || ci->filemask == 5) {
+		/* Modify/delete */
+		die("Not yet implemented.");
+	} else if (ci->filemask == 2 || ci->filemask == 4) {
+		/* Added on one side */
+		int side = (ci->filemask == 4) ? 2 : 1;
+		ci->merged.result.mode = ci->stages[side].mode;
+		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
+		ci->merged.clean = !ci->df_conflict;
+	} else if (ci->filemask == 1) {
+		/* Deleted on both sides */
+		ci->merged.is_null = 1;
+		ci->merged.result.mode = 0;
+		oidcpy(&ci->merged.result.oid, &null_oid);
+		ci->merged.clean = 1;
+	}
+
+	/*
+	 * If still conflicted, record it separately.  This allows us to later
+	 * iterate over just conflicted entries when updating the index instead
+	 * of iterating over all entries.
+	 */
+	if (!ci->merged.clean)
+		strmap_put(&opt->priv->conflicted, path, ci);
+}
+
 static void process_entries(struct merge_options *opt,
 			    struct object_id *result_oid)
 {
-	die("Not yet implemented.");
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+
+	if (strmap_empty(&opt->priv->paths)) {
+		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
+		return;
+	}
+
+	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
+		/*
+		 * NOTE: mi may actually be a pointer to a conflict_info, but
+		 * we have to check mi->clean first to see if it's safe to
+		 * reassign to such a pointer type.
+		 */
+		struct merged_info *mi = e->value;
+
+		if (!mi->clean)
+			process_entry(opt, e->key, e->value);
+	}
+
+	die("Tree creation not yet implemented");
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 12/20] merge-ort: have process_entries operate in a defined order
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (10 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
                       ` (8 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

We want to handle paths below a directory before needing to handle the
directory itself.  Also, we want to handle the directory immediately
after the paths below it, so we can't use simple lexicographic ordering
from strcmp (which would insert foo.txt between foo and foo/file.c).
Copy string_list_df_name_compare() from merge-recursive.c, and set up a
string list of paths sorted by that function so that we can iterate in
the desired order.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index d78b6b0873d..d83ed8768f5 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -492,6 +492,33 @@ static int detect_and_process_renames(struct merge_options *opt,
 	return clean;
 }
 
+static int string_list_df_name_compare(const char *one, const char *two)
+{
+	int onelen = strlen(one);
+	int twolen = strlen(two);
+	/*
+	 * Here we only care that entries for D/F conflicts are
+	 * adjacent, in particular with the file of the D/F conflict
+	 * appearing before files below the corresponding directory.
+	 * The order of the rest of the list is irrelevant for us.
+	 *
+	 * To achieve this, we sort with df_name_compare and provide
+	 * the mode S_IFDIR so that D/F conflicts will sort correctly.
+	 * We use the mode S_IFDIR for everything else for simplicity,
+	 * since in other cases any changes in their order due to
+	 * sorting cause no problems for us.
+	 */
+	int cmp = df_name_compare(one, onelen, S_IFDIR,
+				  two, twolen, S_IFDIR);
+	/*
+	 * Now that 'foo' and 'foo/bar' compare equal, we have to make sure
+	 * that 'foo' comes before 'foo/bar'.
+	 */
+	if (cmp)
+		return cmp;
+	return onelen - twolen;
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
@@ -578,24 +605,44 @@ static void process_entries(struct merge_options *opt,
 {
 	struct hashmap_iter iter;
 	struct strmap_entry *e;
+	struct string_list plist = STRING_LIST_INIT_NODUP;
+	struct string_list_item *entry;
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
 		return;
 	}
 
+	/* Hack to pre-allocate plist to the desired size */
+	ALLOC_GROW(plist.items, strmap_get_size(&opt->priv->paths), plist.alloc);
+
+	/* Put every entry from paths into plist, then sort */
 	strmap_for_each_entry(&opt->priv->paths, &iter, e) {
+		string_list_append(&plist, e->key)->util = e->value;
+	}
+	plist.cmp = string_list_df_name_compare;
+	string_list_sort(&plist);
+
+	/*
+	 * Iterate over the items in reverse order, so we can handle paths
+	 * below a directory before needing to handle the directory itself.
+	 */
+	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
+		char *path = entry->string;
 		/*
 		 * NOTE: mi may actually be a pointer to a conflict_info, but
 		 * we have to check mi->clean first to see if it's safe to
 		 * reassign to such a pointer type.
 		 */
-		struct merged_info *mi = e->value;
+		struct merged_info *mi = entry->util;
 
-		if (!mi->clean)
-			process_entry(opt, e->key, e->value);
+		if (!mi->clean) {
+			struct conflict_info *ci = (struct conflict_info *)mi;
+			process_entry(opt, path, ci);
+		}
 	}
 
+	string_list_clear(&plist, 0);
 	die("Tree creation not yet implemented");
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (11 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
                       ` (7 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

As a step towards transforming the processed path->conflict_info entries
into an actual tree object, start recording basenames, modes, and oids
in a dir_metadata structure.  Subsequent commits will make use of this
to actually write a tree.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 40 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index d83ed8768f5..95369c6a052 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -519,10 +519,31 @@ static int string_list_df_name_compare(const char *one, const char *two)
 	return onelen - twolen;
 }
 
+struct directory_versions {
+	struct string_list versions;
+};
+
+static void record_entry_for_tree(struct directory_versions *dir_metadata,
+				  const char *path,
+				  struct merged_info *mi)
+{
+	const char *basename;
+
+	if (mi->is_null)
+		/* nothing to record */
+		return;
+
+	basename = path + mi->basename_offset;
+	assert(strchr(basename, '/') == NULL);
+	string_list_append(&dir_metadata->versions,
+			   basename)->util = &mi->result;
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
-			  struct conflict_info *ci)
+			  struct conflict_info *ci,
+			  struct directory_versions *dir_metadata)
 {
 	VERIFY_CI(ci);
 	assert(ci->filemask >= 0 && ci->filemask <= 7);
@@ -530,6 +551,14 @@ static void process_entry(struct merge_options *opt,
 	assert(ci->match_mask == 0 || ci->match_mask == 3 ||
 	       ci->match_mask == 5 || ci->match_mask == 6);
 
+	if (ci->dirmask) {
+		record_entry_for_tree(dir_metadata, path, &ci->merged);
+		if (ci->filemask == 0)
+			/* nothing else to handle */
+			return;
+		assert(ci->df_conflict);
+	}
+
 	if (ci->df_conflict) {
 		die("Not yet implemented.");
 	}
@@ -598,6 +627,7 @@ static void process_entry(struct merge_options *opt,
 	 */
 	if (!ci->merged.clean)
 		strmap_put(&opt->priv->conflicted, path, ci);
+	record_entry_for_tree(dir_metadata, path, &ci->merged);
 }
 
 static void process_entries(struct merge_options *opt,
@@ -607,6 +637,7 @@ static void process_entries(struct merge_options *opt,
 	struct strmap_entry *e;
 	struct string_list plist = STRING_LIST_INIT_NODUP;
 	struct string_list_item *entry;
+	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP };
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
@@ -636,13 +667,16 @@ static void process_entries(struct merge_options *opt,
 		 */
 		struct merged_info *mi = entry->util;
 
-		if (!mi->clean) {
+		if (mi->clean)
+			record_entry_for_tree(&dir_metadata, path, mi);
+		else {
 			struct conflict_info *ci = (struct conflict_info *)mi;
-			process_entry(opt, path, ci);
+			process_entry(opt, path, ci, &dir_metadata);
 		}
 	}
 
 	string_list_clear(&plist, 0);
+	string_list_clear(&dir_metadata.versions, 0);
 	die("Tree creation not yet implemented");
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 14/20] merge-ort: step 2 of tree writing -- function to create tree object
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (12 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
                       ` (6 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Create a new function, write_tree(), which will take a list of
basenames, modes, and oids for a single directory and create a tree
object in the object-store.  We do not yet have just basenames, modes,
and oids for just a single directory (we have a mixture of entries from
all directory levels in the hierarchy) so we still die() before the
current call to write_tree(), but the next patch will rectify that.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 95369c6a052..f7041cfeac4 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -19,6 +19,7 @@
 
 #include "diff.h"
 #include "diffcore.h"
+#include "object-store.h"
 #include "strmap.h"
 #include "tree.h"
 #include "xdiff-interface.h"
@@ -523,6 +524,62 @@ struct directory_versions {
 	struct string_list versions;
 };
 
+static int tree_entry_order(const void *a_, const void *b_)
+{
+	const struct string_list_item *a = a_;
+	const struct string_list_item *b = b_;
+
+	const struct merged_info *ami = a->util;
+	const struct merged_info *bmi = b->util;
+	return base_name_compare(a->string, strlen(a->string), ami->result.mode,
+				 b->string, strlen(b->string), bmi->result.mode);
+}
+
+static void write_tree(struct object_id *result_oid,
+		       struct string_list *versions,
+		       unsigned int offset,
+		       size_t hash_size)
+{
+	size_t maxlen = 0, extra;
+	unsigned int nr = versions->nr - offset;
+	struct strbuf buf = STRBUF_INIT;
+	struct string_list relevant_entries = STRING_LIST_INIT_NODUP;
+	int i;
+
+	/*
+	 * We want to sort the last (versions->nr-offset) entries in versions.
+	 * Do so by abusing the string_list API a bit: make another string_list
+	 * that contains just those entries and then sort them.
+	 *
+	 * We won't use relevant_entries again and will let it just pop off the
+	 * stack, so there won't be allocation worries or anything.
+	 */
+	relevant_entries.items = versions->items + offset;
+	relevant_entries.nr = versions->nr - offset;
+	QSORT(relevant_entries.items, relevant_entries.nr, tree_entry_order);
+
+	/* Pre-allocate some space in buf */
+	extra = hash_size + 8; /* 8: 6 for mode, 1 for space, 1 for NUL char */
+	for (i = 0; i < nr; i++) {
+		maxlen += strlen(versions->items[offset+i].string) + extra;
+	}
+	strbuf_grow(&buf, maxlen);
+
+	/* Write each entry out to buf */
+	for (i = 0; i < nr; i++) {
+		struct merged_info *mi = versions->items[offset+i].util;
+		struct version_info *ri = &mi->result;
+		strbuf_addf(&buf, "%o %s%c",
+			    ri->mode,
+			    versions->items[offset+i].string, '\0');
+		strbuf_add(&buf, ri->oid.hash, hash_size);
+	}
+
+	/* Write this object file out, and record in result_oid */
+	write_object_file(buf.buf, buf.len, tree_type, result_oid);
+	strbuf_release(&buf);
+}
+
 static void record_entry_for_tree(struct directory_versions *dir_metadata,
 				  const char *path,
 				  struct merged_info *mi)
@@ -675,9 +732,17 @@ static void process_entries(struct merge_options *opt,
 		}
 	}
 
+	/*
+	 * TODO: We can't actually write a tree yet, because dir_metadata just
+	 * contains all basenames of all files throughout the tree with their
+	 * mode and hash.  Not only is that a nonsensical tree, it will have
+	 * lots of duplicates for paths such as "Makefile" or ".gitignore".
+	 */
+	die("Not yet implemented; need to process subtrees separately");
+	write_tree(result_oid, &dir_metadata.versions, 0,
+		   opt->repo->hash_algo->rawsz);
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
-	die("Tree creation not yet implemented");
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (13 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
                       ` (5 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Our order for processing of entries means that if we have a tree of
files that looks like
   Makefile
   src/moduleA/foo.c
   src/moduleA/bar.c
   src/moduleB/baz.c
   src/moduleB/umm.c
   tokens.txt

Then we will process paths in the order of the leftmost column below.  I
have added two additional columns that help explain the algorithm that
follows; the 2nd column is there to remind us we have oid & mode info we
are tracking for each of these paths (which differs between the paths
which I'm not representing well here), and the third column annotates
the parent directory of the entry:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB
   src/moduleB              <version_info>    src
   src/moduleA/foo.c        <version_info>    src/moduleA
   src/moduleA/bar.c        <version_info>    src/moduleA
   src/moduleA              <version_info>    src
   src                      <version_info>    ""
   Makefile                 <version_info>    ""

When the parent directory changes, if it's a subdirectory of the previous
parent directory (e.g. "" -> src/moduleB) then we can just keep appending.
If the parent directory differs from the previous parent directory and is
not a subdirectory, then we should process that directory.

So, for example, when we get to this point:
   tokens.txt               <version_info>    ""
   src/moduleB/umm.c        <version_info>    src/moduleB
   src/moduleB/baz.c        <version_info>    src/moduleB

and note that the next entry (src/moduleB) has a different parent than
the last one that isn't a subdirectory, we should write out a tree for it
   100644 blob <HASH> umm.c
   100644 blob <HASH> baz.c

then pop all the entries under that directory while recording the new
hash for that directory, leaving us with
   tokens.txt               <version_info>        ""
   src/moduleB              <new version_info>    src

This process repeats until at the end we get to
   tokens.txt               <version_info>        ""
   src                      <new version_info>    ""
   Makefile                 <version_info>        ""

and then we can write out the toplevel tree.  Since we potentially have
entries in our string_list corresponding to multiple different toplevel
directories, e.g. a slightly different repository might have:
   whizbang.txt             <version_info>        ""
   tokens.txt               <version_info>        ""
   src/moduleD              <new version_info>    src
   src/moduleC              <new version_info>    src
   src/moduleB              <new version_info>    src
   src/moduleA/foo.c        <version_info>        src/moduleA
   src/moduleA/bar.c        <version_info>        src/moduleA

When src/moduleA is popped off, we need to know that the "last
directory" reverts back to src, and how many entries in our string_list
are associated with that parent directory.  So I use an auxiliary
offsets string_list which would have (parent_directory,offset)
information of the form
   ""             0
   src            2
   src/moduleA    5

Whenever I write out a tree for a subdirectory, I set versions.nr to
the final offset value and then decrement offsets.nr...and then add
an entry to versions with a hash for the new directory.

The idea is relatively simple, there's just a lot of accounting to
implement this.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 242 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 234 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index f7041cfeac4..a7b0df8cb08 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -521,7 +521,46 @@ static int string_list_df_name_compare(const char *one, const char *two)
 }
 
 struct directory_versions {
+	/*
+	 * versions: list of (basename -> version_info)
+	 *
+	 * The basenames are in reverse lexicographic order of full pathnames,
+	 * as processed in process_entries().  This puts all entries within
+	 * a directory together, and covers the directory itself after
+	 * everything within it, allowing us to write subtrees before needing
+	 * to record information for the tree itself.
+	 */
 	struct string_list versions;
+
+	/*
+	 * offsets: list of (full relative path directories -> integer offsets)
+	 *
+	 * Since versions contains basenames from files in multiple different
+	 * directories, we need to know which entries in versions correspond
+	 * to which directories.  Values of e.g.
+	 *     ""             0
+	 *     src            2
+	 *     src/moduleA    5
+	 * Would mean that entries 0-1 of versions are files in the toplevel
+	 * directory, entries 2-4 are files under src/, and the remaining
+	 * entries starting at index 5 are files under src/moduleA/.
+	 */
+	struct string_list offsets;
+
+	/*
+	 * last_directory: directory that previously processed file found in
+	 *
+	 * last_directory starts NULL, but records the directory in which the
+	 * previous file was found within.  As soon as
+	 *    directory(current_file) != last_directory
+	 * then we need to start updating accounting in versions & offsets.
+	 * Note that last_directory is always the last path in "offsets" (or
+	 * NULL if "offsets" is empty) so this exists just for quick access.
+	 */
+	const char *last_directory;
+
+	/* last_directory_len: cached computation of strlen(last_directory) */
+	unsigned last_directory_len;
 };
 
 static int tree_entry_order(const void *a_, const void *b_)
@@ -596,6 +635,181 @@ static void record_entry_for_tree(struct directory_versions *dir_metadata,
 			   basename)->util = &mi->result;
 }
 
+static void write_completed_directory(struct merge_options *opt,
+				      const char *new_directory_name,
+				      struct directory_versions *info)
+{
+	const char *prev_dir;
+	struct merged_info *dir_info = NULL;
+	unsigned int offset;
+
+	/*
+	 * Some explanation of info->versions and info->offsets...
+	 *
+	 * process_entries() iterates over all relevant files AND
+	 * directories in reverse lexicographic order, and calls this
+	 * function.  Thus, an example of the paths that process_entries()
+	 * could operate on (along with the directories for those paths
+	 * being shown) is:
+	 *
+	 *     xtract.c             ""
+	 *     tokens.txt           ""
+	 *     src/moduleB/umm.c    src/moduleB
+	 *     src/moduleB/stuff.h  src/moduleB
+	 *     src/moduleB/baz.c    src/moduleB
+	 *     src/moduleB          src
+	 *     src/moduleA/foo.c    src/moduleA
+	 *     src/moduleA/bar.c    src/moduleA
+	 *     src/moduleA          src
+	 *     src                  ""
+	 *     Makefile             ""
+	 *
+	 * info->versions:
+	 *
+	 *     always contains the unprocessed entries and their
+	 *     version_info information.  For example, after the first five
+	 *     entries above, info->versions would be:
+	 *
+	 *     	   xtract.c     <xtract.c's version_info>
+	 *     	   token.txt    <token.txt's version_info>
+	 *     	   umm.c        <src/moduleB/umm.c's version_info>
+	 *     	   stuff.h      <src/moduleB/stuff.h's version_info>
+	 *     	   baz.c        <src/moduleB/baz.c's version_info>
+	 *
+	 *     Once a subdirectory is completed we remove the entries in
+	 *     that subdirectory from info->versions, writing it as a tree
+	 *     (write_tree()).  Thus, as soon as we get to src/moduleB,
+	 *     info->versions would be updated to
+	 *
+	 *     	   xtract.c     <xtract.c's version_info>
+	 *     	   token.txt    <token.txt's version_info>
+	 *     	   moduleB      <src/moduleB's version_info>
+	 *
+	 * info->offsets:
+	 *
+	 *     helps us track which entries in info->versions correspond to
+	 *     which directories.  When we are N directories deep (e.g. 4
+	 *     for src/modA/submod/subdir/), we have up to N+1 unprocessed
+	 *     directories (+1 because of toplevel dir).  Corresponding to
+	 *     the info->versions example above, after processing five entries
+	 *     info->offsets will be:
+	 *
+	 *     	   ""           0
+	 *     	   src/moduleB  2
+	 *
+	 *     which is used to know that xtract.c & token.txt are from the
+	 *     toplevel dirctory, while umm.c & stuff.h & baz.c are from the
+	 *     src/moduleB directory.  Again, following the example above,
+	 *     once we need to process src/moduleB, then info->offsets is
+	 *     updated to
+	 *
+	 *     	   ""           0
+	 *     	   src          2
+	 *
+	 *     which says that moduleB (and only moduleB so far) is in the
+	 *     src directory.
+	 *
+	 *     One unique thing to note about info->offsets here is that
+	 *     "src" was not added to info->offsets until there was a path
+	 *     (a file OR directory) immediately below src/ that got
+	 *     processed.
+	 *
+	 * Since process_entry() just appends new entries to info->versions,
+	 * write_completed_directory() only needs to do work if the next path
+	 * is in a directory that is different than the last directory found
+	 * in info->offsets.
+	 */
+
+	/*
+	 * If we are working with the same directory as the last entry, there
+	 * is no work to do.  (See comments above the directory_name member of
+	 * struct merged_info for why we can use pointer comparison instead of
+	 * strcmp here.)
+	 */
+	if (new_directory_name == info->last_directory)
+		return;
+
+	/*
+	 * If we are just starting (last_directory is NULL), or last_directory
+	 * is a prefix of the current directory, then we can just update
+	 * info->offsets to record the offset where we started this directory
+	 * and update last_directory to have quick access to it.
+	 */
+	if (info->last_directory == NULL ||
+	    !strncmp(new_directory_name, info->last_directory,
+		     info->last_directory_len)) {
+		uintptr_t offset = info->versions.nr;
+
+		info->last_directory = new_directory_name;
+		info->last_directory_len = strlen(info->last_directory);
+		/*
+		 * Record the offset into info->versions where we will
+		 * start recording basenames of paths found within
+		 * new_directory_name.
+		 */
+		string_list_append(&info->offsets,
+				   info->last_directory)->util = (void*)offset;
+		return;
+	}
+
+	/*
+	 * The next entry that will be processed will be within
+	 * new_directory_name.  Since at this point we know that
+	 * new_directory_name is within a different directory than
+	 * info->last_directory, we have all entries for info->last_directory
+	 * in info->versions and we need to create a tree object for them.
+	 */
+	dir_info = strmap_get(&opt->priv->paths, info->last_directory);
+	assert(dir_info);
+	offset = (uintptr_t)info->offsets.items[info->offsets.nr-1].util;
+	if (offset == info->versions.nr) {
+		/*
+		 * Actually, we don't need to create a tree object in this
+		 * case.  Whenever all files within a directory disappear
+		 * during the merge (e.g. unmodified on one side and
+		 * deleted on the other, or files were renamed elsewhere),
+		 * then we get here and the directory itself needs to be
+		 * omitted from its parent tree as well.
+		 */
+		dir_info->is_null = 1;
+	} else {
+		/*
+		 * Write out the tree to the git object directory, and also
+		 * record the mode and oid in dir_info->result.
+		 */
+		dir_info->is_null = 0;
+		dir_info->result.mode = S_IFDIR;
+		write_tree(&dir_info->result.oid, &info->versions, offset,
+			   opt->repo->hash_algo->rawsz);
+	}
+
+	/*
+	 * We've now used several entries from info->versions and one entry
+	 * from info->offsets, so we get rid of those values.
+	 */
+	info->offsets.nr--;
+	info->versions.nr = offset;
+
+	/*
+	 * Now we've taken care of the completed directory, but we need to
+	 * prepare things since future entries will be in
+	 * new_directory_name.  (In particular, process_entry() will be
+	 * appending new entries to info->versions.)  So, we need to make
+	 * sure new_directory_name is the last entry in info->offsets.
+	 */
+	prev_dir = info->offsets.nr == 0 ? NULL :
+		   info->offsets.items[info->offsets.nr-1].string;
+	if (new_directory_name != prev_dir) {
+		uintptr_t c = info->versions.nr;
+		string_list_append(&info->offsets,
+				   new_directory_name)->util = (void*)c;
+	}
+
+	/* And, of course, we need to update last_directory to match. */
+	info->last_directory = new_directory_name;
+	info->last_directory_len = strlen(info->last_directory);
+}
+
 /* Per entry merge function */
 static void process_entry(struct merge_options *opt,
 			  const char *path,
@@ -694,7 +908,9 @@ static void process_entries(struct merge_options *opt,
 	struct strmap_entry *e;
 	struct string_list plist = STRING_LIST_INIT_NODUP;
 	struct string_list_item *entry;
-	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP };
+	struct directory_versions dir_metadata = { STRING_LIST_INIT_NODUP,
+						   STRING_LIST_INIT_NODUP,
+						   NULL, 0 };
 
 	if (strmap_empty(&opt->priv->paths)) {
 		oidcpy(result_oid, opt->repo->hash_algo->empty_tree);
@@ -714,6 +930,11 @@ static void process_entries(struct merge_options *opt,
 	/*
 	 * Iterate over the items in reverse order, so we can handle paths
 	 * below a directory before needing to handle the directory itself.
+	 *
+	 * This allows us to write subtrees before we need to write trees,
+	 * and it also enables sane handling of directory/file conflicts
+	 * (because it allows us to know whether the directory is still in
+	 * the way when it is time to process the file at the same path).
 	 */
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
@@ -724,6 +945,8 @@ static void process_entries(struct merge_options *opt,
 		 */
 		struct merged_info *mi = entry->util;
 
+		write_completed_directory(opt, mi->directory_name,
+					  &dir_metadata);
 		if (mi->clean)
 			record_entry_for_tree(&dir_metadata, path, mi);
 		else {
@@ -732,17 +955,20 @@ static void process_entries(struct merge_options *opt,
 		}
 	}
 
-	/*
-	 * TODO: We can't actually write a tree yet, because dir_metadata just
-	 * contains all basenames of all files throughout the tree with their
-	 * mode and hash.  Not only is that a nonsensical tree, it will have
-	 * lots of duplicates for paths such as "Makefile" or ".gitignore".
-	 */
-	die("Not yet implemented; need to process subtrees separately");
+	if (dir_metadata.offsets.nr != 1 ||
+	    (uintptr_t)dir_metadata.offsets.items[0].util != 0) {
+		printf("dir_metadata.offsets.nr = %d (should be 1)\n",
+		       dir_metadata.offsets.nr);
+		printf("dir_metadata.offsets.items[0].util = %u (should be 0)\n",
+		       (unsigned)(uintptr_t)dir_metadata.offsets.items[0].util);
+		fflush(stdout);
+		BUG("dir_metadata accounting completely off; shouldn't happen");
+	}
 	write_tree(result_oid, &dir_metadata.versions, 0,
 		   opt->repo->hash_algo->rawsz);
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
+	string_list_clear(&dir_metadata.offsets, 0);
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 16/20] merge-ort: basic outline for merge_switch_to_result()
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (14 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This adds a basic implementation for merge_switch_to_result(), though
just in terms of a few new empty functions that will be defined in
subsequent commits.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index a7b0df8cb08..ee7fbe71404 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -971,13 +971,53 @@ static void process_entries(struct merge_options *opt,
 	string_list_clear(&dir_metadata.offsets, 0);
 }
 
+static int checkout(struct merge_options *opt,
+		    struct tree *prev,
+		    struct tree *next)
+{
+	die("Not yet implemented.");
+}
+
+static int record_conflicted_index_entries(struct merge_options *opt,
+					   struct index_state *index,
+					   struct strmap *paths,
+					   struct strmap *conflicted)
+{
+	if (strmap_empty(conflicted))
+		return 0;
+
+	die("Not yet implemented.");
+}
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
 			    int update_worktree_and_index,
 			    int display_update_msgs)
 {
-	die("Not yet implemented");
+	assert(opt->priv == NULL);
+	if (result->clean >= 0 && update_worktree_and_index) {
+		struct merge_options_internal *opti = result->priv;
+
+		if (checkout(opt, head, result->tree)) {
+			/* failure to function */
+			result->clean = -1;
+			return;
+		}
+
+		if (record_conflicted_index_entries(opt, opt->repo->index,
+						    &opti->paths,
+						    &opti->conflicted)) {
+			/* failure to function */
+			result->clean = -1;
+			return;
+		}
+	}
+
+	if (display_update_msgs) {
+		/* TODO: print out CONFLICT and other informational messages. */
+	}
+
 	merge_finalize(opt, result);
 }
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 17/20] merge-ort: add implementation of checkout()
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (15 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Since merge-ort creates a tree for its output, when there are no
conflicts, updating the working tree and index is as simple as using the
unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent
of a "checkout" operation).

If there were conflicts in the merge, then since the tree we created
included all the conflict markers, then using the unpack_trees machinery
in this manner will still update the working tree correctly.  Further,
all index entries corresponding to cleanly merged files will also be
updated correctly by this procedure.  Index entries corresponding to
conflicted entries will appear as though the user had run "git add -u"
after the merge to accept all files as-is with conflict markers.

Thus, after running unpack_trees(), there needs to be a separate step
for updating the entries in the index corresponding to conflicted files.
This will be the job for the function record_conflicted_index_entris(),
which will be implemented in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index ee7fbe71404..3c4f64e2675 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -19,9 +19,11 @@
 
 #include "diff.h"
 #include "diffcore.h"
+#include "dir.h"
 #include "object-store.h"
 #include "strmap.h"
 #include "tree.h"
+#include "unpack-trees.h"
 #include "xdiff-interface.h"
 
 /*
@@ -975,7 +977,48 @@ static int checkout(struct merge_options *opt,
 		    struct tree *prev,
 		    struct tree *next)
 {
-	die("Not yet implemented.");
+	/* Switch the index/working copy from old to new */
+	int ret;
+	struct tree_desc trees[2];
+	struct unpack_trees_options unpack_opts;
+
+	memset(&unpack_opts, 0, sizeof(unpack_opts));
+	unpack_opts.head_idx = -1;
+	unpack_opts.src_index = opt->repo->index;
+	unpack_opts.dst_index = opt->repo->index;
+
+	setup_unpack_trees_porcelain(&unpack_opts, "merge");
+
+	/*
+	 * NOTE: if this were just "git checkout" code, we would probably
+	 * read or refresh the cache and check for a conflicted index, but
+	 * builtin/merge.c or sequencer.c really needs to read the index
+	 * and check for conflicted entries before starting merging for a
+	 * good user experience (no sense waiting for merges/rebases before
+	 * erroring out), so there's no reason to duplicate that work here.
+	 */
+
+	/* 2-way merge to the new branch */
+	unpack_opts.update = 1;
+	unpack_opts.merge = 1;
+	unpack_opts.quiet = 0; /* FIXME: sequencer might want quiet? */
+	unpack_opts.verbose_update = (opt->verbosity > 2);
+	unpack_opts.fn = twoway_merge;
+	if (1/* FIXME: opts->overwrite_ignore*/) {
+		unpack_opts.dir = xcalloc(1, sizeof(*unpack_opts.dir));
+		unpack_opts.dir->flags |= DIR_SHOW_IGNORED;
+		setup_standard_excludes(unpack_opts.dir);
+	}
+	parse_tree(prev);
+	init_tree_desc(&trees[0], prev->buffer, prev->size);
+	parse_tree(next);
+	init_tree_desc(&trees[1], next->buffer, next->size);
+
+	ret = unpack_trees(2, trees, &unpack_opts);
+	clear_unpack_trees_porcelain(&unpack_opts);
+	dir_clear(unpack_opts.dir);
+	FREE_AND_NULL(unpack_opts.dir);
+	return ret;
 }
 
 static int record_conflicted_index_entries(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (16 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 tree.c | 2 +-
 tree.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tree.c b/tree.c
index e76517f6b18..a52479812ce 100644
--- a/tree.c
+++ b/tree.c
@@ -144,7 +144,7 @@ int read_tree_recursive(struct repository *r,
 	return ret;
 }
 
-static int cmp_cache_name_compare(const void *a_, const void *b_)
+int cmp_cache_name_compare(const void *a_, const void *b_)
 {
 	const struct cache_entry *ce1, *ce2;
 
diff --git a/tree.h b/tree.h
index 93837450739..3eb0484cbf2 100644
--- a/tree.h
+++ b/tree.h
@@ -28,6 +28,8 @@ void free_tree_buffer(struct tree *tree);
 /* Parses and returns the tree in the given ent, chasing tags and commits. */
 struct tree *parse_tree_indirect(const struct object_id *oid);
 
+int cmp_cache_name_compare(const void *a_, const void *b_);
+
 #define READ_TREE_RECURSIVE 1
 typedef int (*read_tree_fn_t)(const struct object_id *, struct strbuf *, const char *, unsigned int, int, void *);
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 19/20] merge-ort: add implementation of record_conflicted_index_entries()
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (17 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-13  8:04     ` [PATCH v3 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
  2020-12-14 14:24     ` [PATCH v3 00/20] fundamentals of merge-ort implementation Felipe Contreras
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

After checkout(), the working tree has the appropriate contents, and the
index matches the working copy.  That means that all unmodified and
cleanly merged files have correct index entries, but conflicted entries
need to be updated.

We do this by looping over the conflicted entries, marking the existing
index entry for the path with CE_REMOVE, adding new higher order staged
for the path at the end of the index (ignoring normal index sort order),
and then at the end of the loop removing the CE_REMOVED-marked cache
entries and sorting the index.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 3c4f64e2675..47cd772e805 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,7 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "cache-tree.h"
 #include "diff.h"
 #include "diffcore.h"
 #include "dir.h"
@@ -1026,10 +1027,95 @@ static int record_conflicted_index_entries(struct merge_options *opt,
 					   struct strmap *paths,
 					   struct strmap *conflicted)
 {
+	struct hashmap_iter iter;
+	struct strmap_entry *e;
+	int errs = 0;
+	int original_cache_nr;
+
 	if (strmap_empty(conflicted))
 		return 0;
 
-	die("Not yet implemented.");
+	original_cache_nr = index->cache_nr;
+
+	/* Put every entry from paths into plist, then sort */
+	strmap_for_each_entry(conflicted, &iter, e) {
+		const char *path = e->key;
+		struct conflict_info *ci = e->value;
+		int pos;
+		struct cache_entry *ce;
+		int i;
+
+		VERIFY_CI(ci);
+
+		/*
+		 * The index will already have a stage=0 entry for this path,
+		 * because we created an as-merged-as-possible version of the
+		 * file and checkout() moved the working copy and index over
+		 * to that version.
+		 *
+		 * However, previous iterations through this loop will have
+		 * added unstaged entries to the end of the cache which
+		 * ignore the standard alphabetical ordering of cache
+		 * entries and break invariants needed for index_name_pos()
+		 * to work.  However, we know the entry we want is before
+		 * those appended cache entries, so do a temporary swap on
+		 * cache_nr to only look through entries of interest.
+		 */
+		SWAP(index->cache_nr, original_cache_nr);
+		pos = index_name_pos(index, path, strlen(path));
+		SWAP(index->cache_nr, original_cache_nr);
+		if (pos < 0) {
+			if (ci->filemask != 1)
+				BUG("Conflicted %s but nothing in basic working tree or index; this shouldn't happen", path);
+			cache_tree_invalidate_path(index, path);
+		} else {
+			ce = index->cache[pos];
+
+			/*
+			 * Clean paths with CE_SKIP_WORKTREE set will not be
+			 * written to the working tree by the unpack_trees()
+			 * call in checkout().  Our conflicted entries would
+			 * have appeared clean to that code since we ignored
+			 * the higher order stages.  Thus, we need override
+			 * the CE_SKIP_WORKTREE bit and manually write those
+			 * files to the working disk here.
+			 *
+			 * TODO: Implement this CE_SKIP_WORKTREE fixup.
+			 */
+
+			/*
+			 * Mark this cache entry for removal and instead add
+			 * new stage>0 entries corresponding to the
+			 * conflicts.  If there are many conflicted entries, we
+			 * want to avoid memmove'ing O(NM) entries by
+			 * inserting the new entries one at a time.  So,
+			 * instead, we just add the new cache entries to the
+			 * end (ignoring normal index requirements on sort
+			 * order) and sort the index once we're all done.
+			 */
+			ce->ce_flags |= CE_REMOVE;
+		}
+
+		for (i = MERGE_BASE; i <= MERGE_SIDE2; i++) {
+			struct version_info *vi;
+			if (!(ci->filemask & (1ul << i)))
+				continue;
+			vi = &ci->stages[i];
+			ce = make_cache_entry(index, vi->mode, &vi->oid,
+					      path, i+1, 0);
+			add_index_entry(index, ce, ADD_CACHE_JUST_APPEND);
+		}
+	}
+
+	/*
+	 * Remove the unused cache entries (and invalidate the relevant
+	 * cache-trees), then sort the index entries to get the conflicted
+	 * entries we added to the end into their right locations.
+	 */
+	remove_marked_cache_entries(index, 1);
+	QSORT(index->cache, index->cache_nr, cmp_cache_name_compare);
+
+	return errs;
 }
 
 void merge_switch_to_result(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v3 20/20] merge-ort: free data structures in merge_finalize()
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (18 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
@ 2020-12-13  8:04     ` Elijah Newren via GitGitGadget
  2020-12-14 14:24     ` [PATCH v3 00/20] fundamentals of merge-ort implementation Felipe Contreras
  20 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-13  8:04 UTC (permalink / raw)
  To: git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 47cd772e805..51b049358e4 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -209,6 +209,16 @@ struct conflict_info {
 	assert((ci) && !(mi)->clean);        \
 } while (0)
 
+static void free_strmap_strings(struct strmap *map)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	strmap_for_each_entry(map, &iter, entry) {
+		free((char*)entry->key);
+	}
+}
+
 static int err(struct merge_options *opt, const char *err, ...)
 {
 	va_list params;
@@ -1153,7 +1163,27 @@ void merge_switch_to_result(struct merge_options *opt,
 void merge_finalize(struct merge_options *opt,
 		    struct merge_result *result)
 {
-	die("Not yet implemented");
+	struct merge_options_internal *opti = result->priv;
+
+	assert(opt->priv == NULL);
+
+	/*
+	 * We marked opti->paths with strdup_strings = 0, so that we
+	 * wouldn't have to make another copy of the fullpath created by
+	 * make_traverse_path from setup_path_info().  But, now that we've
+	 * used it and have no other references to these strings, it is time
+	 * to deallocate them.
+	 */
+	free_strmap_strings(&opti->paths);
+	strmap_clear(&opti->paths, 1);
+
+	/*
+	 * All keys and values in opti->conflicted are a subset of those in
+	 * opti->paths.  We don't want to deallocate anything twice, so we
+	 * don't free the keys and we pass 0 for free_values.
+	 */
+	strmap_clear(&opti->conflicted, 0);
+	FREE_AND_NULL(opti);
 }
 
 static void merge_start(struct merge_options *opt, struct merge_result *result)
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 84+ messages in thread

* RE: [PATCH v3 00/20] fundamentals of merge-ort implementation
  2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
                       ` (19 preceding siblings ...)
  2020-12-13  8:04     ` [PATCH v3 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
@ 2020-12-14 14:24     ` Felipe Contreras
  2020-12-14 16:24       ` Elijah Newren
  20 siblings, 1 reply; 84+ messages in thread
From: Felipe Contreras @ 2020-12-14 14:24 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: jonathantanmy, dstolee, Elijah Newren,
	Ævar Arnfjörð Bjarmason, Elijah Newren

Elijah Newren via GitGitGadget wrote:
> This is actually v5 of this series, and is being sent due to review comments
> from a different series, namely en/merge-ort-3[1].
> 
> I have rerolls of en/merge-ort-2 and en/merge-ort-3 already prepared, but
> since gitgitgadget will not allow me to send a series dependent on a
> not-published-by-Junio series, I cannot yet send them. You will need to
> temporarily drop them, and I'll resend after you publish the updated version
> of this series. I do not like this solution, and I was tempted to just push
> the updates into en/merge-ort-3, but since this series was still hanging in
> 'seen' awaiting feedback and a lot of the suggestions were for things from
> this series, I decided to go this route anyway...

You could send it the old-fashioned way.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v3 00/20] fundamentals of merge-ort implementation
  2020-12-14 14:24     ` [PATCH v3 00/20] fundamentals of merge-ort implementation Felipe Contreras
@ 2020-12-14 16:24       ` Elijah Newren
  0 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren @ 2020-12-14 16:24 UTC (permalink / raw)
  To: Felipe Contreras
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Tan,
	Derrick Stolee, Ævar Arnfjörð Bjarmason

On Mon, Dec 14, 2020 at 6:24 AM Felipe Contreras
<felipe.contreras@gmail.com> wrote:
>
> Elijah Newren via GitGitGadget wrote:
> > This is actually v5 of this series, and is being sent due to review comments
> > from a different series, namely en/merge-ort-3[1].
> >
> > I have rerolls of en/merge-ort-2 and en/merge-ort-3 already prepared, but
> > since gitgitgadget will not allow me to send a series dependent on a
> > not-published-by-Junio series, I cannot yet send them. You will need to
> > temporarily drop them, and I'll resend after you publish the updated version
> > of this series. I do not like this solution, and I was tempted to just push
> > the updates into en/merge-ort-3, but since this series was still hanging in
> > 'seen' awaiting feedback and a lot of the suggestions were for things from
> > this series, I decided to go this route anyway...
>
> You could send it the old-fashioned way.

As mentioned in the first line of the message, I already did the first
two rounds that way.  There are sometimes reasons to use send-email,
but gitgitgadget comes with really nice cross-platform testing so I do
tend to prefer it.  Also, mix-and-matching between send-email and
gitgitgadget causes some confusion on which-number-in-the-series-is-it
counts (as noted above), so I tend to avoid it.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-11 20:48     ` Derrick Stolee
@ 2020-11-11 21:18       ` Elijah Newren
  0 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren @ 2020-11-11 21:18 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

Hi Derrick,

On Wed, Nov 11, 2020 at 12:48 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/11/2020 1:35 PM, Elijah Newren wrote:
> > On Wed, Nov 11, 2020 at 9:09 AM Derrick Stolee <stolee@gmail.com> wrote:
> >> For the series as a whole I'd love to see at least one test that
> >> demonstrates that this code does something, if even only for a very
> >> narrow case.
> >>
> >> There's a lot of code being moved here, and it would be nice to have
> >> even a very simple test case that can check that we didn't leave any
> >> important die("not implemented") calls lying around or worse accessing
> >> an uninitialized pointer or something.
> >
> > We absolutely left several die("not implemented") calls lying around.
> > The series was long enough at 20 patches; reviewers lose steam at 10
> > (at least both you and Jonathan have), so maybe I should have left
> > even more in there as an attempt to split up this series more.
> >
> > However, if you run the testsuite with GIT_TEST_MERGE_ALGORITHM=ort,
> > then this series drops the number of failures in the testsuite from
> > around 2200, down to 1500.  So, there's about 700 testcases for you.
>
> Sorry that I'm jumping in to the series-of-series in the middle, so
> I am unfamiliar with the previous progress and testing strategy. This

Not a problem at all.  Thanks much for jumping in and taking a look!
You always provide some good feedback and suggestions.

(Besides, those testcase changes have been spread over two and a half
years...hard to stay on top of all of them.)

> "number of test failures" metric is sufficient to demonstrate the
> progress provided in this series. Perhaps it was even in your v1 cover
> letter.

Um, oops; it's not.  I did mention there were still some "not
implemented" messages left, but didn't mention the testcase counts.
But even that mention is apparently in the v1 cover letter rather than
v2, and v2 wasn't sent in-reply-to v1, so it's harder to catch that.
Sorry about that; I'll include the testcase counts in the v3 cover
letter.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-11 18:35   ` Elijah Newren
@ 2020-11-11 20:48     ` Derrick Stolee
  2020-11-11 21:18       ` Elijah Newren
  0 siblings, 1 reply; 84+ messages in thread
From: Derrick Stolee @ 2020-11-11 20:48 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

On 11/11/2020 1:35 PM, Elijah Newren wrote:
> On Wed, Nov 11, 2020 at 9:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>> For the series as a whole I'd love to see at least one test that
>> demonstrates that this code does something, if even only for a very
>> narrow case.
>>
>> There's a lot of code being moved here, and it would be nice to have
>> even a very simple test case that can check that we didn't leave any
>> important die("not implemented") calls lying around or worse accessing
>> an uninitialized pointer or something.
> 
> We absolutely left several die("not implemented") calls lying around.
> The series was long enough at 20 patches; reviewers lose steam at 10
> (at least both you and Jonathan have), so maybe I should have left
> even more in there as an attempt to split up this series more.
> 
> However, if you run the testsuite with GIT_TEST_MERGE_ALGORITHM=ort,
> then this series drops the number of failures in the testsuite from
> around 2200, down to 1500.  So, there's about 700 testcases for you.

Sorry that I'm jumping in to the series-of-series in the middle, so
I am unfamiliar with the previous progress and testing strategy. This
"number of test failures" metric is sufficient to demonstrate the
progress provided in this series. Perhaps it was even in your v1 cover
letter.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-11 17:08 ` Derrick Stolee
@ 2020-11-11 18:35   ` Elijah Newren
  2020-11-11 20:48     ` Derrick Stolee
  0 siblings, 1 reply; 84+ messages in thread
From: Elijah Newren @ 2020-11-11 18:35 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Wed, Nov 11, 2020 at 9:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/2/2020 3:43 PM, Elijah Newren wrote:
> > Elijah Newren (20):
> >   merge-ort: setup basic internal data structures
> >   merge-ort: add some high-level algorithm structure
> >   merge-ort: port merge_start() from merge-recursive
> >   merge-ort: use histogram diff
> >   merge-ort: add an err() function similar to one from merge-recursive
> >   merge-ort: implement a very basic collect_merge_info()
> >   merge-ort: avoid repeating fill_tree_descriptor() on the same tree
> >   merge-ort: compute a few more useful fields for collect_merge_info
> >   merge-ort: record stage and auxiliary info for every path
> >   merge-ort: avoid recursing into identical trees
> >   merge-ort: add a preliminary simple process_entries() implementation
> >   merge-ort: have process_entries operate in a defined order
>
> I got this far before my attention to detail really started slipping.
>
> >   merge-ort: step 1 of tree writing -- record basenames, modes, and oids
> >   merge-ort: step 2 of tree writing -- function to create tree object
> >   merge-ort: step 3 of tree writing -- handling subdirectories as we go
> >   merge-ort: basic outline for merge_switch_to_result()
> >   merge-ort: add implementation of checkout()
> >   tree: enable cmp_cache_name_compare() to be used elsewhere
> >   merge-ort: add implementation of record_unmerged_index_entries()
> >   merge-ort: free data structures in merge_finalize()
>
> I'll try to take another pass on these commits tomorrow.
>
> For the series as a whole I'd love to see at least one test that
> demonstrates that this code does something, if even only for a very
> narrow case.
>
> There's a lot of code being moved here, and it would be nice to have
> even a very simple test case that can check that we didn't leave any
> important die("not implemented") calls lying around or worse accessing
> an uninitialized pointer or something.

We absolutely left several die("not implemented") calls lying around.
The series was long enough at 20 patches; reviewers lose steam at 10
(at least both you and Jonathan have), so maybe I should have left
even more in there as an attempt to split up this series more.

However, if you run the testsuite with GIT_TEST_MERGE_ALGORITHM=ort,
then this series drops the number of failures in the testsuite from
around 2200, down to 1500.  So, there's about 700 testcases for you.

Also, there were several preparatory series all designed for getting
the testsuite in order for this new merge algorithm.  See the
following currently cooking topics:
  * en/merge-tests topic
  * en/dir-rename-tests
and the following topics that were previously merged:
  * 36d225c7d4 ("Merge branch 'en/merge-tests'", 2020-08-19)
  * cf372dc815 ("Merge branch 'en/test-cleanup'", 2020-03-09)
  * ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04)
  * c99033060f ("Merge branch
'en/t7405-recursive-submodule-conflicts'", 2018-08-02)
  * e6da45c7cd ("Merge branch 'en/t6036-merge-recursive-tests'", 2018-08-02)
  * 84e74c6403 ("Merge branch
'en/t6042-insane-merge-rename-testcases'", 2018-08-02)
  * bba1a5559c ("Merge branch 'en/t6036-recursive-corner-cases'", 2018-08-02)
  * 93b74a7cfa ("Merge branch 'en/merge-recursive-tests'", 2018-06-25)
and maybe others I missed.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-02 20:43 [PATCH v2 " Elijah Newren
  2020-11-03 14:49 ` Derrick Stolee
@ 2020-11-11 17:08 ` Derrick Stolee
  2020-11-11 18:35   ` Elijah Newren
  1 sibling, 1 reply; 84+ messages in thread
From: Derrick Stolee @ 2020-11-11 17:08 UTC (permalink / raw)
  To: Elijah Newren, git

On 11/2/2020 3:43 PM, Elijah Newren wrote:
> Elijah Newren (20):
>   merge-ort: setup basic internal data structures
>   merge-ort: add some high-level algorithm structure
>   merge-ort: port merge_start() from merge-recursive
>   merge-ort: use histogram diff
>   merge-ort: add an err() function similar to one from merge-recursive
>   merge-ort: implement a very basic collect_merge_info()
>   merge-ort: avoid repeating fill_tree_descriptor() on the same tree
>   merge-ort: compute a few more useful fields for collect_merge_info
>   merge-ort: record stage and auxiliary info for every path
>   merge-ort: avoid recursing into identical trees
>   merge-ort: add a preliminary simple process_entries() implementation
>   merge-ort: have process_entries operate in a defined order

I got this far before my attention to detail really started slipping.

>   merge-ort: step 1 of tree writing -- record basenames, modes, and oids
>   merge-ort: step 2 of tree writing -- function to create tree object
>   merge-ort: step 3 of tree writing -- handling subdirectories as we go
>   merge-ort: basic outline for merge_switch_to_result()
>   merge-ort: add implementation of checkout()
>   tree: enable cmp_cache_name_compare() to be used elsewhere
>   merge-ort: add implementation of record_unmerged_index_entries()
>   merge-ort: free data structures in merge_finalize()

I'll try to take another pass on these commits tomorrow.

For the series as a whole I'd love to see at least one test that
demonstrates that this code does something, if even only for a very
narrow case.

There's a lot of code being moved here, and it would be nice to have
even a very simple test case that can check that we didn't leave any
important die("not implemented") calls lying around or worse accessing
an uninitialized pointer or something.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-09 19:51               ` Derrick Stolee
@ 2020-11-09 22:44                 ` Elijah Newren
  0 siblings, 0 replies; 84+ messages in thread
From: Elijah Newren @ 2020-11-09 22:44 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List, Jeff Hostetler

Hi Derrick,

On Mon, Nov 9, 2020 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/9/20 12:13 PM, Elijah Newren wrote:> Actually, this was pretty enlightening.  I think I know about what's
> > happening...
> >
> > First, a few years ago, Ben said that merges in the Microsoft repos
> > took about an hour[1]:
> > "For the repro that I have been using this drops the merge time from ~1 hour to
> > ~5 minutes and the unmerged entries goes down from ~40,000 to 1."
> > The change he made to drop it that far was to turn off rename detection.
> >
> > [1] https://lore.kernel.org/git/20180426205202.23056-1-benpeart@microsoft.com/
> >
> > Keep that in mind, especially since your times are actually
> > significantly less than 5 minutes...
>
> Yes, the other thing to keep in mind is that this is
> a Scalar repo with the default cone-mode sparse-checkout
> of only the files at root. For this repo, that means that
> there are only ~10 files actually present.
>
> I wanted to remove any working directory updates/checks
> from the performance check as possible.

Ah, that explains how you got under 20s.  I remember elsewhere on the
list someone (I think it was Ben again) mentioned that a "git checkout
-b <newbranch>" took 20s, despite no need to update the working tree
or index.

I have only done one cursory test of merge-ort with sparse-checkouts;
I should do more.  There might be a bug somewhere, though it does at
least pass the regression tests and I think for the most part it's
actually better: there are cases where merge-recursive will vivify
files outside the sparse-checkout which were not conflicted (see e.g.
https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/);
in contrast, merge-ort shouldn't have any such cases -- it'll only add
files to the working copy if they match the sparsity patterns or the
path has conflicts.

> >> $ /_git/git/summarize-perf git rebase --onto to from test
> >> Successfully rebased and updated refs/heads/test.
> >> Accumulated times:
> >>     8.511 : <unmeasured> (74.9%)
> >
> > Wild guess: This is setup_git_directory() loading your ~3 million entry index.
>
> I think there is also some commit walking happening, but
> it shouldn't be too much. 'from' and 'to' are not very
> far away.

Makes sense.  I suspect that with your commit-graphs this ends up
being fast enough that you might have difficulty even measuring it,
though.

> > Did you include two runs of recursive and two runs of ort just to show
> > that the timings were stable and thus there wasn't warm or cold disk
> > cache issues affecting things?  If so, good plan.  (If there was
> > another reason, let me know; I missed it.)
>
> For the rebase, I did "--onto to from test" and "--onto from to test"
> to show both directions of the rebase. The merge I did twice for the
> cache issues ;)

Oh, good call.  Thanks for pointing it out, I missed that on first reading.

> > .004s on label:incore_nonrecursive -- that's the actual merge
> > operation.  This was a trivial rebase, and the merging took just 4
> > milliseconds.  But the overall run took 11.442 seconds because working
> > with 3M+ entries in the index just takes forever, and my code didn't
> > touch any on-disk formats, certainly not the index format.
> >
> > _All_ of my optimization work was on the merging piece, not the stuff
> > outside.  But for what you're testing here, it appears to be
> > irrelevant compared to the overhead.
>
> OK, so since we already disable rename detection through config,
> the machinery that you are changing is already fast with the old
> algorithm in these trivial cases.
>
> To actually show any benefits, we would need to disable rename
> detection or use a larger change.

...or both.  :-)

> >> And here are timings for a simple merge. Two files at root were changed in the
> >> commits I made, but there are also some larger changes from the commit history.
> >> These should all be seen as "this tree updated in one of the two, so take that
> >> tree".
> >
> > Ahah!  That's a microsoft-specific optimization you guys made in the
> > recursive strategy, yes?
>
> I'm not aware of any logic we have that's different from core Git.
> The config we use [1] includes "merge.stat = false" and "merge.renames
> = false" but otherwise seems to be using stock Git.
>
> [1] https://github.com/microsoft/scalar/blob/1d7938d2df6921f7a3b4f3f1cce56a00929adc40/Scalar.Common/Maintenance/ConfigStep.cs#L100-L127
>
> I'm CC'ing Jeff Hostetler to see if he knows anything about a custom
> merge algorithm in microsoft/git.

Oh, I took your wording that 'These should all be seen as "this tree
updated in one of the two, so take that tree"' as an implication that
you had a special merge tweak and wanted to verify it didn't regress.
I think I read too much into your wording.

Also, thinking over it more, I remember now that Ben also turned on
unpack_opts.aggressive when rename detection was turned off -- see
commit 6f10a09e0a ("merge: pass aggressive when rename detection is
turned off", 2018-05-02).  That isn't quite as advantageous as doing a
trivial tree merge, but if the algorithm that does the trivial tree
merge has to end up updating a complete index later anyway via the
checkout logic of unpack_trees, then the differences are basically a
wash.

> > It does NOT exist in upstream git.  It's
> > also one that is nearly incompatible with rename detection; it turns
> > out you can only do that optimization in the face of rename detection
> > if you do a HUGE amount of specialized work and tracking in order to
> > determine when it's safe _despite_ needing to detect renames.
>
> Perhaps merge.renames=false is enough to trigger this logic already?

Yeah, since I read too much into what you wrote and know that I
remember the "if (no_renames) o.aggressive = 1" bit, then yeah this
would be enough.

> > I
> > thought that optimization was totally incompatible with rename
> > detection for a long time; I tried it a couple times while working on
> > ort and watched it break all kinds of rename tests...but I eventually
> > discovered some tricks involving a lot of work to be able to run that
> > optimization.
>
> I will try to keep this in mind.
>
> > So, you aren't comparing upstream "recursive" to "ort", you're
> > comparing a tweaked version of recursive, and one that is incompatible
> > with how recursive's rename detection work.  In fact, just to be clear
> > in case you go looking, I suspect that this tweak is to be found
> > within unpack_trees.c (which recursive relies on heavily).
> >
> > Further, you've set it up so there are only a few files changed after
> > unpack_trees returns.
> >
> > In total, you have: (1) turned off rename detection (most my
> > optimizations are for improving this factor, meaning I can't show an
> > advantage), (2) you took advantage of no rename detection to implement
> > trivial-tree merges (thus killing the main second advantage my
> > algorithm has), and (3) you are looking at a case with a tiny number
> > of changes for the merge algorithm to process (thus killing a third
> > optimization that removes quadratic performance).  Those are my three
> > big optimizations, and you've made them all irrelevant.  In fact,
> > you're in an area I would have been worried that ort would do _worse_
> > than recursive.  I track an awful lot of things and there is overhead
> > in checking and filling all that information in; if there are only a
> > few entries to merge, then all that information was a waste to collect
> > and ort might be slower than recursive.  But then again, that should
> > be a case where both algorithms are "nearly instantaneous" (or would
> > be if it weren't for your 3M+ index entry repo causing run_builtin()'s
> > call to setup_git_directory() in git.c to take a huge amount of time
> > before the builtin is even called.)
>
> Thanks for your time isolating this case. I appreciate knowing exactly
> which portions of the merge algorithm are being touched and which are
> not.
> > 5 seconds.  I do have to hand it to Ben and anyone else involved,
> > though.  From 1 hour down to 5 seconds is pretty good, even if it was
> > done by hacks (turning off rename detection, and then implementing
> > trivial-tree merging that would have broken rename detection).  I
> > suspect that whoever did that work might have found the unconditional
> > discarding and re-reading of the index and fixed it as well?
>
> As you can probably tell from my general confusion, I had nothing
> to do with it. ;)
>
> > Heh, yeah 0.002 seconds for ..label:incore_recursive.  Only 2
> > milliseconds to create the actual merge tree.  That does suggest you
> > might have fun with 'git log -p --remerge-diff'; if you can redo
> > merges in 2 milliseconds, showing them in git log output is very
> > reasonable.  :-)
>
> Yeah, 'git merge-tree' is very fast for these cases, so I assumed
> that something else was going on for that command.

Oh, interesting.  I forgot about merge-tree.  Maybe I should make a
version based on merge-ort (and then it'd handle rename detection too,
something it doesn't currently do.)?  However, that wouldn't be
comparing merge algorithms, because builtin/merge-tree.c doesn't use
merge-recursive.[ch].  (It would be easy to get confused into thinking
it does, since merge-recursive.[ch] defines a function called
merge_trees(), but builtin/merge-tree.c doesn't use it despite the
name similarity.)

> > Could we have some fun, though?  What if you have some merge or rebase
> > involving lots of changes, and you turn rename detection back on, and
> > you disable that trivial-tree resolution optimization that breaks
> > recursive's rename detection handling...and then compare recursive and
> > ort?  (It might be easiest to just compare upstream recursive rather
> > than the one with all the microsoft changes to make sure you undid
> > whatever trivial tree handling work exists.)
>
> I can try these kinds of cases, but it won't be today. I'm on kid duty
> today, and answering emails in between running around with them.

One word of caution: merge.renameLimit may get in your way.  The
default of 1000 means that you're likely to hit that limit on your
first run, and get a warning message like the following printed out:

warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at
least 27328 and retry the command.

You then need to undo your rebase or merge, bump the limit, and
re-run.  Also, you will need a higher limit for merge-recursive than
you do for merge-ort.  The default of 1000 is enough for merge-ort to
detect all the renames in my 26K-files-in-a-directory rename testcase
of the linux kernel, but the value needs to be bumped to 27328 for
merge-recursive.  And if you don't have the limit high enough, then
one algorithm is doing the work to detect renames and the other is
bailing and skipping it, so it's not an apples-to-apples comparison.
If that warning doesn't appear for either backend, then you have an
apples-to-apples comparison.

> > For example, my testcase in the linux kernel was finding a series of a
> > few dozen patches I could rebase back to an older version, but
> > tweaking the "older" version by renaming drivers/ -> pilots/ (with
> > about 26K files under that directory, that meant about 26K renames).
> > So, I got to see rebasing of dozens of real changes across a massive
> > rename boundary -- and the massive rename boundary also guaranteed
> > there were lots of entries for the merge algorithm to deal with.
> >
> > In the end, though, 4 milliseconds for the rebase and 2 milliseconds
> > for the merge, with the rest all being overhead of interfacing to the
> > index and working tree actually seems pretty good to me.  I'm just
> > curious if we can check how things work for more involved cases.
>
> I'm definitely interested in identifying how your algorithm improves
> over the previous cases, and perhaps re-enabling rename detection for
> merges is enough of a benefit to justify the new one.
>
> Eventually, I hope to actually engage with your patches in the form
> of review. Just trying to build a mental model for what's going on
> first.

Ooh, I can help with that; here's what's going on:  *** Magic ***

(Black, evil magic in the case of merge-recurisve.  Good magic in the
case of merge-ort.)

Glad I could help clear things up for you.  :-)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-09 17:13             ` Elijah Newren
@ 2020-11-09 19:51               ` Derrick Stolee
  2020-11-09 22:44                 ` Elijah Newren
  0 siblings, 1 reply; 84+ messages in thread
From: Derrick Stolee @ 2020-11-09 19:51 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List, git

On 11/9/20 12:13 PM, Elijah Newren wrote:> Actually, this was pretty enlightening.  I think I know about what's
> happening...
> 
> First, a few years ago, Ben said that merges in the Microsoft repos
> took about an hour[1]:
> "For the repro that I have been using this drops the merge time from ~1 hour to
> ~5 minutes and the unmerged entries goes down from ~40,000 to 1."
> The change he made to drop it that far was to turn off rename detection.
> 
> [1] https://lore.kernel.org/git/20180426205202.23056-1-benpeart@microsoft.com/
> 
> Keep that in mind, especially since your times are actually
> significantly less than 5 minutes...

Yes, the other thing to keep in mind is that this is
a Scalar repo with the default cone-mode sparse-checkout
of only the files at root. For this repo, that means that
there are only ~10 files actually present.

I wanted to remove any working directory updates/checks
from the performance check as possible.

>> $ /_git/git/summarize-perf git rebase --onto to from test
>> Successfully rebased and updated refs/heads/test.
>> Accumulated times:
>>     8.511 : <unmeasured> (74.9%)
> 
> Wild guess: This is setup_git_directory() loading your ~3 million entry index.

I think there is also some commit walking happening, but
it shouldn't be too much. 'from' and 'to' are not very
far away.

> Did you include two runs of recursive and two runs of ort just to show
> that the timings were stable and thus there wasn't warm or cold disk
> cache issues affecting things?  If so, good plan.  (If there was
> another reason, let me know; I missed it.)

For the rebase, I did "--onto to from test" and "--onto from to test"
to show both directions of the rebase. The merge I did twice for the
cache issues ;)

> .004s on label:incore_nonrecursive -- that's the actual merge
> operation.  This was a trivial rebase, and the merging took just 4
> milliseconds.  But the overall run took 11.442 seconds because working
> with 3M+ entries in the index just takes forever, and my code didn't
> touch any on-disk formats, certainly not the index format.
> 
> _All_ of my optimization work was on the merging piece, not the stuff
> outside.  But for what you're testing here, it appears to be
> irrelevant compared to the overhead.

OK, so since we already disable rename detection through config,
the machinery that you are changing is already fast with the old
algorithm in these trivial cases.

To actually show any benefits, we would need to disable rename
detection or use a larger change.
>> And here are timings for a simple merge. Two files at root were changed in the
>> commits I made, but there are also some larger changes from the commit history.
>> These should all be seen as "this tree updated in one of the two, so take that
>> tree".
> 
> Ahah!  That's a microsoft-specific optimization you guys made in the
> recursive strategy, yes? 

I'm not aware of any logic we have that's different from core Git.
The config we use [1] includes "merge.stat = false" and "merge.renames
= false" but otherwise seems to be using stock Git.

[1] https://github.com/microsoft/scalar/blob/1d7938d2df6921f7a3b4f3f1cce56a00929adc40/Scalar.Common/Maintenance/ConfigStep.cs#L100-L127

I'm CC'ing Jeff Hostetler to see if he knows anything about a custom
merge algorithm in microsoft/git.

> It does NOT exist in upstream git.  It's
> also one that is nearly incompatible with rename detection; it turns
> out you can only do that optimization in the face of rename detection
> if you do a HUGE amount of specialized work and tracking in order to
> determine when it's safe _despite_ needing to detect renames. 

Perhaps merge.renames=false is enough to trigger this logic already?

> I
> thought that optimization was totally incompatible with rename
> detection for a long time; I tried it a couple times while working on
> ort and watched it break all kinds of rename tests...but I eventually
> discovered some tricks involving a lot of work to be able to run that
> optimization.

I will try to keep this in mind.

> So, you aren't comparing upstream "recursive" to "ort", you're
> comparing a tweaked version of recursive, and one that is incompatible
> with how recursive's rename detection work.  In fact, just to be clear
> in case you go looking, I suspect that this tweak is to be found
> within unpack_trees.c (which recursive relies on heavily).
> 
> Further, you've set it up so there are only a few files changed after
> unpack_trees returns.
> 
> In total, you have: (1) turned off rename detection (most my
> optimizations are for improving this factor, meaning I can't show an
> advantage), (2) you took advantage of no rename detection to implement
> trivial-tree merges (thus killing the main second advantage my
> algorithm has), and (3) you are looking at a case with a tiny number
> of changes for the merge algorithm to process (thus killing a third
> optimization that removes quadratic performance).  Those are my three
> big optimizations, and you've made them all irrelevant.  In fact,
> you're in an area I would have been worried that ort would do _worse_
> than recursive.  I track an awful lot of things and there is overhead
> in checking and filling all that information in; if there are only a
> few entries to merge, then all that information was a waste to collect
> and ort might be slower than recursive.  But then again, that should
> be a case where both algorithms are "nearly instantaneous" (or would
> be if it weren't for your 3M+ index entry repo causing run_builtin()'s
> call to setup_git_directory() in git.c to take a huge amount of time
> before the builtin is even called.)

Thanks for your time isolating this case. I appreciate knowing exactly
which portions of the merge algorithm are being touched and which are
not.
> 5 seconds.  I do have to hand it to Ben and anyone else involved,
> though.  From 1 hour down to 5 seconds is pretty good, even if it was
> done by hacks (turning off rename detection, and then implementing
> trivial-tree merging that would have broken rename detection).  I
> suspect that whoever did that work might have found the unconditional
> discarding and re-reading of the index and fixed it as well?

As you can probably tell from my general confusion, I had nothing
to do with it. ;)

> Heh, yeah 0.002 seconds for ..label:incore_recursive.  Only 2
> milliseconds to create the actual merge tree.  That does suggest you
> might have fun with 'git log -p --remerge-diff'; if you can redo
> merges in 2 milliseconds, showing them in git log output is very
> reasonable.  :-)

Yeah, 'git merge-tree' is very fast for these cases, so I assumed
that something else was going on for that command.

> Could we have some fun, though?  What if you have some merge or rebase
> involving lots of changes, and you turn rename detection back on, and
> you disable that trivial-tree resolution optimization that breaks
> recursive's rename detection handling...and then compare recursive and
> ort?  (It might be easiest to just compare upstream recursive rather
> than the one with all the microsoft changes to make sure you undid
> whatever trivial tree handling work exists.)

I can try these kinds of cases, but it won't be today. I'm on kid duty
today, and answering emails in between running around with them.

> For example, my testcase in the linux kernel was finding a series of a
> few dozen patches I could rebase back to an older version, but
> tweaking the "older" version by renaming drivers/ -> pilots/ (with
> about 26K files under that directory, that meant about 26K renames).
> So, I got to see rebasing of dozens of real changes across a massive
> rename boundary -- and the massive rename boundary also guaranteed
> there were lots of entries for the merge algorithm to deal with.
> 
> In the end, though, 4 milliseconds for the rebase and 2 milliseconds
> for the merge, with the rest all being overhead of interfacing to the
> index and working tree actually seems pretty good to me.  I'm just
> curious if we can check how things work for more involved cases.

I'm definitely interested in identifying how your algorithm improves
over the previous cases, and perhaps re-enabling rename detection for
merges is enough of a benefit to justify the new one.

Eventually, I hope to actually engage with your patches in the form
of review. Just trying to build a mental model for what's going on
first.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-09 12:30           ` Derrick Stolee
@ 2020-11-09 17:13             ` Elijah Newren
  2020-11-09 19:51               ` Derrick Stolee
  0 siblings, 1 reply; 84+ messages in thread
From: Elijah Newren @ 2020-11-09 17:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

Hi Derrick,

On Mon, Nov 9, 2020 at 4:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/7/2020 2:39 PM, Elijah Newren wrote:

> > *1*. Could you give me the accumulated times from the trace2_regions
> > so we can verify where the time is spent?  The 'summarize-perf' script
> > at the toplevel of the repo in my ort branch might be helpful for
> > this; just prefix any git command with that script and it accumulates
> > the trace2 region times and prints them out.  For example, I could run
> > 'summarize-perf git merge --no-edit B^0' or 'summarize-perf test-tool
> > fast-rebase --onto HEAD ca76bea9 myfeature'.  Here's an example:
> >
> > === BEGIN OUTPUT ===
> > $ /home/newren/floss/git/summarize-perf test-tool fast-rebase --onto
> > HEAD 4703d9119972bf586d2cca76ec6438f819ffa30e hwmon-updates
> > Rebasing fd8bdb23b91876ac1e624337bb88dc1dcc21d67e...
> > Done.
> > Accumulated times:
> >     0.031 : <unmeasured> ( 3.2%)
> >     0.837 : 35 : label:incore_nonrecursive
> >        0.003 : <unmeasured> ( 0.4%)
> >        0.476 : 41 : ..label:collect_merge_info
> >           0.001 : <unmeasured> ( 0.2%)
> >           0.475 : 41 : ....label:traverse_trees
> >        0.298 : 41 : ..label:renames
> >           0.015 : <unmeasured> ( 5.1%)
> >           0.280 : 41 : ....label:regular renames
> >              0.036 : <unmeasured> (12.7%)
> >              0.244 : 6 : ......label:diffcore_rename
> >                 0.001 : <unmeasured> ( 0.4%)
> >                 0.078 : 6 : ........label:dir rename setup
> >                 0.055 : 6 : ........label:basename matches
> >                 0.051 : 6 : ........label:exact renames
> >                 0.031 : 6 : ........label:write back to queue
> >                 0.017 : 6 : ........label:setup
> >                 0.009 : 6 : ........label:cull basename
> >                 0.003 : 6 : ........label:cull exact
> >           0.002 : 35 : ....label:directory renames
> >           0.001 : 35 : ....label:process renames
> >        0.052 : 35 : ..label:process_entries
> >           0.001 : <unmeasured> ( 1.7%)
> >           0.033 : 35 : ....label:processing
> >           0.017 : 35 : ....label:process_entries setup
> >              0.001 : <unmeasured> ( 5.8%)
> >              0.008 : 35 : ......label:plist copy
> >              0.008 : 35 : ......label:plist sort
> >              0.000 : 35 : ......label:plist grow
> >           0.001 : 35 : ....label:finalize
> >        0.005 : 35 : ..label:merge_start
> >           0.001 : <unmeasured> (18.8%)
> >           0.004 : 34 : ....label:reset_maps
> >           0.000 : 35 : ....label:sanity checks
> >           0.000 : 1 : ....label:allocate/init
> >        0.003 : 6 : ..label:reset_maps
> >     0.035 : 1 : label:do_write_index
> > /home/newren/floss/linux-stable/.git/index.lock
> >     0.034 : 1 : label:checkout
> >        0.034 : <unmeasured> (99.9%)
> >        0.000 : 1 : ..label:Filtering content
> >     0.009 : 1 : label:do_read_index .git/index
> >     0.000 : 1 : label:write_auto_merge
> >     0.000 : 1 : label:record_unmerged
> > Estimated measurement overhead (.010 ms/region-measure * 679):
> > 0.006790000000000001
> > Timing including forking:  0.960 (0.013 additional seconds)
> > === END OUTPUT ===
> > This was a run that took just under 1s (and was a hot-cache case; I
> > had just done the same rebase before to warm the caches), and the
> > combination of index/working tree bits (everything at and after
> > do_write_index in the output) was 0.035+0.034+0.009+0+0=0.078 seconds,
> > corresponding to just over 8.1% of overall time.  I'm curious where
> > that lands for your repository testcase; if the larger time ends up
> > somewhere under the indented label:incore_nonrecursive region, then
> > it's due to something other than index reading/updating/writing.
> >
> > *2*. If it really is due to index reading/updating/writing, then index
> > handling in merge-ort is confined to two functions: checkout() and
> > record_unmerged_index_entries().  Both functions aren't too long, and
> > neither one calls into any other function within merge-ort.c.
> > (Further, checkout() is a near copy of code from merge_working_tree()
> > in builtin/checkout.c, or at least a copy of that function from a year
> > or so ago.)  As such, it's possible you can go in and make whatever
> > special tweaks you have for partial index reading/writing to those
> > functions.
> >
> > I'm curious to hear back more on this.
>
> I don't have a lot of time to dig into this right now, but here are
> the stats for my rebases and merges with and without your option.

Actually, this was pretty enlightening.  I think I know about what's
happening...

First, a few years ago, Ben said that merges in the Microsoft repos
took about an hour[1]:
"For the repro that I have been using this drops the merge time from ~1 hour to
~5 minutes and the unmerged entries goes down from ~40,000 to 1."
The change he made to drop it that far was to turn off rename detection.

[1] https://lore.kernel.org/git/20180426205202.23056-1-benpeart@microsoft.com/

Keep that in mind, especially since your times are actually
significantly less than 5 minutes...

> The first thing I notice for each is that there is a significant
> amount of "unmeasured" time at the beginning of each, and that
> could possibly be improved separately.
>
> First, try a rebase forward and backward.
>
> $ /_git/git/summarize-perf git rebase --onto to from test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.511 : <unmeasured> (74.9%)

Wild guess: This is setup_git_directory() loading your ~3 million entry index.

>     1.331 : 1 : ......label:unpack_trees
>        0.200 : <unmeasured> (15.1%)
>        0.580 : 1 : ........label:traverse_trees
>        0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>        0.126 : 1 : ........label:check_updates
>           0.126 : <unmeasured> (100.0%)
>           0.000 : 1 : ..........label:Filtering content
>        0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>        0.000 : 1 : ........label:fully_valid
>     1.059 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
>        0.930 : <unmeasured> (87.9%)
>        0.128 : 1 : ........label:write/extension/cache_tree
>     0.455 : 2 : ......label:fully_valid
>     0.001 : 1 : ......label:traverse_trees
>     0.000 : 1 : ......label:check_updates
> Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
> Timing including forking: 11.382 (0.026 additional seconds)
>
> $ /_git/git/summarize-perf git rebase --onto from to test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.556 : <unmeasured> (75.2%)
>     1.315 : 1 : ......label:unpack_trees
>        0.197 : <unmeasured> (15.0%)
>        0.580 : 1 : ........label:traverse_trees
>        0.391 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>        0.126 : 1 : ........label:check_updates
>           0.126 : <unmeasured> (100.0%)
>           0.000 : 1 : ..........label:Filtering content
>        0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>        0.000 : 1 : ........label:fully_valid
>     1.071 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
>        0.942 : <unmeasured> (88.0%)
>        0.129 : 1 : ........label:write/extension/cache_tree
>     0.431 : 2 : ......label:fully_valid
>     0.001 : 1 : ......label:traverse_trees
>     0.000 : 1 : ......label:check_updates
> Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
> Timing including forking: 11.399 (0.026 additional seconds)

Did you include two runs of recursive and two runs of ort just to show
that the timings were stable and thus there wasn't warm or cold disk
cache issues affecting things?  If so, good plan.  (If there was
another reason, let me know; I missed it.)

> Then do the same with the ort strategy.
>
> $ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto to from test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.350 : <unmeasured> (73.2%)
>     1.403 : 1 : ....label:checkout
>        0.000 : <unmeasured> ( 0.0%)
>        1.403 : 1 : ......label:unpack_trees
>           0.312 : <unmeasured> (22.3%)
>           0.539 : 1 : ........label:traverse_trees
>           0.401 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>           0.128 : 1 : ........label:check_updates
>              0.128 : <unmeasured> (100.0%)
>              0.000 : 1 : ..........label:Filtering content
>           0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>           0.000 : 1 : ........label:fully_valid
>     1.081 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
>        0.951 : <unmeasured> (88.1%)
>        0.129 : 1 : ......label:write/extension/cache_tree
>     0.432 : 2 : ....label:fully_valid
>     0.143 : 1 : ....label:do_read_index .git/index
>        0.019 : <unmeasured> (13.1%)
>        0.125 : 1 : label:read/extension/cache_tree
>     0.004 : 1 : ....label:incore_nonrecursive
>        0.001 : <unmeasured> (25.8%)
>        0.002 : 1 : ......label:process_entries
>           0.000 : <unmeasured> ( 2.6%)
>           0.001 : 1 : ........label:finalize
>           0.001 : 1 : ........label:process_entries setup
>              0.000 : <unmeasured> ( 6.7%)
>              0.001 : 1 : ..........label:plist sort
>              0.000 : 1 : ..........label:plist copy
>              0.000 : 1 : ..........label:plist grow
>           0.000 : 1 : ........label:processing
>        0.001 : 1 : ......label:collect_merge_info
>           0.000 : <unmeasured> (35.3%)
>           0.001 : 1 : ........label:traverse_trees
>        0.000 : 1 : ......label:merge_start
>           0.000 : <unmeasured> (42.3%)
>           0.000 : 1 : ........label:allocate/init
>           0.000 : 1 : ........label:sanity checks
>        0.000 : 1 : ......label:renames
>     0.001 : 1 : ....label:traverse_trees
>     0.000 : 1 : ....label:write_auto_merge
>     0.000 : 1 : ....label:check_updates
>     0.000 : 1 : ....label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
> Timing including forking: 11.442 (0.027 additional seconds)

.004s on label:incore_nonrecursive -- that's the actual merge
operation.  This was a trivial rebase, and the merging took just 4
milliseconds.  But the overall run took 11.442 seconds because working
with 3M+ entries in the index just takes forever, and my code didn't
touch any on-disk formats, certainly not the index format.

_All_ of my optimization work was on the merging piece, not the stuff
outside.  But for what you're testing here, it appears to be
irrelevant compared to the overhead.

> $ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto from to test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.337 : <unmeasured> (73.2%)
>     1.395 : 1 : ....label:checkout
>        0.000 : <unmeasured> ( 0.0%)
>        1.395 : 1 : ......label:unpack_trees
>           0.309 : <unmeasured> (22.1%)
>           0.537 : 1 : ........label:traverse_trees
>           0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>           0.124 : 1 : ........label:check_updates
>              0.124 : <unmeasured> (100.0%)
>              0.000 : 1 : ..........label:Filtering content
>           0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>           0.000 : 1 : ........label:fully_valid
>     1.084 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
>        0.955 : <unmeasured> (88.1%)
>        0.129 : 1 : ......label:write/extension/cache_tree
>     0.436 : 2 : ....label:fully_valid
>     0.137 : 1 : ....label:do_read_index .git/index
>        0.013 : <unmeasured> ( 9.3%)
>        0.125 : 1 : label:read/extension/cache_tree
>     0.004 : 1 : ....label:incore_nonrecursive
>        0.001 : <unmeasured> (24.5%)
>        0.002 : 1 : ......label:process_entries
>           0.000 : <unmeasured> ( 2.5%)
>           0.001 : 1 : ........label:finalize
>           0.001 : 1 : ........label:process_entries setup
>              0.000 : <unmeasured> ( 6.5%)
>              0.001 : 1 : ..........label:plist sort
>              0.000 : 1 : ..........label:plist copy
>              0.000 : 1 : ..........label:plist grow
>           0.000 : 1 : ........label:processing
>        0.001 : 1 : ......label:collect_merge_info
>           0.000 : <unmeasured> (26.5%)
>           0.001 : 1 : ........label:traverse_trees
>        0.000 : 1 : ......label:merge_start
>           0.000 : <unmeasured> (43.1%)
>           0.000 : 1 : ........label:allocate/init
>           0.000 : 1 : ........label:sanity checks
>        0.000 : 1 : ......label:renames
>     0.001 : 1 : ....label:traverse_trees
>     0.000 : 1 : ....label:write_auto_merge
>     0.000 : 1 : ....label:check_updates
>     0.000 : 1 : ....label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
> Timing including forking: 11.418 (0.024 additional seconds)

Ah, you included two copies for merge-ort too.  I'm guessing you did
that just to show there wasn't some cold cache issues or something and
that the runs showed consistent timings?


> And here are timings for a simple merge. Two files at root were changed in the
> commits I made, but there are also some larger changes from the commit history.
> These should all be seen as "this tree updated in one of the two, so take that
> tree".

Ahah!  That's a microsoft-specific optimization you guys made in the
recursive strategy, yes?  It does NOT exist in upstream git.  It's
also one that is nearly incompatible with rename detection; it turns
out you can only do that optimization in the face of rename detection
if you do a HUGE amount of specialized work and tracking in order to
determine when it's safe _despite_ needing to detect renames.  I
thought that optimization was totally incompatible with rename
detection for a long time; I tried it a couple times while working on
ort and watched it break all kinds of rename tests...but I eventually
discovered some tricks involving a lot of work to be able to run that
optimization.

So, you aren't comparing upstream "recursive" to "ort", you're
comparing a tweaked version of recursive, and one that is incompatible
with how recursive's rename detection work.  In fact, just to be clear
in case you go looking, I suspect that this tweak is to be found
within unpack_trees.c (which recursive relies on heavily).

Further, you've set it up so there are only a few files changed after
unpack_trees returns.

In total, you have: (1) turned off rename detection (most my
optimizations are for improving this factor, meaning I can't show an
advantage), (2) you took advantage of no rename detection to implement
trivial-tree merges (thus killing the main second advantage my
algorithm has), and (3) you are looking at a case with a tiny number
of changes for the merge algorithm to process (thus killing a third
optimization that removes quadratic performance).  Those are my three
big optimizations, and you've made them all irrelevant.  In fact,
you're in an area I would have been worried that ort would do _worse_
than recursive.  I track an awful lot of things and there is overhead
in checking and filling all that information in; if there are only a
few entries to merge, then all that information was a waste to collect
and ort might be slower than recursive.  But then again, that should
be a case where both algorithms are "nearly instantaneous" (or would
be if it weren't for your 3M+ index entry repo causing run_builtin()'s
call to setup_git_directory() in git.c to take a huge amount of time
before the builtin is even called.)


> $ git reset --hard test2 && /_git/git/summarize-perf git merge test -m test
> Merge made by the 'recursive' strategy.
> Accumulated times:
>     2.647 : <unmeasured> (48.6%)
>     1.384 : 1 : ..label:unpack_trees
>        0.267 : <unmeasured> (19.3%)
>        0.582 : 1 : ....label:traverse_trees
>        0.391 : 1 : ....label:clear_ce_flags/0x00000000_0x02000000
>        0.124 : 1 : ....label:check_updates
>           0.124 : <unmeasured> (100.0%)
>           0.000 : 1 : ......label:Filtering content
>        0.021 : 1 : ....label:clear_ce_flags/0x00080000_0x42000000
>        0.000 : 1 : ....label:fully_valid
>     1.060 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
>        0.931 : <unmeasured> (87.9%)
>        0.128 : 1 : ....label:write/extension/cache_tree
>     0.226 : 1 : ..label:fully_valid
>     0.134 : 1 : ..label:do_read_index .git/index
>        0.008 : <unmeasured> ( 5.8%)
>        0.126 : 1 : label:read/extension/cache_tree
>     0.001 : 1 : ..label:traverse_trees
>     0.000 : 1 : ..label:check_updates
>     0.000 : 1 : ..label:setup
>     0.000 : 1 : ..label:write back to queue
> Estimated measurement overhead (.010 ms/region-measure * 20): 0.0002
> Timing including forking:  5.466 (0.015 additional seconds)

5 seconds.  I do have to hand it to Ben and anyone else involved,
though.  From 1 hour down to 5 seconds is pretty good, even if it was
done by hacks (turning off rename detection, and then implementing
trivial-tree merging that would have broken rename detection).  I
suspect that whoever did that work might have found the unconditional
discarding and re-reading of the index and fixed it as well?

> $ git reset --hard test2 && /_git/git/summarize-perf git -c pull.twohead=ort merge test -m test
> Merge made by the 'ort' strategy.
> Accumulated times:
>     2.531 : <unmeasured> (49.1%)
>     1.328 : 1 : ..label:checkout
>        0.000 : <unmeasured> ( 0.0%)
>        1.328 : 1 : ....label:unpack_trees
>           0.228 : <unmeasured> (17.2%)
>           0.566 : 1 : ......label:traverse_trees
>           0.388 : 1 : ......label:clear_ce_flags/0x00000000_0x02000000
>           0.125 : 1 : ......label:check_updates
>              0.125 : <unmeasured> (100.0%)
>              0.000 : 1 : ........label:Filtering content
>           0.021 : 1 : ......label:clear_ce_flags/0x00080000_0x42000000
>           0.000 : 1 : ......label:fully_valid
>     1.067 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
>        0.938 : <unmeasured> (87.9%)
>        0.129 : 1 : ....label:write/extension/cache_tree
>     0.230 : 1 : ..label:fully_valid
>     0.002 : 1 : ..label:incore_recursive
>        0.001 : <unmeasured> (22.3%)
>        0.001 : 1 : ....label:collect_merge_info
>           0.001 : <unmeasured> (60.2%)
>           0.000 : 1 : ......label:traverse_trees
>        0.001 : 1 : ....label:process_entries
>           0.000 : <unmeasured> ( 2.8%)
>           0.001 : 1 : ......label:finalize
>           0.000 : 1 : ......label:process_entries setup
>              0.000 : <unmeasured> ( 6.9%)
>              0.000 : 1 : ........label:plist sort
>              0.000 : 1 : ........label:plist copy
>              0.000 : 1 : ........label:plist grow
>           0.000 : 1 : ......label:processing
>        0.000 : 1 : ....label:merge_start
>           0.000 : <unmeasured> (50.0%)
>           0.000 : 1 : ......label:allocate/init
>           0.000 : 1 : ......label:sanity checks
>        0.000 : 1 : ....label:renames
>     0.001 : 1 : ..label:traverse_trees
>     0.000 : 1 : ..label:write_auto_merge
>     0.000 : 1 : ..label:check_updates
>     0.000 : 1 : ..label:setup
>     0.000 : 1 : ..label:display messages
>     0.000 : 1 : ..label:write back to queue
>     0.000 : 1 : ..label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 36): 0.00036
> Timing including forking:  5.174 (0.015 additional seconds)

Heh, yeah 0.002 seconds for ..label:incore_recursive.  Only 2
milliseconds to create the actual merge tree.  That does suggest you
might have fun with 'git log -p --remerge-diff'; if you can redo
merges in 2 milliseconds, showing them in git log output is very
reasonable.  :-)


Could we have some fun, though?  What if you have some merge or rebase
involving lots of changes, and you turn rename detection back on, and
you disable that trivial-tree resolution optimization that breaks
recursive's rename detection handling...and then compare recursive and
ort?  (It might be easiest to just compare upstream recursive rather
than the one with all the microsoft changes to make sure you undid
whatever trivial tree handling work exists.)

For example, my testcase in the linux kernel was finding a series of a
few dozen patches I could rebase back to an older version, but
tweaking the "older" version by renaming drivers/ -> pilots/ (with
about 26K files under that directory, that meant about 26K renames).
So, I got to see rebasing of dozens of real changes across a massive
rename boundary -- and the massive rename boundary also guaranteed
there were lots of entries for the merge algorithm to deal with.


In the end, though, 4 milliseconds for the rebase and 2 milliseconds
for the merge, with the rest all being overhead of interfacing to the
index and working tree actually seems pretty good to me.  I'm just
curious if we can check how things work for more involved cases.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-07 19:39         ` Elijah Newren
@ 2020-11-09 12:30           ` Derrick Stolee
  2020-11-09 17:13             ` Elijah Newren
  0 siblings, 1 reply; 84+ messages in thread
From: Derrick Stolee @ 2020-11-09 12:30 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

On 11/7/2020 2:39 PM, Elijah Newren wrote:
> On Sat, Nov 7, 2020 at 7:02 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 11/7/20 1:06 AM, Elijah Newren wrote:
>>> Hi Derrick,
>>>
>>> On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
>>>> All that said, for testing either branch you just need to first set
>>>> pull.twohead=ort in your git config (see
>>>> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
>>>> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
>>>
>>> I probably also should have mentioned that merge-ort does not (yet?)
>>> heed merge.renames configuration setting; it always detects renames.
>>> I know you run with merge.renames=false, so you won't quite get an
>>> apples-to-apples comparison.  However, part of my point was I wanted
>>> to make renames fast enough that they could be left turned on, even
>>> for the large scale repos, so I'm very interested in your experience.
>>> If you need an escape hatch, though, just put a "return 1" at the top
>>> of detect_and_process_renames() to turn it off.
>>>
>>> Oh, and I went through and re-merged all the merge commits in the
>>> linux kernel and found a bug in merge-ort while doing that (causing it
>>> to die, not to merge badly).  I'm kind of surprised that none of my
>>> testcases triggered that failure earlier; if you're testing it out,
>>> you might want to update to get the fix (commit 067e5c1a38,
>>> "merge-ort: fix bug with cached_target_names not being initialized in
>>> redos", 2020-11-06).
>>
>> I did manage to do some testing to see what happens with
>> a large repo under a small sparse-checkout. And using
>> trace2, I was able to see that your code is being exercised.
>> Unfortunately, I didn't see any performance improvement, and
>> that is likely due to needing to expand the index entirely
>> when checking out the merge commit.
>>
>> Is there a command to construct a merge commit without
>> actually checking it out? That would reduce the time spent
>> expanding the index, which would allow your algorithm to
>> really show its benefits!
> 
> Wow, very interesting.  I am working on a --remerge-diff option for
> log, which implies -p and is similar to -c or --cc in that it makes
> merge commits show a diff, but which in particular remerges the two
> parent commits complete with conflict markers and such and then diffs
> the merge commit against that intermediate remerge.  That's a case
> that constructs a merge commit without ever touching the index (or
> working tree)...but there's no equivalent comparison point for
> merge-recursive.  So, it doesn't provide something to compare against
> (and while the code can be used I don't actually have a --remerge-diff
> option yet -- it just hardcodes the behavior on if wanted or not), so
> I'm not sure if you'd be interested in it.  If you are, let me know
> though, and I'll send details.
> 
> However, I'm really surprised here, because merge-recursive always
> reads and writes the index too (the index is the basis for its whole
> algorithm).  In fact, merge-recursive always reads the index at least
> *twice* (it unconditionally discards and re-reads the index), so you
> must have some kind of specialized tweaking of merge-recursive if it
> somehow avoids a full index read/write.  In order to do an
> apples-to-apples comparison, we'd need to make those same tweaks to
> merge-ort, but I don't have a clue what kind of tweaks you've made
> here.  So, some investigation points:
> 
> *1*. Could you give me the accumulated times from the trace2_regions
> so we can verify where the time is spent?  The 'summarize-perf' script
> at the toplevel of the repo in my ort branch might be helpful for
> this; just prefix any git command with that script and it accumulates
> the trace2 region times and prints them out.  For example, I could run
> 'summarize-perf git merge --no-edit B^0' or 'summarize-perf test-tool
> fast-rebase --onto HEAD ca76bea9 myfeature'.  Here's an example:
> 
> === BEGIN OUTPUT ===
> $ /home/newren/floss/git/summarize-perf test-tool fast-rebase --onto
> HEAD 4703d9119972bf586d2cca76ec6438f819ffa30e hwmon-updates
> Rebasing fd8bdb23b91876ac1e624337bb88dc1dcc21d67e...
> Done.
> Accumulated times:
>     0.031 : <unmeasured> ( 3.2%)
>     0.837 : 35 : label:incore_nonrecursive
>        0.003 : <unmeasured> ( 0.4%)
>        0.476 : 41 : ..label:collect_merge_info
>           0.001 : <unmeasured> ( 0.2%)
>           0.475 : 41 : ....label:traverse_trees
>        0.298 : 41 : ..label:renames
>           0.015 : <unmeasured> ( 5.1%)
>           0.280 : 41 : ....label:regular renames
>              0.036 : <unmeasured> (12.7%)
>              0.244 : 6 : ......label:diffcore_rename
>                 0.001 : <unmeasured> ( 0.4%)
>                 0.078 : 6 : ........label:dir rename setup
>                 0.055 : 6 : ........label:basename matches
>                 0.051 : 6 : ........label:exact renames
>                 0.031 : 6 : ........label:write back to queue
>                 0.017 : 6 : ........label:setup
>                 0.009 : 6 : ........label:cull basename
>                 0.003 : 6 : ........label:cull exact
>           0.002 : 35 : ....label:directory renames
>           0.001 : 35 : ....label:process renames
>        0.052 : 35 : ..label:process_entries
>           0.001 : <unmeasured> ( 1.7%)
>           0.033 : 35 : ....label:processing
>           0.017 : 35 : ....label:process_entries setup
>              0.001 : <unmeasured> ( 5.8%)
>              0.008 : 35 : ......label:plist copy
>              0.008 : 35 : ......label:plist sort
>              0.000 : 35 : ......label:plist grow
>           0.001 : 35 : ....label:finalize
>        0.005 : 35 : ..label:merge_start
>           0.001 : <unmeasured> (18.8%)
>           0.004 : 34 : ....label:reset_maps
>           0.000 : 35 : ....label:sanity checks
>           0.000 : 1 : ....label:allocate/init
>        0.003 : 6 : ..label:reset_maps
>     0.035 : 1 : label:do_write_index
> /home/newren/floss/linux-stable/.git/index.lock
>     0.034 : 1 : label:checkout
>        0.034 : <unmeasured> (99.9%)
>        0.000 : 1 : ..label:Filtering content
>     0.009 : 1 : label:do_read_index .git/index
>     0.000 : 1 : label:write_auto_merge
>     0.000 : 1 : label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 679):
> 0.006790000000000001
> Timing including forking:  0.960 (0.013 additional seconds)
> === END OUTPUT ===
> This was a run that took just under 1s (and was a hot-cache case; I
> had just done the same rebase before to warm the caches), and the
> combination of index/working tree bits (everything at and after
> do_write_index in the output) was 0.035+0.034+0.009+0+0=0.078 seconds,
> corresponding to just over 8.1% of overall time.  I'm curious where
> that lands for your repository testcase; if the larger time ends up
> somewhere under the indented label:incore_nonrecursive region, then
> it's due to something other than index reading/updating/writing.
> 
> *2*. If it really is due to index reading/updating/writing, then index
> handling in merge-ort is confined to two functions: checkout() and
> record_unmerged_index_entries().  Both functions aren't too long, and
> neither one calls into any other function within merge-ort.c.
> (Further, checkout() is a near copy of code from merge_working_tree()
> in builtin/checkout.c, or at least a copy of that function from a year
> or so ago.)  As such, it's possible you can go in and make whatever
> special tweaks you have for partial index reading/writing to those
> functions.
> 
> I'm curious to hear back more on this.

I don't have a lot of time to dig into this right now, but here are
the stats for my rebases and merges with and without your option.

The first thing I notice for each is that there is a significant
amount of "unmeasured" time at the beginning of each, and that
could possibly be improved separately.

First, try a rebase forward and backward.

$ /_git/git/summarize-perf git rebase --onto to from test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.511 : <unmeasured> (74.9%)
    1.331 : 1 : ......label:unpack_trees
       0.200 : <unmeasured> (15.1%)
       0.580 : 1 : ........label:traverse_trees
       0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
       0.126 : 1 : ........label:check_updates
          0.126 : <unmeasured> (100.0%)
          0.000 : 1 : ..........label:Filtering content
       0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
       0.000 : 1 : ........label:fully_valid
    1.059 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
       0.930 : <unmeasured> (87.9%)
       0.128 : 1 : ........label:write/extension/cache_tree
    0.455 : 2 : ......label:fully_valid
    0.001 : 1 : ......label:traverse_trees
    0.000 : 1 : ......label:check_updates
Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
Timing including forking: 11.382 (0.026 additional seconds)

$ /_git/git/summarize-perf git rebase --onto from to test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.556 : <unmeasured> (75.2%)
    1.315 : 1 : ......label:unpack_trees
       0.197 : <unmeasured> (15.0%)
       0.580 : 1 : ........label:traverse_trees
       0.391 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
       0.126 : 1 : ........label:check_updates
          0.126 : <unmeasured> (100.0%)
          0.000 : 1 : ..........label:Filtering content
       0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
       0.000 : 1 : ........label:fully_valid
    1.071 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
       0.942 : <unmeasured> (88.0%)
       0.129 : 1 : ........label:write/extension/cache_tree
    0.431 : 2 : ......label:fully_valid
    0.001 : 1 : ......label:traverse_trees
    0.000 : 1 : ......label:check_updates
Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
Timing including forking: 11.399 (0.026 additional seconds)

Then do the same with the ort strategy.

$ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto to from test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.350 : <unmeasured> (73.2%)
    1.403 : 1 : ....label:checkout  
       0.000 : <unmeasured> ( 0.0%)
       1.403 : 1 : ......label:unpack_trees
          0.312 : <unmeasured> (22.3%)
          0.539 : 1 : ........label:traverse_trees
          0.401 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
          0.128 : 1 : ........label:check_updates
             0.128 : <unmeasured> (100.0%)
             0.000 : 1 : ..........label:Filtering content
          0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
          0.000 : 1 : ........label:fully_valid
    1.081 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
       0.951 : <unmeasured> (88.1%)
       0.129 : 1 : ......label:write/extension/cache_tree
    0.432 : 2 : ....label:fully_valid
    0.143 : 1 : ....label:do_read_index .git/index
       0.019 : <unmeasured> (13.1%)
       0.125 : 1 : label:read/extension/cache_tree
    0.004 : 1 : ....label:incore_nonrecursive
       0.001 : <unmeasured> (25.8%)
       0.002 : 1 : ......label:process_entries
          0.000 : <unmeasured> ( 2.6%)
          0.001 : 1 : ........label:finalize
          0.001 : 1 : ........label:process_entries setup
             0.000 : <unmeasured> ( 6.7%)
             0.001 : 1 : ..........label:plist sort
             0.000 : 1 : ..........label:plist copy
             0.000 : 1 : ..........label:plist grow
          0.000 : 1 : ........label:processing
       0.001 : 1 : ......label:collect_merge_info
          0.000 : <unmeasured> (35.3%)
          0.001 : 1 : ........label:traverse_trees
       0.000 : 1 : ......label:merge_start
          0.000 : <unmeasured> (42.3%)
          0.000 : 1 : ........label:allocate/init
          0.000 : 1 : ........label:sanity checks
       0.000 : 1 : ......label:renames 
    0.001 : 1 : ....label:traverse_trees
    0.000 : 1 : ....label:write_auto_merge
    0.000 : 1 : ....label:check_updates
    0.000 : 1 : ....label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
Timing including forking: 11.442 (0.027 additional seconds)

$ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto from to test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.337 : <unmeasured> (73.2%)
    1.395 : 1 : ....label:checkout  
       0.000 : <unmeasured> ( 0.0%)
       1.395 : 1 : ......label:unpack_trees
          0.309 : <unmeasured> (22.1%)
          0.537 : 1 : ........label:traverse_trees
          0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
          0.124 : 1 : ........label:check_updates
             0.124 : <unmeasured> (100.0%)
             0.000 : 1 : ..........label:Filtering content
          0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
          0.000 : 1 : ........label:fully_valid
    1.084 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
       0.955 : <unmeasured> (88.1%)
       0.129 : 1 : ......label:write/extension/cache_tree
    0.436 : 2 : ....label:fully_valid
    0.137 : 1 : ....label:do_read_index .git/index
       0.013 : <unmeasured> ( 9.3%)
       0.125 : 1 : label:read/extension/cache_tree
    0.004 : 1 : ....label:incore_nonrecursive
       0.001 : <unmeasured> (24.5%)
       0.002 : 1 : ......label:process_entries
          0.000 : <unmeasured> ( 2.5%)
          0.001 : 1 : ........label:finalize
          0.001 : 1 : ........label:process_entries setup
             0.000 : <unmeasured> ( 6.5%)
             0.001 : 1 : ..........label:plist sort
             0.000 : 1 : ..........label:plist copy
             0.000 : 1 : ..........label:plist grow
          0.000 : 1 : ........label:processing
       0.001 : 1 : ......label:collect_merge_info
          0.000 : <unmeasured> (26.5%)
          0.001 : 1 : ........label:traverse_trees
       0.000 : 1 : ......label:merge_start
          0.000 : <unmeasured> (43.1%)
          0.000 : 1 : ........label:allocate/init
          0.000 : 1 : ........label:sanity checks
       0.000 : 1 : ......label:renames 
    0.001 : 1 : ....label:traverse_trees
    0.000 : 1 : ....label:write_auto_merge
    0.000 : 1 : ....label:check_updates
    0.000 : 1 : ....label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
Timing including forking: 11.418 (0.024 additional seconds)

And here are timings for a simple merge. Two files at root were changed in the
commits I made, but there are also some larger changes from the commit history.
These should all be seen as "this tree updated in one of the two, so take that
tree".

$ git reset --hard test2 && /_git/git/summarize-perf git merge test -m test
Merge made by the 'recursive' strategy.
Accumulated times:
    2.647 : <unmeasured> (48.6%)
    1.384 : 1 : ..label:unpack_trees
       0.267 : <unmeasured> (19.3%)
       0.582 : 1 : ....label:traverse_trees
       0.391 : 1 : ....label:clear_ce_flags/0x00000000_0x02000000
       0.124 : 1 : ....label:check_updates
          0.124 : <unmeasured> (100.0%)
          0.000 : 1 : ......label:Filtering content
       0.021 : 1 : ....label:clear_ce_flags/0x00080000_0x42000000
       0.000 : 1 : ....label:fully_valid
    1.060 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
       0.931 : <unmeasured> (87.9%)
       0.128 : 1 : ....label:write/extension/cache_tree
    0.226 : 1 : ..label:fully_valid 
    0.134 : 1 : ..label:do_read_index .git/index
       0.008 : <unmeasured> ( 5.8%)
       0.126 : 1 : label:read/extension/cache_tree
    0.001 : 1 : ..label:traverse_trees
    0.000 : 1 : ..label:check_updates
    0.000 : 1 : ..label:setup       
    0.000 : 1 : ..label:write back to queue
Estimated measurement overhead (.010 ms/region-measure * 20): 0.0002
Timing including forking:  5.466 (0.015 additional seconds)

$ git reset --hard test2 && /_git/git/summarize-perf git -c pull.twohead=ort merge test -m test
Merge made by the 'ort' strategy.
Accumulated times:
    2.531 : <unmeasured> (49.1%)
    1.328 : 1 : ..label:checkout    
       0.000 : <unmeasured> ( 0.0%)
       1.328 : 1 : ....label:unpack_trees
          0.228 : <unmeasured> (17.2%)
          0.566 : 1 : ......label:traverse_trees
          0.388 : 1 : ......label:clear_ce_flags/0x00000000_0x02000000
          0.125 : 1 : ......label:check_updates
             0.125 : <unmeasured> (100.0%)
             0.000 : 1 : ........label:Filtering content
          0.021 : 1 : ......label:clear_ce_flags/0x00080000_0x42000000
          0.000 : 1 : ......label:fully_valid
    1.067 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
       0.938 : <unmeasured> (87.9%)
       0.129 : 1 : ....label:write/extension/cache_tree
    0.230 : 1 : ..label:fully_valid 
    0.002 : 1 : ..label:incore_recursive
       0.001 : <unmeasured> (22.3%)
       0.001 : 1 : ....label:collect_merge_info
          0.001 : <unmeasured> (60.2%)
          0.000 : 1 : ......label:traverse_trees
       0.001 : 1 : ....label:process_entries
          0.000 : <unmeasured> ( 2.8%)
          0.001 : 1 : ......label:finalize
          0.000 : 1 : ......label:process_entries setup
             0.000 : <unmeasured> ( 6.9%)
             0.000 : 1 : ........label:plist sort
             0.000 : 1 : ........label:plist copy
             0.000 : 1 : ........label:plist grow
          0.000 : 1 : ......label:processing
       0.000 : 1 : ....label:merge_start
          0.000 : <unmeasured> (50.0%)
          0.000 : 1 : ......label:allocate/init
          0.000 : 1 : ......label:sanity checks
       0.000 : 1 : ....label:renames   
    0.001 : 1 : ..label:traverse_trees
    0.000 : 1 : ..label:write_auto_merge
    0.000 : 1 : ..label:check_updates
    0.000 : 1 : ..label:setup       
    0.000 : 1 : ..label:display messages
    0.000 : 1 : ..label:write back to queue
    0.000 : 1 : ..label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 36): 0.00036
Timing including forking:  5.174 (0.015 additional seconds)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-07 15:02       ` Derrick Stolee
@ 2020-11-07 19:39         ` Elijah Newren
  2020-11-09 12:30           ` Derrick Stolee
  0 siblings, 1 reply; 84+ messages in thread
From: Elijah Newren @ 2020-11-07 19:39 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Sat, Nov 7, 2020 at 7:02 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/7/20 1:06 AM, Elijah Newren wrote:
> > Hi Derrick,
> >
> > On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
> >> All that said, for testing either branch you just need to first set
> >> pull.twohead=ort in your git config (see
> >> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
> >> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
> >
> > I probably also should have mentioned that merge-ort does not (yet?)
> > heed merge.renames configuration setting; it always detects renames.
> > I know you run with merge.renames=false, so you won't quite get an
> > apples-to-apples comparison.  However, part of my point was I wanted
> > to make renames fast enough that they could be left turned on, even
> > for the large scale repos, so I'm very interested in your experience.
> > If you need an escape hatch, though, just put a "return 1" at the top
> > of detect_and_process_renames() to turn it off.
> >
> > Oh, and I went through and re-merged all the merge commits in the
> > linux kernel and found a bug in merge-ort while doing that (causing it
> > to die, not to merge badly).  I'm kind of surprised that none of my
> > testcases triggered that failure earlier; if you're testing it out,
> > you might want to update to get the fix (commit 067e5c1a38,
> > "merge-ort: fix bug with cached_target_names not being initialized in
> > redos", 2020-11-06).
>
> I did manage to do some testing to see what happens with
> a large repo under a small sparse-checkout. And using
> trace2, I was able to see that your code is being exercised.
> Unfortunately, I didn't see any performance improvement, and
> that is likely due to needing to expand the index entirely
> when checking out the merge commit.
>
> Is there a command to construct a merge commit without
> actually checking it out? That would reduce the time spent
> expanding the index, which would allow your algorithm to
> really show its benefits!

Wow, very interesting.  I am working on a --remerge-diff option for
log, which implies -p and is similar to -c or --cc in that it makes
merge commits show a diff, but which in particular remerges the two
parent commits complete with conflict markers and such and then diffs
the merge commit against that intermediate remerge.  That's a case
that constructs a merge commit without ever touching the index (or
working tree)...but there's no equivalent comparison point for
merge-recursive.  So, it doesn't provide something to compare against
(and while the code can be used I don't actually have a --remerge-diff
option yet -- it just hardcodes the behavior on if wanted or not), so
I'm not sure if you'd be interested in it.  If you are, let me know
though, and I'll send details.

However, I'm really surprised here, because merge-recursive always
reads and writes the index too (the index is the basis for its whole
algorithm).  In fact, merge-recursive always reads the index at least
*twice* (it unconditionally discards and re-reads the index), so you
must have some kind of specialized tweaking of merge-recursive if it
somehow avoids a full index read/write.  In order to do an
apples-to-apples comparison, we'd need to make those same tweaks to
merge-ort, but I don't have a clue what kind of tweaks you've made
here.  So, some investigation points:

*1*. Could you give me the accumulated times from the trace2_regions
so we can verify where the time is spent?  The 'summarize-perf' script
at the toplevel of the repo in my ort branch might be helpful for
this; just prefix any git command with that script and it accumulates
the trace2 region times and prints them out.  For example, I could run
'summarize-perf git merge --no-edit B^0' or 'summarize-perf test-tool
fast-rebase --onto HEAD ca76bea9 myfeature'.  Here's an example:

=== BEGIN OUTPUT ===
$ /home/newren/floss/git/summarize-perf test-tool fast-rebase --onto
HEAD 4703d9119972bf586d2cca76ec6438f819ffa30e hwmon-updates
Rebasing fd8bdb23b91876ac1e624337bb88dc1dcc21d67e...
Done.
Accumulated times:
    0.031 : <unmeasured> ( 3.2%)
    0.837 : 35 : label:incore_nonrecursive
       0.003 : <unmeasured> ( 0.4%)
       0.476 : 41 : ..label:collect_merge_info
          0.001 : <unmeasured> ( 0.2%)
          0.475 : 41 : ....label:traverse_trees
       0.298 : 41 : ..label:renames
          0.015 : <unmeasured> ( 5.1%)
          0.280 : 41 : ....label:regular renames
             0.036 : <unmeasured> (12.7%)
             0.244 : 6 : ......label:diffcore_rename
                0.001 : <unmeasured> ( 0.4%)
                0.078 : 6 : ........label:dir rename setup
                0.055 : 6 : ........label:basename matches
                0.051 : 6 : ........label:exact renames
                0.031 : 6 : ........label:write back to queue
                0.017 : 6 : ........label:setup
                0.009 : 6 : ........label:cull basename
                0.003 : 6 : ........label:cull exact
          0.002 : 35 : ....label:directory renames
          0.001 : 35 : ....label:process renames
       0.052 : 35 : ..label:process_entries
          0.001 : <unmeasured> ( 1.7%)
          0.033 : 35 : ....label:processing
          0.017 : 35 : ....label:process_entries setup
             0.001 : <unmeasured> ( 5.8%)
             0.008 : 35 : ......label:plist copy
             0.008 : 35 : ......label:plist sort
             0.000 : 35 : ......label:plist grow
          0.001 : 35 : ....label:finalize
       0.005 : 35 : ..label:merge_start
          0.001 : <unmeasured> (18.8%)
          0.004 : 34 : ....label:reset_maps
          0.000 : 35 : ....label:sanity checks
          0.000 : 1 : ....label:allocate/init
       0.003 : 6 : ..label:reset_maps
    0.035 : 1 : label:do_write_index
/home/newren/floss/linux-stable/.git/index.lock
    0.034 : 1 : label:checkout
       0.034 : <unmeasured> (99.9%)
       0.000 : 1 : ..label:Filtering content
    0.009 : 1 : label:do_read_index .git/index
    0.000 : 1 : label:write_auto_merge
    0.000 : 1 : label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 679):
0.006790000000000001
Timing including forking:  0.960 (0.013 additional seconds)
=== END OUTPUT ===
This was a run that took just under 1s (and was a hot-cache case; I
had just done the same rebase before to warm the caches), and the
combination of index/working tree bits (everything at and after
do_write_index in the output) was 0.035+0.034+0.009+0+0=0.078 seconds,
corresponding to just over 8.1% of overall time.  I'm curious where
that lands for your repository testcase; if the larger time ends up
somewhere under the indented label:incore_nonrecursive region, then
it's due to something other than index reading/updating/writing.

*2*. If it really is due to index reading/updating/writing, then index
handling in merge-ort is confined to two functions: checkout() and
record_unmerged_index_entries().  Both functions aren't too long, and
neither one calls into any other function within merge-ort.c.
(Further, checkout() is a near copy of code from merge_working_tree()
in builtin/checkout.c, or at least a copy of that function from a year
or so ago.)  As such, it's possible you can go in and make whatever
special tweaks you have for partial index reading/writing to those
functions.

I'm curious to hear back more on this.

Elijah

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-07  6:06     ` Elijah Newren
@ 2020-11-07 15:02       ` Derrick Stolee
  2020-11-07 19:39         ` Elijah Newren
  0 siblings, 1 reply; 84+ messages in thread
From: Derrick Stolee @ 2020-11-07 15:02 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

On 11/7/20 1:06 AM, Elijah Newren wrote:
> Hi Derrick,
> 
> On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
>> All that said, for testing either branch you just need to first set
>> pull.twohead=ort in your git config (see
>> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
>> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
> 
> I probably also should have mentioned that merge-ort does not (yet?)
> heed merge.renames configuration setting; it always detects renames.
> I know you run with merge.renames=false, so you won't quite get an
> apples-to-apples comparison.  However, part of my point was I wanted
> to make renames fast enough that they could be left turned on, even
> for the large scale repos, so I'm very interested in your experience.
> If you need an escape hatch, though, just put a "return 1" at the top
> of detect_and_process_renames() to turn it off.
> 
> Oh, and I went through and re-merged all the merge commits in the
> linux kernel and found a bug in merge-ort while doing that (causing it
> to die, not to merge badly).  I'm kind of surprised that none of my
> testcases triggered that failure earlier; if you're testing it out,
> you might want to update to get the fix (commit 067e5c1a38,
> "merge-ort: fix bug with cached_target_names not being initialized in
> redos", 2020-11-06).

I did manage to do some testing to see what happens with
a large repo under a small sparse-checkout. And using
trace2, I was able to see that your code is being exercised.
Unfortunately, I didn't see any performance improvement, and
that is likely due to needing to expand the index entirely
when checking out the merge commit.

Is there a command to construct a merge commit without
actually checking it out? That would reduce the time spent
expanding the index, which would allow your algorithm to
really show its benefits!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-03 16:36   ` Elijah Newren
@ 2020-11-07  6:06     ` Elijah Newren
  2020-11-07 15:02       ` Derrick Stolee
  0 siblings, 1 reply; 84+ messages in thread
From: Elijah Newren @ 2020-11-07  6:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

Hi Derrick,

On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Nov 3, 2020 at 6:50 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 11/2/2020 3:43 PM, Elijah Newren wrote:
> > > This series depends on a merge of en/strmap (after updating to v3) and
> > > en/merge-ort-api-null-impl.
> > >
> > > As promised, here's the update of the series due to the strmap
> > > updates...and two other tiny updates.
> >
> > Hi Elijah,
> >
> > I'm sorry that I've been unavailable to read and review your series
> > on this topic. I'm very excited about the opportunities here, and I
> > wanted to take your topic and merge it with our microsoft/git fork
> > so I could test the performance in a Scalar-enabled monorepo. My
> > branch is available in my fork [1]
> >
> > [1] https://github.com/derrickstolee/git/tree/merge-ort-vfs
> >
> > However, I'm unable to discover how to trigger your ort strategy,
> > even for a simple rebase. Perhaps you could supply a recommended
> > command for testing?
> >
> > Thanks,
> > -Stolee
>
> If you want to test performance, you shouldn't test this particular
> submission, you should test the end result which exists as the 'ort'
> branch of my repo.  It actually passes all the tests rather than just
> trivial cherry-picks and rebases, and has lots (and lots) of
> performance work that hasn't even begun at the point of the
> 'ort-basics' branch.  (However, it also contains some unrelated memory
> cleanup in revision.c, chdir-notify.c, and a number of other places
> because I was annoyed that a rebase wouldn't run valgrind-free and
> made it harder to spot my memory leaks.  And the day I went hunting
> those memory "leaks", I went and grabbed some unrelated memory leaks
> too.  If it causes you merge conflicts, let me know and I'll try to
> create a branch for you that hash the minimal changes outside of
> merge-ort*.[ch] and diffcore*.[ch])
>
> All that said, for testing either branch you just need to first set
> pull.twohead=ort in your git config (see
> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.

I probably also should have mentioned that merge-ort does not (yet?)
heed merge.renames configuration setting; it always detects renames.
I know you run with merge.renames=false, so you won't quite get an
apples-to-apples comparison.  However, part of my point was I wanted
to make renames fast enough that they could be left turned on, even
for the large scale repos, so I'm very interested in your experience.
If you need an escape hatch, though, just put a "return 1" at the top
of detect_and_process_renames() to turn it off.

Oh, and I went through and re-merged all the merge commits in the
linux kernel and found a bug in merge-ort while doing that (causing it
to die, not to merge badly).  I'm kind of surprised that none of my
testcases triggered that failure earlier; if you're testing it out,
you might want to update to get the fix (commit 067e5c1a38,
"merge-ort: fix bug with cached_target_names not being initialized in
redos", 2020-11-06).

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-03 14:49 ` Derrick Stolee
@ 2020-11-03 16:36   ` Elijah Newren
  2020-11-07  6:06     ` Elijah Newren
  0 siblings, 1 reply; 84+ messages in thread
From: Elijah Newren @ 2020-11-03 16:36 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git Mailing List

On Tue, Nov 3, 2020 at 6:50 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/2/2020 3:43 PM, Elijah Newren wrote:
> > This series depends on a merge of en/strmap (after updating to v3) and
> > en/merge-ort-api-null-impl.
> >
> > As promised, here's the update of the series due to the strmap
> > updates...and two other tiny updates.
>
> Hi Elijah,
>
> I'm sorry that I've been unavailable to read and review your series
> on this topic. I'm very excited about the opportunities here, and I
> wanted to take your topic and merge it with our microsoft/git fork
> so I could test the performance in a Scalar-enabled monorepo. My
> branch is available in my fork [1]
>
> [1] https://github.com/derrickstolee/git/tree/merge-ort-vfs
>
> However, I'm unable to discover how to trigger your ort strategy,
> even for a simple rebase. Perhaps you could supply a recommended
> command for testing?
>
> Thanks,
> -Stolee

If you want to test performance, you shouldn't test this particular
submission, you should test the end result which exists as the 'ort'
branch of my repo.  It actually passes all the tests rather than just
trivial cherry-picks and rebases, and has lots (and lots) of
performance work that hasn't even begun at the point of the
'ort-basics' branch.  (However, it also contains some unrelated memory
cleanup in revision.c, chdir-notify.c, and a number of other places
because I was annoyed that a rebase wouldn't run valgrind-free and
made it harder to spot my memory leaks.  And the day I went hunting
those memory "leaks", I went and grabbed some unrelated memory leaks
too.  If it causes you merge conflicts, let me know and I'll try to
create a branch for you that hash the minimal changes outside of
merge-ort*.[ch] and diffcore*.[ch])

All that said, for testing either branch you just need to first set
pull.twohead=ort in your git config (see
https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 00/20] fundamentals of merge-ort implementation
  2020-11-02 20:43 [PATCH v2 " Elijah Newren
@ 2020-11-03 14:49 ` Derrick Stolee
  2020-11-03 16:36   ` Elijah Newren
  2020-11-11 17:08 ` Derrick Stolee
  1 sibling, 1 reply; 84+ messages in thread
From: Derrick Stolee @ 2020-11-03 14:49 UTC (permalink / raw)
  To: Elijah Newren, git

On 11/2/2020 3:43 PM, Elijah Newren wrote:
> This series depends on a merge of en/strmap (after updating to v3) and
> en/merge-ort-api-null-impl.
> 
> As promised, here's the update of the series due to the strmap
> updates...and two other tiny updates.

Hi Elijah,

I'm sorry that I've been unavailable to read and review your series
on this topic. I'm very excited about the opportunities here, and I
wanted to take your topic and merge it with our microsoft/git fork
so I could test the performance in a Scalar-enabled monorepo. My
branch is available in my fork [1]

[1] https://github.com/derrickstolee/git/tree/merge-ort-vfs

However, I'm unable to discover how to trigger your ort strategy,
even for a simple rebase. Perhaps you could supply a recommended
command for testing?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 00/20] fundamentals of merge-ort implementation
@ 2020-11-02 20:43 Elijah Newren
  2020-11-03 14:49 ` Derrick Stolee
  2020-11-11 17:08 ` Derrick Stolee
  0 siblings, 2 replies; 84+ messages in thread
From: Elijah Newren @ 2020-11-02 20:43 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This series depends on a merge of en/strmap (after updating to v3) and
en/merge-ort-api-null-impl.

As promised, here's the update of the series due to the strmap
updates...and two other tiny updates.

Changes since v1:
  * updates needed based on changes to made in v3 of strmap series
  * fixed a typo in a comment
  * tiny tweak to move a strmap_put() into setup_paths()

Elijah Newren (20):
  merge-ort: setup basic internal data structures
  merge-ort: add some high-level algorithm structure
  merge-ort: port merge_start() from merge-recursive
  merge-ort: use histogram diff
  merge-ort: add an err() function similar to one from merge-recursive
  merge-ort: implement a very basic collect_merge_info()
  merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  merge-ort: compute a few more useful fields for collect_merge_info
  merge-ort: record stage and auxiliary info for every path
  merge-ort: avoid recursing into identical trees
  merge-ort: add a preliminary simple process_entries() implementation
  merge-ort: have process_entries operate in a defined order
  merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  merge-ort: step 2 of tree writing -- function to create tree object
  merge-ort: step 3 of tree writing -- handling subdirectories as we go
  merge-ort: basic outline for merge_switch_to_result()
  merge-ort: add implementation of checkout()
  tree: enable cmp_cache_name_compare() to be used elsewhere
  merge-ort: add implementation of record_unmerged_index_entries()
  merge-ort: free data structures in merge_finalize()

 merge-ort.c | 929 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 tree.c      |   2 +-
 tree.h      |   2 +
 3 files changed, 929 insertions(+), 4 deletions(-)

-- 
2.29.0.471.ga4f56089c0


^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2020-12-14 16:26 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-11-29 10:23   ` Ævar Arnfjörð Bjarmason
2020-11-30 16:56     ` Elijah Newren
2020-11-29 10:26   ` Ævar Arnfjörð Bjarmason
2020-11-29  7:43 ` [PATCH 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-11-29 10:20   ` Ævar Arnfjörð Bjarmason
2020-11-29  7:43 ` [PATCH 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-11-29  7:47 ` [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-12-14 14:24     ` [PATCH v3 00/20] fundamentals of merge-ort implementation Felipe Contreras
2020-12-14 16:24       ` Elijah Newren
  -- strict thread matches above, loose matches on Subject: below --
2020-11-02 20:43 [PATCH v2 " Elijah Newren
2020-11-03 14:49 ` Derrick Stolee
2020-11-03 16:36   ` Elijah Newren
2020-11-07  6:06     ` Elijah Newren
2020-11-07 15:02       ` Derrick Stolee
2020-11-07 19:39         ` Elijah Newren
2020-11-09 12:30           ` Derrick Stolee
2020-11-09 17:13             ` Elijah Newren
2020-11-09 19:51               ` Derrick Stolee
2020-11-09 22:44                 ` Elijah Newren
2020-11-11 17:08 ` Derrick Stolee
2020-11-11 18:35   ` Elijah Newren
2020-11-11 20:48     ` Derrick Stolee
2020-11-11 21:18       ` Elijah Newren

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git