git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: jonathantanmy@google.com, dstolee@microsoft.com,
	"Elijah Newren" <newren@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v3 01/20] merge-ort: setup basic internal data structures
Date: Sun, 13 Dec 2020 08:04:08 +0000	[thread overview]
Message-ID: <518dde86966cba9aba211f933e74672fba95509a.1607846667.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.923.v3.git.git.1607846667.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

Set up some basic internal data structures.  The only carry-over from
merge-recursive.c is call_depth, though needed_rename_limit will be
added later.

The central piece of data will definitely be the strmap "paths", which
will map every relevant pathname under consideration to either a
merged_info or a conflict_info.  ("conflicted" is a strmap that is a
subset of "paths".)

merged_info contains all relevant information for a non-conflicted
entry.  conflict_info contains a merged_info, plus any additional
information about a conflict such as the higher orders stages involved
and the names of the paths those came from (handy once renames get
involved).  If an entry remains conflicted, the merged_info portion of a
conflict_info will later be filled with whatever version of the file
should be placed in the working directory (e.g. an as-merged-as-possible
variation that contains conflict markers).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index b487901d3ec..3325c9c0a2c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,153 @@
 #include "cache.h"
 #include "merge-ort.h"
 
+#include "strmap.h"
+
+struct merge_options_internal {
+	/*
+	 * paths: primary data structure in all of merge ort.
+	 *
+	 * The keys of paths:
+	 *   * are full relative paths from the toplevel of the repository
+	 *     (e.g. "drivers/firmware/raspberrypi.c").
+	 *   * store all relevant paths in the repo, both directories and
+	 *     files (e.g. drivers, drivers/firmware would also be included)
+	 *   * these keys serve to intern all the path strings, which allows
+	 *     us to do pointer comparison on directory names instead of
+	 *     strcmp; we just have to be careful to use the interned strings.
+	 *
+	 * The values of paths:
+	 *   * either a pointer to a merged_info, or a conflict_info struct
+	 *   * merged_info contains all relevant information for a
+	 *     non-conflicted entry.
+	 *   * conflict_info contains a merged_info, plus any additional
+	 *     information about a conflict such as the higher orders stages
+	 *     involved and the names of the paths those came from (handy
+	 *     once renames get involved).
+	 *   * a path may start "conflicted" (i.e. point to a conflict_info)
+	 *     and then a later step (e.g. three-way content merge) determines
+	 *     it can be cleanly merged, at which point it'll be marked clean
+	 *     and the algorithm will ignore any data outside the contained
+	 *     merged_info for that entry
+	 *   * If an entry remains conflicted, the merged_info portion of a
+	 *     conflict_info will later be filled with whatever version of
+	 *     the file should be placed in the working directory (e.g. an
+	 *     as-merged-as-possible variation that contains conflict markers).
+	 */
+	struct strmap paths;
+
+	/*
+	 * conflicted: a subset of keys->values from "paths"
+	 *
+	 * conflicted is basically an optimization between process_entries()
+	 * and record_conflicted_index_entries(); the latter could loop over
+	 * ALL the entries in paths AGAIN and look for the ones that are
+	 * still conflicted, but since process_entries() has to loop over
+	 * all of them, it saves the ones it couldn't resolve in this strmap
+	 * so that record_conflicted_index_entries() can iterate just the
+	 * relevant entries.
+	 */
+	struct strmap conflicted;
+
+	/*
+	 * current_dir_name: temporary var used in collect_merge_info_callback()
+	 *
+	 * Used to set merged_info.directory_name; see documentation for that
+	 * variable and the requirements placed on that field.
+	 */
+	const char *current_dir_name;
+
+	/* call_depth: recursion level counter for merging merge bases */
+	int call_depth;
+};
+
+struct version_info {
+	struct object_id oid;
+	unsigned short mode;
+};
+
+struct merged_info {
+	/* if is_null, ignore result.  otherwise result has oid & mode */
+	struct version_info result;
+	unsigned is_null:1;
+
+	/*
+	 * clean: whether the path in question is cleanly merged.
+	 *
+	 * see conflict_info.merged for more details.
+	 */
+	unsigned clean:1;
+
+	/*
+	 * basename_offset: offset of basename of path.
+	 *
+	 * perf optimization to avoid recomputing offset of final '/'
+	 * character in pathname (0 if no '/' in pathname).
+	 */
+	size_t basename_offset;
+
+	 /*
+	  * directory_name: containing directory name.
+	  *
+	  * Note that we assume directory_name is constructed such that
+	  *    strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
+	  * i.e. string equality is equivalent to pointer equality.  For this
+	  * to hold, we have to be careful setting directory_name.
+	  */
+	const char *directory_name;
+};
+
+struct conflict_info {
+	/*
+	 * merged: the version of the path that will be written to working tree
+	 *
+	 * WARNING: It is critical to check merged.clean and ensure it is 0
+	 * before reading any conflict_info fields outside of merged.
+	 * Allocated merge_info structs will always have clean set to 1.
+	 * Allocated conflict_info structs will have merged.clean set to 0
+	 * initially.  The merged.clean field is how we know if it is safe
+	 * to access other parts of conflict_info besides merged; if a
+	 * conflict_info's merged.clean is changed to 1, the rest of the
+	 * algorithm is not allowed to look at anything outside of the
+	 * merged member anymore.
+	 */
+	struct merged_info merged;
+
+	/* oids & modes from each of the three trees for this path */
+	struct version_info stages[3];
+
+	/* pathnames for each stage; may differ due to rename detection */
+	const char *pathnames[3];
+
+	/* Whether this path is/was involved in a directory/file conflict */
+	unsigned df_conflict:1;
+
+	/*
+	 * For filemask and dirmask, the ith bit corresponds to whether the
+	 * ith entry is a file (filemask) or a directory (dirmask).  Thus,
+	 * filemask & dirmask is always zero, and filemask | dirmask is at
+	 * most 7 but can be less when a path does not appear as either a
+	 * file or a directory on at least one side of history.
+	 *
+	 * Note that these masks are related to enum merge_side, as the ith
+	 * entry corresponds to side i.
+	 *
+	 * These values come from a traverse_trees() call; more info may be
+	 * found looking at tree-walk.h's struct traverse_info,
+	 * particularly the documentation above the "fn" member (note that
+	 * filemask = mask & ~dirmask from that documentation).
+	 */
+	unsigned filemask:3;
+	unsigned dirmask:3;
+
+	/*
+	 * Optimization to track which stages match, to avoid the need to
+	 * recompute it in multiple steps. Either 0 or at least 2 bits are
+	 * set; if at least 2 bits are set, their corresponding stages match.
+	 */
+	unsigned match_mask:3;
+};
+
 void merge_switch_to_result(struct merge_options *opt,
 			    struct tree *head,
 			    struct merge_result *result,
-- 
gitgitgadget


  reply	other threads:[~2020-12-13  8:07 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-29  7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-11-29 10:23   ` Ævar Arnfjörð Bjarmason
2020-11-30 16:56     ` Elijah Newren
2020-11-29 10:26   ` Ævar Arnfjörð Bjarmason
2020-11-29  7:43 ` [PATCH 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-11-29  7:43 ` [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-11-29 10:20   ` Ævar Arnfjörð Bjarmason
2020-11-29  7:43 ` [PATCH 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-11-29  7:47 ` [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-12-04 20:48   ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-12-13  8:04   ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
2020-12-13  8:04     ` Elijah Newren via GitGitGadget [this message]
2020-12-13  8:04     ` [PATCH v3 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-12-13  8:04     ` [PATCH v3 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-12-14 14:24     ` [PATCH v3 00/20] fundamentals of merge-ort implementation Felipe Contreras
2020-12-14 16:24       ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=518dde86966cba9aba211f933e74672fba95509a.1607846667.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).