From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: jonathantanmy@google.com, dstolee@microsoft.com,
"Elijah Newren" <newren@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Elijah Newren" <newren@gmail.com>,
"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v2 01/20] merge-ort: setup basic internal data structures
Date: Fri, 04 Dec 2020 20:47:51 +0000 [thread overview]
Message-ID: <2568ec92c6d96dc51aff4a411900eaec8d32ce27.1607114890.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.923.v2.git.git.1607114890.gitgitgadget@gmail.com>
From: Elijah Newren <newren@gmail.com>
Set up some basic internal data structures. The only carry-over from
merge-recursive.c is call_depth, though needed_rename_limit will be
added later.
The central piece of data will definitely be the strmap "paths", which
will map every relevant pathname under consideration to either a
merged_info or a conflict_info. ("conflicted" is a strmap that is a
subset of "paths".)
merged_info contains all relevant information for a non-conflicted
entry. conflict_info contains a merged_info, plus any additional
information about a conflict such as the higher orders stages involved
and the names of the paths those came from (handy once renames get
involved). If an entry remains conflicted, the merged_info portion of a
conflict_info will later be filled with whatever version of the file
should be placed in the working directory (e.g. an as-merged-as-possible
variation that contains conflict markers).
Signed-off-by: Elijah Newren <newren@gmail.com>
---
merge-ort.c | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 137 insertions(+)
diff --git a/merge-ort.c b/merge-ort.c
index b487901d3e..bb37fdf838 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -17,6 +17,143 @@
#include "cache.h"
#include "merge-ort.h"
+#include "strmap.h"
+
+struct merge_options_internal {
+ /*
+ * paths: primary data structure in all of merge ort.
+ *
+ * The keys of paths:
+ * * are full relative paths from the toplevel of the repository
+ * (e.g. "drivers/firmware/raspberrypi.c").
+ * * store all relevant paths in the repo, both directories and
+ * files (e.g. drivers, drivers/firmware would also be included)
+ * * these keys serve to intern all the path strings, which allows
+ * us to do pointer comparison on directory names instead of
+ * strcmp; we just have to be careful to use the interned strings.
+ *
+ * The values of paths:
+ * * either a pointer to a merged_info, or a conflict_info struct
+ * * merged_info contains all relevant information for a
+ * non-conflicted entry.
+ * * conflict_info contains a merged_info, plus any additional
+ * information about a conflict such as the higher orders stages
+ * involved and the names of the paths those came from (handy
+ * once renames get involved).
+ * * a path may start "conflicted" (i.e. point to a conflict_info)
+ * and then a later step (e.g. three-way content merge) determines
+ * it can be cleanly merged, at which point it'll be marked clean
+ * and the algorithm will ignore any data outside the contained
+ * merged_info for that entry
+ * * If an entry remains conflicted, the merged_info portion of a
+ * conflict_info will later be filled with whatever version of
+ * the file should be placed in the working directory (e.g. an
+ * as-merged-as-possible variation that contains conflict markers).
+ */
+ struct strmap paths;
+
+ /*
+ * conflicted: a subset of keys->values from "paths"
+ *
+ * conflicted is basically an optimization between process_entries()
+ * and record_conflicted_index_entries(); the latter could loop over
+ * ALL the entries in paths AGAIN and look for the ones that are
+ * still conflicted, but since process_entries() has to loop over
+ * all of them, it saves the ones it couldn't resolve in this strmap
+ * so that record_conflicted_index_entries() can iterate just the
+ * relevant entries.
+ */
+ struct strmap conflicted;
+
+ /*
+ * current_dir_name: temporary var used in collect_merge_info_callback()
+ *
+ * Used to set merged_info.directory_name; see documentation for that
+ * variable and the requirements placed on that field.
+ */
+ const char *current_dir_name;
+
+ /* call_depth: recursion level counter for merging merge bases */
+ int call_depth;
+};
+
+struct version_info {
+ struct object_id oid;
+ unsigned short mode;
+};
+
+struct merged_info {
+ /* if is_null, ignore result. otherwise result has oid & mode */
+ struct version_info result;
+ unsigned is_null:1;
+
+ /*
+ * clean: whether the path in question is cleanly merged.
+ *
+ * see conflict_info.merged for more details.
+ */
+ unsigned clean:1;
+
+ /*
+ * basename_offset: offset of basename of path.
+ *
+ * perf optimization to avoid recomputing offset of final '/'
+ * character in pathname (0 if no '/' in pathname).
+ */
+ size_t basename_offset;
+
+ /*
+ * directory_name: containing directory name.
+ *
+ * Note that we assume directory_name is constructed such that
+ * strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
+ * i.e. string equality is equivalent to pointer equality. For this
+ * to hold, we have to be careful setting directory_name.
+ */
+ const char *directory_name;
+};
+
+struct conflict_info {
+ /*
+ * merged: the version of the path that will be written to working tree
+ *
+ * WARNING: It is critical to check merged.clean and ensure it is 0
+ * before reading any conflict_info fields outside of merged.
+ * Allocated merge_info structs will always have clean set to 1.
+ * Allocated conflict_info structs will have merged.clean set to 0
+ * initially. The merged.clean field is how we know if it is safe
+ * to access other parts of conflict_info besides merged; if a
+ * conflict_info's merged.clean is changed to 1, the rest of the
+ * algorithm is not allowed to look at anything outside of the
+ * merged member anymore.
+ */
+ struct merged_info merged;
+
+ /* oids & modes from each of the three trees for this path */
+ struct version_info stages[3];
+
+ /* pathnames for each stage; may differ due to rename detection */
+ const char *pathnames[3];
+
+ /* Whether this path is/was involved in a directory/file conflict */
+ unsigned df_conflict:1;
+
+ /*
+ * For filemask and dirmask, see tree-walk.h's struct traverse_info,
+ * particularly the documentation above the "fn" member. Note that
+ * filemask = mask & ~dirmask from that documentation.
+ */
+ unsigned filemask:3;
+ unsigned dirmask:3;
+
+ /*
+ * Optimization to track which stages match, to avoid the need to
+ * recompute it in multiple steps. Either 0 or at least 2 bits are
+ * set; if at least 2 bits are set, their corresponding stages match.
+ */
+ unsigned match_mask:3;
+};
+
void merge_switch_to_result(struct merge_options *opt,
struct tree *head,
struct merge_result *result,
--
gitgitgadget
next prev parent reply other threads:[~2020-12-04 20:50 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-29 7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-11-29 10:23 ` Ævar Arnfjörð Bjarmason
2020-11-30 16:56 ` Elijah Newren
2020-11-29 10:26 ` Ævar Arnfjörð Bjarmason
2020-11-29 7:43 ` [PATCH 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-11-29 7:43 ` [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-11-29 10:20 ` Ævar Arnfjörð Bjarmason
2020-11-29 7:43 ` [PATCH 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-11-29 7:47 ` [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:47 ` Elijah Newren via GitGitGadget [this message]
2020-12-04 20:47 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-12-04 20:48 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget
2020-12-13 8:04 ` [PATCH v3 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget
2020-12-14 14:24 ` [PATCH v3 00/20] fundamentals of merge-ort implementation Felipe Contreras
2020-12-14 16:24 ` Elijah Newren
-- strict thread matches above, loose matches on Subject: below --
2020-11-02 20:43 [PATCH v2 " Elijah Newren
2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren
2020-11-06 22:05 ` Jonathan Tan
2020-11-06 22:45 ` Elijah Newren
2020-11-09 20:55 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2568ec92c6d96dc51aff4a411900eaec8d32ce27.1607114890.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).