From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> To: git@vger.kernel.org Cc: Elijah Newren <newren@gmail.com>, Elijah Newren <newren@gmail.com> Subject: [PATCH 01/20] merge-ort: setup basic internal data structures Date: Sun, 29 Nov 2020 07:43:04 +0000 [thread overview] Message-ID: <2568ec92c6d96dc51aff4a411900eaec8d32ce27.1606635803.git.gitgitgadget@gmail.com> (raw) In-Reply-To: <pull.923.git.git.1606635803.gitgitgadget@gmail.com> From: Elijah Newren <newren@gmail.com> Set up some basic internal data structures. The only carry-over from merge-recursive.c is call_depth, though needed_rename_limit will be added later. The central piece of data will definitely be the strmap "paths", which will map every relevant pathname under consideration to either a merged_info or a conflict_info. ("conflicted" is a strmap that is a subset of "paths".) merged_info contains all relevant information for a non-conflicted entry. conflict_info contains a merged_info, plus any additional information about a conflict such as the higher orders stages involved and the names of the paths those came from (handy once renames get involved). If an entry remains conflicted, the merged_info portion of a conflict_info will later be filled with whatever version of the file should be placed in the working directory (e.g. an as-merged-as-possible variation that contains conflict markers). Signed-off-by: Elijah Newren <newren@gmail.com> --- merge-ort.c | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) diff --git a/merge-ort.c b/merge-ort.c index b487901d3e..bb37fdf838 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -17,6 +17,143 @@ #include "cache.h" #include "merge-ort.h" +#include "strmap.h" + +struct merge_options_internal { + /* + * paths: primary data structure in all of merge ort. + * + * The keys of paths: + * * are full relative paths from the toplevel of the repository + * (e.g. "drivers/firmware/raspberrypi.c"). + * * store all relevant paths in the repo, both directories and + * files (e.g. drivers, drivers/firmware would also be included) + * * these keys serve to intern all the path strings, which allows + * us to do pointer comparison on directory names instead of + * strcmp; we just have to be careful to use the interned strings. + * + * The values of paths: + * * either a pointer to a merged_info, or a conflict_info struct + * * merged_info contains all relevant information for a + * non-conflicted entry. + * * conflict_info contains a merged_info, plus any additional + * information about a conflict such as the higher orders stages + * involved and the names of the paths those came from (handy + * once renames get involved). + * * a path may start "conflicted" (i.e. point to a conflict_info) + * and then a later step (e.g. three-way content merge) determines + * it can be cleanly merged, at which point it'll be marked clean + * and the algorithm will ignore any data outside the contained + * merged_info for that entry + * * If an entry remains conflicted, the merged_info portion of a + * conflict_info will later be filled with whatever version of + * the file should be placed in the working directory (e.g. an + * as-merged-as-possible variation that contains conflict markers). + */ + struct strmap paths; + + /* + * conflicted: a subset of keys->values from "paths" + * + * conflicted is basically an optimization between process_entries() + * and record_conflicted_index_entries(); the latter could loop over + * ALL the entries in paths AGAIN and look for the ones that are + * still conflicted, but since process_entries() has to loop over + * all of them, it saves the ones it couldn't resolve in this strmap + * so that record_conflicted_index_entries() can iterate just the + * relevant entries. + */ + struct strmap conflicted; + + /* + * current_dir_name: temporary var used in collect_merge_info_callback() + * + * Used to set merged_info.directory_name; see documentation for that + * variable and the requirements placed on that field. + */ + const char *current_dir_name; + + /* call_depth: recursion level counter for merging merge bases */ + int call_depth; +}; + +struct version_info { + struct object_id oid; + unsigned short mode; +}; + +struct merged_info { + /* if is_null, ignore result. otherwise result has oid & mode */ + struct version_info result; + unsigned is_null:1; + + /* + * clean: whether the path in question is cleanly merged. + * + * see conflict_info.merged for more details. + */ + unsigned clean:1; + + /* + * basename_offset: offset of basename of path. + * + * perf optimization to avoid recomputing offset of final '/' + * character in pathname (0 if no '/' in pathname). + */ + size_t basename_offset; + + /* + * directory_name: containing directory name. + * + * Note that we assume directory_name is constructed such that + * strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name, + * i.e. string equality is equivalent to pointer equality. For this + * to hold, we have to be careful setting directory_name. + */ + const char *directory_name; +}; + +struct conflict_info { + /* + * merged: the version of the path that will be written to working tree + * + * WARNING: It is critical to check merged.clean and ensure it is 0 + * before reading any conflict_info fields outside of merged. + * Allocated merge_info structs will always have clean set to 1. + * Allocated conflict_info structs will have merged.clean set to 0 + * initially. The merged.clean field is how we know if it is safe + * to access other parts of conflict_info besides merged; if a + * conflict_info's merged.clean is changed to 1, the rest of the + * algorithm is not allowed to look at anything outside of the + * merged member anymore. + */ + struct merged_info merged; + + /* oids & modes from each of the three trees for this path */ + struct version_info stages[3]; + + /* pathnames for each stage; may differ due to rename detection */ + const char *pathnames[3]; + + /* Whether this path is/was involved in a directory/file conflict */ + unsigned df_conflict:1; + + /* + * For filemask and dirmask, see tree-walk.h's struct traverse_info, + * particularly the documentation above the "fn" member. Note that + * filemask = mask & ~dirmask from that documentation. + */ + unsigned filemask:3; + unsigned dirmask:3; + + /* + * Optimization to track which stages match, to avoid the need to + * recompute it in multiple steps. Either 0 or at least 2 bits are + * set; if at least 2 bits are set, their corresponding stages match. + */ + unsigned match_mask:3; +}; + void merge_switch_to_result(struct merge_options *opt, struct tree *head, struct merge_result *result, -- gitgitgadget
next prev parent reply other threads:[~2020-11-29 7:46 UTC|newest] Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-29 7:43 [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget 2020-11-29 7:43 ` Elijah Newren via GitGitGadget [this message] 2020-11-29 7:43 ` [PATCH 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget 2020-11-29 10:23 ` Ævar Arnfjörð Bjarmason 2020-11-30 16:56 ` Elijah Newren 2020-11-29 10:26 ` Ævar Arnfjörð Bjarmason 2020-11-29 7:43 ` [PATCH 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget 2020-11-29 7:43 ` [PATCH 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget 2020-11-29 10:20 ` Ævar Arnfjörð Bjarmason 2020-11-29 7:43 ` [PATCH 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget 2020-11-29 7:47 ` [PATCH 00/20] fundamentals of merge-ort implementation Elijah Newren 2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget 2020-12-04 20:48 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 00/20] fundamentals of merge-ort implementation Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 01/20] merge-ort: setup basic internal data structures Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 02/20] merge-ort: add some high-level algorithm structure Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 04/20] merge-ort: use histogram diff Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 10/20] merge-ort: avoid recursing into identical trees Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 17/20] merge-ort: add implementation of checkout() Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 19/20] merge-ort: add implementation of record_conflicted_index_entries() Elijah Newren via GitGitGadget 2020-12-13 8:04 ` [PATCH v3 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren via GitGitGadget 2020-12-14 14:24 ` [PATCH v3 00/20] fundamentals of merge-ort implementation Felipe Contreras 2020-12-14 16:24 ` Elijah Newren -- strict thread matches above, loose matches on Subject: below -- 2020-10-30 3:41 [PATCH " Elijah Newren 2020-10-30 3:41 ` [PATCH 01/20] merge-ort: setup basic internal data structures Elijah Newren
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2568ec92c6d96dc51aff4a411900eaec8d32ce27.1606635803.git.gitgitgadget@gmail.com \ --to=gitgitgadget@gmail.com \ --cc=git@vger.kernel.org \ --cc=newren@gmail.com \ --subject='Re: [PATCH 01/20] merge-ort: setup basic internal data structures' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).