From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
Thomas Gummerer <t.gummerer@gmail.com>
Subject: [PATCH v4 05/11] rerere: add documentation for conflict normalization
Date: Sun, 5 Aug 2018 18:20:31 +0100 [thread overview]
Message-ID: <20180805172037.12530-6-t.gummerer@gmail.com> (raw)
In-Reply-To: <20180805172037.12530-1-t.gummerer@gmail.com>
Add some documentation for the logic behind the conflict normalization
in rerere.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
Documentation/technical/rerere.txt | 140 +++++++++++++++++++++++++++++
rerere.c | 4 -
2 files changed, 140 insertions(+), 4 deletions(-)
create mode 100644 Documentation/technical/rerere.txt
diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..3d10dbfa67
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,140 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing the common ancestor
+version from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks. More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Removing the common ancestor version
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2. The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document). In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+ <<<<<<< HEAD
+ B
+ =======
+ C
+ >>>>>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+ <<<<<<< HEAD
+ B
+ ||||||| merged common ancestors
+ A
+ =======
+ C
+ >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+ After examining what branches AB and AC did, I believe that making
+ line A into line D is the best thing to do that is compatible with
+ what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same. To do this, the labels on the conflict
+markers are stripped, and the common ancestor version is removed. The above
+examples would both result in the following normalized conflict:
+
+ <<<<<<<
+ B
+ =======
+ C
+ >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part. And then four branches
+are forked that do these things:
+
+ - AB: changes A to B
+ - AC: changes A to C
+ - XY: changes X to Y
+ - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order. The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+ After examining what branches AB and AC did, I believe that
+ making line A into line D is the best thing to do that is
+ compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean. We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z". We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters. The conflict markers are stripped out
+before the sha1 is calculated. So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
diff --git a/rerere.c b/rerere.c
index be98c0afcb..da1ab54027 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
* and NUL concatenated together.
*
* Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
*/
static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
{
--
2.18.0.720.gf7a957e2e7
next prev parent reply other threads:[~2018-08-05 17:20 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
2018-05-21 19:00 ` Stefan Beller
2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
2018-05-24 7:20 ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
2018-05-24 9:20 ` Junio C Hamano
2018-06-03 11:41 ` Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-05-24 9:47 ` Junio C Hamano
2018-05-24 18:54 ` Thomas Gummerer
2018-05-25 1:20 ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-05-24 10:02 ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-05-24 10:21 ` Junio C Hamano
2018-05-24 19:07 ` Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 05/10] rerere: add some documentation Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
2018-07-03 21:05 ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
2018-07-06 17:56 ` Junio C Hamano
2018-07-10 21:37 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 00/11] " Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
2018-07-15 13:24 ` Simon Ruderich
2018-07-16 20:40 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
2018-07-30 17:50 ` Junio C Hamano
2018-07-30 20:21 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-07-30 17:50 ` Junio C Hamano
2018-07-30 20:45 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-07-30 17:50 ` Junio C Hamano
2018-07-30 20:47 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
2018-07-30 17:51 ` Junio C Hamano
2018-07-14 21:44 ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
2018-07-30 17:51 ` Junio C Hamano
2018-07-14 21:44 ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-07-30 17:45 ` Junio C Hamano
2018-07-30 20:20 ` Thomas Gummerer
2018-07-14 21:44 ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
2018-07-30 17:50 ` [PATCH v3 00/11] rerere: handle nested conflicts Junio C Hamano
2018-07-30 20:49 ` Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 " Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 02/11] rerere: lowercase error messages Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 03/11] rerere: wrap paths in output in sq Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 04/11] rerere: mark strings for translation Thomas Gummerer
2018-08-05 17:20 ` Thomas Gummerer [this message]
2018-08-05 17:20 ` [PATCH v4 06/11] rerere: fix crash with files rerere can't handle Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 08/11] rerere: factor out handle_conflict function Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 09/11] rerere: return strbuf from handle path Thomas Gummerer
2018-08-05 17:20 ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-08-22 11:00 ` Ævar Arnfjörð Bjarmason
2018-08-22 16:06 ` Junio C Hamano
2018-08-22 20:34 ` Thomas Gummerer
2018-08-22 21:07 ` Junio C Hamano
2018-08-24 21:56 ` Thomas Gummerer
2018-08-24 22:10 ` [PATCH 1/2] rerere: remove documentation for "nested conflicts" Thomas Gummerer
2018-08-24 22:10 ` [PATCH 2/2] rerere: add not about files with existing conflict markers Thomas Gummerer
2018-08-28 21:27 ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Thomas Gummerer
2018-08-28 21:27 ` [PATCH v2 2/2] rerere: add note about files with existing " Thomas Gummerer
2018-08-29 16:04 ` [PATCH v2 1/2] rerere: mention caveat about unmatched " Junio C Hamano
2018-09-01 9:00 ` Thomas Gummerer
2018-08-27 17:33 ` [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts Junio C Hamano
2018-08-28 22:05 ` Thomas Gummerer
2018-08-27 19:36 ` Junio C Hamano
2018-08-05 17:20 ` [PATCH v4 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180805172037.12530-6-t.gummerer@gmail.com \
--to=t.gummerer@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).