git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [RFC/PATCH 0/7] rerere: handle nested conflicts
@ 2018-05-20 21:12 Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
                   ` (7 more replies)
  0 siblings, 8 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

I started this whole patch series when I did a git rebase, and was too
lazy to resolve a conflict and just added the file with the conflict
markers and continued.  Once I got nested conflicts in the file, I
decided to abort the rebase with 'git rebase --abort' and got a
segfault in 'git rerere clear'.

Even if we can't handle the conflict, we shouldn't end with crashing
'git rerere clear'.  While trying to understand how 'git rerere' works
internally I noticed some other improvements that could be made, such
as marking the strings for translation and adding some docs on how
rerere works, since I had to find out from the code, and reading some
documentation would definitely have been helpful.

The next patches are more related to the actual problem I encountered,
first fixing the the possible crashing of 'git rerere clear' when we
can't handle conflicts in a file, and then actually trying to handle
nested conflicts.

I don't know if it's actually worth trying to handle nested conflicts,
as they are more than likely a very rare use-case, but on the other
hand resolving such conflicts is especially painful, so only having to
do it once would be much nicer.

This whole patch series is marked as RFC/PATCH, as this is my first
time touching the rerere code, so I may well misunderstand some bits
of the code.

Thomas Gummerer (7):
  rerere: unify error message when read_cache fails
  rerere: mark strings for translation
  rerere: add some documentation
  rerere: fix crash when conflict goes unresolved
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: teach rerere to handle nested conflicts

 Documentation/technical/rerere.txt |  43 +++++
 rerere.c                           | 244 ++++++++++++++---------------
 t/t4200-rerere.sh                  |  25 +++
 3 files changed, 186 insertions(+), 126 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PATCH 1/7] rerere: unify error message when read_cache fails
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-21 19:00   ` Stefan Beller
  2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

"index file corrupt" is also what Stefan chose for his series unifying
these error messages (and 'die'ing, which I'm not sure is the right
thing to do here as also mentioned in my reply to [1]).  I'm happy to
drop this if we decide to go with that series.

[1]: <20180516222118.233868-8-sbeller@google.com>

 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index 18cae2d11c..4b4869662d 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PATCH 2/7] rerere: mark strings for translation
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24  7:20   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

'git rerere' is considered a plumbing command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on its output
either way.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 68 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/rerere.c b/rerere.c
index 4b4869662d..af5e6179a9 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
 		/* There has to be the hash, tab, path and then NUL */
 		if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 
 		if (buf.buf[40] != '.') {
 			variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
 			errno = 0;
 			variant = strtol(buf.buf + 41, &path, 10);
 			if (errno)
-				die("corrupt MERGE_RR");
+				die(_("corrupt MERGE_RR"));
 		}
 		if (*(path++) != '\t')
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 		buf.buf[40] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
 				    rr->items[i].string, 0);
 
 		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-			die("unable to write rerere record");
+			die(_("unable to write rerere record"));
 
 		strbuf_release(&buf);
 	}
 	if (commit_lock_file(&write_lock) != 0)
-		die("unable to write rerere record");
+		die(_("unable to write rerere record"));
 	return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("Could not open %s", path);
+		return error_errno(_("Could not open %s"), path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("Could not write %s", output);
+			error_errno(_("Could not write %s"), output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("There were errors while writing %s (%s)",
+		error(_("There were errors while writing %s (%s)"),
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("Failed to flush %s", path);
+		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("Could not parse conflict hunks in %s", path);
+		return error(_("Could not parse conflict hunks in %s"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on %s",
+		warning_errno(_("failed utime() on %s"),
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("Could not open %s", path);
+		return error_errno(_("Could not open %s"), path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("Could not write %s", path);
+		error_errno(_("Could not write %s"), path);
 	if (fclose(f))
-		return error_errno("Writing %s failed", path);
+		return error_errno(_("Writing %s failed"), path);
 
 out:
 	free(cur.ptr);
@@ -715,13 +715,13 @@ static void update_paths(struct string_list *update)
 		struct string_list_item *item = &update->items[i];
 		if (add_file_to_cache(item->string, 0))
 			exit(128);
-		fprintf(stderr, "Staged '%s' using previous resolution.\n",
+		fprintf_ln(stderr, _("Staged '%s' using previous resolution."),
 			item->string);
 	}
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("Unable to write new index file");
+		die(_("Unable to write new index file"));
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -753,7 +753,7 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (!handle_file(path, NULL, NULL)) {
 			copy_file(rerere_path(id, "postimage"), path, 0666);
 			id->collection->status[variant] |= RR_HAS_POSTIMAGE;
-			fprintf(stderr, "Recorded resolution for '%s'.\n", path);
+			fprintf_ln(stderr, _("Recorded resolution for '%s'."), path);
 			free_rerere_id(rr_item);
 			rr_item->util = NULL;
 			return;
@@ -787,9 +787,9 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (rerere_autoupdate)
 			string_list_insert(update, path);
 		else
-			fprintf(stderr,
-				"Resolved '%s' using previous resolution.\n",
-				path);
+			fprintf_ln(stderr,
+				   _("Resolved '%s' using previous resolution."),
+				   path);
 		free_rerere_id(rr_item);
 		rr_item->util = NULL;
 		return;
@@ -803,11 +803,11 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 	if (id->collection->status[variant] & RR_HAS_POSTIMAGE) {
 		const char *path = rerere_path(id, "postimage");
 		if (unlink(path))
-			die_errno("cannot unlink stray '%s'", path);
+			die_errno(_("cannot unlink stray '%s'"), path);
 		id->collection->status[variant] &= ~RR_HAS_POSTIMAGE;
 	}
 	id->collection->status[variant] |= RR_HAS_PREIMAGE;
-	fprintf(stderr, "Recorded preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Recorded preimage for '%s'"), path);
 }
 
 static int do_plain_rerere(struct string_list *rr, int fd)
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("Could not create directory %s", git_path_rr_cache());
+		die(_("Could not create directory %s"), git_path_rr_cache());
 	return 1;
 }
 
@@ -1032,7 +1032,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("Could not parse conflict hunks in '%s'", path);
+		return error(_("Could not parse conflict hunks in '%s'"), path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1050,7 +1050,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("Failed to update conflicted state in '%s'", path);
+			error(_("Failed to update conflicted state in '%s'"), path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
@@ -1061,16 +1061,16 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	}
 
 	if (id->collection->status_nr <= id->variant) {
-		error("no remembered resolution for '%s'", path);
+		error(_("no remembered resolution for '%s'"), path);
 		goto fail_exit;
 	}
 
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for %s", path);
+			error(_("no remembered resolution for %s"), path);
 		else
-			error_errno("cannot unlink %s", filename);
+			error_errno(_("cannot unlink %s"), filename);
 		goto fail_exit;
 	}
 
@@ -1080,7 +1080,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 * the postimage.
 	 */
 	handle_cache(path, sha1, rerere_path(id, "preimage"));
-	fprintf(stderr, "Updated preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Updated preimage for '%s'"), path);
 
 	/*
 	 * And remember that we can record resolution for this
@@ -1089,7 +1089,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for %s\n", path);
+	fprintf_ln(stderr, _("Forgot resolution for %s"), path);
 	return 0;
 
 fail_exit:
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
@@ -1192,7 +1192,7 @@ void rerere_gc(struct string_list *rr)
 	git_config(git_default_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	if (!dir)
-		die_errno("unable to open rr-cache directory");
+		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
 	while ((e = readdir(dir))) {
 		struct rerere_dir *rr_dir;
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PATCH 3/7] rerere: add some documentation
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24  9:20   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.  Also describe a bug that happens because we just linearly
scan for conflict markers.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

This documents my understanding of the rerere conflict normalization
and conflict ID computation logic.  Writing this down helped me
understand the logic, and I thought it may be useful to have this as
documentation in Documentation/technical as well.  Junio: as you wrote
the original NEEDSWORK comment, did you have something more in mind
here that should be documented?

 Documentation/technical/rerere.txt | 43 ++++++++++++++++++++++++++++++
 rerere.c                           |  4 ---
 2 files changed, 43 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..94cc6a7ef0
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,43 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To try and re-do a conflict resolution, even when different merge
+strategies are used, 'rerere' computes a conflict ID for each
+conflict in the file.
+
+This is done by discarding the common ancestor version in the
+diff3-style, and re-ordering the two sides of the conflict, in
+alphabetic order.
+
+Using this technique a conflict that looks as follows when for example
+'master' was merged into a topic branch:
+
+    <<<<<<< HEAD
+    foo
+    =======
+    bar
+    >>>>>>> master
+
+and the opposite way when the topic branch is merged into 'master':
+
+    <<<<<<< HEAD
+    bar
+    =======
+    foo
+    >>>>>>> topic
+
+can be recognized as the same conflict, and can automatically be
+re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
+be calculated from 'bar<NUL>foo<NUL>' in both cases.
+
+If there are multiple conflicts in one file, they are all appended to
+one another, both in the 'preimage' file as well as in the conflict
+ID.
+
+This is currently implemented by simply scanning through the file and
+looking for conflict markers.
diff --git a/rerere.c b/rerere.c
index af5e6179a9..a02a38e072 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * and NUL concatenated together.
  *
  * Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (2 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24  9:47   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict in a file, but
commits the file with the conflict markers, and later the file ends up
in a state in which rerere can't handle it, subsequent rerere
operations that are interested in that path, such as 'rerere clear' or
'rerere forget <path>' will fail, or even worse in the case of 'rerere
clear' segfault.

Such states include nested conflicts, or an extra conflict marker that
doesn't have any match.

This is because the first 'git rerere' when there was only one
conflict in the file leaves an entry in the MERGE_RR file behind.  The
next 'git rerere' will then pick the rerere ID for that file up, and
not assign a new ID as it can't successfully calculate one.  It will
however still try to do the rerere operation, because of the existing
ID.  As the handle_file function fails, it will remove the 'preimage'
for the ID in the process, while leaving the ID in the MERGE_RR file.

Now when 'rerere clear' for example is run, it will segfault in
'has_rerere_resolution', because status is NULL.

To fix this, remove the rerere ID from the MERGE_RR file in case we
can't handle it, and remove the folder for the ID.  Removing it
unconditionally is fine here, because if the user would have resolved
the conflict and ran rerere, the entry would no longer be in the
MERGE_RR file, so we wouldn't have this problem in the first place,
while if the conflict was not resolved, the only thing that's left in
the folder is the 'preimage', which by itself will be regenerated by
git if necessary, so the user won't loose any work.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

I realize the test here may not be as complete as we would want it to
be.  But I first wanted to get some feedback on the approach, before
spending too much time on a proper test (I did test it manually, and
the test does show that the original problem is fixed, but it probably
deserves some cleanup).

 rerere.c          | 12 +++++++-----
 t/t4200-rerere.sh | 25 +++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index a02a38e072..49ace8e108 100644
--- a/rerere.c
+++ b/rerere.c
@@ -824,10 +824,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret;
-
-		if (string_list_has_string(rr, path))
-			continue;
+		int ret, has_string;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -835,7 +832,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		if (ret < 1)
+		has_string = string_list_has_string(rr, path);
+		if (ret < 0 && has_string) {
+			remove_variant(string_list_lookup(rr, path)->util);
+			string_list_remove(rr, path, 1);
+		}
+		if (ret < 1 || has_string)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index eaf18c81cb..27f8afc0b4 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,29 @@ test_expect_success 'multiple identical conflicts' '
 	count_pre_post 0 0
 '
 
+test_expect_success 'rerere with extra conflict markers keeps working' '
+	git reset --hard &&
+
+	git checkout -b branch-1 master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+	echo "baz" >test &&
+	git add test &&
+	git commit -q -m three &&
+
+	git reset --hard &&
+	git checkout -b branch-2 master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge branch-1~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge branch-1 &&
+
+	git rerere clear
+'
+
 test_done
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (3 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24 10:02   ` Junio C Hamano
  2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 49ace8e108..f3e658e374 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	enum {
 		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			hunk_no++;
+			has_conflicts = 1;
 			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 		git_SHA1_Final(sha1, &ctx);
 	if (hunk != RR_CONTEXT)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char *output)
 {
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	struct rerere_io_file io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 		}
 	}
 
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 
 	fclose(io.input);
 	if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	if (io.io.output && fclose(io.io.output))
 		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
 
-	if (hunk_no < 0) {
+	if (has_conflicts < 0) {
 		if (output)
 			unlink_or_warn(output);
 		return error(_("Could not parse conflict hunks in %s"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -955,7 +956,7 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	mmfile_t mmfile[3] = {{NULL}};
 	mmbuffer_t result = {NULL, 0};
 	const struct cache_entry *ce;
-	int pos, len, i, hunk_no;
+	int pos, len, i, has_conflicts;
 	struct rerere_io_mem io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -1009,11 +1010,11 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	 * Grab the conflict ID and optionally write the original
 	 * contents with conflict markers out.
 	 */
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 	strbuf_release(&io.input);
 	if (io.io.output)
 		fclose(io.io.output);
-	return hunk_no;
+	return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PATCH 6/7] rerere: factor out handle_conflict function
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (4 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  7 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is a preparation for the next step, where
this function will be re-used.  No functional changes intended.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 143 +++++++++++++++++++++++++------------------------------
 1 file changed, 65 insertions(+), 78 deletions(-)

diff --git a/rerere.c b/rerere.c
index f3e658e374..f3cfd1c09b 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct rerere_io *io)
 		ferr_puts(str, io->output, &io->wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-	char buf[64];
-
-	while (size) {
-		if (size <= sizeof(buf) - 2) {
-			memset(buf, ch, size);
-			buf[size] = '\n';
-			buf[size + 1] = '\0';
-			size = 0;
-		} else {
-			int sz = sizeof(buf) - 1;
-
-			/*
-			 * Make sure we will not write everything out
-			 * in this round by leaving at least 1 byte
-			 * for the next round, giving the next round
-			 * a chance to add the terminating LF.  Yuck.
-			 */
-			if (size <= sz)
-				sz -= (sz - size) + 1;
-			memset(buf, ch, sz);
-			buf[sz] = '\0';
-			size -= sz;
-		}
-		rerere_io_putstr(buf, io);
-	}
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
 	if (io->output)
@@ -384,37 +352,25 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+	strbuf_addchars(buf, ch, size);
+	strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+			   int marker_size, git_SHA_CTX *ctx)
 {
-	git_SHA_CTX ctx;
-	int has_conflicts = 0;
 	enum {
-		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-	} hunk = RR_CONTEXT;
+		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
-
-	if (sha1)
-		git_SHA1_Init(&ctx);
-
+	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size)) {
-			if (hunk != RR_CONTEXT)
-				goto bad;
-			hunk = RR_SIDE_1;
-		} else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size))
+			goto bad;
+		else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
@@ -427,42 +383,73 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			has_conflicts = 1;
-			hunk = RR_CONTEXT;
-			rerere_io_putconflict('<', marker_size, io);
-			rerere_io_putmem(one.buf, one.len, io);
-			rerere_io_putconflict('=', marker_size, io);
-			rerere_io_putmem(two.buf, two.len, io);
-			rerere_io_putconflict('>', marker_size, io);
-			if (sha1) {
-				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
+			rerere_strbuf_putconflict(out, '<', marker_size);
+			strbuf_addbuf(out, &one);
+			rerere_strbuf_putconflict(out, '=', marker_size);
+			strbuf_addbuf(out, &two);
+			rerere_strbuf_putconflict(out, '>', marker_size);
+			if (ctx) {
+				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
-				git_SHA1_Update(&ctx, two.buf ? two.buf : "",
+				git_SHA1_Update(ctx, two.buf ? two.buf : "",
 					    two.len + 1);
 			}
-			strbuf_reset(&one);
-			strbuf_reset(&two);
+			goto out;
 		} else if (hunk == RR_SIDE_1)
 			strbuf_addbuf(&one, &buf);
 		else if (hunk == RR_ORIGINAL)
 			; /* discard */
 		else if (hunk == RR_SIDE_2)
 			strbuf_addbuf(&two, &buf);
-		else
-			rerere_io_putstr(buf.buf, io);
-		continue;
-	bad:
-		hunk = 99; /* force error exit */
-		break;
 	}
+bad:
+	has_conflicts = -1;
+out:
 	strbuf_release(&one);
 	strbuf_release(&two);
 	strbuf_release(&buf);
 
+	return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reordering our side and their side so that whichever sorts
+ * alphabetically earlier comes before the other one, while
+ * computing the "conflict ID", which is just an SHA-1 hash of
+ * one side of the conflict, NUL, the other side of the conflict,
+ * and NUL concatenated together.
+ *
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
+ */
+static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf out = STRBUF_INIT;
+	int has_conflicts = 0;
+	if (sha1)
+		git_SHA1_Init(&ctx);
+
+	while (!io->getline(&buf, io)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			has_conflicts = handle_conflict(&out, io, marker_size,
+							    sha1 ? &ctx : NULL);
+			if (has_conflicts < 0)
+				break;
+			rerere_io_putmem(out.buf, out.len, io);
+			strbuf_reset(&out);
+		} else
+			rerere_io_putstr(buf.buf, io);
+	}
+	strbuf_release(&buf);
+	strbuf_release(&out);
+
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-	if (hunk != RR_CONTEXT)
-		return -1;
+
 	return has_conflicts;
 }
 
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (5 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-05-20 21:12 ` Thomas Gummerer
  2018-05-24 10:21   ` Junio C Hamano
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  7 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-20 21:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

    <<<<<<< HEAD
    1
    =======
    <<<<<<< HEAD
    3
    =======
    2
    >>>>>>> branch-2
    >>>>>>> branch-3~

it would be reordered as follows in the preimage:

    <<<<<<<
    1
    =======
    <<<<<<<
    2
    =======
    3
    >>>>>>>
    >>>>>>>

and the conflict ID would be calculated as

    sha1(1<NUL><<<<<<<
    2
    =======
    3
    >>>>>>><NUL>)

Stripping out vs. leaving the conflict markers in place should have no
practical impact, but it simplifies the implementation.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

No automated test for this yet.  As mentioned in the cover letter as
well, I'm not sure if this is common enough for us to actually
consider this use case.  I don't know how nested conflicts could
actually be created apart from committing a file with conflict
markers, but maybe I'm just lacking imagination, so if someone has an
example for that I would be very grateful :)  If we decide to do this,
it probably also merits a mention in
Documentation/technical/rerere.txt.

 rerere.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f3cfd1c09b..45e2bd6ff1 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
 	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size))
-			goto bad;
-		else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
+				goto bad;
+			if (hunk == RR_SIDE_1)
+				strbuf_addbuf(&one, &conflict);
+			else
+				strbuf_addbuf(&two, &conflict);
+			strbuf_release(&conflict);
+		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
-- 
2.17.0.588.g4d217cdf8e.dirty


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 1/7] rerere: unify error message when read_cache fails
  2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
@ 2018-05-21 19:00   ` Stefan Beller
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Beller @ 2018-05-21 19:00 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git, Junio C Hamano

On Sun, May 20, 2018 at 2:12 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> We have multiple different variants of the error message we show to
> the user if 'read_cache' fails.  The "Could not read index" variant we
> are using in 'rerere.c' is currently not used anywhere in translated
> form.
>
> As a subsequent commit will mark all output that comes from 'rerere.c'
> for translation, make the life of the translators a little bit easier
> by using a string that is used elsewhere, and marked for translation
> there, and thus most likely already translated.
>
> "index file corrupt" seems to be the most common error message we show
> when 'read_cache' fails, so use that here as well.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>
> "index file corrupt" is also what Stefan chose for his series unifying
> these error messages (and 'die'ing, which I'm not sure is the right
> thing to do here as also mentioned in my reply to [1]).  I'm happy to
> drop this if we decide to go with that series.

Acked-by: <me>

I'd happily have this patch instead of the one in my series.

I was about to ask for translation, but the commit message hints
at a follow up patch marking this for translation, so I'll read on.

Stefan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 2/7] rerere: mark strings for translation
  2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
@ 2018-05-24  7:20   ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2018-05-24  7:20 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

>  		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
> -			die("unable to write rerere record");
> +			die(_("unable to write rerere record"));

As we'd be adding these new strings to the .po file, perhaps we
would want to downcase the first letter in the message to match the
convention?

> ...
> -		return error_errno("Could not open %s", path);
> +		return error_errno(_("Could not open %s"), path);
> ...
> -		error("There were errors while writing %s (%s)",
> +		error(_("There were errors while writing %s (%s)"),
> ...
> -		io.io.wrerror = error_errno("Failed to flush %s", path);
> +		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
> ...
> -		return error("Could not parse conflict hunks in %s", path);
> +		return error(_("Could not parse conflict hunks in %s"), path);

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 3/7] rerere: add some documentation
  2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
@ 2018-05-24  9:20   ` Junio C Hamano
  2018-06-03 11:41     ` Thomas Gummerer
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2018-05-24  9:20 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> +Conflict normalization
> +----------------------
> +
> +To try and re-do a conflict resolution, even when different merge
> +strategies are used, 'rerere' computes a conflict ID for each
> +conflict in the file.
> +
> +This is done by discarding the common ancestor version in the
> +diff3-style, and re-ordering the two sides of the conflict, in
> +alphabetic order.

s/discarding.*-style/normalising the conflicted section to 'merge' style/

The motivation behind the normalization should probably be given
upfront in the first paragraph.  It is to ensure the recorded
resolutions can be looked up from the rerere database for
application, even when branches are merged in different order.  I am
not sure what you meant by even when different merge stratagies are
used; I'd drop that if I were writing the paragraph.

> +Using this technique a conflict that looks as follows when for example
> +'master' was merged into a topic branch:
> +
> +    <<<<<<< HEAD
> +    foo
> +    =======
> +    bar
> +    >>>>>>> master
> +
> +and the opposite way when the topic branch is merged into 'master':
> +
> +    <<<<<<< HEAD
> +    bar
> +    =======
> +    foo
> +    >>>>>>> topic
> +
> +can be recognized as the same conflict, and can automatically be
> +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
> +be calculated from 'bar<NUL>foo<NUL>' in both cases.

You earlier talked about normalizing and reordering, but did not
talk about "concatenate both with NUL in between and hash", so the
explanation in the last two lines are not quite understandable by
mere mortals, even though I know which part of the code you are
referring to.  When you talk about hasing, you may want to make sure
the readers understand that the branch label on <<< and >>> lines
are ignored.

> +If there are multiple conflicts in one file, they are all appended to
> +one another, both in the 'preimage' file as well as in the conflict
> +ID.

In case it was not clear (and I do not think it is to those who only
read your description and haven't thought things through
themselves), this concatenation is why the normalization by
reordering is helpful.  Imagine that a common ancestor had a file
with a line with string "A" on it (I'll call such a line "line A"
for brevity in the following) in its early part, and line X in its
late part.  And then you fork four branches that do these things:

    - AB: changes A to B
    - AC: changes A to C
    - XY: changes X to Y
    - XZ: changes X to Z

Now, forking a branch ABAC off of branch AB and then merging AC into
it, and forking a branch ACAB off of branch AC and then merging AB
into it, would yield the conflict in a different order.  The former
would say "A became B or C, what now?" while the latter would say "A
became C or B, what now?"

But the act of merging AC into ABAC and resolving the conflict to
leave line D means that you declare: 

    After examining what branches AB and AC did, I believe that
    making line A into line D is the best thing to do that is
    compatible with what AB and AC wanted to do.

So the conflict we would see when merging AB into ACAB should be
resolved the same way---it is the resolution that is in line with
that declaration.

Imagine that similarly you had previously forked branch XYXZ from
XY, merged XZ into it, and resolved "X became Y or Z" into "X became
W".

Now, if you forked a branch ABXY from AB and then merged XY, then
ABXY would have line B in its early part and line Y in its later
part.  Such a merge would be quite clean.  We can construct
4 combinations using these four branches ((AB, AC) x (XY, XZ)).

Merging ABXY and ACXZ would make "an early A became B or C, a late X
became Y or Z" conflict, while merging ACXY and ABXZ would make "an
early A became C or B, a late X became Y or Z".  We can see there
are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").

By sorting, we can give the conflict its canonical name, namely, "an
early part became B or C, a late part becames X or Y", and whenever
any of these four patterns appear, we can get to the same conflict
and resolution that we saw earlier.  Without the sorting, we will
have to somehow find a previous resolution from combinatorial
explosion ;-)

These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'",
2016-05-23), the conflict ID can safely collide, i.e. hash
collisions that drops completely different conflicts and their
resolutions into the same .git/rr-cache/$id directory will not
interfere with proper operation of the system, thanks to that
rerere-multi topic that allows us to store multiple preimage
conflicts that happens to share the same conflict ID with their
corresponding postimage resolutions.

In theory, we *should* be able to stub out the SHA-1 computation and
give every conflict the same ID and rerere should still operate
correctly, even though I haven't tried it yet myself.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-05-24  9:47   ` Junio C Hamano
  2018-05-24 18:54     ` Thomas Gummerer
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2018-05-24  9:47 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> To fix this, remove the rerere ID from the MERGE_RR file in case we
> can't handle it, and remove the folder for the ID.  Removing it
> unconditionally is fine here, because if the user would have resolved
> the conflict and ran rerere, the entry would no longer be in the
> MERGE_RR file, so we wouldn't have this problem in the first place,

I do not think removing the directory and losing _other_ conflicts
and their resolutions, if they exist, is fine in the modern world
order post rerere-multi update in 2016.  Well, it is just as safe as
"rm -rf .git/rr-cache/" in the sense that it won't make Git start
segfaulting, but it is not fine as it is discarding information of
conflicts that has nothing to do with the current one that is
problematic.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not
  2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-05-24 10:02   ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2018-05-24 10:02 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> We currently return the exact number of conflict hunks a certain path
> has from the 'handle_paths' function.  However all of its callers only
> care whether there are conflicts or not or if there is an error.
> Return only that information, and document that only that information
> is returned.  This will simplify the code in the subsequent steps.

Makes sense.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts
  2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-05-24 10:21   ` Junio C Hamano
  2018-05-24 19:07     ` Thomas Gummerer
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2018-05-24 10:21 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> No automated test for this yet.  As mentioned in the cover letter as
> well, I'm not sure if this is common enough for us to actually
> consider this use case.  I don't know how nested conflicts could
> actually be created apart from committing a file with conflict
> markers,

Recursive merge whose inner merge leaves conflict markers?

One thing that makes me wonder is that the conflict markers may not
"nest" so nicely.  For example, if inner merges had two conflicts
like these:

<<<
 <<<<<
 A
 =====
 B
 >>>>>
===
 <<<<<
 A
 =====
 C
 >>>>>
>>>

where one side made something to A or B, while the other side made
something (or something else) to A or C, I would imagine that the
outer conflict could be "optimized" to produce this instead:


 <<<<<
 A
 =====
<<<
 B
===
 C
>>>
 >>>>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-24  9:47   ` Junio C Hamano
@ 2018-05-24 18:54     ` Thomas Gummerer
  2018-05-25  1:20       ` Junio C Hamano
  0 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-24 18:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 05/24, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > To fix this, remove the rerere ID from the MERGE_RR file in case we
> > can't handle it, and remove the folder for the ID.  Removing it
> > unconditionally is fine here, because if the user would have resolved
> > the conflict and ran rerere, the entry would no longer be in the
> > MERGE_RR file, so we wouldn't have this problem in the first place,
> 
> I do not think removing the directory and losing _other_ conflicts
> and their resolutions, if they exist, is fine in the modern world
> order post rerere-multi update in 2016.  Well, it is just as safe as
> "rm -rf .git/rr-cache/" in the sense that it won't make Git start
> segfaulting, but it is not fine as it is discarding information of
> conflicts that has nothing to do with the current one that is
> problematic.

Sorry I botched the description here, and failed to describe what the
code is actually doing.  We're actually only removing the variant in
the MERGE_RR file, whose path we are now no longer able to handle.
And I think that's fine to do, because if it is still in the MERGE_RR
file the conflict hasn't been resolved yet, afaiu.

Will update the commit message.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts
  2018-05-24 10:21   ` Junio C Hamano
@ 2018-05-24 19:07     ` Thomas Gummerer
  0 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-05-24 19:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 05/24, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > No automated test for this yet.  As mentioned in the cover letter as
> > well, I'm not sure if this is common enough for us to actually
> > consider this use case.  I don't know how nested conflicts could
> > actually be created apart from committing a file with conflict
> > markers,
> 
> Recursive merge whose inner merge leaves conflict markers?

Thanks, lots of stuff in Git I still have to learn :)

> One thing that makes me wonder is that the conflict markers may not
> "nest" so nicely.  For example, if inner merges had two conflicts
> like these:
> 
> <<<
>  <<<<<
>  A
>  =====
>  B
>  >>>>>
> ===
>  <<<<<
>  A
>  =====
>  C
>  >>>>>
> >>>
> 
> where one side made something to A or B, while the other side made
> something (or something else) to A or C, I would imagine that the
> outer conflict could be "optimized" to produce this instead:
> 
> 
>  <<<<<
>  A
>  =====
> <<<
>  B
> ===
>  C
> >>>
>  >>>>>

Yeah, I do think that would be a nicer merge conflict to solve.  But I
think that should be done in a separate patch series if we decide to
do so.  When this one lands rerere will be able to handle the conflict
either way :)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved
  2018-05-24 18:54     ` Thomas Gummerer
@ 2018-05-25  1:20       ` Junio C Hamano
  0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2018-05-25  1:20 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Sorry I botched the description here, and failed to describe what the
> code is actually doing.  We're actually only removing the variant in
> the MERGE_RR file, whose path we are now no longer able to handle.

Oh, that's absolutely fine, then.  Thanks for a prompt update.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC/PATCH 3/7] rerere: add some documentation
  2018-05-24  9:20   ` Junio C Hamano
@ 2018-06-03 11:41     ` Thomas Gummerer
  0 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-03 11:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 05/24, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > +Conflict normalization
> > +----------------------
> > +
> > +To try and re-do a conflict resolution, even when different merge
> > +strategies are used, 'rerere' computes a conflict ID for each
> > +conflict in the file.
> > +
> > +This is done by discarding the common ancestor version in the
> > +diff3-style, and re-ordering the two sides of the conflict, in
> > +alphabetic order.
> 
> s/discarding.*-style/normalising the conflicted section to 'merge' style/
> 
> The motivation behind the normalization should probably be given
> upfront in the first paragraph.  It is to ensure the recorded
> resolutions can be looked up from the rerere database for
> application, even when branches are merged in different order.  I am
> not sure what you meant by even when different merge stratagies are
> used; I'd drop that if I were writing the paragraph.

What I meant was when different conflict styles are used, and when the
branches are merged in different orders.  But merge strategies is
obviously not a good word for that.  Will rephrase this.

> > +Using this technique a conflict that looks as follows when for example
> > +'master' was merged into a topic branch:
> > +
> > +    <<<<<<< HEAD
> > +    foo
> > +    =======
> > +    bar
> > +    >>>>>>> master
> > +
> > +and the opposite way when the topic branch is merged into 'master':
> > +
> > +    <<<<<<< HEAD
> > +    bar
> > +    =======
> > +    foo
> > +    >>>>>>> topic
> > +
> > +can be recognized as the same conflict, and can automatically be
> > +re-resolved by 'rerere', as the SHA-1 sum of the two conflicts would
> > +be calculated from 'bar<NUL>foo<NUL>' in both cases.
> 
> You earlier talked about normalizing and reordering, but did not
> talk about "concatenate both with NUL in between and hash", so the
> explanation in the last two lines are not quite understandable by
> mere mortals, even though I know which part of the code you are
> referring to.  When you talk about hasing, you may want to make sure
> the readers understand that the branch label on <<< and >>> lines
> are ignored.
> 
> > +If there are multiple conflicts in one file, they are all appended to
> > +one another, both in the 'preimage' file as well as in the conflict
> > +ID.
> 
> In case it was not clear (and I do not think it is to those who only
> read your description and haven't thought things through
> themselves), this concatenation is why the normalization by
> reordering is helpful.  Imagine that a common ancestor had a file
> with a line with string "A" on it (I'll call such a line "line A"
> for brevity in the following) in its early part, and line X in its
> late part.  And then you fork four branches that do these things:
> 
>     - AB: changes A to B
>     - AC: changes A to C
>     - XY: changes X to Y
>     - XZ: changes X to Z
> 
> Now, forking a branch ABAC off of branch AB and then merging AC into
> it, and forking a branch ACAB off of branch AC and then merging AB
> into it, would yield the conflict in a different order.  The former
> would say "A became B or C, what now?" while the latter would say "A
> became C or B, what now?"
> 
> But the act of merging AC into ABAC and resolving the conflict to
> leave line D means that you declare: 
> 
>     After examining what branches AB and AC did, I believe that
>     making line A into line D is the best thing to do that is
>     compatible with what AB and AC wanted to do.
> 
> So the conflict we would see when merging AB into ACAB should be
> resolved the same way---it is the resolution that is in line with
> that declaration.
> 
> Imagine that similarly you had previously forked branch XYXZ from
> XY, merged XZ into it, and resolved "X became Y or Z" into "X became
> W".
> 
> Now, if you forked a branch ABXY from AB and then merged XY, then
> ABXY would have line B in its early part and line Y in its later
> part.  Such a merge would be quite clean.  We can construct
> 4 combinations using these four branches ((AB, AC) x (XY, XZ)).
> 
> Merging ABXY and ACXZ would make "an early A became B or C, a late X
> became Y or Z" conflict, while merging ACXY and ABXZ would make "an
> early A became C or B, a late X became Y or Z".  We can see there
> are 4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
> 
> By sorting, we can give the conflict its canonical name, namely, "an
> early part became B or C, a late part becames X or Y", and whenever
> any of these four patterns appear, we can get to the same conflict
> and resolution that we saw earlier.  Without the sorting, we will
> have to somehow find a previous resolution from combinatorial
> explosion ;-)

Thanks for the in depth explanation!  I'll incorporate this into the
document.

> These days post ec34a8b1 ("Merge branch 'jc/rerere-multi'",
> 2016-05-23), the conflict ID can safely collide, i.e. hash
> collisions that drops completely different conflicts and their
> resolutions into the same .git/rr-cache/$id directory will not
> interfere with proper operation of the system, thanks to that
> rerere-multi topic that allows us to store multiple preimage
> conflicts that happens to share the same conflict ID with their
> corresponding postimage resolutions.
> 
> In theory, we *should* be able to stub out the SHA-1 computation and
> give every conflict the same ID and rerere should still operate
> correctly, even though I haven't tried it yet myself.

I gave this a quick try, and the test suite seems to pass with the
hash computation giving the same ID to all conflicts.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 00/10] rerere: handle nested conflicts
  2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
                   ` (6 preceding siblings ...)
  2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-06-05 21:52 ` " Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
                     ` (11 more replies)
  7 siblings, 12 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

The previous round was at
<20180520211210.1248-1-t.gummerer@gmail.com>.

Thanks Junio for the comments on the previous round.

Changes since v2:
 - lowercase the first letter in some error/warning messages before
   marking them for translation
 - wrap paths in output in single quotes, for consistency, and to make
   some of the messages the same as ones that are already translated
 - mark messages in builtin/rerere.c for translation as well, which I
   had previously forgotten.
 - expanded the technical documentation on rerere.  The entire
   document is basically rewritten.
 - changed the test in 6/10 to just fake a conflict marker inside of
   one of the hunks instead of using an inner conflict created by a
   merge.  This is to make sure the codepath is still hit after we
   handle inner conflicts properly.
 - added tests for handling inner conflict markers
 - added one commit to recalculate the conflict ID when an unresolved
   conflict is committed, and the subsequent operation conflicts again
   in the same file.  More explanation in the commit message of that
   commit.

range-diff below.  A few commits changed enough for range-diff
to give up showing the differences in those, they are probably best
reviewed as the whole patch anyway:

1:  901b638400 ! 1:  2825342cc2 rerere: unify error message when read_cache fails
    @@ -1,6 +1,6 @@
     Author: Thomas Gummerer <t.gummerer@gmail.com>
     
    -    rerere: unify error message when read_cache fails
    +    rerere: unify error messages when read_cache fails
     
         We have multiple different variants of the error message we show to
         the user if 'read_cache' fails.  The "Could not read index" variant we
-:  ---------- > 2:  d1500028aa rerere: lowercase error messages
-:  ---------- > 3:  ed3601ee71 rerere: wrap paths in output in sq
2:  c48ffededd ! 4:  6ead84a199 rerere: mark strings for translation
    @@ -9,6 +9,28 @@
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    +diff --git a/builtin/rerere.c b/builtin/rerere.c
    +--- a/builtin/rerere.c
    ++++ b/builtin/rerere.c
    +@@
    + 	if (!strcmp(argv[0], "forget")) {
    + 		struct pathspec pathspec;
    + 		if (argc < 2)
    +-			warning("'git rerere forget' without paths is deprecated");
    ++			warning(_("'git rerere forget' without paths is deprecated"));
    + 		parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD,
    + 			       prefix, argv + 1);
    + 		return rerere_forget(&pathspec);
    +@@
    + 			const char *path = merge_rr.items[i].string;
    + 			const struct rerere_id *id = merge_rr.items[i].util;
    + 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
    +-				die("unable to generate diff for '%s'", rerere_path(id, NULL));
    ++				die(_("unable to generate diff for '%s'"), rerere_path(id, NULL));
    + 		}
    + 	} else
    + 		usage_with_options(rerere_usage, options);
    +
     diff --git a/rerere.c b/rerere.c
     --- a/rerere.c
     +++ b/rerere.c
    @@ -53,14 +75,14 @@
      	io.input = fopen(path, "r");
      	io.io.wrerror = 0;
      	if (!io.input)
    --		return error_errno("Could not open %s", path);
    -+		return error_errno(_("Could not open %s"), path);
    +-		return error_errno("could not open '%s'", path);
    ++		return error_errno(_("could not open '%s'"), path);
      
      	if (output) {
      		io.io.output = fopen(output, "w");
      		if (!io.io.output) {
    --			error_errno("Could not write %s", output);
    -+			error_errno(_("Could not write %s"), output);
    +-			error_errno("could not write '%s'", output);
    ++			error_errno(_("could not write '%s'"), output);
      			fclose(io.input);
      			return -1;
      		}
    @@ -68,18 +90,18 @@
      
      	fclose(io.input);
      	if (io.io.wrerror)
    --		error("There were errors while writing %s (%s)",
    -+		error(_("There were errors while writing %s (%s)"),
    +-		error("there were errors while writing '%s' (%s)",
    ++		error(_("there were errors while writing '%s' (%s)"),
      		      path, strerror(io.io.wrerror));
      	if (io.io.output && fclose(io.io.output))
    --		io.io.wrerror = error_errno("Failed to flush %s", path);
    -+		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
    +-		io.io.wrerror = error_errno("failed to flush '%s'", path);
    ++		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
      
      	if (hunk_no < 0) {
      		if (output)
      			unlink_or_warn(output);
    --		return error("Could not parse conflict hunks in %s", path);
    -+		return error(_("Could not parse conflict hunks in %s"), path);
    +-		return error("could not parse conflict hunks in '%s'", path);
    ++		return error(_("could not parse conflict hunks in '%s'"), path);
      	}
      	if (io.io.wrerror)
      		return -1;
    @@ -105,21 +127,21 @@
      	 * Mark that "postimage" was used to help gc.
      	 */
      	if (utime(rerere_path(id, "postimage"), NULL) < 0)
    --		warning_errno("failed utime() on %s",
    -+		warning_errno(_("failed utime() on %s"),
    +-		warning_errno("failed utime() on '%s'",
    ++		warning_errno(_("failed utime() on '%s'"),
      			      rerere_path(id, "postimage"));
      
      	/* Update "path" with the resolution */
      	f = fopen(path, "w");
      	if (!f)
    --		return error_errno("Could not open %s", path);
    -+		return error_errno(_("Could not open %s"), path);
    +-		return error_errno("could not open '%s'", path);
    ++		return error_errno(_("could not open '%s'"), path);
      	if (fwrite(result.ptr, result.size, 1, f) != 1)
    --		error_errno("Could not write %s", path);
    -+		error_errno(_("Could not write %s"), path);
    +-		error_errno("could not write '%s'", path);
    ++		error_errno(_("could not write '%s'"), path);
      	if (fclose(f))
    --		return error_errno("Writing %s failed", path);
    -+		return error_errno(_("Writing %s failed"), path);
    +-		return error_errno("writing '%s' failed", path);
    ++		return error_errno(_("writing '%s' failed"), path);
      
      out:
      	free(cur.ptr);
    @@ -134,8 +156,8 @@
      
      	if (write_locked_index(&the_index, &index_lock,
      			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
    --		die("Unable to write new index file");
    -+		die(_("Unable to write new index file"));
    +-		die("unable to write new index file");
    ++		die(_("unable to write new index file"));
      }
      
      static void remove_variant(struct rerere_id *id)
    @@ -179,8 +201,8 @@
      		return rr_cache_exists;
      
      	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
    --		die("Could not create directory %s", git_path_rr_cache());
    -+		die(_("Could not create directory %s"), git_path_rr_cache());
    +-		die("could not create directory '%s'", git_path_rr_cache());
    ++		die(_("could not create directory '%s'"), git_path_rr_cache());
      	return 1;
      }
      
    @@ -188,8 +210,8 @@
      	 */
      	ret = handle_cache(path, sha1, NULL);
      	if (ret < 1)
    --		return error("Could not parse conflict hunks in '%s'", path);
    -+		return error(_("Could not parse conflict hunks in '%s'"), path);
    +-		return error("could not parse conflict hunks in '%s'", path);
    ++		return error(_("could not parse conflict hunks in '%s'"), path);
      
      	/* Nuke the recorded resolution for the conflict */
      	id = new_rerere_id(sha1);
    @@ -214,11 +236,11 @@
      	filename = rerere_path(id, "postimage");
      	if (unlink(filename)) {
      		if (errno == ENOENT)
    --			error("no remembered resolution for %s", path);
    -+			error(_("no remembered resolution for %s"), path);
    +-			error("no remembered resolution for '%s'", path);
    ++			error(_("no remembered resolution for '%s'"), path);
      		else
    --			error_errno("cannot unlink %s", filename);
    -+			error_errno(_("cannot unlink %s"), filename);
    +-			error_errno("cannot unlink '%s'", filename);
    ++			error_errno(_("cannot unlink '%s'"), filename);
      		goto fail_exit;
      	}
      
    @@ -235,8 +257,8 @@
      	item = string_list_insert(rr, path);
      	free_rerere_id(item);
      	item->util = id;
    --	fprintf(stderr, "Forgot resolution for %s\n", path);
    -+	fprintf_ln(stderr, _("Forgot resolution for %s"), path);
    +-	fprintf(stderr, "Forgot resolution for '%s'\n", path);
    ++	fprintf(stderr, _("Forgot resolution for '%s'\n"), path);
      	return 0;
      
      fail_exit:
3:  e29449406f < -:  ---------- rerere: add some documentation
-:  ---------- > 5:  caad276aca rerere: add some documentation
4:  3b41520b28 ! 6:  ad88a6b8a8 rerere: fix crash when conflict goes unresolved
    @@ -23,14 +23,18 @@
         Now when 'rerere clear' for example is run, it will segfault in
         'has_rerere_resolution', because status is NULL.
     
    -    To fix this, remove the rerere ID from the MERGE_RR file in case we
    -    can't handle it, and remove the folder for the ID.  Removing it
    -    unconditionally is fine here, because if the user would have resolved
    -    the conflict and ran rerere, the entry would no longer be in the
    -    MERGE_RR file, so we wouldn't have this problem in the first place,
    -    while if the conflict was not resolved, the only thing that's left in
    -    the folder is the 'preimage', which by itself will be regenerated by
    -    git if necessary, so the user won't loose any work.
    +    To fix this, remove the rerere ID from the MERGE_RR file in the case
    +    when we can't handle it, and remove the corresponding variant from
    +    .git/rr-cache/.  Removing it unconditionally is fine here, because if
    +    the user would have resolved the conflict and ran rerere, the entry
    +    would no longer be in the MERGE_RR file, so we wouldn't have this
    +    problem in the first place, while if the conflict was not resolved,
    +    the only thing that's left in the folder is the 'preimage', which by
    +    itself will be regenerated by git if necessary, so the user won't
    +    loose any work.
    +
    +    Note that other variants that have the same conflict ID will not be
    +    touched.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    @@ -71,16 +75,13 @@
      	count_pre_post 0 0
      '
      
    -+test_expect_success 'rerere with extra conflict markers keeps working' '
    ++test_expect_success 'rerere with unexpected conflict markers does not crash' '
     +	git reset --hard &&
     +
     +	git checkout -b branch-1 master &&
     +	echo "bar" >test &&
     +	git add test &&
     +	git commit -q -m two &&
    -+	echo "baz" >test &&
    -+	git add test &&
    -+	git commit -q -m three &&
     +
     +	git reset --hard &&
     +	git checkout -b branch-2 master &&
    @@ -88,10 +89,10 @@
     +	git add test &&
     +	git commit -q -a -m one &&
     +
    -+	test_must_fail git merge branch-1~ &&
    -+	git add test &&
    -+	git commit -q -m "will solve conflicts later" &&
     +	test_must_fail git merge branch-1 &&
    ++	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
    ++	mv test.tmp test &&
    ++	git rerere &&
     +
     +	git rerere clear
     +'
5:  411a4ee37e ! 7:  15f9efcba6 rerere: only return whether a path has conflicts or not
    @@ -67,13 +67,13 @@
      	if (io.io.wrerror)
     @@
      	if (io.io.output && fclose(io.io.output))
    - 		io.io.wrerror = error_errno(_("Failed to flush %s"), path);
    + 		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
      
     -	if (hunk_no < 0) {
     +	if (has_conflicts < 0) {
      		if (output)
      			unlink_or_warn(output);
    - 		return error(_("Could not parse conflict hunks in %s"), path);
    + 		return error(_("could not parse conflict hunks in '%s'"), path);
      	}
      	if (io.io.wrerror)
      		return -1;
6:  fc9f715913 = 8:  1490efaad3 rerere: factor out handle_conflict function
7:  f7dea09a0a < -:  ---------- rerere: teach rerere to handle nested conflicts
-:  ---------- > 9:  6619650c42 rerere: teach rerere to handle nested conflicts
-:  ---------- > 10:  4b11dce7dd rerere: recalculate conflict ID when unresolved conflict is committed

Thomas Gummerer (10):
  rerere: unify error messages when read_cache fails
  rerere: lowercase error messages
  rerere: wrap paths in output in sq
  rerere: mark strings for translation
  rerere: add some documentation
  rerere: fix crash when conflict goes unresolved
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: teach rerere to handle nested conflicts
  rerere: recalculate conflict ID when unresolved conflict is committed

 Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
 builtin/rerere.c                   |   4 +-
 rerere.c                           | 246 ++++++++++++++---------------
 t/t4200-rerere.sh                  |  67 ++++++++
 4 files changed, 372 insertions(+), 127 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

-- 
2.18.0.rc1.242.g61856ae69


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 01/10] rerere: unify error messages when read_cache fails
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index 18cae2d11c..4b4869662d 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 02/10] rerere: lowercase error messages
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Documentation/CodingGuidelines mentions that error messages should be
lowercase.  Prior to marking them for translation follow that pattern
in rerere as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 4b4869662d..eca182023f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("Could not write %s", output);
+			error_errno("could not write %s", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("There were errors while writing %s (%s)",
+		error("there were errors while writing %s (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("Failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush %s", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("Could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in %s", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -690,11 +690,11 @@ static int merge(const struct rerere_id *id, const char *path)
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("Could not write %s", path);
+		error_errno("could not write %s", path);
 	if (fclose(f))
-		return error_errno("Writing %s failed", path);
+		return error_errno("writing %s failed", path);
 
 out:
 	free(cur.ptr);
@@ -721,7 +721,7 @@ static void update_paths(struct string_list *update)
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("Unable to write new index file");
+		die("unable to write new index file");
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("Could not create directory %s", git_path_rr_cache());
+		die("could not create directory %s", git_path_rr_cache());
 	return 1;
 }
 
@@ -1032,7 +1032,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("Could not parse conflict hunks in '%s'", path);
+		return error("could not parse conflict hunks in '%s'", path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 03/10] rerere: wrap paths in output in sq
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

It looks like most paths in the output in the git codebase are wrapped
in single quotes.  Standardize on that in rerere as well.

Apart from being more consistent, this also makes some of the strings
match strings that are already translated in other parts of the
codebase, thus reducing the work for translators, when the strings are
marked for translation in a subsequent commit.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  2 +-
 rerere.c         | 26 +++++++++++++-------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index 0bc40298c2..e0c67c98e9 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for %s", rerere_path(id, NULL));
+				die("unable to generate diff for '%s'", rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index eca182023f..0e5956a51c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write %s", output);
+			error_errno("could not write '%s'", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing %s (%s)",
+		error("there were errors while writing '%s' (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush '%s'", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in '%s'", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on %s",
+		warning_errno("failed utime() on '%s'",
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write %s", path);
+		error_errno("could not write '%s'", path);
 	if (fclose(f))
-		return error_errno("writing %s failed", path);
+		return error_errno("writing '%s' failed", path);
 
 out:
 	free(cur.ptr);
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory %s", git_path_rr_cache());
+		die("could not create directory '%s'", git_path_rr_cache());
 	return 1;
 }
 
@@ -1068,9 +1068,9 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for %s", path);
+			error("no remembered resolution for '%s'", path);
 		else
-			error_errno("cannot unlink %s", filename);
+			error_errno("cannot unlink '%s'", filename);
 		goto fail_exit;
 	}
 
@@ -1089,7 +1089,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for %s\n", path);
+	fprintf(stderr, "Forgot resolution for '%s'\n", path);
 	return 0;
 
 fail_exit:
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 04/10] rerere: mark strings for translation
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (2 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 05/10] rerere: add some documentation Thomas Gummerer
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

'git rerere' is considered a plumbing command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on its output
either way.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  4 +--
 rerere.c         | 68 ++++++++++++++++++++++++------------------------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index e0c67c98e9..5ed941b91f 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -75,7 +75,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[0], "forget")) {
 		struct pathspec pathspec;
 		if (argc < 2)
-			warning("'git rerere forget' without paths is deprecated");
+			warning(_("'git rerere forget' without paths is deprecated"));
 		parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD,
 			       prefix, argv + 1);
 		return rerere_forget(&pathspec);
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for '%s'", rerere_path(id, NULL));
+				die(_("unable to generate diff for '%s'"), rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index 0e5956a51c..74ce422634 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
 		/* There has to be the hash, tab, path and then NUL */
 		if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 
 		if (buf.buf[40] != '.') {
 			variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
 			errno = 0;
 			variant = strtol(buf.buf + 41, &path, 10);
 			if (errno)
-				die("corrupt MERGE_RR");
+				die(_("corrupt MERGE_RR"));
 		}
 		if (*(path++) != '\t')
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 		buf.buf[40] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
 				    rr->items[i].string, 0);
 
 		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-			die("unable to write rerere record");
+			die(_("unable to write rerere record"));
 
 		strbuf_release(&buf);
 	}
 	if (commit_lock_file(&write_lock) != 0)
-		die("unable to write rerere record");
+		die(_("unable to write rerere record"));
 	return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write '%s'", output);
+			error_errno(_("could not write '%s'"), output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing '%s' (%s)",
+		error(_("there were errors while writing '%s' (%s)"),
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush '%s'", path);
+		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on '%s'",
+		warning_errno(_("failed utime() on '%s'"),
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write '%s'", path);
+		error_errno(_("could not write '%s'"), path);
 	if (fclose(f))
-		return error_errno("writing '%s' failed", path);
+		return error_errno(_("writing '%s' failed"), path);
 
 out:
 	free(cur.ptr);
@@ -715,13 +715,13 @@ static void update_paths(struct string_list *update)
 		struct string_list_item *item = &update->items[i];
 		if (add_file_to_cache(item->string, 0))
 			exit(128);
-		fprintf(stderr, "Staged '%s' using previous resolution.\n",
+		fprintf_ln(stderr, _("Staged '%s' using previous resolution."),
 			item->string);
 	}
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("unable to write new index file");
+		die(_("unable to write new index file"));
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -753,7 +753,7 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (!handle_file(path, NULL, NULL)) {
 			copy_file(rerere_path(id, "postimage"), path, 0666);
 			id->collection->status[variant] |= RR_HAS_POSTIMAGE;
-			fprintf(stderr, "Recorded resolution for '%s'.\n", path);
+			fprintf_ln(stderr, _("Recorded resolution for '%s'."), path);
 			free_rerere_id(rr_item);
 			rr_item->util = NULL;
 			return;
@@ -787,9 +787,9 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (rerere_autoupdate)
 			string_list_insert(update, path);
 		else
-			fprintf(stderr,
-				"Resolved '%s' using previous resolution.\n",
-				path);
+			fprintf_ln(stderr,
+				   _("Resolved '%s' using previous resolution."),
+				   path);
 		free_rerere_id(rr_item);
 		rr_item->util = NULL;
 		return;
@@ -803,11 +803,11 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 	if (id->collection->status[variant] & RR_HAS_POSTIMAGE) {
 		const char *path = rerere_path(id, "postimage");
 		if (unlink(path))
-			die_errno("cannot unlink stray '%s'", path);
+			die_errno(_("cannot unlink stray '%s'"), path);
 		id->collection->status[variant] &= ~RR_HAS_POSTIMAGE;
 	}
 	id->collection->status[variant] |= RR_HAS_PREIMAGE;
-	fprintf(stderr, "Recorded preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Recorded preimage for '%s'"), path);
 }
 
 static int do_plain_rerere(struct string_list *rr, int fd)
@@ -879,7 +879,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory '%s'", git_path_rr_cache());
+		die(_("could not create directory '%s'"), git_path_rr_cache());
 	return 1;
 }
 
@@ -1032,7 +1032,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1050,7 +1050,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("Failed to update conflicted state in '%s'", path);
+			error(_("Failed to update conflicted state in '%s'"), path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
@@ -1061,16 +1061,16 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	}
 
 	if (id->collection->status_nr <= id->variant) {
-		error("no remembered resolution for '%s'", path);
+		error(_("no remembered resolution for '%s'"), path);
 		goto fail_exit;
 	}
 
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for '%s'", path);
+			error(_("no remembered resolution for '%s'"), path);
 		else
-			error_errno("cannot unlink '%s'", filename);
+			error_errno(_("cannot unlink '%s'"), filename);
 		goto fail_exit;
 	}
 
@@ -1080,7 +1080,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 * the postimage.
 	 */
 	handle_cache(path, sha1, rerere_path(id, "preimage"));
-	fprintf(stderr, "Updated preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Updated preimage for '%s'"), path);
 
 	/*
 	 * And remember that we can record resolution for this
@@ -1089,7 +1089,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for '%s'\n", path);
+	fprintf(stderr, _("Forgot resolution for '%s'\n"), path);
 	return 0;
 
 fail_exit:
@@ -1104,7 +1104,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
@@ -1192,7 +1192,7 @@ void rerere_gc(struct string_list *rr)
 	git_config(git_default_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	if (!dir)
-		die_errno("unable to open rr-cache directory");
+		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
 	while ((e = readdir(dir))) {
 		struct rerere_dir *rr_dir;
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 05/10] rerere: add some documentation
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (3 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 142 +++++++++++++++++++++++++++++
 rerere.c                           |   4 -
 2 files changed, 142 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..2c517fe0fc
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,142 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Differnt conflict styles and branch names are dealt with by stripping
+that information from the conflict markers, and removing extraneous
+information from the `diff3` conflict style.
+
+Branches being merged in different order are dealt with by sorting the
+conflict hunks.  More on each of those parts in the following
+sections.
+
+Once these two normalization operations are applied, a conflict ID is
+created based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Stripping extraneous information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with with a line with the string "A" (for
+brevity this line is called "line A" for brevity in the following) in
+it.  In branch AB this line is changed to "B", in AC, this line is
+changed to C, and branch AC2 is forked off of AC, after the line was
+changed to C.
+
+Now forking a branch ABAC off of branch AB and then merging AC into it,
+we'd get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    =======
+    C
+    >>>>>>> AC
+
+Now doing the analogous with AC2 (forking a branch ABAC2 off of branch
+AB and then merging branch AC2 into it), maybe using the diff3
+conflict style, we'd get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    ||||||| merged common ancestors
+    A
+    =======
+    C
+    >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+    After examining what branches AB and AC did, I believe that making
+    line A into line D is the best thing to do that is compatible with
+    what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the diff3 output is removed.  The above
+examples would both result in the following normalized conflict:
+
+    <<<<<<<
+    B
+    =======
+    C
+    >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+    - AB: changes A to B
+    - AC: changes A to C
+    - XY: changes X to Y
+    - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+    After examining what branches AB and AC did, I believe that
+    making line A into line D is the best thing to do that is
+    compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z".  We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters.  The conflict markers are stripped out
+before the sha1 is calculated.  So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
diff --git a/rerere.c b/rerere.c
index 74ce422634..ef23abe4dd 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * and NUL concatenated together.
  *
  * Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (4 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 05/10] rerere: add some documentation Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict in a file, but
commits the file with the conflict markers, and later the file ends up
in a state in which rerere can't handle it, subsequent rerere
operations that are interested in that path, such as 'rerere clear' or
'rerere forget <path>' will fail, or even worse in the case of 'rerere
clear' segfault.

Such states include nested conflicts, or an extra conflict marker that
doesn't have any match.

This is because the first 'git rerere' when there was only one
conflict in the file leaves an entry in the MERGE_RR file behind.  The
next 'git rerere' will then pick the rerere ID for that file up, and
not assign a new ID as it can't successfully calculate one.  It will
however still try to do the rerere operation, because of the existing
ID.  As the handle_file function fails, it will remove the 'preimage'
for the ID in the process, while leaving the ID in the MERGE_RR file.

Now when 'rerere clear' for example is run, it will segfault in
'has_rerere_resolution', because status is NULL.

To fix this, remove the rerere ID from the MERGE_RR file in the case
when we can't handle it, and remove the corresponding variant from
.git/rr-cache/.  Removing it unconditionally is fine here, because if
the user would have resolved the conflict and ran rerere, the entry
would no longer be in the MERGE_RR file, so we wouldn't have this
problem in the first place, while if the conflict was not resolved,
the only thing that's left in the folder is the 'preimage', which by
itself will be regenerated by git if necessary, so the user won't
loose any work.

Note that other variants that have the same conflict ID will not be
touched.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 12 +++++++-----
 t/t4200-rerere.sh | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index ef23abe4dd..220020187b 100644
--- a/rerere.c
+++ b/rerere.c
@@ -824,10 +824,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret;
-
-		if (string_list_has_string(rr, path))
-			continue;
+		int ret, has_string;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -835,7 +832,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		if (ret < 1)
+		has_string = string_list_has_string(rr, path);
+		if (ret < 0 && has_string) {
+			remove_variant(string_list_lookup(rr, path)->util);
+			string_list_remove(rr, path, 1);
+		}
+		if (ret < 1 || has_string)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index eaf18c81cb..5ce411b70d 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,26 @@ test_expect_success 'multiple identical conflicts' '
 	count_pre_post 0 0
 '
 
+test_expect_success 'rerere with unexpected conflict markers does not crash' '
+	git reset --hard &&
+
+	git checkout -b branch-1 master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+
+	git reset --hard &&
+	git checkout -b branch-2 master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge branch-1 &&
+	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
+	mv test.tmp test &&
+	git rerere &&
+
+	git rerere clear
+'
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 07/10] rerere: only return whether a path has conflicts or not
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (5 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 220020187b..da3744b86b 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	enum {
 		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			hunk_no++;
+			has_conflicts = 1;
 			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 		git_SHA1_Final(sha1, &ctx);
 	if (hunk != RR_CONTEXT)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char *output)
 {
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	struct rerere_io_file io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 		}
 	}
 
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 
 	fclose(io.input);
 	if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	if (io.io.output && fclose(io.io.output))
 		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
-	if (hunk_no < 0) {
+	if (has_conflicts < 0) {
 		if (output)
 			unlink_or_warn(output);
 		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -955,7 +956,7 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	mmfile_t mmfile[3] = {{NULL}};
 	mmbuffer_t result = {NULL, 0};
 	const struct cache_entry *ce;
-	int pos, len, i, hunk_no;
+	int pos, len, i, has_conflicts;
 	struct rerere_io_mem io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -1009,11 +1010,11 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	 * Grab the conflict ID and optionally write the original
 	 * contents with conflict markers out.
 	 */
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 	strbuf_release(&io.input);
 	if (io.io.output)
 		fclose(io.io.output);
-	return hunk_no;
+	return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 08/10] rerere: factor out handle_conflict function
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (6 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is a preparation for the next step, where
this function will be re-used.  No functional changes intended.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 143 +++++++++++++++++++++++++------------------------------
 1 file changed, 65 insertions(+), 78 deletions(-)

diff --git a/rerere.c b/rerere.c
index da3744b86b..fac90663b0 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct rerere_io *io)
 		ferr_puts(str, io->output, &io->wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-	char buf[64];
-
-	while (size) {
-		if (size <= sizeof(buf) - 2) {
-			memset(buf, ch, size);
-			buf[size] = '\n';
-			buf[size + 1] = '\0';
-			size = 0;
-		} else {
-			int sz = sizeof(buf) - 1;
-
-			/*
-			 * Make sure we will not write everything out
-			 * in this round by leaving at least 1 byte
-			 * for the next round, giving the next round
-			 * a chance to add the terminating LF.  Yuck.
-			 */
-			if (size <= sz)
-				sz -= (sz - size) + 1;
-			memset(buf, ch, sz);
-			buf[sz] = '\0';
-			size -= sz;
-		}
-		rerere_io_putstr(buf, io);
-	}
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
 	if (io->output)
@@ -384,37 +352,25 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+	strbuf_addchars(buf, ch, size);
+	strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+			   int marker_size, git_SHA_CTX *ctx)
 {
-	git_SHA_CTX ctx;
-	int has_conflicts = 0;
 	enum {
-		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-	} hunk = RR_CONTEXT;
+		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
-
-	if (sha1)
-		git_SHA1_Init(&ctx);
-
+	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size)) {
-			if (hunk != RR_CONTEXT)
-				goto bad;
-			hunk = RR_SIDE_1;
-		} else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size))
+			goto bad;
+		else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
@@ -427,42 +383,73 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			has_conflicts = 1;
-			hunk = RR_CONTEXT;
-			rerere_io_putconflict('<', marker_size, io);
-			rerere_io_putmem(one.buf, one.len, io);
-			rerere_io_putconflict('=', marker_size, io);
-			rerere_io_putmem(two.buf, two.len, io);
-			rerere_io_putconflict('>', marker_size, io);
-			if (sha1) {
-				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
+			rerere_strbuf_putconflict(out, '<', marker_size);
+			strbuf_addbuf(out, &one);
+			rerere_strbuf_putconflict(out, '=', marker_size);
+			strbuf_addbuf(out, &two);
+			rerere_strbuf_putconflict(out, '>', marker_size);
+			if (ctx) {
+				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
-				git_SHA1_Update(&ctx, two.buf ? two.buf : "",
+				git_SHA1_Update(ctx, two.buf ? two.buf : "",
 					    two.len + 1);
 			}
-			strbuf_reset(&one);
-			strbuf_reset(&two);
+			goto out;
 		} else if (hunk == RR_SIDE_1)
 			strbuf_addbuf(&one, &buf);
 		else if (hunk == RR_ORIGINAL)
 			; /* discard */
 		else if (hunk == RR_SIDE_2)
 			strbuf_addbuf(&two, &buf);
-		else
-			rerere_io_putstr(buf.buf, io);
-		continue;
-	bad:
-		hunk = 99; /* force error exit */
-		break;
 	}
+bad:
+	has_conflicts = -1;
+out:
 	strbuf_release(&one);
 	strbuf_release(&two);
 	strbuf_release(&buf);
 
+	return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reordering our side and their side so that whichever sorts
+ * alphabetically earlier comes before the other one, while
+ * computing the "conflict ID", which is just an SHA-1 hash of
+ * one side of the conflict, NUL, the other side of the conflict,
+ * and NUL concatenated together.
+ *
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
+ */
+static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf out = STRBUF_INIT;
+	int has_conflicts = 0;
+	if (sha1)
+		git_SHA1_Init(&ctx);
+
+	while (!io->getline(&buf, io)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			has_conflicts = handle_conflict(&out, io, marker_size,
+							    sha1 ? &ctx : NULL);
+			if (has_conflicts < 0)
+				break;
+			rerere_io_putmem(out.buf, out.len, io);
+			strbuf_reset(&out);
+		} else
+			rerere_io_putstr(buf.buf, io);
+	}
+	strbuf_release(&buf);
+	strbuf_release(&out);
+
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-	if (hunk != RR_CONTEXT)
-		return -1;
+
 	return has_conflicts;
 }
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (7 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-06-05 21:52   ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

    <<<<<<< HEAD
    1
    =======
    <<<<<<< HEAD
    3
    =======
    2
    >>>>>>> branch-2
    >>>>>>> branch-3~

it would be reordered as follows in the preimage:

    <<<<<<<
    1
    =======
    <<<<<<<
    2
    =======
    3
    >>>>>>>
    >>>>>>>

and the conflict ID would be calculated as
    sha1(1<NUL><<<<<<<
    2
    =======
    3
    >>>>>>><NUL>)

Stripping out vs. leaving the conflict markers in place should have no
practical impact, but it simplifies the implementation.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

I couldn't actually get the conflict markers the right way just using
merge-recursive.  But I think that would be fixed either way by
d694a17986 ("ll-merge: use a longer conflict marker for internal
merge", 2016-04-14), if I read that correctly.

Either way I still think this can be an improvement for when the user
commits merge conflicts (even though they shouldn't do that in the
first place), and for possible other edge cases that I'm not able to
produce right now, but I may just not be creative enough for those.

 Documentation/technical/rerere.txt | 40 ++++++++++++++++++++++++++++++
 rerere.c                           | 14 ++++++++---
 t/t4200-rerere.sh                  | 38 ++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index 2c517fe0fc..7077ab4a08 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -140,3 +140,43 @@ SHA1('B<NUL>C<NUL>').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a <NUL> character.
+
+Nested conflicts
+~~~~~~~~~~~~~~~~
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Same as before, labels on conflict markers and diff3 output is
+stripped, and the conflict hunks are sorted, for both the outer and
+the inner conflict.
+
+The only difference is in how the conflict ID is calculated.  For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+    <<<<<<< HEAD
+    1
+    =======
+    <<<<<<< HEAD
+    3
+    =======
+    2
+    >>>>>>> branch-2
+    >>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, the conflict
+would look as follows:
+
+    <<<<<<<
+    1
+    =======
+    <<<<<<<
+    3
+    =======
+    2
+    >>>>>>>
+    >>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
diff --git a/rerere.c b/rerere.c
index fac90663b0..f611db7873 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
 	int has_conflicts = 1;
 	while (!io->getline(&buf, io)) {
-		if (is_cmarker(buf.buf, '<', marker_size))
-			goto bad;
-		else if (is_cmarker(buf.buf, '|', marker_size)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
+				goto bad;
+			if (hunk == RR_SIDE_1)
+				strbuf_addbuf(&one, &conflict);
+			else
+				strbuf_addbuf(&two, &conflict);
+			strbuf_release(&conflict);
+		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				goto bad;
 			hunk = RR_ORIGINAL;
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 5ce411b70d..f433848ccb 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -602,4 +602,42 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
 	git rerere clear
 '
 
+test_expect_success 'rerere with inner conflict markers' '
+	git reset --hard &&
+
+	git checkout -b A master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+	echo "baz" >test &&
+	git add test &&
+	git commit -q -m three &&
+
+	git reset --hard &&
+	git checkout -b B master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+
+	echo "resolved" >test &&
+	git add test &&
+	git commit -q -m "solved conflict" &&
+
+	echo "resolved" >expect &&
+
+	git reset --hard HEAD~~ &&
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+	cat test >actual &&
+	test_cmp expect actual
+'
+
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (8 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-06-05 21:52   ` Thomas Gummerer
  2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  11 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-06-05 21:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict, commits the results,
and does an operation which creates another conflict, rerere will use
the ID of the previously unresolved conflict for the new conflict.
This is because the conflict is kept in the MERGE_RR file, which
'rerere' reads every time it is invoked.

After the new conflict is solved, rerere will record the resolution
with the ID of the old conflict.  So in order to replay the conflict,
both merges would have to be re-done, instead of just the last one, in
order for rerere to be able to automatically resolve the conflict.

Instead of that, assign a new conflict ID if there are still conflicts
in a file and the file had conflicts at a previous step.  This ID
matches the conflict we actually resolved at the corresponding step.

Note that there are no backwards compatibility worries here, as rerere
would have failed to even normalize the conflict before this patch
series.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 7 +++----
 t/t4200-rerere.sh | 7 +++++++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f611db7873..644f185180 100644
--- a/rerere.c
+++ b/rerere.c
@@ -818,7 +818,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret, has_string;
+		int ret;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -826,12 +826,11 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		has_string = string_list_has_string(rr, path);
-		if (ret < 0 && has_string) {
+		if (ret != 0 && string_list_has_string(rr, path)) {
 			remove_variant(string_list_lookup(rr, path)->util);
 			string_list_remove(rr, path, 1);
 		}
-		if (ret < 1 || has_string)
+		if (ret < 1)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index f433848ccb..9578215ff2 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -636,6 +636,13 @@ test_expect_success 'rerere with inner conflict markers' '
 	git commit -q -m "will solve conflicts later" &&
 	test_must_fail git merge A &&
 	cat test >actual &&
+	test_cmp expect actual &&
+
+	git add test &&
+	git commit -m "rerere solved conflict" &&
+	git reset --hard HEAD~ &&
+	test_must_fail git merge A &&
+	cat test >actual &&
 	test_cmp expect actual
 '
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 00/10] rerere: handle nested conflicts
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (9 preceding siblings ...)
  2018-06-05 21:52   ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
@ 2018-07-03 21:05   ` Thomas Gummerer
  2018-07-06 17:56     ` Junio C Hamano
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  11 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-03 21:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

On 06/05, Thomas Gummerer wrote:
> The previous round was at
> <20180520211210.1248-1-t.gummerer@gmail.com>.
> 
> Thanks Junio for the comments on the previous round.
> 
> Changes since v2:
>  - lowercase the first letter in some error/warning messages before
>    marking them for translation
>  - wrap paths in output in single quotes, for consistency, and to make
>    some of the messages the same as ones that are already translated
>  - mark messages in builtin/rerere.c for translation as well, which I
>    had previously forgotten.
>  - expanded the technical documentation on rerere.  The entire
>    document is basically rewritten.
>  - changed the test in 6/10 to just fake a conflict marker inside of
>    one of the hunks instead of using an inner conflict created by a
>    merge.  This is to make sure the codepath is still hit after we
>    handle inner conflicts properly.
>  - added tests for handling inner conflict markers
>  - added one commit to recalculate the conflict ID when an unresolved
>    conflict is committed, and the subsequent operation conflicts again
>    in the same file.  More explanation in the commit message of that
>    commit.

Now that 2.18 is out (and I'm caught up on the list after being away
from it for a few days), is there any interest in this series? I guess
it was overlooked as it's been sent in the rc phase for 2.18.

I think the most important bit here is 6/10 which fixes a crash that
can happen in "normal" usage of git.  The translation bits are also
nice to have I think, but I could send them in a different series if
that's preferred.

The other patches would be nice to have, but are arguably less
important.

> range-diff below.  A few commits changed enough for range-diff
> to give up showing the differences in those, they are probably best
> reviewed as the whole patch anyway:
>
> [snip]
> 
> Thomas Gummerer (10):
>   rerere: unify error messages when read_cache fails
>   rerere: lowercase error messages
>   rerere: wrap paths in output in sq
>   rerere: mark strings for translation
>   rerere: add some documentation
>   rerere: fix crash when conflict goes unresolved
>   rerere: only return whether a path has conflicts or not
>   rerere: factor out handle_conflict function
>   rerere: teach rerere to handle nested conflicts
>   rerere: recalculate conflict ID when unresolved conflict is committed
> 
>  Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
>  builtin/rerere.c                   |   4 +-
>  rerere.c                           | 246 ++++++++++++++---------------
>  t/t4200-rerere.sh                  |  67 ++++++++
>  4 files changed, 372 insertions(+), 127 deletions(-)
>  create mode 100644 Documentation/technical/rerere.txt
> 
> -- 
> 2.18.0.rc1.242.g61856ae69
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 00/10] rerere: handle nested conflicts
  2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
@ 2018-07-06 17:56     ` Junio C Hamano
  2018-07-10 21:37       ` Thomas Gummerer
  0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2018-07-06 17:56 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git

Thomas Gummerer <t.gummerer@gmail.com> writes:

> On 06/05, Thomas Gummerer wrote:
>> The previous round was at
>> <20180520211210.1248-1-t.gummerer@gmail.com>.
>> 
>> Thanks Junio for the comments on the previous round.
>> 
>> Changes since v2:
>>  - lowercase the first letter in some error/warning messages before
>>    marking them for translation
>>  - wrap paths in output in single quotes, for consistency, and to make
>>    some of the messages the same as ones that are already translated
>>  - mark messages in builtin/rerere.c for translation as well, which I
>>    had previously forgotten.
>>  - expanded the technical documentation on rerere.  The entire
>>    document is basically rewritten.
>>  - changed the test in 6/10 to just fake a conflict marker inside of
>>    one of the hunks instead of using an inner conflict created by a
>>    merge.  This is to make sure the codepath is still hit after we
>>    handle inner conflicts properly.
>>  - added tests for handling inner conflict markers
>>  - added one commit to recalculate the conflict ID when an unresolved
>>    conflict is committed, and the subsequent operation conflicts again
>>    in the same file.  More explanation in the commit message of that
>>    commit.
>
> Now that 2.18 is out (and I'm caught up on the list after being away
> from it for a few days), is there any interest in this series? I guess
> it was overlooked as it's been sent in the rc phase for 2.18.

I deliberately ignored, not because I wasn't interested in it, but
because I'd be distracted during the pre-release feature freeze as
I'd be heavily intereseted in it.

Now is a good time to repost to stir/re-ignite the interest from
others, possibly after rebasing on v2.18.0 and polishing further.

Thanks.

>
> I think the most important bit here is 6/10 which fixes a crash that
> can happen in "normal" usage of git.  The translation bits are also
> nice to have I think, but I could send them in a different series if
> that's preferred.
>
> The other patches would be nice to have, but are arguably less
> important.
>
>> range-diff below.  A few commits changed enough for range-diff
>> to give up showing the differences in those, they are probably best
>> reviewed as the whole patch anyway:
>>
>> [snip]
>> 
>> Thomas Gummerer (10):
>>   rerere: unify error messages when read_cache fails
>>   rerere: lowercase error messages
>>   rerere: wrap paths in output in sq
>>   rerere: mark strings for translation
>>   rerere: add some documentation
>>   rerere: fix crash when conflict goes unresolved
>>   rerere: only return whether a path has conflicts or not
>>   rerere: factor out handle_conflict function
>>   rerere: teach rerere to handle nested conflicts
>>   rerere: recalculate conflict ID when unresolved conflict is committed
>> 
>>  Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
>>  builtin/rerere.c                   |   4 +-
>>  rerere.c                           | 246 ++++++++++++++---------------
>>  t/t4200-rerere.sh                  |  67 ++++++++
>>  4 files changed, 372 insertions(+), 127 deletions(-)
>>  create mode 100644 Documentation/technical/rerere.txt
>> 
>> -- 
>> 2.18.0.rc1.242.g61856ae69
>> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 00/10] rerere: handle nested conflicts
  2018-07-06 17:56     ` Junio C Hamano
@ 2018-07-10 21:37       ` Thomas Gummerer
  0 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-10 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On 07/06, Junio C Hamano wrote:
> Thomas Gummerer <t.gummerer@gmail.com> writes:
> 
> > On 06/05, Thomas Gummerer wrote:
> >> The previous round was at
> >> <20180520211210.1248-1-t.gummerer@gmail.com>.
> >> 
> >> Thanks Junio for the comments on the previous round.
> >> 
> >> Changes since v2:
> >>  - lowercase the first letter in some error/warning messages before
> >>    marking them for translation
> >>  - wrap paths in output in single quotes, for consistency, and to make
> >>    some of the messages the same as ones that are already translated
> >>  - mark messages in builtin/rerere.c for translation as well, which I
> >>    had previously forgotten.
> >>  - expanded the technical documentation on rerere.  The entire
> >>    document is basically rewritten.
> >>  - changed the test in 6/10 to just fake a conflict marker inside of
> >>    one of the hunks instead of using an inner conflict created by a
> >>    merge.  This is to make sure the codepath is still hit after we
> >>    handle inner conflicts properly.
> >>  - added tests for handling inner conflict markers
> >>  - added one commit to recalculate the conflict ID when an unresolved
> >>    conflict is committed, and the subsequent operation conflicts again
> >>    in the same file.  More explanation in the commit message of that
> >>    commit.
> >
> > Now that 2.18 is out (and I'm caught up on the list after being away
> > from it for a few days), is there any interest in this series? I guess
> > it was overlooked as it's been sent in the rc phase for 2.18.
> 
> I deliberately ignored, not because I wasn't interested in it, but
> because I'd be distracted during the pre-release feature freeze as
> I'd be heavily intereseted in it.
> 
> Now is a good time to repost to stir/re-ignite the interest from
> others, possibly after rebasing on v2.18.0 and polishing further.

I sometimes find it hard to gauge whether there are no replies because
nobody is interested in the series, or if it is because it was ignored
or slipped to the cracks.  I guess I could have inferred it from your
replies to the previous iteration though :)

I'll go back and polish my patches, and then send a new iteration,
thanks! 

> Thanks.
> 
> >
> > I think the most important bit here is 6/10 which fixes a crash that
> > can happen in "normal" usage of git.  The translation bits are also
> > nice to have I think, but I could send them in a different series if
> > that's preferred.
> >
> > The other patches would be nice to have, but are arguably less
> > important.
> >
> >> range-diff below.  A few commits changed enough for range-diff
> >> to give up showing the differences in those, they are probably best
> >> reviewed as the whole patch anyway:
> >>
> >> [snip]
> >> 
> >> Thomas Gummerer (10):
> >>   rerere: unify error messages when read_cache fails
> >>   rerere: lowercase error messages
> >>   rerere: wrap paths in output in sq
> >>   rerere: mark strings for translation
> >>   rerere: add some documentation
> >>   rerere: fix crash when conflict goes unresolved
> >>   rerere: only return whether a path has conflicts or not
> >>   rerere: factor out handle_conflict function
> >>   rerere: teach rerere to handle nested conflicts
> >>   rerere: recalculate conflict ID when unresolved conflict is committed
> >> 
> >>  Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
> >>  builtin/rerere.c                   |   4 +-
> >>  rerere.c                           | 246 ++++++++++++++---------------
> >>  t/t4200-rerere.sh                  |  67 ++++++++
> >>  4 files changed, 372 insertions(+), 127 deletions(-)
> >>  create mode 100644 Documentation/technical/rerere.txt
> >> 
> >> -- 
> >> 2.18.0.rc1.242.g61856ae69
> >> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 00/11] rerere: handle nested conflicts
  2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
                     ` (10 preceding siblings ...)
  2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
@ 2018-07-14 21:44   ` " Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
                       ` (10 more replies)
  11 siblings, 11 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

The previous rounds were at
<20180520211210.1248-1-t.gummerer@gmail.com> and
<20180605215219.28783-1-t.gummerer@gmail.com>.

This round is a more polished version of the previous round, as
suggested by Junio in <xmqq1scgmemy.fsf@gitster-ct.c.googlers.com>.
It's also rebased on v2.18.

The series grew by one patch, because 8/10 has been split into two
patches hopefully making it easier to follow.

range-diff before below:

1:  2825342cc2 = 1:  018bd68a8a rerere: unify error messages when read_cache fails
2:  d1500028aa ! 2:  281fcbf24f rerere: lowercase error messages
    @@ -4,7 +4,8 @@
     
         Documentation/CodingGuidelines mentions that error messages should be
         lowercase.  Prior to marking them for translation follow that pattern
    -    in rerere as well.
    +    in rerere as well, so translators won't have to translate messages
    +    that don't conform to our guidelines.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    @@ -87,3 +88,12 @@
      
      	/* Nuke the recorded resolution for the conflict */
      	id = new_rerere_id(sha1);
    +@@
    + 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
    + 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
    + 			free(cur.ptr);
    +-			error("Failed to update conflicted state in '%s'", path);
    ++			error("failed to update conflicted state in '%s'", path);
    + 			goto fail_exit;
    + 		}
    + 		cleanly_resolved = !try_merge(id, path, &cur, &result);
3:  ed3601ee71 = 3:  b6d5e2e26d rerere: wrap paths in output in sq
4:  6ead84a199 ! 4:  45f0d7a99f rerere: mark strings for translation
    @@ -4,7 +4,7 @@
     
         'git rerere' is considered a plumbing command and as such its output
         should be translated.  Its functionality is also only enabled through
    -    a config setting, so scripts really shouldn't rely on its output
    +    a config setting, so scripts really shouldn't rely on the output
         either way.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
    @@ -219,8 +219,8 @@
      		handle_cache(path, sha1, rerere_path(id, "thisimage"));
      		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
      			free(cur.ptr);
    --			error("Failed to update conflicted state in '%s'", path);
    -+			error(_("Failed to update conflicted state in '%s'"), path);
    +-			error("failed to update conflicted state in '%s'", path);
    ++			error(_("failed to update conflicted state in '%s'"), path);
      			goto fail_exit;
      		}
      		cleanly_resolved = !try_merge(id, path, &cur, &result);
5:  caad276aca ! 5:  993857a816 rerere: add some documentation
    @@ -1,6 +1,6 @@
     Author: Thomas Gummerer <t.gummerer@gmail.com>
     
    -    rerere: add some documentation
    +    rerere: add documentation for conflict normalization
     
         Add some documentation for the logic behind the conflict normalization
         in rerere.
    @@ -27,30 +27,28 @@
     +when different conflict style settings are used, rerere normalizes the
     +conflicts before writing them to the rerere database.
     +
    -+Differnt conflict styles and branch names are dealt with by stripping
    -+that information from the conflict markers, and removing extraneous
    -+information from the `diff3` conflict style.
    -+
    -+Branches being merged in different order are dealt with by sorting the
    -+conflict hunks.  More on each of those parts in the following
    -+sections.
    ++Different conflict styles and branch names are normalized by stripping
    ++the labels from the conflict markers, and removing extraneous
    ++information from the `diff3` conflict style. Branches that are merged
    ++in different order are normalized by sorting the conflict hunks.  More
    ++on each of those steps in the following sections.
     +
     +Once these two normalization operations are applied, a conflict ID is
    -+created based on the normalized conflict, which is later used by
    ++calculated based on the normalized conflict, which is later used by
     +rerere to look up the conflict in the rerere database.
     +
     +Stripping extraneous information
     +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +
     +Say we have three branches AB, AC and AC2.  The common ancestor of
    -+these branches has a file with with a line with the string "A" (for
    -+brevity this line is called "line A" for brevity in the following) in
    -+it.  In branch AB this line is changed to "B", in AC, this line is
    -+changed to C, and branch AC2 is forked off of AC, after the line was
    -+changed to C.
    ++these branches has a file with a line containing the string "A" (for
    ++brevity this is called "line A" in the rest of the document).  In
    ++branch AB this line is changed to "B", in AC, this line is changed to
    ++"C", and branch AC2 is forked off of AC, after the line was changed to
    ++"C".
     +
    -+Now forking a branch ABAC off of branch AB and then merging AC into it,
    -+we'd get a conflict like the following:
    ++Forking a branch ABAC off of branch AB and then merging AC into it, we
    ++get a conflict like the following:
     +
     +    <<<<<<< HEAD
     +    B
    @@ -58,9 +56,9 @@
     +    C
     +    >>>>>>> AC
     +
    -+Now doing the analogous with AC2 (forking a branch ABAC2 off of branch
    -+AB and then merging branch AC2 into it), maybe using the diff3
    -+conflict style, we'd get a conflict like the following:
    ++Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
    ++and then merging branch AC2 into it), using the diff3 conflict style,
    ++we get a conflict like the following:
     +
     +    <<<<<<< HEAD
     +    B
6:  ad88a6b8a8 = 6:  a7a0f657f3 rerere: fix crash when conflict goes unresolved
7:  15f9efcba6 = 7:  f1afd4b9a4 rerere: only return whether a path has conflicts or not
8:  1490efaad3 ! 8:  1f5cef506a rerere: factor out handle_conflict function
    @@ -3,53 +3,14 @@
         rerere: factor out handle_conflict function
     
         Factor out the handle_conflict function, which handles a single
    -    conflict in a path.  This is a preparation for the next step, where
    -    this function will be re-used.  No functional changes intended.
    +    conflict in a path.  This is in preparation for a subsequent commit,
    +    where this function will be re-used.  No functional changes intended.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
     diff --git a/rerere.c b/rerere.c
     --- a/rerere.c
     +++ b/rerere.c
    -@@
    - 		ferr_puts(str, io->output, &io->wrerror);
    - }
    - 
    --/*
    -- * Write a conflict marker to io->output (if defined).
    -- */
    --static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
    --{
    --	char buf[64];
    --
    --	while (size) {
    --		if (size <= sizeof(buf) - 2) {
    --			memset(buf, ch, size);
    --			buf[size] = '\n';
    --			buf[size + 1] = '\0';
    --			size = 0;
    --		} else {
    --			int sz = sizeof(buf) - 1;
    --
    --			/*
    --			 * Make sure we will not write everything out
    --			 * in this round by leaving at least 1 byte
    --			 * for the next round, giving the next round
    --			 * a chance to add the terminating LF.  Yuck.
    --			 */
    --			if (size <= sz)
    --				sz -= (sz - size) + 1;
    --			memset(buf, ch, sz);
    --			buf[sz] = '\0';
    --			size -= sz;
    --		}
    --		rerere_io_putstr(buf, io);
    --	}
    --}
    --
    - static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
    - {
    - 	if (io->output)
     @@
      	return isspace(*buf);
      }
    @@ -67,14 +28,7 @@
     - * hunks and -1 if an error occured.
     - */
     -static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
    -+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
    -+{
    -+	strbuf_addchars(buf, ch, size);
    -+	strbuf_addch(buf, '\n');
    -+}
    -+
    -+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
    -+			   int marker_size, git_SHA_CTX *ctx)
    ++static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
      {
     -	git_SHA_CTX ctx;
     -	int has_conflicts = 0;
    @@ -88,38 +42,39 @@
     -
     -	if (sha1)
     -		git_SHA1_Init(&ctx);
    --
    -+	int has_conflicts = 1;
    ++	int has_conflicts = -1;
    + 
      	while (!io->getline(&buf, io)) {
    --		if (is_cmarker(buf.buf, '<', marker_size)) {
    + 		if (is_cmarker(buf.buf, '<', marker_size)) {
     -			if (hunk != RR_CONTEXT)
     -				goto bad;
     -			hunk = RR_SIDE_1;
    --		} else if (is_cmarker(buf.buf, '|', marker_size)) {
    -+		if (is_cmarker(buf.buf, '<', marker_size))
    -+			goto bad;
    -+		else if (is_cmarker(buf.buf, '|', marker_size)) {
    ++			break;
    + 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
      			if (hunk != RR_SIDE_1)
    - 				goto bad;
    +-				goto bad;
    ++				break;
      			hunk = RR_ORIGINAL;
    -@@
    - 				goto bad;
    + 		} else if (is_cmarker(buf.buf, '=', marker_size)) {
    + 			if (hunk != RR_SIDE_1 && hunk != RR_ORIGINAL)
    +-				goto bad;
    ++				break;
    + 			hunk = RR_SIDE_2;
    + 		} else if (is_cmarker(buf.buf, '>', marker_size)) {
    + 			if (hunk != RR_SIDE_2)
    +-				goto bad;
    ++				break;
      			if (strbuf_cmp(&one, &two) > 0)
      				strbuf_swap(&one, &two);
    --			has_conflicts = 1;
    + 			has_conflicts = 1;
     -			hunk = RR_CONTEXT;
    --			rerere_io_putconflict('<', marker_size, io);
    --			rerere_io_putmem(one.buf, one.len, io);
    --			rerere_io_putconflict('=', marker_size, io);
    --			rerere_io_putmem(two.buf, two.len, io);
    --			rerere_io_putconflict('>', marker_size, io);
    + 			rerere_io_putconflict('<', marker_size, io);
    + 			rerere_io_putmem(one.buf, one.len, io);
    + 			rerere_io_putconflict('=', marker_size, io);
    + 			rerere_io_putmem(two.buf, two.len, io);
    + 			rerere_io_putconflict('>', marker_size, io);
     -			if (sha1) {
     -				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
    -+			rerere_strbuf_putconflict(out, '<', marker_size);
    -+			strbuf_addbuf(out, &one);
    -+			rerere_strbuf_putconflict(out, '=', marker_size);
    -+			strbuf_addbuf(out, &two);
    -+			rerere_strbuf_putconflict(out, '>', marker_size);
     +			if (ctx) {
     +				git_SHA1_Update(ctx, one.buf ? one.buf : "",
      					    one.len + 1);
    @@ -129,7 +84,7 @@
      			}
     -			strbuf_reset(&one);
     -			strbuf_reset(&two);
    -+			goto out;
    ++			break;
      		} else if (hunk == RR_SIDE_1)
      			strbuf_addbuf(&one, &buf);
      		else if (hunk == RR_ORIGINAL)
    @@ -143,9 +98,6 @@
     -		hunk = 99; /* force error exit */
     -		break;
      	}
    -+bad:
    -+	has_conflicts = -1;
    -+out:
      	strbuf_release(&one);
      	strbuf_release(&two);
      	strbuf_release(&buf);
    @@ -169,24 +121,20 @@
     +{
     +	git_SHA_CTX ctx;
     +	struct strbuf buf = STRBUF_INIT;
    -+	struct strbuf out = STRBUF_INIT;
     +	int has_conflicts = 0;
     +	if (sha1)
     +		git_SHA1_Init(&ctx);
     +
     +	while (!io->getline(&buf, io)) {
     +		if (is_cmarker(buf.buf, '<', marker_size)) {
    -+			has_conflicts = handle_conflict(&out, io, marker_size,
    -+							    sha1 ? &ctx : NULL);
    ++			has_conflicts = handle_conflict(io, marker_size,
    ++							sha1 ? &ctx : NULL);
     +			if (has_conflicts < 0)
     +				break;
    -+			rerere_io_putmem(out.buf, out.len, io);
    -+			strbuf_reset(&out);
     +		} else
     +			rerere_io_putstr(buf.buf, io);
     +	}
     +	strbuf_release(&buf);
    -+	strbuf_release(&out);
     +
      	if (sha1)
      		git_SHA1_Final(sha1, &ctx);
-:  ---------- > 9:  8ac0d3e903 rerere: return strbuf from handle path
9:  6619650c42 ! 10:  ef84fdc201 rerere: teach rerere to handle nested conflicts
    @@ -27,7 +27,7 @@
             >>>>>>> branch-2
             >>>>>>> branch-3~
     
    -    it would be reordered as follows in the preimage:
    +    it would be recorde as follows in the preimage:
     
             <<<<<<<
             1
    @@ -40,14 +40,16 @@
             >>>>>>>
     
         and the conflict ID would be calculated as
    +
             sha1(1<NUL><<<<<<<
             2
             =======
             3
             >>>>>>><NUL>)
     
    -    Stripping out vs. leaving the conflict markers in place should have no
    -    practical impact, but it simplifies the implementation.
    +    Stripping out vs. leaving the conflict markers in place in the inner
    +    conflict should have no practical impact, but it simplifies the
    +    implementation.
     
         Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
     
    @@ -63,9 +65,11 @@
     +~~~~~~~~~~~~~~~~
     +
     +Nested conflicts are handled very similarly to "simple" conflicts.
    -+Same as before, labels on conflict markers and diff3 output is
    -+stripped, and the conflict hunks are sorted, for both the outer and
    -+the inner conflict.
    ++Similar to simple conflicts, the conflict is first normalized by
    ++stripping the labels from conflict markers, stripping the diff3
    ++output, and the sorting the conflict hunks, both for the outer and the
    ++inner conflict.  This is done recursively, so any number of nested
    ++conflicts can be handled.
     +
     +The only difference is in how the conflict ID is calculated.  For the
     +inner conflict, the conflict markers themselves are not stripped out
    @@ -83,16 +87,16 @@
     +    >>>>>>> branch-2
     +    >>>>>>> branch-3~
     +
    -+After stripping out the labels of the conflict markers, the conflict
    -+would look as follows:
    ++After stripping out the labels of the conflict markers, and sorting
    ++the hunks, the conflict would look as follows:
     +
     +    <<<<<<<
     +    1
     +    =======
     +    <<<<<<<
    -+    3
    -+    =======
     +    2
    ++    =======
    ++    3
     +    >>>>>>>
     +    >>>>>>>
     +
    @@ -108,23 +112,21 @@
      	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
     -	struct strbuf buf = STRBUF_INIT;
     +	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
    - 	int has_conflicts = 1;
    + 	int has_conflicts = -1;
    + 
      	while (!io->getline(&buf, io)) {
    --		if (is_cmarker(buf.buf, '<', marker_size))
    --			goto bad;
    --		else if (is_cmarker(buf.buf, '|', marker_size)) {
    -+		if (is_cmarker(buf.buf, '<', marker_size)) {
    + 		if (is_cmarker(buf.buf, '<', marker_size)) {
    +-			break;
     +			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
    -+				goto bad;
    ++				break;
     +			if (hunk == RR_SIDE_1)
     +				strbuf_addbuf(&one, &conflict);
     +			else
     +				strbuf_addbuf(&two, &conflict);
     +			strbuf_release(&conflict);
    -+		} else if (is_cmarker(buf.buf, '|', marker_size)) {
    + 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
      			if (hunk != RR_SIDE_1)
    - 				goto bad;
    - 			hunk = RR_ORIGINAL;
    + 				break;
     
     diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
     --- a/t/t4200-rerere.sh
    @@ -169,6 +171,5 @@
     +	cat test >actual &&
     +	test_cmp expect actual
     +'
    -+
     +
      test_done
10:  4b11dce7dd = 11:  35a826908f rerere: recalculate conflict ID when unresolved conflict is committed


Thomas Gummerer (11):
  rerere: unify error messages when read_cache fails
  rerere: lowercase error messages
  rerere: wrap paths in output in sq
  rerere: mark strings for translation
  rerere: add documentation for conflict normalization
  rerere: fix crash when conflict goes unresolved
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: return strbuf from handle path
  rerere: teach rerere to handle nested conflicts
  rerere: recalculate conflict ID when unresolved conflict is committed

 Documentation/technical/rerere.txt | 182 +++++++++++++++++++++
 builtin/rerere.c                   |   4 +-
 rerere.c                           | 243 ++++++++++++++---------------
 t/t4200-rerere.sh                  |  66 ++++++++
 4 files changed, 366 insertions(+), 129 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

-- 
2.18.0.233.g985f88cf7e

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 01/11] rerere: unify error messages when read_cache fails
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index e0862e2778..473d32a5cd 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -1103,7 +1103,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("Could not read index");
+		return error("index file corrupt");
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 02/11] rerere: lowercase error messages
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Documentation/CodingGuidelines mentions that error messages should be
lowercase.  Prior to marking them for translation follow that pattern
in rerere as well, so translators won't have to translate messages
that don't conform to our guidelines.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/rerere.c b/rerere.c
index 473d32a5cd..c5d9ea171f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("Could not write %s", output);
+			error_errno("could not write %s", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("There were errors while writing %s (%s)",
+		error("there were errors while writing %s (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("Failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush %s", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("Could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in %s", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -690,11 +690,11 @@ static int merge(const struct rerere_id *id, const char *path)
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("Could not open %s", path);
+		return error_errno("could not open %s", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("Could not write %s", path);
+		error_errno("could not write %s", path);
 	if (fclose(f))
-		return error_errno("Writing %s failed", path);
+		return error_errno("writing %s failed", path);
 
 out:
 	free(cur.ptr);
@@ -720,7 +720,7 @@ static void update_paths(struct string_list *update)
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("Unable to write new index file");
+		die("unable to write new index file");
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("Could not create directory %s", git_path_rr_cache());
+		die("could not create directory %s", git_path_rr_cache());
 	return 1;
 }
 
@@ -1031,7 +1031,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("Could not parse conflict hunks in '%s'", path);
+		return error("could not parse conflict hunks in '%s'", path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1049,7 +1049,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("Failed to update conflicted state in '%s'", path);
+			error("failed to update conflicted state in '%s'", path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 03/11] rerere: wrap paths in output in sq
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

It looks like most paths in the output in the git codebase are wrapped
in single quotes.  Standardize on that in rerere as well.

Apart from being more consistent, this also makes some of the strings
match strings that are already translated in other parts of the
codebase, thus reducing the work for translators, when the strings are
marked for translation in a subsequent commit.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  2 +-
 rerere.c         | 26 +++++++++++++-------------
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index 0bc40298c2..e0c67c98e9 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for %s", rerere_path(id, NULL));
+				die("unable to generate diff for '%s'", rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index c5d9ea171f..cde1f6e696 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write %s", output);
+			error_errno("could not write '%s'", output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing %s (%s)",
+		error("there were errors while writing '%s' (%s)",
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush %s", path);
+		io.io.wrerror = error_errno("failed to flush '%s'", path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in %s", path);
+		return error("could not parse conflict hunks in '%s'", path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on %s",
+		warning_errno("failed utime() on '%s'",
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open %s", path);
+		return error_errno("could not open '%s'", path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write %s", path);
+		error_errno("could not write '%s'", path);
 	if (fclose(f))
-		return error_errno("writing %s failed", path);
+		return error_errno("writing '%s' failed", path);
 
 out:
 	free(cur.ptr);
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory %s", git_path_rr_cache());
+		die("could not create directory '%s'", git_path_rr_cache());
 	return 1;
 }
 
@@ -1067,9 +1067,9 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for %s", path);
+			error("no remembered resolution for '%s'", path);
 		else
-			error_errno("cannot unlink %s", filename);
+			error_errno("cannot unlink '%s'", filename);
 		goto fail_exit;
 	}
 
@@ -1088,7 +1088,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for %s\n", path);
+	fprintf(stderr, "Forgot resolution for '%s'\n", path);
 	return 0;
 
 fail_exit:
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 04/11] rerere: mark strings for translation
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (2 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-15 13:24       ` Simon Ruderich
  2018-07-14 21:44     ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
                       ` (6 subsequent siblings)
  10 siblings, 1 reply; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

'git rerere' is considered a plumbing command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on the output
either way.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/rerere.c |  4 +--
 rerere.c         | 68 ++++++++++++++++++++++++------------------------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index e0c67c98e9..5ed941b91f 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -75,7 +75,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 	if (!strcmp(argv[0], "forget")) {
 		struct pathspec pathspec;
 		if (argc < 2)
-			warning("'git rerere forget' without paths is deprecated");
+			warning(_("'git rerere forget' without paths is deprecated"));
 		parse_pathspec(&pathspec, 0, PATHSPEC_PREFER_CWD,
 			       prefix, argv + 1);
 		return rerere_forget(&pathspec);
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char *prefix)
 			const char *path = merge_rr.items[i].string;
 			const struct rerere_id *id = merge_rr.items[i].util;
 			if (diff_two(rerere_path(id, "preimage"), path, path, path))
-				die("unable to generate diff for '%s'", rerere_path(id, NULL));
+				die(_("unable to generate diff for '%s'"), rerere_path(id, NULL));
 		}
 	} else
 		usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index cde1f6e696..be98c0afcb 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
 		/* There has to be the hash, tab, path and then NUL */
 		if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 
 		if (buf.buf[40] != '.') {
 			variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
 			errno = 0;
 			variant = strtol(buf.buf + 41, &path, 10);
 			if (errno)
-				die("corrupt MERGE_RR");
+				die(_("corrupt MERGE_RR"));
 		}
 		if (*(path++) != '\t')
-			die("corrupt MERGE_RR");
+			die(_("corrupt MERGE_RR"));
 		buf.buf[40] = '\0';
 		id = new_rerere_id_hex(buf.buf);
 		id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
 				    rr->items[i].string, 0);
 
 		if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-			die("unable to write rerere record");
+			die(_("unable to write rerere record"));
 
 		strbuf_release(&buf);
 	}
 	if (commit_lock_file(&write_lock) != 0)
-		die("unable to write rerere record");
+		die(_("unable to write rerere record"));
 	return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	io.input = fopen(path, "r");
 	io.io.wrerror = 0;
 	if (!io.input)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 
 	if (output) {
 		io.io.output = fopen(output, "w");
 		if (!io.io.output) {
-			error_errno("could not write '%s'", output);
+			error_errno(_("could not write '%s'"), output);
 			fclose(io.input);
 			return -1;
 		}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 
 	fclose(io.input);
 	if (io.io.wrerror)
-		error("there were errors while writing '%s' (%s)",
+		error(_("there were errors while writing '%s' (%s)"),
 		      path, strerror(io.io.wrerror));
 	if (io.io.output && fclose(io.io.output))
-		io.io.wrerror = error_errno("failed to flush '%s'", path);
+		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
 	if (hunk_no < 0) {
 		if (output)
 			unlink_or_warn(output);
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
 	int i;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
 	if (setup_rerere(merge_rr, RERERE_READONLY))
 		return 0;
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	for (i = 0; i < active_nr;) {
 		int conflict_type;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char *path)
 	 * Mark that "postimage" was used to help gc.
 	 */
 	if (utime(rerere_path(id, "postimage"), NULL) < 0)
-		warning_errno("failed utime() on '%s'",
+		warning_errno(_("failed utime() on '%s'"),
 			      rerere_path(id, "postimage"));
 
 	/* Update "path" with the resolution */
 	f = fopen(path, "w");
 	if (!f)
-		return error_errno("could not open '%s'", path);
+		return error_errno(_("could not open '%s'"), path);
 	if (fwrite(result.ptr, result.size, 1, f) != 1)
-		error_errno("could not write '%s'", path);
+		error_errno(_("could not write '%s'"), path);
 	if (fclose(f))
-		return error_errno("writing '%s' failed", path);
+		return error_errno(_("writing '%s' failed"), path);
 
 out:
 	free(cur.ptr);
@@ -714,13 +714,13 @@ static void update_paths(struct string_list *update)
 		struct string_list_item *item = &update->items[i];
 		if (add_file_to_cache(item->string, 0))
 			exit(128);
-		fprintf(stderr, "Staged '%s' using previous resolution.\n",
+		fprintf_ln(stderr, _("Staged '%s' using previous resolution."),
 			item->string);
 	}
 
 	if (write_locked_index(&the_index, &index_lock,
 			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
-		die("unable to write new index file");
+		die(_("unable to write new index file"));
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -752,7 +752,7 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (!handle_file(path, NULL, NULL)) {
 			copy_file(rerere_path(id, "postimage"), path, 0666);
 			id->collection->status[variant] |= RR_HAS_POSTIMAGE;
-			fprintf(stderr, "Recorded resolution for '%s'.\n", path);
+			fprintf_ln(stderr, _("Recorded resolution for '%s'."), path);
 			free_rerere_id(rr_item);
 			rr_item->util = NULL;
 			return;
@@ -786,9 +786,9 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 		if (rerere_autoupdate)
 			string_list_insert(update, path);
 		else
-			fprintf(stderr,
-				"Resolved '%s' using previous resolution.\n",
-				path);
+			fprintf_ln(stderr,
+				   _("Resolved '%s' using previous resolution."),
+				   path);
 		free_rerere_id(rr_item);
 		rr_item->util = NULL;
 		return;
@@ -802,11 +802,11 @@ static void do_rerere_one_path(struct string_list_item *rr_item,
 	if (id->collection->status[variant] & RR_HAS_POSTIMAGE) {
 		const char *path = rerere_path(id, "postimage");
 		if (unlink(path))
-			die_errno("cannot unlink stray '%s'", path);
+			die_errno(_("cannot unlink stray '%s'"), path);
 		id->collection->status[variant] &= ~RR_HAS_POSTIMAGE;
 	}
 	id->collection->status[variant] |= RR_HAS_PREIMAGE;
-	fprintf(stderr, "Recorded preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Recorded preimage for '%s'"), path);
 }
 
 static int do_plain_rerere(struct string_list *rr, int fd)
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
 		return rr_cache_exists;
 
 	if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-		die("could not create directory '%s'", git_path_rr_cache());
+		die(_("could not create directory '%s'"), git_path_rr_cache());
 	return 1;
 }
 
@@ -1031,7 +1031,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 */
 	ret = handle_cache(path, sha1, NULL);
 	if (ret < 1)
-		return error("could not parse conflict hunks in '%s'", path);
+		return error(_("could not parse conflict hunks in '%s'"), path);
 
 	/* Nuke the recorded resolution for the conflict */
 	id = new_rerere_id(sha1);
@@ -1049,7 +1049,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 		handle_cache(path, sha1, rerere_path(id, "thisimage"));
 		if (read_mmfile(&cur, rerere_path(id, "thisimage"))) {
 			free(cur.ptr);
-			error("failed to update conflicted state in '%s'", path);
+			error(_("failed to update conflicted state in '%s'"), path);
 			goto fail_exit;
 		}
 		cleanly_resolved = !try_merge(id, path, &cur, &result);
@@ -1060,16 +1060,16 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	}
 
 	if (id->collection->status_nr <= id->variant) {
-		error("no remembered resolution for '%s'", path);
+		error(_("no remembered resolution for '%s'"), path);
 		goto fail_exit;
 	}
 
 	filename = rerere_path(id, "postimage");
 	if (unlink(filename)) {
 		if (errno == ENOENT)
-			error("no remembered resolution for '%s'", path);
+			error(_("no remembered resolution for '%s'"), path);
 		else
-			error_errno("cannot unlink '%s'", filename);
+			error_errno(_("cannot unlink '%s'"), filename);
 		goto fail_exit;
 	}
 
@@ -1079,7 +1079,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	 * the postimage.
 	 */
 	handle_cache(path, sha1, rerere_path(id, "preimage"));
-	fprintf(stderr, "Updated preimage for '%s'\n", path);
+	fprintf_ln(stderr, _("Updated preimage for '%s'"), path);
 
 	/*
 	 * And remember that we can record resolution for this
@@ -1088,7 +1088,7 @@ static int rerere_forget_one_path(const char *path, struct string_list *rr)
 	item = string_list_insert(rr, path);
 	free_rerere_id(item);
 	item->util = id;
-	fprintf(stderr, "Forgot resolution for '%s'\n", path);
+	fprintf(stderr, _("Forgot resolution for '%s'\n"), path);
 	return 0;
 
 fail_exit:
@@ -1103,7 +1103,7 @@ int rerere_forget(struct pathspec *pathspec)
 	struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
 	if (read_cache() < 0)
-		return error("index file corrupt");
+		return error(_("index file corrupt"));
 
 	fd = setup_rerere(&merge_rr, RERERE_NOAUTOUPDATE);
 	if (fd < 0)
@@ -1191,7 +1191,7 @@ void rerere_gc(struct string_list *rr)
 	git_config(git_default_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	if (!dir)
-		die_errno("unable to open rr-cache directory");
+		die_errno(_("unable to open rr-cache directory"));
 	/* Collect stale conflict IDs ... */
 	while ((e = readdir(dir))) {
 		struct rerere_dir *rr_dir;
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 05/11] rerere: add documentation for conflict normalization
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (3 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 140 +++++++++++++++++++++++++++++
 rerere.c                           |   4 -
 2 files changed, 140 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
new file mode 100644
index 0000000000..4102cce7aa
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,140 @@
+Rerere
+======
+
+This document describes the rerere logic.
+
+Conflict normalization
+----------------------
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing extraneous
+information from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks.  More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Stripping extraneous information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document).  In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    =======
+    C
+    >>>>>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+    <<<<<<< HEAD
+    B
+    ||||||| merged common ancestors
+    A
+    =======
+    C
+    >>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+    After examining what branches AB and AC did, I believe that making
+    line A into line D is the best thing to do that is compatible with
+    what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the diff3 output is removed.  The above
+examples would both result in the following normalized conflict:
+
+    <<<<<<<
+    B
+    =======
+    C
+    >>>>>>>
+
+Sorting hunks
+~~~~~~~~~~~~~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+    - AB: changes A to B
+    - AC: changes A to C
+    - XY: changes X to Y
+    - XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+    After examining what branches AB and AC did, I believe that
+    making line A into line D is the best thing to do that is
+    compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging ACXY and ABXZ would make "an
+early A became C or B, a late X became Y or Z".  We can see there are
+4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
+
+By sorting, the conflict is given its canonical name, namely, "an
+early part became B or C, a late part becames X or Y", and whenever
+any of these four patterns appear, and we can get to the same conflict
+and resolution that we saw earlier.
+
+Without the sorting, we'd have to somehow find a previous resolution
+from combinatorial explosion.
+
+Conflict ID calculation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the conflict normalization is done, the conflict ID is calculated
+as the sha1 hash of the conflict hunks appended to each other,
+separated by <NUL> characters.  The conflict markers are stripped out
+before the sha1 is calculated.  So in the example above, where we
+merge branch AC which changes line A to line C, into branch AB, which
+changes line A to line C, the conflict ID would be
+SHA1('B<NUL>C<NUL>').
+
+If there are multiple conflicts in one file, the sha1 is calculated
+the same way with all hunks appended to each other, in the order in
+which they appear in the file, separated by a <NUL> character.
diff --git a/rerere.c b/rerere.c
index be98c0afcb..da1ab54027 100644
--- a/rerere.c
+++ b/rerere.c
@@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * and NUL concatenated together.
  *
  * Return the number of conflict hunks found.
- *
- * NEEDSWORK: the logic and theory of operation behind this conflict
- * normalization may deserve to be documented somewhere, perhaps in
- * Documentation/technical/rerere.txt.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (4 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict in a file, but
commits the file with the conflict markers, and later the file ends up
in a state in which rerere can't handle it, subsequent rerere
operations that are interested in that path, such as 'rerere clear' or
'rerere forget <path>' will fail, or even worse in the case of 'rerere
clear' segfault.

Such states include nested conflicts, or an extra conflict marker that
doesn't have any match.

This is because the first 'git rerere' when there was only one
conflict in the file leaves an entry in the MERGE_RR file behind.  The
next 'git rerere' will then pick the rerere ID for that file up, and
not assign a new ID as it can't successfully calculate one.  It will
however still try to do the rerere operation, because of the existing
ID.  As the handle_file function fails, it will remove the 'preimage'
for the ID in the process, while leaving the ID in the MERGE_RR file.

Now when 'rerere clear' for example is run, it will segfault in
'has_rerere_resolution', because status is NULL.

To fix this, remove the rerere ID from the MERGE_RR file in the case
when we can't handle it, and remove the corresponding variant from
.git/rr-cache/.  Removing it unconditionally is fine here, because if
the user would have resolved the conflict and ran rerere, the entry
would no longer be in the MERGE_RR file, so we wouldn't have this
problem in the first place, while if the conflict was not resolved,
the only thing that's left in the folder is the 'preimage', which by
itself will be regenerated by git if necessary, so the user won't
loose any work.

Note that other variants that have the same conflict ID will not be
touched.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 12 +++++++-----
 t/t4200-rerere.sh | 22 ++++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index da1ab54027..895ad80c0c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret;
-
-		if (string_list_has_string(rr, path))
-			continue;
+		int ret, has_string;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		if (ret < 1)
+		has_string = string_list_has_string(rr, path);
+		if (ret < 0 && has_string) {
+			remove_variant(string_list_lookup(rr, path)->util);
+			string_list_remove(rr, path, 1);
+		}
+		if (ret < 1 || has_string)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 8417e5a4b1..34f0518a5e 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,26 @@ test_expect_success 'multiple identical conflicts' '
 	count_pre_post 0 0
 '
 
+test_expect_success 'rerere with unexpected conflict markers does not crash' '
+	git reset --hard &&
+
+	git checkout -b branch-1 master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+
+	git reset --hard &&
+	git checkout -b branch-2 master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge branch-1 &&
+	sed "s/bar/>>>>>>> a/" >test.tmp <test &&
+	mv test.tmp test &&
+	git rerere &&
+
+	git rerere clear
+'
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 07/11] rerere: only return whether a path has conflicts or not
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (5 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 895ad80c0c..bf803043e2 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	enum {
 		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 				goto bad;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
-			hunk_no++;
+			has_conflicts = 1;
 			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
 		git_SHA1_Final(sha1, &ctx);
 	if (hunk != RR_CONTEXT)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char *output)
 {
-	int hunk_no = 0;
+	int has_conflicts = 0;
 	struct rerere_io_file io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 		}
 	}
 
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 
 	fclose(io.input);
 	if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char *sha1, const char *output
 	if (io.io.output && fclose(io.io.output))
 		io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
-	if (hunk_no < 0) {
+	if (has_conflicts < 0) {
 		if (output)
 			unlink_or_warn(output);
 		return error(_("could not parse conflict hunks in '%s'"), path);
 	}
 	if (io.io.wrerror)
 		return -1;
-	return hunk_no;
+	return has_conflicts;
 }
 
 /*
@@ -954,7 +955,7 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	mmfile_t mmfile[3] = {{NULL}};
 	mmbuffer_t result = {NULL, 0};
 	const struct cache_entry *ce;
-	int pos, len, i, hunk_no;
+	int pos, len, i, has_conflicts;
 	struct rerere_io_mem io;
 	int marker_size = ll_merge_marker_size(path);
 
@@ -1008,11 +1009,11 @@ static int handle_cache(const char *path, unsigned char *sha1, const char *outpu
 	 * Grab the conflict ID and optionally write the original
 	 * contents with conflict markers out.
 	 */
-	hunk_no = handle_path(sha1, (struct rerere_io *)&io, marker_size);
+	has_conflicts = handle_path(sha1, (struct rerere_io *)&io, marker_size);
 	strbuf_release(&io.input);
 	if (io.io.output)
 		fclose(io.io.output);
-	return hunk_no;
+	return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 08/11] rerere: factor out handle_conflict function
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (6 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
                       ` (2 subsequent siblings)
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is in preparation for a subsequent commit,
where this function will be re-used.  No functional changes intended.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 87 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 47 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index bf803043e2..2d62251943 100644
--- a/rerere.c
+++ b/rerere.c
@@ -384,85 +384,92 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
 {
-	git_SHA_CTX ctx;
-	int has_conflicts = 0;
 	enum {
-		RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-	} hunk = RR_CONTEXT;
+		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
-
-	if (sha1)
-		git_SHA1_Init(&ctx);
+	int has_conflicts = -1;
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			if (hunk != RR_CONTEXT)
-				goto bad;
-			hunk = RR_SIDE_1;
+			break;
 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
-				goto bad;
+				break;
 			hunk = RR_ORIGINAL;
 		} else if (is_cmarker(buf.buf, '=', marker_size)) {
 			if (hunk != RR_SIDE_1 && hunk != RR_ORIGINAL)
-				goto bad;
+				break;
 			hunk = RR_SIDE_2;
 		} else if (is_cmarker(buf.buf, '>', marker_size)) {
 			if (hunk != RR_SIDE_2)
-				goto bad;
+				break;
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
 			has_conflicts = 1;
-			hunk = RR_CONTEXT;
 			rerere_io_putconflict('<', marker_size, io);
 			rerere_io_putmem(one.buf, one.len, io);
 			rerere_io_putconflict('=', marker_size, io);
 			rerere_io_putmem(two.buf, two.len, io);
 			rerere_io_putconflict('>', marker_size, io);
-			if (sha1) {
-				git_SHA1_Update(&ctx, one.buf ? one.buf : "",
+			if (ctx) {
+				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
-				git_SHA1_Update(&ctx, two.buf ? two.buf : "",
+				git_SHA1_Update(ctx, two.buf ? two.buf : "",
 					    two.len + 1);
 			}
-			strbuf_reset(&one);
-			strbuf_reset(&two);
+			break;
 		} else if (hunk == RR_SIDE_1)
 			strbuf_addbuf(&one, &buf);
 		else if (hunk == RR_ORIGINAL)
 			; /* discard */
 		else if (hunk == RR_SIDE_2)
 			strbuf_addbuf(&two, &buf);
-		else
-			rerere_io_putstr(buf.buf, io);
-		continue;
-	bad:
-		hunk = 99; /* force error exit */
-		break;
 	}
 	strbuf_release(&one);
 	strbuf_release(&two);
 	strbuf_release(&buf);
 
+	return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reordering our side and their side so that whichever sorts
+ * alphabetically earlier comes before the other one, while
+ * computing the "conflict ID", which is just an SHA-1 hash of
+ * one side of the conflict, NUL, the other side of the conflict,
+ * and NUL concatenated together.
+ *
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
+ */
+static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
+{
+	git_SHA_CTX ctx;
+	struct strbuf buf = STRBUF_INIT;
+	int has_conflicts = 0;
+	if (sha1)
+		git_SHA1_Init(&ctx);
+
+	while (!io->getline(&buf, io)) {
+		if (is_cmarker(buf.buf, '<', marker_size)) {
+			has_conflicts = handle_conflict(io, marker_size,
+							sha1 ? &ctx : NULL);
+			if (has_conflicts < 0)
+				break;
+		} else
+			rerere_io_putstr(buf.buf, io);
+	}
+	strbuf_release(&buf);
+
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-	if (hunk != RR_CONTEXT)
-		return -1;
+
 	return has_conflicts;
 }
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 09/11] rerere: return strbuf from handle path
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (7 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently we write the conflict to disk directly in the handle_path
function.  To make it re-usable for nested conflicts, instead of
writing the conflict out directly, store it in a strbuf and let the
caller write it out.

This does mean some slight increase in memory usage, however that
increase is limited to the size of the largest conflict we've
currently processed.  We already keep one copy of the conflict in
memory, and it shouldn't be too large, so the increase in memory usage
seems acceptable.

As a bonus this lets us get replace the rerere_io_putconflict function
with a trivial two line function.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c | 58 ++++++++++++++++++--------------------------------------
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index 2d62251943..a35b88916c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct rerere_io *io)
 		ferr_puts(str, io->output, &io->wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-	char buf[64];
-
-	while (size) {
-		if (size <= sizeof(buf) - 2) {
-			memset(buf, ch, size);
-			buf[size] = '\n';
-			buf[size + 1] = '\0';
-			size = 0;
-		} else {
-			int sz = sizeof(buf) - 1;
-
-			/*
-			 * Make sure we will not write everything out
-			 * in this round by leaving at least 1 byte
-			 * for the next round, giving the next round
-			 * a chance to add the terminating LF.  Yuck.
-			 */
-			if (size <= sz)
-				sz -= (sz - size) + 1;
-			memset(buf, ch, sz);
-			buf[sz] = '\0';
-			size -= sz;
-		}
-		rerere_io_putstr(buf, io);
-	}
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
 	if (io->output)
@@ -384,7 +352,14 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
 	return isspace(*buf);
 }
 
-static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *ctx)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+	strbuf_addchars(buf, ch, size);
+	strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+			   int marker_size, git_SHA_CTX *ctx)
 {
 	enum {
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
@@ -410,11 +385,11 @@ static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *c
 			if (strbuf_cmp(&one, &two) > 0)
 				strbuf_swap(&one, &two);
 			has_conflicts = 1;
-			rerere_io_putconflict('<', marker_size, io);
-			rerere_io_putmem(one.buf, one.len, io);
-			rerere_io_putconflict('=', marker_size, io);
-			rerere_io_putmem(two.buf, two.len, io);
-			rerere_io_putconflict('>', marker_size, io);
+			rerere_strbuf_putconflict(out, '<', marker_size);
+			strbuf_addbuf(out, &one);
+			rerere_strbuf_putconflict(out, '=', marker_size);
+			strbuf_addbuf(out, &two);
+			rerere_strbuf_putconflict(out, '>', marker_size);
 			if (ctx) {
 				git_SHA1_Update(ctx, one.buf ? one.buf : "",
 					    one.len + 1);
@@ -451,21 +426,24 @@ static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX *c
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
 {
 	git_SHA_CTX ctx;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, out = STRBUF_INIT;
 	int has_conflicts = 0;
 	if (sha1)
 		git_SHA1_Init(&ctx);
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			has_conflicts = handle_conflict(io, marker_size,
+			has_conflicts = handle_conflict(&out, io, marker_size,
 							sha1 ? &ctx : NULL);
 			if (has_conflicts < 0)
 				break;
+			rerere_io_putmem(out.buf, out.len, io);
+			strbuf_reset(&out);
 		} else
 			rerere_io_putstr(buf.buf, io);
 	}
 	strbuf_release(&buf);
+	strbuf_release(&out);
 
 	if (sha1)
 		git_SHA1_Final(sha1, &ctx);
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (8 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  2018-07-14 21:44     ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

    <<<<<<< HEAD
    1
    =======
    <<<<<<< HEAD
    3
    =======
    2
    >>>>>>> branch-2
    >>>>>>> branch-3~

it would be recorde as follows in the preimage:

    <<<<<<<
    1
    =======
    <<<<<<<
    2
    =======
    3
    >>>>>>>
    >>>>>>>

and the conflict ID would be calculated as

    sha1(1<NUL><<<<<<<
    2
    =======
    3
    >>>>>>><NUL>)

Stripping out vs. leaving the conflict markers in place in the inner
conflict should have no practical impact, but it simplifies the
implementation.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/rerere.txt | 42 ++++++++++++++++++++++++++++++
 rerere.c                           | 10 +++++--
 t/t4200-rerere.sh                  | 37 ++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/rerere.txt b/Documentation/technical/rerere.txt
index 4102cce7aa..60d48dc4fe 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -138,3 +138,45 @@ SHA1('B<NUL>C<NUL>').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a <NUL> character.
+
+Nested conflicts
+~~~~~~~~~~~~~~~~
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Similar to simple conflicts, the conflict is first normalized by
+stripping the labels from conflict markers, stripping the diff3
+output, and the sorting the conflict hunks, both for the outer and the
+inner conflict.  This is done recursively, so any number of nested
+conflicts can be handled.
+
+The only difference is in how the conflict ID is calculated.  For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+    <<<<<<< HEAD
+    1
+    =======
+    <<<<<<< HEAD
+    3
+    =======
+    2
+    >>>>>>> branch-2
+    >>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, and sorting
+the hunks, the conflict would look as follows:
+
+    <<<<<<<
+    1
+    =======
+    <<<<<<<
+    2
+    =======
+    3
+    >>>>>>>
+    >>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')`
diff --git a/rerere.c b/rerere.c
index a35b88916c..f78bef80b1 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct rerere_io *io,
 		RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
 	} hunk = RR_SIDE_1;
 	struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-	struct strbuf buf = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
 	int has_conflicts = -1;
 
 	while (!io->getline(&buf, io)) {
 		if (is_cmarker(buf.buf, '<', marker_size)) {
-			break;
+			if (handle_conflict(&conflict, io, marker_size, NULL) < 0)
+				break;
+			if (hunk == RR_SIDE_1)
+				strbuf_addbuf(&one, &conflict);
+			else
+				strbuf_addbuf(&two, &conflict);
+			strbuf_release(&conflict);
 		} else if (is_cmarker(buf.buf, '|', marker_size)) {
 			if (hunk != RR_SIDE_1)
 				break;
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 34f0518a5e..d63fe2b33b 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -602,4 +602,41 @@ test_expect_success 'rerere with unexpected conflict markers does not crash' '
 	git rerere clear
 '
 
+test_expect_success 'rerere with inner conflict markers' '
+	git reset --hard &&
+
+	git checkout -b A master &&
+	echo "bar" >test &&
+	git add test &&
+	git commit -q -m two &&
+	echo "baz" >test &&
+	git add test &&
+	git commit -q -m three &&
+
+	git reset --hard &&
+	git checkout -b B master &&
+	echo "foo" >test &&
+	git add test &&
+	git commit -q -a -m one &&
+
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+
+	echo "resolved" >test &&
+	git add test &&
+	git commit -q -m "solved conflict" &&
+
+	echo "resolved" >expect &&
+
+	git reset --hard HEAD~~ &&
+	test_must_fail git merge A~ &&
+	git add test &&
+	git commit -q -m "will solve conflicts later" &&
+	test_must_fail git merge A &&
+	cat test >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed
  2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
                       ` (9 preceding siblings ...)
  2018-07-14 21:44     ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
@ 2018-07-14 21:44     ` Thomas Gummerer
  10 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-14 21:44 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Thomas Gummerer

Currently when a user doesn't resolve a conflict, commits the results,
and does an operation which creates another conflict, rerere will use
the ID of the previously unresolved conflict for the new conflict.
This is because the conflict is kept in the MERGE_RR file, which
'rerere' reads every time it is invoked.

After the new conflict is solved, rerere will record the resolution
with the ID of the old conflict.  So in order to replay the conflict,
both merges would have to be re-done, instead of just the last one, in
order for rerere to be able to automatically resolve the conflict.

Instead of that, assign a new conflict ID if there are still conflicts
in a file and the file had conflicts at a previous step.  This ID
matches the conflict we actually resolved at the corresponding step.

Note that there are no backwards compatibility worries here, as rerere
would have failed to even normalize the conflict before this patch
series.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 rerere.c          | 7 +++----
 t/t4200-rerere.sh | 7 +++++++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f78bef80b1..dd81d09e19 100644
--- a/rerere.c
+++ b/rerere.c
@@ -815,7 +815,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		struct rerere_id *id;
 		unsigned char sha1[20];
 		const char *path = conflict.items[i].string;
-		int ret, has_string;
+		int ret;
 
 		/*
 		 * Ask handle_file() to scan and assign a
@@ -823,12 +823,11 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 		 * yet.
 		 */
 		ret = handle_file(path, sha1, NULL);
-		has_string = string_list_has_string(rr, path);
-		if (ret < 0 && has_string) {
+		if (ret != 0 && string_list_has_string(rr, path)) {
 			remove_variant(string_list_lookup(rr, path)->util);
 			string_list_remove(rr, path, 1);
 		}
-		if (ret < 1 || has_string)
+		if (ret < 1)
 			continue;
 
 		id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index d63fe2b33b..bfb37ed4fc 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -636,6 +636,13 @@ test_expect_success 'rerere with inner conflict markers' '
 	git commit -q -m "will solve conflicts later" &&
 	test_must_fail git merge A &&
 	cat test >actual &&
+	test_cmp expect actual &&
+
+	git add test &&
+	git commit -m "rerere solved conflict" &&
+	git reset --hard HEAD~ &&
+	test_must_fail git merge A &&
+	cat test >actual &&
 	test_cmp expect actual
 '
 
-- 
2.17.0.410.g65aef3a6c4


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 04/11] rerere: mark strings for translation
  2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
@ 2018-07-15 13:24       ` Simon Ruderich
  2018-07-16 20:40         ` Thomas Gummerer
  0 siblings, 1 reply; 46+ messages in thread
From: Simon Ruderich @ 2018-07-15 13:24 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: git, Junio C Hamano

On Sat, Jul 14, 2018 at 10:44:36PM +0100, Thomas Gummerer wrote:
> 'git rerere' is considered a plumbing command and as such its output

s/plumbing/porcelain/?

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 04/11] rerere: mark strings for translation
  2018-07-15 13:24       ` Simon Ruderich
@ 2018-07-16 20:40         ` Thomas Gummerer
  0 siblings, 0 replies; 46+ messages in thread
From: Thomas Gummerer @ 2018-07-16 20:40 UTC (permalink / raw)
  To: Simon Ruderich; +Cc: git, Junio C Hamano

On 07/15, Simon Ruderich wrote:
> On Sat, Jul 14, 2018 at 10:44:36PM +0100, Thomas Gummerer wrote:
> > 'git rerere' is considered a plumbing command and as such its output
> 
> s/plumbing/porcelain/?

Ah yes indeed.  Thanks for catching!

> Regards
> Simon
> -- 
> + privacy is necessary
> + using gnupg http://gnupg.org
> + public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, back to index

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-20 21:12 [RFC/PATCH 0/7] rerere: handle nested conflicts Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 1/7] rerere: unify error message when read_cache fails Thomas Gummerer
2018-05-21 19:00   ` Stefan Beller
2018-05-20 21:12 ` [RFC/PATCH 2/7] rerere: mark strings for translation Thomas Gummerer
2018-05-24  7:20   ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 3/7] rerere: add some documentation Thomas Gummerer
2018-05-24  9:20   ` Junio C Hamano
2018-06-03 11:41     ` Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 4/7] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-05-24  9:47   ` Junio C Hamano
2018-05-24 18:54     ` Thomas Gummerer
2018-05-25  1:20       ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 5/7] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-05-24 10:02   ` Junio C Hamano
2018-05-20 21:12 ` [RFC/PATCH 6/7] rerere: factor out handle_conflict function Thomas Gummerer
2018-05-20 21:12 ` [RFC/PATCH 7/7] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-05-24 10:21   ` Junio C Hamano
2018-05-24 19:07     ` Thomas Gummerer
2018-06-05 21:52 ` [PATCH v2 00/10] rerere: " Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 01/10] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 02/10] rerere: lowercase error messages Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 03/10] rerere: wrap paths in output in sq Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 04/10] rerere: mark strings for translation Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 05/10] rerere: add some documentation Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 06/10] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 07/10] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 08/10] rerere: factor out handle_conflict function Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 09/10] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-06-05 21:52   ` [PATCH v2 10/10] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer
2018-07-03 21:05   ` [PATCH v2 00/10] rerere: handle nested conflicts Thomas Gummerer
2018-07-06 17:56     ` Junio C Hamano
2018-07-10 21:37       ` Thomas Gummerer
2018-07-14 21:44   ` [PATCH v3 00/11] " Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 01/11] rerere: unify error messages when read_cache fails Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 02/11] rerere: lowercase error messages Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 03/11] rerere: wrap paths in output in sq Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 04/11] rerere: mark strings for translation Thomas Gummerer
2018-07-15 13:24       ` Simon Ruderich
2018-07-16 20:40         ` Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 05/11] rerere: add documentation for conflict normalization Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 07/11] rerere: only return whether a path has conflicts or not Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 08/11] rerere: factor out handle_conflict function Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 09/11] rerere: return strbuf from handle path Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts Thomas Gummerer
2018-07-14 21:44     ` [PATCH v3 11/11] rerere: recalculate conflict ID when unresolved conflict is committed Thomas Gummerer

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox