git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 00/19] Diff machine: highlight moved lines.
@ 2017-05-14  4:00 Stefan Beller
  2017-05-14  4:00 ` [PATCH 01/19] diff: readability fix Stefan Beller
                   ` (20 more replies)
  0 siblings, 21 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:00 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

For details on *why* see the commit message of the last commit.

The first five patches are slight refactorings to get into good
shape, the next patches are funneling all output through emit_line_*.

The second last patch introduces an option to buffer up all output
before printing, and then the last patch can color up moved lines
of code.

Any feedback welcome.

Thanks,
Stefan 


Stefan Beller (19):
  diff: readability fix
  diff: move line ending check into emit_hunk_header
  diff.c: drop 'nofirst' from emit_line_0
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff.c: emit_line_0 can handle no color setting
  diff: add emit_line_fmt
  diff.c: convert fn_out_consume to use emit_line_*
  diff.c: convert builtin_diff to use emit_line_*
  diff.c: convert emit_rewrite_diff to use emit_line_*
  diff.c: convert emit_rewrite_lines to use emit_line_*
  submodule.c: convert show_submodule_summary to use emit_line_fmt
  diff.c: convert emit_binary_diff_body to use emit_line_*
  diff.c: convert show_stats to use emit_line_*
  diff.c: convert word diffing to use emit_line_*
  diff.c: convert diff_flush to use emit_line_*
  diff.c: convert diff_summary to use emit_line_*
  diff.c: factor out emit_line_ws for coloring whitespaces
  diff: buffer all output if asked to
  diff.c: color moved lines differently

 Documentation/config.txt   |  12 +-
 diff.c                     | 815 ++++++++++++++++++++++++++++++++++-----------
 diff.h                     |  69 +++-
 submodule.c                |  78 ++---
 submodule.h                |   9 +-
 t/t4015-diff-whitespace.sh | 229 +++++++++++++
 6 files changed, 960 insertions(+), 252 deletions(-)

-- 
2.13.0.18.g183880de0a


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [PATCH 01/19] diff: readability fix
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
@ 2017-05-14  4:00 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 02/19] diff: move line ending check into emit_hunk_header Stefan Beller
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:00 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

We already have dereferenced 'p->two' into a local variable 'two'. Use
that.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 	const char *other;
 	const char *attr_path;
 
-	name  = p->one->path;
-	other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+	name  = one->path;
+	other = (strcmp(name, two->path) ? two->path : NULL);
 	attr_path = name;
 	if (o->prefix_length)
 		strip_prefix(o->prefix_length, &name, &other);
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 02/19] diff: move line ending check into emit_hunk_header
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
  2017-05-14  4:00 ` [PATCH 01/19] diff: readability fix Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15  6:48   ` Junio C Hamano
  2017-05-14  4:01 ` [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0 Stefan Beller
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This patch moves code that is conceptually part of
emit_hunk_header, but was using output in fn_out_consume,
back to emit_hunk_header.

Meanwhile simplify it by using a function that is designed for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	}
 
 	strbuf_add(&msgbuf, line + len, org_len - len);
+	strbuf_complete_line(&msgbuf);
+
 	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		len = sane_truncate_line(ecbdata, line, len);
 		find_lno(line, ecbdata);
 		emit_hunk_header(ecbdata, line, len);
-		if (line[len-1] != '\n')
-			putc('\n', o->file);
 		return;
 	}
 
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
  2017-05-14  4:00 ` [PATCH 01/19] diff: readability fix Stefan Beller
  2017-05-14  4:01 ` [PATCH 02/19] diff: move line ending check into emit_hunk_header Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 18:26   ` Jonathan Tan
  2017-05-15 19:22   ` Brandon Williams
  2017-05-14  4:01 ` [PATCH 04/19] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In 250f79930d (diff.c: split emit_line() from the first char and the rest
of the line, 2009-09-14) we introduced the local variable 'nofirst' that
indicates if we have no first sign character. With the given implementation
we had to use an extra variable unlike reusing 'first' because the lines
first character could be '\0'.

Change the meaning of the 'first' argument to not mean the first character
of the line, but rather just containing the sign that is prepended to the
line. Refactor emit_line to not include the lines first character, but pass
the complete line as well as a '\0' sign, which now serves as an indication
not to print a sign.

With this patch other callers hard code the sign (which are '+', '-',
' ' and '\\') such that we do not run into unexpectedly emitting an
error-nous '\0'.

The audit of the caller revealed that the sign cannot be '\n' or '\r',
so remove that condition for trailing newline or carriage return in the
sign; the else part of the condition handles the len==0 perfectly,
so we can drop the if/else construct.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 40 +++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..4269b8dccf 100644
--- a/diff.c
+++ b/diff.c
@@ -517,33 +517,24 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 }
 
 static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int first, const char *line, int len)
+			int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
-	int nofirst;
 	FILE *file = o->file;
 
 	fputs(diff_line_prefix(o), file);
 
-	if (len == 0) {
-		has_trailing_newline = (first == '\n');
-		has_trailing_carriage_return = (!has_trailing_newline &&
-						(first == '\r'));
-		nofirst = has_trailing_newline || has_trailing_carriage_return;
-	} else {
-		has_trailing_newline = (len > 0 && line[len-1] == '\n');
-		if (has_trailing_newline)
-			len--;
-		has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-		if (has_trailing_carriage_return)
-			len--;
-		nofirst = 0;
-	}
+	has_trailing_newline = (len > 0 && line[len-1] == '\n');
+	if (has_trailing_newline)
+		len--;
+	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+	if (has_trailing_carriage_return)
+		len--;
 
-	if (len || !nofirst) {
+	if (len || sign) {
 		fputs(set, file);
-		if (!nofirst)
-			fputc(first, file);
+		if (sign)
+			fputc(sign, file);
 		fwrite(line, len, 1, file);
 		fputs(reset, file);
 	}
@@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 static void emit_line(struct diff_options *o, const char *set, const char *reset,
 		      const char *line, int len)
 {
-	emit_line_0(o, set, reset, line[0], line+1, len-1);
+	emit_line_0(o, set, reset, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
@@ -4822,9 +4813,12 @@ void diff_flush(struct diff_options *options)
 
 	if (output_format & DIFF_FORMAT_PATCH) {
 		if (separator) {
-			fprintf(options->file, "%s%c",
-				diff_line_prefix(options),
-				options->line_termination);
+			char term[2];
+			term[0] = options->line_termination;
+			term[1] = '\0';
+
+			emit_line(options, NULL, NULL,
+				  term, 1);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
 				fputs(options->stat_sep, options->file);
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 04/19] diff.c: factor out diff_flush_patch_all_file_pairs
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (2 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0 Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 05/19] diff.c: emit_line_0 can handle no color setting Stefan Beller
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 4269b8dccf..381b572d76 100644
--- a/diff.c
+++ b/diff.c
@@ -4728,6 +4728,17 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc)
 		warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+	int i;
+	struct diff_queue_struct *q = &diff_queued_diff;
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		if (check_pair_status(p))
+			diff_flush_patch(p, o);
+	}
+}
+
 void diff_flush(struct diff_options *options)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
@@ -4825,11 +4836,7 @@ void diff_flush(struct diff_options *options)
 			}
 		}
 
-		for (i = 0; i < q->nr; i++) {
-			struct diff_filepair *p = q->queue[i];
-			if (check_pair_status(p))
-				diff_flush_patch(p, options);
-		}
+		diff_flush_patch_all_file_pairs(options);
 	}
 
 	if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 05/19] diff.c: emit_line_0 can handle no color setting
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (3 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 04/19] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 18:31   ` Jonathan Tan
  2017-05-14  4:01 ` [PATCH 06/19] diff: add emit_line_fmt Stefan Beller
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

In later patches we may pass lines that are not colored to
the central function emit_line_0, so we
need to emit the color only when it is non-NULL.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 381b572d76..48f0fb98dc 100644
--- a/diff.c
+++ b/diff.c
@@ -532,11 +532,13 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		len--;
 
 	if (len || sign) {
-		fputs(set, file);
+		if (set)
+			fputs(set, file);
 		if (sign)
 			fputc(sign, file);
 		fwrite(line, len, 1, file);
-		fputs(reset, file);
+		if (reset)
+			fputs(reset, file);
 	}
 	if (has_trailing_carriage_return)
 		fputc('\r', file);
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 06/19] diff: add emit_line_fmt
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (4 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 05/19] diff.c: emit_line_0 can handle no color setting Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 19:31   ` Brandon Williams
  2017-05-14  4:01 ` [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_* Stefan Beller
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In the following patches we'll convert all printing functions to use
the emit_line_* family of functions.

Many of the printing functions to be converted are formatted. So offer
a formatted function in the emit_line function family as well.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/diff.c b/diff.c
index 48f0fb98dc..aef159a919 100644
--- a/diff.c
+++ b/diff.c
@@ -552,6 +552,20 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
 	emit_line_0(o, set, reset, 0, line, len);
 }
 
+static void emit_line_fmt(struct diff_options *o,
+			  const char *set, const char *reset,
+			  const char *fmt, ...)
+{
+	struct strbuf sb = STRBUF_INIT;
+	va_list ap;
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+
+	emit_line(o, set, reset, sb.buf, sb.len);
+	strbuf_release(&sb);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (5 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 06/19] diff: add emit_line_fmt Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-16  1:00   ` Junio C Hamano
  2017-05-14  4:01 ` [PATCH 08/19] diff.c: convert builtin_diff " Stefan Beller
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers the parts of fn_out_consume.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index aef159a919..93343a9ccc 100644
--- a/diff.c
+++ b/diff.c
@@ -1289,7 +1289,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
 	const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
 	struct diff_options *o = ecbdata->opt;
-	const char *line_prefix = diff_line_prefix(o);
 
 	o->found_changes = 1;
 
@@ -1301,14 +1300,12 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 
 	if (ecbdata->label_path[0]) {
 		const char *name_a_tab, *name_b_tab;
-
 		name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
 		name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-		fprintf(o->file, "%s%s--- %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab);
-		fprintf(o->file, "%s%s+++ %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab);
+		emit_line_fmt(o, meta, reset, "--- %s%s\n",
+			      ecbdata->label_path[0], name_a_tab);
+		emit_line_fmt(o, meta, reset, "+++ %s%s\n",
+			      ecbdata->label_path[1], name_b_tab);
 		ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
 	}
 
@@ -1349,7 +1346,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
 			emit_line(o, context, reset, line, len);
-			fputs("~\n", o->file);
+			emit_line(o, NULL, NULL, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 08/19] diff.c: convert builtin_diff to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (6 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_* Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 18:42   ` Jonathan Tan
  2017-05-14  4:01 ` [PATCH 09/19] diff.c: convert emit_rewrite_diff " Stefan Beller
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers builtin_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 93343a9ccc..8e00010bf4 100644
--- a/diff.c
+++ b/diff.c
@@ -1293,8 +1293,9 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		fprintf(o->file, "%s", ecbdata->header->buf);
-		strbuf_reset(ecbdata->header);
+		emit_line(o, NULL, NULL,
+			  ecbdata->header->buf, ecbdata->header->len);
+		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
 	}
 
@@ -2407,7 +2408,7 @@ static void builtin_diff(const char *name_a,
 	b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
 	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
 	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
-	strbuf_addf(&header, "%s%sdiff --git %s %s%s\n", line_prefix, meta, a_one, b_two, reset);
+	strbuf_addf(&header, "%sdiff --git %s %s%s\n", meta, a_one, b_two, reset);
 	if (lbl[0][0] == '/') {
 		/* /dev/null */
 		strbuf_addf(&header, "%s%snew file mode %06o%s\n", line_prefix, meta, two->mode, reset);
@@ -2439,7 +2440,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			fprintf(o->file, "%s", header.buf);
+			emit_line(o, NULL, NULL, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2449,7 +2450,8 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		fprintf(o->file, "%s", header.buf);
+		if (header.len)
+			emit_line(o, NULL, NULL, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2459,13 +2461,16 @@ static void builtin_diff(const char *name_a,
 		    S_ISREG(one->mode) && S_ISREG(two->mode) &&
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
-				if (must_show_header)
-					fprintf(o->file, "%s", header.buf);
+				if (must_show_header && header.len)
+					emit_line(o, NULL, NULL,
+						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			fprintf(o->file, "%s", header.buf);
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			if (header.len)
+				emit_line(o, NULL, NULL,
+					  header.buf, header.len);
+			emit_line_fmt(o, 0, 0, "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 			goto free_ab_and_return;
 		}
 		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
@@ -2473,17 +2478,21 @@ static void builtin_diff(const char *name_a,
 		/* Quite common confusing case */
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
-			if (must_show_header)
-				fprintf(o->file, "%s", header.buf);
+			if (must_show_header && header.len)
+				emit_line(o, NULL, NULL,
+					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		fprintf(o->file, "%s", header.buf);
+		if (header.len)
+			emit_line(o, NULL, NULL,
+				  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
 			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
 		else
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			emit_line_fmt(o, NULL, NULL,
+				      "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 		o->found_changes = 1;
 	} else {
 		/* Crazy xdl interfaces.. */
@@ -2494,8 +2503,8 @@ static void builtin_diff(const char *name_a,
 		struct emit_callback ecbdata;
 		const struct userdiff_funcname *pe;
 
-		if (must_show_header) {
-			fprintf(o->file, "%s", header.buf);
+		if (must_show_header && header.len) {
+			emit_line(o, NULL, NULL, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 09/19] diff.c: convert emit_rewrite_diff to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (7 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 08/19] diff.c: convert builtin_diff " Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 10/19] diff.c: convert emit_rewrite_lines " Stefan Beller
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 8e00010bf4..e4b46fee4f 100644
--- a/diff.c
+++ b/diff.c
@@ -708,17 +708,17 @@ static void remove_tempfile(void)
 	}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
 	switch (count) {
 	case 0:
-		fprintf(file, "0,0");
+		strbuf_addstr(out, "0,0");
 		break;
 	case 1:
-		fprintf(file, "1");
+		strbuf_addstr(out, "1");
 		break;
 	default:
-		fprintf(file, "1,%d", count);
+		strbuf_addf(out, "1,%d", count);
 		break;
 	}
 }
@@ -772,7 +772,7 @@ static void emit_rewrite_diff(const char *name_a,
 	char *data_one, *data_two;
 	size_t size_one, size_two;
 	struct emit_callback ecbdata;
-	const char *line_prefix = diff_line_prefix(o);
+	struct strbuf out = STRBUF_INIT;
 
 	if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		a_prefix = o->b_prefix;
@@ -810,20 +810,23 @@ static void emit_rewrite_diff(const char *name_a,
 	ecbdata.lno_in_preimage = 1;
 	ecbdata.lno_in_postimage = 1;
 
+	emit_line_fmt(o, metainfo, reset, "--- %s%s\n", a_name.buf, name_a_tab);
+	emit_line_fmt(o, metainfo, reset, "+++ %s%s\n", b_name.buf, name_b_tab);
+
 	lc_a = count_lines(data_one, size_one);
 	lc_b = count_lines(data_two, size_two);
-	fprintf(o->file,
-		"%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-		line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-		line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-		line_prefix, fraginfo);
+
+	strbuf_addstr(&out, "@@ -");
 	if (!o->irreversible_delete)
-		print_line_count(o->file, lc_a);
+		add_line_count(&out, lc_a);
 	else
-		fprintf(o->file, "?,?");
-	fprintf(o->file, " +");
-	print_line_count(o->file, lc_b);
-	fprintf(o->file, " @@%s\n", reset);
+		strbuf_addstr(&out, "?,?");
+	strbuf_addstr(&out, " +");
+	add_line_count(&out, lc_b);
+	strbuf_addstr(&out, " @@\n");
+	emit_line(o, fraginfo, reset, out.buf, out.len);
+	strbuf_release(&out);
+
 	if (lc_a && !o->irreversible_delete)
 		emit_rewrite_lines(&ecbdata, '-', data_one, size_one);
 	if (lc_b)
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 10/19] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (8 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 09/19] diff.c: convert emit_rewrite_diff " Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 19:09   ` Jonathan Tan
  2017-05-14  4:01 ` [PATCH 11/19] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_lines.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index e4b46fee4f..369c804f03 100644
--- a/diff.c
+++ b/diff.c
@@ -748,7 +748,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 	if (!endp) {
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		putc('\n', ecb->opt->file);
+		emit_line(ecb->opt, NULL, NULL, "\n", 1);
 		emit_line_0(ecb->opt, context, reset, '\\',
 			    nneof, strlen(nneof));
 	}
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 11/19] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (9 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 10/19] diff.c: convert emit_rewrite_lines " Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 12/19] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This prepares the code for submodules to go through the
emit_line function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c      | 18 +++++++-------
 diff.h      |  5 ++++
 submodule.c | 78 ++++++++++++++++++++++++++++++-------------------------------
 submodule.h |  9 +++----
 4 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/diff.c b/diff.c
index 369c804f03..45ec311828 100644
--- a/diff.c
+++ b/diff.c
@@ -546,15 +546,15 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		fputc('\n', file);
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char *reset,
-		      const char *line, int len)
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+	       const char *line, int len)
 {
 	emit_line_0(o, set, reset, 0, line, len);
 }
 
-static void emit_line_fmt(struct diff_options *o,
-			  const char *set, const char *reset,
-			  const char *fmt, ...)
+void emit_line_fmt(struct diff_options *o,
+		   const char *set, const char *reset,
+		   const char *fmt, ...)
 {
 	struct strbuf sb = STRBUF_INIT;
 	va_list ap;
@@ -2379,8 +2379,7 @@ static void builtin_diff(const char *name_a,
 	    (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_summary(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_summary(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
 				meta, del, add, reset);
@@ -2390,11 +2389,10 @@ static void builtin_diff(const char *name_a,
 		   (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_inline_diff(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_inline_diff(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
-				meta, del, add, reset, o);
+				meta, del, add, reset);
 		return;
 	}
 
diff --git a/diff.h b/diff.h
index 5be1ee77a7..addebd5a0f 100644
--- a/diff.h
+++ b/diff.h
@@ -188,6 +188,11 @@ struct diff_options {
 	int diff_path_counter;
 };
 
+void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
+		   const char *fmt, ...);
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+	       const char *line, int len);
+
 enum color_diff {
 	DIFF_RESET = 0,
 	DIFF_CONTEXT = 1,
diff --git a/submodule.c b/submodule.c
index d3299e29c0..cfad469a2f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path,
 	return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, FILE *f,
-		const char *line_prefix,
+static void print_submodule_summary(struct rev_info *rev,
+		struct diff_options *o,
 		const char *del, const char *add, const char *reset)
 {
 	static const char format[] = "  %m %s";
@@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f,
 		ctx.date_mode = rev->date_mode;
 		ctx.output_encoding = get_log_output_encoding();
 		strbuf_setlen(&sb, 0);
-		strbuf_addstr(&sb, line_prefix);
-		if (commit->object.flags & SYMMETRIC_LEFT) {
-			if (del)
-				strbuf_addstr(&sb, del);
-		}
-		else if (add)
-			strbuf_addstr(&sb, add);
 		format_commit_message(commit, format, &sb, &ctx);
-		if (reset)
-			strbuf_addstr(&sb, reset);
 		strbuf_addch(&sb, '\n');
-		fprintf(f, "%s", sb.buf);
+		if (commit->object.flags & SYMMETRIC_LEFT)
+			emit_line(o, del, reset, sb.buf, sb.len);
+		else if (add)
+			emit_line(o, add, reset, sb.buf, sb.len);
 	}
 	strbuf_release(&sb);
 }
@@ -413,8 +407,7 @@ void prepare_submodule_repo_env(struct argv_array *out)
  * attempt to lookup both the left and right commits and put them into the
  * left and right pointers.
  */
-static void show_submodule_header(FILE *f, const char *path,
-		const char *line_prefix,
+static void show_submodule_header(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *reset,
@@ -426,11 +419,11 @@ static void show_submodule_header(FILE *f, const char *path,
 	int fast_forward = 0, fast_backward = 0;
 
 	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
-		fprintf(f, "%sSubmodule %s contains untracked content\n",
-			line_prefix, path);
+		emit_line_fmt(o, NULL, NULL,
+			      "Submodule %s contains untracked content\n", path);
 	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-		fprintf(f, "%sSubmodule %s contains modified content\n",
-			line_prefix, path);
+		emit_line_fmt(o, NULL, NULL,
+			      "Submodule %s contains modified content\n", path);
 
 	if (is_null_oid(one))
 		message = "(new submodule)";
@@ -472,21 +465,20 @@ static void show_submodule_header(FILE *f, const char *path,
 	}
 
 output_header:
-	strbuf_addf(&sb, "%s%sSubmodule %s ", line_prefix, meta, path);
+	strbuf_addf(&sb, "Submodule %s ", path);
 	strbuf_add_unique_abbrev(&sb, one->hash, DEFAULT_ABBREV);
 	strbuf_addstr(&sb, (fast_backward || fast_forward) ? ".." : "...");
 	strbuf_add_unique_abbrev(&sb, two->hash, DEFAULT_ABBREV);
 	if (message)
-		strbuf_addf(&sb, " %s%s\n", message, reset);
+		strbuf_addf(&sb, " %s\n", message);
 	else
-		strbuf_addf(&sb, "%s:%s\n", fast_backward ? " (rewind)" : "", reset);
-	fwrite(sb.buf, sb.len, 1, f);
+		strbuf_addf(&sb, "%s:\n", fast_backward ? " (rewind)" : "");
+	emit_line(o, meta, reset, sb.buf, sb.len);
 
 	strbuf_release(&sb);
 }
 
-void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset)
@@ -495,7 +487,7 @@ void show_submodule_summary(FILE *f, const char *path,
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/*
@@ -508,11 +500,12 @@ void show_submodule_summary(FILE *f, const char *path,
 
 	/* Treat revision walker failure the same as missing commits */
 	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
-		fprintf(f, "%s(revision walker failed)\n", line_prefix);
+		const char *error = "(revision walker failed)\n";
+		emit_line(o, NULL, NULL, error, strlen(error));
 		goto out;
 	}
 
-	print_submodule_summary(&rev, f, line_prefix, del, add, reset);
+	print_submodule_summary(&rev, o, del, add, reset);
 
 out:
 	if (merge_bases)
@@ -521,20 +514,18 @@ void show_submodule_summary(FILE *f, const char *path,
 	clear_commit_marks(right, ~0);
 }
 
-void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *o)
+		const char *del, const char *add, const char *reset)
 {
 	const struct object_id *old = &empty_tree_oid, *new = &empty_tree_oid;
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
-	struct strbuf submodule_dir = STRBUF_INIT;
 	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf sb = STRBUF_INIT;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/* We need a valid left and right commit to display a difference */
@@ -547,15 +538,14 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 	if (right)
 		new = two;
 
-	fflush(f);
 	cp.git_cmd = 1;
 	cp.dir = path;
-	cp.out = dup(fileno(f));
+	cp.out = -1;
 	cp.no_stdin = 1;
 
 	/* TODO: other options may need to be passed here. */
 	argv_array_push(&cp.args, "diff");
-	argv_array_pushf(&cp.args, "--line-prefix=%s", line_prefix);
+	argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));
 	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		argv_array_pushf(&cp.args, "--src-prefix=%s%s/",
 				 o->b_prefix, path);
@@ -578,11 +568,21 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 		argv_array_push(&cp.args, oid_to_hex(new));
 
 	prepare_submodule_repo_env(&cp.env_array);
-	if (run_command(&cp))
-		fprintf(f, "(diff failed)\n");
+	if (start_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		emit_line(o, NULL, NULL, error, strlen(error));
+	}
+
+	while (strbuf_getwholeline_fd(&sb, cp.out, '\n') != EOF)
+		emit_line(o, NULL, NULL, sb.buf, sb.len);
+
+	if (finish_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		emit_line(o, NULL, NULL, error, strlen(error));
+	}
 
 done:
-	strbuf_release(&submodule_dir);
+	strbuf_release(&sb);
 	if (merge_bases)
 		free_commit_list(merge_bases);
 	if (left)
diff --git a/submodule.h b/submodule.h
index 1277480add..9df0a3aea2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -53,17 +53,14 @@ extern int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 extern const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
 extern void handle_ignore_submodules_arg(struct diff_options *, const char *);
-extern void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset);
-extern void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *opt);
+		const char *del, const char *add, const char *reset);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
 /* Check if we want to update any submodule.*/
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 12/19] diff.c: convert emit_binary_diff_body to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (10 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 11/19] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 13/19] diff.c: convert show_stats " Stefan Beller
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_binary_diff_body.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 45ec311828..ffdb728810 100644
--- a/diff.c
+++ b/diff.c
@@ -2233,8 +2233,8 @@ static unsigned char *deflate_it(char *data,
 	return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
-				  const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+				  mmfile_t *one, mmfile_t *two)
 {
 	void *cp;
 	void *delta;
@@ -2263,13 +2263,12 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	}
 
 	if (delta && delta_size < deflate_size) {
-		fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+		emit_line_fmt(o, NULL, NULL, "delta %lu\n", orig_size);
 		free(deflated);
 		data = delta;
 		data_size = delta_size;
-	}
-	else {
-		fprintf(file, "%sliteral %lu\n", prefix, two->size);
+	} else {
+		emit_line_fmt(o, NULL, NULL, "literal %lu\n", two->size);
 		free(delta);
 		data = deflated;
 		data_size = deflate_size;
@@ -2278,8 +2277,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	/* emit data encoded in base85 */
 	cp = data;
 	while (data_size) {
+		int len;
 		int bytes = (52 < data_size) ? 52 : data_size;
-		char line[70];
+		char line[71];
 		data_size -= bytes;
 		if (bytes <= 26)
 			line[0] = bytes + 'A' - 1;
@@ -2287,20 +2287,25 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 			line[0] = bytes - 26 + 'a' - 1;
 		encode_85(line + 1, cp, bytes);
 		cp = (char *) cp + bytes;
-		fprintf(file, "%s", prefix);
-		fputs(line, file);
-		fputc('\n', file);
+
+		len = strlen(line);
+		line[len++] = '\n';
+		line[len] = '\0';
+
+		emit_line(o, NULL, NULL, line, len);
 	}
-	fprintf(file, "%s\n", prefix);
+	emit_line(o, NULL, NULL, "\n", 1);
 	free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-			     const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+			     mmfile_t *one, mmfile_t *two)
 {
-	fprintf(file, "%sGIT binary patch\n", prefix);
-	emit_binary_diff_body(file, one, two, prefix);
-	emit_binary_diff_body(file, two, one, prefix);
+	const char *s = "GIT binary patch\n";
+	const int len = strlen(s);
+	emit_line(o, NULL, NULL, s, len);
+	emit_binary_diff_body(o, one, two);
+	emit_binary_diff_body(o, two, one);
 }
 
 int diff_filespec_is_binary(struct diff_filespec *one)
@@ -2489,7 +2494,7 @@ static void builtin_diff(const char *name_a,
 				  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
-			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
+			emit_binary_diff(o, &mf1, &mf2);
 		else
 			emit_line_fmt(o, NULL, NULL,
 				      "Binary files %s and %s differ\n",
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 13/19] diff.c: convert show_stats to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (11 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 12/19] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 14/19] diff.c: convert word diffing " Stefan Beller
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_line_* machinery and
keep print_stat_summary with the same arguments around.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 89 ++++++++++++++++++++++++++++++++++++++----------------------------
 diff.h |  4 +--
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/diff.c b/diff.c
index ffdb728810..91dc045a45 100644
--- a/diff.c
+++ b/diff.c
@@ -1529,20 +1529,19 @@ static int scale_linear(int it, int width, int max_change)
 	return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
+static void show_name(struct strbuf *out,
 		      const char *prefix, const char *name, int len)
 {
-	fprintf(file, " %s%-*s |", prefix, len, name);
+	strbuf_addf(out, " %s%-*s |", prefix, len, name);
 }
 
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt, const char *set, const char *reset)
 {
 	if (cnt <= 0)
 		return;
-	fprintf(file, "%s", set);
-	while (cnt--)
-		putc(ch, file);
-	fprintf(file, "%s", reset);
+	strbuf_addstr(out, set);
+	strbuf_addchars(out, ch, cnt);
+	strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1566,14 +1565,16 @@ static void fill_print_name(struct diffstat_file *file)
 	file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+void print_stat_summary_0(struct diff_options *options, int files,
+			  int insertions, int deletions)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
 
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
-		return fprintf(fp, "%s\n", " 0 files changed");
+		strbuf_addstr(&sb, " 0 files changed");
+		emit_line(options, NULL, NULL, sb.buf, sb.len);
+		return;
 	}
 
 	strbuf_addf(&sb,
@@ -1600,9 +1601,17 @@ int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	ret = fputs(sb.buf, fp);
+	emit_line(options, NULL, NULL, sb.buf, sb.len);
 	strbuf_release(&sb);
-	return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+			int insertions, int deletions)
+{
+	struct diff_options o;
+	memset(&o, 0, sizeof(o));
+	o.file = fp;
+	print_stat_summary_0(&o, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1612,13 +1621,13 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 	int total_files = data->nr, count;
 	int width, name_width, graph_width, number_width = 0, bin_width = 0;
 	const char *reset, *add_c, *del_c;
-	const char *line_prefix = "";
 	int extra_shown = 0;
+	const char *line_prefix = diff_line_prefix(options);
+	struct strbuf out = STRBUF_INIT;
 
 	if (data->nr == 0)
 		return;
 
-	line_prefix = diff_line_prefix(options);
 	count = options->stat_count ? options->stat_count : data->nr;
 
 	reset = diff_get_color_opt(options, DIFF_RESET);
@@ -1772,26 +1781,29 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		}
 
 		if (file->is_binary) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " %*s", number_width, "Bin");
+			show_name(&out, prefix, name, len);
+			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
-				putc('\n', options->file);
+				strbuf_addch(&out, '\n');
+				emit_line(options, NULL, NULL, out.buf, out.len);
+				strbuf_reset(&out);
 				continue;
 			}
-			fprintf(options->file, " %s%"PRIuMAX"%s",
+			strbuf_addf(&out, " %s%"PRIuMAX"%s",
 				del_c, deleted, reset);
-			fprintf(options->file, " -> ");
-			fprintf(options->file, "%s%"PRIuMAX"%s",
+			strbuf_addstr(&out, " -> ");
+			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
-			fprintf(options->file, " bytes");
-			fprintf(options->file, "\n");
+			strbuf_addstr(&out, " bytes\n");
+			emit_line(options, NULL, NULL, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " Unmerged\n");
+			show_name(&out, prefix, name, len);
+			strbuf_addstr(&out, " Unmerged\n");
+			emit_line(options, NULL, NULL, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 
@@ -1814,14 +1826,15 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 				add = total - del;
 			}
 		}
-		fprintf(options->file, "%s", line_prefix);
-		show_name(options->file, prefix, name, len);
-		fprintf(options->file, " %*"PRIuMAX"%s",
+		show_name(&out, prefix, name, len);
+		strbuf_addf(&out, " %*"PRIuMAX"%s",
 			number_width, added + deleted,
 			added + deleted ? " " : "");
-		show_graph(options->file, '+', add, add_c, reset);
-		show_graph(options->file, '-', del, del_c, reset);
-		fprintf(options->file, "\n");
+		show_graph(&out, '+', add, add_c, reset);
+		show_graph(&out, '-', del, del_c, reset);
+		strbuf_addch(&out, '\n');
+		emit_line(options, NULL, NULL, out.buf, out.len);
+		strbuf_reset(&out);
 	}
 
 	for (i = 0; i < data->nr; i++) {
@@ -1842,11 +1855,12 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			fprintf(options->file, "%s ...\n", line_prefix);
+			emit_line(options, NULL, NULL,
+				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
-	fprintf(options->file, "%s", line_prefix);
-	print_stat_summary(options->file, total_files, adds, dels);
+
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_shortstats(struct diffstat_t *data, struct diff_options *options)
@@ -1858,7 +1872,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 
 	for (i = 0; i < data->nr; i++) {
 		int added = data->files[i]->added;
-		int deleted= data->files[i]->deleted;
+		int deleted = data->files[i]->deleted;
 
 		if (data->files[i]->is_unmerged ||
 		    (!data->files[i]->is_interesting && (added + deleted == 0))) {
@@ -1868,8 +1882,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 			dels += deleted;
 		}
 	}
-	fprintf(options->file, "%s", diff_line_prefix(options));
-	print_stat_summary(options->file, total_files, adds, dels);
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_numstat(struct diffstat_t *data, struct diff_options *options)
diff --git a/diff.h b/diff.h
index addebd5a0f..5e89481769 100644
--- a/diff.h
+++ b/diff.h
@@ -394,8 +394,8 @@ extern int parse_rename_score(const char **cp_p);
 
 extern long parse_algorithm_value(const char *value);
 
-extern int print_stat_summary(FILE *fp, int files,
-			      int insertions, int deletions);
+extern void print_stat_summary(FILE *fp, int files,
+			       int insertions, int deletions);
 extern void setup_diff_pager(struct diff_options *);
 
 #endif /* DIFF_H */
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 14/19] diff.c: convert word diffing to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (12 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 13/19] diff.c: convert show_stats " Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 22:40   ` Jonathan Tan
  2017-05-14  4:01 ` [PATCH 15/19] diff.c: convert diff_flush " Stefan Beller
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers all code related to diffing words.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 56 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/diff.c b/diff.c
index 91dc045a45..07041a35ad 100644
--- a/diff.c
+++ b/diff.c
@@ -886,37 +886,38 @@ struct diff_words_data {
 	struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
 					  struct diff_words_style_elem *st_el,
 					  const char *newline,
 					  size_t count, const char *buf,
 					  const char *line_prefix)
 {
-	int print = 0;
+	struct strbuf sb = STRBUF_INIT;
 
 	while (count) {
 		char *p = memchr(buf, '\n', count);
-		if (print)
-			fputs(line_prefix, fp);
+
 		if (p != buf) {
-			if (st_el->color && fputs(st_el->color, fp) < 0)
-				return -1;
-			if (fputs(st_el->prefix, fp) < 0 ||
-			    fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-			    fputs(st_el->suffix, fp) < 0)
-				return -1;
-			if (st_el->color && *st_el->color
-			    && fputs(GIT_COLOR_RESET, fp) < 0)
-				return -1;
+			if (st_el->color)
+				strbuf_addstr(&sb, st_el->color);
+			strbuf_addstr(&sb, st_el->prefix);
+			strbuf_add(&sb, buf, p ? p - buf : count);
+			strbuf_addstr(&sb, st_el->suffix);
+			if (st_el->color && *st_el->color)
+			    strbuf_addstr(&sb, GIT_COLOR_RESET);
 		}
 		if (!p)
-			return 0;
-		if (fputs(newline, fp) < 0)
-			return -1;
+			goto out;
+		strbuf_addstr(&sb, newline);
+		emit_line(o, NULL, NULL, sb.buf, sb.len);
+		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
-		print = 1;
 	}
+out:
+	if (sb.len)
+		emit_line(o, NULL, NULL, sb.buf, sb.len);
+	strbuf_release(&sb);
 	return 0;
 }
 
@@ -994,25 +995,25 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	} else
 		plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
 
-	if (color_words_output_graph_prefix(diff_words)) {
-		fputs(line_prefix, diff_words->opt->file);
-	}
+	if (color_words_output_graph_prefix(diff_words))
+		emit_line(diff_words->opt, NULL, NULL, "", 0);
+
 	if (diff_words->current_plus != plus_begin) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->ctx, style->newline,
 				plus_begin - diff_words->current_plus,
 				diff_words->current_plus, line_prefix);
 		if (*(plus_begin - 1) == '\n')
-			fputs(line_prefix, diff_words->opt->file);
+			emit_line(diff_words->opt, NULL, NULL, "", 0);
 	}
 	if (minus_begin != minus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->old, style->newline,
 				minus_end - minus_begin, minus_begin,
 				line_prefix);
 	}
 	if (plus_begin != plus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->new, style->newline,
 				plus_end - plus_begin, plus_begin,
 				line_prefix);
@@ -1109,8 +1110,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
 
 	/* special case: only removal */
 	if (!diff_words->plus.text.size) {
-		fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->old, style->newline,
 			diff_words->minus.text.size,
 			diff_words->minus.text.ptr, line_prefix);
@@ -1136,8 +1136,8 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+			emit_line(diff_words->opt, NULL, NULL, "", 0);
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
 			- diff_words->current_plus, diff_words->current_plus,
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 15/19] diff.c: convert diff_flush to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (13 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 14/19] diff.c: convert word diffing " Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 20:21   ` Jonathan Tan
  2017-05-14  4:01 ` [PATCH 16/19] diff.c: convert diff_summary " Stefan Beller
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_flush.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 07041a35ad..386b28cf47 100644
--- a/diff.c
+++ b/diff.c
@@ -4873,7 +4873,9 @@ void diff_flush(struct diff_options *options)
 				  term, 1);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				fputs(options->stat_sep, options->file);
+				emit_line(options, NULL, NULL,
+					  options->stat_sep,
+					  strlen(options->stat_sep));
 			}
 		}
 
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 16/19] diff.c: convert diff_summary to use emit_line_*
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (14 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 15/19] diff.c: convert diff_flush " Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 17/19] diff.c: factor out emit_line_ws for coloring whitespaces Stefan Beller
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_summary.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 62 +++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/diff.c b/diff.c
index 386b28cf47..899dc69dff 100644
--- a/diff.c
+++ b/diff.c
@@ -4504,67 +4504,71 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
 	}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char *newdelete, struct diff_filespec *fs)
 {
+	struct strbuf sb = STRBUF_INIT;
 	if (fs->mode)
-		fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+		strbuf_addf(&sb, " %s mode %06o ", newdelete, fs->mode);
 	else
-		fprintf(file, " %s ", newdelete);
-	write_name_quoted(fs->path, file, '\n');
+		strbuf_addf(&sb, " %s ", newdelete);
+
+	quote_c_style(fs->path, &sb, NULL, 0);
+	strbuf_addch(&sb, '\n');
+	emit_line(opt, NULL, NULL, sb.buf, sb.len);
+	strbuf_release(&sb);
 }
 
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int show_name,
-		const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+		int show_name)
 {
 	if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-		fprintf(file, "%s mode change %06o => %06o%c", line_prefix, p->one->mode,
-			p->two->mode, show_name ? ' ' : '\n');
+		struct strbuf sb = STRBUF_INIT;
 		if (show_name) {
-			write_name_quoted(p->two->path, file, '\n');
+			strbuf_addch(&sb, ' ');
+			quote_c_style(p->two->path, &sb, NULL, 0);
 		}
+		emit_line_fmt(opt, NULL, NULL,
+			      " mode change %06o => %06o%s\n",
+			      p->one->mode, p->two->mode,
+			      show_name ? sb.buf : "");
+		strbuf_release(&sb);
 	}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct diff_filepair *p,
-			const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+		struct diff_filepair *p)
 {
 	char *names = pprint_rename(p->one->path, p->two->path);
-
-	fprintf(file, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p));
+	emit_line_fmt(opt, NULL, NULL, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p));
 	free(names);
-	show_mode_change(file, p, 0, line_prefix);
+	show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-	FILE *file = opt->file;
-	const char *line_prefix = diff_line_prefix(opt);
-
 	switch(p->status) {
 	case DIFF_STATUS_DELETED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "delete", p->one);
+		show_file_mode_name(opt, "delete", p->one);
 		break;
 	case DIFF_STATUS_ADDED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "create", p->two);
+		show_file_mode_name(opt, "create", p->two);
 		break;
 	case DIFF_STATUS_COPIED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "copy", p, line_prefix);
+		show_rename_copy(opt, "copy", p);
 		break;
 	case DIFF_STATUS_RENAMED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "rename", p, line_prefix);
+		show_rename_copy(opt, "rename", p);
 		break;
 	default:
 		if (p->score) {
-			fprintf(file, "%s rewrite ", line_prefix);
-			write_name_quoted(p->two->path, file, ' ');
-			fprintf(file, "(%d%%)\n", similarity_index(p));
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, " rewrite ");
+			quote_c_style(p->two->path, &sb, NULL, 0);
+			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
+			emit_line(opt, NULL, NULL, sb.buf, sb.len);
 		}
-		show_mode_change(file, p, !p->score, line_prefix);
+		show_mode_change(opt, p, !p->score);
 		break;
 	}
 }
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 17/19] diff.c: factor out emit_line_ws for coloring whitespaces
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (15 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 16/19] diff.c: convert diff_summary " Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:01 ` [PATCH 18/19] diff: buffer all output if asked to Stefan Beller
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

Introduce a helper that calls ws_check_emit. We'll have it as a helper
as in a later patch we'll add buffering in there.

In a later patch we want to buffer up all output and to do that
we'll need to keep around information for outputting a line such
as the whitespace information. We choose to put this information into
a new struct 'buffered_filepair', which later will be persisting longer
than builtin_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 899dc69dff..08dcc56bb9 100644
--- a/diff.c
+++ b/diff.c
@@ -552,6 +552,16 @@ void emit_line(struct diff_options *o, const char *set, const char *reset,
 	emit_line_0(o, set, reset, 0, line, len);
 }
 
+static void emit_line_ws(struct diff_options *o,
+			 const char *set, const char *reset, int sign,
+			 const char *line, int len,
+			 const char *ws, unsigned ws_rule)
+{
+	emit_line_0(o, set, reset, sign, "", 0);
+	ws_check_emit(line, len, ws_rule,
+		      o->file, set, reset, ws);
+}
+
 void emit_line_fmt(struct diff_options *o,
 		   const char *set, const char *reset,
 		   const char *fmt, ...)
@@ -598,12 +608,10 @@ static void emit_line_checked(const char *reset,
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
 		emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
-	else {
+	else
 		/* Emit just the prefix, then the rest. */
-		emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
-		ws_check_emit(line, len, ecbdata->ws_rule,
-			      ecbdata->opt->file, set, reset, ws);
-	}
+		emit_line_ws(ecbdata->opt, set, reset, sign, line, len,
+			     ws, ecbdata->ws_rule);
 }
 
 static void emit_add_line(const char *reset,
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 18/19] diff: buffer all output if asked to
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (16 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 17/19] diff.c: factor out emit_line_ws for coloring whitespaces Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-14  4:06   ` Jeff King
  2017-05-16  4:14   ` Jonathan Tan
  2017-05-14  4:01 ` [PATCH 19/19] diff.c: color moved lines differently Stefan Beller
                   ` (2 subsequent siblings)
  20 siblings, 2 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

Introduce a new option 'use_buffer' in the struct diff_options which
controls whether all output is buffered up until all output is available.

We'll have two new structs in diff.h, one of them 'buffered_patch_line'
will be used to buffer each line, and the other 'buffered_patch_file_pair'
will store information about information relevant on a per file basis.
The buffered_patch_line will duplicate the memory of the line to buffer
as that is easiest to reason about for now.  In a future patch we may want
to decrease the memory usage by not duplicating all output for buffering
but rather we may want to store offsets into the file or in case of hunk
descriptions such as the similarity score, we could just store the relevant
number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_line_0 factors out the emission part into emit_line_emission,
and depending on the diff_options->use_buffer the emission
will be performed directly when calling emit_line_0 or after the
whole process is done, i.e. by buffering we have add the possibility
for a second pass over the whole output before doing the actual
output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.

When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the main
buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_line_* for output.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 diff.h |  39 ++++++++++++++++
 2 files changed, 181 insertions(+), 16 deletions(-)

diff --git a/diff.c b/diff.c
index 08dcc56bb9..dbab7fb44e 100644
--- a/diff.c
+++ b/diff.c
@@ -516,29 +516,29 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int sign, const char *line, int len)
+static void emit_buffered_patch_line(struct diff_options *o,
+				     struct buffered_patch_line *e)
 {
-	int has_trailing_newline, has_trailing_carriage_return;
+	int has_trailing_newline, has_trailing_carriage_return, len = e->len;
 	FILE *file = o->file;
 
 	fputs(diff_line_prefix(o), file);
 
-	has_trailing_newline = (len > 0 && line[len-1] == '\n');
+	has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
 	if (has_trailing_newline)
 		len--;
-	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+	has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
 	if (has_trailing_carriage_return)
 		len--;
 
-	if (len || sign) {
-		if (set)
-			fputs(set, file);
-		if (sign)
-			fputc(sign, file);
-		fwrite(line, len, 1, file);
-		if (reset)
-			fputs(reset, file);
+	if (len || e->sign) {
+		if (e->set)
+			fputs(e->set, file);
+		if (e->sign)
+			fputc(e->sign, file);
+		fwrite(e->line, len, 1, file);
+		if (e->reset)
+			fputs(e->reset, file);
 	}
 	if (has_trailing_carriage_return)
 		fputc('\r', file);
@@ -546,6 +546,65 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		fputc('\n', file);
 }
 
+static void emit_buffered_patch_line_ws(struct diff_options *o,
+					struct buffered_patch_line *e,
+					const char *ws, unsigned ws_rule)
+{
+	struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign};
+
+	emit_buffered_patch_line(o, &s);
+	ws_check_emit(e->line, e->len, ws_rule,
+		      o->file, e->set, e->reset, ws);
+}
+
+static void process_next_buffered_patch_line(struct diff_options *o, int line_no)
+{
+	struct buffered_patch_line *e = &o->line_buffer[line_no];
+
+	const char *ws = o->current_filepair->ws;
+	unsigned ws_rule = o->current_filepair->ws_rule;
+
+	switch (e->state) {
+		case BPL_EMIT_LINE_ASIS:
+			emit_buffered_patch_line(o, e);
+			break;
+		case BPL_EMIT_LINE_WS:
+			emit_buffered_patch_line_ws(o, e, ws, ws_rule);
+			break;
+		case BPL_HANDOVER:
+			o->current_filepair++;
+			break;
+		default:
+			die("BUG: malformatted buffered patch line: '%d'", e->state);
+	}
+}
+
+static void append_buffered_patch_line(struct diff_options *o,
+				       struct buffered_patch_line *e)
+{
+	struct buffered_patch_line *f;
+	ALLOC_GROW(o->line_buffer,
+		   o->line_buffer_nr + 1,
+		   o->line_buffer_alloc);
+	f = &o->line_buffer[o->line_buffer_nr];
+	o->line_buffer_nr++;
+
+	memcpy(f, e, sizeof(struct buffered_patch_line));
+	f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
+}
+
+static void emit_line_0(struct diff_options *o,
+			const char *set, const char *reset,
+			int sign, const char *line, int len)
+{
+	struct buffered_patch_line e = {set, reset, line, len, sign, BPL_EMIT_LINE_ASIS};
+
+	if (o->use_buffer)
+		append_buffered_patch_line(o, &e);
+	else
+		emit_buffered_patch_line(o, &e);
+}
+
 void emit_line(struct diff_options *o, const char *set, const char *reset,
 	       const char *line, int len)
 {
@@ -557,9 +616,12 @@ static void emit_line_ws(struct diff_options *o,
 			 const char *line, int len,
 			 const char *ws, unsigned ws_rule)
 {
-	emit_line_0(o, set, reset, sign, "", 0);
-	ws_check_emit(line, len, ws_rule,
-		      o->file, set, reset, ws);
+	struct buffered_patch_line e = {set, reset, line, len, sign, BPL_EMIT_LINE_WS};
+
+	if (o->use_buffer)
+		append_buffered_patch_line(o, &e);
+	else
+		emit_buffered_patch_line_ws(o, &e, ws, ws_rule);
 }
 
 void emit_line_fmt(struct diff_options *o,
@@ -1160,6 +1222,16 @@ static void diff_words_flush(struct emit_callback *ecbdata)
 	if (ecbdata->diff_words->minus.text.size ||
 	    ecbdata->diff_words->plus.text.size)
 		diff_words_show(ecbdata->diff_words);
+
+	if (ecbdata->diff_words->opt->line_buffer_nr) {
+		int i;
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			append_buffered_patch_line(ecbdata->opt,
+				&ecbdata->diff_words->opt->line_buffer[i]);
+
+		ecbdata->diff_words->opt->line_buffer_nr = 0;
+		/* TODO: free memory as well */
+	}
 }
 
 static void diff_filespec_load_driver(struct diff_filespec *one)
@@ -1195,6 +1267,11 @@ static void init_diff_words_data(struct emit_callback *ecbdata,
 		xcalloc(1, sizeof(struct diff_words_data));
 	ecbdata->diff_words->type = o->word_diff;
 	ecbdata->diff_words->opt = o;
+
+	o->line_buffer = NULL;
+	o->line_buffer_nr = 0;
+	o->line_buffer_alloc = 0;
+
 	if (!o->word_regex)
 		o->word_regex = userdiff_word_regex(one);
 	if (!o->word_regex)
@@ -2568,9 +2645,25 @@ static void builtin_diff(const char *name_a,
 			xecfg.ctxlen = strtoul(v, NULL, 10);
 		if (o->word_diff)
 			init_diff_words_data(&ecbdata, o, one, two);
+		if (o->use_buffer) {
+			ALLOC_GROW(o->filepair_buffer,
+				   o->filepair_buffer_nr + 1,
+				   o->filepair_buffer_alloc);
+			o->current_filepair =
+				&o->filepair_buffer[o->filepair_buffer_nr++];
+
+			o->current_filepair->ws_rule = ecbdata.ws_rule;
+			o->current_filepair->ws =
+				diff_get_color(ecbdata.color_diff, DIFF_WHITESPACE);
+		}
 		if (xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 				  &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
+		if (o->use_buffer) {
+			struct buffered_patch_line e = BUFFERED_PATCH_LINE_INIT;
+			e.state = BPL_HANDOVER; /* handover to next file pair */
+			append_buffered_patch_line(o, &e);
+		}
 		if (o->word_diff)
 			free_diff_words_data(&ecbdata);
 		if (textconv_one)
@@ -4785,11 +4878,44 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
+	/*
+	 * For testing purposes we want to make sure the diff machinery
+	 * works completely with the buffer. If there is anything emitted
+	 * outside the emit_buffered_patch_line, then the order is screwed
+	 * up and the tests will fail.
+	 *
+	 * TODO (later in this series):
+	 * We'll unset this flag in a later patch.
+	 */
+	o->use_buffer = 1;
+
+	if (o->use_buffer) {
+		ALLOC_GROW(o->filepair_buffer,
+			   o->filepair_buffer_nr + 1,
+			   o->filepair_buffer_alloc);
+		o->current_filepair = &o->filepair_buffer[o->filepair_buffer_nr];
+	}
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		if (check_pair_status(p))
 			diff_flush_patch(p, o);
 	}
+
+	if (o->use_buffer) {
+		o->current_filepair = &o->filepair_buffer[0];
+		for (i = 0; i < o->line_buffer_nr; i++)
+			process_next_buffered_patch_line(o, i);
+
+		for (i = 0; i < o->line_buffer_nr; i++);
+			free((void*)o->line_buffer[i].line);
+
+		o->line_buffer = NULL;
+		o->line_buffer_nr = 0;
+		free(o->line_buffer);
+		o->filepair_buffer = NULL;
+		o->filepair_buffer_nr = 0;
+		free(o->filepair_buffer);
+	}
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 5e89481769..c334aac02e 100644
--- a/diff.h
+++ b/diff.h
@@ -115,6 +115,36 @@ enum diff_submodule_format {
 	DIFF_SUBMODULE_INLINE_DIFF
 };
 
+/*
+ * This struct is used when we need to buffer the output of the diff output.
+ *
+ * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
+ * into the pre/post image file. This pointer could be a union with the
+ * line pointer. By storing an offset into the file instead of the literal line,
+ * we can decrease the memory footprint for the buffered output. At first we
+ * may want to only have indirection for the content lines, but we could
+ * also have an enum (based on sign?) that stores prefabricated lines, e.g.
+ * the similarity score line or hunk/file headers.
+ */
+struct buffered_patch_line {
+	const char *set;
+	const char *reset;
+	const char *line;
+	int len;
+	int sign;
+	enum {
+		BPL_EMIT_LINE_WS,
+		BPL_EMIT_LINE_ASIS,
+		BPL_HANDOVER
+	} state;
+};
+#define BUFFERED_PATCH_LINE_INIT {NULL, NULL, NULL, 0, 0, 0}
+
+struct buffered_filepair {
+	const char *ws;
+	unsigned ws_rule;
+};
+
 struct diff_options {
 	const char *orderfile;
 	const char *pickaxe;
@@ -186,6 +216,15 @@ struct diff_options {
 	void *output_prefix_data;
 
 	int diff_path_counter;
+
+	int use_buffer;
+
+	struct buffered_patch_line *line_buffer;
+	int line_buffer_nr, line_buffer_alloc;
+
+	struct buffered_filepair *filepair_buffer;
+	int filepair_buffer_nr, filepair_buffer_alloc;
+	struct buffered_filepair *current_filepair;
 };
 
 void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH 19/19] diff.c: color moved lines differently
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (17 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 18/19] diff: buffer all output if asked to Stefan Beller
@ 2017-05-14  4:01 ` Stefan Beller
  2017-05-15 22:42   ` Brandon Williams
                     ` (2 more replies)
  2017-05-15 12:43 ` [RFC PATCH 00/19] Diff machine: highlight moved lines Junio C Hamano
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
  20 siblings, 3 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:01 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, peff, gitster, mhagger, jrnieder, bmwill,
	Stefan Beller

When there is a lot of code moved around such as in 11979b9 (2005-11-18,
"http.c: reorder to avoid compilation failure.") for example, the review
process is quite hard, as it is not mentally challenging.  It is a rather
tedious process, that gets boring quickly. However you still need to read
through all of the code to make sure the moved lines are there as supposed.

While it is trivial to color up a patch like the following

    $ git diff
    diff --git a/file2.c b/file2.c
    index 9163a0f..8e66dc0 100644
    --- a/file2.c
    +++ b/file2.c
    @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
            return memcpy(xmallocz(len), data, len);
     }

    -int secure_foo(struct user *u)
    -{
    -       if (!u->is_allowed_foo)
    -               return;
    -       foo(u);
    -}
    -
     char *xstrndup(const char *str, size_t len)
     {
            char *p = memchr(str, '\0', len);
    diff --git a/test.c b/test.c
    index a95e6fe..81eb0eb 100644
    --- a/test.c
    +++ b/test.c
    @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
            return total;
     }

    +int secure_foo(struct user *u)
    +{
    +       if (!u->is_allowed_foo)
    +               return;
    +       foo(u);
    +}
    +
     int xdup(int fd)
     {
            int ret = dup(fd);

as in this patch all lines that add or remove lines
should be colored in the new color that indicates moved
lines.

However the intention of this patch is to aid reviewers
to spotting permutations in the moved code. So consider the
following malicious move:

    diff --git a/file2.c b/file2.c
    index 9163a0f..8e66dc0 100644
    --- a/file2.c
    +++ b/file2.c
    @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
            return memcpy(xmallocz(len), data, len);
     }

    -int secure_foo(struct user *u)
    -{
    -       if (!u->is_allowed_foo)
    -               return;
    -       foo(u);
    -}
    -
     char *xstrndup(const char *str, size_t len)
     {
            char *p = memchr(str, '\0', len);
    diff --git a/test.c b/test.c
    index a95e6fe..a679c40 100644
    --- a/test.c
    +++ b/test.c
    @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
            return total;
     }

    +int secure_foo(struct user *u)
    +{
    +       foo(u);
    +       if (!u->is_allowed_foo)
    +               return;
    +}
    +
     int xdup(int fd)
     {
            int ret = dup(fd);

If the moved code is larger, it is easier to hide some permutation in the
code, which is why we would not want to color all lines as "moved" in this
case. So we do not just need to color lines differently that are added and
removed in the same diff, we need to tweak the algorithm a bit more.

As the reviewers attention should be brought to the places, where the
difference is introduced to the moved code, we cannot just have one new
color for all of moved code.

First I implemented an alternative design, which would show a moved hunk
in one color, and its boundaries in another color. This idea was error
prone as it inspected each line and its neighboring lines to determine
if the line was (a) moved and (b) if was deep inside a hunk by having
matching neighboring lines. This is unreliable as the we can construct
hunks which have equal neighbors that just exceed the number of lines
inspected. See one of the tests as an example.

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then switches color to the alternative color
for the next hunk. By doing this any permutation is recognized and
displayed. That implies that there is no dedicated boundary or
inside-hunk color, but instead we'll have just two colors alternating
for hunks.

It would be a bit more UX friendly if the two corresponding hunks
(of added and deleted lines) for one move would get the same color id.
(Both get "regular moved" or "alternative moved"). This problem is
deferred to a later patch for now.

Algorithm-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt   |  12 +-
 diff.c                     | 265 ++++++++++++++++++++++++++++++++++++++++++---
 diff.h                     |  21 +++-
 t/t4015-diff-whitespace.sh | 229 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 506 insertions(+), 21 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..90403c06e3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=<when>]` option.
 
+color.moved::
+	A boolean value, whether a diff should color moved lines
+	differently. The moved lines are searched for in the diff only.
+	Duplicated lines from somewhere in the project that are not
+	part of the diff are not colored as moved.
+	Defaults to false.
+
 color.diff.<slot>::
 	Use customized color for diff colorization.  `<slot>` specifies
 	which part of the patch to use the specified color, and is one
 	of `context` (context text - `plain` is a historical synonym),
 	`meta` (metainformation), `frag`
 	(hunk header), 'func' (function in hunk header), `old` (removed lines),
-	`new` (added lines), `commit` (commit headers), or `whitespace`
-	(highlighting whitespace errors).
+	`new` (added lines), `commit` (commit headers), `whitespace`
+	(highlighting whitespace errors), `movedFrom` (removed lines that
+	reappear), `movedTo` (added lines that were removed elsewhere).
 
 color.decorate.<slot>::
 	Use customized color for 'git log --decorate' output.  `<slot>` is one
diff --git a/diff.c b/diff.c
index dbab7fb44e..6372e0eb25 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static int diff_indent_heuristic; /* experimental */
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
+static int diff_color_moved_default;
 static int diff_context_default = 3;
 static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
@@ -55,6 +56,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
 	GIT_COLOR_YELLOW,	/* COMMIT */
 	GIT_COLOR_BG_RED,	/* WHITESPACE */
 	GIT_COLOR_NORMAL,	/* FUNCINFO */
+	GIT_COLOR_BOLD_RED,	/* OLD_MOVED_A */
+	GIT_COLOR_BG_RED,	/* OLD_MOVED_B */
+	GIT_COLOR_BOLD_GREEN,	/* NEW_MOVED_A */
+	GIT_COLOR_BG_GREEN,	/* NEW_MOVED_B */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -80,6 +85,14 @@ static int parse_diff_color_slot(const char *var)
 		return DIFF_WHITESPACE;
 	if (!strcasecmp(var, "func"))
 		return DIFF_FUNCINFO;
+	if (!strcasecmp(var, "oldmoved"))
+		return DIFF_FILE_OLD_MOVED;
+	if (!strcasecmp(var, "oldmovedalternative"))
+		return DIFF_FILE_OLD_MOVED_ALT;
+	if (!strcasecmp(var, "newmoved"))
+		return DIFF_FILE_NEW_MOVED;
+	if (!strcasecmp(var, "newmovedalternative"))
+		return DIFF_FILE_NEW_MOVED_ALT;
 	return -1;
 }
 
@@ -234,6 +247,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_use_color_default = git_config_colorbool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "color.moved")) {
+		diff_color_moved_default = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.context")) {
 		diff_context_default = git_config_int(var, value);
 		if (diff_context_default < 0)
@@ -354,6 +371,81 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+struct moved_entry {
+	struct hashmap_entry ent;
+	const struct buffered_patch_line *line;
+	struct moved_entry *next_line;
+};
+
+static void get_ws_cleaned_string(const struct buffered_patch_line *l,
+				  struct strbuf *out)
+{
+	int i;
+	for (i = 0; i < l->len; i++) {
+		if (isspace(l->line[i]))
+			continue;
+		strbuf_addch(out, l->line[i]);
+	}
+}
+
+static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
+					 const struct buffered_patch_line *b,
+					 const void *keydata)
+{
+	struct strbuf sba = STRBUF_INIT;
+	struct strbuf sbb = STRBUF_INIT;
+	get_ws_cleaned_string(a, &sba);
+	get_ws_cleaned_string(b, &sbb);
+	return sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
+}
+
+static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
+				   const struct buffered_patch_line *b,
+				   const void *keydata)
+{
+	return a->len != b->len || strncmp(a->line, b->line, a->len);
+}
+
+static int moved_entry_cmp(const struct moved_entry *a,
+			   const struct moved_entry *b,
+			   const void *keydata)
+{
+	return buffered_patch_line_cmp(a->line, b->line, keydata);
+}
+
+static int moved_entry_cmp_no_ws(const struct moved_entry *a,
+				 const struct moved_entry *b,
+				 const void *keydata)
+{
+	return buffered_patch_line_cmp_no_ws(a->line, b->line, keydata);
+}
+
+static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
+{
+	static struct strbuf sb = STRBUF_INIT;
+
+	if (ignore_ws) {
+		strbuf_reset(&sb);
+		get_ws_cleaned_string(line, &sb);
+		return memhash(sb.buf, sb.len);
+	} else
+		return memhash(line->line, line->len);
+}
+
+static struct moved_entry *prepare_entry(struct diff_options *o,
+					 int line_no)
+{
+	struct moved_entry *ret = xmalloc(sizeof(*ret));
+	unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+	struct buffered_patch_line *l = &o->line_buffer[line_no];
+
+	ret->ent.hash = get_line_hash(l, ignore_ws);
+	ret->line = l;
+	ret->next_line = NULL;
+
+	return ret;
+}
+
 static char *quote_two(const char *one, const char *two)
 {
 	int need_one = quote_c_style(one, NULL, NULL, 1);
@@ -516,8 +608,98 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
+static void mark_color_as_moved(struct diff_options *o, int line_no)
+{
+	struct hashmap *hm = NULL;
+	struct moved_entry *key = prepare_entry(o, line_no);
+	struct moved_entry *match = NULL;
+	struct buffered_patch_line *l = &o->line_buffer[line_no];
+	int alt_flag;
+	int i, lp, rp;
+
+	switch (l->sign) {
+	case '+':
+		hm = o->deleted_lines;
+		break;
+	case '-':
+		hm = o->added_lines;
+		break;
+	default:
+		/* reset to standard, on-alt move color */
+		o->color_moved = 1;
+		break;
+	}
+
+	/* Check for any match to color it as a move. */
+	if (!hm)
+		return;
+	match = hashmap_get(hm, key, o);
+	free(key);
+	if (!match)
+		return;
+
+	/* Check any potential block runs, advance each or nullify */
+	for (i = 0; i < o->pmb_nr; i++) {
+		struct moved_entry *p = o->pmb[i];
+		if (p && p->next_line &&
+		    !buffered_patch_line_cmp(p->next_line->line, l, o)) {
+			o->pmb[i] = p->next_line;
+		} else {
+			o->pmb[i] = NULL;
+		}
+	}
+
+	/* Shrink the set to the remaining runs */
+	for (lp = 0, rp = o->pmb_nr - 1; lp <= rp;) {
+		while (lp < o->pmb_nr && o->pmb[lp])
+			lp ++;
+		/* lp points at the first NULL now */
+
+		while (rp > -1 && !o->pmb[rp])
+			rp--;
+		/* rp points at the last non-NULL */
+
+		if (lp < o->pmb_nr && rp > -1 && lp < rp) {
+			o->pmb[lp] = o->pmb[rp];
+			o->pmb[rp] = NULL;
+			rp--;
+			lp++;
+		}
+	}
+
+	if (rp > -1) {
+		/* Remember the number of running sets */
+		o->pmb_nr = rp + 1;
+	} else {
+		/* Toggle color */
+		o->color_moved = o->color_moved == 2 ? 1 : 2;
+
+		/* Build up a new set */
+		i = 0;
+		for (; match; match = hashmap_get_next(hm, match)) {
+			ALLOC_GROW(o->pmb, i + 1, o->pmb_alloc);
+			o->pmb[i] = match;
+			i++;
+		}
+		o->pmb_nr = i;
+	}
+
+	alt_flag = o->color_moved - 1;
+	switch (l->sign) {
+	case '+':
+		l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
+		break;
+	case '-':
+		l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
+		break;
+	default:
+		; /* nothing */
+	}
+}
+
 static void emit_buffered_patch_line(struct diff_options *o,
-				     struct buffered_patch_line *e)
+				     struct buffered_patch_line *e,
+				     int pass)
 {
 	int has_trailing_newline, has_trailing_carriage_return, len = e->len;
 	FILE *file = o->file;
@@ -548,11 +730,11 @@ static void emit_buffered_patch_line(struct diff_options *o,
 
 static void emit_buffered_patch_line_ws(struct diff_options *o,
 					struct buffered_patch_line *e,
-					const char *ws, unsigned ws_rule)
+					const char *ws, unsigned ws_rule,
+					int pass)
 {
 	struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign};
-
-	emit_buffered_patch_line(o, &s);
+	emit_buffered_patch_line(o, &s, 0);
 	ws_check_emit(e->line, e->len, ws_rule,
 		      o->file, e->set, e->reset, ws);
 }
@@ -564,12 +746,14 @@ static void process_next_buffered_patch_line(struct diff_options *o, int line_no
 	const char *ws = o->current_filepair->ws;
 	unsigned ws_rule = o->current_filepair->ws_rule;
 
+	mark_color_as_moved(o, line_no);
+
 	switch (e->state) {
 		case BPL_EMIT_LINE_ASIS:
-			emit_buffered_patch_line(o, e);
+			emit_buffered_patch_line(o, e, 1);
 			break;
 		case BPL_EMIT_LINE_WS:
-			emit_buffered_patch_line_ws(o, e, ws, ws_rule);
+			emit_buffered_patch_line_ws(o, e, ws, ws_rule, 1);
 			break;
 		case BPL_HANDOVER:
 			o->current_filepair++;
@@ -602,7 +786,7 @@ static void emit_line_0(struct diff_options *o,
 	if (o->use_buffer)
 		append_buffered_patch_line(o, &e);
 	else
-		emit_buffered_patch_line(o, &e);
+		emit_buffered_patch_line(o, &e, 0);
 }
 
 void emit_line(struct diff_options *o, const char *set, const char *reset,
@@ -621,7 +805,7 @@ static void emit_line_ws(struct diff_options *o,
 	if (o->use_buffer)
 		append_buffered_patch_line(o, &e);
 	else
-		emit_buffered_patch_line_ws(o, &e, ws, ws_rule);
+		emit_buffered_patch_line_ws(o, &e, ws, ws_rule, 0);
 }
 
 void emit_line_fmt(struct diff_options *o,
@@ -676,6 +860,36 @@ static void emit_line_checked(const char *reset,
 			     ws, ecbdata->ws_rule);
 }
 
+static void add_line_to_move_detection(struct diff_options *o, int line_idx)
+{
+	int sign = 0;
+	struct hashmap *hm;
+	struct moved_entry *key;
+
+	switch (o->line_buffer[line_idx].sign) {
+	case '+':
+		sign = '+';
+		hm = o->added_lines;
+		break;
+	case '-':
+		sign = '-';
+		hm = o->deleted_lines;
+		break;
+	case ' ':
+	default:
+		o->prev_line = NULL;
+		return;
+	}
+
+	key = prepare_entry(o, line_idx);
+	if (o->prev_line &&
+	    o->prev_line->line->sign == sign)
+		o->prev_line->next_line = key;
+
+	hashmap_add(hm, key);
+	o->prev_line = key;
+}
+
 static void emit_add_line(const char *reset,
 			  struct emit_callback *ecbdata,
 			  const char *line, int len)
@@ -3649,6 +3863,9 @@ void diff_setup_done(struct diff_options *options)
 
 	if (DIFF_OPT_TST(options, FOLLOW_RENAMES) && options->pathspec.nr != 1)
 		die(_("--follow requires exactly one pathspec"));
+
+	if (!options->use_color || external_diff())
+		options->color_moved = 0;
 }
 
 static int opt_arg(const char *arg, int arg_short, const char *arg_long, int *val)
@@ -4073,6 +4290,10 @@ int diff_opt_parse(struct diff_options *options,
 	}
 	else if (!strcmp(arg, "--no-color"))
 		options->use_color = 0;
+	else if (!strcmp(arg, "--color-moved"))
+		options->color_moved = 1;
+	else if (!strcmp(arg, "--no-color-moved"))
+		options->color_moved = 0;
 	else if (!strcmp(arg, "--color-words")) {
 		options->use_color = 1;
 		options->word_diff = DIFF_WORDS_COLOR;
@@ -4878,16 +5099,19 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
-	/*
-	 * For testing purposes we want to make sure the diff machinery
-	 * works completely with the buffer. If there is anything emitted
-	 * outside the emit_buffered_patch_line, then the order is screwed
-	 * up and the tests will fail.
-	 *
-	 * TODO (later in this series):
-	 * We'll unset this flag in a later patch.
-	 */
-	o->use_buffer = 1;
+
+	if (o->color_moved) {
+		unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+		o->use_buffer = 1;
+		o->deleted_lines = xmallocz(sizeof(*o->deleted_lines));
+		o->added_lines = xmallocz(sizeof(*o->added_lines));
+		hashmap_init(o->deleted_lines, ignore_ws ?
+			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+			(hashmap_cmp_fn)moved_entry_cmp, 0);
+		hashmap_init(o->added_lines, ignore_ws ?
+			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+			(hashmap_cmp_fn)moved_entry_cmp, 0);
+	}
 
 	if (o->use_buffer) {
 		ALLOC_GROW(o->filepair_buffer,
@@ -4902,6 +5126,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 	}
 
 	if (o->use_buffer) {
+		o->current_filepair = &o->filepair_buffer[0];
+		for (i = 0; i < o->line_buffer_nr; i++)
+			add_line_to_move_detection(o, i);
+
 		o->current_filepair = &o->filepair_buffer[0];
 		for (i = 0; i < o->line_buffer_nr; i++)
 			process_next_buffered_patch_line(o, i);
@@ -4992,6 +5220,7 @@ void diff_flush(struct diff_options *options)
 		if (!options->file)
 			die_errno("Could not open /dev/null");
 		options->close_file = 1;
+		options->color_moved = 0;
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
 			if (check_pair_status(p))
diff --git a/diff.h b/diff.h
index c334aac02e..b83d6fefcc 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "hashmap.h"
 
 struct rev_info;
 struct diff_options;
@@ -145,6 +146,8 @@ struct buffered_filepair {
 	unsigned ws_rule;
 };
 
+struct moved_entry;
+
 struct diff_options {
 	const char *orderfile;
 	const char *pickaxe;
@@ -217,6 +220,8 @@ struct diff_options {
 
 	int diff_path_counter;
 
+	/* Determines color moved code. Flipped between 1, 2 for alt. color. */
+	int color_moved;
 	int use_buffer;
 
 	struct buffered_patch_line *line_buffer;
@@ -225,6 +230,16 @@ struct diff_options {
 	struct buffered_filepair *filepair_buffer;
 	int filepair_buffer_nr, filepair_buffer_alloc;
 	struct buffered_filepair *current_filepair;
+
+	/* built up in the first pass: */
+	struct hashmap *deleted_lines;
+	struct hashmap *added_lines;
+	/* needed for building up */
+	struct moved_entry *prev_line;
+
+	/* state in the second pass */
+	struct moved_entry **pmb; /* potentially moved blocks */
+	int pmb_nr, pmb_alloc;
 };
 
 void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
@@ -241,7 +256,11 @@ enum color_diff {
 	DIFF_FILE_NEW = 5,
 	DIFF_COMMIT = 6,
 	DIFF_WHITESPACE = 7,
-	DIFF_FUNCINFO = 8
+	DIFF_FUNCINFO = 8,
+	DIFF_FILE_OLD_MOVED = 9,
+	DIFF_FILE_OLD_MOVED_ALT = 10,
+	DIFF_FILE_NEW_MOVED = 11,
+	DIFF_FILE_NEW_MOVED_ALT = 12
 };
 const char *diff_get_color(int diff_use_color, enum color_diff ix);
 #define diff_get_color_opt(o, ix) \
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 289806d0c7..232d9ad55e 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -972,4 +972,233 @@ test_expect_success 'option overrides diff.wsErrorHighlight' '
 
 '
 
+test_expect_success 'detect moved code, complete file' '
+	git reset --hard &&
+	cat <<-\EOF >test.c &&
+	#include<stdio.h>
+	main()
+	{
+	printf("Hello World");
+	}
+	EOF
+	git add test.c &&
+	git commit -m "add main function" &&
+	git mv test.c main.c &&
+	git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>new file mode 100644<RESET>
+	<BOLD>index 0000000..a986c57<RESET>
+	<BOLD>--- /dev/null<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -0,0 +1,5 @@<RESET>
+	<BGREEN>+<RESET><BGREEN>#include<stdio.h><RESET>
+	<BGREEN>+<RESET><BGREEN>main()<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>printf("Hello World");<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>deleted file mode 100644<RESET>
+	<BOLD>index a986c57..0000000<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ /dev/null<RESET>
+	<CYAN>@@ -1,5 +0,0 @@<RESET>
+	<BRED>-#include<stdio.h><RESET>
+	<BRED>-main()<RESET>
+	<BRED>-{<RESET>
+	<BRED>-printf("Hello World");<RESET>
+	<BRED>-}<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect moved code, inside file' '
+	git reset --hard &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git add main.c test.c &&
+	git commit -m "add main and test file" &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BRED>-if (!u->is_allowed_foo)<RESET>
+	<BRED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BRED>-}<RESET>
+	<BRED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..e34eb69 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BGREEN>+<RESET><BGREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BGREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect permutations inside moved code, ' '
+	# reusing the move example from last test:
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			foo(u);
+			if (!u->is_allowed_foo)
+				return;
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BOLD;RED>-if (!u->is_allowed_foo)<RESET>
+	<BOLD;RED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BOLD;RED>-}<RESET>
+	<BOLD;RED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..2bedec9 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>}<RESET>
+	<BOLD;GREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.13.0.18.g183880de0a


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [PATCH 18/19] diff: buffer all output if asked to
  2017-05-14  4:01 ` [PATCH 18/19] diff: buffer all output if asked to Stefan Beller
@ 2017-05-14  4:06   ` Jeff King
  2017-05-14  4:25     ` Stefan Beller
  2017-05-16  4:14   ` Jonathan Tan
  1 sibling, 1 reply; 128+ messages in thread
From: Jeff King @ 2017-05-14  4:06 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, gitster, mhagger, jrnieder, bmwill

On Sat, May 13, 2017 at 09:01:16PM -0700, Stefan Beller wrote:

> +		for (i = 0; i < o->line_buffer_nr; i++);
> +			free((void*)o->line_buffer[i].line);

I haven't looked at the patches yet, but this ";" on the for line is
almost certainly a typo (gcc catches it due to the misleading
indentation of the next line).

-Peff

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 18/19] diff: buffer all output if asked to
  2017-05-14  4:06   ` Jeff King
@ 2017-05-14  4:25     ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-14  4:25 UTC (permalink / raw)
  To: Jeff King
  Cc: git@vger.kernel.org, Jonathan Tan, Junio C Hamano,
	Michael Haggerty, Jonathan Nieder, Brandon Williams

On Sat, May 13, 2017 at 9:06 PM, Jeff King <peff@peff.net> wrote:
> On Sat, May 13, 2017 at 09:01:16PM -0700, Stefan Beller wrote:
>
>> +             for (i = 0; i < o->line_buffer_nr; i++);
>> +                     free((void*)o->line_buffer[i].line);
>
> I haven't looked at the patches yet, but this ";" on the for line is
> almost certainly a typo (gcc catches it due to the misleading
> indentation of the next line).

Grr  :/

I have spent hours trying to figure out why this does not work,
questioning the design, my mental model of how pointers work
and programming in general.

/me should get gcc 6 and set  -Wmisleading-indentation

Thanks,
Stefan

>
> -Peff

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 02/19] diff: move line ending check into emit_hunk_header
  2017-05-14  4:01 ` [PATCH 02/19] diff: move line ending check into emit_hunk_header Stefan Beller
@ 2017-05-15  6:48   ` Junio C Hamano
  2017-05-15 16:13     ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-15  6:48 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, peff, mhagger, jrnieder, bmwill

Stefan Beller <sbeller@google.com> writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This patch moves code that is conceptually part of
> emit_hunk_header, but was using output in fn_out_consume,
> back to emit_hunk_header.

Makes sort-of sense.  If I were selling this patch, I'd remove the
first two paragraph and stress on how completing the line inside
emit_hunk_header() is conceptually cleaner than doing it outside.

	emit_hunk_header() function is responsible for assembling a
	hunk header and calling emit_line() to send the hunk header
	to the output file.  Its only caller fn_out_consume() needs
	to prepare for a case where the function emits an incomplete
	line and add the terminating LF.  

	Instead make sure emit_hunk_header() to always send a
	completed line to emit_line().

or something like that.

Note that I am not saying "buffering the entire diff in-core?  why
should we support such a use case?".  I am saying that this change
is a clean-up that is justifiable, without having to answer such a
question.

>
> Meanwhile simplify it by using a function that is designed for it.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 3f5bf8b5a4..c2ed605cd0 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
>  	}
>  
>  	strbuf_add(&msgbuf, line + len, org_len - len);
> +	strbuf_complete_line(&msgbuf);
> +
>  	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
>  	strbuf_release(&msgbuf);
>  }
> @@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>  		len = sane_truncate_line(ecbdata, line, len);
>  		find_lno(line, ecbdata);
>  		emit_hunk_header(ecbdata, line, len);
> -		if (line[len-1] != '\n')
> -			putc('\n', o->file);
>  		return;
>  	}

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/19] Diff machine: highlight moved lines.
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (18 preceding siblings ...)
  2017-05-14  4:01 ` [PATCH 19/19] diff.c: color moved lines differently Stefan Beller
@ 2017-05-15 12:43 ` Junio C Hamano
  2017-05-15 16:33   ` Stefan Beller
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
  20 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-15 12:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, peff, mhagger, jrnieder, bmwill

Stefan Beller <sbeller@google.com> writes:

> For details on *why* see the commit message of the last commit.

Luckily, we have a good test case to see how effective this approach
is in the flight.  Running

  $ git diff master...'pu^{/Merge branch .js/blame-lib}'^2

with your new feature should tell us that bulk of blame.[ch] that
are new files came from builtin/blame.c with some symbols renamed.

;-)

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 02/19] diff: move line ending check into emit_hunk_header
  2017-05-15  6:48   ` Junio C Hamano
@ 2017-05-15 16:13     ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 16:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jonathan Tan, Jeff King, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Sun, May 14, 2017 at 11:48 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> In a later patch, I want to propose an option to detect&color
>> moved lines in a diff, which cannot be done in a one-pass over
>> the diff. Instead we need to go over the whole diff twice,
>> because we cannot detect the first line of the two corresponding
>> lines (+ and -) that got moved.
>>
>> So to prepare the diff machinery for two pass algorithms
>> (i.e. buffer it all up and then operate on the result),
>> move all emissions to places, such that the only emitting
>> function is emit_line_0.
>>
>> This patch moves code that is conceptually part of
>> emit_hunk_header, but was using output in fn_out_consume,
>> back to emit_hunk_header.
>
> Makes sort-of sense.  If I were selling this patch, I'd remove the
> first two paragraph and stress on how completing the line inside
> emit_hunk_header() is conceptually cleaner than doing it outside.
>
>         emit_hunk_header() function is responsible for assembling a
>         hunk header and calling emit_line() to send the hunk header
>         to the output file.  Its only caller fn_out_consume() needs
>         to prepare for a case where the function emits an incomplete
>         line and add the terminating LF.
>
>         Instead make sure emit_hunk_header() to always send a
>         completed line to emit_line().
>
> or something like that.
>
> Note that I am not saying "buffering the entire diff in-core?  why
> should we support such a use case?".  I am saying that this change
> is a clean-up that is justifiable, without having to answer such a
> question.

Right, the first couple patches are more cleanup than preparation.
I considered sending them on their own, but then decided to rather
include it in this series.

I'll reword the commit message for a resend.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [RFC PATCH 00/19] Diff machine: highlight moved lines.
  2017-05-15 12:43 ` [RFC PATCH 00/19] Diff machine: highlight moved lines Junio C Hamano
@ 2017-05-15 16:33   ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jonathan Tan, Jeff King, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 5:43 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> For details on *why* see the commit message of the last commit.
>
> Luckily, we have a good test case to see how effective this approach
> is in the flight.  Running
>
>   $ git diff master...'pu^{/Merge branch .js/blame-lib}'^2
>
> with your new feature should tell us that bulk of blame.[ch] that
> are new files came from builtin/blame.c with some symbols renamed.

Oh! Yeah, that looks nice. Though looking through that[1] it seems not yet
optimal to me.

So we have 2 additional colors for moved code to differentiate between
adjacent moved blocks. However in this implementation we toggle between
these two colors even if we're not adjacent, such that the moved_alternative
color shows up quite frequently.

So if we have normal lines in between, we may want to reset to "default"
moved color.

[1] http://i.imgur.com/djAoTf8.png

Thanks for pointing out this series,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
  2017-05-14  4:01 ` [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0 Stefan Beller
@ 2017-05-15 18:26   ` Jonathan Tan
  2017-05-15 18:33     ` Stefan Beller
  2017-05-15 19:22   ` Brandon Williams
  1 sibling, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-15 18:26 UTC (permalink / raw)
  To: Stefan Beller, git; +Cc: peff, gitster, mhagger, jrnieder, bmwill

On 05/13/2017 09:01 PM, Stefan Beller wrote:
> In 250f79930d (diff.c: split emit_line() from the first char and the rest
> of the line, 2009-09-14) we introduced the local variable 'nofirst' that
> indicates if we have no first sign character. With the given implementation
> we had to use an extra variable unlike reusing 'first' because the lines
> first character could be '\0'.
>
> Change the meaning of the 'first' argument to not mean the first character
> of the line, but rather just containing the sign that is prepended to the
> line. Refactor emit_line to not include the lines first character, but pass
> the complete line as well as a '\0' sign, which now serves as an indication
> not to print a sign.
>
> With this patch other callers hard code the sign (which are '+', '-',
> ' ' and '\\') such that we do not run into unexpectedly emitting an
> error-nous '\0'.

"erroneous"?

I also don't understand the meaning of this paragraph - if you mean that 
this patch teaches other callers to hardcode the sign, I don't see any 
such changes in the diff below.

After reading the patch below, would this commit message be better:

[begin]
diff.c: teach emit_line_0 to accept sign parameter

Instead of a separate "first" parameter representing the first character 
of the line to be printed, make emit_line_0 take an optional "sign" 
parameter specifically intended to hold the sign of the line. Callers 
that store the sign and line separately can use the "sign" parameter 
like they used the "first" parameter previously, and callers that store 
the sign and line together (or do not have a sign) no longer need to 
manipulate their arguments to fit the requirements of emit_line_0.

(And then mention that you have checked all the callers and that none of 
them send '\n' or '\r' as the sign, as you have done in this version.)
[end]

>
> The audit of the caller revealed that the sign cannot be '\n' or '\r',
> so remove that condition for trailing newline or carriage return in the
> sign; the else part of the condition handles the len==0 perfectly,
> so we can drop the if/else construct.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 40 +++++++++++++++++-----------------------
>  1 file changed, 17 insertions(+), 23 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c2ed605cd0..4269b8dccf 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -517,33 +517,24 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
>  }
>
>  static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
> -			int first, const char *line, int len)
> +			int sign, const char *line, int len)
>  {
>  	int has_trailing_newline, has_trailing_carriage_return;
> -	int nofirst;
>  	FILE *file = o->file;
>
>  	fputs(diff_line_prefix(o), file);
>
> -	if (len == 0) {
> -		has_trailing_newline = (first == '\n');
> -		has_trailing_carriage_return = (!has_trailing_newline &&
> -						(first == '\r'));
> -		nofirst = has_trailing_newline || has_trailing_carriage_return;
> -	} else {
> -		has_trailing_newline = (len > 0 && line[len-1] == '\n');
> -		if (has_trailing_newline)
> -			len--;
> -		has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
> -		if (has_trailing_carriage_return)
> -			len--;
> -		nofirst = 0;
> -	}
> +	has_trailing_newline = (len > 0 && line[len-1] == '\n');
> +	if (has_trailing_newline)
> +		len--;
> +	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
> +	if (has_trailing_carriage_return)
> +		len--;
>
> -	if (len || !nofirst) {
> +	if (len || sign) {
>  		fputs(set, file);
> -		if (!nofirst)
> -			fputc(first, file);
> +		if (sign)
> +			fputc(sign, file);
>  		fwrite(line, len, 1, file);
>  		fputs(reset, file);
>  	}
> @@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
>  static void emit_line(struct diff_options *o, const char *set, const char *reset,
>  		      const char *line, int len)
>  {
> -	emit_line_0(o, set, reset, line[0], line+1, len-1);
> +	emit_line_0(o, set, reset, 0, line, len);
>  }

Maybe this function is unnecessary now that emit_line_0 can take the 
line directly.

>
>  static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
> @@ -4822,9 +4813,12 @@ void diff_flush(struct diff_options *options)
>
>  	if (output_format & DIFF_FORMAT_PATCH) {
>  		if (separator) {
> -			fprintf(options->file, "%s%c",
> -				diff_line_prefix(options),
> -				options->line_termination);
> +			char term[2];
> +			term[0] = options->line_termination;
> +			term[1] = '\0';
> +
> +			emit_line(options, NULL, NULL,
> +				  term, 1);

If options->line_termination is 0, this is actually a zero-length string 
(not 1).

>  			if (options->stat_sep) {
>  				/* attach patch instead of inline */
>  				fputs(options->stat_sep, options->file);
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 05/19] diff.c: emit_line_0 can handle no color setting
  2017-05-14  4:01 ` [PATCH 05/19] diff.c: emit_line_0 can handle no color setting Stefan Beller
@ 2017-05-15 18:31   ` Jonathan Tan
  2017-05-15 22:11     ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-15 18:31 UTC (permalink / raw)
  To: Stefan Beller, git; +Cc: peff, gitster, mhagger, jrnieder, bmwill

On 05/13/2017 09:01 PM, Stefan Beller wrote:
> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> In later patches we may pass lines that are not colored to
> the central function emit_line_0, so we
> need to emit the color only when it is non-NULL.

The diff below seems to just make emit_line_0 allow NULL for set and 
reset, unlike what the commit message above describes. (And is that 
necessary? Couldn't the caller just pass "" if they don't want any 
setting and/or resetting?)

>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 381b572d76..48f0fb98dc 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -532,11 +532,13 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
>  		len--;
>
>  	if (len || sign) {
> -		fputs(set, file);
> +		if (set)
> +			fputs(set, file);
>  		if (sign)
>  			fputc(sign, file);
>  		fwrite(line, len, 1, file);
> -		fputs(reset, file);
> +		if (reset)
> +			fputs(reset, file);
>  	}
>  	if (has_trailing_carriage_return)
>  		fputc('\r', file);
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
  2017-05-15 18:26   ` Jonathan Tan
@ 2017-05-15 18:33     ` Stefan Beller
  2017-05-16 16:05       ` Jonathan Tan
  0 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 18:33 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 11:26 AM, Jonathan Tan <jonathantanmy@google.com> wrote:

> "erroneous"?

yep, words are hard.

>
> I also don't understand the meaning of this paragraph - if you mean that
> this patch teaches other callers to hardcode the sign, I don't see any such
> changes in the diff below.

The last two hunks of the patch switch two callers that call with a sign
that is hard to reason about.

> After reading the patch below, would this commit message be better:
>
> [begin]
> diff.c: teach emit_line_0 to accept sign parameter
>
> Instead of a separate "first" parameter representing the first character of
> the line to be printed, make emit_line_0 take an optional "sign" parameter
> specifically intended to hold the sign of the line. Callers that store the
> sign and line separately can use the "sign" parameter like they used the
> "first" parameter previously, and callers that store the sign and line
> together (or do not have a sign) no longer need to manipulate their
> arguments to fit the requirements of emit_line_0.
>
> (And then mention that you have checked all the callers and that none of
> them send '\n' or '\r' as the sign, as you have done in this version.)
> [end]

That describes the situation better, indeed.

>> @@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const
>> char *set, const char *res
>>  static void emit_line(struct diff_options *o, const char *set, const char
>> *reset,
>>                       const char *line, int len)
>>  {
>> -       emit_line_0(o, set, reset, line[0], line+1, len-1);
>> +       emit_line_0(o, set, reset, 0, line, len);
>>  }
>
>
> Maybe this function is unnecessary now that emit_line_0 can take the line
> directly.

That would produce a lot of code churn, specifically in later patches;
but I can remove that function if anyone feels strongly about it.

>> +                       char term[2];
>> +                       term[0] = options->line_termination;
>> +                       term[1] = '\0';
>> +
>> +                       emit_line(options, NULL, NULL,
>> +                                 term, 1);
>
>
> If options->line_termination is 0, this is actually a zero-length string
> (not 1).

So passing in !!options->line_termination should be fine?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 08/19] diff.c: convert builtin_diff to use emit_line_*
  2017-05-14  4:01 ` [PATCH 08/19] diff.c: convert builtin_diff " Stefan Beller
@ 2017-05-15 18:42   ` Jonathan Tan
  0 siblings, 0 replies; 128+ messages in thread
From: Jonathan Tan @ 2017-05-15 18:42 UTC (permalink / raw)
  To: Stefan Beller, git; +Cc: peff, gitster, mhagger, jrnieder, bmwill

On 05/13/2017 09:01 PM, Stefan Beller wrote:
> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers builtin_diff.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 43 ++++++++++++++++++++++++++-----------------
>  1 file changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 93343a9ccc..8e00010bf4 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1293,8 +1293,9 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>  	o->found_changes = 1;
>
>  	if (ecbdata->header) {
> -		fprintf(o->file, "%s", ecbdata->header->buf);
> -		strbuf_reset(ecbdata->header);
> +		emit_line(o, NULL, NULL,
> +			  ecbdata->header->buf, ecbdata->header->len);
> +		strbuf_release(ecbdata->header);
>  		ecbdata->header = NULL;
>  	}
>
> @@ -2407,7 +2408,7 @@ static void builtin_diff(const char *name_a,
>  	b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
>  	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
>  	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
> -	strbuf_addf(&header, "%s%sdiff --git %s %s%s\n", line_prefix, meta, a_one, b_two, reset);
> +	strbuf_addf(&header, "%sdiff --git %s %s%s\n", meta, a_one, b_two, reset);
>  	if (lbl[0][0] == '/') {
>  		/* /dev/null */
>  		strbuf_addf(&header, "%s%snew file mode %06o%s\n", line_prefix, meta, two->mode, reset);
> @@ -2439,7 +2440,7 @@ static void builtin_diff(const char *name_a,
>  		if (complete_rewrite &&
>  		    (textconv_one || !diff_filespec_is_binary(one)) &&
>  		    (textconv_two || !diff_filespec_is_binary(two))) {
> -			fprintf(o->file, "%s", header.buf);
> +			emit_line(o, NULL, NULL, header.buf, header.len);
>  			strbuf_reset(&header);
>  			emit_rewrite_diff(name_a, name_b, one, two,
>  						textconv_one, textconv_two, o);
> @@ -2449,7 +2450,8 @@ static void builtin_diff(const char *name_a,
>  	}
>
>  	if (o->irreversible_delete && lbl[1][0] == '/') {
> -		fprintf(o->file, "%s", header.buf);
> +		if (header.len)
> +			emit_line(o, NULL, NULL, header.buf, header.len);

This used to unconditionally output diff_line_prefix(0), but now outputs 
it only if the non-prefix part is blank. Is that expected? (Same 
comments below.)

>  		strbuf_reset(&header);
>  		goto free_ab_and_return;
>  	} else if (!DIFF_OPT_TST(o, TEXT) &&
> @@ -2459,13 +2461,16 @@ static void builtin_diff(const char *name_a,
>  		    S_ISREG(one->mode) && S_ISREG(two->mode) &&
>  		    !DIFF_OPT_TST(o, BINARY)) {
>  			if (!oidcmp(&one->oid, &two->oid)) {
> -				if (must_show_header)
> -					fprintf(o->file, "%s", header.buf);
> +				if (must_show_header && header.len)
> +					emit_line(o, NULL, NULL,
> +						  header.buf, header.len);
>  				goto free_ab_and_return;
>  			}
> -			fprintf(o->file, "%s", header.buf);
> -			fprintf(o->file, "%sBinary files %s and %s differ\n",
> -				line_prefix, lbl[0], lbl[1]);
> +			if (header.len)
> +				emit_line(o, NULL, NULL,
> +					  header.buf, header.len);
> +			emit_line_fmt(o, 0, 0, "Binary files %s and %s differ\n",
> +				      lbl[0], lbl[1]);
>  			goto free_ab_and_return;
>  		}
>  		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
> @@ -2473,17 +2478,21 @@ static void builtin_diff(const char *name_a,
>  		/* Quite common confusing case */
>  		if (mf1.size == mf2.size &&
>  		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
> -			if (must_show_header)
> -				fprintf(o->file, "%s", header.buf);
> +			if (must_show_header && header.len)
> +				emit_line(o, NULL, NULL,
> +					  header.buf, header.len);
>  			goto free_ab_and_return;
>  		}
> -		fprintf(o->file, "%s", header.buf);
> +		if (header.len)
> +			emit_line(o, NULL, NULL,
> +				  header.buf, header.len);
>  		strbuf_reset(&header);
>  		if (DIFF_OPT_TST(o, BINARY))
>  			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
>  		else
> -			fprintf(o->file, "%sBinary files %s and %s differ\n",
> -				line_prefix, lbl[0], lbl[1]);
> +			emit_line_fmt(o, NULL, NULL,
> +				      "Binary files %s and %s differ\n",
> +				      lbl[0], lbl[1]);
>  		o->found_changes = 1;
>  	} else {
>  		/* Crazy xdl interfaces.. */
> @@ -2494,8 +2503,8 @@ static void builtin_diff(const char *name_a,
>  		struct emit_callback ecbdata;
>  		const struct userdiff_funcname *pe;
>
> -		if (must_show_header) {
> -			fprintf(o->file, "%s", header.buf);
> +		if (must_show_header && header.len) {
> +			emit_line(o, NULL, NULL, header.buf, header.len);
>  			strbuf_reset(&header);
>  		}
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 10/19] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-14  4:01 ` [PATCH 10/19] diff.c: convert emit_rewrite_lines " Stefan Beller
@ 2017-05-15 19:09   ` Jonathan Tan
  2017-05-15 19:31     ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-15 19:09 UTC (permalink / raw)
  To: Stefan Beller, git; +Cc: peff, gitster, mhagger, jrnieder, bmwill



On 05/13/2017 09:01 PM, Stefan Beller wrote:
> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers emit_rewrite_lines.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/diff.c b/diff.c
> index e4b46fee4f..369c804f03 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -748,7 +748,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
>  	if (!endp) {
>  		const char *context = diff_get_color(ecb->color_diff,
>  						     DIFF_CONTEXT);
> -		putc('\n', ecb->opt->file);
> +		emit_line(ecb->opt, NULL, NULL, "\n", 1);

This outputs diff_line_prefix(ecb->opt) - is that OK?

>  		emit_line_0(ecb->opt, context, reset, '\\',
>  			    nneof, strlen(nneof));
>  	}
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
  2017-05-14  4:01 ` [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0 Stefan Beller
  2017-05-15 18:26   ` Jonathan Tan
@ 2017-05-15 19:22   ` Brandon Williams
  2017-05-15 19:35     ` Stefan Beller
  1 sibling, 1 reply; 128+ messages in thread
From: Brandon Williams @ 2017-05-15 19:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, peff, gitster, mhagger, jrnieder

On 05/13, Stefan Beller wrote:
> In 250f79930d (diff.c: split emit_line() from the first char and the rest
> of the line, 2009-09-14) we introduced the local variable 'nofirst' that
> indicates if we have no first sign character. With the given implementation
> we had to use an extra variable unlike reusing 'first' because the lines
> first character could be '\0'.
> 
> Change the meaning of the 'first' argument to not mean the first character
> of the line, but rather just containing the sign that is prepended to the
> line. Refactor emit_line to not include the lines first character, but pass
> the complete line as well as a '\0' sign, which now serves as an indication
> not to print a sign.
> 
> With this patch other callers hard code the sign (which are '+', '-',
> ' ' and '\\') such that we do not run into unexpectedly emitting an
> error-nous '\0'.
> 
> The audit of the caller revealed that the sign cannot be '\n' or '\r',
> so remove that condition for trailing newline or carriage return in the
> sign; the else part of the condition handles the len==0 perfectly,
> so we can drop the if/else construct.
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 40 +++++++++++++++++-----------------------
>  1 file changed, 17 insertions(+), 23 deletions(-)
> 
> diff --git a/diff.c b/diff.c
> index c2ed605cd0..4269b8dccf 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -517,33 +517,24 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
>  }
>  
>  static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
> -			int first, const char *line, int len)
> +			int sign, const char *line, int len)
>  {
>  	int has_trailing_newline, has_trailing_carriage_return;
> -	int nofirst;
>  	FILE *file = o->file;
>  
>  	fputs(diff_line_prefix(o), file);
>  
> -	if (len == 0) {
> -		has_trailing_newline = (first == '\n');
> -		has_trailing_carriage_return = (!has_trailing_newline &&
> -						(first == '\r'));
> -		nofirst = has_trailing_newline || has_trailing_carriage_return;
> -	} else {
> -		has_trailing_newline = (len > 0 && line[len-1] == '\n');
> -		if (has_trailing_newline)
> -			len--;
> -		has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
> -		if (has_trailing_carriage_return)
> -			len--;
> -		nofirst = 0;
> -	}
> +	has_trailing_newline = (len > 0 && line[len-1] == '\n');
> +	if (has_trailing_newline)
> +		len--;
> +	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
> +	if (has_trailing_carriage_return)
> +		len--;

Does the order of newline/carriage return always the same?

>  
> -	if (len || !nofirst) {
> +	if (len || sign) {
>  		fputs(set, file);
> -		if (!nofirst)
> -			fputc(first, file);
> +		if (sign)
> +			fputc(sign, file);
>  		fwrite(line, len, 1, file);
>  		fputs(reset, file);
>  	}
> @@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
>  static void emit_line(struct diff_options *o, const char *set, const char *reset,
>  		      const char *line, int len)
>  {
> -	emit_line_0(o, set, reset, line[0], line+1, len-1);
> +	emit_line_0(o, set, reset, 0, line, len);
>  }
>  
>  static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
> @@ -4822,9 +4813,12 @@ void diff_flush(struct diff_options *options)
>  
>  	if (output_format & DIFF_FORMAT_PATCH) {
>  		if (separator) {
> -			fprintf(options->file, "%s%c",
> -				diff_line_prefix(options),
> -				options->line_termination);
> +			char term[2];
> +			term[0] = options->line_termination;
> +			term[1] = '\0';
> +
> +			emit_line(options, NULL, NULL,
> +				  term, 1);
>  			if (options->stat_sep) {
>  				/* attach patch instead of inline */
>  				fputs(options->stat_sep, options->file);
> -- 
> 2.13.0.18.g183880de0a
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 10/19] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-15 19:09   ` Jonathan Tan
@ 2017-05-15 19:31     ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 19:31 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 12:09 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
>
>
> On 05/13/2017 09:01 PM, Stefan Beller wrote:
>>
>> In a later patch, I want to propose an option to detect&color
>> moved lines in a diff, which cannot be done in a one-pass over
>> the diff. Instead we need to go over the whole diff twice,
>> because we cannot detect the first line of the two corresponding
>> lines (+ and -) that got moved.
>>
>> So to prepare the diff machinery for two pass algorithms
>> (i.e. buffer it all up and then operate on the result),
>> move all emissions to places, such that the only emitting
>> function is emit_line_0.
>>
>> This covers emit_rewrite_lines.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  diff.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/diff.c b/diff.c
>> index e4b46fee4f..369c804f03 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -748,7 +748,7 @@ static void emit_rewrite_lines(struct emit_callback
>> *ecb,
>>         if (!endp) {
>>                 const char *context = diff_get_color(ecb->color_diff,
>>                                                      DIFF_CONTEXT);
>> -               putc('\n', ecb->opt->file);
>> +               emit_line(ecb->opt, NULL, NULL, "\n", 1);
>
>
> This outputs diff_line_prefix(ecb->opt) - is that OK?

It shows this area is not covered well by our test suite.

My first reaction was that this is not ok, but we'd have
to inspect the situation. It was introduced in
35e2d03c2c (Fix '\ No newline...' annotation in rewrite diffs,
2012-08-04).

And looking at the code of the function I think this is broken.

I wonder what the best way forward is for this patch series here,
as we'd need to buffer the last line. That should be fine as it is
a corner case, maybe:

diff --git a/diff.c b/diff.c
index 0f10736ee6..f46e52135d 100644
--- a/diff.c
+++ b/diff.c
@@ -1011,15 +1011,27 @@ static void add_line_count(struct strbuf *out,
int count)
 static void emit_rewrite_lines(struct emit_callback *ecb,
                               int prefix, const char *data, int size)
 {
-       const char *endp = NULL;
-       static const char *nneof = " No newline at end of file\n";
+       static const char *nneof = "\\ No newline at end of file\n";
        const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
+       struct strbuf sb = STRBUF_INIT;

        while (0 < size) {
                int len;

                endp = memchr(data, '\n', size);
-               len = endp ? (endp - data + 1) : size;
+               if (endp)
+                       len = endp - data + 1;
+               else {
+                       /* last line has no \n */
+                       while (0 < size) {
+                               strbuf_addch(&sb, *data);
+                               size -= len;
+                               data += len;
+                       }
+                       strbuf_addch(&sb, '\n');
+                       data = sb.buf;
+                       len = sb.len;
+               }
                if (prefix != '+') {
                        ecb->lno_in_preimage++;
                        emit_del_line(reset, ecb, data, len);
@@ -1030,12 +1042,12 @@ static void emit_rewrite_lines(struct
emit_callback *ecb,
                size -= len;
                data += len;
        }
-       if (!endp) {
+       if (sb.len) {
                const char *context = diff_get_color(ecb->color_diff,
                                                     DIFF_CONTEXT);
-               emit_line(ecb->opt, NULL, NULL, "\n", 1);
-               emit_line_0(ecb->opt, context, reset, '\\',
+               emit_line(ecb->opt, context, reset,
                            nneof, strlen(nneof));
+               strbuf_release(&sb);
        }
 }

^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [PATCH 06/19] diff: add emit_line_fmt
  2017-05-14  4:01 ` [PATCH 06/19] diff: add emit_line_fmt Stefan Beller
@ 2017-05-15 19:31   ` Brandon Williams
  0 siblings, 0 replies; 128+ messages in thread
From: Brandon Williams @ 2017-05-15 19:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, peff, gitster, mhagger, jrnieder

On 05/13, Stefan Beller wrote:
> In the following patches we'll convert all printing functions to use
> the emit_line_* family of functions.
> 
> Many of the printing functions to be converted are formatted. So offer
> a formatted function in the emit_line function family as well.
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/diff.c b/diff.c
> index 48f0fb98dc..aef159a919 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -552,6 +552,20 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
>  	emit_line_0(o, set, reset, 0, line, len);
>  }
>  
> +static void emit_line_fmt(struct diff_options *o,
> +			  const char *set, const char *reset,
> +			  const char *fmt, ...)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	va_list ap;
> +	va_start(ap, fmt);
> +	strbuf_vaddf(&sb, fmt, ap);
> +	va_end(ap);
> +
> +	emit_line(o, set, reset, sb.buf, sb.len);
> +	strbuf_release(&sb);
> +}
> +
>  static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
>  {
>  	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
> -- 
> 2.13.0.18.g183880de0a
> 

Since this is a new function, and it is marked static, this patch
shouldn't compile or the compiler should throw a warning or something.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
  2017-05-15 19:22   ` Brandon Williams
@ 2017-05-15 19:35     ` Stefan Beller
  2017-05-15 19:45       ` Brandon Williams
  0 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 19:35 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git@vger.kernel.org, Jonathan Tan, Jeff King, Junio C Hamano,
	Michael Haggerty, Jonathan Nieder

On Mon, May 15, 2017 at 12:22 PM, Brandon Williams <bmwill@google.com> wrote:

> Does the order of newline/carriage return always the same?

https://en.wikipedia.org/wiki/Newline

There are operating systems that like it the other way round.
The BBC micro is no longer relevant (IMHO), but RISC OS
spooled text output *may* be relevant as they released a stable
version not that long ago.

But I would think this code would have issues with RISC OS
text spooling without this patch as well.

Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
  2017-05-15 19:35     ` Stefan Beller
@ 2017-05-15 19:45       ` Brandon Williams
  0 siblings, 0 replies; 128+ messages in thread
From: Brandon Williams @ 2017-05-15 19:45 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jonathan Tan, Jeff King, Junio C Hamano,
	Michael Haggerty, Jonathan Nieder

On 05/15, Stefan Beller wrote:
> On Mon, May 15, 2017 at 12:22 PM, Brandon Williams <bmwill@google.com> wrote:
> 
> > Does the order of newline/carriage return always the same?
> 
> https://en.wikipedia.org/wiki/Newline
> 
> There are operating systems that like it the other way round.
> The BBC micro is no longer relevant (IMHO), but RISC OS
> spooled text output *may* be relevant as they released a stable
> version not that long ago.
> 
> But I would think this code would have issues with RISC OS
> text spooling without this patch as well.
> 
> Stefan

Fair enough, its not relevant to the series.  I was just pointing it
out.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 15/19] diff.c: convert diff_flush to use emit_line_*
  2017-05-14  4:01 ` [PATCH 15/19] diff.c: convert diff_flush " Stefan Beller
@ 2017-05-15 20:21   ` Jonathan Tan
  2017-05-15 22:08     ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-15 20:21 UTC (permalink / raw)
  To: Stefan Beller, git; +Cc: peff, gitster, mhagger, jrnieder, bmwill

On 05/13/2017 09:01 PM, Stefan Beller wrote:
> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers diff_flush.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/diff.c b/diff.c
> index 07041a35ad..386b28cf47 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4873,7 +4873,9 @@ void diff_flush(struct diff_options *options)
>  				  term, 1);
>  			if (options->stat_sep) {
>  				/* attach patch instead of inline */
> -				fputs(options->stat_sep, options->file);
> +				emit_line(options, NULL, NULL,
> +					  options->stat_sep,
> +					  strlen(options->stat_sep));

Same comment as patch 10 - is it OK that we now output a prefix too?

>  			}
>  		}
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 15/19] diff.c: convert diff_flush to use emit_line_*
  2017-05-15 20:21   ` Jonathan Tan
@ 2017-05-15 22:08     ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 22:08 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 1:21 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On 05/13/2017 09:01 PM, Stefan Beller wrote:
>>
>> In a later patch, I want to propose an option to detect&color
>> moved lines in a diff, which cannot be done in a one-pass over
>> the diff. Instead we need to go over the whole diff twice,
>> because we cannot detect the first line of the two corresponding
>> lines (+ and -) that got moved.
>>
>> So to prepare the diff machinery for two pass algorithms
>> (i.e. buffer it all up and then operate on the result),
>> move all emissions to places, such that the only emitting
>> function is emit_line_0.
>>
>> This covers diff_flush.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  diff.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 07041a35ad..386b28cf47 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -4873,7 +4873,9 @@ void diff_flush(struct diff_options *options)
>>                                   term, 1);
>>                         if (options->stat_sep) {
>>                                 /* attach patch instead of inline */
>> -                               fputs(options->stat_sep, options->file);
>> +                               emit_line(options, NULL, NULL,
>> +                                         options->stat_sep,
>> +                                         strlen(options->stat_sep));
>
>
> Same comment as patch 10 - is it OK that we now output a prefix too?

In this case, I would think it is ok. The stat_sep is only used in
"format-patch --attach" for example, which makes NO sense to
use in combination with --line-prefix.

(That is already broken; at least the line-prefix part, as we
do *not* prefix all the lines with the given prefix. That is because
stat_sep is a multiline string emitted, starting with '\n'.)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 05/19] diff.c: emit_line_0 can handle no color setting
  2017-05-15 18:31   ` Jonathan Tan
@ 2017-05-15 22:11     ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 22:11 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 11:31 AM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On 05/13/2017 09:01 PM, Stefan Beller wrote:
>>
>> In a later patch, I want to propose an option to detect&color
>> moved lines in a diff, which cannot be done in a one-pass over
>> the diff. Instead we need to go over the whole diff twice,
>> because we cannot detect the first line of the two corresponding
>> lines (+ and -) that got moved.
>>
>> So to prepare the diff machinery for two pass algorithms
>> (i.e. buffer it all up and then operate on the result),
>> move all emissions to places, such that the only emitting
>> function is emit_line_0.
>>
>> In later patches we may pass lines that are not colored to
>> the central function emit_line_0, so we
>> need to emit the color only when it is non-NULL.
>
>
> The diff below seems to just make emit_line_0 allow NULL for set and reset,
> unlike what the commit message above describes. (And is that necessary?
> Couldn't the caller just pass "" if they don't want any setting and/or
> resetting?)
>

They could just give ""; but instead of having an empty system
call I thought about this short cut.

I'll reword the commit message.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 14/19] diff.c: convert word diffing to use emit_line_*
  2017-05-14  4:01 ` [PATCH 14/19] diff.c: convert word diffing " Stefan Beller
@ 2017-05-15 22:40   ` Jonathan Tan
  2017-05-15 23:12     ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-15 22:40 UTC (permalink / raw)
  To: Stefan Beller, git; +Cc: peff, gitster, mhagger, jrnieder, bmwill

On 05/13/2017 09:01 PM, Stefan Beller wrote:
> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers all code related to diffing words.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 56 ++++++++++++++++++++++++++++----------------------------
>  1 file changed, 28 insertions(+), 28 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 91dc045a45..07041a35ad 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -886,37 +886,38 @@ struct diff_words_data {
>  	struct diff_words_style *style;
>  };
>
> -static int fn_out_diff_words_write_helper(FILE *fp,
> +static int fn_out_diff_words_write_helper(struct diff_options *o,
>  					  struct diff_words_style_elem *st_el,
>  					  const char *newline,
>  					  size_t count, const char *buf,
>  					  const char *line_prefix)
>  {
> -	int print = 0;
> +	struct strbuf sb = STRBUF_INIT;
>
>  	while (count) {
>  		char *p = memchr(buf, '\n', count);
> -		if (print)
> -			fputs(line_prefix, fp);
> +
>  		if (p != buf) {
> -			if (st_el->color && fputs(st_el->color, fp) < 0)
> -				return -1;
> -			if (fputs(st_el->prefix, fp) < 0 ||
> -			    fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
> -			    fputs(st_el->suffix, fp) < 0)
> -				return -1;
> -			if (st_el->color && *st_el->color
> -			    && fputs(GIT_COLOR_RESET, fp) < 0)
> -				return -1;
> +			if (st_el->color)
> +				strbuf_addstr(&sb, st_el->color);
> +			strbuf_addstr(&sb, st_el->prefix);
> +			strbuf_add(&sb, buf, p ? p - buf : count);
> +			strbuf_addstr(&sb, st_el->suffix);
> +			if (st_el->color && *st_el->color)
> +			    strbuf_addstr(&sb, GIT_COLOR_RESET);
>  		}
>  		if (!p)
> -			return 0;
> -		if (fputs(newline, fp) < 0)
> -			return -1;
> +			goto out;
> +		strbuf_addstr(&sb, newline);
> +		emit_line(o, NULL, NULL, sb.buf, sb.len);

I suspect that this will need to be refactored more thoroughly. Here, 
for example, emit_line (which prints the prefix) is printed nearly 
unconditionally, whereas in the original version, "fputs(line_prefix, 
fp)" is only printed when "print" is true.

> +		strbuf_reset(&sb);
>  		count -= p + 1 - buf;
>  		buf = p + 1;
> -		print = 1;
>  	}
> +out:
> +	if (sb.len)
> +		emit_line(o, NULL, NULL, sb.buf, sb.len);
> +	strbuf_release(&sb);
>  	return 0;
>  }
>
> @@ -994,25 +995,25 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
>  	} else
>  		plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
>
> -	if (color_words_output_graph_prefix(diff_words)) {
> -		fputs(line_prefix, diff_words->opt->file);
> -	}
> +	if (color_words_output_graph_prefix(diff_words))
> +		emit_line(diff_words->opt, NULL, NULL, "", 0);
> +
>  	if (diff_words->current_plus != plus_begin) {
> -		fn_out_diff_words_write_helper(diff_words->opt->file,
> +		fn_out_diff_words_write_helper(diff_words->opt,
>  				&style->ctx, style->newline,
>  				plus_begin - diff_words->current_plus,
>  				diff_words->current_plus, line_prefix);
>  		if (*(plus_begin - 1) == '\n')
> -			fputs(line_prefix, diff_words->opt->file);
> +			emit_line(diff_words->opt, NULL, NULL, "", 0);
>  	}
>  	if (minus_begin != minus_end) {
> -		fn_out_diff_words_write_helper(diff_words->opt->file,
> +		fn_out_diff_words_write_helper(diff_words->opt,
>  				&style->old, style->newline,
>  				minus_end - minus_begin, minus_begin,
>  				line_prefix);
>  	}
>  	if (plus_begin != plus_end) {
> -		fn_out_diff_words_write_helper(diff_words->opt->file,
> +		fn_out_diff_words_write_helper(diff_words->opt,
>  				&style->new, style->newline,
>  				plus_end - plus_begin, plus_begin,
>  				line_prefix);
> @@ -1109,8 +1110,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
>
>  	/* special case: only removal */
>  	if (!diff_words->plus.text.size) {
> -		fputs(line_prefix, diff_words->opt->file);
> -		fn_out_diff_words_write_helper(diff_words->opt->file,
> +		fn_out_diff_words_write_helper(diff_words->opt,
>  			&style->old, style->newline,
>  			diff_words->minus.text.size,
>  			diff_words->minus.text.ptr, line_prefix);
> @@ -1136,8 +1136,8 @@ static void diff_words_show(struct diff_words_data *diff_words)
>  	if (diff_words->current_plus != diff_words->plus.text.ptr +
>  			diff_words->plus.text.size) {
>  		if (color_words_output_graph_prefix(diff_words))
> -			fputs(line_prefix, diff_words->opt->file);
> -		fn_out_diff_words_write_helper(diff_words->opt->file,
> +			emit_line(diff_words->opt, NULL, NULL, "", 0);
> +		fn_out_diff_words_write_helper(diff_words->opt,
>  			&style->ctx, style->newline,
>  			diff_words->plus.text.ptr + diff_words->plus.text.size
>  			- diff_words->current_plus, diff_words->current_plus,
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 19/19] diff.c: color moved lines differently
  2017-05-14  4:01 ` [PATCH 19/19] diff.c: color moved lines differently Stefan Beller
@ 2017-05-15 22:42   ` Brandon Williams
  2017-05-16  4:34   ` Jonathan Tan
  2017-05-16 12:31   ` Jeff King
  2 siblings, 0 replies; 128+ messages in thread
From: Brandon Williams @ 2017-05-15 22:42 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, peff, gitster, mhagger, jrnieder

On 05/13, Stefan Beller wrote:
> When there is a lot of code moved around such as in 11979b9 (2005-11-18,
> "http.c: reorder to avoid compilation failure.") for example, the review
> process is quite hard, as it is not mentally challenging.  It is a rather
> tedious process, that gets boring quickly. However you still need to read
> through all of the code to make sure the moved lines are there as supposed.
> 
> While it is trivial to color up a patch like the following
> 
>     $ git diff
>     diff --git a/file2.c b/file2.c
>     index 9163a0f..8e66dc0 100644
>     --- a/file2.c
>     +++ b/file2.c
>     @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
>             return memcpy(xmallocz(len), data, len);
>      }
> 
>     -int secure_foo(struct user *u)
>     -{
>     -       if (!u->is_allowed_foo)
>     -               return;
>     -       foo(u);
>     -}
>     -
>      char *xstrndup(const char *str, size_t len)
>      {
>             char *p = memchr(str, '\0', len);
>     diff --git a/test.c b/test.c
>     index a95e6fe..81eb0eb 100644
>     --- a/test.c
>     +++ b/test.c
>     @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
>             return total;
>      }
> 
>     +int secure_foo(struct user *u)
>     +{
>     +       if (!u->is_allowed_foo)
>     +               return;
>     +       foo(u);
>     +}
>     +
>      int xdup(int fd)
>      {
>             int ret = dup(fd);
> 
> as in this patch all lines that add or remove lines
> should be colored in the new color that indicates moved
> lines.
> 
> However the intention of this patch is to aid reviewers
> to spotting permutations in the moved code. So consider the
> following malicious move:
> 
>     diff --git a/file2.c b/file2.c
>     index 9163a0f..8e66dc0 100644
>     --- a/file2.c
>     +++ b/file2.c
>     @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
>             return memcpy(xmallocz(len), data, len);
>      }
> 
>     -int secure_foo(struct user *u)
>     -{
>     -       if (!u->is_allowed_foo)
>     -               return;
>     -       foo(u);
>     -}
>     -
>      char *xstrndup(const char *str, size_t len)
>      {
>             char *p = memchr(str, '\0', len);
>     diff --git a/test.c b/test.c
>     index a95e6fe..a679c40 100644
>     --- a/test.c
>     +++ b/test.c
>     @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
>             return total;
>      }
> 
>     +int secure_foo(struct user *u)
>     +{
>     +       foo(u);
>     +       if (!u->is_allowed_foo)
>     +               return;
>     +}
>     +
>      int xdup(int fd)
>      {
>             int ret = dup(fd);
> 
> If the moved code is larger, it is easier to hide some permutation in the
> code, which is why we would not want to color all lines as "moved" in this
> case. So we do not just need to color lines differently that are added and
> removed in the same diff, we need to tweak the algorithm a bit more.
> 
> As the reviewers attention should be brought to the places, where the
> difference is introduced to the moved code, we cannot just have one new
> color for all of moved code.
> 
> First I implemented an alternative design, which would show a moved hunk
> in one color, and its boundaries in another color. This idea was error
> prone as it inspected each line and its neighboring lines to determine
> if the line was (a) moved and (b) if was deep inside a hunk by having
> matching neighboring lines. This is unreliable as the we can construct
> hunks which have equal neighbors that just exceed the number of lines
> inspected. See one of the tests as an example.
> 
> Instead this provides a dynamic programming greedy algorithm that finds
> the largest moved hunk and then switches color to the alternative color
> for the next hunk. By doing this any permutation is recognized and
> displayed. That implies that there is no dedicated boundary or
> inside-hunk color, but instead we'll have just two colors alternating
> for hunks.
> 
> It would be a bit more UX friendly if the two corresponding hunks
> (of added and deleted lines) for one move would get the same color id.
> (Both get "regular moved" or "alternative moved"). This problem is
> deferred to a later patch for now.
> 
> Algorithm-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/config.txt   |  12 +-
>  diff.c                     | 265 ++++++++++++++++++++++++++++++++++++++++++---
>  diff.h                     |  21 +++-
>  t/t4015-diff-whitespace.sh | 229 +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 506 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 475e874d51..90403c06e3 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the
>  'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
>  command line with the `--color[=<when>]` option.
>  
> +color.moved::
> +	A boolean value, whether a diff should color moved lines
> +	differently. The moved lines are searched for in the diff only.
> +	Duplicated lines from somewhere in the project that are not
> +	part of the diff are not colored as moved.
> +	Defaults to false.
> +
>  color.diff.<slot>::
>  	Use customized color for diff colorization.  `<slot>` specifies
>  	which part of the patch to use the specified color, and is one
>  	of `context` (context text - `plain` is a historical synonym),
>  	`meta` (metainformation), `frag`
>  	(hunk header), 'func' (function in hunk header), `old` (removed lines),
> -	`new` (added lines), `commit` (commit headers), or `whitespace`
> -	(highlighting whitespace errors).
> +	`new` (added lines), `commit` (commit headers), `whitespace`
> +	(highlighting whitespace errors), `movedFrom` (removed lines that
> +	reappear), `movedTo` (added lines that were removed elsewhere).
>  
>  color.decorate.<slot>::
>  	Use customized color for 'git log --decorate' output.  `<slot>` is one
> diff --git a/diff.c b/diff.c
> index dbab7fb44e..6372e0eb25 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -31,6 +31,7 @@ static int diff_indent_heuristic; /* experimental */
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> +static int diff_color_moved_default;
>  static int diff_context_default = 3;
>  static int diff_interhunk_context_default;
>  static const char *diff_word_regex_cfg;
> @@ -55,6 +56,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
>  	GIT_COLOR_YELLOW,	/* COMMIT */
>  	GIT_COLOR_BG_RED,	/* WHITESPACE */
>  	GIT_COLOR_NORMAL,	/* FUNCINFO */
> +	GIT_COLOR_BOLD_RED,	/* OLD_MOVED_A */
> +	GIT_COLOR_BG_RED,	/* OLD_MOVED_B */
> +	GIT_COLOR_BOLD_GREEN,	/* NEW_MOVED_A */
> +	GIT_COLOR_BG_GREEN,	/* NEW_MOVED_B */
>  };
>  
>  static NORETURN void die_want_option(const char *option_name)
> @@ -80,6 +85,14 @@ static int parse_diff_color_slot(const char *var)
>  		return DIFF_WHITESPACE;
>  	if (!strcasecmp(var, "func"))
>  		return DIFF_FUNCINFO;
> +	if (!strcasecmp(var, "oldmoved"))
> +		return DIFF_FILE_OLD_MOVED;
> +	if (!strcasecmp(var, "oldmovedalternative"))
> +		return DIFF_FILE_OLD_MOVED_ALT;
> +	if (!strcasecmp(var, "newmoved"))
> +		return DIFF_FILE_NEW_MOVED;
> +	if (!strcasecmp(var, "newmovedalternative"))
> +		return DIFF_FILE_NEW_MOVED_ALT;
>  	return -1;
>  }
>  
> @@ -234,6 +247,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>  		diff_use_color_default = git_config_colorbool(var, value);
>  		return 0;
>  	}
> +	if (!strcmp(var, "color.moved")) {
> +		diff_color_moved_default = git_config_bool(var, value);
> +		return 0;
> +	}
>  	if (!strcmp(var, "diff.context")) {
>  		diff_context_default = git_config_int(var, value);
>  		if (diff_context_default < 0)
> @@ -354,6 +371,81 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
>  	return git_default_config(var, value, cb);
>  }
>  
> +struct moved_entry {
> +	struct hashmap_entry ent;
> +	const struct buffered_patch_line *line;
> +	struct moved_entry *next_line;
> +};
> +
> +static void get_ws_cleaned_string(const struct buffered_patch_line *l,
> +				  struct strbuf *out)
> +{
> +	int i;
> +	for (i = 0; i < l->len; i++) {
> +		if (isspace(l->line[i]))
> +			continue;
> +		strbuf_addch(out, l->line[i]);
> +	}
> +}
> +
> +static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
> +					 const struct buffered_patch_line *b,
> +					 const void *keydata)
> +{
> +	struct strbuf sba = STRBUF_INIT;
> +	struct strbuf sbb = STRBUF_INIT;
> +	get_ws_cleaned_string(a, &sba);
> +	get_ws_cleaned_string(b, &sbb);
> +	return sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);

You have a memory leak here, the strbufs need to be released before
returning.  Also this seems very computationally heavy in that you would
need to reconstruct a string every time you perform this comparison.
Maybe that's ok because the alternative (storing the string w/o
whitespace would be too memory intensive).

> +}
> +
> +static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
> +				   const struct buffered_patch_line *b,
> +				   const void *keydata)
> +{
> +	return a->len != b->len || strncmp(a->line, b->line, a->len);
> +}
> +
> +static int moved_entry_cmp(const struct moved_entry *a,
> +			   const struct moved_entry *b,
> +			   const void *keydata)
> +{
> +	return buffered_patch_line_cmp(a->line, b->line, keydata);
> +}
> +
> +static int moved_entry_cmp_no_ws(const struct moved_entry *a,
> +				 const struct moved_entry *b,
> +				 const void *keydata)
> +{
> +	return buffered_patch_line_cmp_no_ws(a->line, b->line, keydata);
> +}
> +
> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
> +{
> +	static struct strbuf sb = STRBUF_INIT;
> +
> +	if (ignore_ws) {
> +		strbuf_reset(&sb);
> +		get_ws_cleaned_string(line, &sb);
> +		return memhash(sb.buf, sb.len);
> +	} else
> +		return memhash(line->line, line->len);

nit: braces on the else block.
I know other may not agree but I will still point it out when its
mismatched with a block with braces.

> +}
> +
> +static struct moved_entry *prepare_entry(struct diff_options *o,
> +					 int line_no)
> +{
> +	struct moved_entry *ret = xmalloc(sizeof(*ret));
> +	unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
> +	struct buffered_patch_line *l = &o->line_buffer[line_no];
> +
> +	ret->ent.hash = get_line_hash(l, ignore_ws);
> +	ret->line = l;
> +	ret->next_line = NULL;
> +
> +	return ret;
> +}
> +
>  static char *quote_two(const char *one, const char *two)
>  {
>  	int need_one = quote_c_style(one, NULL, NULL, 1);
> @@ -516,8 +608,98 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
>  	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>  }
>  
> +static void mark_color_as_moved(struct diff_options *o, int line_no)
> +{
> +	struct hashmap *hm = NULL;
> +	struct moved_entry *key = prepare_entry(o, line_no);
> +	struct moved_entry *match = NULL;
> +	struct buffered_patch_line *l = &o->line_buffer[line_no];
> +	int alt_flag;
> +	int i, lp, rp;
> +
> +	switch (l->sign) {
> +	case '+':
> +		hm = o->deleted_lines;
> +		break;
> +	case '-':
> +		hm = o->added_lines;
> +		break;
> +	default:
> +		/* reset to standard, on-alt move color */
> +		o->color_moved = 1;
> +		break;
> +	}
> +
> +	/* Check for any match to color it as a move. */
> +	if (!hm)
> +		return;
> +	match = hashmap_get(hm, key, o);
> +	free(key);

memory leak? if (!hm), then 'key' is never freed.

> +	if (!match)
> +		return;
> +
> +	/* Check any potential block runs, advance each or nullify */
> +	for (i = 0; i < o->pmb_nr; i++) {
> +		struct moved_entry *p = o->pmb[i];
> +		if (p && p->next_line &&
> +		    !buffered_patch_line_cmp(p->next_line->line, l, o)) {
> +			o->pmb[i] = p->next_line;
> +		} else {
> +			o->pmb[i] = NULL;
> +		}
> +	}
> +
> +	/* Shrink the set to the remaining runs */
> +	for (lp = 0, rp = o->pmb_nr - 1; lp <= rp;) {
> +		while (lp < o->pmb_nr && o->pmb[lp])
> +			lp ++;
> +		/* lp points at the first NULL now */
> +
> +		while (rp > -1 && !o->pmb[rp])
> +			rp--;
> +		/* rp points at the last non-NULL */
> +
> +		if (lp < o->pmb_nr && rp > -1 && lp < rp) {
> +			o->pmb[lp] = o->pmb[rp];
> +			o->pmb[rp] = NULL;
> +			rp--;
> +			lp++;
> +		}
> +	}
> +
> +	if (rp > -1) {
> +		/* Remember the number of running sets */
> +		o->pmb_nr = rp + 1;
> +	} else {
> +		/* Toggle color */
> +		o->color_moved = o->color_moved == 2 ? 1 : 2;
> +
> +		/* Build up a new set */
> +		i = 0;
> +		for (; match; match = hashmap_get_next(hm, match)) {
> +			ALLOC_GROW(o->pmb, i + 1, o->pmb_alloc);
> +			o->pmb[i] = match;
> +			i++;
> +		}
> +		o->pmb_nr = i;

why not just use o->pmb_nr directly instead of using 'i'?  I was
immediately concerned when 'i' was used in the ALLOC_GROW macro.

> +	}
> +
> +	alt_flag = o->color_moved - 1;
> +	switch (l->sign) {
> +	case '+':
> +		l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
> +		break;
> +	case '-':
> +		l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
> +		break;
> +	default:
> +		; /* nothing */
> +	}
> +}
> +
>  static void emit_buffered_patch_line(struct diff_options *o,
> -				     struct buffered_patch_line *e)
> +				     struct buffered_patch_line *e,
> +				     int pass)
>  {
>  	int has_trailing_newline, has_trailing_carriage_return, len = e->len;
>  	FILE *file = o->file;
> @@ -548,11 +730,11 @@ static void emit_buffered_patch_line(struct diff_options *o,
>  
>  static void emit_buffered_patch_line_ws(struct diff_options *o,
>  					struct buffered_patch_line *e,
> -					const char *ws, unsigned ws_rule)
> +					const char *ws, unsigned ws_rule,
> +					int pass)
>  {
>  	struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign};
> -
> -	emit_buffered_patch_line(o, &s);
> +	emit_buffered_patch_line(o, &s, 0);
>  	ws_check_emit(e->line, e->len, ws_rule,
>  		      o->file, e->set, e->reset, ws);
>  }
> @@ -564,12 +746,14 @@ static void process_next_buffered_patch_line(struct diff_options *o, int line_no
>  	const char *ws = o->current_filepair->ws;
>  	unsigned ws_rule = o->current_filepair->ws_rule;
>  
> +	mark_color_as_moved(o, line_no);
> +
>  	switch (e->state) {
>  		case BPL_EMIT_LINE_ASIS:
> -			emit_buffered_patch_line(o, e);
> +			emit_buffered_patch_line(o, e, 1);
>  			break;
>  		case BPL_EMIT_LINE_WS:
> -			emit_buffered_patch_line_ws(o, e, ws, ws_rule);
> +			emit_buffered_patch_line_ws(o, e, ws, ws_rule, 1);
>  			break;
>  		case BPL_HANDOVER:
>  			o->current_filepair++;
> @@ -602,7 +786,7 @@ static void emit_line_0(struct diff_options *o,
>  	if (o->use_buffer)
>  		append_buffered_patch_line(o, &e);
>  	else
> -		emit_buffered_patch_line(o, &e);
> +		emit_buffered_patch_line(o, &e, 0);
>  }
>  
>  void emit_line(struct diff_options *o, const char *set, const char *reset,
> @@ -621,7 +805,7 @@ static void emit_line_ws(struct diff_options *o,
>  	if (o->use_buffer)
>  		append_buffered_patch_line(o, &e);
>  	else
> -		emit_buffered_patch_line_ws(o, &e, ws, ws_rule);
> +		emit_buffered_patch_line_ws(o, &e, ws, ws_rule, 0);
>  }
>  
>  void emit_line_fmt(struct diff_options *o,
> @@ -676,6 +860,36 @@ static void emit_line_checked(const char *reset,
>  			     ws, ecbdata->ws_rule);
>  }
>  
> +static void add_line_to_move_detection(struct diff_options *o, int line_idx)
> +{
> +	int sign = 0;
> +	struct hashmap *hm;
> +	struct moved_entry *key;
> +
> +	switch (o->line_buffer[line_idx].sign) {
> +	case '+':
> +		sign = '+';
> +		hm = o->added_lines;
> +		break;
> +	case '-':
> +		sign = '-';
> +		hm = o->deleted_lines;
> +		break;
> +	case ' ':
> +	default:
> +		o->prev_line = NULL;
> +		return;
> +	}
> +
> +	key = prepare_entry(o, line_idx);
> +	if (o->prev_line &&
> +	    o->prev_line->line->sign == sign)
> +		o->prev_line->next_line = key;
> +
> +	hashmap_add(hm, key);
> +	o->prev_line = key;

'key' was freed in another function, does it need to be freed here?

> +}
> +
>  static void emit_add_line(const char *reset,
>  			  struct emit_callback *ecbdata,
>  			  const char *line, int len)
> @@ -3649,6 +3863,9 @@ void diff_setup_done(struct diff_options *options)
>  
>  	if (DIFF_OPT_TST(options, FOLLOW_RENAMES) && options->pathspec.nr != 1)
>  		die(_("--follow requires exactly one pathspec"));
> +
> +	if (!options->use_color || external_diff())
> +		options->color_moved = 0;
>  }
>  
>  static int opt_arg(const char *arg, int arg_short, const char *arg_long, int *val)
> @@ -4073,6 +4290,10 @@ int diff_opt_parse(struct diff_options *options,
>  	}
>  	else if (!strcmp(arg, "--no-color"))
>  		options->use_color = 0;
> +	else if (!strcmp(arg, "--color-moved"))
> +		options->color_moved = 1;
> +	else if (!strcmp(arg, "--no-color-moved"))
> +		options->color_moved = 0;
>  	else if (!strcmp(arg, "--color-words")) {
>  		options->use_color = 1;
>  		options->word_diff = DIFF_WORDS_COLOR;
> @@ -4878,16 +5099,19 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>  {
>  	int i;
>  	struct diff_queue_struct *q = &diff_queued_diff;
> -	/*
> -	 * For testing purposes we want to make sure the diff machinery
> -	 * works completely with the buffer. If there is anything emitted
> -	 * outside the emit_buffered_patch_line, then the order is screwed
> -	 * up and the tests will fail.
> -	 *
> -	 * TODO (later in this series):
> -	 * We'll unset this flag in a later patch.
> -	 */
> -	o->use_buffer = 1;
> +
> +	if (o->color_moved) {
> +		unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
> +		o->use_buffer = 1;
> +		o->deleted_lines = xmallocz(sizeof(*o->deleted_lines));
> +		o->added_lines = xmallocz(sizeof(*o->added_lines));
> +		hashmap_init(o->deleted_lines, ignore_ws ?
> +			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
> +			(hashmap_cmp_fn)moved_entry_cmp, 0);
> +		hashmap_init(o->added_lines, ignore_ws ?
> +			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
> +			(hashmap_cmp_fn)moved_entry_cmp, 0);
> +	}
>  
>  	if (o->use_buffer) {
>  		ALLOC_GROW(o->filepair_buffer,
> @@ -4902,6 +5126,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>  	}
>  
>  	if (o->use_buffer) {
> +		o->current_filepair = &o->filepair_buffer[0];
> +		for (i = 0; i < o->line_buffer_nr; i++)
> +			add_line_to_move_detection(o, i);
> +
>  		o->current_filepair = &o->filepair_buffer[0];
>  		for (i = 0; i < o->line_buffer_nr; i++)
>  			process_next_buffered_patch_line(o, i);
> @@ -4992,6 +5220,7 @@ void diff_flush(struct diff_options *options)
>  		if (!options->file)
>  			die_errno("Could not open /dev/null");
>  		options->close_file = 1;
> +		options->color_moved = 0;
>  		for (i = 0; i < q->nr; i++) {
>  			struct diff_filepair *p = q->queue[i];
>  			if (check_pair_status(p))
> diff --git a/diff.h b/diff.h
> index c334aac02e..b83d6fefcc 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -7,6 +7,7 @@
>  #include "tree-walk.h"
>  #include "pathspec.h"
>  #include "object.h"
> +#include "hashmap.h"
>  
>  struct rev_info;
>  struct diff_options;
> @@ -145,6 +146,8 @@ struct buffered_filepair {
>  	unsigned ws_rule;
>  };
>  
> +struct moved_entry;
> +
>  struct diff_options {
>  	const char *orderfile;
>  	const char *pickaxe;
> @@ -217,6 +220,8 @@ struct diff_options {
>  
>  	int diff_path_counter;
>  
> +	/* Determines color moved code. Flipped between 1, 2 for alt. color. */
> +	int color_moved;
>  	int use_buffer;
>  
>  	struct buffered_patch_line *line_buffer;
> @@ -225,6 +230,16 @@ struct diff_options {
>  	struct buffered_filepair *filepair_buffer;
>  	int filepair_buffer_nr, filepair_buffer_alloc;
>  	struct buffered_filepair *current_filepair;
> +
> +	/* built up in the first pass: */
> +	struct hashmap *deleted_lines;
> +	struct hashmap *added_lines;
> +	/* needed for building up */
> +	struct moved_entry *prev_line;
> +
> +	/* state in the second pass */
> +	struct moved_entry **pmb; /* potentially moved blocks */
> +	int pmb_nr, pmb_alloc;
>  };
>  
>  void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
> @@ -241,7 +256,11 @@ enum color_diff {
>  	DIFF_FILE_NEW = 5,
>  	DIFF_COMMIT = 6,
>  	DIFF_WHITESPACE = 7,
> -	DIFF_FUNCINFO = 8
> +	DIFF_FUNCINFO = 8,
> +	DIFF_FILE_OLD_MOVED = 9,
> +	DIFF_FILE_OLD_MOVED_ALT = 10,
> +	DIFF_FILE_NEW_MOVED = 11,
> +	DIFF_FILE_NEW_MOVED_ALT = 12
>  };
>  const char *diff_get_color(int diff_use_color, enum color_diff ix);
>  #define diff_get_color_opt(o, ix) \
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 289806d0c7..232d9ad55e 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -972,4 +972,233 @@ test_expect_success 'option overrides diff.wsErrorHighlight' '
>  
>  '
>  
> +test_expect_success 'detect moved code, complete file' '
> +	git reset --hard &&
> +	cat <<-\EOF >test.c &&
> +	#include<stdio.h>
> +	main()
> +	{
> +	printf("Hello World");
> +	}
> +	EOF
> +	git add test.c &&
> +	git commit -m "add main function" &&
> +	git mv test.c main.c &&
> +	git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
> +	cat >expected <<-\EOF &&
> +	<BOLD>diff --git a/main.c b/main.c<RESET>
> +	<BOLD>new file mode 100644<RESET>
> +	<BOLD>index 0000000..a986c57<RESET>
> +	<BOLD>--- /dev/null<RESET>
> +	<BOLD>+++ b/main.c<RESET>
> +	<CYAN>@@ -0,0 +1,5 @@<RESET>
> +	<BGREEN>+<RESET><BGREEN>#include<stdio.h><RESET>
> +	<BGREEN>+<RESET><BGREEN>main()<RESET>
> +	<BGREEN>+<RESET><BGREEN>{<RESET>
> +	<BGREEN>+<RESET><BGREEN>printf("Hello World");<RESET>
> +	<BGREEN>+<RESET><BGREEN>}<RESET>
> +	<BOLD>diff --git a/test.c b/test.c<RESET>
> +	<BOLD>deleted file mode 100644<RESET>
> +	<BOLD>index a986c57..0000000<RESET>
> +	<BOLD>--- a/test.c<RESET>
> +	<BOLD>+++ /dev/null<RESET>
> +	<CYAN>@@ -1,5 +0,0 @@<RESET>
> +	<BRED>-#include<stdio.h><RESET>
> +	<BRED>-main()<RESET>
> +	<BRED>-{<RESET>
> +	<BRED>-printf("Hello World");<RESET>
> +	<BRED>-}<RESET>
> +	EOF
> +
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'detect moved code, inside file' '
> +	git reset --hard &&
> +	cat <<-\EOF >main.c &&
> +		#include<stdio.h>
> +		int stuff()
> +		{
> +			printf("Hello ");
> +			printf("World\n");
> +		}
> +
> +		int secure_foo(struct user *u)
> +		{
> +			if (!u->is_allowed_foo)
> +				return;
> +			foo(u);
> +		}
> +
> +		int main()
> +		{
> +			foo();
> +		}
> +	EOF
> +	cat <<-\EOF >test.c &&
> +		#include<stdio.h>
> +		int bar()
> +		{
> +			printf("Hello World, but different\n");
> +		}
> +
> +		int another_function()
> +		{
> +			bar();
> +		}
> +	EOF
> +	git add main.c test.c &&
> +	git commit -m "add main and test file" &&
> +	cat <<-\EOF >main.c &&
> +		#include<stdio.h>
> +		int stuff()
> +		{
> +			printf("Hello ");
> +			printf("World\n");
> +		}
> +
> +		int main()
> +		{
> +			foo();
> +		}
> +	EOF
> +	cat <<-\EOF >test.c &&
> +		#include<stdio.h>
> +		int bar()
> +		{
> +			printf("Hello World, but different\n");
> +		}
> +
> +		int secure_foo(struct user *u)
> +		{
> +			if (!u->is_allowed_foo)
> +				return;
> +			foo(u);
> +		}
> +
> +		int another_function()
> +		{
> +			bar();
> +		}
> +	EOF
> +	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
> +	cat <<-\EOF >expected &&
> +	<BOLD>diff --git a/main.c b/main.c<RESET>
> +	<BOLD>index 27a619c..7cf9336 100644<RESET>
> +	<BOLD>--- a/main.c<RESET>
> +	<BOLD>+++ b/main.c<RESET>
> +	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
> +	 printf("World\n");<RESET>
> +	 }<RESET>
> +	 <RESET>
> +	<BRED>-int secure_foo(struct user *u)<RESET>
> +	<BRED>-{<RESET>
> +	<BRED>-if (!u->is_allowed_foo)<RESET>
> +	<BRED>-return;<RESET>
> +	<BRED>-foo(u);<RESET>
> +	<BRED>-}<RESET>
> +	<BRED>-<RESET>
> +	 int main()<RESET>
> +	 {<RESET>
> +	 foo();<RESET>
> +	<BOLD>diff --git a/test.c b/test.c<RESET>
> +	<BOLD>index 1dc1d85..e34eb69 100644<RESET>
> +	<BOLD>--- a/test.c<RESET>
> +	<BOLD>+++ b/test.c<RESET>
> +	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
> +	 printf("Hello World, but different\n");<RESET>
> +	 }<RESET>
> +	 <RESET>
> +	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
> +	<BGREEN>+<RESET><BGREEN>{<RESET>
> +	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
> +	<BGREEN>+<RESET><BGREEN>return;<RESET>
> +	<BGREEN>+<RESET><BGREEN>foo(u);<RESET>
> +	<BGREEN>+<RESET><BGREEN>}<RESET>
> +	<BGREEN>+<RESET>
> +	 int another_function()<RESET>
> +	 {<RESET>
> +	 bar();<RESET>
> +	EOF
> +
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'detect permutations inside moved code, ' '
> +	# reusing the move example from last test:
> +	cat <<-\EOF >main.c &&
> +		#include<stdio.h>
> +		int stuff()
> +		{
> +			printf("Hello ");
> +			printf("World\n");
> +		}
> +
> +		int main()
> +		{
> +			foo();
> +		}
> +	EOF
> +	cat <<-\EOF >test.c &&
> +		#include<stdio.h>
> +		int bar()
> +		{
> +			printf("Hello World, but different\n");
> +		}
> +
> +		int secure_foo(struct user *u)
> +		{
> +			foo(u);
> +			if (!u->is_allowed_foo)
> +				return;
> +		}
> +
> +		int another_function()
> +		{
> +			bar();
> +		}
> +	EOF
> +	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
> +	cat <<-\EOF >expected &&
> +	<BOLD>diff --git a/main.c b/main.c<RESET>
> +	<BOLD>index 27a619c..7cf9336 100644<RESET>
> +	<BOLD>--- a/main.c<RESET>
> +	<BOLD>+++ b/main.c<RESET>
> +	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
> +	 printf("World\n");<RESET>
> +	 }<RESET>
> +	 <RESET>
> +	<BRED>-int secure_foo(struct user *u)<RESET>
> +	<BRED>-{<RESET>
> +	<BOLD;RED>-if (!u->is_allowed_foo)<RESET>
> +	<BOLD;RED>-return;<RESET>
> +	<BRED>-foo(u);<RESET>
> +	<BOLD;RED>-}<RESET>
> +	<BOLD;RED>-<RESET>
> +	 int main()<RESET>
> +	 {<RESET>
> +	 foo();<RESET>
> +	<BOLD>diff --git a/test.c b/test.c<RESET>
> +	<BOLD>index 1dc1d85..2bedec9 100644<RESET>
> +	<BOLD>--- a/test.c<RESET>
> +	<BOLD>+++ b/test.c<RESET>
> +	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
> +	 printf("Hello World, but different\n");<RESET>
> +	 }<RESET>
> +	 <RESET>
> +	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
> +	<BGREEN>+<RESET><BGREEN>{<RESET>
> +	<BOLD;GREEN>+<RESET><BOLD;GREEN>foo(u);<RESET>
> +	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
> +	<BGREEN>+<RESET><BGREEN>return;<RESET>
> +	<BOLD;GREEN>+<RESET><BOLD;GREEN>}<RESET>
> +	<BOLD;GREEN>+<RESET>
> +	 int another_function()<RESET>
> +	 {<RESET>
> +	 bar();<RESET>
> +	EOF
> +
> +	test_cmp expected actual
> +'
> +
>  test_done
> -- 
> 2.13.0.18.g183880de0a
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 14/19] diff.c: convert word diffing to use emit_line_*
  2017-05-15 22:40   ` Jonathan Tan
@ 2017-05-15 23:12     ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-15 23:12 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 3:40 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
>
> I suspect that this will need to be refactored more thoroughly. Here, for
> example, emit_line (which prints the prefix) is printed nearly
> unconditionally, whereas in the original version, "fputs(line_prefix, fp)"
> is only printed when "print" is true.

Yes, manual testing confirms this is the case. Maybe I should to add a test
for "git diff --word-diff --line-prefix"

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_*
  2017-05-14  4:01 ` [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_* Stefan Beller
@ 2017-05-16  1:00   ` Junio C Hamano
  2017-05-16  1:05     ` Junio C Hamano
  0 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-16  1:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, peff, mhagger, jrnieder, bmwill

Stefan Beller <sbeller@google.com> writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers the parts of fn_out_consume.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index aef159a919..93343a9ccc 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1289,7 +1289,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>  	const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
>  	const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
>  	struct diff_options *o = ecbdata->opt;
> -	const char *line_prefix = diff_line_prefix(o);
>  
>  	o->found_changes = 1;
>  
> @@ -1301,14 +1300,12 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>  
>  	if (ecbdata->label_path[0]) {
>  		const char *name_a_tab, *name_b_tab;
> -
>  		name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
>  		name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
> -
> -		fprintf(o->file, "%s%s--- %s%s%s\n",
> -			line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab);
> -		fprintf(o->file, "%s%s+++ %s%s%s\n",
> -			line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab);
> +		emit_line_fmt(o, meta, reset, "--- %s%s\n",
> +			      ecbdata->label_path[0], name_a_tab);
> +		emit_line_fmt(o, meta, reset, "+++ %s%s\n",
> +			      ecbdata->label_path[1], name_b_tab);

How is the loss of line_prefix from this call site compensated?
emit_line_fmt() receives o so it is possible diff_line_prefix(o)
may be called there and prepended to the output over there, but I
somehow do not think that is the case---in fact 06/19 does not seem
to teach emit_line_fmt() to do something like that.

Unless emit_line() is now taught to do the line_prefix thing, that
is.

But then the hunk we see below, which didn't add line_prefix to the
output, would add an unwanted prefix, so that is not likely.

Hmph...

>  		ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
>  	}
>  
> @@ -1349,7 +1346,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>  		diff_words_flush(ecbdata);
>  		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
>  			emit_line(o, context, reset, line, len);
> -			fputs("~\n", o->file);
> +			emit_line(o, NULL, NULL, "~\n", 2);
>  		} else {
>  			/*
>  			 * Skip the prefix character, if any.  With

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_*
  2017-05-16  1:00   ` Junio C Hamano
@ 2017-05-16  1:05     ` Junio C Hamano
  2017-05-16 16:23       ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-16  1:05 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, peff, mhagger, jrnieder, bmwill

Junio C Hamano <gitster@pobox.com> writes:

>> -		fprintf(o->file, "%s%s--- %s%s%s\n",
>> -			line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab);
>> -		fprintf(o->file, "%s%s+++ %s%s%s\n",
>> -			line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab);
>> +		emit_line_fmt(o, meta, reset, "--- %s%s\n",
>> +			      ecbdata->label_path[0], name_a_tab);
>> +		emit_line_fmt(o, meta, reset, "+++ %s%s\n",
>> +			      ecbdata->label_path[1], name_b_tab);
>
> How is the loss of line_prefix from this call site compensated?

OK, emit_line_0() has already been aware of line_prefix, so that is
how the loss of line_prefix in the above is accounted for.  We are
good here.

>>  		ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
>>  	}
>>  
>> @@ -1349,7 +1346,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>>  		diff_words_flush(ecbdata);
>>  		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
>>  			emit_line(o, context, reset, line, len);
>> -			fputs("~\n", o->file);
>> +			emit_line(o, NULL, NULL, "~\n", 2);

So unless we have some magic here, we would see an extra line-prefix
before that "~\n" thing, no?


>>  		} else {
>>  			/*
>>  			 * Skip the prefix character, if any.  With

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 18/19] diff: buffer all output if asked to
  2017-05-14  4:01 ` [PATCH 18/19] diff: buffer all output if asked to Stefan Beller
  2017-05-14  4:06   ` Jeff King
@ 2017-05-16  4:14   ` Jonathan Tan
  2017-05-16 16:42     ` Stefan Beller
  1 sibling, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-16  4:14 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git mailing list, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

Overall, this patch seems larger than it should to me, although there
might be good reasons for that that I don't know. I'll remark on what
I find unexpected.

On Sat, May 13, 2017 at 9:01 PM, Stefan Beller <sbeller@google.com> wrote:
> diff --git a/diff.c b/diff.c
> index 08dcc56bb9..dbab7fb44e 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -516,29 +516,29 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
>         ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>  }
>
> -static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
> -                       int sign, const char *line, int len)
> +static void emit_buffered_patch_line(struct diff_options *o,
> +                                    struct buffered_patch_line *e)
>  {
> -       int has_trailing_newline, has_trailing_carriage_return;
> +       int has_trailing_newline, has_trailing_carriage_return, len = e->len;
>         FILE *file = o->file;
>
>         fputs(diff_line_prefix(o), file);
>
> -       has_trailing_newline = (len > 0 && line[len-1] == '\n');
> +       has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
>         if (has_trailing_newline)
>                 len--;
> -       has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
> +       has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
>         if (has_trailing_carriage_return)
>                 len--;
>
> -       if (len || sign) {
> -               if (set)
> -                       fputs(set, file);
> -               if (sign)
> -                       fputc(sign, file);
> -               fwrite(line, len, 1, file);
> -               if (reset)
> -                       fputs(reset, file);
> +       if (len || e->sign) {
> +               if (e->set)
> +                       fputs(e->set, file);
> +               if (e->sign)
> +                       fputc(e->sign, file);
> +               fwrite(e->line, len, 1, file);
> +               if (e->reset)
> +                       fputs(e->reset, file);
>         }
>         if (has_trailing_carriage_return)
>                 fputc('\r', file);
> @@ -546,6 +546,65 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
>                 fputc('\n', file);
>  }
>
> +static void emit_buffered_patch_line_ws(struct diff_options *o,
> +                                       struct buffered_patch_line *e,
> +                                       const char *ws, unsigned ws_rule)

This introduces a new _ws emission function - how is this used and how
is this different from the non-ws one? I see BPL_EMIT_LINE_WS, but I
don't see the caller that introduces that constant in this patch.

> +{
> +       struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign};
> +
> +       emit_buffered_patch_line(o, &s);
> +       ws_check_emit(e->line, e->len, ws_rule,
> +                     o->file, e->set, e->reset, ws);
> +}
> +
> +static void process_next_buffered_patch_line(struct diff_options *o, int line_no)
> +{
> +       struct buffered_patch_line *e = &o->line_buffer[line_no];
> +
> +       const char *ws = o->current_filepair->ws;
> +       unsigned ws_rule = o->current_filepair->ws_rule;
> +
> +       switch (e->state) {
> +               case BPL_EMIT_LINE_ASIS:
> +                       emit_buffered_patch_line(o, e);
> +                       break;
> +               case BPL_EMIT_LINE_WS:
> +                       emit_buffered_patch_line_ws(o, e, ws, ws_rule);
> +                       break;
> +               case BPL_HANDOVER:
> +                       o->current_filepair++;

If we're just buffering the diff output, do we need to store
per-file-pair metadata? (I assume that's why you need a special
handover constant.) Clients can already read what they need from the
diff output.

> +                       break;
> +               default:
> +                       die("BUG: malformatted buffered patch line: '%d'", e->state);
> +       }
> +}
> +
> +static void append_buffered_patch_line(struct diff_options *o,
> +                                      struct buffered_patch_line *e)
> +{
> +       struct buffered_patch_line *f;
> +       ALLOC_GROW(o->line_buffer,
> +                  o->line_buffer_nr + 1,
> +                  o->line_buffer_alloc);
> +       f = &o->line_buffer[o->line_buffer_nr];
> +       o->line_buffer_nr++;
> +
> +       memcpy(f, e, sizeof(struct buffered_patch_line));
> +       f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
> +}
> +
> +static void emit_line_0(struct diff_options *o,
> +                       const char *set, const char *reset,
> +                       int sign, const char *line, int len)
> +{
> +       struct buffered_patch_line e = {set, reset, line, len, sign, BPL_EMIT_LINE_ASIS};
> +
> +       if (o->use_buffer)
> +               append_buffered_patch_line(o, &e);
> +       else
> +               emit_buffered_patch_line(o, &e);
> +}
> +
>  void emit_line(struct diff_options *o, const char *set, const char *reset,
>                const char *line, int len)
>  {
> @@ -557,9 +616,12 @@ static void emit_line_ws(struct diff_options *o,
>                          const char *line, int len,
>                          const char *ws, unsigned ws_rule)
>  {
> -       emit_line_0(o, set, reset, sign, "", 0);
> -       ws_check_emit(line, len, ws_rule,
> -                     o->file, set, reset, ws);
> +       struct buffered_patch_line e = {set, reset, line, len, sign, BPL_EMIT_LINE_WS};
> +
> +       if (o->use_buffer)
> +               append_buffered_patch_line(o, &e);
> +       else
> +               emit_buffered_patch_line_ws(o, &e, ws, ws_rule);
>  }
>
>  void emit_line_fmt(struct diff_options *o,
> @@ -1160,6 +1222,16 @@ static void diff_words_flush(struct emit_callback *ecbdata)
>         if (ecbdata->diff_words->minus.text.size ||
>             ecbdata->diff_words->plus.text.size)
>                 diff_words_show(ecbdata->diff_words);
> +
> +       if (ecbdata->diff_words->opt->line_buffer_nr) {
> +               int i;
> +               for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
> +                       append_buffered_patch_line(ecbdata->opt,
> +                               &ecbdata->diff_words->opt->line_buffer[i]);
> +
> +               ecbdata->diff_words->opt->line_buffer_nr = 0;
> +               /* TODO: free memory as well */
> +       }
>  }
>
>  static void diff_filespec_load_driver(struct diff_filespec *one)
> @@ -1195,6 +1267,11 @@ static void init_diff_words_data(struct emit_callback *ecbdata,
>                 xcalloc(1, sizeof(struct diff_words_data));
>         ecbdata->diff_words->type = o->word_diff;
>         ecbdata->diff_words->opt = o;
> +
> +       o->line_buffer = NULL;
> +       o->line_buffer_nr = 0;
> +       o->line_buffer_alloc = 0;
> +
>         if (!o->word_regex)
>                 o->word_regex = userdiff_word_regex(one);
>         if (!o->word_regex)
> @@ -2568,9 +2645,25 @@ static void builtin_diff(const char *name_a,
>                         xecfg.ctxlen = strtoul(v, NULL, 10);
>                 if (o->word_diff)
>                         init_diff_words_data(&ecbdata, o, one, two);
> +               if (o->use_buffer) {
> +                       ALLOC_GROW(o->filepair_buffer,
> +                                  o->filepair_buffer_nr + 1,
> +                                  o->filepair_buffer_alloc);
> +                       o->current_filepair =
> +                               &o->filepair_buffer[o->filepair_buffer_nr++];
> +
> +                       o->current_filepair->ws_rule = ecbdata.ws_rule;
> +                       o->current_filepair->ws =
> +                               diff_get_color(ecbdata.color_diff, DIFF_WHITESPACE);
> +               }
>                 if (xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
>                                   &xpp, &xecfg))
>                         die("unable to generate diff for %s", one->path);
> +               if (o->use_buffer) {
> +                       struct buffered_patch_line e = BUFFERED_PATCH_LINE_INIT;
> +                       e.state = BPL_HANDOVER; /* handover to next file pair */
> +                       append_buffered_patch_line(o, &e);
> +               }
>                 if (o->word_diff)
>                         free_diff_words_data(&ecbdata);
>                 if (textconv_one)
> @@ -4785,11 +4878,44 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>  {
>         int i;
>         struct diff_queue_struct *q = &diff_queued_diff;
> +       /*
> +        * For testing purposes we want to make sure the diff machinery
> +        * works completely with the buffer. If there is anything emitted
> +        * outside the emit_buffered_patch_line, then the order is screwed
> +        * up and the tests will fail.
> +        *
> +        * TODO (later in this series):
> +        * We'll unset this flag in a later patch.
> +        */
> +       o->use_buffer = 1;

What I would do is to add a demonstration patch at the end of the
patch series (which is not supposed to be queued) to avoid such churn
in history, but I'm not sure how the Git project prefers to do this.

> +
> +       if (o->use_buffer) {
> +               ALLOC_GROW(o->filepair_buffer,
> +                          o->filepair_buffer_nr + 1,
> +                          o->filepair_buffer_alloc);
> +               o->current_filepair = &o->filepair_buffer[o->filepair_buffer_nr];
> +       }
>         for (i = 0; i < q->nr; i++) {
>                 struct diff_filepair *p = q->queue[i];
>                 if (check_pair_status(p))
>                         diff_flush_patch(p, o);
>         }
> +
> +       if (o->use_buffer) {
> +               o->current_filepair = &o->filepair_buffer[0];
> +               for (i = 0; i < o->line_buffer_nr; i++)
> +                       process_next_buffered_patch_line(o, i);
> +
> +               for (i = 0; i < o->line_buffer_nr; i++);
> +                       free((void*)o->line_buffer[i].line);
> +
> +               o->line_buffer = NULL;
> +               o->line_buffer_nr = 0;
> +               free(o->line_buffer);
> +               o->filepair_buffer = NULL;
> +               o->filepair_buffer_nr = 0;
> +               free(o->filepair_buffer);
> +       }
>  }
>
>  void diff_flush(struct diff_options *options)
> diff --git a/diff.h b/diff.h
> index 5e89481769..c334aac02e 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -115,6 +115,36 @@ enum diff_submodule_format {
>         DIFF_SUBMODULE_INLINE_DIFF
>  };
>
> +/*
> + * This struct is used when we need to buffer the output of the diff output.
> + *
> + * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
> + * into the pre/post image file. This pointer could be a union with the
> + * line pointer. By storing an offset into the file instead of the literal line,
> + * we can decrease the memory footprint for the buffered output. At first we
> + * may want to only have indirection for the content lines, but we could
> + * also have an enum (based on sign?) that stores prefabricated lines, e.g.
> + * the similarity score line or hunk/file headers.

This would be nice, but come to think of it, might not be possible.
When requesting --word-diff, control characters (or others) might
appear in the output, right?

> + */
> +struct buffered_patch_line {
> +       const char *set;
> +       const char *reset;
> +       const char *line;
> +       int len;
> +       int sign;
> +       enum {
> +               BPL_EMIT_LINE_WS,
> +               BPL_EMIT_LINE_ASIS,
> +               BPL_HANDOVER
> +       } state;

It might be better, for simplicity, just to have one big buffer
including everything (if we decide that we really can't add pointers
to input later).

> +};
> +#define BUFFERED_PATCH_LINE_INIT {NULL, NULL, NULL, 0, 0, 0}
> +
> +struct buffered_filepair {
> +       const char *ws;
> +       unsigned ws_rule;
> +};
> +
>  struct diff_options {
>         const char *orderfile;
>         const char *pickaxe;
> @@ -186,6 +216,15 @@ struct diff_options {
>         void *output_prefix_data;
>
>         int diff_path_counter;
> +
> +       int use_buffer;
> +
> +       struct buffered_patch_line *line_buffer;
> +       int line_buffer_nr, line_buffer_alloc;
> +
> +       struct buffered_filepair *filepair_buffer;
> +       int filepair_buffer_nr, filepair_buffer_alloc;
> +       struct buffered_filepair *current_filepair;
>  };
>
>  void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
> --
> 2.13.0.18.g183880de0a
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 19/19] diff.c: color moved lines differently
  2017-05-14  4:01 ` [PATCH 19/19] diff.c: color moved lines differently Stefan Beller
  2017-05-15 22:42   ` Brandon Williams
@ 2017-05-16  4:34   ` Jonathan Tan
  2017-05-16 12:31   ` Jeff King
  2 siblings, 0 replies; 128+ messages in thread
From: Jonathan Tan @ 2017-05-16  4:34 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git mailing list, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

I expected there to be one main function that takes in a diff options
and returns the appropriate output without much (if any) changes in
other functions...but (as with the previous patch) maybe there are
some complications that I didn't foresee.

On Sat, May 13, 2017 at 9:01 PM, Stefan Beller <sbeller@google.com> wrote:
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 475e874d51..90403c06e3 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the
>  'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
>  command line with the `--color[=<when>]` option.
>
> +color.moved::
> +       A boolean value, whether a diff should color moved lines
> +       differently. The moved lines are searched for in the diff only.
> +       Duplicated lines from somewhere in the project that are not
> +       part of the diff are not colored as moved.
> +       Defaults to false.
> +
>  color.diff.<slot>::
>         Use customized color for diff colorization.  `<slot>` specifies
>         which part of the patch to use the specified color, and is one
>         of `context` (context text - `plain` is a historical synonym),
>         `meta` (metainformation), `frag`
>         (hunk header), 'func' (function in hunk header), `old` (removed lines),
> -       `new` (added lines), `commit` (commit headers), or `whitespace`
> -       (highlighting whitespace errors).
> +       `new` (added lines), `commit` (commit headers), `whitespace`
> +       (highlighting whitespace errors), `movedFrom` (removed lines that
> +       reappear), `movedTo` (added lines that were removed elsewhere).

There should be 4 "moved" colors. I think the code below is correct
(oldmoved, oldmovedalternate, etc.) but the documentation above is
wrong.

>
>  color.decorate.<slot>::
>         Use customized color for 'git log --decorate' output.  `<slot>` is one
> diff --git a/diff.c b/diff.c
> index dbab7fb44e..6372e0eb25 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -31,6 +31,7 @@ static int diff_indent_heuristic; /* experimental */
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> +static int diff_color_moved_default;
>  static int diff_context_default = 3;
>  static int diff_interhunk_context_default;
>  static const char *diff_word_regex_cfg;
> @@ -55,6 +56,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
>         GIT_COLOR_YELLOW,       /* COMMIT */
>         GIT_COLOR_BG_RED,       /* WHITESPACE */
>         GIT_COLOR_NORMAL,       /* FUNCINFO */
> +       GIT_COLOR_BOLD_RED,     /* OLD_MOVED_A */
> +       GIT_COLOR_BG_RED,       /* OLD_MOVED_B */
> +       GIT_COLOR_BOLD_GREEN,   /* NEW_MOVED_A */
> +       GIT_COLOR_BG_GREEN,     /* NEW_MOVED_B */
>  };
>
>  static NORETURN void die_want_option(const char *option_name)
> @@ -80,6 +85,14 @@ static int parse_diff_color_slot(const char *var)
>                 return DIFF_WHITESPACE;
>         if (!strcasecmp(var, "func"))
>                 return DIFF_FUNCINFO;
> +       if (!strcasecmp(var, "oldmoved"))
> +               return DIFF_FILE_OLD_MOVED;
> +       if (!strcasecmp(var, "oldmovedalternative"))
> +               return DIFF_FILE_OLD_MOVED_ALT;
> +       if (!strcasecmp(var, "newmoved"))
> +               return DIFF_FILE_NEW_MOVED;
> +       if (!strcasecmp(var, "newmovedalternative"))
> +               return DIFF_FILE_NEW_MOVED_ALT;
>         return -1;
>  }
>
> @@ -234,6 +247,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>                 diff_use_color_default = git_config_colorbool(var, value);
>                 return 0;
>         }
> +       if (!strcmp(var, "color.moved")) {
> +               diff_color_moved_default = git_config_bool(var, value);
> +               return 0;
> +       }
>         if (!strcmp(var, "diff.context")) {
>                 diff_context_default = git_config_int(var, value);
>                 if (diff_context_default < 0)
> @@ -354,6 +371,81 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
>         return git_default_config(var, value, cb);
>  }
>
> +struct moved_entry {
> +       struct hashmap_entry ent;
> +       const struct buffered_patch_line *line;
> +       struct moved_entry *next_line;
> +};
> +
> +static void get_ws_cleaned_string(const struct buffered_patch_line *l,
> +                                 struct strbuf *out)
> +{
> +       int i;
> +       for (i = 0; i < l->len; i++) {
> +               if (isspace(l->line[i]))
> +                       continue;
> +               strbuf_addch(out, l->line[i]);
> +       }
> +}
> +
> +static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
> +                                        const struct buffered_patch_line *b,
> +                                        const void *keydata)
> +{
> +       struct strbuf sba = STRBUF_INIT;
> +       struct strbuf sbb = STRBUF_INIT;
> +       get_ws_cleaned_string(a, &sba);
> +       get_ws_cleaned_string(b, &sbb);
> +       return sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
> +}
> +
> +static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
> +                                  const struct buffered_patch_line *b,
> +                                  const void *keydata)
> +{
> +       return a->len != b->len || strncmp(a->line, b->line, a->len);
> +}
> +
> +static int moved_entry_cmp(const struct moved_entry *a,
> +                          const struct moved_entry *b,
> +                          const void *keydata)
> +{
> +       return buffered_patch_line_cmp(a->line, b->line, keydata);
> +}
> +
> +static int moved_entry_cmp_no_ws(const struct moved_entry *a,
> +                                const struct moved_entry *b,
> +                                const void *keydata)
> +{
> +       return buffered_patch_line_cmp_no_ws(a->line, b->line, keydata);
> +}
> +
> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
> +{
> +       static struct strbuf sb = STRBUF_INIT;
> +
> +       if (ignore_ws) {
> +               strbuf_reset(&sb);
> +               get_ws_cleaned_string(line, &sb);
> +               return memhash(sb.buf, sb.len);
> +       } else
> +               return memhash(line->line, line->len);
> +}
> +
> +static struct moved_entry *prepare_entry(struct diff_options *o,
> +                                        int line_no)
> +{
> +       struct moved_entry *ret = xmalloc(sizeof(*ret));
> +       unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
> +       struct buffered_patch_line *l = &o->line_buffer[line_no];
> +
> +       ret->ent.hash = get_line_hash(l, ignore_ws);
> +       ret->line = l;
> +       ret->next_line = NULL;
> +
> +       return ret;
> +}
> +
>  static char *quote_two(const char *one, const char *two)
>  {
>         int need_one = quote_c_style(one, NULL, NULL, 1);
> @@ -516,8 +608,98 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
>         ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>  }
>
> +static void mark_color_as_moved(struct diff_options *o, int line_no)
> +{
> +       struct hashmap *hm = NULL;
> +       struct moved_entry *key = prepare_entry(o, line_no);
> +       struct moved_entry *match = NULL;
> +       struct buffered_patch_line *l = &o->line_buffer[line_no];
> +       int alt_flag;
> +       int i, lp, rp;
> +
> +       switch (l->sign) {
> +       case '+':
> +               hm = o->deleted_lines;
> +               break;
> +       case '-':
> +               hm = o->added_lines;
> +               break;
> +       default:
> +               /* reset to standard, on-alt move color */
> +               o->color_moved = 1;
> +               break;
> +       }
> +
> +       /* Check for any match to color it as a move. */
> +       if (!hm)
> +               return;
> +       match = hashmap_get(hm, key, o);
> +       free(key);
> +       if (!match)
> +               return;
> +
> +       /* Check any potential block runs, advance each or nullify */
> +       for (i = 0; i < o->pmb_nr; i++) {
> +               struct moved_entry *p = o->pmb[i];
> +               if (p && p->next_line &&
> +                   !buffered_patch_line_cmp(p->next_line->line, l, o)) {
> +                       o->pmb[i] = p->next_line;
> +               } else {
> +                       o->pmb[i] = NULL;
> +               }
> +       }
> +
> +       /* Shrink the set to the remaining runs */
> +       for (lp = 0, rp = o->pmb_nr - 1; lp <= rp;) {
> +               while (lp < o->pmb_nr && o->pmb[lp])
> +                       lp ++;
> +               /* lp points at the first NULL now */
> +
> +               while (rp > -1 && !o->pmb[rp])
> +                       rp--;
> +               /* rp points at the last non-NULL */
> +
> +               if (lp < o->pmb_nr && rp > -1 && lp < rp) {
> +                       o->pmb[lp] = o->pmb[rp];
> +                       o->pmb[rp] = NULL;
> +                       rp--;
> +                       lp++;
> +               }
> +       }
> +
> +       if (rp > -1) {
> +               /* Remember the number of running sets */
> +               o->pmb_nr = rp + 1;
> +       } else {
> +               /* Toggle color */
> +               o->color_moved = o->color_moved == 2 ? 1 : 2;
> +
> +               /* Build up a new set */
> +               i = 0;
> +               for (; match; match = hashmap_get_next(hm, match)) {
> +                       ALLOC_GROW(o->pmb, i + 1, o->pmb_alloc);
> +                       o->pmb[i] = match;
> +                       i++;
> +               }
> +               o->pmb_nr = i;
> +       }
> +
> +       alt_flag = o->color_moved - 1;
> +       switch (l->sign) {
> +       case '+':
> +               l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
> +               break;
> +       case '-':
> +               l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
> +               break;
> +       default:
> +               ; /* nothing */
> +       }
> +}
> +
>  static void emit_buffered_patch_line(struct diff_options *o,
> -                                    struct buffered_patch_line *e)
> +                                    struct buffered_patch_line *e,
> +                                    int pass)

1. I didn't expect such a function to need to be modified in this patch.
2. What does "pass" do?

>  {
>         int has_trailing_newline, has_trailing_carriage_return, len = e->len;
>         FILE *file = o->file;
> @@ -548,11 +730,11 @@ static void emit_buffered_patch_line(struct diff_options *o,
>
>  static void emit_buffered_patch_line_ws(struct diff_options *o,
>                                         struct buffered_patch_line *e,
> -                                       const char *ws, unsigned ws_rule)
> +                                       const char *ws, unsigned ws_rule,
> +                                       int pass)
>  {
>         struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign};
> -
> -       emit_buffered_patch_line(o, &s);
> +       emit_buffered_patch_line(o, &s, 0);
>         ws_check_emit(e->line, e->len, ws_rule,
>                       o->file, e->set, e->reset, ws);
>  }
> @@ -564,12 +746,14 @@ static void process_next_buffered_patch_line(struct diff_options *o, int line_no
>         const char *ws = o->current_filepair->ws;
>         unsigned ws_rule = o->current_filepair->ws_rule;
>
> +       mark_color_as_moved(o, line_no);
> +
>         switch (e->state) {
>                 case BPL_EMIT_LINE_ASIS:
> -                       emit_buffered_patch_line(o, e);
> +                       emit_buffered_patch_line(o, e, 1);
>                         break;
>                 case BPL_EMIT_LINE_WS:
> -                       emit_buffered_patch_line_ws(o, e, ws, ws_rule);
> +                       emit_buffered_patch_line_ws(o, e, ws, ws_rule, 1);
>                         break;
>                 case BPL_HANDOVER:
>                         o->current_filepair++;
> @@ -602,7 +786,7 @@ static void emit_line_0(struct diff_options *o,
>         if (o->use_buffer)
>                 append_buffered_patch_line(o, &e);
>         else
> -               emit_buffered_patch_line(o, &e);
> +               emit_buffered_patch_line(o, &e, 0);
>  }
>
>  void emit_line(struct diff_options *o, const char *set, const char *reset,
> @@ -621,7 +805,7 @@ static void emit_line_ws(struct diff_options *o,
>         if (o->use_buffer)
>                 append_buffered_patch_line(o, &e);
>         else
> -               emit_buffered_patch_line_ws(o, &e, ws, ws_rule);
> +               emit_buffered_patch_line_ws(o, &e, ws, ws_rule, 0);
>  }
>
>  void emit_line_fmt(struct diff_options *o,
> @@ -676,6 +860,36 @@ static void emit_line_checked(const char *reset,
>                              ws, ecbdata->ws_rule);
>  }
>
> +static void add_line_to_move_detection(struct diff_options *o, int line_idx)
> +{
> +       int sign = 0;
> +       struct hashmap *hm;
> +       struct moved_entry *key;
> +
> +       switch (o->line_buffer[line_idx].sign) {
> +       case '+':
> +               sign = '+';
> +               hm = o->added_lines;
> +               break;
> +       case '-':
> +               sign = '-';
> +               hm = o->deleted_lines;
> +               break;
> +       case ' ':
> +       default:
> +               o->prev_line = NULL;
> +               return;
> +       }
> +
> +       key = prepare_entry(o, line_idx);
> +       if (o->prev_line &&
> +           o->prev_line->line->sign == sign)
> +               o->prev_line->next_line = key;
> +
> +       hashmap_add(hm, key);
> +       o->prev_line = key;
> +}
> +
>  static void emit_add_line(const char *reset,
>                           struct emit_callback *ecbdata,
>                           const char *line, int len)
> @@ -3649,6 +3863,9 @@ void diff_setup_done(struct diff_options *options)
>
>         if (DIFF_OPT_TST(options, FOLLOW_RENAMES) && options->pathspec.nr != 1)
>                 die(_("--follow requires exactly one pathspec"));
> +
> +       if (!options->use_color || external_diff())
> +               options->color_moved = 0;
>  }
>
>  static int opt_arg(const char *arg, int arg_short, const char *arg_long, int *val)
> @@ -4073,6 +4290,10 @@ int diff_opt_parse(struct diff_options *options,
>         }
>         else if (!strcmp(arg, "--no-color"))
>                 options->use_color = 0;
> +       else if (!strcmp(arg, "--color-moved"))
> +               options->color_moved = 1;
> +       else if (!strcmp(arg, "--no-color-moved"))
> +               options->color_moved = 0;
>         else if (!strcmp(arg, "--color-words")) {
>                 options->use_color = 1;
>                 options->word_diff = DIFF_WORDS_COLOR;
> @@ -4878,16 +5099,19 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>  {
>         int i;
>         struct diff_queue_struct *q = &diff_queued_diff;
> -       /*
> -        * For testing purposes we want to make sure the diff machinery
> -        * works completely with the buffer. If there is anything emitted
> -        * outside the emit_buffered_patch_line, then the order is screwed
> -        * up and the tests will fail.
> -        *
> -        * TODO (later in this series):
> -        * We'll unset this flag in a later patch.
> -        */
> -       o->use_buffer = 1;
> +
> +       if (o->color_moved) {
> +               unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
> +               o->use_buffer = 1;
> +               o->deleted_lines = xmallocz(sizeof(*o->deleted_lines));
> +               o->added_lines = xmallocz(sizeof(*o->added_lines));
> +               hashmap_init(o->deleted_lines, ignore_ws ?
> +                       (hashmap_cmp_fn)moved_entry_cmp_no_ws :
> +                       (hashmap_cmp_fn)moved_entry_cmp, 0);
> +               hashmap_init(o->added_lines, ignore_ws ?
> +                       (hashmap_cmp_fn)moved_entry_cmp_no_ws :
> +                       (hashmap_cmp_fn)moved_entry_cmp, 0);
> +       }
>
>         if (o->use_buffer) {
>                 ALLOC_GROW(o->filepair_buffer,
> @@ -4902,6 +5126,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>         }
>
>         if (o->use_buffer) {
> +               o->current_filepair = &o->filepair_buffer[0];
> +               for (i = 0; i < o->line_buffer_nr; i++)
> +                       add_line_to_move_detection(o, i);
> +
>                 o->current_filepair = &o->filepair_buffer[0];
>                 for (i = 0; i < o->line_buffer_nr; i++)
>                         process_next_buffered_patch_line(o, i);
> @@ -4992,6 +5220,7 @@ void diff_flush(struct diff_options *options)
>                 if (!options->file)
>                         die_errno("Could not open /dev/null");
>                 options->close_file = 1;
> +               options->color_moved = 0;
>                 for (i = 0; i < q->nr; i++) {
>                         struct diff_filepair *p = q->queue[i];
>                         if (check_pair_status(p))
> diff --git a/diff.h b/diff.h
> index c334aac02e..b83d6fefcc 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -7,6 +7,7 @@
>  #include "tree-walk.h"
>  #include "pathspec.h"
>  #include "object.h"
> +#include "hashmap.h"
>
>  struct rev_info;
>  struct diff_options;
> @@ -145,6 +146,8 @@ struct buffered_filepair {
>         unsigned ws_rule;
>  };
>
> +struct moved_entry;
> +
>  struct diff_options {
>         const char *orderfile;
>         const char *pickaxe;
> @@ -217,6 +220,8 @@ struct diff_options {
>
>         int diff_path_counter;
>
> +       /* Determines color moved code. Flipped between 1, 2 for alt. color. */
> +       int color_moved;
>         int use_buffer;
>
>         struct buffered_patch_line *line_buffer;
> @@ -225,6 +230,16 @@ struct diff_options {
>         struct buffered_filepair *filepair_buffer;
>         int filepair_buffer_nr, filepair_buffer_alloc;
>         struct buffered_filepair *current_filepair;
> +
> +       /* built up in the first pass: */
> +       struct hashmap *deleted_lines;
> +       struct hashmap *added_lines;
> +       /* needed for building up */
> +       struct moved_entry *prev_line;
> +
> +       /* state in the second pass */
> +       struct moved_entry **pmb; /* potentially moved blocks */
> +       int pmb_nr, pmb_alloc;

Placing these in the public API makes the scope unnecessarily large -
could these be stored in a struct (or better, in local variables)
private to the .c file? In particular, prev_line should not need a
scope greater than a function - a function could just loop through all
the buffered lines and construct the two hash maps, and prev_line
would not be needed elsewhere.

>  };
>
>  void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
> @@ -241,7 +256,11 @@ enum color_diff {
>         DIFF_FILE_NEW = 5,
>         DIFF_COMMIT = 6,
>         DIFF_WHITESPACE = 7,
> -       DIFF_FUNCINFO = 8
> +       DIFF_FUNCINFO = 8,
> +       DIFF_FILE_OLD_MOVED = 9,
> +       DIFF_FILE_OLD_MOVED_ALT = 10,
> +       DIFF_FILE_NEW_MOVED = 11,
> +       DIFF_FILE_NEW_MOVED_ALT = 12
>  };
>  const char *diff_get_color(int diff_use_color, enum color_diff ix);
>  #define diff_get_color_opt(o, ix) \
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 289806d0c7..232d9ad55e 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
[snip]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 19/19] diff.c: color moved lines differently
  2017-05-14  4:01 ` [PATCH 19/19] diff.c: color moved lines differently Stefan Beller
  2017-05-15 22:42   ` Brandon Williams
  2017-05-16  4:34   ` Jonathan Tan
@ 2017-05-16 12:31   ` Jeff King
  2 siblings, 0 replies; 128+ messages in thread
From: Jeff King @ 2017-05-16 12:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jonathantanmy, gitster, mhagger, jrnieder, bmwill

On Sat, May 13, 2017 at 09:01:17PM -0700, Stefan Beller wrote:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 475e874d51..90403c06e3 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1051,14 +1051,22 @@ This does not affect linkgit:git-format-patch[1] or the
>  'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
>  command line with the `--color[=<when>]` option.
>  
> +color.moved::
> +	A boolean value, whether a diff should color moved lines
> +	differently. The moved lines are searched for in the diff only.
> +	Duplicated lines from somewhere in the project that are not
> +	part of the diff are not colored as moved.
> +	Defaults to false.

I wanted to play with this series to see how it looked on a few commits.
Since this was the only documentation change, I tried "git -c
color.moved=true diff ...". But it doesn't seem to work.

If we grep for diff_color_moved_default, we can see it declared:

> diff --git a/diff.c b/diff.c
> index dbab7fb44e..6372e0eb25 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -31,6 +31,7 @@ static int diff_indent_heuristic; /* experimental */
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> +static int diff_color_moved_default;
>  static int diff_context_default = 3;
>  static int diff_interhunk_context_default;
>  static const char *diff_word_regex_cfg;

and we can see it parsed:

> @@ -234,6 +247,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>  		diff_use_color_default = git_config_colorbool(var, value);
>  		return 0;
>  	}
> +	if (!strcmp(var, "color.moved")) {
> +		diff_color_moved_default = git_config_bool(var, value);
> +		return 0;
> +	}

But then nobody uses it. ;)

I suspect diff_setup() needs to copy the default into the diff_options
struct?

By reading the code, I found --color-moved, which did work. Yay. So then
I wanted to see how it looked on a wide variety of commits. So I ran
"git log -p --color-moved", but it segfaulted. :-/

I didn't dig, but here's what valgrind says, in case it helps:

  $ valgrind ./git --no-pager log --oneline -p --color-moved
  [...]
  fcdb8874d diff: buffer all output if asked to
  ==8801== Invalid write of size 4
  ==8801==    at 0x1FD521: builtin_diff (diff.c:2869)
  ==8801==    by 0x1FF3CB: run_diff_cmd (diff.c:3582)
  ==8801==    by 0x1FF7F9: run_diff (diff.c:3670)
  ==8801==    by 0x202CC8: diff_flush_patch (diff.c:4664)
  ==8801==    by 0x204123: diff_flush_patch_all_file_pairs (diff.c:5125)
  ==8801==    by 0x204682: diff_flush (diff.c:5249)
  ==8801==    by 0x22CE18: log_tree_diff_flush (log-tree.c:775)
  ==8801==    by 0x22D063: log_tree_diff (log-tree.c:842)
  ==8801==    by 0x22D195: log_tree_commit (log-tree.c:871)
  ==8801==    by 0x1648CE: cmd_log_walk (log.c:365)
  ==8801==    by 0x1659B0: cmd_log (log.c:689)
  ==8801==    by 0x11A434: run_builtin (git.c:371)
  ==8801==  Address 0x8 is not stack'd, malloc'd or (recently) free'd
  ==8801== 
  ==8801== 
  ==8801== Process terminating with default action of signal 11
  (SIGSEGV)
  ==8801==  Access not within mapped region at address 0x8
  ==8801==    at 0x1FD521: builtin_diff (diff.c:2869)
  ==8801==    by 0x1FF3CB: run_diff_cmd (diff.c:3582)
  ==8801==    by 0x1FF7F9: run_diff (diff.c:3670)
  ==8801==    by 0x202CC8: diff_flush_patch (diff.c:4664)
  ==8801==    by 0x204123: diff_flush_patch_all_file_pairs (diff.c:5125)
  ==8801==    by 0x204682: diff_flush (diff.c:5249)
  ==8801==    by 0x22CE18: log_tree_diff_flush (log-tree.c:775)
  ==8801==    by 0x22D063: log_tree_diff (log-tree.c:842)
  ==8801==    by 0x22D195: log_tree_commit (log-tree.c:871)
  ==8801==    by 0x1648CE: cmd_log_walk (log.c:365)
  ==8801==    by 0x1659B0: cmd_log (log.c:689)
  ==8801==    by 0x11A434: run_builtin (git.c:371)

It does show one commit correctly; fcdb8874d (your commit, but picked up
from the list, so my commit sha1) is the second one. I don't know if
that's coincidence, or maybe there's something that's not properly
between diff runs.

-Peff

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0
  2017-05-15 18:33     ` Stefan Beller
@ 2017-05-16 16:05       ` Jonathan Tan
  0 siblings, 0 replies; 128+ messages in thread
From: Jonathan Tan @ 2017-05-16 16:05 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 11:33 AM, Stefan Beller <sbeller@google.com> wrote:
> On Mon, May 15, 2017 at 11:26 AM, Jonathan Tan <jonathantanmy@google.com> wrote:
>> I also don't understand the meaning of this paragraph - if you mean that
>> this patch teaches other callers to hardcode the sign, I don't see any such
>> changes in the diff below.
>
> The last two hunks of the patch switch two callers that call with a sign
> that is hard to reason about.

The last two hunks don't hardcode any signs, as far as I can see. They
do pass in a "first" character that may or may not be a sign, if that
is what you mean.

In any case, can you reword that paragraph into an imperative
statement (e.g. "teach X to...", "make X...")?

>>> +                       char term[2];
>>> +                       term[0] = options->line_termination;
>>> +                       term[1] = '\0';
>>> +
>>> +                       emit_line(options, NULL, NULL,
>>> +                                 term, 1);
>>
>>
>> If options->line_termination is 0, this is actually a zero-length string
>> (not 1).
>
> So passing in !!options->line_termination should be fine?

Yes, that would work. I slightly prefer !!term[0].

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_*
  2017-05-16  1:05     ` Junio C Hamano
@ 2017-05-16 16:23       ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-16 16:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jonathan Tan, Jeff King, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 6:05 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>>> -            fprintf(o->file, "%s%s--- %s%s%s\n",
>>> -                    line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab);
>>> -            fprintf(o->file, "%s%s+++ %s%s%s\n",
>>> -                    line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab);
>>> +            emit_line_fmt(o, meta, reset, "--- %s%s\n",
>>> +                          ecbdata->label_path[0], name_a_tab);
>>> +            emit_line_fmt(o, meta, reset, "+++ %s%s\n",
>>> +                          ecbdata->label_path[1], name_b_tab);
>>
>> How is the loss of line_prefix from this call site compensated?
>
> OK, emit_line_0() has already been aware of line_prefix, so that is
> how the loss of line_prefix in the above is accounted for.  We are
> good here.
>
>>>              ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
>>>      }
>>>
>>> @@ -1349,7 +1346,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
>>>              diff_words_flush(ecbdata);
>>>              if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
>>>                      emit_line(o, context, reset, line, len);
>>> -                    fputs("~\n", o->file);
>>> +                    emit_line(o, NULL, NULL, "~\n", 2);
>
> So unless we have some magic here, we would see an extra line-prefix
> before that "~\n" thing, no?

Right.

With all the discussion on the the convert* patches, I consider just introducing
another parameter to emit_line, which specifies whether we want
emit_line to output
a line_prefix. Then the conversion becomes a lot easier as well.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH 18/19] diff: buffer all output if asked to
  2017-05-16  4:14   ` Jonathan Tan
@ 2017-05-16 16:42     ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-16 16:42 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Git mailing list, Jeff King, Junio C Hamano, Michael Haggerty,
	Jonathan Nieder, Brandon Williams

On Mon, May 15, 2017 at 9:14 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> Overall, this patch seems larger than it should to me, although there
> might be good reasons for that that I don't know. I'll remark on what
> I find unexpected.
>


>>
>> +static void emit_buffered_patch_line_ws(struct diff_options *o,
>> +                                       struct buffered_patch_line *e,
>> +                                       const char *ws, unsigned ws_rule)
>
> This introduces a new _ws emission function - how is this used and how
> is this different from the non-ws one? I see BPL_EMIT_LINE_WS, but I
> don't see the caller that introduces that constant in this patch.

See emit_line_ws, which makes use of BPL_EMIT_LINE_WS.
The difference between BPL_EMIT_LINE_WS and BPL_EMIT_LINE_AS_IS
is that _WS is emitted marking up whitespace differently (e.g. 8 continuous
spaces instead of a tab or such), see the core.whitespace option.
Relevant hunk:

@@ -557,9 +616,12 @@ static void emit_line_ws(struct diff_options *o,
                         const char *line, int len,
                         const char *ws, unsigned ws_rule)
 {
-       emit_line_0(o, set, reset, sign, "", 0);
-       ws_check_emit(line, len, ws_rule,
-                     o->file, set, reset, ws);
+       struct buffered_patch_line e = {set, reset, line, len, sign,
BPL_EMIT_LINE_WS};
+
+       if (o->use_buffer)
+               append_buffered_patch_line(o, &e);
+       else
+               emit_buffered_patch_line_ws(o, &e, ws, ws_rule);
 }

>> +       switch (e->state) {
>> +               case BPL_EMIT_LINE_ASIS:
>> +                       emit_buffered_patch_line(o, e);
>> +                       break;
>> +               case BPL_EMIT_LINE_WS:
>> +                       emit_buffered_patch_line_ws(o, e, ws, ws_rule);
>> +                       break;
>> +               case BPL_HANDOVER:
>> +                       o->current_filepair++;
>
> If we're just buffering the diff output, do we need to store
> per-file-pair metadata? (I assume that's why you need a special
> handover constant.) Clients can already read what they need from the
> diff output.

Currently we keep only whitespace settings per-file separate as they are
defined per path (via attributes).

So I read this comment as if you consider the per-file buffer unneeded and
we could just detect the next file via the line

    <line-prefix>diff --git a/<file> b/<file>

and then re-read the attributes and remember only the current files settings in
the output pass. I'll look into that.

>> +        *
>> +        * TODO (later in this series):
>> +        * We'll unset this flag in a later patch.
>> +        */
>> +       o->use_buffer = 1;
>
> What I would do is to add a demonstration patch at the end of the
> patch series (which is not supposed to be queued) to avoid such churn
> in history, but I'm not sure how the Git project prefers to do this.

ok, I can omit this part in a reroll.

>> + *
>> + * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
>> + * into the pre/post image file. This pointer could be a union with the
>> + * line pointer. By storing an offset into the file instead of the literal line,
>> + * we can decrease the memory footprint for the buffered output. At first we
>> + * may want to only have indirection for the content lines, but we could
>> + * also have an enum (based on sign?) that stores prefabricated lines, e.g.
>> + * the similarity score line or hunk/file headers.
>
> This would be nice, but come to think of it, might not be possible.
> When requesting --word-diff, control characters (or others) might
> appear in the output, right?

That is why we'd have even more states. ;)
Or duplicate word diff still, but lines left intact are referenced via offsets.
It's a hard problem to get right, so I defer it via this comment.

>
>> + */
>> +struct buffered_patch_line {
>> +       const char *set;
>> +       const char *reset;
>> +       const char *line;
>> +       int len;
>> +       int sign;
>> +       enum {
>> +               BPL_EMIT_LINE_WS,
>> +               BPL_EMIT_LINE_ASIS,
>> +               BPL_HANDOVER
>> +       } state;
>
> It might be better, for simplicity, just to have one big buffer
> including everything (if we decide that we really can't add pointers
> to input later).

What do you mean here? (Drop the other structs such as the file pair?)

Thanks for the review!
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [PATCHv2 00/20] Diff machine: highlight moved lines.
  2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
                   ` (19 preceding siblings ...)
  2017-05-15 12:43 ` [RFC PATCH 00/19] Diff machine: highlight moved lines Junio C Hamano
@ 2017-05-17  2:58 ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 01/20] diff: readability fix Stefan Beller
                     ` (20 more replies)
  20 siblings, 21 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

v2:
* emit_line now takes an argument that indicates if we want it
  to emit the line prefix as well. This should allow for a more faithful
  refactoring in the beginning. (Thanks Jonathan!)
* fixed memleaks (Thanks Brandon!)
* "git -c color.moved=true log -p" works now! (Thanks Jeff)
* interdiff below, though it is large.
* less intrusive than v1 (Thanks Jonathan!)

v1:

For details on *why* see the commit message of the last commit.

The first five patches are slight refactorings to get into good
shape, the next patches are funneling all output through emit_line_*.

The second last patch introduces an option to buffer up all output
before printing, and then the last patch can color up moved lines
of code.

Any feedback welcome.

Thanks,
Stefan

Stefan Beller (20):
  diff: readability fix
  diff: move line ending check into emit_hunk_header
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff.c: teach emit_line_0 to accept sign parameter
  diff.c: emit_line_0 can handle no color setting
  diff.c: emit_line_0 takes parameter whether to output line prefix
  diff.c: inline emit_line_0 into emit_line
  diff.c: convert fn_out_consume to use emit_line
  diff.c: convert builtin_diff to use emit_line_*
  diff.c: convert emit_rewrite_diff to use emit_line_*
  diff.c: convert emit_rewrite_lines to use emit_line_*
  submodule.c: convert show_submodule_summary to use emit_line_fmt
  diff.c: convert emit_binary_diff_body to use emit_line_*
  diff.c: convert show_stats to use emit_line_*
  diff.c: convert word diffing to use emit_line_*
  diff.c: convert diff_flush to use emit_line_*
  diff.c: convert diff_summary to use emit_line_*
  diff.c: emit_line includes whitespace highlighting
  diff: buffer all output if asked to
  diff.c: color moved lines differently

 Documentation/config.txt   |  14 +-
 diff.c                     | 845 +++++++++++++++++++++++++++++++++------------
 diff.h                     |  61 +++-
 submodule.c                |  78 ++---
 submodule.h                |   9 +-
 t/t4015-diff-whitespace.sh | 229 ++++++++++++
 6 files changed, 960 insertions(+), 276 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 90403c06e3..902d017c3b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1065,8 +1065,10 @@ color.diff.<slot>::
 	`meta` (metainformation), `frag`
 	(hunk header), 'func' (function in hunk header), `old` (removed lines),
 	`new` (added lines), `commit` (commit headers), `whitespace`
-	(highlighting whitespace errors), `movedFrom` (removed lines that
-	reappear), `movedTo` (added lines that were removed elsewhere).
+	(highlighting whitespace errors), `oldMoved` (removed lines that
+	reappear), `newMoved` (added lines that were removed elsewhere),
+	`oldMovedAlternative` and `newMovedAlternative` (as a fallback to
+	cover adjacent blocks of moved code)
 
 color.decorate.<slot>::
 	Use customized color for 'git log --decorate' output.  `<slot>` is one
diff --git a/diff.c b/diff.c
index 5dfd582084..15cf322b50 100644
--- a/diff.c
+++ b/diff.c
@@ -392,11 +392,17 @@ static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
 					 const struct buffered_patch_line *b,
 					 const void *keydata)
 {
+	int ret;
 	struct strbuf sba = STRBUF_INIT;
 	struct strbuf sbb = STRBUF_INIT;
+
 	get_ws_cleaned_string(a, &sba);
 	get_ws_cleaned_string(b, &sbb);
-	return sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
+	ret = sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
+
+	strbuf_release(&sba);
+	strbuf_release(&sbb);
+	return ret;
 }
 
 static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
@@ -428,8 +434,9 @@ static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_
 		strbuf_reset(&sb);
 		get_ws_cleaned_string(line, &sb);
 		return memhash(sb.buf, sb.len);
-	} else
+	} else {
 		return memhash(line->line, line->len);
+	}
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -608,159 +615,185 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void mark_color_as_moved(struct diff_options *o, int line_no)
+static void add_lines_to_move_detection(struct diff_options *o)
 {
-	struct hashmap *hm = NULL;
-	struct moved_entry *key = prepare_entry(o, line_no);
-	struct moved_entry *match = NULL;
-	struct buffered_patch_line *l = &o->line_buffer[line_no];
-	int alt_flag;
-	int i, lp, rp;
+	struct moved_entry *prev_line;
 
-	switch (l->sign) {
-	case '+':
-		hm = o->deleted_lines;
-		break;
-	case '-':
-		hm = o->added_lines;
-		break;
-	default:
-		/* reset to standard, on-alt move color */
-		o->color_moved = 1;
-		break;
+	int n;
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		int sign = 0;
+		struct hashmap *hm;
+		struct moved_entry *key;
+
+		switch (o->line_buffer[n].sign) {
+		case '+':
+			sign = '+';
+			hm = o->added_lines;
+			break;
+		case '-':
+			sign = '-';
+			hm = o->deleted_lines;
+			break;
+		case ' ':
+		default:
+			prev_line = NULL;
+			continue;
+		}
+
+		key = prepare_entry(o, n);
+		if (prev_line &&
+		    prev_line->line->sign == sign)
+			prev_line->next_line = key;
+
+		hashmap_add(hm, key);
+		prev_line = key;
 	}
+}
 
-	/* Check for any match to color it as a move. */
-	if (!hm)
-		return;
-	match = hashmap_get(hm, key, o);
-	free(key);
-	if (!match)
-		return;
+static void mark_color_as_moved(struct diff_options *o)
+{
+	struct moved_entry **pmb = NULL; /* potentially moved blocks */
+	int pmb_nr = 0, pmb_alloc = 0;
+	int alt_flag = 0;
+	int n;
 
-	/* Check any potential block runs, advance each or nullify */
-	for (i = 0; i < o->pmb_nr; i++) {
-		struct moved_entry *p = o->pmb[i];
-		if (p && p->next_line &&
-		    !buffered_patch_line_cmp(p->next_line->line, l, o)) {
-			o->pmb[i] = p->next_line;
-		} else {
-			o->pmb[i] = NULL;
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		struct hashmap *hm = NULL;
+		struct moved_entry *key;
+		struct moved_entry *match = NULL;
+		struct buffered_patch_line *l = &o->line_buffer[n];
+		int i, lp, rp;
+
+		switch (l->sign) {
+		case '+':
+			hm = o->deleted_lines;
+			break;
+		case '-':
+			hm = o->added_lines;
+			break;
+		default:
+			alt_flag = 0; /* reset to standard, no-alt move color */
+			pmb_nr = 0; /* no running sets */
+			continue;
 		}
-	}
 
-	/* Shrink the set to the remaining runs */
-	for (lp = 0, rp = o->pmb_nr - 1; lp <= rp;) {
-		while (lp < o->pmb_nr && o->pmb[lp])
-			lp ++;
-		/* lp points at the first NULL now */
+		/* Check for any match to color it as a move. */
+		key = prepare_entry(o, n);
+		match = hashmap_get(hm, key, o);
+		free(key);
+		if (!match)
+			continue;
 
-		while (rp > -1 && !o->pmb[rp])
-			rp--;
-		/* rp points at the last non-NULL */
+		/* Check any potential block runs, advance each or nullify */
+		for (i = 0; i < pmb_nr; i++) {
+			struct moved_entry *p = pmb[i];
+			struct moved_entry *pnext = (p && p->next_line) ?
+					p->next_line : NULL;
+			if (pnext &&
+			    !buffered_patch_line_cmp(pnext->line, l, o)) {
+				pmb[i] = p->next_line;
+			} else {
+				pmb[i] = NULL;
+			}
+		}
 
-		if (lp < o->pmb_nr && rp > -1 && lp < rp) {
-			o->pmb[lp] = o->pmb[rp];
-			o->pmb[rp] = NULL;
-			rp--;
-			lp++;
+		/* Shrink the set to the remaining runs */
+		for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
+			while (lp < pmb_nr && pmb[lp])
+				lp ++;
+			/* lp points at the first NULL now */
+
+			while (rp > -1 && !pmb[rp])
+				rp--;
+			/* rp points at the last non-NULL */
+
+			if (lp < pmb_nr && rp > -1 && lp < rp) {
+				pmb[lp] = pmb[rp];
+				pmb[rp] = NULL;
+				rp--;
+				lp++;
+			}
 		}
-	}
 
-	if (rp > -1) {
-		/* Remember the number of running sets */
-		o->pmb_nr = rp + 1;
-	} else {
-		/* Toggle color */
-		o->color_moved = o->color_moved == 2 ? 1 : 2;
-
-		/* Build up a new set */
-		i = 0;
-		for (; match; match = hashmap_get_next(hm, match)) {
-			ALLOC_GROW(o->pmb, i + 1, o->pmb_alloc);
-			o->pmb[i] = match;
-			i++;
+		if (rp > -1) {
+			/* Remember the number of running sets */
+			pmb_nr = rp + 1;
+		} else {
+			/* Toggle color */
+			alt_flag = (alt_flag + 1) % 2;
+
+			/* Build up a new set */
+			pmb_nr = 0;
+			for (; match; match = hashmap_get_next(hm, match)) {
+				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				pmb[pmb_nr++] = match;
+			}
 		}
-		o->pmb_nr = i;
-	}
 
-	alt_flag = o->color_moved - 1;
-	switch (l->sign) {
-	case '+':
-		l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
-		break;
-	case '-':
-		l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
-		break;
-	default:
-		/* reset to standard, on-alt move color */
-		o->color_moved = 1;
+		switch (l->sign) {
+		case '+':
+			l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
+			break;
+		case '-':
+			l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
+			break;
+		default:
+			die("BUG: we should have continued earlier?");
+		}
 	}
+	free(pmb);
 }
 
 static void emit_buffered_patch_line(struct diff_options *o,
-				     struct buffered_patch_line *e,
-				     int pass)
+				     struct buffered_patch_line *e)
 {
-	int has_trailing_newline, has_trailing_carriage_return, len = e->len;
+	const char *ws;
+	int has_trailing_newline, has_trailing_carriage_return;
+	int len = e->len;
 	FILE *file = o->file;
 
-	fputs(diff_line_prefix(o), file);
-
-	has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
-	if (has_trailing_newline)
-		len--;
-	has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
-	if (has_trailing_carriage_return)
-		len--;
+	if (e->add_line_prefix)
+		fputs(diff_line_prefix(o), file);
 
-	if (len || e->sign) {
+	switch (e->state) {
+	case BPL_EMIT_LINE_WS:
+		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
 		if (e->set)
 			fputs(e->set, file);
 		if (e->sign)
 			fputc(e->sign, file);
-		fwrite(e->line, len, 1, file);
 		if (e->reset)
 			fputs(e->reset, file);
-	}
-	if (has_trailing_carriage_return)
-		fputc('\r', file);
-	if (has_trailing_newline)
-		fputc('\n', file);
-}
-
-static void emit_buffered_patch_line_ws(struct diff_options *o,
-					struct buffered_patch_line *e,
-					const char *ws, unsigned ws_rule,
-					int pass)
-{
-	struct buffered_patch_line s = {e->set, e->reset, "", 0, e->sign};
-	emit_buffered_patch_line(o, &s, 0);
-	ws_check_emit(e->line, e->len, ws_rule,
-		      o->file, e->set, e->reset, ws);
-}
-
-static void process_next_buffered_patch_line(struct diff_options *o, int line_no)
-{
-	struct buffered_patch_line *e = &o->line_buffer[line_no];
-
-	const char *ws = o->current_filepair->ws;
-	unsigned ws_rule = o->current_filepair->ws_rule;
-
-	mark_color_as_moved(o, line_no);
+		ws_check_emit(e->line, e->len, o->ws_rule,
+			      file, e->set, e->reset, ws);
+		return;
+	case BPL_EMIT_LINE_ASIS:
+		has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
+		if (has_trailing_newline)
+			len--;
+		has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
+		if (has_trailing_carriage_return)
+			len--;
 
-	switch (e->state) {
-		case BPL_EMIT_LINE_ASIS:
-			emit_buffered_patch_line(o, e, 1);
-			break;
-		case BPL_EMIT_LINE_WS:
-			emit_buffered_patch_line_ws(o, e, ws, ws_rule, 1);
-			break;
-		case BPL_HANDOVER:
-			o->current_filepair++;
-			break;
-		default:
-			die("BUG: malformatted buffered patch line: '%d'", e->state);
+		if (len || e->sign) {
+			if (e->set)
+				fputs(e->set, file);
+			if (e->sign)
+				fputc(e->sign, file);
+			fwrite(e->line, len, 1, file);
+			if (e->reset)
+				fputs(e->reset, file);
+		}
+		if (has_trailing_carriage_return)
+			fputc('\r', file);
+		if (has_trailing_newline)
+			fputc('\n', file);
+		return;
+	case BPL_HANDOVER:
+		o->ws_rule = whitespace_rule(e->line); /*read from file, stored in line?*/
+		return;
+	default:
+		die("BUG: malformatted buffered patch line: '%d'", e->state);
 	}
 }
 
@@ -771,46 +804,30 @@ static void append_buffered_patch_line(struct diff_options *o,
 	ALLOC_GROW(o->line_buffer,
 		   o->line_buffer_nr + 1,
 		   o->line_buffer_alloc);
-	f = &o->line_buffer[o->line_buffer_nr];
-	o->line_buffer_nr++;
+	f = &o->line_buffer[o->line_buffer_nr++];
 
 	memcpy(f, e, sizeof(struct buffered_patch_line));
 	f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
 }
 
-static void emit_line_0(struct diff_options *o,
-			const char *set, const char *reset,
-			int sign, const char *line, int len)
+void emit_line(struct diff_options *o,
+	       const char *set, const char *reset,
+	       int add_line_prefix, int markup_ws,
+	       int sign, const char *line, int len)
 {
-	struct buffered_patch_line e = {set, reset, line, len, sign, BPL_EMIT_LINE_ASIS};
+	struct buffered_patch_line e = {set, reset, line,
+		len, sign, add_line_prefix,
+		markup_ws ? BPL_EMIT_LINE_WS : BPL_EMIT_LINE_ASIS};
 
 	if (o->use_buffer)
 		append_buffered_patch_line(o, &e);
 	else
-		emit_buffered_patch_line(o, &e, 0);
-}
-
-void emit_line(struct diff_options *o, const char *set, const char *reset,
-	       const char *line, int len)
-{
-	emit_line_0(o, set, reset, 0, line, len);
-}
-
-static void emit_line_ws(struct diff_options *o,
-			 const char *set, const char *reset, int sign,
-			 const char *line, int len,
-			 const char *ws, unsigned ws_rule)
-{
-	struct buffered_patch_line e = {set, reset, line, len, sign, BPL_EMIT_LINE_WS};
-
-	if (o->use_buffer)
-		append_buffered_patch_line(o, &e);
-	else
-		emit_buffered_patch_line_ws(o, &e, ws, ws_rule, 0);
+		emit_buffered_patch_line(o, &e);
 }
 
 void emit_line_fmt(struct diff_options *o,
 		   const char *set, const char *reset,
+		   int add_line_prefix,
 		   const char *fmt, ...)
 {
 	struct strbuf sb = STRBUF_INIT;
@@ -819,7 +836,7 @@ void emit_line_fmt(struct diff_options *o,
 	strbuf_vaddf(&sb, fmt, ap);
 	va_end(ap);
 
-	emit_line(o, set, reset, sb.buf, sb.len);
+	emit_line(o, set, reset, add_line_prefix, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -851,44 +868,15 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line_0(ecbdata->opt, set, reset, sign, line, len);
+		emit_line(ecbdata->opt, set, reset, 1, 0, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
-	else
+		emit_line(ecbdata->opt, ws, reset, 1, 1, sign, line, len);
+	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line_ws(ecbdata->opt, set, reset, sign, line, len,
-			     ws, ecbdata->ws_rule);
-}
-
-static void add_line_to_move_detection(struct diff_options *o, int line_idx)
-{
-	int sign = 0;
-	struct hashmap *hm;
-	struct moved_entry *key;
-
-	switch (o->line_buffer[line_idx].sign) {
-	case '+':
-		sign = '+';
-		hm = o->added_lines;
-		break;
-	case '-':
-		sign = '-';
-		hm = o->deleted_lines;
-		break;
-	case ' ':
-	default:
-		o->prev_line = NULL;
-		return;
+		emit_line(ecbdata->opt, set, reset, 1, 1, sign, line, len);
 	}
 
-	key = prepare_entry(o, line_idx);
-	if (o->prev_line &&
-	    o->prev_line->line->sign == sign)
-		o->prev_line->next_line = key;
-
-	hashmap_add(hm, key);
-	o->prev_line = key;
 }
 
 static void emit_add_line(const char *reset,
@@ -935,7 +923,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	if (len < 10 ||
 	    memcmp(line, atat, 2) ||
 	    !(ep = memmem(line + 2, len - 2, atat, 2))) {
-		emit_line(ecbdata->opt, context, reset, line, len);
+		emit_line(ecbdata->opt, context, reset, 1, 0, 0, line, len);
 		return;
 	}
 	ep += 2; /* skip over @@ */
@@ -971,7 +959,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	strbuf_add(&msgbuf, line + len, org_len - len);
 	strbuf_complete_line(&msgbuf);
 
-	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
+	emit_line(ecbdata->opt, "", "", 1, 0, 0, msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
 
@@ -1011,15 +999,25 @@ static void add_line_count(struct strbuf *out, int count)
 static void emit_rewrite_lines(struct emit_callback *ecb,
 			       int prefix, const char *data, int size)
 {
-	const char *endp = NULL;
-	static const char *nneof = " No newline at end of file\n";
 	const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
+	struct strbuf sb = STRBUF_INIT;
 
 	while (0 < size) {
 		int len;
 
-		endp = memchr(data, '\n', size);
-		len = endp ? (endp - data + 1) : size;
+		const char *endp = memchr(data, '\n', size);
+		if (endp)
+			len = endp - data + 1;
+		else {
+			while (0 < size) {
+				strbuf_addch(&sb, *data);
+				size -= len;
+				data += len;
+			}
+			strbuf_addch(&sb, '\n');
+			data = sb.buf;
+			len = sb.len;
+		}
 		if (prefix != '+') {
 			ecb->lno_in_preimage++;
 			emit_del_line(reset, ecb, data, len);
@@ -1030,12 +1028,13 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		size -= len;
 		data += len;
 	}
-	if (!endp) {
+	if (sb.len) {
+		static const char *nneof = "\\ No newline at end of file\n";
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		emit_line(ecb->opt, NULL, NULL, "\n", 1);
-		emit_line_0(ecb->opt, context, reset, '\\',
+		emit_line(ecb->opt, context, reset, 1, 0, 0,
 			    nneof, strlen(nneof));
+		strbuf_release(&sb);
 	}
 }
 
@@ -1095,8 +1094,8 @@ static void emit_rewrite_diff(const char *name_a,
 	ecbdata.lno_in_preimage = 1;
 	ecbdata.lno_in_postimage = 1;
 
-	emit_line_fmt(o, metainfo, reset, "--- %s%s\n", a_name.buf, name_a_tab);
-	emit_line_fmt(o, metainfo, reset, "+++ %s%s\n", b_name.buf, name_b_tab);
+	emit_line_fmt(o, metainfo, reset, 1, "--- %s%s\n", a_name.buf, name_a_tab);
+	emit_line_fmt(o, metainfo, reset, 1, "+++ %s%s\n", b_name.buf, name_b_tab);
 
 	lc_a = count_lines(data_one, size_one);
 	lc_b = count_lines(data_two, size_two);
@@ -1109,7 +1108,7 @@ static void emit_rewrite_diff(const char *name_a,
 	strbuf_addstr(&out, " +");
 	add_line_count(&out, lc_b);
 	strbuf_addstr(&out, " @@\n");
-	emit_line(o, fraginfo, reset, out.buf, out.len);
+	emit_line(o, fraginfo, reset, 1, 0, 0, out.buf, out.len);
 	strbuf_release(&out);
 
 	if (lc_a && !o->irreversible_delete)
@@ -1174,34 +1173,38 @@ struct diff_words_data {
 static int fn_out_diff_words_write_helper(struct diff_options *o,
 					  struct diff_words_style_elem *st_el,
 					  const char *newline,
-					  size_t count, const char *buf,
-					  const char *line_prefix)
+					  size_t count, const char *buf)
 {
+	int print = 0;
 	struct strbuf sb = STRBUF_INIT;
 
 	while (count) {
 		char *p = memchr(buf, '\n', count);
+		if (print)
+			emit_line(o, NULL, NULL, 1, 0, 0, "", 0);
 
 		if (p != buf) {
-			if (st_el->color)
-				strbuf_addstr(&sb, st_el->color);
+			const char *reset = st_el->color && *st_el->color ?
+					    GIT_COLOR_RESET : NULL;
 			strbuf_addstr(&sb, st_el->prefix);
 			strbuf_add(&sb, buf, p ? p - buf : count);
 			strbuf_addstr(&sb, st_el->suffix);
-			if (st_el->color && *st_el->color)
-			    strbuf_addstr(&sb, GIT_COLOR_RESET);
+			emit_line(o, st_el->color, reset,
+				  0, 0, 0, sb.buf, sb.len);
+			strbuf_reset(&sb);
 		}
 		if (!p)
 			goto out;
+
 		strbuf_addstr(&sb, newline);
-		emit_line(o, NULL, NULL, sb.buf, sb.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, sb.buf, sb.len);
 		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
+		print = 1;
 	}
+
 out:
-	if (sb.len)
-		emit_line(o, NULL, NULL, sb.buf, sb.len);
 	strbuf_release(&sb);
 	return 0;
 }
@@ -1256,14 +1259,12 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	int minus_first, minus_len, plus_first, plus_len;
 	const char *minus_begin, *minus_end, *plus_begin, *plus_end;
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
 
 	if (line[0] != '@' || parse_hunk_header(line, len,
 			&minus_first, &minus_len, &plus_first, &plus_len))
 		return;
 
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* POSIX requires that first be decremented by one if len == 0... */
 	if (minus_len) {
@@ -1280,28 +1281,21 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	} else
 		plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
 
-	if (color_words_output_graph_prefix(diff_words))
-		emit_line(diff_words->opt, NULL, NULL, "", 0);
-
 	if (diff_words->current_plus != plus_begin) {
 		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->ctx, style->newline,
 				plus_begin - diff_words->current_plus,
-				diff_words->current_plus, line_prefix);
-		if (*(plus_begin - 1) == '\n')
-			emit_line(diff_words->opt, NULL, NULL, "", 0);
+				diff_words->current_plus);
 	}
 	if (minus_begin != minus_end) {
 		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->old, style->newline,
-				minus_end - minus_begin, minus_begin,
-				line_prefix);
+				minus_end - minus_begin, minus_begin);
 	}
 	if (plus_begin != plus_end) {
 		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->new, style->newline,
-				plus_end - plus_begin, plus_begin,
-				line_prefix);
+				plus_end - plus_begin, plus_begin);
 	}
 
 	diff_words->current_plus = plus_end;
@@ -1388,17 +1382,14 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	struct diff_words_style *style = diff_words->style;
 
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
-
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* special case: only removal */
 	if (!diff_words->plus.text.size) {
 		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->old, style->newline,
 			diff_words->minus.text.size,
-			diff_words->minus.text.ptr, line_prefix);
+			diff_words->minus.text.ptr);
 		diff_words->minus.text.size = 0;
 		return;
 	}
@@ -1421,12 +1412,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			emit_line(diff_words->opt, NULL, NULL, "", 0);
+			emit_line(diff_words->opt, NULL, NULL, 1, 0, 0, "", 0);
 		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
-			- diff_words->current_plus, diff_words->current_plus,
-			line_prefix);
+			- diff_words->current_plus, diff_words->current_plus);
 	}
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
 }
@@ -1444,8 +1434,10 @@ static void diff_words_flush(struct emit_callback *ecbdata)
 			append_buffered_patch_line(ecbdata->opt,
 				&ecbdata->diff_words->opt->line_buffer[i]);
 
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			free((void*) ecbdata->diff_words->opt->line_buffer[i].line);
+
 		ecbdata->diff_words->opt->line_buffer_nr = 0;
-		/* TODO: free memory as well */
 	}
 }
 
@@ -1521,6 +1513,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
 {
 	if (ecbdata->diff_words) {
 		diff_words_flush(ecbdata);
+		free (ecbdata->diff_words->opt->line_buffer);
 		free (ecbdata->diff_words->opt);
 		free (ecbdata->diff_words->minus.text.ptr);
 		free (ecbdata->diff_words->minus.orig);
@@ -1596,7 +1589,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		emit_line(o, NULL, NULL,
+		emit_line(o, NULL, NULL, 0, 0, 0,
 			  ecbdata->header->buf, ecbdata->header->len);
 		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
@@ -1606,9 +1599,9 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		const char *name_a_tab, *name_b_tab;
 		name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
 		name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-		emit_line_fmt(o, meta, reset, "--- %s%s\n",
+		emit_line_fmt(o, meta, reset, 1, "--- %s%s\n",
 			      ecbdata->label_path[0], name_a_tab);
-		emit_line_fmt(o, meta, reset, "+++ %s%s\n",
+		emit_line_fmt(o, meta, reset, 1, "+++ %s%s\n",
 			      ecbdata->label_path[1], name_b_tab);
 		ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
 	}
@@ -1649,8 +1642,8 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-			emit_line(o, context, reset, line, len);
-			emit_line(o, NULL, NULL, "~\n", 2);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
+			emit_line(o, NULL, NULL, 0, 0, 0, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
@@ -1661,7 +1654,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			      line++;
 			      len--;
 			}
-			emit_line(o, context, reset, line, len);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
 		}
 		return;
 	}
@@ -1684,7 +1677,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		/* incomplete line at the end */
 		ecbdata->lno_in_preimage++;
 		emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
-			  reset, line, len);
+			  reset, 1, 0, 0, line, len);
 		break;
 	}
 }
@@ -1873,7 +1866,7 @@ void print_stat_summary_0(struct diff_options *options, int files,
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
 		strbuf_addstr(&sb, " 0 files changed");
-		emit_line(options, NULL, NULL, sb.buf, sb.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		return;
 	}
 
@@ -1901,7 +1894,7 @@ void print_stat_summary_0(struct diff_options *options, int files,
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	emit_line(options, NULL, NULL, sb.buf, sb.len);
+	emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -2085,7 +2078,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
 				strbuf_addch(&out, '\n');
-				emit_line(options, NULL, NULL, out.buf, out.len);
+				emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 				strbuf_reset(&out);
 				continue;
 			}
@@ -2095,14 +2088,14 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
 			strbuf_addstr(&out, " bytes\n");
-			emit_line(options, NULL, NULL, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
 			show_name(&out, prefix, name, len);
 			strbuf_addstr(&out, " Unmerged\n");
-			emit_line(options, NULL, NULL, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
@@ -2133,7 +2126,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		show_graph(&out, '+', add, add_c, reset);
 		show_graph(&out, '-', del, del_c, reset);
 		strbuf_addch(&out, '\n');
-		emit_line(options, NULL, NULL, out.buf, out.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 		strbuf_reset(&out);
 	}
 
@@ -2155,7 +2148,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			emit_line(options, NULL, NULL,
+			emit_line(options, NULL, NULL, 1, 0, 0,
 				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
@@ -2509,7 +2502,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
 			line_prefix, data->filename, data->lineno, err);
 		free(err);
-		emit_line(data->o, set, reset, line, 1);
+		emit_line(data->o, set, reset, 1, 0, 0, line, 1);
 		ws_check_emit(line + 1, len - 1, data->ws_rule,
 			      data->o->file, set, reset, ws);
 	} else if (line[0] == ' ') {
@@ -2576,12 +2569,12 @@ static void emit_binary_diff_body(struct diff_options *o,
 	}
 
 	if (delta && delta_size < deflate_size) {
-		emit_line_fmt(o, NULL, NULL, "delta %lu\n", orig_size);
+		emit_line_fmt(o, NULL, NULL, 1, "delta %lu\n", orig_size);
 		free(deflated);
 		data = delta;
 		data_size = delta_size;
 	} else {
-		emit_line_fmt(o, NULL, NULL, "literal %lu\n", two->size);
+		emit_line_fmt(o, NULL, NULL, 1, "literal %lu\n", two->size);
 		free(delta);
 		data = deflated;
 		data_size = deflate_size;
@@ -2605,9 +2598,9 @@ static void emit_binary_diff_body(struct diff_options *o,
 		line[len++] = '\n';
 		line[len] = '\0';
 
-		emit_line(o, NULL, NULL, line, len);
+		emit_line(o, NULL, NULL, 1, 0, 0, line, len);
 	}
-	emit_line(o, NULL, NULL, "\n", 1);
+	emit_line(o, NULL, NULL, 1, 0, 0, "\n", 1);
 	free(data);
 }
 
@@ -2616,7 +2609,7 @@ static void emit_binary_diff(struct diff_options *o,
 {
 	const char *s = "GIT binary patch\n";
 	const int len = strlen(s);
-	emit_line(o, NULL, NULL, s, len);
+	emit_line(o, NULL, NULL, 1, 0, 0, s, len);
 	emit_binary_diff_body(o, one, two);
 	emit_binary_diff_body(o, two, one);
 }
@@ -2727,7 +2720,7 @@ static void builtin_diff(const char *name_a,
 	b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
 	lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
 	lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
-	strbuf_addf(&header, "%sdiff --git %s %s%s\n", meta, a_one, b_two, reset);
+	strbuf_addf(&header, "%s%sdiff --git %s %s%s\n", line_prefix, meta, a_one, b_two, reset);
 	if (lbl[0][0] == '/') {
 		/* /dev/null */
 		strbuf_addf(&header, "%s%snew file mode %06o%s\n", line_prefix, meta, two->mode, reset);
@@ -2759,7 +2752,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			emit_line(o, NULL, NULL, header.buf, header.len);
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2769,8 +2762,7 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		if (header.len)
-			emit_line(o, NULL, NULL, header.buf, header.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2780,15 +2772,15 @@ static void builtin_diff(const char *name_a,
 		    S_ISREG(one->mode) && S_ISREG(two->mode) &&
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
-				if (must_show_header && header.len)
-					emit_line(o, NULL, NULL,
+				if (must_show_header)
+					emit_line(o, NULL, NULL, 0, 0, 0,
 						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			if (header.len)
-				emit_line(o, NULL, NULL,
-					  header.buf, header.len);
-			emit_line_fmt(o, 0, 0, "Binary files %s and %s differ\n",
+			emit_line(o, NULL, NULL, 0, 0, 0,
+				  header.buf, header.len);
+			emit_line_fmt(o, NULL, NULL, 1,
+				      "Binary files %s and %s differ\n",
 				      lbl[0], lbl[1]);
 			goto free_ab_and_return;
 		}
@@ -2797,19 +2789,18 @@ static void builtin_diff(const char *name_a,
 		/* Quite common confusing case */
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
-			if (must_show_header && header.len)
-				emit_line(o, NULL, NULL,
+			if (must_show_header)
+				emit_line(o, NULL, NULL, 0, 0, 0,
 					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		if (header.len)
-			emit_line(o, NULL, NULL,
-				  header.buf, header.len);
+		emit_line(o, NULL, NULL, 0, 0, 0,
+			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
 			emit_binary_diff(o, &mf1, &mf2);
 		else
-			emit_line_fmt(o, NULL, NULL,
+			emit_line_fmt(o, NULL, NULL, 1,
 				      "Binary files %s and %s differ\n",
 				      lbl[0], lbl[1]);
 		o->found_changes = 1;
@@ -2822,8 +2813,8 @@ static void builtin_diff(const char *name_a,
 		struct emit_callback ecbdata;
 		const struct userdiff_funcname *pe;
 
-		if (must_show_header && header.len) {
-			emit_line(o, NULL, NULL, header.buf, header.len);
+		if (must_show_header) {
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
@@ -2840,6 +2831,7 @@ static void builtin_diff(const char *name_a,
 		ecbdata.label_path = lbl;
 		ecbdata.color_diff = want_color(o->use_color);
 		ecbdata.ws_rule = whitespace_rule(name_b);
+		o->ws_rule = ecbdata.ws_rule;
 		if (ecbdata.ws_rule & WS_BLANK_AT_EOF)
 			check_blank_at_eof(&mf1, &mf2, &ecbdata);
 		ecbdata.opt = o;
@@ -2861,24 +2853,15 @@ static void builtin_diff(const char *name_a,
 		if (o->word_diff)
 			init_diff_words_data(&ecbdata, o, one, two);
 		if (o->use_buffer) {
-			ALLOC_GROW(o->filepair_buffer,
-				   o->filepair_buffer_nr + 1,
-				   o->filepair_buffer_alloc);
-			o->current_filepair =
-				&o->filepair_buffer[o->filepair_buffer_nr++];
-
-			o->current_filepair->ws_rule = ecbdata.ws_rule;
-			o->current_filepair->ws =
-				diff_get_color(ecbdata.color_diff, DIFF_WHITESPACE);
+			struct buffered_patch_line e = BUFFERED_PATCH_LINE_INIT;
+			e.state = BPL_HANDOVER;
+			e.line = name_b;
+			e.len = strlen(name_b);
+			append_buffered_patch_line(o, &e);
 		}
 		if (xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 				  &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
-		if (o->use_buffer) {
-			struct buffered_patch_line e = BUFFERED_PATCH_LINE_INIT;
-			e.state = BPL_HANDOVER; /* handover to next file pair */
-			append_buffered_patch_line(o, &e);
-		}
 		if (o->word_diff)
 			free_diff_words_data(&ecbdata);
 		if (textconv_one)
@@ -3755,6 +3738,12 @@ void diff_setup(struct diff_options *options)
 		options->a_prefix = "a/";
 		options->b_prefix = "b/";
 	}
+
+	options->line_buffer = NULL;
+	options->line_buffer_nr = 0;
+	options->line_buffer_alloc = 0;
+
+	options->color_moved = diff_color_moved_default;
 }
 
 void diff_setup_done(struct diff_options *options)
@@ -4837,11 +4826,10 @@ static void show_file_mode_name(struct diff_options *opt, const char *newdelete,
 
 	quote_c_style(fs->path, &sb, NULL, 0);
 	strbuf_addch(&sb, '\n');
-	emit_line(opt, NULL, NULL, sb.buf, sb.len);
+	emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
-
 static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
 		int show_name)
 {
@@ -4851,7 +4839,7 @@ static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
 			strbuf_addch(&sb, ' ');
 			quote_c_style(p->two->path, &sb, NULL, 0);
 		}
-		emit_line_fmt(opt, NULL, NULL,
+		emit_line_fmt(opt, NULL, NULL, 1,
 			      " mode change %06o => %06o%s\n",
 			      p->one->mode, p->two->mode,
 			      show_name ? sb.buf : "");
@@ -4863,7 +4851,8 @@ static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
 		struct diff_filepair *p)
 {
 	char *names = pprint_rename(p->one->path, p->two->path);
-	emit_line_fmt(opt, NULL, NULL, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p));
+	emit_line_fmt(opt, NULL, NULL, 1, " %s %s (%d%%)\n",
+		      renamecopy, names, similarity_index(p));
 	free(names);
 	show_mode_change(opt, p, 0);
 }
@@ -4889,7 +4878,7 @@ static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 			strbuf_addstr(&sb, " rewrite ");
 			quote_c_style(p->two->path, &sb, NULL, 0);
 			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
-			emit_line(opt, NULL, NULL, sb.buf, sb.len);
+			emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		}
 		show_mode_change(opt, p, !p->score);
 		break;
@@ -5114,12 +5103,6 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			(hashmap_cmp_fn)moved_entry_cmp, 0);
 	}
 
-	if (o->use_buffer) {
-		ALLOC_GROW(o->filepair_buffer,
-			   o->filepair_buffer_nr + 1,
-			   o->filepair_buffer_alloc);
-		o->current_filepair = &o->filepair_buffer[o->filepair_buffer_nr];
-	}
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		if (check_pair_status(p))
@@ -5127,23 +5110,22 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 	}
 
 	if (o->use_buffer) {
-		o->current_filepair = &o->filepair_buffer[0];
-		for (i = 0; i < o->line_buffer_nr; i++)
-			add_line_to_move_detection(o, i);
+		if (o->color_moved) {
+			add_lines_to_move_detection(o);
+			mark_color_as_moved(o);
+		}
 
-		o->current_filepair = &o->filepair_buffer[0];
 		for (i = 0; i < o->line_buffer_nr; i++)
-			process_next_buffered_patch_line(o, i);
+			emit_buffered_patch_line(o, &o->line_buffer[i]);
 
-		for (i = 0; i < o->line_buffer_nr; i++);
+		for (i = 0; i < o->line_buffer_nr; i++)
 			free((void*)o->line_buffer[i].line);
 
+		free(o->line_buffer);
+
 		o->line_buffer = NULL;
 		o->line_buffer_nr = 0;
-		free(o->line_buffer);
-		o->filepair_buffer = NULL;
-		o->filepair_buffer_nr = 0;
-		free(o->filepair_buffer);
+		o->line_buffer_alloc = 0;
 	}
 }
 
@@ -5237,11 +5219,10 @@ void diff_flush(struct diff_options *options)
 			term[0] = options->line_termination;
 			term[1] = '\0';
 
-			emit_line(options, NULL, NULL,
-				  term, 1);
+			emit_line(options, NULL, NULL, 1, 0, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				emit_line(options, NULL, NULL,
+				emit_line(options, NULL, NULL, 0, 0, 0,
 					  options->stat_sep,
 					  strlen(options->stat_sep));
 			}
diff --git a/diff.h b/diff.h
index b83d6fefcc..b8b2a33ccc 100644
--- a/diff.h
+++ b/diff.h
@@ -133,21 +133,24 @@ struct buffered_patch_line {
 	const char *line;
 	int len;
 	int sign;
+	int add_line_prefix;
 	enum {
+		/*
+		 * Emits [lineprefix][set][sign][reset] and then calls
+		 * ws_check_emit which will output "line", marked up
+		 * according to ws_rule.
+		 */
 		BPL_EMIT_LINE_WS,
+
+		/* Emits [lineprefix][set][sign] line [reset] */
 		BPL_EMIT_LINE_ASIS,
+
+		/* Reloads the ws_rule; line contains the file name */
 		BPL_HANDOVER
 	} state;
 };
 #define BUFFERED_PATCH_LINE_INIT {NULL, NULL, NULL, 0, 0, 0}
 
-struct buffered_filepair {
-	const char *ws;
-	unsigned ws_rule;
-};
-
-struct moved_entry;
-
 struct diff_options {
 	const char *orderfile;
 	const char *pickaxe;
@@ -220,32 +223,21 @@ struct diff_options {
 
 	int diff_path_counter;
 
-	/* Determines color moved code. Flipped between 1, 2 for alt. color. */
-	int color_moved;
+	unsigned ws_rule;
 	int use_buffer;
 
 	struct buffered_patch_line *line_buffer;
 	int line_buffer_nr, line_buffer_alloc;
 
-	struct buffered_filepair *filepair_buffer;
-	int filepair_buffer_nr, filepair_buffer_alloc;
-	struct buffered_filepair *current_filepair;
-
-	/* built up in the first pass: */
+	int color_moved;
 	struct hashmap *deleted_lines;
 	struct hashmap *added_lines;
-	/* needed for building up */
-	struct moved_entry *prev_line;
-
-	/* state in the second pass */
-	struct moved_entry **pmb; /* potentially moved blocks */
-	int pmb_nr, pmb_alloc;
 };
 
 void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
-		   const char *fmt, ...);
+		   int add_line_prefix, const char *fmt, ...);
 void emit_line(struct diff_options *o, const char *set, const char *reset,
-	       const char *line, int len);
+	       int add_line_prefix, int markup_ws, int sign, const char *line, int len);
 
 enum color_diff {
 	DIFF_RESET = 0,
diff --git a/submodule.c b/submodule.c
index cfad469a2f..868f913971 100644
--- a/submodule.c
+++ b/submodule.c
@@ -378,9 +378,9 @@ static void print_submodule_summary(struct rev_info *rev,
 		format_commit_message(commit, format, &sb, &ctx);
 		strbuf_addch(&sb, '\n');
 		if (commit->object.flags & SYMMETRIC_LEFT)
-			emit_line(o, del, reset, sb.buf, sb.len);
+			emit_line(o, del, reset, 1, 0, 0, sb.buf, sb.len);
 		else if (add)
-			emit_line(o, add, reset, sb.buf, sb.len);
+			emit_line(o, add, reset, 1, 0, 0, sb.buf, sb.len);
 	}
 	strbuf_release(&sb);
 }
@@ -419,10 +419,10 @@ static void show_submodule_header(struct diff_options *o, const char *path,
 	int fast_forward = 0, fast_backward = 0;
 
 	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
-		emit_line_fmt(o, NULL, NULL,
+		emit_line_fmt(o, NULL, NULL, 1,
 			      "Submodule %s contains untracked content\n", path);
 	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-		emit_line_fmt(o, NULL, NULL,
+		emit_line_fmt(o, NULL, NULL, 1,
 			      "Submodule %s contains modified content\n", path);
 
 	if (is_null_oid(one))
@@ -473,7 +473,7 @@ static void show_submodule_header(struct diff_options *o, const char *path,
 		strbuf_addf(&sb, " %s\n", message);
 	else
 		strbuf_addf(&sb, "%s:\n", fast_backward ? " (rewind)" : "");
-	emit_line(o, meta, reset, sb.buf, sb.len);
+	emit_line(o, meta, reset, 1, 0, 0, sb.buf, sb.len);
 
 	strbuf_release(&sb);
 }
@@ -501,7 +501,7 @@ void show_submodule_summary(struct diff_options *o, const char *path,
 	/* Treat revision walker failure the same as missing commits */
 	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
 		const char *error = "(revision walker failed)\n";
-		emit_line(o, NULL, NULL, error, strlen(error));
+		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
 		goto out;
 	}
 
@@ -570,15 +570,15 @@ void show_submodule_inline_diff(struct diff_options *o, const char *path,
 	prepare_submodule_repo_env(&cp.env_array);
 	if (start_command(&cp)) {
 		const char *error = "(diff failed)\n";
-		emit_line(o, NULL, NULL, error, strlen(error));
+		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
 	}
 
 	while (strbuf_getwholeline_fd(&sb, cp.out, '\n') != EOF)
-		emit_line(o, NULL, NULL, sb.buf, sb.len);
+		emit_line(o, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 
 	if (finish_command(&cp)) {
 		const char *error = "(diff failed)\n";
-		emit_line(o, NULL, NULL, error, strlen(error));
+		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
 	}
 
 done:


-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 01/20] diff: readability fix
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

We already have dereferenced 'p->two' into a local variable 'two'. Use
that.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 	const char *other;
 	const char *attr_path;
 
-	name  = p->one->path;
-	other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+	name  = one->path;
+	other = (strcmp(name, two->path) ? two->path : NULL);
 	attr_path = name;
 	if (o->prefix_length)
 		strip_prefix(o->prefix_length, &name, &other);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 02/20] diff: move line ending check into emit_hunk_header
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 01/20] diff: readability fix Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	}
 
 	strbuf_add(&msgbuf, line + len, org_len - len);
+	strbuf_complete_line(&msgbuf);
+
 	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		len = sane_truncate_line(ecbdata, line, len);
 		find_lno(line, ecbdata);
 		emit_hunk_header(ecbdata, line, len);
-		if (line[len-1] != '\n')
-			putc('\n', o->file);
 		return;
 	}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 03/20] diff.c: factor out diff_flush_patch_all_file_pairs
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 01/20] diff: readability fix Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..2f9722b382 100644
--- a/diff.c
+++ b/diff.c
@@ -4737,6 +4737,17 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc)
 		warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+	int i;
+	struct diff_queue_struct *q = &diff_queued_diff;
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		if (check_pair_status(p))
+			diff_flush_patch(p, o);
+	}
+}
+
 void diff_flush(struct diff_options *options)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
@@ -4831,11 +4842,7 @@ void diff_flush(struct diff_options *options)
 			}
 		}
 
-		for (i = 0; i < q->nr; i++) {
-			struct diff_filepair *p = q->queue[i];
-			if (check_pair_status(p))
-				diff_flush_patch(p, options);
-		}
+		diff_flush_patch_all_file_pairs(options);
 	}
 
 	if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 04/20] diff.c: teach emit_line_0 to accept sign parameter
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (2 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

Teach emit_line_0 take an optional "sign" parameter specifically intended
to hold the sign of the line instead of a separate "first" parameter
representing the first character of the line to be printed.  Callers
that store the sign and line separately can use the "sign" parameter
like they used the "first" parameter previously, and callers that store
the sign and line together (or do not have a sign) no longer need to
manipulate their arguments to fit the requirements of emit_line_0.

With this patch other callers hard code the sign (which are '+', '-',
' ' and '\\') such that we do not run into unexpectedly emitting an
erroneous '\0'.

The audit of the caller revealed that the sign cannot be '\n' or '\r',
so remove that condition for trailing newline or carriage return in the
sign; the else part of the condition handles the len==0 perfectly,
so we can drop the if/else construct.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/diff.c b/diff.c
index 2f9722b382..73e55b0c10 100644
--- a/diff.c
+++ b/diff.c
@@ -517,33 +517,24 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 }
 
 static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int first, const char *line, int len)
+			int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
-	int nofirst;
 	FILE *file = o->file;
 
 	fputs(diff_line_prefix(o), file);
 
-	if (len == 0) {
-		has_trailing_newline = (first == '\n');
-		has_trailing_carriage_return = (!has_trailing_newline &&
-						(first == '\r'));
-		nofirst = has_trailing_newline || has_trailing_carriage_return;
-	} else {
-		has_trailing_newline = (len > 0 && line[len-1] == '\n');
-		if (has_trailing_newline)
-			len--;
-		has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-		if (has_trailing_carriage_return)
-			len--;
-		nofirst = 0;
-	}
+	has_trailing_newline = (len > 0 && line[len-1] == '\n');
+	if (has_trailing_newline)
+		len--;
+	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+	if (has_trailing_carriage_return)
+		len--;
 
-	if (len || !nofirst) {
+	if (len || sign) {
 		fputs(set, file);
-		if (!nofirst)
-			fputc(first, file);
+		if (sign)
+			fputc(sign, file);
 		fwrite(line, len, 1, file);
 		fputs(reset, file);
 	}
@@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 static void emit_line(struct diff_options *o, const char *set, const char *reset,
 		      const char *line, int len)
 {
-	emit_line_0(o, set, reset, line[0], line+1, len-1);
+	emit_line_0(o, set, reset, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
@@ -4833,9 +4824,11 @@ void diff_flush(struct diff_options *options)
 
 	if (output_format & DIFF_FORMAT_PATCH) {
 		if (separator) {
-			fprintf(options->file, "%s%c",
-				diff_line_prefix(options),
-				options->line_termination);
+			char term[2];
+			term[0] = options->line_termination;
+			term[1] = '\0';
+
+			emit_line(options, NULL, NULL, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
 				fputs(options->stat_sep, options->file);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 05/20] diff.c: emit_line_0 can handle no color setting
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (3 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In later patches we may pass lines that are not colored to the central
function emit_line_0, so we need to emit the color only when it is
non-NULL.

We could have chosen to pass "" instead of NULL, but that would be more
work.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 73e55b0c10..6c1886d495 100644
--- a/diff.c
+++ b/diff.c
@@ -532,11 +532,13 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		len--;
 
 	if (len || sign) {
-		fputs(set, file);
+		if (set)
+			fputs(set, file);
 		if (sign)
 			fputc(sign, file);
 		fwrite(line, len, 1, file);
-		fputs(reset, file);
+		if (reset)
+			fputs(reset, file);
 	}
 	if (has_trailing_carriage_return)
 		fputc('\r', file);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (4 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In later patches we'll make extensive use of emit_line_0, as we'd want
to funnel all output through this function such that we can add buffering
there.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 6c1886d495..25735f03d2 100644
--- a/diff.c
+++ b/diff.c
@@ -517,12 +517,13 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 }
 
 static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int sign, const char *line, int len)
+			int add_line_prefix, int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
 
-	fputs(diff_line_prefix(o), file);
+	if (add_line_prefix)
+		fputs(diff_line_prefix(o), file);
 
 	has_trailing_newline = (len > 0 && line[len-1] == '\n');
 	if (has_trailing_newline)
@@ -549,7 +550,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 static void emit_line(struct diff_options *o, const char *set, const char *reset,
 		      const char *line, int len)
 {
-	emit_line_0(o, set, reset, 0, line, len);
+	emit_line_0(o, set, reset, 1, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
@@ -580,13 +581,13 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line_0(ecbdata->opt, set, reset, sign, line, len);
+		emit_line_0(ecbdata->opt, set, reset, 1, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
+		emit_line_0(ecbdata->opt, ws, reset, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
+		emit_line_0(ecbdata->opt, set, reset, 1, sign, "", 0);
 		ws_check_emit(line, len, ecbdata->ws_rule,
 			      ecbdata->opt->file, set, reset, ws);
 	}
@@ -735,7 +736,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
 		putc('\n', ecb->opt->file);
-		emit_line_0(ecb->opt, context, reset, '\\',
+		emit_line_0(ecb->opt, context, reset, 1, '\\',
 			    nneof, strlen(nneof));
 	}
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 07/20] diff.c: inline emit_line_0 into emit_line
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (5 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

The argument list of emit_line_0 is just 2 more arguments that are
hard-coded in emit_line. Eliminate this intermediate function and
rename the remaining function by dropping the '_0'.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/diff.c b/diff.c
index 25735f03d2..3569857818 100644
--- a/diff.c
+++ b/diff.c
@@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int add_line_prefix, int sign, const char *line, int len)
+static void emit_line(struct diff_options *o, const char *set, const char *reset,
+		      int add_line_prefix, int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
@@ -547,12 +547,6 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		fputc('\n', file);
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char *reset,
-		      const char *line, int len)
-{
-	emit_line_0(o, set, reset, 1, 0, line, len);
-}
-
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -581,13 +575,13 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line_0(ecbdata->opt, set, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line_0(ecbdata->opt, ws, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line_0(ecbdata->opt, set, reset, 1, sign, "", 0);
+		emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
 		ws_check_emit(line, len, ecbdata->ws_rule,
 			      ecbdata->opt->file, set, reset, ws);
 	}
@@ -637,7 +631,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	if (len < 10 ||
 	    memcmp(line, atat, 2) ||
 	    !(ep = memmem(line + 2, len - 2, atat, 2))) {
-		emit_line(ecbdata->opt, context, reset, line, len);
+		emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
 		return;
 	}
 	ep += 2; /* skip over @@ */
@@ -673,7 +667,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	strbuf_add(&msgbuf, line + len, org_len - len);
 	strbuf_complete_line(&msgbuf);
 
-	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
+	emit_line(ecbdata->opt, "", "", 1, 0, msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
 
@@ -736,8 +730,8 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
 		putc('\n', ecb->opt->file);
-		emit_line_0(ecb->opt, context, reset, 1, '\\',
-			    nneof, strlen(nneof));
+		emit_line(ecb->opt, context, reset, 1, '\\',
+			  nneof, strlen(nneof));
 	}
 }
 
@@ -1335,7 +1329,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-			emit_line(o, context, reset, line, len);
+			emit_line(o, context, reset, 1, 0, line, len);
 			fputs("~\n", o->file);
 		} else {
 			/*
@@ -1347,7 +1341,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			      line++;
 			      len--;
 			}
-			emit_line(o, context, reset, line, len);
+			emit_line(o, context, reset, 1, 0, line, len);
 		}
 		return;
 	}
@@ -1370,7 +1364,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		/* incomplete line at the end */
 		ecbdata->lno_in_preimage++;
 		emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
-			  reset, line, len);
+			  reset, 1, 0, line, len);
 		break;
 	}
 }
@@ -2182,7 +2176,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
 			line_prefix, data->filename, data->lineno, err);
 		free(err);
-		emit_line(data->o, set, reset, line, 1);
+		emit_line(data->o, set, reset, 1, 0, line, 1);
 		ws_check_emit(line + 1, len - 1, data->ws_rule,
 			      data->o->file, set, reset, ws);
 	} else if (line[0] == ' ') {
@@ -4831,7 +4825,7 @@ void diff_flush(struct diff_options *options)
 			term[0] = options->line_termination;
 			term[1] = '\0';
 
-			emit_line(options, NULL, NULL, term, !!term[0]);
+			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
 				fputs(options->stat_sep, options->file);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 08/20] diff.c: convert fn_out_consume to use emit_line
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (6 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line.

This covers the parts of fn_out_consume.  In the next
patches we'll convert more functions that want to emit
formatted output, so we'd want to have a formatted emit
function. Add it here.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3569857818..8186289734 100644
--- a/diff.c
+++ b/diff.c
@@ -547,6 +547,21 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
 		fputc('\n', file);
 }
 
+static void emit_line_fmt(struct diff_options *o,
+			  const char *set, const char *reset,
+			  int add_line_prefix,
+			  const char *fmt, ...)
+{
+	struct strbuf sb = STRBUF_INIT;
+	va_list ap;
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+
+	emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+	strbuf_release(&sb);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -1270,7 +1285,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
 	const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
 	struct diff_options *o = ecbdata->opt;
-	const char *line_prefix = diff_line_prefix(o);
 
 	o->found_changes = 1;
 
@@ -1282,14 +1296,12 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 
 	if (ecbdata->label_path[0]) {
 		const char *name_a_tab, *name_b_tab;
-
 		name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
 		name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-		fprintf(o->file, "%s%s--- %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab);
-		fprintf(o->file, "%s%s+++ %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab);
+		emit_line_fmt(o, meta, reset, 1, "--- %s%s\n",
+			      ecbdata->label_path[0], name_a_tab);
+		emit_line_fmt(o, meta, reset, 1, "+++ %s%s\n",
+			      ecbdata->label_path[1], name_b_tab);
 		ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
 	}
 
@@ -1330,7 +1342,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
 			emit_line(o, context, reset, 1, 0, line, len);
-			fputs("~\n", o->file);
+			emit_line(o, NULL, NULL, 0, 0, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 09/20] diff.c: convert builtin_diff to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (7 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers builtin_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 8186289734..4fa976d43c 100644
--- a/diff.c
+++ b/diff.c
@@ -1289,8 +1289,9 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		fprintf(o->file, "%s", ecbdata->header->buf);
-		strbuf_reset(ecbdata->header);
+		emit_line(o, NULL, NULL, 0, 0,
+			  ecbdata->header->buf, ecbdata->header->len);
+		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
 	}
 
@@ -2435,7 +2436,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			fprintf(o->file, "%s", header.buf);
+			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2445,7 +2446,7 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		fprintf(o->file, "%s", header.buf);
+		emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2456,12 +2457,15 @@ static void builtin_diff(const char *name_a,
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
 				if (must_show_header)
-					fprintf(o->file, "%s", header.buf);
+					emit_line(o, NULL, NULL, 0, 0,
+						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			fprintf(o->file, "%s", header.buf);
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			emit_line(o, NULL, NULL, 0, 0,
+				  header.buf, header.len);
+			emit_line_fmt(o, NULL, NULL, 1,
+				      "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 			goto free_ab_and_return;
 		}
 		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
@@ -2470,16 +2474,19 @@ static void builtin_diff(const char *name_a,
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
 			if (must_show_header)
-				fprintf(o->file, "%s", header.buf);
+				emit_line(o, NULL, NULL, 0, 0,
+					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		fprintf(o->file, "%s", header.buf);
+		emit_line(o, NULL, NULL, 0, 0,
+			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
 			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
 		else
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			emit_line_fmt(o, NULL, NULL, 1,
+				      "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 		o->found_changes = 1;
 	} else {
 		/* Crazy xdl interfaces.. */
@@ -2491,7 +2498,7 @@ static void builtin_diff(const char *name_a,
 		const struct userdiff_funcname *pe;
 
 		if (must_show_header) {
-			fprintf(o->file, "%s", header.buf);
+			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 10/20] diff.c: convert emit_rewrite_diff to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (8 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 4fa976d43c..3dda9f3c8e 100644
--- a/diff.c
+++ b/diff.c
@@ -704,17 +704,17 @@ static void remove_tempfile(void)
 	}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
 	switch (count) {
 	case 0:
-		fprintf(file, "0,0");
+		strbuf_addstr(out, "0,0");
 		break;
 	case 1:
-		fprintf(file, "1");
+		strbuf_addstr(out, "1");
 		break;
 	default:
-		fprintf(file, "1,%d", count);
+		strbuf_addf(out, "1,%d", count);
 		break;
 	}
 }
@@ -768,7 +768,7 @@ static void emit_rewrite_diff(const char *name_a,
 	char *data_one, *data_two;
 	size_t size_one, size_two;
 	struct emit_callback ecbdata;
-	const char *line_prefix = diff_line_prefix(o);
+	struct strbuf out = STRBUF_INIT;
 
 	if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		a_prefix = o->b_prefix;
@@ -806,20 +806,23 @@ static void emit_rewrite_diff(const char *name_a,
 	ecbdata.lno_in_preimage = 1;
 	ecbdata.lno_in_postimage = 1;
 
+	emit_line_fmt(o, metainfo, reset, 1, "--- %s%s\n", a_name.buf, name_a_tab);
+	emit_line_fmt(o, metainfo, reset, 1, "+++ %s%s\n", b_name.buf, name_b_tab);
+
 	lc_a = count_lines(data_one, size_one);
 	lc_b = count_lines(data_two, size_two);
-	fprintf(o->file,
-		"%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-		line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-		line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-		line_prefix, fraginfo);
+
+	strbuf_addstr(&out, "@@ -");
 	if (!o->irreversible_delete)
-		print_line_count(o->file, lc_a);
+		add_line_count(&out, lc_a);
 	else
-		fprintf(o->file, "?,?");
-	fprintf(o->file, " +");
-	print_line_count(o->file, lc_b);
-	fprintf(o->file, " @@%s\n", reset);
+		strbuf_addstr(&out, "?,?");
+	strbuf_addstr(&out, " +");
+	add_line_count(&out, lc_b);
+	strbuf_addstr(&out, " @@\n");
+	emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+	strbuf_release(&out);
+
 	if (lc_a && !o->irreversible_delete)
 		emit_rewrite_lines(&ecbdata, '-', data_one, size_one);
 	if (lc_b)
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 11/20] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (9 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  5:03     ` Junio C Hamano
  2017-05-18  3:35     ` Junio C Hamano
  2017-05-17  2:58   ` [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
                     ` (9 subsequent siblings)
  20 siblings, 2 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_lines.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3dda9f3c8e..690794aeb8 100644
--- a/diff.c
+++ b/diff.c
@@ -722,15 +722,25 @@ static void add_line_count(struct strbuf *out, int count)
 static void emit_rewrite_lines(struct emit_callback *ecb,
 			       int prefix, const char *data, int size)
 {
-	const char *endp = NULL;
-	static const char *nneof = " No newline at end of file\n";
 	const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
+	struct strbuf sb = STRBUF_INIT;
 
 	while (0 < size) {
 		int len;
 
-		endp = memchr(data, '\n', size);
-		len = endp ? (endp - data + 1) : size;
+		const char *endp = memchr(data, '\n', size);
+		if (endp)
+			len = endp - data + 1;
+		else {
+			while (0 < size) {
+				strbuf_addch(&sb, *data);
+				size -= len;
+				data += len;
+			}
+			strbuf_addch(&sb, '\n');
+			data = sb.buf;
+			len = sb.len;
+		}
 		if (prefix != '+') {
 			ecb->lno_in_preimage++;
 			emit_del_line(reset, ecb, data, len);
@@ -741,12 +751,13 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		size -= len;
 		data += len;
 	}
-	if (!endp) {
+	if (sb.len) {
+		static const char *nneof = "\\ No newline at end of file\n";
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		putc('\n', ecb->opt->file);
-		emit_line(ecb->opt, context, reset, 1, '\\',
-			  nneof, strlen(nneof));
+		emit_line(ecb->opt, context, reset, 1, 0,
+			    nneof, strlen(nneof));
+		strbuf_release(&sb);
 	}
 }
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (10 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  5:19     ` Junio C Hamano
  2017-05-17  2:58   ` [PATCHv2 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
                     ` (8 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This prepares the code for submodules to go through the
emit_line function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c      | 20 +++++++---------
 diff.h      |  5 ++++
 submodule.c | 78 ++++++++++++++++++++++++++++++-------------------------------
 submodule.h |  9 +++----
 4 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/diff.c b/diff.c
index 690794aeb8..7c8d6a5d12 100644
--- a/diff.c
+++ b/diff.c
@@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char *reset,
-		      int add_line_prefix, int sign, const char *line, int len)
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+	       int add_line_prefix, int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
@@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
 		fputc('\n', file);
 }
 
-static void emit_line_fmt(struct diff_options *o,
-			  const char *set, const char *reset,
-			  int add_line_prefix,
-			  const char *fmt, ...)
+void emit_line_fmt(struct diff_options *o,
+		   const char *set, const char *reset,
+		   int add_line_prefix,
+		   const char *fmt, ...)
 {
 	struct strbuf sb = STRBUF_INIT;
 	va_list ap;
@@ -2386,8 +2386,7 @@ static void builtin_diff(const char *name_a,
 	    (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_summary(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_summary(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
 				meta, del, add, reset);
@@ -2397,11 +2396,10 @@ static void builtin_diff(const char *name_a,
 		   (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_inline_diff(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_inline_diff(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
-				meta, del, add, reset, o);
+				meta, del, add, reset);
 		return;
 	}
 
diff --git a/diff.h b/diff.h
index 5be1ee77a7..6e14100102 100644
--- a/diff.h
+++ b/diff.h
@@ -188,6 +188,11 @@ struct diff_options {
 	int diff_path_counter;
 };
 
+void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
+		   int add_line_prefix, const char *fmt, ...);
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+	       int add_line_prefix, int sign, const char *line, int len);
+
 enum color_diff {
 	DIFF_RESET = 0,
 	DIFF_CONTEXT = 1,
diff --git a/submodule.c b/submodule.c
index d3299e29c0..5996ebca44 100644
--- a/submodule.c
+++ b/submodule.c
@@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path,
 	return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, FILE *f,
-		const char *line_prefix,
+static void print_submodule_summary(struct rev_info *rev,
+		struct diff_options *o,
 		const char *del, const char *add, const char *reset)
 {
 	static const char format[] = "  %m %s";
@@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f,
 		ctx.date_mode = rev->date_mode;
 		ctx.output_encoding = get_log_output_encoding();
 		strbuf_setlen(&sb, 0);
-		strbuf_addstr(&sb, line_prefix);
-		if (commit->object.flags & SYMMETRIC_LEFT) {
-			if (del)
-				strbuf_addstr(&sb, del);
-		}
-		else if (add)
-			strbuf_addstr(&sb, add);
 		format_commit_message(commit, format, &sb, &ctx);
-		if (reset)
-			strbuf_addstr(&sb, reset);
 		strbuf_addch(&sb, '\n');
-		fprintf(f, "%s", sb.buf);
+		if (commit->object.flags & SYMMETRIC_LEFT)
+			emit_line(o, del, reset, 1, 0, sb.buf, sb.len);
+		else if (add)
+			emit_line(o, add, reset, 1, 0, sb.buf, sb.len);
 	}
 	strbuf_release(&sb);
 }
@@ -413,8 +407,7 @@ void prepare_submodule_repo_env(struct argv_array *out)
  * attempt to lookup both the left and right commits and put them into the
  * left and right pointers.
  */
-static void show_submodule_header(FILE *f, const char *path,
-		const char *line_prefix,
+static void show_submodule_header(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *reset,
@@ -426,11 +419,11 @@ static void show_submodule_header(FILE *f, const char *path,
 	int fast_forward = 0, fast_backward = 0;
 
 	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
-		fprintf(f, "%sSubmodule %s contains untracked content\n",
-			line_prefix, path);
+		emit_line_fmt(o, NULL, NULL, 1,
+			      "Submodule %s contains untracked content\n", path);
 	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-		fprintf(f, "%sSubmodule %s contains modified content\n",
-			line_prefix, path);
+		emit_line_fmt(o, NULL, NULL, 1,
+			      "Submodule %s contains modified content\n", path);
 
 	if (is_null_oid(one))
 		message = "(new submodule)";
@@ -472,21 +465,20 @@ static void show_submodule_header(FILE *f, const char *path,
 	}
 
 output_header:
-	strbuf_addf(&sb, "%s%sSubmodule %s ", line_prefix, meta, path);
+	strbuf_addf(&sb, "Submodule %s ", path);
 	strbuf_add_unique_abbrev(&sb, one->hash, DEFAULT_ABBREV);
 	strbuf_addstr(&sb, (fast_backward || fast_forward) ? ".." : "...");
 	strbuf_add_unique_abbrev(&sb, two->hash, DEFAULT_ABBREV);
 	if (message)
-		strbuf_addf(&sb, " %s%s\n", message, reset);
+		strbuf_addf(&sb, " %s\n", message);
 	else
-		strbuf_addf(&sb, "%s:%s\n", fast_backward ? " (rewind)" : "", reset);
-	fwrite(sb.buf, sb.len, 1, f);
+		strbuf_addf(&sb, "%s:\n", fast_backward ? " (rewind)" : "");
+	emit_line(o, meta, reset, 1, 0,  sb.buf, sb.len);
 
 	strbuf_release(&sb);
 }
 
-void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset)
@@ -495,7 +487,7 @@ void show_submodule_summary(FILE *f, const char *path,
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/*
@@ -508,11 +500,12 @@ void show_submodule_summary(FILE *f, const char *path,
 
 	/* Treat revision walker failure the same as missing commits */
 	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
-		fprintf(f, "%s(revision walker failed)\n", line_prefix);
+		const char *error = "(revision walker failed)\n";
+		emit_line(o, NULL, NULL, 1, 0, error, strlen(error));
 		goto out;
 	}
 
-	print_submodule_summary(&rev, f, line_prefix, del, add, reset);
+	print_submodule_summary(&rev, o, del, add, reset);
 
 out:
 	if (merge_bases)
@@ -521,20 +514,18 @@ void show_submodule_summary(FILE *f, const char *path,
 	clear_commit_marks(right, ~0);
 }
 
-void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *o)
+		const char *del, const char *add, const char *reset)
 {
 	const struct object_id *old = &empty_tree_oid, *new = &empty_tree_oid;
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
-	struct strbuf submodule_dir = STRBUF_INIT;
 	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf sb = STRBUF_INIT;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/* We need a valid left and right commit to display a difference */
@@ -547,15 +538,14 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 	if (right)
 		new = two;
 
-	fflush(f);
 	cp.git_cmd = 1;
 	cp.dir = path;
-	cp.out = dup(fileno(f));
+	cp.out = -1;
 	cp.no_stdin = 1;
 
 	/* TODO: other options may need to be passed here. */
 	argv_array_push(&cp.args, "diff");
-	argv_array_pushf(&cp.args, "--line-prefix=%s", line_prefix);
+	argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));
 	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		argv_array_pushf(&cp.args, "--src-prefix=%s%s/",
 				 o->b_prefix, path);
@@ -578,11 +568,21 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 		argv_array_push(&cp.args, oid_to_hex(new));
 
 	prepare_submodule_repo_env(&cp.env_array);
-	if (run_command(&cp))
-		fprintf(f, "(diff failed)\n");
+	if (start_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		emit_line(o, NULL, NULL, 1, 0, error, strlen(error));
+	}
+
+	while (strbuf_getwholeline_fd(&sb, cp.out, '\n') != EOF)
+		emit_line(o, NULL, NULL, 1, 0, sb.buf, sb.len);
+
+	if (finish_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		emit_line(o, NULL, NULL, 1, 0, error, strlen(error));
+	}
 
 done:
-	strbuf_release(&submodule_dir);
+	strbuf_release(&sb);
 	if (merge_bases)
 		free_commit_list(merge_bases);
 	if (left)
diff --git a/submodule.h b/submodule.h
index 1277480add..9df0a3aea2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -53,17 +53,14 @@ extern int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 extern const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
 extern void handle_ignore_submodules_arg(struct diff_options *, const char *);
-extern void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset);
-extern void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *opt);
+		const char *del, const char *add, const char *reset);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
 /* Check if we want to update any submodule.*/
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 13/20] diff.c: convert emit_binary_diff_body to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (11 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 14/20] diff.c: convert show_stats " Stefan Beller
                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_binary_diff_body.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 7c8d6a5d12..2dd10fa16a 100644
--- a/diff.c
+++ b/diff.c
@@ -2240,8 +2240,8 @@ static unsigned char *deflate_it(char *data,
 	return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
-				  const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+				  mmfile_t *one, mmfile_t *two)
 {
 	void *cp;
 	void *delta;
@@ -2270,13 +2270,12 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	}
 
 	if (delta && delta_size < deflate_size) {
-		fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+		emit_line_fmt(o, NULL, NULL, 1, "delta %lu\n", orig_size);
 		free(deflated);
 		data = delta;
 		data_size = delta_size;
-	}
-	else {
-		fprintf(file, "%sliteral %lu\n", prefix, two->size);
+	} else {
+		emit_line_fmt(o, NULL, NULL, 1, "literal %lu\n", two->size);
 		free(delta);
 		data = deflated;
 		data_size = deflate_size;
@@ -2285,8 +2284,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	/* emit data encoded in base85 */
 	cp = data;
 	while (data_size) {
+		int len;
 		int bytes = (52 < data_size) ? 52 : data_size;
-		char line[70];
+		char line[71];
 		data_size -= bytes;
 		if (bytes <= 26)
 			line[0] = bytes + 'A' - 1;
@@ -2294,20 +2294,25 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 			line[0] = bytes - 26 + 'a' - 1;
 		encode_85(line + 1, cp, bytes);
 		cp = (char *) cp + bytes;
-		fprintf(file, "%s", prefix);
-		fputs(line, file);
-		fputc('\n', file);
+
+		len = strlen(line);
+		line[len++] = '\n';
+		line[len] = '\0';
+
+		emit_line(o, NULL, NULL, 1, 0, line, len);
 	}
-	fprintf(file, "%s\n", prefix);
+	emit_line(o, NULL, NULL, 1, 0, "\n", 1);
 	free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-			     const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+			     mmfile_t *one, mmfile_t *two)
 {
-	fprintf(file, "%sGIT binary patch\n", prefix);
-	emit_binary_diff_body(file, one, two, prefix);
-	emit_binary_diff_body(file, two, one, prefix);
+	const char *s = "GIT binary patch\n";
+	const int len = strlen(s);
+	emit_line(o, NULL, NULL, 1, 0, s, len);
+	emit_binary_diff_body(o, one, two);
+	emit_binary_diff_body(o, two, one);
 }
 
 int diff_filespec_is_binary(struct diff_filespec *one)
@@ -2494,7 +2499,7 @@ static void builtin_diff(const char *name_a,
 			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
-			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
+			emit_binary_diff(o, &mf1, &mf2);
 		else
 			emit_line_fmt(o, NULL, NULL, 1,
 				      "Binary files %s and %s differ\n",
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 14/20] diff.c: convert show_stats to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (12 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 15/20] diff.c: convert word diffing " Stefan Beller
                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_line_* machinery and
keep print_stat_summary with the same arguments around.

The responsibility to print the line prefix moves from the callers
of print_stat_summary_0 into the function itself.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 89 ++++++++++++++++++++++++++++++++++++++----------------------------
 diff.h |  4 +--
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/diff.c b/diff.c
index 2dd10fa16a..ccd28953d7 100644
--- a/diff.c
+++ b/diff.c
@@ -1536,20 +1536,19 @@ static int scale_linear(int it, int width, int max_change)
 	return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
+static void show_name(struct strbuf *out,
 		      const char *prefix, const char *name, int len)
 {
-	fprintf(file, " %s%-*s |", prefix, len, name);
+	strbuf_addf(out, " %s%-*s |", prefix, len, name);
 }
 
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt, const char *set, const char *reset)
 {
 	if (cnt <= 0)
 		return;
-	fprintf(file, "%s", set);
-	while (cnt--)
-		putc(ch, file);
-	fprintf(file, "%s", reset);
+	strbuf_addstr(out, set);
+	strbuf_addchars(out, ch, cnt);
+	strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1573,14 +1572,16 @@ static void fill_print_name(struct diffstat_file *file)
 	file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+void print_stat_summary_0(struct diff_options *options, int files,
+			  int insertions, int deletions)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
 
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
-		return fprintf(fp, "%s\n", " 0 files changed");
+		strbuf_addstr(&sb, " 0 files changed");
+		emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+		return;
 	}
 
 	strbuf_addf(&sb,
@@ -1607,9 +1608,17 @@ int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	ret = fputs(sb.buf, fp);
+	emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
-	return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+			int insertions, int deletions)
+{
+	struct diff_options o;
+	memset(&o, 0, sizeof(o));
+	o.file = fp;
+	print_stat_summary_0(&o, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1619,13 +1628,13 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 	int total_files = data->nr, count;
 	int width, name_width, graph_width, number_width = 0, bin_width = 0;
 	const char *reset, *add_c, *del_c;
-	const char *line_prefix = "";
 	int extra_shown = 0;
+	const char *line_prefix = diff_line_prefix(options);
+	struct strbuf out = STRBUF_INIT;
 
 	if (data->nr == 0)
 		return;
 
-	line_prefix = diff_line_prefix(options);
 	count = options->stat_count ? options->stat_count : data->nr;
 
 	reset = diff_get_color_opt(options, DIFF_RESET);
@@ -1779,26 +1788,29 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		}
 
 		if (file->is_binary) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " %*s", number_width, "Bin");
+			show_name(&out, prefix, name, len);
+			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
-				putc('\n', options->file);
+				strbuf_addch(&out, '\n');
+				emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+				strbuf_reset(&out);
 				continue;
 			}
-			fprintf(options->file, " %s%"PRIuMAX"%s",
+			strbuf_addf(&out, " %s%"PRIuMAX"%s",
 				del_c, deleted, reset);
-			fprintf(options->file, " -> ");
-			fprintf(options->file, "%s%"PRIuMAX"%s",
+			strbuf_addstr(&out, " -> ");
+			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
-			fprintf(options->file, " bytes");
-			fprintf(options->file, "\n");
+			strbuf_addstr(&out, " bytes\n");
+			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " Unmerged\n");
+			show_name(&out, prefix, name, len);
+			strbuf_addstr(&out, " Unmerged\n");
+			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 
@@ -1821,14 +1833,15 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 				add = total - del;
 			}
 		}
-		fprintf(options->file, "%s", line_prefix);
-		show_name(options->file, prefix, name, len);
-		fprintf(options->file, " %*"PRIuMAX"%s",
+		show_name(&out, prefix, name, len);
+		strbuf_addf(&out, " %*"PRIuMAX"%s",
 			number_width, added + deleted,
 			added + deleted ? " " : "");
-		show_graph(options->file, '+', add, add_c, reset);
-		show_graph(options->file, '-', del, del_c, reset);
-		fprintf(options->file, "\n");
+		show_graph(&out, '+', add, add_c, reset);
+		show_graph(&out, '-', del, del_c, reset);
+		strbuf_addch(&out, '\n');
+		emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+		strbuf_reset(&out);
 	}
 
 	for (i = 0; i < data->nr; i++) {
@@ -1849,11 +1862,12 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			fprintf(options->file, "%s ...\n", line_prefix);
+			emit_line(options, NULL, NULL, 1, 0,
+				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
-	fprintf(options->file, "%s", line_prefix);
-	print_stat_summary(options->file, total_files, adds, dels);
+
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_shortstats(struct diffstat_t *data, struct diff_options *options)
@@ -1865,7 +1879,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 
 	for (i = 0; i < data->nr; i++) {
 		int added = data->files[i]->added;
-		int deleted= data->files[i]->deleted;
+		int deleted = data->files[i]->deleted;
 
 		if (data->files[i]->is_unmerged ||
 		    (!data->files[i]->is_interesting && (added + deleted == 0))) {
@@ -1875,8 +1889,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 			dels += deleted;
 		}
 	}
-	fprintf(options->file, "%s", diff_line_prefix(options));
-	print_stat_summary(options->file, total_files, adds, dels);
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_numstat(struct diffstat_t *data, struct diff_options *options)
diff --git a/diff.h b/diff.h
index 6e14100102..b75b0d7283 100644
--- a/diff.h
+++ b/diff.h
@@ -394,8 +394,8 @@ extern int parse_rename_score(const char **cp_p);
 
 extern long parse_algorithm_value(const char *value);
 
-extern int print_stat_summary(FILE *fp, int files,
-			      int insertions, int deletions);
+extern void print_stat_summary(FILE *fp, int files,
+			       int insertions, int deletions);
 extern void setup_diff_pager(struct diff_options *);
 
 #endif /* DIFF_H */
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 15/20] diff.c: convert word diffing to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (13 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 14/20] diff.c: convert show_stats " Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 16/20] diff.c: convert diff_flush " Stefan Beller
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers all code related to diffing words.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 73 +++++++++++++++++++++++++++++-------------------------------------
 1 file changed, 32 insertions(+), 41 deletions(-)

diff --git a/diff.c b/diff.c
index ccd28953d7..f1cb0b7799 100644
--- a/diff.c
+++ b/diff.c
@@ -893,37 +893,42 @@ struct diff_words_data {
 	struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
 					  struct diff_words_style_elem *st_el,
 					  const char *newline,
-					  size_t count, const char *buf,
-					  const char *line_prefix)
+					  size_t count, const char *buf)
 {
 	int print = 0;
+	struct strbuf sb = STRBUF_INIT;
 
 	while (count) {
 		char *p = memchr(buf, '\n', count);
 		if (print)
-			fputs(line_prefix, fp);
+			emit_line(o, NULL, NULL, 1, 0, "", 0);
+
 		if (p != buf) {
-			if (st_el->color && fputs(st_el->color, fp) < 0)
-				return -1;
-			if (fputs(st_el->prefix, fp) < 0 ||
-			    fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-			    fputs(st_el->suffix, fp) < 0)
-				return -1;
-			if (st_el->color && *st_el->color
-			    && fputs(GIT_COLOR_RESET, fp) < 0)
-				return -1;
+			const char *reset = st_el->color && *st_el->color ?
+					    GIT_COLOR_RESET : NULL;
+			strbuf_addstr(&sb, st_el->prefix);
+			strbuf_add(&sb, buf, p ? p - buf : count);
+			strbuf_addstr(&sb, st_el->suffix);
+			emit_line(o, st_el->color, reset,
+				  0, 0, sb.buf, sb.len);
+			strbuf_reset(&sb);
 		}
 		if (!p)
-			return 0;
-		if (fputs(newline, fp) < 0)
-			return -1;
+			goto out;
+
+		strbuf_addstr(&sb, newline);
+		emit_line(o, NULL, NULL, 0, 0, sb.buf, sb.len);
+		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
 		print = 1;
 	}
+
+out:
+	strbuf_release(&sb);
 	return 0;
 }
 
@@ -977,14 +982,12 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	int minus_first, minus_len, plus_first, plus_len;
 	const char *minus_begin, *minus_end, *plus_begin, *plus_end;
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
 
 	if (line[0] != '@' || parse_hunk_header(line, len,
 			&minus_first, &minus_len, &plus_first, &plus_len))
 		return;
 
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* POSIX requires that first be decremented by one if len == 0... */
 	if (minus_len) {
@@ -1001,28 +1004,21 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	} else
 		plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
 
-	if (color_words_output_graph_prefix(diff_words)) {
-		fputs(line_prefix, diff_words->opt->file);
-	}
 	if (diff_words->current_plus != plus_begin) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->ctx, style->newline,
 				plus_begin - diff_words->current_plus,
-				diff_words->current_plus, line_prefix);
-		if (*(plus_begin - 1) == '\n')
-			fputs(line_prefix, diff_words->opt->file);
+				diff_words->current_plus);
 	}
 	if (minus_begin != minus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->old, style->newline,
-				minus_end - minus_begin, minus_begin,
-				line_prefix);
+				minus_end - minus_begin, minus_begin);
 	}
 	if (plus_begin != plus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->new, style->newline,
-				plus_end - plus_begin, plus_begin,
-				line_prefix);
+				plus_end - plus_begin, plus_begin);
 	}
 
 	diff_words->current_plus = plus_end;
@@ -1109,18 +1105,14 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	struct diff_words_style *style = diff_words->style;
 
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
-
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* special case: only removal */
 	if (!diff_words->plus.text.size) {
-		fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->old, style->newline,
 			diff_words->minus.text.size,
-			diff_words->minus.text.ptr, line_prefix);
+			diff_words->minus.text.ptr);
 		diff_words->minus.text.size = 0;
 		return;
 	}
@@ -1143,12 +1135,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+			emit_line(diff_words->opt, NULL, NULL, 1, 0, "", 0);
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
-			- diff_words->current_plus, diff_words->current_plus,
-			line_prefix);
+			- diff_words->current_plus, diff_words->current_plus);
 	}
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 16/20] diff.c: convert diff_flush to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (14 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 15/20] diff.c: convert word diffing " Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 17/20] diff.c: convert diff_summary " Stefan Beller
                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_flush.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index f1cb0b7799..4cd1b3c520 100644
--- a/diff.c
+++ b/diff.c
@@ -4868,7 +4868,9 @@ void diff_flush(struct diff_options *options)
 			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				fputs(options->stat_sep, options->file);
+				emit_line(options, NULL, NULL, 0, 0,
+					  options->stat_sep,
+					  strlen(options->stat_sep));
 			}
 		}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 17/20] diff.c: convert diff_summary to use emit_line_*
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (15 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 16/20] diff.c: convert diff_flush " Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_summary.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 64 ++++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/diff.c b/diff.c
index 4cd1b3c520..964b5cb5a7 100644
--- a/diff.c
+++ b/diff.c
@@ -4500,67 +4500,71 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
 	}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char *newdelete, struct diff_filespec *fs)
 {
+	struct strbuf sb = STRBUF_INIT;
 	if (fs->mode)
-		fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+		strbuf_addf(&sb, " %s mode %06o ", newdelete, fs->mode);
 	else
-		fprintf(file, " %s ", newdelete);
-	write_name_quoted(fs->path, file, '\n');
-}
+		strbuf_addf(&sb, " %s ", newdelete);
 
+	quote_c_style(fs->path, &sb, NULL, 0);
+	strbuf_addch(&sb, '\n');
+	emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+	strbuf_release(&sb);
+}
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int show_name,
-		const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+		int show_name)
 {
 	if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-		fprintf(file, "%s mode change %06o => %06o%c", line_prefix, p->one->mode,
-			p->two->mode, show_name ? ' ' : '\n');
+		struct strbuf sb = STRBUF_INIT;
 		if (show_name) {
-			write_name_quoted(p->two->path, file, '\n');
+			strbuf_addch(&sb, ' ');
+			quote_c_style(p->two->path, &sb, NULL, 0);
 		}
+		emit_line_fmt(opt, NULL, NULL, 1,
+			      " mode change %06o => %06o%s\n",
+			      p->one->mode, p->two->mode,
+			      show_name ? sb.buf : "");
+		strbuf_release(&sb);
 	}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct diff_filepair *p,
-			const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+		struct diff_filepair *p)
 {
 	char *names = pprint_rename(p->one->path, p->two->path);
-
-	fprintf(file, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p));
+	emit_line_fmt(opt, NULL, NULL, 1, " %s %s (%d%%)\n",
+		      renamecopy, names, similarity_index(p));
 	free(names);
-	show_mode_change(file, p, 0, line_prefix);
+	show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-	FILE *file = opt->file;
-	const char *line_prefix = diff_line_prefix(opt);
-
 	switch(p->status) {
 	case DIFF_STATUS_DELETED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "delete", p->one);
+		show_file_mode_name(opt, "delete", p->one);
 		break;
 	case DIFF_STATUS_ADDED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "create", p->two);
+		show_file_mode_name(opt, "create", p->two);
 		break;
 	case DIFF_STATUS_COPIED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "copy", p, line_prefix);
+		show_rename_copy(opt, "copy", p);
 		break;
 	case DIFF_STATUS_RENAMED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "rename", p, line_prefix);
+		show_rename_copy(opt, "rename", p);
 		break;
 	default:
 		if (p->score) {
-			fprintf(file, "%s rewrite ", line_prefix);
-			write_name_quoted(p->two->path, file, ' ');
-			fprintf(file, "(%d%%)\n", similarity_index(p));
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, " rewrite ");
+			quote_c_style(p->two->path, &sb, NULL, 0);
+			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
+			emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
 		}
-		show_mode_change(file, p, !p->score, line_prefix);
+		show_mode_change(opt, p, !p->score);
 		break;
 	}
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 18/20] diff.c: emit_line includes whitespace highlighting
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (16 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 17/20] diff.c: convert diff_summary " Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 19/20] diff: buffer all output if asked to Stefan Beller
                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

Currently any whitespace highlighting happens outside the emit_line
function. Teach the highlighting to emit_line, triggered by a new
parameter.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c      | 104 +++++++++++++++++++++++++++++++++++-------------------------
 diff.h      |   4 ++-
 submodule.c |  14 ++++----
 3 files changed, 71 insertions(+), 51 deletions(-)

diff --git a/diff.c b/diff.c
index 964b5cb5a7..34482a6a09 100644
--- a/diff.c
+++ b/diff.c
@@ -516,15 +516,33 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-void emit_line(struct diff_options *o, const char *set, const char *reset,
-	       int add_line_prefix, int sign, const char *line, int len)
+void emit_line(struct diff_options *o,
+	       const char *set, const char *reset,
+	       int add_line_prefix, int markup_ws,
+	       int sign, const char *line, int len)
 {
+	const char *ws;
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
 
 	if (add_line_prefix)
 		fputs(diff_line_prefix(o), file);
 
+	if (markup_ws) {
+		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+
+		if (set)
+			fputs(set, file);
+		if (sign)
+			fputc(sign, file);
+		if (reset)
+			fputs(reset, file);
+		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+		ws_check_emit(line, len, o->ws_rule,
+			      file, set, reset, ws);
+		return;
+	}
+
 	has_trailing_newline = (len > 0 && line[len-1] == '\n');
 	if (has_trailing_newline)
 		len--;
@@ -558,7 +576,7 @@ void emit_line_fmt(struct diff_options *o,
 	strbuf_vaddf(&sb, fmt, ap);
 	va_end(ap);
 
-	emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+	emit_line(o, set, reset, add_line_prefix, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -590,16 +608,15 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, set, reset, 1, 0, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, ws, reset, 1, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
-		ws_check_emit(line, len, ecbdata->ws_rule,
-			      ecbdata->opt->file, set, reset, ws);
+		emit_line(ecbdata->opt, set, reset, 1, 1, sign, line, len);
 	}
+
 }
 
 static void emit_add_line(const char *reset,
@@ -646,7 +663,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	if (len < 10 ||
 	    memcmp(line, atat, 2) ||
 	    !(ep = memmem(line + 2, len - 2, atat, 2))) {
-		emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
+		emit_line(ecbdata->opt, context, reset, 1, 0, 0, line, len);
 		return;
 	}
 	ep += 2; /* skip over @@ */
@@ -682,7 +699,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	strbuf_add(&msgbuf, line + len, org_len - len);
 	strbuf_complete_line(&msgbuf);
 
-	emit_line(ecbdata->opt, "", "", 1, 0, msgbuf.buf, msgbuf.len);
+	emit_line(ecbdata->opt, "", "", 1, 0, 0, msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
 
@@ -755,7 +772,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		static const char *nneof = "\\ No newline at end of file\n";
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		emit_line(ecb->opt, context, reset, 1, 0,
+		emit_line(ecb->opt, context, reset, 1, 0, 0,
 			    nneof, strlen(nneof));
 		strbuf_release(&sb);
 	}
@@ -831,7 +848,7 @@ static void emit_rewrite_diff(const char *name_a,
 	strbuf_addstr(&out, " +");
 	add_line_count(&out, lc_b);
 	strbuf_addstr(&out, " @@\n");
-	emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+	emit_line(o, fraginfo, reset, 1, 0, 0, out.buf, out.len);
 	strbuf_release(&out);
 
 	if (lc_a && !o->irreversible_delete)
@@ -904,7 +921,7 @@ static int fn_out_diff_words_write_helper(struct diff_options *o,
 	while (count) {
 		char *p = memchr(buf, '\n', count);
 		if (print)
-			emit_line(o, NULL, NULL, 1, 0, "", 0);
+			emit_line(o, NULL, NULL, 1, 0, 0, "", 0);
 
 		if (p != buf) {
 			const char *reset = st_el->color && *st_el->color ?
@@ -913,14 +930,14 @@ static int fn_out_diff_words_write_helper(struct diff_options *o,
 			strbuf_add(&sb, buf, p ? p - buf : count);
 			strbuf_addstr(&sb, st_el->suffix);
 			emit_line(o, st_el->color, reset,
-				  0, 0, sb.buf, sb.len);
+				  0, 0, 0, sb.buf, sb.len);
 			strbuf_reset(&sb);
 		}
 		if (!p)
 			goto out;
 
 		strbuf_addstr(&sb, newline);
-		emit_line(o, NULL, NULL, 0, 0, sb.buf, sb.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, sb.buf, sb.len);
 		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
@@ -1135,7 +1152,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			emit_line(diff_words->opt, NULL, NULL, 1, 0, "", 0);
+			emit_line(diff_words->opt, NULL, NULL, 1, 0, 0, "", 0);
 		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
@@ -1294,7 +1311,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		emit_line(o, NULL, NULL, 0, 0,
+		emit_line(o, NULL, NULL, 0, 0, 0,
 			  ecbdata->header->buf, ecbdata->header->len);
 		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
@@ -1347,8 +1364,8 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-			emit_line(o, context, reset, 1, 0, line, len);
-			emit_line(o, NULL, NULL, 0, 0, "~\n", 2);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
+			emit_line(o, NULL, NULL, 0, 0, 0, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
@@ -1359,7 +1376,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			      line++;
 			      len--;
 			}
-			emit_line(o, context, reset, 1, 0, line, len);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
 		}
 		return;
 	}
@@ -1382,7 +1399,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		/* incomplete line at the end */
 		ecbdata->lno_in_preimage++;
 		emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
-			  reset, 1, 0, line, len);
+			  reset, 1, 0, 0, line, len);
 		break;
 	}
 }
@@ -1571,7 +1588,7 @@ void print_stat_summary_0(struct diff_options *options, int files,
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
 		strbuf_addstr(&sb, " 0 files changed");
-		emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		return;
 	}
 
@@ -1599,7 +1616,7 @@ void print_stat_summary_0(struct diff_options *options, int files,
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+	emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -1783,7 +1800,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
 				strbuf_addch(&out, '\n');
-				emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+				emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 				strbuf_reset(&out);
 				continue;
 			}
@@ -1793,14 +1810,14 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
 			strbuf_addstr(&out, " bytes\n");
-			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
 			show_name(&out, prefix, name, len);
 			strbuf_addstr(&out, " Unmerged\n");
-			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
@@ -1831,7 +1848,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		show_graph(&out, '+', add, add_c, reset);
 		show_graph(&out, '-', del, del_c, reset);
 		strbuf_addch(&out, '\n');
-		emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 		strbuf_reset(&out);
 	}
 
@@ -1853,7 +1870,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			emit_line(options, NULL, NULL, 1, 0,
+			emit_line(options, NULL, NULL, 1, 0, 0,
 				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
@@ -2207,7 +2224,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
 			line_prefix, data->filename, data->lineno, err);
 		free(err);
-		emit_line(data->o, set, reset, 1, 0, line, 1);
+		emit_line(data->o, set, reset, 1, 0, 0, line, 1);
 		ws_check_emit(line + 1, len - 1, data->ws_rule,
 			      data->o->file, set, reset, ws);
 	} else if (line[0] == ' ') {
@@ -2303,9 +2320,9 @@ static void emit_binary_diff_body(struct diff_options *o,
 		line[len++] = '\n';
 		line[len] = '\0';
 
-		emit_line(o, NULL, NULL, 1, 0, line, len);
+		emit_line(o, NULL, NULL, 1, 0, 0, line, len);
 	}
-	emit_line(o, NULL, NULL, 1, 0, "\n", 1);
+	emit_line(o, NULL, NULL, 1, 0, 0, "\n", 1);
 	free(data);
 }
 
@@ -2314,7 +2331,7 @@ static void emit_binary_diff(struct diff_options *o,
 {
 	const char *s = "GIT binary patch\n";
 	const int len = strlen(s);
-	emit_line(o, NULL, NULL, 1, 0, s, len);
+	emit_line(o, NULL, NULL, 1, 0, 0, s, len);
 	emit_binary_diff_body(o, one, two);
 	emit_binary_diff_body(o, two, one);
 }
@@ -2457,7 +2474,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2467,7 +2484,7 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2478,11 +2495,11 @@ static void builtin_diff(const char *name_a,
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
 				if (must_show_header)
-					emit_line(o, NULL, NULL, 0, 0,
+					emit_line(o, NULL, NULL, 0, 0, 0,
 						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			emit_line(o, NULL, NULL, 0, 0,
+			emit_line(o, NULL, NULL, 0, 0, 0,
 				  header.buf, header.len);
 			emit_line_fmt(o, NULL, NULL, 1,
 				      "Binary files %s and %s differ\n",
@@ -2495,11 +2512,11 @@ static void builtin_diff(const char *name_a,
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
 			if (must_show_header)
-				emit_line(o, NULL, NULL, 0, 0,
+				emit_line(o, NULL, NULL, 0, 0, 0,
 					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		emit_line(o, NULL, NULL, 0, 0,
+		emit_line(o, NULL, NULL, 0, 0, 0,
 			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
@@ -2519,7 +2536,7 @@ static void builtin_diff(const char *name_a,
 		const struct userdiff_funcname *pe;
 
 		if (must_show_header) {
-			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
@@ -2536,6 +2553,7 @@ static void builtin_diff(const char *name_a,
 		ecbdata.label_path = lbl;
 		ecbdata.color_diff = want_color(o->use_color);
 		ecbdata.ws_rule = whitespace_rule(name_b);
+		o->ws_rule = ecbdata.ws_rule;
 		if (ecbdata.ws_rule & WS_BLANK_AT_EOF)
 			check_blank_at_eof(&mf1, &mf2, &ecbdata);
 		ecbdata.opt = o;
@@ -4510,7 +4528,7 @@ static void show_file_mode_name(struct diff_options *opt, const char *newdelete,
 
 	quote_c_style(fs->path, &sb, NULL, 0);
 	strbuf_addch(&sb, '\n');
-	emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+	emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -4562,7 +4580,7 @@ static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 			strbuf_addstr(&sb, " rewrite ");
 			quote_c_style(p->two->path, &sb, NULL, 0);
 			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
-			emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+			emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		}
 		show_mode_change(opt, p, !p->score);
 		break;
@@ -4869,10 +4887,10 @@ void diff_flush(struct diff_options *options)
 			term[0] = options->line_termination;
 			term[1] = '\0';
 
-			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
+			emit_line(options, NULL, NULL, 1, 0, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				emit_line(options, NULL, NULL, 0, 0,
+				emit_line(options, NULL, NULL, 0, 0, 0,
 					  options->stat_sep,
 					  strlen(options->stat_sep));
 			}
diff --git a/diff.h b/diff.h
index b75b0d7283..267acf1980 100644
--- a/diff.h
+++ b/diff.h
@@ -186,12 +186,14 @@ struct diff_options {
 	void *output_prefix_data;
 
 	int diff_path_counter;
+
+	unsigned ws_rule;
 };
 
 void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
 		   int add_line_prefix, const char *fmt, ...);
 void emit_line(struct diff_options *o, const char *set, const char *reset,
-	       int add_line_prefix, int sign, const char *line, int len);
+	       int add_line_prefix, int markup_ws, int sign, const char *line, int len);
 
 enum color_diff {
 	DIFF_RESET = 0,
diff --git a/submodule.c b/submodule.c
index 5996ebca44..868f913971 100644
--- a/submodule.c
+++ b/submodule.c
@@ -378,9 +378,9 @@ static void print_submodule_summary(struct rev_info *rev,
 		format_commit_message(commit, format, &sb, &ctx);
 		strbuf_addch(&sb, '\n');
 		if (commit->object.flags & SYMMETRIC_LEFT)
-			emit_line(o, del, reset, 1, 0, sb.buf, sb.len);
+			emit_line(o, del, reset, 1, 0, 0, sb.buf, sb.len);
 		else if (add)
-			emit_line(o, add, reset, 1, 0, sb.buf, sb.len);
+			emit_line(o, add, reset, 1, 0, 0, sb.buf, sb.len);
 	}
 	strbuf_release(&sb);
 }
@@ -473,7 +473,7 @@ static void show_submodule_header(struct diff_options *o, const char *path,
 		strbuf_addf(&sb, " %s\n", message);
 	else
 		strbuf_addf(&sb, "%s:\n", fast_backward ? " (rewind)" : "");
-	emit_line(o, meta, reset, 1, 0,  sb.buf, sb.len);
+	emit_line(o, meta, reset, 1, 0, 0, sb.buf, sb.len);
 
 	strbuf_release(&sb);
 }
@@ -501,7 +501,7 @@ void show_submodule_summary(struct diff_options *o, const char *path,
 	/* Treat revision walker failure the same as missing commits */
 	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
 		const char *error = "(revision walker failed)\n";
-		emit_line(o, NULL, NULL, 1, 0, error, strlen(error));
+		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
 		goto out;
 	}
 
@@ -570,15 +570,15 @@ void show_submodule_inline_diff(struct diff_options *o, const char *path,
 	prepare_submodule_repo_env(&cp.env_array);
 	if (start_command(&cp)) {
 		const char *error = "(diff failed)\n";
-		emit_line(o, NULL, NULL, 1, 0, error, strlen(error));
+		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
 	}
 
 	while (strbuf_getwholeline_fd(&sb, cp.out, '\n') != EOF)
-		emit_line(o, NULL, NULL, 1, 0, sb.buf, sb.len);
+		emit_line(o, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 
 	if (finish_command(&cp)) {
 		const char *error = "(diff failed)\n";
-		emit_line(o, NULL, NULL, 1, 0, error, strlen(error));
+		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
 	}
 
 done:
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 19/20] diff: buffer all output if asked to
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (17 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-17  2:58   ` [PATCHv2 20/20] diff.c: color moved lines differently Stefan Beller
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

Introduce a new option 'use_buffer' in the struct diff_options which
controls whether all output is buffered up until all output is available.

We'll have a new struct 'buffered_patch_line' in diff.h which will be
used to buffer each line.  The buffered_patch_line will duplicate the
memory of the line to buffer as that is easiest to reason about for now.
In a future patch we may want to decrease the memory usage by not
duplicating all output for buffering but rather we may want to store
offsets into the file or in case of hunk descriptions such as the
similarity score, we could just store the relevant number and
reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_line factors out the emission part into emit_line_emission,
and depending on the diff_options->use_buffer the emission
will be performed directly when calling emit_line or after the
whole process is done, i.e. by buffering we have add the possibility
for a second pass over the whole output before doing the actual
output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_line for output.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++---------------
 diff.h |  39 +++++++++++++++++
 2 files changed, 159 insertions(+), 34 deletions(-)

diff --git a/diff.c b/diff.c
index 34482a6a09..1d8d1786fd 100644
--- a/diff.c
+++ b/diff.c
@@ -516,53 +516,85 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-void emit_line(struct diff_options *o,
-	       const char *set, const char *reset,
-	       int add_line_prefix, int markup_ws,
-	       int sign, const char *line, int len)
+static void emit_buffered_patch_line(struct diff_options *o,
+				     struct buffered_patch_line *e)
 {
 	const char *ws;
 	int has_trailing_newline, has_trailing_carriage_return;
+	int len = e->len;
 	FILE *file = o->file;
 
-	if (add_line_prefix)
+	if (e->add_line_prefix)
 		fputs(diff_line_prefix(o), file);
 
-	if (markup_ws) {
+	switch (e->state) {
+	case BPL_EMIT_LINE_WS:
 		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+		if (e->set)
+			fputs(e->set, file);
+		if (e->sign)
+			fputc(e->sign, file);
+		if (e->reset)
+			fputs(e->reset, file);
+		ws_check_emit(e->line, e->len, o->ws_rule,
+			      file, e->set, e->reset, ws);
+		return;
+	case BPL_EMIT_LINE_ASIS:
+		has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
+		if (has_trailing_newline)
+			len--;
+		has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
+		if (has_trailing_carriage_return)
+			len--;
 
-		if (set)
-			fputs(set, file);
-		if (sign)
-			fputc(sign, file);
-		if (reset)
-			fputs(reset, file);
-		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
-		ws_check_emit(line, len, o->ws_rule,
-			      file, set, reset, ws);
+		if (len || e->sign) {
+			if (e->set)
+				fputs(e->set, file);
+			if (e->sign)
+				fputc(e->sign, file);
+			fwrite(e->line, len, 1, file);
+			if (e->reset)
+				fputs(e->reset, file);
+		}
+		if (has_trailing_carriage_return)
+			fputc('\r', file);
+		if (has_trailing_newline)
+			fputc('\n', file);
+		return;
+	case BPL_HANDOVER:
+		o->ws_rule = whitespace_rule(e->line); /*read from file, stored in line?*/
 		return;
+	default:
+		die("BUG: malformatted buffered patch line: '%d'", e->state);
 	}
+}
 
-	has_trailing_newline = (len > 0 && line[len-1] == '\n');
-	if (has_trailing_newline)
-		len--;
-	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-	if (has_trailing_carriage_return)
-		len--;
+static void append_buffered_patch_line(struct diff_options *o,
+				       struct buffered_patch_line *e)
+{
+	struct buffered_patch_line *f;
+	ALLOC_GROW(o->line_buffer,
+		   o->line_buffer_nr + 1,
+		   o->line_buffer_alloc);
+	f = &o->line_buffer[o->line_buffer_nr++];
 
-	if (len || sign) {
-		if (set)
-			fputs(set, file);
-		if (sign)
-			fputc(sign, file);
-		fwrite(line, len, 1, file);
-		if (reset)
-			fputs(reset, file);
-	}
-	if (has_trailing_carriage_return)
-		fputc('\r', file);
-	if (has_trailing_newline)
-		fputc('\n', file);
+	memcpy(f, e, sizeof(struct buffered_patch_line));
+	f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
+}
+
+void emit_line(struct diff_options *o,
+	       const char *set, const char *reset,
+	       int add_line_prefix, int markup_ws,
+	       int sign, const char *line, int len)
+{
+	struct buffered_patch_line e = {set, reset, line,
+		len, sign, add_line_prefix,
+		markup_ws ? BPL_EMIT_LINE_WS : BPL_EMIT_LINE_ASIS};
+
+	if (o->use_buffer)
+		append_buffered_patch_line(o, &e);
+	else
+		emit_buffered_patch_line(o, &e);
 }
 
 void emit_line_fmt(struct diff_options *o,
@@ -1167,6 +1199,18 @@ static void diff_words_flush(struct emit_callback *ecbdata)
 	if (ecbdata->diff_words->minus.text.size ||
 	    ecbdata->diff_words->plus.text.size)
 		diff_words_show(ecbdata->diff_words);
+
+	if (ecbdata->diff_words->opt->line_buffer_nr) {
+		int i;
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			append_buffered_patch_line(ecbdata->opt,
+				&ecbdata->diff_words->opt->line_buffer[i]);
+
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			free((void*) ecbdata->diff_words->opt->line_buffer[i].line);
+
+		ecbdata->diff_words->opt->line_buffer_nr = 0;
+	}
 }
 
 static void diff_filespec_load_driver(struct diff_filespec *one)
@@ -1202,6 +1246,11 @@ static void init_diff_words_data(struct emit_callback *ecbdata,
 		xcalloc(1, sizeof(struct diff_words_data));
 	ecbdata->diff_words->type = o->word_diff;
 	ecbdata->diff_words->opt = o;
+
+	o->line_buffer = NULL;
+	o->line_buffer_nr = 0;
+	o->line_buffer_alloc = 0;
+
 	if (!o->word_regex)
 		o->word_regex = userdiff_word_regex(one);
 	if (!o->word_regex)
@@ -1236,6 +1285,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
 {
 	if (ecbdata->diff_words) {
 		diff_words_flush(ecbdata);
+		free (ecbdata->diff_words->opt->line_buffer);
 		free (ecbdata->diff_words->opt);
 		free (ecbdata->diff_words->minus.text.ptr);
 		free (ecbdata->diff_words->minus.orig);
@@ -2574,6 +2624,13 @@ static void builtin_diff(const char *name_a,
 			xecfg.ctxlen = strtoul(v, NULL, 10);
 		if (o->word_diff)
 			init_diff_words_data(&ecbdata, o, one, two);
+		if (o->use_buffer) {
+			struct buffered_patch_line e = BUFFERED_PATCH_LINE_INIT;
+			e.state = BPL_HANDOVER;
+			e.line = name_b;
+			e.len = strlen(name_b);
+			append_buffered_patch_line(o, &e);
+		}
 		if (xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 				  &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
@@ -3453,6 +3510,10 @@ void diff_setup(struct diff_options *options)
 		options->a_prefix = "a/";
 		options->b_prefix = "b/";
 	}
+
+	options->line_buffer = NULL;
+	options->line_buffer_nr = 0;
+	options->line_buffer_alloc = 0;
 }
 
 void diff_setup_done(struct diff_options *options)
@@ -4791,11 +4852,36 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
+	/*
+	 * For testing purposes we want to make sure the diff machinery
+	 * works completely with the buffer. If there is anything emitted
+	 * outside the emit_buffered_patch_line, then the order is screwed
+	 * up and the tests will fail.
+	 *
+	 * TODO (later in this series):
+	 * We'll unset this flag in a later patch.
+	 */
+	o->use_buffer = 1;
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		if (check_pair_status(p))
 			diff_flush_patch(p, o);
 	}
+
+	if (o->use_buffer) {
+		for (i = 0; i < o->line_buffer_nr; i++)
+			emit_buffered_patch_line(o, &o->line_buffer[i]);
+
+		for (i = 0; i < o->line_buffer_nr; i++)
+			free((void*)o->line_buffer[i].line);
+
+		free(o->line_buffer);
+
+		o->line_buffer = NULL;
+		o->line_buffer_nr = 0;
+		o->line_buffer_alloc = 0;
+	}
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 267acf1980..64bc9bcd8c 100644
--- a/diff.h
+++ b/diff.h
@@ -115,6 +115,41 @@ enum diff_submodule_format {
 	DIFF_SUBMODULE_INLINE_DIFF
 };
 
+/*
+ * This struct is used when we need to buffer the output of the diff output.
+ *
+ * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
+ * into the pre/post image file. This pointer could be a union with the
+ * line pointer. By storing an offset into the file instead of the literal line,
+ * we can decrease the memory footprint for the buffered output. At first we
+ * may want to only have indirection for the content lines, but we could
+ * also have an enum (based on sign?) that stores prefabricated lines, e.g.
+ * the similarity score line or hunk/file headers.
+ */
+struct buffered_patch_line {
+	const char *set;
+	const char *reset;
+	const char *line;
+	int len;
+	int sign;
+	int add_line_prefix;
+	enum {
+		/*
+		 * Emits [lineprefix][set][sign][reset] and then calls
+		 * ws_check_emit which will output "line", marked up
+		 * according to ws_rule.
+		 */
+		BPL_EMIT_LINE_WS,
+
+		/* Emits [lineprefix][set][sign] line [reset] */
+		BPL_EMIT_LINE_ASIS,
+
+		/* Reloads the ws_rule; line contains the file name */
+		BPL_HANDOVER
+	} state;
+};
+#define BUFFERED_PATCH_LINE_INIT {NULL, NULL, NULL, 0, 0, 0}
+
 struct diff_options {
 	const char *orderfile;
 	const char *pickaxe;
@@ -188,6 +223,10 @@ struct diff_options {
 	int diff_path_counter;
 
 	unsigned ws_rule;
+	int use_buffer;
+
+	struct buffered_patch_line *line_buffer;
+	int line_buffer_nr, line_buffer_alloc;
 };
 
 void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv2 20/20] diff.c: color moved lines differently
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (18 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 19/20] diff: buffer all output if asked to Stefan Beller
@ 2017-05-17  2:58   ` Stefan Beller
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17  2:58 UTC (permalink / raw)
  To: git; +Cc: jrnieder, gitster, jonathantanmy, bmwill, peff, mhagger,
	Stefan Beller

When there is a lot of code moved around such as in 11979b9 (2005-11-18,
"http.c: reorder to avoid compilation failure.") for example, the review
process is quite hard, as it is not mentally challenging.  It is a rather
tedious process, that gets boring quickly. However you still need to read
through all of the code to make sure the moved lines are there as supposed.

While it is trivial to color up a patch like the following

    $ git diff
    diff --git a/file2.c b/file2.c
    index 9163a0f..8e66dc0 100644
    --- a/file2.c
    +++ b/file2.c
    @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
            return memcpy(xmallocz(len), data, len);
     }

    -int secure_foo(struct user *u)
    -{
    -       if (!u->is_allowed_foo)
    -               return;
    -       foo(u);
    -}
    -
     char *xstrndup(const char *str, size_t len)
     {
            char *p = memchr(str, '\0', len);
    diff --git a/test.c b/test.c
    index a95e6fe..81eb0eb 100644
    --- a/test.c
    +++ b/test.c
    @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
            return total;
     }

    +int secure_foo(struct user *u)
    +{
    +       if (!u->is_allowed_foo)
    +               return;
    +       foo(u);
    +}
    +
     int xdup(int fd)
     {
            int ret = dup(fd);

as in this patch all lines that add or remove lines
should be colored in the new color that indicates moved
lines.

However the intention of this patch is to aid reviewers
to spotting permutations in the moved code. So consider the
following malicious move:

    diff --git a/file2.c b/file2.c
    index 9163a0f..8e66dc0 100644
    --- a/file2.c
    +++ b/file2.c
    @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
            return memcpy(xmallocz(len), data, len);
     }

    -int secure_foo(struct user *u)
    -{
    -       if (!u->is_allowed_foo)
    -               return;
    -       foo(u);
    -}
    -
     char *xstrndup(const char *str, size_t len)
     {
            char *p = memchr(str, '\0', len);
    diff --git a/test.c b/test.c
    index a95e6fe..a679c40 100644
    --- a/test.c
    +++ b/test.c
    @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
            return total;
     }

    +int secure_foo(struct user *u)
    +{
    +       foo(u);
    +       if (!u->is_allowed_foo)
    +               return;
    +}
    +
     int xdup(int fd)
     {
            int ret = dup(fd);

If the moved code is larger, it is easier to hide some permutation in the
code, which is why we would not want to color all lines as "moved" in this
case. So we do not just need to color lines differently that are added and
removed in the same diff, we need to tweak the algorithm a bit more.

As the reviewers attention should be brought to the places, where the
difference is introduced to the moved code, we cannot just have one new
color for all of moved code.

First I implemented an alternative design, which would show a moved hunk
in one color, and its boundaries in another color. This idea was error
prone as it inspected each line and its neighboring lines to determine
if the line was (a) moved and (b) if was deep inside a hunk by having
matching neighboring lines. This is unreliable as the we can construct
hunks which have equal neighbors that just exceed the number of lines
inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that
is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then switches color to the alternative color
for the next hunk. By doing this any permutation is recognized and
displayed. That implies that there is no dedicated boundary or
inside-hunk color, but instead we'll have just two colors alternating
for hunks.

It would be a bit more UX friendly if the two corresponding hunks
(of added and deleted lines) for one move would get the same color id.
(Both get "regular moved" or "alternative moved"). This problem is
deferred to a later patch for now.

Algorithm-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt   |  14 ++-
 diff.c                     | 266 +++++++++++++++++++++++++++++++++++++++++++--
 diff.h                     |  11 +-
 t/t4015-diff-whitespace.sh | 229 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 507 insertions(+), 13 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..902d017c3b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,24 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=<when>]` option.
 
+color.moved::
+	A boolean value, whether a diff should color moved lines
+	differently. The moved lines are searched for in the diff only.
+	Duplicated lines from somewhere in the project that are not
+	part of the diff are not colored as moved.
+	Defaults to false.
+
 color.diff.<slot>::
 	Use customized color for diff colorization.  `<slot>` specifies
 	which part of the patch to use the specified color, and is one
 	of `context` (context text - `plain` is a historical synonym),
 	`meta` (metainformation), `frag`
 	(hunk header), 'func' (function in hunk header), `old` (removed lines),
-	`new` (added lines), `commit` (commit headers), or `whitespace`
-	(highlighting whitespace errors).
+	`new` (added lines), `commit` (commit headers), `whitespace`
+	(highlighting whitespace errors), `oldMoved` (removed lines that
+	reappear), `newMoved` (added lines that were removed elsewhere),
+	`oldMovedAlternative` and `newMovedAlternative` (as a fallback to
+	cover adjacent blocks of moved code)
 
 color.decorate.<slot>::
 	Use customized color for 'git log --decorate' output.  `<slot>` is one
diff --git a/diff.c b/diff.c
index 1d8d1786fd..15cf322b50 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static int diff_indent_heuristic; /* experimental */
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
+static int diff_color_moved_default;
 static int diff_context_default = 3;
 static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
@@ -55,6 +56,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
 	GIT_COLOR_YELLOW,	/* COMMIT */
 	GIT_COLOR_BG_RED,	/* WHITESPACE */
 	GIT_COLOR_NORMAL,	/* FUNCINFO */
+	GIT_COLOR_BOLD_RED,	/* OLD_MOVED_A */
+	GIT_COLOR_BG_RED,	/* OLD_MOVED_B */
+	GIT_COLOR_BOLD_GREEN,	/* NEW_MOVED_A */
+	GIT_COLOR_BG_GREEN,	/* NEW_MOVED_B */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -80,6 +85,14 @@ static int parse_diff_color_slot(const char *var)
 		return DIFF_WHITESPACE;
 	if (!strcasecmp(var, "func"))
 		return DIFF_FUNCINFO;
+	if (!strcasecmp(var, "oldmoved"))
+		return DIFF_FILE_OLD_MOVED;
+	if (!strcasecmp(var, "oldmovedalternative"))
+		return DIFF_FILE_OLD_MOVED_ALT;
+	if (!strcasecmp(var, "newmoved"))
+		return DIFF_FILE_NEW_MOVED;
+	if (!strcasecmp(var, "newmovedalternative"))
+		return DIFF_FILE_NEW_MOVED_ALT;
 	return -1;
 }
 
@@ -234,6 +247,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_use_color_default = git_config_colorbool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "color.moved")) {
+		diff_color_moved_default = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.context")) {
 		diff_context_default = git_config_int(var, value);
 		if (diff_context_default < 0)
@@ -354,6 +371,88 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+struct moved_entry {
+	struct hashmap_entry ent;
+	const struct buffered_patch_line *line;
+	struct moved_entry *next_line;
+};
+
+static void get_ws_cleaned_string(const struct buffered_patch_line *l,
+				  struct strbuf *out)
+{
+	int i;
+	for (i = 0; i < l->len; i++) {
+		if (isspace(l->line[i]))
+			continue;
+		strbuf_addch(out, l->line[i]);
+	}
+}
+
+static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
+					 const struct buffered_patch_line *b,
+					 const void *keydata)
+{
+	int ret;
+	struct strbuf sba = STRBUF_INIT;
+	struct strbuf sbb = STRBUF_INIT;
+
+	get_ws_cleaned_string(a, &sba);
+	get_ws_cleaned_string(b, &sbb);
+	ret = sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
+
+	strbuf_release(&sba);
+	strbuf_release(&sbb);
+	return ret;
+}
+
+static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
+				   const struct buffered_patch_line *b,
+				   const void *keydata)
+{
+	return a->len != b->len || strncmp(a->line, b->line, a->len);
+}
+
+static int moved_entry_cmp(const struct moved_entry *a,
+			   const struct moved_entry *b,
+			   const void *keydata)
+{
+	return buffered_patch_line_cmp(a->line, b->line, keydata);
+}
+
+static int moved_entry_cmp_no_ws(const struct moved_entry *a,
+				 const struct moved_entry *b,
+				 const void *keydata)
+{
+	return buffered_patch_line_cmp_no_ws(a->line, b->line, keydata);
+}
+
+static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
+{
+	static struct strbuf sb = STRBUF_INIT;
+
+	if (ignore_ws) {
+		strbuf_reset(&sb);
+		get_ws_cleaned_string(line, &sb);
+		return memhash(sb.buf, sb.len);
+	} else {
+		return memhash(line->line, line->len);
+	}
+}
+
+static struct moved_entry *prepare_entry(struct diff_options *o,
+					 int line_no)
+{
+	struct moved_entry *ret = xmalloc(sizeof(*ret));
+	unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+	struct buffered_patch_line *l = &o->line_buffer[line_no];
+
+	ret->ent.hash = get_line_hash(l, ignore_ws);
+	ret->line = l;
+	ret->next_line = NULL;
+
+	return ret;
+}
+
 static char *quote_two(const char *one, const char *two)
 {
 	int need_one = quote_c_style(one, NULL, NULL, 1);
@@ -516,6 +615,135 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
+static void add_lines_to_move_detection(struct diff_options *o)
+{
+	struct moved_entry *prev_line;
+
+	int n;
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		int sign = 0;
+		struct hashmap *hm;
+		struct moved_entry *key;
+
+		switch (o->line_buffer[n].sign) {
+		case '+':
+			sign = '+';
+			hm = o->added_lines;
+			break;
+		case '-':
+			sign = '-';
+			hm = o->deleted_lines;
+			break;
+		case ' ':
+		default:
+			prev_line = NULL;
+			continue;
+		}
+
+		key = prepare_entry(o, n);
+		if (prev_line &&
+		    prev_line->line->sign == sign)
+			prev_line->next_line = key;
+
+		hashmap_add(hm, key);
+		prev_line = key;
+	}
+}
+
+static void mark_color_as_moved(struct diff_options *o)
+{
+	struct moved_entry **pmb = NULL; /* potentially moved blocks */
+	int pmb_nr = 0, pmb_alloc = 0;
+	int alt_flag = 0;
+	int n;
+
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		struct hashmap *hm = NULL;
+		struct moved_entry *key;
+		struct moved_entry *match = NULL;
+		struct buffered_patch_line *l = &o->line_buffer[n];
+		int i, lp, rp;
+
+		switch (l->sign) {
+		case '+':
+			hm = o->deleted_lines;
+			break;
+		case '-':
+			hm = o->added_lines;
+			break;
+		default:
+			alt_flag = 0; /* reset to standard, no-alt move color */
+			pmb_nr = 0; /* no running sets */
+			continue;
+		}
+
+		/* Check for any match to color it as a move. */
+		key = prepare_entry(o, n);
+		match = hashmap_get(hm, key, o);
+		free(key);
+		if (!match)
+			continue;
+
+		/* Check any potential block runs, advance each or nullify */
+		for (i = 0; i < pmb_nr; i++) {
+			struct moved_entry *p = pmb[i];
+			struct moved_entry *pnext = (p && p->next_line) ?
+					p->next_line : NULL;
+			if (pnext &&
+			    !buffered_patch_line_cmp(pnext->line, l, o)) {
+				pmb[i] = p->next_line;
+			} else {
+				pmb[i] = NULL;
+			}
+		}
+
+		/* Shrink the set to the remaining runs */
+		for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
+			while (lp < pmb_nr && pmb[lp])
+				lp ++;
+			/* lp points at the first NULL now */
+
+			while (rp > -1 && !pmb[rp])
+				rp--;
+			/* rp points at the last non-NULL */
+
+			if (lp < pmb_nr && rp > -1 && lp < rp) {
+				pmb[lp] = pmb[rp];
+				pmb[rp] = NULL;
+				rp--;
+				lp++;
+			}
+		}
+
+		if (rp > -1) {
+			/* Remember the number of running sets */
+			pmb_nr = rp + 1;
+		} else {
+			/* Toggle color */
+			alt_flag = (alt_flag + 1) % 2;
+
+			/* Build up a new set */
+			pmb_nr = 0;
+			for (; match; match = hashmap_get_next(hm, match)) {
+				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				pmb[pmb_nr++] = match;
+			}
+		}
+
+		switch (l->sign) {
+		case '+':
+			l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
+			break;
+		case '-':
+			l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
+			break;
+		default:
+			die("BUG: we should have continued earlier?");
+		}
+	}
+	free(pmb);
+}
+
 static void emit_buffered_patch_line(struct diff_options *o,
 				     struct buffered_patch_line *e)
 {
@@ -3514,6 +3742,8 @@ void diff_setup(struct diff_options *options)
 	options->line_buffer = NULL;
 	options->line_buffer_nr = 0;
 	options->line_buffer_alloc = 0;
+
+	options->color_moved = diff_color_moved_default;
 }
 
 void diff_setup_done(struct diff_options *options)
@@ -3623,6 +3853,9 @@ void diff_setup_done(struct diff_options *options)
 
 	if (DIFF_OPT_TST(options, FOLLOW_RENAMES) && options->pathspec.nr != 1)
 		die(_("--follow requires exactly one pathspec"));
+
+	if (!options->use_color || external_diff())
+		options->color_moved = 0;
 }
 
 static int opt_arg(const char *arg, int arg_short, const char *arg_long, int *val)
@@ -4047,6 +4280,10 @@ int diff_opt_parse(struct diff_options *options,
 	}
 	else if (!strcmp(arg, "--no-color"))
 		options->use_color = 0;
+	else if (!strcmp(arg, "--color-moved"))
+		options->color_moved = 1;
+	else if (!strcmp(arg, "--no-color-moved"))
+		options->color_moved = 0;
 	else if (!strcmp(arg, "--color-words")) {
 		options->use_color = 1;
 		options->word_diff = DIFF_WORDS_COLOR;
@@ -4852,16 +5089,19 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
-	/*
-	 * For testing purposes we want to make sure the diff machinery
-	 * works completely with the buffer. If there is anything emitted
-	 * outside the emit_buffered_patch_line, then the order is screwed
-	 * up and the tests will fail.
-	 *
-	 * TODO (later in this series):
-	 * We'll unset this flag in a later patch.
-	 */
-	o->use_buffer = 1;
+
+	if (o->color_moved) {
+		unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+		o->use_buffer = 1;
+		o->deleted_lines = xmallocz(sizeof(*o->deleted_lines));
+		o->added_lines = xmallocz(sizeof(*o->added_lines));
+		hashmap_init(o->deleted_lines, ignore_ws ?
+			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+			(hashmap_cmp_fn)moved_entry_cmp, 0);
+		hashmap_init(o->added_lines, ignore_ws ?
+			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+			(hashmap_cmp_fn)moved_entry_cmp, 0);
+	}
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -4870,6 +5110,11 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 	}
 
 	if (o->use_buffer) {
+		if (o->color_moved) {
+			add_lines_to_move_detection(o);
+			mark_color_as_moved(o);
+		}
+
 		for (i = 0; i < o->line_buffer_nr; i++)
 			emit_buffered_patch_line(o, &o->line_buffer[i]);
 
@@ -4958,6 +5203,7 @@ void diff_flush(struct diff_options *options)
 		if (!options->file)
 			die_errno("Could not open /dev/null");
 		options->close_file = 1;
+		options->color_moved = 0;
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
 			if (check_pair_status(p))
diff --git a/diff.h b/diff.h
index 64bc9bcd8c..b8b2a33ccc 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "hashmap.h"
 
 struct rev_info;
 struct diff_options;
@@ -227,6 +228,10 @@ struct diff_options {
 
 	struct buffered_patch_line *line_buffer;
 	int line_buffer_nr, line_buffer_alloc;
+
+	int color_moved;
+	struct hashmap *deleted_lines;
+	struct hashmap *added_lines;
 };
 
 void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
@@ -243,7 +248,11 @@ enum color_diff {
 	DIFF_FILE_NEW = 5,
 	DIFF_COMMIT = 6,
 	DIFF_WHITESPACE = 7,
-	DIFF_FUNCINFO = 8
+	DIFF_FUNCINFO = 8,
+	DIFF_FILE_OLD_MOVED = 9,
+	DIFF_FILE_OLD_MOVED_ALT = 10,
+	DIFF_FILE_NEW_MOVED = 11,
+	DIFF_FILE_NEW_MOVED_ALT = 12
 };
 const char *diff_get_color(int diff_use_color, enum color_diff ix);
 #define diff_get_color_opt(o, ix) \
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 289806d0c7..232d9ad55e 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -972,4 +972,233 @@ test_expect_success 'option overrides diff.wsErrorHighlight' '
 
 '
 
+test_expect_success 'detect moved code, complete file' '
+	git reset --hard &&
+	cat <<-\EOF >test.c &&
+	#include<stdio.h>
+	main()
+	{
+	printf("Hello World");
+	}
+	EOF
+	git add test.c &&
+	git commit -m "add main function" &&
+	git mv test.c main.c &&
+	git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>new file mode 100644<RESET>
+	<BOLD>index 0000000..a986c57<RESET>
+	<BOLD>--- /dev/null<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -0,0 +1,5 @@<RESET>
+	<BGREEN>+<RESET><BGREEN>#include<stdio.h><RESET>
+	<BGREEN>+<RESET><BGREEN>main()<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>printf("Hello World");<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>deleted file mode 100644<RESET>
+	<BOLD>index a986c57..0000000<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ /dev/null<RESET>
+	<CYAN>@@ -1,5 +0,0 @@<RESET>
+	<BRED>-#include<stdio.h><RESET>
+	<BRED>-main()<RESET>
+	<BRED>-{<RESET>
+	<BRED>-printf("Hello World");<RESET>
+	<BRED>-}<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect moved code, inside file' '
+	git reset --hard &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git add main.c test.c &&
+	git commit -m "add main and test file" &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BRED>-if (!u->is_allowed_foo)<RESET>
+	<BRED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BRED>-}<RESET>
+	<BRED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..e34eb69 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BGREEN>+<RESET><BGREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BGREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect permutations inside moved code, ' '
+	# reusing the move example from last test:
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			foo(u);
+			if (!u->is_allowed_foo)
+				return;
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BOLD;RED>-if (!u->is_allowed_foo)<RESET>
+	<BOLD;RED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BOLD;RED>-}<RESET>
+	<BOLD;RED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..2bedec9 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>}<RESET>
+	<BOLD;GREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 11/20] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-17  2:58   ` [PATCHv2 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
@ 2017-05-17  5:03     ` Junio C Hamano
  2017-05-17 21:16       ` Stefan Beller
  2017-05-18  3:35     ` Junio C Hamano
  1 sibling, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-17  5:03 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder, jonathantanmy, bmwill, peff, mhagger

Stefan Beller <sbeller@google.com> writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers emit_rewrite_lines.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 3dda9f3c8e..690794aeb8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -722,15 +722,25 @@ static void add_line_count(struct strbuf *out, int count)
>  static void emit_rewrite_lines(struct emit_callback *ecb,
>  			       int prefix, const char *data, int size)
>  {
> -	const char *endp = NULL;
> -	static const char *nneof = " No newline at end of file\n";
>  	const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
> +	struct strbuf sb = STRBUF_INIT;
>  
>  	while (0 < size) {
>  		int len;
>  
> -		endp = memchr(data, '\n', size);
> -		len = endp ? (endp - data + 1) : size;
> +		const char *endp = memchr(data, '\n', size);
> +		if (endp)
> +			len = endp - data + 1;
> +		else {
> +			while (0 < size) {
> +				strbuf_addch(&sb, *data);
> +				size -= len;
> +				data += len;
> +			}
> +			strbuf_addch(&sb, '\n');
> +			data = sb.buf;
> +			len = sb.len;
> +		}
>  		if (prefix != '+') {
>  			ecb->lno_in_preimage++;
>  			emit_del_line(reset, ecb, data, len);
> @@ -741,12 +751,13 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
>  		size -= len;
>  		data += len;
>  	}
> -	if (!endp) {
> +	if (sb.len) {
> +		static const char *nneof = "\\ No newline at end of file\n";
>  		const char *context = diff_get_color(ecb->color_diff,
>  						     DIFF_CONTEXT);
> -		putc('\n', ecb->opt->file);
> -		emit_line(ecb->opt, context, reset, 1, '\\',
> -			  nneof, strlen(nneof));
> +		emit_line(ecb->opt, context, reset, 1, 0,
> +			    nneof, strlen(nneof));
> +		strbuf_release(&sb);

The reason why we can lose the LF immediately after the incomplete
line we found in the above loop is because the updated emit_line_0()
adds LF if its input is an incomplete line?  Even before this series
started, emit_line_0() was already prepared to see a complete or
incomplete line and emit the "reset" color after the optional EOL
bytes at the end, so emit_line() and emit_{add,del}_line() calls
throughout the code can pass the body of the line with or without
the EOL and right things will happen.  Sounds about right.




>  	}
>  }

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-17  2:58   ` [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
@ 2017-05-17  5:19     ` Junio C Hamano
  2017-05-17 21:05       ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-17  5:19 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder, jonathantanmy, bmwill, peff, mhagger

Stefan Beller <sbeller@google.com> writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This prepares the code for submodules to go through the
> emit_line function.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c      | 20 +++++++---------
>  diff.h      |  5 ++++
>  submodule.c | 78 ++++++++++++++++++++++++++++++-------------------------------
>  submodule.h |  9 +++----
>  4 files changed, 56 insertions(+), 56 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 690794aeb8..7c8d6a5d12 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
>  	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>  }
>  
> -static void emit_line(struct diff_options *o, const char *set, const char *reset,
> -		      int add_line_prefix, int sign, const char *line, int len)
> +void emit_line(struct diff_options *o, const char *set, const char *reset,
> +	       int add_line_prefix, int sign, const char *line, int len)
>  {
>  	int has_trailing_newline, has_trailing_carriage_return;
>  	FILE *file = o->file;
> @@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
>  		fputc('\n', file);
>  }
>  
> -static void emit_line_fmt(struct diff_options *o,
> -			  const char *set, const char *reset,
> -			  int add_line_prefix,
> -			  const char *fmt, ...)
> +void emit_line_fmt(struct diff_options *o,
> +		   const char *set, const char *reset,
> +		   int add_line_prefix,
> +		   const char *fmt, ...)

Interesting...

> -static void show_submodule_header(FILE *f, const char *path,
> -		const char *line_prefix,
> +static void show_submodule_header(struct diff_options *o, const char *path,
>  		struct object_id *one, struct object_id *two,
>  		unsigned dirty_submodule, const char *meta,
>  		const char *reset,

Is this ONLY called when the caller wants its output inserted to the
"diff" (or "log -p") output?  If so, I think it makes sense to pass
'o', but if the function is oblivious that it is driven to produce
part of a "diff", it feels wrong to pass 'o'.  The original was
taking a "FILE *" and line_prefix, so it is rather clear that the
answer to the question is "yes, this is very closely tied to diff
output".  Now you have access to 'o', so you do not need to pass
them separately.  Good.

Each line in its output, when incorporated in "diff" or "log -p"
output, must be prefixed with the line-prefix to accomodate users of
"log --graph", so I guess it cannot be helped.  Your calls to
emit_line_fmt() below seems to ask the line-prefix to be added,
which is good, too.

How does capturing these lines help moved line detection, by the
way?  These must never be matched with any other added or removed
line in the real patch output.

> @@ -426,11 +419,11 @@ static void show_submodule_header(FILE *f, const char *path,
>  	int fast_forward = 0, fast_backward = 0;
>  
>  	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
> -		fprintf(f, "%sSubmodule %s contains untracked content\n",
> -			line_prefix, path);
> +		emit_line_fmt(o, NULL, NULL, 1,
> +			      "Submodule %s contains untracked content\n", path);
>  	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
> -		fprintf(f, "%sSubmodule %s contains modified content\n",
> -			line_prefix, path);
> +		emit_line_fmt(o, NULL, NULL, 1,
> +			      "Submodule %s contains modified content\n", path);
>  
>  	if (is_null_oid(one))
>  		message = "(new submodule)";

emit_line() and emit_line_fmt() are both inappropriate names for a
global function.  These are very closely tied to diff generation, so
we probably want to see some form of "diff" in their names.  

The fact that it is clear because its first parameter is "struct
diff_options" is insufficient---"you cannot tell what context the
function is meant to be used by only looking at its name" is
certainly solved by its function signature, but the other issue with
an overly generic name is that other codepaths in different contexts
may want to use such a short and sweet name.

Thanks.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-17  5:19     ` Junio C Hamano
@ 2017-05-17 21:05       ` Stefan Beller
  2017-05-18  3:25         ` Junio C Hamano
  0 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-17 21:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jonathan Nieder, Jonathan Tan,
	Brandon Williams, Jeff King, Michael Haggerty

On Tue, May 16, 2017 at 10:19 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> In a later patch, I want to propose an option to detect&color
>> moved lines in a diff, which cannot be done in a one-pass over
>> the diff. Instead we need to go over the whole diff twice,
>> because we cannot detect the first line of the two corresponding
>> lines (+ and -) that got moved.
>>
>> So to prepare the diff machinery for two pass algorithms
>> (i.e. buffer it all up and then operate on the result),
>> move all emissions to places, such that the only emitting
>> function is emit_line_0.
>>
>> This prepares the code for submodules to go through the
>> emit_line function.
>>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>  diff.c      | 20 +++++++---------
>>  diff.h      |  5 ++++
>>  submodule.c | 78 ++++++++++++++++++++++++++++++-------------------------------
>>  submodule.h |  9 +++----
>>  4 files changed, 56 insertions(+), 56 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 690794aeb8..7c8d6a5d12 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
>>       ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
>>  }
>>
>> -static void emit_line(struct diff_options *o, const char *set, const char *reset,
>> -                   int add_line_prefix, int sign, const char *line, int len)
>> +void emit_line(struct diff_options *o, const char *set, const char *reset,
>> +            int add_line_prefix, int sign, const char *line, int len)
>>  {
>>       int has_trailing_newline, has_trailing_carriage_return;
>>       FILE *file = o->file;
>> @@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
>>               fputc('\n', file);
>>  }
>>
>> -static void emit_line_fmt(struct diff_options *o,
>> -                       const char *set, const char *reset,
>> -                       int add_line_prefix,
>> -                       const char *fmt, ...)
>> +void emit_line_fmt(struct diff_options *o,
>> +                const char *set, const char *reset,
>> +                int add_line_prefix,
>> +                const char *fmt, ...)
>
> Interesting...
>
>> -static void show_submodule_header(FILE *f, const char *path,
>> -             const char *line_prefix,
>> +static void show_submodule_header(struct diff_options *o, const char *path,
>>               struct object_id *one, struct object_id *two,
>>               unsigned dirty_submodule, const char *meta,
>>               const char *reset,
>
> Is this ONLY called when the caller wants its output inserted to the
> "diff" (or "log -p") output?

Yes.

>  If so, I think it makes sense to pass
> 'o', but if the function is oblivious that it is driven to produce
> part of a "diff", it feels wrong to pass 'o'.  The original was
> taking a "FILE *" and line_prefix, so it is rather clear that the
> answer to the question is "yes, this is very closely tied to diff
> output".  Now you have access to 'o', so you do not need to pass
> them separately.  Good.

ok.

> Each line in its output, when incorporated in "diff" or "log -p"
> output, must be prefixed with the line-prefix to accomodate users of
> "log --graph", so I guess it cannot be helped.  Your calls to
> emit_line_fmt() below seems to ask the line-prefix to be added,
> which is good, too.
>
> How does capturing these lines help moved line detection, by the
> way?  These must never be matched with any other added or removed
> line in the real patch output.

Why?

Actually I think it has some value if it can match across
(submodule-)repository boundaries, e.g. think of Ævars RFC to put
SHA1DC into a submodule. If reviewing that commit later on, a user
may be interested in "what is the difference between what we carried so
far in this repo compared to what we point at now in the submodule".
Most of the code should be the same, but anchored at a different
path/repo, so a move detection would be super helpful.

I do understand that you may not want to see a move crossing
a repo boundary, but I would prefer that to a later patch, once we
have a better understanding on the use cases of this new feature.

>>       if (is_null_oid(one))
>>               message = "(new submodule)";
>
> emit_line() and emit_line_fmt() are both inappropriate names for a
> global function.  These are very closely tied to diff generation, so
> we probably want to see some form of "diff" in their names.

Oh, uh. You're right.

I would think inside of diff.c we'd still want to keep the short name,
so maybe I'd expose a wrapper to the outside world.

Maybe

    diff_emit_strbuf(diff_options *, strbuf *)

would be fine for all use cases from outside.


> The fact that it is clear because its first parameter is "struct
> diff_options" is insufficient---"you cannot tell what context the
> function is meant to be used by only looking at its name" is
> certainly solved by its function signature, but the other issue with
> an overly generic name is that other codepaths in different contexts
> may want to use such a short and sweet name.
>
> Thanks.

I am bad at naming, so if you have a better idea for names,
feel free to mention them.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 11/20] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-17  5:03     ` Junio C Hamano
@ 2017-05-17 21:16       ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-17 21:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jonathan Nieder, Jonathan Tan,
	Brandon Williams, Jeff King, Michael Haggerty

On Tue, May 16, 2017 at 10:03 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:

> The reason why we can lose the LF immediately after the incomplete
> line we found in the above loop is because the updated emit_line_0()
> adds LF if its input is an incomplete line?

No. Because there are no incomplete lines any more, as we complete
the line above via strbuf_addch(&sb, '\n');

In a buffered world, we need to think about what to buffer, and I think
we rather want to buffer all lines with the same line ending otherwise
the comparison function is harder. So in that case we'll rather

    line1="last line, but we added EOL\n"
    line2="\\ No newline at end of file\n"

because line1 could occur somewhere else as is with the \n.

>  Even before this series
> started, emit_line_0() was already prepared to see a complete or
> incomplete line and emit the "reset" color after the optional EOL
> bytes at the end, so emit_line() and emit_{add,del}_line() calls
> throughout the code can pass the body of the line with or without
> the EOL and right things will happen.  Sounds about right.

Yes the emit_line_0 will strip off \n if there is and output it after a
potential color reset.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-17 21:05       ` Stefan Beller
@ 2017-05-18  3:25         ` Junio C Hamano
  2017-05-18 17:12           ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-18  3:25 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jonathan Nieder, Jonathan Tan,
	Brandon Williams, Jeff King, Michael Haggerty

Stefan Beller <sbeller@google.com> writes:

>>> +static void show_submodule_header(struct diff_options *o, const char *path,
>>>               struct object_id *one, struct object_id *two,
>>>               unsigned dirty_submodule, const char *meta,
>>>               const char *reset,
>> ...
>> How does capturing these lines help moved line detection, by the
>> way?  These must never be matched with any other added or removed
>> line in the real patch output.
>
> Why?

What are buffered are not patch text, but informational text like
"Submodule X contains untracked content", etc.  When a text file is
modified elsewhere and lost a line that happened to say the same
contents, we do not want to consider that such a line was moved to
where a submodule had an untracked file.

I have a suspicion that the two-pass buffering is done at too high a
level in this series.  Doesn't the code (I haven't reached the end
of the series) update emit_line() to buffer the patch text and these
non-patch text with all the coloring and resetting sequences?
Because the "ah, this old line removed corresponds to that new line
that appears elsewhere?" logic do not want to see these color/reset
sequence, the buffering code needs to become quite specific to how
the current diff code is colored (e.g. a line must be painted in a
single color and have reset at the end) and makes future change to
color things differently almost impossible (e.g. imagine how you
would add a "feature" that paints certain words on added lines
differently?).

Ahh, yes, I see NEEDSWORK comment in patch 19/20.  

Yes, I agree that this code really should be working in terms of
offsets into pre/post images when finding matching changes, which
probably should happen without letting fn_out_consume produce fully
colored textual diff output in the first pass.  For the purpose of
"moved lines detection", the logic to match a stretch of preimage
lines with postimage lines do not want to bother with "--- a/$path"
headers, and it does not want to care if a line that begins with '+'
needs to be added by calling emit_add_line() that knows how to check
ws errors or the payload needs to be painted in green.  After the
first pass determines which added lines are not true addition but
merely moved, the second pass would know how that '+' line needs to
be painted a lot better (e.g. it may not be painted in green).
Letting fn_out_consume() call emit_add_line() only to compute
information (e.g. "'+'? ok, green") that the first pass does not
want to see and the second pass will compute better is probably not
a good longer term direction to go in.

Having said that, we need to start somewhere, and I think it is a
reasonable first-cut attempt to work on top of the textual output
like this series does (IOW, while I do agree with the NEEDSWORK and
the way this series currently does things must be revamped in the
longer term, I do not think we should wait until that happens to
start playing with this topic).

Thanks.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 11/20] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-17  2:58   ` [PATCHv2 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
  2017-05-17  5:03     ` Junio C Hamano
@ 2017-05-18  3:35     ` Junio C Hamano
  1 sibling, 0 replies; 128+ messages in thread
From: Junio C Hamano @ 2017-05-18  3:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jrnieder, jonathantanmy, bmwill, peff, mhagger

Stefan Beller <sbeller@google.com> writes:

> In a later patch, I want to propose an option to detect&color
> moved lines in a diff, which cannot be done in a one-pass over
> the diff. Instead we need to go over the whole diff twice,
> because we cannot detect the first line of the two corresponding
> lines (+ and -) that got moved.
>
> So to prepare the diff machinery for two pass algorithms
> (i.e. buffer it all up and then operate on the result),
> move all emissions to places, such that the only emitting
> function is emit_line_0.
>
> This covers emit_rewrite_lines.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 3dda9f3c8e..690794aeb8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -722,15 +722,25 @@ static void add_line_count(struct strbuf *out, int count)
>  static void emit_rewrite_lines(struct emit_callback *ecb,
>  			       int prefix, const char *data, int size)
>  {
> -	const char *endp = NULL;
> -	static const char *nneof = " No newline at end of file\n";
>  	const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
> +	struct strbuf sb = STRBUF_INIT;
>  
>  	while (0 < size) {
>  		int len;
>  
> -		endp = memchr(data, '\n', size);
> -		len = endp ? (endp - data + 1) : size;
> +		const char *endp = memchr(data, '\n', size);
> +		if (endp)
> +			len = endp - data + 1;
> +		else {

This side does not initialize "len" at all, which means ...

> +			while (0 < size) {
> +				strbuf_addch(&sb, *data);
> +				size -= len;
> +				data += len;

... we do random computation here.

> +			}
> +			strbuf_addch(&sb, '\n');
> +			data = sb.buf;
> +			len = sb.len;
> +		}
>  		if (prefix != '+') {
>  			ecb->lno_in_preimage++;
>  			emit_del_line(reset, ecb, data, len);
> @@ -741,12 +751,13 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
>  		size -= len;
>  		data += len;
>  	}
> -	if (!endp) {
> +	if (sb.len) {
> +		static const char *nneof = "\\ No newline at end of file\n";
>  		const char *context = diff_get_color(ecb->color_diff,
>  						     DIFF_CONTEXT);
> -		putc('\n', ecb->opt->file);
> -		emit_line(ecb->opt, context, reset, 1, '\\',
> -			  nneof, strlen(nneof));
> +		emit_line(ecb->opt, context, reset, 1, 0,
> +			    nneof, strlen(nneof));
> +		strbuf_release(&sb);
>  	}
>  }

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-18  3:25         ` Junio C Hamano
@ 2017-05-18 17:12           ` Stefan Beller
  2017-05-20  4:50             ` Junio C Hamano
  0 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 17:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jonathan Nieder, Jonathan Tan,
	Brandon Williams, Jeff King, Michael Haggerty

On Wed, May 17, 2017 at 8:25 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>>> +static void show_submodule_header(struct diff_options *o, const char *path,
>>>>               struct object_id *one, struct object_id *two,
>>>>               unsigned dirty_submodule, const char *meta,
>>>>               const char *reset,
>>> ...
>>> How does capturing these lines help moved line detection, by the
>>> way?  These must never be matched with any other added or removed
>>> line in the real patch output.
>>
>> Why?
>
> What are buffered are not patch text, but informational text like
> "Submodule X contains untracked content", etc.  When a text file is
> modified elsewhere and lost a line that happened to say the same
> contents, we do not want to consider that such a line was moved to
> where a submodule had an untracked file.
>
> I have a suspicion that the two-pass buffering is done at too high a
> level in this series.  Doesn't the code (I haven't reached the end
> of the series) update emit_line() to buffer the patch text and these
> non-patch text with all the coloring and resetting sequences?
> Because the "ah, this old line removed corresponds to that new line
> that appears elsewhere?" logic do not want to see these color/reset
> sequence, the buffering code needs to become quite specific to how
> the current diff code is colored (e.g. a line must be painted in a
> single color and have reset at the end) and makes future change to
> color things differently almost impossible (e.g. imagine how you
> would add a "feature" that paints certain words on added lines
> differently?).

That could be added in ws.c:ws_check_emit, as these certain words
are similar to coloring whitespace.

It depends on the precedence of such a future feature, is the move
detection or the word highlighting more important to keep its color?

> Ahh, yes, I see NEEDSWORK comment in patch 19/20.
>
> Yes, I agree that this code really should be working in terms of
> offsets into pre/post images when finding matching changes, which
> probably should happen without letting fn_out_consume produce fully
> colored textual diff output in the first pass.  For the purpose of
> "moved lines detection", the logic to match a stretch of preimage
> lines with postimage lines do not want to bother with "--- a/$path"
> headers, and it does not want to care if a line that begins with '+'
> needs to be added by calling emit_add_line() that knows how to check
> ws errors or the payload needs to be painted in green.  After the
> first pass determines which added lines are not true addition but
> merely moved, the second pass would know how that '+' line needs to
> be painted a lot better (e.g. it may not be painted in green).
> Letting fn_out_consume() call emit_add_line() only to compute
> information (e.g. "'+'? ok, green") that the first pass does not
> want to see and the second pass will compute better is probably not
> a good longer term direction to go in.
>
> Having said that, we need to start somewhere, and I think it is a
> reasonable first-cut attempt to work on top of the textual output
> like this series does (IOW, while I do agree with the NEEDSWORK and
> the way this series currently does things must be revamped in the
> longer term, I do not think we should wait until that happens to
> start playing with this topic).

Ok. I share a similar reaction to submodule diffs that we discuss above
and word coloring, that Jonathan Tan brought up off list.

Both of them are broken in this implementation, but the NEEDSWORK
would hint at how to fix them.

For word coloring, we'd invent a new state BPL_EMIT_WITH_BRACES,
that would only store the word and at output time we'd have to add
sign, braces, colors. Then a block movement detection is possible.
(and this would also work with offset/len into the files longer term)

For the submodule diffs, I am really looking forward how Brandons
current work is coming along to have a repository struct such that we
can process submodules in the same process. For this diffing the
repo object would need to learn about the attribute system of
the submodules, such that we can obtain the whitespace coloring
rules, as well as the config (submodule may be configured to use
different colors for diffs).

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [PATCHv3 00/20] Diff machine: highlight moved lines.
  2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
                     ` (19 preceding siblings ...)
  2017-05-17  2:58   ` [PATCHv2 20/20] diff.c: color moved lines differently Stefan Beller
@ 2017-05-18 19:37   ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 01/20] diff: readability fix Stefan Beller
                       ` (20 more replies)
  20 siblings, 21 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

v3:
* see interdiff below.
* fixing one invalid computation (Thanks Junio!)
* I reasoned more about submodule and word diffing, see the commit message
  of the last patch:
  
    A note on the options '--submodule=diff' and '--color-words/--word-diff':
    In the conversion to use emit_line in the prior patches both submodules
    as well as word diff output carefully chose to call emit_line with sign=0.
    All output with sign=0 is ignored for move detection purposes in this
    patch, such that no weird looking output will be generated for these
    cases. This leads to another thought: We could pass on '--color-moved' to
    submodules such that they color up moved lines for themselves. If we'd do
    so only line moves within a repository boundary are marked up.

* better name for emit_line outside of diff.[ch]

v2:
* emit_line now takes an argument that indicates if we want it
  to emit the line prefix as well. This should allow for a more faithful
  refactoring in the beginning. (Thanks Jonathan!)
* fixed memleaks (Thanks Brandon!)
* "git -c color.moved=true log -p" works now! (Thanks Jeff)
* interdiff below, though it is large.
* less intrusive than v1 (Thanks Jonathan!)

v1:

For details on *why* see the commit message of the last commit.

The first five patches are slight refactorings to get into good
shape, the next patches are funneling all output through emit_line_*.

The second last patch introduces an option to buffer up all output
before printing, and then the last patch can color up moved lines
of code.

Any feedback welcome.

Thanks,
Stefan

Stefan Beller (20):
  diff: readability fix
  diff: move line ending check into emit_hunk_header
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff.c: teach emit_line_0 to accept sign parameter
  diff.c: emit_line_0 can handle no color setting
  diff.c: emit_line_0 takes parameter whether to output line prefix
  diff.c: inline emit_line_0 into emit_line
  diff.c: convert fn_out_consume to use emit_line
  diff.c: convert builtin_diff to use emit_line_*
  diff.c: convert emit_rewrite_diff to use emit_line_*
  diff.c: convert emit_rewrite_lines to use emit_line_*
  submodule.c: convert show_submodule_summary to use emit_line_fmt
  diff.c: convert emit_binary_diff_body to use emit_line_*
  diff.c: convert show_stats to use emit_line_*
  diff.c: convert word diffing to use emit_line_*
  diff.c: convert diff_flush to use emit_line_*
  diff.c: convert diff_summary to use emit_line_*
  diff.c: emit_line includes whitespace highlighting
  diff: buffer all output if asked to
  diff.c: color moved lines differently

 Documentation/config.txt   |  14 +-
 diff.c                     | 849 +++++++++++++++++++++++++++++++++------------
 diff.h                     |  59 +++-
 submodule.c                |  87 ++---
 submodule.h                |   9 +-
 t/t4015-diff-whitespace.sh | 229 ++++++++++++
 6 files changed, 969 insertions(+), 278 deletions(-)

diff --git a/diff.c b/diff.c
index 15cf322b50..451cab2875 100644
--- a/diff.c
+++ b/diff.c
@@ -840,6 +840,12 @@ void emit_line_fmt(struct diff_options *o,
 	strbuf_release(&sb);
 }
 
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+		    const char *line, int len)
+{
+	emit_line(o, set, reset, 1, 0, 0, line, len);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -1009,12 +1015,10 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		if (endp)
 			len = endp - data + 1;
 		else {
-			while (0 < size) {
-				strbuf_addch(&sb, *data);
-				size -= len;
-				data += len;
-			}
+			strbuf_add(&sb, data, size);
 			strbuf_addch(&sb, '\n');
+			size = 0; /* to exit the loop. */
+
 			data = sb.buf;
 			len = sb.len;
 		}
diff --git a/diff.h b/diff.h
index b8b2a33ccc..2d86e3a012 100644
--- a/diff.h
+++ b/diff.h
@@ -234,10 +234,8 @@ struct diff_options {
 	struct hashmap *added_lines;
 };
 
-void emit_line_fmt(struct diff_options *o, const char *set, const char *reset,
-		   int add_line_prefix, const char *fmt, ...);
-void emit_line(struct diff_options *o, const char *set, const char *reset,
-	       int add_line_prefix, int markup_ws, int sign, const char *line, int len);
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+		    const char *line, int len);
 
 enum color_diff {
 	DIFF_RESET = 0,
diff --git a/submodule.c b/submodule.c
index 868f913971..19c63197fb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -378,9 +378,9 @@ static void print_submodule_summary(struct rev_info *rev,
 		format_commit_message(commit, format, &sb, &ctx);
 		strbuf_addch(&sb, '\n');
 		if (commit->object.flags & SYMMETRIC_LEFT)
-			emit_line(o, del, reset, 1, 0, 0, sb.buf, sb.len);
+			diff_emit_line(o, del, reset, sb.buf, sb.len);
 		else if (add)
-			emit_line(o, add, reset, 1, 0, 0, sb.buf, sb.len);
+			diff_emit_line(o, add, reset, sb.buf, sb.len);
 	}
 	strbuf_release(&sb);
 }
@@ -418,12 +418,17 @@ static void show_submodule_header(struct diff_options *o, const char *path,
 	struct strbuf sb = STRBUF_INIT;
 	int fast_forward = 0, fast_backward = 0;
 
-	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
-		emit_line_fmt(o, NULL, NULL, 1,
-			      "Submodule %s contains untracked content\n", path);
-	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-		emit_line_fmt(o, NULL, NULL, 1,
-			      "Submodule %s contains modified content\n", path);
+	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) {
+		strbuf_addf(&sb, "Submodule %s contains untracked content\n", path);
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+		strbuf_reset(&sb);
+	}
+
+	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED) {
+		strbuf_addf(&sb, "Submodule %s contains modified content\n", path);
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+		strbuf_reset(&sb);
+	}
 
 	if (is_null_oid(one))
 		message = "(new submodule)";
@@ -473,7 +478,7 @@ static void show_submodule_header(struct diff_options *o, const char *path,
 		strbuf_addf(&sb, " %s\n", message);
 	else
 		strbuf_addf(&sb, "%s:\n", fast_backward ? " (rewind)" : "");
-	emit_line(o, meta, reset, 1, 0, 0, sb.buf, sb.len);
+	diff_emit_line(o, meta, reset, sb.buf, sb.len);
 
 	strbuf_release(&sb);
 }
@@ -501,7 +506,7 @@ void show_submodule_summary(struct diff_options *o, const char *path,
 	/* Treat revision walker failure the same as missing commits */
 	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
 		const char *error = "(revision walker failed)\n";
-		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
 		goto out;
 	}
 
@@ -570,15 +575,15 @@ void show_submodule_inline_diff(struct diff_options *o, const char *path,
 	prepare_submodule_repo_env(&cp.env_array);
 	if (start_command(&cp)) {
 		const char *error = "(diff failed)\n";
-		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
 	}
 
 	while (strbuf_getwholeline_fd(&sb, cp.out, '\n') != EOF)
-		emit_line(o, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
 
 	if (finish_command(&cp)) {
 		const char *error = "(diff failed)\n";
-		emit_line(o, NULL, NULL, 1, 0, 0, error, strlen(error));
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
 	}
 
 done:


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 01/20] diff: readability fix
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
                       ` (19 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

We already have dereferenced 'p->two' into a local variable 'two'. Use
that.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 	const char *other;
 	const char *attr_path;
 
-	name  = p->one->path;
-	other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+	name  = one->path;
+	other = (strcmp(name, two->path) ? two->path : NULL);
 	attr_path = name;
 	if (o->prefix_length)
 		strip_prefix(o->prefix_length, &name, &other);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 02/20] diff: move line ending check into emit_hunk_header
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 01/20] diff: readability fix Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
                       ` (18 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	}
 
 	strbuf_add(&msgbuf, line + len, org_len - len);
+	strbuf_complete_line(&msgbuf);
+
 	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		len = sane_truncate_line(ecbdata, line, len);
 		find_lno(line, ecbdata);
 		emit_hunk_header(ecbdata, line, len);
-		if (line[len-1] != '\n')
-			putc('\n', o->file);
 		return;
 	}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 03/20] diff.c: factor out diff_flush_patch_all_file_pairs
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 01/20] diff: readability fix Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
                       ` (17 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..2f9722b382 100644
--- a/diff.c
+++ b/diff.c
@@ -4737,6 +4737,17 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc)
 		warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+	int i;
+	struct diff_queue_struct *q = &diff_queued_diff;
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		if (check_pair_status(p))
+			diff_flush_patch(p, o);
+	}
+}
+
 void diff_flush(struct diff_options *options)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
@@ -4831,11 +4842,7 @@ void diff_flush(struct diff_options *options)
 			}
 		}
 
-		for (i = 0; i < q->nr; i++) {
-			struct diff_filepair *p = q->queue[i];
-			if (check_pair_status(p))
-				diff_flush_patch(p, options);
-		}
+		diff_flush_patch_all_file_pairs(options);
 	}
 
 	if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (2 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 23:33       ` Jonathan Tan
  2017-05-18 19:37     ` [PATCHv3 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
                       ` (16 subsequent siblings)
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

Teach emit_line_0 take a "sign" parameter specifically intended
to hold the sign of the line instead of a separate "first" parameter
representing the first character of the line to be printed.  Callers
that store the sign and line separately can use the "sign" parameter
like they used the "first" parameter previously, and callers that store
the sign and line together (or do not have a sign) no longer need to
manipulate their arguments to fit the requirements of emit_line_0.

With this patch other callers hard code the sign (which are '+', '-',
' ' and '\\') such that we do not run into unexpectedly emitting an
erroneous '\0'.

The audit of the caller revealed that the sign cannot be '\n' or '\r',
so remove that condition for trailing newline or carriage return in the
sign; the else part of the condition handles the len==0 perfectly,
so we can drop the if/else construct.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/diff.c b/diff.c
index 2f9722b382..73e55b0c10 100644
--- a/diff.c
+++ b/diff.c
@@ -517,33 +517,24 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 }
 
 static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int first, const char *line, int len)
+			int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
-	int nofirst;
 	FILE *file = o->file;
 
 	fputs(diff_line_prefix(o), file);
 
-	if (len == 0) {
-		has_trailing_newline = (first == '\n');
-		has_trailing_carriage_return = (!has_trailing_newline &&
-						(first == '\r'));
-		nofirst = has_trailing_newline || has_trailing_carriage_return;
-	} else {
-		has_trailing_newline = (len > 0 && line[len-1] == '\n');
-		if (has_trailing_newline)
-			len--;
-		has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-		if (has_trailing_carriage_return)
-			len--;
-		nofirst = 0;
-	}
+	has_trailing_newline = (len > 0 && line[len-1] == '\n');
+	if (has_trailing_newline)
+		len--;
+	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+	if (has_trailing_carriage_return)
+		len--;
 
-	if (len || !nofirst) {
+	if (len || sign) {
 		fputs(set, file);
-		if (!nofirst)
-			fputc(first, file);
+		if (sign)
+			fputc(sign, file);
 		fwrite(line, len, 1, file);
 		fputs(reset, file);
 	}
@@ -556,7 +547,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 static void emit_line(struct diff_options *o, const char *set, const char *reset,
 		      const char *line, int len)
 {
-	emit_line_0(o, set, reset, line[0], line+1, len-1);
+	emit_line_0(o, set, reset, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
@@ -4833,9 +4824,11 @@ void diff_flush(struct diff_options *options)
 
 	if (output_format & DIFF_FORMAT_PATCH) {
 		if (separator) {
-			fprintf(options->file, "%s%c",
-				diff_line_prefix(options),
-				options->line_termination);
+			char term[2];
+			term[0] = options->line_termination;
+			term[1] = '\0';
+
+			emit_line(options, NULL, NULL, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
 				fputs(options->stat_sep, options->file);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 05/20] diff.c: emit_line_0 can handle no color setting
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (3 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
                       ` (15 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In later patches we may pass lines that are not colored to the central
function emit_line_0, so we need to emit the color only when it is
non-NULL.

We could have chosen to pass "" instead of NULL, but that would be more
work.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 73e55b0c10..6c1886d495 100644
--- a/diff.c
+++ b/diff.c
@@ -532,11 +532,13 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		len--;
 
 	if (len || sign) {
-		fputs(set, file);
+		if (set)
+			fputs(set, file);
 		if (sign)
 			fputc(sign, file);
 		fwrite(line, len, 1, file);
-		fputs(reset, file);
+		if (reset)
+			fputs(reset, file);
 	}
 	if (has_trailing_carriage_return)
 		fputc('\r', file);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (4 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
                       ` (14 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In later patches we'll make extensive use of emit_line_0, as we'd want
to funnel all output through this function such that we can add buffering
there.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 6c1886d495..25735f03d2 100644
--- a/diff.c
+++ b/diff.c
@@ -517,12 +517,13 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 }
 
 static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int sign, const char *line, int len)
+			int add_line_prefix, int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
 
-	fputs(diff_line_prefix(o), file);
+	if (add_line_prefix)
+		fputs(diff_line_prefix(o), file);
 
 	has_trailing_newline = (len > 0 && line[len-1] == '\n');
 	if (has_trailing_newline)
@@ -549,7 +550,7 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 static void emit_line(struct diff_options *o, const char *set, const char *reset,
 		      const char *line, int len)
 {
-	emit_line_0(o, set, reset, 0, line, len);
+	emit_line_0(o, set, reset, 1, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
@@ -580,13 +581,13 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line_0(ecbdata->opt, set, reset, sign, line, len);
+		emit_line_0(ecbdata->opt, set, reset, 1, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
+		emit_line_0(ecbdata->opt, ws, reset, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
+		emit_line_0(ecbdata->opt, set, reset, 1, sign, "", 0);
 		ws_check_emit(line, len, ecbdata->ws_rule,
 			      ecbdata->opt->file, set, reset, ws);
 	}
@@ -735,7 +736,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
 		putc('\n', ecb->opt->file);
-		emit_line_0(ecb->opt, context, reset, '\\',
+		emit_line_0(ecb->opt, context, reset, 1, '\\',
 			    nneof, strlen(nneof));
 	}
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 07/20] diff.c: inline emit_line_0 into emit_line
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (5 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
                       ` (13 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

The argument list of emit_line_0 is just 2 more arguments that are
hard-coded in emit_line. Eliminate this intermediate function and
rename the remaining function by dropping the '_0'.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/diff.c b/diff.c
index 25735f03d2..3569857818 100644
--- a/diff.c
+++ b/diff.c
@@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int add_line_prefix, int sign, const char *line, int len)
+static void emit_line(struct diff_options *o, const char *set, const char *reset,
+		      int add_line_prefix, int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
@@ -547,12 +547,6 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		fputc('\n', file);
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char *reset,
-		      const char *line, int len)
-{
-	emit_line_0(o, set, reset, 1, 0, line, len);
-}
-
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -581,13 +575,13 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line_0(ecbdata->opt, set, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line_0(ecbdata->opt, ws, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line_0(ecbdata->opt, set, reset, 1, sign, "", 0);
+		emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
 		ws_check_emit(line, len, ecbdata->ws_rule,
 			      ecbdata->opt->file, set, reset, ws);
 	}
@@ -637,7 +631,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	if (len < 10 ||
 	    memcmp(line, atat, 2) ||
 	    !(ep = memmem(line + 2, len - 2, atat, 2))) {
-		emit_line(ecbdata->opt, context, reset, line, len);
+		emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
 		return;
 	}
 	ep += 2; /* skip over @@ */
@@ -673,7 +667,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	strbuf_add(&msgbuf, line + len, org_len - len);
 	strbuf_complete_line(&msgbuf);
 
-	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
+	emit_line(ecbdata->opt, "", "", 1, 0, msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
 
@@ -736,8 +730,8 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
 		putc('\n', ecb->opt->file);
-		emit_line_0(ecb->opt, context, reset, 1, '\\',
-			    nneof, strlen(nneof));
+		emit_line(ecb->opt, context, reset, 1, '\\',
+			  nneof, strlen(nneof));
 	}
 }
 
@@ -1335,7 +1329,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-			emit_line(o, context, reset, line, len);
+			emit_line(o, context, reset, 1, 0, line, len);
 			fputs("~\n", o->file);
 		} else {
 			/*
@@ -1347,7 +1341,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			      line++;
 			      len--;
 			}
-			emit_line(o, context, reset, line, len);
+			emit_line(o, context, reset, 1, 0, line, len);
 		}
 		return;
 	}
@@ -1370,7 +1364,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		/* incomplete line at the end */
 		ecbdata->lno_in_preimage++;
 		emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
-			  reset, line, len);
+			  reset, 1, 0, line, len);
 		break;
 	}
 }
@@ -2182,7 +2176,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
 			line_prefix, data->filename, data->lineno, err);
 		free(err);
-		emit_line(data->o, set, reset, line, 1);
+		emit_line(data->o, set, reset, 1, 0, line, 1);
 		ws_check_emit(line + 1, len - 1, data->ws_rule,
 			      data->o->file, set, reset, ws);
 	} else if (line[0] == ' ') {
@@ -4831,7 +4825,7 @@ void diff_flush(struct diff_options *options)
 			term[0] = options->line_termination;
 			term[1] = '\0';
 
-			emit_line(options, NULL, NULL, term, !!term[0]);
+			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
 				fputs(options->stat_sep, options->file);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 08/20] diff.c: convert fn_out_consume to use emit_line
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (6 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
                       ` (12 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line.

This covers the parts of fn_out_consume.  In the next
patches we'll convert more functions that want to emit
formatted output, so we'd want to have a formatted emit
function. Add it here.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3569857818..8186289734 100644
--- a/diff.c
+++ b/diff.c
@@ -547,6 +547,21 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
 		fputc('\n', file);
 }
 
+static void emit_line_fmt(struct diff_options *o,
+			  const char *set, const char *reset,
+			  int add_line_prefix,
+			  const char *fmt, ...)
+{
+	struct strbuf sb = STRBUF_INIT;
+	va_list ap;
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+
+	emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+	strbuf_release(&sb);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -1270,7 +1285,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
 	const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
 	struct diff_options *o = ecbdata->opt;
-	const char *line_prefix = diff_line_prefix(o);
 
 	o->found_changes = 1;
 
@@ -1282,14 +1296,12 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 
 	if (ecbdata->label_path[0]) {
 		const char *name_a_tab, *name_b_tab;
-
 		name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
 		name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-		fprintf(o->file, "%s%s--- %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab);
-		fprintf(o->file, "%s%s+++ %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab);
+		emit_line_fmt(o, meta, reset, 1, "--- %s%s\n",
+			      ecbdata->label_path[0], name_a_tab);
+		emit_line_fmt(o, meta, reset, 1, "+++ %s%s\n",
+			      ecbdata->label_path[1], name_b_tab);
 		ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
 	}
 
@@ -1330,7 +1342,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
 			emit_line(o, context, reset, 1, 0, line, len);
-			fputs("~\n", o->file);
+			emit_line(o, NULL, NULL, 0, 0, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 09/20] diff.c: convert builtin_diff to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (7 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
                       ` (11 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers builtin_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 8186289734..4fa976d43c 100644
--- a/diff.c
+++ b/diff.c
@@ -1289,8 +1289,9 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		fprintf(o->file, "%s", ecbdata->header->buf);
-		strbuf_reset(ecbdata->header);
+		emit_line(o, NULL, NULL, 0, 0,
+			  ecbdata->header->buf, ecbdata->header->len);
+		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
 	}
 
@@ -2435,7 +2436,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			fprintf(o->file, "%s", header.buf);
+			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2445,7 +2446,7 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		fprintf(o->file, "%s", header.buf);
+		emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2456,12 +2457,15 @@ static void builtin_diff(const char *name_a,
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
 				if (must_show_header)
-					fprintf(o->file, "%s", header.buf);
+					emit_line(o, NULL, NULL, 0, 0,
+						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			fprintf(o->file, "%s", header.buf);
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			emit_line(o, NULL, NULL, 0, 0,
+				  header.buf, header.len);
+			emit_line_fmt(o, NULL, NULL, 1,
+				      "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 			goto free_ab_and_return;
 		}
 		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
@@ -2470,16 +2474,19 @@ static void builtin_diff(const char *name_a,
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
 			if (must_show_header)
-				fprintf(o->file, "%s", header.buf);
+				emit_line(o, NULL, NULL, 0, 0,
+					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		fprintf(o->file, "%s", header.buf);
+		emit_line(o, NULL, NULL, 0, 0,
+			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
 			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
 		else
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			emit_line_fmt(o, NULL, NULL, 1,
+				      "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 		o->found_changes = 1;
 	} else {
 		/* Crazy xdl interfaces.. */
@@ -2491,7 +2498,7 @@ static void builtin_diff(const char *name_a,
 		const struct userdiff_funcname *pe;
 
 		if (must_show_header) {
-			fprintf(o->file, "%s", header.buf);
+			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 10/20] diff.c: convert emit_rewrite_diff to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (8 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
                       ` (10 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 4fa976d43c..3dda9f3c8e 100644
--- a/diff.c
+++ b/diff.c
@@ -704,17 +704,17 @@ static void remove_tempfile(void)
 	}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
 	switch (count) {
 	case 0:
-		fprintf(file, "0,0");
+		strbuf_addstr(out, "0,0");
 		break;
 	case 1:
-		fprintf(file, "1");
+		strbuf_addstr(out, "1");
 		break;
 	default:
-		fprintf(file, "1,%d", count);
+		strbuf_addf(out, "1,%d", count);
 		break;
 	}
 }
@@ -768,7 +768,7 @@ static void emit_rewrite_diff(const char *name_a,
 	char *data_one, *data_two;
 	size_t size_one, size_two;
 	struct emit_callback ecbdata;
-	const char *line_prefix = diff_line_prefix(o);
+	struct strbuf out = STRBUF_INIT;
 
 	if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		a_prefix = o->b_prefix;
@@ -806,20 +806,23 @@ static void emit_rewrite_diff(const char *name_a,
 	ecbdata.lno_in_preimage = 1;
 	ecbdata.lno_in_postimage = 1;
 
+	emit_line_fmt(o, metainfo, reset, 1, "--- %s%s\n", a_name.buf, name_a_tab);
+	emit_line_fmt(o, metainfo, reset, 1, "+++ %s%s\n", b_name.buf, name_b_tab);
+
 	lc_a = count_lines(data_one, size_one);
 	lc_b = count_lines(data_two, size_two);
-	fprintf(o->file,
-		"%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-		line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-		line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-		line_prefix, fraginfo);
+
+	strbuf_addstr(&out, "@@ -");
 	if (!o->irreversible_delete)
-		print_line_count(o->file, lc_a);
+		add_line_count(&out, lc_a);
 	else
-		fprintf(o->file, "?,?");
-	fprintf(o->file, " +");
-	print_line_count(o->file, lc_b);
-	fprintf(o->file, " @@%s\n", reset);
+		strbuf_addstr(&out, "?,?");
+	strbuf_addstr(&out, " +");
+	add_line_count(&out, lc_b);
+	strbuf_addstr(&out, " @@\n");
+	emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+	strbuf_release(&out);
+
 	if (lc_a && !o->irreversible_delete)
 		emit_rewrite_lines(&ecbdata, '-', data_one, size_one);
 	if (lc_b)
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 11/20] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (9 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
                       ` (9 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_lines.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3dda9f3c8e..ca6b48cf49 100644
--- a/diff.c
+++ b/diff.c
@@ -722,15 +722,23 @@ static void add_line_count(struct strbuf *out, int count)
 static void emit_rewrite_lines(struct emit_callback *ecb,
 			       int prefix, const char *data, int size)
 {
-	const char *endp = NULL;
-	static const char *nneof = " No newline at end of file\n";
 	const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
+	struct strbuf sb = STRBUF_INIT;
 
 	while (0 < size) {
 		int len;
 
-		endp = memchr(data, '\n', size);
-		len = endp ? (endp - data + 1) : size;
+		const char *endp = memchr(data, '\n', size);
+		if (endp)
+			len = endp - data + 1;
+		else {
+			strbuf_add(&sb, data, size);
+			strbuf_addch(&sb, '\n');
+			size = 0; /* to exit the loop. */
+
+			data = sb.buf;
+			len = sb.len;
+		}
 		if (prefix != '+') {
 			ecb->lno_in_preimage++;
 			emit_del_line(reset, ecb, data, len);
@@ -741,12 +749,13 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		size -= len;
 		data += len;
 	}
-	if (!endp) {
+	if (sb.len) {
+		static const char *nneof = "\\ No newline at end of file\n";
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		putc('\n', ecb->opt->file);
-		emit_line(ecb->opt, context, reset, 1, '\\',
-			  nneof, strlen(nneof));
+		emit_line(ecb->opt, context, reset, 1, 0,
+			    nneof, strlen(nneof));
+		strbuf_release(&sb);
 	}
 }
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (10 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
                       ` (8 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This prepares the code for submodules to go through the
emit_line function.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c      | 26 ++++++++++--------
 diff.h      |  3 +++
 submodule.c | 87 ++++++++++++++++++++++++++++++++-----------------------------
 submodule.h |  9 +++----
 4 files changed, 67 insertions(+), 58 deletions(-)

diff --git a/diff.c b/diff.c
index ca6b48cf49..2950455e49 100644
--- a/diff.c
+++ b/diff.c
@@ -516,8 +516,8 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char *reset,
-		      int add_line_prefix, int sign, const char *line, int len)
+void emit_line(struct diff_options *o, const char *set, const char *reset,
+	       int add_line_prefix, int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
@@ -547,10 +547,10 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
 		fputc('\n', file);
 }
 
-static void emit_line_fmt(struct diff_options *o,
-			  const char *set, const char *reset,
-			  int add_line_prefix,
-			  const char *fmt, ...)
+void emit_line_fmt(struct diff_options *o,
+		   const char *set, const char *reset,
+		   int add_line_prefix,
+		   const char *fmt, ...)
 {
 	struct strbuf sb = STRBUF_INIT;
 	va_list ap;
@@ -562,6 +562,12 @@ static void emit_line_fmt(struct diff_options *o,
 	strbuf_release(&sb);
 }
 
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+		    const char *line, int len)
+{
+	emit_line(o, set, reset, 1, 0, line, len);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -2384,8 +2390,7 @@ static void builtin_diff(const char *name_a,
 	    (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_summary(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_summary(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
 				meta, del, add, reset);
@@ -2395,11 +2400,10 @@ static void builtin_diff(const char *name_a,
 		   (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_inline_diff(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_inline_diff(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
-				meta, del, add, reset, o);
+				meta, del, add, reset);
 		return;
 	}
 
diff --git a/diff.h b/diff.h
index 5be1ee77a7..9ad546361a 100644
--- a/diff.h
+++ b/diff.h
@@ -188,6 +188,9 @@ struct diff_options {
 	int diff_path_counter;
 };
 
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+		    const char *line, int len);
+
 enum color_diff {
 	DIFF_RESET = 0,
 	DIFF_CONTEXT = 1,
diff --git a/submodule.c b/submodule.c
index d3299e29c0..19c63197fb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path,
 	return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, FILE *f,
-		const char *line_prefix,
+static void print_submodule_summary(struct rev_info *rev,
+		struct diff_options *o,
 		const char *del, const char *add, const char *reset)
 {
 	static const char format[] = "  %m %s";
@@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f,
 		ctx.date_mode = rev->date_mode;
 		ctx.output_encoding = get_log_output_encoding();
 		strbuf_setlen(&sb, 0);
-		strbuf_addstr(&sb, line_prefix);
-		if (commit->object.flags & SYMMETRIC_LEFT) {
-			if (del)
-				strbuf_addstr(&sb, del);
-		}
-		else if (add)
-			strbuf_addstr(&sb, add);
 		format_commit_message(commit, format, &sb, &ctx);
-		if (reset)
-			strbuf_addstr(&sb, reset);
 		strbuf_addch(&sb, '\n');
-		fprintf(f, "%s", sb.buf);
+		if (commit->object.flags & SYMMETRIC_LEFT)
+			diff_emit_line(o, del, reset, sb.buf, sb.len);
+		else if (add)
+			diff_emit_line(o, add, reset, sb.buf, sb.len);
 	}
 	strbuf_release(&sb);
 }
@@ -413,8 +407,7 @@ void prepare_submodule_repo_env(struct argv_array *out)
  * attempt to lookup both the left and right commits and put them into the
  * left and right pointers.
  */
-static void show_submodule_header(FILE *f, const char *path,
-		const char *line_prefix,
+static void show_submodule_header(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *reset,
@@ -425,12 +418,17 @@ static void show_submodule_header(FILE *f, const char *path,
 	struct strbuf sb = STRBUF_INIT;
 	int fast_forward = 0, fast_backward = 0;
 
-	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
-		fprintf(f, "%sSubmodule %s contains untracked content\n",
-			line_prefix, path);
-	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-		fprintf(f, "%sSubmodule %s contains modified content\n",
-			line_prefix, path);
+	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) {
+		strbuf_addf(&sb, "Submodule %s contains untracked content\n", path);
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+		strbuf_reset(&sb);
+	}
+
+	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED) {
+		strbuf_addf(&sb, "Submodule %s contains modified content\n", path);
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+		strbuf_reset(&sb);
+	}
 
 	if (is_null_oid(one))
 		message = "(new submodule)";
@@ -472,21 +470,20 @@ static void show_submodule_header(FILE *f, const char *path,
 	}
 
 output_header:
-	strbuf_addf(&sb, "%s%sSubmodule %s ", line_prefix, meta, path);
+	strbuf_addf(&sb, "Submodule %s ", path);
 	strbuf_add_unique_abbrev(&sb, one->hash, DEFAULT_ABBREV);
 	strbuf_addstr(&sb, (fast_backward || fast_forward) ? ".." : "...");
 	strbuf_add_unique_abbrev(&sb, two->hash, DEFAULT_ABBREV);
 	if (message)
-		strbuf_addf(&sb, " %s%s\n", message, reset);
+		strbuf_addf(&sb, " %s\n", message);
 	else
-		strbuf_addf(&sb, "%s:%s\n", fast_backward ? " (rewind)" : "", reset);
-	fwrite(sb.buf, sb.len, 1, f);
+		strbuf_addf(&sb, "%s:\n", fast_backward ? " (rewind)" : "");
+	diff_emit_line(o, meta, reset, sb.buf, sb.len);
 
 	strbuf_release(&sb);
 }
 
-void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset)
@@ -495,7 +492,7 @@ void show_submodule_summary(FILE *f, const char *path,
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/*
@@ -508,11 +505,12 @@ void show_submodule_summary(FILE *f, const char *path,
 
 	/* Treat revision walker failure the same as missing commits */
 	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
-		fprintf(f, "%s(revision walker failed)\n", line_prefix);
+		const char *error = "(revision walker failed)\n";
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
 		goto out;
 	}
 
-	print_submodule_summary(&rev, f, line_prefix, del, add, reset);
+	print_submodule_summary(&rev, o, del, add, reset);
 
 out:
 	if (merge_bases)
@@ -521,20 +519,18 @@ void show_submodule_summary(FILE *f, const char *path,
 	clear_commit_marks(right, ~0);
 }
 
-void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *o)
+		const char *del, const char *add, const char *reset)
 {
 	const struct object_id *old = &empty_tree_oid, *new = &empty_tree_oid;
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
-	struct strbuf submodule_dir = STRBUF_INIT;
 	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf sb = STRBUF_INIT;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/* We need a valid left and right commit to display a difference */
@@ -547,15 +543,14 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 	if (right)
 		new = two;
 
-	fflush(f);
 	cp.git_cmd = 1;
 	cp.dir = path;
-	cp.out = dup(fileno(f));
+	cp.out = -1;
 	cp.no_stdin = 1;
 
 	/* TODO: other options may need to be passed here. */
 	argv_array_push(&cp.args, "diff");
-	argv_array_pushf(&cp.args, "--line-prefix=%s", line_prefix);
+	argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));
 	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		argv_array_pushf(&cp.args, "--src-prefix=%s%s/",
 				 o->b_prefix, path);
@@ -578,11 +573,21 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 		argv_array_push(&cp.args, oid_to_hex(new));
 
 	prepare_submodule_repo_env(&cp.env_array);
-	if (run_command(&cp))
-		fprintf(f, "(diff failed)\n");
+	if (start_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
+	}
+
+	while (strbuf_getwholeline_fd(&sb, cp.out, '\n') != EOF)
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+
+	if (finish_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
+	}
 
 done:
-	strbuf_release(&submodule_dir);
+	strbuf_release(&sb);
 	if (merge_bases)
 		free_commit_list(merge_bases);
 	if (left)
diff --git a/submodule.h b/submodule.h
index 1277480add..9df0a3aea2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -53,17 +53,14 @@ extern int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 extern const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
 extern void handle_ignore_submodules_arg(struct diff_options *, const char *);
-extern void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset);
-extern void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *opt);
+		const char *del, const char *add, const char *reset);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
 /* Check if we want to update any submodule.*/
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 13/20] diff.c: convert emit_binary_diff_body to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (11 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 14/20] diff.c: convert show_stats " Stefan Beller
                       ` (7 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_binary_diff_body.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 2950455e49..126038696d 100644
--- a/diff.c
+++ b/diff.c
@@ -2244,8 +2244,8 @@ static unsigned char *deflate_it(char *data,
 	return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
-				  const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+				  mmfile_t *one, mmfile_t *two)
 {
 	void *cp;
 	void *delta;
@@ -2274,13 +2274,12 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	}
 
 	if (delta && delta_size < deflate_size) {
-		fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+		emit_line_fmt(o, NULL, NULL, 1, "delta %lu\n", orig_size);
 		free(deflated);
 		data = delta;
 		data_size = delta_size;
-	}
-	else {
-		fprintf(file, "%sliteral %lu\n", prefix, two->size);
+	} else {
+		emit_line_fmt(o, NULL, NULL, 1, "literal %lu\n", two->size);
 		free(delta);
 		data = deflated;
 		data_size = deflate_size;
@@ -2289,8 +2288,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	/* emit data encoded in base85 */
 	cp = data;
 	while (data_size) {
+		int len;
 		int bytes = (52 < data_size) ? 52 : data_size;
-		char line[70];
+		char line[71];
 		data_size -= bytes;
 		if (bytes <= 26)
 			line[0] = bytes + 'A' - 1;
@@ -2298,20 +2298,25 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 			line[0] = bytes - 26 + 'a' - 1;
 		encode_85(line + 1, cp, bytes);
 		cp = (char *) cp + bytes;
-		fprintf(file, "%s", prefix);
-		fputs(line, file);
-		fputc('\n', file);
+
+		len = strlen(line);
+		line[len++] = '\n';
+		line[len] = '\0';
+
+		emit_line(o, NULL, NULL, 1, 0, line, len);
 	}
-	fprintf(file, "%s\n", prefix);
+	emit_line(o, NULL, NULL, 1, 0, "\n", 1);
 	free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-			     const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+			     mmfile_t *one, mmfile_t *two)
 {
-	fprintf(file, "%sGIT binary patch\n", prefix);
-	emit_binary_diff_body(file, one, two, prefix);
-	emit_binary_diff_body(file, two, one, prefix);
+	const char *s = "GIT binary patch\n";
+	const int len = strlen(s);
+	emit_line(o, NULL, NULL, 1, 0, s, len);
+	emit_binary_diff_body(o, one, two);
+	emit_binary_diff_body(o, two, one);
 }
 
 int diff_filespec_is_binary(struct diff_filespec *one)
@@ -2498,7 +2503,7 @@ static void builtin_diff(const char *name_a,
 			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
-			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
+			emit_binary_diff(o, &mf1, &mf2);
 		else
 			emit_line_fmt(o, NULL, NULL, 1,
 				      "Binary files %s and %s differ\n",
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 14/20] diff.c: convert show_stats to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (12 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 15/20] diff.c: convert word diffing " Stefan Beller
                       ` (6 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_line_* machinery and
keep print_stat_summary with the same arguments around.

The responsibility to print the line prefix moves from the callers
of print_stat_summary_0 into the function itself.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 89 ++++++++++++++++++++++++++++++++++++++----------------------------
 diff.h |  4 +--
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/diff.c b/diff.c
index 126038696d..f45f034036 100644
--- a/diff.c
+++ b/diff.c
@@ -1540,20 +1540,19 @@ static int scale_linear(int it, int width, int max_change)
 	return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
+static void show_name(struct strbuf *out,
 		      const char *prefix, const char *name, int len)
 {
-	fprintf(file, " %s%-*s |", prefix, len, name);
+	strbuf_addf(out, " %s%-*s |", prefix, len, name);
 }
 
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt, const char *set, const char *reset)
 {
 	if (cnt <= 0)
 		return;
-	fprintf(file, "%s", set);
-	while (cnt--)
-		putc(ch, file);
-	fprintf(file, "%s", reset);
+	strbuf_addstr(out, set);
+	strbuf_addchars(out, ch, cnt);
+	strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1577,14 +1576,16 @@ static void fill_print_name(struct diffstat_file *file)
 	file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+void print_stat_summary_0(struct diff_options *options, int files,
+			  int insertions, int deletions)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
 
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
-		return fprintf(fp, "%s\n", " 0 files changed");
+		strbuf_addstr(&sb, " 0 files changed");
+		emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+		return;
 	}
 
 	strbuf_addf(&sb,
@@ -1611,9 +1612,17 @@ int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	ret = fputs(sb.buf, fp);
+	emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
-	return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+			int insertions, int deletions)
+{
+	struct diff_options o;
+	memset(&o, 0, sizeof(o));
+	o.file = fp;
+	print_stat_summary_0(&o, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1623,13 +1632,13 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 	int total_files = data->nr, count;
 	int width, name_width, graph_width, number_width = 0, bin_width = 0;
 	const char *reset, *add_c, *del_c;
-	const char *line_prefix = "";
 	int extra_shown = 0;
+	const char *line_prefix = diff_line_prefix(options);
+	struct strbuf out = STRBUF_INIT;
 
 	if (data->nr == 0)
 		return;
 
-	line_prefix = diff_line_prefix(options);
 	count = options->stat_count ? options->stat_count : data->nr;
 
 	reset = diff_get_color_opt(options, DIFF_RESET);
@@ -1783,26 +1792,29 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		}
 
 		if (file->is_binary) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " %*s", number_width, "Bin");
+			show_name(&out, prefix, name, len);
+			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
-				putc('\n', options->file);
+				strbuf_addch(&out, '\n');
+				emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+				strbuf_reset(&out);
 				continue;
 			}
-			fprintf(options->file, " %s%"PRIuMAX"%s",
+			strbuf_addf(&out, " %s%"PRIuMAX"%s",
 				del_c, deleted, reset);
-			fprintf(options->file, " -> ");
-			fprintf(options->file, "%s%"PRIuMAX"%s",
+			strbuf_addstr(&out, " -> ");
+			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
-			fprintf(options->file, " bytes");
-			fprintf(options->file, "\n");
+			strbuf_addstr(&out, " bytes\n");
+			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " Unmerged\n");
+			show_name(&out, prefix, name, len);
+			strbuf_addstr(&out, " Unmerged\n");
+			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 
@@ -1825,14 +1837,15 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 				add = total - del;
 			}
 		}
-		fprintf(options->file, "%s", line_prefix);
-		show_name(options->file, prefix, name, len);
-		fprintf(options->file, " %*"PRIuMAX"%s",
+		show_name(&out, prefix, name, len);
+		strbuf_addf(&out, " %*"PRIuMAX"%s",
 			number_width, added + deleted,
 			added + deleted ? " " : "");
-		show_graph(options->file, '+', add, add_c, reset);
-		show_graph(options->file, '-', del, del_c, reset);
-		fprintf(options->file, "\n");
+		show_graph(&out, '+', add, add_c, reset);
+		show_graph(&out, '-', del, del_c, reset);
+		strbuf_addch(&out, '\n');
+		emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+		strbuf_reset(&out);
 	}
 
 	for (i = 0; i < data->nr; i++) {
@@ -1853,11 +1866,12 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			fprintf(options->file, "%s ...\n", line_prefix);
+			emit_line(options, NULL, NULL, 1, 0,
+				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
-	fprintf(options->file, "%s", line_prefix);
-	print_stat_summary(options->file, total_files, adds, dels);
+
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_shortstats(struct diffstat_t *data, struct diff_options *options)
@@ -1869,7 +1883,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 
 	for (i = 0; i < data->nr; i++) {
 		int added = data->files[i]->added;
-		int deleted= data->files[i]->deleted;
+		int deleted = data->files[i]->deleted;
 
 		if (data->files[i]->is_unmerged ||
 		    (!data->files[i]->is_interesting && (added + deleted == 0))) {
@@ -1879,8 +1893,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 			dels += deleted;
 		}
 	}
-	fprintf(options->file, "%s", diff_line_prefix(options));
-	print_stat_summary(options->file, total_files, adds, dels);
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_numstat(struct diffstat_t *data, struct diff_options *options)
diff --git a/diff.h b/diff.h
index 9ad546361a..56d8dd036e 100644
--- a/diff.h
+++ b/diff.h
@@ -392,8 +392,8 @@ extern int parse_rename_score(const char **cp_p);
 
 extern long parse_algorithm_value(const char *value);
 
-extern int print_stat_summary(FILE *fp, int files,
-			      int insertions, int deletions);
+extern void print_stat_summary(FILE *fp, int files,
+			       int insertions, int deletions);
 extern void setup_diff_pager(struct diff_options *);
 
 #endif /* DIFF_H */
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 15/20] diff.c: convert word diffing to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (13 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 14/20] diff.c: convert show_stats " Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 16/20] diff.c: convert diff_flush " Stefan Beller
                       ` (5 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers all code related to diffing words.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 73 +++++++++++++++++++++++++++++-------------------------------------
 1 file changed, 32 insertions(+), 41 deletions(-)

diff --git a/diff.c b/diff.c
index f45f034036..383c8c4b52 100644
--- a/diff.c
+++ b/diff.c
@@ -897,37 +897,42 @@ struct diff_words_data {
 	struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
 					  struct diff_words_style_elem *st_el,
 					  const char *newline,
-					  size_t count, const char *buf,
-					  const char *line_prefix)
+					  size_t count, const char *buf)
 {
 	int print = 0;
+	struct strbuf sb = STRBUF_INIT;
 
 	while (count) {
 		char *p = memchr(buf, '\n', count);
 		if (print)
-			fputs(line_prefix, fp);
+			emit_line(o, NULL, NULL, 1, 0, "", 0);
+
 		if (p != buf) {
-			if (st_el->color && fputs(st_el->color, fp) < 0)
-				return -1;
-			if (fputs(st_el->prefix, fp) < 0 ||
-			    fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-			    fputs(st_el->suffix, fp) < 0)
-				return -1;
-			if (st_el->color && *st_el->color
-			    && fputs(GIT_COLOR_RESET, fp) < 0)
-				return -1;
+			const char *reset = st_el->color && *st_el->color ?
+					    GIT_COLOR_RESET : NULL;
+			strbuf_addstr(&sb, st_el->prefix);
+			strbuf_add(&sb, buf, p ? p - buf : count);
+			strbuf_addstr(&sb, st_el->suffix);
+			emit_line(o, st_el->color, reset,
+				  0, 0, sb.buf, sb.len);
+			strbuf_reset(&sb);
 		}
 		if (!p)
-			return 0;
-		if (fputs(newline, fp) < 0)
-			return -1;
+			goto out;
+
+		strbuf_addstr(&sb, newline);
+		emit_line(o, NULL, NULL, 0, 0, sb.buf, sb.len);
+		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
 		print = 1;
 	}
+
+out:
+	strbuf_release(&sb);
 	return 0;
 }
 
@@ -981,14 +986,12 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	int minus_first, minus_len, plus_first, plus_len;
 	const char *minus_begin, *minus_end, *plus_begin, *plus_end;
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
 
 	if (line[0] != '@' || parse_hunk_header(line, len,
 			&minus_first, &minus_len, &plus_first, &plus_len))
 		return;
 
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* POSIX requires that first be decremented by one if len == 0... */
 	if (minus_len) {
@@ -1005,28 +1008,21 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	} else
 		plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
 
-	if (color_words_output_graph_prefix(diff_words)) {
-		fputs(line_prefix, diff_words->opt->file);
-	}
 	if (diff_words->current_plus != plus_begin) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->ctx, style->newline,
 				plus_begin - diff_words->current_plus,
-				diff_words->current_plus, line_prefix);
-		if (*(plus_begin - 1) == '\n')
-			fputs(line_prefix, diff_words->opt->file);
+				diff_words->current_plus);
 	}
 	if (minus_begin != minus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->old, style->newline,
-				minus_end - minus_begin, minus_begin,
-				line_prefix);
+				minus_end - minus_begin, minus_begin);
 	}
 	if (plus_begin != plus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->new, style->newline,
-				plus_end - plus_begin, plus_begin,
-				line_prefix);
+				plus_end - plus_begin, plus_begin);
 	}
 
 	diff_words->current_plus = plus_end;
@@ -1113,18 +1109,14 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	struct diff_words_style *style = diff_words->style;
 
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
-
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* special case: only removal */
 	if (!diff_words->plus.text.size) {
-		fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->old, style->newline,
 			diff_words->minus.text.size,
-			diff_words->minus.text.ptr, line_prefix);
+			diff_words->minus.text.ptr);
 		diff_words->minus.text.size = 0;
 		return;
 	}
@@ -1147,12 +1139,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+			emit_line(diff_words->opt, NULL, NULL, 1, 0, "", 0);
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
-			- diff_words->current_plus, diff_words->current_plus,
-			line_prefix);
+			- diff_words->current_plus, diff_words->current_plus);
 	}
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 16/20] diff.c: convert diff_flush to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (14 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 15/20] diff.c: convert word diffing " Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 17/20] diff.c: convert diff_summary " Stefan Beller
                       ` (4 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_flush.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 383c8c4b52..774f1acdd3 100644
--- a/diff.c
+++ b/diff.c
@@ -4872,7 +4872,9 @@ void diff_flush(struct diff_options *options)
 			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				fputs(options->stat_sep, options->file);
+				emit_line(options, NULL, NULL, 0, 0,
+					  options->stat_sep,
+					  strlen(options->stat_sep));
 			}
 		}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 17/20] diff.c: convert diff_summary to use emit_line_*
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (15 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 16/20] diff.c: convert diff_flush " Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
                       ` (3 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_summary.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 64 ++++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/diff.c b/diff.c
index 774f1acdd3..0945802ebf 100644
--- a/diff.c
+++ b/diff.c
@@ -4504,67 +4504,71 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
 	}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char *newdelete, struct diff_filespec *fs)
 {
+	struct strbuf sb = STRBUF_INIT;
 	if (fs->mode)
-		fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+		strbuf_addf(&sb, " %s mode %06o ", newdelete, fs->mode);
 	else
-		fprintf(file, " %s ", newdelete);
-	write_name_quoted(fs->path, file, '\n');
-}
+		strbuf_addf(&sb, " %s ", newdelete);
 
+	quote_c_style(fs->path, &sb, NULL, 0);
+	strbuf_addch(&sb, '\n');
+	emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+	strbuf_release(&sb);
+}
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int show_name,
-		const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+		int show_name)
 {
 	if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-		fprintf(file, "%s mode change %06o => %06o%c", line_prefix, p->one->mode,
-			p->two->mode, show_name ? ' ' : '\n');
+		struct strbuf sb = STRBUF_INIT;
 		if (show_name) {
-			write_name_quoted(p->two->path, file, '\n');
+			strbuf_addch(&sb, ' ');
+			quote_c_style(p->two->path, &sb, NULL, 0);
 		}
+		emit_line_fmt(opt, NULL, NULL, 1,
+			      " mode change %06o => %06o%s\n",
+			      p->one->mode, p->two->mode,
+			      show_name ? sb.buf : "");
+		strbuf_release(&sb);
 	}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct diff_filepair *p,
-			const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+		struct diff_filepair *p)
 {
 	char *names = pprint_rename(p->one->path, p->two->path);
-
-	fprintf(file, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p));
+	emit_line_fmt(opt, NULL, NULL, 1, " %s %s (%d%%)\n",
+		      renamecopy, names, similarity_index(p));
 	free(names);
-	show_mode_change(file, p, 0, line_prefix);
+	show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-	FILE *file = opt->file;
-	const char *line_prefix = diff_line_prefix(opt);
-
 	switch(p->status) {
 	case DIFF_STATUS_DELETED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "delete", p->one);
+		show_file_mode_name(opt, "delete", p->one);
 		break;
 	case DIFF_STATUS_ADDED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "create", p->two);
+		show_file_mode_name(opt, "create", p->two);
 		break;
 	case DIFF_STATUS_COPIED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "copy", p, line_prefix);
+		show_rename_copy(opt, "copy", p);
 		break;
 	case DIFF_STATUS_RENAMED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "rename", p, line_prefix);
+		show_rename_copy(opt, "rename", p);
 		break;
 	default:
 		if (p->score) {
-			fprintf(file, "%s rewrite ", line_prefix);
-			write_name_quoted(p->two->path, file, ' ');
-			fprintf(file, "(%d%%)\n", similarity_index(p));
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, " rewrite ");
+			quote_c_style(p->two->path, &sb, NULL, 0);
+			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
+			emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
 		}
-		show_mode_change(file, p, !p->score, line_prefix);
+		show_mode_change(opt, p, !p->score);
 		break;
 	}
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 18/20] diff.c: emit_line includes whitespace highlighting
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (16 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 17/20] diff.c: convert diff_summary " Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 19/20] diff: buffer all output if asked to Stefan Beller
                       ` (2 subsequent siblings)
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

Currently any whitespace highlighting happens outside the emit_line
function. Teach the highlighting to emit_line, triggered by a new
parameter.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 106 ++++++++++++++++++++++++++++++++++++++---------------------------
 diff.h |   2 ++
 2 files changed, 64 insertions(+), 44 deletions(-)

diff --git a/diff.c b/diff.c
index 0945802ebf..50d91643b6 100644
--- a/diff.c
+++ b/diff.c
@@ -516,15 +516,33 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-void emit_line(struct diff_options *o, const char *set, const char *reset,
-	       int add_line_prefix, int sign, const char *line, int len)
+void emit_line(struct diff_options *o,
+	       const char *set, const char *reset,
+	       int add_line_prefix, int markup_ws,
+	       int sign, const char *line, int len)
 {
+	const char *ws;
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
 
 	if (add_line_prefix)
 		fputs(diff_line_prefix(o), file);
 
+	if (markup_ws) {
+		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+
+		if (set)
+			fputs(set, file);
+		if (sign)
+			fputc(sign, file);
+		if (reset)
+			fputs(reset, file);
+		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+		ws_check_emit(line, len, o->ws_rule,
+			      file, set, reset, ws);
+		return;
+	}
+
 	has_trailing_newline = (len > 0 && line[len-1] == '\n');
 	if (has_trailing_newline)
 		len--;
@@ -558,14 +576,14 @@ void emit_line_fmt(struct diff_options *o,
 	strbuf_vaddf(&sb, fmt, ap);
 	va_end(ap);
 
-	emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+	emit_line(o, set, reset, add_line_prefix, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
 		    const char *line, int len)
 {
-	emit_line(o, set, reset, 1, 0, line, len);
+	emit_line(o, set, reset, 1, 0, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
@@ -596,16 +614,15 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, set, reset, 1, 0, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, ws, reset, 1, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
-		ws_check_emit(line, len, ecbdata->ws_rule,
-			      ecbdata->opt->file, set, reset, ws);
+		emit_line(ecbdata->opt, set, reset, 1, 1, sign, line, len);
 	}
+
 }
 
 static void emit_add_line(const char *reset,
@@ -652,7 +669,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	if (len < 10 ||
 	    memcmp(line, atat, 2) ||
 	    !(ep = memmem(line + 2, len - 2, atat, 2))) {
-		emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
+		emit_line(ecbdata->opt, context, reset, 1, 0, 0, line, len);
 		return;
 	}
 	ep += 2; /* skip over @@ */
@@ -688,7 +705,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	strbuf_add(&msgbuf, line + len, org_len - len);
 	strbuf_complete_line(&msgbuf);
 
-	emit_line(ecbdata->opt, "", "", 1, 0, msgbuf.buf, msgbuf.len);
+	emit_line(ecbdata->opt, "", "", 1, 0, 0, msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
 
@@ -759,7 +776,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		static const char *nneof = "\\ No newline at end of file\n";
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		emit_line(ecb->opt, context, reset, 1, 0,
+		emit_line(ecb->opt, context, reset, 1, 0, 0,
 			    nneof, strlen(nneof));
 		strbuf_release(&sb);
 	}
@@ -835,7 +852,7 @@ static void emit_rewrite_diff(const char *name_a,
 	strbuf_addstr(&out, " +");
 	add_line_count(&out, lc_b);
 	strbuf_addstr(&out, " @@\n");
-	emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+	emit_line(o, fraginfo, reset, 1, 0, 0, out.buf, out.len);
 	strbuf_release(&out);
 
 	if (lc_a && !o->irreversible_delete)
@@ -908,7 +925,7 @@ static int fn_out_diff_words_write_helper(struct diff_options *o,
 	while (count) {
 		char *p = memchr(buf, '\n', count);
 		if (print)
-			emit_line(o, NULL, NULL, 1, 0, "", 0);
+			emit_line(o, NULL, NULL, 1, 0, 0, "", 0);
 
 		if (p != buf) {
 			const char *reset = st_el->color && *st_el->color ?
@@ -917,14 +934,14 @@ static int fn_out_diff_words_write_helper(struct diff_options *o,
 			strbuf_add(&sb, buf, p ? p - buf : count);
 			strbuf_addstr(&sb, st_el->suffix);
 			emit_line(o, st_el->color, reset,
-				  0, 0, sb.buf, sb.len);
+				  0, 0, 0, sb.buf, sb.len);
 			strbuf_reset(&sb);
 		}
 		if (!p)
 			goto out;
 
 		strbuf_addstr(&sb, newline);
-		emit_line(o, NULL, NULL, 0, 0, sb.buf, sb.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, sb.buf, sb.len);
 		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
@@ -1139,7 +1156,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			emit_line(diff_words->opt, NULL, NULL, 1, 0, "", 0);
+			emit_line(diff_words->opt, NULL, NULL, 1, 0, 0, "", 0);
 		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
@@ -1298,7 +1315,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		emit_line(o, NULL, NULL, 0, 0,
+		emit_line(o, NULL, NULL, 0, 0, 0,
 			  ecbdata->header->buf, ecbdata->header->len);
 		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
@@ -1351,8 +1368,8 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-			emit_line(o, context, reset, 1, 0, line, len);
-			emit_line(o, NULL, NULL, 0, 0, "~\n", 2);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
+			emit_line(o, NULL, NULL, 0, 0, 0, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
@@ -1363,7 +1380,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			      line++;
 			      len--;
 			}
-			emit_line(o, context, reset, 1, 0, line, len);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
 		}
 		return;
 	}
@@ -1386,7 +1403,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		/* incomplete line at the end */
 		ecbdata->lno_in_preimage++;
 		emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
-			  reset, 1, 0, line, len);
+			  reset, 1, 0, 0, line, len);
 		break;
 	}
 }
@@ -1575,7 +1592,7 @@ void print_stat_summary_0(struct diff_options *options, int files,
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
 		strbuf_addstr(&sb, " 0 files changed");
-		emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		return;
 	}
 
@@ -1603,7 +1620,7 @@ void print_stat_summary_0(struct diff_options *options, int files,
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+	emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -1787,7 +1804,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
 				strbuf_addch(&out, '\n');
-				emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+				emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 				strbuf_reset(&out);
 				continue;
 			}
@@ -1797,14 +1814,14 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
 			strbuf_addstr(&out, " bytes\n");
-			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
 			show_name(&out, prefix, name, len);
 			strbuf_addstr(&out, " Unmerged\n");
-			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
@@ -1835,7 +1852,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		show_graph(&out, '+', add, add_c, reset);
 		show_graph(&out, '-', del, del_c, reset);
 		strbuf_addch(&out, '\n');
-		emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 		strbuf_reset(&out);
 	}
 
@@ -1857,7 +1874,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			emit_line(options, NULL, NULL, 1, 0,
+			emit_line(options, NULL, NULL, 1, 0, 0,
 				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
@@ -2211,7 +2228,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
 			line_prefix, data->filename, data->lineno, err);
 		free(err);
-		emit_line(data->o, set, reset, 1, 0, line, 1);
+		emit_line(data->o, set, reset, 1, 0, 0, line, 1);
 		ws_check_emit(line + 1, len - 1, data->ws_rule,
 			      data->o->file, set, reset, ws);
 	} else if (line[0] == ' ') {
@@ -2307,9 +2324,9 @@ static void emit_binary_diff_body(struct diff_options *o,
 		line[len++] = '\n';
 		line[len] = '\0';
 
-		emit_line(o, NULL, NULL, 1, 0, line, len);
+		emit_line(o, NULL, NULL, 1, 0, 0, line, len);
 	}
-	emit_line(o, NULL, NULL, 1, 0, "\n", 1);
+	emit_line(o, NULL, NULL, 1, 0, 0, "\n", 1);
 	free(data);
 }
 
@@ -2318,7 +2335,7 @@ static void emit_binary_diff(struct diff_options *o,
 {
 	const char *s = "GIT binary patch\n";
 	const int len = strlen(s);
-	emit_line(o, NULL, NULL, 1, 0, s, len);
+	emit_line(o, NULL, NULL, 1, 0, 0, s, len);
 	emit_binary_diff_body(o, one, two);
 	emit_binary_diff_body(o, two, one);
 }
@@ -2461,7 +2478,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2471,7 +2488,7 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2482,11 +2499,11 @@ static void builtin_diff(const char *name_a,
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
 				if (must_show_header)
-					emit_line(o, NULL, NULL, 0, 0,
+					emit_line(o, NULL, NULL, 0, 0, 0,
 						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			emit_line(o, NULL, NULL, 0, 0,
+			emit_line(o, NULL, NULL, 0, 0, 0,
 				  header.buf, header.len);
 			emit_line_fmt(o, NULL, NULL, 1,
 				      "Binary files %s and %s differ\n",
@@ -2499,11 +2516,11 @@ static void builtin_diff(const char *name_a,
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
 			if (must_show_header)
-				emit_line(o, NULL, NULL, 0, 0,
+				emit_line(o, NULL, NULL, 0, 0, 0,
 					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		emit_line(o, NULL, NULL, 0, 0,
+		emit_line(o, NULL, NULL, 0, 0, 0,
 			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
@@ -2523,7 +2540,7 @@ static void builtin_diff(const char *name_a,
 		const struct userdiff_funcname *pe;
 
 		if (must_show_header) {
-			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
@@ -2540,6 +2557,7 @@ static void builtin_diff(const char *name_a,
 		ecbdata.label_path = lbl;
 		ecbdata.color_diff = want_color(o->use_color);
 		ecbdata.ws_rule = whitespace_rule(name_b);
+		o->ws_rule = ecbdata.ws_rule;
 		if (ecbdata.ws_rule & WS_BLANK_AT_EOF)
 			check_blank_at_eof(&mf1, &mf2, &ecbdata);
 		ecbdata.opt = o;
@@ -4514,7 +4532,7 @@ static void show_file_mode_name(struct diff_options *opt, const char *newdelete,
 
 	quote_c_style(fs->path, &sb, NULL, 0);
 	strbuf_addch(&sb, '\n');
-	emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+	emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -4566,7 +4584,7 @@ static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 			strbuf_addstr(&sb, " rewrite ");
 			quote_c_style(p->two->path, &sb, NULL, 0);
 			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
-			emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+			emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		}
 		show_mode_change(opt, p, !p->score);
 		break;
@@ -4873,10 +4891,10 @@ void diff_flush(struct diff_options *options)
 			term[0] = options->line_termination;
 			term[1] = '\0';
 
-			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
+			emit_line(options, NULL, NULL, 1, 0, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				emit_line(options, NULL, NULL, 0, 0,
+				emit_line(options, NULL, NULL, 0, 0, 0,
 					  options->stat_sep,
 					  strlen(options->stat_sep));
 			}
diff --git a/diff.h b/diff.h
index 56d8dd036e..85948ed65a 100644
--- a/diff.h
+++ b/diff.h
@@ -186,6 +186,8 @@ struct diff_options {
 	void *output_prefix_data;
 
 	int diff_path_counter;
+
+	unsigned ws_rule;
 };
 
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 19/20] diff: buffer all output if asked to
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (17 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-18 19:37     ` [PATCHv3 20/20] diff.c: color moved lines differently Stefan Beller
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
  20 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

Introduce a new option 'use_buffer' in the struct diff_options which
controls whether all output is buffered up until all output is available.

We'll have a new struct 'buffered_patch_line' in diff.h which will be
used to buffer each line.  The buffered_patch_line will duplicate the
memory of the line to buffer as that is easiest to reason about for now.
In a future patch we may want to decrease the memory usage by not
duplicating all output for buffering but rather we may want to store
offsets into the file or in case of hunk descriptions such as the
similarity score, we could just store the relevant number and
reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_line factors out the emission part into emit_line_emission,
and depending on the diff_options->use_buffer the emission
will be performed directly when calling emit_line or after the
whole process is done, i.e. by buffering we have add the possibility
for a second pass over the whole output before doing the actual
output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_line for output.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++---------------
 diff.h |  39 +++++++++++++++++
 2 files changed, 159 insertions(+), 34 deletions(-)

diff --git a/diff.c b/diff.c
index 50d91643b6..2ccf93cd09 100644
--- a/diff.c
+++ b/diff.c
@@ -516,53 +516,85 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-void emit_line(struct diff_options *o,
-	       const char *set, const char *reset,
-	       int add_line_prefix, int markup_ws,
-	       int sign, const char *line, int len)
+static void emit_buffered_patch_line(struct diff_options *o,
+				     struct buffered_patch_line *e)
 {
 	const char *ws;
 	int has_trailing_newline, has_trailing_carriage_return;
+	int len = e->len;
 	FILE *file = o->file;
 
-	if (add_line_prefix)
+	if (e->add_line_prefix)
 		fputs(diff_line_prefix(o), file);
 
-	if (markup_ws) {
+	switch (e->state) {
+	case BPL_EMIT_LINE_WS:
 		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+		if (e->set)
+			fputs(e->set, file);
+		if (e->sign)
+			fputc(e->sign, file);
+		if (e->reset)
+			fputs(e->reset, file);
+		ws_check_emit(e->line, e->len, o->ws_rule,
+			      file, e->set, e->reset, ws);
+		return;
+	case BPL_EMIT_LINE_ASIS:
+		has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
+		if (has_trailing_newline)
+			len--;
+		has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
+		if (has_trailing_carriage_return)
+			len--;
 
-		if (set)
-			fputs(set, file);
-		if (sign)
-			fputc(sign, file);
-		if (reset)
-			fputs(reset, file);
-		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
-		ws_check_emit(line, len, o->ws_rule,
-			      file, set, reset, ws);
+		if (len || e->sign) {
+			if (e->set)
+				fputs(e->set, file);
+			if (e->sign)
+				fputc(e->sign, file);
+			fwrite(e->line, len, 1, file);
+			if (e->reset)
+				fputs(e->reset, file);
+		}
+		if (has_trailing_carriage_return)
+			fputc('\r', file);
+		if (has_trailing_newline)
+			fputc('\n', file);
+		return;
+	case BPL_HANDOVER:
+		o->ws_rule = whitespace_rule(e->line); /*read from file, stored in line?*/
 		return;
+	default:
+		die("BUG: malformatted buffered patch line: '%d'", e->state);
 	}
+}
 
-	has_trailing_newline = (len > 0 && line[len-1] == '\n');
-	if (has_trailing_newline)
-		len--;
-	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-	if (has_trailing_carriage_return)
-		len--;
+static void append_buffered_patch_line(struct diff_options *o,
+				       struct buffered_patch_line *e)
+{
+	struct buffered_patch_line *f;
+	ALLOC_GROW(o->line_buffer,
+		   o->line_buffer_nr + 1,
+		   o->line_buffer_alloc);
+	f = &o->line_buffer[o->line_buffer_nr++];
 
-	if (len || sign) {
-		if (set)
-			fputs(set, file);
-		if (sign)
-			fputc(sign, file);
-		fwrite(line, len, 1, file);
-		if (reset)
-			fputs(reset, file);
-	}
-	if (has_trailing_carriage_return)
-		fputc('\r', file);
-	if (has_trailing_newline)
-		fputc('\n', file);
+	memcpy(f, e, sizeof(struct buffered_patch_line));
+	f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
+}
+
+void emit_line(struct diff_options *o,
+	       const char *set, const char *reset,
+	       int add_line_prefix, int markup_ws,
+	       int sign, const char *line, int len)
+{
+	struct buffered_patch_line e = {set, reset, line,
+		len, sign, add_line_prefix,
+		markup_ws ? BPL_EMIT_LINE_WS : BPL_EMIT_LINE_ASIS};
+
+	if (o->use_buffer)
+		append_buffered_patch_line(o, &e);
+	else
+		emit_buffered_patch_line(o, &e);
 }
 
 void emit_line_fmt(struct diff_options *o,
@@ -1171,6 +1203,18 @@ static void diff_words_flush(struct emit_callback *ecbdata)
 	if (ecbdata->diff_words->minus.text.size ||
 	    ecbdata->diff_words->plus.text.size)
 		diff_words_show(ecbdata->diff_words);
+
+	if (ecbdata->diff_words->opt->line_buffer_nr) {
+		int i;
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			append_buffered_patch_line(ecbdata->opt,
+				&ecbdata->diff_words->opt->line_buffer[i]);
+
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			free((void*) ecbdata->diff_words->opt->line_buffer[i].line);
+
+		ecbdata->diff_words->opt->line_buffer_nr = 0;
+	}
 }
 
 static void diff_filespec_load_driver(struct diff_filespec *one)
@@ -1206,6 +1250,11 @@ static void init_diff_words_data(struct emit_callback *ecbdata,
 		xcalloc(1, sizeof(struct diff_words_data));
 	ecbdata->diff_words->type = o->word_diff;
 	ecbdata->diff_words->opt = o;
+
+	o->line_buffer = NULL;
+	o->line_buffer_nr = 0;
+	o->line_buffer_alloc = 0;
+
 	if (!o->word_regex)
 		o->word_regex = userdiff_word_regex(one);
 	if (!o->word_regex)
@@ -1240,6 +1289,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
 {
 	if (ecbdata->diff_words) {
 		diff_words_flush(ecbdata);
+		free (ecbdata->diff_words->opt->line_buffer);
 		free (ecbdata->diff_words->opt);
 		free (ecbdata->diff_words->minus.text.ptr);
 		free (ecbdata->diff_words->minus.orig);
@@ -2578,6 +2628,13 @@ static void builtin_diff(const char *name_a,
 			xecfg.ctxlen = strtoul(v, NULL, 10);
 		if (o->word_diff)
 			init_diff_words_data(&ecbdata, o, one, two);
+		if (o->use_buffer) {
+			struct buffered_patch_line e = BUFFERED_PATCH_LINE_INIT;
+			e.state = BPL_HANDOVER;
+			e.line = name_b;
+			e.len = strlen(name_b);
+			append_buffered_patch_line(o, &e);
+		}
 		if (xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 				  &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
@@ -3457,6 +3514,10 @@ void diff_setup(struct diff_options *options)
 		options->a_prefix = "a/";
 		options->b_prefix = "b/";
 	}
+
+	options->line_buffer = NULL;
+	options->line_buffer_nr = 0;
+	options->line_buffer_alloc = 0;
 }
 
 void diff_setup_done(struct diff_options *options)
@@ -4795,11 +4856,36 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
+	/*
+	 * For testing purposes we want to make sure the diff machinery
+	 * works completely with the buffer. If there is anything emitted
+	 * outside the emit_buffered_patch_line, then the order is screwed
+	 * up and the tests will fail.
+	 *
+	 * TODO (later in this series):
+	 * We'll unset this flag in a later patch.
+	 */
+	o->use_buffer = 1;
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		if (check_pair_status(p))
 			diff_flush_patch(p, o);
 	}
+
+	if (o->use_buffer) {
+		for (i = 0; i < o->line_buffer_nr; i++)
+			emit_buffered_patch_line(o, &o->line_buffer[i]);
+
+		for (i = 0; i < o->line_buffer_nr; i++)
+			free((void*)o->line_buffer[i].line);
+
+		free(o->line_buffer);
+
+		o->line_buffer = NULL;
+		o->line_buffer_nr = 0;
+		o->line_buffer_alloc = 0;
+	}
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 85948ed65a..f9fd0ea3ae 100644
--- a/diff.h
+++ b/diff.h
@@ -115,6 +115,41 @@ enum diff_submodule_format {
 	DIFF_SUBMODULE_INLINE_DIFF
 };
 
+/*
+ * This struct is used when we need to buffer the output of the diff output.
+ *
+ * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
+ * into the pre/post image file. This pointer could be a union with the
+ * line pointer. By storing an offset into the file instead of the literal line,
+ * we can decrease the memory footprint for the buffered output. At first we
+ * may want to only have indirection for the content lines, but we could
+ * also have an enum (based on sign?) that stores prefabricated lines, e.g.
+ * the similarity score line or hunk/file headers.
+ */
+struct buffered_patch_line {
+	const char *set;
+	const char *reset;
+	const char *line;
+	int len;
+	int sign;
+	int add_line_prefix;
+	enum {
+		/*
+		 * Emits [lineprefix][set][sign][reset] and then calls
+		 * ws_check_emit which will output "line", marked up
+		 * according to ws_rule.
+		 */
+		BPL_EMIT_LINE_WS,
+
+		/* Emits [lineprefix][set][sign] line [reset] */
+		BPL_EMIT_LINE_ASIS,
+
+		/* Reloads the ws_rule; line contains the file name */
+		BPL_HANDOVER
+	} state;
+};
+#define BUFFERED_PATCH_LINE_INIT {NULL, NULL, NULL, 0, 0, 0}
+
 struct diff_options {
 	const char *orderfile;
 	const char *pickaxe;
@@ -188,6 +223,10 @@ struct diff_options {
 	int diff_path_counter;
 
 	unsigned ws_rule;
+	int use_buffer;
+
+	struct buffered_patch_line *line_buffer;
+	int line_buffer_nr, line_buffer_alloc;
 };
 
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv3 20/20] diff.c: color moved lines differently
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (18 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 19/20] diff: buffer all output if asked to Stefan Beller
@ 2017-05-18 19:37     ` Stefan Beller
  2017-05-19 18:23       ` Jonathan Tan
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
  20 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-18 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: bmwill, git, gitster, jonathantanmy, jrnieder, mhagger, peff

When there is a lot of code moved around such as in 11979b9 (2005-11-18,
"http.c: reorder to avoid compilation failure.") for example, the review
process is quite hard, as it is not mentally challenging.  It is a rather
tedious process, that gets boring quickly. However you still need to read
through all of the code to make sure the moved lines are there as supposed.

While it is trivial to color up a patch like the following

    $ git diff
    diff --git a/file2.c b/file2.c
    index 9163a0f..8e66dc0 100644
    --- a/file2.c
    +++ b/file2.c
    @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
            return memcpy(xmallocz(len), data, len);
     }

    -int secure_foo(struct user *u)
    -{
    -       if (!u->is_allowed_foo)
    -               return;
    -       foo(u);
    -}
    -
     char *xstrndup(const char *str, size_t len)
     {
            char *p = memchr(str, '\0', len);
    diff --git a/test.c b/test.c
    index a95e6fe..81eb0eb 100644
    --- a/test.c
    +++ b/test.c
    @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
            return total;
     }

    +int secure_foo(struct user *u)
    +{
    +       if (!u->is_allowed_foo)
    +               return;
    +       foo(u);
    +}
    +
     int xdup(int fd)
     {
            int ret = dup(fd);

as in this patch all lines that add or remove lines
should be colored in the new color that indicates moved
lines.

However the intention of this patch is to aid reviewers
to spotting permutations in the moved code. So consider the
following malicious move:

    diff --git a/file2.c b/file2.c
    index 9163a0f..8e66dc0 100644
    --- a/file2.c
    +++ b/file2.c
    @@ -3,13 +3,6 @@ void *xmemdupz(const void *data, size_t len)
            return memcpy(xmallocz(len), data, len);
     }

    -int secure_foo(struct user *u)
    -{
    -       if (!u->is_allowed_foo)
    -               return;
    -       foo(u);
    -}
    -
     char *xstrndup(const char *str, size_t len)
     {
            char *p = memchr(str, '\0', len);
    diff --git a/test.c b/test.c
    index a95e6fe..a679c40 100644
    --- a/test.c
    +++ b/test.c
    @@ -18,6 +18,13 @@ ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset)
            return total;
     }

    +int secure_foo(struct user *u)
    +{
    +       foo(u);
    +       if (!u->is_allowed_foo)
    +               return;
    +}
    +
     int xdup(int fd)
     {
            int ret = dup(fd);

If the moved code is larger, it is easier to hide some permutation in the
code, which is why we would not want to color all lines as "moved" in this
case. So we do not just need to color lines differently that are added and
removed in the same diff, we need to tweak the algorithm a bit more.

As the reviewers attention should be brought to the places, where the
difference is introduced to the moved code, we cannot just have one new
color for all of moved code.

First I implemented an alternative design, which would show a moved hunk
in one color, and its boundaries in another color. This idea was error
prone as it inspected each line and its neighboring lines to determine
if the line was (a) moved and (b) if was deep inside a hunk by having
matching neighboring lines. This is unreliable as the we can construct
hunks which have equal neighbors that just exceed the number of lines
inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that
is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then switches color to the alternative color
for the next hunk. By doing this any permutation is recognized and
displayed. That implies that there is no dedicated boundary or
inside-hunk color, but instead we'll have just two colors alternating
for hunks.

It would be a bit more UX friendly if the two corresponding hunks
(of added and deleted lines) for one move would get the same color id.
(Both get "regular moved" or "alternative moved"). This problem is
deferred to a later patch for now.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

Algorithm-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/config.txt   |  14 ++-
 diff.c                     | 266 +++++++++++++++++++++++++++++++++++++++++++--
 diff.h                     |  11 +-
 t/t4015-diff-whitespace.sh | 229 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 507 insertions(+), 13 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..902d017c3b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,24 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=<when>]` option.
 
+color.moved::
+	A boolean value, whether a diff should color moved lines
+	differently. The moved lines are searched for in the diff only.
+	Duplicated lines from somewhere in the project that are not
+	part of the diff are not colored as moved.
+	Defaults to false.
+
 color.diff.<slot>::
 	Use customized color for diff colorization.  `<slot>` specifies
 	which part of the patch to use the specified color, and is one
 	of `context` (context text - `plain` is a historical synonym),
 	`meta` (metainformation), `frag`
 	(hunk header), 'func' (function in hunk header), `old` (removed lines),
-	`new` (added lines), `commit` (commit headers), or `whitespace`
-	(highlighting whitespace errors).
+	`new` (added lines), `commit` (commit headers), `whitespace`
+	(highlighting whitespace errors), `oldMoved` (removed lines that
+	reappear), `newMoved` (added lines that were removed elsewhere),
+	`oldMovedAlternative` and `newMovedAlternative` (as a fallback to
+	cover adjacent blocks of moved code)
 
 color.decorate.<slot>::
 	Use customized color for 'git log --decorate' output.  `<slot>` is one
diff --git a/diff.c b/diff.c
index 2ccf93cd09..451cab2875 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static int diff_indent_heuristic; /* experimental */
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
+static int diff_color_moved_default;
 static int diff_context_default = 3;
 static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
@@ -55,6 +56,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
 	GIT_COLOR_YELLOW,	/* COMMIT */
 	GIT_COLOR_BG_RED,	/* WHITESPACE */
 	GIT_COLOR_NORMAL,	/* FUNCINFO */
+	GIT_COLOR_BOLD_RED,	/* OLD_MOVED_A */
+	GIT_COLOR_BG_RED,	/* OLD_MOVED_B */
+	GIT_COLOR_BOLD_GREEN,	/* NEW_MOVED_A */
+	GIT_COLOR_BG_GREEN,	/* NEW_MOVED_B */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -80,6 +85,14 @@ static int parse_diff_color_slot(const char *var)
 		return DIFF_WHITESPACE;
 	if (!strcasecmp(var, "func"))
 		return DIFF_FUNCINFO;
+	if (!strcasecmp(var, "oldmoved"))
+		return DIFF_FILE_OLD_MOVED;
+	if (!strcasecmp(var, "oldmovedalternative"))
+		return DIFF_FILE_OLD_MOVED_ALT;
+	if (!strcasecmp(var, "newmoved"))
+		return DIFF_FILE_NEW_MOVED;
+	if (!strcasecmp(var, "newmovedalternative"))
+		return DIFF_FILE_NEW_MOVED_ALT;
 	return -1;
 }
 
@@ -234,6 +247,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_use_color_default = git_config_colorbool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "color.moved")) {
+		diff_color_moved_default = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.context")) {
 		diff_context_default = git_config_int(var, value);
 		if (diff_context_default < 0)
@@ -354,6 +371,88 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+struct moved_entry {
+	struct hashmap_entry ent;
+	const struct buffered_patch_line *line;
+	struct moved_entry *next_line;
+};
+
+static void get_ws_cleaned_string(const struct buffered_patch_line *l,
+				  struct strbuf *out)
+{
+	int i;
+	for (i = 0; i < l->len; i++) {
+		if (isspace(l->line[i]))
+			continue;
+		strbuf_addch(out, l->line[i]);
+	}
+}
+
+static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
+					 const struct buffered_patch_line *b,
+					 const void *keydata)
+{
+	int ret;
+	struct strbuf sba = STRBUF_INIT;
+	struct strbuf sbb = STRBUF_INIT;
+
+	get_ws_cleaned_string(a, &sba);
+	get_ws_cleaned_string(b, &sbb);
+	ret = sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
+
+	strbuf_release(&sba);
+	strbuf_release(&sbb);
+	return ret;
+}
+
+static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
+				   const struct buffered_patch_line *b,
+				   const void *keydata)
+{
+	return a->len != b->len || strncmp(a->line, b->line, a->len);
+}
+
+static int moved_entry_cmp(const struct moved_entry *a,
+			   const struct moved_entry *b,
+			   const void *keydata)
+{
+	return buffered_patch_line_cmp(a->line, b->line, keydata);
+}
+
+static int moved_entry_cmp_no_ws(const struct moved_entry *a,
+				 const struct moved_entry *b,
+				 const void *keydata)
+{
+	return buffered_patch_line_cmp_no_ws(a->line, b->line, keydata);
+}
+
+static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
+{
+	static struct strbuf sb = STRBUF_INIT;
+
+	if (ignore_ws) {
+		strbuf_reset(&sb);
+		get_ws_cleaned_string(line, &sb);
+		return memhash(sb.buf, sb.len);
+	} else {
+		return memhash(line->line, line->len);
+	}
+}
+
+static struct moved_entry *prepare_entry(struct diff_options *o,
+					 int line_no)
+{
+	struct moved_entry *ret = xmalloc(sizeof(*ret));
+	unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+	struct buffered_patch_line *l = &o->line_buffer[line_no];
+
+	ret->ent.hash = get_line_hash(l, ignore_ws);
+	ret->line = l;
+	ret->next_line = NULL;
+
+	return ret;
+}
+
 static char *quote_two(const char *one, const char *two)
 {
 	int need_one = quote_c_style(one, NULL, NULL, 1);
@@ -516,6 +615,135 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
+static void add_lines_to_move_detection(struct diff_options *o)
+{
+	struct moved_entry *prev_line;
+
+	int n;
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		int sign = 0;
+		struct hashmap *hm;
+		struct moved_entry *key;
+
+		switch (o->line_buffer[n].sign) {
+		case '+':
+			sign = '+';
+			hm = o->added_lines;
+			break;
+		case '-':
+			sign = '-';
+			hm = o->deleted_lines;
+			break;
+		case ' ':
+		default:
+			prev_line = NULL;
+			continue;
+		}
+
+		key = prepare_entry(o, n);
+		if (prev_line &&
+		    prev_line->line->sign == sign)
+			prev_line->next_line = key;
+
+		hashmap_add(hm, key);
+		prev_line = key;
+	}
+}
+
+static void mark_color_as_moved(struct diff_options *o)
+{
+	struct moved_entry **pmb = NULL; /* potentially moved blocks */
+	int pmb_nr = 0, pmb_alloc = 0;
+	int alt_flag = 0;
+	int n;
+
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		struct hashmap *hm = NULL;
+		struct moved_entry *key;
+		struct moved_entry *match = NULL;
+		struct buffered_patch_line *l = &o->line_buffer[n];
+		int i, lp, rp;
+
+		switch (l->sign) {
+		case '+':
+			hm = o->deleted_lines;
+			break;
+		case '-':
+			hm = o->added_lines;
+			break;
+		default:
+			alt_flag = 0; /* reset to standard, no-alt move color */
+			pmb_nr = 0; /* no running sets */
+			continue;
+		}
+
+		/* Check for any match to color it as a move. */
+		key = prepare_entry(o, n);
+		match = hashmap_get(hm, key, o);
+		free(key);
+		if (!match)
+			continue;
+
+		/* Check any potential block runs, advance each or nullify */
+		for (i = 0; i < pmb_nr; i++) {
+			struct moved_entry *p = pmb[i];
+			struct moved_entry *pnext = (p && p->next_line) ?
+					p->next_line : NULL;
+			if (pnext &&
+			    !buffered_patch_line_cmp(pnext->line, l, o)) {
+				pmb[i] = p->next_line;
+			} else {
+				pmb[i] = NULL;
+			}
+		}
+
+		/* Shrink the set to the remaining runs */
+		for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
+			while (lp < pmb_nr && pmb[lp])
+				lp ++;
+			/* lp points at the first NULL now */
+
+			while (rp > -1 && !pmb[rp])
+				rp--;
+			/* rp points at the last non-NULL */
+
+			if (lp < pmb_nr && rp > -1 && lp < rp) {
+				pmb[lp] = pmb[rp];
+				pmb[rp] = NULL;
+				rp--;
+				lp++;
+			}
+		}
+
+		if (rp > -1) {
+			/* Remember the number of running sets */
+			pmb_nr = rp + 1;
+		} else {
+			/* Toggle color */
+			alt_flag = (alt_flag + 1) % 2;
+
+			/* Build up a new set */
+			pmb_nr = 0;
+			for (; match; match = hashmap_get_next(hm, match)) {
+				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				pmb[pmb_nr++] = match;
+			}
+		}
+
+		switch (l->sign) {
+		case '+':
+			l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
+			break;
+		case '-':
+			l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
+			break;
+		default:
+			die("BUG: we should have continued earlier?");
+		}
+	}
+	free(pmb);
+}
+
 static void emit_buffered_patch_line(struct diff_options *o,
 				     struct buffered_patch_line *e)
 {
@@ -3518,6 +3746,8 @@ void diff_setup(struct diff_options *options)
 	options->line_buffer = NULL;
 	options->line_buffer_nr = 0;
 	options->line_buffer_alloc = 0;
+
+	options->color_moved = diff_color_moved_default;
 }
 
 void diff_setup_done(struct diff_options *options)
@@ -3627,6 +3857,9 @@ void diff_setup_done(struct diff_options *options)
 
 	if (DIFF_OPT_TST(options, FOLLOW_RENAMES) && options->pathspec.nr != 1)
 		die(_("--follow requires exactly one pathspec"));
+
+	if (!options->use_color || external_diff())
+		options->color_moved = 0;
 }
 
 static int opt_arg(const char *arg, int arg_short, const char *arg_long, int *val)
@@ -4051,6 +4284,10 @@ int diff_opt_parse(struct diff_options *options,
 	}
 	else if (!strcmp(arg, "--no-color"))
 		options->use_color = 0;
+	else if (!strcmp(arg, "--color-moved"))
+		options->color_moved = 1;
+	else if (!strcmp(arg, "--no-color-moved"))
+		options->color_moved = 0;
 	else if (!strcmp(arg, "--color-words")) {
 		options->use_color = 1;
 		options->word_diff = DIFF_WORDS_COLOR;
@@ -4856,16 +5093,19 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
-	/*
-	 * For testing purposes we want to make sure the diff machinery
-	 * works completely with the buffer. If there is anything emitted
-	 * outside the emit_buffered_patch_line, then the order is screwed
-	 * up and the tests will fail.
-	 *
-	 * TODO (later in this series):
-	 * We'll unset this flag in a later patch.
-	 */
-	o->use_buffer = 1;
+
+	if (o->color_moved) {
+		unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+		o->use_buffer = 1;
+		o->deleted_lines = xmallocz(sizeof(*o->deleted_lines));
+		o->added_lines = xmallocz(sizeof(*o->added_lines));
+		hashmap_init(o->deleted_lines, ignore_ws ?
+			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+			(hashmap_cmp_fn)moved_entry_cmp, 0);
+		hashmap_init(o->added_lines, ignore_ws ?
+			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+			(hashmap_cmp_fn)moved_entry_cmp, 0);
+	}
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -4874,6 +5114,11 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 	}
 
 	if (o->use_buffer) {
+		if (o->color_moved) {
+			add_lines_to_move_detection(o);
+			mark_color_as_moved(o);
+		}
+
 		for (i = 0; i < o->line_buffer_nr; i++)
 			emit_buffered_patch_line(o, &o->line_buffer[i]);
 
@@ -4962,6 +5207,7 @@ void diff_flush(struct diff_options *options)
 		if (!options->file)
 			die_errno("Could not open /dev/null");
 		options->close_file = 1;
+		options->color_moved = 0;
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
 			if (check_pair_status(p))
diff --git a/diff.h b/diff.h
index f9fd0ea3ae..2d86e3a012 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "hashmap.h"
 
 struct rev_info;
 struct diff_options;
@@ -227,6 +228,10 @@ struct diff_options {
 
 	struct buffered_patch_line *line_buffer;
 	int line_buffer_nr, line_buffer_alloc;
+
+	int color_moved;
+	struct hashmap *deleted_lines;
+	struct hashmap *added_lines;
 };
 
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
@@ -241,7 +246,11 @@ enum color_diff {
 	DIFF_FILE_NEW = 5,
 	DIFF_COMMIT = 6,
 	DIFF_WHITESPACE = 7,
-	DIFF_FUNCINFO = 8
+	DIFF_FUNCINFO = 8,
+	DIFF_FILE_OLD_MOVED = 9,
+	DIFF_FILE_OLD_MOVED_ALT = 10,
+	DIFF_FILE_NEW_MOVED = 11,
+	DIFF_FILE_NEW_MOVED_ALT = 12
 };
 const char *diff_get_color(int diff_use_color, enum color_diff ix);
 #define diff_get_color_opt(o, ix) \
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 289806d0c7..232d9ad55e 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -972,4 +972,233 @@ test_expect_success 'option overrides diff.wsErrorHighlight' '
 
 '
 
+test_expect_success 'detect moved code, complete file' '
+	git reset --hard &&
+	cat <<-\EOF >test.c &&
+	#include<stdio.h>
+	main()
+	{
+	printf("Hello World");
+	}
+	EOF
+	git add test.c &&
+	git commit -m "add main function" &&
+	git mv test.c main.c &&
+	git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>new file mode 100644<RESET>
+	<BOLD>index 0000000..a986c57<RESET>
+	<BOLD>--- /dev/null<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -0,0 +1,5 @@<RESET>
+	<BGREEN>+<RESET><BGREEN>#include<stdio.h><RESET>
+	<BGREEN>+<RESET><BGREEN>main()<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>printf("Hello World");<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>deleted file mode 100644<RESET>
+	<BOLD>index a986c57..0000000<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ /dev/null<RESET>
+	<CYAN>@@ -1,5 +0,0 @@<RESET>
+	<BRED>-#include<stdio.h><RESET>
+	<BRED>-main()<RESET>
+	<BRED>-{<RESET>
+	<BRED>-printf("Hello World");<RESET>
+	<BRED>-}<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect moved code, inside file' '
+	git reset --hard &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git add main.c test.c &&
+	git commit -m "add main and test file" &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BRED>-if (!u->is_allowed_foo)<RESET>
+	<BRED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BRED>-}<RESET>
+	<BRED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..e34eb69 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BGREEN>+<RESET><BGREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BGREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect permutations inside moved code, ' '
+	# reusing the move example from last test:
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			foo(u);
+			if (!u->is_allowed_foo)
+				return;
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BOLD;RED>-if (!u->is_allowed_foo)<RESET>
+	<BOLD;RED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BOLD;RED>-}<RESET>
+	<BOLD;RED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..2bedec9 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>}<RESET>
+	<BOLD;GREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter
  2017-05-18 19:37     ` [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
@ 2017-05-18 23:33       ` Jonathan Tan
  2017-05-22 23:36         ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-18 23:33 UTC (permalink / raw)
  To: Stefan Beller; +Cc: bmwill, git, gitster, jrnieder, mhagger, peff

On Thu, 18 May 2017 12:37:30 -0700
Stefan Beller <sbeller@google.com> wrote:

> Teach emit_line_0 take a "sign" parameter specifically intended
> to hold the sign of the line instead of a separate "first" parameter
> representing the first character of the line to be printed.  Callers
> that store the sign and line separately can use the "sign" parameter
> like they used the "first" parameter previously, and callers that
> store the sign and line together (or do not have a sign) no longer
> need to manipulate their arguments to fit the requirements of
> emit_line_0.

I know I suggested the paragraph above, but after rereading your patch
set, I think I finally understand what you're trying to accomplish.
I think it's better to combine patches 4/20, 5/20, and 6/20, with the
following commit message:

  diff: introduce more flexible emit function

  Currently, diff output is written either through the emit_line_0
  function or through the FILE * in struct diff_options directly. To
  make it easier to teach diff to buffer its output (which will be done
  in a subsequent commit), introduce a more flexible emit() function. In
  this commit, direct usages of emit_line_0() are replaced with emit();
  subsequent commits will also replace usages of the FILE * with emit().

And the function itself can be documented this way (with the appropriate
formatting):

  /*
  Emits the following line or part of line. It is expected that "set"
  and "reset", if not NULL, should contain terminal color codes (or
  markup denoting color) and that "sign", if not 0, should contain '+',
  '-', or ' '. But those are not requirements.
  */

If you do all that, then the buffering patch (19/20) can be improved by
adding this comment somewhere in the file:

  Buffer the diff output into ??? instead of immediately writing it to
  "file". 

  NEEDSWORK: The contents of the ??? array - in particular, how the diff
  output is divided into array elements - is not precisely defined; some
  functions may emit a line all at once (resulting in one element)
  whereas some others may emit a line piecemeal (resulting in more than
  one element). Ideally, the code in this file should be structured so
  that we do not have such imprecision, but in the meantime, callers
  that request buffering should ensure that the diff output is divided
  the way they expect (and have tests to ensure that it remains so).

> With this patch other callers hard code the sign (which are '+', '-',
> ' ' and '\\') such that we do not run into unexpectedly emitting an
> erroneous '\0'.

I still don't understand this paragraph - can you rewrite this in the
imperative tense?
 
> The audit of the caller revealed that the sign cannot be '\n' or '\r',
> so remove that condition for trailing newline or carriage return in
> the sign; the else part of the condition handles the len==0 perfectly,
> so we can drop the if/else construct.
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  diff.c | 39 ++++++++++++++++-----------------------
>  1 file changed, 16 insertions(+), 23 deletions(-)
[snip]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv3 20/20] diff.c: color moved lines differently
  2017-05-18 19:37     ` [PATCHv3 20/20] diff.c: color moved lines differently Stefan Beller
@ 2017-05-19 18:23       ` Jonathan Tan
  2017-05-19 18:40         ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Jonathan Tan @ 2017-05-19 18:23 UTC (permalink / raw)
  To: Stefan Beller; +Cc: bmwill, git, gitster, jrnieder, mhagger, peff

On Thu, 18 May 2017 12:37:46 -0700
Stefan Beller <sbeller@google.com> wrote:

[snip]

> Instead this provides a dynamic programming greedy algorithm that

Not sure if this is called "dynamic programming".

> finds the largest moved hunk and then switches color to the
> alternative color for the next hunk. By doing this any permutation is
> recognized and displayed. That implies that there is no dedicated
> boundary or inside-hunk color, but instead we'll have just two colors
> alternating for hunks.

[snip]

I would title this "color moved blocks differently" to emphasize that we
are treating the moves in terms of blocks, not just lines.

The first part of the commit message could probably be written more
concisely, like the following:

  When a patch consists mostly of moving blocks of code around, it can
  be quite tedious to ensure that the blocks are moved verbatim, and not
  undesirably modified in the move. To that end, color blocks that are
  moved within the same patch differently. For example (OM, del, add,
  and NM are different colors):

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OM]  -        if (!is_authorized_user())
    [OM]  -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OM]  -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NM]  +        sensitive_stuff(spanning,
    [NM]  +                        multiple,
    [NM]  +                        lines);
    [NM]  +}

  Adjacent blocks are colored differently. For example, in this
  potentially malicious patch, the swapping of blocks can be spotted:

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OMA] -        if (!is_authorized_user())
    [OMA] -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OMA] -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NMA] +        sensitive_stuff(spanning,
    [NMA] +                        multiple,
    [NMA] +                        lines);
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NMA] +}

Having said that, thanks - this version is much more like what I would
expect.

> +static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
> +					 const struct buffered_patch_line *b,
> +					 const void *keydata)
> +{
> +	int ret;
> +	struct strbuf sba = STRBUF_INIT;
> +	struct strbuf sbb = STRBUF_INIT;
> +
> +	get_ws_cleaned_string(a, &sba);
> +	get_ws_cleaned_string(b, &sbb);
> +	ret = sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
> +	strbuf_release(&sba);
> +	strbuf_release(&sbb);
> +	return ret;
> +}
> +
> +static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
> +				   const struct buffered_patch_line *b,
> +				   const void *keydata)
> +{
> +	return a->len != b->len || strncmp(a->line, b->line, a->len);
> +}

Instead of having 2 versions of all the comparison functions, could the
ws-ness be passed as the keydata?

> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
> +{
> +	static struct strbuf sb = STRBUF_INIT;
> +
> +	if (ignore_ws) {
> +		strbuf_reset(&sb);
> +		get_ws_cleaned_string(line, &sb);

Memory leak here, I think.

> +		return memhash(sb.buf, sb.len);
> +	} else {
> +		return memhash(line->line, line->len);
> +	}
> +}

[snip]

> +static void add_lines_to_move_detection(struct diff_options *o)
> +{
> +	struct moved_entry *prev_line;

gcc says (rightly) that this must be initialized.

> +
> +	int n;
> +	for (n = 0; n < o->line_buffer_nr; n++) {
> +		int sign = 0;
> +		struct hashmap *hm;
> +		struct moved_entry *key;
> +
> +		switch (o->line_buffer[n].sign) {
> +		case '+':
> +			sign = '+';
> +			hm = o->added_lines;
> +			break;
> +		case '-':
> +			sign = '-';
> +			hm = o->deleted_lines;
> +			break;
> +		case ' ':
> +		default:
> +			prev_line = NULL;
> +			continue;
> +		}
> +
> +		key = prepare_entry(o, n);
> +		if (prev_line &&
> +		    prev_line->line->sign == sign)
> +			prev_line->next_line = key;
> +
> +		hashmap_add(hm, key);
> +		prev_line = key;
> +	}
> +}
> +
> +static void mark_color_as_moved(struct diff_options *o)
> +{
> +	struct moved_entry **pmb = NULL; /* potentially moved blocks */
> +	int pmb_nr = 0, pmb_alloc = 0;
> +	int alt_flag = 0;

Probably call this "use_alt_color" or something similar.

> +	int n;
> +
> +	for (n = 0; n < o->line_buffer_nr; n++) {
> +		struct hashmap *hm = NULL;
> +		struct moved_entry *key;
> +		struct moved_entry *match = NULL;
> +		struct buffered_patch_line *l = &o->line_buffer[n];
> +		int i, lp, rp;
> +
> +		switch (l->sign) {
> +		case '+':
> +			hm = o->deleted_lines;
> +			break;
> +		case '-':
> +			hm = o->added_lines;
> +			break;
> +		default:
> +			alt_flag = 0; /* reset to standard, no-alt move color */
> +			pmb_nr = 0; /* no running sets */
> +			continue;
> +		}
> +
> +		/* Check for any match to color it as a move. */
> +		key = prepare_entry(o, n);
> +		match = hashmap_get(hm, key, o);
> +		free(key);
> +		if (!match)
> +			continue;
> +
> +		/* Check any potential block runs, advance each or nullify */
> +		for (i = 0; i < pmb_nr; i++) {
> +			struct moved_entry *p = pmb[i];
> +			struct moved_entry *pnext = (p && p->next_line) ?
> +					p->next_line : NULL;
> +			if (pnext &&
> +			    !buffered_patch_line_cmp(pnext->line, l, o)) {
> +				pmb[i] = p->next_line;
> +			} else {
> +				pmb[i] = NULL;
> +			}

Memory leak of pmb[i] somewhere here?

> +		}
> +
> +		/* Shrink the set to the remaining runs */
> +		for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
> +			while (lp < pmb_nr && pmb[lp])
> +				lp ++;
> +			/* lp points at the first NULL now */
> +
> +			while (rp > -1 && !pmb[rp])
> +				rp--;
> +			/* rp points at the last non-NULL */
> +
> +			if (lp < pmb_nr && rp > -1 && lp < rp) {
> +				pmb[lp] = pmb[rp];
> +				pmb[rp] = NULL;
> +				rp--;
> +				lp++;
> +			}
> +		}
> +
> +		if (rp > -1) {
> +			/* Remember the number of running sets */
> +			pmb_nr = rp + 1;
> +		} else {
> +			/* Toggle color */
> +			alt_flag = (alt_flag + 1) % 2;
> +
> +			/* Build up a new set */
> +			pmb_nr = 0;
> +			for (; match; match = hashmap_get_next(hm, match)) {
> +				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
> +				pmb[pmb_nr++] = match;
> +			}
> +		}
> +
> +		switch (l->sign) {
> +		case '+':
> +			l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
> +			break;
> +		case '-':
> +			l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
> +			break;
> +		default:
> +			die("BUG: we should have continued earlier?");
> +		}
> +	}
> +	free(pmb);
> +}

[snip]

> @@ -4874,6 +5114,11 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>  
>  	if (o->use_buffer) {
> +		if (o->color_moved) {

Can you just declare the two hashmaps here, so that we do not need to
put them in o? They don't seem to be used outside this block anyway.

> +			add_lines_to_move_detection(o);
> +			mark_color_as_moved(o);
> +		}
> +
>  		for (i = 0; i < o->line_buffer_nr; i++)
>  			emit_buffered_patch_line(o, &o->line_buffer[i]); 

[snip]

> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 289806d0c7..232d9ad55e 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh

As for the tests, also add a test checking the interaction with
whitespace highlighting, and a test showing that diff errors out if we
ask for both move coloring and word-by-word diffing.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv3 20/20] diff.c: color moved lines differently
  2017-05-19 18:23       ` Jonathan Tan
@ 2017-05-19 18:40         ` Stefan Beller
  2017-05-19 19:34           ` Jonathan Tan
  0 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-19 18:40 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Brandon Williams, git@vger.kernel.org, Junio C Hamano,
	Jonathan Nieder, Michael Haggerty, Jeff King

On Fri, May 19, 2017 at 11:23 AM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Thu, 18 May 2017 12:37:46 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
> [snip]
>
>> Instead this provides a dynamic programming greedy algorithm that
>
> Not sure if this is called "dynamic programming".

https://loveforprogramming.quora.com/Backtracking-Memoization-Dynamic-Programming
http://stackoverflow.com/questions/3592943/difference-between-back-tracking-and-dynamic-programming

Instead of doing backtracking (finding the lengthiest hunk for
each line), we keep a set of potential hunks around, this sounds
very much like the examples given in these links.


> The first part of the commit message could probably be written more
> concisely, like the following:
...
> Having said that, thanks - this version is much more like what I would
> expect.

Thanks for giving a more concise commit message, will fix in a reroll.


>
>> +static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
>
>> +static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
>
> Instead of having 2 versions of all the comparison functions, could the
> ws-ness be passed as the keydata?

No, this is misuse use of the API, peff explains:

https://public-inbox.org/git/20170513085050.plmau5ffvzn6ibfp@sigill.intra.peff.net/



>
>> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
>> +{
>> +     static struct strbuf sb = STRBUF_INIT;
>> +
>> +     if (ignore_ws) {
>> +             strbuf_reset(&sb);
>> +             get_ws_cleaned_string(line, &sb);
>
> Memory leak here, I think.

It's static, so we don't care.
I can make it non-static and release the memory in a resend.

>
>> +             return memhash(sb.buf, sb.len);
>> +     } else {
>> +             return memhash(line->line, line->len);
>> +     }
>> +}
>
> [snip]
>
>> +static void add_lines_to_move_detection(struct diff_options *o)
>> +{
>> +     struct moved_entry *prev_line;
>
> gcc says (rightly) that this must be initialized.

This is one of the last refactorings I did on this patch, moving
the prev_line out of the diff_options struct (which is memset in its
init), forgot to init it here. will fix.

>> +     int alt_flag = 0;
>
> Probably call this "use_alt_color" or something similar.

Sounds better than alt_flag.

>> +                     struct moved_entry *p = pmb[i];
>> +                     struct moved_entry *pnext = (p && p->next_line) ?
>> +                                     p->next_line : NULL;
>> +                     if (pnext &&
>> +                         !buffered_patch_line_cmp(pnext->line, l, o)) {
>> +                             pmb[i] = p->next_line;
>> +                     } else {
>> +                             pmb[i] = NULL;
>> +                     }
>
> Memory leak of pmb[i] somewhere here?

pmb[] holds pointers into moved)entry elements that
are obtained via  hashmap_get_next(hm, match), such that
any pmb[] element is also part of a hashmap.

When freeing the hashmap, we'll free the memory. This
array doesn't own the underlying memory.

>> @@ -4874,6 +5114,11 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>>
>>       if (o->use_buffer) {
>> +             if (o->color_moved) {
>
> Can you just declare the two hashmaps here, so that we do not need to
> put them in o? They don't seem to be used outside this block anyway.

Obviously. Thanks for that pointer as well.


>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 289806d0c7..232d9ad55e 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>
> As for the tests, also add a test checking the interaction with
> whitespace highlighting, and a test showing that diff errors out if we
> ask for both move coloring and word-by-word diffing.

We do not error out, but ignore the move heuristic doesn't find any
blocks. I can make it error out, instead. (and add tests)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv3 20/20] diff.c: color moved lines differently
  2017-05-19 18:40         ` Stefan Beller
@ 2017-05-19 19:34           ` Jonathan Tan
  0 siblings, 0 replies; 128+ messages in thread
From: Jonathan Tan @ 2017-05-19 19:34 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Brandon Williams, git@vger.kernel.org, Junio C Hamano,
	Jonathan Nieder, Michael Haggerty, Jeff King

On Fri, May 19, 2017 at 11:40 AM, Stefan Beller <sbeller@google.com> wrote:
>>> +static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
>>> +{
>>> +     static struct strbuf sb = STRBUF_INIT;
>>> +
>>> +     if (ignore_ws) {
>>> +             strbuf_reset(&sb);
>>> +             get_ws_cleaned_string(line, &sb);
>>
>> Memory leak here, I think.
>
> It's static, so we don't care.
> I can make it non-static and release the memory in a resend.

Ah, I missed the "static". It seems that "static" is used elsewhere
too, so these functions are not reentrant anyway, so this is fine.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-18 17:12           ` Stefan Beller
@ 2017-05-20  4:50             ` Junio C Hamano
  2017-05-20 22:00               ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-20  4:50 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Jonathan Nieder, Jonathan Tan,
	Brandon Williams, Jeff King, Michael Haggerty

Stefan Beller <sbeller@google.com> writes:

> That could be added in ws.c:ws_check_emit, as these certain words
> are similar to coloring whitespace.

I actually was envisioning of highlighting a part of a line, like

    -Very <red>poor</red> SCM
    +Very <green>nice</red> SCM

which would be done by finding semi-matching removed and added lines
in the same hunk (i.e. local buffering) and makes a coloring decision.
That does not have any place in ws.c.

>> Having said that, we need to start somewhere, and I think it is a
>> reasonable first-cut attempt to work on top of the textual output
>> like this series does (IOW, while I do agree with the NEEDSWORK and
>> the way this series currently does things must be revamped in the
>> longer term, I do not think we should wait until that happens to
>> start playing with this topic).
>
> Ok. I share a similar reaction to submodule diffs that we discuss above
> and word coloring, that Jonathan Tan brought up off list.
>
> Both of them are broken in this implementation, but the NEEDSWORK
> would hint at how to fix them.

Yes, but if NEEDSWORK has to say "the current hack is working at a
wrong level, we need to do all of this before producing textual
diffs that are passed to the layer that colors lines", that wouldn't
help that much as a hint X-<.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-20  4:50             ` Junio C Hamano
@ 2017-05-20 22:00               ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-20 22:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Jonathan Nieder, Jonathan Tan,
	Brandon Williams, Jeff King, Michael Haggerty

On Fri, May 19, 2017 at 9:50 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> That could be added in ws.c:ws_check_emit, as these certain words
>> are similar to coloring whitespace.
>
> I actually was envisioning of highlighting a part of a line, like
>
>     -Very <red>poor</red> SCM
>     +Very <green>nice</red> SCM
>
> which would be done by finding semi-matching removed and added lines
> in the same hunk (i.e. local buffering) and makes a coloring decision.
> That does not have any place in ws.c.

Yes such a feature would not want to be in ws.c

For the problem above, we're still fine with the given data structures IMO.
Though it may hint at bad naming of the struct 'buffered_patch_line'
as it is not a complete line.

Assuming the example above, running "show --word-diff" would produce
the following "buffered_patch_lines"

{show_prefix=1, sign='\0', line="-Very ", set=NULL, reset=NULL}
{show_prefix=0, sign='\0', line="{- poor}", set='red', reset='normal'}
{show_prefix=0, sign='\0', line="{+ nice}", set='green', reset='normal'}
{show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL}

so in the future, we may want to produce

{show_prefix=1, sign='-', line="Very ", set=NULL, reset=NULL}
{show_prefix=0, sign='\0', line="poor", set='red', reset='normal'}
{show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL}
{show_prefix=1, sign='+', line="Very ", set=NULL, reset=NULL}
{show_prefix=0, sign='\0', line="nice", set='gren', reset='normal'}
{show_prefix=0, sign='\0', line=" SCM\n", set=NULL, reset=NULL}

instead.

I think the data structure can be used as-is, but is just mis-named.
I'll fix that in a resend.

The algorithm to highlight what changed on a word level after a hunk
would have to be put into the current hunk finding
("mark_color_as_moved"), and then split up the actual lines into pieces
of a line and add different colors like we see above.

>
>>> Having said that, we need to start somewhere, and I think it is a
>>> reasonable first-cut attempt to work on top of the textual output
>>> like this series does (IOW, while I do agree with the NEEDSWORK and
>>> the way this series currently does things must be revamped in the
>>> longer term, I do not think we should wait until that happens to
>>> start playing with this topic).
>>
>> Ok. I share a similar reaction to submodule diffs that we discuss above
>> and word coloring, that Jonathan Tan brought up off list.
>>
>> Both of them are broken in this implementation, but the NEEDSWORK
>> would hint at how to fix them.
>
> Yes, but if NEEDSWORK has to say "the current hack is working at a
> wrong level, we need to do all of this before producing textual
> diffs that are passed to the layer that colors lines", that wouldn't
> help that much as a hint X-<.

For the word coloring, I think we'd just need better post processing.

For cros-submodule move detection, we may want to wait in Brandons
work to be able to run submodule stuff in-process and then we
have the lines buffered directly there, so we can operate on them as well.

I agree that "NEEDSWORK: the current hack is working at a wrong level"
is not useful. But I thought we're in agreement that it is not? I must have
misunderstood parts of what you were saying.

By saying it could go to ws.c in my reply I rather meant:
it could go into some post-processing function of the "buffered_patch_line"
struct. Currently we only have ws_emit_line that does it, but we can do
it either similarly or just split up one buffered_patch_line into more
than one and color up each individually.

This would also be possible for us now, too. Instead of running it through
ws_emit_line we could split up the line and color each piece differently.
However that could be a problem in the algorithm to find similar hunks
(as we do have different rules for added and deleted text).

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter
  2017-05-18 23:33       ` Jonathan Tan
@ 2017-05-22 23:36         ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-22 23:36 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Brandon Williams, git@vger.kernel.org, Junio C Hamano,
	Jonathan Nieder, Michael Haggerty, Jeff King

On Thu, May 18, 2017 at 4:33 PM, Jonathan Tan <jonathantanmy@google.com> wrote:

> I know I suggested the paragraph above, but after rereading your patch
> set, I think I finally understand what you're trying to accomplish.
> I think it's better to combine patches 4/20, 5/20, and 6/20, with the
> following commit message:

and patch 7/20 (diff.c: inline emit_line_0 into emit_line)

>   diff: introduce more flexible emit function
>
>   Currently, diff output is written either through the emit_line_0
>   function or through the FILE * in struct diff_options directly. To
>   make it easier to teach diff to buffer its output (which will be done
>   in a subsequent commit), introduce a more flexible emit() function. In
>   this commit, direct usages of emit_line_0() are replaced with emit();
>   subsequent commits will also replace usages of the FILE * with emit().

ok. sounds reasonable to me. I kept them separate for easier review
originally, but I can certainly combine them again.

>
> And the function itself can be documented this way (with the appropriate
> formatting):

I have some documentation in 19/20 in diff.h for the data structure stored.
We'd want to discuss where to put the documentation. I'd not like to add
such documentation to some static function in diff.c but rather only document
the data structure, and then (after in-lining <...>_0 to emit_line) keep the
function documentation rather short as:

/* see struct buffered_patch_line */
static void emit_line(...lots of args...)
{
    /* work on said data structure */
    ...


> If you do all that, then the buffering patch (19/20) can be improved by
> adding this comment somewhere in the file:
>
>   Buffer the diff output into ??? instead of immediately writing it to
>   "file".
>
>   NEEDSWORK: The contents of the ??? array - in particular, how the diff
>   output is divided into array elements - is not precisely defined; some
>   functions may emit a line all at once (resulting in one element)
>   whereas some others may emit a line piecemeal (resulting in more than
>   one element). Ideally, the code in this file should be structured so
>   that we do not have such imprecision, but in the meantime, callers
>   that request buffering should ensure that the diff output is divided
>   the way they expect (and have tests to ensure that it remains so).

I want to rename that struct to "buffered_diff_piece" or
"partial_diff_output", not implying we have a line or other another
specific piece.

Rethinking the discussion with Junio, what the correct abstraction level is,
we could even name it "one_color_piece", or "colored_diff_part", to
imply we'd want to have as much content as possible in the same
color. (And line endings always being in RESET color, we'd have a
natural separation at EOL)

>
>> With this patch other callers hard code the sign (which are '+', '-',
>> ' ' and '\\') such that we do not run into unexpectedly emitting an
>> erroneous '\0'.
>
> I still don't understand this paragraph - can you rewrite this in the
> imperative tense?

As in this patch the function signature is the same, but just changes
meaning of one of the arguments, one could imagine that there
could be a caller with first='\0' given some interesting data/mode of operation,
which would have instructed to literally output a '\0'. We would not do this
any more as the meaning changed.

However this hypothetical subtle bug was not introduced, because
there are no callers that call the function with a 'first' depending on
user provided data or otherwise hard-coded  expectation of a '\0' output.

Maybe I'll just drop this part.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [PATCHv4 00/17] Diff machine: highlight moved lines.
  2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
                       ` (19 preceding siblings ...)
  2017-05-18 19:37     ` [PATCHv3 20/20] diff.c: color moved lines differently Stefan Beller
@ 2017-05-23  2:40     ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 01/17] diff: readability fix Stefan Beller
                         ` (17 more replies)
  20 siblings, 18 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

v4:
* interdiff to v3 (what is currently origin/sb/diff-color-move) below.
* renamed the "buffered_patch_line" to "diff_line". Originally I planned
  to not carry the "line" part as it can be a piece of a line as well.
  But for the intended functionality it is best to keep the name.
  If we'd want to add more functionality to say have a move detection
  for words as well, we'd rename the struct to have a better name then.
  For now diff_line is the best. (Thanks Jonathan Nieder!)
* tests to demonstrate it doesn't mess with --color-words as well as
  submodules. (Thanks Jonathan Tan!)
* added in the statics (Thanks Ramsay!)
* smaller scope for the hashmaps (Thanks Jonathan Tan!)
* some commit messages were updated, prior patch 4-7 is squashed into one
  (Thanks Jonathan Tan!)
* the tests added revealed an actual fault: now that the submodule process
  is not attached to a dupe of our stdout, it would stop coloring the
  output. We need to pass on use-color explicitly.
* updated the NEEDSWORK comment in the second last patch.

Thanks for bearing,
Stefan

v3:
* see interdiff below.
* fixing one invalid computation (Thanks Junio!)
* I reasoned more about submodule and word diffing, see the commit message
  of the last patch:
  
    A note on the options '--submodule=diff' and '--color-words/--word-diff':
    In the conversion to use emit_line in the prior patches both submodules
    as well as word diff output carefully chose to call emit_line with sign=0.
    All output with sign=0 is ignored for move detection purposes in this
    patch, such that no weird looking output will be generated for these
    cases. This leads to another thought: We could pass on '--color-moved' to
    submodules such that they color up moved lines for themselves. If we'd do
    so only line moves within a repository boundary are marked up.

* better name for emit_line outside of diff.[ch]

v2:
* emit_line now takes an argument that indicates if we want it
  to emit the line prefix as well. This should allow for a more faithful
  refactoring in the beginning. (Thanks Jonathan!)
* fixed memleaks (Thanks Brandon!)
* "git -c color.moved=true log -p" works now! (Thanks Jeff)
* interdiff below, though it is large.
* less intrusive than v1 (Thanks Jonathan!)

v1:

For details on *why* see the commit message of the last commit.

The first five patches are slight refactorings to get into good
shape, the next patches are funneling all output through emit_line_*.

The second last patch introduces an option to buffer up all output
before printing, and then the last patch can color up moved lines
of code.

Any feedback welcome.

Thanks,
Stefan

Stefan Beller (17):
  diff: readability fix
  diff: move line ending check into emit_hunk_header
  diff.c: factor out diff_flush_patch_all_file_pairs
  diff: introduce more flexible emit function
  diff.c: convert fn_out_consume to use emit_line
  diff.c: convert builtin_diff to use emit_line_*
  diff.c: convert emit_rewrite_diff to use emit_line_*
  diff.c: convert emit_rewrite_lines to use emit_line_*
  submodule.c: convert show_submodule_summary to use emit_line_fmt
  diff.c: convert emit_binary_diff_body to use emit_line_*
  diff.c: convert show_stats to use emit_line_*
  diff.c: convert word diffing to use emit_line_*
  diff.c: convert diff_flush to use emit_line_*
  diff.c: convert diff_summary to use emit_line_*
  diff.c: emit_line includes whitespace highlighting
  diff: buffer all output if asked to
  diff.c: color moved lines differently

 Documentation/config.txt   |  14 +-
 diff.c                     | 858 +++++++++++++++++++++++++++++++++------------
 diff.h                     |  59 +++-
 submodule.c                |  89 ++---
 submodule.h                |   9 +-
 t/t4015-diff-whitespace.sh | 267 ++++++++++++++
 6 files changed, 1018 insertions(+), 278 deletions(-)

diff --git a/diff.c b/diff.c
index b977a5d95b..23e70d348e 100644
--- a/diff.c
+++ b/diff.c
@@ -373,11 +373,11 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 
 struct moved_entry {
 	struct hashmap_entry ent;
-	const struct buffered_patch_line *line;
+	const struct diff_line *line;
 	struct moved_entry *next_line;
 };
 
-static void get_ws_cleaned_string(const struct buffered_patch_line *l,
+static void get_ws_cleaned_string(const struct diff_line *l,
 				  struct strbuf *out)
 {
 	int i;
@@ -388,8 +388,8 @@ static void get_ws_cleaned_string(const struct buffered_patch_line *l,
 	}
 }
 
-static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
-					 const struct buffered_patch_line *b,
+static int diff_line_cmp_no_ws(const struct diff_line *a,
+					 const struct diff_line *b,
 					 const void *keydata)
 {
 	int ret;
@@ -405,8 +405,8 @@ static int buffered_patch_line_cmp_no_ws(const struct buffered_patch_line *a,
 	return ret;
 }
 
-static int buffered_patch_line_cmp(const struct buffered_patch_line *a,
-				   const struct buffered_patch_line *b,
+static int diff_line_cmp(const struct diff_line *a,
+				   const struct diff_line *b,
 				   const void *keydata)
 {
 	return a->len != b->len || strncmp(a->line, b->line, a->len);
@@ -416,17 +416,17 @@ static int moved_entry_cmp(const struct moved_entry *a,
 			   const struct moved_entry *b,
 			   const void *keydata)
 {
-	return buffered_patch_line_cmp(a->line, b->line, keydata);
+	return diff_line_cmp(a->line, b->line, keydata);
 }
 
 static int moved_entry_cmp_no_ws(const struct moved_entry *a,
 				 const struct moved_entry *b,
 				 const void *keydata)
 {
-	return buffered_patch_line_cmp_no_ws(a->line, b->line, keydata);
+	return diff_line_cmp_no_ws(a->line, b->line, keydata);
 }
 
-static unsigned get_line_hash(struct buffered_patch_line *line, unsigned ignore_ws)
+static unsigned get_line_hash(struct diff_line *line, unsigned ignore_ws)
 {
 	static struct strbuf sb = STRBUF_INIT;
 
@@ -444,7 +444,7 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 {
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
-	struct buffered_patch_line *l = &o->line_buffer[line_no];
+	struct diff_line *l = &o->line_buffer[line_no];
 
 	ret->ent.hash = get_line_hash(l, ignore_ws);
 	ret->line = l;
@@ -615,7 +615,9 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void add_lines_to_move_detection(struct diff_options *o)
+static void add_lines_to_move_detection(struct diff_options *o,
+					struct hashmap *add_lines,
+					struct hashmap *del_lines)
 {
 	struct moved_entry *prev_line = NULL;
 
@@ -628,11 +630,11 @@ static void add_lines_to_move_detection(struct diff_options *o)
 		switch (o->line_buffer[n].sign) {
 		case '+':
 			sign = '+';
-			hm = o->added_lines;
+			hm = add_lines;
 			break;
 		case '-':
 			sign = '-';
-			hm = o->deleted_lines;
+			hm = del_lines;
 			break;
 		case ' ':
 		default:
@@ -650,29 +652,31 @@ static void add_lines_to_move_detection(struct diff_options *o)
 	}
 }
 
-static void mark_color_as_moved(struct diff_options *o)
+static void mark_color_as_moved(struct diff_options *o,
+				struct hashmap *add_lines,
+				struct hashmap *del_lines)
 {
 	struct moved_entry **pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
-	int alt_flag = 0;
+	int use_alt_color = 0;
 	int n;
 
 	for (n = 0; n < o->line_buffer_nr; n++) {
 		struct hashmap *hm = NULL;
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
-		struct buffered_patch_line *l = &o->line_buffer[n];
+		struct diff_line *l = &o->line_buffer[n];
 		int i, lp, rp;
 
 		switch (l->sign) {
 		case '+':
-			hm = o->deleted_lines;
+			hm = del_lines;
 			break;
 		case '-':
-			hm = o->added_lines;
+			hm = add_lines;
 			break;
 		default:
-			alt_flag = 0; /* reset to standard, no-alt move color */
+			use_alt_color = 0;
 			pmb_nr = 0; /* no running sets */
 			continue;
 		}
@@ -690,7 +694,7 @@ static void mark_color_as_moved(struct diff_options *o)
 			struct moved_entry *pnext = (p && p->next_line) ?
 					p->next_line : NULL;
 			if (pnext &&
-			    !buffered_patch_line_cmp(pnext->line, l, o)) {
+			    !diff_line_cmp(pnext->line, l, o)) {
 				pmb[i] = p->next_line;
 			} else {
 				pmb[i] = NULL;
@@ -720,7 +724,7 @@ static void mark_color_as_moved(struct diff_options *o)
 			pmb_nr = rp + 1;
 		} else {
 			/* Toggle color */
-			alt_flag = (alt_flag + 1) % 2;
+			use_alt_color = (use_alt_color + 1) % 2;
 
 			/* Build up a new set */
 			pmb_nr = 0;
@@ -732,10 +736,12 @@ static void mark_color_as_moved(struct diff_options *o)
 
 		switch (l->sign) {
 		case '+':
-			l->set = diff_get_color_opt(o, DIFF_FILE_NEW_MOVED + alt_flag);
+			l->set = diff_get_color_opt(o,
+				DIFF_FILE_NEW_MOVED + use_alt_color);
 			break;
 		case '-':
-			l->set = diff_get_color_opt(o, DIFF_FILE_OLD_MOVED + alt_flag);
+			l->set = diff_get_color_opt(o,
+				DIFF_FILE_OLD_MOVED + use_alt_color);
 			break;
 		default:
 			die("BUG: we should have continued earlier?");
@@ -744,8 +750,8 @@ static void mark_color_as_moved(struct diff_options *o)
 	free(pmb);
 }
 
-static void emit_buffered_patch_line(struct diff_options *o,
-				     struct buffered_patch_line *e)
+static void emit_diff_line(struct diff_options *o,
+				     struct diff_line *e)
 {
 	const char *ws;
 	int has_trailing_newline, has_trailing_carriage_return;
@@ -756,7 +762,7 @@ static void emit_buffered_patch_line(struct diff_options *o,
 		fputs(diff_line_prefix(o), file);
 
 	switch (e->state) {
-	case BPL_EMIT_LINE_WS:
+	case DIFF_LINE_WS:
 		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
 		if (e->set)
 			fputs(e->set, file);
@@ -767,7 +773,7 @@ static void emit_buffered_patch_line(struct diff_options *o,
 		ws_check_emit(e->line, e->len, o->ws_rule,
 			      file, e->set, e->reset, ws);
 		return;
-	case BPL_EMIT_LINE_ASIS:
+	case DIFF_LINE_ASIS:
 		has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
 		if (has_trailing_newline)
 			len--;
@@ -789,46 +795,46 @@ static void emit_buffered_patch_line(struct diff_options *o,
 		if (has_trailing_newline)
 			fputc('\n', file);
 		return;
-	case BPL_HANDOVER:
-		o->ws_rule = whitespace_rule(e->line); /*read from file, stored in line?*/
+	case DIFF_LINE_RELOAD_WS_RULE:
+		o->ws_rule = whitespace_rule(e->line);
 		return;
 	default:
 		die("BUG: malformatted buffered patch line: '%d'", e->state);
 	}
 }
 
-static void append_buffered_patch_line(struct diff_options *o,
-				       struct buffered_patch_line *e)
+static void append_diff_line(struct diff_options *o,
+				       struct diff_line *e)
 {
-	struct buffered_patch_line *f;
+	struct diff_line *f;
 	ALLOC_GROW(o->line_buffer,
 		   o->line_buffer_nr + 1,
 		   o->line_buffer_alloc);
 	f = &o->line_buffer[o->line_buffer_nr++];
 
-	memcpy(f, e, sizeof(struct buffered_patch_line));
+	memcpy(f, e, sizeof(struct diff_line));
 	f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
 }
 
-void emit_line(struct diff_options *o,
-	       const char *set, const char *reset,
-	       int add_line_prefix, int markup_ws,
-	       int sign, const char *line, int len)
+static void emit_line(struct diff_options *o,
+		      const char *set, const char *reset,
+		      int add_line_prefix, int markup_ws,
+		      int sign, const char *line, int len)
 {
-	struct buffered_patch_line e = {set, reset, line,
+	struct diff_line e = {set, reset, line,
 		len, sign, add_line_prefix,
-		markup_ws ? BPL_EMIT_LINE_WS : BPL_EMIT_LINE_ASIS};
+		markup_ws ? DIFF_LINE_WS : DIFF_LINE_ASIS};
 
 	if (o->use_buffer)
-		append_buffered_patch_line(o, &e);
+		append_diff_line(o, &e);
 	else
-		emit_buffered_patch_line(o, &e);
+		emit_diff_line(o, &e);
 }
 
-void emit_line_fmt(struct diff_options *o,
-		   const char *set, const char *reset,
-		   int add_line_prefix,
-		   const char *fmt, ...)
+static void emit_line_fmt(struct diff_options *o,
+			  const char *set, const char *reset,
+			  int add_line_prefix,
+			  const char *fmt, ...)
 {
 	struct strbuf sb = STRBUF_INIT;
 	va_list ap;
@@ -1435,7 +1441,7 @@ static void diff_words_flush(struct emit_callback *ecbdata)
 	if (ecbdata->diff_words->opt->line_buffer_nr) {
 		int i;
 		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
-			append_buffered_patch_line(ecbdata->opt,
+			append_diff_line(ecbdata->opt,
 				&ecbdata->diff_words->opt->line_buffer[i]);
 
 		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
@@ -1862,8 +1868,8 @@ static void fill_print_name(struct diffstat_file *file)
 	file->print_name = pname;
 }
 
-void print_stat_summary_0(struct diff_options *options, int files,
-			  int insertions, int deletions)
+static void print_stat_summary_0(struct diff_options *options, int files,
+				 int insertions, int deletions)
 {
 	struct strbuf sb = STRBUF_INIT;
 
@@ -2857,11 +2863,11 @@ static void builtin_diff(const char *name_a,
 		if (o->word_diff)
 			init_diff_words_data(&ecbdata, o, one, two);
 		if (o->use_buffer) {
-			struct buffered_patch_line e = BUFFERED_PATCH_LINE_INIT;
-			e.state = BPL_HANDOVER;
+			struct diff_line e = diff_line_INIT;
+			e.state = DIFF_LINE_RELOAD_WS_RULE;
 			e.line = name_b;
 			e.len = strlen(name_b);
-			append_buffered_patch_line(o, &e);
+			append_diff_line(o, &e);
 		}
 		if (xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 				  &xpp, &xecfg))
@@ -5094,18 +5100,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
 
-	if (o->color_moved) {
-		unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+	if (o->color_moved)
 		o->use_buffer = 1;
-		o->deleted_lines = xmallocz(sizeof(*o->deleted_lines));
-		o->added_lines = xmallocz(sizeof(*o->added_lines));
-		hashmap_init(o->deleted_lines, ignore_ws ?
-			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
-			(hashmap_cmp_fn)moved_entry_cmp, 0);
-		hashmap_init(o->added_lines, ignore_ws ?
-			(hashmap_cmp_fn)moved_entry_cmp_no_ws :
-			(hashmap_cmp_fn)moved_entry_cmp, 0);
-	}
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -5115,12 +5111,25 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 	if (o->use_buffer) {
 		if (o->color_moved) {
-			add_lines_to_move_detection(o);
-			mark_color_as_moved(o);
+			struct hashmap add_lines, del_lines;
+			unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+
+			hashmap_init(&del_lines, ignore_ws ?
+				(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+				(hashmap_cmp_fn)moved_entry_cmp, 0);
+			hashmap_init(&add_lines, ignore_ws ?
+				(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+				(hashmap_cmp_fn)moved_entry_cmp, 0);
+
+			add_lines_to_move_detection(o, &add_lines, &del_lines);
+			mark_color_as_moved(o, &add_lines, &del_lines);
+
+			hashmap_free(&add_lines, 0);
+			hashmap_free(&del_lines, 0);
 		}
 
 		for (i = 0; i < o->line_buffer_nr; i++)
-			emit_buffered_patch_line(o, &o->line_buffer[i]);
+			emit_diff_line(o, &o->line_buffer[i]);
 
 		for (i = 0; i < o->line_buffer_nr; i++)
 			free((void *)o->line_buffer[i].line);
diff --git a/diff.h b/diff.h
index 2d86e3a012..445259ebf7 100644
--- a/diff.h
+++ b/diff.h
@@ -123,11 +123,12 @@ enum diff_submodule_format {
  * into the pre/post image file. This pointer could be a union with the
  * line pointer. By storing an offset into the file instead of the literal line,
  * we can decrease the memory footprint for the buffered output. At first we
- * may want to only have indirection for the content lines, but we could
- * also have an enum (based on sign?) that stores prefabricated lines, e.g.
- * the similarity score line or hunk/file headers.
+ * may want to only have indirection for the content lines, but we could also
+ * enhance the state for emitting prefabricated lines, e.g. the similarity
+ * score line or hunk/file headers would only need to store a number or path
+ * and then the output can be constructed later on depending on state.
  */
-struct buffered_patch_line {
+struct diff_line {
 	const char *set;
 	const char *reset;
 	const char *line;
@@ -140,16 +141,16 @@ struct buffered_patch_line {
 		 * ws_check_emit which will output "line", marked up
 		 * according to ws_rule.
 		 */
-		BPL_EMIT_LINE_WS,
+		DIFF_LINE_WS,
 
 		/* Emits [lineprefix][set][sign] line [reset] */
-		BPL_EMIT_LINE_ASIS,
+		DIFF_LINE_ASIS,
 
 		/* Reloads the ws_rule; line contains the file name */
-		BPL_HANDOVER
+		DIFF_LINE_RELOAD_WS_RULE
 	} state;
 };
-#define BUFFERED_PATCH_LINE_INIT {NULL, NULL, NULL, 0, 0, 0}
+#define diff_line_INIT {NULL, NULL, NULL, 0, 0, 0}
 
 struct diff_options {
 	const char *orderfile;
@@ -226,14 +227,13 @@ struct diff_options {
 	unsigned ws_rule;
 	int use_buffer;
 
-	struct buffered_patch_line *line_buffer;
+	struct diff_line *line_buffer;
 	int line_buffer_nr, line_buffer_alloc;
 
 	int color_moved;
-	struct hashmap *deleted_lines;
-	struct hashmap *added_lines;
 };
 
+/* Emit [line_prefix] [set] line [reset] */
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
 		    const char *line, int len);
 
diff --git a/submodule.c b/submodule.c
index 19c63197fb..428c996c97 100644
--- a/submodule.c
+++ b/submodule.c
@@ -550,6 +550,8 @@ void show_submodule_inline_diff(struct diff_options *o, const char *path,
 
 	/* TODO: other options may need to be passed here. */
 	argv_array_push(&cp.args, "diff");
+	if (o->use_color)
+		argv_array_push(&cp.args, "--color=always");
 	argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));
 	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		argv_array_pushf(&cp.args, "--src-prefix=%s%s/",
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 232d9ad55e..0e92bf94bf 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1124,7 +1124,7 @@ test_expect_success 'detect moved code, inside file' '
 	test_cmp expected actual
 '
 
-test_expect_success 'detect permutations inside moved code, ' '
+test_expect_success 'detect permutations inside moved code' '
 	# reusing the move example from last test:
 	cat <<-\EOF >main.c &&
 		#include<stdio.h>
@@ -1201,4 +1201,42 @@ test_expect_success 'detect permutations inside moved code, ' '
 	test_cmp expected actual
 '
 
+test_expect_success 'move detection does not mess up colored words' '
+	cat <<-\EOF >text.txt &&
+	Lorem Ipsum is simply dummy text of the printing and typesetting industry.
+	EOF
+	git add text.txt &&
+	git commit -a -m "clean state" &&
+	cat <<-\EOF >text.txt &&
+	simply Lorem Ipsum dummy is text of the typesetting and printing industry.
+	EOF
+	git diff --color-moved --word-diff >actual &&
+	git diff --word-diff >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'move detection with submodules' '
+	test_create_repo bananas &&
+	echo ripe >bananas/recipe &&
+	git -C bananas add recipe &&
+	test_commit fruit &&
+	test_commit -C bananas recipe &&
+	git submodule add ./bananas &&
+	git add bananas &&
+	git commit -a -m "bananas are like a heavy library?" &&
+	echo foul >bananas/recipe &&
+	echo ripe >fruit.t &&
+
+	git diff --submodule=diff --color-moved >actual &&
+
+	# no move detection as the moved line is across repository boundaries.
+	test_decode_color <actual >decoded_actual &&
+	! grep BGREEN decoded_actual &&
+	! grep BRED decoded_actual &&
+
+	# nor did we mess with it another way
+	git diff --submodule=diff | test_decode_color >expect &&
+	test_cmp expect decoded_actual
+'
+
 test_done



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 01/17] diff: readability fix
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 02/17] diff: move line ending check into emit_hunk_header Stefan Beller
                         ` (16 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

We already have dereferenced 'p->two' into a local variable 'two'. Use
that.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 74283d9001..3f5bf8b5a4 100644
--- a/diff.c
+++ b/diff.c
@@ -3283,8 +3283,8 @@ static void run_diff(struct diff_filepair *p, struct diff_options *o)
 	const char *other;
 	const char *attr_path;
 
-	name  = p->one->path;
-	other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+	name  = one->path;
+	other = (strcmp(name, two->path) ? two->path : NULL);
 	attr_path = name;
 	if (o->prefix_length)
 		strip_prefix(o->prefix_length, &name, &other);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 02/17] diff: move line ending check into emit_hunk_header
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 01/17] diff: readability fix Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 03/17] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
                         ` (15 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

The emit_hunk_header() function is responsible for assembling a
hunk header and calling emit_line() to send the hunk header
to the output file.  Its only caller fn_out_consume() needs
to prepare for a case where the function emits an incomplete
line and add the terminating LF.

Instead make sure emit_hunk_header() to always send a
completed line to emit_line().

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 3f5bf8b5a4..c2ed605cd0 100644
--- a/diff.c
+++ b/diff.c
@@ -677,6 +677,8 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	}
 
 	strbuf_add(&msgbuf, line + len, org_len - len);
+	strbuf_complete_line(&msgbuf);
+
 	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
@@ -1315,8 +1317,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		len = sane_truncate_line(ecbdata, line, len);
 		find_lno(line, ecbdata);
 		emit_hunk_header(ecbdata, line, len);
-		if (line[len-1] != '\n')
-			putc('\n', o->file);
 		return;
 	}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 03/17] diff.c: factor out diff_flush_patch_all_file_pairs
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 01/17] diff: readability fix Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 02/17] diff: move line ending check into emit_hunk_header Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 04/17] diff: introduce more flexible emit function Stefan Beller
                         ` (14 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch we want to do more things before and after all filepairs
are flushed. So factor flushing out all file pairs into its own function
that the new code can be plugged in easily.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index c2ed605cd0..2f9722b382 100644
--- a/diff.c
+++ b/diff.c
@@ -4737,6 +4737,17 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc)
 		warning(_(rename_limit_advice), varname, needed);
 }
 
+static void diff_flush_patch_all_file_pairs(struct diff_options *o)
+{
+	int i;
+	struct diff_queue_struct *q = &diff_queued_diff;
+	for (i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		if (check_pair_status(p))
+			diff_flush_patch(p, o);
+	}
+}
+
 void diff_flush(struct diff_options *options)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
@@ -4831,11 +4842,7 @@ void diff_flush(struct diff_options *options)
 			}
 		}
 
-		for (i = 0; i < q->nr; i++) {
-			struct diff_filepair *p = q->queue[i];
-			if (check_pair_status(p))
-				diff_flush_patch(p, options);
-		}
+		diff_flush_patch_all_file_pairs(options);
 	}
 
 	if (output_format & DIFF_FORMAT_CALLBACK)
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 04/17] diff: introduce more flexible emit function
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (2 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 03/17] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 05/17] diff.c: convert fn_out_consume to use emit_line Stefan Beller
                         ` (13 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

Currently, diff output is written either through the emit_line_0
function or through the FILE * in struct diff_options directly. To
make it easier to teach diff to buffer its output (which will be done
in a subsequent commit), introduce a more flexible emit_line() function.
In this commit, direct usages of emit_line_0() are replaced with
emit_line(); subsequent commits will also replace usages of the
FILE * with emit().

Instead of having a 'first' parameter containing the first character
of a line, have a dedicated 'sign' parameter that is just set when
the first character of the line is part of the actual content, i.e.
' ', '+', '-'.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 76 +++++++++++++++++++++++++++++-------------------------------------
 1 file changed, 33 insertions(+), 43 deletions(-)

diff --git a/diff.c b/diff.c
index 2f9722b382..3569857818 100644
--- a/diff.c
+++ b/diff.c
@@ -516,36 +516,30 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line_0(struct diff_options *o, const char *set, const char *reset,
-			int first, const char *line, int len)
+static void emit_line(struct diff_options *o, const char *set, const char *reset,
+		      int add_line_prefix, int sign, const char *line, int len)
 {
 	int has_trailing_newline, has_trailing_carriage_return;
-	int nofirst;
 	FILE *file = o->file;
 
-	fputs(diff_line_prefix(o), file);
+	if (add_line_prefix)
+		fputs(diff_line_prefix(o), file);
 
-	if (len == 0) {
-		has_trailing_newline = (first == '\n');
-		has_trailing_carriage_return = (!has_trailing_newline &&
-						(first == '\r'));
-		nofirst = has_trailing_newline || has_trailing_carriage_return;
-	} else {
-		has_trailing_newline = (len > 0 && line[len-1] == '\n');
-		if (has_trailing_newline)
-			len--;
-		has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-		if (has_trailing_carriage_return)
-			len--;
-		nofirst = 0;
-	}
+	has_trailing_newline = (len > 0 && line[len-1] == '\n');
+	if (has_trailing_newline)
+		len--;
+	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
+	if (has_trailing_carriage_return)
+		len--;
 
-	if (len || !nofirst) {
-		fputs(set, file);
-		if (!nofirst)
-			fputc(first, file);
+	if (len || sign) {
+		if (set)
+			fputs(set, file);
+		if (sign)
+			fputc(sign, file);
 		fwrite(line, len, 1, file);
-		fputs(reset, file);
+		if (reset)
+			fputs(reset, file);
 	}
 	if (has_trailing_carriage_return)
 		fputc('\r', file);
@@ -553,12 +547,6 @@ static void emit_line_0(struct diff_options *o, const char *set, const char *res
 		fputc('\n', file);
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char *reset,
-		      const char *line, int len)
-{
-	emit_line_0(o, set, reset, line[0], line+1, len-1);
-}
-
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -587,13 +575,13 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line_0(ecbdata->opt, set, reset, sign, line, len);
+		emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line_0(ecbdata->opt, ws, reset, sign, line, len);
+		emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
+		emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
 		ws_check_emit(line, len, ecbdata->ws_rule,
 			      ecbdata->opt->file, set, reset, ws);
 	}
@@ -643,7 +631,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	if (len < 10 ||
 	    memcmp(line, atat, 2) ||
 	    !(ep = memmem(line + 2, len - 2, atat, 2))) {
-		emit_line(ecbdata->opt, context, reset, line, len);
+		emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
 		return;
 	}
 	ep += 2; /* skip over @@ */
@@ -679,7 +667,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	strbuf_add(&msgbuf, line + len, org_len - len);
 	strbuf_complete_line(&msgbuf);
 
-	emit_line(ecbdata->opt, "", "", msgbuf.buf, msgbuf.len);
+	emit_line(ecbdata->opt, "", "", 1, 0, msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
 
@@ -742,8 +730,8 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
 		putc('\n', ecb->opt->file);
-		emit_line_0(ecb->opt, context, reset, '\\',
-			    nneof, strlen(nneof));
+		emit_line(ecb->opt, context, reset, 1, '\\',
+			  nneof, strlen(nneof));
 	}
 }
 
@@ -1341,7 +1329,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-			emit_line(o, context, reset, line, len);
+			emit_line(o, context, reset, 1, 0, line, len);
 			fputs("~\n", o->file);
 		} else {
 			/*
@@ -1353,7 +1341,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			      line++;
 			      len--;
 			}
-			emit_line(o, context, reset, line, len);
+			emit_line(o, context, reset, 1, 0, line, len);
 		}
 		return;
 	}
@@ -1376,7 +1364,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		/* incomplete line at the end */
 		ecbdata->lno_in_preimage++;
 		emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
-			  reset, line, len);
+			  reset, 1, 0, line, len);
 		break;
 	}
 }
@@ -2188,7 +2176,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
 			line_prefix, data->filename, data->lineno, err);
 		free(err);
-		emit_line(data->o, set, reset, line, 1);
+		emit_line(data->o, set, reset, 1, 0, line, 1);
 		ws_check_emit(line + 1, len - 1, data->ws_rule,
 			      data->o->file, set, reset, ws);
 	} else if (line[0] == ' ') {
@@ -4833,9 +4821,11 @@ void diff_flush(struct diff_options *options)
 
 	if (output_format & DIFF_FORMAT_PATCH) {
 		if (separator) {
-			fprintf(options->file, "%s%c",
-				diff_line_prefix(options),
-				options->line_termination);
+			char term[2];
+			term[0] = options->line_termination;
+			term[1] = '\0';
+
+			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
 				fputs(options->stat_sep, options->file);
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 05/17] diff.c: convert fn_out_consume to use emit_line
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (3 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 04/17] diff: introduce more flexible emit function Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 06/17] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
                         ` (12 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line.

This covers the parts of fn_out_consume.  In the next
patches we'll convert more functions that want to emit
formatted output, so we'd want to have a formatted emit
function. Add it here.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3569857818..8186289734 100644
--- a/diff.c
+++ b/diff.c
@@ -547,6 +547,21 @@ static void emit_line(struct diff_options *o, const char *set, const char *reset
 		fputc('\n', file);
 }
 
+static void emit_line_fmt(struct diff_options *o,
+			  const char *set, const char *reset,
+			  int add_line_prefix,
+			  const char *fmt, ...)
+{
+	struct strbuf sb = STRBUF_INIT;
+	va_list ap;
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	va_end(ap);
+
+	emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+	strbuf_release(&sb);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -1270,7 +1285,6 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	const char *context = diff_get_color(ecbdata->color_diff, DIFF_CONTEXT);
 	const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
 	struct diff_options *o = ecbdata->opt;
-	const char *line_prefix = diff_line_prefix(o);
 
 	o->found_changes = 1;
 
@@ -1282,14 +1296,12 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 
 	if (ecbdata->label_path[0]) {
 		const char *name_a_tab, *name_b_tab;
-
 		name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
 		name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
-
-		fprintf(o->file, "%s%s--- %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[0], reset, name_a_tab);
-		fprintf(o->file, "%s%s+++ %s%s%s\n",
-			line_prefix, meta, ecbdata->label_path[1], reset, name_b_tab);
+		emit_line_fmt(o, meta, reset, 1, "--- %s%s\n",
+			      ecbdata->label_path[0], name_a_tab);
+		emit_line_fmt(o, meta, reset, 1, "+++ %s%s\n",
+			      ecbdata->label_path[1], name_b_tab);
 		ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
 	}
 
@@ -1330,7 +1342,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
 			emit_line(o, context, reset, 1, 0, line, len);
-			fputs("~\n", o->file);
+			emit_line(o, NULL, NULL, 0, 0, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 06/17] diff.c: convert builtin_diff to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (4 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 05/17] diff.c: convert fn_out_consume to use emit_line Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 07/17] diff.c: convert emit_rewrite_diff " Stefan Beller
                         ` (11 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers builtin_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 8186289734..4fa976d43c 100644
--- a/diff.c
+++ b/diff.c
@@ -1289,8 +1289,9 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		fprintf(o->file, "%s", ecbdata->header->buf);
-		strbuf_reset(ecbdata->header);
+		emit_line(o, NULL, NULL, 0, 0,
+			  ecbdata->header->buf, ecbdata->header->len);
+		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
 	}
 
@@ -2435,7 +2436,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			fprintf(o->file, "%s", header.buf);
+			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2445,7 +2446,7 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		fprintf(o->file, "%s", header.buf);
+		emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2456,12 +2457,15 @@ static void builtin_diff(const char *name_a,
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
 				if (must_show_header)
-					fprintf(o->file, "%s", header.buf);
+					emit_line(o, NULL, NULL, 0, 0,
+						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			fprintf(o->file, "%s", header.buf);
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			emit_line(o, NULL, NULL, 0, 0,
+				  header.buf, header.len);
+			emit_line_fmt(o, NULL, NULL, 1,
+				      "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 			goto free_ab_and_return;
 		}
 		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
@@ -2470,16 +2474,19 @@ static void builtin_diff(const char *name_a,
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
 			if (must_show_header)
-				fprintf(o->file, "%s", header.buf);
+				emit_line(o, NULL, NULL, 0, 0,
+					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		fprintf(o->file, "%s", header.buf);
+		emit_line(o, NULL, NULL, 0, 0,
+			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
 			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
 		else
-			fprintf(o->file, "%sBinary files %s and %s differ\n",
-				line_prefix, lbl[0], lbl[1]);
+			emit_line_fmt(o, NULL, NULL, 1,
+				      "Binary files %s and %s differ\n",
+				      lbl[0], lbl[1]);
 		o->found_changes = 1;
 	} else {
 		/* Crazy xdl interfaces.. */
@@ -2491,7 +2498,7 @@ static void builtin_diff(const char *name_a,
 		const struct userdiff_funcname *pe;
 
 		if (must_show_header) {
-			fprintf(o->file, "%s", header.buf);
+			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 07/17] diff.c: convert emit_rewrite_diff to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (5 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 06/17] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 08/17] diff.c: convert emit_rewrite_lines " Stefan Beller
                         ` (10 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_diff.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index 4fa976d43c..3dda9f3c8e 100644
--- a/diff.c
+++ b/diff.c
@@ -704,17 +704,17 @@ static void remove_tempfile(void)
 	}
 }
 
-static void print_line_count(FILE *file, int count)
+static void add_line_count(struct strbuf *out, int count)
 {
 	switch (count) {
 	case 0:
-		fprintf(file, "0,0");
+		strbuf_addstr(out, "0,0");
 		break;
 	case 1:
-		fprintf(file, "1");
+		strbuf_addstr(out, "1");
 		break;
 	default:
-		fprintf(file, "1,%d", count);
+		strbuf_addf(out, "1,%d", count);
 		break;
 	}
 }
@@ -768,7 +768,7 @@ static void emit_rewrite_diff(const char *name_a,
 	char *data_one, *data_two;
 	size_t size_one, size_two;
 	struct emit_callback ecbdata;
-	const char *line_prefix = diff_line_prefix(o);
+	struct strbuf out = STRBUF_INIT;
 
 	if (diff_mnemonic_prefix && DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		a_prefix = o->b_prefix;
@@ -806,20 +806,23 @@ static void emit_rewrite_diff(const char *name_a,
 	ecbdata.lno_in_preimage = 1;
 	ecbdata.lno_in_postimage = 1;
 
+	emit_line_fmt(o, metainfo, reset, 1, "--- %s%s\n", a_name.buf, name_a_tab);
+	emit_line_fmt(o, metainfo, reset, 1, "+++ %s%s\n", b_name.buf, name_b_tab);
+
 	lc_a = count_lines(data_one, size_one);
 	lc_b = count_lines(data_two, size_two);
-	fprintf(o->file,
-		"%s%s--- %s%s%s\n%s%s+++ %s%s%s\n%s%s@@ -",
-		line_prefix, metainfo, a_name.buf, name_a_tab, reset,
-		line_prefix, metainfo, b_name.buf, name_b_tab, reset,
-		line_prefix, fraginfo);
+
+	strbuf_addstr(&out, "@@ -");
 	if (!o->irreversible_delete)
-		print_line_count(o->file, lc_a);
+		add_line_count(&out, lc_a);
 	else
-		fprintf(o->file, "?,?");
-	fprintf(o->file, " +");
-	print_line_count(o->file, lc_b);
-	fprintf(o->file, " @@%s\n", reset);
+		strbuf_addstr(&out, "?,?");
+	strbuf_addstr(&out, " +");
+	add_line_count(&out, lc_b);
+	strbuf_addstr(&out, " @@\n");
+	emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+	strbuf_release(&out);
+
 	if (lc_a && !o->irreversible_delete)
 		emit_rewrite_lines(&ecbdata, '-', data_one, size_one);
 	if (lc_b)
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 08/17] diff.c: convert emit_rewrite_lines to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (6 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 07/17] diff.c: convert emit_rewrite_diff " Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
                         ` (9 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_rewrite_lines.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 3dda9f3c8e..ca6b48cf49 100644
--- a/diff.c
+++ b/diff.c
@@ -722,15 +722,23 @@ static void add_line_count(struct strbuf *out, int count)
 static void emit_rewrite_lines(struct emit_callback *ecb,
 			       int prefix, const char *data, int size)
 {
-	const char *endp = NULL;
-	static const char *nneof = " No newline at end of file\n";
 	const char *reset = diff_get_color(ecb->color_diff, DIFF_RESET);
+	struct strbuf sb = STRBUF_INIT;
 
 	while (0 < size) {
 		int len;
 
-		endp = memchr(data, '\n', size);
-		len = endp ? (endp - data + 1) : size;
+		const char *endp = memchr(data, '\n', size);
+		if (endp)
+			len = endp - data + 1;
+		else {
+			strbuf_add(&sb, data, size);
+			strbuf_addch(&sb, '\n');
+			size = 0; /* to exit the loop. */
+
+			data = sb.buf;
+			len = sb.len;
+		}
 		if (prefix != '+') {
 			ecb->lno_in_preimage++;
 			emit_del_line(reset, ecb, data, len);
@@ -741,12 +749,13 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		size -= len;
 		data += len;
 	}
-	if (!endp) {
+	if (sb.len) {
+		static const char *nneof = "\\ No newline at end of file\n";
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		putc('\n', ecb->opt->file);
-		emit_line(ecb->opt, context, reset, 1, '\\',
-			  nneof, strlen(nneof));
+		emit_line(ecb->opt, context, reset, 1, 0,
+			    nneof, strlen(nneof));
+		strbuf_release(&sb);
 	}
 }
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (7 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 08/17] diff.c: convert emit_rewrite_lines " Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  5:59         ` Junio C Hamano
  2017-05-23  2:40       ` [PATCHv4 10/17] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
                         ` (8 subsequent siblings)
  17 siblings, 1 reply; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This prepares the code for submodules to go through the
emit_line function.

As the submodule process is no longer attached to the
same stdout as the superprojects process we need to
pass on the usage of colors explicitly.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c      | 14 ++++++----
 diff.h      |  3 +++
 submodule.c | 89 +++++++++++++++++++++++++++++++++----------------------------
 submodule.h |  9 +++----
 4 files changed, 63 insertions(+), 52 deletions(-)

diff --git a/diff.c b/diff.c
index ca6b48cf49..3357c0fca3 100644
--- a/diff.c
+++ b/diff.c
@@ -562,6 +562,12 @@ static void emit_line_fmt(struct diff_options *o,
 	strbuf_release(&sb);
 }
 
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+		    const char *line, int len)
+{
+	emit_line(o, set, reset, 1, 0, line, len);
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
 {
 	if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -2384,8 +2390,7 @@ static void builtin_diff(const char *name_a,
 	    (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_summary(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_summary(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
 				meta, del, add, reset);
@@ -2395,11 +2400,10 @@ static void builtin_diff(const char *name_a,
 		   (!two->mode || S_ISGITLINK(two->mode))) {
 		const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
 		const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
-		show_submodule_inline_diff(o->file, one->path ? one->path : two->path,
-				line_prefix,
+		show_submodule_inline_diff(o, one->path ? one->path : two->path,
 				&one->oid, &two->oid,
 				two->dirty_submodule,
-				meta, del, add, reset, o);
+				meta, del, add, reset);
 		return;
 	}
 
diff --git a/diff.h b/diff.h
index 5be1ee77a7..9ad546361a 100644
--- a/diff.h
+++ b/diff.h
@@ -188,6 +188,9 @@ struct diff_options {
 	int diff_path_counter;
 };
 
+void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
+		    const char *line, int len);
+
 enum color_diff {
 	DIFF_RESET = 0,
 	DIFF_CONTEXT = 1,
diff --git a/submodule.c b/submodule.c
index d3299e29c0..428c996c97 100644
--- a/submodule.c
+++ b/submodule.c
@@ -362,8 +362,8 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path,
 	return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, FILE *f,
-		const char *line_prefix,
+static void print_submodule_summary(struct rev_info *rev,
+		struct diff_options *o,
 		const char *del, const char *add, const char *reset)
 {
 	static const char format[] = "  %m %s";
@@ -375,18 +375,12 @@ static void print_submodule_summary(struct rev_info *rev, FILE *f,
 		ctx.date_mode = rev->date_mode;
 		ctx.output_encoding = get_log_output_encoding();
 		strbuf_setlen(&sb, 0);
-		strbuf_addstr(&sb, line_prefix);
-		if (commit->object.flags & SYMMETRIC_LEFT) {
-			if (del)
-				strbuf_addstr(&sb, del);
-		}
-		else if (add)
-			strbuf_addstr(&sb, add);
 		format_commit_message(commit, format, &sb, &ctx);
-		if (reset)
-			strbuf_addstr(&sb, reset);
 		strbuf_addch(&sb, '\n');
-		fprintf(f, "%s", sb.buf);
+		if (commit->object.flags & SYMMETRIC_LEFT)
+			diff_emit_line(o, del, reset, sb.buf, sb.len);
+		else if (add)
+			diff_emit_line(o, add, reset, sb.buf, sb.len);
 	}
 	strbuf_release(&sb);
 }
@@ -413,8 +407,7 @@ void prepare_submodule_repo_env(struct argv_array *out)
  * attempt to lookup both the left and right commits and put them into the
  * left and right pointers.
  */
-static void show_submodule_header(FILE *f, const char *path,
-		const char *line_prefix,
+static void show_submodule_header(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *reset,
@@ -425,12 +418,17 @@ static void show_submodule_header(FILE *f, const char *path,
 	struct strbuf sb = STRBUF_INIT;
 	int fast_forward = 0, fast_backward = 0;
 
-	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
-		fprintf(f, "%sSubmodule %s contains untracked content\n",
-			line_prefix, path);
-	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
-		fprintf(f, "%sSubmodule %s contains modified content\n",
-			line_prefix, path);
+	if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) {
+		strbuf_addf(&sb, "Submodule %s contains untracked content\n", path);
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+		strbuf_reset(&sb);
+	}
+
+	if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED) {
+		strbuf_addf(&sb, "Submodule %s contains modified content\n", path);
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+		strbuf_reset(&sb);
+	}
 
 	if (is_null_oid(one))
 		message = "(new submodule)";
@@ -472,21 +470,20 @@ static void show_submodule_header(FILE *f, const char *path,
 	}
 
 output_header:
-	strbuf_addf(&sb, "%s%sSubmodule %s ", line_prefix, meta, path);
+	strbuf_addf(&sb, "Submodule %s ", path);
 	strbuf_add_unique_abbrev(&sb, one->hash, DEFAULT_ABBREV);
 	strbuf_addstr(&sb, (fast_backward || fast_forward) ? ".." : "...");
 	strbuf_add_unique_abbrev(&sb, two->hash, DEFAULT_ABBREV);
 	if (message)
-		strbuf_addf(&sb, " %s%s\n", message, reset);
+		strbuf_addf(&sb, " %s\n", message);
 	else
-		strbuf_addf(&sb, "%s:%s\n", fast_backward ? " (rewind)" : "", reset);
-	fwrite(sb.buf, sb.len, 1, f);
+		strbuf_addf(&sb, "%s:\n", fast_backward ? " (rewind)" : "");
+	diff_emit_line(o, meta, reset, sb.buf, sb.len);
 
 	strbuf_release(&sb);
 }
 
-void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset)
@@ -495,7 +492,7 @@ void show_submodule_summary(FILE *f, const char *path,
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/*
@@ -508,11 +505,12 @@ void show_submodule_summary(FILE *f, const char *path,
 
 	/* Treat revision walker failure the same as missing commits */
 	if (prepare_submodule_summary(&rev, path, left, right, merge_bases)) {
-		fprintf(f, "%s(revision walker failed)\n", line_prefix);
+		const char *error = "(revision walker failed)\n";
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
 		goto out;
 	}
 
-	print_submodule_summary(&rev, f, line_prefix, del, add, reset);
+	print_submodule_summary(&rev, o, del, add, reset);
 
 out:
 	if (merge_bases)
@@ -521,20 +519,18 @@ void show_submodule_summary(FILE *f, const char *path,
 	clear_commit_marks(right, ~0);
 }
 
-void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *o)
+		const char *del, const char *add, const char *reset)
 {
 	const struct object_id *old = &empty_tree_oid, *new = &empty_tree_oid;
 	struct commit *left = NULL, *right = NULL;
 	struct commit_list *merge_bases = NULL;
-	struct strbuf submodule_dir = STRBUF_INIT;
 	struct child_process cp = CHILD_PROCESS_INIT;
+	struct strbuf sb = STRBUF_INIT;
 
-	show_submodule_header(f, path, line_prefix, one, two, dirty_submodule,
+	show_submodule_header(o, path, one, two, dirty_submodule,
 			      meta, reset, &left, &right, &merge_bases);
 
 	/* We need a valid left and right commit to display a difference */
@@ -547,15 +543,16 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 	if (right)
 		new = two;
 
-	fflush(f);
 	cp.git_cmd = 1;
 	cp.dir = path;
-	cp.out = dup(fileno(f));
+	cp.out = -1;
 	cp.no_stdin = 1;
 
 	/* TODO: other options may need to be passed here. */
 	argv_array_push(&cp.args, "diff");
-	argv_array_pushf(&cp.args, "--line-prefix=%s", line_prefix);
+	if (o->use_color)
+		argv_array_push(&cp.args, "--color=always");
+	argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));
 	if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
 		argv_array_pushf(&cp.args, "--src-prefix=%s%s/",
 				 o->b_prefix, path);
@@ -578,11 +575,21 @@ void show_submodule_inline_diff(FILE *f, const char *path,
 		argv_array_push(&cp.args, oid_to_hex(new));
 
 	prepare_submodule_repo_env(&cp.env_array);
-	if (run_command(&cp))
-		fprintf(f, "(diff failed)\n");
+	if (start_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
+	}
+
+	while (strbuf_getwholeline_fd(&sb, cp.out, '\n') != EOF)
+		diff_emit_line(o, NULL, NULL, sb.buf, sb.len);
+
+	if (finish_command(&cp)) {
+		const char *error = "(diff failed)\n";
+		diff_emit_line(o, NULL, NULL, error, strlen(error));
+	}
 
 done:
-	strbuf_release(&submodule_dir);
+	strbuf_release(&sb);
 	if (merge_bases)
 		free_commit_list(merge_bases);
 	if (left)
diff --git a/submodule.h b/submodule.h
index 1277480add..9df0a3aea2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -53,17 +53,14 @@ extern int parse_submodule_update_strategy(const char *value,
 		struct submodule_update_strategy *dst);
 extern const char *submodule_strategy_to_string(const struct submodule_update_strategy *s);
 extern void handle_ignore_submodules_arg(struct diff_options *, const char *);
-extern void show_submodule_summary(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_summary(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
 		const char *del, const char *add, const char *reset);
-extern void show_submodule_inline_diff(FILE *f, const char *path,
-		const char *line_prefix,
+extern void show_submodule_inline_diff(struct diff_options *o, const char *path,
 		struct object_id *one, struct object_id *two,
 		unsigned dirty_submodule, const char *meta,
-		const char *del, const char *add, const char *reset,
-		const struct diff_options *opt);
+		const char *del, const char *add, const char *reset);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
 /* Check if we want to update any submodule.*/
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 10/17] diff.c: convert emit_binary_diff_body to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (8 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 11/17] diff.c: convert show_stats " Stefan Beller
                         ` (7 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers emit_binary_diff_body.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index 3357c0fca3..b5a5261a4e 100644
--- a/diff.c
+++ b/diff.c
@@ -2244,8 +2244,8 @@ static unsigned char *deflate_it(char *data,
 	return deflated;
 }
 
-static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
-				  const char *prefix)
+static void emit_binary_diff_body(struct diff_options *o,
+				  mmfile_t *one, mmfile_t *two)
 {
 	void *cp;
 	void *delta;
@@ -2274,13 +2274,12 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	}
 
 	if (delta && delta_size < deflate_size) {
-		fprintf(file, "%sdelta %lu\n", prefix, orig_size);
+		emit_line_fmt(o, NULL, NULL, 1, "delta %lu\n", orig_size);
 		free(deflated);
 		data = delta;
 		data_size = delta_size;
-	}
-	else {
-		fprintf(file, "%sliteral %lu\n", prefix, two->size);
+	} else {
+		emit_line_fmt(o, NULL, NULL, 1, "literal %lu\n", two->size);
 		free(delta);
 		data = deflated;
 		data_size = deflate_size;
@@ -2289,8 +2288,9 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 	/* emit data encoded in base85 */
 	cp = data;
 	while (data_size) {
+		int len;
 		int bytes = (52 < data_size) ? 52 : data_size;
-		char line[70];
+		char line[71];
 		data_size -= bytes;
 		if (bytes <= 26)
 			line[0] = bytes + 'A' - 1;
@@ -2298,20 +2298,25 @@ static void emit_binary_diff_body(FILE *file, mmfile_t *one, mmfile_t *two,
 			line[0] = bytes - 26 + 'a' - 1;
 		encode_85(line + 1, cp, bytes);
 		cp = (char *) cp + bytes;
-		fprintf(file, "%s", prefix);
-		fputs(line, file);
-		fputc('\n', file);
+
+		len = strlen(line);
+		line[len++] = '\n';
+		line[len] = '\0';
+
+		emit_line(o, NULL, NULL, 1, 0, line, len);
 	}
-	fprintf(file, "%s\n", prefix);
+	emit_line(o, NULL, NULL, 1, 0, "\n", 1);
 	free(data);
 }
 
-static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two,
-			     const char *prefix)
+static void emit_binary_diff(struct diff_options *o,
+			     mmfile_t *one, mmfile_t *two)
 {
-	fprintf(file, "%sGIT binary patch\n", prefix);
-	emit_binary_diff_body(file, one, two, prefix);
-	emit_binary_diff_body(file, two, one, prefix);
+	const char *s = "GIT binary patch\n";
+	const int len = strlen(s);
+	emit_line(o, NULL, NULL, 1, 0, s, len);
+	emit_binary_diff_body(o, one, two);
+	emit_binary_diff_body(o, two, one);
 }
 
 int diff_filespec_is_binary(struct diff_filespec *one)
@@ -2498,7 +2503,7 @@ static void builtin_diff(const char *name_a,
 			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
-			emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
+			emit_binary_diff(o, &mf1, &mf2);
 		else
 			emit_line_fmt(o, NULL, NULL, 1,
 				      "Binary files %s and %s differ\n",
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 11/17] diff.c: convert show_stats to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (9 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 10/17] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 12/17] diff.c: convert word diffing " Stefan Beller
                         ` (6 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_line_* machinery and
keep print_stat_summary with the same arguments around.

The responsibility to print the line prefix moves from the callers
of print_stat_summary_0 into the function itself.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 89 ++++++++++++++++++++++++++++++++++++++----------------------------
 diff.h |  4 +--
 2 files changed, 53 insertions(+), 40 deletions(-)

diff --git a/diff.c b/diff.c
index b5a5261a4e..a76abf5f69 100644
--- a/diff.c
+++ b/diff.c
@@ -1540,20 +1540,19 @@ static int scale_linear(int it, int width, int max_change)
 	return 1 + (it * (width - 1) / max_change);
 }
 
-static void show_name(FILE *file,
+static void show_name(struct strbuf *out,
 		      const char *prefix, const char *name, int len)
 {
-	fprintf(file, " %s%-*s |", prefix, len, name);
+	strbuf_addf(out, " %s%-*s |", prefix, len, name);
 }
 
-static void show_graph(FILE *file, char ch, int cnt, const char *set, const char *reset)
+static void show_graph(struct strbuf *out, char ch, int cnt, const char *set, const char *reset)
 {
 	if (cnt <= 0)
 		return;
-	fprintf(file, "%s", set);
-	while (cnt--)
-		putc(ch, file);
-	fprintf(file, "%s", reset);
+	strbuf_addstr(out, set);
+	strbuf_addchars(out, ch, cnt);
+	strbuf_addstr(out, reset);
 }
 
 static void fill_print_name(struct diffstat_file *file)
@@ -1577,14 +1576,16 @@ static void fill_print_name(struct diffstat_file *file)
 	file->print_name = pname;
 }
 
-int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
+static void print_stat_summary_0(struct diff_options *options, int files,
+				 int insertions, int deletions)
 {
 	struct strbuf sb = STRBUF_INIT;
-	int ret;
 
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
-		return fprintf(fp, "%s\n", " 0 files changed");
+		strbuf_addstr(&sb, " 0 files changed");
+		emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+		return;
 	}
 
 	strbuf_addf(&sb,
@@ -1611,9 +1612,17 @@ int print_stat_summary(FILE *fp, int files, int insertions, int deletions)
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	ret = fputs(sb.buf, fp);
+	emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
-	return ret;
+}
+
+void print_stat_summary(FILE *fp, int files,
+			int insertions, int deletions)
+{
+	struct diff_options o;
+	memset(&o, 0, sizeof(o));
+	o.file = fp;
+	print_stat_summary_0(&o, files, insertions, deletions);
 }
 
 static void show_stats(struct diffstat_t *data, struct diff_options *options)
@@ -1623,13 +1632,13 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 	int total_files = data->nr, count;
 	int width, name_width, graph_width, number_width = 0, bin_width = 0;
 	const char *reset, *add_c, *del_c;
-	const char *line_prefix = "";
 	int extra_shown = 0;
+	const char *line_prefix = diff_line_prefix(options);
+	struct strbuf out = STRBUF_INIT;
 
 	if (data->nr == 0)
 		return;
 
-	line_prefix = diff_line_prefix(options);
 	count = options->stat_count ? options->stat_count : data->nr;
 
 	reset = diff_get_color_opt(options, DIFF_RESET);
@@ -1783,26 +1792,29 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		}
 
 		if (file->is_binary) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " %*s", number_width, "Bin");
+			show_name(&out, prefix, name, len);
+			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
-				putc('\n', options->file);
+				strbuf_addch(&out, '\n');
+				emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+				strbuf_reset(&out);
 				continue;
 			}
-			fprintf(options->file, " %s%"PRIuMAX"%s",
+			strbuf_addf(&out, " %s%"PRIuMAX"%s",
 				del_c, deleted, reset);
-			fprintf(options->file, " -> ");
-			fprintf(options->file, "%s%"PRIuMAX"%s",
+			strbuf_addstr(&out, " -> ");
+			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
-			fprintf(options->file, " bytes");
-			fprintf(options->file, "\n");
+			strbuf_addstr(&out, " bytes\n");
+			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
-			fprintf(options->file, "%s", line_prefix);
-			show_name(options->file, prefix, name, len);
-			fprintf(options->file, " Unmerged\n");
+			show_name(&out, prefix, name, len);
+			strbuf_addstr(&out, " Unmerged\n");
+			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			strbuf_reset(&out);
 			continue;
 		}
 
@@ -1825,14 +1837,15 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 				add = total - del;
 			}
 		}
-		fprintf(options->file, "%s", line_prefix);
-		show_name(options->file, prefix, name, len);
-		fprintf(options->file, " %*"PRIuMAX"%s",
+		show_name(&out, prefix, name, len);
+		strbuf_addf(&out, " %*"PRIuMAX"%s",
 			number_width, added + deleted,
 			added + deleted ? " " : "");
-		show_graph(options->file, '+', add, add_c, reset);
-		show_graph(options->file, '-', del, del_c, reset);
-		fprintf(options->file, "\n");
+		show_graph(&out, '+', add, add_c, reset);
+		show_graph(&out, '-', del, del_c, reset);
+		strbuf_addch(&out, '\n');
+		emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+		strbuf_reset(&out);
 	}
 
 	for (i = 0; i < data->nr; i++) {
@@ -1853,11 +1866,12 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			fprintf(options->file, "%s ...\n", line_prefix);
+			emit_line(options, NULL, NULL, 1, 0,
+				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
-	fprintf(options->file, "%s", line_prefix);
-	print_stat_summary(options->file, total_files, adds, dels);
+
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_shortstats(struct diffstat_t *data, struct diff_options *options)
@@ -1869,7 +1883,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 
 	for (i = 0; i < data->nr; i++) {
 		int added = data->files[i]->added;
-		int deleted= data->files[i]->deleted;
+		int deleted = data->files[i]->deleted;
 
 		if (data->files[i]->is_unmerged ||
 		    (!data->files[i]->is_interesting && (added + deleted == 0))) {
@@ -1879,8 +1893,7 @@ static void show_shortstats(struct diffstat_t *data, struct diff_options *option
 			dels += deleted;
 		}
 	}
-	fprintf(options->file, "%s", diff_line_prefix(options));
-	print_stat_summary(options->file, total_files, adds, dels);
+	print_stat_summary_0(options, total_files, adds, dels);
 }
 
 static void show_numstat(struct diffstat_t *data, struct diff_options *options)
diff --git a/diff.h b/diff.h
index 9ad546361a..56d8dd036e 100644
--- a/diff.h
+++ b/diff.h
@@ -392,8 +392,8 @@ extern int parse_rename_score(const char **cp_p);
 
 extern long parse_algorithm_value(const char *value);
 
-extern int print_stat_summary(FILE *fp, int files,
-			      int insertions, int deletions);
+extern void print_stat_summary(FILE *fp, int files,
+			       int insertions, int deletions);
 extern void setup_diff_pager(struct diff_options *);
 
 #endif /* DIFF_H */
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 12/17] diff.c: convert word diffing to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (10 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 11/17] diff.c: convert show_stats " Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 13/17] diff.c: convert diff_flush " Stefan Beller
                         ` (5 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers all code related to diffing words.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 73 +++++++++++++++++++++++++++++-------------------------------------
 1 file changed, 32 insertions(+), 41 deletions(-)

diff --git a/diff.c b/diff.c
index a76abf5f69..8317824963 100644
--- a/diff.c
+++ b/diff.c
@@ -897,37 +897,42 @@ struct diff_words_data {
 	struct diff_words_style *style;
 };
 
-static int fn_out_diff_words_write_helper(FILE *fp,
+static int fn_out_diff_words_write_helper(struct diff_options *o,
 					  struct diff_words_style_elem *st_el,
 					  const char *newline,
-					  size_t count, const char *buf,
-					  const char *line_prefix)
+					  size_t count, const char *buf)
 {
 	int print = 0;
+	struct strbuf sb = STRBUF_INIT;
 
 	while (count) {
 		char *p = memchr(buf, '\n', count);
 		if (print)
-			fputs(line_prefix, fp);
+			emit_line(o, NULL, NULL, 1, 0, "", 0);
+
 		if (p != buf) {
-			if (st_el->color && fputs(st_el->color, fp) < 0)
-				return -1;
-			if (fputs(st_el->prefix, fp) < 0 ||
-			    fwrite(buf, p ? p - buf : count, 1, fp) != 1 ||
-			    fputs(st_el->suffix, fp) < 0)
-				return -1;
-			if (st_el->color && *st_el->color
-			    && fputs(GIT_COLOR_RESET, fp) < 0)
-				return -1;
+			const char *reset = st_el->color && *st_el->color ?
+					    GIT_COLOR_RESET : NULL;
+			strbuf_addstr(&sb, st_el->prefix);
+			strbuf_add(&sb, buf, p ? p - buf : count);
+			strbuf_addstr(&sb, st_el->suffix);
+			emit_line(o, st_el->color, reset,
+				  0, 0, sb.buf, sb.len);
+			strbuf_reset(&sb);
 		}
 		if (!p)
-			return 0;
-		if (fputs(newline, fp) < 0)
-			return -1;
+			goto out;
+
+		strbuf_addstr(&sb, newline);
+		emit_line(o, NULL, NULL, 0, 0, sb.buf, sb.len);
+		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
 		print = 1;
 	}
+
+out:
+	strbuf_release(&sb);
 	return 0;
 }
 
@@ -981,14 +986,12 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	int minus_first, minus_len, plus_first, plus_len;
 	const char *minus_begin, *minus_end, *plus_begin, *plus_end;
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
 
 	if (line[0] != '@' || parse_hunk_header(line, len,
 			&minus_first, &minus_len, &plus_first, &plus_len))
 		return;
 
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* POSIX requires that first be decremented by one if len == 0... */
 	if (minus_len) {
@@ -1005,28 +1008,21 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	} else
 		plus_begin = plus_end = diff_words->plus.orig[plus_first].end;
 
-	if (color_words_output_graph_prefix(diff_words)) {
-		fputs(line_prefix, diff_words->opt->file);
-	}
 	if (diff_words->current_plus != plus_begin) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->ctx, style->newline,
 				plus_begin - diff_words->current_plus,
-				diff_words->current_plus, line_prefix);
-		if (*(plus_begin - 1) == '\n')
-			fputs(line_prefix, diff_words->opt->file);
+				diff_words->current_plus);
 	}
 	if (minus_begin != minus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->old, style->newline,
-				minus_end - minus_begin, minus_begin,
-				line_prefix);
+				minus_end - minus_begin, minus_begin);
 	}
 	if (plus_begin != plus_end) {
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 				&style->new, style->newline,
-				plus_end - plus_begin, plus_begin,
-				line_prefix);
+				plus_end - plus_begin, plus_begin);
 	}
 
 	diff_words->current_plus = plus_end;
@@ -1113,18 +1109,14 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	struct diff_words_style *style = diff_words->style;
 
 	struct diff_options *opt = diff_words->opt;
-	const char *line_prefix;
-
 	assert(opt);
-	line_prefix = diff_line_prefix(opt);
 
 	/* special case: only removal */
 	if (!diff_words->plus.text.size) {
-		fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->old, style->newline,
 			diff_words->minus.text.size,
-			diff_words->minus.text.ptr, line_prefix);
+			diff_words->minus.text.ptr);
 		diff_words->minus.text.size = 0;
 		return;
 	}
@@ -1147,12 +1139,11 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			fputs(line_prefix, diff_words->opt->file);
-		fn_out_diff_words_write_helper(diff_words->opt->file,
+			emit_line(diff_words->opt, NULL, NULL, 1, 0, "", 0);
+		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
-			- diff_words->current_plus, diff_words->current_plus,
-			line_prefix);
+			- diff_words->current_plus, diff_words->current_plus);
 	}
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 13/17] diff.c: convert diff_flush to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (11 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 12/17] diff.c: convert word diffing " Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 14/17] diff.c: convert diff_summary " Stefan Beller
                         ` (4 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_flush.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 8317824963..8ebe673331 100644
--- a/diff.c
+++ b/diff.c
@@ -4872,7 +4872,9 @@ void diff_flush(struct diff_options *options)
 			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				fputs(options->stat_sep, options->file);
+				emit_line(options, NULL, NULL, 0, 0,
+					  options->stat_sep,
+					  strlen(options->stat_sep));
 			}
 		}
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 14/17] diff.c: convert diff_summary to use emit_line_*
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (12 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 13/17] diff.c: convert diff_flush " Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 15/17] diff.c: emit_line includes whitespace highlighting Stefan Beller
                         ` (3 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

In a later patch, I want to propose an option to detect&color
moved lines in a diff, which cannot be done in a one-pass over
the diff. Instead we need to go over the whole diff twice,
because we cannot detect the first line of the two corresponding
lines (+ and -) that got moved.

So to prepare the diff machinery for two pass algorithms
(i.e. buffer it all up and then operate on the result),
move all emissions to places, such that the only emitting
function is emit_line_0.

This covers diff_summary.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 64 ++++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/diff.c b/diff.c
index 8ebe673331..76cafde4be 100644
--- a/diff.c
+++ b/diff.c
@@ -4504,67 +4504,71 @@ static void flush_one_pair(struct diff_filepair *p, struct diff_options *opt)
 	}
 }
 
-static void show_file_mode_name(FILE *file, const char *newdelete, struct diff_filespec *fs)
+static void show_file_mode_name(struct diff_options *opt, const char *newdelete, struct diff_filespec *fs)
 {
+	struct strbuf sb = STRBUF_INIT;
 	if (fs->mode)
-		fprintf(file, " %s mode %06o ", newdelete, fs->mode);
+		strbuf_addf(&sb, " %s mode %06o ", newdelete, fs->mode);
 	else
-		fprintf(file, " %s ", newdelete);
-	write_name_quoted(fs->path, file, '\n');
-}
+		strbuf_addf(&sb, " %s ", newdelete);
 
+	quote_c_style(fs->path, &sb, NULL, 0);
+	strbuf_addch(&sb, '\n');
+	emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+	strbuf_release(&sb);
+}
 
-static void show_mode_change(FILE *file, struct diff_filepair *p, int show_name,
-		const char *line_prefix)
+static void show_mode_change(struct diff_options *opt, struct diff_filepair *p,
+		int show_name)
 {
 	if (p->one->mode && p->two->mode && p->one->mode != p->two->mode) {
-		fprintf(file, "%s mode change %06o => %06o%c", line_prefix, p->one->mode,
-			p->two->mode, show_name ? ' ' : '\n');
+		struct strbuf sb = STRBUF_INIT;
 		if (show_name) {
-			write_name_quoted(p->two->path, file, '\n');
+			strbuf_addch(&sb, ' ');
+			quote_c_style(p->two->path, &sb, NULL, 0);
 		}
+		emit_line_fmt(opt, NULL, NULL, 1,
+			      " mode change %06o => %06o%s\n",
+			      p->one->mode, p->two->mode,
+			      show_name ? sb.buf : "");
+		strbuf_release(&sb);
 	}
 }
 
-static void show_rename_copy(FILE *file, const char *renamecopy, struct diff_filepair *p,
-			const char *line_prefix)
+static void show_rename_copy(struct diff_options *opt, const char *renamecopy,
+		struct diff_filepair *p)
 {
 	char *names = pprint_rename(p->one->path, p->two->path);
-
-	fprintf(file, " %s %s (%d%%)\n", renamecopy, names, similarity_index(p));
+	emit_line_fmt(opt, NULL, NULL, 1, " %s %s (%d%%)\n",
+		      renamecopy, names, similarity_index(p));
 	free(names);
-	show_mode_change(file, p, 0, line_prefix);
+	show_mode_change(opt, p, 0);
 }
 
 static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 {
-	FILE *file = opt->file;
-	const char *line_prefix = diff_line_prefix(opt);
-
 	switch(p->status) {
 	case DIFF_STATUS_DELETED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "delete", p->one);
+		show_file_mode_name(opt, "delete", p->one);
 		break;
 	case DIFF_STATUS_ADDED:
-		fputs(line_prefix, file);
-		show_file_mode_name(file, "create", p->two);
+		show_file_mode_name(opt, "create", p->two);
 		break;
 	case DIFF_STATUS_COPIED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "copy", p, line_prefix);
+		show_rename_copy(opt, "copy", p);
 		break;
 	case DIFF_STATUS_RENAMED:
-		fputs(line_prefix, file);
-		show_rename_copy(file, "rename", p, line_prefix);
+		show_rename_copy(opt, "rename", p);
 		break;
 	default:
 		if (p->score) {
-			fprintf(file, "%s rewrite ", line_prefix);
-			write_name_quoted(p->two->path, file, ' ');
-			fprintf(file, "(%d%%)\n", similarity_index(p));
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_addstr(&sb, " rewrite ");
+			quote_c_style(p->two->path, &sb, NULL, 0);
+			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
+			emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
 		}
-		show_mode_change(file, p, !p->score, line_prefix);
+		show_mode_change(opt, p, !p->score);
 		break;
 	}
 }
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 15/17] diff.c: emit_line includes whitespace highlighting
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (13 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 14/17] diff.c: convert diff_summary " Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 16/17] diff: buffer all output if asked to Stefan Beller
                         ` (2 subsequent siblings)
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

Currently any whitespace highlighting happens outside the emit_line
function. Teach the highlighting to emit_line, triggered by a new
parameter.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 107 ++++++++++++++++++++++++++++++++++++++---------------------------
 diff.h |   2 ++
 2 files changed, 65 insertions(+), 44 deletions(-)

diff --git a/diff.c b/diff.c
index 76cafde4be..514c5facd7 100644
--- a/diff.c
+++ b/diff.c
@@ -516,15 +516,34 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-static void emit_line(struct diff_options *o, const char *set, const char *reset,
-		      int add_line_prefix, int sign, const char *line, int len)
+
+static void emit_line(struct diff_options *o,
+		      const char *set, const char *reset,
+		      int add_line_prefix, int markup_ws,
+		      int sign, const char *line, int len)
 {
+	const char *ws;
 	int has_trailing_newline, has_trailing_carriage_return;
 	FILE *file = o->file;
 
 	if (add_line_prefix)
 		fputs(diff_line_prefix(o), file);
 
+	if (markup_ws) {
+		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+
+		if (set)
+			fputs(set, file);
+		if (sign)
+			fputc(sign, file);
+		if (reset)
+			fputs(reset, file);
+		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+		ws_check_emit(line, len, o->ws_rule,
+			      file, set, reset, ws);
+		return;
+	}
+
 	has_trailing_newline = (len > 0 && line[len-1] == '\n');
 	if (has_trailing_newline)
 		len--;
@@ -558,14 +577,14 @@ static void emit_line_fmt(struct diff_options *o,
 	strbuf_vaddf(&sb, fmt, ap);
 	va_end(ap);
 
-	emit_line(o, set, reset, add_line_prefix, 0, sb.buf, sb.len);
+	emit_line(o, set, reset, add_line_prefix, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
 		    const char *line, int len)
 {
-	emit_line(o, set, reset, 1, 0, line, len);
+	emit_line(o, set, reset, 1, 0, 0, line, len);
 }
 
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char *line, int len)
@@ -596,16 +615,15 @@ static void emit_line_checked(const char *reset,
 	}
 
 	if (!ws)
-		emit_line(ecbdata->opt, set, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, set, reset, 1, 0, sign, line, len);
 	else if (sign == '+' && new_blank_line_at_eof(ecbdata, line, len))
 		/* Blank line at EOF - paint '+' as well */
-		emit_line(ecbdata->opt, ws, reset, 1, sign, line, len);
+		emit_line(ecbdata->opt, ws, reset, 1, 1, sign, line, len);
 	else {
 		/* Emit just the prefix, then the rest. */
-		emit_line(ecbdata->opt, set, reset, 1, sign, "", 0);
-		ws_check_emit(line, len, ecbdata->ws_rule,
-			      ecbdata->opt->file, set, reset, ws);
+		emit_line(ecbdata->opt, set, reset, 1, 1, sign, line, len);
 	}
+
 }
 
 static void emit_add_line(const char *reset,
@@ -652,7 +670,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	if (len < 10 ||
 	    memcmp(line, atat, 2) ||
 	    !(ep = memmem(line + 2, len - 2, atat, 2))) {
-		emit_line(ecbdata->opt, context, reset, 1, 0, line, len);
+		emit_line(ecbdata->opt, context, reset, 1, 0, 0, line, len);
 		return;
 	}
 	ep += 2; /* skip over @@ */
@@ -688,7 +706,7 @@ static void emit_hunk_header(struct emit_callback *ecbdata,
 	strbuf_add(&msgbuf, line + len, org_len - len);
 	strbuf_complete_line(&msgbuf);
 
-	emit_line(ecbdata->opt, "", "", 1, 0, msgbuf.buf, msgbuf.len);
+	emit_line(ecbdata->opt, "", "", 1, 0, 0, msgbuf.buf, msgbuf.len);
 	strbuf_release(&msgbuf);
 }
 
@@ -759,7 +777,7 @@ static void emit_rewrite_lines(struct emit_callback *ecb,
 		static const char *nneof = "\\ No newline at end of file\n";
 		const char *context = diff_get_color(ecb->color_diff,
 						     DIFF_CONTEXT);
-		emit_line(ecb->opt, context, reset, 1, 0,
+		emit_line(ecb->opt, context, reset, 1, 0, 0,
 			    nneof, strlen(nneof));
 		strbuf_release(&sb);
 	}
@@ -835,7 +853,7 @@ static void emit_rewrite_diff(const char *name_a,
 	strbuf_addstr(&out, " +");
 	add_line_count(&out, lc_b);
 	strbuf_addstr(&out, " @@\n");
-	emit_line(o, fraginfo, reset, 1, 0, out.buf, out.len);
+	emit_line(o, fraginfo, reset, 1, 0, 0, out.buf, out.len);
 	strbuf_release(&out);
 
 	if (lc_a && !o->irreversible_delete)
@@ -908,7 +926,7 @@ static int fn_out_diff_words_write_helper(struct diff_options *o,
 	while (count) {
 		char *p = memchr(buf, '\n', count);
 		if (print)
-			emit_line(o, NULL, NULL, 1, 0, "", 0);
+			emit_line(o, NULL, NULL, 1, 0, 0, "", 0);
 
 		if (p != buf) {
 			const char *reset = st_el->color && *st_el->color ?
@@ -917,14 +935,14 @@ static int fn_out_diff_words_write_helper(struct diff_options *o,
 			strbuf_add(&sb, buf, p ? p - buf : count);
 			strbuf_addstr(&sb, st_el->suffix);
 			emit_line(o, st_el->color, reset,
-				  0, 0, sb.buf, sb.len);
+				  0, 0, 0, sb.buf, sb.len);
 			strbuf_reset(&sb);
 		}
 		if (!p)
 			goto out;
 
 		strbuf_addstr(&sb, newline);
-		emit_line(o, NULL, NULL, 0, 0, sb.buf, sb.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, sb.buf, sb.len);
 		strbuf_reset(&sb);
 		count -= p + 1 - buf;
 		buf = p + 1;
@@ -1139,7 +1157,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
 			diff_words->plus.text.size) {
 		if (color_words_output_graph_prefix(diff_words))
-			emit_line(diff_words->opt, NULL, NULL, 1, 0, "", 0);
+			emit_line(diff_words->opt, NULL, NULL, 1, 0, 0, "", 0);
 		fn_out_diff_words_write_helper(diff_words->opt,
 			&style->ctx, style->newline,
 			diff_words->plus.text.ptr + diff_words->plus.text.size
@@ -1298,7 +1316,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 	o->found_changes = 1;
 
 	if (ecbdata->header) {
-		emit_line(o, NULL, NULL, 0, 0,
+		emit_line(o, NULL, NULL, 0, 0, 0,
 			  ecbdata->header->buf, ecbdata->header->len);
 		strbuf_release(ecbdata->header);
 		ecbdata->header = NULL;
@@ -1351,8 +1369,8 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		}
 		diff_words_flush(ecbdata);
 		if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
-			emit_line(o, context, reset, 1, 0, line, len);
-			emit_line(o, NULL, NULL, 0, 0, "~\n", 2);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
+			emit_line(o, NULL, NULL, 0, 0, 0, "~\n", 2);
 		} else {
 			/*
 			 * Skip the prefix character, if any.  With
@@ -1363,7 +1381,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 			      line++;
 			      len--;
 			}
-			emit_line(o, context, reset, 1, 0, line, len);
+			emit_line(o, context, reset, 1, 0, 0, line, len);
 		}
 		return;
 	}
@@ -1386,7 +1404,7 @@ static void fn_out_consume(void *priv, char *line, unsigned long len)
 		/* incomplete line at the end */
 		ecbdata->lno_in_preimage++;
 		emit_line(o, diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
-			  reset, 1, 0, line, len);
+			  reset, 1, 0, 0, line, len);
 		break;
 	}
 }
@@ -1575,7 +1593,7 @@ static void print_stat_summary_0(struct diff_options *options, int files,
 	if (!files) {
 		assert(insertions == 0 && deletions == 0);
 		strbuf_addstr(&sb, " 0 files changed");
-		emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		return;
 	}
 
@@ -1603,7 +1621,7 @@ static void print_stat_summary_0(struct diff_options *options, int files,
 			    deletions);
 	}
 	strbuf_addch(&sb, '\n');
-	emit_line(options, NULL, NULL, 1, 0, sb.buf, sb.len);
+	emit_line(options, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -1787,7 +1805,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, " %*s", number_width, "Bin");
 			if (!added && !deleted) {
 				strbuf_addch(&out, '\n');
-				emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+				emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 				strbuf_reset(&out);
 				continue;
 			}
@@ -1797,14 +1815,14 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 			strbuf_addf(&out, "%s%"PRIuMAX"%s",
 				add_c, added, reset);
 			strbuf_addstr(&out, " bytes\n");
-			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
 		else if (file->is_unmerged) {
 			show_name(&out, prefix, name, len);
 			strbuf_addstr(&out, " Unmerged\n");
-			emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+			emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 			strbuf_reset(&out);
 			continue;
 		}
@@ -1835,7 +1853,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		show_graph(&out, '+', add, add_c, reset);
 		show_graph(&out, '-', del, del_c, reset);
 		strbuf_addch(&out, '\n');
-		emit_line(options, NULL, NULL, 1, 0, out.buf, out.len);
+		emit_line(options, NULL, NULL, 1, 0, 0, out.buf, out.len);
 		strbuf_reset(&out);
 	}
 
@@ -1857,7 +1875,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
 		if (i < count)
 			continue;
 		if (!extra_shown)
-			emit_line(options, NULL, NULL, 1, 0,
+			emit_line(options, NULL, NULL, 1, 0, 0,
 				  " ...\n", strlen(" ...\n"));
 		extra_shown = 1;
 	}
@@ -2211,7 +2229,7 @@ static void checkdiff_consume(void *priv, char *line, unsigned long len)
 		fprintf(data->o->file, "%s%s:%d: %s.\n",
 			line_prefix, data->filename, data->lineno, err);
 		free(err);
-		emit_line(data->o, set, reset, 1, 0, line, 1);
+		emit_line(data->o, set, reset, 1, 0, 0, line, 1);
 		ws_check_emit(line + 1, len - 1, data->ws_rule,
 			      data->o->file, set, reset, ws);
 	} else if (line[0] == ' ') {
@@ -2307,9 +2325,9 @@ static void emit_binary_diff_body(struct diff_options *o,
 		line[len++] = '\n';
 		line[len] = '\0';
 
-		emit_line(o, NULL, NULL, 1, 0, line, len);
+		emit_line(o, NULL, NULL, 1, 0, 0, line, len);
 	}
-	emit_line(o, NULL, NULL, 1, 0, "\n", 1);
+	emit_line(o, NULL, NULL, 1, 0, 0, "\n", 1);
 	free(data);
 }
 
@@ -2318,7 +2336,7 @@ static void emit_binary_diff(struct diff_options *o,
 {
 	const char *s = "GIT binary patch\n";
 	const int len = strlen(s);
-	emit_line(o, NULL, NULL, 1, 0, s, len);
+	emit_line(o, NULL, NULL, 1, 0, 0, s, len);
 	emit_binary_diff_body(o, one, two);
 	emit_binary_diff_body(o, two, one);
 }
@@ -2461,7 +2479,7 @@ static void builtin_diff(const char *name_a,
 		if (complete_rewrite &&
 		    (textconv_one || !diff_filespec_is_binary(one)) &&
 		    (textconv_two || !diff_filespec_is_binary(two))) {
-			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 			emit_rewrite_diff(name_a, name_b, one, two,
 						textconv_one, textconv_two, o);
@@ -2471,7 +2489,7 @@ static void builtin_diff(const char *name_a,
 	}
 
 	if (o->irreversible_delete && lbl[1][0] == '/') {
-		emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+		emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 		strbuf_reset(&header);
 		goto free_ab_and_return;
 	} else if (!DIFF_OPT_TST(o, TEXT) &&
@@ -2482,11 +2500,11 @@ static void builtin_diff(const char *name_a,
 		    !DIFF_OPT_TST(o, BINARY)) {
 			if (!oidcmp(&one->oid, &two->oid)) {
 				if (must_show_header)
-					emit_line(o, NULL, NULL, 0, 0,
+					emit_line(o, NULL, NULL, 0, 0, 0,
 						  header.buf, header.len);
 				goto free_ab_and_return;
 			}
-			emit_line(o, NULL, NULL, 0, 0,
+			emit_line(o, NULL, NULL, 0, 0, 0,
 				  header.buf, header.len);
 			emit_line_fmt(o, NULL, NULL, 1,
 				      "Binary files %s and %s differ\n",
@@ -2499,11 +2517,11 @@ static void builtin_diff(const char *name_a,
 		if (mf1.size == mf2.size &&
 		    !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
 			if (must_show_header)
-				emit_line(o, NULL, NULL, 0, 0,
+				emit_line(o, NULL, NULL, 0, 0, 0,
 					  header.buf, header.len);
 			goto free_ab_and_return;
 		}
-		emit_line(o, NULL, NULL, 0, 0,
+		emit_line(o, NULL, NULL, 0, 0, 0,
 			  header.buf, header.len);
 		strbuf_reset(&header);
 		if (DIFF_OPT_TST(o, BINARY))
@@ -2523,7 +2541,7 @@ static void builtin_diff(const char *name_a,
 		const struct userdiff_funcname *pe;
 
 		if (must_show_header) {
-			emit_line(o, NULL, NULL, 0, 0, header.buf, header.len);
+			emit_line(o, NULL, NULL, 0, 0, 0, header.buf, header.len);
 			strbuf_reset(&header);
 		}
 
@@ -2540,6 +2558,7 @@ static void builtin_diff(const char *name_a,
 		ecbdata.label_path = lbl;
 		ecbdata.color_diff = want_color(o->use_color);
 		ecbdata.ws_rule = whitespace_rule(name_b);
+		o->ws_rule = ecbdata.ws_rule;
 		if (ecbdata.ws_rule & WS_BLANK_AT_EOF)
 			check_blank_at_eof(&mf1, &mf2, &ecbdata);
 		ecbdata.opt = o;
@@ -4514,7 +4533,7 @@ static void show_file_mode_name(struct diff_options *opt, const char *newdelete,
 
 	quote_c_style(fs->path, &sb, NULL, 0);
 	strbuf_addch(&sb, '\n');
-	emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+	emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 	strbuf_release(&sb);
 }
 
@@ -4566,7 +4585,7 @@ static void diff_summary(struct diff_options *opt, struct diff_filepair *p)
 			strbuf_addstr(&sb, " rewrite ");
 			quote_c_style(p->two->path, &sb, NULL, 0);
 			strbuf_addf(&sb, " (%d%%)\n", similarity_index(p));
-			emit_line(opt, NULL, NULL, 1, 0, sb.buf, sb.len);
+			emit_line(opt, NULL, NULL, 1, 0, 0, sb.buf, sb.len);
 		}
 		show_mode_change(opt, p, !p->score);
 		break;
@@ -4873,10 +4892,10 @@ void diff_flush(struct diff_options *options)
 			term[0] = options->line_termination;
 			term[1] = '\0';
 
-			emit_line(options, NULL, NULL, 1, 0, term, !!term[0]);
+			emit_line(options, NULL, NULL, 1, 0, 0, term, !!term[0]);
 			if (options->stat_sep) {
 				/* attach patch instead of inline */
-				emit_line(options, NULL, NULL, 0, 0,
+				emit_line(options, NULL, NULL, 0, 0, 0,
 					  options->stat_sep,
 					  strlen(options->stat_sep));
 			}
diff --git a/diff.h b/diff.h
index 56d8dd036e..85948ed65a 100644
--- a/diff.h
+++ b/diff.h
@@ -186,6 +186,8 @@ struct diff_options {
 	void *output_prefix_data;
 
 	int diff_path_counter;
+
+	unsigned ws_rule;
 };
 
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 16/17] diff: buffer all output if asked to
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (14 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 15/17] diff.c: emit_line includes whitespace highlighting Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-23  2:40       ` [PATCHv4 17/17] diff.c: color moved lines differently Stefan Beller
  2017-05-27  1:04       ` [PATCHv4 00/17] Diff machine: highlight moved lines Jacob Keller
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

Introduce a new option 'use_buffer' in the struct diff_options which
controls whether all output is buffered up until all output is available.

We'll have a new struct 'diff_line' in diff.h which will be used to buffer
each line.  The diff_line will duplicate the memory of the line to buffer
as that is easiest to reason about for now. In a future patch we may want
to decrease the memory usage by not duplicating all output for buffering
but rather we may want to store offsets into the file or in case of hunk
descriptions such as the similarity score, we could just store the
relevant number and reproduce the text later on.

This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.

emit_line factors out the emission part into emit_line_emission,
and depending on the diff_options->use_buffer the emission
will be performed directly when calling emit_line or after the
whole process is done, i.e. by buffering we have add the possibility
for a second pass over the whole output before doing the actual
output.

In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.

Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_line for output.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++++++---------------
 diff.h |  41 +++++++++++++++++
 2 files changed, 161 insertions(+), 35 deletions(-)

diff --git a/diff.c b/diff.c
index 514c5facd7..c0b8afa38f 100644
--- a/diff.c
+++ b/diff.c
@@ -516,54 +516,85 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
-
-static void emit_line(struct diff_options *o,
-		      const char *set, const char *reset,
-		      int add_line_prefix, int markup_ws,
-		      int sign, const char *line, int len)
+static void emit_diff_line(struct diff_options *o,
+				     struct diff_line *e)
 {
 	const char *ws;
 	int has_trailing_newline, has_trailing_carriage_return;
+	int len = e->len;
 	FILE *file = o->file;
 
-	if (add_line_prefix)
+	if (e->add_line_prefix)
 		fputs(diff_line_prefix(o), file);
 
-	if (markup_ws) {
+	switch (e->state) {
+	case DIFF_LINE_WS:
 		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
+		if (e->set)
+			fputs(e->set, file);
+		if (e->sign)
+			fputc(e->sign, file);
+		if (e->reset)
+			fputs(e->reset, file);
+		ws_check_emit(e->line, e->len, o->ws_rule,
+			      file, e->set, e->reset, ws);
+		return;
+	case DIFF_LINE_ASIS:
+		has_trailing_newline = (len > 0 && e->line[len-1] == '\n');
+		if (has_trailing_newline)
+			len--;
+		has_trailing_carriage_return = (len > 0 && e->line[len-1] == '\r');
+		if (has_trailing_carriage_return)
+			len--;
 
-		if (set)
-			fputs(set, file);
-		if (sign)
-			fputc(sign, file);
-		if (reset)
-			fputs(reset, file);
-		ws = diff_get_color(o->use_color, DIFF_WHITESPACE);
-		ws_check_emit(line, len, o->ws_rule,
-			      file, set, reset, ws);
+		if (len || e->sign) {
+			if (e->set)
+				fputs(e->set, file);
+			if (e->sign)
+				fputc(e->sign, file);
+			fwrite(e->line, len, 1, file);
+			if (e->reset)
+				fputs(e->reset, file);
+		}
+		if (has_trailing_carriage_return)
+			fputc('\r', file);
+		if (has_trailing_newline)
+			fputc('\n', file);
+		return;
+	case DIFF_LINE_RELOAD_WS_RULE:
+		o->ws_rule = whitespace_rule(e->line);
 		return;
+	default:
+		die("BUG: malformatted buffered patch line: '%d'", e->state);
 	}
+}
 
-	has_trailing_newline = (len > 0 && line[len-1] == '\n');
-	if (has_trailing_newline)
-		len--;
-	has_trailing_carriage_return = (len > 0 && line[len-1] == '\r');
-	if (has_trailing_carriage_return)
-		len--;
+static void append_diff_line(struct diff_options *o,
+				       struct diff_line *e)
+{
+	struct diff_line *f;
+	ALLOC_GROW(o->line_buffer,
+		   o->line_buffer_nr + 1,
+		   o->line_buffer_alloc);
+	f = &o->line_buffer[o->line_buffer_nr++];
 
-	if (len || sign) {
-		if (set)
-			fputs(set, file);
-		if (sign)
-			fputc(sign, file);
-		fwrite(line, len, 1, file);
-		if (reset)
-			fputs(reset, file);
-	}
-	if (has_trailing_carriage_return)
-		fputc('\r', file);
-	if (has_trailing_newline)
-		fputc('\n', file);
+	memcpy(f, e, sizeof(struct diff_line));
+	f->line = e->line ? xmemdupz(e->line, e->len) : NULL;
+}
+
+static void emit_line(struct diff_options *o,
+		      const char *set, const char *reset,
+		      int add_line_prefix, int markup_ws,
+		      int sign, const char *line, int len)
+{
+	struct diff_line e = {set, reset, line,
+		len, sign, add_line_prefix,
+		markup_ws ? DIFF_LINE_WS : DIFF_LINE_ASIS};
+
+	if (o->use_buffer)
+		append_diff_line(o, &e);
+	else
+		emit_diff_line(o, &e);
 }
 
 static void emit_line_fmt(struct diff_options *o,
@@ -1172,6 +1203,18 @@ static void diff_words_flush(struct emit_callback *ecbdata)
 	if (ecbdata->diff_words->minus.text.size ||
 	    ecbdata->diff_words->plus.text.size)
 		diff_words_show(ecbdata->diff_words);
+
+	if (ecbdata->diff_words->opt->line_buffer_nr) {
+		int i;
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			append_diff_line(ecbdata->opt,
+				&ecbdata->diff_words->opt->line_buffer[i]);
+
+		for (i = 0; i < ecbdata->diff_words->opt->line_buffer_nr; i++)
+			free((void *)ecbdata->diff_words->opt->line_buffer[i].line);
+
+		ecbdata->diff_words->opt->line_buffer_nr = 0;
+	}
 }
 
 static void diff_filespec_load_driver(struct diff_filespec *one)
@@ -1207,6 +1250,11 @@ static void init_diff_words_data(struct emit_callback *ecbdata,
 		xcalloc(1, sizeof(struct diff_words_data));
 	ecbdata->diff_words->type = o->word_diff;
 	ecbdata->diff_words->opt = o;
+
+	o->line_buffer = NULL;
+	o->line_buffer_nr = 0;
+	o->line_buffer_alloc = 0;
+
 	if (!o->word_regex)
 		o->word_regex = userdiff_word_regex(one);
 	if (!o->word_regex)
@@ -1241,6 +1289,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
 {
 	if (ecbdata->diff_words) {
 		diff_words_flush(ecbdata);
+		free (ecbdata->diff_words->opt->line_buffer);
 		free (ecbdata->diff_words->opt);
 		free (ecbdata->diff_words->minus.text.ptr);
 		free (ecbdata->diff_words->minus.orig);
@@ -2579,6 +2628,13 @@ static void builtin_diff(const char *name_a,
 			xecfg.ctxlen = strtoul(v, NULL, 10);
 		if (o->word_diff)
 			init_diff_words_data(&ecbdata, o, one, two);
+		if (o->use_buffer) {
+			struct diff_line e = diff_line_INIT;
+			e.state = DIFF_LINE_RELOAD_WS_RULE;
+			e.line = name_b;
+			e.len = strlen(name_b);
+			append_diff_line(o, &e);
+		}
 		if (xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 				  &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
@@ -3458,6 +3514,10 @@ void diff_setup(struct diff_options *options)
 		options->a_prefix = "a/";
 		options->b_prefix = "b/";
 	}
+
+	options->line_buffer = NULL;
+	options->line_buffer_nr = 0;
+	options->line_buffer_alloc = 0;
 }
 
 void diff_setup_done(struct diff_options *options)
@@ -4796,11 +4856,36 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
+	/*
+	 * For testing purposes we want to make sure the diff machinery
+	 * works completely with the buffer. If there is anything emitted
+	 * outside the emit_diff_line, then the order is screwed
+	 * up and the tests will fail.
+	 *
+	 * TODO (later in this series):
+	 * We'll unset this flag in a later patch.
+	 */
+	o->use_buffer = 1;
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		if (check_pair_status(p))
 			diff_flush_patch(p, o);
 	}
+
+	if (o->use_buffer) {
+		for (i = 0; i < o->line_buffer_nr; i++)
+			emit_diff_line(o, &o->line_buffer[i]);
+
+		for (i = 0; i < o->line_buffer_nr; i++)
+			free((void *)o->line_buffer[i].line);
+
+		free(o->line_buffer);
+
+		o->line_buffer = NULL;
+		o->line_buffer_nr = 0;
+		o->line_buffer_alloc = 0;
+	}
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 85948ed65a..fad1258556 100644
--- a/diff.h
+++ b/diff.h
@@ -115,6 +115,42 @@ enum diff_submodule_format {
 	DIFF_SUBMODULE_INLINE_DIFF
 };
 
+/*
+ * This struct is used when we need to buffer the output of the diff output.
+ *
+ * NEEDSWORK: Instead of storing a copy of the line, add an offset pointer
+ * into the pre/post image file. This pointer could be a union with the
+ * line pointer. By storing an offset into the file instead of the literal line,
+ * we can decrease the memory footprint for the buffered output. At first we
+ * may want to only have indirection for the content lines, but we could also
+ * enhance the state for emitting prefabricated lines, e.g. the similarity
+ * score line or hunk/file headers would only need to store a number or path
+ * and then the output can be constructed later on depending on state.
+ */
+struct diff_line {
+	const char *set;
+	const char *reset;
+	const char *line;
+	int len;
+	int sign;
+	int add_line_prefix;
+	enum {
+		/*
+		 * Emits [lineprefix][set][sign][reset] and then calls
+		 * ws_check_emit which will output "line", marked up
+		 * according to ws_rule.
+		 */
+		DIFF_LINE_WS,
+
+		/* Emits [lineprefix][set][sign] line [reset] */
+		DIFF_LINE_ASIS,
+
+		/* Reloads the ws_rule; line contains the file name */
+		DIFF_LINE_RELOAD_WS_RULE
+	} state;
+};
+#define diff_line_INIT {NULL, NULL, NULL, 0, 0, 0}
+
 struct diff_options {
 	const char *orderfile;
 	const char *pickaxe;
@@ -188,8 +224,13 @@ struct diff_options {
 	int diff_path_counter;
 
 	unsigned ws_rule;
+	int use_buffer;
+
+	struct diff_line *line_buffer;
+	int line_buffer_nr, line_buffer_alloc;
 };
 
+/* Emit [line_prefix] [set] line [reset] */
 void diff_emit_line(struct diff_options *o, const char *set, const char *reset,
 		    const char *line, int len);
 
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCHv4 17/17] diff.c: color moved lines differently
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (15 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 16/17] diff: buffer all output if asked to Stefan Beller
@ 2017-05-23  2:40       ` Stefan Beller
  2017-05-27  1:04       ` [PATCHv4 00/17] Diff machine: highlight moved lines Jacob Keller
  17 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23  2:40 UTC (permalink / raw)
  To: gitster; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger,
	Stefan Beller

When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OM]  -        if (!is_authorized_user())
    [OM]  -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OM]  -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NM]  +        sensitive_stuff(spanning,
    [NM]  +                        multiple,
    [NM]  +                        lines);
    [NM]  +}

Adjacent blocks are colored differently. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:

    [OM]  -void sensitive_stuff(void)
    [OM]  -{
    [OMA] -        if (!is_authorized_user())
    [OMA] -                die("unauthorized");
    [OM]  -        sensitive_stuff(spanning,
    [OM]  -                        multiple,
    [OM]  -                        lines);
    [OMA] -}

           void another_function()
           {
    [del] -        printf("foo");
    [add] +        printf("bar");
           }

    [NM]  +void sensitive_stuff(void)
    [NM]  +{
    [NMA] +        sensitive_stuff(spanning,
    [NMA] +                        multiple,
    [NMA] +                        lines);
    [NM]  +        if (!is_authorized_user())
    [NM]  +                die("unauthorized");
    [NMA] +}

If the moved code is larger, it is easier to hide some permutation in the
code, which is why the alternative coloring is really needed.

As the reviewers attention should be brought to the places, where the
difference is introduced to the moved code, we cannot just have one new
color for all of moved code.

First I implemented an alternative design, which would show a moved hunk
in one color, and its boundaries in another color. This idea was error
prone as it inspected each line and its neighboring lines to determine
if the line was (a) moved and (b) if was deep inside a hunk by having
matching neighboring lines. This is unreliable as the we can construct
hunks which have equal neighbors that just exceed the number of lines
inspected. (Think of 'AXYZBXYZCXYZD..' with each letter as a line, that
is permutated to AXYZCXYZBXYZD..').

Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then switches color to the alternative color
for the next hunk. By doing this any permutation is recognized and
displayed. That implies that there is no dedicated boundary or
inside-hunk color, but instead we'll have just two colors alternating
for hunks.

It would be a bit more UX friendly if the two corresponding hunks
(of added and deleted lines) for one move would get the same color id.
(Both get "regular moved" or "alternative moved"). This problem is
deferred to a later patch for now.

A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

# Conflicts:
#	diff.c
---
 Documentation/config.txt   |  14 ++-
 diff.c                     | 275 +++++++++++++++++++++++++++++++++++++++++++--
 diff.h                     |   9 +-
 t/t4015-diff-whitespace.sh | 267 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 552 insertions(+), 13 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d51..902d017c3b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,14 +1051,24 @@ This does not affect linkgit:git-format-patch[1] or the
 'git-diff-{asterisk}' plumbing commands.  Can be overridden on the
 command line with the `--color[=<when>]` option.
 
+color.moved::
+	A boolean value, whether a diff should color moved lines
+	differently. The moved lines are searched for in the diff only.
+	Duplicated lines from somewhere in the project that are not
+	part of the diff are not colored as moved.
+	Defaults to false.
+
 color.diff.<slot>::
 	Use customized color for diff colorization.  `<slot>` specifies
 	which part of the patch to use the specified color, and is one
 	of `context` (context text - `plain` is a historical synonym),
 	`meta` (metainformation), `frag`
 	(hunk header), 'func' (function in hunk header), `old` (removed lines),
-	`new` (added lines), `commit` (commit headers), or `whitespace`
-	(highlighting whitespace errors).
+	`new` (added lines), `commit` (commit headers), `whitespace`
+	(highlighting whitespace errors), `oldMoved` (removed lines that
+	reappear), `newMoved` (added lines that were removed elsewhere),
+	`oldMovedAlternative` and `newMovedAlternative` (as a fallback to
+	cover adjacent blocks of moved code)
 
 color.decorate.<slot>::
 	Use customized color for 'git log --decorate' output.  `<slot>` is one
diff --git a/diff.c b/diff.c
index c0b8afa38f..23e70d348e 100644
--- a/diff.c
+++ b/diff.c
@@ -31,6 +31,7 @@ static int diff_indent_heuristic; /* experimental */
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
+static int diff_color_moved_default;
 static int diff_context_default = 3;
 static int diff_interhunk_context_default;
 static const char *diff_word_regex_cfg;
@@ -55,6 +56,10 @@ static char diff_colors[][COLOR_MAXLEN] = {
 	GIT_COLOR_YELLOW,	/* COMMIT */
 	GIT_COLOR_BG_RED,	/* WHITESPACE */
 	GIT_COLOR_NORMAL,	/* FUNCINFO */
+	GIT_COLOR_BOLD_RED,	/* OLD_MOVED_A */
+	GIT_COLOR_BG_RED,	/* OLD_MOVED_B */
+	GIT_COLOR_BOLD_GREEN,	/* NEW_MOVED_A */
+	GIT_COLOR_BG_GREEN,	/* NEW_MOVED_B */
 };
 
 static NORETURN void die_want_option(const char *option_name)
@@ -80,6 +85,14 @@ static int parse_diff_color_slot(const char *var)
 		return DIFF_WHITESPACE;
 	if (!strcasecmp(var, "func"))
 		return DIFF_FUNCINFO;
+	if (!strcasecmp(var, "oldmoved"))
+		return DIFF_FILE_OLD_MOVED;
+	if (!strcasecmp(var, "oldmovedalternative"))
+		return DIFF_FILE_OLD_MOVED_ALT;
+	if (!strcasecmp(var, "newmoved"))
+		return DIFF_FILE_NEW_MOVED;
+	if (!strcasecmp(var, "newmovedalternative"))
+		return DIFF_FILE_NEW_MOVED_ALT;
 	return -1;
 }
 
@@ -234,6 +247,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_use_color_default = git_config_colorbool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "color.moved")) {
+		diff_color_moved_default = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.context")) {
 		diff_context_default = git_config_int(var, value);
 		if (diff_context_default < 0)
@@ -354,6 +371,88 @@ int git_diff_basic_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+struct moved_entry {
+	struct hashmap_entry ent;
+	const struct diff_line *line;
+	struct moved_entry *next_line;
+};
+
+static void get_ws_cleaned_string(const struct diff_line *l,
+				  struct strbuf *out)
+{
+	int i;
+	for (i = 0; i < l->len; i++) {
+		if (isspace(l->line[i]))
+			continue;
+		strbuf_addch(out, l->line[i]);
+	}
+}
+
+static int diff_line_cmp_no_ws(const struct diff_line *a,
+					 const struct diff_line *b,
+					 const void *keydata)
+{
+	int ret;
+	struct strbuf sba = STRBUF_INIT;
+	struct strbuf sbb = STRBUF_INIT;
+
+	get_ws_cleaned_string(a, &sba);
+	get_ws_cleaned_string(b, &sbb);
+	ret = sba.len != sbb.len || strncmp(sba.buf, sbb.buf, sba.len);
+
+	strbuf_release(&sba);
+	strbuf_release(&sbb);
+	return ret;
+}
+
+static int diff_line_cmp(const struct diff_line *a,
+				   const struct diff_line *b,
+				   const void *keydata)
+{
+	return a->len != b->len || strncmp(a->line, b->line, a->len);
+}
+
+static int moved_entry_cmp(const struct moved_entry *a,
+			   const struct moved_entry *b,
+			   const void *keydata)
+{
+	return diff_line_cmp(a->line, b->line, keydata);
+}
+
+static int moved_entry_cmp_no_ws(const struct moved_entry *a,
+				 const struct moved_entry *b,
+				 const void *keydata)
+{
+	return diff_line_cmp_no_ws(a->line, b->line, keydata);
+}
+
+static unsigned get_line_hash(struct diff_line *line, unsigned ignore_ws)
+{
+	static struct strbuf sb = STRBUF_INIT;
+
+	if (ignore_ws) {
+		strbuf_reset(&sb);
+		get_ws_cleaned_string(line, &sb);
+		return memhash(sb.buf, sb.len);
+	} else {
+		return memhash(line->line, line->len);
+	}
+}
+
+static struct moved_entry *prepare_entry(struct diff_options *o,
+					 int line_no)
+{
+	struct moved_entry *ret = xmalloc(sizeof(*ret));
+	unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+	struct diff_line *l = &o->line_buffer[line_no];
+
+	ret->ent.hash = get_line_hash(l, ignore_ws);
+	ret->line = l;
+	ret->next_line = NULL;
+
+	return ret;
+}
+
 static char *quote_two(const char *one, const char *two)
 {
 	int need_one = quote_c_style(one, NULL, NULL, 1);
@@ -516,6 +615,141 @@ static void check_blank_at_eof(mmfile_t *mf1, mmfile_t *mf2,
 	ecbdata->blank_at_eof_in_postimage = (at - l2) + 1;
 }
 
+static void add_lines_to_move_detection(struct diff_options *o,
+					struct hashmap *add_lines,
+					struct hashmap *del_lines)
+{
+	struct moved_entry *prev_line = NULL;
+
+	int n;
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		int sign = 0;
+		struct hashmap *hm;
+		struct moved_entry *key;
+
+		switch (o->line_buffer[n].sign) {
+		case '+':
+			sign = '+';
+			hm = add_lines;
+			break;
+		case '-':
+			sign = '-';
+			hm = del_lines;
+			break;
+		case ' ':
+		default:
+			prev_line = NULL;
+			continue;
+		}
+
+		key = prepare_entry(o, n);
+		if (prev_line &&
+		    prev_line->line->sign == sign)
+			prev_line->next_line = key;
+
+		hashmap_add(hm, key);
+		prev_line = key;
+	}
+}
+
+static void mark_color_as_moved(struct diff_options *o,
+				struct hashmap *add_lines,
+				struct hashmap *del_lines)
+{
+	struct moved_entry **pmb = NULL; /* potentially moved blocks */
+	int pmb_nr = 0, pmb_alloc = 0;
+	int use_alt_color = 0;
+	int n;
+
+	for (n = 0; n < o->line_buffer_nr; n++) {
+		struct hashmap *hm = NULL;
+		struct moved_entry *key;
+		struct moved_entry *match = NULL;
+		struct diff_line *l = &o->line_buffer[n];
+		int i, lp, rp;
+
+		switch (l->sign) {
+		case '+':
+			hm = del_lines;
+			break;
+		case '-':
+			hm = add_lines;
+			break;
+		default:
+			use_alt_color = 0;
+			pmb_nr = 0; /* no running sets */
+			continue;
+		}
+
+		/* Check for any match to color it as a move. */
+		key = prepare_entry(o, n);
+		match = hashmap_get(hm, key, o);
+		free(key);
+		if (!match)
+			continue;
+
+		/* Check any potential block runs, advance each or nullify */
+		for (i = 0; i < pmb_nr; i++) {
+			struct moved_entry *p = pmb[i];
+			struct moved_entry *pnext = (p && p->next_line) ?
+					p->next_line : NULL;
+			if (pnext &&
+			    !diff_line_cmp(pnext->line, l, o)) {
+				pmb[i] = p->next_line;
+			} else {
+				pmb[i] = NULL;
+			}
+		}
+
+		/* Shrink the set to the remaining runs */
+		for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
+			while (lp < pmb_nr && pmb[lp])
+				lp++;
+			/* lp points at the first NULL now */
+
+			while (rp > -1 && !pmb[rp])
+				rp--;
+			/* rp points at the last non-NULL */
+
+			if (lp < pmb_nr && rp > -1 && lp < rp) {
+				pmb[lp] = pmb[rp];
+				pmb[rp] = NULL;
+				rp--;
+				lp++;
+			}
+		}
+
+		if (rp > -1) {
+			/* Remember the number of running sets */
+			pmb_nr = rp + 1;
+		} else {
+			/* Toggle color */
+			use_alt_color = (use_alt_color + 1) % 2;
+
+			/* Build up a new set */
+			pmb_nr = 0;
+			for (; match; match = hashmap_get_next(hm, match)) {
+				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				pmb[pmb_nr++] = match;
+			}
+		}
+
+		switch (l->sign) {
+		case '+':
+			l->set = diff_get_color_opt(o,
+				DIFF_FILE_NEW_MOVED + use_alt_color);
+			break;
+		case '-':
+			l->set = diff_get_color_opt(o,
+				DIFF_FILE_OLD_MOVED + use_alt_color);
+			break;
+		default:
+			die("BUG: we should have continued earlier?");
+		}
+	}
+	free(pmb);
+}
+
 static void emit_diff_line(struct diff_options *o,
 				     struct diff_line *e)
 {
@@ -3518,6 +3752,8 @@ void diff_setup(struct diff_options *options)
 	options->line_buffer = NULL;
 	options->line_buffer_nr = 0;
 	options->line_buffer_alloc = 0;
+
+	options->color_moved = diff_color_moved_default;
 }
 
 void diff_setup_done(struct diff_options *options)
@@ -3627,6 +3863,9 @@ void diff_setup_done(struct diff_options *options)
 
 	if (DIFF_OPT_TST(options, FOLLOW_RENAMES) && options->pathspec.nr != 1)
 		die(_("--follow requires exactly one pathspec"));
+
+	if (!options->use_color || external_diff())
+		options->color_moved = 0;
 }
 
 static int opt_arg(const char *arg, int arg_short, const char *arg_long, int *val)
@@ -4051,6 +4290,10 @@ int diff_opt_parse(struct diff_options *options,
 	}
 	else if (!strcmp(arg, "--no-color"))
 		options->use_color = 0;
+	else if (!strcmp(arg, "--color-moved"))
+		options->color_moved = 1;
+	else if (!strcmp(arg, "--no-color-moved"))
+		options->color_moved = 0;
 	else if (!strcmp(arg, "--color-words")) {
 		options->use_color = 1;
 		options->word_diff = DIFF_WORDS_COLOR;
@@ -4856,16 +5099,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 {
 	int i;
 	struct diff_queue_struct *q = &diff_queued_diff;
-	/*
-	 * For testing purposes we want to make sure the diff machinery
-	 * works completely with the buffer. If there is anything emitted
-	 * outside the emit_diff_line, then the order is screwed
-	 * up and the tests will fail.
-	 *
-	 * TODO (later in this series):
-	 * We'll unset this flag in a later patch.
-	 */
-	o->use_buffer = 1;
+
+	if (o->color_moved)
+		o->use_buffer = 1;
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
@@ -4874,6 +5110,24 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 	}
 
 	if (o->use_buffer) {
+		if (o->color_moved) {
+			struct hashmap add_lines, del_lines;
+			unsigned ignore_ws = DIFF_XDL_TST(o, IGNORE_WHITESPACE);
+
+			hashmap_init(&del_lines, ignore_ws ?
+				(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+				(hashmap_cmp_fn)moved_entry_cmp, 0);
+			hashmap_init(&add_lines, ignore_ws ?
+				(hashmap_cmp_fn)moved_entry_cmp_no_ws :
+				(hashmap_cmp_fn)moved_entry_cmp, 0);
+
+			add_lines_to_move_detection(o, &add_lines, &del_lines);
+			mark_color_as_moved(o, &add_lines, &del_lines);
+
+			hashmap_free(&add_lines, 0);
+			hashmap_free(&del_lines, 0);
+		}
+
 		for (i = 0; i < o->line_buffer_nr; i++)
 			emit_diff_line(o, &o->line_buffer[i]);
 
@@ -4962,6 +5216,7 @@ void diff_flush(struct diff_options *options)
 		if (!options->file)
 			die_errno("Could not open /dev/null");
 		options->close_file = 1;
+		options->color_moved = 0;
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
 			if (check_pair_status(p))
diff --git a/diff.h b/diff.h
index fad1258556..445259ebf7 100644
--- a/diff.h
+++ b/diff.h
@@ -7,6 +7,7 @@
 #include "tree-walk.h"
 #include "pathspec.h"
 #include "object.h"
+#include "hashmap.h"
 
 struct rev_info;
 struct diff_options;
@@ -228,6 +229,8 @@ struct diff_options {
 
 	struct diff_line *line_buffer;
 	int line_buffer_nr, line_buffer_alloc;
+
+	int color_moved;
 };
 
 /* Emit [line_prefix] [set] line [reset] */
@@ -243,7 +246,11 @@ enum color_diff {
 	DIFF_FILE_NEW = 5,
 	DIFF_COMMIT = 6,
 	DIFF_WHITESPACE = 7,
-	DIFF_FUNCINFO = 8
+	DIFF_FUNCINFO = 8,
+	DIFF_FILE_OLD_MOVED = 9,
+	DIFF_FILE_OLD_MOVED_ALT = 10,
+	DIFF_FILE_NEW_MOVED = 11,
+	DIFF_FILE_NEW_MOVED_ALT = 12
 };
 const char *diff_get_color(int diff_use_color, enum color_diff ix);
 #define diff_get_color_opt(o, ix) \
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 289806d0c7..0e92bf94bf 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -972,4 +972,271 @@ test_expect_success 'option overrides diff.wsErrorHighlight' '
 
 '
 
+test_expect_success 'detect moved code, complete file' '
+	git reset --hard &&
+	cat <<-\EOF >test.c &&
+	#include<stdio.h>
+	main()
+	{
+	printf("Hello World");
+	}
+	EOF
+	git add test.c &&
+	git commit -m "add main function" &&
+	git mv test.c main.c &&
+	git diff HEAD --color-moved --no-renames | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>new file mode 100644<RESET>
+	<BOLD>index 0000000..a986c57<RESET>
+	<BOLD>--- /dev/null<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -0,0 +1,5 @@<RESET>
+	<BGREEN>+<RESET><BGREEN>#include<stdio.h><RESET>
+	<BGREEN>+<RESET><BGREEN>main()<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>printf("Hello World");<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>deleted file mode 100644<RESET>
+	<BOLD>index a986c57..0000000<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ /dev/null<RESET>
+	<CYAN>@@ -1,5 +0,0 @@<RESET>
+	<BRED>-#include<stdio.h><RESET>
+	<BRED>-main()<RESET>
+	<BRED>-{<RESET>
+	<BRED>-printf("Hello World");<RESET>
+	<BRED>-}<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect moved code, inside file' '
+	git reset --hard &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git add main.c test.c &&
+	git commit -m "add main and test file" &&
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			if (!u->is_allowed_foo)
+				return;
+			foo(u);
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BRED>-if (!u->is_allowed_foo)<RESET>
+	<BRED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BRED>-}<RESET>
+	<BRED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..e34eb69 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BGREEN>+<RESET><BGREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>}<RESET>
+	<BGREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'detect permutations inside moved code' '
+	# reusing the move example from last test:
+	cat <<-\EOF >main.c &&
+		#include<stdio.h>
+		int stuff()
+		{
+			printf("Hello ");
+			printf("World\n");
+		}
+
+		int main()
+		{
+			foo();
+		}
+	EOF
+	cat <<-\EOF >test.c &&
+		#include<stdio.h>
+		int bar()
+		{
+			printf("Hello World, but different\n");
+		}
+
+		int secure_foo(struct user *u)
+		{
+			foo(u);
+			if (!u->is_allowed_foo)
+				return;
+		}
+
+		int another_function()
+		{
+			bar();
+		}
+	EOF
+	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/main.c b/main.c<RESET>
+	<BOLD>index 27a619c..7cf9336 100644<RESET>
+	<BOLD>--- a/main.c<RESET>
+	<BOLD>+++ b/main.c<RESET>
+	<CYAN>@@ -5,13 +5,6 @@<RESET> <RESET>printf("Hello ");<RESET>
+	 printf("World\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BRED>-int secure_foo(struct user *u)<RESET>
+	<BRED>-{<RESET>
+	<BOLD;RED>-if (!u->is_allowed_foo)<RESET>
+	<BOLD;RED>-return;<RESET>
+	<BRED>-foo(u);<RESET>
+	<BOLD;RED>-}<RESET>
+	<BOLD;RED>-<RESET>
+	 int main()<RESET>
+	 {<RESET>
+	 foo();<RESET>
+	<BOLD>diff --git a/test.c b/test.c<RESET>
+	<BOLD>index 1dc1d85..2bedec9 100644<RESET>
+	<BOLD>--- a/test.c<RESET>
+	<BOLD>+++ b/test.c<RESET>
+	<CYAN>@@ -4,6 +4,13 @@<RESET> <RESET>int bar()<RESET>
+	 printf("Hello World, but different\n");<RESET>
+	 }<RESET>
+	 <RESET>
+	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
+	<BGREEN>+<RESET><BGREEN>{<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>foo(u);<RESET>
+	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
+	<BGREEN>+<RESET><BGREEN>return;<RESET>
+	<BOLD;GREEN>+<RESET><BOLD;GREEN>}<RESET>
+	<BOLD;GREEN>+<RESET>
+	 int another_function()<RESET>
+	 {<RESET>
+	 bar();<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'move detection does not mess up colored words' '
+	cat <<-\EOF >text.txt &&
+	Lorem Ipsum is simply dummy text of the printing and typesetting industry.
+	EOF
+	git add text.txt &&
+	git commit -a -m "clean state" &&
+	cat <<-\EOF >text.txt &&
+	simply Lorem Ipsum dummy is text of the typesetting and printing industry.
+	EOF
+	git diff --color-moved --word-diff >actual &&
+	git diff --word-diff >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'move detection with submodules' '
+	test_create_repo bananas &&
+	echo ripe >bananas/recipe &&
+	git -C bananas add recipe &&
+	test_commit fruit &&
+	test_commit -C bananas recipe &&
+	git submodule add ./bananas &&
+	git add bananas &&
+	git commit -a -m "bananas are like a heavy library?" &&
+	echo foul >bananas/recipe &&
+	echo ripe >fruit.t &&
+
+	git diff --submodule=diff --color-moved >actual &&
+
+	# no move detection as the moved line is across repository boundaries.
+	test_decode_color <actual >decoded_actual &&
+	! grep BGREEN decoded_actual &&
+	! grep BRED decoded_actual &&
+
+	# nor did we mess with it another way
+	git diff --submodule=diff | test_decode_color >expect &&
+	test_cmp expect decoded_actual
+'
+
 test_done
-- 
2.13.0.18.g7d86cc8ba0


^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-23  2:40       ` [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
@ 2017-05-23  5:59         ` Junio C Hamano
  2017-05-23 18:14           ` Stefan Beller
  0 siblings, 1 reply; 128+ messages in thread
From: Junio C Hamano @ 2017-05-23  5:59 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, bmwill, jrnieder, jonathantanmy, peff, mhagger

Stefan Beller <sbeller@google.com> writes:

> diff --git a/submodule.c b/submodule.c
> index d3299e29c0..428c996c97 100644
> --- a/submodule.c
> +++ b/submodule.c
> ...
> @@ -547,15 +543,16 @@ void show_submodule_inline_diff(FILE *f, const char *path,
>  	if (right)
>  		new = two;
>  
> -	fflush(f);
>  	cp.git_cmd = 1;
>  	cp.dir = path;
> -	cp.out = dup(fileno(f));
> +	cp.out = -1;
>  	cp.no_stdin = 1;
>  
>  	/* TODO: other options may need to be passed here. */
>  	argv_array_push(&cp.args, "diff");
> -	argv_array_pushf(&cp.args, "--line-prefix=%s", line_prefix);
> +	if (o->use_color)
> +		argv_array_push(&cp.args, "--color=always");
> +	argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));

This makes me wonder if we also need to explicitly decline coloring
when o->use_color is not set.  After all, even if configuration in
the submodule's config file says diff.color=never, we will enable
the color with this codepath (because the user explicitly asked to
use the color in the top-level), so we should do the same for the
opposite case where the config says yes/auto if the user said no at
the top-level, no?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt
  2017-05-23  5:59         ` Junio C Hamano
@ 2017-05-23 18:14           ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-23 18:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git@vger.kernel.org, Brandon Williams, Jonathan Nieder,
	Jonathan Tan, Jeff King, Michael Haggerty

On Mon, May 22, 2017 at 10:59 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> diff --git a/submodule.c b/submodule.c
>> index d3299e29c0..428c996c97 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> ...
>> @@ -547,15 +543,16 @@ void show_submodule_inline_diff(FILE *f, const char *path,
>>       if (right)
>>               new = two;
>>
>> -     fflush(f);
>>       cp.git_cmd = 1;
>>       cp.dir = path;
>> -     cp.out = dup(fileno(f));
>> +     cp.out = -1;
>>       cp.no_stdin = 1;
>>
>>       /* TODO: other options may need to be passed here. */
>>       argv_array_push(&cp.args, "diff");
>> -     argv_array_pushf(&cp.args, "--line-prefix=%s", line_prefix);
>> +     if (o->use_color)
>> +             argv_array_push(&cp.args, "--color=always");
>> +     argv_array_pushf(&cp.args, "--line-prefix=%s", diff_line_prefix(o));
>
> This makes me wonder if we also need to explicitly decline coloring
> when o->use_color is not set.  After all, even if configuration in
> the submodule's config file says diff.color=never, we will enable
> the color with this codepath (because the user explicitly asked to
> use the color in the top-level), so we should do the same for the
> opposite case where the config says yes/auto if the user said no at
> the top-level, no?

That makes sense, so instead we'd do

             argv_array_push(&cp.args, "--color=%s", o->use_color ?
"always" : "never");

to override the submodule config in all cases.

However that changes from current behavior.

You could imagine that you want to see the superproject colored
and the submodule non-colored to easily spot that it is a submodule change.
Currently this can be made to work via setting color=never in the
submodule and then run the diff from the superproject.

What we really want here is a switch that influences the automatic detection
and say: pretend "dup(fileno(f));" was your stdout, now run your auto-detection
to decide for yourself.

I am not sure if it worth the effort to fix this hypothetical situation, though.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv4 00/17] Diff machine: highlight moved lines.
  2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
                         ` (16 preceding siblings ...)
  2017-05-23  2:40       ` [PATCHv4 17/17] diff.c: color moved lines differently Stefan Beller
@ 2017-05-27  1:04       ` Jacob Keller
  2017-05-30 21:38         ` Stefan Beller
  17 siblings, 1 reply; 128+ messages in thread
From: Jacob Keller @ 2017-05-27  1:04 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Git mailing list, Brandon Williams,
	Jonathan Nieder, Jonathan Tan, Jeff King, Michael Haggerty

On Mon, May 22, 2017 at 7:40 PM, Stefan Beller <sbeller@google.com> wrote:
> v4:
> * interdiff to v3 (what is currently origin/sb/diff-color-move) below.
> * renamed the "buffered_patch_line" to "diff_line". Originally I planned
>   to not carry the "line" part as it can be a piece of a line as well.
>   But for the intended functionality it is best to keep the name.
>   If we'd want to add more functionality to say have a move detection
>   for words as well, we'd rename the struct to have a better name then.
>   For now diff_line is the best. (Thanks Jonathan Nieder!)
> * tests to demonstrate it doesn't mess with --color-words as well as
>   submodules. (Thanks Jonathan Tan!)
> * added in the statics (Thanks Ramsay!)
> * smaller scope for the hashmaps (Thanks Jonathan Tan!)
> * some commit messages were updated, prior patch 4-7 is squashed into one
>   (Thanks Jonathan Tan!)
> * the tests added revealed an actual fault: now that the submodule process
>   is not attached to a dupe of our stdout, it would stop coloring the
>   output. We need to pass on use-color explicitly.
> * updated the NEEDSWORK comment in the second last patch.
>
> Thanks for bearing,
> Stefan
>

One thing to note when I was playing around with what's on pu right
now, I noticed that the oldMovedAlternative and newMovedAlternative
are the first moved colors to be used if there is only one move. (Ie:
a simple case of literally one section moved) This is a bit weird that
the alternative colors are used before the "main" colors. I would have
thought it would be the other way.

I noticed this because the default colors do not work well for my
terminal color scheme and I had to configure but realized that I
needed to configure the alternative ones to make a difference in the
simple diff I was viewing.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCHv4 00/17] Diff machine: highlight moved lines.
  2017-05-27  1:04       ` [PATCHv4 00/17] Diff machine: highlight moved lines Jacob Keller
@ 2017-05-30 21:38         ` Stefan Beller
  0 siblings, 0 replies; 128+ messages in thread
From: Stefan Beller @ 2017-05-30 21:38 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Junio C Hamano, Git mailing list, Brandon Williams,
	Jonathan Nieder, Jonathan Tan, Jeff King, Michael Haggerty

On Fri, May 26, 2017 at 6:04 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
> On Mon, May 22, 2017 at 7:40 PM, Stefan Beller <sbeller@google.com> wrote:
>> v4:
>> * interdiff to v3 (what is currently origin/sb/diff-color-move) below.
>> * renamed the "buffered_patch_line" to "diff_line". Originally I planned
>>   to not carry the "line" part as it can be a piece of a line as well.
>>   But for the intended functionality it is best to keep the name.
>>   If we'd want to add more functionality to say have a move detection
>>   for words as well, we'd rename the struct to have a better name then.
>>   For now diff_line is the best. (Thanks Jonathan Nieder!)
>> * tests to demonstrate it doesn't mess with --color-words as well as
>>   submodules. (Thanks Jonathan Tan!)
>> * added in the statics (Thanks Ramsay!)
>> * smaller scope for the hashmaps (Thanks Jonathan Tan!)
>> * some commit messages were updated, prior patch 4-7 is squashed into one
>>   (Thanks Jonathan Tan!)
>> * the tests added revealed an actual fault: now that the submodule process
>>   is not attached to a dupe of our stdout, it would stop coloring the
>>   output. We need to pass on use-color explicitly.
>> * updated the NEEDSWORK comment in the second last patch.
>>
>> Thanks for bearing,
>> Stefan
>>
>
> One thing to note when I was playing around with what's on pu right
> now, I noticed that the oldMovedAlternative and newMovedAlternative
> are the first moved colors to be used if there is only one move. (Ie:
> a simple case of literally one section moved) This is a bit weird that
> the alternative colors are used before the "main" colors. I would have
> thought it would be the other way.

While pu is not up-to-date, I double checked with the most recent
implementation and that is no longer the case.

> I noticed this because the default colors do not work well for my
> terminal color scheme and I had to configure but realized that I
> needed to configure the alternative ones to make a difference in the
> simple diff I was viewing.

The v4 that you tested, is the "alternate" scheme in the resend
https://public-inbox.org/git/20170527001820.25214-2-sbeller@google.com/

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2017-05-30 21:38 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-14  4:00 [RFC PATCH 00/19] Diff machine: highlight moved lines Stefan Beller
2017-05-14  4:00 ` [PATCH 01/19] diff: readability fix Stefan Beller
2017-05-14  4:01 ` [PATCH 02/19] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-15  6:48   ` Junio C Hamano
2017-05-15 16:13     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 03/19] diff.c: drop 'nofirst' from emit_line_0 Stefan Beller
2017-05-15 18:26   ` Jonathan Tan
2017-05-15 18:33     ` Stefan Beller
2017-05-16 16:05       ` Jonathan Tan
2017-05-15 19:22   ` Brandon Williams
2017-05-15 19:35     ` Stefan Beller
2017-05-15 19:45       ` Brandon Williams
2017-05-14  4:01 ` [PATCH 04/19] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-14  4:01 ` [PATCH 05/19] diff.c: emit_line_0 can handle no color setting Stefan Beller
2017-05-15 18:31   ` Jonathan Tan
2017-05-15 22:11     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 06/19] diff: add emit_line_fmt Stefan Beller
2017-05-15 19:31   ` Brandon Williams
2017-05-14  4:01 ` [PATCH 07/19] diff.c: convert fn_out_consume to use emit_line_* Stefan Beller
2017-05-16  1:00   ` Junio C Hamano
2017-05-16  1:05     ` Junio C Hamano
2017-05-16 16:23       ` Stefan Beller
2017-05-14  4:01 ` [PATCH 08/19] diff.c: convert builtin_diff " Stefan Beller
2017-05-15 18:42   ` Jonathan Tan
2017-05-14  4:01 ` [PATCH 09/19] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-14  4:01 ` [PATCH 10/19] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-15 19:09   ` Jonathan Tan
2017-05-15 19:31     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 11/19] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-14  4:01 ` [PATCH 12/19] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-14  4:01 ` [PATCH 13/19] diff.c: convert show_stats " Stefan Beller
2017-05-14  4:01 ` [PATCH 14/19] diff.c: convert word diffing " Stefan Beller
2017-05-15 22:40   ` Jonathan Tan
2017-05-15 23:12     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 15/19] diff.c: convert diff_flush " Stefan Beller
2017-05-15 20:21   ` Jonathan Tan
2017-05-15 22:08     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 16/19] diff.c: convert diff_summary " Stefan Beller
2017-05-14  4:01 ` [PATCH 17/19] diff.c: factor out emit_line_ws for coloring whitespaces Stefan Beller
2017-05-14  4:01 ` [PATCH 18/19] diff: buffer all output if asked to Stefan Beller
2017-05-14  4:06   ` Jeff King
2017-05-14  4:25     ` Stefan Beller
2017-05-16  4:14   ` Jonathan Tan
2017-05-16 16:42     ` Stefan Beller
2017-05-14  4:01 ` [PATCH 19/19] diff.c: color moved lines differently Stefan Beller
2017-05-15 22:42   ` Brandon Williams
2017-05-16  4:34   ` Jonathan Tan
2017-05-16 12:31   ` Jeff King
2017-05-15 12:43 ` [RFC PATCH 00/19] Diff machine: highlight moved lines Junio C Hamano
2017-05-15 16:33   ` Stefan Beller
2017-05-17  2:58 ` [PATCHv2 00/20] " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 01/20] diff: readability fix Stefan Beller
2017-05-17  2:58   ` [PATCHv2 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-17  2:58   ` [PATCHv2 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-17  2:58   ` [PATCHv2 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
2017-05-17  2:58   ` [PATCHv2 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
2017-05-17  2:58   ` [PATCHv2 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
2017-05-17  2:58   ` [PATCHv2 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
2017-05-17  2:58   ` [PATCHv2 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
2017-05-17  2:58   ` [PATCHv2 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
2017-05-17  2:58   ` [PATCHv2 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-17  5:03     ` Junio C Hamano
2017-05-17 21:16       ` Stefan Beller
2017-05-18  3:35     ` Junio C Hamano
2017-05-17  2:58   ` [PATCHv2 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-17  5:19     ` Junio C Hamano
2017-05-17 21:05       ` Stefan Beller
2017-05-18  3:25         ` Junio C Hamano
2017-05-18 17:12           ` Stefan Beller
2017-05-20  4:50             ` Junio C Hamano
2017-05-20 22:00               ` Stefan Beller
2017-05-17  2:58   ` [PATCHv2 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-17  2:58   ` [PATCHv2 14/20] diff.c: convert show_stats " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 15/20] diff.c: convert word diffing " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 16/20] diff.c: convert diff_flush " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 17/20] diff.c: convert diff_summary " Stefan Beller
2017-05-17  2:58   ` [PATCHv2 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
2017-05-17  2:58   ` [PATCHv2 19/20] diff: buffer all output if asked to Stefan Beller
2017-05-17  2:58   ` [PATCHv2 20/20] diff.c: color moved lines differently Stefan Beller
2017-05-18 19:37   ` [PATCHv3 00/20] Diff machine: highlight moved lines Stefan Beller
2017-05-18 19:37     ` [PATCHv3 01/20] diff: readability fix Stefan Beller
2017-05-18 19:37     ` [PATCHv3 02/20] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-18 19:37     ` [PATCHv3 03/20] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-18 19:37     ` [PATCHv3 04/20] diff.c: teach emit_line_0 to accept sign parameter Stefan Beller
2017-05-18 23:33       ` Jonathan Tan
2017-05-22 23:36         ` Stefan Beller
2017-05-18 19:37     ` [PATCHv3 05/20] diff.c: emit_line_0 can handle no color setting Stefan Beller
2017-05-18 19:37     ` [PATCHv3 06/20] diff.c: emit_line_0 takes parameter whether to output line prefix Stefan Beller
2017-05-18 19:37     ` [PATCHv3 07/20] diff.c: inline emit_line_0 into emit_line Stefan Beller
2017-05-18 19:37     ` [PATCHv3 08/20] diff.c: convert fn_out_consume to use emit_line Stefan Beller
2017-05-18 19:37     ` [PATCHv3 09/20] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
2017-05-18 19:37     ` [PATCHv3 10/20] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 11/20] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 12/20] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-18 19:37     ` [PATCHv3 13/20] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-18 19:37     ` [PATCHv3 14/20] diff.c: convert show_stats " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 15/20] diff.c: convert word diffing " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 16/20] diff.c: convert diff_flush " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 17/20] diff.c: convert diff_summary " Stefan Beller
2017-05-18 19:37     ` [PATCHv3 18/20] diff.c: emit_line includes whitespace highlighting Stefan Beller
2017-05-18 19:37     ` [PATCHv3 19/20] diff: buffer all output if asked to Stefan Beller
2017-05-18 19:37     ` [PATCHv3 20/20] diff.c: color moved lines differently Stefan Beller
2017-05-19 18:23       ` Jonathan Tan
2017-05-19 18:40         ` Stefan Beller
2017-05-19 19:34           ` Jonathan Tan
2017-05-23  2:40     ` [PATCHv4 00/17] Diff machine: highlight moved lines Stefan Beller
2017-05-23  2:40       ` [PATCHv4 01/17] diff: readability fix Stefan Beller
2017-05-23  2:40       ` [PATCHv4 02/17] diff: move line ending check into emit_hunk_header Stefan Beller
2017-05-23  2:40       ` [PATCHv4 03/17] diff.c: factor out diff_flush_patch_all_file_pairs Stefan Beller
2017-05-23  2:40       ` [PATCHv4 04/17] diff: introduce more flexible emit function Stefan Beller
2017-05-23  2:40       ` [PATCHv4 05/17] diff.c: convert fn_out_consume to use emit_line Stefan Beller
2017-05-23  2:40       ` [PATCHv4 06/17] diff.c: convert builtin_diff to use emit_line_* Stefan Beller
2017-05-23  2:40       ` [PATCHv4 07/17] diff.c: convert emit_rewrite_diff " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 08/17] diff.c: convert emit_rewrite_lines " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 09/17] submodule.c: convert show_submodule_summary to use emit_line_fmt Stefan Beller
2017-05-23  5:59         ` Junio C Hamano
2017-05-23 18:14           ` Stefan Beller
2017-05-23  2:40       ` [PATCHv4 10/17] diff.c: convert emit_binary_diff_body to use emit_line_* Stefan Beller
2017-05-23  2:40       ` [PATCHv4 11/17] diff.c: convert show_stats " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 12/17] diff.c: convert word diffing " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 13/17] diff.c: convert diff_flush " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 14/17] diff.c: convert diff_summary " Stefan Beller
2017-05-23  2:40       ` [PATCHv4 15/17] diff.c: emit_line includes whitespace highlighting Stefan Beller
2017-05-23  2:40       ` [PATCHv4 16/17] diff: buffer all output if asked to Stefan Beller
2017-05-23  2:40       ` [PATCHv4 17/17] diff.c: color moved lines differently Stefan Beller
2017-05-27  1:04       ` [PATCHv4 00/17] Diff machine: highlight moved lines Jacob Keller
2017-05-30 21:38         ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).