git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
@ 2016-04-15 16:51 Jacob Keller
  2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function Jacob Keller
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 16:51 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

From: Jacob Keller <jacob.keller@gmail.com>

I took up Stefan's patch, and modified it to resolve a couple issues. I
also tried to implement the suggestions from Junio's review, as well as
the suggestions I had. It appears to produce equivalent output as Jeff's
script. This version is still missing a few things, and it is possible
Stefan is working on a v2 of his own, but I thought I'd submit this.

There's still several TODOs:
* Add some tests
* better name for the heuristic(?)
* better patch description plus documentation
* is_emptyline should be more generic and handle CRLF

Changes since Stefan's v1:
* Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
* Fixed a segfault in Stefan's patch
* Added XDL flag to configure the behavior
* Used an int and counted empty lines via += instead of |=
* Renamed starts_with_emptyline to is_emptyline
* Added diff command line and config options

For reviewer convenience, the interdiff between this and Stefan's version:

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba56522bce..cebf82702d2a 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.emptyLineHeuristic::
+	Set this option to true to enable the empty line chunk heuristic when
+	producing diff output. This heuristic will attempt to shift hunks such
+	that a common empty line occurs below the hunk with the rest of the
+	context above it.
+
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e2ac15..6c1cd8b35584 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--empty-line-heuristic::
+--no-empty-line-heuristic::
+	When possible, shift common empty lines in diff hunks below the hunk
+	such that the last common empty line for each hunk is below, with the
+	rest of the context above the hunk.
+
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe6609d059..8ab9a492928d 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_empty_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_detect_rename_default = git_config_rename(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "diff.emptylineheuristic")) {
+		diff_empty_line_heuristic = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
 		diff_auto_refresh_index = git_config_bool(var, value);
 		return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
+	if (diff_empty_line_heuristic)
+		DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--empty-line-heuristic"))
+		DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
+	else if (!strcmp(arg, "--no-empty-line-heuristic"))
+		DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79410c2..9195a5c0e958 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
+#define XDF_EMPTY_LINE_HEURISTIC (1 << 8)
+
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 5f14beb98049..83984b90f82f 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,7 +400,7 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
-static int starts_with_emptyline(const char *recs)
+static int is_emptyline(const char *recs)
 {
 	return recs[0] == '\n'; /* CRLF not covered here */
 }
@@ -416,7 +416,7 @@ static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
-	unsigned char has_emptyline;
+	unsigned int emptylines;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -450,7 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		do {
 			grpsiz = ix - ixs;
-			has_emptyline = 0;
+			emptylines = 0;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -458,11 +458,11 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
+				emptylines += is_emptyline(recs[ix - 1]->ptr);
+
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 
-				has_emptyline |=
-					starts_with_emptyline(recs[ix]->ptr);
 				/*
 				 * This change might have joined two change groups,
 				 * so we try to take this scenario in account by moving
@@ -492,9 +492,6 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
-				has_emptyline |=
-					starts_with_emptyline(recs[ix]->ptr);
-
 				/*
 				 * This change might have joined two change groups,
 				 * so we try to take this scenario in account by moving
@@ -527,21 +524,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 		 * As we shifted the group forward as far as possible, we only
 		 * need to shift it back if at all.
 		 */
-		if (has_emptyline) {
-			while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
-			       xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags) &&
-			       !starts_with_emptyline(recs[ix - 1]->ptr)) {
+		if ((flags & XDF_EMPTY_LINE_HEURISTIC) && emptylines) {
+			while (ixs > 0 &&
+			       !is_emptyline(recs[ix - 1]->ptr) &&
+			       xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
-
-				/*
-				 * This change did not join two change groups,
-				 * as we did that before already, so there is no
-				 * need to adapt the other-file, i.e.
-				 * running
-				 *     for (; rchg[ixs - 1]; ixs--);
-				 *     while (rchgo[--ixo]);
-				 */
 			}
 		}
 	}

Jacob Keller (1):
  xdiff: add xdl_hash_and_recmatch helper function

Stefan Beller (1):
  xdiff: implement empty line chunk heuristic

 Documentation/diff-config.txt  |  6 ++++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 42 ++++++++++++++++++++++++++++++++++++++----
 5 files changed, 63 insertions(+), 4 deletions(-)

-- 
2.8.1.369.geae769a

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function
  2016-04-15 16:51 [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Jacob Keller
@ 2016-04-15 16:51 ` Jacob Keller
  2016-04-15 17:25   ` Junio C Hamano
  2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic Jacob Keller
  2016-04-15 17:02 ` [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Stefan Beller
  2 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 16:51 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller, Jacob Keller

From: Jacob Keller <jacob.keller@gmail.com>

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function xdl_hash_and_recmatch which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 xdiff/xdiffi.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 2358a2d6326e..fff97ac3e3c7 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags)
+{
+	return (recs[ixs]->ha == recs[ix]->ha &&
+		xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
+			     recs[ix]->ptr, recs[ix]->size,
+			     flags));
+}
+
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
@@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the last line of the current change group, shift backward
 			 * the group.
 			 */
-			while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
-			       xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
+			while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 
@@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the line next of the current change group, shift forward
 			 * the group.
 			 */
-			while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-			       xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
+			while (ix < nrec && xdl_hash_and_recmatch(recs, ixs, ix, flags)) {
+				emptylines += is_emptyline(recs[ix]->ptr);
+
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
-- 
2.8.1.369.geae769a

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 16:51 [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Jacob Keller
  2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function Jacob Keller
@ 2016-04-15 16:51 ` Jacob Keller
  2016-04-15 19:57   ` Stefan Beller
  2016-04-15 17:02 ` [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Stefan Beller
  2 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 16:51 UTC (permalink / raw)
  To: git
  Cc: Stefan Beller, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

From: Stefan Beller <sbeller@google.com>

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, leave the heuristic disabled by default.

Add an XDIFF flag to enable this heuristic only conditionally. Add
a diff command line option and diff configuration option to allow users
to enable this option when desired.

TODO:
* Add tests
* Add better/more documentation explaining the heuristic, possibly with
  examples(?)
* better name(?)

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 Documentation/diff-config.txt  |  6 ++++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 26 ++++++++++++++++++++++++++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba56522bce..cebf82702d2a 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.emptyLineHeuristic::
+	Set this option to true to enable the empty line chunk heuristic when
+	producing diff output. This heuristic will attempt to shift hunks such
+	that a common empty line occurs below the hunk with the rest of the
+	context above it.
+
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e2ac15..6c1cd8b35584 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--empty-line-heuristic::
+--no-empty-line-heuristic::
+	When possible, shift common empty lines in diff hunks below the hunk
+	such that the last common empty line for each hunk is below, with the
+	rest of the context above the hunk.
+
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe6609d059..8ab9a492928d 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_empty_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_detect_rename_default = git_config_rename(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "diff.emptylineheuristic")) {
+		diff_empty_line_heuristic = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
 		diff_auto_refresh_index = git_config_bool(var, value);
 		return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
+	if (diff_empty_line_heuristic)
+		DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--empty-line-heuristic"))
+		DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
+	else if (!strcmp(arg, "--no-empty-line-heuristic"))
+		DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79410c2..9195a5c0e958 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
+#define XDF_EMPTY_LINE_HEURISTIC (1 << 8)
+
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index fff97ac3e3c7..83984b90f82f 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,11 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int is_emptyline(const char *recs)
+{
+	return recs[0] == '\n'; /* CRLF not covered here */
+}
+
 static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
@@ -411,6 +416,7 @@ static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
+	unsigned int emptylines;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -444,6 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		do {
 			grpsiz = ix - ixs;
+			emptylines = 0;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -451,6 +458,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
+				emptylines += is_emptyline(recs[ix - 1]->ptr);
+
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 
@@ -506,6 +515,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			rchg[--ix] = 0;
 			while (rchgo[--ixo]);
 		}
+
+		/*
+		 * If a group can be moved back and forth, see if there is an
+		 * empty line in the moving space. If there is an empty line,
+		 * make sure the last empty line is the end of the group.
+		 *
+		 * As we shifted the group forward as far as possible, we only
+		 * need to shift it back if at all.
+		 */
+		if ((flags & XDF_EMPTY_LINE_HEURISTIC) && emptylines) {
+			while (ixs > 0 &&
+			       !is_emptyline(recs[ix - 1]->ptr) &&
+			       xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
+				rchg[--ixs] = 1;
+				rchg[--ix] = 0;
+			}
+		}
 	}
 
 	return 0;
-- 
2.8.1.369.geae769a

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 16:51 [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Jacob Keller
  2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function Jacob Keller
  2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic Jacob Keller
@ 2016-04-15 17:02 ` Stefan Beller
  2016-04-15 17:10   ` Stefan Beller
  2016-04-15 19:07   ` Jacob Keller
  2 siblings, 2 replies; 22+ messages in thread
From: Stefan Beller @ 2016-04-15 17:02 UTC (permalink / raw)
  To: Jacob Keller
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 9:51 AM, Jacob Keller <jacob.e.keller@intel.com> wrote:
> From: Jacob Keller <jacob.keller@gmail.com>
>
> I took up Stefan's patch, and modified it to resolve a couple issues. I
> also tried to implement the suggestions from Junio's review, as well as
> the suggestions I had. It appears to produce equivalent output as Jeff's
> script. This version is still missing a few things, and it is possible
> Stefan is working on a v2 of his own, but I thought I'd submit this.
>
> There's still several TODOs:
> * Add some tests
> * better name for the heuristic(?)
> * better patch description plus documentation
> * is_emptyline should be more generic and handle CRLF
>
> Changes since Stefan's v1:
> * Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
> * Fixed a segfault in Stefan's patch
> * Added XDL flag to configure the behavior
> * Used an int and counted empty lines via += instead of |=
> * Renamed starts_with_emptyline to is_emptyline
> * Added diff command line and config options
>
> For reviewer convenience, the interdiff between this and Stefan's version:
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index edba56522bce..cebf82702d2a 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -170,6 +170,12 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> +diff.emptyLineHeuristic::
> +       Set this option to true to enable the empty line chunk heuristic when
> +       producing diff output. This heuristic will attempt to shift hunks such
> +       that a common empty line occurs below the hunk with the rest of the
> +       context above it.
> +

    This heuristic will attempt to shift hunks such that *the last* common
    empty line occurs below the hunk with the rest of the context above it.

maybe?


>  diff.algorithm::
>         Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 4b0318e2ac15..6c1cd8b35584 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,6 +63,12 @@ ifndef::git-format-patch[]
>         Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> +--empty-line-heuristic::
> +--no-empty-line-heuristic::
> +       When possible, shift common empty lines in diff hunks below the hunk
> +       such that the last common empty line for each hunk is below, with the
> +       rest of the context above the hunk.
> +
>  --minimal::
>         Spend extra time to make sure the smallest possible
>         diff is produced.
> diff --git a/diff.c b/diff.c
> index 4dfe6609d059..8ab9a492928d 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -26,6 +26,7 @@
>  #endif
>
>  static int diff_detect_rename_default;
> +static int diff_empty_line_heuristic = 0;
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>                 diff_detect_rename_default = git_config_rename(var, value);
>                 return 0;
>         }
> +       if (!strcmp(var, "diff.emptylineheuristic")) {
> +               diff_empty_line_heuristic = git_config_bool(var, value);
> +               return 0;
> +       }
>         if (!strcmp(var, "diff.autorefreshindex")) {
>                 diff_auto_refresh_index = git_config_bool(var, value);
>                 return 0;
> @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
>         options->use_color = diff_use_color_default;
>         options->detect_rename = diff_detect_rename_default;
>         options->xdl_opts |= diff_algorithm;
> +       if (diff_empty_line_heuristic)
> +               DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
>
>         options->orderfile = diff_order_file_cfg;
>
> @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
>                 DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
>         else if (!strcmp(arg, "--ignore-blank-lines"))
>                 DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
> +       else if (!strcmp(arg, "--empty-line-heuristic"))
> +               DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
> +       else if (!strcmp(arg, "--no-empty-line-heuristic"))
> +               DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
>         else if (!strcmp(arg, "--patience"))
>                 options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
>         else if (!strcmp(arg, "--histogram"))
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 4fb7e79410c2..9195a5c0e958 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -41,6 +41,8 @@ extern "C" {
>
>  #define XDF_IGNORE_BLANK_LINES (1 << 7)
>
> +#define XDF_EMPTY_LINE_HEURISTIC (1 << 8)
> +
>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 5f14beb98049..83984b90f82f 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -400,7 +400,7 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
>  }
>
>
> -static int starts_with_emptyline(const char *recs)
> +static int is_emptyline(const char *recs)
>  {
>         return recs[0] == '\n'; /* CRLF not covered here */
>  }
> @@ -416,7 +416,7 @@ static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags
>  int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>         long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
>         char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
> -       unsigned char has_emptyline;
> +       unsigned int emptylines;
>         xrecord_t **recs = xdf->recs;
>
>         /*
> @@ -450,7 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>
>                 do {
>                         grpsiz = ix - ixs;
> -                       has_emptyline = 0;
> +                       emptylines = 0;
>
>                         /*
>                          * If the line before the current change group, is equal to
> @@ -458,11 +458,11 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>                          * the group.
>                          */
>                         while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
> +                               emptylines += is_emptyline(recs[ix - 1]->ptr);
> +
>                                 rchg[--ixs] = 1;
>                                 rchg[--ix] = 0;
>
> -                               has_emptyline |=
> -                                       starts_with_emptyline(recs[ix]->ptr);

How is this fixing the segfault bug?
From my understanding ix should have the same value here as ix-1 above.

>                                 /*
>                                  * This change might have joined two change groups,
>                                  * so we try to take this scenario in account by moving
> @@ -492,9 +492,6 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>                                 rchg[ixs++] = 0;
>                                 rchg[ix++] = 1;
>
> -                               has_emptyline |=
> -                                       starts_with_emptyline(recs[ix]->ptr);
> -

Or would this fix the segfault bug?
I think we would need to have the check for empty lines
in both loops to be sure to cover the whole movable range.


>                                 /*
>                                  * This change might have joined two change groups,
>                                  * so we try to take this scenario in account by moving
> @@ -527,21 +524,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>                  * As we shifted the group forward as far as possible, we only
>                  * need to shift it back if at all.
>                  */
> -               if (has_emptyline) {
> -                       while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
> -                              xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags) &&
> -                              !starts_with_emptyline(recs[ix - 1]->ptr)) {
> +               if ((flags & XDF_EMPTY_LINE_HEURISTIC) && emptylines) {
> +                       while (ixs > 0 &&
> +                              !is_emptyline(recs[ix - 1]->ptr) &&
> +                              xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
>                                 rchg[--ixs] = 1;
>                                 rchg[--ix] = 0;
> -
> -                               /*
> -                                * This change did not join two change groups,
> -                                * as we did that before already, so there is no
> -                                * need to adapt the other-file, i.e.
> -                                * running
> -                                *     for (; rchg[ixs - 1]; ixs--);
> -                                *     while (rchgo[--ixo]);
> -                                */
>                         }
>                 }
>         }
>
> Jacob Keller (1):
>   xdiff: add xdl_hash_and_recmatch helper function
>
> Stefan Beller (1):
>   xdiff: implement empty line chunk heuristic
>
>  Documentation/diff-config.txt  |  6 ++++++
>  Documentation/diff-options.txt |  6 ++++++
>  diff.c                         | 11 +++++++++++
>  xdiff/xdiff.h                  |  2 ++
>  xdiff/xdiffi.c                 | 42 ++++++++++++++++++++++++++++++++++++++----
>  5 files changed, 63 insertions(+), 4 deletions(-)
>
> --
> 2.8.1.369.geae769a
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 17:02 ` [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Stefan Beller
@ 2016-04-15 17:10   ` Stefan Beller
  2016-04-15 17:27     ` Junio C Hamano
  2016-04-15 19:08     ` Jacob Keller
  2016-04-15 19:07   ` Jacob Keller
  1 sibling, 2 replies; 22+ messages in thread
From: Stefan Beller @ 2016-04-15 17:10 UTC (permalink / raw)
  To: Jacob Keller
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 10:02 AM, Stefan Beller <sbeller@google.com> wrote:
>> @@ -458,11 +458,11 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>                          * the group.
>>                          */
>>                         while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
>> +                               emptylines += is_emptyline(recs[ix - 1]->ptr);
>> +
>>                                 rchg[--ixs] = 1;
>>                                 rchg[--ix] = 0;
>>
>> -                               has_emptyline |=
>> -                                       starts_with_emptyline(recs[ix]->ptr);
>
> How is this fixing the segfault bug?
> From my understanding ix should have the same value here as ix-1 above.
>
>>                                 /*
>>                                  * This change might have joined two change groups,
>>                                  * so we try to take this scenario in account by moving
>> @@ -492,9 +492,6 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>                                 rchg[ixs++] = 0;
>>                                 rchg[ix++] = 1;
>>
>> -                               has_emptyline |=
>> -                                       starts_with_emptyline(recs[ix]->ptr);
>> -
>
> Or would this fix the segfault bug?
> I think we would need to have the check for empty lines
> in both loops to be sure to cover the whole movable range.

Actually we would only need to have the empty line count in the second loop as
the first loop shifted it as much up(backwards) as possible, such that
the hunk sits on one
end of the movable range. The second loop would iterate over the whole
range, so that
would be the only place needed to count.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function
  2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function Jacob Keller
@ 2016-04-15 17:25   ` Junio C Hamano
  2016-04-15 19:09     ` Jacob Keller
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-04-15 17:25 UTC (permalink / raw)
  To: Jacob Keller
  Cc: git, Stefan Beller, Jeff King, Jens Lehmann, Davide Libenzi,
	Jacob Keller

Jacob Keller <jacob.e.keller@intel.com> writes:

> From: Jacob Keller <jacob.keller@gmail.com>
>
> It is a common pattern in xdl_change_compact to check that hashes and
> strings match. The resulting code to perform this change causes very
> long lines and makes it hard to follow the intention. Introduce a helper
> function xdl_hash_and_recmatch which performs both checks to increase
> code readability.

Think _why_ it is common to check hash and then do recmatch().  What
is the combination of two trying to compute?

How about calling it after "what" it computes, not after "how" it
computes it?  E.g.

    static int recs_match(xrecord_t **recs, long x, long y, long flags)

if we answer the above question "they try to see if two records match".
We could also go s/recs/lines/.

The xdl_recmatch() function appears in xutils.c, and over there the
functions do not use arrays of (xrecord_t *), so I think we are
better off without xdl_ prefix to avoid confusion.

> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  xdiff/xdiffi.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 2358a2d6326e..fff97ac3e3c7 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
>  }
>  
>  
> +static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags)
> +{
> +	return (recs[ixs]->ha == recs[ix]->ha &&
> +		xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
> +			     recs[ix]->ptr, recs[ix]->size,
> +			     flags));
> +}
> +
>  int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>  	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
>  	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
> @@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>  			 * the last line of the current change group, shift backward
>  			 * the group.
>  			 */
> -			while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
> -			       xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
> +			while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
>  				rchg[--ixs] = 1;
>  				rchg[--ix] = 0;
>  
> @@ -470,8 +477,9 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>  			 * the line next of the current change group, shift forward
>  			 * the group.
>  			 */
> -			while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
> -			       xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
> +			while (ix < nrec && xdl_hash_and_recmatch(recs, ixs, ix, flags)) {
> +				emptylines += is_emptyline(recs[ix]->ptr);
> +
>  				rchg[ixs++] = 0;
>  				rchg[ix++] = 1;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 17:10   ` Stefan Beller
@ 2016-04-15 17:27     ` Junio C Hamano
  2016-04-15 17:33       ` Stefan Beller
  2016-04-15 19:08     ` Jacob Keller
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-04-15 17:27 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jacob Keller, git@vger.kernel.org, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

Stefan Beller <sbeller@google.com> writes:

> Actually we would only need to have the empty line count in the
> second loop as the first loop shifted it as much up(backwards) as
> possible, such that the hunk sits on one end of the movable
> range. The second loop would iterate over the whole range, so that
> would be the only place needed to count.

The description makes me wonder if we can do without two loops ;-)

That is, can we teach the first loop not to be so aggressive to
shift "as much as possible" but notice there is an empty line we
would want to treat specially?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 17:27     ` Junio C Hamano
@ 2016-04-15 17:33       ` Stefan Beller
  2016-04-15 17:48         ` Junio C Hamano
  2016-04-15 19:09         ` Jacob Keller
  0 siblings, 2 replies; 22+ messages in thread
From: Stefan Beller @ 2016-04-15 17:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jacob Keller, git@vger.kernel.org, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 10:27 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Actually we would only need to have the empty line count in the
>> second loop as the first loop shifted it as much up(backwards) as
>> possible, such that the hunk sits on one end of the movable
>> range. The second loop would iterate over the whole range, so that
>> would be the only place needed to count.
>
> The description makes me wonder if we can do without two loops ;-)

I think the existing 2 loops are needed to move "as much as possible"
to either boundary to see if there is overlap to another group and combine
the groups if so. (This whole construction prior to these patches remind
me of shaker sort)

>
> That is, can we teach the first loop not to be so aggressive to
> shift "as much as possible" but notice there is an empty line we
> would want to treat specially?

I think we need to be aggressive to find adjacent groups and only after
that is done we should think about optimizing look&feel? That is why I
even proposed to not touch the current 2 loops at all and have our own
loop to find out if there is at least one empty line.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 17:33       ` Stefan Beller
@ 2016-04-15 17:48         ` Junio C Hamano
  2016-04-15 19:09         ` Jacob Keller
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2016-04-15 17:48 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jacob Keller, git@vger.kernel.org, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

Stefan Beller <sbeller@google.com> writes:

> I think we need to be aggressive to find adjacent groups and only after
> that is done we should think about optimizing look&feel?

OK.  I was just gauging to see if those involved in the codepath
thought things through, and apparently you did, so I'm happy ;-)

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 17:02 ` [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Stefan Beller
  2016-04-15 17:10   ` Stefan Beller
@ 2016-04-15 19:07   ` Jacob Keller
  1 sibling, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 19:07 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jacob Keller, git@vger.kernel.org, Junio C Hamano, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 10:02 AM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Apr 15, 2016 at 9:51 AM, Jacob Keller <jacob.e.keller@intel.com> wrote:
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> I took up Stefan's patch, and modified it to resolve a couple issues. I
>> also tried to implement the suggestions from Junio's review, as well as
>> the suggestions I had. It appears to produce equivalent output as Jeff's
>> script. This version is still missing a few things, and it is possible
>> Stefan is working on a v2 of his own, but I thought I'd submit this.
>>
>> There's still several TODOs:
>> * Add some tests
>> * better name for the heuristic(?)
>> * better patch description plus documentation
>> * is_emptyline should be more generic and handle CRLF
>>
>> Changes since Stefan's v1:
>> * Added a patch to implement xdl_hash_and_recmatch as Junio suggested.
>> * Fixed a segfault in Stefan's patch
>> * Added XDL flag to configure the behavior
>> * Used an int and counted empty lines via += instead of |=
>> * Renamed starts_with_emptyline to is_emptyline
>> * Added diff command line and config options
>>
>> For reviewer convenience, the interdiff between this and Stefan's version:
>>
>> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
>> index edba56522bce..cebf82702d2a 100644
>> --- a/Documentation/diff-config.txt
>> +++ b/Documentation/diff-config.txt
>> @@ -170,6 +170,12 @@ diff.tool::
>>
>>  include::mergetools-diff.txt[]
>>
>> +diff.emptyLineHeuristic::
>> +       Set this option to true to enable the empty line chunk heuristic when
>> +       producing diff output. This heuristic will attempt to shift hunks such
>> +       that a common empty line occurs below the hunk with the rest of the
>> +       context above it.
>> +
>
>     This heuristic will attempt to shift hunks such that *the last* common
>     empty line occurs below the hunk with the rest of the context above it.
>
> maybe?
>

That sounds reasonable.

>
>>  diff.algorithm::
>>         Choose a diff algorithm.  The variants are as follows:
>>  +
>> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
>> index 4b0318e2ac15..6c1cd8b35584 100644
>> --- a/Documentation/diff-options.txt
>> +++ b/Documentation/diff-options.txt
>> @@ -63,6 +63,12 @@ ifndef::git-format-patch[]
>>         Synonym for `-p --raw`.
>>  endif::git-format-patch[]
>>
>> +--empty-line-heuristic::
>> +--no-empty-line-heuristic::
>> +       When possible, shift common empty lines in diff hunks below the hunk
>> +       such that the last common empty line for each hunk is below, with the
>> +       rest of the context above the hunk.
>> +
>>  --minimal::
>>         Spend extra time to make sure the smallest possible
>>         diff is produced.
>> diff --git a/diff.c b/diff.c
>> index 4dfe6609d059..8ab9a492928d 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -26,6 +26,7 @@
>>  #endif
>>
>>  static int diff_detect_rename_default;
>> +static int diff_empty_line_heuristic = 0;
>>  static int diff_rename_limit_default = 400;
>>  static int diff_suppress_blank_empty;
>>  static int diff_use_color_default = -1;
>> @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>>                 diff_detect_rename_default = git_config_rename(var, value);
>>                 return 0;
>>         }
>> +       if (!strcmp(var, "diff.emptylineheuristic")) {
>> +               diff_empty_line_heuristic = git_config_bool(var, value);
>> +               return 0;
>> +       }
>>         if (!strcmp(var, "diff.autorefreshindex")) {
>>                 diff_auto_refresh_index = git_config_bool(var, value);
>>                 return 0;
>> @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
>>         options->use_color = diff_use_color_default;
>>         options->detect_rename = diff_detect_rename_default;
>>         options->xdl_opts |= diff_algorithm;
>> +       if (diff_empty_line_heuristic)
>> +               DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
>>
>>         options->orderfile = diff_order_file_cfg;
>>
>> @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
>>                 DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
>>         else if (!strcmp(arg, "--ignore-blank-lines"))
>>                 DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
>> +       else if (!strcmp(arg, "--empty-line-heuristic"))
>> +               DIFF_XDL_SET(options, EMPTY_LINE_HEURISTIC);
>> +       else if (!strcmp(arg, "--no-empty-line-heuristic"))
>> +               DIFF_XDL_CLR(options, EMPTY_LINE_HEURISTIC);
>>         else if (!strcmp(arg, "--patience"))
>>                 options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
>>         else if (!strcmp(arg, "--histogram"))
>> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
>> index 4fb7e79410c2..9195a5c0e958 100644
>> --- a/xdiff/xdiff.h
>> +++ b/xdiff/xdiff.h
>> @@ -41,6 +41,8 @@ extern "C" {
>>
>>  #define XDF_IGNORE_BLANK_LINES (1 << 7)
>>
>> +#define XDF_EMPTY_LINE_HEURISTIC (1 << 8)
>> +
>>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)
>>
>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> index 5f14beb98049..83984b90f82f 100644
>> --- a/xdiff/xdiffi.c
>> +++ b/xdiff/xdiffi.c
>> @@ -400,7 +400,7 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
>>  }
>>
>>
>> -static int starts_with_emptyline(const char *recs)
>> +static int is_emptyline(const char *recs)
>>  {
>>         return recs[0] == '\n'; /* CRLF not covered here */
>>  }
>> @@ -416,7 +416,7 @@ static int xdl_hash_and_recmatch(xrecord_t **recs, long ixs, long ix, long flags
>>  int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>         long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
>>         char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
>> -       unsigned char has_emptyline;
>> +       unsigned int emptylines;
>>         xrecord_t **recs = xdf->recs;
>>
>>         /*
>> @@ -450,7 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>
>>                 do {
>>                         grpsiz = ix - ixs;
>> -                       has_emptyline = 0;
>> +                       emptylines = 0;
>>
>>                         /*
>>                          * If the line before the current change group, is equal to
>> @@ -458,11 +458,11 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>                          * the group.
>>                          */
>>                         while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
>> +                               emptylines += is_emptyline(recs[ix - 1]->ptr);
>> +
>>                                 rchg[--ixs] = 1;
>>                                 rchg[--ix] = 0;
>>
>> -                               has_emptyline |=
>> -                                       starts_with_emptyline(recs[ix]->ptr);
>
> How is this fixing the segfault bug?
> From my understanding ix should have the same value here as ix-1 above.
>

This change was for style reasons to check for empty lines first
thing, and it matches how we do "ix - 1" into xdl_hash_and_recmatch.
But as per your following comment, I think you're right and we don't
need this.

>>                                 /*
>>                                  * This change might have joined two change groups,
>>                                  * so we try to take this scenario in account by moving
>> @@ -492,9 +492,6 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>                                 rchg[ixs++] = 0;
>>                                 rchg[ix++] = 1;
>>
>> -                               has_emptyline |=
>> -                                       starts_with_emptyline(recs[ix]->ptr);
>> -
>
> Or would this fix the segfault bug?
> I think we would need to have the check for empty lines
> in both loops to be sure to cover the whole movable range.
>

This is the real segfault fix, I only changed the other in order to
make it appear consistent.

>
>>                                 /*
>>                                  * This change might have joined two change groups,
>>                                  * so we try to take this scenario in account by moving
>> @@ -527,21 +524,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>                  * As we shifted the group forward as far as possible, we only
>>                  * need to shift it back if at all.
>>                  */
>> -               if (has_emptyline) {
>> -                       while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
>> -                              xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags) &&
>> -                              !starts_with_emptyline(recs[ix - 1]->ptr)) {
>> +               if ((flags & XDF_EMPTY_LINE_HEURISTIC) && emptylines) {
>> +                       while (ixs > 0 &&
>> +                              !is_emptyline(recs[ix - 1]->ptr) &&
>> +                              xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
>>                                 rchg[--ixs] = 1;
>>                                 rchg[--ix] = 0;
>> -
>> -                               /*
>> -                                * This change did not join two change groups,
>> -                                * as we did that before already, so there is no
>> -                                * need to adapt the other-file, i.e.
>> -                                * running
>> -                                *     for (; rchg[ixs - 1]; ixs--);
>> -                                *     while (rchgo[--ixo]);
>> -                                */
>>                         }
>>                 }
>>         }
>>
>> Jacob Keller (1):
>>   xdiff: add xdl_hash_and_recmatch helper function
>>
>> Stefan Beller (1):
>>   xdiff: implement empty line chunk heuristic
>>
>>  Documentation/diff-config.txt  |  6 ++++++
>>  Documentation/diff-options.txt |  6 ++++++
>>  diff.c                         | 11 +++++++++++
>>  xdiff/xdiff.h                  |  2 ++
>>  xdiff/xdiffi.c                 | 42 ++++++++++++++++++++++++++++++++++++++----
>>  5 files changed, 63 insertions(+), 4 deletions(-)
>>
>> --
>> 2.8.1.369.geae769a
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 17:10   ` Stefan Beller
  2016-04-15 17:27     ` Junio C Hamano
@ 2016-04-15 19:08     ` Jacob Keller
  1 sibling, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 19:08 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jacob Keller, git@vger.kernel.org, Junio C Hamano, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 10:10 AM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Apr 15, 2016 at 10:02 AM, Stefan Beller <sbeller@google.com> wrote:
>>> @@ -458,11 +458,11 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>>                          * the group.
>>>                          */
>>>                         while (ixs > 0 && xdl_hash_and_recmatch(recs, ixs - 1, ix - 1, flags)) {
>>> +                               emptylines += is_emptyline(recs[ix - 1]->ptr);
>>> +
>>>                                 rchg[--ixs] = 1;
>>>                                 rchg[--ix] = 0;
>>>
>>> -                               has_emptyline |=
>>> -                                       starts_with_emptyline(recs[ix]->ptr);
>>
>> How is this fixing the segfault bug?
>> From my understanding ix should have the same value here as ix-1 above.
>>
>>>                                 /*
>>>                                  * This change might have joined two change groups,
>>>                                  * so we try to take this scenario in account by moving
>>> @@ -492,9 +492,6 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>>                                 rchg[ixs++] = 0;
>>>                                 rchg[ix++] = 1;
>>>
>>> -                               has_emptyline |=
>>> -                                       starts_with_emptyline(recs[ix]->ptr);
>>> -
>>
>> Or would this fix the segfault bug?
>> I think we would need to have the check for empty lines
>> in both loops to be sure to cover the whole movable range.
>
> Actually we would only need to have the empty line count in the second loop as
> the first loop shifted it as much up(backwards) as possible, such that
> the hunk sits on one
> end of the movable range. The second loop would iterate over the whole
> range, so that
> would be the only place needed to count.

I agree that we can drop the first part and I am testing it now.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics
  2016-04-15 17:33       ` Stefan Beller
  2016-04-15 17:48         ` Junio C Hamano
@ 2016-04-15 19:09         ` Jacob Keller
  1 sibling, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 19:09 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Jacob Keller, git@vger.kernel.org, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 10:33 AM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Apr 15, 2016 at 10:27 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>> Actually we would only need to have the empty line count in the
>>> second loop as the first loop shifted it as much up(backwards) as
>>> possible, such that the hunk sits on one end of the movable
>>> range. The second loop would iterate over the whole range, so that
>>> would be the only place needed to count.
>>
>> The description makes me wonder if we can do without two loops ;-)
>
> I think the existing 2 loops are needed to move "as much as possible"
> to either boundary to see if there is overlap to another group and combine
> the groups if so. (This whole construction prior to these patches remind
> me of shaker sort)
>
>>
>> That is, can we teach the first loop not to be so aggressive to
>> shift "as much as possible" but notice there is an empty line we
>> would want to treat specially?
>
> I think we need to be aggressive to find adjacent groups and only after
> that is done we should think about optimizing look&feel? That is why I
> even proposed to not touch the current 2 loops at all and have our own
> loop to find out if there is at least one empty line.

Agreed, we need the two loops in order to aggressively merge all the
changes together first.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function
  2016-04-15 17:25   ` Junio C Hamano
@ 2016-04-15 19:09     ` Jacob Keller
  0 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 19:09 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jacob Keller, Git mailing list, Stefan Beller, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 10:25 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
>
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> It is a common pattern in xdl_change_compact to check that hashes and
>> strings match. The resulting code to perform this change causes very
>> long lines and makes it hard to follow the intention. Introduce a helper
>> function xdl_hash_and_recmatch which performs both checks to increase
>> code readability.
>
> Think _why_ it is common to check hash and then do recmatch().  What
> is the combination of two trying to compute?
>
> How about calling it after "what" it computes, not after "how" it
> computes it?  E.g.
>
>     static int recs_match(xrecord_t **recs, long x, long y, long flags)
>
> if we answer the above question "they try to see if two records match".
> We could also go s/recs/lines/.
>
> The xdl_recmatch() function appears in xutils.c, and over there the
> functions do not use arrays of (xrecord_t *), so I think we are
> better off without xdl_ prefix to avoid confusion.
>

That makes sense. I like the sound of recs_match, and it's shorter too
which is nice.

Regards,
Jake

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic Jacob Keller
@ 2016-04-15 19:57   ` Stefan Beller
  2016-04-15 20:06     ` Junio C Hamano
  2016-04-15 20:06     ` Jacob Keller
  0 siblings, 2 replies; 22+ messages in thread
From: Stefan Beller @ 2016-04-15 19:57 UTC (permalink / raw)
  To: Jacob Keller
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi

On Fri, Apr 15, 2016 at 9:51 AM, Jacob Keller <jacob.e.keller@intel.com> wrote:
> From: Stefan Beller <sbeller@google.com>
>
> In order to produce the smallest possible diff and combine several diff
> hunks together, we implement a heuristic from GNU Diff which moves diff
> hunks forward as far as possible when we find common context above and
> below a diff hunk. This sometimes produces less readable diffs when
> writing C, Shell, or other programming languages, ie:
>
> ...
>  /*
> + *
> + *
> + */
> +
> +/*
> ...
>
> instead of the more readable equivalent of
>
> ...
> +/*
> + *
> + *
> + */
> +
>  /*
> ...
>
> Implement the following heuristic to (optionally) produce the desired
> output.
>
>   If there are diff chunks which can be shifted around, shift each hunk
>   such that the last common empty line is below the chunk with the rest
>   of the context above.
>
> This heuristic appears to resolve the above example and several other
> common issues without producing significantly weird results. However, as
> with any heuristic it is not really known whether this will always be
> more optimal. Thus, leave the heuristic disabled by default.
>
> Add an XDIFF flag to enable this heuristic only conditionally. Add
> a diff command line option and diff configuration option to allow users
> to enable this option when desired.
>
> TODO:
> * Add tests
> * Add better/more documentation explaining the heuristic, possibly with
>   examples(?)
> * better name(?)
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  Documentation/diff-config.txt  |  6 ++++++
>  Documentation/diff-options.txt |  6 ++++++
>  diff.c                         | 11 +++++++++++
>  xdiff/xdiff.h                  |  2 ++
>  xdiff/xdiffi.c                 | 26 ++++++++++++++++++++++++++
>  5 files changed, 51 insertions(+)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index edba56522bce..cebf82702d2a 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -170,6 +170,12 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> +diff.emptyLineHeuristic::

I was looking at the TODO here and thought about the name:
It should not encode the `emptyLine` into the config option as
it is only one of many heuristics.

It should be something like `diff.heuristic=lastEmptyLine`
The we could add firstEmptyLine, aggressiveUp, aggressiveDown,
breakAtShortestLineLength or other styles as well later on.

I do not quite understand the difference between diff.algorithm
and the newly proposed heuristic as the heuristic is part of
the algorithm? So I guess we'd need to have some documentation
saying how these differ. (fundamental algorithm vs last minute
style fixup?)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 19:57   ` Stefan Beller
@ 2016-04-15 20:06     ` Junio C Hamano
  2016-04-15 20:17       ` Jacob Keller
  2016-04-15 20:06     ` Jacob Keller
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-04-15 20:06 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jacob Keller, git@vger.kernel.org, Jeff King, Jens Lehmann,
	Davide Libenzi

Stefan Beller <sbeller@google.com> writes:

>> +diff.emptyLineHeuristic::
>
> I was looking at the TODO here and thought about the name:
> It should not encode the `emptyLine` into the config option as
> it is only one of many heuristics.
>
> It should be something like `diff.heuristic=lastEmptyLine`
> The we could add firstEmptyLine, aggressiveUp, aggressiveDown,
> breakAtShortestLineLength or other styles as well later on.
>
> I do not quite understand the difference between diff.algorithm
> and the newly proposed heuristic as the heuristic is part of
> the algorithm? So I guess we'd need to have some documentation
> saying how these differ. (fundamental algorithm vs last minute
> style fixup?)

I actually do not think these knobs should exist when the code is
mature enough to be shipped to the end users.

Use "diff.compactionHeuristics = <uint>" as an opaque set of bits to
help the developers while they compare notes and reach consensus on
a single tweak that they can agree on being good enough, and then
remove that variable before the code hits 'next'.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 19:57   ` Stefan Beller
  2016-04-15 20:06     ` Junio C Hamano
@ 2016-04-15 20:06     ` Jacob Keller
  1 sibling, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 20:06 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jacob Keller, git@vger.kernel.org, Junio C Hamano, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 12:57 PM, Stefan Beller <sbeller@google.com> wrote:
> I was looking at the TODO here and thought about the name:
> It should not encode the `emptyLine` into the config option as
> it is only one of many heuristics.
>
> It should be something like `diff.heuristic=lastEmptyLine`
> The we could add firstEmptyLine, aggressiveUp, aggressiveDown,
> breakAtShortestLineLength or other styles as well later on.
>

This sounds better, but how does this handle multiple heuristics?

> I do not quite understand the difference between diff.algorithm
> and the newly proposed heuristic as the heuristic is part of
> the algorithm? So I guess we'd need to have some documentation
> saying how these differ. (fundamental algorithm vs last minute
> style fixup?)

It is not part of the algorithm. It's applied after the algorithm.
xdl_change_compact is run after the algorithm and run for all
algorithms.

These are last minute style changes, and should probably not use the
term heuristic, but somehow capture "last minute style fixup"

Thanks,
Jake

> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 20:06     ` Junio C Hamano
@ 2016-04-15 20:17       ` Jacob Keller
  2016-04-15 20:30         ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 20:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jacob Keller, git@vger.kernel.org, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>> +diff.emptyLineHeuristic::
>>
>> I was looking at the TODO here and thought about the name:
>> It should not encode the `emptyLine` into the config option as
>> it is only one of many heuristics.
>>
>> It should be something like `diff.heuristic=lastEmptyLine`
>> The we could add firstEmptyLine, aggressiveUp, aggressiveDown,
>> breakAtShortestLineLength or other styles as well later on.
>>
>> I do not quite understand the difference between diff.algorithm
>> and the newly proposed heuristic as the heuristic is part of
>> the algorithm? So I guess we'd need to have some documentation
>> saying how these differ. (fundamental algorithm vs last minute
>> style fixup?)
>
> I actually do not think these knobs should exist when the code is
> mature enough to be shipped to the end users.
>
> Use "diff.compactionHeuristics = <uint>" as an opaque set of bits to
> help the developers while they compare notes and reach consensus on
> a single tweak that they can agree on being good enough, and then
> remove that variable before the code hits 'next'.
>
> Thanks.

I was under the impression that we would want a longer lived
configuration until we had enough data to say whether it was helpful
to make it default. I guess i had thought it would need to be longer
lived since there may be cases where it's not optimal and being able
to turn it off would be good?

I'd rather keep it semi-human readable vs a uint since it would help
keep me sane when looking at it in the interim.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 20:17       ` Jacob Keller
@ 2016-04-15 20:30         ` Junio C Hamano
  2016-04-15 21:15           ` Jacob Keller
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-04-15 20:30 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Jacob Keller, git@vger.kernel.org, Jeff King,
	Jens Lehmann, Davide Libenzi

Jacob Keller <jacob.keller@gmail.com> writes:

> On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
>> I actually do not think these knobs should exist when the code is
>> mature enough to be shipped to the end users.
>>
>> Use "diff.compactionHeuristics = <uint>" as an opaque set of bits to
>> help the developers while they compare notes and reach consensus on
>> a single tweak that they can agree on being good enough, and then
>> remove that variable before the code hits 'next'.
>>
>> Thanks.
>
> I was under the impression that we would want a longer lived
> configuration until we had enough data to say whether it was
> helpful to make it default. I guess i had thought it would need to
> be longer lived since there may be cases where it's not optimal
> and being able to turn it off would be good?

Once you start worrying about "some cases this may misbehave", a
configuration variable is a wrong mechanism to do so anyway.  You
would need to tie this to attributes, so the users can say "use this
heuristics for my C code, but do not apply it for my AsciiDoc
input", etc.

What you have is a pure developer support; aim to come up with "good
enough" way, giving developers an easier way to experiment with, and
remove it before the feature is shipped to the end user.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 20:30         ` Junio C Hamano
@ 2016-04-15 21:15           ` Jacob Keller
  2016-04-15 21:22             ` Jacob Keller
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 21:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jacob Keller, git@vger.kernel.org, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 1:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
>
>> On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>>> I actually do not think these knobs should exist when the code is
>>> mature enough to be shipped to the end users.
>>>
>>> Use "diff.compactionHeuristics = <uint>" as an opaque set of bits to
>>> help the developers while they compare notes and reach consensus on
>>> a single tweak that they can agree on being good enough, and then
>>> remove that variable before the code hits 'next'.
>>>
>>> Thanks.
>>
>> I was under the impression that we would want a longer lived
>> configuration until we had enough data to say whether it was
>> helpful to make it default. I guess i had thought it would need to
>> be longer lived since there may be cases where it's not optimal
>> and being able to turn it off would be good?
>
> Once you start worrying about "some cases this may misbehave", a
> configuration variable is a wrong mechanism to do so anyway.  You
> would need to tie this to attributes, so the users can say "use this
> heuristics for my C code, but do not apply it for my AsciiDoc
> input", etc.
>

I think this makes perfect sense to apply this as an attribute,
however.. why isn't the current diff algorithm done this way?

Thanks,
Jake

> What you have is a pure developer support; aim to come up with "good
> enough" way, giving developers an easier way to experiment with, and
> remove it before the feature is shipped to the end user.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 21:15           ` Jacob Keller
@ 2016-04-15 21:22             ` Jacob Keller
  2016-04-15 21:44               ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 21:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jacob Keller, git@vger.kernel.org, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 2:15 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
> On Fri, Apr 15, 2016 at 1:30 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jacob Keller <jacob.keller@gmail.com> writes:
>>
>>> On Fri, Apr 15, 2016 at 1:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>>> I actually do not think these knobs should exist when the code is
>>>> mature enough to be shipped to the end users.
>>>>
>>>> Use "diff.compactionHeuristics = <uint>" as an opaque set of bits to
>>>> help the developers while they compare notes and reach consensus on
>>>> a single tweak that they can agree on being good enough, and then
>>>> remove that variable before the code hits 'next'.
>>>>
>>>> Thanks.
>>>
>>> I was under the impression that we would want a longer lived
>>> configuration until we had enough data to say whether it was
>>> helpful to make it default. I guess i had thought it would need to
>>> be longer lived since there may be cases where it's not optimal
>>> and being able to turn it off would be good?
>>
>> Once you start worrying about "some cases this may misbehave", a
>> configuration variable is a wrong mechanism to do so anyway.  You
>> would need to tie this to attributes, so the users can say "use this
>> heuristics for my C code, but do not apply it for my AsciiDoc
>> input", etc.
>>
>
> I think this makes perfect sense to apply this as an attribute,
> however.. why isn't the current diff algorithm done this way?
>
> Thanks,
> Jake
>
>> What you have is a pure developer support; aim to come up with "good
>> enough" way, giving developers an easier way to experiment with, and
>> remove it before the feature is shipped to the end user.


What are your thoughts on adding this do the gitattributes diff
section? Ie: modifications to the diff driver.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 21:22             ` Jacob Keller
@ 2016-04-15 21:44               ` Junio C Hamano
  2016-04-15 21:56                 ` Jacob Keller
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2016-04-15 21:44 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Jacob Keller, git@vger.kernel.org, Jeff King,
	Jens Lehmann, Davide Libenzi

Jacob Keller <jacob.keller@gmail.com> writes:

>>> What you have is a pure developer support; aim to come up with "good
>>> enough" way, giving developers an easier way to experiment with, and
>>> remove it before the feature is shipped to the end user.
>
> What are your thoughts on adding this do the gitattributes diff
> section? Ie: modifications to the diff driver.

I do try to foresee the future needs but I try to limit the forecast
to "just enough so that we won't waste engineering effort on a wrong
thing".  "It may need to be turned off conditionally" suggests we
would want attributes added per path, and that is sufficient to make
me say "don't waste time on bikeshedding configuration variable
names, it will not be in the final product".

We do not need yet to know what the final name of the attributes
are, or how exactly the bit to affect the low level mechanism will
be set by the attribute mechanism.  I do not think this topic is
there yet, and it is a waste of engineering effort to prematurely
trying to make things too flexible and customizable, when the thing
that will eventually become flexible by conditionally enabled is not
even there yet.

As long as the low-level thing has a knob, set of internal bits, to
enable and disable it, that is all that is necessary to know at this
point.

Having said all that, I'd expect we'd compute the right bit to use
in the same place where we currently pick the custom textconv
driver, diff backend, etc., by consulting the attribute system
before running the diff.

But again, I'd think it would be waste of time to think beyond that
at this point, identifying exactly at which line of which source
file the new code would go and what that new code would look like,
until we are ready to start integrating it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 21:44               ` Junio C Hamano
@ 2016-04-15 21:56                 ` Jacob Keller
  0 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2016-04-15 21:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Jacob Keller, git@vger.kernel.org, Jeff King,
	Jens Lehmann, Davide Libenzi

On Fri, Apr 15, 2016 at 2:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
>
>>>> What you have is a pure developer support; aim to come up with "good
>>>> enough" way, giving developers an easier way to experiment with, and
>>>> remove it before the feature is shipped to the end user.
>>
>> What are your thoughts on adding this do the gitattributes diff
>> section? Ie: modifications to the diff driver.
>
> I do try to foresee the future needs but I try to limit the forecast
> to "just enough so that we won't waste engineering effort on a wrong
> thing".  "It may need to be turned off conditionally" suggests we
> would want attributes added per path, and that is sufficient to make
> me say "don't waste time on bikeshedding configuration variable
> names, it will not be in the final product".
>
> We do not need yet to know what the final name of the attributes
> are, or how exactly the bit to affect the low level mechanism will
> be set by the attribute mechanism.  I do not think this topic is
> there yet, and it is a waste of engineering effort to prematurely
> trying to make things too flexible and customizable, when the thing
> that will eventually become flexible by conditionally enabled is not
> even there yet.
>
> As long as the low-level thing has a knob, set of internal bits, to
> enable and disable it, that is all that is necessary to know at this
> point.
>
> Having said all that, I'd expect we'd compute the right bit to use
> in the same place where we currently pick the custom textconv
> driver, diff backend, etc., by consulting the attribute system
> before running the diff.
>
> But again, I'd think it would be waste of time to think beyond that
> at this point, identifying exactly at which line of which source
> file the new code would go and what that new code would look like,
> until we are ready to start integrating it.

Ok, for now I'll leave this as is then.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-04-15 21:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-15 16:51 [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Jacob Keller
2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 1/2] xdiff: add xdl_hash_and_recmatch helper function Jacob Keller
2016-04-15 17:25   ` Junio C Hamano
2016-04-15 19:09     ` Jacob Keller
2016-04-15 16:51 ` [RFC PATCH, WAS: "weird diff output?" 2/2] xdiff: implement empty line chunk heuristic Jacob Keller
2016-04-15 19:57   ` Stefan Beller
2016-04-15 20:06     ` Junio C Hamano
2016-04-15 20:17       ` Jacob Keller
2016-04-15 20:30         ` Junio C Hamano
2016-04-15 21:15           ` Jacob Keller
2016-04-15 21:22             ` Jacob Keller
2016-04-15 21:44               ` Junio C Hamano
2016-04-15 21:56                 ` Jacob Keller
2016-04-15 20:06     ` Jacob Keller
2016-04-15 17:02 ` [RFC PATCH, WAS: "weird diff output?" 0/2] implement better chunk heuristics Stefan Beller
2016-04-15 17:10   ` Stefan Beller
2016-04-15 17:27     ` Junio C Hamano
2016-04-15 17:33       ` Stefan Beller
2016-04-15 17:48         ` Junio C Hamano
2016-04-15 19:09         ` Jacob Keller
2016-04-15 19:08     ` Jacob Keller
2016-04-15 19:07   ` Jacob Keller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).