git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: peff@peff.net
Cc: git@vger.kernel.org, jacob.keller@gmail.com,
	mhagger@alum.mit.edu, Stefan Beller <sbeller@google.com>
Subject: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines
Date: Thu, 16 Jun 2016 10:46:20 -0700	[thread overview]
Message-ID: <20160616174620.1011-1-sbeller@google.com> (raw)

The compaction heuristic for diffs was deemed quite good, but in 5580b27
we have an example demonstrates a common case, that fails with the existing
heuristic. That is why we disabled the heuristic in the v2.9.0 release.

With this patch a diff looks like:

         def bar
                 do_bar_stuff()

                 common_ending()
         end

+        def bal
+                do_bal_stuff()
+
+                common_ending()
+        end
+
         def baz
                 do_baz_stuff()

                 common_ending()
         end

whereas before we had:

  WIP (TODO: ask peff to provide an example that actually triggers, I seem to be
       unable to write one, though I thought the above was one)


The way we do it, is by inspecting the neighboring lines and see how
much indent there is and we choose the line that has the shortest amount
of blanks in the neighboring lines.

(TODO: update comments in the file)

Signed-off-by: Stefan Beller <sbeller@google.com>
---

Hi Jeff, hi Michael,

thanks for pointing out the flaw in this ruby code base before the 2.9 release.
I think this patch would fix your finding, though I cannot reproduce it.
Can you provide an example repo/patch that looks ugly on current origin/master,
so I can be sure this fixes the issue?

This can also be a start of a discussion if this is a too short-sighted enhancement
of the heuristic. Essentially we'd want to prefer 

+        end
+
+        def bal

over:

+                do_bal_stuff()
+
+                common_ending()

because there is less space between line start and {end, def bal}
than for {do_bal_stuff, common_ending}.

Thanks,
Stefan

 xdiff/xdiffi.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index b3c6848..58adc74 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -405,6 +405,35 @@ static int is_blank_line(xrecord_t **recs, long ix, long flags)
 	return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags);
 }
 
+static unsigned int leading_blank(const char *line)
+{
+	unsigned int ret = 0;
+	while (*line) {
+		if (*line == '\t')
+			ret += 8;
+		else if (*line == ' ')
+			ret ++;
+		else
+			break;
+		line++;
+	}
+	return ret;
+}
+
+static unsigned int surrounding_leading_blank(xrecord_t **recs, long ix,
+		long flags, long nrec)
+{
+	unsigned int i, ret = UINT_MAX;
+	if (ix > 0)
+		ret = leading_blank(recs[ix - 1]->ptr);
+	if (ix < nrec - 1) {
+		i = leading_blank(recs[ix + 1]->ptr);
+		if (i < ret)
+			ret = i;
+	}
+	return ret;
+}
+
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
@@ -416,7 +445,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
-	unsigned int blank_lines;
+	unsigned int blank_lines, min_bl_neigh_indent;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -451,6 +480,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 		do {
 			grpsiz = ix - ixs;
 			blank_lines = 0;
+			min_bl_neigh_indent = UINT_MAX;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -485,7 +515,13 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
-				blank_lines += is_blank_line(recs, ix, flags);
+				if (is_blank_line(recs, ix, flags)) {
+					unsigned int bl_neigh_indent =
+						surrounding_leading_blank(recs, ix, flags, nrec);
+					if (min_bl_neigh_indent > bl_neigh_indent)
+						min_bl_neigh_indent = min_bl_neigh_indent;
+					blank_lines++;
+				}
 
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
@@ -525,6 +561,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 		if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
 			while (ixs > 0 &&
 			       !is_blank_line(recs, ix - 1, flags) &&
+			       surrounding_leading_blank(recs, ix - 1, flags, nrec) > min_bl_neigh_indent &&
 			       recs_match(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
-- 
2.9.0.8.gb6cbe66


             reply	other threads:[~2016-06-16 17:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 17:46 Stefan Beller [this message]
2016-06-16 20:27 ` [PATCH] diff compaction heuristic: favor shortest neighboring blank lines Junio C Hamano
2016-06-16 21:06   ` Stefan Beller
2016-06-16 21:10 ` Michael Haggerty
2016-06-16 21:36   ` Stefan Beller
2016-06-17 15:36 ` Jeff King
2016-06-17 16:09   ` Stefan Beller
2016-06-23 17:10     ` Michael Haggerty
2016-06-23 17:25       ` Stefan Beller
2016-06-23 17:37       ` Junio C Hamano
2016-06-23 20:13         ` Michael Haggerty
2016-06-30 13:54       ` Michael Haggerty
2016-07-01 17:04         ` diff heuristics dramatically improved by considering line indentation and " Michael Haggerty
2016-07-01 18:01         ` [PATCH] diff compaction heuristic: favor shortest neighboring " Junio C Hamano
2016-07-04 14:33           ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160616174620.1011-1-sbeller@google.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).